Development of Anomaly Detection System Based on Distributed Log Tracing
https://doi.org/10.25205/1818-7900-2023-21-1-62-72
Abstract
Software system developers must respond quickly to failures in order to avoid reputational and financial losses for their customers. Therefore, it is important to detect behavioral anomalies in the operation of software systems in a timely manner. At the moment, various tools for automatic monitoring of systems are being actively developed, but logs are the main tool for analyzing failures. Logs contain information about the operation of the system at various points of execution. Modern systems often have a distributed microservice architecture, which significantly complicates the task of analyzing logs. Logs of such systems are collected centrally from different microservices, forming a huge flow of information that is very difficult to analyze manually. However, the problem of identifying logs related to a specific request to the system is solved by distributed tracing, the use of which opens up wide opportunities for the introduction of automatic analysis. There are already many solutions for detecting anomalies in logs, but they do not take advantage of distributed tracing. The article is considered to solving the problem of detecting behavioral anomalies in the work of distributed software systems based on automatic analysis of log traces. The solution is based on the synthesis of machine learning methods. Log traces are preprocessed and cleaned using process mining methods. Next, vectorization and clustering of log messages is performed. After that, a long short-term memory network (LSTM) is used to analyze deviations in the sequences of processed logs. As a result of the work performed, a prototype of the anomaly detection system was developed and tested.
About the Author
D. A. KhudyakovRussian Federation
Daniil A. Khudyakov, Student
Novosibirsk
References
1. Chandola V., Banerjee A., Kumar V. Anomaly Detection: A Survey. ACM Computing Surveys (CSUR) 41.3, 2009. p. 1–58. DOI: 10.1145/1541880.1541882
2. Sridharan C. Distributed Systems Observability: A Guide to Building Robust Systems. O’Reilly Media, 2018.
3. Sultana N., Chilamkurti N., Peng, W., Alhadad R. Survey on SDN Based Network Intrusion Detection System Using Machine Learning approaches. Peer-to-Peer Networking and Applications, 2018. p. 1–9. DOI: 10.1007/s12083-017-0630-0
4. Pang G., Shen C., Cao L., van den Hengel A. Deep Learning for Anomaly Detection: A Review, 2020. arXiv: 2007.02500. DOI: 10.1145/3439950
5. Palchunov D. E., Yakhyaeva G. E. Integration of Fuzzy Model Theory and FCA for Big Data Mining. SIBIRCON 2019 – International Multi-Conference on Engineering, Computer and Information Sciences, Proceedings, 2019. p. 961–966. DOI: 10.1109/sibircon48586.2019.8958216
6. Yakhyaeva G. E. Application of Boolean Valued and Fuzzy Model Theory for Knowledge Base Development. SIBIRCON 2019 - International Multi-Conference on Engineering, Computer and Information Sciences, Proceedings, 2019. p. 868–871. DOI: 10.1109/sibircon48586.2019.8958245
7. Palchunov D. E.,. Tishkovsky D. E,. Tishkovskaya S. V, Yakhyaeva G. E. Combining logical and statistical rule reasoning and verification for medical applications. Proceedings – 2017 International Multi-Conference on Engineering, Computer and Information Sciences, SIBIRCON 2017, 2017, p. 309-313. DOI: 10.1109/sibircon.2017.8109895
8. Esposito C., Castiglione A., Choo K. R. Challenges in Delivering Software in the Cloud as Microservices. IEEE Cloud Computing 3.5, 2016, p. 10–14. DOI: 10.1109/mcc.2016.105
9. Gunawi H. S., Hao M., Leesatapornwongsa T., Patana-anake T., Do T., Adityatama J., Eliazar K. J., Laksono A., Lukman J. F., Martin V. What Bugs Live in the Cloud? A Study of 3000+ Issues in Cloud Systems. Proceedings of the ACM Symposium on Cloud Computing, 2014. p. 1–14. DOI: 10.1145/2670979.2670986
10. Beschastnikh I., Wang P., Brun Y., Ernst M. D. Debugging distributed systems: Challenges and options for validation and debugging. Communications of the ACM, vol. 59, no. 8, Aug. 2016. p. 32-37. DOI: 10.1145/2927299.2940294
11. Xu W., Huang L., Fox A., Patterson D., Jordan M. I. Detecting large-scale system problems by mining console logs. Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles (SOSP ‘09). Association for Computing Machinery, New York, NY, USA, 2009. p. 117–132. DOI: 10.1145/1629575.1629587
12. Vaarandi R. A data clustering algorithm for mining patterns from event logs. Proc. 3<sup>rd</sup> IEEE Workshop IP Oper. Manage, Oct. 2003. p. 119–126. DOI: 10.1109/ipom.2003.1251233
13. Du M., Li F., Zheng G., Srikumar V. Deeplog: Anomaly detection and diagnosis from system logs through deep learning. Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (CCS), 2017. p. 1285-1298. DOI: 10.1145/3133956.3134015
14. Zhang X., Li Z., Chen J. Robust log-based anomaly detection on unstable log data. Proceedings of the 2019 27th ACM Joint Meeting, Tallinn, Estonia. 26–30 August 2019. p. 807–817. DOI: 10.1145/3338906.3338931
15. Leemans S.J.J., Fahland D., van der Aalst W.M.P. Discovering Block-Structured Process Models from Event Logs Containing Infrequent Behaviour. Business Process Management Workshops. Lecture Notes in Business Information Processing, vol 171. Springer, Cham, 2013. DOI: 10.1007/978-3-319-06257-0_6
16. Le Q., Mikolov T. Distributed Representations of Sentences and Documents. 31<sup>st</sup> International Conference on Machine Learning, ICML 2014, 2014. DOI: 10.18653/v1/s17-1003
Review
For citations:
Khudyakov D.A. Development of Anomaly Detection System Based on Distributed Log Tracing. Vestnik NSU. Series: Information Technologies. 2023;21(1):62-72. (In Russ.) https://doi.org/10.25205/1818-7900-2023-21-1-62-72