References

intechngu

Вестник НГУ. Серия: Информационные технологии

Vestnik NSU. Series: Information Technologies

1818-79002410-0420

НГУ

10.25205/1818-7900-2024-22-2-57-67

intechngu-271

Research Article

Статьи

Использование цепей Маркова для автоматического завершения исходного кода программы

Code Completion Using Markov Chains

https://orcid.org/0009-0008-1060-4613

Тимофеев

В. С.

Timofeev

V. S.

Тимофеев Владислав Сергеевич, магистрант

Новосибирск

Vladislav S. Timofeev, Master’s Student

Novosibirsk

v.timofeev@g.nsu.ru

Новосибирский государственный университетРоссияNovosibirsk State UniversityRussian Federation

2024

10102024

2225767

2024

Тимофеев В.С.

Timofeev V.S.

This work is licensed under a Creative Commons Attribution 4.0 License.

https://intechngu.elpub.ru/jour/article/view/271

В сфере программирования используются разнообразные инструменты с целью оптимизации процесса разработки. Среди них особое место занимают интегрированные среды разработки (IDE), обеспечивающие широкий спектр сервисов, включая текстовый редактор, отладчик и интеллектуальное завершение кода. Настоящая работа посвящена разработке модели, направленной на предсказание вариантов завершения исходного кода программы. Для улучшения точности модели были использованы комбинации цепей Маркова, основанные на различных методах вычисления текущего контекста программы: линейном и с использованием абстрактного синтаксического дерева (AST). Линейный метод анализа контекста представляет собой анализ токенизированного представления исходного кода, в то время как второй метод использует структуру исходного кода в виде AST. Объединение различных моделей позволяет сохранить больше семантической информации о коде и учитывать при автодополнении индивидуальный стиль написания кода. Разработанная модель демонстрирует высокую точность предсказаний при минимальном объеме вычислительных ресурсов, что делает ее применимой в интегрированных средах разработки.

Modern software engineers use many tools to speed up the development process. Many of them use integrated development environments (IDEs), which provide services such as text editors, debuggers and even intelligent code completion. This paper is dedicated to the development of a model for predicting variants of program source code termination. To improve the accuracy of the model, we used combinations of Markov chains constructed using different ways of calculating the current context of the program: linear and with AST. The linear way of computing the context is an analysis of the tokenized representation of the source code. The second method, on the other hand, uses a representation of the source code in the form of an abstract syntax tree. Combining the different models preserves more semantic information about the code, also adding the ability to support custom code writing style features. In order to compare the different models, a new dataset has been created specifically for the Pascal language. A detailed comparison of the working mechanisms as well as the prediction accuracy on the collected data is given. The proposed model showed high enough accuracy of predictions with minimal computation costs, which allows using it in integrated development environments.

автоматическое завершение кодацепи Маркованейронные сетиинтегрированные среды разработкиязык программирования Pascal

code completionMarkov chainsneural networksintegrated development environmentPascal programming language

References1

Marasoiu M., Church L., Blackwell A. An empirical investigation of code completion usage by professional software developers. Annual Workshop of the Psychology of Programming Interest Group, 2015.

Han S., Wallace D. R., Miller R. C. Code Completion from Abbreviated Input. 2009 IEEE/ ACM International Conference on Automated Software Engineering, Nov. 2009, pр. 332–343. DOI: 10.1109/ase.2009.64

Hou D., Pletcher D. M. An evaluation of the strategies of sorting, filtering, and grouping API methods for Code Completion. 2011 27th IEEE International Conference on Software Maintenance (ICSM), Sep. 2011, pр. 233–242. DOI: 10.1109/icsm.2011.6080790

Ginzberg A., Kostas L., Balakrishnan T. Automatic code completion. Stanford, Class Project, 2017.

Svyatkovskiy A., Zhao Y., Fu S., Sundaresan N. Pythia: AI-assisted Code Completion System. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Jul. 2019, pр. 2727–2735. DOI: 10.1145/3292500.3330699

Buksbaum D. Increasing Code Completion Accuracy in Pythia Models for Non-Standard Python Libraries. Doctoral dissertation. Nova Southeastern University, 2023, https://nsuworks.nova.edu/gscis_etd/1188

Eysholdt M., Behrens H. Xtext: implement your language faster than the quick and dirty way. Proceedings of the ACM international conference companion on Object oriented programming systems languages and applications companion, Oct. 2010, pр. 307–309. DOI: 10.1145/1869542.1869625

Hellendoorn V. J., Devanbu P. Are deep neural networks the best choice for modeling source code? Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, Aug. 2017, pр. 763–773. DOI: 10.1145/3106237.3106290

Chan K. C., Lenard C. T., Mills T. M. An introduction to Markov chains. 49th Annual Conference of Mathematical Association of Victoria, 2012, pp. 40–47. DOI: 10.13140/2.1.1833.8248

Cormen T. H., Leiserson C. E., Rivest R. L., Stein C. Introduction to Algorithms (2nd ed.). MIT Press and McGraw-Hill, 2001, pp. 214–217.

Wu B., Liang B., Zhang X. Turn tree into graph: Automatic code review via simplified AST driven graph convolutional network. Knowledge-Based Systems, 2022, pp. 109450. DOI: 10.1016/j.knosys.2022.109450

The authors declare that there are no conflicts of interest present.