Preview

Vestnik NSU. Series: Information Technologies

Advanced search

Further training of the CodeBERT model for writing comments on SQL queries

https://doi.org/10.25205/1818-7900-2024-22-3-28-39

Abstract

   Automated creation of comments to the source code is an urgent topic in software development, where machine translation models are used to “translate” code into text descriptions. The CodeBERT model, pre-trained in six programming languages, is used to search for code, generate documentation, and correct errors. This model understands well the semantics of natural language, programming languages, as well as the connections between them, this model is well suited for additional training on various applied tasks related to code. The article discusses the further training of the CodeBERT model for generating comments on SQL queries. This task is relevant, since large projects can use many SQL queries of varying complexity, and comments help to improve their readability and understanding. However, manually writing and keeping comments up-to-date takes time and effort from developers. The article suggests using the pre-trained CodeBERT model to automatically generate comments on SQL code, which will reduce time and allow you to keep comments up to date. For further training, open datasets, the contents of the SQL query, as well as comments on it are used. The test results showed that the pre-trained model successfully copes with the task of creating comments to an SQL query, which is also confirmed by the obtained values of the Bleu metric.

About the Author

D. A. Komlev
NUST MISIS
Russian Federation

Danila A. Komlev, Graduate Student

Moscow



References

1. What Is Declarative Programming? URL: https://codefresh.io/learn/infrastructure-as-code/declarative-vs-imperative-programming-4-key-differences/

2. Vaswani О., Shazeer N., Parmar N. etc. Attention Is All You Need. URL: https://arxiv.org/abs/1706.03762

3. Masked Language Modeling in BERT. URL: https://www.scaler.com/topics/nlp/masked-language-model-explained/

4. Clark К., Luong М.-Т. D. Manning С Electra: pre-training text encoders as discriminators rather than generators. URL: https://openreview.net/pdf?id=r1xMH1BtvB

5. Natural Language Processing: Bleu Score. URL: https://www.baeldung.com/cs/nlp-bleu-score

6. N-граммы. URL: https://deepai.org/machine-learning-glossary-and-terms/n-gram

7. CodeXGLUE. URL: https://microsoft.github.io/CodeXGLUE/

8. Cross Enptropy. URL: https://ml-cheatsheet.readthedocs.io/en/latest/loss_functions.html

9. Beam Search. URL: https://d2l.ai/chapter_recurrent-modern/beam-search.html

10. Zhu Qingfu, Zhang Weinan, Zhou Lianqiang, Liu Ting. Learning to Start for Sequence to Sequence Architecture. 2016. URL: https://www.researchgate.net/publication/306357583_Learning_to_Start_for_Sequence_to_Sequence_Architecture


Review

For citations:


Komlev D.A. Further training of the CodeBERT model for writing comments on SQL queries. Vestnik NSU. Series: Information Technologies. 2024;22(3):28-39. (In Russ.) https://doi.org/10.25205/1818-7900-2024-22-3-28-39

Views: 107


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 1818-7900 (Print)
ISSN 2410-0420 (Online)