Further training of the CodeBERT model for writing comments on SQL queries

D. A. Komlev

doi:10.25205/1818-7900-2024-22-3-28-39

Further training of the CodeBERT model for writing comments on SQL queries

D. A. Komlev

https://doi.org/10.25205/1818-7900-2024-22-3-28-39

Full Text:

PDF (Rus)

Generate QR code

Abstract

Automated creation of comments to the source code is an urgent topic in software development, where machine translation models are used to “translate” code into text descriptions. The CodeBERT model, pre-trained in six programming languages, is used to search for code, generate documentation, and correct errors. This model understands well the semantics of natural language, programming languages, as well as the connections between them, this model is well suited for additional training on various applied tasks related to code. The article discusses the further training of the CodeBERT model for generating comments on SQL queries. This task is relevant, since large projects can use many SQL queries of varying complexity, and comments help to improve their readability and understanding. However, manually writing and keeping comments up-to-date takes time and effort from developers. The article suggests using the pre-trained CodeBERT model to automatically generate comments on SQL code, which will reduce time and allow you to keep comments up to date. For further training, open datasets, the contents of the SQL query, as well as comments on it are used. The test results showed that the pre-trained model successfully copes with the task of creating comments to an SQL query, which is also confirmed by the obtained values of the Bleu metric.

Keywords

SQL query, CodeBERT, Transformers, NLP, Bleu, additional training, comment generation

About the Author

D. A. Komlev

NUST MISIS
Russian Federation

Danila A. Komlev, Graduate Student

Moscow

References

1. What Is Declarative Programming? URL: https://codefresh.io/learn/infrastructure-as-code/declarative-vs-imperative-programming-4-key-differences/

2. Vaswani О., Shazeer N., Parmar N. etc. Attention Is All You Need. URL: https://arxiv.org/abs/1706.03762

3. Masked Language Modeling in BERT. URL: https://www.scaler.com/topics/nlp/masked-language-model-explained/

4. Clark К., Luong М.-Т. D. Manning С Electra: pre-training text encoders as discriminators rather than generators. URL: https://openreview.net/pdf?id=r1xMH1BtvB

5. Natural Language Processing: Bleu Score. URL: https://www.baeldung.com/cs/nlp-bleu-score

6. N-граммы. URL: https://deepai.org/machine-learning-glossary-and-terms/n-gram

7. CodeXGLUE. URL: https://microsoft.github.io/CodeXGLUE/

8. Cross Enptropy. URL: https://ml-cheatsheet.readthedocs.io/en/latest/loss_functions.html

9. Beam Search. URL: https://d2l.ai/chapter_recurrent-modern/beam-search.html

10. Zhu Qingfu, Zhang Weinan, Zhou Lianqiang, Liu Ting. Learning to Start for Sequence to Sequence Architecture. 2016. URL: https://www.researchgate.net/publication/306357583_Learning_to_Start_for_Sequence_to_Sequence_Architecture

Review

For citations:

Komlev D.A. Further training of the CodeBERT model for writing comments on SQL queries. Vestnik NSU. Series: Information Technologies. 2024;22(3):28-39. (In Russ.) https://doi.org/10.25205/1818-7900-2024-22-3-28-39

This work is licensed under a Creative Commons Attribution 4.0 License.

ISSN 1818-7900 (Print)
ISSN 2410-0420 (Online)

Username
Password
	Remember me
Not a user? Register with this site Forgot your password?

User

Vestnik NSU. Series: Information Technologies

Further training of the CodeBERT model for writing comments on SQL queries

Full Text:

Abstract

Keywords

About the Author

References

Review

For citations:

Cookies policy