Vol 22, No 3 (2024)

Method for Extracting Multi-Component Terminological Units with Right Definitions from Scientific and Technical Texts

Iu. I. Butenko

PDF (Rus)

5-14 145

Abstract

The paper proposes a method for extracting Russian-language multicomponent terms with right definitions in their structure. The analysis of modern methods, techniques and software tools for extraction of special terminology is carried out, and on its basis it is shown that they cover terms only with left definitions only. The formal structure of Russian-language multi-component terminological units with right definitions is investigated, where special attention is paid to their grammatical features, which include gender, case, number for Russian language nouns and adjectives. The inexpediency
of applying lemmatisation to all components of a term is substantiated. The correctness of morphological analyzers of Russian texts is analyzed in the aspect of their applicability to the extraction of multi-component terms. The models of five-component terms are given, which became the basis for the development of the method of extraction of Russian-language multicomponent terms with right definitions. The proposed structural models identify the nuclear element, left and right definitions, and grammatical features of the right definition for Russian-language multicomponent terms. The paper also illustrates he differences in the lists of Russian-language candidate terms when using traditional approaches that use lemmatisation at the first stage and the proposed method for extraction of multicomponent terms with right definitions.

Choosing the optimal programming language for the generation of mathematical problems

D. V. Vinokurova

PDF (Rus)

15-27 163

Abstract

This paper compares mathematical libraries of web programming languages JavaScript, PHP, Python to create generators in the field of some topics of mathematical analysis and computational mathematics.

The main objective of the study is to conduct an experiment with a given set of tasks, using the libraries Math.js, Algebrite, Nerdamer, MathPHP, NumPy, SymPy, SciPy to determine the optimal functionality and performance for performing character and numerical computing.

The experimental study was carried out with the help of the libraries listed, where the corresponding tasks were computed with the measurement of their speed. A comparative analysis of the obtained results of the study is given. The main problems that arose during the experiment in different libraries are shown. The obtained results can be used by developers and researchers who are involved in the design and implementation of generators of mathematical problems. In the process of work it is identified that JavaScript and PHP libraries do not fully support all functions for creating generators of mathematical problems. Python was much more efficient in both symbolic and numerical calculations.

Further training of the CodeBERT model for writing comments on SQL queries

D. A. Komlev

PDF (Rus)

28-39 110

Abstract

Automated creation of comments to the source code is an urgent topic in software development, where machine translation models are used to “translate” code into text descriptions. The CodeBERT model, pre-trained in six programming languages, is used to search for code, generate documentation, and correct errors. This model understands well the semantics of natural language, programming languages, as well as the connections between them, this model is well suited for additional training on various applied tasks related to code. The article discusses the further training of the CodeBERT model for generating comments on SQL queries. This task is relevant, since large projects can use many SQL queries of varying complexity, and comments help to improve their readability and understanding. However, manually writing and keeping comments up-to-date takes time and effort from developers. The article suggests using the pre-trained CodeBERT model to automatically generate comments on SQL code, which will reduce time and allow you to keep comments up to date. For further training, open datasets, the contents of the SQL query, as well as comments on it are used. The test results showed that the pre-trained model successfully copes with the task of creating comments to an SQL query, which is also confirmed by the obtained values of the Bleu metric.

Software and hardware solution for stream processing of data for сompensation of temperature drifts of LWD orientation sensor «Looch»

V. S. Litvinov, A. A. Vlasov, D. V. Teytelbaum

PDF (Rus)

40-48 132

Abstract

The Scientific-production enterprise of geophysical equipment “Looch” develops and manufactures LWD telemetry systems used in the process of drilling oil and gas wells. They include an orientation sensor (inclinometer) that estimates the position of the device in the well based on signals from three accelerometers and three magnetometers. System modules operate at high temperatures (up to 120 °C), and temperature drift compensation is required to ensure the specified orientation measurement error. This paper provides estimates of allowable drifts of sensor readings, a temperature polynomial model of accelerometers and magnetometers, and a compensation technique. An experiment was carried out, the results of which determined the suitability of the model and methodology used.

Comparison of machine learning methods for sentiment analysis

M. V. Shvenk, E. P. Bruches, A. Y. Leman

PDF (Rus)

49-61 185

Abstract

Every day the amount of text data containing the subjective evaluation of the author is increasing thanks to the Internet. This information is used, for example, by numerous companies to assess the loyalty of their target audience. Due to the incredibly fast growth of the volume of such texts, their manual processing becomes impractical. It is in such situations that automated sentiment analysis is used, which is an actively developing area of natural language processing. We collected a corpus of medical service reviews, on the basis of which three classifiers were trained. We also performed a
comparative analysis of the obtained results of the models, which belong to traditional or deep machine learning. Our corpus of texts is public and can be useful for other researchers.

Username
Password
	Remember me
Not a user? Register with this site Forgot your password?

User

Vestnik NSU. Series: Information Technologies

Cookies policy