We propose a method for scientific terms extraction from the texts in Russian based on weakly supervised learning. This approach doesn't require a large amount of hand-labeled data. To implement this method we collected a list of terms in a semi-automatic way and then annotated texts of scientific articles with these terms. These texts we used to train a model. Then we used predictions of this model on another part of the text collection to extend the train set. The second model was trained on both text collections: annotated with a dictionary and by a second model. Obtained results showed that giving additional data, annotated even in an automatic way, improves the quality of scientific terms extraction.
The article proposes to analyze cyber-situational awareness of an energy facility in three stages. There are i) analysis of cyber threats to the energy infrastructure; ii) modeling of extreme situations scenarios in the energy sector caused by the implementation of the cyber threats; iii) risk assessment of the cybersecurity disruption to energy infrastructure. Three methods are presented, corresponding to each stage. The authors propose to apply semantic modeling methods to analyze the impact of cyber threats to energy facilities, taking into account energy security within the presented approach. Such methods show their effectiveness in the absence or incompleteness of data for modeling the behavior of systems, which defies formal description or accurate forecasting. The presented approach to the cyber situational awareness analysis of energy facilities considered as a synthesis of cybersecurity and situational awareness studies, characterized by the use of semantic modeling methods.
The article is devoted to the choice of a clear and concise form for presenting the results of analise and comparing programming languages and systems, convenient for assessing the expressive power of languages and the complexity of implementing systems. The formalization is adapted to the paradigm analysis of the definitions of programming languages and the selection of practical criteria for decomposition of programs. The semantic decomposition of the definitions of languages and systems as part of the analysis of programming paradigms was chosen as the main ap-proach. Such a choice makes it possible to single out autonomously developed typical program components that can be adapted to the design of various information systems. Many works on methods for developing software systems depend on the practicality of approaches to decomposition of programs debugged using programming systems. The solution to this problem is useful when studying programming methods, studying the history of programming languages, for comparing programming paradigms, the potential of used circuits and models, assessing the novelty level of created programming languages, and also when choosing criteria for program decomposition. In addition, their existence allows us to form a teaching methodology for developing the components of information systems. Along the way, the distance in conceptual complexity between programming and programming system development is shown.
In this paper, we take a close look at a web platform that provides the tools necessary for working with folklore materials and conducting scientific research based on them. Folklore studies consist of working with audio and video materials, which contain the reproduction of elements of folk art in national languages, creating specific text recordings with translation and comments, written in a public language, and building a picture of the worlds based on available resources. To structure and present this content, we use an ontology-based approach, which allows linguists to describe not only the resources, but also subject knowledge in the Semantic Web style, i.e. using hierarchies of classes, objects and relationships between them. The main feature of folklore research is the need for synchronization of translations, which is achieved by creating a parallel corpora of texts, and the ability to label texts with entities of the subject area, which is called semantic markup. Moreover, each corpus is connected with a certain nationality and has both its own national language and unique system of concepts of the world around it. Such representation imposes many non-standard requirements for the platform, such as working with arbitrary languages, supporting many ontologies, ensuring the creation and editing of national subject ontologies, semantic text markup, presentation, navigation, and search across heterogeneous resources. The developed platform provides all the necessary tools for research, including tools for the development of ontologies in specific national subject areas and manual annotation of texts in real time by several specialists. Resources of the web-platform are located in the resource ontology, which includes such concepts as corpus, video resource, audio resource, graphic image, person, geographical location, genre of text, etc. Ontologies of subject areas are presented in the form of a hierarchy, where the ontology of universals, common to all folklore studies, is located at the top level. At the same time, inherited ontologies are specialized for each represented national corpus. The web application is built with Python Django framework and the TypeScript React library. Data storage is implemented using the Postgres database.
Due to the growth of the number of scientific publications, the tasks related to scientific article processing become more actual. Such texts have a special structure, lexical and semantic content that should be taken into account while processing. Using information from knowledge bases can significantly improve the quality of text processing systems. This paper is dedicated to the entity linking task for scientific articles in Russian, where we consider scientific terms as entities. During our work, we annotated a corpus with scientific texts, where each term was linked with an entity from a knowledge base. Also, we implemented an algorithm for entity linking and evaluated it on the corpus. The algorithm consists of two stages: candidate generation for an input term and ranking this set of candidates to choose the best match. We used string matching of an input term and an entity in a knowledge base to generate a set of candidates. To rank the candidates and choose the most relevant entity for a term, information about the number of links to other entities within the knowledge base and to other sites is used. We analyzed the obtained results and proposed possible ways to improve the quality of the algorithm, for example, using information about the context and a knowledge base structure. The annotated corpus is publicly available and can be useful for other researchers.
In this paper, the authors describe an algorithm for importing data from the social network Twitter and building weighted social graphs. To import data, the given posts are taken as a basis, users who have had any of the recorded interactions with them are downloaded. Further, the algorithm focuses on the given configuration and uses it to calculate the weights on the edges of the resulting graph. The configuration takes into account the type of user interaction with each other. The authors introduce the concept of (F, L, C, R)-model of information interaction.
The authors describe the developed algorithm and implemented software for constructing weighted graphs. The paper shows the application of the algorithm and three models on the example of both a single post and a series of posts.
The occurrence of snow avalanches is mainly influenced by meteorological conditions and the configuration of snow cover layers. Machine learning methods have predictive power and are capable of predicting new events. From the trained machine learning models, an ensemble is obtained that predicts the possibility of avalanches. The model obtained in the article uses avalanche data, meteorological data and generated data on the state of snow cover for training. This allows the resulting solution to be used in more mountainous areas than solutions using a wider range of less available data.
Snow data is generated by the SNOWPACK software package.
At present the study of computer science is almost impossible imagine without electronic educational resources usage. These resources are most often represents at learning package. Synthesis of learning packages is labour and resource intensive process. The results of this process influence to students work efficient. Article is devoted to creating automated system synthesis of structured educational content. This system is the universal shell for computer science learning package. This system allow to unify educational process approach, apply regardless of course content, increase student work efficient in university local network. Developed system is include six independent module. These modules realize system load properties choose, service, system load, authentification and tools, student competence control. This article contains modules work concept. Module work concept of system load properties choose is describes in detail and it's function algorithm is presented. Educational content is structured to computer science teaching program. All content is divided to two logical section. Which section is contains several subsection. Those section and subsection are TreeView hierarchical tree nodes. They are included educational content elements by special algorithm. Universal test system and special developed on VBA interactive trainers are included to competence control subsection. Those trainers are program's which can generate task for several topic. The system which considering in this article have some advantages. It's occupied small memory on hard disk, it's worked in network and on local computer. The educational content can be flexible included in this system without on course volume and student basic knowledge level.
The automatic service composition is discussed in the article. The method is proposed for building the service composition based on the processing of statistical data on individual applying services (tasks) by users. The method is based on linking tasks to each other, determining data dependencies, parameters of services whose values are rigidly set by the composition of services, and parameters whose values can be changed by the user are highlighted. Service compositions are built in the form of a directed graph of DAG. The methods have been developed for reducing the set of obtained service compositions, which allow us to highlight useful ones and rank them by degree of use. In particular, equivalent service compositions based on isomorphism of DAG graphs are determined, trivial ones are discarded, and only compositions that lead to the published result are left behind.
ISSN 2410-0420 (Online)