Vol 16, No 3 (2018)

РАЗВИТИЕ ОБРАЗОВАНИЯ В БИОИНФОРМАТИКЕ. ПО МАТЕРИАЛАМ, ПРЕДСТАВЛЕННЫМ НА СТУДЕНЧЕСКИХ КОНФЕРЕНЦИЯХ МНСК-2018, ШКОЛЕ МОЛЕКУЛЯРНОГО МОДЕЛИРОВАНИЯ И ХАКАТОНЕ В НОВОСИБИРСКЕ

Ю. Орлов, А. Бакулина

PDF (Rus)

5-6 59

THE COMPUTER DATABASE FOR THE ANALYSIS OF DIFFERENTIALLY EXPRESSING GENES, RELATED TO AGGRESSIVE BEHAVIOR ON MODELS OF LABORATORY ANIMALS

A. O. Bragin, K. A. Tabanyukhov, I. V. Chadaeva, A. V. Tsukanov, R. O. Babenko, I. V. Medvedeva, A. G. Bogomolov, V. N. Babenko, Yu. L. Orlov

PDF (Rus)

7-21 114

Abstract

Development and application of computer means of the analysis of transcriptome sequencing data in model laboratory animals represents an important problem of bioinformatics. The research problem of an expression of genes on the basis of the modern methods of high-throughput sequencing in brain areas of laboratory animals is common basis for a research of genetic background of behavior. Studying of genetic determinants of aggressive behavior in general not only actual for research of molecular mechanisms of behavior regulation, but also has wide practical component for the work with animals and application in agrobiology. The genetic predisposition of animals to aggressive behavior leads to emergence of differences in a brain structure, and comparison of such distinctions allow to find both the common, and specific mechanisms of behavior regulation promoting aggression manifestation in provocative conditions of the environment. Computer programs of the gene splicing analysis, and prototype of the database of an expression of genes in brain areas of laboratory animals - gray rats, selected on manifestation of aggressive behavior are developed. We fulfilled the functional summary of over- and under-expressed genes in experiments on rats, described isoforms and the alternate splicing patterns of these genes.

COMPUTER ANALYSIS OF GENE ALTERNATIVE SPLICING IN GLIOMA CELL CULTURES BY RNA-seq DATA

S. S. Kovalev, E. Yu. Lieberfarb, N. V. Gubanova, A. O. Bragin, A. G. Galieva, A. V. Tsukanov, V. N. Babenko, Yu. L. Orlov

PDF (Rus)

22-36 98

Abstract

Fundamental biomedical research in oncology, the search for new markers of tumor development, modern post-genomic studies of gene expression on cell cultures need glioma transcriptome profiling and analysis of individual gene isoforms. Such experiments, in turn, require development of new computer tools and database for analysis of bulk sequencing data. The aim of our study is a computer search for genes and gene isoforms, the difference of their expression is associated with the development of glioblastoma. The work is based on modern high-throughput sequencing technologies and international biomedical data banks analysis. The search for candidate genes in tumors for therapeutic treatment, including individual gene isoforms, is very relevant in healthcare and modern high-tech medicine. This work presents the bioinformatics problems related to the development of computer pipelines for the processing of transcriptomic data, the revealing of the differentially expressed genes, the analysis of alternative splicing, and the description of the gene ontologies categories for the genes sets found. The tasks of automatic search and description of gene functions in connection with cancer diseases, visualization of results and development of biomedical databases are considered. A prototype database of differential alternative splicing of genes is presented, «Differential Alternative Splicing of Human Genes in Secondary Glioblastoma (DASGG)», with the ability to work through a website, to search for expression levels of individual isoforms in tumor cells.

CYTOSCAPE PLUGIN FOR RECONSTRUCTION OF STRUCTURAL RANDOM GRAPH MODELS OF BIOLOGICAL NETWORKS

N. L. Podkolodnyy, D. A. Gavrilov, N. N. Tverdokhleb, O. A. Podkolodnaya

PDF (Rus)

37-50 157

Abstract

Modern experimental technologies in molecular biology allow reconstructing different types of biological networks, including gene and metabolic networks, networks of interatomic, gene coexpression networks, a network of diseases, etc. This article presents the program tool for reconstructing structural random graph models of biological networks, the structural regularities of which coincide with the structural regularities of the initial biological network. Such structural models can be used to test various statistical hypotheses on networks, to study the influence of structural regularities in biological networks on their function, and so on. Our tool generate the structural random graph models with the following fixed characteristics: the distribution of vertex degrees, the joint distribution of degrees of vertices, the average degree of neighboring vertices, the clustering coefficient, the clustering spectrum, the frequency of structural motifs of various sizes, etc. The developed system is based on the client-server architecture and consists of the Cytoscape plug-application and remote computing service. The interaction between the client and the server is implemented through the gRPC framework using the Protocol Buffers (structured data serialization protocol). The system allows to construct the structural random graph models of the given biological networks asynchronously through software Random Network Generator and GTrie Scanner. The result structural model can be loaded for visualization and analysis using the Cytoscape package. This article also presents the computational experiment for reconstruct the structural random graph models of a number of biological networks. The algorithm for estimating the time of calculations of structural models of this kind of biological networks was constructed.

PROGRAMS FOR STATISTICAL ANALYSIS, CLUSTERIZATION AND VISUALIZATION OF GENOME DISTRIBUTION OF TRANSCRIPTION FACTOR BINDING SITES

A. V. Tsukanov, N. G. Orlova, A. I. Dergilev, Yu. L. Orlov

PDF (Rus)

51-63 106

Abstract

The analysis of gene transcription regulation based on the data of modern technologies of high-performance sequencing is an actual task of bioinformatics. It requires the development of new computer tools including supercomputer applications. We consider the problems of processing of genome ChIP-seq profiles for detections of transcription factors binding site in a genome, determining the peaks of such profiles and search the binding sites in the nucleotide sequences of the peaks. The computer programs have been developed to analyze the location of the binding sites in the genome relative to gene regions, to calculate clusters of such sites and visualize their positions in the genome. Clusters of binding sites of transcription factors in the human genome have been calculated using the Cistrome database. We have calculated matrices of the joint occurrence of pairs of binding sites of different transcription factors in the genome for various types of tissues and cells. A computational experiment on the computer generation of random clusters in the genome was carried out, as well as an assessment of the occurrence of large clusters for experimentally obtained binding sites of transcription factors in the human genome. The patterns of occurrence of binding sites of pluripotency factors in embryonic stem cells were described. The developed software is available on request to the authors.

DEVELOPMENT OF THE AUTOMATED MODULE FOR MINERALIZATION OF WATER

R. M. Bandishoeva

PDF (Rus)

64-73 102

Abstract

The article is devoted to the development of the management system of fertigation. The system allows automatic feeding of fertilizers taking into account the pH and EC values obtained from the respective sensors. To solve this problem, a mineralization module consisting of two submodules has been developed: a submodule for measuring and controlling pH and a submodule for measuring and regulating EC. The data is transferred to the microcontroller for further system management actions. As a decision-making system, the work uses an intelligent system based on fuzzy logic.

DEVELOPING THE SYSTEM FOR AUTOMATIC SUMMARIZATION OF SCIENTIFIC TEXTS

T. V. Batura, A. M. Bakiyeva

PDF (Rus)

74-86 185

Abstract

The paper describes a new method of automatic text summarization. Based on this method, a system has been created that makes it possible to obtain summaries of scientific and technical texts and to determine their topics. The summarization process consists of five main steps: preprocessing, transformation, weight evaluation, sentence selection, and smoothing. The proposed method allows receiving the summary based on important sentences of the original document. The importance of sentences is partially determined in the process of rhetorical analysis, which is performed using discursive markers and connectors. Keywords, multiword terms, and some special words that are often found in scientific and technical texts are also taken into account. We used additive regularization for topic modeling (ARTM) to extract keywords and discover the topics.

INFORMATION MODELS AND PROJECT SOLUTIONS FOR THE ECCLESIA RESEARCH DATA STORING AND PROCESSING SYSTEM

M. A. Gorodnichev, A. V. Komissarov, A. V. Mozhina, P. V. Prochkin, P. D. Rudych, A. V. Yurchenko

PDF (Rus)

87-104 118

Abstract

Scientific research produces a lot of digital data that should be carefully gathered and stored for further usage: processing, analysis and publication. Building e-infrastructure for that is one of the most topical problems of IT (or digital) curation of science. Starting from three data-processing problems in physiology we are developing an information system for automation of gathering, storing and analyzing data. Problems encountered in development of such a system are examined and analyzed, along with existing approaches and software solutions related to these problems. Based on results of the conducted analysis a number of models and mechanisms for solving encountered problems are proposed. Developed solutions include models and mechanisms for collecting and storing research data, a model describing and formalizing data processing scenarios and models and mechanisms for processing collected data in a distributed computer system. As a result, an architecture for a computer system for collecting, storing and processing research data is presented. The system is proposed as a tool for solving a wide spectrum of problems in scientific research involving collecting and multi-step processing of various kinds of data.

GENERAL GESTURAL DICTIONARY DEVELOPMENT FOR NATURAL COMPUTER-BASED CONTACTLESS INTERFACE

V. A. Zeng

PDF (Rus)

105-112 117

Abstract

This article contains the description of contactless systems and interfaces and main principles of working with these technologies. There is possibility of using such systems to simplify the interaction of users with the limitations of health possibilities with a computer interface. The features and advantages of using natural interfaces and systems based on gesture control are also presented. There are stages of the formation of a basic gesture dictionary for further use in contactless interface which are described in detail as well. As additional hardware for obtaining more accurate results of recognizing such hand gestures a Microsoft Kinect device was reviewed.

AN INTELLECTUAL SYSTEM FOR PROCESSING AND INTEGRATING KNOWLEDGE BASED ON SEMANTIC WEB TECHNOLOGIES

I. A. Korsun, D. E. Palchunov

PDF (Rus)

113-125 92

Abstract

The paper is devoted to the development of model-theoretic methods of concepts definitions extraction from the natural language texts. The information extracted from texts is represented in the form of statements on the Description Logic language (DL) by transformation through fragments of atomic diagrams of algebraic systems. Such a representation allows you to get more expressive texts, in contrast to algorithms, where information is represented in a database or by expressions in a formal language (for example, SQL).

SOME APPROACHES TO STORAGE INFORMATION ABOUT TEXTBOOK PROVISION OF EDUCATION PROCESS AT UNIVERSITIES

D. S. Matusevich, O. V. Izmesteva

PDF (Rus)

126-132 101

Abstract

At the article explore three approaches to storage information about textbooks provision of educational process in different types of integration Integrated Library System and University’s Information System: inside the Integrated Library System in RUSMARC format, at relational database, OLAP technology. Analyzing advantages and disadvantages of each approach.

ADAPTATION OF SERVICE MENU STRUCTURE FOR DIFFERENT TYPES OF USERS USING THE ONTOLOGY MODELING

R. S. Pogodin

PDF (Rus)

133-144 112

Abstract

The purpose of the article is to solve the problem of adaptation of large arborescent and linear menus of mobile and internet services for various types of users according to their interests, social status and other parameters. There has been created the program system building an optimal menu of services for classes of users divided by physical and socio-economic parameters. When adapting the menu, a modified algorithm of construction of an optimal graph of the USSD-menu was used. An ontological approach was used during the work for formal representation of the notions of a given subject area, extraction, representation and processing of knowledge. Users’ models representing descriptions of their needs, aims and interests were used for adaptation of interfaces. Formalization of users’ conduct was realized with the help of an ontological model of mobile and internet services. Each user can be ascribed to a definite model based on their physical and social parameters. The program realizing adaptation of the menus contains two modules: the module of frequency of calls reception based on ontology queries and the graph of the menu optimization module. An algorithm of the optimization of the menu works with DOT graph prescription language.

EXPERIMENTAL STUDY OF THE ACCURACY OF COMPRESSION-BASED FORECASTING METHODS

K. S. Chirikhin, B. Ya. Ryabko

PDF (Rus)

145-158 94

Abstract

In information theory it is known that methods of data compression can be used for forecasting of stationary processes. In this paper an compression-based algorithm for time series forecasting was proposed and empirical study of its accuracy was carried out. The algorithm can operate with arbitrary methods of data compression. During the steps of the algorithm predicted values from different methods are combined, and the greatest impact on the end result is exerted by the method with the best compression ratio for the series. The algorithm can be used for forecasting of time series with discrete and continuous alphabets. To improve the accuracy of the forecast existing methods of time series preprocessing can be used. The empirical study of the efficiency of the proposed algorithm was conducted on time series from the M3 Competition and the T-index series. To generate forecasts well-known archivers were used. The results of the calculations showed that the obtained method has a relatively high accuracy and speed.

Сведения об авторах

Editorial Article

PDF (Rus)

159-160 93

Информация для авторов

Editorial Article

PDF (Rus)

161-161 84

Username
Password
	Remember me
Not a user? Register with this site Forgot your password?

Vestnik NSU. Series: Information Technologies

БИОИНФОРМАТИКА

Information Technologies

User

Vestnik NSU. Series: Information Technologies

БИОИНФОРМАТИКА

Information Technologies

Cookies policy