Preview

Vestnik NSU. Series: Information Technologies

Advanced search
Vol 16, No 3 (2018)
View or download the full issue PDF (Russian)

БИОИНФОРМАТИКА

7-21 56
Abstract
Development and application of computer means of the analysis of transcriptome sequencing data in model laboratory animals represents an important problem of bioinformatics. The research problem of an expression of genes on the basis of the modern methods of high-throughput sequencing in brain areas of laboratory animals is common basis for a research of genetic background of behavior. Studying of genetic determinants of aggressive behavior in general not only actual for research of molecular mechanisms of behavior regulation, but also has wide practical component for the work with animals and application in agrobiology. The genetic predisposition of animals to aggressive behavior leads to emergence of differences in a brain structure, and comparison of such distinctions allow to find both the common, and specific mechanisms of behavior regulation promoting aggression manifestation in provocative conditions of the environment. Computer programs of the gene splicing analysis, and prototype of the database of an expression of genes in brain areas of laboratory animals - gray rats, selected on manifestation of aggressive behavior are developed. We fulfilled the functional summary of over- and under-expressed genes in experiments on rats, described isoforms and the alternate splicing patterns of these genes.
22-36 52
Abstract
Fundamental biomedical research in oncology, the search for new markers of tumor development, modern post-genomic studies of gene expression on cell cultures need glioma transcriptome profiling and analysis of individual gene isoforms. Such experiments, in turn, require development of new computer tools and database for analysis of bulk sequencing data. The aim of our study is a computer search for genes and gene isoforms, the difference of their expression is associated with the development of glioblastoma. The work is based on modern high-throughput sequencing technologies and international biomedical data banks analysis. The search for candidate genes in tumors for therapeutic treatment, including individual gene isoforms, is very relevant in healthcare and modern high-tech medicine. This work presents the bioinformatics problems related to the development of computer pipelines for the processing of transcriptomic data, the revealing of the differentially expressed genes, the analysis of alternative splicing, and the description of the gene ontologies categories for the genes sets found. The tasks of automatic search and description of gene functions in connection with cancer diseases, visualization of results and development of biomedical databases are considered. A prototype database of differential alternative splicing of genes is presented, «Differential Alternative Splicing of Human Genes in Secondary Glioblastoma (DASGG)», with the ability to work through a website, to search for expression levels of individual isoforms in tumor cells.
37-50 115
Abstract
Modern experimental technologies in molecular biology allow reconstructing different types of biological networks, including gene and metabolic networks, networks of interatomic, gene coexpression networks, a network of diseases, etc. This article presents the program tool for reconstructing structural random graph models of biological networks, the structural regularities of which coincide with the structural regularities of the initial biological network. Such structural models can be used to test various statistical hypotheses on networks, to study the influence of structural regularities in biological networks on their function, and so on. Our tool generate the structural random graph models with the following fixed characteristics: the distribution of vertex degrees, the joint distribution of degrees of vertices, the average degree of neighboring vertices, the clustering coefficient, the clustering spectrum, the frequency of structural motifs of various sizes, etc. The developed system is based on the client-server architecture and consists of the Cytoscape plug-application and remote computing service. The interaction between the client and the server is implemented through the gRPC framework using the Protocol Buffers (structured data serialization protocol). The system allows to construct the structural random graph models of the given biological networks asynchronously through software Random Network Generator and GTrie Scanner. The result structural model can be loaded for visualization and analysis using the Cytoscape package. This article also presents the computational experiment for reconstruct the structural random graph models of a number of biological networks. The algorithm for estimating the time of calculations of structural models of this kind of biological networks was constructed.
51-63 49
Abstract
The analysis of gene transcription regulation based on the data of modern technologies of high-performance sequencing is an actual task of bioinformatics. It requires the development of new computer tools including supercomputer applications. We consider the problems of processing of genome ChIP-seq profiles for detections of transcription factors binding site in a genome, determining the peaks of such profiles and search the binding sites in the nucleotide sequences of the peaks. The computer programs have been developed to analyze the location of the binding sites in the genome relative to gene regions, to calculate clusters of such sites and visualize their positions in the genome. Clusters of binding sites of transcription factors in the human genome have been calculated using the Cistrome database. We have calculated matrices of the joint occurrence of pairs of binding sites of different transcription factors in the genome for various types of tissues and cells. A computational experiment on the computer generation of random clusters in the genome was carried out, as well as an assessment of the occurrence of large clusters for experimentally obtained binding sites of transcription factors in the human genome. The patterns of occurrence of binding sites of pluripotency factors in embryonic stem cells were described. The developed software is available on request to the authors.

Information Technologies

64-73 57
Abstract
The article is devoted to the development of the management system of fertigation. The system allows automatic feeding of fertilizers taking into account the pH and EC values obtained from the respective sensors. To solve this problem, a mineralization module consisting of two submodules has been developed: a submodule for measuring and controlling pH and a submodule for measuring and regulating EC. The data is transferred to the microcontroller for further system management actions. As a decision-making system, the work uses an intelligent system based on fuzzy logic.
74-86 133
Abstract
The paper describes a new method of automatic text summarization. Based on this method, a system has been created that makes it possible to obtain summaries of scientific and technical texts and to determine their topics. The summarization process consists of five main steps: preprocessing, transformation, weight evaluation, sentence selection, and smoothing. The proposed method allows receiving the summary based on important sentences of the original document. The importance of sentences is partially determined in the process of rhetorical analysis, which is performed using discursive markers and connectors. Keywords, multiword terms, and some special words that are often found in scientific and technical texts are also taken into account. We used additive regularization for topic modeling (ARTM) to extract keywords and discover the topics.
87-104 66
Abstract
Scientific research produces a lot of digital data that should be carefully gathered and stored for further usage: processing, analysis and publication. Building e-infrastructure for that is one of the most topical problems of IT (or digital) curation of science. Starting from three data-processing problems in physiology we are developing an information system for automation of gathering, storing and analyzing data. Problems encountered in development of such a system are examined and analyzed, along with existing approaches and software solutions related to these problems. Based on results of the conducted analysis a number of models and mechanisms for solving encountered problems are proposed. Developed solutions include models and mechanisms for collecting and storing research data, a model describing and formalizing data processing scenarios and models and mechanisms for processing collected data in a distributed computer system. As a result, an architecture for a computer system for collecting, storing and processing research data is presented. The system is proposed as a tool for solving a wide spectrum of problems in scientific research involving collecting and multi-step processing of various kinds of data.
105-112 72
Abstract
This article contains the description of contactless systems and interfaces and main principles of working with these technologies. There is possibility of using such systems to simplify the interaction of users with the limitations of health possibilities with a computer interface. The features and advantages of using natural interfaces and systems based on gesture control are also presented. There are stages of the formation of a basic gesture dictionary for further use in contactless interface which are described in detail as well. As additional hardware for obtaining more accurate results of recognizing such hand gestures a Microsoft Kinect device was reviewed.
113-125 50
Abstract
The paper is devoted to the development of model-theoretic methods of concepts definitions extraction from the natural language texts. The information extracted from texts is represented in the form of statements on the Description Logic language (DL) by transformation through fragments of atomic diagrams of algebraic systems. Such a representation allows you to get more expressive texts, in contrast to algorithms, where information is represented in a database or by expressions in a formal language (for example, SQL).
126-132 54
Abstract
At the article explore three approaches to storage information about textbooks provision of educational process in different types of integration Integrated Library System and University’s Information System: inside the Integrated Library System in RUSMARC format, at relational database, OLAP technology. Analyzing advantages and disadvantages of each approach.
133-144 67
Abstract
The purpose of the article is to solve the problem of adaptation of large arborescent and linear menus of mobile and internet services for various types of users according to their interests, social status and other parameters. There has been created the program system building an optimal menu of services for classes of users divided by physical and socio-economic parameters. When adapting the menu, a modified algorithm of construction of an optimal graph of the USSD-menu was used. An ontological approach was used during the work for formal representation of the notions of a given subject area, extraction, representation and processing of knowledge. Users’ models representing descriptions of their needs, aims and interests were used for adaptation of interfaces. Formalization of users’ conduct was realized with the help of an ontological model of mobile and internet services. Each user can be ascribed to a definite model based on their physical and social parameters. The program realizing adaptation of the menus contains two modules: the module of frequency of calls reception based on ontology queries and the graph of the menu optimization module. An algorithm of the optimization of the menu works with DOT graph prescription language.
145-158 41
Abstract
In information theory it is known that methods of data compression can be used for forecasting of stationary processes. In this paper an compression-based algorithm for time series forecasting was proposed and empirical study of its accuracy was carried out. The algorithm can operate with arbitrary methods of data compression. During the steps of the algorithm predicted values from different methods are combined, and the greatest impact on the end result is exerted by the method with the best compression ratio for the series. The algorithm can be used for forecasting of time series with discrete and continuous alphabets. To improve the accuracy of the forecast existing methods of time series preprocessing can be used. The empirical study of the efficiency of the proposed algorithm was conducted on time series from the M3 Competition and the T-index series. To generate forecasts well-known archivers were used. The results of the calculations showed that the obtained method has a relatively high accuracy and speed.


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 1818-7900 (Print)
ISSN 2410-0420 (Online)