Preview

Vestnik NSU. Series: Information Technologies

Advanced search

AUTOMATIC EXTRACTION OF FORMAL LATTICES FROM MEDICAL TEXTS BASED ON THE COMBINATION OF THE FORMAL CONCEPT ANALYSIS AND BOOTSTRAPPING TECHNOLOGIES

https://doi.org/10.25205/1818-7900-2018-16-4-140-152

Abstract

The article considers a new way of concept extraction from the subject domain texts based on combination of formal concept analysis and bootstrap technology of information retrieval. Formal concept analysis is a powerful way of automatically deriving the domain concepts, but it is designed for high quality input data, without missing and inaccuracies. Obtaining such datasets directly from texts is difficult because of the strong sparsity of the text corpora. Accordingly, it seems promising to improve the quality of input data with bootstrapping, a technology that provides an intelligent search for fragmented information on the Internet. In this paper, we show the steps of implementing the way of automatically concept extraction from medical texts based on the filling of blanks in highly sparse matrices of the joint occurrence of terms. The input data for formal concept analysis is represented in the form of an object-feature table that reflects the distribution of attributes over the objects of the domain. The purpose of this paper is to show that with proper selection of initial search patterns, bootstrapping based on the use of open Internet resources as valuable sources of knowledge, turns into an effective tool for supporting conceptual modeling.

About the Authors

A. B. Nugumanova
Sarsen Amanzholov East-Kazakhstan State University
Russian Federation


E. M. Bayburin
Sarsen Amanzholov East-Kazakhstan State University
Russian Federation


M. E. Mansurova
Al-Farabi Kazakh National University
Russian Federation


V. B. Barakhnin
Institute of Computational Technologies SB RAS; Novosibirsk State University
Russian Federation


References

1. Игнатов Д. И. Анализ формальных понятий: от теории к практике // Доклады всероссийской научной конференции АИСТ'12 «Анализ изображений, сетей и текстов». 16-18 марта 2012 г. Национальный открытый университет «ИНТУИТ». Екатеринбург, 2012. С. 3-15.

2. Ganter B., Wille R. Formal concept analysis: mathematical foundations. Springer Science & Business Media, 2012. 284 p.

3. Кузнецов О. С., Объедков С. А. Алгоритмы построения множества всех понятий формального контекста и его диаграммы Хассе // Изв. РАН. Теория и системы управления. 2001. № 1. С. 120-129.

4. Hwang Y. S., Finch A., Sasaki Y. Improving statistical machine translation using shallow linguistic knowledge // Computer Speech & Language. 2007. Vol. 21. No. 2. P. 350-372.

5. Crysmann B. et al. An integrated architecture for shallow and deep processing // Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, 2002. P. 441-448.

6. PullEnti / К. И. Кузнецов. 2013. URL: http://www.pullenti.ru/Default.aspx (дата обращения 07.01.2018).

7. Kozerenko E., Kuznetsov K., Morozova Yu., Romanov D. Semantic Proximity Establishment in the Tasks of Knowledge Extraction and Named Entities Recognition // Proc. of the 2017 Int. Conf. on Artificial Intelligence. 2017. P. 339-344.

8. Zipf G. Selective Studies and the Principle of Relative Frequency in Language. Cambridge, 1932.

9. Nadeau D., Turney P., Matwin S. Unsupervised Named-Entity Recognition: Generating Gazetteers and Resolving Ambiguity // Advances in Artificial Intelligence. 2006. P. 266-277.

10. Schapire R. E. The boosting approach to machine learning: An overview // Nonlinear estimation and classification. New York: Springer, 2003. P. 149-171.

11. Vieira K. et al. Finding seeds to bootstrap focused crawlers // World Wide Web. 2016. Vol. 19. No. 3. P. 449-474.


Review

For citations:


Nugumanova A.B., Bayburin E.M., Mansurova M.E., Barakhnin V.B. AUTOMATIC EXTRACTION OF FORMAL LATTICES FROM MEDICAL TEXTS BASED ON THE COMBINATION OF THE FORMAL CONCEPT ANALYSIS AND BOOTSTRAPPING TECHNOLOGIES. Vestnik NSU. Series: Information Technologies. 2018;16(4):140-152. (In Russ.) https://doi.org/10.25205/1818-7900-2018-16-4-140-152

Views: 61


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 1818-7900 (Print)
ISSN 2410-0420 (Online)