Preview

Vestnik NSU. Series: Information Technologies

Advanced search

DEVELOPING THE SYSTEM FOR AUTOMATIC SUMMARIZATION OF SCIENTIFIC TEXTS

https://doi.org/10.25205/1818-7900-2018-16-3-74-86

Abstract

The paper describes a new method of automatic text summarization. Based on this method, a system has been created that makes it possible to obtain summaries of scientific and technical texts and to determine their topics. The summarization process consists of five main steps: preprocessing, transformation, weight evaluation, sentence selection, and smoothing. The proposed method allows receiving the summary based on important sentences of the original document. The importance of sentences is partially determined in the process of rhetorical analysis, which is performed using discursive markers and connectors. Keywords, multiword terms, and some special words that are often found in scientific and technical texts are also taken into account. We used additive regularization for topic modeling (ARTM) to extract keywords and discover the topics.

About the Authors

T. V. Batura
Novosibirsk State University; A. P. Ershov Institute of Informatics Systems SB RAS
Russian Federation


A. M. Bakiyeva
Novosibirsk State University
Russian Federation


References

1. Ананьева М.И.,Кобозева М.В. Разработка корпуса текстов на русском языке с разметкой на основе теории риторических структур // Компьютерная лингвистика и интеллектуальные технологии: по материалам ежегодной Международной конференции «Диалог». М., 2016. URL: www.dialog-21.ru/media/3460/ananyeva.pdf

2. Marcu D. Improving summarization through rhetorical parsing tuning // VI Workshop on Very Large Corpora. 1998. P. 206-215.

3. Hovy E., Lin Ch.-Y. Automated text summarization and the SUMMARIST system // Proc. of the TIPSTER Text Program. 1998. P. 197-214.

4. Teufel S., Moens M. Summarizing scientific articles: experiments with relevance and rhetorical status // Computational Linguistics. 2002. Vol. 28 (4). P. 409-445.

5. Bosma W. Query-Based Summarization using Rhetorical Structure Theory // 15th Meeting of CLIN. 2005. P. 29-44.

6. Huspi S. H. Improving Single Document Summarization in a Multi-Document Environment. PhD Thesis. Melbourne, Australia: RMIT University, 2017. 190 p.

7. Mithun S. Exploiting rhetorical relations in blog summarization. PhD Thesis. Montreal, Canada: Concordia University, 2012. 230 p.

8. Тревгода С. А. Методы и алгоритмы автоматического реферирования текста на основе анализа функциональных отношений: Дис. … канд. техн. наук. СПб., 2009. 157 с.

9. Осминин П. Г. Построение модели реферирования и аннотирования научно-технических текстов, ориентированной на автоматический перевод: Дис. … канд. филол. наук. Челябинск, 2016. 239 с.

10. Batura T. V., Bakiyeva A. M., Yerimbetova A. S. Mit'kovskaya M. V. Semenova N. A. Methods of constructing natural language analyzers based on Link Grammar and rhetorical structure theory // Bulletin of the Novosibirsk Computing Center. Series: Computer Science. 2016. Is. 40. P. 37-51.

11. Бакиева А. М., Батура Т. В. Исследование применимости теории риторических структур для автоматической обработки научно-технических текстов // Cloud of Science. Вып. 4, № 3. С. 450-464.

12. Pisarevskaya D., Ananyeva M., Kobozeva M., Nasedkin A., Nikiforova S., Pavlova I., Shelepov A. Towards building a discourse-annotated corpus of Russian // Computational Linguistics and Intellectual Technologies. 2017. Iss. 16 (23). Vol. 1. P. 194-204.

13. Vorontsov K., Frei O., Apishev M., Romov P., Dudarenko M. BigARTM: Open Source Library for Regularized Multimodal Topic Modeling of Large Collections // International Conference on Analysis of Images, Social Networks and Texts (AIST). Ekaterinburg, Russia, 2015. P. 370-384.

14. Mann W., Thompson C. Rhetorical structure theory: Toward a functional theory of text organization // Text-Interdisciplinary Journal for the Study of Discourse. 1988. Vol. 8. No. 3. P. 243-281.

15. Blei D. M., Lafferty J. D. Visualizing Topics with Multi-Word Expressions // Semantic Scholar. 2009. URL: https://arxiv.org/pdf/0907.1013.pdf

16. Батура Т. В., Стрекалова С. Е. Подход к построению расширенных тематических моделей текстов на русском языке // Вестн. НГУ. Серия: Информационные технологии. 2018. T. 16, № 2. С. 5-18.

17. Vorontsov K. Welcome to BigARTM’s documentation! 2015. URL: http://bigartm.read thedocs.io/en/stable/

18. Das D., Martins A. A. Survey on Automatic Text Summarization // Literature Survey for the Language and Statistics II course at CMU. 2007. P. 192-195.

19. Lin Ch. Y. ROUGE: A Package for Automatic Evaluation of Summaries // Workshop On Text Summarization Branches Out. 2004. P. 74-81.

20. Zhang J. J., Chan H. Y., Fung P. Improving lecture speech summarization using rhetorical information // IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU). 2007. P. 195-200.


Review

For citations:


Batura T.V., Bakiyeva A.M. DEVELOPING THE SYSTEM FOR AUTOMATIC SUMMARIZATION OF SCIENTIFIC TEXTS. Vestnik NSU. Series: Information Technologies. 2018;16(3):74-86. (In Russ.) https://doi.org/10.25205/1818-7900-2018-16-3-74-86

Views: 137


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 1818-7900 (Print)
ISSN 2410-0420 (Online)