Extracting Semantic Relations from the Texts of Scientifc Articles
https://doi.org/10.25205/1818-7900-2022-20-3-65-76
Abstract
Nowadays, the number of scientifc publications existing in the form of electronic text is constantly growing. As a result, the tasks related to the text processing of scientifc articles become especially actual. This paper is dedicated to the task of extracting semantic relations between entities from the texts of scientifc articles in Russian, where we consider scientifc terms as entities. Relation extraction can be useful in some specialized areas, such as searching and question-answering systems, as well as in the compilation of ontologies. In our work, we have created a corpus of scientifc texts consisting of 136 abstracts of scientifc articles in Russian, in which 353 relations of the following types were highlighted: USAGE, ISA, TOOL, SYNONYMS, PART_OF, CAUSE. This corpus was used to train the machine learning models. In addition, we have implemented the automatic semantic relation extraction algorithm and tested it on the already existing corpus RuSERRC. The neural network model BERT was used to implement the algorithm. We’ve done a number of experiments using vectors derived from different language models, as well as two neural network architectures. The developed tool and the annotated corpus are publicly available and can be useful for other researchers.
About the Authors
O. Yu. TikhobaevaRussian Federation
Olga Yur. Tikhobaeva, Student
Novosibirsk
E. P. Bruches
Russian Federation
Elena P. Bruches, Junior Researcher; Senior Lecturer
Novosibirsk
T. V. Batura
Russian Federation
Tatiana Viktorovna Batura, PhD in Physics and Mathematics, Associate Professor, Head of Laboratory, Associate Professor
Novosibirsk
References
1. Auger A., Barrière C. Pattern-based approaches to semantic relation extraction: A state-of-theart. Terminology, 2008. vol. 14, no. 1, pp. 1–19. DOI: 10.1075/term.14.1.02aug
2. Kambhatla N. Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations. Proceedings of the ACL Interactive Poster and Demonstration Sessions, 2004. Pp. 178–181. DOI: 10.3115/1219044.1219066
3. Zeng D., Liu K., Lai S., Zhou G., Zhao J. Relation classifcation via convolutional deep neural network. Proceedings of COLING 2014, the 25th international conference on computational linguistics: technical papers, 2014. p. 2335-2344.
4. Bruches E., Pauls A., Batura T., Isachenko V. Entity recognition and relation extraction from scientifc and technical texts in Russian. 2020 Science and Artifcial Intelligence conference (S.A.I.ence), IEEE, 2020. Pp. 41–45. DOI: 10.1109/s.a.i.ence50533.2020.9303196
5. Bruches E., Mezentseva A., Batura T. A system for information extraction from scientifc texts in Russian, 2021. arXiv preprint arXiv:2109.06703
6. Devlin J., Chang M.W., Lee K., Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, Minnesota, 2019. Vol. 1 (Long and Short Papers), pp. 4171–4186. arXiv preprint arXiv:1810.04805. DOI: 10.18653/v1/N19-1423
7. Luan Y., He L., Ostendorf M., Hajishirzi H. Multi-task identifcation of entities, relations, and coreference for scientifc knowledge graph construction. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 2018. Pp. 3219–3232. DOI: 10.18653/v1/D18-1360.
8. Wu S., He Y. Enriching pre-trained language model with entity information for relation classifcation. Proceedings of the 28th ACM international conference on information and knowledge management, 2019. Pp. 2361–2364. DOI: 10.1145/3357384.3358119
9. Kuratov Y., Arkhipov M. Adaptation of deep bidirectional multilingual transformers for Russian language. Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue 2019”, Moscow, May 29—June 1, 2019. arXiv preprint arXiv:1905.07213
10. Zhang B., Williams P., Titov I., Sennrich R. Improving massively multilingual neural machine translation and zero-shot translation. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020. Pp. 1628–1639, Online. arXiv preprint arXiv:2004.11867. DOI: 10.18653/v1/2020.acl-main.148
Review
For citations:
Tikhobaeva O.Yu., Bruches E.P., Batura T.V. Extracting Semantic Relations from the Texts of Scientifc Articles. Vestnik NSU. Series: Information Technologies. 2022;20(3):65-76. (In Russ.) https://doi.org/10.25205/1818-7900-2022-20-3-65-76