Preview

Vestnik NSU. Series: Information Technologies

Advanced search

Extracting Semantic Relations from the Texts of Scientifc Articles

https://doi.org/10.25205/1818-7900-2022-20-3-65-76

Abstract

Nowadays, the number of scientifc publications existing in the form of electronic text is constantly growing. As a result, the tasks related to the text processing of scientifc articles become especially actual. This paper is dedicated to the task of extracting semantic relations between entities from the texts of scientifc articles in Russian, where we consider scientifc terms as entities. Relation extraction can be useful in some specialized areas, such as searching and question-answering systems, as well as in the compilation of ontologies. In our work, we have created a corpus of scientifc texts consisting of 136 abstracts of scientifc articles in Russian, in which 353 relations of the following types were highlighted: USAGE, ISA, TOOL, SYNONYMS, PART_OF, CAUSE. This corpus was used to train the machine learning models. In addition, we have implemented the automatic semantic relation extraction algorithm and tested it on the already existing corpus RuSERRC. The neural network model BERT was used to implement the algorithm. We’ve done a number of experiments using vectors derived from different language models, as well as two neural network architectures. The developed tool and the annotated corpus are publicly available and can be useful for other researchers.

About the Authors

O. Yu. Tikhobaeva
Novosibirsk State University
Russian Federation

 Olga Yur. Tikhobaeva, Student

Novosibirsk 



E. P. Bruches
A.P. Ershov Institute of Informatics Systems SB RAS; Novosibirsk State University
Russian Federation

 Elena P. Bruches, Junior Researcher; Senior Lecturer

Novosibirsk 



T. V. Batura
A. P. Ershov Institute of Informatics Systems SB RAS; Novosibirsk State University
Russian Federation

 Tatiana Viktorovna Batura, PhD in Physics and Mathematics, Associate Professor, Head of Laboratory, Associate Professor

Novosibirsk 



References

1. Auger A., Barrière C. Pattern-based approaches to semantic relation extraction: A state-of-theart. Terminology, 2008. vol. 14, no. 1, pp. 1–19. DOI: 10.1075/term.14.1.02aug

2. Kambhatla N. Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations. Proceedings of the ACL Interactive Poster and Demonstration Sessions, 2004. Pp. 178–181. DOI: 10.3115/1219044.1219066

3. Zeng D., Liu K., Lai S., Zhou G., Zhao J. Relation classifcation via convolutional deep neural network. Proceedings of COLING 2014, the 25th international conference on computational linguistics: technical papers, 2014. p. 2335-2344.

4. Bruches E., Pauls A., Batura T., Isachenko V. Entity recognition and relation extraction from scientifc and technical texts in Russian. 2020 Science and Artifcial Intelligence conference (S.A.I.ence), IEEE, 2020. Pp. 41–45. DOI: 10.1109/s.a.i.ence50533.2020.9303196

5. Bruches E., Mezentseva A., Batura T. A system for information extraction from scientifc texts in Russian, 2021. arXiv preprint arXiv:2109.06703

6. Devlin J., Chang M.W., Lee K., Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, Minnesota, 2019. Vol. 1 (Long and Short Papers), pp. 4171–4186. arXiv preprint arXiv:1810.04805. DOI: 10.18653/v1/N19-1423

7. Luan Y., He L., Ostendorf M., Hajishirzi H. Multi-task identifcation of entities, relations, and coreference for scientifc knowledge graph construction. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 2018. Pp. 3219–3232. DOI: 10.18653/v1/D18-1360.

8. Wu S., He Y. Enriching pre-trained language model with entity information for relation classifcation. Proceedings of the 28th ACM international conference on information and knowledge management, 2019. Pp. 2361–2364. DOI: 10.1145/3357384.3358119

9. Kuratov Y., Arkhipov M. Adaptation of deep bidirectional multilingual transformers for Russian language. Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue 2019”, Moscow, May 29—June 1, 2019. arXiv preprint arXiv:1905.07213

10. Zhang B., Williams P., Titov I., Sennrich R. Improving massively multilingual neural machine translation and zero-shot translation. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020. Pp. 1628–1639, Online. arXiv preprint arXiv:2004.11867. DOI: 10.18653/v1/2020.acl-main.148


Review

For citations:


Tikhobaeva O.Yu., Bruches E.P., Batura T.V. Extracting Semantic Relations from the Texts of Scientifc Articles. Vestnik NSU. Series: Information Technologies. 2022;20(3):65-76. (In Russ.) https://doi.org/10.25205/1818-7900-2022-20-3-65-76

Views: 254


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 1818-7900 (Print)
ISSN 2410-0420 (Online)