Preview

Vestnik NSU. Series: Information Technologies

Advanced search

Using TXM Platform of Corpus Analysis for Text Analysis of Social Media

https://doi.org/10.25205/1818-7900-2023-21-2-29-38

Abstract

When forming graphs of interacting objects built when importing data from social networks and instant messaging networks, text data also act as vertex attributes. In this paper, the authors describe a text research methodology based on corpus analysis procedures. The purpose of this article is to test the methodological tools provided by the TXM software for the comparative analysis of the revealed communities texts on the graph of interacting objects. The method is proposed to assess the quality of the implicit communities revealing on the graph obtained by importing data from the channel network of the Telegram messenger.

About the Authors

A. I. Fokina
HSE University
Russian Federation

Alina I. Fokina, postgraduate student

Moscow



A. A. Chepovskiy
HSE University
Russian Federation

Alexander A. Chepovskiy, Ph.D. (mathematics), Associate Professor

Moscow



A. M. Chepovskiy
HSE University; Federal Research Center “Informatics and Management”
Russian Federation

Andrey M. Chepovskiy, Dr Sc. (Eng)

Moscow



References

1. Avanesyan N. L, Solovev F. N., Chepovskiy A. A. Characteristics of Texts of Social Networks Communities // Vestnik NSU. Series: Information Technologies. 2021. Vol. 19(1). Pp. 5–14. (in Russ.) DOI: 10.25205/1818-7900-2021-19-1-5-14

2. Lavrentiev A. M., Raybova D. M., Tikhomirova E. A., Fokina A. I., Chepovskiy A. M., Sherstinova T. Yu. Comparative analysis of special text corpora for security-related tasks // Voprosi kiberbezopasnosti. 2020. № 3(37). Pp. 58–65. (in Russ.) DOI: 10.21681/2311-3456-2020-03-58-65

3. Lavrentiev A. M., Smirnov I. V., Solovev F. N., Suvorova M. I., Fokina A. I., Chepovskiy A. M. Analiz korpusov tekstov terroristicheskoi i antipravovoy napravlennosti // Voprosi kiberbezopasnosti. 2019. № 4(32). Pp. 54–60. (in Russ.) DOI: 10.21681/2311-3456-2019-4-54-60

4. Lavrentiev A. M., Solovev F. N., Suvorova M. I., Fokina A. I., Chepovskiy A. M. A New Toolkit for Natural Text Processing with the TXM Platform and its Appliсation to a Corpus for Analysis of Texts Propagating Extremist Views // Vestnik NSU. Series: Linguistics and Intercultural Communication. 2018. Vol. 16, № 3. Pp. 19–31. (in Russ.). DOI: 10.25205/1818-7935-2018-16-3-19-31

5. Popov V. A., Chepovskij A. A. Vydelenie neyavnyh peresekayushchihsya soobshchestv na grafe vzaimodejstviya Telegram-kanalov s pomoshch’yu «metoda Galaktik» // Trudy ISA RAN. 2022. Vol. 72. № 4. Pp. 39–50. (in Russ.). DOI: 10.14357/20790279220405

6. Popov V. A., Chepovskij A. A. Telegram Messenger Data Import Models // Vestnik NSU. Series: Information Technologies. 2022. Vol. 20, № 2. Pp. 60–71. (in Russ.). DOI: 10.25205/1818-7900-2022-20-2-60-71

7. Solovev F. N. Embedding Additional Natural Language Processing Tools into the TXM Platform // Vestnik NSU. Series: Information Technologies. 2020. Vol. 18, no. 1. Pp. 74–82. (in Russ.). DOI: 10.25205/1818-7900-2020-18-1-74-82

8. Chepovskiy A. A. On the construction and analysis of graphs of interacting objects in the Telegram-channels network // Voprosy kiberbezopasnosti. 2023. Vol. 1(53). Pp. 75–81. (in Russ.). DOI: 10.21681/2311-3456-2023-1-75-81

9. Chepovskiy A. A. Implicit Communities Defined on the Graph for Interacting Objects // Russian Journal of Cybernetics. 2023. Vol. 4(1). Pp. 56–64. (in Russ.). DOI: 10.51790/2712-9942-2023-4-1-08

10. Benzécri J.-P. L’analyse des données: l’analyse des correspondances. 2nd ed. Paris: Dunod, 1979. Vol. 2.

11. Fortunato S., Newman M. E. J. 20 years of network community detection // Nat. Phys. 2022. Vol. 18. Pp. 848–850.

12. Heiden S. The TXM Platform: Building Open-Source Textual Analysis Software Compatible with the TEI Encoding Scheme // Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation. Sendai, Japan. Pp. 389–398.

13. Lavrentiev A., Sherstinova T., Chepovskiy A., Pincemin B. Using TXM Platform for Research on Language Changes over Time: The Dynamics of Vocabulary and Punctuation in Russian Literary Texts // Vestnik Tomskogo Gosudarstvennogo Universiteta, Filologiya. 2021. Vol. 70. Pp. 69–89. DOI: 10.17223/19986645/70/5

14. Newman M. E. J. Networks: An Introduction. Oxford University Press, 2010. 784 p.

15. Schmid H. Probabilistic Part-of-Speech Tagging Using Decision Trees [Online] // Proc. of International Conference on New Methods in Language Processing. Manchester, UK. 1994. URL: http://www.cis.uni-muenchen.de/sschmid/tools/TreeTagger/data/tree-tagger1.pdf (accessed on: 30.05.2023).


Review

For citations:


Fokina A.I., Chepovskiy A.A., Chepovskiy A.M. Using TXM Platform of Corpus Analysis for Text Analysis of Social Media. Vestnik NSU. Series: Information Technologies. 2023;21(2):29-38. (In Russ.) https://doi.org/10.25205/1818-7900-2023-21-2-29-38

Views: 216


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 1818-7900 (Print)
ISSN 2410-0420 (Online)