Development of Chatbots to Support Web Site Content Search Based on Thematic and Genre Characteristics

V. D. Rublev; E. A. Sidorova

doi:10.25205/1818-7900-2021-19-4-50-66

Development of Chatbots to Support Web Site Content Search Based on Thematic and Genre Characteristics

V. D. Rublev, E. A. Sidorova

https://doi.org/10.25205/1818-7900-2021-19-4-50-66

Full Text:

PDF (Rus) |

Generate QR code

Abstract

The paper considers an approach to creating intelligent assistants in the form of chatbots that support information search based on preliminary genre and thematic clustering of website content. The tasks of finding the necessary information and providing information support to the user, organizing feedback to improve the quality of the search are being solved. A feature of the approach is the use of genre models developed for a given type of resource (educational, informational, etc.), on the basis of which genre structuring of the content of a particular site is carried out. The resulting genre structures allow you to more accurately determine the boundaries of thematic clusters related to the topic of the user's search query. To provide feedback to the user, a simple script has been developed that allows not only to clarify the request, but also to implicitly get information about what exactly did not suit the user in the resulting out-put. An experimental study was conducted on the Telegram platform, the results were compared with the Yandex search engine.

Keywords

intelligent assistant, search engine, information search, genre model of the site, genre segmentation, thematic clustering

About the Authors

V. D. Rublev

Novosibirsk State University
Russian Federation

Vladislav D. Rublev, Master’s Student

Novosibirsk

E. A. Sidorova

A. P. Ershov Institute of Informatics Systems of the Siberian Branch of the Russian Academy of Sciences
Russian Federation

Elena A. Sidorova, Candidate of Sciences (Physics and Mathematics), Senior Researcher

Novosibirsk

References

1. Kutovenko A. Professional internet search. St. Petersbug, Peter, 2011, 252 p. (in Russ.)

2. Stanislaw Osinski, Dawid Weiss. Carrot2 Project. In: Carrot2 – Open Source Search Results Clustering Engine. URL: http://project.carrot2.org/.

3. Radhakrishnan Arun. Hakia’s Semantic Search : The Answer to Poor Keyword Based Relevancy. Search Engine Journal. URL: https://www.searchenginejournal.com/hakias-semantic-search-the-answer-to-poor-keyword-based-relevancy/5246/.

4. Nimavat K., Champaneria T. Chatbots: an overview of types, architecture, tools and future possibilities. Int. J. Sci. Res. Dev., 2017, pp. 1019–1024.

5. Wu Y., Wu W., Xing C., Zhou M., Li Z. Sequential Matching Network: A New Architecture for Multi-turn Response Selection in Retrieval-based Chatbots. In: ArXiv:11612.01627, 2017.

6. Kapočiūtė-Dzikienė J. A Domain-Specific Generative Chatbot Trained from Little Data. Applied Sciences, 2020, vol. 10, p. 2221.

7. Heriberto Cuayáhuitl, Donghyeon Lee, Seonghan Ryu, Yongjin Cho, Sungja Choi, Satish Indurthi, Seunghak Yu, Hyungtak Choi, Inchul Hwang, Jihie Kim. Ensemble-based deep reinforcement learning for chatbots. Neurocomputing, 2019, vol. 366, pp. 118–130.

8. Kim Sihyung, Kwon Oh-Woog, Kim Harksoo. Knowledge-Grounded Chatbot Based on Dual Wasserstein Generative Adversarial Networks with Effective Attention Mechanisms. Applied Sciences, 2020, vol. 10.

9. Bahtin M. M. The problem of speech genres. In: Estetika slovesnogo tvorchestva [Aesthetics of Verbal Creation]. Moscow, Iskusstvo, 1986, pp. 250–296. (in Russ.)

10. Kononenko I. S., Sidorova E. A. Genre aspects of website classification. Software Engineering, 2015, no. 8, pp. 32–40.

11. Sidorova E. A. A comprehensive approach to the study of lexical characteristics of the text. Vestnik SibSUTI, 2019, no. 3, pp. 80–88.

12. MacQueen J. B. Some Methods for classification and Analysis of Multivariate Observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability. University of California Press, 1967, pp. 281–297.

13. Guo J., Hartung S., Komusiewicz C. et al. Exact algorithms and experiments for hierarchical tree clustering. In: Proceedings of the TwentyFourth AAAI Conference on Artificial Intelli-gence (AAAI-10), 2010, pp. 1–6.

14. Manwar A., Mahalle H., Chinchkhede K. et al. A vector space model for information retrieval: a matlab approach. Indian Journal of Computer Science and Engineering, 2012, no. 3, pp. 222–230.

15. Erendira Rendon, Itzel Abundez, Alejandra Arizmendi et al. Internal versus external clus-ter validation indexes. International Journal of Computers and Communications, 2011, vol. 5, no. 1, pp. 27–34.

16. Yanchi Liu, Zhongmou Li, Hui Xiong et al. Understanding of internal clustering validation measures. In: IEEE International Conference on Data Mining, 2010, pp. 911–916. DOI 10.1109/tsmcb.2012.2220543

17. Olatz Arbelaitz, Ibai Gurrutxaga, Javier Muguerza et al. An extensive comparative study of cluster validity indices. Pattern Recognition, 2013, vol. 46, no. 1, pp. 243–256. DOI 10.1016/j.patcog.2012.07.021

18. Rousseeuw Peter J. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 1987, vol. 20, pp. 53–65. DOI 10.1016/0377-0427(87)90125-7

19. Sirotkin P. F. On Search Engine Evaluation Metrics. In: ArXiv:abs/1302.2318, 2013, pp. 24–26.

20. Belozerov V. N. The efficiency of the engines Yandex and Google to search for educational material. Vestnik MGIK, 2015. no. 1, pp. 208–213.

Review

For citations:

Rublev V.D., Sidorova E.A. Development of Chatbots to Support Web Site Content Search Based on Thematic and Genre Characteristics. Vestnik NSU. Series: Information Technologies. 2021;19(4):50-66. (In Russ.) https://doi.org/10.25205/1818-7900-2021-19-4-50-66

This work is licensed under a Creative Commons Attribution 4.0 License.

ISSN 1818-7900 (Print)
ISSN 2410-0420 (Online)

Username
Password
	Remember me
Not a user? Register with this site Forgot your password?

User

Vestnik NSU. Series: Information Technologies

Development of Chatbots to Support Web Site Content Search Based on Thematic and Genre Characteristics

Full Text:

Abstract

Keywords

About the Authors

References

Review

For citations:

Cookies policy