Paper2vec and Cite2vec Methods for Analyzing Collections of Scientific Publications
https://doi.org/10.25205/1818-7900-2021-19-3-61-69
Abstract
Visualizations are used to better understand collections of scientific publications. Various methods of analyzing text collections can be used to build these visualizations. This article discusses two methods Paper2vec and Cite2vec that get vector representations of documents using citation information. To demonstrate a work of these techniques and an example of their application, visualizations were developed, which are described in this paper.
About the Author
N. I. TikhonovRussian Federation
Nikolay I. Tikhonov, Graduate Student
Novosibirsk
References
1. Apanovich Z. V. Evolution of Visualization Methods for Research Publication Collections. Elektronnye biblioteki, 2018, vol. 21, no. 1, pp. 2–42. (in Russ.)
2. Mikolov T., Sutskever I., Chen K., Corrado G. S., Dean J. Distributed Representations of Words and Phrases and Their Compositionality. Advances in Neural Information Processing Systems, 2013, vol. 26, pp. 3111–3119.
3. Pennington J., Socher R. D., Manning C. Glove: Global vectors for word representation. In: Proceedings of the Empirical Methods in Natural Language Processing (EMNLP 2014), 2014, pp. 1532–1543. DOI 10.3115/v1/D14-1162
4. Bojanowski P., Grave E., Joulin A., Mikolov T. Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics, 2017, vol. 5, pp. 135–146. DOI 10.1162/tacl_a_00051
5. Peters M., Neumann M., Iyyer M., Gardner M., Clark C., Lee K., Zettlemoyer L. Deep contextualized word representations. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018, vol. 1, pp. 2227–2237. DOI 10.18653/v1/N18-1202
6. Tian H., Zhuo H. H. Paper2vec: Citation-Context Based Document Distributed Representation for Scholar Recommendation. ArXiv. abs/1703.06587, 2017.
7. Berger M., McDonough K., Seversky Lee M. Cite2vec: Citation-Driven Document Exploration via Word Embeddings. IEEE Transactions on Visualization and Computer Graphics, 2017, vol. 23, no. 1, pp. 691–700. DOI 10.1109/TVCG.2016.2598667
8. Maaten L. van der, Hinton G. Viualizing data using t-SNE. Journal of Machine Learning Research, 2008, vol. 9, pp. 2579–2605.
Review
For citations:
Tikhonov N.I. Paper2vec and Cite2vec Methods for Analyzing Collections of Scientific Publications. Vestnik NSU. Series: Information Technologies. 2021;19(3):61-69. (In Russ.) https://doi.org/10.25205/1818-7900-2021-19-3-61-69