Preview

Vestnik NSU. Series: Information Technologies

Advanced search

Scientific Texts Classification by Speciality with Machine Learning Methods

https://doi.org/10.25205/1818-7900-2022-20-2-27-36

Abstract

This article investigates the problem of experimental study classification problem of scientific text materials by utilizing the methods of Machine Learning and Deep Learning. The experimental study based on text classification method which proposed preprocessing and specificity of scientific text materials by using the ML algorithms to improve accuracy and speed of text classification was conducted. The analysis of indexation and classification methods by specialties was conducted for a set of scientific text materials. The evaluation and comparison of ML algorithms’ quality was considered, and the results of dissertational works’ classification by machine learning methods within the framework of the existing training set of scientific materials were obtained.

About the Authors

B. Inomov
Tajik Technical University named after academician M. S. Osimi
Tajikistan

Behruz B. Inomov, Ph.D, Senior Lecturer of Digital Economy Department, Polytechnic Institute of the Tajik Technical University named after academician MS Osimi

Khujand



M. Tropmann-Frick
Hamburg University of Applied Sciences
Germany

Marina Tropmann-Frick, Professor of Data Science, Department of Computer Science, University of Applied Sciences (HAW Hamburg)

Hamburg



References

1. Maqsudov Kh. T., Inomov B. B., Mullojonov N. M. Comparative analysis of methods «Decision tree» and «random forest» in determining the specialty of scientific texts // Bulletin of the Tajik national university: series of natural sciences 2019. № 3. – Dushanbe: TNU, 2019. – pp. 23–28.

2. Maqsudov Kh. T., Inomov B. B. Evaluation of the effectiveness of k-nearest neighbors and logistic regression methods in determining the specialty of scientific texts // Polytechnic Bulletin series: intelligence. Innovation. Investments. 4 (48)2019. – Dushabe: TTU, 2019. – pp. 34–38.

3. Gusev P. Yu. Development of a Classification System for Texts by Scientific Specialties Using Machine Learning Methods // Vestnik NSU. Series: Information Technologies, 2021, vol. 19, no. 1

4. Danilov G. V. et al. Sravnitel’nyj analiz statisticheskih metodov klassifikacii nauchnyh publickacij v oblasti mediciny [Comparative analysis of statistical methods for the classification of scientific publications in the field of medicine]. Komp’juternye issledovanija i modelirovanie, 2020, vol. 12, no. 4, p. 921–933 (in Russ). DOI 10.20537/2076-7633-2020-12-4-921-933.

5. Kepa M., Szymanski J. Two stage SVM and kNN text documents classifier, In: Pattern Recognition and Machine Intelligence, Kryszkiewicz M. (Ed.), Lecture Notes in Computer Science, Vol. 9124, pp. 279–289, 2015

6. Adeniyi D. A., Wei Z., Yongquan Y. Automated web usage data mining and recommendation system using K-Nearest Neighbor (KNN)classification method // Applied Computing and Informatics. – 2016. – Vol. 12. – № 1. – С. 90–108.

7. Baralis E., Cagliero L., Garza P. EnBay: A novel pattern-based Bayesian classifier, Tkde, vol. 25, no. 12, pp. 2780–2795, 2013.

8. Tang B. et al. A Bayesian classification approach using class-specific features for text categorization // IEEE Transactions on Knowledge and Data Engineering. – 2016. – Vol. 28. – № 6. – С. 1602–1606.

9. Yoo J. Y., Yang D. Classification scheme of unstructured text document using TF-IDF and naive bayes classifier // Advanced Scienceand Technology Letters. – 2015. – Vol. 3. – С.263–266.

10. Lilleberg J., Zhu Y., Zhang Y. Support vector machines and word2vec for text classification with semantic features // Cognitive Informatics & Cognitive Computing (ICCI* CC), 2015 IEEE 14th International Conference on. – IEEE, 2015. – С. 136–140.

11. Barik R. C., Naik B. A Novel Extraction and Classification Technique for Machine Learning using Time Series and Statistical Approach, Computational Intelligence in Data Mining, vol. 3, pp. 217–228, 2015.

12. Liu Z., Lv X., Liu K., Shi S., Study on SVM compared with the other text classification methods, 2nd Int. Work. Educ. Technol. Comput. Sci. ETCS 2010, vol. 1, pp. 219–222, 2010.

13. Pliakos K., Geurts P., Vens C. Global multi-output decision trees for interaction prediction // Machine Learning. – 2018. – С. 1–25.

14. Inomov B. B. Resources, source code, results. [Electronic resource]. URL: https://drive.google. com/open?id=13SaeBHidtCPpOdXTmtlGMWiT_WwbkujG (seen: 06.04.2019).

15. TF-IDF — Wikipedia. [Electronic resource]. URL: https://ru.wikipedia.org/wiki/TF-IDF (seen: 06.04.2019).

16. Sklearn.feature_extraction.text.CountVectorizer [Electronic resource]. URL: https://scikitlearn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html (seen: 16.03.2019).


Review

For citations:


Inomov B., Tropmann-Frick M. Scientific Texts Classification by Speciality with Machine Learning Methods. Vestnik NSU. Series: Information Technologies. 2022;20(2):27-36. (In Russ.) https://doi.org/10.25205/1818-7900-2022-20-2-27-36

Views: 130


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 1818-7900 (Print)
ISSN 2410-0420 (Online)