Transparent Reduction of Dimension with Genetic Algorithm

N. A. Radeev

doi:10.25205/1818-7900-2023-21-1-46-61

Transparent Reduction of Dimension with Genetic Algorithm

N. A. Radeev

https://doi.org/10.25205/1818-7900-2023-21-1-46-61

Full Text:

PDF (Rus) |

Generate QR code

Abstract

There are domain areas where all transformations of data must be transparent and interpretable (medicine and finance for example). Dimension reduction is an important part of a preprocessing pipeline but algorithms for it are not transparent at the current time. In this work, we provide a genetic algorithm for transparent dimension reduction of numerical data. The algorithm constructs features in a form of expression trees based on a subset of numerical features from the source data and common arithmetical operations. It is designed to maximize quality in binary classification tasks and generate features explainable by a human which achieves by using human-interpretable operations in a feature construction. Also, data transformed by the algorithm can be used in a visual analysis. The multicriterial dynamic fitness function is provided to build features with high diversity.

Keywords

genetic algorithm, dimension reduction, feature construction, interpretability, explainable AI, symbolic regression

About the Author

N. A. Radeev

Novosibirsk State University
Russian Federation

Nikita A. Radeev, master

faculty of Information Technologies

General Informatics Department

Novosibirsk

References

1. M. H. ur Rehman, C. S. Liew, A. Abbas, P. P. Jayaraman, T. Y. Wah, S. U. Khan. Big Data Reduction Methods: A Survey. Data Sci. Eng., 2016, vol. 1, no. 4, p. 265–284, DOI: 10.1007/s41019-016-0022-0.

2. C. H. Yoon, R. Torrance, N. Scheinerman. Machine learning in medicine: should the pursuit of enhanced interpretability be abandoned? J. Med. Ethics, 2022, vol. 48, no. 9, p. 581–585, DOI: 10.1136/medethics-2020-107102.

3. P. Linardatos, V. Papastefanopoulos, S. Kotsiantis. Explainable AI: A Review of Machine Learning Interpretability Methods. Entropy, 2021, vol. 23, no. 1, Art. no. 1, DOI: 10.3390/e23010018.

4. Izenman. Introduction to manifold learning. Wiley Interdiscip. Rev. Comput. Stat., 2012, vol. 4, DOI: 10.1002/wics.1222.

5. H. Han, W. Li, J. Wang, G. Qin, X. Qin. Enhance Explainability of Manifold Learning. Neurocomputing, 2022, vol. 500, DOI: 10.1016/j.neucom.2022.05.119.

6. D. Elizondo, R. Birkenhead, M. Gámez, N. Rubio, E. Alfaro-Cortés. Linear separability and classification complexity. Expert Syst. Appl., 2012, vol. 39, p. 7796–7807, DOI: 10.1016/j.eswa.2012.01.090.

7. J. Koza, R. Poli. Genetic Programming. in Search Methodologies, 2005, p. 127–164. DOI: 10.1007/0-387-28356-0_5.

8. U.-M. O’Reilly, E. Hemberg. Genetic programming: a tutorial introduction. 2021, p. 453. DOI: 10.1145/3449726.3461394.

9. Vasuki. Genetic Programming. 2020, p. 61–76. DOI: 10.1201/9780429289071-5.

10. L. Kallel, B. Naudts, C. Reeves. Properties of Fitness Functions and Search Landscapes. 2000, DOI: 10.1007/978-3-662-04448-3_8.

11. M. Schmidt, H. Lipson. Distilling Free-Form Natural Laws from Experimental Data. Science, 2009, vol. 324, no. 5923, p. 81–85, DOI: 10.1126/science.1165893.

12. W. La Cava et al. Contemporary Symbolic Regression Methods and their Relative Performance. in Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 2021, vol. 1.

13. L. Sotto, P. Kaufmann, T. Atkinson, R. Kalkreuth, M. Basgalupp. Graph representations in genetic programming. Genet. Program. Evolvable Mach., 2021, vol. 22, DOI: 10.1007/s10710-021-09413-9.

14. P. Krtolica, P. Stanimirovic. Reverse Polish notation method. Int. J. Comput. Math., 2004, vol. IJCM, p. 273–284, DOI: 10.1080/00207160410001660826.

15. C. Ferreira. Gene Expression Programming: a New Adaptive Algorithm for Solving Problems. arXiv, 2001. DOI: 10.48550/arXiv.cs/0102027.

16. C. Ferreira. Gene Expression Programming in Problem Solving. in Soft Computing and Industry: Recent Applications, Eds. London: Springer, 2002, p. 635–653. DOI: 10.1007/978-1-4471-0123-9_54.

17. Jolliffe. Principal Component Analysis. Springer: Berlin, Germany, 1986, vol. 87, p. 41–64, DOI: 10.1007/b98835.

18. L. van der Maaten, G. Hinton. Visualizing data using t-SNE. Journal of Machine Learning Research, 2008, vol. 9, p. 2579–2605.

19. B. Hosseini, B. Hammer. Interpretable Discriminative Dimensionality Reduction and Feature Selection on the Manifold. arXiv, arXiv:1909.09218, 2019 DOI: 10.48550/arXiv.1909.09218.

20. T. Uriot, M. Virgolin, T. Alderliesten, P. Bosman. On genetic programming representations and fitness functions for interpretable dimensionality reduction. arXiv, arXiv:2203.00528, 2022. DOI: 10.48550/arXiv.2203.00528.

21. Lensen, M. Zhang, B. Xue. Multi-Objective Genetic Programming for Manifold Learning: Balancing Quality and Dimensionality. Genet. Program. Evolvable Mach., 2020, vol. 21, no. 3, p. 399–431, DOI: 10.1007/s10710-020-09375-4.

22. M. Virgolin, T. Alderliesten, P. A. N. Bosman. On Explaining Machine Learning Models by Evolving Crucial and Compact Features. Swarm Evol. Comput., 2020 vol. 53, p. 100640, DOI: 10.1016/j.swevo.2019.100640.

23. Lensen, B. Xue, M. Zhang. Can Genetic Programming Do Manifold Learning Too? in Genetic Programming, Cham, 2019, p. 114–130. doi: 10.1007/978-3-030-16670-0_8.

24. Q. Zhang, H. Li. MOEA/D: A Multiobjective Evolutionary Algorithm Based on Decomposition. IEEE Trans. Evol. Comput., 2007, vol. 11, no. 6, p. 712–731, DOI: 10.1109/TEVC.2007.892759.

25. S. Roweis, L. Saul. Nonlinear Dimensionality Reduction by Locally Linear Embedding. Science, 2001, vol. 290, p. 2323–6, DOI: 10.1126/science.290.5500.2323.

26. B. K. Tripathy, S. Anveshrithaa, S. Ghela. Multidimensional Scaling (MDS). 2021, p. 41–51. DOI: 10.1201/9781003190554-6.

27. L. McInnes, J. Healy, J. Melville. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv, Sep. 17, 2020. DOI: 10.48550/arXiv.1802.03426.

28. C. Cortes, V. Vapnik. Support vector machines. Mach. Learn., 1995, vol. 20, p. 273–293.

29. Vanschoren, J. N. van Rijn, B. Bischl, L. Torgo. OpenML: networked science in machine learning. ACM SIGKDD Explor. Newsl., 2014, vol. 15, no. 2, p. 49–60, DOI: 10.1145/2641190.2641198.

30. S. Moro, P. Cortez, R. Laureano. Using Data Mining for Bank Direct Marketing: An Application of the CRISP-DM Methodology. 2011.

31. Yeh, K.-J. Yang, T.-M. Ting. Knowledge discovery on RFM model using Bernoulli sequence. Expert Syst Appl, 2009, vol. 36, p. 5866–5871, DOI: 10.1016/j.eswa.2008.07.018.

32. Bennett, O. L. Mangasarian. Robust Linear Programming Discrimination Of Two Linearly Inseparable Sets. Optim. Methods Softw., 2002, vol. 1, DOI: 10.1080/10556789208805504.

33. D. Dua, C. Graff. UCI Machine Learning Repository. University of California, Irvine, School of Information and Computer Sciences, 2019. [Online]. Available: http://archive.ics.uci.edu/ml

34. V. Sigillito, S. Wing, L. Hutton, K. Baker. Classification of radar returns from the ionosphere using neural networks. Johns Hopkins APL Tech. Dig. Appl. Phys. Lab., 1989, vol. 10.

35. Guyon, S. Gunn, A. Ben-Hur, G. Dror. Result Analysis of the NIPS 2003 Feature Selection Challenge, vol. 17. 2004.

36. R. P. Gorman, T. Sejnowski. Analysis of hidden units in a layered network trained to classify sonar targets. Neural Netw., 1988, vol. 1, p. 75–89, DOI: 10.1016/0893-6080(88)90023-8

Review

For citations:

Radeev N.A. Transparent Reduction of Dimension with Genetic Algorithm. Vestnik NSU. Series: Information Technologies. 2023;21(1):46-61. (In Russ.) https://doi.org/10.25205/1818-7900-2023-21-1-46-61

This work is licensed under a Creative Commons Attribution 4.0 License.

ISSN 1818-7900 (Print)
ISSN 2410-0420 (Online)

Username
Password
	Remember me
Not a user? Register with this site Forgot your password?

User

Vestnik NSU. Series: Information Technologies

Transparent Reduction of Dimension with Genetic Algorithm

Full Text:

Abstract

Keywords

About the Author

References

Review

For citations:

Cookies policy