Analysis and testing of the RadixSpline Learned index
https://doi.org/10.25205/1818-7900-2025-23-3-67-78
Abstract
Recent advances in machine learning have introduced a novel approach to database indexing known as learned indexes. Among these, the RadixSpline index has emerged as a particularly promising solution for efficient search in sorted key-value data. This paper presents a comprehensive analysis and performance evaluation of the RadixSpline index, which combines spline approximation with bounded error (GreedySpline) and a sparse radix table structure. We investigate the index’s behavior under different parameter configurations, focusing on two key parameters: the spline approximation error (err) and the radix prefix length (r). Our experimental study, conducted on a 25 MB subset of the MovieLens32M dataset, demonstrates that RadixSpline can achieve up to 30% faster search times compared to traditional binary search while reducing memory usage by over 80 % compared to RocksDB’s sparse index implementation. The study makes several important contributions: (1) we provide a detailed analysis of parameter sensitivity, identifying optimal configurations for various use cases; (2) we address implementation challenges not covered in the original work, including handling of non-contiguous prefixes and missing keys; (3) we demonstrate the index’s suitability for real-time applications through its single-pass construction property. Our results confirm that RadixSpline offers significant advantages over conventional indexing methods, particularly in scenarios requiring fast searches with limited memory resources. The findings suggest strong potential for integrating RadixSpline into production database systems, with RocksDB being a particularly promising candidate for such integration.
Keywords
About the Authors
I. M. StetskiiRussian Federation
Ilya M. Stetsky, Master’s Student
Novosibirsk
B. N. Pischik
Russian Federation
Boris N. Pishik, Associate Professor
Novosibirsk
References
1. Kraska T., Beutel A., Chi E.H., Dean J., Polyzotis N. The Case for Learned Index Structures // arXiv:1712.01208 [cs.DB]. 2017. DOI: 10.48550/arXiv.1712.01208
2. Kipf A., Marcus R., van Renen A., Stoian M., Kemper A., Kraska T., Neumann T. RadixSpline: A Single-Pass Learned Index // Proceedings of the Third International Workshop on Exploiting Artifi cial Intelligence Techniques for Data Management (aiDM ‘20). 2020. Art. No. 5. P. 1–5. DOI: 10.1145/3401071.3401659.
3. Neumann T., Michel S. Smooth Interpolating Histograms with Error Guarantees // Proceedings of the 11th International Conference on Extending Database Technology (EDBT ‘08). Cardiff , UK, 2008. P. 42–53. DOI: 10.1007/978-3-540-70504-8-12.
4. MovieLens32M Dataset [Электронный ресурс] / URL: GroupLens Research (дата обращения: 10.12.2024).
5. Index Block Format [Электронный ресурс] // RocksDB Wiki. – URL: https://github.com/facebook/rocksdb/wiki/Index-Block-Format (дата обращения: 10.08.2024).
6. RocksDB: A Persistent Key-Value Store [Электронный ресурс] / Meta Open Source. URL: https://github.com/facebook/rocksdb (дата обращения: 10.08.2024).
7. O’Neil P. The Log-Structured Merge-Tree (LSM-Tree) // Acta Informatica. 1996. Vol. 33. No. 4. P. 351–385. DOI: 10.1007/s002360050048.
8. Marcus R., Kipf A., van Renen A., Stoian M., Misra S., Kemper A., Neumann T., Kraska T. Benchmarking Learned Indexes // Proceedings of the VLDB Endowment (PVLDB). 2021. Vol. 14. No. 1. P. 1–13. DOI: 10.14778/3421424.3421425.
9. Li Z., Chan T. N., Yiu M. L., Jensen C. S. A Survey of Learned Indexes for the Multi-dimensional Space // arXiv:2403.06456 [cs.DB]. 2024. DOI: 10.48550/arXiv.2403.06456
10. Ferragina P., Vinciguerra G. A Critical Analysis of Recursive Model Indexes // arXiv:2106.16166 [cs.DB]. 2021. DOI: 10.48550/arXiv.2106.16166
11. Ding J., Minhas U. F., Yu J., Wang C., Li Y., Zhang H., Chandramouli B., Gehrke J., Kossmann D., Lomet D., Kraska T. Towards Practical Learned Indexing (Extended Abstracts) // arXiv:2108.05117 [cs.DB]. 2021. DOI: 10.48550/arXiv.2108.05117
12. Wu J., Zhang Y., Chen S., Wang J., Chen Y., Xing C. LSI: A Learned Secondary Index Structure // arXiv:2205.05769 [cs.DB]. 2022. DOI: 10.48550/arXiv.2205.05769
13. Hadian A., Heinis T. Shift-Table: A Low-Latency Learned Index for Range Queries using Model Correction // arXiv:2103.13160 [cs.DB]. 2021. DOI: 10.48550/arXiv.2103.13160
14. Vaidya K., Kraska T. Partitioned Learned Bloom Filter // Proceedings of the VLDB Endowment. 2020. Vol. 13. No. 12. P. 3298–3311. DOI: 10.14778/3415478.3415545
15. Spector B., Marcus R., van Renen A., Stoian M., Kemper A., Kraska T., Neumann T. Bounding the Last Mile: Effi cient Learned String Indexing // arXiv:2111.14905 [cs.DB]. 2021. DOI: 10.48550/arXiv.2111.14905
16. Crotty A. Hist-Tree: Those Who Ignore It Are Doomed to Learn [Электронный ресурс] // CIDR 2021. URL: https://cs.brown.edu/people/acrotty/pubs/cidr2021-paper20.pdf (дата обращения: 02.02.2025).
Review
For citations:
Stetskii I.M., Pischik B.N. Analysis and testing of the RadixSpline Learned index. Vestnik NSU. Series: Information Technologies. 2025;23(3):67-78. (In Russ.) https://doi.org/10.25205/1818-7900-2025-23-3-67-78

