Preview

Vestnik NSU. Series: Information Technologies

Advanced search

On Increasing the Quality of the Climate Observations Question-Answering System’s Output Data

https://doi.org/10.25205/1818-7900-2024-22-4-5-16

Abstract

The development of the climate observations question-answer (QA) information system relies on heterogeneous climate data in various formats (text, numerical, graphic, video, audio, geographic and monitoring data). A mandatory element of such a system is a tool that allows processing and analyzing such data.
Searching and retrieving data is a central part of the system in question, since the quality of the generated answer heavily depends on it. The exact way the data is retrieved is critical to the output of a QA system as well as to decision-making problems, since there are situations in which the LLM generates a contextually appropriate but factually incorrect answers that do not match the input. Using correct metrics and algorithms for some data types and incorrect ones for others can cause the permissible threshold of irrelevant data to be exceeded, which in turn can cause the quality of the answers to decrease. Retrieval-augmented generation (RAG) systems can also be used to optimize input data for that task.
This work discusses various algorithms for data extraction and document ranking, as well as the possibility of using ensembles of LLM agents in development of the QA system that works with climate data.

About the Authors

O. Yu. Gavenko
Federal Research Center for Information and Computational Technologies; Novosibirsk State University
Russian Federation

Olga Yu. Gavenko, Doctor of Sciences (Technical Sciences), Сandidate of Sciences (Philology), Leading Researcher; Senior lecturer of the Department of Mathematical Modeling

Novosibirsk



N. A. Shashok
Federal Research Center for Information and Computational Technologies
Russian Federation

Natalia A. Shashok, Ph. D Student

Novosibirsk



References

1. Hirschman L., Gaizauskas R. Natural language question answering: the view from here. Natural Language Engineering Journal, 2001, vol. 7, no. 4, pp. 275–300. DOI: 10.1017/S1351324901002807

2. Keen P. G. W, Michael S. S. M. Decision support systems: an organizational perspective. Michigan, Addison-Wesley, 1978.

3. Woods W. A. Progress in natural language understanding: an application to lunar geology. Proceedings of the national computer conference and exposition (AFIPS ‘73), 1974, Association for Computing Machinery, New York, NY, USA, pp. 441–450. DOI: https://doi.org/10.1145/1499586.1499695

4. Lewis P., Perez E., et al. Retrieval-augmented generation for knowledge-intensive NLP tasks. Proceedings of the 34th International Conference on Neural Information Processing Systems (NIPS ‘20), 2020, Curran Associates Inc., Red Hook, NY, USA, Article 793, pp. 9459–9474. DOI: 10.48550/arXiv.2005.11401

5. Wang L., Lo K. et al. CORD-19: The Covid-19 Open Research Dataset. ArXiv, abs/2004.10706, 2020. DOI: 10.48550/arXiv.2004.10706

6. Rajpurkar P., Zhang J., Lopyrev K., Liang P. Squad: 100,000+ questions for machine comprehension of text. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016, Association for Computational Linguistics, Austin, Texas, USA, pp. 2383–2392. doi: 10.18653/v1/D16-1264

7. Magesh V., Surani F., Dahl M., Suzgun M. et al. Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools. ArXiv, abs/2405.20362, 2024. DOI: 10.48550/arXiv.2405.20362

8. Page L., Brin S., Motwani R., Winograd T. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report, Stanford InfoLab, 1999.

9. Fadeev S. V. Ekologicheskij slovar’. Saint Petersburg, 2011 (in Russ.)

10. Florin C., Giovanni T., et al: The Power of Noise: Redefining Retrieval for RAG Systems. Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2024, Association for Computing Machinery, New York, NY, USA pp. 719-729. DOI: 10.1145/3626772.3657834

11. Cormack G. V., Clarke C. ., Büttcher S. Reciprocal rank fusion outperforms condorcet and individual rank learning methods. Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, 2009, Association for Computing Machinery, New York, NY, USA, pp. 758–759. DOI: 10.1145/1571941.1572114


Review

For citations:


Gavenko O.Yu., Shashok N.A. On Increasing the Quality of the Climate Observations Question-Answering System’s Output Data. Vestnik NSU. Series: Information Technologies. 2024;22(4):5-16. (In Russ.) https://doi.org/10.25205/1818-7900-2024-22-4-5-16

Views: 111


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 1818-7900 (Print)
ISSN 2410-0420 (Online)