Preview

Vestnik NSU. Series: Information Technologies

Advanced search

Preconditions-Based Algorithm for Safe Start of Replication in Fault-Tolerant PostgreSQL Cluster

https://doi.org/10.25205/1818-7900-2025-23-2-29-42

Abstract

Traditionally, fault-tolerant DBMS clusters using PostgreSQL or derivatives are built on replication machinery, operated via write-ahead log shipping. Default checks are aimed only at preserving the integrity of received records. In certain conditions replication start can lead to standby cluster node having data different from other nodes, or being unable to finish startup procedures. Existing high availability systems are forced to cope with the problem through recreating such nodes from backups, which is usually costly in terms of recovery time.
To address this issue, we propose an algorithm to prevent replication start when it is guaranteed to lead to data differences or node startup failure. For detection of such cases node collects information about write-ahead logs in the cluster and performs additional checks. If replication was blocked, automatic node synchronization for consequent replication start is available.
We have tested the algorithm on various real-world cluster confi gurations with simulated failures, and the experimental results indicate that algorithm substantially reduces the chance of nodes being non-eligible to restart.

About the Authors

A. S. Rudometov
Novosibirsk State University
Russian Federation

Andrey S. Rudometov, Master’s Student

Novosibirsk



M. V. Rutman
Novosibirsk State University
Russian Federation

Mikhail V. Rutman, Associate Professor

Novosibirsk



References

1. Thomas S. M. PG Phriday: Redefining Postgres High Availability. In: BonesMoses.org: сайт. 2024. URL: https://bonesmoses.org/2024/pg-phriday-redefining-postgres-high-availability/

2. Kassema J. J. Disaster Recovery Plan for Business Continuity: Case Study in a Business Sector. In: SSRN, 2016, DOI: 10.2139/ssrn.2796601

3. Stonebraker M., Rowe L. A. The design of POSTGRES. In: Proceedings of the 1986 ACM SIGMOD international conference on Management of data (SIGMOD ‘86) (June 1986). Association for Computing Machinery, New York, NY, USA, 1986, рp. 340–355. DOI 10.1145/16856.16888

4. Cecchet E., Candea G., Ailamaki A. Middleware-based database replication: the gaps between theory and practice. In: Proceedings of the 2008 ACM SIGMOD international conference on Management of data (SIGMOD ‘08). Association for Computing Machinery, New York, NY, USA, 2008, рp. 739–752. DOI 10.1145/1376616.1376691

5. Stonebraker M., Rowe L., Hirohama M. The Implementation Of Postgres. In: Knowledge and Data Engineering, IEEE Transactions on. 2. 1990, рp. 125–142. DOI 10.1109/69.50912

6. Wieck J. Slony-I. A replication system for PostgreSQL. In: Slony-I. URL: https://slony.info/images/Slony-I-concept.pdf

7. PostgreSQL: Documentation 9.0: Release 9.0. In: PostgreSQL: Documentation. URL: https://www.postgresql.org/docs/9.0/release-9-0.html

8. Linnakangas H. Understanding PostgreSQL timelines. In: FOSDEM 2013. URL: https://wiki.postgresql.org/images/e/e5/FOSDEM2013-Timelines.pdf

9. Davidson S. B., Garcia-Molina H.; Skeen D. Consistency In A Partitioned Network: A Survey. In: ACM Computing Surveys, 1985, vol. 17, iss. 3, рp. 341–370. DOI 10.1145/5505.5508

10. Panchenko I. PostgreSQL: yesterday, today, tomorrow. In: Open systems. DBMS, 2015, no. 3, рр. 34–37. URL: https://www.osp.ru/os/2015/03/13046900

11. PostgreSQL: Documentation 17.0: pg_rewind. In: PostgreSQL: Documentation. URL: https://www.postgresql.org/docs/17/app-pgrewind.html

12. Härder, T., Sauer, C., Graefe, G. et al. Instant recovery with write-ahead logging. Datenbank Spektrum 15. 2015, рp. 235–239. DOI 10.1007/s13222-015-0204-3.

13. Bárbaro P., Pedroso M. High Availability and Load Balancing for Postgresql Databases: Designing and Implementing. International Journal of Database Management Systems, 2016, vol. 8, рp. 27–34. DOI 10.5121/ijdms.2016.8603

14. Md. Anower H., Md. Imrul H., Dr. MD Rashedul I., Nadeem A. A Novel Recovery Process in Timelagged Server using Point in Time Recovery (PITR). In: 24th International Conference on Computer and Information Technology (ICCIT). 2021. DOI 10.1109/ICCIT54785.2021.9689808.

15. Kim H., Yeom H. Y, Son Y. An Efficient Database Backup and Recovery Scheme using Write-Ahead Logging. In: 2020 IEEE 13th International Conference on Cloud Computing (CLOUD), 2020, рp. 405–413, DOI: 10.1109/CLOUD49709.2020.00062

16. Introduction – patroni 3.3 documentation. In: patroni documentation. https://patroni.readthedocs.io/en/rel_3_3/

17. Postgres Pro Enterprise: Documentation: 16: F.8: biha – built-in high-availability cluster // Documentation PostgreSQL и Postgres Pro: Postgres Professional: site. URL: https://postgrespro.ru/docs/enterprise/16/biha

18. Meng-Lai Y. Assessing availability impact caused by switchover in database failover. In: 2009 Annual Reliability and Maintainability Symposium. Fort Worth, TX, USA, 2009, рp. 401–406. DOI: 10.1109/RAMS.2009.4914710.

19. Coan B. A., & Oki B. M., Kolodner E. K. Limitations on Database Availability when Networks Partition. In: PODC ‘86: Proceedings of the fifth annual ACM symposium on Principles of distributed computing. 1986, рp. 187–194. DOI: 10.1145/10590.10606.


Review

For citations:


Rudometov A.S., Rutman M.V. Preconditions-Based Algorithm for Safe Start of Replication in Fault-Tolerant PostgreSQL Cluster. Vestnik NSU. Series: Information Technologies. 2025;23(2):29-42. (In Russ.) https://doi.org/10.25205/1818-7900-2025-23-2-29-42

Views: 12


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 1818-7900 (Print)
ISSN 2410-0420 (Online)