Skip to main content
Skip table of contents

Troubleshooting Postgres CrashLoopBackOff

🤔 Problem

It can happen after an unexpected crash or sudden stop of one of the Postgres containers that the database can no longer locate a valid checkpoint.

The following log can be observed in the concerned Postgres container

CODE
PANIC:  could not locate a valid checkpoint record

Restarting the container doesn’t seem to solve automatically the issue as Postgres is looking for a checkpoint record that is probably corrupted.

🌱 Solution

We would like to reset the write-ahead log and other control information of a PostgreSQL database cluster. The stored data should not be affected.

Proceed with the following steps in the concerned deployment.yaml

  1. locate the faulty postgres container

  2. add the fields

    1. command: ["sleep"]

    2. args: ["1000"]

  3. Save & Re deploy (The faulty container should not immediately restart when it fails)

  4. Open a bash in the pod

  5. run

    CODE
    su postgres
    pg_resetwal /var/lib/postgresql/data

  6. Once the database accessible, revert the changes from the steps 2.

📎 Related articles

https://stackoverflow.com/questions/60604699/postgres-k8s-panic-could-not-locate-a-valid-checkpoint-record-crashloopbachttps://www.postgresql.r2schools.com/postgresql-panic-could-not-locate-a-valid-checkpoint-record/

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.