Skip to content

Instantly share code, notes, and snippets.

@andywer
Created August 21, 2018 05:03

Revisions

  1. andywer created this gist Aug 21, 2018.
    53 changes: 53 additions & 0 deletions postgres-outage.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,53 @@
    # 2018-08-20: Postgres failure

    ## What happened

    The PostgreSQL container stopped unexpectedly, was automatically restarted, but suddenly didn't accept any connections anymore. Neither from the API service containers nor from the Macbook over the internet.

    Error in logs:
    ```
    FATAL: pg_hba.conf rejects connection for host "10.0.1.2", user "postgres", database "******", SSL off
    ```


    ## Cause

    Two lines were added at the beginning of the `/var/lib/postgresql/data/pg_hba.conf` file (automatically by some script of the Postgres docker image?), even before the initial comment block:

    ```
    host all postgres 0.0.0.0/0 reject
    host all pgdbadm 0.0.0.0/0 md5
    ```

    The first line caused the outage, since it would reject any connection using that user.


    ## Fix

    ```sh
    $ docker ps
    $ docker exec -it <postgres-container-ID> bash
    # In the container:
    $ vi /var/lib/postgresql/data/pg_hba.conf
    ```

    Change first line of `pg_hba.conf` or (untested:) remove the top two lines:

    ```diff
    - host all postgres 0.0.0.0/0 reject
    + host all postgres 0.0.0.0/0 md5
    ```

    Run (still in the Postgres container):

    ```sh
    $ su - postgres
    $ pg_ctl reload
    ```

    That's it. I was now able to connect from the Macbook and the API services worked again.


    ## How to prevent in the future

    Not possible to prevent until the cause of the configuration change is known.