• Bug#1029548: Warnings about collation version mismatch

    From Raphael Hertzog@21:1/5 to Julian Gilbey on Thu Apr 24 11:00:01 2025
    Hi,

    On Tue, 24 Jan 2023, Julian Gilbey wrote:
    WARNING: database "postgres" has a collation version mismatch
    DETAIL: The database was created using collation version 2.35, but
    the operating system provides version 2.36.
    HINT: Rebuild all objects in this database that use the default
    collation and run ALTER DATABASE postgres REFRESH COLLATION VERSION,
    or build PostgreSQL with the right library version.

    While those are only warnings, you will actually get errors
    when you try to create new databases:
    createdb: error: database creation failed: ERROR: template database "template1" has a collation version mismatch

    And those errors are relatively frequent for users of rolling releases
    derived from Debian like Kali Linux. They even have a page for this: https://www.kali.org/docs/troubleshooting/postgresql-collation-mismatch-error/

    I think it would be nice if that operation could be automated.
    With dpkg triggers, it's easy to schedule a script on the postgresql
    package side when libc6 (or locales?) gets upgraded. We can record the
    libc6 version at installation time, and compare the version in the trigger
    run, and if the major version has changed, then we run the required
    ALTER DATABASE commands.

    What do you think Christoph?

    Cheers,

    PS: Not quite sure if there's a good CLI command to directly get the
    "collation version".
    --
    ⢀⣴⠾⠻⢶⣦⠀ Raphaël Hertzog <hertzog@debian.org>
    ⣾⠁⢠⠒⠀⣿⡁
    ⢿⡄⠘⠷⠚⠋ The Debian Handbook: https://debian-handbook.info/get/
    ⠈⠳⣄⠀⠀⠀⠀ Debian Long Term Support: https://deb.li/LTS

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Christoph Berg@21:1/5 to All on Thu Apr 24 12:40:01 2025
    Re: Raphael Hertzog
    I think it would be nice if that operation could be automated.
    With dpkg triggers, it's easy to schedule a script on the postgresql
    package side when libc6 (or locales?) gets upgraded. We can record the
    libc6 version at installation time, and compare the version in the trigger run, and if the major version has changed, then we run the required
    ALTER DATABASE commands.

    The correct default action is to reindex all text indexes. ALTER
    DATABASE is just a shortcut for when you know that the collation has
    not changed.

    I wouldn't want to wire that decision into postinst scripts. On
    Debian, it's not a problem since it only happens at release upgrade
    time.

    PS: Not quite sure if there's a good CLI command to directly get the "collation version".

    In the OS, it's the glibc version. In PG, it's
    pg_database.datcollversion.

    Christoph

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Raphael Hertzog@21:1/5 to Christoph Berg on Fri Apr 25 11:10:01 2025
    Hi,

    On Thu, 24 Apr 2025, Christoph Berg wrote:
    Re: Raphael Hertzog
    I think it would be nice if that operation could be automated.
    With dpkg triggers, it's easy to schedule a script on the postgresql package side when libc6 (or locales?) gets upgraded. We can record the libc6 version at installation time, and compare the version in the trigger run, and if the major version has changed, then we run the required
    ALTER DATABASE commands.

    The correct default action is to reindex all text indexes. ALTER
    DATABASE is just a shortcut for when you know that the collation has
    not changed.

    How can we determine if the collation information has changed between
    two glibc releases? (cc Aurélien in case he knows)

    I understand the correct default action is to reindex all text fields,
    but arguably the "template1" and "template0" databases
    are mostly empty and likely have only ascii content where the collation information has no impact (or rather almost 0% chance of having been
    modified), so it should still be safe to perform those operations on those databases no?

    I would argue that the "postgres" database could likely be in the same
    set, while it's not empty, does it have any non-ASCII content where the collation information might have an impact on indexes? Even if yes, I
    guess that the database is small enough that rebuilding all the text
    indexes is not very costly and can be automated too. What are the required commands to perform this reindex?

    Cheers,
    --
    ⢀⣴⠾⠻⢶⣦⠀ Raphaël Hertzog <hertzog@debian.org>
    ⣾⠁⢠⠒⠀⣿⡁
    ⢿⡄⠘⠷⠚⠋ The Debian Handbook: https://debian-handbook.info/get/
    ⠈⠳⣄⠀⠀⠀⠀ Debian Long Term Support: https://deb.li/LTS

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Christoph Berg@21:1/5 to All on Fri Apr 25 11:30:01 2025
    Re: Raphael Hertzog
    How can we determine if the collation information has changed between
    two glibc releases? (cc Aurlien in case he knows)

    The problem is, no one knows. Or else PG would already use that info.

    You could alternatively use the libicu collations, but these have
    effectively the same problem, just their track record of doing changes
    is more coordinated.

    I understand the correct default action is to reindex all text fields,
    but arguably the "template1" and "template0" databases
    are mostly empty and likely have only ascii content where the collation information has no impact (or rather almost 0% chance of having been modified), so it should still be safe to perform those operations on those databases no?

    template0 is always empty, yes.

    For template1 and postgres, you could just run `reindex database template1;`

    Christoph

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Aurelien Jarno@21:1/5 to Raphael Hertzog on Fri Apr 25 21:50:01 2025
    Hi,

    On 2025-04-25 10:58, Raphael Hertzog wrote:
    Hi,

    On Thu, 24 Apr 2025, Christoph Berg wrote:
    Re: Raphael Hertzog
    I think it would be nice if that operation could be automated.
    With dpkg triggers, it's easy to schedule a script on the postgresql package side when libc6 (or locales?) gets upgraded. We can record the libc6 version at installation time, and compare the version in the trigger
    run, and if the major version has changed, then we run the required
    ALTER DATABASE commands.

    The correct default action is to reindex all text indexes. ALTER
    DATABASE is just a shortcut for when you know that the collation has
    not changed.

    How can we determine if the collation information has changed between
    two glibc releases? (cc Aurélien in case he knows)

    I don't think there is an easy way to determine that. On the other hand,
    I think it's safe to assume that the collation information only
    changes for an upstream release, not for a debian revision.

    For a new upstream release, most of the time most of the locales do not
    see any collation change, but there is almost always at least one locale
    with collation changes (although that was not true for glibc 2.41).

    Regards
    Aurelien

    --
    Aurelien Jarno GPG: 4096R/1DDD8C9B aurelien@aurel32.net http://aurel32.net

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)