• [gentoo-dev] Defining TZ in the base system profile?

    From Joshua Kinard@21:1/5 to All on Thu Jan 19 02:50:01 2023
    So this article[1] from 2017 popped up again on the tech radar via hackernews[2] and a few other sites[3]. It
    annotates how if the envvar TZ is undefined on a Linux system, it causes glibc to generate a number of
    additional syscalls, mainly stat-related calls (in my tests, newfstatat()). If defined to an actual value,
    such as ":/etc/localtime" (or even an empty string), glibc will instead generate far fewer, if any at all, of
    these stat-related syscalls.

    Apparently, TZ is accessed quite frequently, so this has a compound effect, according to the article, in glibc
    making thousands of unnecessary stat-related syscalls to /etc/localtime (which must be hard-coded somewhere in
    glibc for this case). Given the article's age (five years old), I tested the example C program out, and it
    does appear to still be accurate on a modern glibc-based system. When TZ is undefined, I get exactly nine
    newfstatat calls on /etc/localtime. If I define TZ to ":/etc/localtime", I do not get any of these newfstatat
    calls, and if I set TZ to an empty string, glibc will call openat() against "/usr/share/zoneinfo/Universal"
    and then generate exactly two newfstatat syscalls on that handle to read it.

    I ran strace() against the undefined TZ case and the ":/etc/localtime" case, normalized the hex addresses to
    get a clean diff, and this is what it looks like:

    --- a 2023-01-18 20:30:36.826805343 -0500
    +++ b 2023-01-18 20:30:45.106983600 -0500
    @@ -1,4 +1,4 @@
    -# strace ./tz_test
    +# TZ=":/etc/localtime" strace ./tz_test
    execve("./tz_test", ["./tz_test"], 0xhhhhhhhhhhhh /* XX vars */) = 0
    brk(NULL) = 0xhhhhhhhhhhhh
    mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xhhhhhhhhhhhh
    @@ -61,15 +61,6 @@ read(3, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0
    lseek(3, -2260, SEEK_CUR) = 1292
    read(3, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\6\0\0\0\6\0\0\0\0"..., 3584) = 2260
    close(3) = 0
    -newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0644, st_size=3552, ...}, 0) = 0
    -newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0644, st_size=3552, ...}, 0) = 0
    -newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0644, st_size=3552, ...}, 0) = 0
    -newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0644, st_size=3552, ...}, 0) = 0
    -newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0644, st_size=3552, ...}, 0) = 0
    -newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0644, st_size=3552, ...}, 0) = 0
    -newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0644, st_size=3552, ...}, 0) = 0
    -newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0644, st_size=3552, ...}, 0) = 0
    -newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0644, st_size=3552, ...}, 0) = 0
    write(1, "Godspeed, dear friend!\n", 23Godspeed, dear friend!
    ) = 23
    exit_group(0) = ?

    For comparison, I tested the same program on FreeBSD and it does not exhibit this behavior at all, regardless
    of whether TZ is undefined, a value, or an empty string. I have yet to make a similar test on a mips/musl
    chroot to see how musl handles this.

    There is a rather old (2010) StackOverflow question[4] about it as well, and someone left an answer in March
    of last year about the specific code in glibc that handles TZ if it is set or is an empty string.

    So is adding a default definition of TZ to our base system /etc/profile something we want to look at? I
    haven't tried any other methods of benchmarking to see if not making those additional syscalls is just placebo
    or if there are actual impacts. Given how long this oddity has been around, I can't tell if it's a genuine
    bug in glibc, an unoptimized corner case, or just a big nothingburger.


    1. https://blog.packagecloud.io/set-environment-variable-save-thousands-of-system-calls/
    2. https://news.ycombinator.com/item?id=34346346
    3. https://vermaden.wordpress.com/posts/
    4. https://stackoverflow.com/questions/4554271/how-to-avoid-excessive-stat-etc-localtime-calls-in-strftime-on-linux


    Thoughts?

    --
    Joshua Kinard
    Gentoo/MIPS
    kumba@gentoo.org
    rsa6144/5C63F4E3F5C6C943 2015-04-27
    177C 1972 1FB8 F254 BAD0 3E72 5C63 F4E3 F5C6 C943

    "The past tempts us, the present confuses us, the future frightens us. And our lives slip away, moment by
    moment, lost in that vast, terrible in-between."

    --Emperor Turhan, Centauri Republic

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ionen Wolkens@21:1/5 to Joshua Kinard on Thu Jan 19 07:10:01 2023
    On Wed, Jan 18, 2023 at 08:48:56PM -0500, Joshua Kinard wrote:

    So this article[1] from 2017 popped up again on the tech radar via hackernews[2] and a few other sites[3]. It
    annotates how if the envvar TZ is undefined on a Linux system, it causes glibc to generate a number of
    additional syscalls, mainly stat-related calls (in my tests, newfstatat()). If defined to an actual value,
    such as ":/etc/localtime" (or even an empty string), glibc will instead generate far fewer, if any at all, of
    these stat-related syscalls.
    [...]

    Thoughts?

    Sounds good to me from the little I know of it, albeit I do imagine it
    could raise issues with some packages that try to use/handle TZ
    themselves and no telling what obscure thing this is going to break.

    exa[1][2] is one example that sam mentioned, but I imagine there's
    more to find.

    Personally added to /etc/env.d locally anyway, will see what come of it
    for the things I use, not that this covers much at all :)

    [1] https://github.com/ogham/exa/issues/856
    [2] https://github.com/ogham/exa/pull/867
    --
    ionen

    -----BEGIN PGP SIGNATURE-----

    iQEzBAABCAAdFiEEx3SLh1HBoPy/yLVYskQGsLCsQzQFAmPI3XYACgkQskQGsLCs QzR6eAf9Ho3/lnVbNhmX4HfAFa4c3G/NZ7J1gduFSMG8i/w/wT1XVMaeYR7AsPuL ggMcbv9M4vQJ/K91NcjnLOQYByuqMq1nRHGLWrRJTFsPbRWsMV4lMhxsOc85qU9G 3SSoauEz3GuGjn3E3HorzjO9abhc7QQB3DK+kfMe/QmUCIndtao6SYf6288GT2p9 i520LJ1B7g5u7BJUp/Ynnv1O8Uvfs31CR5+IPz6AXTYayn5jDduI0bHIUFiJDOTU OyU/p7qrCDkL8qxs0RcuZ+LuwiXbD1f2iLe2km2aRcq1CiFPafWSmSF7Fw8BdAOE DxgA7MUkRu/11tiFj311UWaojR9VgQ==
    =ixSZ
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Micha=C5=82_G=C3=B3rny?=@21:1/5 to Joshua Kinard on Thu Jan 19 06:50:01 2023
    On Wed, 2023-01-18 at 20:48 -0500, Joshua Kinard wrote:
    So this article[1] from 2017 popped up again on the tech radar via hackernews[2] and a few other sites[3]. It
    annotates how if the envvar TZ is undefined on a Linux system, it causes glibc to generate a number of
    additional syscalls, mainly stat-related calls (in my tests, newfstatat()). If defined to an actual value,
    such as ":/etc/localtime" (or even an empty string), glibc will instead generate far fewer, if any at all, of
    these stat-related syscalls.

    [...]
    So is adding a default definition of TZ to our base system /etc/profile something we want to look at? I
    haven't tried any other methods of benchmarking to see if not making those additional syscalls is just placebo
    or if there are actual impacts. Given how long this oddity has been around, I can't tell if it's a genuine
    bug in glibc, an unoptimized corner case, or just a big nothingburger.


    Am I correct that there's no real difference between setting it to ":/etc/localtime" and the actual timezone?

    I suppose it would make sense to default it.

    --
    Best regards,
    Michał Górny

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Arsen =?utf-8?Q?Arsenovi=C4=87?=@21:1/5 to mgorny@gentoo.org on Thu Jan 19 13:20:02 2023
    Michał Górny <mgorny@gentoo.org> writes:

    On Wed, 2023-01-18 at 20:48 -0500, Joshua Kinard wrote:
    So this article[1] from 2017 popped up again on the tech radar via hackernews[2] and a few other sites[3]. It
    annotates how if the envvar TZ is undefined on a Linux system, it causes glibc to generate a number of
    additional syscalls, mainly stat-related calls (in my tests, newfstatat()). If defined to an actual value,
    such as ":/etc/localtime" (or even an empty string), glibc will instead generate far fewer, if any at all, of
    these stat-related syscalls.

    [...]
    So is adding a default definition of TZ to our base system /etc/profile something we want to look at? I
    haven't tried any other methods of benchmarking to see if not making those additional syscalls is just placebo
    or if there are actual impacts. Given how long this oddity has been around, I can't tell if it's a genuine
    bug in glibc, an unoptimized corner case, or just a big nothingburger.


    Am I correct that there's no real difference between setting it to ":/etc/localtime" and the actual timezone?

    I suppose it would make sense to default it.

    Correct, from ``(libc)TZ Variable'':

    If the ‘TZ’ environment variable does not have a value, the operation chooses a time zone by default. In the GNU C Library, the default time
    zone is like the specification ‘TZ=:/etc/localtime’ (or ‘TZ=:/usr/local/etc/localtime’, depending on how the GNU C Library was configured; *note Installation::). Other C libraries use their own rule
    for choosing the default time zone, so there is little we can say about
    them.

    I don't suspect any downside to this approach.
    --
    Arsen Arsenović

    --=-=-Content-Type: application/pgp-signature; name="signature.asc"

    -----BEGIN PGP SIGNATURE-----

    iOcEARYKAI8WIQT+4rPRE/wAoxYtYGFSwpQwHqLEkwUCY8kz718UgAAAAAAuAChp c3N1ZXItZnByQG5vdGF0aW9ucy5vcGVucGdwLmZpZnRoaG9yc2VtYW4ubmV0RkVF MkIzRDExM0ZDMDBBMzE2MkQ2MDYxNTJDMjk0MzAxRUEyQzQ5MxEcYXJzZW5AZ2Vu dG9vLm9yZwAKCRBSwpQwHqLEkwtuAP4jFS+iRMTbEFT59j69vRIdXZl+QeXQxcpP xjZlJYCyCgD/eHJYmq4kbpxIkE1oXoybyVL/bVZjydYslTHg4GYBLQg=mJ+Z
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Orlitzky@21:1/5 to Joshua Kinard on Thu Jan 19 15:50:01 2023
    On Wed, 2023-01-18 at 20:48 -0500, Joshua Kinard wrote:

    So is adding a default definition of TZ to our base system
    /etc/profile something we want to look at? I
    haven't tried any other methods of benchmarking to see if not making
    those additional syscalls is just placebo
    or if there are actual impacts. Given how long this oddity has been
    around, I can't tell if it's a genuine
    bug in glibc, an unoptimized corner case, or just a big
    nothingburger.


    I thought about doing this on my laptop, and talked myself out of it.
    The main counter-arguments are,

    1. ICU doesn't handle the :/etc/localtime format at the moment,

    * https://unicode-org.atlassian.net/browse/ICU-13694
    * https://github.com/nodejs/node/issues/37271

    You could readlink() it or whatever at boot, but that will cause
    changes to /etc/localtime to be mysteriously ignored.

    2. The stats are there for a "good" reason, namely to let glibc
    know if the timezone has changed on the fly.

    The first one is only a temporary deal-breaker, but the second is a
    tradeoff involving how often your timezone changes (user-dependent) and
    what the real performance impact is (probably not much).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Haelwenn (lanodan) Monnier@21:1/5 to All on Tue Feb 14 13:50:01 2023
    [2023-01-18 20:48:56-0500] Joshua Kinard:
    So is adding a default definition of TZ to our base system /etc/profile something we want to look at? I
    haven't tried any other methods of benchmarking to see if not making those additional syscalls is just placebo
    or if there are actual impacts. Given how long this oddity has been around, I can't tell if it's a genuine
    bug in glibc, an unoptimized corner case, or just a big nothingburger.

    I would take it as a glibc bug / lack of optimisation. At least definitely one where the fault lies in glibc given that your showed other libc as more optimized.

    And given that POSIX puts ":/etc/localtime" as implementation defined[1],
    I think we should avoid it, glibc isn't alone in dealing with timezones.

    1: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html#tag_08_03

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)