• Bug#1100685: unattended-upgrades and needrestart may restart services u

    From Helmut Grohne@21:1/5 to All on Mon Mar 17 11:00:02 2025
    Package: unattended-upgrades,needrestart
    Severity: important
    Control: affects -1 + cron openssh-server systemd

    Hi,

    I hit a funky interaction bug with the last bookworm stable point
    release upgrade. In isolation, each component behaves reasonably, but
    their combination may result in unexpected service failure. I seek
    feedback on improving the situation.

    1. Services such as cron and ssh may leave processes behind after a
    restart. A long cron job and an existing ssh connection are example
    situations where this happens.
    2. By default, unattended-upgrades sets Unattended-Upgrade::MinimalSteps
    to true and therefore upgrades one source package at a time. As a
    consequence it invokes needrestart a number of times.
    3. Every time needrestart is invoked, it considers all running services
    and considers those left-over cron jobs or ssh connections as a reason
    to restart the service even if the main daemon process is no longer
    using an outdated copy.
    4. systemd poses a limit on restarting services too frequently. If you
    restart a service 10 times within a minute, it temporarily ignores
    start requests and leaves the service in a failed state.

    The end result is that a stable point release may upgrade glibc rather
    early, then each of the minimal steps will restart your service until it
    fails. A stable point release has sufficiently many updates to trigger systemd's limit if you operate on fast storage.

    Terminating ssh in an unattended-upgrade is a significant problem
    justifying important severity. Hope you agree.

    Now the question arises what could be done to improve the situation.

    The default of Unattended-Upgrade::MinimalSteps is set to true arguing
    that this is safer. Arguably, setting it to false, also provides a kind
    of safety against unattended-upgrades terminating your ssh server.

    Another way to look at this would be that needrestart maybe should
    recognize that restarting cron or ssh is not going to help in this
    situation and skip doing that.

    Yet another way of looking at it, is considering that
    unattended-upgrades maybe should interact with needrestart more closely
    and batch up needrestart even in the fase of
    Unattended-Upgrade::MinimalSteps. Maybe it could temporarily disable needrestart somehow and then run it once after doing its thing? That
    would also speed things up.

    We're not yet at the end of options. Skipping restarts of young
    processes also is a possible avenue and suggested by Paul Wise via
    #889552.

    Last but not least, having unattended-upgrades perform a sleep between
    the upgrade operations would make it slow enough to not trigger
    systemd's limit.

    As we can see, there a are lots of options to twist the current behavior
    into something that avoids this particular failure mode. On the flip
    side, each of them has other subtle consequences, so it is not clear to
    me what the best option is. I appreciate some feedback from the relevant package maintainers.

    Helmut

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Helmut Grohne@21:1/5 to Helmut Grohne on Tue Apr 29 20:30:01 2025
    Control: tags -1 + patch

    On Mon, Mar 17, 2025 at 10:46:23AM +0100, Helmut Grohne wrote:
    I hit a funky interaction bug with the last bookworm stable point
    release upgrade. In isolation, each component behaves reasonably, but
    their combination may result in unexpected service failure. I seek
    feedback on improving the situation.

    1. Services such as cron and ssh may leave processes behind after a
    restart. A long cron job and an existing ssh connection are example
    situations where this happens.
    2. By default, unattended-upgrades sets Unattended-Upgrade::MinimalSteps
    to true and therefore upgrades one source package at a time. As a
    consequence it invokes needrestart a number of times.
    3. Every time needrestart is invoked, it considers all running services
    and considers those left-over cron jobs or ssh connections as a reason
    to restart the service even if the main daemon process is no longer
    using an outdated copy.
    4. systemd poses a limit on restarting services too frequently. If you
    restart a service 10 times within a minute, it temporarily ignores
    start requests and leaves the service in a failed state.

    The end result is that a stable point release may upgrade glibc rather
    early, then each of the minimal steps will restart your service until it fails. A stable point release has sufficiently many updates to trigger systemd's limit if you operate on fast storage.

    Terminating ssh in an unattended-upgrade is a significant problem
    justifying important severity. Hope you agree.

    Now the question arises what could be done to improve the situation.

    The default of Unattended-Upgrade::MinimalSteps is set to true arguing
    that this is safer. Arguably, setting it to false, also provides a kind
    of safety against unattended-upgrades terminating your ssh server.

    Another way to look at this would be that needrestart maybe should
    recognize that restarting cron or ssh is not going to help in this
    situation and skip doing that.

    Yet another way of looking at it, is considering that
    unattended-upgrades maybe should interact with needrestart more closely
    and batch up needrestart even in the fase of Unattended-Upgrade::MinimalSteps. Maybe it could temporarily disable needrestart somehow and then run it once after doing its thing? That
    would also speed things up.

    We're not yet at the end of options. Skipping restarts of young
    processes also is a possible avenue and suggested by Paul Wise via
    #889552.

    Last but not least, having unattended-upgrades perform a sleep between
    the upgrade operations would make it slow enough to not trigger
    systemd's limit.

    As we can see, there a are lots of options to twist the current behavior
    into something that avoids this particular failure mode. On the flip
    side, each of them has other subtle consequences, so it is not clear to
    me what the best option is. I appreciate some feedback from the relevant package maintainers.

    I attempted to solve the problem at the needrestart level. There, the
    options were dim. needrestart supports a NEEDRESTART_SUSPEND variable to
    skip its operation, but it cannot reasonably know when to set that. I
    also suggested moving the actual restarts into a systemd unit that would
    order after apt-daily-upgrade.service to batch service activations, but
    it there are other distributions with other services invoking
    needrestart. We ended up concluding that needrestart was not a good
    place to fix this. Still Thomas Liske provided some feedback and agreed
    to contribute to the discussion.

    My second attempt is at the unattended-upgrades level. In effect, it is
    an unattended-upgrades process that ends up calling needrestart via needrestart's apt-pinvoke via an apt invocation. Thus it is able to
    control needrestart via NEEDRESTART_SUSPEND. If the unattended-upgrades
    process were to set that variable and finally call apt-pinvoke, we'd effectively get the batching suggested earlier and fix the root cause.

    Now unattended-upgrades has a plugin mechanism. The primary hooking
    mechanism is postrun, which is a good place to call apt-pinvoke. In
    addition, __init__ is called early and allows us to modify the process environment. It probably was not intended that way, but it works. And
    that approach yields a fairly reliable mechanism for batching
    needrestart when called from unattended-upgrades. I'm attaching the
    resulting unattanded-upgrades plugin that can be dropped into one of /etc/unattended-upgrades/plugins or
    /usr/share/unattended-upgrades/plugins.

    Now the question becomes whether either unattended-upgrades or
    needrestart would be willing to install this plugin below /usr/share to
    turn it active by default? From my point of view, either package would
    be a good fit for doing so.

    Thanks for considering

    Helmut

    import os
    import subprocess

    class UnattendedUpgradesPluginNeedRestart:
    """Batch needrestart invocations. Normally needrestart integrates by
    setting up an apt Post-Invoke hook. With minimal steps enabled, this may
    may cause excessive restarts. Instead, skip them all and restart once at
    the end.
    """
    NEEDRESTART_HOOK = "/usr/lib/needrestart/apt-pinvoke"

    def __init__(self):
    self.enabled = True
    self.prerun()

    def prerun(self):
    if (
    "NEEDRESTART_SUSPEND" in os.environ
    or not os.path.exists(self.NEEDRESTART_HOOK)
    ):
    self.enabled = False
    return

    # Modify global environment variables inherited to apt and thus
    # needrestart.
    os.environ["NEEDRESTART_SUSPEND"] = "unattended-upgrades"

    def postrun(self, result):
    if not self.enabled:
    return

    del os.environ["NEEDRESTART_SUSPEND"]

    if not result.packages_upgraded:
    return

    subprocess.call([self.NEEDRESTART_HOOK])

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Julian Andres Klode@21:1/5 to Helmut Grohne on Mon May 5 18:00:01 2025
    On Tue, Apr 29, 2025 at 08:22:42PM +0200, Helmut Grohne wrote:
    Control: tags -1 + patch

    On Mon, Mar 17, 2025 at 10:46:23AM +0100, Helmut Grohne wrote:
    I hit a funky interaction bug with the last bookworm stable point
    release upgrade. In isolation, each component behaves reasonably, but
    their combination may result in unexpected service failure. I seek
    feedback on improving the situation.

    1. Services such as cron and ssh may leave processes behind after a
    restart. A long cron job and an existing ssh connection are example
    situations where this happens.
    2. By default, unattended-upgrades sets Unattended-Upgrade::MinimalSteps
    to true and therefore upgrades one source package at a time. As a
    consequence it invokes needrestart a number of times.
    3. Every time needrestart is invoked, it considers all running services
    and considers those left-over cron jobs or ssh connections as a reason
    to restart the service even if the main daemon process is no longer
    using an outdated copy.
    4. systemd poses a limit on restarting services too frequently. If you
    restart a service 10 times within a minute, it temporarily ignores
    start requests and leaves the service in a failed state.

    The end result is that a stable point release may upgrade glibc rather early, then each of the minimal steps will restart your service until it fails. A stable point release has sufficiently many updates to trigger systemd's limit if you operate on fast storage.

    Terminating ssh in an unattended-upgrade is a significant problem justifying important severity. Hope you agree.

    Now the question arises what could be done to improve the situation.

    The default of Unattended-Upgrade::MinimalSteps is set to true arguing
    that this is safer. Arguably, setting it to false, also provides a kind
    of safety against unattended-upgrades terminating your ssh server.

    Another way to look at this would be that needrestart maybe should recognize that restarting cron or ssh is not going to help in this situation and skip doing that.

    Yet another way of looking at it, is considering that
    unattended-upgrades maybe should interact with needrestart more closely
    and batch up needrestart even in the fase of Unattended-Upgrade::MinimalSteps. Maybe it could temporarily disable needrestart somehow and then run it once after doing its thing? That
    would also speed things up.

    We're not yet at the end of options. Skipping restarts of young
    processes also is a possible avenue and suggested by Paul Wise via
    #889552.

    Last but not least, having unattended-upgrades perform a sleep between
    the upgrade operations would make it slow enough to not trigger
    systemd's limit.

    As we can see, there a are lots of options to twist the current behavior into something that avoids this particular failure mode. On the flip
    side, each of them has other subtle consequences, so it is not clear to
    me what the best option is. I appreciate some feedback from the relevant package maintainers.

    I attempted to solve the problem at the needrestart level. There, the
    options were dim. needrestart supports a NEEDRESTART_SUSPEND variable to
    skip its operation, but it cannot reasonably know when to set that. I
    also suggested moving the actual restarts into a systemd unit that would order after apt-daily-upgrade.service to batch service activations, but
    it there are other distributions with other services invoking
    needrestart. We ended up concluding that needrestart was not a good
    place to fix this. Still Thomas Liske provided some feedback and agreed
    to contribute to the discussion.

    My second attempt is at the unattended-upgrades level. In effect, it is
    an unattended-upgrades process that ends up calling needrestart via needrestart's apt-pinvoke via an apt invocation. Thus it is able to
    control needrestart via NEEDRESTART_SUSPEND. If the unattended-upgrades process were to set that variable and finally call apt-pinvoke, we'd effectively get the batching suggested earlier and fix the root cause.

    Now unattended-upgrades has a plugin mechanism. The primary hooking
    mechanism is postrun, which is a good place to call apt-pinvoke. In
    addition, __init__ is called early and allows us to modify the process environment. It probably was not intended that way, but it works. And
    that approach yields a fairly reliable mechanism for batching
    needrestart when called from unattended-upgrades. I'm attaching the
    resulting unattanded-upgrades plugin that can be dropped into one of /etc/unattended-upgrades/plugins or
    /usr/share/unattended-upgrades/plugins.

    Now the question becomes whether either unattended-upgrades or
    needrestart would be willing to install this plugin below /usr/share to
    turn it active by default? From my point of view, either package would
    be a good fit for doing so.

    Given that this is an issue with how unattended-upgrades uses apt that
    should be resolved eventually by all that being handled natively in
    apt, I'm strongly in favour of shipping the workaround in u-u.

    A more contrived example would be to collect all the dpkg hooks
    and execute them ourselves, but the needrestart case is more limited
    in scope, which is beneficial.

    The question for the needrestart maintainer is whether the name of
    the hook is a stable interface, I cannot answer that.
    --
    debian developer - deb.li/jak | jak-linux.org - free software dev
    ubuntu core developer i speak de, en

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)