Forum: >>> Magnum BBS <<<

Bug#1100685: unattended-upgrades and needrestart may restart services u

From Helmut Grohne@21:1/5 to All on Mon Mar 17 11:00:02 2025

Package: unattended-upgrades,needrestart
Severity: important
Control: affects -1 + cron openssh-server systemd

Hi,

I hit a funky interaction bug with the last bookworm stable point
release upgrade. In isolation, each component behaves reasonably, but
their combination may result in unexpected service failure. I seek
feedback on improving the situation.

1. Services such as cron and ssh may leave processes behind after a
restart. A long cron job and an existing ssh connection are example
situations where this happens.
2. By default, unattended-upgrades sets Unattended-Upgrade::MinimalSteps
to true and therefore upgrades one source package at a time. As a
consequence it invokes needrestart a number of times.
3. Every time needrestart is invoked, it considers all running services
and considers those left-over cron jobs or ssh connections as a reason
to restart the service even if the main daemon process is no longer
using an outdated copy.
4. systemd poses a limit on restarting services too frequently. If you
restart a service 10 times within a minute, it temporarily ignores
start requests and leaves the service in a failed state.

The end result is that a stable point release may upgrade glibc rather
early, then each of the minimal steps will restart your service until it
fails. A stable point release has sufficiently many updates to trigger systemd's limit if you operate on fast storage.

Terminating ssh in an unattended-upgrade is a significant problem
justifying important severity. Hope you agree.

Now the question arises what could be done to improve the situation.

The default of Unattended-Upgrade::MinimalSteps is set to true arguing
that this is safer. Arguably, setting it to false, also provides a kind
of safety against unattended-upgrades terminating your ssh server.

Another way to look at this would be that needrestart maybe should
recognize that restarting cron or ssh is not going to help in this
situation and skip doing that.

Yet another way of looking at it, is considering that
unattended-upgrades maybe should interact with needrestart more closely
and batch up needrestart even in the fase of
Unattended-Upgrade::MinimalSteps. Maybe it could temporarily disable needrestart somehow and then run it once after doing its thing? That
would also speed things up.

We're not yet at the end of options. Skipping restarts of young
processes also is a possible avenue and suggested by Paul Wise via
#889552.

Last but not least, having unattended-upgrades perform a sleep between
the upgrade operations would make it slow enough to not trigger
systemd's limit.

As we can see, there a are lots of options to twist the current behavior
into something that avoids this particular failure mode. On the flip
side, each of them has other subtle consequences, so it is not clear to
me what the best option is. I appreciate some feedback from the relevant package maintainers.

Helmut

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Helmut Grohne@21:1/5 to Helmut Grohne on Tue Apr 29 20:30:01 2025

Control: tags -1 + patch

On Mon, Mar 17, 2025 at 10:46:23AM +0100, Helmut Grohne wrote:

I hit a funky interaction bug with the last bookworm stable point
release upgrade. In isolation, each component behaves reasonably, but
their combination may result in unexpected service failure. I seek
feedback on improving the situation.

1. Services such as cron and ssh may leave processes behind after a
restart. A long cron job and an existing ssh connection are example
situations where this happens.
2. By default, unattended-upgrades sets Unattended-Upgrade::MinimalSteps
to true and therefore upgrades one source package at a time. As a
consequence it invokes needrestart a number of times.
3. Every time needrestart is invoked, it considers all running services
and considers those left-over cron jobs or ssh connections as a reason
to restart the service even if the main daemon process is no longer
using an outdated copy.
4. systemd poses a limit on restarting services too frequently. If you
restart a service 10 times within a minute, it temporarily ignores
start requests and leaves the service in a failed state.

The end result is that a stable point release may upgrade glibc rather
early, then each of the minimal steps will restart your service until it fails. A stable point release has sufficiently many updates to trigger systemd's limit if you operate on fast storage.

Terminating ssh in an unattended-upgrade is a significant problem
justifying important severity. Hope you agree.

Now the question arises what could be done to improve the situation.

The default of Unattended-Upgrade::MinimalSteps is set to true arguing
that this is safer. Arguably, setting it to false, also provides a kind
of safety against unattended-upgrades terminating your ssh server.

Another way to look at this would be that needrestart maybe should
recognize that restarting cron or ssh is not going to help in this
situation and skip doing that.

Yet another way of looking at it, is considering that
unattended-upgrades maybe should interact with needrestart more closely
and batch up needrestart even in the fase of Unattended-Upgrade::MinimalSteps. Maybe it could temporarily disable needrestart somehow and then run it once after doing its thing? That
would also speed things up.

We're not yet at the end of options. Skipping restarts of young
processes also is a possible avenue and suggested by Paul Wise via
#889552.

Last but not least, having unattended-upgrades perform a sleep between
the upgrade operations would make it slow enough to not trigger
systemd's limit.

As we can see, there a are lots of options to twist the current behavior
into something that avoids this particular failure mode. On the flip
side, each of them has other subtle consequences, so it is not clear to
me what the best option is. I appreciate some feedback from the relevant package maintainers.

I attempted to solve the problem at the needrestart level. There, the
options were dim. needrestart supports a NEEDRESTART_SUSPEND variable to
skip its operation, but it cannot reasonably know when to set that. I
also suggested moving the actual restarts into a systemd unit that would
order after apt-daily-upgrade.service to batch service activations, but
it there are other distributions with other services invoking
needrestart. We ended up concluding that needrestart was not a good
place to fix this. Still Thomas Liske provided some feedback and agreed
to contribute to the discussion.

My second attempt is at the unattended-upgrades level. In effect, it is
an unattended-upgrades process that ends up calling needrestart via needrestart's apt-pinvoke via an apt invocation. Thus it is able to
control needrestart via NEEDRESTART_SUSPEND. If the unattended-upgrades
process were to set that variable and finally call apt-pinvoke, we'd effectively get the batching suggested earlier and fix the root cause.

Now unattended-upgrades has a plugin mechanism. The primary hooking
mechanism is postrun, which is a good place to call apt-pinvoke. In
addition, __init__ is called early and allows us to modify the process environment. It probably was not intended that way, but it works. And
that approach yields a fairly reliable mechanism for batching
needrestart when called from unattended-upgrades. I'm attaching the
resulting unattanded-upgrades plugin that can be dropped into one of /etc/unattended-upgrades/plugins or
/usr/share/unattended-upgrades/plugins.

Now the question becomes whether either unattended-upgrades or
needrestart would be willing to install this plugin below /usr/share to
turn it active by default? From my point of view, either package would
be a good fit for doing so.

Thanks for considering

Helmut

import os
import subprocess

class UnattendedUpgradesPluginNeedRestart:
"""Batch needrestart invocations. Normally needrestart integrates by
setting up an apt Post-Invoke hook. With minimal steps enabled, this may
may cause excessive restarts. Instead, skip them all and restart once at
the end.
"""
NEEDRESTART_HOOK = "/usr/lib/needrestart/apt-pinvoke"

def __init__(self):
self.enabled = True
self.prerun()

def prerun(self):
if (
"NEEDRESTART_SUSPEND" in os.environ
or not os.path.exists(self.NEEDRESTART_HOOK)
):
self.enabled = False
return

# Modify global environment variables inherited to apt and thus
# needrestart.
os.environ["NEEDRESTART_SUSPEND"] = "unattended-upgrades"

def postrun(self, result):
if not self.enabled:
return

del os.environ["NEEDRESTART_SUSPEND"]

if not result.packages_upgraded:
return

subprocess.call([self.NEEDRESTART_HOOK])

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Julian Andres Klode@21:1/5 to Helmut Grohne on Mon May 5 18:00:01 2025

On Tue, Apr 29, 2025 at 08:22:42PM +0200, Helmut Grohne wrote:

Control: tags -1 + patch

On Mon, Mar 17, 2025 at 10:46:23AM +0100, Helmut Grohne wrote:

I hit a funky interaction bug with the last bookworm stable point
release upgrade. In isolation, each component behaves reasonably, but
their combination may result in unexpected service failure. I seek
feedback on improving the situation.

1. Services such as cron and ssh may leave processes behind after a
restart. A long cron job and an existing ssh connection are example
situations where this happens.
2. By default, unattended-upgrades sets Unattended-Upgrade::MinimalSteps
to true and therefore upgrades one source package at a time. As a
consequence it invokes needrestart a number of times.
3. Every time needrestart is invoked, it considers all running services
and considers those left-over cron jobs or ssh connections as a reason
to restart the service even if the main daemon process is no longer
using an outdated copy.
4. systemd poses a limit on restarting services too frequently. If you
restart a service 10 times within a minute, it temporarily ignores
start requests and leaves the service in a failed state.

The end result is that a stable point release may upgrade glibc rather early, then each of the minimal steps will restart your service until it fails. A stable point release has sufficiently many updates to trigger systemd's limit if you operate on fast storage.

Terminating ssh in an unattended-upgrade is a significant problem justifying important severity. Hope you agree.

Now the question arises what could be done to improve the situation.

The default of Unattended-Upgrade::MinimalSteps is set to true arguing
that this is safer. Arguably, setting it to false, also provides a kind
of safety against unattended-upgrades terminating your ssh server.

Another way to look at this would be that needrestart maybe should recognize that restarting cron or ssh is not going to help in this situation and skip doing that.

Yet another way of looking at it, is considering that
unattended-upgrades maybe should interact with needrestart more closely
and batch up needrestart even in the fase of Unattended-Upgrade::MinimalSteps. Maybe it could temporarily disable needrestart somehow and then run it once after doing its thing? That
would also speed things up.

We're not yet at the end of options. Skipping restarts of young
processes also is a possible avenue and suggested by Paul Wise via
#889552.

Last but not least, having unattended-upgrades perform a sleep between
the upgrade operations would make it slow enough to not trigger
systemd's limit.

As we can see, there a are lots of options to twist the current behavior into something that avoids this particular failure mode. On the flip
side, each of them has other subtle consequences, so it is not clear to
me what the best option is. I appreciate some feedback from the relevant package maintainers.

I attempted to solve the problem at the needrestart level. There, the
options were dim. needrestart supports a NEEDRESTART_SUSPEND variable to
skip its operation, but it cannot reasonably know when to set that. I
also suggested moving the actual restarts into a systemd unit that would order after apt-daily-upgrade.service to batch service activations, but
it there are other distributions with other services invoking
needrestart. We ended up concluding that needrestart was not a good
place to fix this. Still Thomas Liske provided some feedback and agreed
to contribute to the discussion.

My second attempt is at the unattended-upgrades level. In effect, it is
an unattended-upgrades process that ends up calling needrestart via needrestart's apt-pinvoke via an apt invocation. Thus it is able to
control needrestart via NEEDRESTART_SUSPEND. If the unattended-upgrades process were to set that variable and finally call apt-pinvoke, we'd effectively get the batching suggested earlier and fix the root cause.

Now unattended-upgrades has a plugin mechanism. The primary hooking
mechanism is postrun, which is a good place to call apt-pinvoke. In
addition, __init__ is called early and allows us to modify the process environment. It probably was not intended that way, but it works. And
that approach yields a fairly reliable mechanism for batching
needrestart when called from unattended-upgrades. I'm attaching the
resulting unattanded-upgrades plugin that can be dropped into one of /etc/unattended-upgrades/plugins or
/usr/share/unattended-upgrades/plugins.

Now the question becomes whether either unattended-upgrades or
needrestart would be willing to install this plugin below /usr/share to
turn it active by default? From my point of view, either package would
be a good fit for doing so.

Given that this is an issue with how unattended-upgrades uses apt that
should be resolved eventually by all that being handled natively in
apt, I'm strongly in favour of shipping the workaround in u-u.

A more contrived example would be to collect all the dpkg hooks
and execute them ourselves, but the needrestart case is more limited
in scope, which is beneficial.

The question for the needrestart maintainer is whether the name of
the hook is a stable interface, I cannot answer that.
--
debian developer - deb.li/jak | jak-linux.org - free software dev
ubuntu core developer i speak de, en

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Adam Fancher
  Fri May 23 17:40:50 2025
  from Winsted, Ct via Telnet
- Daniel Garrod
  Fri May 23 16:53:38 2025
  from Cambridge, Uk via Telnet
- Adam Fancher
  Fri May 23 16:18:50 2025
  from Winsted, Ct via Telnet
- Jokke
  Fri May 23 13:26:00 2025
  from Be via Telnet
- Jokke
  Fri May 23 08:37:28 2025
  from Be via Telnet
- Centurion
  Fri May 23 02:35:44 2025
  from Berea, Ohio via Telnet
- Rixter
  Fri May 23 02:32:42 2025
  from Madison, Nc via Telnet
- Adam Fancher
  Thu May 22 20:51:38 2025
  from Winsted, Ct via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	481
Nodes:	16 (3 / 13)
Uptime:	32:31:57
Calls:	9,547
Calls today:	7
Files:	13,656
Messages:	6,141,056

Bug#1100685: unattended-upgrades and needrestart may restart services u

Who's Online

Recent Visitors

System Info