Hi. Since 10.0p1 came out, about half the time I try to connect to my
system it fails and on inspection there is a crash in the dmesg
output:
[419972.562415] sshd-session[189732]: segfault at 7ffceb533dbc ip 00007ff7dc95261d sp 00007ffceb533d70 error 6 in libc.so.6[6261d,7ff7dc918000+165000] likely on CPU 3 (core 3, socket 0)
[419972.562422] Code: 59 ec ff ff e8 a4 a5 0b 00 0f 1f 40 00 41 57 49 89 f7 41 56 49 89 d6 41 55 41 54 55 53 48 89 fb 4c 89 ff 48 81 ec f8 04 00 00 <89> 4c 24 4c 48 89 74 24 70 be 25 00 00 00 64 48 8b 04 25 28 00 00
Of course the addresses and cpu number vary, but the code is always
the same.
Since this started happening a few days ago, I wondered if it might be >because my system had been up for over a year and cthulhu knows how
many versions of libc6 were still pinned in core because a process
still depended on it after an upgrade, but rebooting had no
effect. libc has also upgraded twice since this started, also with no
effect.
Obviously this will require more investigation to debug. I am at your >disposal.
* install the gdb and systemd-coredump packages
This minor speedbump aside, I have attached the 'bt' output from gdb.
Ugh, I didn't realize your password would show up in the backtrace! Sorry about that - please change it as soon as possible.
valid. Therefore I think we must be dealing with action at a distance from some previous memory corruption, which is going to be a pain to track down. It might be in openssh-server, and the timing suggests that it probably is; but it might also be in any other PAM module used in the auth phase.
Now try logging in again until you hit a crash, and then look in "sudo journalctl -u ssh.service | less" for the output of valgrind; each instance of its output will start with a line saying "Memcheck, a memory error detector", and each line will have "==PID==" in it for some process ID. I don't think the output is likely to include your password this time, but it will probably be worth checking it over just in case.
Separately, it might also be helpful for me to have a copy of your /etc/pam.d/common-auth file, so I can see which other modules are being run
Reconfiguring libpam-runtime to exclude ecryptfs doesn't make any
difference, it still crashes
* Michel Casabona <michel.casabona@free.fr> [250430 00:36]:
Reconfiguring libpam-runtime to exclude ecryptfs doesn't make any
difference, it still crashes
Could you maybe post your full PAM configuration? That would be /etc/pam.d/sshd and also all of /etc/pam.d/common-*
Same problem here since 1:10.0p1-2 was migrated to testing yesterday.
It seems (but I'm' not sure) that there is less chances to crash when
using password authentication (PubkeyAuthentication=no).
Also, on mys system it's easier to cause a crash when logging from the
server itself (either by loopback or ethernet IP address)
Reconfiguring libpam-runtime to exclude ecryptfs doesn't make any
difference, it still crashes
From the client view (-vvv) the connection is reset at different points, >sometimes after the local version string is shown, with an error message:
As advised I tried installing systemd-coredump, valgrind and also
debuginfod, then modified the script
/usr/local/bin/sshd-session-valgrind like this
DEBUGINFOD_URLS=https://debuginfod.debian.net/ exec valgrind >--leak-check=full --enable-debuginfod=yes
/usr/lib/openssh/sshd-session "$@"
Now valgrind shows the name of a function
avril 29 19:57:25 odysseus sshd[4019365]: ==4019365== Cannot map
memory to grow the stack for thread #1 to 0x1ffeffc000
avril 29 19:57:25 odysseus sshd[4019365]: ==4019365==
avril 29 19:57:25 odysseus sshd[4019365]: ==4019365== Process
terminating with default action of signal 11 (SIGSEGV): dumping core
avril 29 19:57:25 odysseus sshd[4019365]: ==4019365== Access not
within mapped region at address 0x1FFEFFCD78
avril 29 19:57:25 odysseus sshd[4019365]: ==4019365== Cannot map
memory to grow the stack for thread #1 to 0x1ffeffc000
avril 29 19:57:25 odysseus sshd[4019365]: ==4019365== at
0x1BCBC9: glob0 (glob.c:476)
Unfortunately I couldn't get a coredump
avril 29 19:57:25 odysseus systemd[1]: Started
systemd-coredump@15-4019403-0.service - Process Core Dump (PID
4019403/UID 0).
avril 29 19:57:25 odysseus systemd-coredump[4019404]: Resource
limits disable core dumping for process 4019365 (memcheck-amd64-).
avril 29 19:57:25 odysseus systemd-coredump[4019404]: [🡕] Process
4019365 (memcheck-amd64-) of user 0 terminated abnormally without
generating a coredump.
avril 29 19:57:25 odysseus systemd[1]:
systemd-coredump@15-4019403-0.service: Deactivated successfully.
No idea why, I thought installing systemd-coredump pushed the limits
Le 30/04/2025 à 13:42, Colin Watson a écrit :
Is that the complete output from valgrind, or did you edit it down?
It's tantalizingly close to being useful, but it really feels like
there should be more of it. Could I have all of the lines matching >>"==4019365=="?
Yes, I pasted a few line to post, sorry. The full log is attached below.
Could you drop --leak-check=full from the valgrind call, and instead add --main-stacksize=67108864 (i.e. eight times the current value)? Then provoke the bug again and send me the new valgrind output. Let's see if that tells us something different.
Could I also get your /etc/ssh/sshd_config and /etc/ssh/sshd_config.d/* files (of course you can edit out anything secret, but if you do then
please at least keep the structure)?
Le 30/04/2025 à 14:48, Colin Watson a écrit :
Could you drop --leak-check=full from the valgrind call, and instead add
--main-stacksize=67108864 (i.e. eight times the current value)? Then
provoke the bug again and send me the new valgrind output. Let's see if
that tells us something different.
Same output :-( Log attached.
Could I also get your /etc/ssh/sshd_config and /etc/ssh/sshd_config.d/*
files (of course you can edit out anything secret, but if you do then
please at least keep the structure)?
The (unedited) config files are attached too
I'm trying to get my test system closer to yours, but no luck so far.
The best I've been able to come up with is an overlap between source
and destination in a strlcpy call, which should probably be fixed,
I can reproduce this on my system. In my case it works when I issue `ssh localhost` (same user), but
it reliably crashes when I issue `ssh lucio@localhost`. Please note
that my username is `lucio`.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 482 |
Nodes: | 16 (2 / 14) |
Uptime: | 63:10:14 |
Calls: | 9,569 |
Files: | 13,663 |
Messages: | 6,143,629 |