Forum: >>> Magnum BBS <<<

Bug#980148: mesa-vulkan-drivers: file content conflict in Multi-Arch:sa

From Helmut Grohne@21:1/5 to All on Fri Apr 4 20:50:02 2025

XPost: linux.debian.maint.x

user debian-qa@lists.debian.org
usertags 980148 + fileconflict
severity 980148 serious
thanks

Hi,

On Fri, Jan 15, 2021 at 12:06:14PM +0100, Michel Dänzer wrote:

On 2021-01-15 12:02 p.m., Thorsten Glaser wrote:

Package: mesa-vulkan-drivers
[…]
Multi-Arch: same

The file /usr/share/vulkan/icd.d/intel_icd.x86_64.json differs.

amd64:

{
"ICD": {
"api_version": "1.2.145",
"library_path": "/usr/lib/x86_64-linux-gnu/libvulkan_intel.so"
},
"file_format_version": "1.0.0"
}

x32:

{
"ICD": {
"api_version": "1.2.145",
"library_path": "/usr/lib/x86_64-linux-gnux32/libvulkan_intel.so"
},
"file_format_version": "1.0.0"
}

This file must be moved out of /usr/share and into a multiarch library path.

Looks to me like the filename is wrong on x32.

How do you reach that conclusion? At least the multiarch tuple is
appropriate for x32.

A similar problem exists for armel and armhf. Both install /usr/share/vulkan/icd.d/lvp_icd.armv8l.json and /usr/share/vulkan/icd.d/radeon_icd.armv8l.json. I haven't checked their
file contents, but it seems very likely that they differ in
arm-linux-gnueabi vs arm-linux-gnueabihf in their library_path.

Given that these files reference shared libraries, they are inherently architecture-dependent and that makes them technically inappropriate to
ship below /usr/share. Would it be feasible to transition these files
rom /usr/share/vulkan/icd.d to /usr/lib/<triplet>/vulkan/icd.d? That'd
be a longer adventure as the consumers of these files would first have
to search both locations and once they all consider both we could start
moving them.

The file conflict on two release architectures makes the problem rc (as
you can practically experience an unpack error). A short-term workaround
is dropping Multi-Arch: same. Doing so reopens #853897 sadly.

Helmut

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From =?UTF-8?Q?Michel_D=C3=A4nzer?=@21:1/5 to Helmut Grohne on Mon Apr 7 10:40:01 2025

XPost: linux.debian.maint.x

On 2025-04-04 20:42, Helmut Grohne wrote:

On Fri, Jan 15, 2021 at 12:06:14PM +0100, Michel Dänzer wrote:

On 2021-01-15 12:02 p.m., Thorsten Glaser wrote:

Package: mesa-vulkan-drivers
[…]
Multi-Arch: same

The file /usr/share/vulkan/icd.d/intel_icd.x86_64.json differs.

amd64:

{
"ICD": {
"api_version": "1.2.145",
"library_path": "/usr/lib/x86_64-linux-gnu/libvulkan_intel.so"
},
"file_format_version": "1.0.0"
}

x32:

{
"ICD": {
"api_version": "1.2.145",
"library_path": "/usr/lib/x86_64-linux-gnux32/libvulkan_intel.so" >>> },
"file_format_version": "1.0.0"
}

This file must be moved out of /usr/share and into a multiarch library
path.

Looks to me like the filename is wrong on x32.

How do you reach that conclusion? At least the multiarch tuple is
appropriate for x32.

I see, my bad.

--
Earthling Michel Dänzer \ GNOME / Xwayland / Mesa developer https://redhat.com \ Libre software enthusiast

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From =?UTF-8?Q?Michel_D=C3=A4nzer?=@21:1/5 to Simon McVittie on Mon Apr 14 19:00:01 2025

XPost: linux.debian.maint.x

On 2025-04-14 18:23, Simon McVittie wrote:

I can see two ways to resolve #980148 without needing to change the
search path for Vulkan drivers:

1. As far as I'm aware, the basename of these files never matters: all
that matters is their content. So Mesa's debian/rules could do something
like this (assuming file-rename(1p) from the rename package):

file-rename 's/(.*)\.([^.]+?)\.json$/$1.$ENV{DEB_HOST_ARCH}.json/' \
debian/tmp/usr/share/vulkan/icd.d/*.json

to replace the "x86_64" or "armv8l" part of the filename with a string
that is definitely distinct for each pair of Debian architectures,
resulting in filenames like intel_icd.amd64.json and intel_icd.x32.json.

Or it could use $ENV{DEB_HOST_MULTIARCH} for longer-but-maybe-clearer
filenames like intel_icd.x86_64-linux-gnu.json, which would be necessary
if we want to allow mesa-vulkan-drivers:amd64,
mesa-vulkan-drivers:hurd-amd64 and mesa-vulkan-drivers:kfreebsd-amd64
to be co-installed.

[...]

2. Or, Mesa could give its Vulkan drivers the same file layout as its
Vulkan layers (which happens to be the same as the Nvidia proprietary
driver's Vulkan driver), taking advantage of the fact that on Debian, each
of its drivers is installed into ld.so's default load path for shared
libraries. So instead of hard-coding the full path of the library, it could
set the library_path field to be just the basename, resulting in the same
JSON content on every architecture:

{
"ICD": {
"api_version": "1.2.145",
"library_path": "libvulkan_intel.so"
},
"file_format_version": "1.0.0"
}

and then rename the file to a name that is intentionally the same
for every architecture (like intel_icd.json), so that they *always*
collide, and dpkg's multiarch refcounting resolves this by only keeping
one copy.

FWIW, I recommend option 2 for these reasons:

It should result in slightly better Vulkan start-up performance when mesa-vulkan-drivers is installed for multiple architectures, because the Vulkan loader won't waste cycles trying to load ICDs that can't work.

It also avoids warning messages from the loader (or the dynamic linker?) when trying to load ICDs that can't work.

--
Earthling Michel Dänzer \ GNOME / Xwayland / Mesa developer https://redhat.com \ Libre software enthusiast

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Simon McVittie@21:1/5 to All on Mon Apr 14 19:20:01 2025

XPost: linux.debian.maint.x

On Mon, 14 Apr 2025 at 18:48:02 +0200, Michel D�nzer wrote:

On 2025-04-14 18:23, Simon McVittie wrote:

I can see two ways to resolve #980148 without needing to change the
search path for Vulkan drivers:

1. [rename the files to have a longer architecture disambiguator]

2. Or, Mesa could give its Vulkan drivers the same file layout as its
Vulkan layers [changing] the library_path field to be just the basename
and then rename the file to a name that is intentionally the same
for every architecture (like intel_icd.json)

FWIW, I recommend option 2 for these reasons:

It should result in slightly better Vulkan start-up performance when mesa-vulkan-drivers is installed for multiple architectures, because
the Vulkan loader won't waste cycles trying to load ICDs that can't work.

Yes, although that speed-up probably isn't noticeable.

It also avoids warning messages from the loader (or the dynamic
linker?) when trying to load ICDs that can't work.

I believe Vulkan-Loader now tries to suppress those messages, although
they were a frequent source of noise in support channels in the past.

It's worth mentioning that there are some other specifications that are "the same shape" as Vulkan, particularly EGL and OpenXR. It would be great if Mesa's Vulkan driver could do this in a way that makes a good example for others, to make file content conflicts less likely to arise because anyone naively copying what Mesa does for Vulkan will already be "doing the right thing".

For EGL, Mesa's driver in the libegl-mesa0 package already uses the
equivalent of option 2 above, so there is no conflict. I think that's
another point in favour of option 2.

For OpenXR, the only "runtime" (~= driver) implementation I'm aware of in Debian (Monado) has a file content conflict bug similar to this one, but with no attempt to distinguish different architectures' files, so it affects amd64/i386 just as much as amd64/x32: <https://bugs.debian.org/1101455>.
On that bug, I recommended the equivalent of option 2 above, because the equivalent of option 1 would be more annoying to implement for OpenXR
(it is not *precisely* "the same shape" as Vulkan).

smcv

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From =?UTF-8?Q?Michel_D=C3=A4nzer?=@21:1/5 to Helmut Grohne on Tue Apr 15 10:50:01 2025

XPost: linux.debian.maint.x

On 2025-04-14 18:44, Helmut Grohne wrote:

On Mon, Apr 14, 2025 at 05:23:01PM +0100, Simon McVittie wrote:

Loaders are expected to be able to recognise that a particular driver is not >> for them, and gracefully not load it. In practice this works fine, because all
of our architectures can be distinguished by their ELF headers (and if that >> wasn't the case, multiarch co-installation of ordinary shared libraries would
go badly wrong).

I'm sorry to disappoint you, but reality is not like that.

You can actually run kfreebsd-amd64 binaries on a Linux kernel as their
ELF header looks the same. Not that they do useful stuff, but they may
go far enough as to reset your system clock. I've actually encountered
that.

Then, if you combine armel and armhf, those architectures also have ELF headers that are mostly indistinguishable. I'm not sure what happens
exactly, but it isn't good.

What also gets interesting is when you try to combine e.g. amd64 and musl-linux-amd64. Those also do not tell apart from their ELF header.

The elf-arch tool from arch-test attempts to map ELF headers to Debian architectures, but it can only do so much.

So no, as long as we support armel and armhf simultaneously, we cannot
tell architectures apart by their ELF header.

[...]

Given what I said earlier about the inability to tell ELF headers apart
and the real problems observed in trying to do so, I have a preference
for the first option.

Given that different variants of libvulkan_*.so are located in separate search paths, is there any scenario other than a system misconfiguration which would result in an attempt to load the wrong one?

--
Earthling Michel Dänzer \ GNOME / Xwayland / Mesa developer https://redhat.com \ Libre software enthusiast

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Simon McVittie@21:1/5 to Helmut Grohne on Tue Apr 15 15:00:01 2025

XPost: linux.debian.maint.x

On Mon, 14 Apr 2025 at 18:44:38 +0200, Helmut Grohne wrote:

In general, I doubt we fix this for trixie other than dropping M-A:same maybe.

Please don't drop M-A: same from mesa-vulkan-drivers. From my point of
view as someone helping to make Steam be runnable on Linux: mesa-vulkan-drivers:amd64 and mesa-vulkan-drivers:i386 need to be co-installable, otherwise it isn't possible to run 64- and 32-bit games
that use Vulkan on the same system (which would be a regression relative
to bookworm, bullseye, and I think also buster, where this worked fine).

Or, even if proprietary software like Steam is disregarded, mesa-vulkan-drivers:amd64 and mesa-vulkan-drivers:i386 need to be co-installable if we want both 64- and 32-bit Wine to be able to implement
the Direct3D API using DXVK, which I believe we do.

I think a regression for amd64/i386 co-installation would have a
considerably larger practical negative impact on Debian users than
ABI conflicts between less-commonly-used architecture pairs like
armel/armhf, and a very much larger practical negative impact than
conflicts between architecture pairs involving -ports (amd64/hurd-amd64
or amd64/kfreebsd-amd64) or architectures that are not yet in Debian at
all (amd64/musl-linux-amd64).

On Mon, Apr 14, 2025 at 05:23:01PM +0100, Simon McVittie wrote:

Loaders are expected to be able to recognise that a particular driver is not
for them, and gracefully not load it. In practice this works fine, because all
of our architectures can be distinguished by their ELF headers (and if that wasn't the case, multiarch co-installation of ordinary shared libraries would
go badly wrong).

I'm sorry to disappoint you, but reality is not like that.

...

Then, if you combine armel and armhf, those architectures also have ELF headers that are mostly indistinguishable. I'm not sure what happens
exactly, but it isn't good.

...

So no, as long as we support armel and armhf simultaneously, we cannot
tell architectures apart by their ELF header.

If this is a problem, surely it's a problem that we already have, whatever
Mesa might do? Because /etc/ld.so.conf.d adds all the multiarch directories from every enabled architecture to the search path:

amdahl$ schroot -c sid_armel-dchroot cat /etc/ld.so.conf.d/arm-linux-gnueabi.conf
...
# Multiarch support
/usr/local/lib/arm-linux-gnueabi
/lib/arm-linux-gnueabi
/usr/lib/arm-linux-gnueabi

amdahl$ schroot -c sid_armhf-dchroot cat /etc/ld.so.conf.d/arm-linux-gnueabihf.conf
...
# Multiarch support
/usr/local/lib/arm-linux-gnueabihf
/lib/arm-linux-gnueabihf
/usr/lib/arm-linux-gnueabihf

and we rely on the dynamic linker to consider and reject libraries that
are not, in fact, compatible with the current process.

I don't have a mixed armel/armhf system immediately to hand right now,
but you can see this in action on a mixed amd64/i386 system. I don't have
one /etc/ld.so.cache for amd64 and a second for i386: I only have one ld.so cache, containing both:

$ /sbin/ldconfig -Xp | grep libvulkan.so.1
libvulkan.so.1 (libc6,x86-64) => /lib/x86_64-linux-gnu/libvulkan.so.1
libvulkan.so.1 (libc6) => /lib/i386-linux-gnu/libvulkan.so.1

But when one of my installed programs asks to load libvulkan.so.1, either
via DT_NEEDED or dlopen(), ld.so knows that it must choose the one that
matches the architecture of the running process and disregard the other one.

(There is also only one LD_LIBRARY_PATH, shared between all architectures.)

Similarly I'm 95% sure that a mixed armel/armhf system only has one ld.so.cache, listing both armel and armhf libraries indiscriminately, and hopefully with enough metadata to choose which one is more appropriate
and disregard the other. So on a mixed armel/armhf system, there are
two possibilities:

(a) glibc/ld.so knows how to distinguish between armel and armhf libraries,
and avoid loading armel libraries into armhf processes and vice versa.
If this is true then loading "libvulkan_radeon.so" into an armhf process
will reliably load the armhf flavour, avoiding the armel flavour,
and we win.

(b) glibc/ld.so can't distinguish between armel and armhf libraries.
But if this is true, then we will already have the problem that loading
an ordinary library dependency like "libc.so.6" or "libvulkan.so.1" to
satisfy DT_NEEDED can load the wrong flavour, so we have already lost,
even before loading a Vulkan driver plugin; and I don't see how Mesa
doing a dlopen("libvulkan_radeon.so", ...) is going to make this
any worse.

and it seems like the same would be true for any pair of glibc
architectures? Either we're in the equivalent of situation (a) and my
"option 2" from earlier in the thread would work fine, or we're in
the equivalent of situation (b) and we already have a serious problem,
which is not going to be made noticeably worse by anything Mesa does.

In practice, it seems that ld.so *can* distinguish between armel and armhf, presumably by distinguishing EF_ARM_ABI_FLOAT_SOFT from
EF_ARM_ABI_FLOAT_HARD in their e_flags field:

amdahl$ schroot -c sid_armel-dchroot -- /sbin/ldconfig -Xp | grep libzstd
...
libzstd.so.1 (libc6,soft-float) => /lib/arm-linux-gnueabi/libzstd.so.1

amdahl$ schroot -c sid_armhf-dchroot -- /sbin/ldconfig -Xp | grep libzstd
...
libzstd.so.1 (libc6,hard-float) => /lib/arm-linux-gnueabihf/libzstd.so.1

so hopefully what we get for the armel/armhf pair is (a). (And I would
expect mixed armel/armhf to already fail horribly if that was not the case.)

You can actually run kfreebsd-amd64 binaries on a Linux kernel as their
ELF header looks the same. Not that they do useful stuff, but they may
go far enough as to reset your system clock.

But can you dlopen() kfreebsd-amd64 libraries into a running Linux
amd64 process? That's what matters here. If you can't, then the drivers
from mesa-vulkan-drivers:kfreebsd-amd64 will gracefully fail to load,
no harm done. Or if you can, then we likely already have worse problems.

What also gets interesting is when you try to combine e.g. amd64 and musl-linux-amd64. Those also do not tell apart from their ELF header.

This is a situation where "option 2", a single JSON manifest with only
the basename of the library, might actually be *better* than "option 1",
a distinct JSON manifest per architecture with the absolute path to the library.

Presumably musl-linux-amd64 has a library search path (either hard-coded
into it or via configuration) that is distinct from glibc's; if it didn't,
and if glibc's and musl's dynamic linkers are unable to avoid loading
libraries from the "other" ABI (scenario b above), we would already have
worse problems.

But if musl has a distinct search path, then a musl process calling dlopen("libvulkan_radeon.so", ...), as it would if option 2 is taken,
won't load the glibc flavour of libvulkan_radeon.so, because that isn't in
its search path; and conversely, a glibc process doing the same dlopen()
call won't see the musl flavour.

However if a musl process calls dlopen("/usr/lib/x86_64-linux-gnu/libvulkan_radeon.so", ...), as it
would if option 1 is taken, then I can see how that might accidentally
succeed if their ELF flags happen to be the same, leading to problems
when musl and glibc ABI assumptions collide.

1. As far as I'm aware, the basename of these files never matters: all
that matters is their content. So Mesa's debian/rules could do something
like this (assuming file-rename(1p) from the rename package):

file-rename 's/(.*)\.([^.]+?)\.json$/$1.$ENV{DEB_HOST_ARCH}.json/' \
debian/tmp/usr/share/vulkan/icd.d/*.json

to replace the "x86_64" or "armv8l" part of the filename with a string
that is definitely distinct for each pair of Debian architectures,
resulting in filenames like intel_icd.amd64.json and intel_icd.x32.json.

Or it could use $ENV{DEB_HOST_MULTIARCH} for longer-but-maybe-clearer
filenames like intel_icd.x86_64-linux-gnu.json, which would be necessary
if we want to allow mesa-vulkan-drivers:amd64,
mesa-vulkan-drivers:hurd-amd64 and mesa-vulkan-drivers:kfreebsd-amd64
to be co-installed.

...

This sounds very reasonable to me.

But if you are concerned about the possiblity that the dynamic linker will
load the "wrong" flavour of the library, how would this help us?
There is nothing special about these filenames that makes Vulkan-Loader
load some while ignoring others: the only mechanism for ignoring unsuitable/incompatible drivers is to dlopen() them and see if it fails.

(Vulkan-Loader *does* have a mechanism to flag drivers as 32-bit or 64-bit,
in which case the dlopen() won't even be attempted for the "wrong" word
size, but this is very limited and only works with word sizes, not the rest
of the possible differences between architectures; and in any case Mesa
doesn't currently apply this marking to its drivers.)

For example on a mixed armel/armhf system, if we did "option 1", we would
be relying on an armhf Vulkan-Loader doing something like this:

readdir("/usr/share/vulkan/icd.d") => a list of filenames
including e.g. radeon_icd.armel.json, radeon_icd.armhf.json,
nouveau_icd.armhf.json and so on

fopen(".../radeon_icd.armel.json", ...) => success
parse JSON
dlopen("/usr/lib/arm-linux-gnueabi/libvulkan_radeon.so", ...) => failure
(because it's a softfloat library and we are a hardfloat process)

fopen(".../radeon_icd.armhf.json", ...) => success
parse JSON
dlopen("/usr/lib/arm-linux-gnueabihf/libvulkan_radeon.so", ...) => success
ask this driver whether it can find any GPUs that it supports
foreach GPU in the result {
add the GPU to our list of devices
}

repeat both for nouveau driver
repeat both for lavapipe driver
repeat both for virtio driver
etc.

This works as intended for the most common multiarch scenarios like
amd64/i386. I suspect it also works as intended for armel/armhf, although
your assertion is that it does not.

2. Or, Mesa could give its Vulkan drivers the same file layout as its
Vulkan layers [and]
set the library_path field to be just the basename

Given what I said earlier about the inability to tell ELF headers apart
and the real problems observed in trying to do so, I have a preference
for the first option.

I don't see how this would introduce a new problem that we don't already
have to deal with. In "option 2", Vulkan-Loader would do something like:

readdir("/usr/share/vulkan/icd.d") => a list of filenames
including e.g. radeon_icd.json, nouveau_icd.json and so on

fopen(".../radeon_icd.json", ...) => success
parse JSON
dlopen("libvulkan_radeon.so", ...) => success
ask this driver whether it can find any GPUs that it supports
foreach GPU in the result {
add the GPU to our list of devices
}

repeat for nouveau driver
repeat for lavapipe driver
repeat for virtio driver
etc.

But if dlopen("libvulkan_radeon.so", ...) can succeed but return a library
that is of the wrong architecture, don't we already have an equivalent
problem when evaluating DT_NEEDED dependencies, like when libGL.so.1
loads libGLdispatch.so.0, or at least when evaluating dlopen()'d weak dependencies, like when libSDL2-2.0.so.0 loads libdbus-1.so.3? And if that problem already exists, then the relevant architecture pair already aren't going to work well together, and Mesa/Vulkan isn't making the problem worse.

smcv

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Helmut Grohne@21:1/5 to All on Tue Apr 15 22:10:01 2025

XPost: linux.debian.maint.x

On Tue, Apr 15, 2025 at 10:41:22AM +0200, Michel D�nzer wrote:

Given that different variants of libvulkan_*.so are located in separate search paths, is there any scenario other than a system misconfiguration which would result in an attempt to load the wrong one?

I fear the answer to this question is not obvious and I can only
partially answer it.

For one thing, I note that the idea of separate search paths is a nice
one. We want to think of architectures and their library directories as separate. That is not reality. The glibc dynamic loader searches every
library path referenced from /etc/ld.so.conf. On Debian multiarch
systems that happens to be every architecture for which libc6 is
installed. You can easily see this on a multiarch system by inspecting /etc/ld.so.cache as it contains libraries for multiple architectures.

I can also tell you that running a kfreebsd-amd64 ELF executable on a
Linux amd64 kernel works "too well". The Linux kernel cannot tell these architectures apart from the ELF header and happily runs it. As the
syscall ABI is completely different, you end up doing stuff you never
wanted such as resetting your system clock before the program quickly
fails.

Loading shared libraries is a different beast as it is done by glibc.
There the story looks different. If you attempt to load incompatible
libraries you tend to see the error "wrong ELF class: ..." from
dlerror() after having tried loading it with dlopen. Not so, if your architectures are too similar. For instance, attempting to load an armel library into an armhf executable, I got "cannot open shared object file:
No such file or directory" (and note that it successfully opened but not
mapped the file).

I cannot tell how this translates to the vulkan case. However, we
learned two aspects in the process:
* On Debian systems, the loader will search all multiarch directories
for compatible libraries.
* The ELF class is not sufficient to tell armel and armhf apart.

Whether these are convincing arguments is up to you in the end. I
suspect further research is needed.

Helmut

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Helmut Grohne@21:1/5 to Simon McVittie on Tue Apr 15 22:10:01 2025

XPost: linux.debian.maint.x

Hi Simon,

On Tue, Apr 15, 2025 at 01:47:54PM +0100, Simon McVittie wrote:

I think a regression for amd64/i386 co-installation would have a
considerably larger practical negative impact on Debian users than
ABI conflicts between less-commonly-used architecture pairs like
armel/armhf, and a very much larger practical negative impact than
conflicts between architecture pairs involving -ports (amd64/hurd-amd64
or amd64/kfreebsd-amd64) or architectures that are not yet in Debian at
all (amd64/musl-linux-amd64).

This reasoning convinces me. As it stands, I only see solutions to this
problem that are inappropriate for trixie.

(b) glibc/ld.so can't distinguish between armel and armhf libraries.
But if this is true, then we will already have the problem that loading
an ordinary library dependency like "libc.so.6" or "libvulkan.so.1" to
satisfy DT_NEEDED can load the wrong flavour, so we have already lost,
even before loading a Vulkan driver plugin; and I don't see how Mesa
doing a dlopen("libvulkan_radeon.so", ...) is going to make this
any worse.

This is an argument that I missed earlier. Thank you. It changes
preference on solution as it removes my argument for not preferring
option 2.

But can you dlopen() kfreebsd-amd64 libraries into a running Linux
amd64 process? That's what matters here. If you can't, then the drivers
from mesa-vulkan-drivers:kfreebsd-amd64 will gracefully fail to load,
no harm done. Or if you can, then we likely already have worse problems.

I fear I cannot answer that question anymore given that there no longer
is kfreebsd-amd64. Experiments with armel and armhf (see my other mail)
though yield that the error returned from dlerror() is different from
when your ELF class is incompatible. I'd like to understand the
semantics here, but my research of the glibc source was inconclusive in
this regard.

Presumably musl-linux-amd64 has a library search path (either hard-coded
into it or via configuration) that is distinct from glibc's; if it didn't, and if glibc's and musl's dynamic linkers are unable to avoid loading libraries from the "other" ABI (scenario b above), we would already have worse problems.

The reason we don't have those problems primarily is that the ongoing disagreement between musl and systemd upstreams make it impossible to
bootstrap musl-based Debian ports and therefore there is noone who
attempts to mix musl and glibc on a single system.

In any case, I expect musl to search /usr/lib, so there is at least that
shared path.

However if a musl process calls dlopen("/usr/lib/x86_64-linux-gnu/libvulkan_radeon.so", ...), as it
would if option 1 is taken, then I can see how that might accidentally succeed if their ELF flags happen to be the same, leading to problems
when musl and glibc ABI assumptions collide.

This is another fairly convincing argument!

This works as intended for the most common multiarch scenarios like amd64/i386. I suspect it also works as intended for armel/armhf, although your assertion is that it does not.

Indeed, my assumption was that you could dlopen an armel library on
armhf, but I didn't succeed in practically doing that. I cannot tell
whether this is due to me not trying hard enough or whether there is
some mechanism systematically preventing this from working in a reliable
way.

I propose the following consensus:

None of the known solutions (options 1 and 2) or workarounds (dropping m-a:same) is appropriate for Debian trixie and the best course of
short-term action is not fixing this bug for trixie while working
towards a long term solution in forky.

Regarding the precise implementation going forward, I prefer deferring
to you (plural) as I've shared my limited knowledge and trust that you
find a more sensible solution than I could.

Helmut

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From =?UTF-8?Q?Michel_D=C3=A4nzer?=@21:1/5 to Helmut Grohne on Wed Apr 16 09:30:01 2025

XPost: linux.debian.maint.x

On 2025-04-15 11:34, Helmut Grohne wrote:

I can also tell you that running a kfreebsd-amd64 ELF executable on a
Linux amd64 kernel works "too well". The Linux kernel cannot tell these architectures apart from the ELF header and happily runs it. As the
syscall ABI is completely different, you end up doing stuff you never
wanted such as resetting your system clock before the program quickly
fails.

Loading shared libraries is a different beast as it is done by glibc.
There the story looks different. If you attempt to load incompatible libraries you tend to see the error "wrong ELF class: ..." from
dlerror() after having tried loading it with dlopen. Not so, if your architectures are too similar. For instance, attempting to load an armel library into an armhf executable, I got "cannot open shared object file:
No such file or directory" (and note that it successfully opened but not mapped the file).

Per Simon's other post, none of that is relevant for his proposed option 2, which boils down to dlopen("libvulkan_*.so", ...). If that could end up opening a variant from a wrong search path, it'd break lots of other things anyway.

In summary, Simon's option 2 seems like the clear winner. As he pointed out, it matches what's already being done for libEGL_mesa.so.0 shipped in libegl-mesa0, with no known issues.

--
Earthling Michel Dänzer \ GNOME / Xwayland / Mesa developer https://redhat.com \ Libre software enthusiast

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Simon McVittie@21:1/5 to Helmut Grohne on Wed Apr 23 17:20:01 2025

XPost: linux.debian.maint.x

On Tue, 15 Apr 2025 at 16:52:56 +0200, Helmut Grohne wrote:

On Tue, Apr 15, 2025 at 01:47:54PM +0100, Simon McVittie wrote:

I think a regression for amd64/i386 co-installation would have a
considerably larger practical negative impact on Debian users

This reasoning convinces me. As it stands, I only see solutions to this >problem that are inappropriate for trixie.

I think the "option 2" that I proposed is entirely feasible for trixie, actually. I'm testing an implementation now.

What I said before is that I don't think it's feasible to send
it upstream for trixie, but it's fairly straightforward to do as a
downstream adjustment (in debian/rules rather than in the upstream code),
so inability to upstream it is not necessarily a blocker IMO. Within
Debian, we can make simplifying assumptions like "we are installing into
a $libdir that the dynamic linker will search by default" which would
not necessarily be considered valid upstream.

smcv

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Centurion
  Sun May 25 01:43:46 2025
  from Berea, Ohio via Telnet
- Centurion
  Sun May 25 01:41:34 2025
  from Berea, Ohio via Telnet
- Centurion
  Sat May 24 22:26:49 2025
  from Berea, Ohio via Telnet
- Centurion
  Sat May 24 21:46:22 2025
  from Berea, Ohio via Telnet
- Daniel Garrod
  Sat May 24 19:50:22 2025
  from Cambridge, Uk via Telnet
- Ian Rihard Kosednar
  Fri May 23 22:50:48 2025
  from No via RLogin
- Ian Rihard Kosednar
  Fri May 23 22:39:13 2025
  from No via RLogin
- Ian Rihard Kosednar
  Fri May 23 22:38:31 2025
  from No via RLogin

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	482
Nodes:	16 (0 / 16)
Uptime:	69:08:12
Calls:	9,571
Calls today:	2
Files:	13,663
Messages:	6,142,164

Bug#980148: mesa-vulkan-drivers: file content conflict in Multi-Arch:sa

Who's Online

Recent Visitors

System Info