On 2021-01-15 12:02 p.m., Thorsten Glaser wrote:
Package: mesa-vulkan-drivers
[…]
Multi-Arch: same
The file /usr/share/vulkan/icd.d/intel_icd.x86_64.json differs.
amd64:
{
"ICD": {
"api_version": "1.2.145",
"library_path": "/usr/lib/x86_64-linux-gnu/libvulkan_intel.so"
},
"file_format_version": "1.0.0"
}
x32:
{
"ICD": {
"api_version": "1.2.145",
"library_path": "/usr/lib/x86_64-linux-gnux32/libvulkan_intel.so"
},
"file_format_version": "1.0.0"
}
This file must be moved out of /usr/share and into a multiarch library path.
Looks to me like the filename is wrong on x32.
On Fri, Jan 15, 2021 at 12:06:14PM +0100, Michel Dänzer wrote:
On 2021-01-15 12:02 p.m., Thorsten Glaser wrote:
Package: mesa-vulkan-drivers
[…]
Multi-Arch: same
The file /usr/share/vulkan/icd.d/intel_icd.x86_64.json differs.
amd64:
{
"ICD": {
"api_version": "1.2.145",
"library_path": "/usr/lib/x86_64-linux-gnu/libvulkan_intel.so"
},
"file_format_version": "1.0.0"
}
x32:
{
"ICD": {
"api_version": "1.2.145",
"library_path": "/usr/lib/x86_64-linux-gnux32/libvulkan_intel.so" >>> },
"file_format_version": "1.0.0"
}
This file must be moved out of /usr/share and into a multiarch library
path.
Looks to me like the filename is wrong on x32.
How do you reach that conclusion? At least the multiarch tuple is
appropriate for x32.
I can see two ways to resolve #980148 without needing to change the
search path for Vulkan drivers:
1. As far as I'm aware, the basename of these files never matters: all
that matters is their content. So Mesa's debian/rules could do something
like this (assuming file-rename(1p) from the rename package):
file-rename 's/(.*)\.([^.]+?)\.json$/$1.$ENV{DEB_HOST_ARCH}.json/' \
debian/tmp/usr/share/vulkan/icd.d/*.json
to replace the "x86_64" or "armv8l" part of the filename with a string
that is definitely distinct for each pair of Debian architectures,
resulting in filenames like intel_icd.amd64.json and intel_icd.x32.json.
Or it could use $ENV{DEB_HOST_MULTIARCH} for longer-but-maybe-clearer
filenames like intel_icd.x86_64-linux-gnu.json, which would be necessary
if we want to allow mesa-vulkan-drivers:amd64,
mesa-vulkan-drivers:hurd-amd64 and mesa-vulkan-drivers:kfreebsd-amd64
to be co-installed.
[...]
2. Or, Mesa could give its Vulkan drivers the same file layout as its
Vulkan layers (which happens to be the same as the Nvidia proprietary
driver's Vulkan driver), taking advantage of the fact that on Debian, each
of its drivers is installed into ld.so's default load path for shared
libraries. So instead of hard-coding the full path of the library, it could
set the library_path field to be just the basename, resulting in the same
JSON content on every architecture:
{
"ICD": {
"api_version": "1.2.145",
"library_path": "libvulkan_intel.so"
},
"file_format_version": "1.0.0"
}
and then rename the file to a name that is intentionally the same
for every architecture (like intel_icd.json), so that they *always*
collide, and dpkg's multiarch refcounting resolves this by only keeping
one copy.
On 2025-04-14 18:23, Simon McVittie wrote:
I can see two ways to resolve #980148 without needing to change the
search path for Vulkan drivers:
1. [rename the files to have a longer architecture disambiguator]
2. Or, Mesa could give its Vulkan drivers the same file layout as its
Vulkan layers [changing] the library_path field to be just the basename
and then rename the file to a name that is intentionally the same
for every architecture (like intel_icd.json)
FWIW, I recommend option 2 for these reasons:
It should result in slightly better Vulkan start-up performance when mesa-vulkan-drivers is installed for multiple architectures, because
the Vulkan loader won't waste cycles trying to load ICDs that can't work.
It also avoids warning messages from the loader (or the dynamic
linker?) when trying to load ICDs that can't work.
On Mon, Apr 14, 2025 at 05:23:01PM +0100, Simon McVittie wrote:
Loaders are expected to be able to recognise that a particular driver is not >> for them, and gracefully not load it. In practice this works fine, because all
of our architectures can be distinguished by their ELF headers (and if that >> wasn't the case, multiarch co-installation of ordinary shared libraries would
go badly wrong).
I'm sorry to disappoint you, but reality is not like that.
You can actually run kfreebsd-amd64 binaries on a Linux kernel as their
ELF header looks the same. Not that they do useful stuff, but they may
go far enough as to reset your system clock. I've actually encountered
that.
Then, if you combine armel and armhf, those architectures also have ELF headers that are mostly indistinguishable. I'm not sure what happens
exactly, but it isn't good.
What also gets interesting is when you try to combine e.g. amd64 and musl-linux-amd64. Those also do not tell apart from their ELF header.
The elf-arch tool from arch-test attempts to map ELF headers to Debian architectures, but it can only do so much.
So no, as long as we support armel and armhf simultaneously, we cannot
tell architectures apart by their ELF header.
[...]
Given what I said earlier about the inability to tell ELF headers apart
and the real problems observed in trying to do so, I have a preference
for the first option.
In general, I doubt we fix this for trixie other than dropping M-A:same maybe.
On Mon, Apr 14, 2025 at 05:23:01PM +0100, Simon McVittie wrote:...
Loaders are expected to be able to recognise that a particular driver is not
for them, and gracefully not load it. In practice this works fine, because all
of our architectures can be distinguished by their ELF headers (and if that wasn't the case, multiarch co-installation of ordinary shared libraries would
go badly wrong).
I'm sorry to disappoint you, but reality is not like that.
Then, if you combine armel and armhf, those architectures also have ELF headers that are mostly indistinguishable. I'm not sure what happens...
exactly, but it isn't good.
So no, as long as we support armel and armhf simultaneously, we cannot
tell architectures apart by their ELF header.
You can actually run kfreebsd-amd64 binaries on a Linux kernel as their
ELF header looks the same. Not that they do useful stuff, but they may
go far enough as to reset your system clock.
What also gets interesting is when you try to combine e.g. amd64 and musl-linux-amd64. Those also do not tell apart from their ELF header.
...1. As far as I'm aware, the basename of these files never matters: all
that matters is their content. So Mesa's debian/rules could do something
like this (assuming file-rename(1p) from the rename package):
file-rename 's/(.*)\.([^.]+?)\.json$/$1.$ENV{DEB_HOST_ARCH}.json/' \
debian/tmp/usr/share/vulkan/icd.d/*.json
to replace the "x86_64" or "armv8l" part of the filename with a string
that is definitely distinct for each pair of Debian architectures,
resulting in filenames like intel_icd.amd64.json and intel_icd.x32.json.
Or it could use $ENV{DEB_HOST_MULTIARCH} for longer-but-maybe-clearer
filenames like intel_icd.x86_64-linux-gnu.json, which would be necessary
if we want to allow mesa-vulkan-drivers:amd64,
mesa-vulkan-drivers:hurd-amd64 and mesa-vulkan-drivers:kfreebsd-amd64
to be co-installed.
This sounds very reasonable to me.
2. Or, Mesa could give its Vulkan drivers the same file layout as its
Vulkan layers [and]
set the library_path field to be just the basename
Given what I said earlier about the inability to tell ELF headers apart
and the real problems observed in trying to do so, I have a preference
for the first option.
Given that different variants of libvulkan_*.so are located in separate search paths, is there any scenario other than a system misconfiguration which would result in an attempt to load the wrong one?
I think a regression for amd64/i386 co-installation would have a
considerably larger practical negative impact on Debian users than
ABI conflicts between less-commonly-used architecture pairs like
armel/armhf, and a very much larger practical negative impact than
conflicts between architecture pairs involving -ports (amd64/hurd-amd64
or amd64/kfreebsd-amd64) or architectures that are not yet in Debian at
all (amd64/musl-linux-amd64).
(b) glibc/ld.so can't distinguish between armel and armhf libraries.
But if this is true, then we will already have the problem that loading
an ordinary library dependency like "libc.so.6" or "libvulkan.so.1" to
satisfy DT_NEEDED can load the wrong flavour, so we have already lost,
even before loading a Vulkan driver plugin; and I don't see how Mesa
doing a dlopen("libvulkan_radeon.so", ...) is going to make this
any worse.
But can you dlopen() kfreebsd-amd64 libraries into a running Linux
amd64 process? That's what matters here. If you can't, then the drivers
from mesa-vulkan-drivers:kfreebsd-amd64 will gracefully fail to load,
no harm done. Or if you can, then we likely already have worse problems.
Presumably musl-linux-amd64 has a library search path (either hard-coded
into it or via configuration) that is distinct from glibc's; if it didn't, and if glibc's and musl's dynamic linkers are unable to avoid loading libraries from the "other" ABI (scenario b above), we would already have worse problems.
However if a musl process calls dlopen("/usr/lib/x86_64-linux-gnu/libvulkan_radeon.so", ...), as it
would if option 1 is taken, then I can see how that might accidentally succeed if their ELF flags happen to be the same, leading to problems
when musl and glibc ABI assumptions collide.
This works as intended for the most common multiarch scenarios like amd64/i386. I suspect it also works as intended for armel/armhf, although your assertion is that it does not.
I can also tell you that running a kfreebsd-amd64 ELF executable on a
Linux amd64 kernel works "too well". The Linux kernel cannot tell these architectures apart from the ELF header and happily runs it. As the
syscall ABI is completely different, you end up doing stuff you never
wanted such as resetting your system clock before the program quickly
fails.
Loading shared libraries is a different beast as it is done by glibc.
There the story looks different. If you attempt to load incompatible libraries you tend to see the error "wrong ELF class: ..." from
dlerror() after having tried loading it with dlopen. Not so, if your architectures are too similar. For instance, attempting to load an armel library into an armhf executable, I got "cannot open shared object file:
No such file or directory" (and note that it successfully opened but not mapped the file).
On Tue, Apr 15, 2025 at 01:47:54PM +0100, Simon McVittie wrote:
I think a regression for amd64/i386 co-installation would have a
considerably larger practical negative impact on Debian users
This reasoning convinces me. As it stands, I only see solutions to this >problem that are inappropriate for trixie.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 482 |
Nodes: | 16 (0 / 16) |
Uptime: | 69:08:12 |
Calls: | 9,571 |
Calls today: | 2 |
Files: | 13,663 |
Messages: | 6,142,164 |