• Bug#980148: mesa-vulkan-drivers: file content conflict in Multi-Arch:sa

    From Helmut Grohne@21:1/5 to All on Fri Apr 4 20:50:02 2025
    XPost: linux.debian.maint.x

    user debian-qa@lists.debian.org
    usertags 980148 + fileconflict
    severity 980148 serious
    thanks

    Hi,

    On Fri, Jan 15, 2021 at 12:06:14PM +0100, Michel Dänzer wrote:
    On 2021-01-15 12:02 p.m., Thorsten Glaser wrote:
    Package: mesa-vulkan-drivers
    […]
    Multi-Arch: same

    The file /usr/share/vulkan/icd.d/intel_icd.x86_64.json differs.

    amd64:

    {
    "ICD": {
    "api_version": "1.2.145",
    "library_path": "/usr/lib/x86_64-linux-gnu/libvulkan_intel.so"
    },
    "file_format_version": "1.0.0"
    }

    x32:

    {
    "ICD": {
    "api_version": "1.2.145",
    "library_path": "/usr/lib/x86_64-linux-gnux32/libvulkan_intel.so"
    },
    "file_format_version": "1.0.0"
    }

    This file must be moved out of /usr/share and into a multiarch library path.

    Looks to me like the filename is wrong on x32.

    How do you reach that conclusion? At least the multiarch tuple is
    appropriate for x32.

    A similar problem exists for armel and armhf. Both install /usr/share/vulkan/icd.d/lvp_icd.armv8l.json and /usr/share/vulkan/icd.d/radeon_icd.armv8l.json. I haven't checked their
    file contents, but it seems very likely that they differ in
    arm-linux-gnueabi vs arm-linux-gnueabihf in their library_path.

    Given that these files reference shared libraries, they are inherently architecture-dependent and that makes them technically inappropriate to
    ship below /usr/share. Would it be feasible to transition these files
    rom /usr/share/vulkan/icd.d to /usr/lib/<triplet>/vulkan/icd.d? That'd
    be a longer adventure as the consumers of these files would first have
    to search both locations and once they all consider both we could start
    moving them.

    The file conflict on two release architectures makes the problem rc (as
    you can practically experience an unpack error). A short-term workaround
    is dropping Multi-Arch: same. Doing so reopens #853897 sadly.

    Helmut

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Michel_D=C3=A4nzer?=@21:1/5 to Helmut Grohne on Mon Apr 7 10:40:01 2025
    XPost: linux.debian.maint.x

    On 2025-04-04 20:42, Helmut Grohne wrote:
    On Fri, Jan 15, 2021 at 12:06:14PM +0100, Michel Dänzer wrote:
    On 2021-01-15 12:02 p.m., Thorsten Glaser wrote:
    Package: mesa-vulkan-drivers
    […]
    Multi-Arch: same

    The file /usr/share/vulkan/icd.d/intel_icd.x86_64.json differs.

    amd64:

    {
    "ICD": {
    "api_version": "1.2.145",
    "library_path": "/usr/lib/x86_64-linux-gnu/libvulkan_intel.so"
    },
    "file_format_version": "1.0.0"
    }

    x32:

    {
    "ICD": {
    "api_version": "1.2.145",
    "library_path": "/usr/lib/x86_64-linux-gnux32/libvulkan_intel.so" >>> },
    "file_format_version": "1.0.0"
    }

    This file must be moved out of /usr/share and into a multiarch library
    path.

    Looks to me like the filename is wrong on x32.

    How do you reach that conclusion? At least the multiarch tuple is
    appropriate for x32.

    I see, my bad.


    --
    Earthling Michel Dänzer \ GNOME / Xwayland / Mesa developer https://redhat.com \ Libre software enthusiast

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Michel_D=C3=A4nzer?=@21:1/5 to Simon McVittie on Mon Apr 14 19:00:01 2025
    XPost: linux.debian.maint.x

    On 2025-04-14 18:23, Simon McVittie wrote:

    I can see two ways to resolve #980148 without needing to change the
    search path for Vulkan drivers:

    1. As far as I'm aware, the basename of these files never matters: all
    that matters is their content. So Mesa's debian/rules could do something
    like this (assuming file-rename(1p) from the rename package):

    file-rename 's/(.*)\.([^.]+?)\.json$/$1.$ENV{DEB_HOST_ARCH}.json/' \
    debian/tmp/usr/share/vulkan/icd.d/*.json

    to replace the "x86_64" or "armv8l" part of the filename with a string
    that is definitely distinct for each pair of Debian architectures,
    resulting in filenames like intel_icd.amd64.json and intel_icd.x32.json.

    Or it could use $ENV{DEB_HOST_MULTIARCH} for longer-but-maybe-clearer
    filenames like intel_icd.x86_64-linux-gnu.json, which would be necessary
    if we want to allow mesa-vulkan-drivers:amd64,
    mesa-vulkan-drivers:hurd-amd64 and mesa-vulkan-drivers:kfreebsd-amd64
    to be co-installed.

    [...]

    2. Or, Mesa could give its Vulkan drivers the same file layout as its
    Vulkan layers (which happens to be the same as the Nvidia proprietary
    driver's Vulkan driver), taking advantage of the fact that on Debian, each
    of its drivers is installed into ld.so's default load path for shared
    libraries. So instead of hard-coding the full path of the library, it could
    set the library_path field to be just the basename, resulting in the same
    JSON content on every architecture:

    {
    "ICD": {
    "api_version": "1.2.145",
    "library_path": "libvulkan_intel.so"
    },
    "file_format_version": "1.0.0"
    }

    and then rename the file to a name that is intentionally the same
    for every architecture (like intel_icd.json), so that they *always*
    collide, and dpkg's multiarch refcounting resolves this by only keeping
    one copy.

    FWIW, I recommend option 2 for these reasons:

    It should result in slightly better Vulkan start-up performance when mesa-vulkan-drivers is installed for multiple architectures, because the Vulkan loader won't waste cycles trying to load ICDs that can't work.

    It also avoids warning messages from the loader (or the dynamic linker?) when trying to load ICDs that can't work.


    --
    Earthling Michel Dänzer \ GNOME / Xwayland / Mesa developer https://redhat.com \ Libre software enthusiast

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Simon McVittie@21:1/5 to All on Mon Apr 14 19:20:01 2025
    XPost: linux.debian.maint.x

    On Mon, 14 Apr 2025 at 18:48:02 +0200, Michel Dänzer wrote:
    On 2025-04-14 18:23, Simon McVittie wrote:

    I can see two ways to resolve #980148 without needing to change the
    search path for Vulkan drivers:

    1. [rename the files to have a longer architecture disambiguator]

    2. Or, Mesa could give its Vulkan drivers the same file layout as its
    Vulkan layers [changing] the library_path field to be just the basename
    and then rename the file to a name that is intentionally the same
    for every architecture (like intel_icd.json)

    FWIW, I recommend option 2 for these reasons:

    It should result in slightly better Vulkan start-up performance when mesa-vulkan-drivers is installed for multiple architectures, because
    the Vulkan loader won't waste cycles trying to load ICDs that can't work.

    Yes, although that speed-up probably isn't noticeable.

    It also avoids warning messages from the loader (or the dynamic
    linker?) when trying to load ICDs that can't work.

    I believe Vulkan-Loader now tries to suppress those messages, although
    they were a frequent source of noise in support channels in the past.

    It's worth mentioning that there are some other specifications that are "the same shape" as Vulkan, particularly EGL and OpenXR. It would be great if Mesa's Vulkan driver could do this in a way that makes a good example for others, to make file content conflicts less likely to arise because anyone naively copying what Mesa does for Vulkan will already be "doing the right thing".

    For EGL, Mesa's driver in the libegl-mesa0 package already uses the
    equivalent of option 2 above, so there is no conflict. I think that's
    another point in favour of option 2.

    For OpenXR, the only "runtime" (~= driver) implementation I'm aware of in Debian (Monado) has a file content conflict bug similar to this one, but with no attempt to distinguish different architectures' files, so it affects amd64/i386 just as much as amd64/x32: <https://bugs.debian.org/1101455>.
    On that bug, I recommended the equivalent of option 2 above, because the equivalent of option 1 would be more annoying to implement for OpenXR
    (it is not *precisely* "the same shape" as Vulkan).

    smcv

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Michel_D=C3=A4nzer?=@21:1/5 to Helmut Grohne on Tue Apr 15 10:50:01 2025
    XPost: linux.debian.maint.x

    On 2025-04-14 18:44, Helmut Grohne wrote:
    On Mon, Apr 14, 2025 at 05:23:01PM +0100, Simon McVittie wrote:
    Loaders are expected to be able to recognise that a particular driver is not >> for them, and gracefully not load it. In practice this works fine, because all
    of our architectures can be distinguished by their ELF headers (and if that >> wasn't the case, multiarch co-installation of ordinary shared libraries would
    go badly wrong).

    I'm sorry to disappoint you, but reality is not like that.

    You can actually run kfreebsd-amd64 binaries on a Linux kernel as their
    ELF header looks the same. Not that they do useful stuff, but they may
    go far enough as to reset your system clock. I've actually encountered
    that.

    Then, if you combine armel and armhf, those architectures also have ELF headers that are mostly indistinguishable. I'm not sure what happens
    exactly, but it isn't good.

    What also gets interesting is when you try to combine e.g. amd64 and musl-linux-amd64. Those also do not tell apart from their ELF header.

    The elf-arch tool from arch-test attempts to map ELF headers to Debian architectures, but it can only do so much.

    So no, as long as we support armel and armhf simultaneously, we cannot
    tell architectures apart by their ELF header.

    [...]

    Given what I said earlier about the inability to tell ELF headers apart
    and the real problems observed in trying to do so, I have a preference
    for the first option.

    Given that different variants of libvulkan_*.so are located in separate search paths, is there any scenario other than a system misconfiguration which would result in an attempt to load the wrong one?


    --
    Earthling Michel Dänzer \ GNOME / Xwayland / Mesa developer https://redhat.com \ Libre software enthusiast

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Simon McVittie@21:1/5 to Helmut Grohne on Tue Apr 15 15:00:01 2025
    XPost: linux.debian.maint.x

    On Mon, 14 Apr 2025 at 18:44:38 +0200, Helmut Grohne wrote:
    In general, I doubt we fix this for trixie other than dropping M-A:same maybe.

    Please don't drop M-A: same from mesa-vulkan-drivers. From my point of
    view as someone helping to make Steam be runnable on Linux: mesa-vulkan-drivers:amd64 and mesa-vulkan-drivers:i386 need to be co-installable, otherwise it isn't possible to run 64- and 32-bit games
    that use Vulkan on the same system (which would be a regression relative
    to bookworm, bullseye, and I think also buster, where this worked fine).

    Or, even if proprietary software like Steam is disregarded, mesa-vulkan-drivers:amd64 and mesa-vulkan-drivers:i386 need to be co-installable if we want both 64- and 32-bit Wine to be able to implement
    the Direct3D API using DXVK, which I believe we do.

    I think a regression for amd64/i386 co-installation would have a
    considerably larger practical negative impact on Debian users than
    ABI conflicts between less-commonly-used architecture pairs like
    armel/armhf, and a very much larger practical negative impact than
    conflicts between architecture pairs involving -ports (amd64/hurd-amd64
    or amd64/kfreebsd-amd64) or architectures that are not yet in Debian at
    all (amd64/musl-linux-amd64).

    On Mon, Apr 14, 2025 at 05:23:01PM +0100, Simon McVittie wrote:
    Loaders are expected to be able to recognise that a particular driver is not
    for them, and gracefully not load it. In practice this works fine, because all
    of our architectures can be distinguished by their ELF headers (and if that wasn't the case, multiarch co-installation of ordinary shared libraries would
    go badly wrong).

    I'm sorry to disappoint you, but reality is not like that.
    ...
    Then, if you combine armel and armhf, those architectures also have ELF headers that are mostly indistinguishable. I'm not sure what happens
    exactly, but it isn't good.
    ...
    So no, as long as we support armel and armhf simultaneously, we cannot
    tell architectures apart by their ELF header.

    If this is a problem, surely it's a problem that we already have, whatever
    Mesa might do? Because /etc/ld.so.conf.d adds all the multiarch directories from every enabled architecture to the search path:

    amdahl$ schroot -c sid_armel-dchroot cat /etc/ld.so.conf.d/arm-linux-gnueabi.conf
    ...
    # Multiarch support
    /usr/local/lib/arm-linux-gnueabi
    /lib/arm-linux-gnueabi
    /usr/lib/arm-linux-gnueabi

    amdahl$ schroot -c sid_armhf-dchroot cat /etc/ld.so.conf.d/arm-linux-gnueabihf.conf
    ...
    # Multiarch support
    /usr/local/lib/arm-linux-gnueabihf
    /lib/arm-linux-gnueabihf
    /usr/lib/arm-linux-gnueabihf

    and we rely on the dynamic linker to consider and reject libraries that
    are not, in fact, compatible with the current process.

    I don't have a mixed armel/armhf system immediately to hand right now,
    but you can see this in action on a mixed amd64/i386 system. I don't have
    one /etc/ld.so.cache for amd64 and a second for i386: I only have one ld.so cache, containing both:

    $ /sbin/ldconfig -Xp | grep libvulkan.so.1
    libvulkan.so.1 (libc6,x86-64) => /lib/x86_64-linux-gnu/libvulkan.so.1
    libvulkan.so.1 (libc6) => /lib/i386-linux-gnu/libvulkan.so.1

    But when one of my installed programs asks to load libvulkan.so.1, either
    via DT_NEEDED or dlopen(), ld.so knows that it must choose the one that
    matches the architecture of the running process and disregard the other one.

    (There is also only one LD_LIBRARY_PATH, shared between all architectures.)

    Similarly I'm 95% sure that a mixed armel/armhf system only has one ld.so.cache, listing both armel and armhf libraries indiscriminately, and hopefully with enough metadata to choose which one is more appropriate
    and disregard the other. So on a mixed armel/armhf system, there are
    two possibilities:

    (a) glibc/ld.so knows how to distinguish between armel and armhf libraries,
    and avoid loading armel libraries into armhf processes and vice versa.
    If this is true then loading "libvulkan_radeon.so" into an armhf process
    will reliably load the armhf flavour, avoiding the armel flavour,
    and we win.

    (b) glibc/ld.so can't distinguish between armel and armhf libraries.
    But if this is true, then we will already have the problem that loading
    an ordinary library dependency like "libc.so.6" or "libvulkan.so.1" to
    satisfy DT_NEEDED can load the wrong flavour, so we have already lost,
    even before loading a Vulkan driver plugin; and I don't see how Mesa
    doing a dlopen("libvulkan_radeon.so", ...) is going to make this
    any worse.

    and it seems like the same would be true for any pair of glibc
    architectures? Either we're in the equivalent of situation (a) and my
    "option 2" from earlier in the thread would work fine, or we're in
    the equivalent of situation (b) and we already have a serious problem,
    which is not going to be made noticeably worse by anything Mesa does.

    In practice, it seems that ld.so *can* distinguish between armel and armhf, presumably by distinguishing EF_ARM_ABI_FLOAT_SOFT from
    EF_ARM_ABI_FLOAT_HARD in their e_flags field:

    amdahl$ schroot -c sid_armel-dchroot -- /sbin/ldconfig -Xp | grep libzstd
    ...
    libzstd.so.1 (libc6,soft-float) => /lib/arm-linux-gnueabi/libzstd.so.1

    amdahl$ schroot -c sid_armhf-dchroot -- /sbin/ldconfig -Xp | grep libzstd
    ...
    libzstd.so.1 (libc6,hard-float) => /lib/arm-linux-gnueabihf/libzstd.so.1

    so hopefully what we get for the armel/armhf pair is (a). (And I would
    expect mixed armel/armhf to already fail horribly if that was not the case.)

    You can actually run kfreebsd-amd64 binaries on a Linux kernel as their
    ELF header looks the same. Not that they do useful stuff, but they may
    go far enough as to reset your system clock.

    But can you dlopen() kfreebsd-amd64 libraries into a running Linux
    amd64 process? That's what matters here. If you can't, then the drivers
    from mesa-vulkan-drivers:kfreebsd-amd64 will gracefully fail to load,
    no harm done. Or if you can, then we likely already have worse problems.

    What also gets interesting is when you try to combine e.g. amd64 and musl-linux-amd64. Those also do not tell apart from their ELF header.

    This is a situation where "option 2", a single JSON manifest with only
    the basename of the library, might actually be *better* than "option 1",
    a distinct JSON manifest per architecture with the absolute path to the library.

    Presumably musl-linux-amd64 has a library search path (either hard-coded
    into it or via configuration) that is distinct from glibc's; if it didn't,
    and if glibc's and musl's dynamic linkers are unable to avoid loading
    libraries from the "other" ABI (scenario b above), we would already have
    worse problems.

    But if musl has a distinct search path, then a musl process calling dlopen("libvulkan_radeon.so", ...), as it would if option 2 is taken,
    won't load the glibc flavour of libvulkan_radeon.so, because that isn't in
    its search path; and conversely, a glibc process doing the same dlopen()
    call won't see the musl flavour.

    However if a musl process calls dlopen("/usr/lib/x86_64-linux-gnu/libvulkan_radeon.so", ...), as it
    would if option 1 is taken, then I can see how that might accidentally
    succeed if their ELF flags happen to be the same, leading to problems
    when musl and glibc ABI assumptions collide.

    1. As far as I'm aware, the basename of these files never matters: all
    that matters is their content. So Mesa's debian/rules could do something
    like this (assuming file-rename(1p) from the rename package):

    file-rename 's/(.*)\.([^.]+?)\.json$/$1.$ENV{DEB_HOST_ARCH}.json/' \
    debian/tmp/usr/share/vulkan/icd.d/*.json

    to replace the "x86_64" or "armv8l" part of the filename with a string
    that is definitely distinct for each pair of Debian architectures,
    resulting in filenames like intel_icd.amd64.json and intel_icd.x32.json.

    Or it could use $ENV{DEB_HOST_MULTIARCH} for longer-but-maybe-clearer
    filenames like intel_icd.x86_64-linux-gnu.json, which would be necessary
    if we want to allow mesa-vulkan-drivers:amd64,
    mesa-vulkan-drivers:hurd-amd64 and mesa-vulkan-drivers:kfreebsd-amd64
    to be co-installed.
    ...
    This sounds very reasonable to me.

    But if you are concerned about the possiblity that the dynamic linker will
    load the "wrong" flavour of the library, how would this help us?
    There is nothing special about these filenames that makes Vulkan-Loader
    load some while ignoring others: the only mechanism for ignoring unsuitable/incompatible drivers is to dlopen() them and see if it fails.

    (Vulkan-Loader *does* have a mechanism to flag drivers as 32-bit or 64-bit,
    in which case the dlopen() won't even be attempted for the "wrong" word
    size, but this is very limited and only works with word sizes, not the rest
    of the possible differences between architectures; and in any case Mesa
    doesn't currently apply this marking to its drivers.)

    For example on a mixed armel/armhf system, if we did "option 1", we would
    be relying on an armhf Vulkan-Loader doing something like this:

    readdir("/usr/share/vulkan/icd.d") => a list of filenames
    including e.g. radeon_icd.armel.json, radeon_icd.armhf.json,
    nouveau_icd.armhf.json and so on

    fopen(".../radeon_icd.armel.json", ...) => success
    parse JSON
    dlopen("/usr/lib/arm-linux-gnueabi/libvulkan_radeon.so", ...) => failure
    (because it's a softfloat library and we are a hardfloat process)

    fopen(".../radeon_icd.armhf.json", ...) => success
    parse JSON
    dlopen("/usr/lib/arm-linux-gnueabihf/libvulkan_radeon.so", ...) => success
    ask this driver whether it can find any GPUs that it supports
    foreach GPU in the result {
    add the GPU to our list of devices
    }

    repeat both for nouveau driver
    repeat both for lavapipe driver
    repeat both for virtio driver
    etc.

    This works as intended for the most common multiarch scenarios like
    amd64/i386. I suspect it also works as intended for armel/armhf, although
    your assertion is that it does not.


    2. Or, Mesa could give its Vulkan drivers the same file layout as its
    Vulkan layers [and]
    set the library_path field to be just the basename

    Given what I said earlier about the inability to tell ELF headers apart
    and the real problems observed in trying to do so, I have a preference
    for the first option.

    I don't see how this would introduce a new problem that we don't already
    have to deal with. In "option 2", Vulkan-Loader would do something like:

    readdir("/usr/share/vulkan/icd.d") => a list of filenames
    including e.g. radeon_icd.json, nouveau_icd.json and so on

    fopen(".../radeon_icd.json", ...) => success
    parse JSON
    dlopen("libvulkan_radeon.so", ...) => success
    ask this driver whether it can find any GPUs that it supports
    foreach GPU in the result {
    add the GPU to our list of devices
    }

    repeat for nouveau driver
    repeat for lavapipe driver
    repeat for virtio driver
    etc.

    But if dlopen("libvulkan_radeon.so", ...) can succeed but return a library
    that is of the wrong architecture, don't we already have an equivalent
    problem when evaluating DT_NEEDED dependencies, like when libGL.so.1
    loads libGLdispatch.so.0, or at least when evaluating dlopen()'d weak dependencies, like when libSDL2-2.0.so.0 loads libdbus-1.so.3? And if that problem already exists, then the relevant architecture pair already aren't going to work well together, and Mesa/Vulkan isn't making the problem worse.

    smcv

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Helmut Grohne@21:1/5 to All on Tue Apr 15 22:10:01 2025
    XPost: linux.debian.maint.x

    On Tue, Apr 15, 2025 at 10:41:22AM +0200, Michel Dänzer wrote:
    Given that different variants of libvulkan_*.so are located in separate search paths, is there any scenario other than a system misconfiguration which would result in an attempt to load the wrong one?

    I fear the answer to this question is not obvious and I can only
    partially answer it.

    For one thing, I note that the idea of separate search paths is a nice
    one. We want to think of architectures and their library directories as separate. That is not reality. The glibc dynamic loader searches every
    library path referenced from /etc/ld.so.conf. On Debian multiarch
    systems that happens to be every architecture for which libc6 is
    installed. You can easily see this on a multiarch system by inspecting /etc/ld.so.cache as it contains libraries for multiple architectures.

    I can also tell you that running a kfreebsd-amd64 ELF executable on a
    Linux amd64 kernel works "too well". The Linux kernel cannot tell these architectures apart from the ELF header and happily runs it. As the
    syscall ABI is completely different, you end up doing stuff you never
    wanted such as resetting your system clock before the program quickly
    fails.

    Loading shared libraries is a different beast as it is done by glibc.
    There the story looks different. If you attempt to load incompatible
    libraries you tend to see the error "wrong ELF class: ..." from
    dlerror() after having tried loading it with dlopen. Not so, if your architectures are too similar. For instance, attempting to load an armel library into an armhf executable, I got "cannot open shared object file:
    No such file or directory" (and note that it successfully opened but not
    mapped the file).

    I cannot tell how this translates to the vulkan case. However, we
    learned two aspects in the process:
    * On Debian systems, the loader will search all multiarch directories
    for compatible libraries.
    * The ELF class is not sufficient to tell armel and armhf apart.

    Whether these are convincing arguments is up to you in the end. I
    suspect further research is needed.

    Helmut

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Helmut Grohne@21:1/5 to Simon McVittie on Tue Apr 15 22:10:01 2025
    XPost: linux.debian.maint.x

    Hi Simon,

    On Tue, Apr 15, 2025 at 01:47:54PM +0100, Simon McVittie wrote:
    I think a regression for amd64/i386 co-installation would have a
    considerably larger practical negative impact on Debian users than
    ABI conflicts between less-commonly-used architecture pairs like
    armel/armhf, and a very much larger practical negative impact than
    conflicts between architecture pairs involving -ports (amd64/hurd-amd64
    or amd64/kfreebsd-amd64) or architectures that are not yet in Debian at
    all (amd64/musl-linux-amd64).

    This reasoning convinces me. As it stands, I only see solutions to this
    problem that are inappropriate for trixie.

    (b) glibc/ld.so can't distinguish between armel and armhf libraries.
    But if this is true, then we will already have the problem that loading
    an ordinary library dependency like "libc.so.6" or "libvulkan.so.1" to
    satisfy DT_NEEDED can load the wrong flavour, so we have already lost,
    even before loading a Vulkan driver plugin; and I don't see how Mesa
    doing a dlopen("libvulkan_radeon.so", ...) is going to make this
    any worse.

    This is an argument that I missed earlier. Thank you. It changes
    preference on solution as it removes my argument for not preferring
    option 2.

    But can you dlopen() kfreebsd-amd64 libraries into a running Linux
    amd64 process? That's what matters here. If you can't, then the drivers
    from mesa-vulkan-drivers:kfreebsd-amd64 will gracefully fail to load,
    no harm done. Or if you can, then we likely already have worse problems.

    I fear I cannot answer that question anymore given that there no longer
    is kfreebsd-amd64. Experiments with armel and armhf (see my other mail)
    though yield that the error returned from dlerror() is different from
    when your ELF class is incompatible. I'd like to understand the
    semantics here, but my research of the glibc source was inconclusive in
    this regard.

    Presumably musl-linux-amd64 has a library search path (either hard-coded
    into it or via configuration) that is distinct from glibc's; if it didn't, and if glibc's and musl's dynamic linkers are unable to avoid loading libraries from the "other" ABI (scenario b above), we would already have worse problems.

    The reason we don't have those problems primarily is that the ongoing disagreement between musl and systemd upstreams make it impossible to
    bootstrap musl-based Debian ports and therefore there is noone who
    attempts to mix musl and glibc on a single system.

    In any case, I expect musl to search /usr/lib, so there is at least that
    shared path.

    However if a musl process calls dlopen("/usr/lib/x86_64-linux-gnu/libvulkan_radeon.so", ...), as it
    would if option 1 is taken, then I can see how that might accidentally succeed if their ELF flags happen to be the same, leading to problems
    when musl and glibc ABI assumptions collide.

    This is another fairly convincing argument!

    This works as intended for the most common multiarch scenarios like amd64/i386. I suspect it also works as intended for armel/armhf, although your assertion is that it does not.

    Indeed, my assumption was that you could dlopen an armel library on
    armhf, but I didn't succeed in practically doing that. I cannot tell
    whether this is due to me not trying hard enough or whether there is
    some mechanism systematically preventing this from working in a reliable
    way.

    I propose the following consensus:

    None of the known solutions (options 1 and 2) or workarounds (dropping m-a:same) is appropriate for Debian trixie and the best course of
    short-term action is not fixing this bug for trixie while working
    towards a long term solution in forky.

    Regarding the precise implementation going forward, I prefer deferring
    to you (plural) as I've shared my limited knowledge and trust that you
    find a more sensible solution than I could.

    Helmut

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Michel_D=C3=A4nzer?=@21:1/5 to Helmut Grohne on Wed Apr 16 09:30:01 2025
    XPost: linux.debian.maint.x

    On 2025-04-15 11:34, Helmut Grohne wrote:

    I can also tell you that running a kfreebsd-amd64 ELF executable on a
    Linux amd64 kernel works "too well". The Linux kernel cannot tell these architectures apart from the ELF header and happily runs it. As the
    syscall ABI is completely different, you end up doing stuff you never
    wanted such as resetting your system clock before the program quickly
    fails.

    Loading shared libraries is a different beast as it is done by glibc.
    There the story looks different. If you attempt to load incompatible libraries you tend to see the error "wrong ELF class: ..." from
    dlerror() after having tried loading it with dlopen. Not so, if your architectures are too similar. For instance, attempting to load an armel library into an armhf executable, I got "cannot open shared object file:
    No such file or directory" (and note that it successfully opened but not mapped the file).

    Per Simon's other post, none of that is relevant for his proposed option 2, which boils down to dlopen("libvulkan_*.so", ...). If that could end up opening a variant from a wrong search path, it'd break lots of other things anyway.


    In summary, Simon's option 2 seems like the clear winner. As he pointed out, it matches what's already being done for libEGL_mesa.so.0 shipped in libegl-mesa0, with no known issues.


    --
    Earthling Michel Dänzer \ GNOME / Xwayland / Mesa developer https://redhat.com \ Libre software enthusiast

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Simon McVittie@21:1/5 to Helmut Grohne on Wed Apr 23 17:20:01 2025
    XPost: linux.debian.maint.x

    On Tue, 15 Apr 2025 at 16:52:56 +0200, Helmut Grohne wrote:
    On Tue, Apr 15, 2025 at 01:47:54PM +0100, Simon McVittie wrote:
    I think a regression for amd64/i386 co-installation would have a
    considerably larger practical negative impact on Debian users

    This reasoning convinces me. As it stands, I only see solutions to this >problem that are inappropriate for trixie.

    I think the "option 2" that I proposed is entirely feasible for trixie, actually. I'm testing an implementation now.

    What I said before is that I don't think it's feasible to send
    it upstream for trixie, but it's fairly straightforward to do as a
    downstream adjustment (in debian/rules rather than in the upstream code),
    so inability to upstream it is not necessarily a blocker IMO. Within
    Debian, we can make simplifying assumptions like "we are installing into
    a $libdir that the dynamic linker will search by default" which would
    not necessarily be considered valid upstream.

    smcv

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)