• Bug#1100880: mpich: MPII_init_gpu ... gpu_init failed

    From Drew Parsons@21:1/5 to All on Thu Mar 20 01:50:01 2025
    Package: mpich
    Version: 4.3.0-2
    Severity: serious
    Justification: debci
    Control: affects -1 src:armci-mpi

    mpich 4.3 is doing something different with gpu support,
    that is causing tests to fail, both mpich's own tests (on amd64 etc)
    and build tests in other packages like armci-mpi.

    For instance, mpich's amd64 tests at https://ci.debian.net/packages/m/mpich/unstable/amd64/58761388/
    show

    81s autopkgtest [07:37:03]: test hello4: [-----------------------
    82s Abort(672262671): Fatal error in internal_Init: Other MPI error, error stack:
    82s internal_Init(70)....: MPI_Init(argc=(nil), argv=(nil)) failed
    82s MPII_Init_thread(199):
    82s MPII_init_gpu(51)....: gpu_init failed


    A fresh armci-mpi build sees the same error "gpu_init failed" in
    armci-mpi tests.



    -- System Information:
    Debian Release: trixie/sid
    APT prefers unstable-debug
    APT policy: (500, 'unstable-debug'), (500, 'unstable'), (1, 'experimental') Architecture: amd64 (x86_64)
    Foreign Architectures: i386

    Kernel: Linux 6.12.19-amd64 (SMP w/8 CPU threads; PREEMPT)
    Kernel taint flags: TAINT_PROPRIETARY_MODULE, TAINT_OOT_MODULE
    Locale: LANG=en_AU.UTF-8, LC_CTYPE=en_AU.UTF-8 (charmap=UTF-8), LANGUAGE=en_AU:en
    Shell: /bin/sh linked to /usr/bin/dash
    Init: systemd (via /run/systemd/system)
    LSM: AppArmor: enabled

    Versions of packages mpich depends on:
    ii hwloc 2.12.0-1
    ii libamdhip64-5 5.7.1-5+b1
    ii libc6 2.41-6
    ii libhwloc15 2.12.0-1
    ii libmpich12 4.3.0-2
    ii libslurm42t64 24.11.3-2
    ii perl 5.40.1-2

    Versions of packages mpich recommends:
    ii libmpich-dev 4.3.0-2

    Versions of packages mpich suggests:
    ii mpich-doc 4.3.0-2

    -- no debconf information

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)