• Bug#1096105: ITP: ggml -- Tensor library for machine learning

    From Clint Adams@21:1/5 to Christian Kastner on Tue Apr 15 05:30:01 2025
    On Sun, Feb 16, 2025 at 01:07:04PM +0100, Christian Kastner wrote:
    * Package name : libggml

    I'm getting a SIGILL when trying to run llama-bench on a
    machine without avx512.

    I see that you're building and installing the cpu baseline
    libggml-cpu with -DGGML_AVX512=OFF , however, in the .deb
    produced this happens:

    % objdump -D /usr/lib/x86_64-linux-gnu/ggml/libggml-cpu.so| grep -i vinserti
    2c60c: 62 f3 fd 28 38 c1 01 vinserti64x2 $0x1,%xmm1,%ymm0,%ymm0
    2e11d: 62 f3 75 48 3a d1 01 vinserti32x8 $0x1,%ymm1,%zmm1,%zmm2
    2e401: 62 d3 65 48 3a dc 01 vinserti32x8 $0x1,%ymm12,%zmm3,%zmm3
    2e408: 62 d3 7d 48 3a c2 01 vinserti32x8 $0x1,%ymm10,%zmm0,%zmm0
    2e422: 62 d3 5d 48 3a e3 01 vinserti32x8 $0x1,%ymm11,%zmm4,%zmm4
    2e433: 62 53 15 28 38 6d 00 vinserti32x4 $0x1,0x0(%r13),%ymm13,%ymm13
    2e448: 62 f3 6d 48 3a d5 01 vinserti32x8 $0x1,%ymm5,%zmm2,%zmm2
    2e5fc: 62 53 35 48 3a c9 01 vinserti32x8 $0x1,%ymm9,%zmm9,%zmm9
    2e603: 62 53 25 48 3a db 01 vinserti32x8 $0x1,%ymm11,%zmm11,%zmm11
    2e61e: 62 53 2d 48 3a d2 01 vinserti32x8 $0x1,%ymm10,%zmm10,%zmm10
    2e64a: 62 53 1d 48 3a e4 01 vinserti32x8 $0x1,%ymm12,%zmm12,%zmm12
    2e651: 62 f3 65 48 3a db 01 vinserti32x8 $0x1,%ymm3,%zmm3,%zmm3
    2e68a: 62 f3 6d 48 3a d2 01 vinserti32x8 $0x1,%ymm2,%zmm2,%zmm2
    2e691: 62 f3 5d 48 3a e4 01 vinserti32x8 $0x1,%ymm4,%zmm4,%zmm4
    2e6ac: 62 f3 55 48 3a ed 01 vinserti32x8 $0x1,%ymm5,%zmm5,%zmm5
    2ebe9: c4 e3 75 38 c9 01 vinserti128 $0x1,%xmm1,%ymm1,%ymm1
    2f067: 62 53 25 48 3a db 01 vinserti32x8 $0x1,%ymm11,%zmm11,%zmm11
    2f06e: 62 53 35 48 3a c9 01 vinserti32x8 $0x1,%ymm9,%zmm9,%zmm9
    2f0bd: 62 f3 75 48 3a cf 01 vinserti32x8 $0x1,%ymm7,%zmm1,%zmm1
    2f0c4: 62 f3 55 48 3a eb 01 vinserti32x8 $0x1,%ymm3,%zmm5,%zmm5
    2f0e9: 62 f3 5d 48 3a f6 01 vinserti32x8 $0x1,%ymm6,%zmm4,%zmm6
    2f0f6: 62 f3 6d 48 3a d0 01 vinserti32x8 $0x1,%ymm0,%zmm2,%zmm2
    2f17a: 62 53 3d 48 3a c0 01 vinserti32x8 $0x1,%ymm8,%zmm8,%zmm8
    2f18f: 62 f3 6d 28 38 51 f7 vinserti32x4 $0x1,-0x90(%rcx),%ymm2,%ymm2
    2f197: 62 f3 4d 48 3a f6 01 vinserti32x8 $0x1,%ymm6,%zmm6,%zmm6
    2f205: 62 f3 45 48 3a ff 01 vinserti32x8 $0x1,%ymm7,%zmm7,%zmm7
    2f29e: 62 53 0d 48 3a f6 01 vinserti32x8 $0x1,%ymm14,%zmm14,%zmm14
    2f2c0: 62 53 15 48 3a ed 01 vinserti32x8 $0x1,%ymm13,%zmm13,%zmm13
    2f2cd: 62 53 1d 48 3a e4 01 vinserti32x8 $0x1,%ymm12,%zmm12,%zmm12
    2f775: c4 e3 75 38 c9 01 vinserti128 $0x1,%xmm1,%ymm1,%ymm1
    3775d: 62 f3 7d 28 38 c1 01 vinserti32x4 $0x1,%xmm1,%ymm0,%ymm0
    37843: 62 f3 7d 28 38 c5 01 vinserti32x4 $0x1,%xmm5,%ymm0,%ymm0
    3793c: 62 f3 7d 28 38 c1 01 vinserti32x4 $0x1,%xmm1,%ymm0,%ymm0
    37a51: 62 f3 7d 28 38 c1 01 vinserti32x4 $0x1,%xmm1,%ymm0,%ymm0
    37d23: 62 f3 75 28 38 cb 01 vinserti32x4 $0x1,%xmm3,%ymm1,%ymm1
    37d54: 62 f3 6d 28 38 db 01 vinserti32x4 $0x1,%xmm3,%ymm2,%ymm3
    37d61: 62 b3 6d 28 38 d1 01 vinserti32x4 $0x1,%xmm17,%ymm2,%ymm2
    380f0: c4 e3 75 38 c9 01 vinserti128 $0x1,%xmm1,%ymm1,%ymm1
    380f6: c4 e3 7d 38 c0 01 vinserti128 $0x1,%xmm0,%ymm0,%ymm0
    383a5: c4 e3 75 38 c9 01 vinserti128 $0x1,%xmm1,%ymm1,%ymm1
    383ab: c4 e3 7d 38 c0 01 vinserti128 $0x1,%xmm0,%ymm0,%ymm0
    386c6: c4 e3 65 38 db 01 vinserti128 $0x1,%xmm3,%ymm3,%ymm3
    38928: c4 e3 4d 38 f6 01 vinserti128 $0x1,%xmm6,%ymm6,%ymm6
    38ddd: 62 f3 fd 28 38 c1 01 vinserti64x2 $0x1,%xmm1,%ymm0,%ymm0
    38e2c: 62 f3 fd 28 38 c1 01 vinserti64x2 $0x1,%xmm1,%ymm0,%ymm0
    38e63: 62 f3 f5 28 38 c8 01 vinserti64x2 $0x1,%xmm0,%ymm1,%ymm1
    38e8d: 62 f3 fd 28 38 c3 01 vinserti64x2 $0x1,%xmm3,%ymm0,%ymm0
    390bf: 62 63 bd 20 38 c1 01 vinserti64x2 $0x1,%xmm1,%ymm24,%ymm24
    39100: 62 e3 c5 20 38 f9 01 vinserti64x2 $0x1,%xmm1,%ymm23,%ymm23
    39135: 62 f3 e5 28 38 d9 01 vinserti64x2 $0x1,%xmm1,%ymm3,%ymm3
    3915e: c4 e3 75 38 c9 01 vinserti128 $0x1,%xmm1,%ymm1,%ymm1
    39164: 62 03 35 20 38 c9 01 vinserti32x4 $0x1,%xmm25,%ymm25,%ymm25
    39170: 62 e3 cd 20 38 f0 01 vinserti64x2 $0x1,%xmm0,%ymm22,%ymm22
    39456: 62 f3 f5 28 38 c8 01 vinserti64x2 $0x1,%xmm0,%ymm1,%ymm1
    394cb: 62 f3 fd 28 38 c4 01 vinserti64x2 $0x1,%xmm4,%ymm0,%ymm0
    396ee: 62 f3 6d 28 38 d0 01 vinserti32x4 $0x1,%xmm0,%ymm2,%ymm2
    3974f: 62 f3 7d 28 38 c7 01 vinserti32x4 $0x1,%xmm7,%ymm0,%ymm0
    397c7: 62 f3 cd 28 38 ff 01 vinserti64x2 $0x1,%xmm7,%ymm6,%ymm7
    397fd: 62 d3 cd 28 38 f1 01 vinserti64x2 $0x1,%xmm9,%ymm6,%ymm6
    39a65: 62 d3 75 28 38 ce 01 vinserti32x4 $0x1,%xmm14,%ymm1,%ymm1
    39ae5: 62 d3 7d 28 38 c6 01 vinserti32x4 $0x1,%xmm14,%ymm0,%ymm0
    39d2e: 62 f3 fd 28 38 c3 01 vinserti64x2 $0x1,%xmm3,%ymm0,%ymm0
    39db4: 62 f3 c5 28 38 fc 01 vinserti64x2 $0x1,%xmm4,%ymm7,%ymm7
    3a098: 62 f3 fd 28 38 c1 01 vinserti64x2 $0x1,%xmm1,%ymm0,%ymm0
    3a122: 62 f3 fd 28 38 c1 01 vinserti64x2 $0x1,%xmm1,%ymm0,%ymm0
    3a190: 62 f3 f5 28 38 c8 01 vinserti64x2 $0x1,%xmm0,%ymm1,%ymm1
    3a1e9: 62 f3 fd 28 38 c1 01 vinserti64x2 $0x1,%xmm1,%ymm0,%ymm0
    3a211: 62 f3 4d 28 38 f0 01 vinserti32x4 $0x1,%xmm0,%ymm6,%ymm6
    3a22d: 62 d3 7d 28 38 c4 01 vinserti32x4 $0x1,%xmm12,%ymm0,%ymm0
    3a3f9: 62 f3 75 28 38 ca 01 vinserti32x4 $0x1,%xmm2,%ymm1,%ymm1
    3a41b: 62 f3 7d 28 38 c2 01 vinserti32x4 $0x1,%xmm2,%ymm0,%ymm0
    3a65a: 62 f3 7d 28 38 c2 01 vinserti32x4 $0x1,%xmm2,%ymm0,%ymm0
    3a68b: 62 f3 75 28 38 ca 01 vinserti32x4 $0x1,%xmm2,%ymm1,%ymm1

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Clint Adams@21:1/5 to Clint Adams on Tue Apr 15 15:00:01 2025
    On Tue, Apr 15, 2025 at 03:17:36AM +0000, Clint Adams wrote:
    I see that you're building and installing the cpu baseline
    libggml-cpu with -DGGML_AVX512=OFF , however, in the .deb
    produced this happens:

    % objdump -D /usr/lib/x86_64-linux-gnu/ggml/libggml-cpu.so| grep -i vinserti
    2c60c: 62 f3 fd 28 38 c1 01 vinserti64x2 $0x1,%xmm1,%ymm0,%ymm0

    I delved slightly deeper, and these instructions are only produced in `build-cpu-hwcaps-x86-64-v4`, so the problem is that only the -v4
    object is making it into the .deb.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Clint Adams@21:1/5 to Christian Kastner on Wed Apr 16 01:10:01 2025
    On Wed, Apr 16, 2025 at 12:16:27AM +0200, Christian Kastner wrote:
    Indeed, I messed up what I thought was a simple refactoring in my last >upload.

    I just uploaded new upstream version with a fix to NEW.

    I also uploaded a new upstream version of llama.cpp to NEW, as the
    existing one was incompatible with the new ggml.

    Thanks, I've rebuilt ggml and see the correct glibc-hwcaps split,
    and am building llama.cpp now.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)