• Bug#1101305: libfuse3-4: regression in 3.17.1-1 for gvfsd-fuse: "both '

    From Simon McVittie@21:1/5 to Michael Anderson on Tue Mar 25 13:40:01 2025
    Package: libfuse3-4
    Version: 3.17.1-1
    Severity: serious
    Justification: regression in code that is (AFAICS) following upstream recommendations
    X-Debbugs-Cc: Michael Anderson <doctorkillshot@yahoo.com>, Vladimir K <pzs-fs@yandex.ru>
    Control: affects -1 + src:gvfs gvfs-fuse

    On Tue, 25 Mar 2025 at 09:34:40 +0000, Michael Anderson wrote:
    Thanks for the previous fix which did work.

    However it has broken again with libfuse3-4 updating from 3.17.1~rc1-3
    to 3.17.1-1.

    On Tue, 25 Mar 2025 at 14:17:32 +0300, Vladimir K wrote:
    It worked, but broke again.

    $ eng apt list --installed fuse3 gvfs-fuse libfuse*
    fuse3/unstable,now 3.17.1-1 amd64 [installed,automatic]
    gvfs-fuse/unstable,now 1.57.2-2 amd64 [installed]
    libfuse2t64/unstable,now 2.9.9-9 amd64 [installed,automatic]
    libfuse3-4/unstable,now 3.17.1-1 amd64 [installed,automatic]

    $ /usr/libexec/gvfsd-fuse -f /run/user/1000/gvfs
    fuse: both 'want' and 'want_ext' are set

    Laszlo, I assume this is not the result you expected after updating
    fuse3? In gvfs-fuse_1.57.2-2, the only references to FUSE_CAP are:

    client/gvfsfusedaemon.c: fuse_set_feature_flag(conn, FUSE_CAP_ATOMIC_O_TRUNC); client/gvfsfusedaemon.c: fuse_unset_feature_flag(conn, FUSE_CAP_ASYNC_READ);

    which if I understand correctly is the style that is recommended by the upstream developers of FUSE. I verified that this worked as intended with libfuse3-4_3.17.1~rc1-3, but I can confirm the regression with 3.17.1-1.

    If necessary we can revert to the old code in gvfs as a workaround:

    conn->want |= FUSE_CAP_ATOMIC_O_TRUNC;
    conn->want &= ~FUSE_CAP_ASYNC_READ;

    but I'm aware that that's unlikely to be very future-proof, and I'd like
    to avoid getting into a cycle of fuse3 and gvfs working around each
    other!

    Thanks,
    smcv

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?B?TMOhc3psw7MgQsO2c3rDtnJtw@21:1/5 to smcv@debian.org on Tue Mar 25 14:40:01 2025
    On Tue, Mar 25, 2025 at 1:39 PM Simon McVittie <smcv@debian.org> wrote:
    On Tue, 25 Mar 2025 at 09:34:40 +0000, Michael Anderson wrote:
    Thanks for the previous fix which did work.

    However it has broken again with libfuse3-4 updating from 3.17.1~rc1-3
    to 3.17.1-1.
    [...]
    Laszlo, I assume this is not the result you expected after updating
    fuse3? In gvfs-fuse_1.57.2-2, the only references to FUSE_CAP are:

    client/gvfsfusedaemon.c: fuse_set_feature_flag(conn, FUSE_CAP_ATOMIC_O_TRUNC);
    client/gvfsfusedaemon.c: fuse_unset_feature_flag(conn, FUSE_CAP_ASYNC_READ);

    which if I understand correctly is the style that is recommended by the upstream developers of FUSE. I verified that this worked as intended with libfuse3-4_3.17.1~rc1-3, but I can confirm the regression with 3.17.1-1.
    Yes, this is the recommended way to use the feature flag helper
    functions. I think the break might be caused by the FUSE_CAP_* enum ->
    defines conversion [1]. I need to check, but maybe the bits are
    changed? I mean with enum FUSE_CAP_ATOMIC_O_TRUNC might have a value
    of x and with defines now it might be value y. As gvfs was compiled
    with the former, might use different flags / values with FUSE v3.17.1
    - need to check.
    But if this is the reason, gvfs will need a binNMU with the final FUSE
    v3.17.1 release.

    Regards,
    Laszlo/GCS
    [1] https://github.com/libfuse/libfuse/commit/3ae5ca7443348aabad9bc71b9d5b0999f8292379

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Simon McVittie@21:1/5 to Simon McVittie on Tue Mar 25 15:00:01 2025
    On Tue, 25 Mar 2025 at 12:34:26 +0000, Simon McVittie wrote:
    On Tue, 25 Mar 2025 at 14:17:32 +0300, Vladimir K wrote:
    $ /usr/libexec/gvfsd-fuse -f /run/user/1000/gvfs
    fuse: both 'want' and 'want_ext' are set

    Laszlo, I assume this is not the result you expected after updating
    fuse3? In gvfs-fuse_1.57.2-2, the only references to FUSE_CAP are:

    client/gvfsfusedaemon.c: fuse_set_feature_flag(conn, FUSE_CAP_ATOMIC_O_TRUNC);
    client/gvfsfusedaemon.c: fuse_unset_feature_flag(conn, FUSE_CAP_ASYNC_READ);

    which if I understand correctly is the style that is recommended by the >upstream developers of FUSE. I verified that this worked as intended with >libfuse3-4_3.17.1~rc1-3, but I can confirm the regression with 3.17.1-1.

    In the thread that calls gvfs' vfs_init() from FUSE's fuse_fs_init(),
    it looks like we end up with:

    conn->capable = 0x7fbfffdb
    conn->want = 0x0000f819
    (ASYNC_READ | ATOMIC_O_TRUNC | EXPORT_SUPPORT | IOCTL_DIR |
    AUTO_INVAL_DATA | READDIRPLUS | READDIRPLUS_AUTO | ASYNC_DIO) conn->capable_ext = 0x7fbfffdb
    conn->want_ext = 0x0000f818
    (ATOMIC_O_TRUNC | EXPORT_SUPPORT | IOCTL_DIR |
    AUTO_INVAL_DATA | READDIRPLUS | READDIRPLUS_AUTO | ASYNC_DIO)

    when control returns from gvfs to libfuse: gvfs has all of the default capability flags, except for ASYNC_READ which it explicitly disables
    (and it also explicitly enables ATOMIC_O_TRUNC, but that's the default
    in fuse3 anyway). Then in fuse_fs_init() we have

    want_ext_default = 0x0000f819
    want_default = 0x0000f819
    (both ASYNC_READ | ATOMIC_O_TRUNC | EXPORT_SUPPORT | IOCTL_DIR |
    AUTO_INVAL_DATA | READDIRPLUS | READDIRPLUS_AUTO | ASYNC_DIO)

    which means convert_to_conn_want_ext() shouldn't be failing, because
    the condition "conn->want != want_default" shouldn't be met.

    But when I put a breakpoint on fuse_log(), the backtrace is from a
    different caller:

    #0 fuse_log
    (level=level@entry=FUSE_LOG_ERR, fmt=fmt@entry=0x7ffff7625568 "fuse: both 'want' and 'want_ext' are set\n")
    at ../lib/fuse_log.c:78
    #1 0x00007ffff761ad1b in convert_to_conn_want_ext
    (conn=0x55555556c130, want_ext_default=<optimized out>, want_default=65033) at ../lib/fuse_i.h:247
    #2 do_init (req=0x7ffff0000d20, nodeid=<optimized out>, inarg=0x7ffff0001fd8) at ../lib/fuse_lowlevel.c:2176
    #3 0x00007ffff761b3f9 in fuse_session_process_buf_internal
    (se=0x55555556bf80, buf=buf@entry=0x55555556ba38, ch=<optimized out>) at ../lib/fuse_lowlevel.c:2909
    #4 0x00007ffff76160d7 in fuse_do_work (data=0x55555556ba20) at ../lib/fuse_loop_mt.c:179
    #5 0x00007ffff714eb7b in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:448
    #6 0x00007ffff71cc7b8 in __GI___clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

    ... is gvfs perhaps initializing its capabilities in the wrong place or something, such that FUSE receives an unintended capability set?

    In this backtrace, want_default is 0xfe09:
    ASYNC_READ | ATOMIC_O_TRUNC | SPLICE_READ | FLOCK_LOCKS | IOCTL_DIR |
    AUTO_INVAL_DATA | READDIRPLUS | READDIRPLUS_AUTO | ASYNC_DIO

    which differs from the defaults seen when gvfs' vfs_init() is called:
    in vfs_init() we have EXPORT_SUPPORT, but in fuse_do_work() we have
    SPLICE_READ and FLOCK_LOCKS. Which one is correct? Presumably not both?

    I wonder whether there might be a thread-safety issue here that is
    either resulting in initialization happening in both threads, or
    writes from one thread not being visible to the other? do_init() says

    /* Prevent bogus data races (bogus since "init" is called before
    * multi-threading becomes relevant */

    but I can't help wondering whether those data races reported by tsan
    are, in fact, not bogus at all.

    I also wonder whether it would be better for fuse_set_feature_flag() and fuse_unset_feature_flag() to set/unset the relevant flag in conn->want
    *and* conn->want_ext, if it happens to be below the 32-bit boundary.
    That way, "fuse_lower_32_bits(conn->want_ext) != conn->want" would
    usually be false and we would not have any inconsistency.

    smcv

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jakob Haufe@21:1/5 to All on Wed Apr 30 19:10:01 2025
    Control: reopen 1101305
    Control: affects 1101305 ceph-fuse

    Unfortunately, this still affects ceph-fuse.

    See https://github.com/libfuse/libfuse/pull/1217 for a likely cause and
    fix. I can provide a patch against 3.17.1+git250416-1 as well, if wanted.

    Cheers,
    sur5r

    --
    ceterum censeo microsoftem esse delendam.

    -----BEGIN PGP SIGNATURE-----

    iQIzBAEBCgAdFiEEe/X2rDZDH11A3BN6TPKyGPVNrj0FAmgSWKAACgkQTPKyGPVN rj1OrQ//WgH3RzUzJVgrWKfdXYRAz5YRXcYSuzexhuCcor+0cVGfua1w0DN1YGXh AOyCKxuUBpb07lgOjSSzB/PA4hwmEv16XkOsBDPCw/eVFBk8JlpHUV8vLOTSkHCV y6pE667t7dr6eqzHdzynXeT6No94u+C6jkLzK2dgXw4RKxH7j4TpOkAWidtBxvSy JiNNlSqKPkB2vsnYLx4S0RQJRcWfpxoDS/F9+24bBkqwLAAaq6yk4HRu7dVXErJn lL+Ycu425dJSWeD5R1vWDhsKri87PShwLwEnCo92NktNZYgyexv12t2vvskqCNh1 66GpK7lmlPHpsa/vhj/S8JUMJY/j1WoooxsJWA1u9Ap4h865MmSHZ88rzMLHXJRI duURaczeKdDBfX6GogmSqOtegRLajKm2sJ9K25rFK6+rlo4gakGMAQnjjE3S+eQ/ 9nCbENUCZhaK5l3pi0HjU707cTb2DYoO0QoHRezU1mKydrVdI/mHD4qG9i+xGIvm IQFoPjiO5O/ulj+6mpQmksxsOzgvrN5VX+yF236EE5s+8tx19DvWo57X1Hy961NE C1NUj1DGpH+PPw9hqp3zzpHs48gMvQ7DH64bmNXEL3afoyJSmqTm5CUN4tY2zHBY mVZHibNnZM8YRcFyFKtR+te7dyCyeOLNi361H8L5xQ4sQhyLVOc=
    =7Fo2
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)