• Avoiding /var/tmp for long-running compute (was: Make /tmp/ a tmpfs and

    From Russ Allbery@21:1/5 to Jonathan Dowland on Wed May 8 18:50:01 2024
    "Jonathan Dowland" <jmtd@debian.org> writes:

    Else-thread, Russ begs people to stop doing this. I agree people
    shouldn't! We should also work on education and promotion of the alternatives.

    Also, helping people use better tools for managing workloads like this
    that make their lives easier and have better semantics, thus improving
    life for everyone.

    I'm suggesting solutions that I don't have time to help implement, and of course it will take a long time for better tools to filter into all those clusters, so this doesn't address the immediate problem of this thread
    (hence the subject change). But based on my past experience with these
    types of systems, I bet a lot of the patterns captured in software are
    older ones. Linux has a *lot* of facilities today that it didn't have, or
    at least weren't widely used, five years ago. It would be great to help
    some of those improvements filter down, because they can make a lot of
    these problems go away.

    For example, take the case of scratch space for batch computing. The
    logical lifespan for temporary files for a batch computing job is the
    lifetime of the job, whatever that may be. (I know there are exceptions,
    but here I'm just talking about defaults.) Previously one would have to
    build support into the batch job management system for creating and
    managing those per-job temporary directories, and ensure the jobs support TMPDIR or other environment variables to control where they store data,
    and everyone was doing this independently. (I've done a *lot* of this
    kind of thing, once upon a time.)

    But now we have mount namespaces, and systemd has PrivateTmp that builds
    on top of that. So if the job is managed by an execution manager, it can create per-job temporary directories and it may already support (as
    systemd does) the semantics of deleting the contents of those directories
    on job exit, and it bind-mounts those into the process space and the
    process is none the wiser. I think all of the desirable glue may not
    fully be there (controlling what underlying file system is used for
    PrivateTmp, ensuring they're also excluded from normal cleanup, etc.), but
    this is very close to a much better way of handling this problem that
    still exposes /tmp and /var/tmp to the job so that none of the
    often-crufty scientific computing software has to change.

    The new capabilities that Linux now has due to namespaces are marvellous
    and solve a whole lot of problems that I didn't realize were even
    solvable, and right now I suspect there are huge opportunities for
    substantial improvements without a whole lot of effort by just plumbing
    those facilities through to higher-level layers like batch systems. Whole classes of long-standing problems would just disappear, or at least be
    far, far easier to manage.

    Substantial, substantial caveat: I have been out of this world for a
    while, and maybe most of this work has already been done? That would be amazing. The best possible response to this post would be for someone to
    tell me I'm five years behind and the batch systems have already picked up
    this work and we can just point people at them.

    --
    Russ Allbery (rra@debian.org) <https://www.eyrie.org/~eagle/>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)