• Question about garbage collection

    From Frank Millman@21:1/5 to All on Mon Jan 15 15:51:26 2024
    Hi all

    I have read that one should not have to worry about garbage collection
    in modern versions of Python - it 'just works'.

    I don't want to rely on that. My app is a long-running server, with
    multiple clients logging on, doing stuff, and logging off. They can
    create many objects, some of them long-lasting. I want to be sure that
    all objects created are gc'd when the session ends.

    I do have several circular references. My experience is that if I do not
    take some action to break the references when closing the session, the
    objects remain alive. Below is a very simple program to illustrate this.

    Am I missing something? All comments appreciated.

    Frank Millman

    ==================================================

    import gc

    class delwatcher:
        # This stores enough information to identify the object being watched.
        # It does not store a reference to the object itself.
        def __init__(self, obj):
            self.id = (obj.type, obj.name, id(obj))
            print('***', *self.id, 'created ***')
        def __del__(self):
            print('***', *self.id, 'deleted ***')

    class Parent:
        def __init__(self, name):
            self.type = 'parent'
            self.name = name
            self.children = []
            self._del = delwatcher(self)

    class Child:
        def __init__(self, parent, name):
            self.type = 'child'
            self.parent = parent
            self.name = name
            parent.children.append(self)
            self._del = delwatcher(self)

    p1 = Parent('P1')
    p2 = Parent('P2')

    c1_1 = Child(p1, 'C1_1')
    c1_2 = Child(p1, 'C1_2')
    c2_1 = Child(p2, 'C2_1')
    c2_2 = Child(p2, 'C2_2')

    input('waiting ...')

    # if next 2 lines are included, parent and child can be gc'd
    # for ch in p1.children:
    #     ch.parent = None

    # if next line is included, child can be gc'd, but not parent
    # p1.children = None

    del c1_1
    del p1
    gc.collect()

    input('wait some more ...')

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From dieter.maurer@online.de@21:1/5 to Frank Millman on Mon Jan 15 19:59:06 2024
    Frank Millman wrote at 2024-1-15 15:51 +0200:
    I have read that one should not have to worry about garbage collection
    in modern versions of Python - it 'just works'.

    There are still some isolated cases when not all objects
    in an unreachable cycle are destroyed
    (see e.g. step 2 of "https://devguide.python.org/internals/garbage-collector/index.html#destroying-unreachable-objects").
    But Python's own objects (e.g. traceback cycles)
    or instances of classes implemented in Python
    should no longer be affected.

    Thus, unless you use extensions implemented in C (with "legacy finalizer"s), garbage collection should not make problems.


    On the other hand, your application, too, must avoid memory leaks.
    Caches of various forms (with data for several sessions) might introduce them.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Akkana Peck@21:1/5 to Dieter Maurer via Python-list on Mon Jan 15 12:21:05 2024
    Frank Millman wrote at 2024-1-15 15:51 +0200:
    I have read that one should not have to worry about garbage collection
    in modern versions of Python - it 'just works'.

    Dieter Maurer via Python-list writes:
    There are still some isolated cases when not all objects
    in an unreachable cycle are destroyed
    (see e.g. step 2 of "https://devguide.python.org/internals/garbage-collector/index.html#destroying-unreachable-objects").

    Also be warned that some modules (particularly if they're based on libraries not written in Python) might not garbage collect, so you may need to use other methods of cleaning up after those objects.

    ...Akkana

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul Rubin@21:1/5 to Frank Millman on Mon Jan 15 18:12:52 2024
    Frank Millman <frank@chagford.com> writes:
    I don't want to rely on that. My app is a long-running server, with
    multiple clients logging on, doing stuff, and logging off. They can
    create many objects, some of them long-lasting. I want to be sure that
    all objects created are gc'd when the session ends.

    It's very hard to be sure there are no memory leaks in a system with
    gc. Wanting to do that is why Rust was invented. The best you can do
    is monitor for leaks, by looking at the gc stats now and then to make
    sure that the footprint isn't increasing.

    There is an Erlang saying that no uniprocessor system can be reliable,
    since the power cord is a single point of failure. That is, if you are
    trying to run a high availability system, you need a way to restart the
    server now and then without dropping connections. There are ways to do
    that, but it drifts from gc per se.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Akkana Peck@21:1/5 to Chris Angelico on Mon Jan 15 19:47:32 2024
    I wrote:
    Also be warned that some modules (particularly if they're based on libraries not written in Python) might not garbage collect, so you may need to use other methods of cleaning up after those objects.

    Chris Angelico writes:
    Got any examples of that?

    The big one for me was gdk-pixbuf, part of GTK. When you do something like gtk.gdk.pixbuf_new_from_file(), there's a Python object that gets created, but there's also the underlying C code that allocates memory for the pixbuf. When the object went out of
    scope, the Python object was automatically garbage collected, but the pixbuf data leaked. Calling gc.collect() caused the pixbuf data to be garbage collected too.

    There used to be a post explaining this on the pygtk mailing list: the link was http://www.daa.com.au/pipermail/pygtk/2003-December/006499.html
    but that page is gone now and I can't seem to find any other archives of that list (it's not on archive.org either). And this was from GTK2; I never checked whether the extra gc.collect() is still necessary in GTK3, but I figure leaving it in doesn't
    hurt anything. I use pixbufs in a tiled map application, so there are a lot of small pixbufs being repeatedly read and then deallocated.

    ...Akkana

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Passin@21:1/5 to Akkana Peck via Python-list on Mon Jan 15 22:36:26 2024
    On 1/15/2024 9:47 PM, Akkana Peck via Python-list wrote:
    I wrote:
    Also be warned that some modules (particularly if they're based on libraries not written in Python) might not garbage collect, so you may need to use other methods of cleaning up after those objects.

    Chris Angelico writes:
    Got any examples of that?

    The big one for me was gdk-pixbuf, part of GTK. When you do something like gtk.gdk.pixbuf_new_from_file(), there's a Python object that gets created, but there's also the underlying C code that allocates memory for the pixbuf. When the object went out
    of scope, the Python object was automatically garbage collected, but the pixbuf data leaked.

    This kind of thing can happen with PyQt, also. There are ways to
    minimize it but I don't know if you can ever be sure all Qt C++ objects
    will get deleted. It depends on the type of object and the circumstances.

    Calling gc.collect() caused the pixbuf data to be garbage collected too.

    There used to be a post explaining this on the pygtk mailing list: the link was
    http://www.daa.com.au/pipermail/pygtk/2003-December/006499.html
    but that page is gone now and I can't seem to find any other archives of that list (it's not on archive.org either). And this was from GTK2; I never checked whether the extra gc.collect() is still necessary in GTK3, but I figure leaving it in doesn't
    hurt anything. I use pixbufs in a tiled map application, so there are a lot of small pixbufs being repeatedly read and then deallocated.

    ...Akkana

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Paul Rubin on Tue Jan 16 05:54:30 2024
    On Mon, 15 Jan 2024 18:12:52 -0800, Paul Rubin wrote:

    It's very hard to be sure there are no memory leaks in a system with gc.

    Luckily, Python doesn’t do pure GC. I think the term is “ORC”, for “reference counting with cycles”.

    There are techniques you can use. Like when installing a callback into an object which points back to the object itself, you can stick a weak
    reference somewhere to break the cycle of strong references. If it’s a user-supplied callback, then the trick is to hide the weak reference in a
    part of the chain that the user will never see.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Barry@21:1/5 to All on Tue Jan 16 09:17:35 2024
    On 16 Jan 2024, at 03:49, Thomas Passin via Python-list <python-list@python.org> wrote:

    This kind of thing can happen with PyQt, also. There are ways to minimize it but I don't know if you can ever be sure all Qt C++ objects will get deleted. It depends on the type of object and the circumstances.

    When this has been seen in the past it has been promptly fixed by the maintainer.

    Barry

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Frank Millman@21:1/5 to Frank Millman via Python-list on Tue Jan 16 14:01:48 2024
    On 2024-01-15 3:51 PM, Frank Millman via Python-list wrote:
    Hi all

    I have read that one should not have to worry about garbage collection
    in modern versions of Python - it 'just works'.

    I don't want to rely on that. My app is a long-running server, with
    multiple clients logging on, doing stuff, and logging off. They can
    create many objects, some of them long-lasting. I want to be sure that
    all objects created are gc'd when the session ends.


    I did not explain myself very well. Sorry about that.

    My problem is that my app is quite complex, and it is easy to leave a
    reference dangling somewhere which prevents an object from being gc'd.

    This can create (at least) two problems. The obvious one is a memory
    leak. The second is that I sometimes need to keep a reference from a
    transient object to a more permanent structure in my app. To save myself
    the extra step of removing all these references when the transient
    object is deleted, I make them weak references. This works, unless the transient object is kept alive by mistake and the weak ref is never removed.

    I feel it is important to find these dangling references and fix them,
    rather than wait for problems to appear in production. The only method I
    can come up with is to use the 'delwatcher' class that I used in my toy
    program in my original post.

    I am surprised that this issue does not crop up more often. Does nobody
    else have these problems?

    Frank

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Passin@21:1/5 to Barry on Tue Jan 16 07:47:40 2024
    On 1/16/2024 4:17 AM, Barry wrote:


    On 16 Jan 2024, at 03:49, Thomas Passin via Python-list <python-list@python.org> wrote:

    This kind of thing can happen with PyQt, also. There are ways to minimize it but I don't know if you can ever be sure all Qt C++ objects will get deleted. It depends on the type of object and the circumstances.

    When this has been seen in the past it has been promptly fixed by the maintainer.

    The usual advice is to call deleteLater() on objects derived from PyQt
    classes. I don't know enough about PyQt to know if this takes care of
    all dangling reference problems, though.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Frank Millman@21:1/5 to Chris Angelico via Python-list on Tue Jan 16 16:43:38 2024
    On 2024-01-16 2:15 PM, Chris Angelico via Python-list wrote:

    Where do you tend to "leave a reference dangling somewhere"? How is
    this occurring? Is it a result of an incomplete transaction (like an
    HTTP request that never finishes), or a regular part of the operation
    of the server?


    I have a class that represents a database table, and another class that represents a database column. There is a one-to-many relationship and
    they maintain references to each other.

    In another part of the app, there is a class that represents a form, and another class that represents the gui elements on the form. Again there
    is a one-to-many relationship.

    A gui element that represents a piece of data has to maintain a link to
    its database column object. There can be a many-to-one relationship, as
    there could be more than one gui element referring to the same column.

    There are added complications which I won't go into here. The bottom
    line is that on some occasions a form which has been closed does not get
    gc'd.

    I have been trying to reproduce the problem in my toy app, but I cannot
    get it to fail. There is a clue there! I think I have just
    over-complicated things.

    I will start with a fresh approach tomorrow. If you don't hear from me
    again, you will know that I have solved it!

    Thanks for the input, it definitely helped.

    Frank

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Barry@21:1/5 to All on Tue Jan 16 22:31:44 2024
    On 16 Jan 2024, at 13:17, Thomas Passin via Python-list <python-list@python.org> wrote:

    The usual advice is to call deleteLater() on objects derived from PyQt classes. I don't know enough about PyQt to know if this takes care of all dangling reference problems, though.

    It works well and robustly.

    Barry

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Barry@21:1/5 to All on Tue Jan 16 22:37:05 2024
    On 16 Jan 2024, at 12:10, Frank Millman via Python-list <python-list@python.org> wrote:

    My problem is that my app is quite complex, and it is easy to leave a reference dangling somewhere which prevents an object from being gc'd.

    What I do to track these problems down is use gc.get_objects() then summerize the number of each type. Part 2 is to print the delta after an interval of a 2nd summary.
    Leaks of objects show up as the count of a type increasing every time you sample.


    Barry

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Greg Ewing@21:1/5 to Frank Millman on Wed Jan 17 14:01:54 2024
    On 17/01/24 1:01 am, Frank Millman wrote:
    I sometimes need to keep a reference from a
    transient object to a more permanent structure in my app. To save myself
    the extra step of removing all these references when the transient
    object is deleted, I make them weak references.

    I don't see how weak references help here at all. If the transient
    object goes away, all references from it to the permanent objects also
    go away.

    A weak reference would only be of use if the reference went the other
    way, i.e. from the permanent object to the transient object.

    --
    Greg

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Greg Ewing@21:1/5 to Chris Angelico on Wed Jan 17 14:01:49 2024
    On 17/01/24 4:00 am, Chris Angelico wrote:
    class Form:
    def __init__(self):
    self.elements = []

    class Element:
    def __init__(self, form):
    self.form = form
    form.elements.append(self)

    If you make the reference from Element to Form a weak reference,
    it won't keep the Form alive after it's been closed.

    --
    Greg

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Frank Millman@21:1/5 to Greg Ewing via Python-list on Wed Jan 17 08:42:17 2024
    On 2024-01-17 3:01 AM, Greg Ewing via Python-list wrote:
    On 17/01/24 1:01 am, Frank Millman wrote:
    I sometimes need to keep a reference from a transient object to a more
    permanent structure in my app. To save myself the extra step of
    removing all these references when the transient object is deleted, I
    make them weak references.

    I don't see how weak references help here at all. If the transient
    object goes away, all references from it to the permanent objects also
    go away.

    A weak reference would only be of use if the reference went the other
    way, i.e. from the permanent object to the transient object.


    You are right. I got my description above back-to-front. It is a pub/sub scenario. A transient object makes a request to the permanent object to
    be notified of any changes. The permanent object stores a reference to
    the transient object and executes a callback on each change. When the
    transient object goes away, the reference must be removed.

    Frank

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)