Forum: >>> Magnum BBS <<<

Is there a more efficient threading lock?

From Skip Montanaro@21:1/5 to All on Sat Feb 25 09:52:15 2023

I have a multi-threaded program which calls out to a non-thread-safe
library (not mine) in a couple places. I guard against multiple
threads executing code there using threading.Lock. The code is
straightforward:

from threading import Lock

# Something in textblob and/or nltk doesn't play nice with no-gil, so just
# serialize all blobby accesses.
BLOB_LOCK = Lock()

def get_terms(text):
with BLOB_LOCK:
phrases = TextBlob(text, np_extractor=EXTRACTOR).noun_phrases
for phrase in phrases:
yield phrase

When I monitor the application using py-spy, that with statement is
consuming huge amounts of CPU. Does threading.Lock.acquire() sleep
anywhere? I didn't see anything obvious poking around in the C code
which implements this stuff. I'm no expert though, so could easily
have missed something.

Thx,

Skip

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas Passin@21:1/5 to Skip Montanaro on Sat Feb 25 11:48:37 2023

On 2/25/2023 10:52 AM, Skip Montanaro wrote:

I have a multi-threaded program which calls out to a non-thread-safe
library (not mine) in a couple places. I guard against multiple
threads executing code there using threading.Lock. The code is straightforward:

from threading import Lock

# Something in textblob and/or nltk doesn't play nice with no-gil, so just
# serialize all blobby accesses.
BLOB_LOCK = Lock()

def get_terms(text):
with BLOB_LOCK:
phrases = TextBlob(text, np_extractor=EXTRACTOR).noun_phrases
for phrase in phrases:
yield phrase

When I monitor the application using py-spy, that with statement is
consuming huge amounts of CPU. Does threading.Lock.acquire() sleep
anywhere? I didn't see anything obvious poking around in the C code
which implements this stuff. I'm no expert though, so could easily
have missed something.

I'm no expert on locks, but you don't usually want to keep a lock while
some long-running computation goes on. You want the computation to be
done by a separate thread, put its results somewhere, and then notify
the choreographing thread that the result is ready.

This link may be helpful -

https://anandology.com/blog/using-iterators-and-generators/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Peter J. Holzer@21:1/5 to Skip Montanaro on Sat Feb 25 17:27:51 2023

On 2023-02-25 09:52:15 -0600, Skip Montanaro wrote:

I have a multi-threaded program which calls out to a non-thread-safe
library (not mine) in a couple places. I guard against multiple
threads executing code there using threading.Lock. The code is straightforward:

from threading import Lock

# Something in textblob and/or nltk doesn't play nice with no-gil, so just
# serialize all blobby accesses.
BLOB_LOCK = Lock()

def get_terms(text):
with BLOB_LOCK:
phrases = TextBlob(text, np_extractor=EXTRACTOR).noun_phrases
for phrase in phrases:
yield phrase

When I monitor the application using py-spy, that with statement is
consuming huge amounts of CPU.

Which OS is this?

Does threading.Lock.acquire() sleep anywhere?

On Linux it calls futex(2), which does sleep if it can't get the lock
right away. (Of course if it does get the lock, it will return
immediately which may use a lot of CPU if you are calling it a lot.)

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"

-----BEGIN PGP SIGNATURE-----

iQIzBAABCgAdFiEETtJbRjyPwVTYGJ5k8g5IURL+KF0FAmP6NwIACgkQ8g5IURL+ KF18tRAAqpkMX0CSkYYWGXN3gYgSEa7ad/F7A7O9sjiAnhvGU2PN+e4+tOypgfJX pMoN87qwB1VqVysEU6yE6X6YoQWjuWUcoYDrfvrdt5+vyxDahmhILEXGURnXs3SG 2qwvy7ZiLdUxkGXtqYzpFJ9Y50WsSIFzEbQVyd5xJppiavcQnfgjXC/To9NSceH9 Yt6n0K+UQXpM8cydKq19eX/sgL0H3mJVTyKmorlD6eVgV3cF8k8QgEeDi4fu82Zs Wf0Mr5hQBjgzJbQ6QbfIz3PndVtN9hfFcXstS9jWR7CImF1CTErkzOZOK4/Dm4Fb Jo4j8hAEfydOdFGiv7ndfeVa2+Y3XKg+1KEWT1KNQ+Nl/kcDIe78QnEz2Nt1UvC8 tCtg/R5sBMp3RXazkkGen4Ql3OboUID0kbf0rKGIoSjD8s2drdgUnQTAyxU+sqa0 R5kmuU96bWJRVkUrCzE2ASyDncWqbuB+6o5OZ81aP+sKBHeGzj4geV3EVsoXVGiZ iZjuh3INgIF1pkL0KUJQ9yHYBAgdsmbtmbko3jYaseq1MXLBThhlm2CKne0Zznzf jdBEt4kuZKhcQwUHhdcoLdC1axbD7SE3LRhGIHNciP4JidtjHpCKZl+0mpJCs9Ht 4W5mn5KFD7TUCOUbiXxzcuzZL/vzYL4PvPduLV8

From Paul Rubin@21:1/5 to Skip Montanaro on Sat Feb 25 11:41:13 2023

Skip Montanaro <skip.montanaro@gmail.com> writes:

from threading import Lock

1) you generally want to use RLock rather than Lock

2) I have generally felt that using locks at the app level at all is an antipattern. The main way I've stayed sane in multi-threaded Python
code is to have every mutable strictly owned by exactly one thread, pass
values around using Queues, and have an event loop in each thread taking requests from Queues.

3) I didn't know that no-gil was a now thing and I'm used to having the
GIL. So I would have considered the multiprocessing module rather than threading, for something like this.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Rubin@21:1/5 to Jon Ribbens on Sat Feb 25 13:51:38 2023

Jon Ribbens <jon+usenet@unequivocal.eu> writes:

1) you generally want to use RLock rather than Lock

Why?

So that a thread that tries to acquire it twice doesn't block itself,
etc. Look at the threading lib docs for more info.

What does this mean? Are you saying the GIL has been removed?

Last I heard there was an experimental version of CPython with the GIL
removed. It is supposed to take less of a performance hit due to
INCREF/DECREF than an earlier attempt some years back. I don't know its current status.

The GIL is an evil thing, but it has been around for so long that most
of us have gotten used to it, and some user code actually relies on it.
For example, with the GIL in place, a statement like "x += 1" is always
atomic, I believe. But, I think it is better to not have any shared
mutables regardless.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jon Ribbens@21:1/5 to Paul Rubin on Sat Feb 25 21:24:19 2023

On 2023-02-25, Paul Rubin <no.email@nospam.invalid> wrote:

Skip Montanaro <skip.montanaro@gmail.com> writes:

from threading import Lock

1) you generally want to use RLock rather than Lock

Why?

2) I have generally felt that using locks at the app level at all is an antipattern. The main way I've stayed sane in multi-threaded Python
code is to have every mutable strictly owned by exactly one thread, pass values around using Queues, and have an event loop in each thread taking requests from Queues.

3) I didn't know that no-gil was a now thing and I'm used to having the
GIL. So I would have considered the multiprocessing module rather than threading, for something like this.

What does this mean? Are you saying the GIL has been removed?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Peter J. Holzer@21:1/5 to Skip Montanaro on Sat Feb 25 22:53:22 2023

On 2023-02-25 09:52:15 -0600, Skip Montanaro wrote:

BLOB_LOCK = Lock()

def get_terms(text):
with BLOB_LOCK:
phrases = TextBlob(text, np_extractor=EXTRACTOR).noun_phrases
for phrase in phrases:
yield phrase

When I monitor the application using py-spy, that with statement is
consuming huge amounts of CPU.

Another thought:

How accurate is py-spy? Is it possible that it assigns time actually
spent in
phrases = TextBlob(text, np_extractor=EXTRACTOR).noun_phrases
to
with BLOB_LOCK:
?

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"

-----BEGIN PGP SIGNATURE-----

iQIzBAABCgAdFiEETtJbRjyPwVTYGJ5k8g5IURL+KF0FAmP6g0wACgkQ8g5IURL+ KF3TThAAnsPRreAxjTkVc1uLW8LEyzWewLm6QrZblcCj9zAsWTPZIb/qYVWHcqko agpZH6UNKlwMvGN1Gil5LViKJtJjAsXyBlmI6HCVkzqX4k/k1LXsxVzsbWztdrAP feLur/bCfkcelDw7xCX8GkHBoQ2owA5I5zIUqsnEB0njjJ7TO34PjZZLTSlGXo/U MKly/+Vhnq7bcDa8EogbDyYzhsgT3xvlabpTQtNNdfAjC4LW4Th9uknEVsZF+dnt I3OW52RORCwgl2CXCMK6B82+wopEFfnlupiAwmZwjUERsgwbD0slMHoDjOfyImK5 TYoLVY4frqbN5Aa4Fo7BWISDnMUd478mJUINRoXTNADwBAJngn6+LIAAZUcCRQPC RB3xKdpgu/6CUVSnBIYlDVkPjO50ZSzLsTG6YwLrd7zaISpKhf1v/BKrIEV44jBP Pt3vDDz6SlRK7hE9nQQQg6wFfNJJdU8neSdiUq4D5Afoit/NqqIHgC6/J31kwgeK UqbEnl0rz7RLi22bM7llVMmH1ol1w4sLlTi2x0oHUGqQA0LWJi/PeMGwczR66SJW R/wKzSiI8KDhI9lj1CRr5VLNql9HdIJRYKUmVIODSOdhYdwGmYV8VuxYwIqTD/Cr sQxV7ZpT4mApuuJaOKsvTnQpKOxhYJtbQiNr1W9

From Skip Montanaro@21:1/5 to Peter on Sat Feb 25 15:41:52 2023

Thanks for the responses.

Peter wrote:

Which OS is this?

MacOS Ventura 13.1, M1 MacBook Pro (eight cores).

Thomas wrote:

I'm no expert on locks, but you don't usually want to keep a lock while
some long-running computation goes on. You want the computation to be
done by a separate thread, put its results somewhere, and then notify
the choreographing thread that the result is ready.

In this case I'm extracting the noun phrases from the body of an email
message (returned as a list). I have a collection of email messages
organized by month (typically 1000 to 3000 messages per month). I'm using concurrent.futures.ThreadPoolExecutor() with the default number of workers ( os.cpu_count() * 1.5, or 12 threads on my system) to process each month, so
12 active threads at a time. Given that the process is pretty much CPU
bound, maybe reducing the number of workers to the CPU count would make
sense. Processing of each email message enters that with block once. That's about as minimal as I can make it. I thought for a bit about pushing the textblob stuff into a separate worker thread, but it wasn't obvious how to
set up queues to handle the communication between the threads created by ThreadPoolExecutor() and the worker thread. Maybe I'll think about it
harder. (I have a related problem with SQLite, since an open database can't
be manipulated from multiple threads. That makes much of the program's end-of-run processing single-threaded.)

This link may be helpful -

https://anandology.com/blog/using-iterators-and-generators/

I don't think that's where my problem is. The lock protects the generation
of the noun phrases. My loop which does the yielding operates outside of
that lock's control. The version of the code is my latest, in which I
tossed out a bunch of phrase-processing code (effectively dead end ideas
for processing the phrases). Replacing the for loop with a simple return
seems not to have any effect. In any case, the caller which uses the
phrases does a fair amount of extra work with the phrases, populating a
SQLite database, so I don't think the amount of time it takes to process a single email message is dominated by the phrase generation.

Here's timeit output for the noun_phrases code:

% python -m timeit -s 'text = """`python -m timeit --help`""" ; from
textblob import TextBlob ; from textblob.np_extractors import
ConllExtractor ; ext = ConllExtractor() ; phrases = TextBlob(text, np_extractor=ext).noun_phrases' 'phrases = TextBlob(text, np_extractor=ext).noun_phrases'
5000 loops, best of 5: 98.7 usec per loop

I process the output of timeit's help message which looks to be about the
same length as a typical email message, certainly the same order of
magnitude. Also, note that I call it once in the setup to eliminate the
initial training of the ConllExtractor instance. I don't know if ~100us qualifies as long running or not.

I'll keep messing with it.

Skip

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Rubin@21:1/5 to Skip Montanaro on Sat Feb 25 13:57:48 2023

Skip Montanaro <skip.montanaro@gmail.com> writes:

In this case I'm extracting the noun phrases from the body of an email message (returned as a list). I have a collection of email messages
organized by month (typically 1000 to 3000 messages per month).

This is embarassingly parallel enough that I would probably launch a
bunch of separate command line processes with GNU Parallel, rather than
messing with writing a multi-threaded Python program. That would also
let you distribute the processing across multiple machines on a network,
if the cpu requirements warranted it.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Weatherby,Gerard@21:1/5 to All on Sat Feb 25 21:47:00 2023

�I'm no expert on locks, but you don't usually want to keep a lock while
some long-running computation goes on. You want the computation to be
done by a separate thread, put its results somewhere, and then notify
the choreographing thread that the result is ready.�

Maybe. There are so many possible threaded application designs I�d hesitate to make a general statement.

The threading.Lock.acquire method has flags for both a non-blocking attempt and a timeout, so a valid design could include a long-running computation with a main thread or event loop polling the thread. Or the thread could signal a main loop some other
way.

I�ve written some code that coordinated threads by having a process talk to itself using a socket.socketpair. The advantage is that you can bundle multiple items (sockets, file handles, a polling timeout) into a select.select call which waits without
consuming resources (at least on Linux) until
something interesting happens.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Barry@21:1/5 to All on Sat Feb 25 22:48:47 2023

Re sqlite and threads. The C API can be compiled to be thread safe from my Reading if the sqlite docs. What I have not checked is how python’s bundled sqlite
is compiled. There are claims python’s sqlite is not thread safe.

Barry

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas Passin@21:1/5 to Skip Montanaro on Sat Feb 25 17:20:20 2023

On 2/25/2023 4:41 PM, Skip Montanaro wrote:

Thanks for the responses.

Peter wrote:

Which OS is this?

MacOS Ventura 13.1, M1 MacBook Pro (eight cores).

Thomas wrote:

I'm no expert on locks, but you don't usually want to keep a lock while some long-running computation goes on. You want the computation to be done by a separate thread, put its results somewhere, and then notify
the choreographing thread that the result is ready.

In this case I'm extracting the noun phrases from the body of an email message(returned as a list). I have a collection of email messages
organized by month(typically 1000 to 3000 messages per month). I'm using concurrent.futures.ThreadPoolExecutor() with the default number of
workers (os.cpu_count() * 1.5, or 12 threads on my system)to process
each month, so 12 active threads at a time. Given that the process is
pretty much CPU bound, maybe reducing the number of workers to the CPU
count would make sense. Processing of each email message enters that
with block once.That's about as minimal as I can make it. I thought for
a bit about pushing the textblob stuff into a separate worker thread,
but it wasn't obvious how to set up queues to handle the communication between the threads created by ThreadPoolExecutor()and the worker
thread. Maybe I'll think about it harder. (I have a related problem with SQLite, since an open database can't be manipulated from multiple
threads. That makes much of the program's end-of-run processing single-threaded.)

If the noun extractor is single-threaded (which I think you mentioned),
no amount of parallel access is going to help. The best you can do is
to queue up requests so that as soon as the noun extractor returns from
one call, it gets handed another blob. The CPU will be busy all the
time running the noun-extraction code.

If that's the case, you might just as well eliminate all the threads and
just do it sequentially in the most obvious and simple manner.

It would possibly be worth while to try this approach out and see what
happens to the CPU usage and overall computation time.

This link may be helpful -

https://anandology.com/blog/using-iterators-and-generators/

<https://anandology.com/blog/using-iterators-and-generators/>

I don't think that's where my problem is. The lock protects the
generation of the noun phrases. My loop which does the yielding operates outside of that lock's control. The version of the code is my latest, in which I tossed out a bunch of phrase-processing code (effectively dead
end ideas for processing the phrases). Replacing the for loop with a
simple return seems not to have any effect. In any case, the caller
which uses the phrases does a fair amount of extra work with the
phrases, populating a SQLite database, so I don't think the amount of
time it takes to process a single email message is dominated by the
phrase generation.

Here's timeitoutput for the noun_phrases code:

% python -m timeit -s 'text = """`python -m timeit --help`""" ; from
textblob import TextBlob ; from textblob.np_extractors import
ConllExtractor ; ext = ConllExtractor() ; phrases = TextBlob(text, np_extractor=ext).noun_phrases' 'phrases = TextBlob(text, np_extractor=ext).noun_phrases'
5000 loops, best of 5: 98.7 usec per loop

I process the output of timeit's help message which looks to be about
the same length as a typical email message, certainly the same order of magnitude. Also, note that I call it once in the setup to eliminate the initial training of the ConllExtractor instance. I don't know if ~100us qualifies as long running or not.

I'll keep messing with it.

Skip

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jon Ribbens@21:1/5 to Paul Rubin on Sat Feb 25 23:45:36 2023

On 2023-02-25, Paul Rubin <no.email@nospam.invalid> wrote:

Jon Ribbens <jon+usenet@unequivocal.eu> writes:

1) you generally want to use RLock rather than Lock

Why?

So that a thread that tries to acquire it twice doesn't block itself,
etc. Look at the threading lib docs for more info.

Yes, I know what the docs say, I was asking why you were making the
statement above. I haven't used Lock very often, but I've literally
never once in 25 years needed to use RLock. As you say, it's best
to keep the lock-protected code brief, so it's usually pretty
obvious that the code can't be re-entered.

What does this mean? Are you saying the GIL has been removed?

Last I heard there was an experimental version of CPython with the GIL removed. It is supposed to take less of a performance hit due to INCREF/DECREF than an earlier attempt some years back. I don't know its current status.

The GIL is an evil thing, but it has been around for so long that most
of us have gotten used to it, and some user code actually relies on it.
For example, with the GIL in place, a statement like "x += 1" is always atomic, I believe. But, I think it is better to not have any shared
mutables regardless.

I think it is the case that x += 1 is atomic but foo.x += 1 is not.
Any replacement for the GIL would have to keep the former at least,
plus the fact that you can do hundreds of things like list.append(foo)
which are all effectively atomic.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dennis Lee Bieber@21:1/5 to All on Sat Feb 25 22:10:48 2023

On Sat, 25 Feb 2023 15:41:52 -0600, Skip Montanaro
<skip.montanaro@gmail.com> declaimed the following:

concurrent.futures.ThreadPoolExecutor() with the default number of workers ( >os.cpu_count() * 1.5, or 12 threads on my system) to process each month, so >12 active threads at a time. Given that the process is pretty much CPU
bound, maybe reducing the number of workers to the CPU count would make

Unless things have improved a lot over the years, the GIL still limits active threads to the equivalent of a single CPU. The OS may swap among
which CPU as it schedules system processes, but only one thread will be
running at any moment regardless of CPU count.

Common wisdom is that Python threading works well for I/O bound systems, where each thread spends most of its time waiting for some I/O operation to complete -- thereby allowing the OS to schedule other threads.

For CPU bound, use of the multiprocessing package may be more suited -- though you'll have to device a working IPC system transfer data to/from the separate processes (no shared objects as possible with threads).

--
Wulfraed Dennis Lee Bieber AF6VN
wlfraed@ix.netcom.com http://wlfraed.microdiversity.freeddns.org/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Chris Angelico@21:1/5 to Dennis Lee Bieber on Sun Feb 26 16:50:38 2023

On Sun, 26 Feb 2023 at 16:27, Dennis Lee Bieber <wlfraed@ix.netcom.com> wrote:

On Sat, 25 Feb 2023 15:41:52 -0600, Skip Montanaro
<skip.montanaro@gmail.com> declaimed the following:

concurrent.futures.ThreadPoolExecutor() with the default number of workers ( >os.cpu_count() * 1.5, or 12 threads on my system) to process each month, so >12 active threads at a time. Given that the process is pretty much CPU >bound, maybe reducing the number of workers to the CPU count would make

Unless things have improved a lot over the years, the GIL still limits
active threads to the equivalent of a single CPU. The OS may swap among
which CPU as it schedules system processes, but only one thread will be running at any moment regardless of CPU count.

Specifically, a single CPU core *executing Python bytecode*. There are
quite a few libraries that release the GIL during computation. Here's
a script that's quite capable of saturating my CPU entirely - in fact,
typing this email is glitchy due to lack of resources:

import threading
import bcrypt
results = [0, 0]
def thrd():
for _ in range(10):
ok = bcrypt.checkpw(b"password", b'$2b$15$DGDXMb2zvPotw1rHFouzyOVzSopiLIUSedO5DVGQ1GblAd6L6I8/6')
results[ok] += 1

threads = [threading.Thread(target=thrd) for _ in range(100)]
for t in threads: t.start()
for t in threads: t.join()
print(results)

I have four cores eight threads, and yeah, my CPU's not exactly the
latest and greatest (i7 6700k - it was quite good some years ago, but outstripped now), but feel free to crank the numbers if you want to.

I'm pretty sure bcrypt won't use more than one CPU core for a single hashpw/checkpw call, but by releasing the GIL during the hard number
crunching, it allows easy parallelization. Same goes for numpy work,
or anything else that can be treated as a separate operation.

So it's more accurate to say that only one CPU core can be
*manipulating Python objects* at a time, although it's hard to pin
down exactly what that means, making it easier to say that there can
only be one executing Python bytecode; it should be possible for any
function call into a C library to be a point where other threads can
take over (most notably, any sort of I/O, but also these kinds of
things).

As mentioned, GIL-removal has been under discussion at times, most
recently (and currently) with PEP 703
https://peps.python.org/pep-0703/ - and the benefits in multithreaded applications always have to be factored against quite significant
performance penalties. It's looking like PEP 703's proposal has the
least-bad performance measurements of any GILectomy I've seen so far,
showing 10% worse performance on average (possibly able to be reduced
to 5%). As it happens, a GIL just makes sense when you want pure, raw performance, and it's only certain workloads that suffer under it.

ChrisA

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Chris Angelico@21:1/5 to python-list@python.org on Sun Feb 26 16:35:50 2023

On Sun, 26 Feb 2023 at 16:16, Jon Ribbens via Python-list <python-list@python.org> wrote:

On 2023-02-25, Paul Rubin <no.email@nospam.invalid> wrote:

The GIL is an evil thing, but it has been around for so long that most
of us have gotten used to it, and some user code actually relies on it.
For example, with the GIL in place, a statement like "x += 1" is always atomic, I believe. But, I think it is better to not have any shared mutables regardless.

I think it is the case that x += 1 is atomic but foo.x += 1 is not.
Any replacement for the GIL would have to keep the former at least,
plus the fact that you can do hundreds of things like list.append(foo)
which are all effectively atomic.

The GIL is most assuredly *not* an evil thing. If you think it's so
evil, go ahead and remove it, because we'll clearly be better off
without it, right?

As it turns out, most GIL-removal attempts have had a fairly nasty
negative effect on performance. The GIL is a huge performance boost.

As to what is atomic and what is not... it's complicated, as always.
Suppose that x (or foo.x) is a custom type:

class Thing:
def __iadd__(self, other):
print("Hi, I'm being added onto!")
self.increment_by(other)
return self

Then no, neither of these is atomic, although if the increment itself
is, it probably won't matter. As far as I know, the only way that it
would be at all different for x+=1 and foo.x+=1 would be if the
__iadd__ method both mutates and returns something other than self,
which is quite unusual. (Most incrementing is done by either
constructing a new object to return, or mutating the existing one, but
not a hybrid.)

Consider this:

import threading
d = {0:0, 1:0, 2:0, 3:0}
def thrd():
for _ in range(10000):
d[0] += 1
d[1] += 1
d[2] += 1
d[3] += 1

threads = [threading.Thread(target=thrd) for _ in range(50)]
for t in threads: t.start()
for t in threads: t.join()
print(d)

Is this code guaranteed to result in 500000 in every slot in the
dictionary? What if you replace the dictionary with a four-element
list? Do you need a GIL for this, or some other sort of lock? What
exactly is it that is needed? To answer that question, let's look at
exactly what happens in the disassembly:

def thrd():

... d[0] += 1
... d[1] += 1
...

import dis
dis.dis(thrd)

1 0 RESUME 0

2 2 LOAD_GLOBAL 0 (d)
14 LOAD_CONST 1 (0)
16 COPY 2
18 COPY 2
20 BINARY_SUBSCR
30 LOAD_CONST 2 (1)
32 BINARY_OP 13 (+=)
36 SWAP 3
38 SWAP 2
40 STORE_SUBSCR

3 44 LOAD_GLOBAL 0 (d)
56 LOAD_CONST 2 (1)
58 COPY 2
60 COPY 2
62 BINARY_SUBSCR
72 LOAD_CONST 2 (1)
74 BINARY_OP 13 (+=)
78 SWAP 3
80 SWAP 2
82 STORE_SUBSCR
86 LOAD_CONST 0 (None)
88 RETURN_VALUE

(Your exact disassembly may differ, this was on CPython 3.12.)
Crucially, note these three instructions that occur in each block: BINARY_SUBSCR, BINARY_OP, and STORE_SUBSCR. Those are a lookup
(retrieving the value of d[0]), the actual addition (adding one to the
value), and a store (putting the result back into d[0]). So it's
actually not guaranteed to be atomic; it would be perfectly reasonable
to interrupt that sequence and have something else do another
subscript.

Here's the equivalent with just incrementing a global:

def thrd():

... x += 1
...

dis.dis(thrd)

1 0 RESUME 0

2 2 LOAD_FAST_CHECK 0 (x)
4 LOAD_CONST 1 (1)
6 BINARY_OP 13 (+=)
10 STORE_FAST 0 (x)
12 LOAD_CONST 0 (None)
14 RETURN_VALUE

The exact same sequence: load, add, store. Still not atomic.

General takeaway: The GIL is a performance feature, not a magic
solution, and certainly not an evil beast that must be slain at any
cost. Attempts to remove it always have to provide equivalent
protection in some other way. But the protection you think you have
might not be what you actually have.

ChrisA

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Barry Scott@21:1/5 to Jon Ribbens via Python-list on Sun Feb 26 11:53:26 2023

On 25/02/2023 23:45, Jon Ribbens via Python-list wrote:

I think it is the case that x += 1 is atomic but foo.x += 1 is not.

No that is not true, and has never been true.

def x(a):

:...    a += 1
:...

dis.dis(x)

1           0 RESUME                   0

2           2 LOAD_FAST                0 (a)
             4 LOAD_CONST               1 (1)
             6 BINARY_OP               13 (+=)
            10 STORE_FAST               0 (a)
            12 LOAD_CONST               0 (None)
            14 RETURN_VALUE

As you can see there are 4 byte code ops executed.

Python's eval loop can switch to another thread between any of them.

Its is not true that the GIL provides atomic operations in python.

Barry

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jon Ribbens@21:1/5 to Chris Angelico on Sun Feb 26 16:09:44 2023

On 2023-02-26, Chris Angelico <rosuav@gmail.com> wrote:

On Sun, 26 Feb 2023 at 16:16, Jon Ribbens via Python-list
<python-list@python.org> wrote:

On 2023-02-25, Paul Rubin <no.email@nospam.invalid> wrote:

The GIL is an evil thing, but it has been around for so long that most
of us have gotten used to it, and some user code actually relies on it.
For example, with the GIL in place, a statement like "x += 1" is always
atomic, I believe. But, I think it is better to not have any shared
mutables regardless.

I think it is the case that x += 1 is atomic but foo.x += 1 is not.
Any replacement for the GIL would have to keep the former at least,
plus the fact that you can do hundreds of things like list.append(foo)
which are all effectively atomic.

The GIL is most assuredly *not* an evil thing. If you think it's so
evil, go ahead and remove it, because we'll clearly be better off
without it, right?

If you say so. I said nothing whatsoever about the GIL being evil.

As it turns out, most GIL-removal attempts have had a fairly nasty
negative effect on performance. The GIL is a huge performance boost.

As to what is atomic and what is not... it's complicated, as always.
Suppose that x (or foo.x) is a custom type:

Yes, sure, you can make x += 1 not work even single-threaded if you
make custom types which override basic operations. I'm talking about
when you're dealing with simple atomic built-in types such as integers.

Here's the equivalent with just incrementing a global:

def thrd():

... x += 1
...

dis.dis(thrd)

1 0 RESUME 0

2 2 LOAD_FAST_CHECK 0 (x)
4 LOAD_CONST 1 (1)
6 BINARY_OP 13 (+=)
10 STORE_FAST 0 (x)
12 LOAD_CONST 0 (None)
14 RETURN_VALUE

The exact same sequence: load, add, store. Still not atomic.

And yet, it appears that *something* changed between Python 2
and Python 3 such that it *is* atomic:

import sys, threading
class Foo:
x = 0
foo = Foo()
y = 0
def thrd():
global y
for _ in range(10000):
foo.x += 1
y += 1
threads = [threading.Thread(target=thrd) for _ in range(50)]
for t in threads: t.start()
for t in threads: t.join()
print(sys.version)
print(foo.x, y)

2.7.5 (default, Jun 28 2022, 15:30:04)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-44)]
(64489, 59854)

3.8.10 (default, Nov 14 2022, 12:59:47)
[GCC 9.4.0]
500000 500000

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jon Ribbens@21:1/5 to Barry Scott on Sun Feb 26 16:11:11 2023

On 2023-02-26, Barry Scott <barry@barrys-emacs.org> wrote:

On 25/02/2023 23:45, Jon Ribbens via Python-list wrote:

I think it is the case that x += 1 is atomic but foo.x += 1 is not.

No that is not true, and has never been true.

def x(a):

:...    a += 1
:...

dis.dis(x)

1           0 RESUME                   0

2           2 LOAD_FAST                0 (a)
             4 LOAD_CONST               1 (1)
             6 BINARY_OP               13 (+=)
            10 STORE_FAST               0 (a)
            12 LOAD_CONST               0 (None)
            14 RETURN_VALUE

As you can see there are 4 byte code ops executed.

Python's eval loop can switch to another thread between any of them.

Its is not true that the GIL provides atomic operations in python.

That's oversimplifying to the point of falsehood (just as the opposite
would be too). And: see my other reply in this thread just now - if the
GIL isn't making "x += 1" atomic, something else is.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Skip Montanaro@21:1/5 to All on Sun Feb 26 11:53:45 2023

Thanks for the various replies. The program originally started out single-threaded. I wandered down the multi-threaded path to see if I could
get a performance boost using Sam Gross's NoGIL fork <https://github.com/colesbury/nogil-3.12>. I was pretty sure the GIL would limit multi-threading performance on a stock Python interpreter. When I
first switched to threads, I didn't have a lock around the one or two
places which called out to the TextBlob <https://textblob.readthedocs.io/>/NLTK stuff. The use of threading.Lock was the obvious simplest choice, and it
solved the crash I saw without it. I'm still thinking about using queues to communicate between the email processing threads and the TextBlob & SQLite processing stuff.

I had been doing a bit of pre- and post-processing of the default TextBlob
noun phrase generation, but I wasn't happy with it, so I decided to
experiment with an alternate noun phrase extractor <https://textblob.readthedocs.io/en/dev/api_reference.html?highlight=ConllExtractor#textblob.en.np_extractors.ConllExtractor>.
I was happier with that, so ripped out most of the ad hoc stuff I was
doing. While doing this code surgery, I moved back to 3.11 to have a more trusty Python interpreter. (I've yet to encounter a problem with NoGIL,
just cutting back on moving parts, and wasn't seeing any obvious
performance gains.)

As for SQLite and multi-threading, I figured if the core devs hadn't yet
gotten around to making it available then it probably wasn't
straightforward. I wasn't willing to tackle that.

So, I'll keep messing around. It's all just for fun <https://www.smontanaro.net/CR> anyway.

Skip

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Rubin@21:1/5 to Chris Angelico on Sun Feb 26 10:50:45 2023

Chris Angelico <rosuav@gmail.com> writes:

The GIL is most assuredly *not* an evil thing. If you think it's so
evil, go ahead and remove it, because we'll clearly be better off
without it, right? ... As it turns out, most GIL-removal attempts
have had a fairly nasty negative effect on performance. The GIL is a
huge performance boost.

The evil GIL doesn't give any performance boost. Rather, removing it
gives a terrible performance loss, because CPython's evil reference
counting memory management system would require evil locks around the
evil reference counts in the absence of the GIL, causing a slowdown.

The "right" fix is to throw out the refcounts and use a proper garbage collector, like every Lisp implementation has done back to the 1950s. MicroPython, IronPython, and other Python implementations all do this as
well. But, this change would break CPython's C API pretty badly, so we
are stuck. That is another reason Python 3 is a tragedy. The 2 to 3 transition would have been the right time to retire CPython entirely,
and move to a PyPy based reference implementation.

Yeah, yeah, deterministic releases and the myth of no pauses.
Deterministic release is why the "with" statement was added: no need to
rely on refcounting for that. And realtime (bounded latency) GC's exist
for those who want them. CPython's refcounting can have unbounded
delays if you free a large structure and millions of refcounts have to
all be decremented.

CPython of course ended up adding a GC anyway, to clean up after the
refcount system in cases where there is cyclic structure needing to be reclaimed...

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Skip Montanaro@21:1/5 to All on Sun Feb 26 18:20:02 2023

And yet, it appears that *something* changed between Python 2 and Python

3 such that it *is* atomic:

I haven't looked, but something to check in the source is opcode
prediction. It's possible that after the BINARY_OP executes, opcode
prediction jumps straight to the STORE_FAST opcode, avoiding the transfer
to the top of the virtual machine loop. That would (I think) avoid checks related to GIL release and thread switches.

I don't guarantee that's what's going on, and even if I'm correct, I don't think you can rely on it.

Skip

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Chris Angelico@21:1/5 to python-list@python.org on Mon Feb 27 12:25:28 2023

On Mon, 27 Feb 2023 at 10:42, Jon Ribbens via Python-list <python-list@python.org> wrote:

On 2023-02-26, Chris Angelico <rosuav@gmail.com> wrote:

On Sun, 26 Feb 2023 at 16:16, Jon Ribbens via Python-list
<python-list@python.org> wrote:

On 2023-02-25, Paul Rubin <no.email@nospam.invalid> wrote:

The GIL is an evil thing, but it has been around for so long that most >> > of us have gotten used to it, and some user code actually relies on it. >> > For example, with the GIL in place, a statement like "x += 1" is always >> > atomic, I believe. But, I think it is better to not have any shared
mutables regardless.

I think it is the case that x += 1 is atomic but foo.x += 1 is not.
Any replacement for the GIL would have to keep the former at least,
plus the fact that you can do hundreds of things like list.append(foo)
which are all effectively atomic.

The GIL is most assuredly *not* an evil thing. If you think it's so
evil, go ahead and remove it, because we'll clearly be better off
without it, right?

If you say so. I said nothing whatsoever about the GIL being evil.

You didn't, but I was also responding to Paul's description that the
GIL "is an evil thing". Apologies if that wasn't clear.

Yes, sure, you can make x += 1 not work even single-threaded if you
make custom types which override basic operations. I'm talking about
when you're dealing with simple atomic built-in types such as integers.

Here's the equivalent with just incrementing a global:

def thrd():

... x += 1
...

dis.dis(thrd)

1 0 RESUME 0

2 2 LOAD_FAST_CHECK 0 (x)
4 LOAD_CONST 1 (1)
6 BINARY_OP 13 (+=)
10 STORE_FAST 0 (x)
12 LOAD_CONST 0 (None)
14 RETURN_VALUE

The exact same sequence: load, add, store. Still not atomic.

And yet, it appears that *something* changed between Python 2
and Python 3 such that it *is* atomic:

I don't think that's a guarantee. You might be unable to make it
break, but that doesn't mean it's dependable.

In any case, it's not the GIL that's doing this. It might be a quirk
of the current implementation of the core evaluation loop, or it might
be something unrelated, but whatever it is, removing the GIL wouldn't
change that; and it's certainly no different whether it's a global or
an attribute of an object.

ChrisA

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael Speer@21:1/5 to python-list@python.org on Sun Feb 26 22:19:55 2023

I wanted to provide an example that your claimed atomicity is simply wrong,
but I found there is something different in the 3.10+ cpython
implementations.

I've tested the code at the bottom of this message using a few docker
python images, and it appears there is a difference starting in 3.10.0

python3.8
EXPECTED 2560000000
ACTUAL 84533137
python:3.9
EXPECTED 2560000000
ACTUAL 95311773
python:3.10 (.8)
EXPECTED 2560000000
ACTUAL 2560000000

just to see if there was a specific sub-version of 3.10 that added it python:3.10.0
EXPECTED 2560000000
ACTUAL 2560000000

nope, from the start of 3.10 this is happening

the only difference in the bytecode I see is 3.10 adds SETUP_LOOP and
POP_BLOCK around the for loop

I don't see anything different in the long c code that I would expect would cause this.

AFAICT the inplace add is null for longs and so should revert to the
long_add that always creates a new integer in x_add

another test
python:3.11
EXPECTED 2560000000
ACTUAL 2560000000

I'm not sure where the difference is at the moment. I didn't see anything
in the release notes given a quick glance.

I do agree that you shouldn't depend on this unless you find a written guarantee of the behavior, as it is likely an implementation quirk of some
kind

--[code]--

import threading

UPDATES = 10000000
THREADS = 256

vv = 0

def update_x_times( xx ):
for _ in range( xx ):
global vv
vv += 1

def main():
tts = []
for _ in range( THREADS ):
tts.append( threading.Thread( target = update_x_times, args = (UPDATES,) ) )

for tt in tts:
tt.start()

for tt in tts:
tt.join()

print( 'EXPECTED', UPDATES * THREADS )
print( 'ACTUAL ', vv )

if __name__ == '__main__':
main()

On Sun, Feb 26, 2023 at 6:35 PM Jon Ribbens via Python-list < python-list@python.org> wrote:

On 2023-02-26, Barry Scott <barry@barrys-emacs.org> wrote:

On 25/02/2023 23:45, Jon Ribbens via Python-list wrote:

I think it is the case that x += 1 is atomic but foo.x += 1 is not.

No that is not true, and has never been true.

def x(a):

:... a += 1
:...

dis.dis(x)

1 0 RESUME 0

2 2 LOAD_FAST 0 (a)
4 LOAD_CONST 1 (1)
6 BINARY_OP 13 (+=)
10 STORE_FAST 0 (a)
12 LOAD_CONST 0 (None)
14 RETURN_VALUE

As you can see there are 4 byte code ops executed.

Python's eval loop can switch to another thread between any of them.

Its is not true that the GIL provides atomic operations in python.

That's oversimplifying to the point of falsehood (just as the opposite
would be too). And: see my other reply in this thread just now - if the
GIL isn't making "x += 1" atomic, something else is.
--
https://mail.python.org/mailman/listinfo/python-list

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael Speer@21:1/5 to knomenet@gmail.com on Mon Feb 27 01:26:59 2023

https://stackoverflow.com/questions/69993959/python-threads-difference-for-3-10-and-others

https://github.com/python/cpython/commit/4958f5d69dd2bf86866c43491caf72f774ddec97

it's a quirk of implementation. the scheduler currently only checks if it
needs to release the gil after the POP_JUMP_IF_FALSE, POP_JUMP_IF_TRUE, JUMP_ABSOLUTE, CALL_METHOD, CALL_FUNCTION, CALL_FUNCTION_KW, and CALL_FUNCTION_EX opcodes.

import code
import dis
dis.dis( code.update_x_times )

10 0 LOAD_GLOBAL 0 (range)
2 LOAD_FAST 0 (xx)
4 CALL_FUNCTION 1
##### GIL CAN RELEASE HERE #####
6 GET_ITER
>> 8 FOR_ITER 6 (to 22)
10 STORE_FAST 1 (_)
12 12 LOAD_GLOBAL 1 (vv)
14 LOAD_CONST 1 (1)
16 INPLACE_ADD
18 STORE_GLOBAL 1 (vv)
20 JUMP_ABSOLUTE 4 (to 8)
##### GIL CAN RELEASE HERE (after JUMP_ABSOLUTE points the instruction
counter back to FOR_ITER, but before the interpreter actually jumps to
FOR_ITER again) #####
10 >> 22 LOAD_CONST 0 (None)
24 RETURN_VALUE

due to this, this section:
12 12 LOAD_GLOBAL 1 (vv)
14 LOAD_CONST 1 (1)
16 INPLACE_ADD
18 STORE_GLOBAL 1 (vv)

is effectively locked/atomic on post-3.10 interpreters, though this is
neither portable nor guaranteed to stay that way into the future

On Sun, Feb 26, 2023 at 10:19 PM Michael Speer <knomenet@gmail.com> wrote:

I wanted to provide an example that your claimed atomicity is simply
wrong, but I found there is something different in the 3.10+ cpython implementations.

I've tested the code at the bottom of this message using a few docker
python images, and it appears there is a difference starting in 3.10.0

python3.8
EXPECTED 2560000000
ACTUAL 84533137
python:3.9
EXPECTED 2560000000
ACTUAL 95311773
python:3.10 (.8)
EXPECTED 2560000000
ACTUAL 2560000000

just to see if there was a specific sub-version of 3.10 that added it python:3.10.0
EXPECTED 2560000000
ACTUAL 2560000000

nope, from the start of 3.10 this is happening

the only difference in the bytecode I see is 3.10 adds SETUP_LOOP and POP_BLOCK around the for loop

I don't see anything different in the long c code that I would expect
would cause this.

AFAICT the inplace add is null for longs and so should revert to the
long_add that always creates a new integer in x_add

another test
python:3.11
EXPECTED 2560000000
ACTUAL 2560000000

I'm not sure where the difference is at the moment. I didn't see anything
in the release notes given a quick glance.

I do agree that you shouldn't depend on this unless you find a written guarantee of the behavior, as it is likely an implementation quirk of some kind

--[code]--

import threading

UPDATES = 10000000
THREADS = 256

vv = 0

def update_x_times( xx ):
for _ in range( xx ):
global vv
vv += 1

def main():
tts = []
for _ in range( THREADS ):
tts.append( threading.Thread( target = update_x_times, args = (UPDATES,) ) )

for tt in tts:
tt.start()

for tt in tts:
tt.join()

print( 'EXPECTED', UPDATES * THREADS )
print( 'ACTUAL ', vv )

if __name__ == '__main__':
main()

On Sun, Feb 26, 2023 at 6:35 PM Jon Ribbens via Python-list < python-list@python.org> wrote:

On 2023-02-26, Barry Scott <barry@barrys-emacs.org> wrote:

On 25/02/2023 23:45, Jon Ribbens via Python-list wrote:

I think it is the case that x += 1 is atomic but foo.x += 1 is not.

No that is not true, and has never been true.

def x(a):

:... a += 1
:...

dis.dis(x)

1 0 RESUME 0

2 2 LOAD_FAST 0 (a)
4 LOAD_CONST 1 (1)
6 BINARY_OP 13 (+=)
10 STORE_FAST 0 (a)
12 LOAD_CONST 0 (None)
14 RETURN_VALUE

As you can see there are 4 byte code ops executed.

Python's eval loop can switch to another thread between any of them.

Its is not true that the GIL provides atomic operations in python.

That's oversimplifying to the point of falsehood (just as the opposite
would be too). And: see my other reply in this thread just now - if the
GIL isn't making "x += 1" atomic, something else is.
--
https://mail.python.org/mailman/listinfo/python-list

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Chris Angelico@21:1/5 to Michael Speer on Mon Feb 27 17:37:32 2023

On Mon, 27 Feb 2023 at 17:28, Michael Speer <knomenet@gmail.com> wrote:

https://github.com/python/cpython/commit/4958f5d69dd2bf86866c43491caf72f774ddec97

it's a quirk of implementation. the scheduler currently only checks if it needs to release the gil after the POP_JUMP_IF_FALSE, POP_JUMP_IF_TRUE, JUMP_ABSOLUTE, CALL_METHOD, CALL_FUNCTION, CALL_FUNCTION_KW, and CALL_FUNCTION_EX opcodes.

Oh now that is VERY interesting. It's a quirk of implementation, yes,
but there's a reason for it; a bug being solved. The underlying
guarantee about __exit__ should be considered to be defined behaviour,
meaning that the precise quirk might not be relevant even though the
bug has to remain fixed in all future versions. But I'd also note here
that, if it can be absolutely 100% guaranteed that the GIL will be
released and signals checked on a reasonable interval, there's no
particular reason to state that signals are checked after every single
Python bytecode. (See the removed comment about empty loops, which
would have been a serious issue and is probably why the backward jump
rule exists.)

So it wouldn't be too hard for a future release of Python to mandate
atomicity of certain specific operations. Obviously it'd require
buy-in from other implementations, but it would be rather convenient
if, subject to some very tight rules like "only when adding integers
onto core data types" etc, a simple statement like "x.y += 1" could
actually be guaranteed to take place atomically.

Though it's still probably not as useful as you might hope. In C, if I
can do "int id = counter++;" atomically, it would guarantee me a new
ID that no other thread could ever have. But in Python, that increment operation doesn't give you the result, so all it's really useful for
is statistics on operations done. Still, that in itself could be of
value in quite a few situations.

In any case, though, this isn't something to depend upon at the moment.

ChrisA

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Rubin@21:1/5 to Chris Angelico on Mon Feb 27 00:37:18 2023

Chris Angelico <rosuav@gmail.com> writes:

So it wouldn't be too hard for a future release of Python to mandate atomicity of certain specific operations.... "only when adding
integers onto core data types" etc, a simple statement like "x.y += 1"
could actually be guaranteed to take place atomically.

That would be pretty awful and messy, have performance costs, be cpython specific, etc. It is likely feasible to have something like

with atomic(): x.y += 1

where atomic() used a locked machine instruction rather than a system
call or anything awful like that. The article "Beautiful Concurrency" describes the GHC API for stuff like this, but not so much the
underlying implementation:

https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.365.1337&rep=rep1&type=pdf

The implementation uses a hardware compare-and-swap instruction (LOCK
XCHG on the x86) that exists on most modern cpus. This goes into more
detail, iirc:

https://research.microsoft.com/en-us/um/people/simonpj/Papers/stm/stm.pdf

Haskell uses its type system to make sure you don't do forbidden stuff
like I/O inside a memory transaction, but Python could do dynamic checks
and raise errors if you made a mistake.

The proposed multitasking spec for ANS Forth includes compare-and-swap
and it is actually used in a STM (software transactional memory) library written in Forth. It is very cool stuff and makes a whole lot of
lock-related pain go away in concurrent programs.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Barry@21:1/5 to All on Tue Feb 28 23:04:39 2023

Though it's still probably not as useful as you might hope. In C, if I
can do "int id = counter++;" atomically, it would guarantee me a new
ID that no other thread could ever have.

C does not have to do that atomically. In fact it is free to use lots of instructions to build the int value. And some compilers indeed do, the linux kernel folks see this in gcc generated code.

I understand you have to use the new atomics features.

Barry

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Chris Angelico@21:1/5 to Barry on Wed Mar 1 12:58:54 2023

On Wed, 1 Mar 2023 at 10:04, Barry <barry@barrys-emacs.org> wrote:

Though it's still probably not as useful as you might hope. In C, if I
can do "int id = counter++;" atomically, it would guarantee me a new
ID that no other thread could ever have.

C does not have to do that atomically. In fact it is free to use lots of instructions to build the int value. And some compilers indeed do, the linux kernel folks see this in gcc generated code.

I understand you have to use the new atomics features.

Yeah, I didn't have a good analogy so I went with a hypothetical. The atomicity would be more useful in that context as it would give
lock-free ID generation, which doesn't work in Python.

ChrisA

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Lonewolf
  Thu Apr 10 02:12:57 2025
  from Little Flock, Ar via Telnet
- Bob Worm
  Wed Apr 9 21:38:08 2025
  from Wales, Uk via Telnet
- Guest
  Wed Apr 9 19:06:14 2025
  from A via Raw
- Keyop
  Wed Apr 9 14:31:47 2025
  from Huddersfield, West Yorkshire via SSH
- Keyop
  Wed Apr 9 14:31:22 2025
  from Huddersfield, West Yorkshire via SSH
- Alf
  Wed Apr 9 12:51:34 2025
  from Ita via Telnet
- Alf
  Wed Apr 9 12:32:08 2025
  from Ita via Telnet
- Bob Worm
  Wed Apr 9 08:55:40 2025
  from Wales, Uk via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	446
Nodes:	16 (2 / 14)
Uptime:	19:42:53
Calls:	9,234
Calls today:	1
Files:	13,496
Messages:	6,063,226

Is there a more efficient threading lock?

Who's Online

Recent Visitors

System Info