I have a multi-threaded program which calls out to a non-thread-safe
library (not mine) in a couple places. I guard against multiple
threads executing code there using threading.Lock. The code is straightforward:
from threading import Lock
# Something in textblob and/or nltk doesn't play nice with no-gil, so just
# serialize all blobby accesses.
BLOB_LOCK = Lock()
def get_terms(text):
with BLOB_LOCK:
phrases = TextBlob(text, np_extractor=EXTRACTOR).noun_phrases
for phrase in phrases:
yield phrase
When I monitor the application using py-spy, that with statement is
consuming huge amounts of CPU. Does threading.Lock.acquire() sleep
anywhere? I didn't see anything obvious poking around in the C code
which implements this stuff. I'm no expert though, so could easily
have missed something.
I have a multi-threaded program which calls out to a non-thread-safe
library (not mine) in a couple places. I guard against multiple
threads executing code there using threading.Lock. The code is straightforward:
from threading import Lock
# Something in textblob and/or nltk doesn't play nice with no-gil, so just
# serialize all blobby accesses.
BLOB_LOCK = Lock()
def get_terms(text):
with BLOB_LOCK:
phrases = TextBlob(text, np_extractor=EXTRACTOR).noun_phrases
for phrase in phrases:
yield phrase
When I monitor the application using py-spy, that with statement is
consuming huge amounts of CPU.
Does threading.Lock.acquire() sleep anywhere?
from threading import Lock
1) you generally want to use RLock rather than LockWhy?
What does this mean? Are you saying the GIL has been removed?
Skip Montanaro <skip.montanaro@gmail.com> writes:
from threading import Lock
1) you generally want to use RLock rather than Lock
2) I have generally felt that using locks at the app level at all is an antipattern. The main way I've stayed sane in multi-threaded Python
code is to have every mutable strictly owned by exactly one thread, pass values around using Queues, and have an event loop in each thread taking requests from Queues.
3) I didn't know that no-gil was a now thing and I'm used to having the
GIL. So I would have considered the multiprocessing module rather than threading, for something like this.
BLOB_LOCK = Lock()
def get_terms(text):
with BLOB_LOCK:
phrases = TextBlob(text, np_extractor=EXTRACTOR).noun_phrases
for phrase in phrases:
yield phrase
When I monitor the application using py-spy, that with statement is
consuming huge amounts of CPU.
Which OS is this?
I'm no expert on locks, but you don't usually want to keep a lock while
some long-running computation goes on. You want the computation to be
done by a separate thread, put its results somewhere, and then notify
the choreographing thread that the result is ready.
This link may be helpful -
https://anandology.com/blog/using-iterators-and-generators/
In this case I'm extracting the noun phrases from the body of an email message (returned as a list). I have a collection of email messages
organized by month (typically 1000 to 3000 messages per month).
Thanks for the responses.
Peter wrote:
Which OS is this?
MacOS Ventura 13.1, M1 MacBook Pro (eight cores).
Thomas wrote:
I'm no expert on locks, but you don't usually want to keep a lock while some long-running computation goes on. You want the computation to be done by a separate thread, put its results somewhere, and then notify
the choreographing thread that the result is ready.
In this case I'm extracting the noun phrases from the body of an email message(returned as a list). I have a collection of email messages
organized by month(typically 1000 to 3000 messages per month). I'm using concurrent.futures.ThreadPoolExecutor() with the default number of
workers (os.cpu_count() * 1.5, or 12 threads on my system)to process
each month, so 12 active threads at a time. Given that the process is
pretty much CPU bound, maybe reducing the number of workers to the CPU
count would make sense. Processing of each email message enters that
with block once.That's about as minimal as I can make it. I thought for
a bit about pushing the textblob stuff into a separate worker thread,
but it wasn't obvious how to set up queues to handle the communication between the threads created by ThreadPoolExecutor()and the worker
thread. Maybe I'll think about it harder. (I have a related problem with SQLite, since an open database can't be manipulated from multiple
threads. That makes much of the program's end-of-run processing single-threaded.)
This link may be helpful -
https://anandology.com/blog/using-iterators-and-generators/<https://anandology.com/blog/using-iterators-and-generators/>
I don't think that's where my problem is. The lock protects the
generation of the noun phrases. My loop which does the yielding operates outside of that lock's control. The version of the code is my latest, in which I tossed out a bunch of phrase-processing code (effectively dead
end ideas for processing the phrases). Replacing the for loop with a
simple return seems not to have any effect. In any case, the caller
which uses the phrases does a fair amount of extra work with the
phrases, populating a SQLite database, so I don't think the amount of
time it takes to process a single email message is dominated by the
phrase generation.
Here's timeitoutput for the noun_phrases code:
% python -m timeit -s 'text = """`python -m timeit --help`""" ; from
textblob import TextBlob ; from textblob.np_extractors import
ConllExtractor ; ext = ConllExtractor() ; phrases = TextBlob(text, np_extractor=ext).noun_phrases' 'phrases = TextBlob(text, np_extractor=ext).noun_phrases'
5000 loops, best of 5: 98.7 usec per loop
I process the output of timeit's help message which looks to be about
the same length as a typical email message, certainly the same order of magnitude. Also, note that I call it once in the setup to eliminate the initial training of the ConllExtractor instance. I don't know if ~100us qualifies as long running or not.
I'll keep messing with it.
Skip
Jon Ribbens <jon+usenet@unequivocal.eu> writes:
1) you generally want to use RLock rather than LockWhy?
So that a thread that tries to acquire it twice doesn't block itself,
etc. Look at the threading lib docs for more info.
What does this mean? Are you saying the GIL has been removed?
Last I heard there was an experimental version of CPython with the GIL removed. It is supposed to take less of a performance hit due to INCREF/DECREF than an earlier attempt some years back. I don't know its current status.
The GIL is an evil thing, but it has been around for so long that most
of us have gotten used to it, and some user code actually relies on it.
For example, with the GIL in place, a statement like "x += 1" is always atomic, I believe. But, I think it is better to not have any shared
mutables regardless.
concurrent.futures.ThreadPoolExecutor() with the default number of workers ( >os.cpu_count() * 1.5, or 12 threads on my system) to process each month, so >12 active threads at a time. Given that the process is pretty much CPU
bound, maybe reducing the number of workers to the CPU count would make
On Sat, 25 Feb 2023 15:41:52 -0600, Skip Montanaro
<skip.montanaro@gmail.com> declaimed the following:
concurrent.futures.ThreadPoolExecutor() with the default number of workers ( >os.cpu_count() * 1.5, or 12 threads on my system) to process each month, so >12 active threads at a time. Given that the process is pretty much CPU >bound, maybe reducing the number of workers to the CPU count would make
Unless things have improved a lot over the years, the GIL still limits
active threads to the equivalent of a single CPU. The OS may swap among
which CPU as it schedules system processes, but only one thread will be running at any moment regardless of CPU count.
On 2023-02-25, Paul Rubin <no.email@nospam.invalid> wrote:
The GIL is an evil thing, but it has been around for so long that most
of us have gotten used to it, and some user code actually relies on it.
For example, with the GIL in place, a statement like "x += 1" is always atomic, I believe. But, I think it is better to not have any shared mutables regardless.
I think it is the case that x += 1 is atomic but foo.x += 1 is not.
Any replacement for the GIL would have to keep the former at least,
plus the fact that you can do hundreds of things like list.append(foo)
which are all effectively atomic.
... d[0] += 1def thrd():
1 0 RESUME 0import dis
dis.dis(thrd)
... x += 1def thrd():
1 0 RESUME 0dis.dis(thrd)
I think it is the case that x += 1 is atomic but foo.x += 1 is not.
:... Â Â Â a += 1def x(a):
 1           0 RESUME                   0
dis.dis(x)
On Sun, 26 Feb 2023 at 16:16, Jon Ribbens via Python-list
<python-list@python.org> wrote:
On 2023-02-25, Paul Rubin <no.email@nospam.invalid> wrote:
The GIL is an evil thing, but it has been around for so long that most
of us have gotten used to it, and some user code actually relies on it.
For example, with the GIL in place, a statement like "x += 1" is always
atomic, I believe. But, I think it is better to not have any shared
mutables regardless.
I think it is the case that x += 1 is atomic but foo.x += 1 is not.
Any replacement for the GIL would have to keep the former at least,
plus the fact that you can do hundreds of things like list.append(foo)
which are all effectively atomic.
The GIL is most assuredly *not* an evil thing. If you think it's so
evil, go ahead and remove it, because we'll clearly be better off
without it, right?
As it turns out, most GIL-removal attempts have had a fairly nasty
negative effect on performance. The GIL is a huge performance boost.
As to what is atomic and what is not... it's complicated, as always.
Suppose that x (or foo.x) is a custom type:
Here's the equivalent with just incrementing a global:
... x += 1def thrd():
...
1 0 RESUME 0dis.dis(thrd)
2 2 LOAD_FAST_CHECK 0 (x)
4 LOAD_CONST 1 (1)
6 BINARY_OP 13 (+=)
10 STORE_FAST 0 (x)
12 LOAD_CONST 0 (None)
14 RETURN_VALUE
The exact same sequence: load, add, store. Still not atomic.
On 25/02/2023 23:45, Jon Ribbens via Python-list wrote:
I think it is the case that x += 1 is atomic but foo.x += 1 is not.
No that is not true, and has never been true.
:... Â Â Â a += 1def x(a):
:...
 1           0 RESUME                   0
dis.dis(x)
 2           2 LOAD_FAST                0 (a)
             4 LOAD_CONST               1 (1)
             6 BINARY_OP               13 (+=)
            10 STORE_FAST               0 (a)
            12 LOAD_CONST               0 (None)
            14 RETURN_VALUE
As you can see there are 4 byte code ops executed.
Python's eval loop can switch to another thread between any of them.
Its is not true that the GIL provides atomic operations in python.
The GIL is most assuredly *not* an evil thing. If you think it's so
evil, go ahead and remove it, because we'll clearly be better off
without it, right? ... As it turns out, most GIL-removal attempts
have had a fairly nasty negative effect on performance. The GIL is a
huge performance boost.
And yet, it appears that *something* changed between Python 2 and Python3 such that it *is* atomic:
On 2023-02-26, Chris Angelico <rosuav@gmail.com> wrote:
On Sun, 26 Feb 2023 at 16:16, Jon Ribbens via Python-list
<python-list@python.org> wrote:
On 2023-02-25, Paul Rubin <no.email@nospam.invalid> wrote:
The GIL is an evil thing, but it has been around for so long that most >> > of us have gotten used to it, and some user code actually relies on it. >> > For example, with the GIL in place, a statement like "x += 1" is always >> > atomic, I believe. But, I think it is better to not have any shared
mutables regardless.
I think it is the case that x += 1 is atomic but foo.x += 1 is not.
Any replacement for the GIL would have to keep the former at least,
plus the fact that you can do hundreds of things like list.append(foo)
which are all effectively atomic.
The GIL is most assuredly *not* an evil thing. If you think it's so
evil, go ahead and remove it, because we'll clearly be better off
without it, right?
If you say so. I said nothing whatsoever about the GIL being evil.
Yes, sure, you can make x += 1 not work even single-threaded if you
make custom types which override basic operations. I'm talking about
when you're dealing with simple atomic built-in types such as integers.
Here's the equivalent with just incrementing a global:
... x += 1def thrd():
...
1 0 RESUME 0dis.dis(thrd)
2 2 LOAD_FAST_CHECK 0 (x)
4 LOAD_CONST 1 (1)
6 BINARY_OP 13 (+=)
10 STORE_FAST 0 (x)
12 LOAD_CONST 0 (None)
14 RETURN_VALUE
The exact same sequence: load, add, store. Still not atomic.
And yet, it appears that *something* changed between Python 2
and Python 3 such that it *is* atomic:
On 2023-02-26, Barry Scott <barry@barrys-emacs.org> wrote:
On 25/02/2023 23:45, Jon Ribbens via Python-list wrote:
I think it is the case that x += 1 is atomic but foo.x += 1 is not.
No that is not true, and has never been true.
:... a += 1def x(a):
:...
1 0 RESUME 0
dis.dis(x)
2 2 LOAD_FAST 0 (a)
4 LOAD_CONST 1 (1)
6 BINARY_OP 13 (+=)
10 STORE_FAST 0 (a)
12 LOAD_CONST 0 (None)
14 RETURN_VALUE
As you can see there are 4 byte code ops executed.
Python's eval loop can switch to another thread between any of them.
Its is not true that the GIL provides atomic operations in python.
That's oversimplifying to the point of falsehood (just as the opposite
would be too). And: see my other reply in this thread just now - if the
GIL isn't making "x += 1" atomic, something else is.
--
https://mail.python.org/mailman/listinfo/python-list
10 0 LOAD_GLOBAL 0 (range)import code
import dis
dis.dis( code.update_x_times )
I wanted to provide an example that your claimed atomicity is simply
wrong, but I found there is something different in the 3.10+ cpython implementations.
I've tested the code at the bottom of this message using a few docker
python images, and it appears there is a difference starting in 3.10.0
python3.8
EXPECTED 2560000000
ACTUAL 84533137
python:3.9
EXPECTED 2560000000
ACTUAL 95311773
python:3.10 (.8)
EXPECTED 2560000000
ACTUAL 2560000000
just to see if there was a specific sub-version of 3.10 that added it python:3.10.0
EXPECTED 2560000000
ACTUAL 2560000000
nope, from the start of 3.10 this is happening
the only difference in the bytecode I see is 3.10 adds SETUP_LOOP and POP_BLOCK around the for loop
I don't see anything different in the long c code that I would expect
would cause this.
AFAICT the inplace add is null for longs and so should revert to the
long_add that always creates a new integer in x_add
another test
python:3.11
EXPECTED 2560000000
ACTUAL 2560000000
I'm not sure where the difference is at the moment. I didn't see anything
in the release notes given a quick glance.
I do agree that you shouldn't depend on this unless you find a written guarantee of the behavior, as it is likely an implementation quirk of some kind
--[code]--
import threading
UPDATES = 10000000
THREADS = 256
vv = 0
def update_x_times( xx ):
for _ in range( xx ):
global vv
vv += 1
def main():
tts = []
for _ in range( THREADS ):
tts.append( threading.Thread( target = update_x_times, args = (UPDATES,) ) )
for tt in tts:
tt.start()
for tt in tts:
tt.join()
print( 'EXPECTED', UPDATES * THREADS )
print( 'ACTUAL ', vv )
if __name__ == '__main__':
main()
On Sun, Feb 26, 2023 at 6:35 PM Jon Ribbens via Python-list < python-list@python.org> wrote:
On 2023-02-26, Barry Scott <barry@barrys-emacs.org> wrote:
On 25/02/2023 23:45, Jon Ribbens via Python-list wrote:
I think it is the case that x += 1 is atomic but foo.x += 1 is not.
No that is not true, and has never been true.
:... a += 1def x(a):
:...
1 0 RESUME 0
dis.dis(x)
2 2 LOAD_FAST 0 (a)
4 LOAD_CONST 1 (1)
6 BINARY_OP 13 (+=)
10 STORE_FAST 0 (a)
12 LOAD_CONST 0 (None)
14 RETURN_VALUE
As you can see there are 4 byte code ops executed.
Python's eval loop can switch to another thread between any of them.
Its is not true that the GIL provides atomic operations in python.
That's oversimplifying to the point of falsehood (just as the opposite
would be too). And: see my other reply in this thread just now - if the
GIL isn't making "x += 1" atomic, something else is.
--
https://mail.python.org/mailman/listinfo/python-list
https://github.com/python/cpython/commit/4958f5d69dd2bf86866c43491caf72f774ddec97
it's a quirk of implementation. the scheduler currently only checks if it needs to release the gil after the POP_JUMP_IF_FALSE, POP_JUMP_IF_TRUE, JUMP_ABSOLUTE, CALL_METHOD, CALL_FUNCTION, CALL_FUNCTION_KW, and CALL_FUNCTION_EX opcodes.
So it wouldn't be too hard for a future release of Python to mandate atomicity of certain specific operations.... "only when adding
integers onto core data types" etc, a simple statement like "x.y += 1"
could actually be guaranteed to take place atomically.
Though it's still probably not as useful as you might hope. In C, if I
can do "int id = counter++;" atomically, it would guarantee me a new
ID that no other thread could ever have.
Though it's still probably not as useful as you might hope. In C, if I
can do "int id = counter++;" atomically, it would guarantee me a new
ID that no other thread could ever have.
C does not have to do that atomically. In fact it is free to use lots of instructions to build the int value. And some compilers indeed do, the linux kernel folks see this in gcc generated code.
I understand you have to use the new atomics features.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 446 |
Nodes: | 16 (2 / 14) |
Uptime: | 19:42:53 |
Calls: | 9,234 |
Calls today: | 1 |
Files: | 13,496 |
Messages: | 6,063,226 |