-
-
Notifications
You must be signed in to change notification settings - Fork 32k
gh-111881: Import _sha2 lazily in random #111889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The random module now imports the _sha2 module lazily in the Random.seed() method for str, bytes and bytearray seeds. It also imports lazily the warnings module in the _randbelow() method for classes without getrandbits(). Lazy import makes Python startup faster and reduces the number of imported modules at startup.
969e5c8
to
f19af2f
Compare
Oh right, I updated my PR to also import the warnings module lazily. It's common in the stdlib to only import warnings when a warning is logged, especially to emit a deprecation warning. |
This change only impacts uncommon code paths. IMO it's more common to call seed() with an integer, or just not call seed() at all. Also, random.Random subclasses which don't implement getrandbits() should be updated to implement getrandbits(). I suppose that the warnings is a good reminder for that :-) Note: A few years ago, I proposed adding a BaseRandom class which would make inheritance better defined, but I abandoned this approach: issue gh-84526. |
+1 for the lazy import of warnings. -1 for the lazy import of sha import which is already very fast. Also, I've seen production code such as a BloomFilter that would be adversely impacted this because it frequently reseeds with string keys. Seeding with strings isn't rare enough to cripple that code path. It is the easiest way to generate a high quality seed. Oeople do better at thinking up arbitrary words and phrases than they do at choosing an arbitrary numerical seed and the SHA does a good job of extending that choice to a large number of bits and does so in way that is hard to invert. |
Benchmark on moving the import inside seed(): Mean +- std dev: [ref] 5.32 us +- 0.09 us -> [lazy] 5.54 us +- 0.09 us: 1.04x slower. So yes, it has a significant negative impact on performance. So I added a global variable: with a global variable, there is no impact on performance:
Benchmark script,
I ran the benchmark with:
|
Merged. Thanks for reviews @AlexWaygood and @rhettinger. |
It's sad that import is "so slow" when the module is already loaded, but I'm not interested to dig into import performance. Using a global variable is simple, similar to what we had before, and seems to be very efficient (hot code only has to check if @rhettinger: I didn't know that passing a string to seed() was common, thanks, it forces me to think about a different approach to avoid any performance issue. |
The random module now imports the _sha2 module lazily in the Random.seed() method for str, bytes and bytearray seeds. It also imports lazily the warnings module in the _randbelow() method for classes without getrandbits(). Lazy import makes Python startup faster and reduces the number of imported modules at startup.
The random module now imports the _sha2 module lazily in the Random.seed() method for str, bytes and bytearray seeds. It also imports lazily the warnings module in the _randbelow() method for classes without getrandbits(). Lazy import makes Python startup faster and reduces the number of imported modules at startup.
The random module now imports the _sha2 module laziliy in Random.seed() method for str, bytes and bytearray seeds. Lazy import makes Python startup faster and reduce the number of imported modules at startup.