Skip to content

Commit fd471b8

Browse files
committed
gh-118761: Reduce import time of gettext.py by delaying re import
gettext is often imported in programs that may not end up translating anything. In fact, the `struct` module already has a delayed import when parsing GNUTranslations to speed up the no .mo files case. The re module is also used in the same situation, but behind a function chain only called by GNUTranslations. cache the compiled regex globally the first time it is used. The finditer function can be converted to a method call on the compiled object (it always could) which is slightly more efficient and necessary for the conditional re import.
1 parent d05140f commit fd471b8

File tree

2 files changed

+21
-15
lines changed

2 files changed

+21
-15
lines changed

Lib/gettext.py

+18-15
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,6 @@
4848

4949
import operator
5050
import os
51-
import re
5251
import sys
5352

5453

@@ -70,22 +69,26 @@
7069
# https://www.gnu.org/software/gettext/manual/gettext.html#Plural-forms
7170
# http://git.savannah.gnu.org/cgit/gettext.git/tree/gettext-runtime/intl/plural.y
7271

73-
_token_pattern = re.compile(r"""
74-
(?P<WHITESPACES>[ \t]+) | # spaces and horizontal tabs
75-
(?P<NUMBER>[0-9]+\b) | # decimal integer
76-
(?P<NAME>n\b) | # only n is allowed
77-
(?P<PARENTHESIS>[()]) |
78-
(?P<OPERATOR>[-*/%+?:]|[><!]=?|==|&&|\|\|) | # !, *, /, %, +, -, <, >,
79-
# <=, >=, ==, !=, &&, ||,
80-
# ? :
81-
# unary and bitwise ops
82-
# not allowed
83-
(?P<INVALID>\w+|.) # invalid token
84-
""", re.VERBOSE|re.DOTALL)
85-
72+
_token_pattern = None
8673

8774
def _tokenize(plural):
88-
for mo in re.finditer(_token_pattern, plural):
75+
global _token_pattern
76+
if _token_pattern is None:
77+
import re
78+
_token_pattern = re.compile(r"""
79+
(?P<WHITESPACES>[ \t]+) | # spaces and horizontal tabs
80+
(?P<NUMBER>[0-9]+\b) | # decimal integer
81+
(?P<NAME>n\b) | # only n is allowed
82+
(?P<PARENTHESIS>[()]) |
83+
(?P<OPERATOR>[-*/%+?:]|[><!]=?|==|&&|\|\|) | # !, *, /, %, +, -, <, >,
84+
# <=, >=, ==, !=, &&, ||,
85+
# ? :
86+
# unary and bitwise ops
87+
# not allowed
88+
(?P<INVALID>\w+|.) # invalid token
89+
""", re.VERBOSE|re.DOTALL)
90+
91+
for mo in _token_pattern.finditer(plural):
8992
kind = mo.lastgroup
9093
if kind == 'WHITESPACES':
9194
continue
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
Reduce import time of :mod:`gettext` by up to ten times, by importing
2+
:mod:`re` on demand. In particular, ``re`` is no longer implicitly
3+
exposed as ``gettext.re``. Patch by Eli Schwartz.

0 commit comments

Comments
 (0)