-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[gyb] Python 2 or 3 compatible Generate Your Boilerplate #806
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
d944e27
to
42604fa
Compare
42604fa
to
fae1495
Compare
I've gone ahead and put all of my Python 2/3 code on this branch. At this point I am able to compile, with these patches, using either Python 2.7.11 (OS X) or 3.5.1 (Arch Linux). 🎉 I've deleted my |
878f13e
to
b1e4585
Compare
@@ -27,7 +27,7 @@ lineForName = {} | |||
|
|||
# Load the data file. | |||
with open(CFDatabaseFile, 'rb') as f: | |||
for line in f: | |||
for line in [line_bytes.decode('utf-8') for line_bytes in f]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should make this:
for line in (line_bytes.decode('utf-8') for line_bytes in f):
so that it utilizes generators instead of building up a list unnecessarily
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was a good suggestion. I've never written Python before and I hadn't noticed the difference between the parenthesis and square brackets.
Though in the end I actually found there is a codecs
module that does what I am looking for explicitly and I've refactored the code to make use of that instead.
8a96152
to
464426a
Compare
@@ -304,7 +309,7 @@ def splitGybLines(sourceLines): | |||
dedents = 0 | |||
try: | |||
for tokenKind, tokenText, tokenStart, (tokenEndLine, tokenEndCol), lineText \ | |||
in tokenize.generate_tokens(sourceLines.__iter__().next): | |||
in tokenize.generate_tokens(partial(next, sourceLines.__iter__())): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The uses of partial
don’t read very well. Would we be better off with a lambda?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about:
sourceLinesIter = iter(sourceLines)
try:
for tokenKind, tokenText, tokenStart, (tokenEndLine, tokenEndCol), lineText \
in tokenize.generate_tokens(next(sourceLinesIter)):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be fine except the result of next(sourceLinesIter)
is not callable. Taken from tokenize.generate_tokens
docs
... requires one argument, readline, which must be a callable object which provides the same interface as the readline() method of built-in file objects (see section File Objects) ...
Notice that in the original code next
was sent to tokenize.generate_tokens
not next()
, i.e., something that is callable. @frewsxcv suggestion is the equivalent of sending next()
and not next
. Hence why I went with the partial function application.
Testing
I first applied this patch:
diff --git a/utils/gyb.py b/utils/gyb.py
index ab422f1..68a8280 100755
--- a/utils/gyb.py
+++ b/utils/gyb.py
@@ -307,9 +307,10 @@ def splitGybLines(sourceLines):
unmatchedIndents = []
dedents = 0
+ sourceLinesIter = iter(sourceLines)
try:
for tokenKind, tokenText, tokenStart, (tokenEndLine, tokenEndCol), lineText \
- in tokenize.generate_tokens(partial(next, sourceLines.__iter__())):
+ in tokenize.generate_tokens(next(sourceLinesIter)):
if tokenKind in (tokenize.COMMENT, tokenize.ENDMARKER):
continue
Then I ran this from the utils
directory.
$ python3
>>> from gyb import *
>>> src = splitLines('''\
... if x:
... print x
... if y: # trailing comment
... print z
... if z: # another comment\
... ''')
>>> splitGybLines(src)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/lovelett/Source/swift/utils/gyb.py", line 313, in splitGybLines
in tokenize.generate_tokens(next(sourceLinesIter)):
File "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/tokenize.py", line 514, in _tokenize
line = readline()
TypeError: 'str' object is not callable
Expected:
$ python3
>>> from gyb import *
>>> src = splitLines('''\
... if x:
... print x
... if y: # trailing comment
... print z
... if z: # another comment\
... ''')
>>> splitGybLines(src)
[3, 5]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blah, you're right. Forgot that generate_tokens
wants a callable. You might be able to get by with:
tokenize.generate_tokens(sourceLinesIter.next)
...since next
is a method on an iterator, though I'm not entirely sure. Otherwise, I guess just stick with a lambda
or partial
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@frewsxcv unfortunately next
only exists on an iterator in Python 2. PEP 3114 renamed the next
function to __next__
in Python 3 and got rid of next
entirely. Or put another way, it works as expected in Python 2 and fails with the runtime error AttributeError: 'list_iterator' object has no attribute 'next'
if you run the same code in Python 3.
Like I said before somewhere else, this is the first Python code I've ever written or worked with. I say that to say please forgive my ignorance when asking what may be a stupid question: what is a lambda
? Is that some sort of Python keyword?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lambda
is a keyword to create an anonymous function in Python. Here's a short demo:
>>> some_string = "hello!"
>>> some_string
'hello!'
>>> some_function = lambda: some_string
>>> some_function()
'hello!'
>>> another_function = lambda s: some_string + s
>>> another_function(" world!")
'hello! world!'
So in this particular case, you'd probably want to do something like this:
tokenize.generate_tokens(lambda: next(sourceLinesIter))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've gone ahead and switched over to the lambda
syntax (see: dc70d6d06119b72f3733da56fccb3c2d66082d8a). As a newbie to Python this feels like a lateral move but I defer to those who have been there before me.
@frewsxcv thanks for showing me some Python! 🍻
7c43fdf
to
8f22300
Compare
[PEP 3114](https://www.python.org/dev/peps/pep-3114/) renamed `iterator.next()` (Python 2) to `iterator.__next__()` (Python 3). The recommended solution to make the code work in both Python 2 and 3 is to call the global `next` function. To use this recommended global `next` function this patch uses a lambda function to supply `tokenize.generate_tokens` a callable function as it was previously. This should be functionally equivalent to the old code with the added benefit of working on both Python 2 and 3.
The `StringIO` and `cStringIO` modules are gone in Python 3. The recommendation is to import the `io` module on Python 3. Therefore, this patch first attempts to import the Python 2 module, `cStringIO`, and if that fails then attempts to import the Python 3 module, `io`. **NOTE**: There are still other Python 3.x fixes necessary to make `gyb` run on a Python 3.x interpreter. This is just one small incremental patch on the way there.
In Python 2 the syntax to catch exceptions was: except (Exception1, Exception2), target: In Python 3 the syntax to catch exceptions is: except (Exception1, Exception2) as target: This newer Python 3 syntax is also available in 2.6 and 2.7. Therefore, it is preferred for compatibility reasons. Additionally, the target no longer can be a tuple. This patch refactors the exception handeling code to be the newer Python 3 exception catch syntax.
All strings are sequences of Unicode characters in Python 3. This is entirely different than that of Python 2. Python 2's strings were of bytes. However, Python 2 does have the concept of Unicode strings. This patch changes the behavior of the file reader to use the same the codecs module on Python 2 to properly read a string into a unicode string. From there the strings are meant to be equivalent on 2 and 3. The rest of the patch just updates the code to natively work with unicode strings. To test the class `GraphemeClusterBreakPropertyTable`: $ python2 utils/gyb --test \ -DunicodeGraphemeBreakPropertyFile=./utils/UnicodeData/GraphemeBreakProperty.txt \ -DunicodeGraphemeBreakTestFile=./utils/UnicodeData/GraphemeBreakTest.txt \ -DCMAKE_SIZEOF_VOID_P=8 \ -o /tmp/UnicodeExtendedGraphemeClusters.cpp.2.7.tmp \ ./stdlib/public/stubs/UnicodeExtendedGraphemeClusters.cpp.gyb $ python3 utils/gyb --test \ -DunicodeGraphemeBreakPropertyFile=./utils/UnicodeData/GraphemeBreakProperty.txt \ -DunicodeGraphemeBreakTestFile=./utils/UnicodeData/GraphemeBreakTest.txt \ -DCMAKE_SIZEOF_VOID_P=8 \ -o /tmp/UnicodeExtendedGraphemeClusters.cpp.3.5.tmp \ ./stdlib/public/stubs/UnicodeExtendedGraphemeClusters.cpp.gyb $ diff -u /tmp/UnicodeExtendedGraphemeClusters.cpp.2.7.tmp \ /tmp/UnicodeExtendedGraphemeClusters.cpp.3.5.tmp To test the method `get_grapheme_cluster_break_tests_as_UTF8`: $ python2 utils/gyb --test \ -DunicodeGraphemeBreakPropertyFile=./utils/UnicodeData/GraphemeBreakProperty.txt \ -DunicodeGraphemeBreakTestFile=./utils/UnicodeData/GraphemeBreakTest.txt \ -DCMAKE_SIZEOF_VOID_P=8 \ -o /tmp/UnicodeGraphemeBreakTest.cpp.2.7.tmp \ ./unittests/Basic/UnicodeGraphemeBreakTest.cpp.gyb $ python3 utils/gyb --test \ -DunicodeGraphemeBreakPropertyFile=./utils/UnicodeData/GraphemeBreakProperty.txt \ -DunicodeGraphemeBreakTestFile=./utils/UnicodeData/GraphemeBreakTest.txt \ -DCMAKE_SIZEOF_VOID_P=8 \ -o /tmp/UnicodeGraphemeBreakTest.cpp.3.5.tmp \ ./unittests/Basic/UnicodeGraphemeBreakTest.cpp.gyb $ diff -u /tmp/UnicodeGraphemeBreakTest.cpp.2.7.tmp \ /tmp/UnicodeGraphemeBreakTest.cpp.3.5.tmp
PEP 3106 [1] changed the behavior of the dictionaries `items` method. In Python 2, `items` builds a real list of tuples where `iteritems` returns a generator. PEP 3106 changes Python 3's `items` method to be equivalent to Python 2's `iteritems` and completely removes `iteritems` in Python 3. This patch switches to both to use `items`. This could have a negative impact on Python 2's performance because it now causes the dictionary tuples to be built in memory. [1] https://www.python.org/dev/peps/pep-3106/
Python 2's map function [1] returns a list by default. Compared with Python 3's map function [2] which returns an iterator (or map object). The former is subscriptable, while the latter is not. This patch explicitly converts the result of some map operations to be a list. That way they have the same intended behaviour on both Python 2 and 3. [1] https://docs.python.org/2/library/functions.html#map [2] https://docs.python.org/3/library/functions.html#map
The Popen command on Python returns a byte sequence on stdout by default. However by sending the constructor the argument `universal_newlines=True` it forces the Popen to put a string on stdout. This was not a problem on Python 2 because the Python 2 regex engine seemed to work find on byte sequences where Python 3's does not. By explicitly converting everything to a string the same behavior is now seen on Python 2 and 3. See: https://docs.python.org/2/library/subprocess.html#frequently-used-arguments See: https://docs.python.org/3/library/subprocess.html#frequently-used-arguments
@dabrahams I've updated (7700b3b) per your comments. |
@RLovelett Thanks! |
[gyb] Python 2 or 3 compatible Generate Your Boilerplate
EDIT: I've edited it since I first submitted this pull-request. I've added all the other patches I found necessary to get the
gyb
script to run on either Python 2 or 3 (as opposed to the single patch I submitted in the first place). As result this comment has been reformatted. My goal was to write fairly explicit commit messages that explain what each commit does on its own. They may be worth reviewing to figure out why each change was made.