[gyb] Python 2 or 3 compatible Generate Your Boilerplate #806

RLovelett · 2015-12-28T20:32:20Z

EDIT: I've edited it since I first submitted this pull-request. I've added all the other patches I found necessary to get the gyb script to run on either Python 2 or 3 (as opposed to the single patch I submitted in the first place). As result this comment has been reformatted. My goal was to write fairly explicit commit messages that explain what each commit does on its own. They may be worth reviewing to figure out why each change was made.

RLovelett · 2015-12-30T20:54:50Z

I've gone ahead and put all of my Python 2/3 code on this branch. At this point I am able to compile, with these patches, using either Python 2.7.11 (OS X) or 3.5.1 (Arch Linux). 🎉

I've deleted my gyb-python-3-wip branch because it no longer is a work in progress. If you want me to split up these commits into multiple pulls just let me know.

RLovelett · 2015-12-30T21:18:22Z

Hopefully this should help solve some of the issues that were left by #83 and #90.

frewsxcv · 2015-12-31T05:57:35Z

lib/ClangImporter/SortedCFDatabase.def.gyb

@@ -27,7 +27,7 @@ lineForName = {}

 # Load the data file.
 with open(CFDatabaseFile, 'rb') as f:
-  for line in f:
+  for line in [line_bytes.decode('utf-8') for line_bytes in f]:


You should make this:

for line in (line_bytes.decode('utf-8') for line_bytes in f):

so that it utilizes generators instead of building up a list unnecessarily

https://www.python.org/dev/peps/pep-0289/

That was a good suggestion. I've never written Python before and I hadn't noticed the difference between the parenthesis and square brackets.

Though in the end I actually found there is a codecs module that does what I am looking for explicitly and I've refactored the code to make use of that instead.

dabrahams · 2015-12-31T17:37:51Z

utils/gyb.py

@@ -304,7 +309,7 @@ def splitGybLines(sourceLines):
    dedents = 0
    try:
        for tokenKind, tokenText, tokenStart, (tokenEndLine, tokenEndCol), lineText \
-            in tokenize.generate_tokens(sourceLines.__iter__().next):
+            in tokenize.generate_tokens(partial(next, sourceLines.__iter__())):


The uses of partial don’t read very well. Would we be better off with a lambda?

How about:

sourceLinesIter = iter(sourceLines) try: for tokenKind, tokenText, tokenStart, (tokenEndLine, tokenEndCol), lineText \ in tokenize.generate_tokens(next(sourceLinesIter)):

That would be fine except the result of next(sourceLinesIter) is not callable. Taken from tokenize.generate_tokens docs

... requires one argument, readline, which must be a callable object which provides the same interface as the readline() method of built-in file objects (see section File Objects) ...

Notice that in the original code next was sent to tokenize.generate_tokens not next(), i.e., something that is callable. @frewsxcv suggestion is the equivalent of sending next() and not next. Hence why I went with the partial function application.

Testing

I first applied this patch:

diff --git a/utils/gyb.py b/utils/gyb.py index ab422f1..68a8280 100755 --- a/utils/gyb.py +++ b/utils/gyb.py @@ -307,9 +307,10 @@ def splitGybLines(sourceLines): unmatchedIndents = [] dedents = 0 + sourceLinesIter = iter(sourceLines) try: for tokenKind, tokenText, tokenStart, (tokenEndLine, tokenEndCol), lineText \ - in tokenize.generate_tokens(partial(next, sourceLines.__iter__())): + in tokenize.generate_tokens(next(sourceLinesIter)): if tokenKind in (tokenize.COMMENT, tokenize.ENDMARKER): continue

Then I ran this from the utils directory.

$ python3 >>> from gyb import * >>> src = splitLines('''\ ... if x: ... print x ... if y: # trailing comment ... print z ... if z: # another comment\ ... ''') >>> splitGybLines(src) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/lovelett/Source/swift/utils/gyb.py", line 313, in splitGybLines in tokenize.generate_tokens(next(sourceLinesIter)): File "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/tokenize.py", line 514, in _tokenize line = readline() TypeError: 'str' object is not callable

Expected:

$ python3 >>> from gyb import * >>> src = splitLines('''\ ... if x: ... print x ... if y: # trailing comment ... print z ... if z: # another comment\ ... ''') >>> splitGybLines(src) [3, 5]

Blah, you're right. Forgot that generate_tokens wants a callable. You might be able to get by with:

tokenize.generate_tokens(sourceLinesIter.next)

...since next is a method on an iterator, though I'm not entirely sure. Otherwise, I guess just stick with a lambda or partial

@frewsxcv unfortunately next only exists on an iterator in Python 2. PEP 3114 renamed the next function to __next__ in Python 3 and got rid of next entirely. Or put another way, it works as expected in Python 2 and fails with the runtime error AttributeError: 'list_iterator' object has no attribute 'next' if you run the same code in Python 3.

Like I said before somewhere else, this is the first Python code I've ever written or worked with. I say that to say please forgive my ignorance when asking what may be a stupid question: what is a lambda? Is that some sort of Python keyword?

lambda is a keyword to create an anonymous function in Python. Here's a short demo:

>>> some_string = "hello!" >>> some_string 'hello!' >>> some_function = lambda: some_string >>> some_function() 'hello!' >>> another_function = lambda s: some_string + s >>> another_function(" world!") 'hello! world!'

So in this particular case, you'd probably want to do something like this:

tokenize.generate_tokens(lambda: next(sourceLinesIter))

I've gone ahead and switched over to the lambda syntax (see: dc70d6d06119b72f3733da56fccb3c2d66082d8a). As a newbie to Python this feels like a lateral move but I defer to those who have been there before me.

@frewsxcv thanks for showing me some Python! 🍻

[PEP 3114](https://www.python.org/dev/peps/pep-3114/) renamed `iterator.next()` (Python 2) to `iterator.__next__()` (Python 3). The recommended solution to make the code work in both Python 2 and 3 is to call the global `next` function. To use this recommended global `next` function this patch uses a lambda function to supply `tokenize.generate_tokens` a callable function as it was previously. This should be functionally equivalent to the old code with the added benefit of working on both Python 2 and 3.

The `StringIO` and `cStringIO` modules are gone in Python 3. The recommendation is to import the `io` module on Python 3. Therefore, this patch first attempts to import the Python 2 module, `cStringIO`, and if that fails then attempts to import the Python 3 module, `io`. **NOTE**: There are still other Python 3.x fixes necessary to make `gyb` run on a Python 3.x interpreter. This is just one small incremental patch on the way there.

In Python 2 the syntax to catch exceptions was: except (Exception1, Exception2), target: In Python 3 the syntax to catch exceptions is: except (Exception1, Exception2) as target: This newer Python 3 syntax is also available in 2.6 and 2.7. Therefore, it is preferred for compatibility reasons. Additionally, the target no longer can be a tuple. This patch refactors the exception handeling code to be the newer Python 3 exception catch syntax.

All strings are sequences of Unicode characters in Python 3. This is entirely different than that of Python 2. Python 2's strings were of bytes. However, Python 2 does have the concept of Unicode strings. This patch changes the behavior of the file reader to use the same the codecs module on Python 2 to properly read a string into a unicode string. From there the strings are meant to be equivalent on 2 and 3. The rest of the patch just updates the code to natively work with unicode strings. To test the class `GraphemeClusterBreakPropertyTable`: $ python2 utils/gyb --test \ -DunicodeGraphemeBreakPropertyFile=./utils/UnicodeData/GraphemeBreakProperty.txt \ -DunicodeGraphemeBreakTestFile=./utils/UnicodeData/GraphemeBreakTest.txt \ -DCMAKE_SIZEOF_VOID_P=8 \ -o /tmp/UnicodeExtendedGraphemeClusters.cpp.2.7.tmp \ ./stdlib/public/stubs/UnicodeExtendedGraphemeClusters.cpp.gyb $ python3 utils/gyb --test \ -DunicodeGraphemeBreakPropertyFile=./utils/UnicodeData/GraphemeBreakProperty.txt \ -DunicodeGraphemeBreakTestFile=./utils/UnicodeData/GraphemeBreakTest.txt \ -DCMAKE_SIZEOF_VOID_P=8 \ -o /tmp/UnicodeExtendedGraphemeClusters.cpp.3.5.tmp \ ./stdlib/public/stubs/UnicodeExtendedGraphemeClusters.cpp.gyb $ diff -u /tmp/UnicodeExtendedGraphemeClusters.cpp.2.7.tmp \ /tmp/UnicodeExtendedGraphemeClusters.cpp.3.5.tmp To test the method `get_grapheme_cluster_break_tests_as_UTF8`: $ python2 utils/gyb --test \ -DunicodeGraphemeBreakPropertyFile=./utils/UnicodeData/GraphemeBreakProperty.txt \ -DunicodeGraphemeBreakTestFile=./utils/UnicodeData/GraphemeBreakTest.txt \ -DCMAKE_SIZEOF_VOID_P=8 \ -o /tmp/UnicodeGraphemeBreakTest.cpp.2.7.tmp \ ./unittests/Basic/UnicodeGraphemeBreakTest.cpp.gyb $ python3 utils/gyb --test \ -DunicodeGraphemeBreakPropertyFile=./utils/UnicodeData/GraphemeBreakProperty.txt \ -DunicodeGraphemeBreakTestFile=./utils/UnicodeData/GraphemeBreakTest.txt \ -DCMAKE_SIZEOF_VOID_P=8 \ -o /tmp/UnicodeGraphemeBreakTest.cpp.3.5.tmp \ ./unittests/Basic/UnicodeGraphemeBreakTest.cpp.gyb $ diff -u /tmp/UnicodeGraphemeBreakTest.cpp.2.7.tmp \ /tmp/UnicodeGraphemeBreakTest.cpp.3.5.tmp

PEP 3106 [1] changed the behavior of the dictionaries `items` method. In Python 2, `items` builds a real list of tuples where `iteritems` returns a generator. PEP 3106 changes Python 3's `items` method to be equivalent to Python 2's `iteritems` and completely removes `iteritems` in Python 3. This patch switches to both to use `items`. This could have a negative impact on Python 2's performance because it now causes the dictionary tuples to be built in memory. [1] https://www.python.org/dev/peps/pep-3106/

Python 2's map function [1] returns a list by default. Compared with Python 3's map function [2] which returns an iterator (or map object). The former is subscriptable, while the latter is not. This patch explicitly converts the result of some map operations to be a list. That way they have the same intended behaviour on both Python 2 and 3. [1] https://docs.python.org/2/library/functions.html#map [2] https://docs.python.org/3/library/functions.html#map

The Popen command on Python returns a byte sequence on stdout by default. However by sending the constructor the argument `universal_newlines=True` it forces the Popen to put a string on stdout. This was not a problem on Python 2 because the Python 2 regex engine seemed to work find on byte sequences where Python 3's does not. By explicitly converting everything to a string the same behavior is now seen on Python 2 and 3. See: https://docs.python.org/2/library/subprocess.html#frequently-used-arguments See: https://docs.python.org/3/library/subprocess.html#frequently-used-arguments

RLovelett · 2015-12-31T21:53:13Z

@dabrahams I've updated (7700b3b) per your comments.

dabrahams · 2015-12-31T21:58:45Z

@RLovelett Thanks!

[gyb] Python 2 or 3 compatible Generate Your Boilerplate

RLovelett force-pushed the gyb-python-3 branch 2 times, most recently from d944e27 to 42604fa Compare December 28, 2015 20:57

gribozavr self-assigned this Dec 29, 2015

RLovelett force-pushed the gyb-python-3 branch from 42604fa to fae1495 Compare December 30, 2015 19:57

RLovelett changed the title ~~[gyb] Use partial function application to work around PEP 3114~~ [gyb] Python 2 or 3 compatible Generate Your Boilerplate Dec 30, 2015

RLovelett force-pushed the gyb-python-3 branch 4 times, most recently from 878f13e to b1e4585 Compare December 30, 2015 21:16

frewsxcv reviewed Dec 31, 2015
View reviewed changes

RLovelett force-pushed the gyb-python-3 branch 3 times, most recently from 8a96152 to 464426a Compare December 31, 2015 17:31

dabrahams reviewed Dec 31, 2015
View reviewed changes

RLovelett force-pushed the gyb-python-3 branch 3 times, most recently from 7c43fdf to 8f22300 Compare December 31, 2015 21:51

RLovelett added 7 commits December 31, 2015 16:51

dabrahams pushed a commit that referenced this pull request Dec 31, 2015

Merge pull request #806 from RLovelett/gyb-python-3

b50b541

[gyb] Python 2 or 3 compatible Generate Your Boilerplate

dabrahams merged commit b50b541 into swiftlang:master Dec 31, 2015

RLovelett deleted the gyb-python-3 branch December 31, 2015 22:07

modocache mentioned this pull request Mar 7, 2016

Make gyb explicitly reference python2.7 #1535

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[gyb] Python 2 or 3 compatible Generate Your Boilerplate #806

[gyb] Python 2 or 3 compatible Generate Your Boilerplate #806

RLovelett commented Dec 28, 2015

RLovelett commented Dec 30, 2015

RLovelett commented Dec 30, 2015

frewsxcv Dec 31, 2015

RLovelett Dec 31, 2015

dabrahams Dec 31, 2015

frewsxcv Dec 31, 2015

dabrahams Dec 31, 2015

RLovelett Dec 31, 2015

frewsxcv Dec 31, 2015

RLovelett Dec 31, 2015

frewsxcv Dec 31, 2015

RLovelett Dec 31, 2015

RLovelett commented Dec 31, 2015

dabrahams commented Dec 31, 2015

[gyb] Python 2 or 3 compatible Generate Your Boilerplate #806

[gyb] Python 2 or 3 compatible Generate Your Boilerplate #806

Conversation

RLovelett commented Dec 28, 2015

RLovelett commented Dec 30, 2015

RLovelett commented Dec 30, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Testing

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RLovelett commented Dec 31, 2015

dabrahams commented Dec 31, 2015