-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
re functions str/unicode problems #273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'm of two minds; it would be better not to rely on this, because your code will break in Python 3; but then this behavior seems to be relied on commonly. I propose to wait until we've truly addressed python/typing#208. |
I see, what you mean. For this specific case I would recommend using something like this:
The reason would be, that if
|
That's a good idea. Could you submit a patch? |
We just recently landed and reverted a patch that did this, #244. Partly because there was a mistake in the way the Unions of Patterns were written, but also because we decided to wait for python/typing#208. From what I've seen so far, the usual case for a string pattern/unicode match is when the string is just a static string literal -- in that case, the best fix is to just make it a unicode literal. This is also a case which is likely to be allowed in python/typing#208 without needing changes here. Also important to note: in general, testing with just ASCII is not a guarantee of compatibility. In this case, using non-ASCII codepoints while mixing bytes and unicode not cause a runtime exception, but will likely result in missing matches you might otherwise expect. I don't think I'm against this patch; just wanted to give some context. |
Actually, I think we need this because of zulip/zulip#936 (you can't write raw, explicit unicode literals in Python 3). |
I see what you mean. I am just thinking, there where no new posts at python/typing#208 and python/mypy#1141 since April, so I figure, a solution to that problem will take a while (which is understandable, since it is really not a trivial problem to sort out). Should we maybe implement a patch like this for the time being until there is a resolution on this matter? In our main project we have data from different sources coming in as either unicode or str and we want to use the same regexes (precompiled) on both. If we define the regex as unicode, mypy will think, that the output from e.g. |
Yeah, a PR that makes re.sub()'s first arg Union[str, unicode] would be
fine.
|
In Python 2, all re.pyi functions now accept both str and unicode for their pattern arguments, so I don't think there's anything else to do here. |
Currenty all functions in re are typed with
AnyStr
, so for examplere.match
is defined as follows:This is a problem, because since
AnyStr
is defined asAnyStr = TypeVar('AnyStr', bytes, str)
You cannot use, for example, a str pattern on an unicode string, which does work fine in Python 2.7.
Which way should this be changed to, so that typeshed reflects the reality?
The same problem appears in a lot of places, since it is used as if it was
Union[str, unicode]
.The text was updated successfully, but these errors were encountered: