-
Notifications
You must be signed in to change notification settings - Fork 34
Encode utf8 strings. Fixes errors when non-ascii chars are used. #11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@frizzby could you please resolve conflicts to be able to merge PR |
arguments = ", ".join(self.args) | ||
full_name = "{0}{1} ({2})".format(assignment, self.name, arguments) | ||
full_name = "{0}{1} ({2})".format( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a problem context and robot framework internal implementation, thus I can be wrong.
From my perspective we need to convert all to Unicode first, than build string from parts and then convert single string to utf8 bytes (encode in python3 returns bytes
type).
On python35
>>> s = "Русский"
>>> print(s)
Русский
>>> type(s)
<class 'str'>
>>> rb=s.encode("utf8")
>>> repr(rb)
"b'\\xd0\\xa0\\xd1\\x83\\xd1\\x81\\xd1\\x81\\xd0\\xba\\xd0\\xb8\\xd0\\xb9'"
>>> repr("{0}".format(rb))
'"b\'\\\\xd0\\\\xa0\\\\xd1\\\\x83\\\\xd1\\\\x81\\\\xd1\\\\x81\\\\xd0\\\\xba\\\\xd0\\\\xb8\\\\xd0\\\\xb9\'"'
On python27
>>> s = u"Русский"
>>> print(s)
Русский
>>> type(s)
<type 'unicode'>
>>> rb=s.encode("utf8")
>>> repr(rb)
"'\\xd0\\xa0\\xd1\\x83\\xd1\\x81\\xd1\\x81\\xd0\\xba\\xd0\\xb8\\xd0\\xb9'"
>>> repr("{0}".format(rb))
"'\\xd0\\xa0\\xd1\\x83\\xd1\\x81\\xd1\\x81\\xd0\\xba\\xd0\\xb8\\xd0\\xb9'"
Need to be checked for py2
and py3
.
Also python format is able to work with bytes
type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@frizzby This is common problem with Robot Framework, but it should be fixed not with encode
.
The reason
It is because Robot Framework works only and only with unicode strings.
Meanwhile, doing "some string {}".format(robot_framework_value)
will cause encoding error because python trying to encode unicode (robot_framework_value
) to ascii maybe because of some internatioalization options (correct me please).
How to fix it properly
- Please, remove all
.encode
calls. - Change all strings to unicode strings. For instance (file model.py):
Before:
def get_name(self):
assignment = "{0} = ".format(", ".join(self.assign)) if self.assign else ""
arguments = ", ".join(self.args)
full_name = "{0}{1} ({2})".format(assignment, self.name, arguments)
return full_name[:256]
After:
def get_name(self):
assignment = u"{0} = ".format(u", ".join(self.assign)) if self.assign else ""
arguments = u", ".join(self.args)
full_name = u"{0}{1} ({2})".format(assignment, self.name, arguments)
return full_name[:256]
@frizzby @krasoffski Guys, can anyone comment on the state of this RP? Should i close it? |
Hi @avarabyeu, |
@avarabyeu one question here, @frizzby, |
@frizzby @krasoffski guys, could you please handle this PR. Merge or close it, please. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I leaved above working examples. format
cannot concatenate ascii and unicode. robot framework works only with unicode. so we just have to have unicode strings to format it with robot framework values.
arguments = ", ".join(self.args) | ||
full_name = "{0}{1} ({2})".format(assignment, self.name, arguments) | ||
full_name = "{0}{1} ({2})".format( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@frizzby This is common problem with Robot Framework, but it should be fixed not with encode
.
The reason
It is because Robot Framework works only and only with unicode strings.
Meanwhile, doing "some string {}".format(robot_framework_value)
will cause encoding error because python trying to encode unicode (robot_framework_value
) to ascii maybe because of some internatioalization options (correct me please).
How to fix it properly
- Please, remove all
.encode
calls. - Change all strings to unicode strings. For instance (file model.py):
Before:
def get_name(self):
assignment = "{0} = ".format(", ".join(self.assign)) if self.assign else ""
arguments = ", ".join(self.args)
full_name = "{0}{1} ({2})".format(assignment, self.name, arguments)
return full_name[:256]
After:
def get_name(self):
assignment = u"{0} = ".format(u", ".join(self.assign)) if self.assign else ""
arguments = u", ".join(self.args)
full_name = u"{0}{1} ({2})".format(assignment, self.name, arguments)
return full_name[:256]
No description provided.