Skip to content

Encode utf8 strings. Fixes errors when non-ascii chars are used. #11

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Oct 6, 2017

Conversation

frizzby
Copy link
Collaborator

@frizzby frizzby commented Jun 25, 2017

No description provided.

@avarabyeu
Copy link
Member

@frizzby could you please resolve conflicts to be able to merge PR

arguments = ", ".join(self.args)
full_name = "{0}{1} ({2})".format(assignment, self.name, arguments)
full_name = "{0}{1} ({2})".format(
Copy link

@krasoffski krasoffski Jun 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a problem context and robot framework internal implementation, thus I can be wrong.
From my perspective we need to convert all to Unicode first, than build string from parts and then convert single string to utf8 bytes (encode in python3 returns bytes type).

On python35

>>> s = "Русский"
>>> print(s)
Русский
>>> type(s)
<class 'str'>
>>> rb=s.encode("utf8")
>>> repr(rb)
"b'\\xd0\\xa0\\xd1\\x83\\xd1\\x81\\xd1\\x81\\xd0\\xba\\xd0\\xb8\\xd0\\xb9'"
>>> repr("{0}".format(rb))
'"b\'\\\\xd0\\\\xa0\\\\xd1\\\\x83\\\\xd1\\\\x81\\\\xd1\\\\x81\\\\xd0\\\\xba\\\\xd0\\\\xb8\\\\xd0\\\\xb9\'"'

On python27

>>> s = u"Русский"
>>> print(s)
Русский
>>> type(s)
<type 'unicode'>
>>> rb=s.encode("utf8")
>>> repr(rb)
"'\\xd0\\xa0\\xd1\\x83\\xd1\\x81\\xd1\\x81\\xd0\\xba\\xd0\\xb8\\xd0\\xb9'"
>>> repr("{0}".format(rb))
"'\\xd0\\xa0\\xd1\\x83\\xd1\\x81\\xd1\\x81\\xd0\\xba\\xd0\\xb8\\xd0\\xb9'"

Need to be checked for py2 and py3.
Also python format is able to work with bytes type.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@frizzby This is common problem with Robot Framework, but it should be fixed not with encode.

The reason

It is because Robot Framework works only and only with unicode strings.
Meanwhile, doing "some string {}".format(robot_framework_value) will cause encoding error because python trying to encode unicode (robot_framework_value) to ascii maybe because of some internatioalization options (correct me please).

How to fix it properly

  • Please, remove all .encode calls.
  • Change all strings to unicode strings. For instance (file model.py):

Before:

    def get_name(self):
        assignment = "{0} = ".format(", ".join(self.assign)) if self.assign else ""
        arguments = ", ".join(self.args)
        full_name = "{0}{1} ({2})".format(assignment, self.name, arguments)
        return full_name[:256]

After:

     def get_name(self):
        assignment = u"{0} = ".format(u", ".join(self.assign)) if self.assign else ""
        arguments = u", ".join(self.args)
        full_name = u"{0}{1} ({2})".format(assignment, self.name, arguments)
        return full_name[:256]

@avarabyeu
Copy link
Member

avarabyeu commented Jul 19, 2017

@frizzby @krasoffski Guys, can anyone comment on the state of this RP? Should i close it?

@krasoffski
Copy link

krasoffski commented Jul 20, 2017

Hi @avarabyeu,
Need to check for python3 because it might contain an issue with unicode encoding/decoding.

krasoffski added a commit to krasoffski/agent-python-robot that referenced this pull request Jul 20, 2017
@krasoffski
Copy link

krasoffski commented Jul 20, 2017

@avarabyeu one question here,
Name is limited by 255 Unicode chars or 255 bytes?

@frizzby,
How to verify/reproduce problem? Non-ascii chars for keyword/docstring/tags?

@avarabyeu
Copy link
Member

@frizzby @krasoffski guys, could you please handle this PR. Merge or close it, please.

Copy link

@ailjushkin ailjushkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I leaved above working examples. format cannot concatenate ascii and unicode. robot framework works only with unicode. so we just have to have unicode strings to format it with robot framework values.

arguments = ", ".join(self.args)
full_name = "{0}{1} ({2})".format(assignment, self.name, arguments)
full_name = "{0}{1} ({2})".format(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@frizzby This is common problem with Robot Framework, but it should be fixed not with encode.

The reason

It is because Robot Framework works only and only with unicode strings.
Meanwhile, doing "some string {}".format(robot_framework_value) will cause encoding error because python trying to encode unicode (robot_framework_value) to ascii maybe because of some internatioalization options (correct me please).

How to fix it properly

  • Please, remove all .encode calls.
  • Change all strings to unicode strings. For instance (file model.py):

Before:

    def get_name(self):
        assignment = "{0} = ".format(", ".join(self.assign)) if self.assign else ""
        arguments = ", ".join(self.args)
        full_name = "{0}{1} ({2})".format(assignment, self.name, arguments)
        return full_name[:256]

After:

     def get_name(self):
        assignment = u"{0} = ".format(u", ".join(self.assign)) if self.assign else ""
        arguments = u", ".join(self.args)
        full_name = u"{0}{1} ({2})".format(assignment, self.name, arguments)
        return full_name[:256]

@avarabyeu avarabyeu merged commit cea3c25 into master Oct 6, 2017
@avarabyeu avarabyeu deleted the fix_utf8 branch October 6, 2017 19:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants