-
-
Notifications
You must be signed in to change notification settings - Fork 32k
bpo-32861: urllib.robotparser fix incomplete __str__ methods. #5711
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-32861: urllib.robotparser fix incomplete __str__ methods. #5711
Conversation
The RobotFileParser's string representation was incomplete and missing some valid rule lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I have added just few suggestions. And the two unnecessary trailing newlines should be kept in maintained releases.
Please add a news entry.
Lib/urllib/robotparser.py
Outdated
@@ -190,7 +190,10 @@ def request_rate(self, useragent): | |||
return self.default_entry.req_rate | |||
|
|||
def __str__(self): | |||
return ''.join([str(entry) + "\n" for entry in self.entries]) | |||
ret = [str(entry) for entry in self.entries] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be faster:
entries = self.entries
if self.default_entry is not None:
entries = entries + [self.default_entry]
return '\n\n'.join(map(str, entries))
Lib/urllib/robotparser.py
Outdated
@@ -222,10 +225,14 @@ def __init__(self): | |||
def __str__(self): | |||
ret = [] | |||
for agent in self.useragents: | |||
ret.extend(["User-agent: ", agent, "\n"]) | |||
ret.append("User-agent: {0}".format(agent)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f-strings can be used in 3.6+.
Lib/urllib/robotparser.py
Outdated
for line in self.rulelines: | ||
ret.extend([str(line), "\n"]) | ||
return ''.join(ret) | ||
ret.append(str(line)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or just
ret.extend(map(str, self.rulelines))
All suggestions have been implemented, I have no strong opinions on any of them. I also added a news entry. Is backporting part of this PR, or do you merge this into 3.8 and then create separate issues for the other python branches? Is that something that I can help with? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Just add your credits.
@@ -0,0 +1,3 @@ | |||
The urllib.robotparser's ``__str__`` representation now includes wildcard | |||
entries and the "Crawl-delay" and "Request-rate" fields. Also removes extra | |||
newlines that were being appended to the end of the string. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add "Patch by yourname." and add your name into Misc/ACKS.
Cool, I added my name to the news entry. I'm already in the ACKS file so all good there. |
Thanks @michael-lazar for the PR, and @serhiy-storchaka for merging it 🌮🎉.. I'm working now to backport this PR to: 2.7, 3.6, 3.7. |
GH-6795 is a backport of this pull request to the 3.7 branch. |
Sorry, @michael-lazar and @serhiy-storchaka, I could not cleanly backport this to |
GH-6796 is a backport of this pull request to the 3.6 branch. |
The RobotFileParser's string representation was incomplete and missing some valid rule lines.
https://bugs.python.org/issue32861
https://bugs.python.org/issue32861