CLN: clean up data.py #4002

cpcloud · 2013-06-23T05:20:26Z

closes #4001
closes #3982
closes #4028

jtratner · 2013-06-23T06:22:51Z

pandas/io/data.py

@@ -107,12 +107,13 @@ def get_quote_yahoo(symbols):
    request = str.join('', codes.values())  # code request string
    header = codes.keys()

-    data = dict(zip(codes.keys(), [[] for i in range(len(codes))]))


would it make more sense to change this to a defaultdict?

yep will do, does py26 have defaultdict?

@cpcloud came in with 2.5, so yes :)

sweet i like defaultdict.

jreback · 2013-06-23T21:14:56Z

everything still work? (not sure how many tests we actually have on io.data)

can u squash the num of commits down a bit?

cpcloud · 2013-06-23T22:49:58Z

@jreback yes everything still works, but not quite finished yet...will squash when i check off the list items

cpcloud · 2013-06-23T22:58:22Z

the fact that everything still works after all these changes makes me think that there is potentially a lot of untested code in data.py...

jtratner · 2013-06-24T02:19:19Z

@cpcloud I agree with you on that. Do you know why the main entry point is named like a class (DataReader) but isn't really a class? It's confusing because the functions seem really similar and a good candidate for abstracting into a class definition. Not even necessarily because it would reduce code duplication, but because it could make the file easier to follow (i.e., a workflow described as get_data -> clean_data -> package_data [into appropriate type, like Series/Panel]) and would allow us to do a better job in standardizing error handling and making it easier to add in new data sources (or allow people to make plugins or something).

jtratner · 2013-06-24T03:43:49Z

Would it make sense to set a default timeout on these requests? (you can pass a timeout keyword to urlopen). Helps the test suite and gives more flexibility to users.

cpcloud · 2013-06-24T03:51:43Z

possibly...somewhat related is that there's a retry_count all over the place which i think may be unnecessary and might overlap a bit with timeout...timeout might be nice although i'm not sure when you'd want to use it. maybe u have some specific use cases in mind?

jtratner · 2013-06-24T04:13:48Z

pandas/io/data.py


-    lines = urllib2.urlopen(urlStr).readlines()
+    with closing(urlopen(url_str)) as url:


@cpcloud you could use timeout here...just so you can explicitly set when you want to give up on a server responding...

basically every line like this.

i'm not sure that timing out is an issue in these cases...but i suppose that a timeout parameter might be useful. i'm not sure abou it though

Well, weren't you saying that the test_google method was hanging? Was it
because of iterations? Might help it...but yeah prob not a big deal.

On Mon, Jun 24, 2013 at 12:20 AM, Phillip Cloud [email protected]:

In pandas/io/data.py:

lines = urllib2.urlopen(urlStr).readlines()

with closing(urlopen(url_str)) as url:

i'm not sure that timing out is an issue in these cases...but i suppose
that a timeout parameter might be useful. i'm not sure abou it though

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/4002/files#r4835821
.

i'm not sure what the reason is. i'll check and let u know

cpcloud · 2013-06-28T19:02:29Z

@jreback on this clean up i have those tests that fail skip if they raise an AttributeError. i feel a little weird about that .... how should i handle this error, seems hard to predict since lxml will return but then not have any html...how to test for that? could return nan if that happens or raise...

jreback · 2013-06-28T19:13:03Z

I would have an exception fail them. I guess you can't really 'validate' these consistently. I think these reutrn a list, so a len(0) list is valid then; only complete the rest of the test if you have len > 0. Maybe I would print a warning (in the test) if this is the case, so that when running in the command line, we know this has 'failed'.

AssertionWarning? a little bit cheesy...but what can you do

cpcloud · 2013-06-28T19:20:49Z

hold on maybe there's something different between using expiry vs passing month/year explicitly. only the warning tests seem to be failing

cpcloud · 2013-06-28T19:32:29Z

nope that looks ok

cpcloud · 2013-06-28T21:53:32Z

after this passes i'm going to do one more rehash and push then im gonna merge

cpcloud · 2013-06-28T21:55:16Z

going to also run tox a few times in a row to see if i can get the warnings to show up

cpcloud · 2013-06-28T22:50:14Z

@jreback any objections here?

cpcloud · 2013-06-28T22:50:36Z

kind of a big overhaul, but badly needed

jreback · 2013-06-28T23:05:14Z

looks good

CLN: clean up data.py

cpcloud · 2013-06-28T23:06:20Z

round 42 on this...here we go.

woodb · 2013-06-29T03:27:17Z

I picked a hell of a time to decide to make a minor change to data.py! Things are moving quickly 👍 I'll have to check back on #4044 once the dust has settled

cpcloud · 2013-06-29T14:53:28Z

sorry man there's just a bunch of stuff that needed to essentially be rewritten and network stuff cleared up (still some issues there)

ghost assigned cpcloud Jun 23, 2013

jtratner reviewed Jun 23, 2013
View reviewed changes

cpcloud mentioned this pull request Jun 23, 2013

BUG/TST: catch socket.error in py2/3.2 and ConnectionError in py3.3 #3985

Merged

jtratner mentioned this pull request Jun 24, 2013

yahoo test error connection reset #3982

Closed

jtratner reviewed Jun 24, 2013
View reviewed changes

jreback mentioned this pull request Jun 24, 2013

clean up data.py #4001

Closed

13 tasks

cpcloud closed this Jun 27, 2013

cpcloud reopened this Jun 28, 2013

CLN: clean up data.py

0c4e215

cpcloud added a commit that referenced this pull request Jun 28, 2013

Merge pull request #4002 from cpcloud/data-dot-py-cleanup

3b08632

CLN: clean up data.py

cpcloud merged commit 3b08632 into pandas-dev:master Jun 28, 2013

cpcloud deleted the data-dot-py-cleanup branch June 28, 2013 23:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLN: clean up data.py #4002

CLN: clean up data.py #4002

cpcloud commented Jun 23, 2013

jtratner Jun 23, 2013

cpcloud Jun 23, 2013

jtratner Jun 24, 2013

cpcloud Jun 24, 2013

jreback commented Jun 23, 2013

cpcloud commented Jun 23, 2013

cpcloud commented Jun 23, 2013

jtratner commented Jun 24, 2013

jtratner commented Jun 24, 2013

cpcloud commented Jun 24, 2013

jtratner Jun 24, 2013

jtratner Jun 24, 2013

cpcloud Jun 24, 2013

jtratner Jun 24, 2013

cpcloud Jun 24, 2013

cpcloud commented Jun 28, 2013

jreback commented Jun 28, 2013

cpcloud commented Jun 28, 2013

cpcloud commented Jun 28, 2013

cpcloud commented Jun 28, 2013

cpcloud commented Jun 28, 2013

cpcloud commented Jun 28, 2013

cpcloud commented Jun 28, 2013

jreback commented Jun 28, 2013

cpcloud commented Jun 28, 2013

woodb commented Jun 29, 2013

cpcloud commented Jun 29, 2013


		lines = urllib2.urlopen(urlStr).readlines()
		with closing(urlopen(url_str)) as url:

CLN: clean up data.py #4002

CLN: clean up data.py #4002

Conversation

cpcloud commented Jun 23, 2013

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Jun 23, 2013

cpcloud commented Jun 23, 2013

cpcloud commented Jun 23, 2013

jtratner commented Jun 24, 2013

jtratner commented Jun 24, 2013

cpcloud commented Jun 24, 2013

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cpcloud commented Jun 28, 2013

jreback commented Jun 28, 2013

cpcloud commented Jun 28, 2013

cpcloud commented Jun 28, 2013

cpcloud commented Jun 28, 2013

cpcloud commented Jun 28, 2013

cpcloud commented Jun 28, 2013

cpcloud commented Jun 28, 2013

jreback commented Jun 28, 2013

cpcloud commented Jun 28, 2013

woodb commented Jun 29, 2013

cpcloud commented Jun 29, 2013