DOC: Clarify how date_parser is called (GH9376) #9377

cmeeren · 2015-01-30T11:13:27Z

jreback · 2015-01-30T14:51:55Z

jorisvandenbossche · 2015-01-30T14:58:27Z

Yep, looking good. Although, in reality it is still a little bit more complex, as it are actually three steps that are tried, the first and last the onces you mentioned (vectorized with the columns as input, and scalar with rows), but also vectorized on the concatenated columns into one column. Only, also adding that will make it maybe too complex?

@cmeeren Do you also want to add a similar note to the tutorial docs? Somewhere here: http://pandas.pydata.org/pandas-docs/stable/io.html#date-parsing-functions That section could also use a real example I think (of a custom defined function, not of one imported from the io.date_converters module)

cmeeren · 2015-01-30T21:38:50Z

@jorisvandenbossche Just to be clear concerning the second try, where you say it concatenates the columns into one column: If you use two columns, one with values [2013, 2013, 2013] and one with values [1, 2, 3], will the second try pass the single argument [2013, 2013, 2013, 1, 2, 3]?

cmeeren · 2015-01-30T22:33:14Z

I have now mentioned the second way of calling date_parser (assuming my guess in the previous comment was correct) and added a description to the tutorial docs. I have not touched the example, since I have little experience with the Sphinx-IPython combo.

jreback · 2015-01-30T22:36:53Z

@cmeeren actually, I think you should mention pd.to_datetime() first. If you want to specify a format, read_csv does not currently have this implemented (it can infer the format however if infer_datetime_format=True, but is False by default).

So it is MUCH more performant to use pd.to_datetime() AFTER parsing if you have a single column, but it needs a format specification. IOW, you should NOT use date_parser if this is the case. (In reality read_csv should just do this, but it is an open issue).

cmeeren · 2015-01-30T22:40:46Z

@jreback are you suggesting that I mention pd.to_datetime() as a possible function to use for date_parser?

jreback · 2015-01-30T22:55:51Z

no! this is in lieu of using date_parser entirely

cmeeren · 2015-01-31T11:33:26Z

I don't really understand the role of pd.to_datetime() in this scenario. Why do you suggest that I mention it in the documentation for date_parser? Could you give an example of how pd.to_datetime() might be used instead of date_parser?

jorisvandenbossche · 2015-01-31T15:43:44Z

The concatenation is like this in your example: ["2013 1", "2013 2", "2013 3"].

@jreback In many cases you are right about to_datetime, and I also think read_csv should take a date_format argument (#2586)

jorisvandenbossche · 2015-01-31T15:46:22Z

@jreback finishing my comment: indeed, in many case to_datetime will be better, but I think you specifically want to use date_parser when you have multiple columns that have to be combined.

BTW, you can use pd.to_datetime as a function for date_parser, no? (if you still want to do it with a one-liner)

jreback · 2015-01-31T16:06:12Z

so the simple heuristic is this:

if you have multiple columns that need parsing, use parse_dates=[[....]].
try to infer the format read_csv(..., infer_datetime_format=True)
if you have a format, the use date_parser=lambda x: pd.to_datetime(x, format=.....)
if you have a really non-standard format, finally use date_parser=.....

so a naked date_parser is ALWAYS the last resort (as unless it can handle a vectorized input, its in python space).

jreback · 2015-01-31T16:11:45Z

@cmeeren so what I think we should do is update the doc-string a bit (what you have is prob good). then add a section to the docs in the date parsing section giving the relative list as above.

jorisvandenbossche · 2015-01-31T16:22:30Z

@jreback yes, that is a nice overview of steps to follow!

The full docs on the date parsing can use an overhaul. It is now scattered a bit:

where the first is not adjacent to the other three. I would have just one section with some subsections.

@cmeeren If you want, you can certainly try to tackle this! Otherwise, we just merge this as is (as it is correct information) and open a new issue for it.

cmeeren · 2015-01-31T18:16:12Z

First I'll correct things based on the recent feedback here. The information I added as it currently stands is not correct (specifically regarding the second "concatenation" call).

cmeeren · 2015-01-31T18:47:56Z

I added the list provided by @jreback. Please check the PR diff now and see if the information is good.

jorisvandenbossche · 2015-01-31T22:24:11Z

doc/source/io.rst

+an exception is raised, the next one is tried:
+
+1. ``date_parser`` is first called with one or more arrays as arguments,
+   as defined using `parse_dates` (e.g., ``date_parser(['2013', '2013']``, ``['1', '2'])``)


Here the ```` in the middle of the date_parser(..) example should be removed I think

jorisvandenbossche · 2015-01-31T22:29:18Z

This looks good for me!

Apart from the one small comment, can you also squash your commits into one?

cmeeren · 2015-02-01T11:03:58Z

I addressed the comment and I think I managed to squash the commits now.

jorisvandenbossche · 2015-02-01T14:58:43Z

Thanks a lot!

DOC: Clarify how date_parser is called (GH9376)

cmeeren mentioned this pull request Jan 30, 2015

read_csv: date_parser called once with arrays and then many times with strings (from each single row) as arguments #9376

Closed

jreback added the Docs label Jan 30, 2015

jreback added this to the 0.16.0 milestone Jan 30, 2015

jreback added IO CSV read_csv, to_csv Datetime Datetime data dtype labels Jan 31, 2015

jorisvandenbossche reviewed Jan 31, 2015
View reviewed changes

DOC: Clarify how date_parser is called (GH9376)

8504456

jorisvandenbossche added a commit that referenced this pull request Feb 1, 2015

Merge pull request #9377 from cmeeren/patch-1

ef48c6f

DOC: Clarify how date_parser is called (GH9376)

jorisvandenbossche merged commit ef48c6f into pandas-dev:master Feb 1, 2015

jreback mentioned this pull request Mar 5, 2015

datetime optimization #9594

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC: Clarify how date_parser is called (GH9376) #9377

DOC: Clarify how date_parser is called (GH9376) #9377

cmeeren commented Jan 30, 2015

jreback commented Jan 30, 2015

jorisvandenbossche commented Jan 30, 2015

cmeeren commented Jan 30, 2015

cmeeren commented Jan 30, 2015

jreback commented Jan 30, 2015

cmeeren commented Jan 30, 2015

jreback commented Jan 30, 2015

cmeeren commented Jan 31, 2015

jorisvandenbossche commented Jan 31, 2015

jorisvandenbossche commented Jan 31, 2015

jreback commented Jan 31, 2015

jreback commented Jan 31, 2015

jorisvandenbossche commented Jan 31, 2015

cmeeren commented Jan 31, 2015

cmeeren commented Jan 31, 2015

jorisvandenbossche Jan 31, 2015

jorisvandenbossche commented Jan 31, 2015

cmeeren commented Feb 1, 2015

jorisvandenbossche commented Feb 1, 2015

DOC: Clarify how date_parser is called (GH9376) #9377

DOC: Clarify how date_parser is called (GH9376) #9377

Conversation

cmeeren commented Jan 30, 2015

jreback commented Jan 30, 2015

jorisvandenbossche commented Jan 30, 2015

cmeeren commented Jan 30, 2015

cmeeren commented Jan 30, 2015

jreback commented Jan 30, 2015

cmeeren commented Jan 30, 2015

jreback commented Jan 30, 2015

cmeeren commented Jan 31, 2015

jorisvandenbossche commented Jan 31, 2015

jorisvandenbossche commented Jan 31, 2015

jreback commented Jan 31, 2015

jreback commented Jan 31, 2015

jorisvandenbossche commented Jan 31, 2015

cmeeren commented Jan 31, 2015

cmeeren commented Jan 31, 2015

jorisvandenbossche Jan 31, 2015

Choose a reason for hiding this comment

jorisvandenbossche commented Jan 31, 2015

cmeeren commented Feb 1, 2015

jorisvandenbossche commented Feb 1, 2015