BUG: Patch read_csv NA values behaviour #14751

gfyoung · 2016-11-26T06:03:07Z

Patches the following behaviour when na_values is passed in as a dictionary:

Prevent aliasing in case na_values was defined in a broader scope.
Respect column indices as keys when doing NA conversions.

Closes #14203.

codecov-io · 2016-11-26T16:04:38Z

Current coverage is 85.27% (diff: 100%)

No coverage report found for master at 033d345.

Powered by Codecov. Last update 033d345...0bd7531

gfyoung · 2016-11-27T04:14:34Z

Appveyor keeps failing because the build is taking too long, but I have no clue why that is the case (e.g. this build seemed to do just fine). I took a look at Travis to see if the builds much longer, but that is not the case.

gfyoung · 2016-12-06T16:36:38Z

@jreback , @sinhrks : Everything is green, so ready to merge if there are no concerns.

jreback · 2016-12-06T18:46:01Z

pandas/io/parsers.py

+        clean_na_values = {}
+        clean_na_fvalues = {}
+
+        if isinstance(self.na_values, dict):


I think it might be better to have a single function in parsers.pyx which does this (e.g. returns cleaned na_value, na_fvalue). rather than have duplicate code in python and c parser.

Hmmm...perhaps, but the logic is very coupled with the class attributes of the C and Python parsers. In addition, the way the logic is applied (in bulk for Python, per iteration in C), merging the two would not be straightforward.

(in bulk for Python, per iteration in C),

what does this mean?

this should be done once

Look at the _get_na_list method here for the C engine. It gets called on every iteration to extract the relevant na_values for the particular column in question.

that's only called on column conversion (once).

There really is no difference. All we do is get the na_values for the given column, "clean it" and then use them to parse the rest of the column. In Python, we get the na_values for all relevant columns, clean them, and then use stored dictionary later on.

just trying to reduce the amount of code
having a common method for all parsers is better than having specific things for each

No, I perfectly understand. I just haven't found a clean way to do it yet, and it seems like it would be best to do as a follow-up if that works.

oh ok

let's do that then

lgtm

Great. I'll ping again when everything goes green.

Closes pandas-devgh-14203.

gfyoung · 2016-12-16T04:55:29Z

@jreback : Everything is passing. Ready to merge if there are no other concerns.

jreback · 2016-12-16T23:32:42Z

thanks!

Patches the following behaviour when `na_values` is passed in as a dictionary: 1. Prevent aliasing in case `na_values` was defined in a broader scope. 2. Respect column indices as keys when doing NA conversions. Closes pandas-dev#14203. Author: gfyoung <[email protected]> Closes pandas-dev#14751 from gfyoung/csv-na-values-patching and squashes the following commits: cac422c [gfyoung] BUG: Respect column indices for dict-like na_values 1439c27 [gfyoung] BUG: Prevent aliasing of dict na_values

Patches the following behaviour when `na_values` is passed in as a dictionary: 1. Prevent aliasing in case `na_values` was defined in a broader scope. 2. Respect column indices as keys when doing NA conversions. Closes pandas-dev#14203. Author: gfyoung <[email protected]> Closes pandas-dev#14751 from gfyoung/csv-na-values-patching and squashes the following commits: cac422c [gfyoung] BUG: Respect column indices for dict-like na_values 1439c27 [gfyoung] BUG: Prevent aliasing of dict na_values (cherry picked from commit dd8cba2)

gfyoung force-pushed the csv-na-values-patching branch 2 times, most recently from 8ebba67 to b9b0367 Compare November 26, 2016 22:31

gfyoung force-pushed the csv-na-values-patching branch 2 times, most recently from a2a5b63 to f086bac Compare November 27, 2016 04:24

sinhrks added Bug IO CSV read_csv, to_csv labels Nov 27, 2016

gfyoung force-pushed the csv-na-values-patching branch 2 times, most recently from 5bd7c10 to 0bd7531 Compare December 1, 2016 23:05

jreback reviewed Dec 6, 2016

View reviewed changes

gfyoung added 2 commits December 15, 2016 11:09

BUG: Prevent aliasing of dict na_values

1439c27

BUG: Respect column indices for dict-like na_values

cac422c

Closes pandas-devgh-14203.

gfyoung force-pushed the csv-na-values-patching branch from 0bd7531 to cac422c Compare December 15, 2016 16:09

jreback modified the milestones: 0.20.0, 0.19.2 Dec 15, 2016

gfyoung changed the title ~~Patch read_csv NA values behaviour~~ BUG: Patch read_csv NA values behaviour Dec 15, 2016

jreback closed this in dd8cba2 Dec 16, 2016

gfyoung deleted the csv-na-values-patching branch December 17, 2016 01:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Patch read_csv NA values behaviour #14751

BUG: Patch read_csv NA values behaviour #14751

gfyoung commented Nov 26, 2016

codecov-io commented Nov 26, 2016 •

edited

Loading

gfyoung commented Nov 27, 2016 •

edited

Loading

gfyoung commented Dec 6, 2016

jreback Dec 6, 2016

gfyoung Dec 6, 2016

jreback Dec 6, 2016

gfyoung Dec 6, 2016 •

edited

Loading

jreback Dec 6, 2016

gfyoung Dec 15, 2016

jreback Dec 15, 2016

gfyoung Dec 15, 2016

jreback Dec 15, 2016

gfyoung Dec 15, 2016

gfyoung commented Dec 16, 2016

jreback commented Dec 16, 2016

BUG: Patch read_csv NA values behaviour #14751

BUG: Patch read_csv NA values behaviour #14751

Conversation

gfyoung commented Nov 26, 2016

codecov-io commented Nov 26, 2016 • edited Loading

Current coverage is 85.27% (diff: 100%)

gfyoung commented Nov 27, 2016 • edited Loading

gfyoung commented Dec 6, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gfyoung Dec 6, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gfyoung commented Dec 16, 2016

jreback commented Dec 16, 2016

codecov-io commented Nov 26, 2016 •

edited

Loading

gfyoung commented Nov 27, 2016 •

edited

Loading

gfyoung Dec 6, 2016 •

edited

Loading