Skip to content

Data.Analysis : float/double values don't convert properly with . decimals in csv file #5652

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
chriss2401 opened this issue May 30, 2020 · 7 comments · Fixed by #6782
Closed
Labels
Microsoft.Data.Analysis All DataFrame related issues and PRs

Comments

@chriss2401
Copy link

If I have a csv dataframe that looks like this:

Date;Value
12-04-1989;20,7
03-11-1990;22,1

Then my two float values will properly be loaded when I call DataFrame.LoadCsv (with values of 20.7 and 22.1)

But if my separator is the default one (',') and my floats are with a dot instead, they will get ignored and I will get 207 and 221 as values.

The issue comes from here :
https://github.com/dotnet/corefxlab/blob/master/src/Microsoft.Data.Analysis/DataFrame.cs#L488

Since if you write object value = Convert.ChangeType("20.7", typeof(double)); you will get 207 as a result.

@chriss2401
Copy link
Author

Apparently you need to add CultureInfo.InvariantCulture to the function if your double/float has dots instead of commas:

https://stackoverflow.com/questions/7725020/how-to-use-convert-changetype-when-conversiontype-is-decimal-and-input-is-40

So a potential fix could be to check whether the type of the column is a double/float and call ChangeType in the right way depending on whether the value(s) is described with dots or commas. I can open a branch for this if the maintainers agree with this approach.

@eerhardt
Copy link
Member

eerhardt commented Jun 5, 2020

So a potential fix could be to check whether the type of the column is a double/float and call ChangeType in the right way depending on whether the value(s) is described with dots or commas. I can open a branch for this if the maintainers agree with this approach.

We may also think about accepting a CultureInfo in the public API. That way the caller can specify what culture to use to parse numbers. The caller is going to know best whether they want CurrentCulture or InvariantCulture.

@chriss2401 - if you have a fix and test cases, feel free to send a PR and @pgovind and I can take a look.

Note also dotnet/corefxlab#2927 is re-doing how LoadCsv is implemented. So we may want to consider that as well.

@chriss2401
Copy link
Author

@eerhardt Good idea. I can implement this in a couple of days unless you guys decide to add it to dotnet/corefxlab#2927 , which in that case I can just close the issue.

@pgovind
Copy link

pgovind commented Jun 8, 2020

I think taking a CultureInfo in the public API is the right approach. @chriss2401 : If you have a fix and unit tests, I don't mind accepting it into either DataFrame.LoadCsv or dotnet/corefxlab#2927 (feel free to push a commit to that PR). If you make a new PR, I'll port your fix to dotnet/corefxlab#2927 when I go through there next :)

@chriss2401
Copy link
Author

Sounds good @pgovind , I'll push this and dotnet/corefxlab#2902 in dotnet/corefxlab#2927 in the next couple of days. I think I will clean up some duplicate code I saw as well that I saw in that PR :)

@pgovind pgovind transferred this issue from dotnet/corefxlab Mar 6, 2021
@pgovind pgovind added the Microsoft.Data.Analysis All DataFrame related issues and PRs label Mar 6, 2021
@Oblomoff
Copy link

Will this ever be fixed? For people outside of the US, the DataFrame.LoadCsv method doesn't exist in practice.

@chriss2401
Copy link
Author

I wouldn't mind re-taking a look at this at some point, just don't know exactly when that would be :)

@ghost ghost added the in-pr label Aug 1, 2023
@ghost ghost removed the in-pr label Sep 1, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Oct 1, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Microsoft.Data.Analysis All DataFrame related issues and PRs
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants