cross section coercion with output iterating #12859

tsu-shiuan · 2016-04-11T12:10:21Z

I'm am trying to call the to_dict function on the following DataFrame:

import pandas as pd

data = {"a": [1,2,3,4,5], "b": [90,80,40,60,30]}

df = pd.DataFrame(data)

df.reset_index().to_dict("r")
[{'a': 1, 'b': 90, 'index': 0},
 {'a': 2, 'b': 80, 'index': 1},
 {'a': 3, 'b': 40, 'index': 2},
 {'a': 4, 'b': 60, 'index': 3},
 {'a': 5, 'b': 30, 'index': 4}]

However my problem occurs if I perform a float operation on the dataframe, which mutates the index into a float:

(df*1.0).reset_index().to_dict("r")
[{'a': 1.0, 'b': 90.0, 'index': 0.0},  
{'a': 2.0, 'b': 80.0, 'index': 1.0},  
{'a': 3.0, 'b': 40.0, 'index': 2.0},  
{'a': 4.0, 'b': 60.0, 'index': 3.0},  
{'a': 5.0, 'b': 30.0, 'index': 4.0}]

Can anyone explain the above behaviour or recommend a workaround, or verify whether or not this could be a pandas bug? None of the other outtypes in the to_dict method mutates the index as shown above.

I've replicated this on both pandas 0.14 and 0.18 (latest)

Many thanks!

link to stackoverflow: http://stackoverflow.com/questions/36548151/pandas-to-dict-changes-index-type-with-outtype-records

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2016-04-11T12:30:40Z

Nothing to do with the index, just the fact that you have any float dtypes in the data

data = {"a": [1.0,2,3,4,5], "b": [90,80,40,60,30]}
In [19]: df.to_dict("records")
Out[19]:
[{'a': 1.0, 'b': 90.0},
 {'a': 2.0, 'b': 80.0},
 {'a': 3.0, 'b': 40.0},
 {'a': 4.0, 'b': 60.0},
 {'a': 5.0, 'b': 30.0}]

If you look at the code, we use DataFrame.values, which returns a NumPy array, which must have a single dtype (float64 in this case).

We probably don't need to use .values here.

tsu-shiuan · 2016-04-11T12:33:59Z

Thanks for your response. It there a possible workaround that I can use in the meantime?

TomAugspurger · 2016-04-11T12:37:53Z

Something like

In [28]: [x._asdict() for x in df.itertuples()]
Out[28]:
[OrderedDict([('Index', 0), ('a', 1.0), ('b', 90)]),
 OrderedDict([('Index', 1), ('a', 2.0), ('b', 80)]),
 OrderedDict([('Index', 2), ('a', 3.0), ('b', 40)]),
 OrderedDict([('Index', 3), ('a', 4.0), ('b', 60)]),
 OrderedDict([('Index', 4), ('a', 5.0), ('b', 30)])]

That's an OrderedDict using namedtuple._asdict, you can write dict comprehension if you want a regular one.

tsu-shiuan · 2016-04-11T12:40:34Z

Thanks :)

jreback · 2016-04-11T12:43:09Z

Though one could argue that this result is correct as we don't support mixed types in int-float when doing a cross-section, IOW:

In [10]: (df*1.0).reset_index().iloc[1]
Out[10]: 
index     1.0
a         2.0
b        80.0
Name: 1, dtype: float64

this is somewhat related to #12532, meaning that we should be iterating directly over (which already does the proper coercion), rather that doing a specific coercion in .to_dict().

jreback · 2016-04-11T12:44:58Z

giong to mark this an an API issue that needs discussion. This would actually be a fairly large change to correctly change this (though to be honest I think the current behavior is fine).

makmanalp · 2017-05-25T23:01:28Z

Note for future seekers - I'm trying to combine multiple pandas objects into one nested json structure.

Since to_json doesn't work in this case (manipulating json strings is hard), you might try to do to_dict(orient="records"), and combine the results of the to_dict()s into a bigger object, and do json.dumps on that. But because of this bug, you can't do that without screwing with the types of everything.

So then you might try doing @TomAugspurger's solution but you might find that for some reason it won't convert numpy types to python types, like to_dict() does, which makes json.dumps() fail.

My workaround solution is to do to_json() which gives you a correct json string with correct types, then do json.loads() on that to get python objects corresponding to that string, which you then put together whichever way you want (e.g. big_obj = {"a": df_a_json, "b": df_b_json}) and then run json.dumps on the whole thing. It's roundabout but it's the closest general solution I found without having to muck about with type conversions myself!

def to_records(df):
    """Replacement for pandas' to_dict(orient="records") which has trouble with
    upcasting ints to floats in the case of other floats being there.

    https://github.com/pandas-dev/pandas/issues/12859
    """
    import json
    return json.loads(df.to_json(orient="records"))

TomAugspurger · 2017-05-26T14:31:35Z

👍 There is an issue somewhere .to_dict using python types.

makmanalp · 2017-05-26T14:42:23Z

Seems like #16048 and #13258

gosuto-inzasheru · 2020-12-22T11:44:41Z

Really good workaround is found here: https://stackoverflow.com/a/31375241/1838257

df = pd.DataFrame({'INTS': [1, 2, 3], 'FLOATS': [1., 2., 3.]})

df.iloc[0].to_dict()

{'INTS': 1.0, 'FLOATS': 1.0}

Using the workaround:

df.astype('object').iloc[0].to_dict()

{'INTS': 1, 'FLOATS': 1.0}

Could this be implemented in a flag of .to_dict maybe?

mroeschke · 2021-04-23T03:08:14Z

The original example has the correct behavior of the index values remaining as integer. Could use a test

In [26]: (df*1.0).reset_index().to_dict("records")
Out[26]:
[{'index': 0, 'a': 1.0, 'b': 90.0},
 {'index': 1, 'a': 2.0, 'b': 80.0},
 {'index': 2, 'a': 3.0, 'b': 40.0},
 {'index': 3, 'a': 4.0, 'b': 60.0},
 {'index': 4, 'a': 5.0, 'b': 30.0}]

TomAugspurger added Difficulty Novice Dtype Conversions Unexpected or buggy dtype conversions labels Apr 11, 2016

TomAugspurger added this to the 0.19.0 milestone Apr 11, 2016

jreback modified the milestones: No action, 0.19.0 Apr 11, 2016

jreback changed the title ~~to_dict with outtype='records' mutates the index type~~ cross section coercion with output iterating Apr 11, 2016

jreback added API Design Needs Discussion Requires discussion from core team before further action labels Apr 11, 2016

TomAugspurger mentioned this issue Jul 27, 2016

BUG: df.to_dict('record') casts ints to floats #13817

Closed

jreback modified the milestones: 0.20.0, Next Major Release Mar 23, 2017

jreback mentioned this issue May 25, 2017

Indexing into a dataframe re-casts / "forgets" dtypes #16508

Closed

TomAugspurger added the good first issue label Oct 11, 2017

jreback removed the Difficulty Novice label Dec 15, 2017

jbrockmendel removed the Effort Low label Oct 21, 2019

mroeschke added Needs Tests Unit test(s) needed to prevent regressions and removed API Design Dtype Conversions Unexpected or buggy dtype conversions Needs Discussion Requires discussion from core team before further action labels Apr 23, 2021

mroeschke mentioned this issue May 12, 2021

TST: Add test for old issues #41431

Merged

10 tasks

jreback closed this as completed in #41431 May 12, 2021

corbin-chris mentioned this issue Feb 2, 2024

Variant db additions eastgenomics/eris#86

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cross section coercion with output iterating #12859

cross section coercion with output iterating #12859

tsu-shiuan commented Apr 11, 2016

TomAugspurger commented Apr 11, 2016

tsu-shiuan commented Apr 11, 2016

TomAugspurger commented Apr 11, 2016

tsu-shiuan commented Apr 11, 2016

jreback commented Apr 11, 2016

jreback commented Apr 11, 2016

makmanalp commented May 25, 2017 •

edited

Loading

TomAugspurger commented May 26, 2017

makmanalp commented May 26, 2017

gosuto-inzasheru commented Dec 22, 2020

mroeschke commented Apr 23, 2021

cross section coercion with output iterating #12859

cross section coercion with output iterating #12859

Comments

tsu-shiuan commented Apr 11, 2016

TomAugspurger commented Apr 11, 2016

tsu-shiuan commented Apr 11, 2016

TomAugspurger commented Apr 11, 2016

tsu-shiuan commented Apr 11, 2016

jreback commented Apr 11, 2016

jreback commented Apr 11, 2016

makmanalp commented May 25, 2017 • edited Loading

TomAugspurger commented May 26, 2017

makmanalp commented May 26, 2017

gosuto-inzasheru commented Dec 22, 2020

mroeschke commented Apr 23, 2021

makmanalp commented May 25, 2017 •

edited

Loading