Skip to content

BUG: DataFrame.to_dict when orient=index data loss #22810

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Oct 11, 2018
Merged

BUG: DataFrame.to_dict when orient=index data loss #22810

merged 18 commits into from
Oct 11, 2018

Conversation

jameswinegar
Copy link
Contributor

@pep8speaks
Copy link

pep8speaks commented Sep 23, 2018

Hello @jameswinegar! Thanks for updating the PR.

Comment last updated on September 23, 2018 at 02:01 Hours UTC

@codecov
Copy link

codecov bot commented Sep 23, 2018

Codecov Report

Merging #22810 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #22810      +/-   ##
==========================================
+ Coverage   92.19%   92.19%   +<.01%     
==========================================
  Files         169      169              
  Lines       50873    50913      +40     
==========================================
+ Hits        46904    46941      +37     
- Misses       3969     3972       +3
Flag Coverage Δ
#multiple 90.61% <100%> (ø) ⬆️
#single 42.3% <0%> (-0.02%) ⬇️
Impacted Files Coverage Δ
pandas/core/frame.py 97.11% <100%> (-0.1%) ⬇️
pandas/core/internals/concat.py 98% <0%> (-0.38%) ⬇️
pandas/core/indexes/period.py 93.31% <0%> (-0.35%) ⬇️
pandas/core/indexes/datetimes.py 95.74% <0%> (-0.04%) ⬇️
pandas/core/indexing.py 93.87% <0%> (-0.01%) ⬇️
pandas/io/packers.py 88.04% <0%> (ø) ⬆️
pandas/core/dtypes/base.py 100% <0%> (ø) ⬆️
pandas/core/generic.py 96.65% <0%> (ø) ⬆️
pandas/core/arrays/datetimelike.py 95.56% <0%> (ø) ⬆️
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 904af03...6aec074. Read the comment docs.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a note in the api changes section about this for 0.24.0

@@ -1163,8 +1163,13 @@ def to_dict(self, orient='dict', into=dict):
for k, v in zip(self.columns, np.atleast_1d(row)))
for row in self.values]
elif orient.lower().startswith('i'):
return into_c((t[0], dict(zip(self.columns, t[1:])))
for t in self.itertuples())
if self.index.unique().size == self.index.size:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just use self.index.is_unique

if self.index.unique().size == self.index.size:
return into_c((t[0], dict(zip(self.columns, t[1:])))
for t in self.itertuples())
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for an else!

@jreback jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Error Reporting Incorrect or improved errors from pandas labels Sep 23, 2018
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this affect any other orient types?

can you add some equiv tests for .to_json? (or is this affected?)

@jameswinegar
Copy link
Contributor Author

@jreback on .to_json #4376

@jameswinegar
Copy link
Contributor Author

@jreback Where are the API changes documentation?

@@ -191,6 +191,7 @@ Other Enhancements
- :meth:`Series.resample` and :meth:`DataFrame.resample` have gained the :meth:`Resampler.quantile` (:issue:`15023`).
- :meth:`Index.to_frame` now supports overriding column name(s) (:issue:`22580`).
- New attribute :attr:`__git_version__` will return git commit sha of current build (:issue:`21295`).
- :func:`~DataFrame.to_dict` with orient on index and a non-unique index will raise a ValueError instead of losing data (:issue:`22801`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to other api changes

@jameswinegar
Copy link
Contributor Author

@jreback any more edits needed?

@@ -804,6 +804,7 @@ Reshaping
- Bug in :meth:`Series.replace` and meth:`DataFrame.replace` when dict is used as the ``to_replace`` value and one key in the dict is is another key's value, the results were inconsistent between using integer key and using string key (:issue:`20656`)
- Bug in :meth:`DataFrame.drop_duplicates` for empty ``DataFrame`` which incorrectly raises an error (:issue:`20516`)
- Bug in :func:`pandas.wide_to_long` when a string is passed to the stubnames argument and a column name is a substring of that stubname (:issue:`22468`)
- Bug in :func:`DataFrame.to_dict` raises ``ValueError`` when used with ``orient='index'`` and a non-unique index instead of losing data (:issue:`22801`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to API Breaking changes

@jameswinegar
Copy link
Contributor Author

@jreback is that move good?

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm @jreback

@@ -847,7 +847,7 @@ Reshaping
- Bug in :meth:`Series.replace` and meth:`DataFrame.replace` when dict is used as the ``to_replace`` value and one key in the dict is is another key's value, the results were inconsistent between using integer key and using string key (:issue:`20656`)
- Bug in :meth:`DataFrame.drop_duplicates` for empty ``DataFrame`` which incorrectly raises an error (:issue:`20516`)
- Bug in :func:`pandas.wide_to_long` when a string is passed to the stubnames argument and a column name is a substring of that stubname (:issue:`22468`)
-
- Bug in :func:`merge` when merging ``datetime64[ns, tz]`` data that contained a DST transition (:issue:`18885`)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved merge conflict

@jreback jreback added this to the 0.24.0 milestone Oct 7, 2018
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rebase and ping on green.

@jameswinegar
Copy link
Contributor Author

@jreback fixed. long wait last night for circleci


Bug in :func:`DataFrame.to_dict` raises ``ValueError`` when used with
``orient='index'`` and a non-unique index instead of losing data (:issue:`22801`)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a mini-example here (just showing what kind of data frame will raise). the prior example was too long, but I think we need something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an ipython block that will raise the error. @jreback

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right but now the docs will not work. so, you can do this as an ipython block but you need an :okexcept: to handle the exception.

also show the df after defining it

Add example that will raise value error.

Bug in :func:`DataFrame.to_dict` raises ``ValueError`` when used with
``orient='index'`` and a non-unique index instead of losing data (:issue:`22801`)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right but now the docs will not work. so, you can do this as an ipython block but you need an :okexcept: to handle the exception.

also show the df after defining it

``orient='index'`` and a non-unique index instead of losing data (:issue:`22801`)

.. ipython:: python
:okexcept:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line this up as below, e.g.

.. ipython:: python
    :okexcept:

    df = ......

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one more doc correction, ping on green.

@jreback jreback merged commit c8ce3d0 into pandas-dev:master Oct 11, 2018
@jreback
Copy link
Contributor

jreback commented Oct 11, 2018

thanks @jameswinegar

tm9k1 pushed a commit to tm9k1/pandas that referenced this pull request Nov 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DataFrame.to_dict can result in data loss when indexes are not unique
4 participants