Skip to content

DataFrame.from_dict(OrderedDict(items),...) not a proper replacement for from_items(items,...) #22705

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
PatrickDRusk opened this issue Sep 14, 2018 · 2 comments
Labels
Constructors Series/DataFrame/Index/pd.array Constructors Error Reporting Incorrect or improved errors from pandas

Comments

@PatrickDRusk
Copy link

PatrickDRusk commented Sep 14, 2018

Code Sample

import pandas
from collections import OrderedDict
def test_from_dict_replacing_from_items_with_duplicates():
    rows = [(1, (2,)), (1, (2,))]
    df1 = pandas.DataFrame.from_items(rows, columns=('a', ), orient='index')
    df2 = pandas.DataFrame.from_dict(OrderedDict(rows), columns=('a', ), orient='index')
    pandas.testing.assert_frame_equal(df1, df2)

Problem description

The deprecation warning in 0.23.4 indicates that from_items(items,...) should be changed to from_dict(OrderedDict(items).... But that doesn't work in cases where there would be duplicates in the index. The above is a test that would fail.

This would make it burdensome for developers to test all the scenarios in which they have used from_items() to change them to from_dict.

It might be worth noting that the columns parameter was not allowed on from_dict() prior to 0.23, and that would further complicate adapting code away from from_items().

I recommend against this deprecation.

Expected Output

The above test would be expected to succeed.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.7.0.final.0 python-bits: 64 OS: Darwin OS-release: 17.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.23.4
pytest: 3.8.0
pip: 18.0
setuptools: 40.2.0
Cython: 0.28.5
numpy: 1.14.5
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.5.0
sphinx: None
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: 3.4.4
numexpr: 2.6.8
feather: None
matplotlib: 2.2.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: 1.2.11
pymysql: 0.9.2
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@PatrickDRusk
Copy link
Author

PatrickDRusk commented Sep 24, 2018

If this deprecation is going to happen, the warning message should probably change its recommendation to something more reliable. I believe this would work:

pandas.DataFrame([i[1] for i in items], index=[i[0] for i in items], columns=columns)

@mroeschke mroeschke added the Error Reporting Incorrect or improved errors from pandas label Jan 13, 2019
@jbrockmendel jbrockmendel added the Constructors Series/DataFrame/Index/pd.array Constructors label Jul 23, 2019
@mroeschke
Copy link
Member

Looks like from_items has already been removed. Not sure if there is any additional tasks to be done here. Closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Constructors Series/DataFrame/Index/pd.array Constructors Error Reporting Incorrect or improved errors from pandas
Projects
None yet
Development

No branches or pull requests

3 participants