API: disallow duplicate level names #18882

toobaz · 2017-12-20T20:53:23Z

closes Problems when accessing MultiIndex level with duplicated level names #18872
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

jbrockmendel · 2017-12-20T21:00:36Z

+1 on the idea. Some cases to think about: pd.NaT, np.nan, anything else where x != x

toobaz · 2017-12-21T07:21:38Z

anything else where x != x

I see only three options:

we don't do anything: the user just knows that passing np.nan as level name is stupid because np.nan != np.nan
we build (and keep updated) an Index out of level names, and use it to access levels with the same rules with which you access items in a Series
just like None, np.nan and friends can be used as level names and can be duplicated but cannot be used to refer to the levels (e.g. get_level_values(np.nan) will immediately raise). We describe this behavior in the docstring of get_level_values() and in all other methods which accept a level name, we state "see get_level_values()"

The current PR does 1., I would be OK with 3., while I'm skeptical that 2. is worth the effort.

(If we go for 3, I would leave it to a separate PR)

jreback · 2017-12-21T14:11:40Z

pandas/tests/frame/test_reshape.py

@@ -560,16 +560,6 @@ def test_unstack_dtypes(self):
            assert left.shape == (3, 2)
            tm.assert_frame_equal(left, right)

-    def test_unstack_non_unique_index_names(self):


what does this do now?

It raises when you try to create the MultiIndex with duplicated name.

jreback · 2017-12-21T14:12:38Z

pandas/tests/indexes/test_multi.py

+                      [[0, 1]] * 3, names=names)
+
+        # With .rename()
+        mi = pd.MultiIndex.from_product([[0, 1]] * 3)


can you check the error messages here, use tm.assert_raises_regex

jreback · 2017-12-21T14:13:48Z

pandas/tests/series/test_alter_axes.py

@@ -214,17 +214,6 @@ def test_reorder_levels(self):
        expected = Series(np.arange(6), index=e_idx)
        assert_series_equal(result, expected)

-        result = s.reorder_levels([0, 0, 0])


can you add a test that raises

It is the new test_set_index_duplicate_names (the constructor itself raises).

jreback · 2017-12-21T17:38:28Z

doc/source/whatsnew/v0.22.0.txt

@@ -179,6 +179,7 @@ Other API Changes
 - A :class:`Series` of ``dtype=category`` constructed from an empty ``dict`` will now have categories of ``dtype=object`` rather than ``dtype=float64``, consistently with the case in which an empty list is passed (:issue:`18515`)
 - ``NaT`` division with :class:`datetime.timedelta` will now return ``NaN`` instead of raising (:issue:`17876`)
 - All-NaN levels in a ``MultiIndex`` are now assigned ``float`` rather than ``object`` dtype, promoting consistency with ``Index`` (:issue:`17929`).
+- Levels names of a ``MultiIndex`` (when not None) are now required to be unique: trying to create a ``MultiIndex`` with repeated names will raise a ``ValueError`` (:issue:`18872`)


rebase on master. can you move to 0.23 (docs were renamed), prob easiest to just check this file from master and past in new one

codecov · 2017-12-22T15:34:57Z

Codecov Report

Merging #18882 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #18882      +/-   ##
==========================================
+ Coverage   91.58%   91.58%   +<.01%     
==========================================
  Files         150      150              
  Lines       48967    48962       -5     
==========================================
- Hits        44846    44842       -4     
+ Misses       4121     4120       -1

Flag	Coverage Δ
#multiple	`89.94% <100%> (ø)`	⬆️
#single	`41.73% <60%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/reshape/reshape.py	`100% <ø> (ø)`	⬆️
pandas/core/indexes/multi.py	`96.22% <100%> (-0.08%)`	⬇️
pandas/util/testing.py	`84.9% <0%> (+0.21%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4883a43...53cb26c. Read the comment docs.

toobaz · 2017-12-22T15:54:45Z

rebased, ping

jreback

lgtm. minor request on the error message. ping on green.

jreback · 2017-12-23T20:14:26Z

pandas/core/indexes/multi.py


        # set the name
        for l, name in zip(level, names):
+            if name is not None and name in used:
+                raise ValueError("Duplicated level name: {}.".format(name))


can you enumerate on this and report the level number where the error is as well
e.g. level 2, duplicate name 'foo' or somesuch

pep8speaks · 2017-12-23T23:23:24Z

Hello @toobaz! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on December 29, 2017 at 09:15 Hours UTC

closes pandas-dev#18872

toobaz · 2017-12-29T10:59:38Z

@jreback ping

jreback · 2017-12-29T14:34:20Z

thanks!

closes pandas-dev#18872

TomAugspurger · 2018-01-01T17:37:29Z

FYI, this is causing failures on dask due to operations like:

In [1]: import dask.dataframe as dd

In [2]: import pandas as pd

In [3]:     pdf = pd.DataFrame({'a': [1, 2, 3, 4, 5, 6, 7, 8, 9],
   ...:                         'b': [4, 5, 6, 3, 2, 1, 0, 0, 0]},
   ...:                        index=[0, 1, 3, 5, 6, 8, 9, 9, 9]).set_index("a")
   ...:
   ...:

In [4]: pdf.groupby(pdf.index).apply(lambda x: x.b)
Out[4]:
a  a
1  1    4
2  2    5
3  3    6
4  4    3
5  5    2
6  6    1
7  7    0
8  8    0
9  9    0
Name: b, dtype: int64

essentially, grouping by an index and doing a .apply that returns a Series whose index is the same as the original (at least the same name). This seems like a fairly common use-case.

I think we should make this a warning for now and and raise later.

Pandas 0.23 is disallowing duplicate names in MultiIndexes. This adjusts a test that relied on that behavior, and `groupby().nunique` which produced it as a by-product. Closes dask#3039 xref pandas-dev/pandas#18882

* COMPAT: Pandas 0.23 duplicate names in MI Pandas 0.23 is disallowing duplicate names in MultiIndexes. This adjusts a test that relied on that behavior, and `groupby().nunique` which produced it as a by-product. Closes #3039 xref pandas-dev/pandas#18882 * Cleanup renaming

toobaz force-pushed the dup_lev_names_18872 branch from a3361a2 to e23fa82 Compare December 21, 2017 08:06

jreback added API Design MultiIndex labels Dec 21, 2017

jreback requested changes Dec 21, 2017

View reviewed changes

toobaz force-pushed the dup_lev_names_18872 branch 3 times, most recently from 5503b34 to 0044bbe Compare December 22, 2017 11:50

jreback requested changes Dec 23, 2017

View reviewed changes

toobaz force-pushed the dup_lev_names_18872 branch from 0044bbe to a1db0e0 Compare December 23, 2017 23:23

toobaz force-pushed the dup_lev_names_18872 branch from a1db0e0 to 33aa1f3 Compare December 23, 2017 23:24

API: disallow duplicate level names

53cb26c

closes pandas-dev#18872

toobaz force-pushed the dup_lev_names_18872 branch from 33aa1f3 to 53cb26c Compare December 29, 2017 09:15

jreback added this to the 0.23.0 milestone Dec 29, 2017

jreback approved these changes Dec 29, 2017

View reviewed changes

jreback merged commit 7958018 into pandas-dev:master Dec 29, 2017

toobaz deleted the dup_lev_names_18872 branch December 29, 2017 14:45

TomAugspurger mentioned this pull request Jan 1, 2018

CI failures with pandas master dask/dask#3039

Closed

hexgnu pushed a commit to hexgnu/pandas that referenced this pull request Jan 1, 2018

API: disallow duplicate level names (pandas-dev#18882)

01e4216

closes pandas-dev#18872

This was referenced Jan 1, 2018

Warn on duplicate names in MI? #19029

Closed

COMPAT: Pandas 0.23 duplicate names in MI dask/dask#3041

Merged

J535D165 mentioned this pull request Feb 8, 2018

Problems duplicated level names for MultiIndex objects J535D165/recordlinkage#55

Closed

5 tasks

toobaz mentioned this pull request Apr 24, 2018

BUG: fix to_latex() when using MultiIndex with NaN in (#14249) (v2) #20797

Merged

7 tasks

jorisvandenbossche mentioned this pull request May 16, 2018

groupby breaks when using duplicated level names #21075

Closed

jorisvandenbossche mentioned this pull request Jun 11, 2018

API: re-allow duplicate index level names #21423

Merged

pepicello mentioned this pull request Sep 5, 2019

Clarify that MultiIndex.set_levels() interprets passed values as new components of the .levels attribute #28294

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API: disallow duplicate level names #18882

API: disallow duplicate level names #18882

toobaz commented Dec 20, 2017

jbrockmendel commented Dec 20, 2017

toobaz commented Dec 21, 2017 •

edited

Loading

jreback Dec 21, 2017

toobaz Dec 21, 2017

jreback Dec 21, 2017

toobaz Dec 22, 2017

jreback Dec 21, 2017

toobaz Dec 21, 2017

jreback Dec 21, 2017

codecov bot commented Dec 22, 2017 •

edited

Loading

toobaz commented Dec 22, 2017

jreback left a comment

jreback Dec 23, 2017

pep8speaks commented Dec 23, 2017 •

edited

Loading

toobaz commented Dec 29, 2017

jreback commented Dec 29, 2017

TomAugspurger commented Jan 1, 2018 •

edited

Loading

API: disallow duplicate level names #18882

API: disallow duplicate level names #18882

Conversation

toobaz commented Dec 20, 2017

jbrockmendel commented Dec 20, 2017

toobaz commented Dec 21, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Dec 22, 2017 • edited Loading

Codecov Report

toobaz commented Dec 22, 2017

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pep8speaks commented Dec 23, 2017 • edited Loading

Comment last updated on December 29, 2017 at 09:15 Hours UTC

toobaz commented Dec 29, 2017

jreback commented Dec 29, 2017

TomAugspurger commented Jan 1, 2018 • edited Loading

toobaz commented Dec 21, 2017 •

edited

Loading

codecov bot commented Dec 22, 2017 •

edited

Loading

pep8speaks commented Dec 23, 2017 •

edited

Loading

TomAugspurger commented Jan 1, 2018 •

edited

Loading