-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
API: disallow duplicate level names #18882
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
+1 on the idea. Some cases to think about: pd.NaT, np.nan, anything else where x != x |
I see only three options:
The current PR does 1., I would be OK with 3., while I'm skeptical that 2. is worth the effort. (If we go for 3, I would leave it to a separate PR) |
a3361a2
to
e23fa82
Compare
@@ -560,16 +560,6 @@ def test_unstack_dtypes(self): | |||
assert left.shape == (3, 2) | |||
tm.assert_frame_equal(left, right) | |||
|
|||
def test_unstack_non_unique_index_names(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this do now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It raises when you try to create the MultiIndex
with duplicated name.
[[0, 1]] * 3, names=names) | ||
|
||
# With .rename() | ||
mi = pd.MultiIndex.from_product([[0, 1]] * 3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you check the error messages here, use tm.assert_raises_regex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(done)
@@ -214,17 +214,6 @@ def test_reorder_levels(self): | |||
expected = Series(np.arange(6), index=e_idx) | |||
assert_series_equal(result, expected) | |||
|
|||
result = s.reorder_levels([0, 0, 0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a test that raises
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is the new test_set_index_duplicate_names
(the constructor itself raises).
doc/source/whatsnew/v0.22.0.txt
Outdated
@@ -179,6 +179,7 @@ Other API Changes | |||
- A :class:`Series` of ``dtype=category`` constructed from an empty ``dict`` will now have categories of ``dtype=object`` rather than ``dtype=float64``, consistently with the case in which an empty list is passed (:issue:`18515`) | |||
- ``NaT`` division with :class:`datetime.timedelta` will now return ``NaN`` instead of raising (:issue:`17876`) | |||
- All-NaN levels in a ``MultiIndex`` are now assigned ``float`` rather than ``object`` dtype, promoting consistency with ``Index`` (:issue:`17929`). | |||
- Levels names of a ``MultiIndex`` (when not None) are now required to be unique: trying to create a ``MultiIndex`` with repeated names will raise a ``ValueError`` (:issue:`18872`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rebase on master. can you move to 0.23 (docs were renamed), prob easiest to just check this file from master and past in new one
5503b34
to
0044bbe
Compare
Codecov Report
@@ Coverage Diff @@
## master #18882 +/- ##
==========================================
+ Coverage 91.58% 91.58% +<.01%
==========================================
Files 150 150
Lines 48967 48962 -5
==========================================
- Hits 44846 44842 -4
+ Misses 4121 4120 -1
Continue to review full report at Codecov.
|
rebased, ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. minor request on the error message. ping on green.
pandas/core/indexes/multi.py
Outdated
|
||
# set the name | ||
for l, name in zip(level, names): | ||
if name is not None and name in used: | ||
raise ValueError("Duplicated level name: {}.".format(name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you enumerate on this and report the level number where the error is as well
e.g. level 2, duplicate name 'foo'
or somesuch
0044bbe
to
a1db0e0
Compare
Hello @toobaz! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on December 29, 2017 at 09:15 Hours UTC |
a1db0e0
to
33aa1f3
Compare
33aa1f3
to
53cb26c
Compare
@jreback ping |
thanks! |
FYI, this is causing failures on dask due to operations like: In [1]: import dask.dataframe as dd
In [2]: import pandas as pd
In [3]: pdf = pd.DataFrame({'a': [1, 2, 3, 4, 5, 6, 7, 8, 9],
...: 'b': [4, 5, 6, 3, 2, 1, 0, 0, 0]},
...: index=[0, 1, 3, 5, 6, 8, 9, 9, 9]).set_index("a")
...:
...:
In [4]: pdf.groupby(pdf.index).apply(lambda x: x.b)
Out[4]:
a a
1 1 4
2 2 5
3 3 6
4 4 3
5 5 2
6 6 1
7 7 0
8 8 0
9 9 0
Name: b, dtype: int64 essentially, grouping by an index and doing a I think we should make this a warning for now and and raise later. |
Pandas 0.23 is disallowing duplicate names in MultiIndexes. This adjusts a test that relied on that behavior, and `groupby().nunique` which produced it as a by-product. Closes dask#3039 xref pandas-dev/pandas#18882
* COMPAT: Pandas 0.23 duplicate names in MI Pandas 0.23 is disallowing duplicate names in MultiIndexes. This adjusts a test that relied on that behavior, and `groupby().nunique` which produced it as a by-product. Closes #3039 xref pandas-dev/pandas#18882 * Cleanup renaming
git diff upstream/master -u -- "*.py" | flake8 --diff