Skip to content

API: re-allow duplicate index level names #21423

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jorisvandenbossche
Copy link
Member

One possible solution for #19029

WIP (need to clean up tests and possibly re-add some ones that have been removed in #18882)

@jorisvandenbossche jorisvandenbossche force-pushed the api-duplicate-level-names branch from 59cc3dd to 511ee38 Compare June 11, 2018 15:21
@codecov
Copy link

codecov bot commented Jun 11, 2018

Codecov Report

Merging #21423 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #21423      +/-   ##
==========================================
+ Coverage    91.9%    91.9%   +<.01%     
==========================================
  Files         154      154              
  Lines       49558    49560       +2     
==========================================
+ Hits        45545    45548       +3     
+ Misses       4013     4012       -1
Flag Coverage Δ
#multiple 90.27% <100%> (ø) ⬆️
#single 42.02% <20%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/reshape/reshape.py 99.78% <100%> (ø) ⬆️
pandas/core/indexes/multi.py 94.96% <100%> (+0.07%) ⬆️
pandas/core/indexes/interval.py 93.13% <0%> (-0.04%) ⬇️
pandas/core/frame.py 97.19% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 44c5460...fde8231. Read the comment docs.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jun 27, 2018

+1 in general.

Will want a semi-detailed release note, potentially laying out future plans with respect to duplicate names?

@jorisvandenbossche jorisvandenbossche added this to the 0.23.2 milestone Jun 27, 2018
@jorisvandenbossche
Copy link
Member Author

Will want a semi-detailed release note, potentially laying out future plans with respect to duplicate names?

Until we don't exactly know what those future plans are, I would rather keep it limited (only noting that duplicated level names are allowed again?)

@jorisvandenbossche jorisvandenbossche changed the title [WIP] API: re-allow duplicate index level names API: re-allow duplicate index level names Jun 27, 2018
@jorisvandenbossche
Copy link
Member Author

@jreback @toobaz ready for review

I went through the tests and added back some of the tests removed in #18882

I also added back MultiIndex._reference_duplicate_name to use it in stack/unstack (so we can there raise an informative error message). However, I would personally prefer that MultiIndex._get_level_number directly raises this informative warning, then the extra checking in stack/unstack can be removed. However, that is a slight change in behaviour for eg get_level_values (KeyError -> ValueError), so I would rather leave that for 0.24.0.

@jorisvandenbossche
Copy link
Member Author

However, I would personally prefer that MultiIndex._get_level_number directly raises this informative warning

That was also the original intent of:

count = self.names.count(level)
if count > 1:
raise ValueError('The name %s occurs multiple times, use a '
'level number' % level)
level = self.names.index(level)
except ValueError:

where already an informative ValueError is raised, but now the ValueError is catched, and then a generic KeyError is raised (this was changed 5 years ago: bdf270c#diff-dc96e62847cdfe77b570cf9c6b0e8086R1536)

@TomAugspurger
Copy link
Contributor

Until we don't exactly know what those future plans are, I would rather keep it limited (only noting that duplicated level names are allowed again?)

Agreed.

@@ -115,6 +115,12 @@ def __init__(self, values, index, level=-1, value_columns=None,

self.index = index.remove_unused_levels()

if isinstance(self.index, MultiIndex):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather not add API here. why don't you just do this check inside _get_level_number itself, and then also add to Index so you don't also have to add an instance check?

Copy link
Member Author

@jorisvandenbossche jorisvandenbossche Jun 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you read my comment in the main thread?
This code above is exactly how it was before. As I said, I also think this is better in get_level_number, but propose to do a follow-up PR that can be merged for 0.24.0, as it gives an additional API change (change in error type)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then let's just wait till 0.24.0 to do this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like a good fix for the regression, why not merge it? It's no different that reverting a previous commit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it is how it was for a long time, and this is fixing a regression so it is the point to do it now and not wait for 0.24.0.

I can certainly do the PR for 0.24.0 to further clean this up, but that will involve an API change so I will not do that in this PR.

@jreback jreback merged commit 66b517c into pandas-dev:master Jun 29, 2018
@jreback
Copy link
Contributor

jreback commented Jun 29, 2018

fair enough, is there an issue to clean this up?

@jorisvandenbossche jorisvandenbossche deleted the api-duplicate-level-names branch June 29, 2018 08:37
@jorisvandenbossche
Copy link
Member Author

@jreback I am doing a PR, but ran into some issues: #21677

@jorisvandenbossche
Copy link
Member Author

Opened the PR to not forget his (#21678), but as mentioned, we first need to discuss the other issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants