Skip to content

ENH partial sorting for mi in sortlevel #6135

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 2, 2014

Conversation

hayd
Copy link
Contributor

@hayd hayd commented Jan 28, 2014

fixes #3984.

Still need to tweak the docstrings and stuff. But I think this is working.

cc @jtratner

@ghost
Copy link

ghost commented Jan 29, 2014

Would it be more general to overload level using types rather then adding a keyword
in 3 places? Perhaps that way supporting sorting on a "slice" of levels rather then
introducing a kw to choose between "that one" and "that one and all those that follow"?

@hayd
Copy link
Contributor Author

hayd commented Jan 29, 2014

I was just concerned about keeping old (strange?) behavior, I don't follow how to do this with types, but definitely the kwargs are inelegant.

@ghost
Copy link

ghost commented Jan 29, 2014

What I meant was you can express the same operation and more
by accepting slice(3,None) or slice(3,5), or correspondingly (3,) and (3,5)
as values for level. Using types as in, rather then integers, accept tuples/slices
to express a range of levels to sort.

@hayd
Copy link
Contributor Author

hayd commented Jan 29, 2014

That's a nice idea. The issue is the current behaviour is to sort all when you pass a single level e.g. sortlevel('A')... maybe this shouldn't be the default and we could do something more logical (like what you suggest)?

Actually this current behaviour is a little deranged... is sortlevels the only way to sort a mi (sort doesn't work!)...

@ghost
Copy link

ghost commented Jan 29, 2014

You'll have to give examples, I'm not sure what you mean by deranged.
It doesn't look to me like it sorts all levels at all.

You probably tried the following already:

mi=pd.MultiIndex.from_product([['A0','A1','A2'],['B0','B1','B2'],['C0','C1','C2']])
list(mi.get_values())
list(mi.sortlevel(0)[0].get_values())
list(mi.sortlevel(1)[0].get_values())
list(mi.sortlevel(2)[0].get_values())

That sorts by one and only one level, and it looks stable wrt to the other levels doing so.
mi.lexsort_depth seems to be set to 0 when sorting on inner levels, which is right.
all seems to be in order.

What are you after, preferably with explicit example?

@hayd
Copy link
Contributor Author

hayd commented Jan 29, 2014

"That sorts by one and only one level," here's an example, let's reverse:

In [8]: mi=pd.MultiIndex.from_product([['A2','A1','A0'],['B2','B1','B0'],['C2','C1','C0']])

In [10]: print mi.sortlevel(1)[0].get_values()
[('A0', 'B0', 'C0') ('A0', 'B0', 'C1') ('A0', 'B0', 'C2')
 ('A1', 'B0', 'C0') ('A1', 'B0', 'C1') ('A1', 'B0', 'C2')
 ('A2', 'B0', 'C0') ('A2', 'B0', 'C1') ('A2', 'B0', 'C2')
 ('A0', 'B1', 'C0') ('A0', 'B1', 'C1') ('A0', 'B1', 'C2')
 ('A1', 'B1', 'C0') ('A1', 'B1', 'C1') ('A1', 'B1', 'C2')
 ('A2', 'B1', 'C0') ('A2', 'B1', 'C1') ('A2', 'B1', 'C2')
 ('A0', 'B2', 'C0') ('A0', 'B2', 'C1') ('A0', 'B2', 'C2')
 ('A1', 'B2', 'C0') ('A1', 'B2', 'C1') ('A1', 'B2', 'C2')
 ('A2', 'B2', 'C0') ('A2', 'B2', 'C1') ('A2', 'B2', 'C2')]

@ghost
Copy link

ghost commented Jan 29, 2014

now I see what you mean.

@hayd
Copy link
Contributor Author

hayd commented Jan 29, 2014

This is kindof documented in the DataFrame.sortlevel, but tbh this is more of an observation than a spec:

Sort multilevel index by chosen axis and primary level. Data will be
lexicographically sorted by the chosen level followed by the other
levels (in order)

... :s

@ghost
Copy link

ghost commented Jan 29, 2014

Yeah, it's a tossup on which docstring is more abstruse, df.sortlevel or mi.sortlevel.

You could have level=(1,2) can sort just level 1, level=(1,3) means sort levels [1,2) ,
and level=(1,) can mean sort levels 1..end.

If you want to sort on multiple single levels, just do multiple invocations.

@ghost
Copy link

ghost commented Jan 29, 2014

But now I understand why you chose to have the default True keyword. makes sense.

@jorisvandenbossche
Copy link
Member

@hayd It's maybe also an option to fix this in sort/sort_index by providing there a level keyword? It does already work, but it just sorts all levels. And then you can do there the more 'logical' thing to only sort the level you specified?

@hayd
Copy link
Contributor Author

hayd commented Jan 29, 2014

@jorisvandenbossche I wasn't sure if this was part of @jtratner's work on refactoring index - and making the common methods.

+1 a level kwarg makes sense!

I was confused about what sort_index actually does, I thought it would sort by the index, but it appears to accept columns... i.e is just sort with an axis arg?

@jreback
Copy link
Contributor

jreback commented Jan 29, 2014

@hayd related is #5190

@jreback
Copy link
Contributor

jreback commented Feb 16, 2014

hows this coming?

@hayd
Copy link
Contributor Author

hayd commented Feb 18, 2014

@jreback Is the remaining thing here to add a level (and sort_remaining?) arg to sort and sort_index ?

@jreback
Copy link
Contributor

jreback commented Feb 18, 2014

I think this is fine...wouldn't really change sort/sort_index ATM.

ping me to look at this in a day or 2

@hayd
Copy link
Contributor Author

hayd commented Mar 5, 2014

ping

@jreback
Copy link
Contributor

jreback commented Mar 5, 2014

this was nice.... any other possibilites besides sort_remaining?

@hayd
Copy link
Contributor Author

hayd commented Mar 5, 2014

Not sure, happy to change, what's your issue with sort_remaining?

(sort_remaining_levels ?)

@jreback
Copy link
Contributor

jreback commented Mar 5, 2014

just sounds odd - though can't think of anything better

you are defaulting this to true right?

@hayd
Copy link
Contributor Author

hayd commented Mar 5, 2014

yeah, defaults to true so doesn't break backwards compat... (though perhaps we will later)

@jreback
Copy link
Contributor

jreback commented Mar 5, 2014

maybe I think it's a perf issue to completely sort

@hayd
Copy link
Contributor Author

hayd commented Mar 14, 2014

@jreback shall we get this in?

@jreback
Copy link
Contributor

jreback commented Mar 14, 2014

looks fine

release note / v0.14.0 mention

merge when ready

I don't think docs are really necessary as the do string covers

@jreback
Copy link
Contributor

jreback commented Apr 5, 2014

looks fine....just check release notes / v0.14

@@ -2627,7 +2627,8 @@ def trans(v):
else:
return self.take(indexer, axis=axis, convert=False, is_copy=False)

def sortlevel(self, level=0, axis=0, ascending=True, inplace=False):
def sortlevel(self, level=0, axis=0, ascending=True,
inplace=False, sort_remaining=True):
"""
Sort multilevel index by chosen axis and primary level. Data will be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be more clear as:

Lexicographically sort index of dataframe on specified axis, starting with the specified level and then sorting by other levels in the order they're defined on the multilevel index (sort_remaining can optionally disable sorting on other levels).

It's not perfect, but maybe clearer?

@jtratner
Copy link
Contributor

jtratner commented Apr 6, 2014

👍 from me as well. In addition to the minor suggestion on the docstring, should we add a warning about the performance hit from non-lexsorted MultiIndex? (either via level != 0 or sort_remaining=False)

@jreback
Copy link
Contributor

jreback commented Apr 10, 2014

ping!

@jreback
Copy link
Contributor

jreback commented Apr 21, 2014

ping

@jreback
Copy link
Contributor

jreback commented Apr 27, 2014

@hayd ping....need to get this in ASAP

@jreback
Copy link
Contributor

jreback commented May 1, 2014

ping!

jreback added a commit that referenced this pull request May 2, 2014
ENH partial sorting for mi in sortlevel
@jreback jreback merged commit d5f9493 into pandas-dev:master May 2, 2014
@hayd hayd deleted the mi_partial_sorting branch May 28, 2014 03:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff API Design
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sorting by multiple levels
4 participants