Skip to content

API: add top-level melt function as method #15521

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/api.rst
Original file line number Diff line number Diff line change
@@ -933,6 +933,7 @@ Reshaping, sorting, transposing
DataFrame.swaplevel
DataFrame.stack
DataFrame.unstack
DataFrame.melt
DataFrame.T
DataFrame.to_panel
DataFrame.to_xarray
9 changes: 5 additions & 4 deletions doc/source/reshaping.rst
Original file line number Diff line number Diff line change
@@ -265,7 +265,7 @@ the right thing:
Reshaping by Melt
-----------------

The :func:`~pandas.melt` function is useful to massage a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you mention here both?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done (GitHub is not folding this review component for some reason, however).

The ``melt`` and :func:`~DataFrame.melt` functions are useful to massage a
DataFrame into a format where one or more columns are identifier variables,
while all other columns, considered measured variables, are "unpivoted" to the
row axis, leaving just two non-identifier columns, "variable" and "value". The
@@ -281,10 +281,11 @@ For instance,
'height' : [5.5, 6.0],
'weight' : [130, 150]})
cheese
pd.melt(cheese, id_vars=['first', 'last'])
pd.melt(cheese, id_vars=['first', 'last'], var_name='quantity')
cheese.melt(id_vars=['first', 'last'])
cheese.melt(id_vars=['first', 'last'], var_name='quantity')

Another way to transform is to use the ``wide_to_long`` panel data convenience function.
Another way to transform is to use the ``wide_to_long`` panel data convenience
function.

.. ipython:: python

1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.20.0.txt
Original file line number Diff line number Diff line change
@@ -324,6 +324,7 @@ Other Enhancements
- ``Series.sort_index`` accepts parameters ``kind`` and ``na_position`` (:issue:`13589`, :issue:`14444`)

- ``DataFrame`` has gained a ``nunique()`` method to count the distinct values over an axis (:issue:`14336`).
- ``DataFrame`` has gained a ``melt()`` method, equivalent to ``pd.melt()``, for unpivoting from a wide to long format (:issue:`12640`).
- ``DataFrame.groupby()`` has gained a ``.nunique()`` method to count the distinct values for all columns within each group (:issue:`14336`, :issue:`15197`).

- ``pd.read_excel()`` now preserves sheet order when using ``sheetname=None`` (:issue:`9930`)
105 changes: 104 additions & 1 deletion pandas/core/frame.py
Original file line number Diff line number Diff line change
@@ -105,7 +105,9 @@
optional_by="""
by : str or list of str
Name or list of names which refer to the axis items.""",
versionadded_to_excel='')
versionadded_to_excel='',
versionadded_melt='\n.. versionadded:: 0.20.0\n',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs extra spaces (you can check pd.DataFrame.melt? if you are on this branch)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean by this, can you clarify?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you do pd.DataFrame.melt? you see:

Signature: pd.DataFrame.melt(self, id_vars=None, value_vars=None, var_name=None, value_name='value', col_level=None)
Docstring:
    "Unpivots" a DataFrame from wide format to long format, optionally leaving
    identifier variables set.

    This function is useful to massage a DataFrame into a format where one
    or more columns are identifier variables (`id_vars`), while all other
    columns, considered measured variables (`value_vars`), are "unpivoted" to
    the row axis, leaving just two non-identifier columns, 'variable' and
    'value'.

.. versionadded:: 0.20.0


    Parameters
    ----------
    ....

so the versionadded is not indented as the other lines.

But I think Jeff fixed it before merging

other_melt='melt')

_numeric_only_doc = """numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use
@@ -4051,6 +4053,107 @@ def unstack(self, level=-1, fill_value=None):
from pandas.core.reshape import unstack
return unstack(self, level, fill_value)

_shared_docs['melt'] = """
"Unpivots" a DataFrame from wide format to long format, optionally leaving
identifier variables set.

This function is useful to massage a DataFrame into a format where one
or more columns are identifier variables (`id_vars`), while all other
columns, considered measured variables (`value_vars`), are "unpivoted" to
the row axis, leaving just two non-identifier columns, 'variable' and
'value'.
%(versionadded_melt)s

Parameters
----------
frame : DataFrame
id_vars : tuple, list, or ndarray, optional
Column(s) to use as identifier variables.
value_vars : tuple, list, or ndarray, optional
Column(s) to unpivot. If not specified, uses all columns that
are not set as `id_vars`.
var_name : scalar
Name to use for the 'variable' column. If None it uses
``frame.columns.name`` or 'variable'.
value_name : scalar, default 'value'
Name to use for the 'value' column.
col_level : int or string, optional
If columns are a MultiIndex then use this level to melt.

See also
--------
%(other_melt)s
pivot_table
DataFrame.pivot

Examples
--------
>>> import pandas as pd
>>> df = pd.DataFrame({'A': {0: 'a', 1: 'b', 2: 'c'},
... 'B': {0: 1, 1: 3, 2: 5},
... 'C': {0: 2, 1: 4, 2: 6}})
>>> df
A B C
0 a 1 2
1 b 3 4
2 c 5 6

>>> pd.melt(df, id_vars=['A'], value_vars=['B'])
A variable value
0 a B 1
1 b B 3
2 c B 5

>>> pd.melt(df, id_vars=['A'], value_vars=['B', 'C'])
A variable value
0 a B 1
1 b B 3
2 c B 5
3 a C 2
4 b C 4
5 c C 6

The names of 'variable' and 'value' columns can be customized:

>>> pd.melt(df, id_vars=['A'], value_vars=['B'],
... var_name='myVarname', value_name='myValname')
A myVarname myValname
0 a B 1
1 b B 3
2 c B 5

If you have multi-index columns:

>>> df.columns = [list('ABC'), list('DEF')]
>>> df
A B C
D E F
0 a 1 2
1 b 3 4
2 c 5 6

>>> pd.melt(df, col_level=0, id_vars=['A'], value_vars=['B'])
A variable value
0 a B 1
1 b B 3
2 c B 5

>>> pd.melt(df, id_vars=[('A', 'D')], value_vars=[('B', 'E')])
(A, D) variable_0 variable_1 value
0 a B E 1
1 b B E 3
2 c B E 5

"""

@Appender(_shared_docs['melt'] % _shared_doc_kwargs)
def melt(self, id_vars=None, value_vars=None, var_name=None,
value_name='value', col_level=None):
from pandas.core.reshape import melt
return melt(self, id_vars=id_vars, value_vars=value_vars,
var_name=var_name, value_name=value_name,
col_level=col_level)

# ----------------------------------------------------------------------
# Time series-related

96 changes: 6 additions & 90 deletions pandas/core/reshape.py
Original file line number Diff line number Diff line change
@@ -28,6 +28,11 @@
import pandas.core.algorithms as algos
from pandas._libs import algos as _algos, reshape as _reshape

from pandas.core.frame import _shared_docs
from pandas.util.decorators import Appender
_shared_docs_kwargs = dict(
versionadded_melt="", other_melt='DataFrame.melt')

from pandas.core.index import MultiIndex, _get_na_value


@@ -701,98 +706,9 @@ def _convert_level_number(level_num, columns):
return result


@Appender(_shared_docs['melt'] % _shared_docs_kwargs)
def melt(frame, id_vars=None, value_vars=None, var_name=None,
value_name='value', col_level=None):
"""
"Unpivots" a DataFrame from wide format to long format, optionally leaving
identifier variables set.

This function is useful to massage a DataFrame into a format where one
or more columns are identifier variables (`id_vars`), while all other
columns, considered measured variables (`value_vars`), are "unpivoted" to
the row axis, leaving just two non-identifier columns, 'variable' and
'value'.

Parameters
----------
frame : DataFrame
id_vars : tuple, list, or ndarray, optional
Column(s) to use as identifier variables.
value_vars : tuple, list, or ndarray, optional
Column(s) to unpivot. If not specified, uses all columns that
are not set as `id_vars`.
var_name : scalar
Name to use for the 'variable' column. If None it uses
``frame.columns.name`` or 'variable'.
value_name : scalar, default 'value'
Name to use for the 'value' column.
col_level : int or string, optional
If columns are a MultiIndex then use this level to melt.

See also
--------
pivot_table
DataFrame.pivot

Examples
--------
>>> import pandas as pd
>>> df = pd.DataFrame({'A': {0: 'a', 1: 'b', 2: 'c'},
... 'B': {0: 1, 1: 3, 2: 5},
... 'C': {0: 2, 1: 4, 2: 6}})
>>> df
A B C
0 a 1 2
1 b 3 4
2 c 5 6

>>> pd.melt(df, id_vars=['A'], value_vars=['B'])
A variable value
0 a B 1
1 b B 3
2 c B 5

>>> pd.melt(df, id_vars=['A'], value_vars=['B', 'C'])
A variable value
0 a B 1
1 b B 3
2 c B 5
3 a C 2
4 b C 4
5 c C 6

The names of 'variable' and 'value' columns can be customized:

>>> pd.melt(df, id_vars=['A'], value_vars=['B'],
... var_name='myVarname', value_name='myValname')
A myVarname myValname
0 a B 1
1 b B 3
2 c B 5

If you have multi-index columns:

>>> df.columns = [list('ABC'), list('DEF')]
>>> df
A B C
D E F
0 a 1 2
1 b 3 4
2 c 5 6

>>> pd.melt(df, col_level=0, id_vars=['A'], value_vars=['B'])
A variable value
0 a B 1
1 b B 3
2 c B 5

>>> pd.melt(df, id_vars=[('A', 'D')], value_vars=[('B', 'E')])
(A, D) variable_0 variable_1 value
0 a B E 1
1 b B E 3
2 c B E 5

"""
# TODO: what about the existing index?
if id_vars is not None:
if not is_list_like(id_vars):
102 changes: 64 additions & 38 deletions pandas/tests/test_reshape.py
Original file line number Diff line number Diff line change
@@ -30,23 +30,46 @@ def setUp(self):
self.df1.columns = [list('ABC'), list('abc')]
self.df1.columns.names = ['CAP', 'low']

def test_default_col_names(self):
def test_top_level_method(self):
result = melt(self.df)
self.assertEqual(result.columns.tolist(), ['variable', 'value'])

result1 = melt(self.df, id_vars=['id1'])
def test_method_signatures(self):
tm.assert_frame_equal(self.df.melt(),
melt(self.df))

tm.assert_frame_equal(self.df.melt(id_vars=['id1', 'id2'],
value_vars=['A', 'B']),
melt(self.df,
id_vars=['id1', 'id2'],
value_vars=['A', 'B']))

tm.assert_frame_equal(self.df.melt(var_name=self.var_name,
value_name=self.value_name),
melt(self.df,
var_name=self.var_name,
value_name=self.value_name))

tm.assert_frame_equal(self.df1.melt(col_level=0),
melt(self.df1, col_level=0))

def test_default_col_names(self):
result = self.df.melt()
self.assertEqual(result.columns.tolist(), ['variable', 'value'])

result1 = self.df.melt(id_vars=['id1'])
self.assertEqual(result1.columns.tolist(), ['id1', 'variable', 'value'
])

result2 = melt(self.df, id_vars=['id1', 'id2'])
result2 = self.df.melt(id_vars=['id1', 'id2'])
self.assertEqual(result2.columns.tolist(), ['id1', 'id2', 'variable',
'value'])

def test_value_vars(self):
result3 = melt(self.df, id_vars=['id1', 'id2'], value_vars='A')
result3 = self.df.melt(id_vars=['id1', 'id2'], value_vars='A')
self.assertEqual(len(result3), 10)

result4 = melt(self.df, id_vars=['id1', 'id2'], value_vars=['A', 'B'])
result4 = self.df.melt(id_vars=['id1', 'id2'], value_vars=['A', 'B'])
expected4 = DataFrame({'id1': self.df['id1'].tolist() * 2,
'id2': self.df['id2'].tolist() * 2,
'variable': ['A'] * 10 + ['B'] * 10,
@@ -65,8 +88,8 @@ def test_value_vars_types(self):
columns=['id1', 'id2', 'variable', 'value'])

for type_ in (tuple, list, np.array):
result = melt(self.df, id_vars=['id1', 'id2'],
value_vars=type_(('A', 'B')))
result = self.df.melt(id_vars=['id1', 'id2'],
value_vars=type_(('A', 'B')))
tm.assert_frame_equal(result, expected)

def test_vars_work_with_multiindex(self):
@@ -77,7 +100,7 @@ def test_vars_work_with_multiindex(self):
'value': self.df1[('B', 'b')],
}, columns=[('A', 'a'), 'CAP', 'low', 'value'])

result = melt(self.df1, id_vars=[('A', 'a')], value_vars=[('B', 'b')])
result = self.df1.melt(id_vars=[('A', 'a')], value_vars=[('B', 'b')])
tm.assert_frame_equal(result, expected)

def test_tuple_vars_fail_with_multiindex(self):
@@ -92,26 +115,26 @@ def test_tuple_vars_fail_with_multiindex(self):
for id_vars, value_vars in ((tuple_a, list_b), (list_a, tuple_b),
(tuple_a, tuple_b)):
with tm.assertRaisesRegexp(ValueError, r'MultiIndex'):
melt(self.df1, id_vars=id_vars, value_vars=value_vars)
self.df1.melt(id_vars=id_vars, value_vars=value_vars)

def test_custom_var_name(self):
result5 = melt(self.df, var_name=self.var_name)
result5 = self.df.melt(var_name=self.var_name)
self.assertEqual(result5.columns.tolist(), ['var', 'value'])

result6 = melt(self.df, id_vars=['id1'], var_name=self.var_name)
result6 = self.df.melt(id_vars=['id1'], var_name=self.var_name)
self.assertEqual(result6.columns.tolist(), ['id1', 'var', 'value'])

result7 = melt(self.df, id_vars=['id1', 'id2'], var_name=self.var_name)
result7 = self.df.melt(id_vars=['id1', 'id2'], var_name=self.var_name)
self.assertEqual(result7.columns.tolist(), ['id1', 'id2', 'var',
'value'])

result8 = melt(self.df, id_vars=['id1', 'id2'], value_vars='A',
var_name=self.var_name)
result8 = self.df.melt(id_vars=['id1', 'id2'], value_vars='A',
var_name=self.var_name)
self.assertEqual(result8.columns.tolist(), ['id1', 'id2', 'var',
'value'])

result9 = melt(self.df, id_vars=['id1', 'id2'], value_vars=['A', 'B'],
var_name=self.var_name)
result9 = self.df.melt(id_vars=['id1', 'id2'], value_vars=['A', 'B'],
var_name=self.var_name)
expected9 = DataFrame({'id1': self.df['id1'].tolist() * 2,
'id2': self.df['id2'].tolist() * 2,
self.var_name: ['A'] * 10 + ['B'] * 10,
@@ -121,24 +144,24 @@ def test_custom_var_name(self):
tm.assert_frame_equal(result9, expected9)

def test_custom_value_name(self):
result10 = melt(self.df, value_name=self.value_name)
result10 = self.df.melt(value_name=self.value_name)
self.assertEqual(result10.columns.tolist(), ['variable', 'val'])

result11 = melt(self.df, id_vars=['id1'], value_name=self.value_name)
result11 = self.df.melt(id_vars=['id1'], value_name=self.value_name)
self.assertEqual(result11.columns.tolist(), ['id1', 'variable', 'val'])

result12 = melt(self.df, id_vars=['id1', 'id2'],
value_name=self.value_name)
result12 = self.df.melt(id_vars=['id1', 'id2'],
value_name=self.value_name)
self.assertEqual(result12.columns.tolist(), ['id1', 'id2', 'variable',
'val'])

result13 = melt(self.df, id_vars=['id1', 'id2'], value_vars='A',
value_name=self.value_name)
result13 = self.df.melt(id_vars=['id1', 'id2'], value_vars='A',
value_name=self.value_name)
self.assertEqual(result13.columns.tolist(), ['id1', 'id2', 'variable',
'val'])

result14 = melt(self.df, id_vars=['id1', 'id2'], value_vars=['A', 'B'],
value_name=self.value_name)
result14 = self.df.melt(id_vars=['id1', 'id2'], value_vars=['A', 'B'],
value_name=self.value_name)
expected14 = DataFrame({'id1': self.df['id1'].tolist() * 2,
'id2': self.df['id2'].tolist() * 2,
'variable': ['A'] * 10 + ['B'] * 10,
@@ -150,26 +173,29 @@ def test_custom_value_name(self):

def test_custom_var_and_value_name(self):

result15 = melt(self.df, var_name=self.var_name,
value_name=self.value_name)
result15 = self.df.melt(var_name=self.var_name,
value_name=self.value_name)
self.assertEqual(result15.columns.tolist(), ['var', 'val'])

result16 = melt(self.df, id_vars=['id1'], var_name=self.var_name,
value_name=self.value_name)
result16 = self.df.melt(id_vars=['id1'], var_name=self.var_name,
value_name=self.value_name)
self.assertEqual(result16.columns.tolist(), ['id1', 'var', 'val'])

result17 = melt(self.df, id_vars=['id1', 'id2'],
var_name=self.var_name, value_name=self.value_name)
result17 = self.df.melt(id_vars=['id1', 'id2'],
var_name=self.var_name,
value_name=self.value_name)
self.assertEqual(result17.columns.tolist(), ['id1', 'id2', 'var', 'val'
])

result18 = melt(self.df, id_vars=['id1', 'id2'], value_vars='A',
var_name=self.var_name, value_name=self.value_name)
result18 = self.df.melt(id_vars=['id1', 'id2'], value_vars='A',
var_name=self.var_name,
value_name=self.value_name)
self.assertEqual(result18.columns.tolist(), ['id1', 'id2', 'var', 'val'
])

result19 = melt(self.df, id_vars=['id1', 'id2'], value_vars=['A', 'B'],
var_name=self.var_name, value_name=self.value_name)
result19 = self.df.melt(id_vars=['id1', 'id2'], value_vars=['A', 'B'],
var_name=self.var_name,
value_name=self.value_name)
expected19 = DataFrame({'id1': self.df['id1'].tolist() * 2,
'id2': self.df['id2'].tolist() * 2,
self.var_name: ['A'] * 10 + ['B'] * 10,
@@ -181,17 +207,17 @@ def test_custom_var_and_value_name(self):

df20 = self.df.copy()
df20.columns.name = 'foo'
result20 = melt(df20)
result20 = df20.melt()
self.assertEqual(result20.columns.tolist(), ['foo', 'value'])

def test_col_level(self):
res1 = melt(self.df1, col_level=0)
res2 = melt(self.df1, col_level='CAP')
res1 = self.df1.melt(col_level=0)
res2 = self.df1.melt(col_level='CAP')
self.assertEqual(res1.columns.tolist(), ['CAP', 'value'])
self.assertEqual(res2.columns.tolist(), ['CAP', 'value'])

def test_multiindex(self):
res = pd.melt(self.df1)
res = self.df1.melt()
self.assertEqual(res.columns.tolist(), ['CAP', 'low', 'value'])