Skip to content

Commit 5d5e5d8

Browse files
Merge pull request #2102 from plotly/dimensions_max_cardinality
More flexible parallel_categories magic
2 parents 37c8c81 + 83d005f commit 5d5e5d8

File tree

5 files changed

+51
-7
lines changed

5 files changed

+51
-7
lines changed

doc/python/parallel-categories-diagram.md

+8-6
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,8 @@ jupyter:
55
text_representation:
66
extension: .md
77
format_name: markdown
8-
format_version: "1.1"
9-
jupytext_version: 1.1.1
8+
format_version: '1.2'
9+
jupytext_version: 1.3.1
1010
kernelspec:
1111
display_name: Python 3
1212
language: python
@@ -20,7 +20,7 @@ jupyter:
2020
name: python
2121
nbconvert_exporter: python
2222
pygments_lexer: ipython3
23-
version: 3.7.3
23+
version: 3.6.8
2424
plotly:
2525
description: How to make parallel categories diagrams in Python with Plotly.
2626
display_as: statistical
@@ -35,16 +35,18 @@ jupyter:
3535

3636
#### Parallel Categories Diagram
3737

38-
The parallel categories diagram is a visualization of multi-dimensional categorical data sets. Each variable in the data set is represented by a column of rectangles, where each rectangle corresponds to a discrete value taken on by that variable. The relative heights of the rectangles reflect the relative frequency of occurrence of the corresponding value.
38+
The parallel categories diagram (also known as parallel sets or alluvial diagram) is a visualization of multi-dimensional categorical data sets. Each variable in the data set is represented by a column of rectangles, where each rectangle corresponds to a discrete value taken on by that variable. The relative heights of the rectangles reflect the relative frequency of occurrence of the corresponding value.
3939

4040
Combinations of category rectangles across dimensions are connected by ribbons, where the height of the ribbon corresponds to the relative frequency of occurrence of the combination of categories in the data set.
4141

42-
For other representations of multivariate data, also see [parallel coordinates](/python/parallel-coordinates-plot/), [radar charts](/python/radar-chart/) and [scatterplot matrix (SPLOM)](/python/splom/).
42+
For other representations of multivariate data, also see [parallel coordinates](/python/parallel-coordinates-plot/), [radar charts](/python/radar-chart/) and [scatterplot matrix (SPLOM)](/python/splom/). A visually-similar but more generic type of visualization is the [sankey diagrams](/python/sankey-diagram/).
4343

4444
#### Basic Parallel Category Diagram with plotly.express
4545

4646
This example visualizes the resturant bills of a sample of 244 people. Hovering over a category rectangle (sex, smoker, etc) displays a tooltip with the number of people with that single trait. Hovering over a ribbon in the diagram displays a tooltip with the number of people with a particular combination of the five traits connected by the ribbon.
4747

48+
By default, `px.parallel_categories` will display any column in the `data_frame` that has a cardinality (or number of unique values) of less than 50. This can be overridden either by passing in a specific list of columns to `dimensions` or by setting `dimensions_max_cardinality` to something other than 50.
49+
4850
```python
4951
import plotly.express as px
5052

@@ -68,7 +70,7 @@ fig = px.parallel_categories(df, dimensions=['sex', 'smoker', 'day'],
6870
fig.show()
6971
```
7072

71-
#### Basic Parallel Categories Diagram
73+
### Basic Parallel Categories Diagram with `graph_objects`
7274

7375
This example illustartes the hair color, eye color, and sex of a sample of 8 people. The dimension labels can be dragged horizontally to reorder the dimensions and the category rectangles can be dragged vertically to reorder the categories within a dimension.
7476

packages/python/plotly/plotly/express/_chart_types.py

+1
Original file line numberDiff line numberDiff line change
@@ -1205,6 +1205,7 @@ def parallel_categories(
12051205
template=None,
12061206
width=None,
12071207
height=None,
1208+
dimensions_max_cardinality=50,
12081209
):
12091210
"""
12101211
In a parallel categories (or parallel sets) plot, each row of

packages/python/plotly/plotly/express/_core.py

+3-1
Original file line numberDiff line numberDiff line change
@@ -181,7 +181,9 @@ def make_trace_kwargs(args, trace_spec, g, mapping_labels, sizeref):
181181
)
182182
and (
183183
trace_spec.constructor != go.Parcats
184-
or len(args["data_frame"][name].unique()) <= 20
184+
or (v is not None and name in v)
185+
or len(args["data_frame"][name].unique())
186+
<= args["dimensions_max_cardinality"]
185187
)
186188
]
187189
result["dimensions"] = [

packages/python/plotly/plotly/express/_doc.py

+6
Original file line numberDiff line numberDiff line change
@@ -106,6 +106,12 @@
106106
colref_list_desc,
107107
"Values from these columns are used for multidimensional visualization.",
108108
],
109+
dimensions_max_cardinality=[
110+
"int (default 50)",
111+
"When `dimensions` is `None` and `data_frame` is provided, "
112+
"columns with more than this number of unique values are excluded from the output.",
113+
"Not used when `dimensions` is passed.",
114+
],
109115
error_x=[
110116
colref_type,
111117
colref_desc,

packages/python/plotly/plotly/tests/test_core/test_px/test_px_functions.py

+33
Original file line numberDiff line numberDiff line change
@@ -139,3 +139,36 @@ def test_funnel():
139139
color=["0", "0", "0", "1", "1", "1"],
140140
)
141141
assert len(fig.data) == 2
142+
143+
144+
def test_parcats_dimensions_max():
145+
df = px.data.tips()
146+
147+
# default behaviour
148+
fig = px.parallel_categories(df)
149+
assert [d.label for d in fig.data[0].dimensions] == [
150+
"sex",
151+
"smoker",
152+
"day",
153+
"time",
154+
"size",
155+
]
156+
157+
# explicit subset of default
158+
fig = px.parallel_categories(df, dimensions=["sex", "smoker", "day"])
159+
assert [d.label for d in fig.data[0].dimensions] == ["sex", "smoker", "day"]
160+
161+
# shrinking max
162+
fig = px.parallel_categories(df, dimensions_max_cardinality=4)
163+
assert [d.label for d in fig.data[0].dimensions] == [
164+
"sex",
165+
"smoker",
166+
"day",
167+
"time",
168+
]
169+
170+
# explicit superset of default, violating the max
171+
fig = px.parallel_categories(
172+
df, dimensions=["sex", "smoker", "day", "size"], dimensions_max_cardinality=4
173+
)
174+
assert [d.label for d in fig.data[0].dimensions] == ["sex", "smoker", "day", "size"]

0 commit comments

Comments
 (0)