Skip to content

DOC: update the pandas.Index.duplicated and pandas.Series.duplicated docstring #20117

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Mar 14, 2018

Conversation

stijnvanhoey
Copy link
Contributor

Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):

  • PR title is "DOC: update the docstring"
  • The validation script passes: scripts/validate_docstrings.py <your-function-or-method>
  • The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
  • The html version looks good: python doc/make.py --single <your-function-or-method>
  • It has been proofread on language by another sprint participant

Please include the output of the validation script below between the "```" ticks:

Method pandas.Index.duplicated:

################################################################################
##################### Docstring (pandas.Index.duplicated)  #####################
################################################################################

Indicate duplicate index values.

Duplicated values are indicated as ``True`` values in the resulting
array. Either all duplicates, all except the first or all except the
last occurrence of duplicates can be indicated.

Parameters
----------
keep : {'first', 'last', False}, default 'first'
    - 'first' : Mark duplicates as ``True`` except for the first
      occurrence.
    - 'last' : Mark duplicates as ``True`` except for the last
      occurrence.
    - ``False`` : Mark all duplicates as ``True``.

Examples
--------
By default, for each set of duplicated values, the first occurrence is
set on False and all others on True:

>>> idx = pd.Index(['lama', 'cow', 'lama', 'beetle', 'lama'])
>>> idx.duplicated()
array([False, False,  True, False,  True], dtype=bool)

which is equivalent to

>>> idx.duplicated(keep='first')
array([False, False,  True, False,  True], dtype=bool)

By using 'last', the last occurrence of each set of duplicated values
is set on False and all others on True:

>>> idx.duplicated(keep='last')
array([ True, False,  True, False, False], dtype=bool)

By setting keep on ``False``, all duplicates are True:

>>> idx.duplicated(keep=False)
array([ True, False,  True, False,  True], dtype=bool)

Returns
-------
numpy.ndarray

See Also
--------
pandas.Series.duplicated : equivalent method on pandas.Series

################################################################################
################################## Validation ##################################
################################################################################

Errors found:
	Errors in parameters section
		Parameter "keep" description should start with capital letter

Method pandas.Series.duplicated:

################################################################################
##################### Docstring (pandas.Series.duplicated) #####################
################################################################################

Indicate duplicate Series values.

Duplicated values are indicated as ``True`` values in the resulting
Series. Either all duplicates, all except the first or all except the
last occurrence of duplicates can be indicated.

Parameters
----------
keep : {'first', 'last', False}, default 'first'
    - 'first' : Mark duplicates as ``True`` except for the first
      occurrence.
    - 'last' : Mark duplicates as ``True`` except for the last
      occurrence.
    - ``False`` : Mark all duplicates as ``True``.

Examples
--------
By default, for each set of duplicated values, the first occurrence is
set on False and all others on True:

>>> animals = pd.Series(['lama', 'cow', 'lama', 'beetle', 'lama'])
>>> animals.duplicated()
0    False
1    False
2     True
3    False
4     True
dtype: bool

which is equivalent to

>>> animals.duplicated(keep='first')
0    False
1    False
2     True
3    False
4     True
dtype: bool

By using 'last', the last occurrence of each set of duplicated values
is set on False and all others on True:

>>> animals.duplicated(keep='last')
0     True
1    False
2     True
3    False
4    False
dtype: bool

By setting keep on ``False``, all duplicates are True:

>>> animals.duplicated(keep=False)
0     True
1    False
2     True
3    False
4     True
dtype: bool

Returnsinde
-------
pandas.core.series.Series

See Also
--------
pandas.Index.duplicated : equivalent method on pandas.Index

################################################################################
################################## Validation ##################################
################################################################################

Errors found:
	Errors in parameters section
		Parameter "keep" description should start with capital letter

If the validation script still gives errors, but you think there is a good reason
to deviate in this case (and there are certainly such cases), please state this
explicitly.

  • We (Ghent chapter) decided that an additional line of text (with capital) was less useful than starting with explaining the list of options.

Instead of using the template-based version used before, we split out both docstrings and made a separate for the Index versus the Series. This introduces some redundancy and overlap (basically, the keep argument, also shared with drop_duplicated), but provides a cleaner option by having the examples written inside the docstring of the methods and not somewhere else in the code.

@jorisvandenbossche
Copy link
Member

@TomAugspurger we will have the same discussion here about sharing the docstrings or not

@TomAugspurger
Copy link
Contributor

FWIW I have a slight preference for sharing. Don't have a strong opinion though.

@jreback jreback added Docs Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Mar 10, 2018

See Also
--------
pandas.Index.duplicated : equivalent method on pandas.Index
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link so Series.drop_duplicates, DataFrame.duplicated

"""
Indicate duplicate index values.

Duplicated values are indicated as ``True`` values in the resulting
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you coordinate text with #20114, seems some slight differences

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like they are in the zone together already :-)
(they are sitting close to me: we removed the extended summary in the other PR as Tom asked, or are there other differences?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great, seemed likely.

@jreback
Copy link
Contributor

jreback commented Mar 10, 2018

Ideally I agree that sharing would be nice.

@jorisvandenbossche
Copy link
Member

Ideally I agree that sharing would be nice.

@jreback does "ideally" mean you are OK with this "practical" solution? :-)

As I said in #20114 (comment) as well, in principle we could take out the parameter section as shared part and inject that. But not sure that will be that practical, as the question is then where to put that.

@jreback
Copy link
Contributor

jreback commented Mar 10, 2018

well ideally this should be much more shared, it ends up being a lot of duplicate text. (this means if we can fix it to make it shared would be the best), but if its too complicated for now, then ok too.

@jorisvandenbossche jorisvandenbossche changed the title "DOC: update the pandas.Index.duplicated and pandas.Series.duplicated docstring DOC: update the pandas.Index.duplicated and pandas.Series.duplicated docstring Mar 14, 2018
@codecov
Copy link

codecov bot commented Mar 14, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@e6c2647). Click here to learn what that means.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #20117   +/-   ##
=========================================
  Coverage          ?    91.7%           
=========================================
  Files             ?      150           
  Lines             ?    49148           
  Branches          ?        0           
=========================================
  Hits              ?    45070           
  Misses            ?     4078           
  Partials          ?        0
Flag Coverage Δ
#multiple 90.08% <100%> (?)
#single 41.84% <71.42%> (?)
Impacted Files Coverage Δ
pandas/core/base.py 96.77% <ø> (ø)
pandas/core/indexes/base.py 96.66% <100%> (ø)
pandas/core/indexes/multi.py 95.06% <100%> (ø)
pandas/core/series.py 93.84% <100%> (ø)
pandas/core/indexes/category.py 97.31% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e6c2647...e02eda6. Read the comment docs.

@TomAugspurger TomAugspurger added this to the 0.23.0 milestone Mar 14, 2018
@TomAugspurger TomAugspurger merged commit 92c2910 into pandas-dev:master Mar 14, 2018
@TomAugspurger
Copy link
Contributor

Thanks @stijnvanhoey

@stijnvanhoey stijnvanhoey deleted the docstring-duplicated branch January 13, 2020 11:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants