DOC: update the pandas.Index.duplicated and pandas.Series.duplicated docstring #20117

stijnvanhoey · 2018-03-10T12:42:20Z

Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):

PR title is "DOC: update the docstring"
The validation script passes: scripts/validate_docstrings.py <your-function-or-method>
The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
The html version looks good: python doc/make.py --single <your-function-or-method>
It has been proofread on language by another sprint participant

Please include the output of the validation script below between the "```" ticks:

Method pandas.Index.duplicated:

################################################################################
##################### Docstring (pandas.Index.duplicated)  #####################
################################################################################

Indicate duplicate index values.

Duplicated values are indicated as ``True`` values in the resulting
array. Either all duplicates, all except the first or all except the
last occurrence of duplicates can be indicated.

Parameters
----------
keep : {'first', 'last', False}, default 'first'
    - 'first' : Mark duplicates as ``True`` except for the first
      occurrence.
    - 'last' : Mark duplicates as ``True`` except for the last
      occurrence.
    - ``False`` : Mark all duplicates as ``True``.

Examples
--------
By default, for each set of duplicated values, the first occurrence is
set on False and all others on True:

>>> idx = pd.Index(['lama', 'cow', 'lama', 'beetle', 'lama'])
>>> idx.duplicated()
array([False, False,  True, False,  True], dtype=bool)

which is equivalent to

>>> idx.duplicated(keep='first')
array([False, False,  True, False,  True], dtype=bool)

By using 'last', the last occurrence of each set of duplicated values
is set on False and all others on True:

>>> idx.duplicated(keep='last')
array([ True, False,  True, False, False], dtype=bool)

By setting keep on ``False``, all duplicates are True:

>>> idx.duplicated(keep=False)
array([ True, False,  True, False,  True], dtype=bool)

Returns
-------
numpy.ndarray

See Also
--------
pandas.Series.duplicated : equivalent method on pandas.Series

################################################################################
################################## Validation ##################################
################################################################################

Errors found:
	Errors in parameters section
		Parameter "keep" description should start with capital letter

Method pandas.Series.duplicated:

################################################################################
##################### Docstring (pandas.Series.duplicated) #####################
################################################################################

Indicate duplicate Series values.

Duplicated values are indicated as ``True`` values in the resulting
Series. Either all duplicates, all except the first or all except the
last occurrence of duplicates can be indicated.

Parameters
----------
keep : {'first', 'last', False}, default 'first'
    - 'first' : Mark duplicates as ``True`` except for the first
      occurrence.
    - 'last' : Mark duplicates as ``True`` except for the last
      occurrence.
    - ``False`` : Mark all duplicates as ``True``.

Examples
--------
By default, for each set of duplicated values, the first occurrence is
set on False and all others on True:

>>> animals = pd.Series(['lama', 'cow', 'lama', 'beetle', 'lama'])
>>> animals.duplicated()
0    False
1    False
2     True
3    False
4     True
dtype: bool

which is equivalent to

>>> animals.duplicated(keep='first')
0    False
1    False
2     True
3    False
4     True
dtype: bool

By using 'last', the last occurrence of each set of duplicated values
is set on False and all others on True:

>>> animals.duplicated(keep='last')
0     True
1    False
2     True
3    False
4    False
dtype: bool

By setting keep on ``False``, all duplicates are True:

>>> animals.duplicated(keep=False)
0     True
1    False
2     True
3    False
4     True
dtype: bool

Returnsinde
-------
pandas.core.series.Series

See Also
--------
pandas.Index.duplicated : equivalent method on pandas.Index

################################################################################
################################## Validation ##################################
################################################################################

Errors found:
	Errors in parameters section
		Parameter "keep" description should start with capital letter

If the validation script still gives errors, but you think there is a good reason
to deviate in this case (and there are certainly such cases), please state this
explicitly.

We (Ghent chapter) decided that an additional line of text (with capital) was less useful than starting with explaining the list of options.

Instead of using the template-based version used before, we split out both docstrings and made a separate for the Index versus the Series. This introduces some redundancy and overlap (basically, the keep argument, also shared with drop_duplicated), but provides a cleaner option by having the examples written inside the docstring of the methods and not somewhere else in the code.

jorisvandenbossche · 2018-03-10T12:52:29Z

@TomAugspurger we will have the same discussion here about sharing the docstrings or not

TomAugspurger · 2018-03-10T12:59:54Z

FWIW I have a slight preference for sharing. Don't have a strong opinion though.

jreback · 2018-03-10T14:56:59Z

pandas/core/series.py

+
+        See Also
+        --------
+        pandas.Index.duplicated : equivalent method on pandas.Index


link so Series.drop_duplicates, DataFrame.duplicated

jreback · 2018-03-10T14:57:36Z

pandas/core/indexes/base.py

+        """
+        Indicate duplicate index values.
+
+        Duplicated values are indicated as ``True`` values in the resulting


can you coordinate text with #20114, seems some slight differences

Looks like they are in the zone together already :-)
(they are sitting close to me: we removed the extended summary in the other PR as Tom asked, or are there other differences?)

great, seemed likely.

jreback · 2018-03-10T15:10:15Z

Ideally I agree that sharing would be nice.

jorisvandenbossche · 2018-03-10T16:22:34Z

Ideally I agree that sharing would be nice.

@jreback does "ideally" mean you are OK with this "practical" solution? :-)

As I said in #20114 (comment) as well, in principle we could take out the parameter section as shared part and inject that. But not sure that will be that practical, as the question is then where to put that.

jreback · 2018-03-10T17:01:59Z

well ideally this should be much more shared, it ends up being a lot of duplicate text. (this means if we can fix it to make it shared would be the best), but if its too complicated for now, then ok too.

…tring-duplicated

[ci skip]

codecov · 2018-03-14T15:01:11Z

Codecov Report

❗ No coverage uploaded for pull request base (master@e6c2647). Click here to learn what that means.
The diff coverage is 100%.

@@            Coverage Diff            @@
##             master   #20117   +/-   ##
=========================================
  Coverage          ?    91.7%           
=========================================
  Files             ?      150           
  Lines             ?    49148           
  Branches          ?        0           
=========================================
  Hits              ?    45070           
  Misses            ?     4078           
  Partials          ?        0

Flag	Coverage Δ
#multiple	`90.08% <100%> (?)`
#single	`41.84% <71.42%> (?)`

Impacted Files	Coverage Δ
pandas/core/base.py	`96.77% <ø> (ø)`
pandas/core/indexes/base.py	`96.66% <100%> (ø)`
pandas/core/indexes/multi.py	`95.06% <100%> (ø)`
pandas/core/series.py	`93.84% <100%> (ø)`
pandas/core/indexes/category.py	`97.31% <100%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e6c2647...e02eda6. Read the comment docs.

TomAugspurger · 2018-03-14T15:01:30Z

Thanks @stijnvanhoey

stijnvanhoey added 13 commits March 10, 2018 12:25

Create separate docstring for index duplicated

3d916ee

--amend

9f4cfc7

Remove duplicat element from shared dict _index_doc_kwargs

faa87da

Add docstring of Series duplicated

a35c25b

Clean old docstring referencegs

688a9d3

Reset dict entries for existing docstrings

1daaed3

Update docstring reference to Index version

28c9bf6

Move docstring to series implementation level

4aad42a

Fix docstring guide errors

b4cd28b

--amend

7f6ca4e

Remove trailing whitespaces

b48e13f

Fixe too long lines

df6e7c8

Remove redundant last entry of examples

c2f79de

jreback added Docs Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Mar 10, 2018

jreback requested changes Mar 10, 2018

View reviewed changes

stijnvanhoey added 2 commits March 10, 2018 16:31

Extend related methods

f4e3756

Extend related methods of index

1ff7864

Merge remote-tracking branch 'upstream/master' into stijnvanhoey-docs…

5da7f6e

…tring-duplicated

jorisvandenbossche changed the title ~~"DOC: update the pandas.Index.duplicated and pandas.Series.duplicated docstring~~ DOC: update the pandas.Index.duplicated and pandas.Series.duplicated docstring Mar 14, 2018

Cleanup [ci skip]

e02eda6

[ci skip]

TomAugspurger added this to the 0.23.0 milestone Mar 14, 2018

TomAugspurger merged commit 92c2910 into pandas-dev:master Mar 14, 2018

stijnvanhoey deleted the docstring-duplicated branch January 13, 2020 11:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC: update the pandas.Index.duplicated and pandas.Series.duplicated docstring #20117

DOC: update the pandas.Index.duplicated and pandas.Series.duplicated docstring #20117

stijnvanhoey commented Mar 10, 2018

jorisvandenbossche commented Mar 10, 2018

TomAugspurger commented Mar 10, 2018

jreback Mar 10, 2018

jreback Mar 10, 2018

jorisvandenbossche Mar 10, 2018

jreback Mar 10, 2018

jreback commented Mar 10, 2018

jorisvandenbossche commented Mar 10, 2018

jreback commented Mar 10, 2018

codecov bot commented Mar 14, 2018

TomAugspurger commented Mar 14, 2018

DOC: update the pandas.Index.duplicated and pandas.Series.duplicated docstring #20117

DOC: update the pandas.Index.duplicated and pandas.Series.duplicated docstring #20117

Conversation

stijnvanhoey commented Mar 10, 2018

jorisvandenbossche commented Mar 10, 2018

TomAugspurger commented Mar 10, 2018

jreback Mar 10, 2018

Choose a reason for hiding this comment

jreback Mar 10, 2018

Choose a reason for hiding this comment

jorisvandenbossche Mar 10, 2018

Choose a reason for hiding this comment

jreback Mar 10, 2018

Choose a reason for hiding this comment

jreback commented Mar 10, 2018

jorisvandenbossche commented Mar 10, 2018

jreback commented Mar 10, 2018

codecov bot commented Mar 14, 2018

Codecov Report

TomAugspurger commented Mar 14, 2018