Skip to content

Documentation for object.__getitem__ is over-specified. #113313

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rodrigogiraoserrao opened this issue Dec 20, 2023 · 9 comments
Closed

Documentation for object.__getitem__ is over-specified. #113313

rodrigogiraoserrao opened this issue Dec 20, 2023 · 9 comments
Labels
docs Documentation in the Doc dir

Comments

@rodrigogiraoserrao
Copy link
Contributor

rodrigogiraoserrao commented Dec 20, 2023

Bug report

Bug description:

collections.deque is a sequence but doesn't implement the full sequence protocol (doesn't support slices).

collections.deque is a sequence:

from collections import deque
from collections.abc import Sequence
assert issubclass(deque, Sequence)  # No AssertionError

The docs on collections.abc.Sequence say this is an ABC for sequences, linking to the docs glossary entry on sequences:

ABCs for read-only and mutable sequences.

In turn, the glossary entry says sequences must implement __getitem__:

An iterable which supports efficient element access using integer indices via the __getitem__() special method [...]

Finally, the __getitem__ docs say that sequences must accept integers and slice objects:

[...] For sequence types, the accepted keys should be integers and slice objects.[...]

(Excerpts taken from the Python 3.12.1 documentation on 20/12/2023.)

I see one of three resolutions:

  1. issubclass(deque, Sequence) returns False
  2. we implement slicing for deque
  3. change the docs on __getitem__ to say “For sequence types, the accepted keys should be integers (and optionally slice objects).”

I don't know which one is the “best”.

CPython versions tested on:

3.8, 3.12

Operating systems tested on:

macOS

Linked PRs

@AlexWaygood
Copy link
Member

Cc. @rhettinger

@AlexWaygood
Copy link
Member

@rodrigogiraoserrao
Copy link
Contributor Author

I was reading through the linked issues and I kind of understand the matters of practicality vs purity, and the fact that concrete classes sometimes deviate a little bit from their ABCs.
However, I claim it really looks like something is wrong because the code below passes type-checking and yet doesn't work:

from collections import deque
from collections.abc import Sequence

def foo(seq: Sequence):
    seq[:]

foo(deque())

Given that I'm not doing anything naughty with casts or trying to bend typing to my will and given that type-checking passes, shouldn't the code work?

@AlexWaygood
Copy link
Member

AlexWaygood commented Dec 20, 2023

Given that I'm not doing anything naughty with casts or trying to bend typing to my will and given that type-checking passes, shouldn't the code work?

Ideally, yes. We do the best we can at typeshed by overriding the stub for __getitem__ on deque: https://github.com/python/typeshed/blob/8dfebc01176dd6c1859c717b021f58eccabf2182/stdlib/collections/__init__.pyi#L252-L255. But there's ultimately not much we can do at typeshed or mypy to prevent this kind of failure of static type-checking when the runtime class violates LSP. Mypy understands deque to be a mutable sequence (if it believed otherwise, that would cause many false-positive errors), and it assumes that all mutable sequences support slicing due to our MutableSequence stubs (again, if it believed otherwise, that would cause many false-positive errors). We have to faithfully represent the runtime class as it actually exists in the typeshed stubs; we have no other option.

None of this is to say that we should definitely change deque so that it does support slicing; it's only to say that I agree that, in an ideal world, we wouldn't have situations like this. We don't live in an ideal world: it may well be that the runtime implementation would still be too problematic or too complex to make solving this problem worthwhile. I leave that question to those with more expertise with the code in question :)

@rhettinger
Copy link
Contributor

rhettinger commented Dec 20, 2023

FWIW, this isn't a bug in CPython. A class registered with collections.abc.Sequence or MutableSequence is not required to support slicing. There has never been a requirement that deques support slicing. We only added indexing late in game and that was mainly only to support d[0] and d[-1] as a way to peek at end-points. Other uses are indexing are mostly an antipattern for deques because they has O(n) running time and because the position of values shifts as the deques grow and shrink.

We could make this a feature request. I've worked on in it this past and it was a big project, much more complicated than it seemed at first. In the end, I dropped it because there isn't much of a need. Most of what users want deques for is already satisfied by the current API. We expand APIs to satisfy compelling needs, not to make type annotations more parallel (the tail wagging the dog).

As @AlexWaygood notes, TypeShed chose its assumptions based on what worked well for most users most of the time, but it definitely did not create a requirement to change the concrete implementation.

Any objections to marking this as closed?

@rhettinger rhettinger removed the type-bug An unexpected behavior, bug, or error label Dec 20, 2023
@rodrigogiraoserrao
Copy link
Contributor Author

rodrigogiraoserrao commented Dec 21, 2023

We could make this a feature request.

I'm not particularly interested or invested in having deque support slicing.

A class registered with collections.abc.Sequence or MutableSequence is not required to support slicing.

I may have misinterpreted the docs, but the __getitem__ docs seem to say it is:

[...] For sequence types, the accepted keys should be integers and slice objects.[...]

(My original issue has the complete “documentation paper trail” that links everything together.)

Is it a case, then, that the __getitem__ docs should be updated to say something like “For sequence types, the accepted keys should be integers (and optionally slice objects)”?

If you think I misinterpreted the docs, would you be so kind as to help me understand where in the chain of links I followed above I made a mistake?
Maybe I'll be able to clarify the docs a bit.

I hope you understand my persistence comes from the fact that, from where I currently stand, it looks like code and documentation are at odds, which seems to imply one needs to be tweaked.
I'm also not interested in proving you wrong.
I'm interested in understanding where I made a mistake.

@rhettinger
Copy link
Contributor

It would reasonable to add "(and optionally slices)" to the __getitem__ docs.

@rhettinger
Copy link
Contributor

rhettinger commented Dec 21, 2023

I've uploaded a PR because the current wording over-specifies the API and doesn't reflect reality.

FWIW, for much of its history, the term "sequence" was very general and used loosely. It was meant to distinguish between the two most common patterns of using __getitem__, one as a sequence and the other as a mapping. The former expected integer indices, would raise an IndexError, and supported automatic sequence iteration. The latter used key based lookups and would raise a KeyError for a missing key. Slice support was completely optional and was implemented using the __getslice__ dunder method.

When collections.abc.Sequence and collections.abcMutableSequence were introduced, the notion of a sequence became more precise. Technically, it only applied to class inheriting from or registered to the ABCs. Other classes could still be sequence-like but without having count or index. The ABC specified the methods that would be supported but didn't require negative index support or slicing. Either of those might not be supportable depending on the semantics of the class.

When __getslice__ was folded into __getitem__, the requirements did not change. Also slice objects are allowed to have types other than integers for the start, stop, and step arguments. Slices are also allowed as dictionary keys so they work with mappings as well.

Later, type annotations came along. For the most part, they were able to describe reality, but it didn't fit perfectly and occassional approximations had to be made (for example, that is why a function annotated with float can accept an int even though the latter is a not subclass of the former).

When sequence is spelled with a capital S, it means either typing.Sequence or collections.abc.Sequence which are very close in meaning but not exactly the same. With a lowercase S, a sequence is a general concept that can have fewer requirements (ie. a class need only to have a __getitem__ method to support automatic sequence iteration) or may have additional requirements (such as the builtin sequence types all supporting concatenation with +, repetition with *, negative indexing, and slice support).

miss-islington pushed a commit to miss-islington/cpython that referenced this issue Dec 21, 2023
…ences. (pythongh-113377)

(cherry picked from commit 6a5b473)

Co-authored-by: Raymond Hettinger <[email protected]>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Dec 21, 2023
…ences. (pythongh-113377)

(cherry picked from commit 6a5b473)

Co-authored-by: Raymond Hettinger <[email protected]>
@rhettinger rhettinger changed the title collections.deque is a sequence but doesn't implement the full sequence protocol (doesn't support slices) Documentation for object.__getitem__ is over-specified. Dec 21, 2023
@rodrigogiraoserrao
Copy link
Contributor Author

I didn't know that there used to be a dunder method __getslice__!

Thanks for your thorough reply, @rhettinger, and thanks for handling this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation in the Doc dir
Projects
None yet
Development

No branches or pull requests

3 participants