Skip to content

Type erasure and related issues #79

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gvanrossum opened this issue Apr 13, 2015 · 23 comments
Closed

Type erasure and related issues #79

gvanrossum opened this issue Apr 13, 2015 · 23 comments
Milestone

Comments

@gvanrossum
Copy link
Member

We said it shouldn't be possible to instantiate List or List[t] (i.e., List() or List[t]()), but the implementation allows this (where t can be a type variable or a concrete class). This should be disallowed (and made sure it's documented in the PEP).

However, for user-defined generic classes it should definitely be possible to write node Node() and probably also Node[t](). We must decide whether the latter is allowed, and if so whether the object it returns has Node or Node[t] as its __class__.

Also note that if the Node class is declared generic in a stub file, all uses of Node[t] must appear in string quotes (i.e. as forward references) since in this case the runtime Node class is not parametrized. (See #60.)

Mark Shannon has proposed an intermediate option, using a class decorator like @implements(Generic[T]), which would make the fact that Node is parametrized knowable at runtime but without defining isinstance(), issubclass() or Node[t]. This would be a third way to do it, not replacing stubs or inheriting from Generic[T]. It could possibly also save space (since in CPython class objects such as created by Node[t] use a lot of memory).

@JukkaL
Copy link
Contributor

JukkaL commented Apr 14, 2015

We (me, Mark Shannon and Andrey Vlasovskikh) discussed this yesterday and the consensus was that Node[t]() should be disallowed in expressions (this also applies to things like Node[t].foo(), though we weren't sure whether we can reasonably modify the implementation to not allow that.

@markshannon
Copy link
Member

Given the expense of creating a new class, it might be worth caching these things in the typing module. That way we don't need to explicitly disallow Node[T]() ast it will only create one new class object per T, not per evaluation.

@gvanrossum
Copy link
Member Author

I like that idea. Maybe I won't get to implementing it before beta 1 though.

@JukkaL
Copy link
Contributor

JukkaL commented Apr 17, 2015

I have another issue with supporting Node[T](). What if T is a type variable? At runtime, the value of a type variable is unknown. In a function annotation, a type variable implies a generic function, and the type variable applies to the function as a whole, so there is no such issue and introspection works consistently. At runtime, the type variable in Node[T]() applies to a single function invocation where T has some value (though only statically), so conceptually it should be bound to this particular value, but due to type erasure, the value is not available at runtime.

Example:

...
class Node(Generic[T]):
    def __init__(self, x: T) -> None:
        self.value = x

T = TypeVar('T')

def f(x: T) -> Node[T]:
    return Node[T](x)

n1 = f(1)  # Runtime type Node[T], but should arguably be Node[int]
n2 = f('x')  # Also Node[T], but should be Node[str]

My argument against storing the type argument in instances is primarily based on two issues:

  1. Type variables cannot be used usefully (their value is unknown), so the information about type arguments is only partial and thus can't be usefully relied on. Instances created within generic functions may have different properties from instances created in non-generic functions, which non-uniform and ugly.
  2. If an instance is created using Node(...), the type argument value is missing. Again, this means that runtime introspection of type argument values is not very useful since the value is not always there, and the behavior is inconsistent.

These are more minor issues:

  1. Node[T](), even with caching, seems like a somewhat expensive operation (we could easily benchmark how fast it is compared to just Node(), though).
  2. List[int]() would not be supported (and generally any std library class would not support indexing, unless it is defined in typing), which makes these types inconsistent from built-in types.
  3. type(node) is Node doesn't work for Node[int]() instances, which means that code that used to work when it was dynamically typed may break once it is adapted to static typing (which may introduce Node[t]() etc.).

@gvanrossum
Copy link
Member Author

OK, unless there are further objections, let's disallow instantiating parametrized types.

@gvanrossum
Copy link
Member Author

How about we make Node[t]() return an instance whose __class__ is just Node (not Node[t])? Just so if there are no other ways the type checker can guess the type (e.g. it's an empty container) we don't have to resort to a # type: Node[t] comment? This could work for concrete types as well as type variables.

We could even go back and make List[t]() return an empty builtin list instance?

@JukkaL
Copy link
Contributor

JukkaL commented Apr 17, 2015

That could be nice -- that's how mypy used to work, and it was somewhat helpful. These are the tradeoffs as I see them:

  1. List[t]() (etc.) is shorter to write than a type comment and arguably looks less noisy.
  2. List[t]() is less efficient than []. We should measure by how much and if it's a lot slower we may want to reconsider.
  3. There's now more than one obvious way to do things. Some people may prefer a type comment while others may prefer List[t](). Should one of these be the recommended way in most cases? For example, we should recommend using a type comment for initialization and List[t]() in subexpressions where a type comment isn't valid. Variable/attribute initialization is clearly the most common use case for these.

@gvanrossum
Copy link
Member Author

I wonder if the real utility of List[t]() (etc.) is when the List[t] part is actually a type alias. I expect that reading e.g. List[Employee]() will feel pretty cryptic (there's so much punctuation that less-experienced readers are likely to either skip it or be stopped in their tracks trying to understand it), but e.g. EmployeeList() feels more readable.

@gvanrossum gvanrossum added the bug label May 18, 2015
@gvanrossum
Copy link
Member Author

This needs text in the PEP describing what we decided to do. I think it's this (which is already implemented in typing.py):

  • Node[t]() (for some concrete type t) as well as Node[T]() and Node() are all allowed, but don't record the type parameter (if any) at runtime.
  • Type checkers should capture the type parameter and proceed to check uses leading from this instantiation accordingly.

@gvanrossum
Copy link
Member Author

FWIW I also need to fix this in typing.py -- currently Node().__class__ is Node[T], Node[int]().__class__ is Node[int], etc. (I.e. I was wrong in the previous comment that this is already implemented correctly.)

gvanrossum pushed a commit that referenced this issue May 20, 2015
This addresses issue #79, but does not fix it (typing.py needs to be
updated to implement type erasure).
@gvanrossum gvanrossum removed the to do label May 20, 2015
@The-Compiler
Copy link

I found the section in the PEP (added in c29a9b9) unclear to read as someone not very familiar with type systems (coming from C and Python).

Two things are confusing to me:

  • What part is talking about the statically inferred type, and what part is talking about the runtime type?
  • What's the purpose of instantiating such a generic?

After reading it a few times, here is what I think it means:

  • the type of x is Node[Any]. refers to the statically inferred type, as it otherwise conflicts with At runtime the type is not preserved, and the observable type of x is
    just Node.
  • Instantiating it will just create a Node object (as if it didn't inherit from Generic) and the extra infos are only for the typechecker.

Is this correct? Maybe the text could be clarified a bit, though I'm not sure how exactly.

@gvanrossum
Copy link
Member Author

That's good feedback. I'll clarify it.

@gvanrossum
Copy link
Member Author

Done in rev b5e4e3d.

@gvanrossum
Copy link
Member Author

I'm reopening this. The PEP (https://www.python.org/dev/peps/pep-0484/#instantiating-generic-classes-and-type-erasure) explicitly allows Node[int](), but mypy explicitly disallows it, with this error:

error: Generic type not valid as an expression any more (use '# type:' comment instead)

The discussion in this issue was somewhat hurried and went back and forth a few times. What should we do? Fix mypy or change the PEP?

@gvanrossum gvanrossum added this to the 3.5.2 milestone Mar 24, 2016
@ilevkivskyi
Copy link
Member

I do not see why Node[int]() should be prohibited. Probably, the only downside is speed, but people who want speed simply will not use this particular possibility. On the other hand assume that I have

class Box(Generic[T]):
    def __init__(self, value: T) -> None:
        self.value = value

If I want to be able to find type errors at instantiation, then I have two options:

x = Box[int]('a string') # OOPS!

or alternatively

x = Box('a string') # type: Box[int]

The second option looks a bit ugly to me. It would be even more ugly if I want to type annotate something like

if Box('a string') in box_list: # type: Box[int]
    ...

Although there might be no serious reason to recommend the first option, I do not see any reason to prohibit it.

@gvanrossum
Copy link
Member Author

The main reason to prohibit Node[int]() is indeed that it's expensive. In fact, it's really expensive. Like 100x the cost of Node() (if Node is generic; otherwise it's more like 250x).

Why is it so slow? Because each time you call Node[int] it creates a new class object. And class objects just aren't meant to be created that frequently.

However you can still create a type alias for Node[int] and call it. E.g. the following works in mypy (and is barely slower than using a type comment):

IntBox = Box[int]
x = IntBox(0)  # okay
y = IntBox('')  # error

@ilevkivskyi
Copy link
Member

Type alias is a nice compromise here. I would even say that using type aliases could improve readability in this case. Finally, the facts that this already works in mypy and it is fast mean that this is a good option that we could recommend.

@gvanrossum
Copy link
Member Author

gvanrossum commented Apr 3, 2016 via email

@JukkaL
Copy link
Contributor

JukkaL commented Apr 4, 2016

Explicitly making a type alias such as Node[int] be a valid thing to call seems reasonable, but I think that we should erase type variables -- my preference would be to make IntBox(0) construct the same object as Box(0) at runtime in the above example. Allowing something like Box[int] to be called within an expression seems questionable unless we can make it at least reasonably fast.

@gvanrossum
Copy link
Member Author

gvanrossum commented Apr 4, 2016 via email

@ilevkivskyi
Copy link
Member

Sorry, I was quite busy today, @gvanrossum I agree with everything what you said, I propose to change the error message to the following one "Generic type is prohibited as a runtime expression (use a type alias or '# type:' comment)". I have made a PR at mypy python/mypy#1326

@gvanrossum
Copy link
Member Author

This entire section in the PEP is questionable (showing how little I understood when I wrote it):

Now there are two ways we can instantiate this class; the type
inferred by a type checker may be different depending on the form we
use.  The first way is to give the value of the type parameter
explicitly -- this overrides whatever type inference the type
checker would otherwise perform::

  x = Node[T]()  # The type inferred for x is Node[T].

  y = Node[int]()  # The type inferred for y is Node[int].

If no explicit types are given, the type checker is given some
freedom. Consider this code::

  x = Node()

The inferred type could be ``Node[Any]``, as there isn't enough
context to infer a more precise type.  Alternatively, a type checker
may reject the line and require an explicit annotation, like this::

  x = Node()  # type: Node[int]  # Inferred type is Node[int].

A type checker with more powerful type inference could look at how
``x`` is used elsewhere in the file and try to infer a more precise
type such as ``Node[int]`` even without an explicit type annotation.
However, it is probably impossible to make such type inference work
well in all cases, since Python programs can be very dynamic.

This PEP doesn't specify the details of how type inference should
work.  We allow different tools to experiment with various approaches.
We may give more explicit rules in future revisions.

I would like to rewrite all of this it to clarify:

  • x = Node() infers the type of x as Node[Any]
  • if the typevar occurs in the signature of __init__ we infer the type from that instead
  • x = Node() # type: Node[X] infers the type of x as Node[X] (for some concrete type X)
  • x = Node[X]() is invalid
  • A = Node[X] followed by x = A() is valid and infers the type of x as Node[X]

@gvanrossum
Copy link
Member Author

(Also added to the peps repo, as rev 1ad9e828ca36.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants