Skip to content

Proposal to make oneof-deconstruction typecheckable #358

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
124C41p opened this issue Mar 23, 2022 · 5 comments
Closed

Proposal to make oneof-deconstruction typecheckable #358

124C41p opened this issue Mar 23, 2022 · 5 comments
Labels
enhancement New feature or request large Large effort issue, multiple PRs, needs to be split up on hold

Comments

@124C41p
Copy link
Contributor

124C41p commented Mar 23, 2022

According to the docs, deconstruction of a oneof-type is meant to be done in the following way:

syntax = "proto3";

message Test {
  oneof foo {
    bool on = 1;
    int32 count = 2;
    string name = 3;
  }
}
foo_type, foo_value = betterproto.which_one_of(test, 'foo')

if foo_type == 'on':
  process_on(foo_value)
elif foo_type == 'count':
  process_count(foo_value)
elif foo_type == 'name':
  process_name(foo_value)
else:
  raise ValueError('No value for foo specified!')

Unfortunately, in that way development tooling (pylance, mypy, etc) can neither check for typos nor for completeness. (Imagine to introduce a new field string address = 4; or to remove on of the other fields later. A type checker could not see any problem with our python code after executing protoc.)

There are different possible ways to solve that issue.

Variant 1 - Make all sub-fields of foo optional

You could change the implementation of the Test dataclass such that all sub fields of foo are optional, and the unset fields are set to None. Then our deconstruction could look like this:

if test.on is not None:
  process_on(test.on)
elif test.count is not None:
  process_count(test.count)
elif test.name is not None:
  process_name(test.name)
else:
  raise ValueError('No value for foo specified!')

Variant 2 - Introduce an enum specifying the field set

You could add an enum field to the Test dataclass which encodes the actual field set. Then our deconstruction could look like this:

if test.foo_type == FooType.ON:
  process_on(test.on)
elif test.foo_type == FooType.COUNT:
  process_count(test.count)
elif test.foo_type == FooType.NAME:
  process_name(test.name)
else:
  raise ValueError('No value for foo specified!')

There are certainly many other ways to make typechecking possible. I really hope you agree with me that such a feature would increase development quality (especially since the rest of your auto generated code is beautifully typecheckable - I like that a lot!).

Thank you in advance!

@Gobot1234
Copy link
Collaborator

Gobot1234 commented Mar 23, 2022

I really like the idea and I was actually thinking about this a while back. I'm not a huge fan of making all the fields optional as it could lead to confusion as to why this is different from optional fields and it makes unwrapping fields a pain. Adding a new field and generating members for the enums would be a big compiler change so I'd like to avoid that if possible.

My ideal solution would be to implement oneofs as a context manager, maybe we could try something like.

@dataclass(eq=False,repr=False)
class Test(betterproto.Message):
    with betterproto.oneof() as foo:
        on: bool = betterproto.bool_field(1)
        count: int = betterproto.int32_field(2)
        name: str = betterproto.string_field(3)


match test.foo:
    case OneOf(Test.on, value):
        process_on(value)
    case OneOf(Test.count, value):
        process_count(value)
    case OneOf(Test.name, value):
        process_name(value)
    case _:
        raise ValueError

(This is very Prost inspired) having this type check I think should be possible although I haven't looked into how match works that well as of yet.

@Gobot1234 Gobot1234 added the enhancement New feature or request label Mar 23, 2022
@Gobot1234 Gobot1234 added on hold large Large effort issue, multiple PRs, needs to be split up labels Apr 11, 2022
@heimalne
Copy link

Would an union encoding (a.k.a Algebraic Data Type, a.k.a Choice type) be suitable for betterproto?

Foo = bool | int32 | string

match test.foo:
    case bool():
        process_on(v)
    case int32():
        process_count(v)
    case str():
        process_name(v)
    case _:
        assert_never(v)

They have a consistent mathematical background and in Python 3.11 (https://docs.python.org/3.11/library/typing.html#typing.assert_never) are getting full exhaustiveness-check support, and allow a backwards compatible workaround for older Python versions (https://hakibenita.com/python-mypy-exhaustive-checking)

@a-khabarov
Copy link
Contributor

I think it would be great if we could use (nested) pattern matching instead of which_one_of.

For example, if we have the following messages:

message X {
  int64 n = 1;
}

message Y {
  string s = 1;
}

message A {
  oneof data {
    X x = 1;
    Y y = 2;
  }
}

It would be nice if we could just write something like the following in Python:

match a:
    case A(data=X(n=n)):
        ...
    case A(data=Y(s=s)):
        ...
    ...

@MicaelJarniac
Copy link
Contributor

What about using a sentinel, like betterproto.MISSING?

syntax = "proto3";

message MyMessage {
  oneof foo {
    bool first = 1;
    int32 second = 2;
    string third = 3;
  }
}
# Generated by the protocol buffer compiler.  DO NOT EDIT!
# sources: mymessage.proto
# plugin: python-betterproto
from dataclasses import dataclass
from typing import Union

import betterproto


@dataclass(eq=False, repr=False)
class MyMessage(betterproto.Message):
    first: Union[bool, betterproto.MISSING] = betterproto.bool_field(1, group="foo")
    second: Union[int, betterproto.MISSING] = betterproto.int32_field(2, group="foo")
    third: Union[str, betterproto.MISSING] = betterproto.string_field(3, group="foo")

In this example, only one of first, second or third will not be betterproto.MISSING, which would be the only one provided. The rest would be betterproto.MISSING.

To use it, one would simply do something like this:

from betterproto import MISSING

from lib import MyMessage


def process(msg: MyMessage) -> None:
    reveal_type(msg.first)  # bool | MISSING
    reveal_type(msg.second)  # int | MISSING
    reveal_type(msg.third)  # str | MISSING
    if msg.first is not MISSING:
        # Process `first`
        reveal_type(msg.first)  # bool
    elif msg.second is not MISSING:
        # Process `second`
        reveal_type(msg.second)  # int
    elif msg.third is not MISSING:
        # Process `third`
        reveal_type(msg.third)  # str

The reveal_type calls are just to show that type-checkers will understand it correctly.

It's possible to define MISSING, as well as to simplify the notation Union[..., betterproto.MISSING] like so:

# Sentinels are a bit of a mess
# https://www.python.org/dev/peps/pep-0484/#support-for-singleton-types-in-unions
# https://www.python.org/dev/peps/pep-0661

from enum import Enum
from typing import Final, TypeVar, Union


class Missing(Enum):
    # https://www.python.org/dev/peps/pep-0484/#support-for-singleton-types-in-unions
    MISSING = None

    def __bool__(self) -> bool:
        # https://docs.python.org/3/library/enum.html#boolean-value-of-enum-classes-and-members
        return bool(self.value)

    def __repr__(self) -> str:
        # https://docs.python.org/3/library/enum.html#omitting-values
        return f"{self.__class__.__name__}.{self.name}"


MISSING: Final = Missing.MISSING
T = TypeVar("T")
Maybe = Union[T, Missing]

Maybe[foo] can then be used instead of Union[foo, MISSING], like so:

# Generated by the protocol buffer compiler.  DO NOT EDIT!
# sources: mymessage.proto
# plugin: python-betterproto
from dataclasses import dataclass

import betterproto


@dataclass(eq=False, repr=False)
class MyMessage(betterproto.Message):
    first: betterproto.Maybe[bool] = betterproto.bool_field(1, group="foo")
    second: betterproto.Maybe[int] = betterproto.int32_field(2, group="foo")
    third: betterproto.Maybe[str] = betterproto.string_field(3, group="foo")

@a-khabarov
Copy link
Contributor

I think I found a very simple way to implement this that also provides some additional benefits and does not rely on sentinels.
Here is a PR: #510

MicaelJarniac added a commit to MicaelJarniac/python-betterproto that referenced this issue Feb 22, 2024
Removed the parts of the example that showed accessing an unset value, as it now raises an `AttributeError`, and added an example of the `match` way of accessing the attributes.

Related to danielgtaylor#510 and danielgtaylor#358.
Gobot1234 pushed a commit that referenced this issue Mar 19, 2024
Removed the parts of the example that showed accessing an unset value, as it now raises an `AttributeError`, and added an example of the `match` way of accessing the attributes.

Related to #510 and #358.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request large Large effort issue, multiple PRs, needs to be split up on hold
Projects
None yet
Development

No branches or pull requests

5 participants