-
Notifications
You must be signed in to change notification settings - Fork 229
REF: Refactor plugin.py to use modular dataclasses in tree-like structure to represent parsed data #121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REF: Refactor plugin.py to use modular dataclasses in tree-like structure to represent parsed data #121
Conversation
just a quick reply while i'm packing for a holiday.. please have a look also at #49 and my comment there. |
Yeah splitting things up I think would be great. Perhaps we can do this in the following steps:
Regarding these dataclasses specifically: how do you feel about their structure, and about moving the logic for each protobuf object into it's corresponding data class? Thus generating the response mainly consists of instantiating these days classes passing them (1) their parent, (2) their path and (3) their corresponding protobuf object. |
I like the idea of gradually moving towards OOP. It still feels somewhat more risky coming up with a top-down design and fitting the code into it, as opposed to isolating a responsibility from the code, and moving it into a (small) class, and see immediately that it made the code more simple. I see a number of responsibilities that might get mixed up (they are already mixed up in the current codebase):
It's not clear if we can split up the code immediately into clean components that separate these concerns, but even if we don't, we should know to some level which responsibilities each class takes. But even if we mix it up, it would still be a step forward, and I support your proposal to have each protobuf type handled by separate components. I haven't spent time to think about the paths/parents you mentioned, but i suppose it can be handy. In any case, I would stick to always working code as much as possible, instead of a big bang change, so that we will be able to keep releasing. It could be a good start to just extract a single object like Service, Message, Enum, or even ServiceMethod. Whichever is easiest. It will already be challenging. |
betterproto/plugin.py
Outdated
if field_val is PLACEHOLDER: | ||
raise ValueError(f"`{field_name}` is a required field with no default value.") | ||
|
||
@property # TODO: replace with functools.cached_property |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would bump the required version to generate protobufs, might be better to just copy the source for it somewhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. I see three alternatives:
- Copy it from source (as you suggested).
- Leave it as
@property
which means it will be recalculated every time. - Use
@functools.lru_cache(maxsize=1)
which would achieve the same thing, albeit with overkill complexity in the backend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested pulling the source, it's pretty straightforward. Only question is if it's worth the trouble or if the performance is okay as is.
I pushed a version that has a separate The way it is currently structured, there is no way to go one dataclass at a time, since they all rely on the |
Makefile
Outdated
test: ## - Run tests, ingoring collection errors (ex from missing imports) | ||
poetry run pytest --cov betterproto --continue-on-collection-errors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is obviously not needed for this PR, but I found it very useful, will remove before final review.
Moving from draft -> real PR. Still have some failing tests, but most are passing. |
class Message(ProtoContentBase): | ||
"""Representation of a protobuf message. | ||
""" | ||
parent: Union[Message, OutputTemplate] = PLACEHOLDER |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the main features of this refactor is the tree structure.
Tests are working on 3.8 and 3.7. 3.6 is broken because I was using PEP 563. It's easy enough to just use string literal type hints, I'll change that when needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your effort on this.
Splitting up the plugin module is overdue. But since you've started I think it would be nicer to organise things a bit more explicitly with a sub-package.
consider:
betterproto/
plugin/
__init__.py
__main__.py # this is executed as $ python -m betterproto.plugin
models.py
This structure would then be easier to extend to split things out further.
@nat-n please take a look at the structure I just created and let me know if that falls within the framework you were thinking of. |
@SET plugin_dir=%~dp0 | ||
@python -m %plugin_dir% %* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how this will pan out, not a .bat person...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will test this out when I have time in the next couple of days 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this used anymore?
@SET plugin_dir=%~dp0 | ||
@python %plugin_dir%/plugin.py %* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved to src/betterproto/plugin/plugin.bat
@@ -1,480 +0,0 @@ | |||
#!/usr/bin/env python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
split up into src/betterproto/plugin/__init__.py
and src/betterproto/plugin/parser.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still need to find time to review this properly, but it looks better already.
src/betterproto/plugin/__init__.py
Outdated
fh.write(request.SerializeToString()) | ||
|
||
|
||
if __name__ == "__main__": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having this here too doesn't hurt (except for breaking the relative import in this module), but it would be good to also have a __main__.py
like:
from . import main
main()
see: https://docs.python.org/3/using/cmdline.html#cmdoption-m
src/betterproto/plugin/__init__.py
Outdated
|
||
|
||
def main(): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't be an empty line before the docstring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @adriangb for taking up this big task!
Since I'm not able to fully review now, I would just like to share a few high level questions/remarks.
I think you've described the structure of protobuf files correctly in the docblock, and it should be a useful to represent them as a nested structure as to allow navigating to definitions of field types, and so on (and for example knowing to which file they will be written).
Within the concept of a tree, I would go personally go with more specialized concepts, rather than a generic term, I.e. messages have a package, and a field have a "containing message" or just message. That way we stick close to the real domain terms, and do not introduce new concepts or abstractions that are not part of Protobuf, and thus lower the barrier of understanding the library.
Other questions:
Is building this tree part of this PR? What is the general approach to generating the tree?
Will we be able to take advantage of the classes, before having the tree building logic complete?
Even with a lot of changes already moved out of scope (and it's great you're keeping an eye on scope, and keeping diffs manageable), I'm still concerned about the scope being too big.
Could you list briefly which changes you consider explicit part of this PR, and which are out of scope?
Sorry if any of my questions were already answered or should have been clear from the code, I'm on mobile
Happy to change any wording in the docs or name of variables! I was trying
to use general data structure terms so that someone unfamiliar with
protobuf might be able to understand the structure faster.
The tree is built using the same logic that used to build the nested dicts.
The main difference is that when you "add" a field to a message, the field
object registers the message as it's parent, and adds a reference to itself
into the message's `fields` list.
Unfortunately, I do not see a way of implementing these classes one at a
time, so I had to actually fully implement them into the parser (by parser
I mean the `generate_code` method). The good news is that _most_ of the
logic in these dataclasses was just copied over from `plugin.py` and the
logic in `generate_code` remains largely unchanged.
The specific goals of this PR would be:
- Create object oriented representations of protobuf object with references
to their parents/children.
- Refactor `generate_code` to use these classes.
I wasn't planning on restructuring the package, but I took that on based on
@nat-n 's suggestion. I agree with them that it makes things nicer and
cleaner, but it does increase the diff complexity considerably.
…On Sat, Jul 18, 2020, 12:46 PM Bouke Versteegh ***@***.***> wrote:
***@***.**** commented on this pull request.
Thank you @adriangb <https://github.com/adriangb> for taking up this big
task!
Since I'm not able to fully review now, I would just like to share a few
high level questions/remarks.
I think you've described the structure of protobuf files correctly in the
docblock, and it should be a useful to represent them as a nested structure
as to allow navigating to definitions of field types, and so on (and for
example knowing to which file they will be written).
Within the concept of a tree, I would go personally go with more
specialized concepts, rather than a generic term, I.e. messages have a
package, and a field have a "containing message" or just message. That way
we stick close to the real domain terms, and do not introduce new concepts
or abstractions that are not part of Protobuf, and thus lower the barrier
of understanding the library.
Other questions:
Is building this tree part of this PR? What is the general approach to
generating the tree?
Will we be able to take advantage of the classes, before having the tree
building logic complete?
Even with a lot of changes already moved out of scope (and it's great
you're keeping an eye on scope, and keeping diffs manageable), I'm still
concerned about the scope being too big.
Could you list briefly which changes you consider explicit part of this
PR, and which are out of scope?
Sorry if any of my questions were already answered or should have been
clear from the code, I'm on mobile
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good so far! Still need to look at the actual objects. Coming later
src/betterproto/plugin/parser.py
Outdated
message_data = Message(parent=output_package, proto_obj=item, path=path) | ||
for index, field in enumerate(item.field): | ||
if is_map(field, item): | ||
MapField(parent=message_data, proto_obj=field, path=path + [2, index]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks so much cleaner now! Nice!
Because protobuf itself also has class names for describing protobuf objects , I suggest we suffix all our classes representing protobuf objects with Definition.
It will still be somewhat confusing, as protobuf uses Description, which is very similar, and newcomers won't easily be able to distinguish the input types from the intermediate types.
Some terms come to mind:
- Parsed...Definition (ParsedFieldDefinition)
- Python... (PythonField) — not my favorite as the terms represent more the protobuf objects, than the python objects
- ...Compiler (MessageCompiler, FieldCompiler) — kinda close to what it is, perhaps?
- ...Mapper/Transformer
I personally like Compiler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah either compiler or Parsed..Definition sound good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With "Compiler" it would be:
- PluginRequestCompiler
- MessageCompiler
- FieldCompiler
- MapEntryCompiler
- OneOfFieldCompiler
- EnumDefCompiler
- ServiceCompiler
- ServiceMethodCompiler
Right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With "Compiler" it would be:
- RequestCompiler
- MessageCompiler
- FieldCompiler
- MapEntryCompiler
- OneOfFieldCompiler
- EnumDefCompiler
- ServiceCompiler
- ServiceMethodCompiler
Right?
Yes 👍
Although I haven't looked at protobuf descriptions to see if the classnames match, but if it doesn't, it would be an easy fix. So this looks fine to me.
EnumDefCompiler could just be EnumCompiler, I guess?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I suggested EnumDefCompiler instead of EnumCompiler is that I originally was confused because I thought the enum objects protoc was supplying were enum fields or something like that instead of the actual enum defenitions. But maybe that was just me not knowing enough about protobuf structures... Again I really am open to anything you or others decide on 😃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right, I understand. I may have had the same confusion, until I realized enums are also independent objects.
What further confuses things is that Enum values are also distinct from the Enum definition itself, although I think the Enum values end up being just fields on the Enum, but not sure (?).
How about EnumTypeCompiler?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep "Enum Fields" are just Fields of the type if that makes sense. They're treated just like any other field. So the only time you encounter anything "Enum" related is the enum definitions. Why not EnumDefinitionCompiler
? It's a bit long but I think it's most explicit. But otherwise EnumTypeCompiler
sounds fine as well.
src/betterproto/plugin/parser.py
Outdated
if is_map(field, item): | ||
MapField(parent=message_data, proto_obj=field, path=path + [2, index]) | ||
elif is_oneof(field): | ||
OneOfField(parent=message_data, proto_obj=field, path=path + [2, index]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this represent a field in a group, or the group itself?
If its the group, we could call it Group, or OneOfGroup. If that is in line with the descriptor name passed by the plugin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be a field that is part of a OneOf group. I don't think the oneof group itself is represented as an entity by protoc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks for the reply. In the current library implementation, one of fields and normal fields are declared in pretty much the same way, except that one of has a group attribute set.
It seems a simple way to model the domain from the outside, as it avoids a nested structure (as does yours) but also another concept altogether.
Would you say keeping this approach in the OOP version could work, or did you have a specific motivation to split them into two classes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My reasoning for still splitting up into two classes is:
- Be able to use an
isinstance
check instead of adding anis_oneof
attribute or something like that. - Don't pollute the already complex
Field
namespace with a method that isn't relevant to a field unless it's a OneOf
src/betterproto/plugin/models.py
Outdated
return current.package_proto_obj | ||
|
||
@property | ||
def request(self) -> "Request": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a convenience method or do we need it for compilation?
If possible, I'd prefer to avoid too much treewalking unless really needed, because it makes it harder to think about the code, especially when the tree walking is very abstract.
If we really need the plug-in request object from any object defined in it, we could also just pass it as a constructor arg, and keep things flat. Or would that make things more difficult?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment, it's only being used by ServiceMethod.py_input_message
to get all of the messages. I guess it could be a constructor arg, that might simplify things, or make them more complicated, not sure, I'd have to implement it.
@nat-n I think the structure is now how you indicated and all of the entry points should work |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adriangb Congratulations on this amazing job! You've single handedly turned our ticker tape code into a nicely organized structure. I still recognize most of the pieces, but now each of them are in their proper place, and it fits together beautifully. I'm very happy with the result, and I expect it will make contributing this library much more pleasant, easy and much less daunting.
I see you've put a lot of effort in this, as is visible from the well readable and logically structured code, and you took great care to not overdo it. That is important and much appreciated.
Any suggested changes that came to my mind are insignificant compared to the big jump forward that this PR represents, so rather than to keep polishing it and blocking its merge, I suggest that we merge this as soon as there are no showstoppers.
Any finetuning (which for my part is limited to clarifying some variable names) can easily be done in follow PRs.
Your work will be one of the major milestones in this project, so I thank you on behalf of all contributors and users of betterproto who will find it in a much better shape.
Muchas gracias! 🙏🎉
I think you are giving me too much credit, I could not have made this PR without the work you and others have done before me! I made the minor edits from your comments. Since these changes are all internal, I think it's okay if we have to make changes again later, so I too am on board for merging what we've got so far and going from there. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All for this change. Sorry I am a bit late to the party, but wanted to add a few comments regarding structure and change sequencing. Not show stoppers, but nice to haves.
# plugin: python-betterproto | ||
from dataclasses import dataclass | ||
{% if description.datetime_imports %} | ||
from datetime import {% for i in description.datetime_imports %}{{ i }}{% if not loop.last %}, {% endif %}{% endfor %} | ||
from datetime import {% for i in description.datetime_imports|sort %}{{ i }}{% if not loop.last %}, {% endif %}{% endfor %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to implement this using isort similar to the way we use black today? Obviously as part of another changeset and not within here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That definitely makes sense for a future PR. Previously sort was being used somewhere else, I just moved it.
@@ -1,13 +1,13 @@ | |||
# Generated by the protocol buffer compiler. DO NOT EDIT! | |||
# sources: {{ ', '.join(description.files) }} | |||
# sources: {{ ', '.join(description.input_filenames) }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for sanity sake it would be great to see the changes to names etc (and the template as a whole) to be split out into another PR just so that in case of any downstream issues we can revert easily. Just food for thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kind of agree. There's two main reasons why names were changed:
(1) When using the old names in the new structure really didn't make sense (ex: properties
being a container for fields
).
(2) When I just renamed things as I was developing for the sake of keeping my sanity and understanding everything.
I'd say the former makes sense for this PR but the latter (along with other re-naming for clarity's sake) would have made sense for another PR. The issue is that I don't remember which is which at this point 😣
@@ -0,0 +1 @@ | |||
from .main import main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from .main import main | |
from betterproto.plugin.main import main |
Or better yet, drop this in favour of consolidating main.py
into __main__.py
and update pyproject.toml
to use betterproto.plugin.__main__:main
.
It may not be needed anymore, as within the poetry shell we can call the
plugin with `--betterproto_out` or something, but i haven't checked the
code if we actually do that
…On Thu, Jul 23, 2020, 11:42 Arun Babu Neelicattu ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In src/betterproto/plugin/plugin.bat
<#121 (comment)>
:
> ***@***.*** plugin_dir=%~dp0
***@***.*** -m %plugin_dir% %*
Is this used anymore?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#121 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAANFJTICYWGTLSBZAZVTJDR5AAY3ANCNFSM4O2U5XKA>
.
|
As proposed earlier, I will merge this PR since there don't seem to be showstoppers in it. Thank you! |
…ke structure to represent parsed data (danielgtaylor#121)" This reverts commit b5dcac1
…ture to represent parsed data (danielgtaylor#121) Refactor plugin to parse input into data-class based hierarchical structure
This is an attempt to modularize plugin.py so that it can easily be extended to make new plugins. For reference, discussion in #119.