REF: Refactor plugin.py to use modular dataclasses in tree-like structure to represent parsed data #121

adriangb · 2020-07-15T15:35:26Z

This is an attempt to modularize plugin.py so that it can easily be extended to make new plugins. For reference, discussion in #119.

boukeversteegh · 2020-07-15T16:22:32Z

just a quick reply while i'm packing for a holiday.. please have a look also at #49 and my comment there.
i think it may be helpful if we first split up plugin.py into separate files, and merge that, before we end up with even more objects/variables in plugin.py.
on another point, small steps may be wise also, because if the refactor takes a long time, it will start to block new incoming work

adriangb · 2020-07-15T17:02:39Z

Yeah splitting things up I think would be great.

Perhaps we can do this in the following steps:

Define the data structures and constants in a seperate file (plugin_dataclasses.py?)
Insert the data structures into the current generate_code method (i.e. still a single monolithic function, but replace the nested dicts with these new classes).
Refactor generate_code into a modular class.

Regarding these dataclasses specifically: how do you feel about their structure, and about moving the logic for each protobuf object into it's corresponding data class? Thus generating the response mainly consists of instantiating these days classes passing them (1) their parent, (2) their path and (3) their corresponding protobuf object.

boukeversteegh · 2020-07-15T17:36:01Z

Yeah splitting things up I think would be great.

Perhaps we can do this in the following steps:

Define the data structures and constants in a seperate file (plugin_dataclasses.py?)

Insert the data structures into the current generate_code method (i.e. still a single monolithic function, but replace the nested dicts with these new classes).

Refactor generate_code into a modular class.

Regarding these dataclasses specifically: how do you feel about their structure, and about moving the logic for each protobuf object into it's corresponding data class? Thus generating the response mainly consists of instantiating these days classes passing them (1) their parent, (2) their path and (3) their corresponding protobuf object.

I like the idea of gradually moving towards OOP. It still feels somewhat more risky coming up with a top-down design and fitting the code into it, as opposed to isolating a responsibility from the code, and moving it into a (small) class, and see immediately that it made the code more simple.

I see a number of responsibilities that might get mixed up (they are already mixed up in the current codebase):

analysis (reading all the source files to gather information that we need to start do mapping)
mapping (deciding on the proper output structure, based on input structure, for example the files, packages and classes)
rendering (transforming the output structure into actual python code)

It's not clear if we can split up the code immediately into clean components that separate these concerns, but even if we don't, we should know to some level which responsibilities each class takes.

But even if we mix it up, it would still be a step forward, and I support your proposal to have each protobuf type handled by separate components.

I haven't spent time to think about the paths/parents you mentioned, but i suppose it can be handy.

In any case, I would stick to always working code as much as possible, instead of a big bang change, so that we will be able to keep releasing. It could be a good start to just extract a single object like Service, Message, Enum, or even ServiceMethod. Whichever is easiest. It will already be challenging.

Gobot1234 · 2020-07-15T17:52:47Z

betterproto/plugin.py

+            if field_val is PLACEHOLDER:
+                raise ValueError(f"`{field_name}` is a required field with no default value.")
+
+    @property   # TODO: replace with functools.cached_property


This would bump the required version to generate protobufs, might be better to just copy the source for it somewhere

Yeah. I see three alternatives:

Copy it from source (as you suggested).

Leave it as @property which means it will be recalculated every time.

Use @functools.lru_cache(maxsize=1) which would achieve the same thing, albeit with overkill complexity in the backend.

I tested pulling the source, it's pretty straightforward. Only question is if it's worth the trouble or if the performance is okay as is.

adriangb · 2020-07-15T21:47:14Z

I pushed a version that has a separate plugin_dataclasses.py and is built on the latest version in master.

The way it is currently structured, there is no way to go one dataclass at a time, since they all rely on the parent and child tree-like architecture.

adriangb · 2020-07-16T17:30:31Z

Makefile

+test:               ## - Run tests, ingoring collection errors (ex from missing imports)
+	poetry run pytest --cov betterproto --continue-on-collection-errors


This is obviously not needed for this PR, but I found it very useful, will remove before final review.

adriangb · 2020-07-16T17:31:44Z

Moving from draft -> real PR.

Still have some failing tests, but most are passing.

adriangb · 2020-07-17T01:41:28Z

src/betterproto/plugin_dataclasses.py

+class Message(ProtoContentBase):
+    """Representation of a protobuf message.
+    """
+    parent: Union[Message, OutputTemplate] = PLACEHOLDER


One of the main features of this refactor is the tree structure.

adriangb · 2020-07-17T14:07:17Z

Tests are working on 3.8 and 3.7. 3.6 is broken because I was using PEP 563. It's easy enough to just use string literal type hints, I'll change that when needed.

nat-n

Thanks for your effort on this.

Splitting up the plugin module is overdue. But since you've started I think it would be nicer to organise things a bit more explicitly with a sub-package.

consider:

betterproto/
    plugin/
        __init__.py
        __main__.py  # this is executed as $ python -m betterproto.plugin
        models.py

This structure would then be easier to extend to split things out further.

adriangb · 2020-07-17T19:21:29Z

@nat-n please take a look at the structure I just created and let me know if that falls within the framework you were thinking of.

adriangb · 2020-07-17T19:27:15Z

src/betterproto/plugin/plugin.bat

+@SET plugin_dir=%~dp0
+@python -m %plugin_dir% %*


Not sure how this will pan out, not a .bat person...

I will test this out when I have time in the next couple of days 👍

Is this used anymore?

adriangb · 2020-07-17T19:27:47Z

src/betterproto/plugin.bat

-@SET plugin_dir=%~dp0
-@python %plugin_dir%/plugin.py %*


moved to src/betterproto/plugin/plugin.bat

adriangb · 2020-07-17T19:28:25Z

src/betterproto/plugin.py

@@ -1,480 +0,0 @@
-#!/usr/bin/env python


split up into src/betterproto/plugin/__init__.py and src/betterproto/plugin/parser.py

nat-n

I still need to find time to review this properly, but it looks better already.

nat-n · 2020-07-17T20:56:50Z

src/betterproto/plugin/__init__.py

+        fh.write(request.SerializeToString())
+
+
+if __name__ == "__main__":


Having this here too doesn't hurt (except for breaking the relative import in this module), but it would be good to also have a __main__.py like:

from . import main main()

see: https://docs.python.org/3/using/cmdline.html#cmdoption-m

nat-n · 2020-07-17T20:58:23Z

src/betterproto/plugin/__init__.py

+
+
+def main():
+


shouldn't be an empty line before the docstring

boukeversteegh

Thank you @adriangb for taking up this big task!

Since I'm not able to fully review now, I would just like to share a few high level questions/remarks.

I think you've described the structure of protobuf files correctly in the docblock, and it should be a useful to represent them as a nested structure as to allow navigating to definitions of field types, and so on (and for example knowing to which file they will be written).

Within the concept of a tree, I would go personally go with more specialized concepts, rather than a generic term, I.e. messages have a package, and a field have a "containing message" or just message. That way we stick close to the real domain terms, and do not introduce new concepts or abstractions that are not part of Protobuf, and thus lower the barrier of understanding the library.

Other questions:

Is building this tree part of this PR? What is the general approach to generating the tree?

Will we be able to take advantage of the classes, before having the tree building logic complete?

Even with a lot of changes already moved out of scope (and it's great you're keeping an eye on scope, and keeping diffs manageable), I'm still concerned about the scope being too big.

Could you list briefly which changes you consider explicit part of this PR, and which are out of scope?

Sorry if any of my questions were already answered or should have been clear from the code, I'm on mobile ☺️🏕️

adriangb · 2020-07-18T19:40:55Z

Happy to change any wording in the docs or name of variables! I was trying to use general data structure terms so that someone unfamiliar with protobuf might be able to understand the structure faster. The tree is built using the same logic that used to build the nested dicts. The main difference is that when you "add" a field to a message, the field object registers the message as it's parent, and adds a reference to itself into the message's `fields` list. Unfortunately, I do not see a way of implementing these classes one at a time, so I had to actually fully implement them into the parser (by parser I mean the `generate_code` method). The good news is that _most_ of the logic in these dataclasses was just copied over from `plugin.py` and the logic in `generate_code` remains largely unchanged. The specific goals of this PR would be: - Create object oriented representations of protobuf object with references to their parents/children. - Refactor `generate_code` to use these classes. I wasn't planning on restructuring the package, but I took that on based on @nat-n 's suggestion. I agree with them that it makes things nicer and cleaner, but it does increase the diff complexity considerably.

…

On Sat, Jul 18, 2020, 12:46 PM Bouke Versteegh ***@***.***> wrote: ***@***.**** commented on this pull request. Thank you @adriangb <https://github.com/adriangb> for taking up this big task! Since I'm not able to fully review now, I would just like to share a few high level questions/remarks. I think you've described the structure of protobuf files correctly in the docblock, and it should be a useful to represent them as a nested structure as to allow navigating to definitions of field types, and so on (and for example knowing to which file they will be written). Within the concept of a tree, I would go personally go with more specialized concepts, rather than a generic term, I.e. messages have a package, and a field have a "containing message" or just message. That way we stick close to the real domain terms, and do not introduce new concepts or abstractions that are not part of Protobuf, and thus lower the barrier of understanding the library. Other questions: Is building this tree part of this PR? What is the general approach to generating the tree? Will we be able to take advantage of the classes, before having the tree building logic complete? Even with a lot of changes already moved out of scope (and it's great you're keeping an eye on scope, and keeping diffs manageable), I'm still concerned about the scope being too big. Could you list briefly which changes you consider explicit part of this PR, and which are out of scope? Sorry if any of my questions were already answered or should have been clear from the code, I'm on mobile ☺️🏕️ — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#121 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AANMPP64C27NZMBCMP2OK5TR4HNWBANCNFSM4O2U5XKA> .

boukeversteegh

Looking good so far! Still need to look at the actual objects. Coming later

boukeversteegh · 2020-07-19T07:47:39Z

src/betterproto/plugin/parser.py

+        message_data = Message(parent=output_package, proto_obj=item, path=path)
+        for index, field in enumerate(item.field):
+            if is_map(field, item):
+                MapField(parent=message_data, proto_obj=field, path=path + [2, index])


This looks so much cleaner now! Nice!

Because protobuf itself also has class names for describing protobuf objects , I suggest we suffix all our classes representing protobuf objects with Definition.

It will still be somewhat confusing, as protobuf uses Description, which is very similar, and newcomers won't easily be able to distinguish the input types from the intermediate types.

Some terms come to mind:

Parsed...Definition (ParsedFieldDefinition)

Python... (PythonField) — not my favorite as the terms represent more the protobuf objects, than the python objects

...Compiler (MessageCompiler, FieldCompiler) — kinda close to what it is, perhaps?

...Mapper/Transformer

I personally like Compiler

Yeah either compiler or Parsed..Definition sound good to me.

With "Compiler" it would be:

PluginRequestCompiler

MessageCompiler

FieldCompiler

MapEntryCompiler

OneOfFieldCompiler

EnumDefCompiler

ServiceCompiler

ServiceMethodCompiler

Right?

With "Compiler" it would be:

RequestCompiler

MessageCompiler

FieldCompiler

MapEntryCompiler

OneOfFieldCompiler

EnumDefCompiler

ServiceCompiler

ServiceMethodCompiler

Right?

Yes 👍

Although I haven't looked at protobuf descriptions to see if the classnames match, but if it doesn't, it would be an easy fix. So this looks fine to me.

EnumDefCompiler could just be EnumCompiler, I guess?

The reason I suggested EnumDefCompiler instead of EnumCompiler is that I originally was confused because I thought the enum objects protoc was supplying were enum fields or something like that instead of the actual enum defenitions. But maybe that was just me not knowing enough about protobuf structures... Again I really am open to anything you or others decide on 😃

Ah right, I understand. I may have had the same confusion, until I realized enums are also independent objects.

What further confuses things is that Enum values are also distinct from the Enum definition itself, although I think the Enum values end up being just fields on the Enum, but not sure (?).

How about EnumTypeCompiler?

Yep "Enum Fields" are just Fields of the type if that makes sense. They're treated just like any other field. So the only time you encounter anything "Enum" related is the enum definitions. Why not EnumDefinitionCompiler? It's a bit long but I think it's most explicit. But otherwise EnumTypeCompiler sounds fine as well.

src/betterproto/plugin/parser.py

boukeversteegh · 2020-07-19T08:02:20Z

src/betterproto/plugin/parser.py

+            if is_map(field, item):
+                MapField(parent=message_data, proto_obj=field, path=path + [2, index])
+            elif is_oneof(field):
+                OneOfField(parent=message_data, proto_obj=field, path=path + [2, index])


Does this represent a field in a group, or the group itself?

If its the group, we could call it Group, or OneOfGroup. If that is in line with the descriptor name passed by the plugin.

This would be a field that is part of a OneOf group. I don't think the oneof group itself is represented as an entity by protoc.

I see, thanks for the reply. In the current library implementation, one of fields and normal fields are declared in pretty much the same way, except that one of has a group attribute set.

It seems a simple way to model the domain from the outside, as it avoids a nested structure (as does yours) but also another concept altogether.

Would you say keeping this approach in the OOP version could work, or did you have a specific motivation to split them into two classes?

My reasoning for still splitting up into two classes is:

Be able to use an isinstance check instead of adding an is_oneof attribute or something like that.

Don't pollute the already complex Field namespace with a method that isn't relevant to a field unless it's a OneOf

boukeversteegh · 2020-07-19T16:13:57Z

src/betterproto/plugin/models.py

+        return current.package_proto_obj
+
+    @property
+    def request(self) -> "Request":


Is this a convenience method or do we need it for compilation?
If possible, I'd prefer to avoid too much treewalking unless really needed, because it makes it harder to think about the code, especially when the tree walking is very abstract.

If we really need the plug-in request object from any object defined in it, we could also just pass it as a constructor arg, and keep things flat. Or would that make things more difficult?

At the moment, it's only being used by ServiceMethod.py_input_message to get all of the messages. I guess it could be a constructor arg, that might simplify things, or make them more complicated, not sure, I'd have to implement it.

src/betterproto/plugin/models.py

adriangb · 2020-07-19T18:29:47Z

@nat-n I think the structure is now how you indicated and all of the entry points should work
@boukeversteegh I renamed the classes as discussed

boukeversteegh

@adriangb Congratulations on this amazing job! You've single handedly turned our ticker tape code into a nicely organized structure. I still recognize most of the pieces, but now each of them are in their proper place, and it fits together beautifully. I'm very happy with the result, and I expect it will make contributing this library much more pleasant, easy and much less daunting.

I see you've put a lot of effort in this, as is visible from the well readable and logically structured code, and you took great care to not overdo it. That is important and much appreciated.

Any suggested changes that came to my mind are insignificant compared to the big jump forward that this PR represents, so rather than to keep polishing it and blocking its merge, I suggest that we merge this as soon as there are no showstoppers.

Any finetuning (which for my part is limited to clarifying some variable names) can easily be done in follow PRs.

Your work will be one of the major milestones in this project, so I thank you on behalf of all contributors and users of betterproto who will find it in a much better shape.

Muchas gracias! 🙏🎉

src/betterproto/plugin/models.py

adriangb · 2020-07-20T14:22:20Z

I think you are giving me too much credit, I could not have made this PR without the work you and others have done before me!

I made the minor edits from your comments.

Since these changes are all internal, I think it's okay if we have to make changes again later, so I too am on board for merging what we've got so far and going from there.

abn

All for this change. Sorry I am a bit late to the party, but wanted to add a few comments regarding structure and change sequencing. Not show stoppers, but nice to haves.

abn · 2020-07-23T13:05:24Z

src/betterproto/templates/template.py.j2

 # plugin: python-betterproto
 from dataclasses import dataclass
 {% if description.datetime_imports %}
-from datetime import {% for i in description.datetime_imports %}{{ i }}{% if not loop.last %}, {% endif %}{% endfor %}
+from datetime import {% for i in description.datetime_imports|sort %}{{ i }}{% if not loop.last %}, {% endif %}{% endfor %}


Would it make sense to implement this using isort similar to the way we use black today? Obviously as part of another changeset and not within here.

That definitely makes sense for a future PR. Previously sort was being used somewhere else, I just moved it.

abn · 2020-07-23T13:05:28Z

src/betterproto/templates/template.py.j2

@@ -1,13 +1,13 @@
 # Generated by the protocol buffer compiler.  DO NOT EDIT!
-# sources: {{ ', '.join(description.files) }}
+# sources: {{ ', '.join(description.input_filenames) }}


Just for sanity sake it would be great to see the changes to names etc (and the template as a whole) to be split out into another PR just so that in case of any downstream issues we can revert easily. Just food for thought.

I kind of agree. There's two main reasons why names were changed:
(1) When using the old names in the new structure really didn't make sense (ex: properties being a container for fields).
(2) When I just renamed things as I was developing for the sake of keeping my sanity and understanding everything.

I'd say the former makes sense for this PR but the latter (along with other re-naming for clarity's sake) would have made sense for another PR. The issue is that I don't remember which is which at this point 😣

abn · 2020-07-23T13:13:04Z

src/betterproto/plugin/__init__.py

@@ -0,0 +1 @@
+from .main import main


Suggested change

from .main import main

from betterproto.plugin.main import main

Or better yet, drop this in favour of consolidating main.py into __main__.py and update pyproject.toml to use betterproto.plugin.__main__:main.

boukeversteegh · 2020-07-23T15:39:51Z

It may not be needed anymore, as within the poetry shell we can call the plugin with `--betterproto_out` or something, but i haven't checked the code if we actually do that

…

On Thu, Jul 23, 2020, 11:42 Arun Babu Neelicattu ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In src/betterproto/plugin/plugin.bat <#121 (comment)> : > ***@***.*** plugin_dir=%~dp0 ***@***.*** -m %plugin_dir% %* Is this used anymore? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#121 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAANFJTICYWGTLSBZAZVTJDR5AAY3ANCNFSM4O2U5XKA> .

boukeversteegh · 2020-07-25T17:42:17Z

As proposed earlier, I will merge this PR since there don't seem to be showstoppers in it.
Any small improvements / nice to haves are welcome as new PR's.

Thank you!

…ke structure to represent parsed data (danielgtaylor#121)" This reverts commit b5dcac1

…ture to represent parsed data (danielgtaylor#121) Refactor plugin to parse input into data-class based hierarchical structure

adriangb mentioned this pull request Jul 15, 2020

Add support for custom options #119

Open

adriangb force-pushed the modular-plugin branch from 4e1f31f to ae6c1e3 Compare July 15, 2020 16:26

Gobot1234 reviewed Jul 15, 2020

View reviewed changes

rebase onto latest master

6e2d8a7

adriangb force-pushed the modular-plugin branch from ae6c1e3 to 6e2d8a7 Compare July 15, 2020 21:45

adriangb added 5 commits July 16, 2020 10:41

fix bugs

e60791b

add import

46c1035

fix map type detection

faea404

formatting fixes

092076b

nested msg fix

e58bbe1

adriangb commented Jul 16, 2020

View reviewed changes

adriangb marked this pull request as ready for review July 16, 2020 17:32

adriangb added 6 commits July 16, 2020 12:47

blacken

d51a5b1

mutable defaults fix

cefd73f

more mutable default arg fixes

f2b81bd

fixes

a77e462

routing fix

514548d

fix all tests

e970c5a

adriangb changed the title ~~Modularize plugin.py~~ REF: Refactor plugin.py to use modular dataclasses in tree-like structure to represent parsed data Jul 17, 2020

adriangb commented Jul 17, 2020

View reviewed changes

adriangb added 2 commits July 16, 2020 21:20

black

01345c5

revert changes to Makefile

8858a34

python3.6 support

6747a23

nat-n reviewed Jul 17, 2020

View reviewed changes

docs

f9dd7eb

file cleanup

62eea35

adriangb commented Jul 17, 2020

View reviewed changes

nat-n reviewed Jul 17, 2020

View reviewed changes

structure changes

ecd4b5a

boukeversteegh reviewed Jul 18, 2020

View reviewed changes

boukeversteegh reviewed Jul 19, 2020

View reviewed changes

restructure, rename

0fdb181

blacken

8a99cbd

boukeversteegh previously approved these changes Jul 20, 2020

View reviewed changes

src/betterproto/plugin/models.py Outdated Show resolved Hide resolved

src/betterproto/plugin/models.py Outdated Show resolved Hide resolved

fix comment

66610b0

adriangb dismissed boukeversteegh’s stale review via 66610b0 July 20, 2020 14:12

clarify variable names

6f52343

adriangb requested a review from boukeversteegh July 20, 2020 20:14

boukeversteegh approved these changes Jul 21, 2020

View reviewed changes

adriangb requested a review from nat-n July 21, 2020 19:39

abn reviewed Jul 23, 2020

View reviewed changes

boukeversteegh merged commit b5dcac1 into danielgtaylor:master Jul 25, 2020

Gobot1234 added a commit to Gobot1234/python-betterproto that referenced this pull request Aug 6, 2020

Revert "REF: Refactor plugin.py to use modular dataclasses in tree-li…

c2c946d

…ke structure to represent parsed data (danielgtaylor#121)" This reverts commit b5dcac1

abn mentioned this pull request Nov 24, 2020

Release v2.0.0b2 #175

Merged

		test: ## - Run tests, ingoring collection errors (ex from missing imports)
		poetry run pytest --cov betterproto --continue-on-collection-errors

		fh.write(request.SerializeToString())


		if __name__ == "__main__":

	from .main import main
	from betterproto.plugin.main import main

		@SET plugin_dir=%~dp0
		@python -m %plugin_dir% %*

		@SET plugin_dir=%~dp0
		@python %plugin_dir%/plugin.py %*

REF: Refactor plugin.py to use modular dataclasses in tree-like structure to represent parsed data #121

REF: Refactor plugin.py to use modular dataclasses in tree-like structure to represent parsed data #121

Conversation

adriangb commented Jul 15, 2020 • edited Loading

boukeversteegh commented Jul 15, 2020

adriangb commented Jul 15, 2020

boukeversteegh commented Jul 15, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adriangb commented Jul 15, 2020

Choose a reason for hiding this comment

adriangb commented Jul 16, 2020

Choose a reason for hiding this comment

adriangb commented Jul 17, 2020

nat-n left a comment

Choose a reason for hiding this comment

adriangb commented Jul 17, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nat-n left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

boukeversteegh left a comment

Choose a reason for hiding this comment

adriangb commented Jul 18, 2020 via email

boukeversteegh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adriangb Jul 19, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

boukeversteegh Jul 19, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adriangb commented Jul 19, 2020

boukeversteegh left a comment

Choose a reason for hiding this comment

adriangb commented Jul 20, 2020

abn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

boukeversteegh commented Jul 23, 2020 via email

boukeversteegh commented Jul 25, 2020

adriangb commented Jul 15, 2020 •

edited

Loading

adriangb Jul 19, 2020 •

edited

Loading

boukeversteegh Jul 19, 2020 •

edited

Loading