Skip to content

Minimal viable metaprogramming #4296

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ghost opened this issue Mar 16, 2025 · 26 comments
Open

Minimal viable metaprogramming #4296

ghost opened this issue Mar 16, 2025 · 26 comments
Labels
feature Proposed language feature that solves one or more problems

Comments

@ghost
Copy link

ghost commented Mar 16, 2025

TL;DR: treating a macro as a pure function transforming input text into output text, replacing the original.

I've been trying to identify the minimal set of features that enable code generation in most known use cases.
I'm thinking of the following setup.

  • The macro operates on blocks of code passed to the macro as text (as opposed to ast, stream of tokens, set of mirrors etc - just pure text).
  • The macro writes its output to some text buffer via out.write(String text).
  • The generated text replaces the original block of code.
  • Macro calls can be nested.
  • Preprocessor invokes the macros in a natural order (that is, inside out), like functions: when we write f(g(x)), we expect g to be called first - the same is true with macros.
  • The preprocessor doesn't try to parse the file for any purposes other than to identify the macro invocations and the blocks to be passed as parameters.
  • All information necessary for the macro execution is contained in the block of text. The macro doesn't need to look into other files or even into the same file. It doesn't know anything beyond the text passed as a parameter.
  • In particular, macro has no access to type information.
  • The preprocessed version of the file is saved in another (readonly) file (naming conventions - TBD).
  • As a corollary, while making a decision whether to re-process the file, the preprocessor can look only at the timestamps of the source file and generated file (plus the timestamp/version of the macro itself).
  • there's no requirement to write the block passed into the macro in dart's syntax - it can be anything
  • the macro can parse the text using the tools of its choice.
  • the macro can accept parameters as name/value pairs, e.g. @[some_macro a:'hello', b: 42]

The challenging part is how to syntactically identify the boundaries of the block to be passed to the macro.
The conjecture is that 2 rules can cover most of the use cases:

@[my_macro] 
class Foo {
  // code
}
class Bar {
   int f = @[eval] factorial(10);
}

The code gets scanned till the first occurrence of ; or {, whatever comes first.
If it's ; the block ends at that point.
Otherwise, the code gets scanned till the matching occurrence of }.

@ghost ghost added the feature Proposed language feature that solves one or more problems label Mar 16, 2025
@ykmnkmi
Copy link

ykmnkmi commented Mar 17, 2025

Can I insert HTML with this?

@[component]
Node counter() {
  var count = state<int>(0);

  void handleClick() {
    count.set(count() + 1);
  }

  return <button on:click={handleClick}>
    Clicked {count()} {count() == 1 ? 'time' : 'times'}
  </button>;
}

@ghost
Copy link
Author

ghost commented Mar 17, 2025

Yes, you can insert any text, you just have to know how to parse it. E.g.

var ChessGame = @[chess title:"Kasparov vs. Deep Blue"] {
  1.Nf3 d5 2.d4 e6 3.g3 c5 4.Bg2 Nc6 5.0-0 Nf6 6.c4 dxc4 7.Ne5 Bd7 8.Na3 cxd4 9.Naxc4 Bc5 10.Qb3 0-0 11.Qxb7 Nxe5 12.Nxe5 Rb8 
  13.Qf3 Bd6 14.Nc6 Bxc6 15.Qxc6 e5 16.Rb1 Rb6 17.Qa4 Qb8 18.Bg5 Be7 19.b4 Bxb4 20.Bxf6 gxf6 21.Qd7 Qc8 22.Qxa7 Rb8 23.Qa4 
  // etc
  66.Qxf5 Qc6 67.Qf8+ Kc7 68.Qe7+ Kc8 69.Bf5+ Kb8 70.Qd8+ Kb7 71.Qd7+ Qxd7 72.Bxd7 Kc7 73.Bb5 10 (Resignation)
}

The only requirement is that the block should be syntactically well-formed.
You, of course, need a parser, but for every commonly used notation, the parsers are normally available somewhere.
In your case, I think it would be simpler to parse if you denote it slightly differently:

Node counter() {
  var count = state<int>(0);

  void handleClick() {
    count.set(count() + 1);
  }

  return @[component] <button on:click={handleClick}>
    Clicked {count()} {count() == 1 ? 'time' : 'times'}
  </button>;
}

Now, your macro will have to process a pure jsx text.

@lrhn
Copy link
Member

lrhn commented Mar 17, 2025

Should that be minimal?
I can do less! (Or more, with less. Or something.)

Consider:

  • Some way to recognize metaprogramming annotations in a file. @[...] syntax is fine, and not requiring anything other than balanced [/], {/} /*/*/ and string quotes. (Anything that can make an @[..] be not an annotation.)
  • When one or more files are recognized to contain metaprogramming annotations, the scanner invokes the "macro" code associated with the keywords with the file path(s) and (just to be nice) the positions in the fuke of the detected annotations.
  • The macro code does whatever it wants to, including editing the file. (I'd probably insert a part directive surrounded by comments saying that it's generated, and probably with a macro annotation pointing back to the same macro, so it can remove it again if it's the only annotation left, and the write everything into a part, at least if I we have some kind of augmentation.)
  • If a file changed, the scanner will rescan it and check if any annotations for other macros have changed, and if so run those again.
  • Repeat round-robin until all metaprogramming annotations have been passed to a macro without causing any change.

It's a little fidgety since any change in a file containing an annotation can be significant, so if two macros both change the same file, they'll keep getting invoked until they both stop editing.

It modifies the original file directly, and has direct access to the file system. (Probably have to restrict it to the Pub directory somehow). That's obviously dangerous, but also maximally powerful.
I'd want some established "generated code region" comments that tools can use to recognize which code is generated, and maybe even by what. You don't have to use them, you can run a code-generator once, and have it remove all the annotations and just leave normal code. You can comment out the originals code and insert replacements, or you can do piecemal modification.

@ghost
Copy link
Author

ghost commented Mar 17, 2025

I'm not sure it's "more with less". In my mental model, no rescan of the entire file is necessary.
Example:

// other code
@[some_macro]
class Foo {
  // somewhere
  @[another_macro] var x = something;
}
// other code

The preprocessor detects level-0 invocation (@[some_macro]), and pulls the entire block starting with class Foo { and ending with matching } into memory. This block then becomes the "working area" for level-1. (There's a stack of "working areas", initially containing the entire file at stack level 0). Within the current working area, the preprocessor identifies the invocation of another_macro. The block @[another_macro] var x = something; is now a "working area, level 2". When the macro returns the text. preprocessor comes back to working area level 1, makes the substitution and continues scanning.
(You probably understand what I mean). Now the question is "from which point will it continue scanning"? Will it re-scan the area from the start? The only reason for this would be to account for the case when the macro inserts invocations of other macros in the returned text. Then it's equivalent to saying: "the returned text gets scanned, too, and this continues until on some iteration there's no more unresolved macro calls" (Some limit has to be set to prevent infinite cycles). That's fine, of course.
BUT: it doesn't re-scan the entire file on each iteration, there's no need for that. The entire re-scan can happen in the unusual case where you annotate the entire library, like

@[some_macro] {
library foo;
//The entire library gets preprocessed
}

but this scenario is not a typical one.

Important: by construction, the macro cannot change anything "in the file". It doesn't know what "file" is. It receives text as input and returns text as output.
The preprocessor works on the level of individual files and gets activated on-save. The delay should be almost unnoticeable (e.g. 0.2 sec).
In my mental model, the output of the preprocessor gets saved in a different file where all macro calls are substituted with the outputs. It never modifies the same file. (I've never seen macros that modify the file in-place).
(FWIW, back in time when I experimented with the serialization, I used naming conventions: filename.dart.proto 1 and filename.dart for the (readonly) result). Another important point is that the preprocessing of a file never triggers a cascade of preprocessor runs for other files.
I'm not sure we understand each other. I tried to keep the description short, but could have missed something.
Can you ask the questions like this: "what happens if ..." - and describe some condition I might have missed (with a short example)

Footnotes

  1. the .proto file is not necessarily a valid dart file. It wasn't a valid dart file even with a RIP-metaprogramming design: more and more arbitrary rules were introduced for "augmentations" to create an illusion of a dart syntax.

@ghost
Copy link
Author

ghost commented Mar 18, 2025

The lack of access to type information may at first glance look like a major scandal. But in reality, a good argument can be made that type info is immaterial.
Suppose you generate a deserializer for a class A, which has a field b of type B somewhere. Though you don't have access to type B, you can still assume the existence of a static method fromJson in B. So you generate b = B.fromJson(text);.
What happens if B does not have such a method? You will get a static error on b = B.fromJson(text);. You will be able to see not only the invocation but also the full context. If the macro wants to be extra-friendly, it can generate the code that verifies all the assumptions made by a macro, e.g. assert(B.fromJson is Function(), "message");. You will get a static error in assert anyway.

@rrousselGit
Copy link

More importantly, the lack of type information means we can't generate copyWith/==/toString/... ; since we don't have the list of fields nor their types.

@ghost
Copy link
Author

ghost commented Mar 18, 2025

@[generate_to_string]
class A {
  int a;
  String b;
  // etc.
}

@rrousselGit: I don't understand your argument. The whole block starting with "class A" gets passed into the macro. Macro parses it with a standard dart parser and finds out what the fields/types are. But only the class A is available for parsing! If the class contains a field of type B, then the generator can call it via b.toString() as a part of its own toString(). No need to parse the definition of B for that.
Could you find a better counterexample? :-)

@insinfo
Copy link

insinfo commented Mar 18, 2025

perhaps to reduce complexity and still maintain a minimal type analysis, it could be restricted to an analysis based on a simple parse implementation that analyzes only the file where the macro is applied and only analyzes primitive types and ignores inheritance, which is the most complex part in my opinion. Even with this restriction, it can still be very useful for simple types that do not use inheritance.

@rrousselGit
Copy link

rrousselGit commented Mar 18, 2025

Take Freezed Unions:

@freezed
class Union {
  factory Union.a(int value) = A;
  factory Union.b(double? value) = B;
}

This generates the field:

num? value;

so that folks can do:

Union union = ...;
num? value=  union.value; // legal
if (union is A) {
  int value2 = union.value; // legal too.
}

This requires type information to know that the shared interface between all value types is num?

@ghost
Copy link
Author

ghost commented Mar 18, 2025

@rrousselGit: this argument shows that the union should be a first-class construct,
Not to sound dismissive (you did whatever you could), but a homegrown union is a poor man's union, a workaround waiting for a real union. Fortunetaly, @eernstg has come up with the concept of a real union in #2727 - the concept he is satisfied with (me too, if it matters, which it doesn't). (this was a result of a very long vigorous discussion :-)

I'm really interested in other examples. I'd like to find out the limits of the concept of a minimal viable thing.
Having said this, I don't expect it to be a Holy Grail of metaprogramming. Such a Holy Grail doesn't exist, or else it would have been found eons ago. But still, I'm curious to explore how far it can go.

@rrousselGit
Copy link

Freezed again. Deep copy:

@freezed
class A {
  A(this.b);
  final B b;
}

@freezed
class B {
  B(this.value);
  final C c;
}

@freezed
class C {
  C(this.value);
  final int value;
}

This supports:

A a;
a.copyWith.b.c(value: 42);

It relies on knowing if a field is using Freezed or not. That requires inspecting the field.


Mobx

class Store {
  @observable
  int value = 0;
  @observable
  List<int> list = 0;
}

This encapsulates the getters/setters into a different Observable subclass based on the type, such as Observable vs ObservableList.

This requires knowing the type of a field.


Rivepod

@riverpod
Future<int> async(ref) => ...;

@riverpod
Stream<int> stream((ref) => ...;

This generates an object of a different type based on wether the return value is a Future/Stream or a random object.


functional_wdiget

@sWidget()
Widget home(BuildContext context, {int? id}) => ...

This generates a class ; which happens to override Diagnosticable and list every widget parameter.
And based on the parameter type, Diagnosticable expects a different function. Cf StringProperty vs EnumProperty vs BoolProperty, etc...


I could list more. Type information is quite critical to codegen.

@ghost
Copy link
Author

ghost commented Mar 18, 2025

@rrousselGit:
While staring at your examples, I'd like to make sure we are on the same page.

class Store {
  @observable
  int value = 0;
  @observable
  List<int> list = 0;
}

This encapsulates the getters/setters into a different Observable subclass based on the type, such as Observable vs ObservableList.

But... the macro for @[observable] will receive, as a text parameter, this whole line:
int value =0;. And if it's a list, then it's List<int> list=0. After parsing, you get an idea of whether it's a list or not.

Please send more examples! I have to categorize them!

@rrousselGit
Copy link

Type annotations != type

Consider:

typedef MyList = List<int>;

or:

class MyList extends ListView<int>

Or

extension type MyList(List<int> value) {}

Those are ways to define lists that no amount of if type.startsWith('List') will handle

I'm not saying you can't generate some code with just the AST. I've certainly attempted to do so in some of my generators, for the sake of speed.
But you will be limited in what you can do.

You'll have to cut some features ,and your users will regularly have to debug "why is the generated code failing" because you can't have proper error handling.

It wouldn't meet Dart's standard of user-friendlyness and quality IMO.

@ghost
Copy link
Author

ghost commented Mar 19, 2025

You listed corner cases which are either never or very rarely occur in the code subject to metaprogramming.
And for these rare cases there's a standard cure: @annotations. E.g.

@freezed
class B {
  B(this.value);
  @freezed final C c;
  // OR
  final D d @freezed;
}

In any case, you need some stats to determine whether theoretical possibilities are real, or else you will be optimizing for a non-existent case.

if type.startsWith('List')

No, you don't have to do it. You just parse it with a standard dart parser, it may generate AST or other TBD format.
BTW, you are not limited to dart syntax. You can generate SQL requests and what-not.

You can go through all your examples and verify that an extra annotation (whenever necessary) can help. There are 2 forms: extra parameter in the macro call @[my_macro name: value name:value] or a regular @annotation for a member.

It wouldn't meet Dart's standard of user-friendliness and quality IMO.

I am not aware of any current or currently discussed approaches to metaprogramming that could qualify as "meeting high Dart's standards".

If a 100K LOC project can be preprocessed within 1 sec 1, it's a big step forward. Big enough to justify some rare inconveniences (I'm not even sure there are many).

That's not all. The user will be able to write the code with no artifacts like extending _$ classes or other noise. And in the end, the user will see the clean generated code, with no trace of all macros, annotations or anything distracting the attention.

Footnotes

  1. a conservative estimate. According to the posted benchmark, dart's json decoder works today at a rate of ~120MB/sec (please verify the calculations). The amount of work needed for parsing/code generation is in the same ballpark. (there's an overhead of file read/write operations, but it doesn't depend on the method of metaprogramming)

@ykmnkmi
Copy link

ykmnkmi commented Mar 19, 2025

Does analyzer have functions or classes for parsing partial declarations? parseString function from analyzer expects whole script and Parser class from _fe_analyzer_shared accepts tokens).

@ghost
Copy link
Author

ghost commented Mar 19, 2025

Not sure this answers your question, but here's what ChatGpt suggests:


The package:analyzer provides a way to parse individual expressions without needing an entire Dart file. You can use the parseString() function along with the Parser.parseExpression() method to parse standalone expressions.

Here’s how you can do it:

import 'package:analyzer/dart/analysis/utilities.dart';
import 'package:analyzer/dart/ast/ast.dart';

void main() {
  var expression = 'a + b * c';
  
  var result = parseString(content: expression, throwIfDiagnostics: false);
  var expressionNode = result.unit.childEntities.firstWhere(
    (node) => node is Expression,
    orElse: () => null,
  );

  if (expressionNode is Expression) {
    print('Parsed Expression: ${expressionNode.toSource()}');
  } else {
    print('Failed to parse expression.');
  }
}

You can go from here. Pretty sure analyzer provides all building blocks out of the box. Use chatGpt or any other AI - it may generate some code.

@ghost
Copy link
Author

ghost commented Mar 19, 2025

@lrhn: I want to correct myself. If the macro generates the code where it inserts other macro calls (via @[another_macro} {...}), it has to declare this statically (in its properties). Otherwise, the preprocessor will always have to make another iteration over the generated code in vain, thus wasting resources. In the worst case, there will be a static error.

@TekExplorer
Copy link

TekExplorer commented Mar 20, 2025

What if generators accepted the analysis result for the file only? ie, that just that compilation unit (if I understand it right)

then, we also say that generators are not allowed to shadow types used in that file.

this way, the only usage of generated types, are those elements that haven't been resolved in the first place.

@freezed
sealed class MyUnion {
  // not allowed. if the generator tries to generate it, you get an ambiguity error, and you'll be forced to use a prefix
  factory MyUnion.intBad(int i) = int;
  // generator sees the prefix and can specify a library self-import of some kind to allow the prefix.
  // may require additional language feature, or just error I guess.
  factory MyUnion.intBetter(int i) = g.int; 

  factory MyUnion.int(int i) = UInt; 
  factory MyUnion.double(double i) = UDouble;
}

@freezed
class Wrapper {
  const Wrapper(this.union);
  final MyUnion union;
}

// strawman.
// specifying the type will automatically apply meta-meta annotation to allow only for classes.
// otherwise use the annotation and accept Element, or something. 
// probably need some kind of general resolver.
// thisLibrary as a context for reachability and prefixes.
freezed(Library thisLibrary, ClassElement annotatedElement, StringSink writer) {
  final factories = annotatedElement.constructors.whereType<FactoryConstructor>();
  for (final factory in factories) {
    // or whatever
    final InterfaceElement type = factory.redirected;
    assert(type.type is InvalidType);
    // or
    if (type.isResolved) {
      reportError(..);
      continue;
    }
   // is able to check parameter.type. is a freezed class? that's okay, we're not generating copyWith yet, just the interface. that will be a nested macro so that the interfaces we need will be available then.
   _generateFreezedClass(thisLibrary, classElement, type.name, writer);
  }
}

// gen?
augment MyUnion {
  num get i;

  // == and hashcode
  @freezedCopyWith
  MyUnionCopyWith get copyWith;
}

abstract mixin class MyUnionCopyWith {
  // some factory constructor that may be marked external or references an unresolved Impl type

  MyUnion call({int? i});
}

class UInt implements MyUnion {...}
...

augment class Wrapper {
  // normal stuff

  @freezedCopyWith
  WrapperCopyWith get copyWith;
}

// some implementation details idk.
abstract mixin class WrapperCopyWith {
  // some factory constructor that may be marked external or references an unresolved Impl type

  Wrapper call({MyUnion? union});
  UnionCopyWith get union;
}

// level 2
// the actual copyWith classes, and augmentations that introduce them.

basically, normal code gen, but with some restrictions so we don't end up cycling or with shadowed types rendering the generated code out of date

@TekExplorer
Copy link

TekExplorer commented Mar 20, 2025

The fact of the matter is, we need type information. its not possible to write good, reliable generators without it. we want users to be able to use normal dart constructs like aliases, prefixes, etc, and we simply wont be able to properly handle it otherwise.

it would be impossible to support multiple types of patterns otherwise, such as dart_mappable's fromMap or generic argument factories for serialization, or maybe being able to choose between .close() or .dispose() for disposable.

technically we could ask for that information in the annotation, but... why? the information is there, lets get it

@ghost
Copy link
Author

ghost commented Mar 20, 2025

why? the information is there, lets get it

Where "there"? "There" is a function of time. E.g. a macro handling class A wants to know if class B implements the method fromJson. At what moment of time? At t0, it doesn't, but at t1, it does - because fromJson gets added by the same or a different macro.
The setup is akin to a biological ecosystem where each species affects the environment, which in turn affects every other species causing them to adapt; the adaptation changes the environment again, and so on ad infinitum. It's very difficult to reason about the behavior of such a system. You can't build a mental model of it. When something breaks, you won't know how to untangle it.

(note that in the last model, you could ask the same question: why can't I know the values of constant expressions? The constants are there after all. But... they are not frozen in time!)

The best way to deal with Gordian Knots like this is to cut them. That's what I'm trying to propose here.


I'd like to emphasize one important point of the proposal. If the macro makes some assumptions about the types based on annotations, heuristics etc, it has to insert "assert" statements that verify the assumptions statically. That guarantees that no possible misconception would propagate to the runtime.

Another characteristic of the proposed mechanism: preprocessing can be easily parallelized (b/c nobody depends on anybody else). This is a minor point considering that the preprocessing would be very fast as it is, but is worth pointing out for completeness.

@TekExplorer
Copy link

TekExplorer commented Mar 20, 2025

why? the information is there, lets get it

Where "there"? "There" is a function of time. E.g. a macro handling class A wants to know if class B implements the method fromJson. At what moment of time? At t0, it doesn't, but at t1, it does - because fromJson gets added by the same or a different macro. The setup is akin to a biological ecosystem where each species affects the environment, which in turn affects every other species causing them to adapt; the adaptation changes the environment again, and so on ad infinitum. It's very difficult to reason about the behavior of such a system. You can't build a mental model of it. When something breaks, you won't know how to untangle it.

if a class doesnt have a fromJson yet, then perhaps the macro isnt handling it correctly.

do it in stages.

  1. defines the fromJson interface.
    • runs the "top-level" macros once across all relevant files (possibly only the imported ones?)
  2. implements it
    • either a second macro is applied to the new definition to actually create the implementation
      • which could perhaps be marked as compile-only, so the analyzer doesnt need to run implementation generation?
    • or the same macro is run, only with the fromJson now available (the macro could have told the system that it was not done)

It would probably take some doing to find a best practice for that, but i remember that one of the big problems for me with macros was that i couldnt check a type and also define a new one at the same stage, as what i define depended on the type.

with this, it would work fine.

perhaps a macro could indicate that its environment isnt ready yet, and to please call it again next stage, where the desired generated code may be available? macrorunner.defer() or macrorunner.notDone(). probably better than the runs_before stuff in build_runner

macro application is complete only when all macros have resolved (or remaining macros are all compile-time implementation generators, which would have compile-checked grantees that it does not add anything to any interface)

i figure if a macro isnt done and theres nothing else to run, then its not executed again and a diagnostic error is shown on the macro

or maybe a macro marks itself as complete instead, like macrorunner.complete()

either one probably returns Never since the macro is meant to stop executing there.

@rrousselGit
Copy link

Build_runner will get better on that area apparently. With augments, there's plan to support augmentations to add other augmentations. So you could have an infinite number of "stages"

@ghost
Copy link
Author

ghost commented Mar 20, 2025

With augments, there's plan to support augmentations to add other augmentations. So you could have an infinite number of "stages"

My internal "complexity indicator" flashes red :-)
If you could have an infinite number of stages, you will have an infinite number of stages. Miscalculations, bugs, new versions... You are in for some very exciting troubleshooting sessions :-)

@TekExplorer
Copy link

TekExplorer commented Mar 21, 2025

With augments, there's plan to support augmentations to add other augmentations. So you could have an infinite number of "stages"

My internal "complexity indicator" flashes red :-) If you could have an infinite number of stages, you will have an infinite number of stages. Miscalculations, bugs, new versions... You are in for some very exciting troubleshooting sessions :-)

I doubt it. just put a cap on how many passes we're allowed to make. devs shouldn't have recursive macros, or macros that never finish working. that will push for better implementations.

the only reason you need more than one pass in the first place is to depend on generated code. how many new types/interface modifications can macros need?

@ghost
Copy link
Author

ghost commented Mar 26, 2025

In principle, the model can be extended so that the macros actually have access to type information via something like reflectable. The type information can be collected and cached based on .proto files, not the results of code generation. This can be done efficiently by the analyzer without causing checken-and-egg paradoxes.

(the type info should, among other things, contain the annotations of both forms @annotation and @[macro_call], which will make it possible for a "freezed" macro to find out whether the imported type B is going to be processed by freezed)

@ghost
Copy link
Author

ghost commented Mar 27, 2025

I think the approach outlined in the previous comment is feasible as a community project. At least, it's a much easier effort than dart-rust interface.
@rrousselGit, could you comment on that? Do you have counterexamples?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Proposed language feature that solves one or more problems
Projects
None yet
Development

No branches or pull requests

5 participants