Skip to content

Proposal: code annotations #14656

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ityonemo opened this issue Feb 15, 2023 · 18 comments
Closed

Proposal: code annotations #14656

ityonemo opened this issue Feb 15, 2023 · 18 comments
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@ityonemo
Copy link
Contributor

ityonemo commented Feb 15, 2023

The proposal is to introduce "code annotations", these are a special type of comment that, like doc comments, are tied to a specific piece of code. These are hints for 3rd party tooling to do things such as autogenerate surrounding code, or do static analysis.

Might be considered to be similar to pragmas in other languages, but see the Restrictions

Basic - level 0

Example (tags a function as BEAM-nif):

//: BEAM-nif: threaded
fn myfun(value: u8) u8 {
  return value + 1;
}

Example (for a hypothetical http server framework):

//: http-server: GET "/my_endpoint"
pub fn my_endpoint(request: Request) Response {
  // do stuff here
}

The (tentative) syntax of a annotation is:
//:<whitespace>identifier<colon> <any string>

More advanced - level 1

Internally make doc comments /// and //! special cases of //:

Even more advanced - level 2

create a way for the zig compiler to register 3rd party static analysis tools that can asynchronously subscribe to zig parser events, sema completion, zir generation, mir generation events, and code annotations. An analyzer could then track the code's custodianship, like responsibility to clean up resources (memory, file descriptors, etc). The analyzer could halt zig compilation.

Example (resource tracking):

const MyFileType = struct {
  //: tracker: (create, _)
  fn create(...) void {...}

  //: tracker: (destroy)
  fn close(...) void {...}
}

//: tracker: (_, transfer)
fn do_something_and_close(a: Allocator, f: *MyFileType) void {
  defer f.close();
  // stuff
}

//: tracker: (_, access)
fn access_only(a: Allocator, f: *MyFileType) void {
  defer f.close(); // !! tracker notices that this is not allowed by the "access" condition.
  // stuff
}

//: tracker: (_, access)
fn also_access_only(a: Allocator, f: *MyFileType) void {
  defer do_something_and_close(a, f); // !! tracker notices that this is not allowed by the "access" condition.
  // stuff
}

fn some_function(a: Allocator) void {
  var f: MyFileType = undefined;
  f.create("file_path");
  do_something_and_close(a, &f);
  f.close();  // !! tracker notices that this is not allowed to happen
}

Example (contract checking):

//: contract: (1..100)
fn must_be_one_to_hundred(value: u8) void {
  if (value > 100) { 
    @panic("out of bounds");
  }
}

fn do_something_bad() void {
  var index:u8 = 0;
  while (index < 150) : (index += 1) {
    must_be_one_to_hundred(index);  // !! caught by contract
  }
}

Cons

This is potentially embedding other languages in zig, which does break simplicity. However, I think in many cases there already is the temptation to do this anyways, as was the case with zigler's annotations hijacking docstrings (even though future elixir versions of zigler will not use this mechanism -- still may need to do it with erlang).

In other words, if we don't provide guidelines, people will probably find a way to do it anyways, and interop between colliding toolchain could be a problem.

Restrictions

unlike pragmas, code annotations are never allowed to affect the final compiled artifact of the code (aside from possibly halting compilation altogether, as in level 3). Otherwise this really runs afoul of breaking the principle of zig's simplicity and readability... As it is it's a little bit dangerous since this is tantamount to embedding other languages in zig.

Unopinionated

  • specific syntax doesn't have to be //:
  • what the syntax for multiline looks like.
  • should we force annotations to be more structured?
@ityonemo
Copy link
Contributor Author

ityonemo commented Feb 16, 2023

lots of downvotes. Any comments why? Can you steelman this argument? Did you read the part where I mention in many cases there already is the temptation to do this anyways? This comes from real experience.

@andrewrk
Copy link
Member

For what it's worth, I completely ignore the emojis on any proposal. They mean nothing to me

@andrewrk andrewrk added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Feb 16, 2023
@andrewrk andrewrk added this to the 0.11.0 milestone Feb 16, 2023
@andrewrk
Copy link
Member

Related proposal: #1099

@ityonemo
Copy link
Contributor Author

ityonemo commented Feb 16, 2023

sure, but if folks are truly displeased with it it would at least be good to understand why so that maybe a better alternative can be forged. I'm not particularly tied to this specific way of achieving this, but I noticed this when developing zigler (and TBQF felt a little bit dirty about how I achieved it) and forsee the issue of collisions if it continues to be a wild west.

@ityonemo
Copy link
Contributor Author

I like the feel of various tags (and that it's structured data) in #1099, (e.g. assigning to pub const) but as it is it only really lets you tag structs, you would need some indirection to tag functions, but for the purposes of 3rd party stuff it think it might be hard to get at them either through ZIR analysis or side-compiling and doing SEMA, if they're inside a non-public struct at the to level of any given file.

@sengir
Copy link
Contributor

sengir commented Feb 17, 2023

What do level 0 and level 1 provide over the status quo of //////!? Assuming that they can't be introspected at comptime (which I don't see in the proposal), then I don't see any functional difference.

unlike pragmas, code annotations are never allowed to affect the final compiled artifact of the code

I could see plenty of scenarios where this isn't true if

tools that can asynchronously subscribe to zig parser events, sema completion, zir generation, mir generation events, and code annotations

is true. If an external tool can be triggered during Parse, AstGen, or Sema, then it could create or modify a file that later gets imported. Ultimately, having hooks into Compilation like this easily allow for Probably Bad Things.

@Arya-Elfren
Copy link
Contributor

I think having some form of structured annotations is useful and even a good idea. However, level 2 is probably a bad idea.

The "build tools" integration should just be a build.zig calling tools by command and then those tools use the annotations. To make these even more enticing, the level 1 idea would just let people re-use the zig parsing code to extract annotations.

This could solve multiple things in one generic way:

  • Replace the // zig fmt off "magic comment" with something more structured like //: fmt: off
  • Static analysis tools (multi line might even get to the point of allowing easier formal verification?)
  • Doc comment structured annotations (in this case it would be that ///: a: number is syntax sugar for //: doc: a: number):
/// A helpful doc comment
///: a: the left param
///: b: the right param
///: return: the sum of a and b
fn add(a: usize, b: usize) void{
    return a + b;
}

@InKryption
Copy link
Contributor

I actually like that idea. May help with zig document generation, and would help with the use case of documenting parameter names in functions that don't have many parameters. E.g., right now, documenting the exemplified add would look like:

/// A helpful doc comment
fn add(
    /// the left param
    a: usize,
    /// the right param
    b: usize,
) usize {
    return a + b;
}

And there isn't even a way to explicitly add a doc comment for the "return".

Another documentation-related use case this may help address is "related function mentions". Often times functions will mention other functions, which is helpful, but it comes with the unfortunate downside that they will sometimes go out of date, when mentioned functions get renamed, deprecated, etc. A third party tool could detect something like

///: See also: `bar`, `baz`
fn foo() void {}

and help ensure that the declarations bar & baz actually exist in some way.

And another pretty simple one would be depreciation annotations, a la //: deprecated.

I was a bit skeptical of this proposal on first reading, but I now think it would actually make sense to standardize on "comment annotations", beyond hard-coded stuff like // zig fmt: .

@eLeCtrOssSnake
Copy link

I will say no because this will make people write a ton of useless comment lines and pages worth of information that is not critical and traiting every function silently instead of designing good api.
So instead of asserting and creating types and options they would rather make a 5 line commentary on what should and shouldnt be used to call function, which dangerously reminds me of C way of things with flags, errno etc.

@nektro
Copy link
Contributor

nektro commented Feb 19, 2023

the one use case i see this mainly potentially helping with is being able to label anytype parameters as interfaces for tooling eg ///: foo: std.io.Reader but there's no reason i see for it to be a special type of comment within the compiler

@tauoverpi
Copy link
Contributor

@eLeCtrOssSnake it allows one to annotate with static checks which cannot (or won't) be performed by zig similar to liquidhaskell in a standard format rather than having multiple tools defining their own special format which may collide or clutter things even more.

@mlugg
Copy link
Member

mlugg commented Feb 20, 2023

I like the idea of "basic" (level 0) annotations (they're like Go's struct tags, just a bit more general). It'd definitely be useful for [de]serializers, HTTP servers, etc if it were exposed at comptime. I'd suggest that the data, rather than being a part of @typeInfo, can be exposed through a new builtin, e.g. @annotation(my_endpoint, "http"); this would both make the intended usage more convenient (you don't have to search through some big structure in @typeInfo), and would allow us to emit an error if an annotation is unused, which feels like a good idea (EDIT: actually, maybe not - consider a case where an annotation is used in a code path which isn't analyzed when building with a certain flag). We'd probably need a separate one for fields, e.g. @fieldAnnotation(MyData, "field_name", "serialize"). Thinking about it, maybe the other one should be @declAnnotation(namespace, my_endpoint, "http") to avoid any confusion with how it works when decls are passed through as parameters or otherwise copied. (I think it's reasonable to say that parameter annotations wouldn't be overly useful, so can be omitted.)

@matu3ba
Copy link
Contributor

matu3ba commented Feb 20, 2023

This proposal is 1. missing on how the Zig compiler ensures that a tag has a correct semantic mapping and no duplicates or missing semantics attached to it.
Without this basic check, we would allow inconsistencies on combining multiple tools.

Also, 2. Consider the following:

    1. Any third-party tool will try to claim ownership of a abbreviation
    1. Code annotations are intended to be inferable along libraries
    1. This means that a verbatim code annotation requires like the C namespace "compatibility-guarantee forever" and "stuff is global" and the user has no idea or way to automatically extract those semantics.

To prevent having to repeat the C trap of adding incompatible things, I would suggest to combine the semantics of an annotation to "a required source of truth" like how Zig file references work. Think of it as something like required "mypy version + semantics" document, but it could be also a versioned link to the standard document or a fully versioned git repository.

I think a declarative approach to specify the code annotation in each file should be sufficient:

//: annotation special_json semantic_file

//: special_json: some_annotation
fn create() !void {
    ..
}

However, the most problematic tradeoffs to be made are (to me) that annotations either 1. are compact and inline with comments, but this means at worst external parser-library complexity, 2. compact, but limited in expressivity or 3. create rather ugly and verbose newlines.

Implementation detail(s) for code annotations on newline approach

If the AST would have comments and spacing separated from code,
then the Zig compiler could ensure annotation validity in another simplified AST-only pass not affecting main compilation.

Even more advanced - level 2

This is missing more design work/experimentation on how this could or should work with comptime-resolving plus, more importantly, a very convincing + general use case.

@ityonemo
Copy link
Contributor Author

* 1. Any third-party tool will try to claim ownership of a abbreviation

Point taken. Presuming each library has a consistent idea of what it's using for annotations, you could have something in build.zig that remaps its tags.

So, suppose library 1 has tags called "tracker" that map to the "footracker" plugin
and library 2 has tags called "tracker" that map to the competing "bartracker" plugin.

There could be a mechanism to map your software (which wishes to use both footracker and bartracker) to the "footracker" and "bartracker" tags) -- or say you could map footracker to "tracker1" and bartracker to "tracker_a", whatever, you want, in build.zig

I think a declarative approach to specify the code annotation in each file should be sufficient:

I don't love this required on a file-by-file basis, but I think build.zig would be okay.

that annotations either 1. are compact and inline with comments, but this means at worst external parser-library complexity, 2. compact, but limited in expressivity or 3. create rather ugly and verbose newlines.

Agreed. I do like the @tag proposal elsewhere, in that it literally creates something that must be zig parseable syntax (and therefore "pretty")

@ityonemo
Copy link
Contributor Author

If an external tool can be triggered during Parse, AstGen, or Sema, then it could create or modify a file that later gets imported. Ultimately, having hooks into Compilation like this easily allow for Probably Bad Things.

Right, this should be considered "disallowed behaviour". Of course it will be impossible? [*] to enforce this, but just provide guidance and the community I think is strong enough to call it out and refuse to use software that behaves badly.

[*] I mean, maybe? It is likely possible to run an integrity check.

@matu3ba
Copy link
Contributor

matu3ba commented Feb 27, 2023

I don't love this required on a file-by-file basis, but I think build.zig would be okay.

I was intending to specify it better. I think it should link to an arbitrary, but required to be semantically versioned package. However, the package should be "optional", so that the analyzer can do its own thing and download is done by the analyzer/prover on demand (if no incompatibility exists etc).

Ideally, Zig can guide this, but I have not thought yet on how exactly optional downloads could work in practice for (potentially) recursive dependencies.
I think we should just disallow recursive dependencies for generated compile steps and strongly emphasize that the analyzer/prover have commands for 1. auto-download and 2. present list of required dependencies to download + print the command.

Agreed. I do like the @tag proposal elsewhere, in that it literally creates something that must be zig parseable syntax (and therefore "pretty")

The main reason I dislike @tag is, because it looks like regular Zig code and static analysis/verifiers want a lot flexibility in how to express + parse things (in-place in comments, externally referencing or could be a tool combining multiple different analyzers/parsers) etc.
By making it look like regular Zig code it feels to me like regular code and users will try to sneak in more features and ask unnecessary questions.

@deanveloper
Copy link

deanveloper commented Jan 28, 2024

Go has a similar feature, where you can tie any string as a "tag" to a struct's field. However, this has actually caused quite a bit of frustration because it's just a string. There's a very well-liked proposal (golang/go#23637) which allows any data structure to be tied to a field, which provides a number of benefits (fixes name clashing, autocomplete issues, typos are caught at compile-time, better expressibility, etc).

It may be good to learn from Go's mistake, and allow tying any data structure to containers/fields rather than only strings.

@ityonemo
Copy link
Contributor Author

closing in favor of #1099

@andrewrk andrewrk modified the milestones: 0.14.0, 0.13.0 May 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
Development

No branches or pull requests