Mechanics of interpreter generation from DSL #479

gvanrossum · 2022-10-12T18:47:39Z

Assuming we have a DSL roughly like this (modified per #477), we have a few ways to integrate this.

We could place the generator input in a separate file, and the generator script (automatically run by make when the DSL file is updated) writes the interpreter cases to a separate file (I'll call it cases.h for now, we can bikeshed the actual name later). In ceval.c we replace all the cases (roughly 4000 lines) with a single #include "cases.h". (Presumably we'd check cases.h into git, just so people can bootstrap more easily.)
Place the generator input in a separate file, but have the generator update ceval.c in place, sort of like . A downside is that people will undoubtedly confused and try to edit the generator output. (We could add comments warning them off, but we'd probably have to repeat those comments for each case.)
Place the generator input in ceval.c, exactly where the cases are currently, but surrounded by #if etc. so that the compiler uses the #include "cases.h" but tooling (e.g. VS Code) sees the generator input and believes this is the actual code. (This may actually be a little bit complicated, because at least VS Code understands #if and expands them according to the configured variables. We'd have to fool it, otherwise it'll think that the generator input is all dead code and dim its appearance.) The generator can find its input by reading ceval.c, looking for the appropriate marker #if and #else. It still writes its output to cases.h.

(3) feels like a bit more complicated to implement, but is probably the most user-friendly for people trying to read or edit the generator input, since they see it in a context where their editor can show them both the C code for an instruction and the surrounding C code (static functions they call, and the non-generated parts of the function containing the switch). It also preserves the git history of the C code included in the generator input.

I'm not keen on (2), because of the mechanics of rewriting ceval.c in place. But it does preserve git history, like (3).

The simplest to build is probably (1). It does not preserve git history, however (the code of all instructions would be owned by the initial creator of the input file).

I think I've convinced myself to prototype (3) and see how this works out. Now on to designing the architecture of the generator. :-)

The text was updated successfully, but these errors were encountered:

gvanrossum · 2022-10-13T01:30:18Z

I've got a very early prototype of a hybrid of (1) and (3) in my branch generate-ceval-switch. The generator script is Tools/scripts/generate_cases.py. It reads Python/bytecodes.inst and writes Python/cases.h.

It's a hybrid because the code generator reads bytecodes.inst and writes cases.h, which is then #included by ceval.c -- that's (1), but it retains the original cases in ceval.c, and there's a separate script (Tools/scripts/extract_cases.py) that regenerates bytecodes.inst from the original cases in ceval.c -- the two scripts together are almost (3). It round-trips, and the contents of cases.h is identical to the full list of cases (really TARGETs) in ceval.c.

Most things here are still fake -- the code generator has a regex that basically looks for 'inst' and then copies a block of code without looking at it, and it ignores the stack effect. The extractor tries to guess the numeric stack effect by calling dis.stack_effect(). But it turns out that for super-instructions and specialized instructions the dis module doesn't know the stack effect. And there are additional complications relating to variable stack effects. Moreover, even for those opcodes whose stack effect is fixed and known, we don't actually know how many variables it consumes from the stack and how many it produces -- e.g. BINARY_OP consumes two values and produces one, but all we know is that the net stack effect is -1, which we represent as (__0 -- ), whereas it should really be something like (left, right -- value).

Next steps? I have many ideas.

Write an actual parser for use by the code generator.
Generate family declarations from dis._specializations.
Derive stack effect for specialized and super-instructions from their components, in the extractor.
Derive linear variable stack effects in the extractor.
Many cases end withDISPATCH(); -- we could remove this in the extractor (leaving behind a flag, maybe) and put it back in the generator.
Auto-insert PREDICTED() macros.
If a case ends with PUSH(value), and its stack effect is 1, mark that in the extractor and use it in the generator.
Ditto with simple POP() cases.
Work through the examples from this.

carljm · 2022-10-14T23:52:01Z

Are there any design decisions here that would preclude adding the ability to have multiple separate input files to the code generator? (This would be useful for an extension like Static Python that wants to generate an extended interpreter loop.) I read over the design doc and didn't see anything that looked like it would be a problem.

gvanrossum · 2022-10-15T00:21:57Z

Yeah, that should be simple enough.

markshannon · 2022-10-17T13:57:09Z

I would prefer 1 (separate file). Having the input in a separate file makes it simpler to have two forms of the macros, the "fake" ones in at the top of the input to help IDEs and such, and the real ones in ceval.c. Unless, we don't want any macros in ceval.c and have the tooling generate the expanded form directly.

Having separate inputs files, also makes the tooling more composable.
e.g.
interpreter_gen --template ceval_template.c instructions.c > ceval.c
interpreter_gen --template trace_template.c instructions.c trace_super_instructions.c > trace_eval.c
For Cinder:
interpreter_gen --template cinder_template.c instructions.c cinder_extras.c > ceval.c

markshannon · 2022-10-17T14:17:18Z

I'd suggest delaying doing any stack analysis, until after we have a generator merged.
For now the generator only need to understand the form without stack comment.

The definition file in generate-ceval-switch has both stack comments and POPs, which would mean popping the values twice.
The stack comment should tell the code generator which POPs and PUSHes to generate.

gvanrossum · 2022-10-19T19:12:20Z

I would prefer 1 (separate file). [...]

Sure, I can do that. Although I don't think we should generate all of ceval.c (too much duplication) -- I like the current process that generates cases.h which is #included by ceval.c. The cases can then be removed from ceval.c, but the rest will stay. No need for a template.

I'd suggest delaying doing any stack analysis, until after we have a generator merged. [...]

Agreed. In the extractor (a temporary hack) I just generated the stack comments (to the extent that I could derive them) so I wouldn't have to do that later. The generator ignores them. I have a vague transition plan where the generator ignores stack effects if they start with __. I'll leave this alone for now.

gvanrossum · 2022-11-02T05:16:47Z

Once python/cpython#98830 is merged we should close this.

gvanrossum · 2022-11-14T19:38:09Z

This has served its purpose. We've settled on the following workflow:

The instruction definitions exist only in Python/bytecodes.c.
Tools/cases_generator/generate_cases.py reads Python/bytecodes.c and writes Python/generated_cases.c.h.
Python/ceval.c #includes Python/generated_cases.c.h.
For most people, make regen-cases (or make regen-all) runs the cases generator script.

This was referenced Oct 28, 2022

GH-98831: "Generate" the interpreter python/cpython#98830

Merged

Generate the interpreter python/cpython#98831

Closed

gvanrossum closed this as completed Nov 14, 2022

gvanrossum added the epic-generator DSL-based code generator for the interpreter(s) label Nov 16, 2022

This was referenced Feb 18, 2023

Enable generated interpreter cases to be composed from multiple inputs python/cpython#102020

Closed

Enable definition for generated interpreter cases to be composed from multiple files python/cpython#102021

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Mechanics of interpreter generation from DSL #479

Mechanics of interpreter generation from DSL #479

gvanrossum commented Oct 12, 2022

gvanrossum commented Oct 13, 2022 •

edited

Loading

Uh oh!

carljm commented Oct 14, 2022

Uh oh!

gvanrossum commented Oct 15, 2022

Uh oh!

markshannon commented Oct 17, 2022

Uh oh!

markshannon commented Oct 17, 2022

Uh oh!

gvanrossum commented Oct 19, 2022

Uh oh!

gvanrossum commented Nov 2, 2022

Uh oh!

gvanrossum commented Nov 14, 2022

Uh oh!

Mechanics of interpreter generation from DSL #479

Mechanics of interpreter generation from DSL #479

Comments

gvanrossum commented Oct 12, 2022

gvanrossum commented Oct 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

carljm commented Oct 14, 2022

Uh oh!

gvanrossum commented Oct 15, 2022

Uh oh!

markshannon commented Oct 17, 2022

Uh oh!

markshannon commented Oct 17, 2022

Uh oh!

gvanrossum commented Oct 19, 2022

Uh oh!

gvanrossum commented Nov 2, 2022

Uh oh!

gvanrossum commented Nov 14, 2022

Uh oh!

gvanrossum commented Oct 13, 2022 •

edited

Loading