From 3bf2baf0790ce1dd86f59b196e51a7be2e600647 Mon Sep 17 00:00:00 2001 From: faizanoor3001 Date: Tue, 4 Mar 2025 13:31:55 +0200 Subject: [PATCH 1/5] Updated the interpreters.md with the how to add a new bytecode specialization steps --- InternalDocs/interpreter.md | 56 +++++++++++++++++++++++++++++++++++++ 1 file changed, 56 insertions(+) diff --git a/InternalDocs/interpreter.md b/InternalDocs/interpreter.md index 7195d9c6de575c..3f2d947878b0ed 100644 --- a/InternalDocs/interpreter.md +++ b/InternalDocs/interpreter.md @@ -505,6 +505,62 @@ After the last `DEOPT_IF` has passed, a hit should be recorded with `STAT_INC(BASE_INSTRUCTION, hit)`. After an optimization has been deferred in the adaptive instruction, that should be recorded with `STAT_INC(BASE_INSTRUCTION, deferred)`. +## How to add a new bytecode specialization + +Assuming you found an instruction that serves as a good specialization candidate. +Let's use the example of [`CONTAINS_OP`](../Doc/library/dis.rst#contains_op): + +1. Update below in [Python/bytecodes.c](../Python/bytecodes.c) + +- Convert `CONTAINS_OP` to a micro-operation (uop) by renaming + it to `_CONTAINS_OP` and changing the instruction definition + from `inst` to `op`. + + ```c + // Before + inst(CONTAINS_OP, ...); + + // After + op(_CONTAINS_OP, ...); + ``` + +- Add a uop that calls the specializing function `_SPECIALIZE_CONTAINS_OP`. + For example. + + ```c + specializing op(_SPECIALIZE_CONTAINS_OP, (counter/1, left, right -- left, right)) { + #if ENABLE_SPECIALIZATION + if (ADAPTIVE_COUNTER_IS_ZERO(counter)) { + next_instr = this_instr; + _Py_Specialize_ContainsOp(right, next_instr); + DISPATCH_SAME_OPARG(); + } + STAT_INC(CONTAINS_OP, deferred); + DECREMENT_ADAPTIVE_COUNTER(this_instr[1].cache); + #endif /* ENABLE_SPECIALIZATION */ + } + ``` + +- The original `CONTAINS_OP` is now a new macro consisting of + `_SPECIALIZE_CONTAINS_OP` and `_CONTAINS_OP`. + +2. Define the cache structure in [Include/internal/pycore_code.h](../Include/internal/pycore_code.h), +at the very least, a 16-bit counter is needed. + + ```c + typedef struct { + uint16_t counter; + } _PyContainsOpCache; + ``` + +3. Write the specializing function itself in [Python/specialize.c ](../Python/specialize.c). + Refer to any other function in that file for the format. +4. Remember to update operation stats by calling add_stat_dict in + [Python/specialize.c ](../Python/specialize.c). +5. Add the cache layout in [Lib/opcode.py](../Lib/opcode.py) so that Python's + dis module will know how to represent it properly. +6. Bump magic number in [Include/core/pycore_magic_number.h](../Include/internal/pycore_magic_number.h). +7. Run ``make regen-all`` on `*nix` or `build.bat --regen` on Windows. Additional resources From a023a526beab3fdfc0e61f743b54aa49b675bacf Mon Sep 17 00:00:00 2001 From: faizanoor3001 Date: Tue, 4 Mar 2025 21:57:39 +0200 Subject: [PATCH 2/5] Addressed the review comments --- InternalDocs/interpreter.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/InternalDocs/interpreter.md b/InternalDocs/interpreter.md index 3f2d947878b0ed..ef895228ac0a51 100644 --- a/InternalDocs/interpreter.md +++ b/InternalDocs/interpreter.md @@ -505,6 +505,7 @@ After the last `DEOPT_IF` has passed, a hit should be recorded with `STAT_INC(BASE_INSTRUCTION, hit)`. After an optimization has been deferred in the adaptive instruction, that should be recorded with `STAT_INC(BASE_INSTRUCTION, deferred)`. + ## How to add a new bytecode specialization Assuming you found an instruction that serves as a good specialization candidate. @@ -555,7 +556,7 @@ at the very least, a 16-bit counter is needed. 3. Write the specializing function itself in [Python/specialize.c ](../Python/specialize.c). Refer to any other function in that file for the format. -4. Remember to update operation stats by calling add_stat_dict in +4. Remember to update operation stats by calling `add_stat_dict` in [Python/specialize.c ](../Python/specialize.c). 5. Add the cache layout in [Lib/opcode.py](../Lib/opcode.py) so that Python's dis module will know how to represent it properly. From cb03b20c0b9458804c1d993d2dde8b16e0b3e11f Mon Sep 17 00:00:00 2001 From: faizanoor3001 Date: Tue, 4 Mar 2025 23:01:48 +0200 Subject: [PATCH 3/5] Update section for Step1, reworded as per review comments for clarity --- InternalDocs/interpreter.md | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/InternalDocs/interpreter.md b/InternalDocs/interpreter.md index ef895228ac0a51..c84e9a92bdc9c7 100644 --- a/InternalDocs/interpreter.md +++ b/InternalDocs/interpreter.md @@ -525,8 +525,7 @@ Let's use the example of [`CONTAINS_OP`](../Doc/library/dis.rst#contains_op): op(_CONTAINS_OP, ...); ``` -- Add a uop that calls the specializing function `_SPECIALIZE_CONTAINS_OP`. - For example. +- Add a uop that calls the specializing function: ```c specializing op(_SPECIALIZE_CONTAINS_OP, (counter/1, left, right -- left, right)) { @@ -542,8 +541,11 @@ Let's use the example of [`CONTAINS_OP`](../Doc/library/dis.rst#contains_op): } ``` -- The original `CONTAINS_OP` is now a new macro consisting of - `_SPECIALIZE_CONTAINS_OP` and `_CONTAINS_OP`. +- Create a macro for the original bytecode name: + + ```c + macro(CONTAINS_OP) = _SPECIALIZE_CONTAINS_OP + _CONTAINS_OP; + ``` 2. Define the cache structure in [Include/internal/pycore_code.h](../Include/internal/pycore_code.h), at the very least, a 16-bit counter is needed. From d9783ca52b10307ded7ea1fa608a9878ca404e0b Mon Sep 17 00:00:00 2001 From: faizanoor3001 Date: Tue, 4 Mar 2025 23:16:33 +0200 Subject: [PATCH 4/5] Added new lines between steps for better readability --- InternalDocs/interpreter.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/InternalDocs/interpreter.md b/InternalDocs/interpreter.md index c84e9a92bdc9c7..bc1a679bc6a68f 100644 --- a/InternalDocs/interpreter.md +++ b/InternalDocs/interpreter.md @@ -558,11 +558,15 @@ at the very least, a 16-bit counter is needed. 3. Write the specializing function itself in [Python/specialize.c ](../Python/specialize.c). Refer to any other function in that file for the format. + 4. Remember to update operation stats by calling `add_stat_dict` in [Python/specialize.c ](../Python/specialize.c). + 5. Add the cache layout in [Lib/opcode.py](../Lib/opcode.py) so that Python's dis module will know how to represent it properly. + 6. Bump magic number in [Include/core/pycore_magic_number.h](../Include/internal/pycore_magic_number.h). + 7. Run ``make regen-all`` on `*nix` or `build.bat --regen` on Windows. From 852cbaf9d98def8f8cb8041a5d10fa32410e16d4 Mon Sep 17 00:00:00 2001 From: faizanoor3001 Date: Tue, 4 Mar 2025 23:40:42 +0200 Subject: [PATCH 5/5] Updated the text for the steps 3,4,5 as per comment --- InternalDocs/interpreter.md | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/InternalDocs/interpreter.md b/InternalDocs/interpreter.md index bc1a679bc6a68f..0e43fc1634cdcd 100644 --- a/InternalDocs/interpreter.md +++ b/InternalDocs/interpreter.md @@ -556,14 +556,13 @@ at the very least, a 16-bit counter is needed. } _PyContainsOpCache; ``` -3. Write the specializing function itself in [Python/specialize.c ](../Python/specialize.c). - Refer to any other function in that file for the format. +3. Write the specializing function itself (`_Py_Specialize_ContainsOp`) in [Python/specialize.c ](../Python/specialize.c). +Refer to other functions in that file for the pattern. -4. Remember to update operation stats by calling `add_stat_dict` in - [Python/specialize.c ](../Python/specialize.c). +4. Add a call to `add_stat_dict` in `_Py_GetSpecializationStats` which is in [Python/specialize.c ](../Python/specialize.c). 5. Add the cache layout in [Lib/opcode.py](../Lib/opcode.py) so that Python's - dis module will know how to represent it properly. + `dis` module will know how to represent it properly. 6. Bump magic number in [Include/core/pycore_magic_number.h](../Include/internal/pycore_magic_number.h).