Hardcode common Op parametrizations to allow numba caching #1341
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is perhaps the ugliest PR of my life. Hardcoding common parametrizations of string-generated Ops helps a tiny bit with numba caching.
After caching, I see compile times of the logp-dlogp function of the nutpie readme example going down:
This includes PyMC-PyTensor compile time.
For reference FAST_RUN takes 1.8s then 1.5s, and JAX 2.0s then 1.5s
Test snippet
A more aggressive approach may come out of #1326, which would render this code duplication unnecessary, but that's still too green to see the light of day (and may prove completely impractical).
Also the Elemwise overload seems to always trigger some cache writing when the interpreter is launched again even if
store_core_outputs
and thecore_op
can be cached on subsequent runs.I don't know what's going on with that, perhaps @aseyboldt has an idea. Relevant snippet: