-
Notifications
You must be signed in to change notification settings - Fork 11
ENH: Caching of inputs #338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I like the idea! What are possible interfaces? Extending
|
IIUC, the above is mostly about hashing files, not hashing Python objects that are inputs to a task function? |
I had two things in mind.
And, then, hashing can be turned on if necessary. Paths are also currently not hashed; instead, we use the last modified date. |
Sounds good, but it might be hard to distinguish between I was genuinely surprised that the above did not work -- for me as a user it seems the same whether the body of a function changes or some input does. Is there any reason to allow task functions to have arguments beyond |
Maybe deprecate strings as filepaths or some logic to differentiate it. Second, can be indeed ugly.
Not all input is tracked. Only
Everything should be a dependency except for the products. Thus, we could remove |
Wild thought: Can't we borrow from dags logic and just use reserved keywords
for task functions instead of the
Sure, I understand that now, it just was not intuitive to me 😇 |
A new feature dropped in FastAPI that uses
|
The feature will be available in v0.4. It is documented here: https://pytask-dev.readthedocs.io/en/latest/how_to_guides/hashing_inputs_of_tasks.html. |
Is your feature request related to a problem?
Yes and no, at least pytask and my expectations were not aligned when I ran the code snippet below.
In particular, I expected the task to be re-run whenever the result from
load_model_dict()
changes, which is specified in a central module.Describe the solution you'd like
I would love to see the possibility to hash Python inputs similarly to what is being done for file contents. Usually, they will be much smaller and it allows for more granularity (e.g., above I could of course specify the central
config.py
as a dependency, but this will mean doing so everywhere and whenever it changes, the entire pipeline will be run. Splitting its contents across many files would also be possible, but ugly).I would actually like to see the inputs of
hash_python_inputs
or whatever a decorator might be called to be true in the spirit of correctness trumps performance (typically, these objects will be small relative to files).API breaking implications
If the default is set differently as suggested before, behavior in some cases might change. Else just an addition.
Additional context
Add any other context, code examples, or references to existing implementations about
the feature request here.
The text was updated successfully, but these errors were encountered: