diff --git a/README.md b/README.md index 22263724..6fa511fd 100644 --- a/README.md +++ b/README.md @@ -21,7 +21,7 @@ ______________________________________________________________________ -pytask is a workflow management system which facilitates reproducible data analyses. Its +pytask is a workflow management system that facilitates reproducible data analyses. Its features include: - **Automatic discovery of tasks.** @@ -73,7 +73,7 @@ template or start from # Usage -A task is a function which is detected if the module and the function name are prefixed +A task is a function that is detected if the module and the function name are prefixed with `task_`. Here is an example. ```python @@ -89,12 +89,12 @@ def task_hello_earth(produces): Here are some details: -- Dependencies and products of a task are tracked via markers. For dependencies use - `@pytask.mark.depends_on` and for products use `@pytask.mark.produces`. Use strings - and `pathlib.Path` to specify the location. -- Use `produces` (and `depends_on`) as function arguments to access the paths of the - dependencies and products inside the function. All values are converted to - `pathlib.Path`'s. Here, `produces` holds the path to `"hello_earth.txt"`. +- Dependencies and products of a task are tracked via markers. Use + `@pytask.mark.depends_on` for dependencies and `@pytask.mark.produces` for products. + Values are strings or `pathlib.Path` and point to files on the disk. +- Use `produces` (and `depends_on`) as function arguments to access the paths inside the + function. pytask converts all paths to `pathlib.Path`'s. Here, `produces` holds the + path to `"hello_earth.txt"`. To execute the task, enter `pytask` on the command-line @@ -102,7 +102,7 @@ To execute the task, enter `pytask` on the command-line # Documentation -The documentation can be found under with +You find the documentation with [tutorials](https://pytask-dev.readthedocs.io/en/stable/tutorials/index.html) and guides for [best practices](https://pytask-dev.readthedocs.io/en/stable/how_to_guides/index.html). @@ -121,12 +121,12 @@ pytask is distributed under the terms of the [MIT license](LICENSE). The license also includes a copyright and permission notice from [pytest](https://github.com/pytest-dev/pytest) since some modules, classes, and functions are copied from pytest. Not to mention how pytest has inspired the development -of pytask in general. Without the amazing work of +of pytask in general. Without the excellent work of [Holger Krekel](https://github.com/hpk42) and pytest's many contributors, this project would not have been possible. Thank you! -pytask ows its beautiful appearance on the command line to -[rich](https://github.com/Textualize/rich) written by +pytask owes its beautiful appearance on the command line to +[rich](https://github.com/Textualize/rich), written by [Will McGugan](https://github.com/willmcgugan). Repeating tasks in loops is inspired by [ward](https://github.com/darrenburns/ward) @@ -135,7 +135,7 @@ written by [Darren Burns](https://github.com/darrenburns). # Citation If you rely on pytask to manage your research project, please cite it with the following -key to help others to discover the tool. +key to helping others to discover the tool. ```bibtex @Unpublished{Raabe2020, diff --git a/docs/source/explanations/comparison_to_other_tools.md b/docs/source/explanations/comparison_to_other_tools.md index 92e947b6..c021f07c 100644 --- a/docs/source/explanations/comparison_to_other_tools.md +++ b/docs/source/explanations/comparison_to_other_tools.md @@ -122,3 +122,7 @@ Cons General - A general task-runner with task defined in yaml files. + +## [zenml](https://github.com/zenml-io/zenml) + +## [flyte](https://github.com/flyteorg/flyte) diff --git a/docs/source/tutorials/capturing_output.md b/docs/source/tutorials/capturing_output.md index 1d79021b..09ad662e 100644 --- a/docs/source/tutorials/capturing_output.md +++ b/docs/source/tutorials/capturing_output.md @@ -1,7 +1,7 @@ # Capturing output What is capturing? Some of your tasks may use {func}`print` statements, have progress -bars, require user input or the libraries you are using show information during +bars, require user input, or the libraries you are using show information during execution. Since the output would pollute the terminal and the information shown by pytask, it @@ -13,15 +13,15 @@ the error. ## Default stdout/stderr/stdin capturing behavior -During task execution any output sent to `stdout` and `stderr` is captured. If a task -fails its captured output will usually be shown along with the failure traceback. +Any output sent to `stdout` and `stderr` is captured during task execution. pytask +displays it only if the task fails in addition to the traceback. In addition, `stdin` is set to a "null" object which will fail on attempts to read from it because it is rarely desired to wait for interactive input when running automated tasks. -By default capturing is done by intercepting writes to low level file descriptors. This -allows to capture output from simple {func}`print` statements as well as output from a +By default, capturing is done by intercepting writes to low-level file descriptors. This +allows capturing output from simple {func}`print` statements as well as output from a subprocess started by a task. ## Setting capturing methods or disabling capturing diff --git a/docs/source/tutorials/cleaning_projects.md b/docs/source/tutorials/cleaning_projects.md index ea29fef3..e4142935 100644 --- a/docs/source/tutorials/cleaning_projects.md +++ b/docs/source/tutorials/cleaning_projects.md @@ -2,12 +2,12 @@ Projects usually become cluttered with obsolete files after some time. -To clean the project from files which are not recognized by pytask and type +To clean the project, type `pytask clean` ```{image} /_static/images/clean-dry-run.svg ``` -pytask performs a dry-run by default and shows all the files which can be removed. +pytask performs a dry-run by default and lists all removable files. If you want to remove the files, use {option}`pytask clean --mode` with one of the following modes. @@ -16,23 +16,22 @@ following modes. - `interactive` allows you to decide for every file whether to keep it or not. If you want to delete complete folders instead of single files, use -{option}`pytask clean --directories`. If all content in a directory can be removed, only -the directory is shown. +{option}`pytask clean --directories`. ```{image} /_static/images/clean-dry-run-directories.svg ``` ## Excluding files -Files which are under version control with git are excluded from the cleaning process. +pytask excludes files that are under version control with git. -If other files or directories should be excluded as well, you can use the -{option}`pytask clean --exclude` option or the `exclude` key in the configuration file. +Use the {option}`pytask clean --exclude` option or the `exclude` key in the +configuration file to exclude files and directories. -The value can be a Unix filename pattern which is documented in {mod}`fnmatch` and -supports the wildcard character `*` for any characters and other symbols. +Values can be Unix filename patterns that, for example, support the wildcard character +`*` for any characters. You find the documentation in {mod}`fnmatch`. -Here is an example where the `obsolete_folder` is excluded from the cleaning process. +Here is an example for excluding a folder. ```console $ pytask clean --exclude obsolete_folder diff --git a/docs/source/tutorials/collecting_tasks.md b/docs/source/tutorials/collecting_tasks.md index e60f38f3..48f479e5 100644 --- a/docs/source/tutorials/collecting_tasks.md +++ b/docs/source/tutorials/collecting_tasks.md @@ -1,9 +1,9 @@ # Collecting tasks -If you want to inspect your project and see a summary of all the tasks in the projects, -you can use the `pytask collect` command. +If you want to inspect your project and see a summary of all the tasks, you can use the +`pytask collect` command. -For example, let us take the following task +Let us take the following task. ```python # Content of task_module.py @@ -22,13 +22,13 @@ Now, running `pytask collect` will produce the following output. ```{image} /_static/images/collect.svg ``` -If you want to have more information regarding dependencies and products of the task, -append the {option}`pytask collect --nodes` flag. +If you want to have more information regarding the dependencies and products of the +task, append the {option}`pytask collect --nodes` flag. ```{image} /_static/images/collect-nodes.svg ``` -To restrict the set of tasks you are looking at, use markers, expression and ignore +To restrict the set of tasks you are looking at, use markers, expressions and ignore patterns as usual. ## Further reading diff --git a/docs/source/tutorials/configuration.md b/docs/source/tutorials/configuration.md index 13e6365c..bf20c8ec 100644 --- a/docs/source/tutorials/configuration.md +++ b/docs/source/tutorials/configuration.md @@ -3,8 +3,8 @@ pytask can be configured via the command-line interface or permanently with a `pyproject.toml` file. -The file also indicates the root of your project where pytask stores information on -whether tasks need to be executed or not in a `.pytask.sqlite3` database. +The file also indicates the root of your project where pytask stores information in a +`.pytask.sqlite3` database. :::{important} `pytask.ini`, `tox.ini`, and `setup.cfg` will be deprecated as configuration files for @@ -15,8 +15,8 @@ your configuration in the `toml` format to facilitate the transition. ## The configuration file -You only need to add the header to the configuration file if you want to indicate the -root of your project. +You only need to add the header to the configuration file to indicate the root of your +project. ```toml [tool.pytask.ini_options] @@ -47,9 +47,9 @@ The second option is to let pytask try to find the configuration itself. 1. Find the common base directory of all paths passed to pytask (default to the current working directory). 2. Starting from this directory, look at all parent directories, and return the file if - it is found. -3. If a directory contains a `.git` directory/file, a `.hg` directory or a valid - configuration file with the right section stop searching. + it exists. +3. If a directory contains a `.git` directory/file, a `.hg` directory, or a valid + configuration file with the right section, stop searching. ## The options diff --git a/docs/source/tutorials/debugging.md b/docs/source/tutorials/debugging.md index 65809c9d..cf500a2c 100644 --- a/docs/source/tutorials/debugging.md +++ b/docs/source/tutorials/debugging.md @@ -12,8 +12,8 @@ find out the cause of the exception. ```{image} /_static/images/pdb.svg ``` -A following tutorial explains {doc}`how to select a subset of tasks `. -Combine it with the {option}`pytask build --pdb` flag to debug specific tasks. +One tutorial explains {doc}`how to select a subset of tasks `. Combine +it with the {option}`pytask build --pdb` flag to debug specific tasks. ## Tracing diff --git a/docs/source/tutorials/defining_dependencies_products.md b/docs/source/tutorials/defining_dependencies_products.md index d8776a83..517df398 100644 --- a/docs/source/tutorials/defining_dependencies_products.md +++ b/docs/source/tutorials/defining_dependencies_products.md @@ -1,10 +1,10 @@ # Defining dependencies and products -To ensure pytask executes all tasks in a correct order, define which dependencies are +To ensure pytask executes all tasks in correct order, define which dependencies are required and which products are produced by a task. :::{important} -If you do not specify dependencies and products as explained below, pytask will not able +If you do not specify dependencies and products as explained below, pytask will not be able to build a graph, a {term}`DAG`, and will not be able to execute all tasks in the project correctly! ::: @@ -27,13 +27,12 @@ Optionally, you can use `produces` as an argument of the task function and get a the same path inside the task function. :::{tip} -If you do not know about {mod}`pathlib` check out [^id3] and [^id4]. The module is very -useful to handle paths conveniently and across platforms. +If you do not know about {mod}`pathlib` check out [^id3] and [^id4]. The module is beneficial for handling paths conveniently and across platforms. ::: ## Dependencies -Most tasks have dependencies. Similar to products, you can use the +Most tasks have dependencies. Like products, you can use the {func}`@pytask.mark.depends_on ` marker to attach a dependency to a task. @@ -45,7 +44,7 @@ def task_plot_data(depends_on, produces): ... ``` -Use `depends_on` as a function argument to work with the path of the dependency and, for +Use `depends_on` as a function argument to work with the dependency path and, for example, load the data. ## Conversion @@ -63,14 +62,13 @@ def task_create_random_data(produces): ``` If you use `depends_on` or `produces` as arguments for the task function, you will have -access to the paths of the targets as {class}`pathlib.Path` even if strings were used -before. +access to the paths of the targets as {class}`pathlib.Path`. ## Multiple dependencies and products -Most tasks have multiple dependencies or products. The easiest way to attach multiple -dependencies or products to a task is to pass a {class}`dict` (highly recommended), -{class}`list` or another iterator to the marker containing the paths. +The easiest way to attach multiple dependencies or products to a task is to pass a +{class}`dict` (highly recommended), {class}`list` or another iterator to the marker +containing the paths. To assign labels to dependencies or products, pass a dictionary. For example, @@ -108,19 +106,16 @@ keys are the positions in the list. {0: BLD / "data_0.pkl", 1: BLD / "data_1.pkl"} ``` -Why does pytask recommend dictionaries and even converts lists, tuples or other +Why does pytask recommend dictionaries and convert lists, tuples, or other iterators to dictionaries? First, dictionaries with positions as keys behave very -similar to lists. +similarly to lists. -Secondly, dictionaries use keys instead of positions which is more verbose and -descriptive and does not assume a fixed ordering. Both attributes are especially -desirable in complex projects. +Secondly, dictionaries use keys instead of positions that are more verbose and descriptive and do not assume a fixed ordering. Both attributes are especially desirable in complex projects. ## Multiple decorators -You can also attach multiple decorators to a function which will be merged into a single -dictionary. This might help you to group certain dependencies and apply them to multiple -tasks. +pytask merges multiple decorators of one kind into a single dictionary. This might help +you to group dependencies and apply them to multiple tasks. ```python common_dependencies = pytask.mark.depends_on( @@ -143,12 +138,12 @@ Inside the task, `depends_on` will be ## Nested dependencies and products -Dependencies and products are allowed to be nested containers consisting of tuples, -lists, and dictionaries. It beneficial if you want more structure and nesting. +Dependencies and products can be nested containers consisting of tuples, lists, and +dictionaries. It is beneficial if you want more structure and nesting. -Here is an example with a task which fits some model on data. It depends on a module -containing the code for the model which is not actively used, but ensures that the task -is rerun when the model is changed. And, it depends on data. +Here is an example with a task that fits some model on data. It depends on a module +containing the code for the model, which is not actively used but ensures that the task +is rerun when the model is changed. And it depends on data. ```python @pytask.mark.depends_on( diff --git a/docs/source/tutorials/invoking_pytask.md b/docs/source/tutorials/invoking_pytask.md index 15f4d5dd..864b0ec2 100644 --- a/docs/source/tutorials/invoking_pytask.md +++ b/docs/source/tutorials/invoking_pytask.md @@ -1,6 +1,6 @@ # Invoking pytask -pytask is a command line program which can be invoked with +You can invoke pytask on the command line with ```console $ pytask @@ -15,7 +15,7 @@ $ pytask -h | --help ## Commands -pytask has multiple commands which are listed in the main help page. +pytask has multiple commands that are listed in the main help page. ```{image} /_static/images/help_page.svg ``` @@ -38,7 +38,7 @@ $ pytask --help ## The build command The build command accepts among many options paths as positional arguments. If no paths -are passed to the command line interface, pytask will look for the `paths` key in the +are passed via the command line interface, pytask will look for the `paths` key in the configuration file. At last, pytask will collect tasks from the current working directory and subsequent folders. @@ -55,7 +55,7 @@ might run your tasks with missing or outdated dependencies. ## Options -Here are some useful options for the build command. +Here are some valuable options for the build command. ### Showing errors immediately @@ -65,12 +65,12 @@ To show errors immediately when they occur, use $ pytask --show-errors-immediately ``` -It can be useful when you have a long-running workflow, but want feedback as soon as it +It can be helpful when you have a long-running workflow but want feedback as soon as it is available. ### Stopping after the first (N) failures -To stop the build of the project after the first (N) failures use +To stop the build of the project after the first `n` failures use ```console $ pytask -x | --stop-after-first-failure # Stop after the first failure diff --git a/docs/source/tutorials/making_tasks_persist.md b/docs/source/tutorials/making_tasks_persist.md index d5092632..0ad3ae34 100644 --- a/docs/source/tutorials/making_tasks_persist.md +++ b/docs/source/tutorials/making_tasks_persist.md @@ -1,24 +1,22 @@ # Making tasks persist -Sometimes you want to skip the execution of a task and pretend like nothing has changed. +Sometimes you want to skip the execution of a task and pretend as nothing has changed. -A common scenario is that you have a long running task which will be executed again if -you would format the task's source file with [black](https://github.com/psf/black). +A typical scenario is that you formatted the task's source files with [black](https://github.com/psf/black) which would rerun the task. In this case, you can apply the {func}`@pytask.mark.persist ` -decorator to the task which will skip its execution as long as all products exist. +decorator to the task, which will skip its execution as long as all products exist. -Internally, the state of the dependencies, the source file and the products is updated -in the database such that the next execution will skip the task successfully. +Internally, the state of the dependencies, the source file, and the products are updated +in the database such that the subsequent execution will skip the task successfully. ## When is this useful? - You ran a formatter like Black on the files in your project and want to prevent the - longest running tasks from being rerun. -- You extend a repetition of a task function, but do not want to rerun all tasks. -- You want to integrate a task which you have already run elsewhere. Place the - dependencies and products and the task definition in the correct place and make the - task persist. + longest-running tasks from being rerun. +- You extend a repetition of a task function but do not want to rerun all tasks. +- You want to integrate a task that you have already run elsewhere. Copy over the + dependencies and products and the task definition and make the task persist. :::{caution} This feature can corrupt the integrity of your project. Document why you have applied @@ -27,9 +25,9 @@ the decorator out of consideration for yourself and other contributors. ## How to do it? -To create a persisting task, apply the correct decorator and, et voilĂ , it is done. +To create a persisting task, apply the correct decorator, and, et voilĂ , it is done. -To see the whole process, first, we create some task and its dependency. +First, to see the whole process, we create a task and its dependency. ```python # Content of task_module.py @@ -50,21 +48,20 @@ def task_make_input_bold(depends_on, produces): Here is the text. ``` -If you execute the task with pytask, the task will be executed since the product is -missing. +Running pytask will execute the task since the product is missing. ```{image} /_static/images/persist-executed.svg ``` -After that, we change the source file of the task accidentally by formatting the file -with black. Without the {func}`@pytask.mark.persist ` decorator the +After that, we accidentally changed the task's source file by formatting the file +with Black. Without the {func}`@pytask.mark.persist ` decorator the task would run again since it has changed. With the decorator, the execution is skipped which is signaled by a green p. ```{image} /_static/images/persist-persisted.svg ``` -If we now run the task again, it is skipped because nothing has changed and not because +If we rerun the task, it is skipped because nothing has changed and not because it is marked with {func}`@pytask.mark.persist `. ```{image} /_static/images/persist-skipped.svg diff --git a/docs/source/tutorials/markers.md b/docs/source/tutorials/markers.md index 42c52aab..a0c498ae 100644 --- a/docs/source/tutorials/markers.md +++ b/docs/source/tutorials/markers.md @@ -6,10 +6,9 @@ available markers by using the `pytask markers` command. ```{image} /_static/images/markers.svg ``` -You can use your own markers to select tasks as explained in this -{ref}`tutorial `. +You can use your markers to select tasks as explained in this {ref}`tutorial `. -If you create your own marker, register it in the configuration file with its name and a +If you create your marker, register it in the configuration file with its name and a description. ```toml diff --git a/docs/source/tutorials/plugins.md b/docs/source/tutorials/plugins.md index 1a421843..f2a8e4de 100644 --- a/docs/source/tutorials/plugins.md +++ b/docs/source/tutorials/plugins.md @@ -1,24 +1,24 @@ # Plugins -Since pytask is used in many different contexts, all possible applications are -unforeseeable and cannot be directly supported by pytask's developers. +Users employ pytask in many different contexts, making it impossible for pytask's +maintainers to support all possible use-cases. -Therefore, pytask is built upon [pluggy](https://github.com/pytest-dev/pluggy), a plugin -framework also used in pytest which allows other developers to extend pytask. +Therefore, pytask uses [pluggy](https://github.com/pytest-dev/pluggy), a plugin +framework, for allowing users to extend pytask. ## Where to find plugins -Plugins can be found in many places. +You can find plugins in many places. -- All plugins should appear in this {doc}`automatically updated list <../plugin_list>` +- All plugins should appear in this {doc}`automatically updated list <../plugin_list>`, which is created by scanning packages on PyPI. - Check out the repositories in the [pytask-dev](https://github.com/pytask-dev) Github organization for a collection of officially supported plugins. -- Check out the [pytask Github topic](https://github.com/topics/pytask) which shows an +- Check out the [pytask Github topic](https://github.com/topics/pytask), which shows an overview of repositories linked to pytask. - Search on [anaconda.org](https://anaconda.org/search?q=pytask) for related packages. -## How to implement your own plugin +## How to implement your plugin Follow the {doc}`guide on writing a plugin <../how_to_guides/how_to_write_a_plugin>` to write your own plugin. diff --git a/docs/source/tutorials/profiling_tasks.md b/docs/source/tutorials/profiling_tasks.md index ea473d56..4f5681ab 100644 --- a/docs/source/tutorials/profiling_tasks.md +++ b/docs/source/tutorials/profiling_tasks.md @@ -1,7 +1,6 @@ # Profiling tasks -pytask collects information on the runtime of tasks when they finished successfully. To -display the information, enter +pytask collects information on tasks when they succeed. To display the data, enter ```console $ pytask profile