Skip to content

Spell and grammar checking for some tutorials. #279

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 3, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 13 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ ______________________________________________________________________

<!-- Keep in sync with docs/source/index.md -->

pytask is a workflow management system which facilitates reproducible data analyses. Its
pytask is a workflow management system that facilitates reproducible data analyses. Its
features include:

- **Automatic discovery of tasks.**
Expand Down Expand Up @@ -73,7 +73,7 @@ template or start from

# Usage

A task is a function which is detected if the module and the function name are prefixed
A task is a function that is detected if the module and the function name are prefixed
with `task_`. Here is an example.

```python
Expand All @@ -89,20 +89,20 @@ def task_hello_earth(produces):

Here are some details:

- Dependencies and products of a task are tracked via markers. For dependencies use
`@pytask.mark.depends_on` and for products use `@pytask.mark.produces`. Use strings
and `pathlib.Path` to specify the location.
- Use `produces` (and `depends_on`) as function arguments to access the paths of the
dependencies and products inside the function. All values are converted to
`pathlib.Path`'s. Here, `produces` holds the path to `"hello_earth.txt"`.
- Dependencies and products of a task are tracked via markers. Use
`@pytask.mark.depends_on` for dependencies and `@pytask.mark.produces` for products.
Values are strings or `pathlib.Path` and point to files on the disk.
- Use `produces` (and `depends_on`) as function arguments to access the paths inside the
function. pytask converts all paths to `pathlib.Path`'s. Here, `produces` holds the
path to `"hello_earth.txt"`.

To execute the task, enter `pytask` on the command-line

![image](https://github.com/pytask-dev/pytask/raw/main/docs/source/_static/images/readme.svg)

# Documentation

The documentation can be found under <https://pytask-dev.readthedocs.io/en/stable> with
You find the documentation <https://pytask-dev.readthedocs.io/en/stable> with
[tutorials](https://pytask-dev.readthedocs.io/en/stable/tutorials/index.html) and guides
for
[best practices](https://pytask-dev.readthedocs.io/en/stable/how_to_guides/index.html).
Expand All @@ -121,12 +121,12 @@ pytask is distributed under the terms of the [MIT license](LICENSE).
The license also includes a copyright and permission notice from
[pytest](https://github.com/pytest-dev/pytest) since some modules, classes, and
functions are copied from pytest. Not to mention how pytest has inspired the development
of pytask in general. Without the amazing work of
of pytask in general. Without the excellent work of
[Holger Krekel](https://github.com/hpk42) and pytest's many contributors, this project
would not have been possible. Thank you!

pytask ows its beautiful appearance on the command line to
[rich](https://github.com/Textualize/rich) written by
pytask owes its beautiful appearance on the command line to
[rich](https://github.com/Textualize/rich), written by
[Will McGugan](https://github.com/willmcgugan).

Repeating tasks in loops is inspired by [ward](https://github.com/darrenburns/ward)
Expand All @@ -135,7 +135,7 @@ written by [Darren Burns](https://github.com/darrenburns).
# Citation

If you rely on pytask to manage your research project, please cite it with the following
key to help others to discover the tool.
key to helping others to discover the tool.

```bibtex
@Unpublished{Raabe2020,
Expand Down
4 changes: 4 additions & 0 deletions docs/source/explanations/comparison_to_other_tools.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,3 +122,7 @@ Cons
General

- A general task-runner with task defined in yaml files.

## [zenml](https://github.com/zenml-io/zenml)

## [flyte](https://github.com/flyteorg/flyte)
10 changes: 5 additions & 5 deletions docs/source/tutorials/capturing_output.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Capturing output

What is capturing? Some of your tasks may use {func}`print` statements, have progress
bars, require user input or the libraries you are using show information during
bars, require user input, or the libraries you are using show information during
execution.

Since the output would pollute the terminal and the information shown by pytask, it
Expand All @@ -13,15 +13,15 @@ the error.

## Default stdout/stderr/stdin capturing behavior

During task execution any output sent to `stdout` and `stderr` is captured. If a task
fails its captured output will usually be shown along with the failure traceback.
Any output sent to `stdout` and `stderr` is captured during task execution. pytask
displays it only if the task fails in addition to the traceback.

In addition, `stdin` is set to a "null" object which will fail on attempts to read from
it because it is rarely desired to wait for interactive input when running automated
tasks.

By default capturing is done by intercepting writes to low level file descriptors. This
allows to capture output from simple {func}`print` statements as well as output from a
By default, capturing is done by intercepting writes to low-level file descriptors. This
allows capturing output from simple {func}`print` statements as well as output from a
subprocess started by a task.

## Setting capturing methods or disabling capturing
Expand Down
19 changes: 9 additions & 10 deletions docs/source/tutorials/cleaning_projects.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,12 @@

Projects usually become cluttered with obsolete files after some time.

To clean the project from files which are not recognized by pytask and type
To clean the project, type `pytask clean`

```{image} /_static/images/clean-dry-run.svg
```

pytask performs a dry-run by default and shows all the files which can be removed.
pytask performs a dry-run by default and lists all removable files.

If you want to remove the files, use {option}`pytask clean --mode` with one of the
following modes.
Expand All @@ -16,23 +16,22 @@ following modes.
- `interactive` allows you to decide for every file whether to keep it or not.

If you want to delete complete folders instead of single files, use
{option}`pytask clean --directories`. If all content in a directory can be removed, only
the directory is shown.
{option}`pytask clean --directories`.

```{image} /_static/images/clean-dry-run-directories.svg
```

## Excluding files

Files which are under version control with git are excluded from the cleaning process.
pytask excludes files that are under version control with git.

If other files or directories should be excluded as well, you can use the
{option}`pytask clean --exclude` option or the `exclude` key in the configuration file.
Use the {option}`pytask clean --exclude` option or the `exclude` key in the
configuration file to exclude files and directories.

The value can be a Unix filename pattern which is documented in {mod}`fnmatch` and
supports the wildcard character `*` for any characters and other symbols.
Values can be Unix filename patterns that, for example, support the wildcard character
`*` for any characters. You find the documentation in {mod}`fnmatch`.

Here is an example where the `obsolete_folder` is excluded from the cleaning process.
Here is an example for excluding a folder.

```console
$ pytask clean --exclude obsolete_folder
Expand Down
12 changes: 6 additions & 6 deletions docs/source/tutorials/collecting_tasks.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
# Collecting tasks

If you want to inspect your project and see a summary of all the tasks in the projects,
you can use the `pytask collect` command.
If you want to inspect your project and see a summary of all the tasks, you can use the
`pytask collect` command.

For example, let us take the following task
Let us take the following task.

```python
# Content of task_module.py
Expand All @@ -22,13 +22,13 @@ Now, running `pytask collect` will produce the following output.
```{image} /_static/images/collect.svg
```

If you want to have more information regarding dependencies and products of the task,
append the {option}`pytask collect --nodes` flag.
If you want to have more information regarding the dependencies and products of the
task, append the {option}`pytask collect --nodes` flag.

```{image} /_static/images/collect-nodes.svg
```

To restrict the set of tasks you are looking at, use markers, expression and ignore
To restrict the set of tasks you are looking at, use markers, expressions and ignore
patterns as usual.

## Further reading
Expand Down
14 changes: 7 additions & 7 deletions docs/source/tutorials/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@
pytask can be configured via the command-line interface or permanently with a
`pyproject.toml` file.

The file also indicates the root of your project where pytask stores information on
whether tasks need to be executed or not in a `.pytask.sqlite3` database.
The file also indicates the root of your project where pytask stores information in a
`.pytask.sqlite3` database.

:::{important}
`pytask.ini`, `tox.ini`, and `setup.cfg` will be deprecated as configuration files for
Expand All @@ -15,8 +15,8 @@ your configuration in the `toml` format to facilitate the transition.

## The configuration file

You only need to add the header to the configuration file if you want to indicate the
root of your project.
You only need to add the header to the configuration file to indicate the root of your
project.

```toml
[tool.pytask.ini_options]
Expand Down Expand Up @@ -47,9 +47,9 @@ The second option is to let pytask try to find the configuration itself.
1. Find the common base directory of all paths passed to pytask (default to the current
working directory).
2. Starting from this directory, look at all parent directories, and return the file if
it is found.
3. If a directory contains a `.git` directory/file, a `.hg` directory or a valid
configuration file with the right section stop searching.
it exists.
3. If a directory contains a `.git` directory/file, a `.hg` directory, or a valid
configuration file with the right section, stop searching.

## The options

Expand Down
4 changes: 2 additions & 2 deletions docs/source/tutorials/debugging.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ find out the cause of the exception.
```{image} /_static/images/pdb.svg
```

A following tutorial explains {doc}`how to select a subset of tasks <selecting_tasks>`.
Combine it with the {option}`pytask build --pdb` flag to debug specific tasks.
One tutorial explains {doc}`how to select a subset of tasks <selecting_tasks>`. Combine
it with the {option}`pytask build --pdb` flag to debug specific tasks.

## Tracing

Expand Down
43 changes: 19 additions & 24 deletions docs/source/tutorials/defining_dependencies_products.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# Defining dependencies and products

To ensure pytask executes all tasks in a correct order, define which dependencies are
To ensure pytask executes all tasks in correct order, define which dependencies are
required and which products are produced by a task.

:::{important}
If you do not specify dependencies and products as explained below, pytask will not able
If you do not specify dependencies and products as explained below, pytask will not be able
to build a graph, a {term}`DAG`, and will not be able to execute all tasks in the
project correctly!
:::
Expand All @@ -27,13 +27,12 @@ Optionally, you can use `produces` as an argument of the task function and get a
the same path inside the task function.

:::{tip}
If you do not know about {mod}`pathlib` check out [^id3] and [^id4]. The module is very
useful to handle paths conveniently and across platforms.
If you do not know about {mod}`pathlib` check out [^id3] and [^id4]. The module is beneficial for handling paths conveniently and across platforms.
:::

## Dependencies

Most tasks have dependencies. Similar to products, you can use the
Most tasks have dependencies. Like products, you can use the
{func}`@pytask.mark.depends_on <pytask.mark.depends_on>` marker to attach a
dependency to a task.

Expand All @@ -45,7 +44,7 @@ def task_plot_data(depends_on, produces):
...
```

Use `depends_on` as a function argument to work with the path of the dependency and, for
Use `depends_on` as a function argument to work with the dependency path and, for
example, load the data.

## Conversion
Expand All @@ -63,14 +62,13 @@ def task_create_random_data(produces):
```

If you use `depends_on` or `produces` as arguments for the task function, you will have
access to the paths of the targets as {class}`pathlib.Path` even if strings were used
before.
access to the paths of the targets as {class}`pathlib.Path`.

## Multiple dependencies and products

Most tasks have multiple dependencies or products. The easiest way to attach multiple
dependencies or products to a task is to pass a {class}`dict` (highly recommended),
{class}`list` or another iterator to the marker containing the paths.
The easiest way to attach multiple dependencies or products to a task is to pass a
{class}`dict` (highly recommended), {class}`list` or another iterator to the marker
containing the paths.

To assign labels to dependencies or products, pass a dictionary. For example,

Expand Down Expand Up @@ -108,19 +106,16 @@ keys are the positions in the list.
{0: BLD / "data_0.pkl", 1: BLD / "data_1.pkl"}
```

Why does pytask recommend dictionaries and even converts lists, tuples or other
Why does pytask recommend dictionaries and convert lists, tuples, or other
iterators to dictionaries? First, dictionaries with positions as keys behave very
similar to lists.
similarly to lists.

Secondly, dictionaries use keys instead of positions which is more verbose and
descriptive and does not assume a fixed ordering. Both attributes are especially
desirable in complex projects.
Secondly, dictionaries use keys instead of positions that are more verbose and descriptive and do not assume a fixed ordering. Both attributes are especially desirable in complex projects.

## Multiple decorators

You can also attach multiple decorators to a function which will be merged into a single
dictionary. This might help you to group certain dependencies and apply them to multiple
tasks.
pytask merges multiple decorators of one kind into a single dictionary. This might help
you to group dependencies and apply them to multiple tasks.

```python
common_dependencies = pytask.mark.depends_on(
Expand All @@ -143,12 +138,12 @@ Inside the task, `depends_on` will be

## Nested dependencies and products

Dependencies and products are allowed to be nested containers consisting of tuples,
lists, and dictionaries. It beneficial if you want more structure and nesting.
Dependencies and products can be nested containers consisting of tuples, lists, and
dictionaries. It is beneficial if you want more structure and nesting.

Here is an example with a task which fits some model on data. It depends on a module
containing the code for the model which is not actively used, but ensures that the task
is rerun when the model is changed. And, it depends on data.
Here is an example with a task that fits some model on data. It depends on a module
containing the code for the model, which is not actively used but ensures that the task
is rerun when the model is changed. And it depends on data.

```python
@pytask.mark.depends_on(
Expand Down
12 changes: 6 additions & 6 deletions docs/source/tutorials/invoking_pytask.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Invoking pytask

pytask is a command line program which can be invoked with
You can invoke pytask on the command line with

```console
$ pytask
Expand All @@ -15,7 +15,7 @@ $ pytask -h | --help

## Commands

pytask has multiple commands which are listed in the main help page.
pytask has multiple commands that are listed in the main help page.

```{image} /_static/images/help_page.svg
```
Expand All @@ -38,7 +38,7 @@ $ pytask <command-name> --help
## The build command

The build command accepts among many options paths as positional arguments. If no paths
are passed to the command line interface, pytask will look for the `paths` key in the
are passed via the command line interface, pytask will look for the `paths` key in the
configuration file. At last, pytask will collect tasks from the current working
directory and subsequent folders.

Expand All @@ -55,7 +55,7 @@ might run your tasks with missing or outdated dependencies.

## Options

Here are some useful options for the build command.
Here are some valuable options for the build command.

### Showing errors immediately

Expand All @@ -65,12 +65,12 @@ To show errors immediately when they occur, use
$ pytask --show-errors-immediately
```

It can be useful when you have a long-running workflow, but want feedback as soon as it
It can be helpful when you have a long-running workflow but want feedback as soon as it
is available.

### Stopping after the first (N) failures

To stop the build of the project after the first (N) failures use
To stop the build of the project after the first `n` failures use

```console
$ pytask -x | --stop-after-first-failure # Stop after the first failure
Expand Down
Loading