You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+76-76
Original file line number
Diff line number
Diff line change
@@ -12,140 +12,140 @@ The workflow involves chaining together parameterized tasks which pass multiple
12
12
13
13
`d6tflow` to the rescue! **With d6tflow you can easily chain together complex data flows and execute them. You can quickly load input and output data for each task.** It makes your workflow very clear and intuitive.
14
14
15
-
#### When to use d6tflow?
16
-
17
-
* Data engineering: when you prepare and analyze data with pandas or dask. That is you load, filter, transform, join data
18
-
* Data science: when you analyze data with ANY ML library including sklearn, pytorch, keras. That is you perform EDA, feature engineering, model training and evaluation
19
-
20
15
#### Read more at:
21
16
[4 Reasons Why Your Machine Learning Code is Probably Bad](https://github.com/d6t/d6t-python/blob/master/blogs/reasons-why-bad-ml-code.rst)
22
17
[How d6tflow is different from airflow/luigi](https://github.com/d6t/d6t-python/blob/master/blogs/datasci-dags-airflow-meetup.md)
* Data science: you want to build better models faster. Your workflow is EDA, feature engineering, model training and evaluation. d6tflow works with ANY ML library including sklearn, pytorch, keras
25
+
* Data engineering: you want to build robust data pipelines using a lightweight yet powerful library. You workflow is load, filter, transform, join data in pandas, dask or pyspark.
26
+
27
27
## What can d6tflow do for you?
28
28
29
+
* Data science
30
+
* Experiment management: easily manage workflows that compare different models to find the best one
31
+
* Scalable workflows: build an efficient data workflow that support rapid prototyping and iterations
32
+
* Cache data: easily save/load intermediary calculations to reduce model training time
33
+
* Model deployment: d6tflow workflows are easier to deploy to production
29
34
* Data engineering
30
35
* Build a data workflow made up of tasks with dependencies and parameters
31
-
* Check task dependencies and their execution status
36
+
* Visualize task dependencies and their execution status
32
37
* Execute tasks including dependencies
33
38
* Intelligently continue workflows after failed tasks
34
39
* Intelligently rerun workflow after changing parameters, code or data
35
-
* Intelligently manage parameters between dependencies
36
-
* Save task output to Parquet, CSV, JSON, pickle and in-memory
37
-
* Load task output to pandas dataframe and python objects
38
40
* Quickly share and hand off output data to others
39
-
* Data science
40
-
* Scalable workflows: build an efficient data workflow made up of tasks with dependencies and parameters
41
-
* Experiment tracking: compare model performance with different preprocessing and model selection options
42
-
* Model deployment: d6tflow workflows are easier to deploy to production
43
41
44
42
45
43
## Installation
46
44
47
-
Install with `pip install d6tflow`. To update, run `pip install d6tflow -U --no-deps`.
45
+
Install with `pip install d6tflow`. To update, run `pip install d6tflow -U`.
48
46
49
-
You can also clone the repo and run `pip install .`
47
+
If you are behind an enterprise firewall, you can also clone/download the repo and run `pip install .`
50
48
51
49
**Python3 only** You might need to call `pip3 install d6tflow` if you have not set python 3 as default.
52
50
53
51
To install latest DEV `pip install git+git://github.com/d6t/d6tflow.git` or upgrade `pip install git+git://github.com/d6t/d6tflow.git -U --no-deps`
54
52
55
-
## Example 1: Introduction
53
+
## Example: Model Comparison
54
+
55
+
Below is an introductory example that gets training data, trains two models and compares their performance.
56
56
57
-
This is a minial example. Be sure to check out the ML workflow example below.
57
+
**[See the full ML workflow example here](http://tiny.cc/d6tflow-start-example)**
Alternatively, chain together functions into a workflow and get the power of d6tflow with only little change in code. **[Jupyter notebook example](https://github.com/d6t/d6tflow/blob/master/docs/example-functional.ipynb)**
*[Rapid Prototyping for Quantitative Investing with d6tflow](https://github.com/d6tdev/d6tflow-binder-interactive/blob/master/example-trading.ipynb)
127
+
* Chain together functions into a workflow and get the power of d6tflow with only little change in code. **[Jupyter notebook example](https://github.com/d6t/d6tflow/blob/master/docs/example-functional.ipynb)**
126
128
127
129
## Documentation
128
130
129
131
Library usage and reference https://d6tflow.readthedocs.io
Transition to d6tflow from typical scripts [5 Step Guide to Scalable Deep Learning Pipelines with d6tflow](https://htmlpreview.github.io/?https://github.com/d6t/d6t-python/blob/master/blogs/blog-20190813-d6tflow-pytorch.html)
134
136
135
-
136
-
## d6tpipe Integration
137
-
138
-
To quickly share workflow outputs, we recommend you make use of [d6tpipe](https://github.com/d6t/d6tpipe). See [Sharing Workflows and Outputs](https://d6tflow.readthedocs.io/en/latest/collaborate.html).
0 commit comments