chore: Update docs for push apis and stream feature views. (#2846)

kevjumba · web-flow · commit eebf78db76ae · 2022-06-23T13:33:01.000-07:00
* Update docs

Signed-off-by: Kevin Zhang &lt;kzhang@tecton.ai&gt;

* Fix

Signed-off-by: Kevin Zhang &lt;kzhang@tecton.ai&gt;

* Fix

Signed-off-by: Kevin Zhang &lt;kzhang@tecton.ai&gt;

* Fix

Signed-off-by: Kevin Zhang &lt;kzhang@tecton.ai&gt;

* Fix

Signed-off-by: Kevin Zhang &lt;kzhang@tecton.ai&gt;

* Fix

Signed-off-by: Kevin Zhang &lt;kzhang@tecton.ai&gt;

* Fix

Signed-off-by: Kevin Zhang &lt;kzhang@tecton.ai&gt;

* Fix

Signed-off-by: Kevin Zhang &lt;kzhang@tecton.ai&gt;

* Fix

Signed-off-by: Kevin Zhang &lt;kzhang@tecton.ai&gt;
diff --git a/README.md b/README.md
@@ -179,8 +179,8 @@ The list below contains the functionality that contributors are planning to deve
   * [ ] Batch transformation (In progress. See [RFC](https://docs.google.com/document/d/1964OkzuBljifDvkV-0fakp2uaijnVzdwWNGdz7Vz50A/edit))
 * **Streaming**
   * [x] [Custom streaming ingestion job support](https://docs.feast.dev/how-to-guides/creating-a-custom-provider)
-  * [x] [Push based streaming data ingestion to online store](https://docs.feast.dev/reference/data-sources/push)
-  * [ ] Push based streaming data ingestion to offline store (In Progress)
+  * [x] [Push based streaming data ingestion to online store (Alpha)](https://docs.feast.dev/reference/data-sources/push)
+  * [x] [Push based streaming data ingestion to offline store (Alpha)](https://docs.feast.dev/reference/data-sources/push)
 * **Deployments**
   * [x] AWS Lambda (Alpha release. See [RFC](https://docs.google.com/document/d/1eZWKWzfBif66LDN32IajpaG-j82LSHCCOzY6R7Ax7MI/edit))
   * [x] Kubernetes (See [guide](https://docs.feast.dev/how-to-guides/running-feast-in-production#4.3.-java-based-feature-server-deployed-on-kubernetes))
diff --git a/docs/getting-started/architecture-and-components/offline-store.md b/docs/getting-started/architecture-and-components/offline-store.md
@@ -13,3 +13,5 @@ It is not possible to query all data sources from all offline stores, and only a
 
 Please see the [Offline Stores](../../reference/offline-stores/) reference for more details on configuring offline stores.
 
+Please see the [Push Source](reference/data-sources/push.md) for reference on how to push features directly to the offline store in your feature store.
+
diff --git a/docs/how-to-guides/running-feast-in-production.md b/docs/how-to-guides/running-feast-in-production.md
@@ -3,15 +3,15 @@
 ## Overview
 
 After learning about Feast concepts and playing with Feast locally, you're now ready to use Feast in production.
-This guide aims to help with the transition from a sandbox project to production-grade deployment in the cloud or on-premise. 
+This guide aims to help with the transition from a sandbox project to production-grade deployment in the cloud or on-premise.
 
 Overview of typical production configuration is given below:
 
 ![Overview](production-simple.png)
 
 {% hint style="success" %}
-**Important note:** We're trying to keep Feast modular. With the exception of the core, most of the Feast blocks are loosely connected and can be used independently. Hence, you are free to build your own production configuration. 
-For example, you might not have a stream source and, thus, no need to write features in real-time to an online store. 
+**Important note:** We're trying to keep Feast modular. With the exception of the core, most of the Feast blocks are loosely connected and can be used independently. Hence, you are free to build your own production configuration.
+For example, you might not have a stream source and, thus, no need to write features in real-time to an online store.
 Or you might not need to retrieve online features.
 
 Furthermore, there's no single "true" approach. As you will see in this guide, Feast usually provides several options for each problem.
@@ -95,7 +95,7 @@ In summary, once you have set up a Git based repository with CI that runs `feast
 
 To keep your online store up to date, you need to run a job that loads feature data from your feature view sources into your online store. In Feast, this loading operation is called materialization.
 
-### 2.1. Manual materializations 
+### 2.1. Manual materializations
 The simplest way to schedule materialization is to run an **incremental** materialization using the Feast CLI:
 
 ```text
@@ -116,7 +116,7 @@ In the above example we are materializing the source data from the `driver_hourl
 
 The timestamps above should match the interval of data that has been computed by the data transformation system.
 
-### 2.2. Automate periodic materializations 
+### 2.2. Automate periodic materializations
 
 It is up to you which orchestration/scheduler to use to periodically run `$ feast materialize`.
 Feast keeps the history of materialization in its registry so that the choice could be as simple as a [unix cron util](https://en.wikipedia.org/wiki/Cron).
@@ -160,7 +160,7 @@ feature_refs = [
 ]
 
 training_df = fs.get_historical_features(
-    entity_df=entity_df, 
+    entity_df=entity_df,
     features=feature_refs,
 ).to_df()
 
@@ -214,7 +214,7 @@ There are three approaches for that purpose sorted from the most simple one (in
 
 This approach is the most convenient to keep your infrastructure as minimalistic as possible and avoid deploying extra services.
 The Feast Python SDK will connect directly to the online store (Redis, Datastore, etc), pull the feature data, and run transformations locally (if required).
-The obvious drawback is that your service must be written in Python to use the Feast Python SDK. 
+The obvious drawback is that your service must be written in Python to use the Feast Python SDK.
 A benefit of using a Python stack is that you can enjoy production-grade services with integrations with many existing data science tools.
 
 To integrate online retrieval into your service use the following code:
@@ -245,9 +245,9 @@ This service will provide an HTTP API with JSON I/O, which can be easily used wi
 ### 4.3. Java based Feature Server deployed on Kubernetes
 
 For users with very latency-sensitive and high QPS use-cases, Feast offers a high-performance Java feature server.
-Besides the benefits of running on JVM, this implementation also provides a gRPC API, which guarantees good connection utilization and 
-small request / response body size (compared to JSON). 
-You will need the Feast Java SDK to retrieve features from this service. This SDK wraps all the gRPC logic for you and provides more convenient APIs. 
+Besides the benefits of running on JVM, this implementation also provides a gRPC API, which guarantees good connection utilization and
+small request / response body size (compared to JSON).
+You will need the Feast Java SDK to retrieve features from this service. This SDK wraps all the gRPC logic for you and provides more convenient APIs.
 
 The Java based feature server can be deployed to Kubernetes cluster via Helm charts in a few simple steps:
 
@@ -292,9 +292,9 @@ def feast_writer(spark_df):
 streamingDF.writeStream.foreachBatch(feast_writer).start()
 ```
 
-### 5.2. Push service *(still under development)*
+### 5.2. Push Service (Alpha)
 
-Alternatively, if you want to ingest features directly from a broker (eg, Kafka or Kinesis), you can use the "push service", which will write to an online store.
+Alternatively, if you want to ingest features directly from a broker (eg, Kafka or Kinesis), you can use the "push service", which will write to an online store and/or offline store.
 This service will expose an HTTP API or when deployed on Serverless platforms like AWS Lambda or Google Cloud Run,
 this service can be directly connected to Kinesis or PubSub.
 
diff --git a/docs/reference/data-sources/kafka.md b/docs/reference/data-sources/kafka.md
@@ -4,9 +4,9 @@
 
 ## Description
 
-Kafka sources allow users to register Kafka streams as data sources. Feast currently does not launch or monitor jobs to ingest data from Kafka. Users are responsible for launching and monitoring their own ingestion jobs, which should write feature values to the online store through [FeatureStore.write_to_online_store](https://rtd.feast.dev/en/latest/index.html#feast.feature_store.FeatureStore.write_to_online_store). An example of how to launch such a job with Spark can be found [here](https://github.com/feast-dev/feast/tree/master/sdk/python/feast/infra/contrib).
+Kafka sources allow users to register Kafka streams as data sources. Feast currently does not launch or monitor jobs to ingest data from Kafka. Users are responsible for launching and monitoring their own ingestion jobs, which should write feature values to the online store through [FeatureStore.write_to_online_store](https://rtd.feast.dev/en/latest/index.html#feast.feature_store.FeatureStore.write_to_online_store). An example of how to launch such a job with Spark can be found [here](https://github.com/feast-dev/feast/tree/master/sdk/python/feast/infra/contrib). Feast also provides functionality to write to the offline store using the `write_to_offline_store` functionality.
 
-Kafka sources must have a batch source specified. The batch source will be used for retrieving historical features. Thus users are also responsible for writing data from their Kafka streams to a batch data source such as a data warehouse table. Feast plans on shipping `FeatureStore.write_to_offline_store` functionality soon, so users will be able to write data to the offline store just as easily as to the online store. When using a Kafka source as a stream source in the definition of a feature view, a batch source doesn't need to be specified in the feature view definition explicitly.
+Kafka sources must have a batch source specified. The batch source will be used for retrieving historical features. Thus users are also responsible for writing data from their Kafka streams to a batch data source such as a data warehouse table. When using a Kafka source as a stream source in the definition of a feature view, a batch source doesn't need to be specified in the feature view definition explicitly.
 
 ## Stream sources
 Streaming data sources are important sources of feature values. A typical setup with streaming data looks like:
diff --git a/docs/reference/data-sources/kinesis.md b/docs/reference/data-sources/kinesis.md
@@ -4,9 +4,9 @@
 
 ## Description
 
-Kinesis sources allow users to register Kinesis streams as data sources. Feast currently does not launch or monitor jobs to ingest data from Kinesis. Users are responsible for launching and monitoring their own ingestion jobs, which should write feature values to the online store through [FeatureStore.write_to_online_store](https://rtd.feast.dev/en/latest/index.html#feast.feature_store.FeatureStore.write_to_online_store). An example of how to launch such a job with Spark to ingest from Kafka can be found [here](https://github.com/feast-dev/feast/tree/master/sdk/python/feast/infra/contrib); by using a different plugin, the example can be adapted to Kinesis.
+Kinesis sources allow users to register Kinesis streams as data sources. Feast currently does not launch or monitor jobs to ingest data from Kinesis. Users are responsible for launching and monitoring their own ingestion jobs, which should write feature values to the online store through [FeatureStore.write_to_online_store](https://rtd.feast.dev/en/latest/index.html#feast.feature_store.FeatureStore.write_to_online_store). An example of how to launch such a job with Spark to ingest from Kafka can be found [here](https://github.com/feast-dev/feast/tree/master/sdk/python/feast/infra/contrib); by using a different plugin, the example can be adapted to Kinesis. Feast also provides functionality to write to the offline store using the `write_to_offline_store` functionality.
 
-Kinesis sources must have a batch source specified. The batch source will be used for retrieving historical features. Thus users are also responsible for writing data from their Kinesis streams to a batch data source such as a data warehouse table. Feast plans on shipping `FeatureStore.write_to_offline_store` functionality soon, so users will be able to write data to the offline store just as easily as to the online store. When using a Kinesis source as a stream source in the definition of a feature view, a batch source doesn't need to be specified in the feature view definition explicitly.
+Kinesis sources must have a batch source specified. The batch source will be used for retrieving historical features. Thus users are also responsible for writing data from their Kinesis streams to a batch data source such as a data warehouse table. When using a Kinesis source as a stream source in the definition of a feature view, a batch source doesn't need to be specified in the feature view definition explicitly.
 
 ## Stream sources
 Streaming data sources are important sources of feature values. A typical setup with streaming data looks like:
diff --git a/docs/reference/data-sources/push.md b/docs/reference/data-sources/push.md
@@ -4,12 +4,12 @@
 
 ## Description
 
-Push sources allow feature values to be pushed to the online store in real time. This allows fresh feature values to be made available to applications. Push sources supercede the 
+Push sources allow feature values to be pushed to the online store and offline store in real time. This allows fresh feature values to be made available to applications. Push sources supercede the
 [FeatureStore.write_to_online_store](https://rtd.feast.dev/en/latest/index.html#feast.feature_store.FeatureStore.write_to_online_store).
 
 Push sources can be used by multiple feature views. When data is pushed to a push source, Feast propagates the feature values to all the consuming feature views.
 
-Push sources must have a batch source specified. The batch source will be used for retrieving historical features. Thus users are also responsible for pushing data to a batch data source such as a data warehouse table. Feast plans on shipping `FeatureStore.write_to_offline_store` functionality soon, so users will be able to write data to the offline store just as easily as to the online store. When using a push source as a stream source in the definition of a feature view, a batch source doesn't need to be specified in the feature view definition explicitly.
+Push sources must have a batch source specified. The batch source will be used for retrieving historical features. Thus users are also responsible for pushing data to a batch data source such as a data warehouse table. When using a push source as a stream source in the definition of a feature view, a batch source doesn't need to be specified in the feature view definition explicitly.
 
 ## Stream sources
 Streaming data sources are important sources of feature values. A typical setup with streaming data looks like:
@@ -20,11 +20,11 @@ Streaming data sources are important sources of feature values. A typical setup
 4. Write stream 2 values to an online store for low latency feature serving
 5. Periodically materialize feature values from the offline store into the online store for decreased training-serving skew and improved model performance
 
-Feast now allows users to push features previously registered in a feature view to the online store for fresher features.
+Feast allows users to push features previously registered in a feature view to the online store for fresher features. It also allows users to push batches of stream data to the offline store by specifying that the push be directed to the offline store. This will push the data to the offline store declared in the repository configuration used to initialize the feature store.
 
 ## Example
 ### Defining a push source
-Note that the push schema needs to also include the entity
+Note that the push schema needs to also include the entity.
 
 ```python
 from feast import PushSource, ValueType, BigQuerySource, FeatureView, Feature, Field
@@ -44,14 +44,16 @@ fv = FeatureView(
 ```
 
 ### Pushing data
+Note that the `to` parameter is optional and defaults to online but we can specify these options: `PushMode.ONLINE`, `PushMode.OFFLINE`, or `PushMode.ONLINE_AND_OFFLINE`.
 ```python
 from feast import FeatureStore
 import pandas as pd
+from feast.data_source import PushMode
 
 fs = FeatureStore(...)
 feature_data_frame = pd.DataFrame()
-fs.push("push_source_name", feature_data_frame)
+fs.push("push_source_name", feature_data_frame, to=PushMode.ONLINE_AND_OFFLINE)
 ```
 
-See also [Python feature server](../feature-servers/python-feature-server.md) for instructions on how to push data to a deployed feature server. 
+See also [Python feature server](../feature-servers/python-feature-server.md) for instructions on how to push data to a deployed feature server.
 
diff --git a/docs/reference/feature-servers/python-feature-server.md b/docs/reference/feature-servers/python-feature-server.md
@@ -152,13 +152,12 @@ curl -X POST \
   }' | jq
 ```
 
-### Pushing features to the online store
-You can push data corresponding to a push source to the online store (note that timestamps need to be strings):
+### Pushing features to the online and offline stores
+You can push data corresponding to a push source to the online and offline stores (note that timestamps need to be strings):
 
-You can also define a pushmode to push offline data, either to the online store, offline store, or both. The feature server will throw an error if the online/offline
-store doesn't support the push api functionality.
+You can also define a pushmode to push stream or batch data, either to the online store, offline store, or both. The feature server will throw an error if the online/offline store doesn't support the push api functionality.
 
-The request definition for pushmode is a string parameter `to` where the options are: ["online", "offline", "both"].
+The request definition for pushmode is a string parameter `to` where the options are: ["online", "offline", "online_and_offline"].
 ```text
 curl -X POST "http://localhost:6566/push" -d '{
     "push_source_name": "driver_hourly_stats_push_source",
@@ -169,7 +168,8 @@ curl -X POST "http://localhost:6566/push" -d '{
             "conv_rate": [1.0],
             "acc_rate": [1.0],
             "avg_daily_trips": [1000]
-    }
+    },
+    "to": "online_and_offline",
   }' | jq
 ```
 
diff --git a/docs/roadmap.md b/docs/roadmap.md
@@ -44,8 +44,8 @@ The list below contains the functionality that contributors are planning to deve
   * [ ] Batch transformation (In progress. See [RFC](https://docs.google.com/document/d/1964OkzuBljifDvkV-0fakp2uaijnVzdwWNGdz7Vz50A/edit))
 * **Streaming**
   * [x] [Custom streaming ingestion job support](https://docs.feast.dev/how-to-guides/creating-a-custom-provider)
-  * [x] [Push based streaming data ingestion to online store](https://docs.feast.dev/reference/data-sources/push)
-  * [ ] Push based streaming data ingestion to offline store (In Progress)
+  * [x] [Push based streaming data ingestion to online store (Alpha)](https://docs.feast.dev/reference/data-sources/push)
+  * [x] [Push based streaming data ingestion to offline store (Alpha)](https://docs.feast.dev/reference/data-sources/push)
 * **Deployments**
   * [x] AWS Lambda (Alpha release. See [RFC](https://docs.google.com/document/d/1eZWKWzfBif66LDN32IajpaG-j82LSHCCOzY6R7Ax7MI/edit))
   * [x] Kubernetes (See [guide](https://docs.feast.dev/how-to-guides/running-feast-in-production#4.3.-java-based-feature-server-deployed-on-kubernetes))
diff --git a/docs/tutorials/building-streaming-features.md b/docs/tutorials/building-streaming-features.md
@@ -1,3 +1,5 @@
 # Building streaming features
 
-Please see [here](https://github.com/feast-dev/streaming-tutorial) for the tutorial.
+Feast supports registering streaming feature views and Kafka and Kinesis streaming sources. It also provides an interface for stream processing called the `Stream Processor`. An example Kafka/Spark StreamProcessor is implemented in the contrib folder. For more details, please see the [RFC](https://docs.google.com/document/d/1UzEyETHUaGpn0ap4G82DHluiCj7zEbrQLkJJkKSv4e8/edit?usp=sharing) for more details.
+
+Please see [here](https://github.com/feast-dev/streaming-tutorial) for a tutorial on how to build a versioned streaming pipeline that registers your transformations, features, and data sources in Feast.
diff --git a/sdk/python/feast/feature_server.py b/sdk/python/feast/feature_server.py
@@ -86,8 +86,12 @@ def push(body=Depends(get_body)):
                 to = PushMode.OFFLINE
             elif request.to == "online":
                 to = PushMode.ONLINE
-            else:
+            elif request.to == "online_and_offline":
                 to = PushMode.ONLINE_AND_OFFLINE
+            else:
+                raise ValueError(
+                    f"{request.to} is not a supported push format. Please specify one of these ['online', 'offline', 'online_and_offline']."
+                )
             store.push(
                 push_source_name=request.push_source_name,
                 df=df,

Original file line number	Diff line number	Diff line change
`@@ -13,3 +13,5 @@ It is not possible to query all data sources from all offline stores, and only a`
`13`	`13`
`14`	`14`	`Please see the [Offline Stores](../../reference/offline-stores/) reference for more details on configuring offline stores.`
`15`	`15`
	`16`	`+Please see the [Push Source](reference/data-sources/push.md) for reference on how to push features directly to the offline store in your feature store.`
	`17`	`+`