Skip to content

Commit eebf78d

Browse files
authored
chore: Update docs for push apis and stream feature views. (#2846)
* Update docs Signed-off-by: Kevin Zhang <[email protected]> * Fix Signed-off-by: Kevin Zhang <[email protected]> * Fix Signed-off-by: Kevin Zhang <[email protected]> * Fix Signed-off-by: Kevin Zhang <[email protected]> * Fix Signed-off-by: Kevin Zhang <[email protected]> * Fix Signed-off-by: Kevin Zhang <[email protected]> * Fix Signed-off-by: Kevin Zhang <[email protected]> * Fix Signed-off-by: Kevin Zhang <[email protected]> * Fix Signed-off-by: Kevin Zhang <[email protected]>
1 parent 58ea103 commit eebf78d

File tree

10 files changed

+44
-34
lines changed

10 files changed

+44
-34
lines changed

README.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -179,8 +179,8 @@ The list below contains the functionality that contributors are planning to deve
179179
* [ ] Batch transformation (In progress. See [RFC](https://docs.google.com/document/d/1964OkzuBljifDvkV-0fakp2uaijnVzdwWNGdz7Vz50A/edit))
180180
* **Streaming**
181181
* [x] [Custom streaming ingestion job support](https://docs.feast.dev/how-to-guides/creating-a-custom-provider)
182-
* [x] [Push based streaming data ingestion to online store](https://docs.feast.dev/reference/data-sources/push)
183-
* [ ] Push based streaming data ingestion to offline store (In Progress)
182+
* [x] [Push based streaming data ingestion to online store (Alpha)](https://docs.feast.dev/reference/data-sources/push)
183+
* [x] [Push based streaming data ingestion to offline store (Alpha)](https://docs.feast.dev/reference/data-sources/push)
184184
* **Deployments**
185185
* [x] AWS Lambda (Alpha release. See [RFC](https://docs.google.com/document/d/1eZWKWzfBif66LDN32IajpaG-j82LSHCCOzY6R7Ax7MI/edit))
186186
* [x] Kubernetes (See [guide](https://docs.feast.dev/how-to-guides/running-feast-in-production#4.3.-java-based-feature-server-deployed-on-kubernetes))

docs/getting-started/architecture-and-components/offline-store.md

+2
Original file line numberDiff line numberDiff line change
@@ -13,3 +13,5 @@ It is not possible to query all data sources from all offline stores, and only a
1313

1414
Please see the [Offline Stores](../../reference/offline-stores/) reference for more details on configuring offline stores.
1515

16+
Please see the [Push Source](reference/data-sources/push.md) for reference on how to push features directly to the offline store in your feature store.
17+

docs/how-to-guides/running-feast-in-production.md

+12-12
Original file line numberDiff line numberDiff line change
@@ -3,15 +3,15 @@
33
## Overview
44

55
After learning about Feast concepts and playing with Feast locally, you're now ready to use Feast in production.
6-
This guide aims to help with the transition from a sandbox project to production-grade deployment in the cloud or on-premise.
6+
This guide aims to help with the transition from a sandbox project to production-grade deployment in the cloud or on-premise.
77

88
Overview of typical production configuration is given below:
99

1010
![Overview](production-simple.png)
1111

1212
{% hint style="success" %}
13-
**Important note:** We're trying to keep Feast modular. With the exception of the core, most of the Feast blocks are loosely connected and can be used independently. Hence, you are free to build your own production configuration.
14-
For example, you might not have a stream source and, thus, no need to write features in real-time to an online store.
13+
**Important note:** We're trying to keep Feast modular. With the exception of the core, most of the Feast blocks are loosely connected and can be used independently. Hence, you are free to build your own production configuration.
14+
For example, you might not have a stream source and, thus, no need to write features in real-time to an online store.
1515
Or you might not need to retrieve online features.
1616

1717
Furthermore, there's no single "true" approach. As you will see in this guide, Feast usually provides several options for each problem.
@@ -95,7 +95,7 @@ In summary, once you have set up a Git based repository with CI that runs `feast
9595

9696
To keep your online store up to date, you need to run a job that loads feature data from your feature view sources into your online store. In Feast, this loading operation is called materialization.
9797

98-
### 2.1. Manual materializations
98+
### 2.1. Manual materializations
9999
The simplest way to schedule materialization is to run an **incremental** materialization using the Feast CLI:
100100

101101
```text
@@ -116,7 +116,7 @@ In the above example we are materializing the source data from the `driver_hourl
116116

117117
The timestamps above should match the interval of data that has been computed by the data transformation system.
118118

119-
### 2.2. Automate periodic materializations
119+
### 2.2. Automate periodic materializations
120120

121121
It is up to you which orchestration/scheduler to use to periodically run `$ feast materialize`.
122122
Feast keeps the history of materialization in its registry so that the choice could be as simple as a [unix cron util](https://en.wikipedia.org/wiki/Cron).
@@ -160,7 +160,7 @@ feature_refs = [
160160
]
161161

162162
training_df = fs.get_historical_features(
163-
entity_df=entity_df,
163+
entity_df=entity_df,
164164
features=feature_refs,
165165
).to_df()
166166

@@ -214,7 +214,7 @@ There are three approaches for that purpose sorted from the most simple one (in
214214

215215
This approach is the most convenient to keep your infrastructure as minimalistic as possible and avoid deploying extra services.
216216
The Feast Python SDK will connect directly to the online store (Redis, Datastore, etc), pull the feature data, and run transformations locally (if required).
217-
The obvious drawback is that your service must be written in Python to use the Feast Python SDK.
217+
The obvious drawback is that your service must be written in Python to use the Feast Python SDK.
218218
A benefit of using a Python stack is that you can enjoy production-grade services with integrations with many existing data science tools.
219219

220220
To integrate online retrieval into your service use the following code:
@@ -245,9 +245,9 @@ This service will provide an HTTP API with JSON I/O, which can be easily used wi
245245
### 4.3. Java based Feature Server deployed on Kubernetes
246246

247247
For users with very latency-sensitive and high QPS use-cases, Feast offers a high-performance Java feature server.
248-
Besides the benefits of running on JVM, this implementation also provides a gRPC API, which guarantees good connection utilization and
249-
small request / response body size (compared to JSON).
250-
You will need the Feast Java SDK to retrieve features from this service. This SDK wraps all the gRPC logic for you and provides more convenient APIs.
248+
Besides the benefits of running on JVM, this implementation also provides a gRPC API, which guarantees good connection utilization and
249+
small request / response body size (compared to JSON).
250+
You will need the Feast Java SDK to retrieve features from this service. This SDK wraps all the gRPC logic for you and provides more convenient APIs.
251251

252252
The Java based feature server can be deployed to Kubernetes cluster via Helm charts in a few simple steps:
253253

@@ -292,9 +292,9 @@ def feast_writer(spark_df):
292292
streamingDF.writeStream.foreachBatch(feast_writer).start()
293293
```
294294

295-
### 5.2. Push service *(still under development)*
295+
### 5.2. Push Service (Alpha)
296296

297-
Alternatively, if you want to ingest features directly from a broker (eg, Kafka or Kinesis), you can use the "push service", which will write to an online store.
297+
Alternatively, if you want to ingest features directly from a broker (eg, Kafka or Kinesis), you can use the "push service", which will write to an online store and/or offline store.
298298
This service will expose an HTTP API or when deployed on Serverless platforms like AWS Lambda or Google Cloud Run,
299299
this service can be directly connected to Kinesis or PubSub.
300300

docs/reference/data-sources/kafka.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,9 @@
44

55
## Description
66

7-
Kafka sources allow users to register Kafka streams as data sources. Feast currently does not launch or monitor jobs to ingest data from Kafka. Users are responsible for launching and monitoring their own ingestion jobs, which should write feature values to the online store through [FeatureStore.write_to_online_store](https://rtd.feast.dev/en/latest/index.html#feast.feature_store.FeatureStore.write_to_online_store). An example of how to launch such a job with Spark can be found [here](https://github.com/feast-dev/feast/tree/master/sdk/python/feast/infra/contrib).
7+
Kafka sources allow users to register Kafka streams as data sources. Feast currently does not launch or monitor jobs to ingest data from Kafka. Users are responsible for launching and monitoring their own ingestion jobs, which should write feature values to the online store through [FeatureStore.write_to_online_store](https://rtd.feast.dev/en/latest/index.html#feast.feature_store.FeatureStore.write_to_online_store). An example of how to launch such a job with Spark can be found [here](https://github.com/feast-dev/feast/tree/master/sdk/python/feast/infra/contrib). Feast also provides functionality to write to the offline store using the `write_to_offline_store` functionality.
88

9-
Kafka sources must have a batch source specified. The batch source will be used for retrieving historical features. Thus users are also responsible for writing data from their Kafka streams to a batch data source such as a data warehouse table. Feast plans on shipping `FeatureStore.write_to_offline_store` functionality soon, so users will be able to write data to the offline store just as easily as to the online store. When using a Kafka source as a stream source in the definition of a feature view, a batch source doesn't need to be specified in the feature view definition explicitly.
9+
Kafka sources must have a batch source specified. The batch source will be used for retrieving historical features. Thus users are also responsible for writing data from their Kafka streams to a batch data source such as a data warehouse table. When using a Kafka source as a stream source in the definition of a feature view, a batch source doesn't need to be specified in the feature view definition explicitly.
1010

1111
## Stream sources
1212
Streaming data sources are important sources of feature values. A typical setup with streaming data looks like:

docs/reference/data-sources/kinesis.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,9 @@
44

55
## Description
66

7-
Kinesis sources allow users to register Kinesis streams as data sources. Feast currently does not launch or monitor jobs to ingest data from Kinesis. Users are responsible for launching and monitoring their own ingestion jobs, which should write feature values to the online store through [FeatureStore.write_to_online_store](https://rtd.feast.dev/en/latest/index.html#feast.feature_store.FeatureStore.write_to_online_store). An example of how to launch such a job with Spark to ingest from Kafka can be found [here](https://github.com/feast-dev/feast/tree/master/sdk/python/feast/infra/contrib); by using a different plugin, the example can be adapted to Kinesis.
7+
Kinesis sources allow users to register Kinesis streams as data sources. Feast currently does not launch or monitor jobs to ingest data from Kinesis. Users are responsible for launching and monitoring their own ingestion jobs, which should write feature values to the online store through [FeatureStore.write_to_online_store](https://rtd.feast.dev/en/latest/index.html#feast.feature_store.FeatureStore.write_to_online_store). An example of how to launch such a job with Spark to ingest from Kafka can be found [here](https://github.com/feast-dev/feast/tree/master/sdk/python/feast/infra/contrib); by using a different plugin, the example can be adapted to Kinesis. Feast also provides functionality to write to the offline store using the `write_to_offline_store` functionality.
88

9-
Kinesis sources must have a batch source specified. The batch source will be used for retrieving historical features. Thus users are also responsible for writing data from their Kinesis streams to a batch data source such as a data warehouse table. Feast plans on shipping `FeatureStore.write_to_offline_store` functionality soon, so users will be able to write data to the offline store just as easily as to the online store. When using a Kinesis source as a stream source in the definition of a feature view, a batch source doesn't need to be specified in the feature view definition explicitly.
9+
Kinesis sources must have a batch source specified. The batch source will be used for retrieving historical features. Thus users are also responsible for writing data from their Kinesis streams to a batch data source such as a data warehouse table. When using a Kinesis source as a stream source in the definition of a feature view, a batch source doesn't need to be specified in the feature view definition explicitly.
1010

1111
## Stream sources
1212
Streaming data sources are important sources of feature values. A typical setup with streaming data looks like:

docs/reference/data-sources/push.md

+8-6
Original file line numberDiff line numberDiff line change
@@ -4,12 +4,12 @@
44

55
## Description
66

7-
Push sources allow feature values to be pushed to the online store in real time. This allows fresh feature values to be made available to applications. Push sources supercede the
7+
Push sources allow feature values to be pushed to the online store and offline store in real time. This allows fresh feature values to be made available to applications. Push sources supercede the
88
[FeatureStore.write_to_online_store](https://rtd.feast.dev/en/latest/index.html#feast.feature_store.FeatureStore.write_to_online_store).
99

1010
Push sources can be used by multiple feature views. When data is pushed to a push source, Feast propagates the feature values to all the consuming feature views.
1111

12-
Push sources must have a batch source specified. The batch source will be used for retrieving historical features. Thus users are also responsible for pushing data to a batch data source such as a data warehouse table. Feast plans on shipping `FeatureStore.write_to_offline_store` functionality soon, so users will be able to write data to the offline store just as easily as to the online store. When using a push source as a stream source in the definition of a feature view, a batch source doesn't need to be specified in the feature view definition explicitly.
12+
Push sources must have a batch source specified. The batch source will be used for retrieving historical features. Thus users are also responsible for pushing data to a batch data source such as a data warehouse table. When using a push source as a stream source in the definition of a feature view, a batch source doesn't need to be specified in the feature view definition explicitly.
1313

1414
## Stream sources
1515
Streaming data sources are important sources of feature values. A typical setup with streaming data looks like:
@@ -20,11 +20,11 @@ Streaming data sources are important sources of feature values. A typical setup
2020
4. Write stream 2 values to an online store for low latency feature serving
2121
5. Periodically materialize feature values from the offline store into the online store for decreased training-serving skew and improved model performance
2222

23-
Feast now allows users to push features previously registered in a feature view to the online store for fresher features.
23+
Feast allows users to push features previously registered in a feature view to the online store for fresher features. It also allows users to push batches of stream data to the offline store by specifying that the push be directed to the offline store. This will push the data to the offline store declared in the repository configuration used to initialize the feature store.
2424

2525
## Example
2626
### Defining a push source
27-
Note that the push schema needs to also include the entity
27+
Note that the push schema needs to also include the entity.
2828

2929
```python
3030
from feast import PushSource, ValueType, BigQuerySource, FeatureView, Feature, Field
@@ -44,14 +44,16 @@ fv = FeatureView(
4444
```
4545

4646
### Pushing data
47+
Note that the `to` parameter is optional and defaults to online but we can specify these options: `PushMode.ONLINE`, `PushMode.OFFLINE`, or `PushMode.ONLINE_AND_OFFLINE`.
4748
```python
4849
from feast import FeatureStore
4950
import pandas as pd
51+
from feast.data_source import PushMode
5052

5153
fs = FeatureStore(...)
5254
feature_data_frame = pd.DataFrame()
53-
fs.push("push_source_name", feature_data_frame)
55+
fs.push("push_source_name", feature_data_frame, to=PushMode.ONLINE_AND_OFFLINE)
5456
```
5557

56-
See also [Python feature server](../feature-servers/python-feature-server.md) for instructions on how to push data to a deployed feature server.
58+
See also [Python feature server](../feature-servers/python-feature-server.md) for instructions on how to push data to a deployed feature server.
5759

docs/reference/feature-servers/python-feature-server.md

+6-6
Original file line numberDiff line numberDiff line change
@@ -152,13 +152,12 @@ curl -X POST \
152152
}' | jq
153153
```
154154

155-
### Pushing features to the online store
156-
You can push data corresponding to a push source to the online store (note that timestamps need to be strings):
155+
### Pushing features to the online and offline stores
156+
You can push data corresponding to a push source to the online and offline stores (note that timestamps need to be strings):
157157

158-
You can also define a pushmode to push offline data, either to the online store, offline store, or both. The feature server will throw an error if the online/offline
159-
store doesn't support the push api functionality.
158+
You can also define a pushmode to push stream or batch data, either to the online store, offline store, or both. The feature server will throw an error if the online/offline store doesn't support the push api functionality.
160159
161-
The request definition for pushmode is a string parameter `to` where the options are: ["online", "offline", "both"].
160+
The request definition for pushmode is a string parameter `to` where the options are: ["online", "offline", "online_and_offline"].
162161
```text
163162
curl -X POST "http://localhost:6566/push" -d '{
164163
"push_source_name": "driver_hourly_stats_push_source",
@@ -169,7 +168,8 @@ curl -X POST "http://localhost:6566/push" -d '{
169168
"conv_rate": [1.0],
170169
"acc_rate": [1.0],
171170
"avg_daily_trips": [1000]
172-
}
171+
},
172+
"to": "online_and_offline",
173173
}' | jq
174174
```
175175

docs/roadmap.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -44,8 +44,8 @@ The list below contains the functionality that contributors are planning to deve
4444
* [ ] Batch transformation (In progress. See [RFC](https://docs.google.com/document/d/1964OkzuBljifDvkV-0fakp2uaijnVzdwWNGdz7Vz50A/edit))
4545
* **Streaming**
4646
* [x] [Custom streaming ingestion job support](https://docs.feast.dev/how-to-guides/creating-a-custom-provider)
47-
* [x] [Push based streaming data ingestion to online store](https://docs.feast.dev/reference/data-sources/push)
48-
* [ ] Push based streaming data ingestion to offline store (In Progress)
47+
* [x] [Push based streaming data ingestion to online store (Alpha)](https://docs.feast.dev/reference/data-sources/push)
48+
* [x] [Push based streaming data ingestion to offline store (Alpha)](https://docs.feast.dev/reference/data-sources/push)
4949
* **Deployments**
5050
* [x] AWS Lambda (Alpha release. See [RFC](https://docs.google.com/document/d/1eZWKWzfBif66LDN32IajpaG-j82LSHCCOzY6R7Ax7MI/edit))
5151
* [x] Kubernetes (See [guide](https://docs.feast.dev/how-to-guides/running-feast-in-production#4.3.-java-based-feature-server-deployed-on-kubernetes))
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
11
# Building streaming features
22

3-
Please see [here](https://github.com/feast-dev/streaming-tutorial) for the tutorial.
3+
Feast supports registering streaming feature views and Kafka and Kinesis streaming sources. It also provides an interface for stream processing called the `Stream Processor`. An example Kafka/Spark StreamProcessor is implemented in the contrib folder. For more details, please see the [RFC](https://docs.google.com/document/d/1UzEyETHUaGpn0ap4G82DHluiCj7zEbrQLkJJkKSv4e8/edit?usp=sharing) for more details.
4+
5+
Please see [here](https://github.com/feast-dev/streaming-tutorial) for a tutorial on how to build a versioned streaming pipeline that registers your transformations, features, and data sources in Feast.

sdk/python/feast/feature_server.py

+5-1
Original file line numberDiff line numberDiff line change
@@ -86,8 +86,12 @@ def push(body=Depends(get_body)):
8686
to = PushMode.OFFLINE
8787
elif request.to == "online":
8888
to = PushMode.ONLINE
89-
else:
89+
elif request.to == "online_and_offline":
9090
to = PushMode.ONLINE_AND_OFFLINE
91+
else:
92+
raise ValueError(
93+
f"{request.to} is not a supported push format. Please specify one of these ['online', 'offline', 'online_and_offline']."
94+
)
9195
store.push(
9296
push_source_name=request.push_source_name,
9397
df=df,

0 commit comments

Comments
 (0)