Skip to content

Commit fad45ca

Browse files
docs: Add type system docs and add details to data source docs (#3108)
* Data source docs Signed-off-by: Felix Wang <[email protected]> * Type system docs Signed-off-by: Felix Wang <[email protected]> * Update data source docs Signed-off-by: Felix Wang <[email protected]> * Update docs Signed-off-by: Felix Wang <[email protected]> Signed-off-by: Felix Wang <[email protected]>
1 parent 83cf753 commit fad45ca

File tree

12 files changed

+114
-3
lines changed

12 files changed

+114
-3
lines changed

docs/SUMMARY.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,9 @@
6262
## Reference
6363

6464
* [Codebase Structure](reference/codebase-structure.md)
65+
* [Type System](reference/type-system.md)
6566
* [Data sources](reference/data-sources/README.md)
67+
* [Overview](reference/data-sources/overview.md)
6668
* [File](reference/data-sources/file.md)
6769
* [Snowflake](reference/data-sources/snowflake.md)
6870
* [BigQuery](reference/data-sources/bigquery.md)

docs/reference/data-sources/README.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,10 @@
11
# Data sources
22

3-
Please see [Data Source](../../getting-started/concepts/data-ingestion.md) for an explanation of data sources.
3+
Please see [Data Source](../../getting-started/concepts/data-ingestion.md) for a conceptual explanation of data sources.
4+
5+
{% content-ref url="overview.md" %}
6+
[overview.md](overview.md)
7+
{% endcontent-ref %}
48

59
{% content-ref url="file.md" %}
610
[file.md](file.md)

docs/reference/data-sources/bigquery.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,3 +30,8 @@ BigQuerySource(
3030
```
3131

3232
The full set of configuration options is available [here](https://rtd.feast.dev/en/latest/index.html#feast.infra.offline_stores.bigquery_source.BigQuerySource).
33+
34+
## Supported Types
35+
36+
BigQuery data sources support all eight primitive types and their corresponding array types.
37+
For a comparison against other batch data sources, please see [here](overview.md#functionality-matrix).

docs/reference/data-sources/file.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,3 +22,8 @@ parquet_file_source = FileSource(
2222
```
2323

2424
The full set of configuration options is available [here](https://rtd.feast.dev/en/latest/index.html#feast.infra.offline_stores.file_source.FileSource).
25+
26+
## Supported Types
27+
28+
File data sources support all eight primitive types and their corresponding array types.
29+
For a comparison against other batch data sources, please see [here](overview.md#functionality-matrix).
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
# Overview
2+
3+
## Functionality
4+
5+
In Feast, each batch data source is associated with a corresponding offline store.
6+
For example, a `SnowflakeSource` can only be processed by the Snowflake offline store.
7+
Otherwise, the primary difference between batch data sources is the set of supported types.
8+
Feast has an internal type system, and aims to support eight primitive types (`bytes`, `string`, `int32`, `int64`, `float32`, `float64`, `bool`, and `timestamp`) along with the corresponding array types.
9+
However, not every batch data source supports all of these types.
10+
11+
For more details on the Feast type system, see [here](../type-system.md).
12+
13+
## Functionality Matrix
14+
15+
There are currently four core batch data source implementations: `FileSource`, `BigQuerySource`, `SnowflakeSource`, and `RedshiftSource`.
16+
There are several additional implementations contributed by the Feast community (`PostgreSQLSource`, `SparkSource`, and `TrinoSource`), which are not guaranteed to be stable or to match the functionality of the core implementations.
17+
Details for each specific data source can be found [here](README.md).
18+
19+
Below is a matrix indicating which data sources support which types.
20+
21+
| | File | BigQuery | Snowflake | Redshift | Postgres | Spark | Trino |
22+
| :-------------------------------- | :-- | :-- | :-- | :-- | :-- | :-- | :-- |
23+
| `bytes` | yes | yes | yes | yes | yes | yes | yes |
24+
| `string` | yes | yes | yes | yes | yes | yes | yes |
25+
| `int32` | yes | yes | yes | yes | yes | yes | yes |
26+
| `int64` | yes | yes | yes | yes | yes | yes | yes |
27+
| `float32` | yes | yes | yes | yes | yes | yes | yes |
28+
| `float64` | yes | yes | yes | yes | yes | yes | yes |
29+
| `bool` | yes | yes | yes | yes | yes | yes | yes |
30+
| `timestamp` | yes | yes | yes | yes | yes | yes | yes |
31+
| array types | yes | yes | no | no | yes | yes | no |

docs/reference/data-sources/postgres.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,3 +28,8 @@ driver_stats_source = PostgreSQLSource(
2828
```
2929

3030
The full set of configuration options is available [here](https://rtd.feast.dev/en/master/#feast.infra.offline_stores.contrib.postgres_offline_store.postgres_source.PostgreSQLSource).
31+
32+
## Supported Types
33+
34+
PostgreSQL data sources support all eight primitive types and their corresponding array types.
35+
For a comparison against other batch data sources, please see [here](overview.md#functionality-matrix).

docs/reference/data-sources/redshift.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,3 +30,8 @@ my_redshift_source = RedshiftSource(
3030
```
3131

3232
The full set of configuration options is available [here](https://rtd.feast.dev/en/master/#feast.infra.offline_stores.redshift_source.RedshiftSource).
33+
34+
## Supported Types
35+
36+
Redshift data sources support all eight primitive types, but currently do not support array types.
37+
For a comparison against other batch data sources, please see [here](overview.md#functionality-matrix).

docs/reference/data-sources/snowflake.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,3 +43,8 @@ In particular, you can read more about quote identifiers [here](https://docs.sno
4343
{% endhint %}
4444

4545
The full set of configuration options is available [here](https://rtd.feast.dev/en/latest/index.html#feast.infra.offline_stores.snowflake_source.SnowflakeSource).
46+
47+
## Supported Types
48+
49+
Snowflake data sources support all eight primitive types, but currently do not support array types.
50+
For a comparison against other batch data sources, please see [here](overview.md#functionality-matrix).

docs/reference/data-sources/spark.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,3 +52,8 @@ my_spark_source = SparkSource(
5252
```
5353

5454
The full set of configuration options is available [here](https://rtd.feast.dev/en/master/#feast.infra.offline_stores.contrib.spark_offline_store.spark_source.SparkSource).
55+
56+
## Supported Types
57+
58+
Spark data sources support all eight primitive types and their corresponding array types.
59+
For a comparison against other batch data sources, please see [here](overview.md#functionality-matrix).

docs/reference/data-sources/trino.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,3 +27,8 @@ driver_hourly_stats = TrinoSource(
2727
```
2828

2929
The full set of configuration options is available [here](https://rtd.feast.dev/en/master/#trino-source).
30+
31+
## Supported Types
32+
33+
Trino data sources support all eight primitive types, but currently do not support array types.
34+
For a comparison against other batch data sources, please see [here](overview.md#functionality-matrix).

docs/reference/offline-stores/README.md

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,6 @@
22

33
Please see [Offline Store](../../getting-started/architecture-and-components/offline-store.md) for a conceptual explanation of offline stores.
44

5-
## Reference
6-
75
{% content-ref url="overview.md" %}
86
[overview.md](overview.md)
97
{% endcontent-ref %}

docs/reference/type-system.md

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
# Type System
2+
3+
## Motivation
4+
5+
Feast uses an internal type system to provide guarantees on training and serving data.
6+
Feast currently supports eight primitive types - `INT32`, `INT64`, `FLOAT32`, `FLOAT64`, `STRING`, `BYTES`, `BOOL`, and `UNIX_TIMESTAMP` - and the corresponding array types.
7+
Null types are not supported, although the `UNIX_TIMESTAMP` type is nullable.
8+
The type system is controlled by [`Value.proto`](https://github.com/feast-dev/feast/blob/master/protos/feast/types/Value.proto) in protobuf and by [`types.py`](https://github.com/feast-dev/feast/blob/master/sdk/python/feast/types.py) in Python.
9+
Type conversion logic can be found in [`type_map.py`](https://github.com/feast-dev/feast/blob/master/sdk/python/feast/type_map.py).
10+
11+
## Examples
12+
13+
### Feature inference
14+
15+
During `feast apply`, Feast runs schema inference on the data sources underlying feature views.
16+
For example, if the `schema` parameter is not specified for a feature view, Feast will examine the schema of the underlying data source to determine the event timestamp column, feature columns, and entity columns.
17+
Each of these columns must be associated with a Feast type, which requires conversion from the data source type system to the Feast type system.
18+
* The feature inference logic calls `_infer_features_and_entities`.
19+
* `_infer_features_and_entities` calls `source_datatype_to_feast_value_type`.
20+
* `source_datatype_to_feast_value_type` cals the appropriate method in `type_map.py`. For example, if a `SnowflakeSource` is being examined, `snowflake_python_type_to_feast_value_type` from `type_map.py` will be called.
21+
22+
### Materialization
23+
24+
Feast serves feature values as [`Value`](https://github.com/feast-dev/feast/blob/master/protos/feast/types/Value.proto) proto objects, which have a type corresponding to Feast types.
25+
Thus Feast must materialize feature values into the online store as `Value` proto objects.
26+
* The local materialization engine first pulls the latest historical features and converts it to pyarrow.
27+
* Then it calls `_convert_arrow_to_proto` to convert the pyarrow table to proto format.
28+
* This calls `python_values_to_proto_values` in `type_map.py` to perform the type conversion.
29+
30+
### Historical feature retrieval
31+
32+
The Feast type system is typically not necessary when retrieving historical features.
33+
A call to `get_historical_features` will return a `RetrievalJob` object, which allows the user to export the results to one of several possible locations: a Pandas dataframe, a pyarrow table, a data lake (e.g. S3 or GCS), or the offline store (e.g. a Snowflake table).
34+
In all of these cases, the type conversion is handled natively by the offline store.
35+
For example, a BigQuery query exposes a `to_dataframe` method that will automatically convert the result to a dataframe, without requiring any conversions within Feast.
36+
37+
### Feature serving
38+
39+
As mentioned above in the section on [materialization](#materialization), Feast persists feature values into the online store as `Value` proto objects.
40+
A call to `get_online_features` will return an `OnlineResponse` object, which essentially wraps a bunch of `Value` protos with some metadata.
41+
The `OnlineResponse` object can then be converted into a Python dictionary, which calls `feast_value_type_to_python_type` from `type_map.py`, a utility that converts the Feast internal types to Python native types.

0 commit comments

Comments
 (0)