Skip to content

Commit 62623b3

Browse files
authored
YDBDOCS-191: translate FQ docs (#5549)
1 parent 7f78811 commit 62623b3

28 files changed

+1375
-24
lines changed

ydb/docs/en/core/concepts/federated_query.md

Lines changed: 0 additions & 1 deletion
This file was deleted.
Loading
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
{% if oss == "true" %}Deploy the [connector](../architecture.md#connectors) {% else %}Deploy the connector{% endif %} and {% if oss == "true" %}[configure](../../../deploy/manual/deploy-ydb-federated-query.md) {% else %}configure{% endif %} the {{ ydb-short-name }} dynamic nodes to interact with it. Additionally, ensure network access from the {{ ydb-short-name }} dynamic nodes to the external data source (at the address specified in the `LOCATION` parameter of the `CREATE EXTERNAL DATA SOURCE` request). If network connection encryption to the external source was enabled in the previous step, the connector will use the system's root certificates (more details on TLS configuration can be found in the [guide](../../../deploy/manual/connector.md) on deploying the connector).
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
If the date value stored in the external data source is outside the allowed range for {{ ydb-short-name }} (all dates used must be later than 1970-01-01 but earlier than 2105-12-31), such a value in {{ ydb-short-name }} will be converted to `NULL`.
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
The {{ ydb-short-name }} federated query processing system is capable of delegating the execution of certain parts of a query to the system acting as the data source. Query fragments are passed through {{ ydb-short-name }} directly to the external system and processed within it. This optimization, known as "predicate pushdown", significantly reduces the volume of data transferred from the source to the federated query processing engine. This reduces network load and saves computational resources for {{ ydb-short-name }}.
2+
3+
A specific case of predicate pushdown, where filtering expressions specified after the `WHERE` keyword are passed down, is called "filter pushdown". Filter pushdown is possible when using:
4+
5+
|Description|Example|
6+
|---|---|
7+
|Filters like `IS NULL`/`IS NOT NULL`|`WHERE column1 IS NULL` or `WHERE column1 IS NOT NULL`|
8+
|Logical conditions `OR`, `NOT`, `AND`|`WHERE column IS NULL OR column2 is NOT NULL`|
9+
|Comparison conditions `=`, `<>`, `<`, `<=`, `>`, `>=` with other columns or constants|`WHERE column3 > column4 OR column5 <= 10`|
10+
11+
Supported data types for filter pushdown:
12+
13+
|{{ ydb-short-name }} Data Type|
14+
|----|
15+
|`Bool`|
16+
|`Int8`|
17+
|`Uint8`|
18+
|`Int16`|
19+
|`Uint16`|
20+
|`Int32`|
21+
|`Uint32`|
22+
|`Int64`|
23+
|`Uint64`|
24+
|`Float`|
25+
|`Double`|
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
External sources are available only for reading data through `SELECT` queries. The federated query processing engine currently does not support queries that modify tables in external sources.
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
# Federated query processing system architecture
2+
3+
## External data sources and external tables
4+
5+
A key element of the federated query processing system in {{ ydb-full-name }} is the concept of an [external data source](../datamodel/external_data_source.md). Such sources can include relational DBMS, object storage, and other data storage systems. When processing a federated query, {{ ydb-short-name }} streams data from external systems and allows performing the same range of operations on them as on local data.
6+
7+
To work with data located in external systems, {{ ydb-short-name }} must have information about the internal structure of this data (e.g., the number, names, and types of columns in tables). Some sources provide such metadata along with the data itself, whereas for other unschematized sources, this metadata must be provided externally. This latter purpose is served by [external tables](../datamodel/external_table.md).
8+
9+
Once external data sources and (if necessary) external tables are registered in {{ ydb-short-name }}, the client can proceed to describe federated queries.
10+
11+
## Connectors {#connectors}
12+
13+
While executing federated queries, {{ ydb-short-name }} needs to access external data storage systems over the network, for which it uses their client libraries. Including such dependencies negatively affects the codebase size, compilation time, and binary file size of {{ ydb-short-name }}, as well as the product's overall stability.
14+
15+
The list of supported data sources for federated queries is constantly expanding. The most popular sources, such as [S3](s3), are natively supported by {{ ydb-short-name }}. However, not all users require support for all sources simultaneously. Support can be optionally enabled using _connectors_ - special microservices implementing a unified interface for accessing external data sources.
16+
17+
The functions of connectors include:
18+
19+
* Translating YQL queries into queries in the language specific to the external source (e.g., into another SQL dialect or HTTP API calls).
20+
* Establishing network connections with data sources.
21+
* Converting data retrieved from external sources into a columnar format in [Arrow IPC Stream](https://arrow.apache.org/docs/format/Columnar.html#serialization-and-interprocess-communication-ipc) format, supported by {{ ydb-short-name }}.
22+
23+
![YDB Federated Query Architecture](_assets/architecture.png "YDB Federated Query Architecture" =640x)
24+
25+
Thus, connectors form an abstraction layer that hides the specifics of external data sources from {{ ydb-short-name }}. The concise connector interface makes it easy to expand the list of supported sources with minimal changes to {{ ydb-short-name }}'s code.
26+
27+
Users can deploy [one of the ready-made connectors](../../deploy/manual/connector.md) or write their own implementation in any programming language according to the [gRPC specification](https://github.com/ydb-platform/ydb/tree/main/ydb/library/yql/providers/generic/connector/api).
28+
29+
## List of supported external data sources {#supported-datasources}
30+
31+
| Source | Support |
32+
|--------|---------|
33+
| [S3](https://aws.amazon.com/ru/s3/) | Built into `ydbd` |
34+
| [ClickHouse](https://clickhouse.com/) | Via connector [fq-connector-go](../../deploy/manual/connector.md#fq-connector-go) |
35+
| [PostgreSQL](https://www.postgresql.org/) | Via connector [fq-connector-go](../../deploy/manual/connector.md#fq-connector-go) |
36+
| [{{ydb-short-name}}](https://ydb.tech/) | Via connector [fq-connector-go](../../deploy/manual/connector.md#fq-connector-go) |
Lines changed: 95 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,95 @@
1-
{% include [no-translation](../../_includes/alerts/no-translation.md) %}
1+
# Working with ClickHouse databases
2+
3+
This section describes the basic information about working with the external ClickHouse database [ClickHouse](https://clickhouse.com).
4+
5+
To work with the external ClickHouse database, the following steps must be completed:
6+
1. Create a [secret](../datamodel/secrets.md) containing the password to connect to the database.
7+
```sql
8+
CREATE OBJECT clickhouse_datasource_user_password (TYPE SECRET) WITH (value = "<password>");
9+
```
10+
1. Create an [external data source](../datamodel/external_data_source.md) describing the target database inside the ClickHouse cluster. To connect to ClickHouse, you can use either the [native TCP protocol](https://clickhouse.com/docs/en/interfaces/tcp) (`PROTOCOL="NATIVE"`) or the [HTTP protocol](https://clickhouse.com/docs/en/interfaces/http) (`PROTOCOL="HTTP"`). To enable encryption for connections to the external database, use the `USE_TLS="TRUE"` parameter.
11+
```sql
12+
CREATE EXTERNAL DATA SOURCE clickhouse_datasource WITH (
13+
SOURCE_TYPE="ClickHouse",
14+
LOCATION="<host>:<port>",
15+
DATABASE_NAME="<database>",
16+
AUTH_METHOD="BASIC",
17+
LOGIN="<login>",
18+
PASSWORD_SECRET_NAME="clickhouse_datasource_user_password",
19+
PROTOCOL="NATIVE",
20+
USE_TLS="TRUE"
21+
);
22+
```
23+
24+
1. {% include [!](_includes/connector_deployment.md) %}
25+
1. [Execute a query](#query) to the database.
26+
27+
28+
## Query syntax {#query}
29+
To work with ClickHouse, use the following SQL query form:
30+
31+
```sql
32+
SELECT * FROM clickhouse_datasource.<table_name>
33+
```
34+
35+
Where:
36+
- `clickhouse_datasource` is the identifier of the external data source;
37+
- `<table_name>` is the table's name within the external data source.
38+
39+
## Limitations
40+
41+
There are several limitations when working with ClickHouse clusters:
42+
43+
1. {% include [!](_includes/supported_requests.md) %}
44+
1. {% include [!](_includes/datetime_limits.md) %}
45+
1. {% include [!](_includes/predicate_pushdown.md) %}
46+
47+
## Supported data types
48+
49+
By default, ClickHouse columns cannot physically contain `NULL` values. However, users can create tables with columns of optional or [nullable](https://clickhouse.com/docs/en/sql-reference/data-types/nullable) types. The column types displayed in {{ ydb-short-name }} when extracting data from the external ClickHouse database will depend on whether primitive or optional types are used in the ClickHouse table. Due to the previously discussed limitations of {{ ydb-short-name }} types used to store dates and times, all similar ClickHouse types are displayed in {{ ydb-short-name }} as [optional](../../yql/reference/types/optional.md).
50+
51+
Below are the mapping tables for ClickHouse and {{ ydb-short-name }} types. All other data types, except those listed, are not supported.
52+
53+
### Primitive data types
54+
55+
|ClickHouse data type|{{ ydb-full-name }} data type|Notes|
56+
|---|----|------|
57+
|`Bool`|`Bool`||
58+
|`Int8`|`Int8`||
59+
|`UInt8`|`Uint8`||
60+
|`Int16`|`Int16`||
61+
|`UInt16`|`Uint16`||
62+
|`Int32`|`Int32`||
63+
|`UInt32`|`Uint32`||
64+
|`Int64`|`Int64`||
65+
|`UInt64`|`Uint64`||
66+
|`Float32`|`Float`||
67+
|`Float64`|`Double`||
68+
|`Date`|`Date`||
69+
|`Date32`|`Optional<Date>`|Valid date range from 1970-01-01 to 2105-12-31. Values outside this range return `NULL`.|
70+
|`DateTime`|`Optional<DateTime>`|Valid time range from 1970-01-01 00:00:00 to 2105-12-31 23:59:59. Values outside this range return `NULL`.|
71+
|`DateTime64`|`Optional<Timestamp>`|Valid time range from 1970-01-01 00:00:00 to 2105-12-31 23:59:59. Values outside this range return `NULL`.|
72+
|`String`|`String`||
73+
|`FixedString`|`String`|Null bytes in `FixedString` are transferred to `String` unchanged.|
74+
75+
### Optional data types
76+
77+
|ClickHouse data type|{{ ydb-full-name }} data type|Notes|
78+
|---|----|------|
79+
|`Nullable(Bool)`|`Optional<Bool>`||
80+
|`Nullable(Int8)`|`Optional<Int8>`||
81+
|`Nullable(UInt8)`|`Optional<Uint8>`||
82+
|`Nullable(Int16)`|`Optional<Int16>`||
83+
|`Nullable(UInt16)`|`Optional<Uint16>`||
84+
|`Nullable(Int32)`|`Optional<Int32>`||
85+
|`Nullable(UInt32)`|`Optional<Uint32>`||
86+
|`Nullable(Int64)`|`Optional<Int64>`||
87+
|`Nullable(UInt64)`|`Optional<Uint64>`||
88+
|`Nullable(Float32)`|`Optional<Float>`||
89+
|`Nullable(Float64)`|`Optional<Double>`||
90+
|`Nullable(Date)`|`Optional<Date>`||
91+
|`Nullable(Date32)`|`Optional<Date>`|Valid date range from 1970-01-01 to 2105-12-31. Values outside this range return `NULL`.|
92+
|`Nullable(DateTime)`|`Optional<DateTime>`|Valid time range from 1970-01-01 00:00:00 to 2105-12-31 23:59:59. Values outside this range return `NULL`.|
93+
|`Nullable(DateTime64)`|`Optional<Timestamp>`|Valid time range from 1970-01-01 00:00:00 to 2105-12-31 23:59:59. Values outside this range return `NULL`.|
94+
|`Nullable(String)`|`Optional<String>`||
95+
|`Nullable(FixedString)`|`Optional<String>`|Null bytes in `FixedString` are transferred to `String` unchanged.|
Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,17 @@
1-
{% include [no-translation](../../_includes/alerts/no-translation.md) %}
1+
# Federated queries
2+
3+
{% note warning %}
4+
5+
This functionality is in "Preview" mode.
6+
7+
{% endnote %}
8+
9+
Federated queries allow retrieving information from various data sources without needing to transfer the data from these sources into {{ ydb-full-name }} storage. Currently, federated queries support interaction with ClickHouse, PostgreSQL, and S3-compatible data stores. Using YQL queries, you can access these databases without the need to duplicate data between systems.
10+
11+
To work with data stored in external DBMSs, it is sufficient to create an [external data source](../datamodel/external_data_source.md). To work with unstructured data stored in S3 buckets, you additionally need to create an [external table](../datamodel/external_table.md). In both cases, it is necessary to create [secrets](../datamodel/secrets.md) objects first that store confidential data required for authentication in external systems.
12+
13+
You can learn about the internals of the federated query processing system in the [architecture](./architecture.md) section. Detailed information on working with various data sources is provided in the corresponding sections:
14+
- [ClickHouse](clickhouse.md)
15+
- [PostgreSQL](postgresql.md)
16+
- [{{ ydb-short-name }}](ydb.md)
17+
- [S3](s3/external_table.md)
Lines changed: 75 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,75 @@
1-
{% include [no-translation](../../_includes/alerts/no-translation.md) %}
1+
# Working with PostgreSQL databases
2+
3+
This section provides basic information on working with external [PostgreSQL](http://postgresql.org) databases.
4+
5+
To work with an external PostgreSQL database, you need to follow these steps:
6+
1. Create a [secret](../datamodel/secrets.md) containing the password for connecting to the database.
7+
```sql
8+
CREATE OBJECT postgresql_datasource_user_password (TYPE SECRET) WITH (value = "<password>");
9+
```
10+
1. Create an [external data source](../datamodel/external_data_source.md) that describes a specific database within the PostgreSQL cluster. By default, the [namespace](https://www.postgresql.org/docs/current/catalog-pg-namespace.html) `public` is used for reading, but this value can be changed using the optional `SCHEMA` parameter. The network connection is made using the standard ([Frontend/Backend Protocol](https://www.postgresql.org/docs/current/protocol.html)) over TCP transport (`PROTOCOL="NATIVE"`). You can enable encryption of connections to the external database using the `USE_TLS="TRUE"` parameter.
11+
```sql
12+
CREATE EXTERNAL DATA SOURCE postgresql_datasource WITH (
13+
SOURCE_TYPE="PostgreSQL",
14+
LOCATION="<host>:<port>",
15+
DATABASE_NAME="<database>",
16+
AUTH_METHOD="BASIC",
17+
LOGIN="user",
18+
PASSWORD_SECRET_NAME="postgresql_datasource_user_password",
19+
PROTOCOL="NATIVE",
20+
USE_TLS="TRUE",
21+
SCHEMA="<schema>"
22+
);
23+
```
24+
1. {% include [!](_includes/connector_deployment.md) %}
25+
1. [Execute a query](#query) to the database.
26+
27+
## Query syntax { #query }
28+
The following SQL query format is used to work with PostgreSQL:
29+
30+
```sql
31+
SELECT * FROM postgresql_datasource.<table_name>
32+
```
33+
34+
where:
35+
- `postgresql_datasource` - identifier of the external data source;
36+
- `<table_name>` - table name within the external data source.
37+
38+
## Limitations
39+
40+
When working with PostgreSQL clusters, there are a number of limitations:
41+
42+
1. {% include [!](_includes/supported_requests.md) %}
43+
1. {% include [!](_includes/datetime_limits.md) %}
44+
1. {% include [!](_includes/predicate_pushdown.md) %}
45+
46+
## Supported data types
47+
48+
In the PostgreSQL database, the optionality of column values (whether a column can contain `NULL` values) is not part of the data type system. The `NOT NULL` constraint for each column is implemented as the `attnotnull` attribute in the system catalog [pg_attribute](https://www.postgresql.org/docs/current/catalog-pg-attribute.html), i.e., at the metadata level of the table. Therefore, all basic PostgreSQL types can contain `NULL` values by default, and in the {{ ydb-full-name }} type system, they should be mapped to [optional](https://ydb.tech/docs/ru/yql/reference/types/optional) types.
49+
50+
Below is a correspondence table between PostgreSQL and {{ ydb-short-name }} types. All other data types, except those listed, are not supported.
51+
52+
| PostgreSQL Data Type | {{ ydb-full-name }} Data Type | Notes |
53+
|---|----|------|
54+
| `boolean` | `Optional<Bool>` ||
55+
| `smallint` | `Optional<Int16>` ||
56+
| `int2` | `Optional<Int16>` ||
57+
| `integer` | `Optional<Int32>` ||
58+
| `int` | `Optional<Int32>` ||
59+
| `int4` | `Optional<Int32>` ||
60+
| `serial` | `Optional<Int32>` ||
61+
| `serial4` | `Optional<Int32>` ||
62+
| `bigint` | `Optional<Int64>` ||
63+
| `int8` | `Optional<Int64>` ||
64+
| `bigserial` | `Optional<Int64>` ||
65+
| `serial8` | `Optional<Int64>` ||
66+
| `real` | `Optional<Float>` ||
67+
| `float4` | `Optional<Float>` ||
68+
| `double precision` | `Optional<Double>` ||
69+
| `float8` | `Optional<Double>` ||
70+
| `date` | `Optional<Date>` | Valid date range from 1970-01-01 to 2105-12-31. Values outside this range return `NULL`. |
71+
| `timestamp` | `Optional<Timestamp>` | Valid time range from 1970-01-01 00:00:00 to 2105-12-31 23:59:59. Values outside this range return `NULL`. |
72+
| `bytea` | `Optional<String>` ||
73+
| `character` | `Optional<Utf8>` | [Default collation rules](https://www.postgresql.org/docs/current/collation.html), string padded with spaces to the required length. |
74+
| `character varying` | `Optional<Utf8>` | [Default collation rules](https://www.postgresql.org/docs/current/collation.html). |
75+
| `text` | `Optional<Utf8>` | [Default collation rules](https://www.postgresql.org/docs/current/collation.html). |

ydb/docs/en/core/concepts/federated_query/s3/_includes/create_external_table.md

Whitespace-only changes.
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
|Path format|Description|Example|
2+
|----|----|---|
3+
|Path ends with a `/`|Path to a directory|The path `/a` addresses all contents of the directory:<br>`/a/b/c/d/1.txt`<br>`/a/b/2.csv`|
4+
|Path contains a wildcard character `*`|Any files nested in the path|The path `/a/*.csv` addresses files in directories:<br>`/a/b/c/1.csv`<br>`/a/2.csv`<br>`/a/b/c/d/e/f/g/2.csv`|
5+
|Path does not end with `/` and does not contain wildcard characters|Path to a single file|The path `/a/b.csv` addresses the specific file `/a/b.csv`|

0 commit comments

Comments
 (0)