Skip to content

Add enrich source field mapper. #42423

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

martijnvg
Copy link
Member

The enrich source field mapper stores the source of a document as
binary doc values. This is useful in cases where retrieval speeds
are more important than compact storage (which is what SourceFieldMapper does),
which is the case for the enrich processor.

Prior to this change enrich processor was using _source stored field
to fetch the enrich document to enrich document being ingested.

The enrich policy runner, when creating the enrich index, disables
_source meta field and enables the _enrich_source meta field.

The enrich source field mapper is an internal field, which is only
meant to be used by the enrich feature.

Relates to #41521 and #32789

The enrich source field mapper stores the source of a document as
binary doc values. This is useful in cases where retrieval speeds
are more important than compact storage (which is what SourceFieldMapper does),
which is the case for the enrich processor.

Prior to this change enrich processor was using _source stored field
to fetch the enrich document to enrich document being ingested.

The enrich policy runner, when creating the enrich index, disables
_source meta field and enables the _enrich_source meta field.

The enrich source field mapper is an internal field, which is only
meant to be used by the enrich feature.

Relates to elastic#41521 and elastic#32789
@martijnvg martijnvg added >non-issue :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP labels May 23, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features

@martijnvg
Copy link
Member Author

martijnvg commented May 23, 2019

Running benchmark using this track with this new meta field type already shows good performance improvements:

|                                                Min Throughput | insert-messages-with-enrich-pipeline |    11260.4 |       19572 |  8311.57 | docs/s |
|                                             Median Throughput | insert-messages-with-enrich-pipeline |    15292.7 |       23454 |  8161.33 | docs/s |
|                                                Max Throughput | insert-messages-with-enrich-pipeline |    16502.6 |     24143.1 |  7640.51 | docs/s |
|                                       50th percentile latency | insert-messages-with-enrich-pipeline |    553.256 |      356.66 | -196.596 |     ms |
|                                       90th percentile latency | insert-messages-with-enrich-pipeline |    687.964 |     567.713 |  -120.25 |     ms |
|                                      100th percentile latency | insert-messages-with-enrich-pipeline |    814.368 |     763.496 | -50.8725 |     ms |
|                                  50th percentile service time | insert-messages-with-enrich-pipeline |    553.256 |      356.66 | -196.596 |     ms |
|                                  90th percentile service time | insert-messages-with-enrich-pipeline |    687.964 |     567.713 |  -120.25 |     ms |
|                                 100th percentile service time | insert-messages-with-enrich-pipeline |    814.368 |     763.496 | -50.8725 |     ms |
|                                                    error rate | insert-messages-with-enrich-pipeline |          0 |           0 |        0 |      % |

(the baseline is what is in the enrich branch and the contender what is in this PR)

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good. Something I'm wondering is if we can prevent users from using this field on their regular indices?


@Override
public SortedBinaryDocValues getBytesValues() {
return new SortedBinaryDocValues() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use FieldData#singleton(values)

@martijnvg
Copy link
Member Author

Thanks for reviewing @jpountz!

Something I'm wondering is if we can prevent users from using this field on their regular indices?

Perhaps enforcing that the index name should start with .enrich- is enough? All enrich indices will have this prefix (EnrichPolicy#ENRICH_INDEX_NAME_BASE).

@jpountz
Copy link
Contributor

jpountz commented May 29, 2019

👍

@martijnvg martijnvg requested a review from jpountz May 29, 2019 15:34
@martijnvg
Copy link
Member Author

This change is no longer relevant.

@martijnvg martijnvg closed this Jul 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >non-issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants