DotExpandingXContentParser to expose the original token location #84970

javanna · 2022-03-15T10:53:12Z

With #79922 we have introduced a parser that expands dots in fields names on the fly, so that the expansion no longer needs to be handled by consumers.

The token location exposed by such parser can be confusing to interpret: consumers are parsing the expanded version which requires jumping ahead reading tokens and exposing additional field names and start objects, while users have sent the unexpanded version and would like errors to refer to the original content.

This commit adds a test for this scenario and tweaks the DotExpandingXContentParser to cache the token location before jumping ahead to expand dots in field names.

elasticmachine · 2022-03-15T10:53:17Z

Pinging @elastic/es-search (Team:Search)

elasticsearchmachine · 2022-03-15T10:53:37Z

Hi @javanna, I've created a changelog YAML for you.

javanna · 2022-03-15T10:56:10Z

libs/x-content/src/test/java/org/elasticsearch/xcontent/DotExpandingXContentParserTests.java

@@ -166,4 +166,64 @@ public void testNestedExpansions() throws IOException {
            {"first.dot":{"second.dot":"value","third":"value"},"nodots":"value"}\
            """);
    }
+
+    public void testGetTokenLocation() throws IOException {


I meant to add a test to reproduce this issue when indexing a document, but I did not find an easy way to do so. It turns out that a lot of errors that we throw as MapperParsingException don't actually hold the xcontent location, and the ones that do are parse exception coming from jackson directly. For the latter, the token location is not aligned only while expanding start or end object, which makes it hard to recreate the situation where the wrong token location would get returned without my fix.

All this said, I came to reconsider how important this fix is. Maybe in practice nobody will ever notice...?

Or maybe we should be including xcontent location in more parsing exceptions? It's a bit odd that we use the same exception type for parsing mappings and parsing documents as well.

I opened #85083

javanna · 2022-03-15T12:00:06Z

run elasticsearch-ci/part-2

javanna · 2022-03-15T12:00:19Z

run elasticsearch-ci/bwc

romseygeek

LGTM

With elastic#79922 we have introduced a parser that expands dots in fields names on the fly, so that the expansion no longer needs to be handled by consumers. The token location exposed by such parser can be confusing to interpret: consumers are parsing the expanded version which requires jumping ahead reading tokens and exposing additional field names and start objects, while users have sent the unexpanded version and would like errors to refer to the original content. This commit adds a test for this scenario and tweaks the DotExpandingXContentParser to cache the token location before jumping ahead to expand dots in field names.

elasticsearchmachine · 2022-03-16T14:11:36Z

💚 Backport successful

Status	Branch	Result
✅	8.1

…stic#84970) With elastic#79922 we have introduced a parser that expands dots in fields names on the fly, so that the expansion no longer needs to be handled by consumers. The token location exposed by such parser can be confusing to interpret: consumers are parsing the expanded version which requires jumping ahead reading tokens and exposing additional field names and start objects, while users have sent the unexpanded version and would like errors to refer to the original content. This commit adds a test for this scenario and tweaks the DotExpandingXContentParser to cache the token location before jumping ahead to expand dots in field names.

) With #79922 we have introduced a parser that expands dots in fields names on the fly, so that the expansion no longer needs to be handled by consumers. The token location exposed by such parser can be confusing to interpret: consumers are parsing the expanded version which requires jumping ahead reading tokens and exposing additional field names and start objects, while users have sent the unexpanded version and would like errors to refer to the original content. This commit adds a test for this scenario and tweaks the DotExpandingXContentParser to cache the token location before jumping ahead to expand dots in field names.

javanna added >bug :Search/Search Search-related issues that do not fall into other categories v8.2.0 v8.1.1 labels Mar 15, 2022

javanna requested a review from romseygeek March 15, 2022 10:53

elasticmachine added the Team:Search Meta label for search team label Mar 15, 2022

javanna commented Mar 15, 2022

View reviewed changes

javanna added :Search Foundations/Mapping Index mappings, including merging and defining field types and removed :Search/Search Search-related issues that do not fall into other categories labels Mar 15, 2022

romseygeek approved these changes Mar 16, 2022

View reviewed changes

javanna changed the base branch from master to main March 16, 2022 10:53

javanna added 3 commits March 16, 2022 11:55

remove leftover test

4f02bc9

Update docs/changelog/84970.yaml

7474c98

javanna force-pushed the fix/dot_expand_token_location branch from 08bbbaf to 7474c98 Compare March 16, 2022 10:56

javanna changed the base branch from main to master March 16, 2022 10:58

javanna added the auto-backport Automatically create backport pull requests when merged label Mar 16, 2022

javanna merged commit 4a26ed2 into elastic:master Mar 16, 2022

javanna mentioned this pull request Mar 16, 2022

[8.1] DotExpandingXContentParser to expose the original token location (#84970) #85032

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DotExpandingXContentParser to expose the original token location #84970

DotExpandingXContentParser to expose the original token location #84970

Uh oh!

javanna commented Mar 15, 2022

Uh oh!

elasticmachine commented Mar 15, 2022

Uh oh!

elasticsearchmachine commented Mar 15, 2022

Uh oh!

javanna Mar 15, 2022

Uh oh!

romseygeek Mar 16, 2022

Uh oh!

javanna Mar 17, 2022

Uh oh!

javanna commented Mar 15, 2022

Uh oh!

javanna commented Mar 15, 2022

Uh oh!

romseygeek left a comment

Uh oh!

elasticsearchmachine commented Mar 16, 2022

Uh oh!

Uh oh!

DotExpandingXContentParser to expose the original token location #84970

DotExpandingXContentParser to expose the original token location #84970

Uh oh!

Conversation

javanna commented Mar 15, 2022

Uh oh!

elasticmachine commented Mar 15, 2022

Uh oh!

elasticsearchmachine commented Mar 15, 2022

Uh oh!

javanna Mar 15, 2022

Choose a reason for hiding this comment

Uh oh!

romseygeek Mar 16, 2022

Choose a reason for hiding this comment

Uh oh!

javanna Mar 17, 2022

Choose a reason for hiding this comment

Uh oh!

javanna commented Mar 15, 2022

Uh oh!

javanna commented Mar 15, 2022

Uh oh!

romseygeek left a comment

Choose a reason for hiding this comment

Uh oh!

elasticsearchmachine commented Mar 16, 2022

💚 Backport successful

Uh oh!

Uh oh!