Skip to content

Format doc values fields. #22146

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

jpountz
Copy link
Contributor

@jpountz jpountz commented Dec 13, 2016

Currently docvalues_fields return the values of the fields as they are stored
in doc values. I don't like that it exposes implementation details, but there
are also user-facing issues like the fact it cannot work with binary fields.
This change will also make it easier for users to reindex if they do not store
the source, since docvalues_fields will return data is such a format that it
can be put in an indexing request with the same mappings.

The hard part of the change is backward compatibility, since it is breaking.
The approach taken here is that 5.x will keep exposing the internal
representation, with a special format name called use_field_format which
will format the field depending on how it is mapped. This will become the
default in 6.0, and this hardcoded format name will be removed in 7.0 to ease
the transition from 5.x to 6.x.

Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea. I like anything that makes it clear that these fields have been processed somehow and we aren't just returning them directly from source.

* @param name The field to get from the docvalue
* @param format How to format the field, {@code null} to use defaults.
*/
public SearchRequestBuilder addDocValueField(String name, String format) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're usually OK with making breaking changes to the Transport API in a minor release it is probably ok to add the format parameter to this method if you think it makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know Kibana uses this feature to retrieve dates in millis since Epoch, so with this option they will be able to pass format: epoch_millis when we format docvalue fields. I also plan to use it for the transition period: users will be able in 5.x to enable the 6.x behaviour by using the special format use_field_defaults.

* @param name name of the field
* @param format how to format the field
*/
public InnerHitBuilder addDocValueField(String name, String format) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make format @Nullable and describe what null means here?

@clintongormley clintongormley added the :Search/Search Search-related issues that do not fall into other categories label Dec 14, 2016
@jimczi
Copy link
Contributor

jimczi commented Dec 14, 2016

+1 as well. I think it's useful for dates, it also fixes problems we can have with binary doc_values. Not sure about the name of the BWC param though, in the code you use use_field_format and in the docs you use use_field_defaults. What about use_field_mapping ?

@jpountz
Copy link
Contributor Author

jpountz commented Dec 14, 2016

Woooooops, good catch. I like use_field_mapping too, I'll wait for this to be discussed in Fixit Friday to see what others think about it.


@Override
public void writeTo(StreamOutput out) throws IOException {
out.writeString(name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this up under the reading constructor? It helps to have them on the same screen.

--------------------------------------------------
GET _search
{
"docvalue_fields" : [ { "name": "my_date", "format": "epoch_millis" } ]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should keep this as the default if possible in 5.x so we aren't making breaking changes, I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

@jpountz
Copy link
Contributor Author

jpountz commented Dec 27, 2016

I'm removing the discuss label since we agreed to do this in FixitFriday. I also opened another PR for 5.x to better show the bw compat layer: #22354.

"docvalue_fields" : [ { "name": "my_date", "format": "epoch_millis" } ]
}
--------------------------------------------------
// CONSOLE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optionally, you may want to do something like // TEST[setup:twitter] to create a small index so the test generated from the snippet will find anything an exercise the snippet more fully. It certainly isn't required but I've been trying to do it lately.

@jpountz jpountz force-pushed the fix/format_doc_values_fields branch from bf181d9 to e782af4 Compare January 31, 2017 16:40
Currently `docvalues_fields` return the values of the fields as they are stored
in doc values. I don't like that it exposes implementation details, but there
are also user-facing issues like the fact it cannot work with binary fields.
This change will also make it easier for users to reindex if they do not store
the source, since `docvalues_fields` will return data is such a format that it
can be put in an indexing request with the same mappings.

The hard part of the change is backward compatibility, since it is breaking.
The approach taken here is that 5.x will keep exposing the internal
representation, with a special format name called `use_field_format` which
will format the field depending on how it is mapped. This will become the
default in 6.0, and this hardcoded format name will be removed in 7.0 to ease
the transition from 5.x to 6.x.
@jpountz jpountz force-pushed the fix/format_doc_values_fields branch from e782af4 to d1490e1 Compare April 21, 2017 07:22
@rjernst
Copy link
Member

rjernst commented Jun 9, 2017

@jpountz Should this PR (and the related 5.x PR) be brought up to date and merged?

@dakrone
Copy link
Member

dakrone commented Aug 15, 2017

@jpountz ping on this issue again for updating and merging (or closing if it's not needed any more)

@jpountz
Copy link
Contributor Author

jpountz commented Aug 16, 2017

this PR needs sync with Kibana in order to be merged so I'll do it after 6.0 is out

@jpountz jpountz closed this Oct 10, 2017
@lcawl lcawl added v6.0.0-rc2 and removed v6.0.0 labels Oct 30, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search/Search Search-related issues that do not fall into other categories v5.4.4 v6.0.0-rc2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants