Skip to content

Allow invalid values to be ignored #493

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
apatrida opened this issue Nov 8, 2010 · 5 comments
Closed

Allow invalid values to be ignored #493

apatrida opened this issue Nov 8, 2010 · 5 comments

Comments

@apatrida
Copy link
Contributor

apatrida commented Nov 8, 2010

For datatypes that expect a specific format, allow a variation of the type that is a "softie" in that it allows bad things to just be ignored. This is important for unclean data, and although other work may allow you to write a document processing plugin to ES that can clean data (i.e. parse human readable dates into a standard form), it is sometimes likely you can have some rough data that you don't want to kill your indexing. Especially for fields that are not that important but you want a best attempt to insert the document.

You then could also consider marking documents that have validation errors so they could be later rescanned and reindexed given the stored JSON when a cleaner is added that would possibly resolve the problem. For example, mark the document as validation error for field XX and later search for those docs and ask the system "reindex document " from its own stored form.

So 3 things here:

  • be able to mark a field mapping to allow invalid data to be ignored (discarded) on a field by field basis
  • have the system mark the record as not passing validation at a per-field level
  • be able to ask the system to reindex by ID using _stored JSON or even by reindex-by-query (Reindex from _source by document ID or Query #492)
@kimchy
Copy link
Member

kimchy commented Nov 9, 2010

Currently, the only field type that qualifies this is the date type field. We can add something like that to it, and in case its fails to parse, just don't index it. You can always query for documents that don't have this field and then handle it? If the actual value still needs to be stored as well, it can be a multi_field mapping, one with plane string type, and one with date in this "soft" format.

@apatrida
Copy link
Contributor Author

apatrida commented Nov 9, 2010

Numeric fields that don't parse wouldn't qualify (or are they handled in creating the JSON or previous?)

@kimchy
Copy link
Member

kimchy commented Nov 9, 2010

Yea, json already handles it, since it has native types for numbers.

@apatrida
Copy link
Contributor Author

apatrida commented Nov 9, 2010

dandy.

@clintongormley
Copy link
Contributor

Also, there is now a flag for ignoring malformed values

cbuescher pushed a commit to cbuescher/elasticsearch that referenced this issue Oct 2, 2023
PR#491 introduced a check for the presence of `/etc/crypttab` but there
was an Ansible bug that this commit fixes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants