Skip to content

add ipv6 field support #5758

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

add ipv6 field support #5758

wants to merge 1 commit into from

Conversation

kzwang
Copy link
Contributor

@kzwang kzwang commented Apr 10, 2014

What I've done is copied the NumericTokenStream, NumericRangeFilter, NumericRangeQuery and NumericUtils from lucene and change those to support BigInteger

the term query and range query/filter are both working, but there are still other things like field data need to do

@kimchy @jpountz @dadoonet can you have a quick look just to make sure I've done the correct thing so far?

closes #3714

@jpountz
Copy link
Contributor

jpountz commented Apr 10, 2014

Thank you @kzwang , I think this is the right approach in general for managing ipv6 addresses.

However, I'd like to be careful here since this decides on how we are going to index these addresses, and it won't be easy to change the format in the future since we need to maintain backward compatibility. For that reason, maybe the index-time range support for types that are greater than 64 bits should be contributed to Lucene first in order to have ideas/feedback from other Lucene committers? For example, I'm wondering if we should support any fixed length data like you did or if we could just write something that would support variable lengths.

Regarding the field mapper, I'm wondering if we need a dedicated field mapper for ipv6 addresses or if we should try to write a field mapper that would both support ipv4 and ipv6 addresses. I think the latter option would be more convenient from the user perspective, but maybe it would also be a problem because it would require us to be able to distinguish ipv4 and ipv6 addresses at both query and indexing time?

@jpountz jpountz self-assigned this Apr 10, 2014
@uboness
Copy link
Contributor

uboness commented Apr 10, 2014

+1 on pushing index formatting to Lucene and not tie it to ES

Regarding the field mapper, I'm wondering if we need a dedicated field mapper for ipv6 addresses or if we should try to write a field mapper that would both support ipv4 and ipv6 addresses. I think the latter option would be more convenient from the user perspective, but maybe it would also be a problem because it would require us to be able to distinguish ipv4 and ipv6 addresses at both query and indexing time?

I'd like to see us pursue adding this support to the ip mapper we already have. Note that the type name doesn't indicate the version of the ip... so it can be confusing and inconsistent to some degree now that we'll support both. @jpountz what problems do you expect at index/query time with having a single mapper? I believe we can have a setting on the mapper for the version where the default is set to IPv4 (for bwc)

@jpountz
Copy link
Contributor

jpountz commented Apr 10, 2014

@jpountz what problems do you expect at index/query time with having a single mapper? I believe we can have a setting on the mapper for the version where the default is set to IPv4 (for bwc)

The scenario I was thinking about is a user indexing Apache access logs who has a field in Elasticsearch that is used to store ip addresses. I don't think we should expect clients to use different fields depending on the version of the ip address, so I guess we either need:

  • to be able to support both v4 and v6 addresses in the same index field,
  • or have two index fields (one for v4 and one for v6) and add a bridge on top of them that would redirect to the appropriate field based on the version of the ip that is being indexed/searched.

I think the 2nd option would make bw compatibility easier to maintain but maybe it would also raise issues (eg. currently field mappers can only expose a single FieldMappers.Names.indexName()).

@kzwang
Copy link
Contributor Author

kzwang commented Apr 11, 2014

I've created a issue in lucene (https://issues.apache.org/jira/browse/LUCENE-5596) and I'll move those code to there

@uschindler
Copy link
Contributor

Hi, we started to discuss on the Lucene issue already,
To me it looks wrong to use BigInteger at all for IPv6 addresses. IP adresses in elasticsearch should use the raw bytes as returned by InetAddress#getAddress() which are in network byte order. Network byte order is sortable as we want to have it, without any signedness issues.

The proposal in Lucene is to not provide Big numeric support at all, just allow the precision step stuff also work on binary terms. We also try to allow indexing "binary" terms (which is supported under the hood by lucene) more easily.

In that case, ES is repsonsible to create a byte[] out of the network addresse (or whatever type) and index it. DocValues and Stored fields work out of the box already, just the part that takes care of indexing and range-querying the values may need improvements in Lucene (to support fast ranges). Out of the box, you can even do a TermRangeQuery on a binary term easily, if you indexed it as binary term! It is just not using prefix encoded terms for range speedup.

@clintongormley
Copy link
Contributor

@gurvindersingh
Copy link

any update on this issue ?

@jrideout
Copy link

jrideout commented Oct 6, 2014

@clintongormley clintongormley added the :Search Foundations/Mapping Index mappings, including merging and defining field types label Nov 11, 2014
@vvaradhan
Copy link

@jrideout https://issues.apache.org/jira/browse/LUCENE-5879 got fixed recently in trunk and 5.2 of Lucene. Is it possible to provide an update now for ipv6 support in ip field type?

@clintongormley
Copy link
Contributor

@vvaradhan Auto-prefixed terms in Lucene has been exposed as an experimental postings format, so it's not safe for us to use until the feature makes it to the default postings format (which is backward compatible)

@mikemccand
Copy link
Contributor

I think we could alternatively use the new (just released in Lucene 5.3.0) NumericRangeTree to implement this?

See #5683 (comment) for some ideas on how it would work for BigInteger/Decimal ... I think the basic idea would be similar: any value that can be converted into a "same sort order" byte[] should work.

@mikemccand mikemccand removed the stalled label Sep 1, 2015
@uschindler
Copy link
Contributor

Regarding #5683: Converting of the ipv6 addresses to BigDecimal would be a waste here? Just use the byte[] directly: http://docs.oracle.com/javase/7/docs/api/java/net/Inet6Address.html#getAddress()

All signs are correct from the beginning, because its just 16 bytes (0-255), highest byte first. So order is correct by default. Theoretically we can index it as is (if we could use the new AutoPrefix terms), but for NumericRangeTree it should also be simple.

@mikemccand
Copy link
Contributor

@uschindler right, we should go straight to the byte[]! It ought to work very well.

@SKumarMN
Copy link

@kimchy @jpountz @dadoonet @kzwang

Hi,

I have used the above fix in my 1.4.4 code to support big integer by changing the IPV6 Mapper. Search and range queries works fine. Our application needs support for Bigdecimal too. Could you please provide me pointers about how can i implement big decimal support with range functionality as well..

@clintongormley
Copy link
Contributor

Closing in favour of #17007

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Search Foundations/Mapping Index mappings, including merging and defining field types
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for IPv6 mapping type
10 participants