Skip to content

Terms Aggregation KeyAsString is always null #2393

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
AmirSasson opened this issue Nov 19, 2016 · 5 comments
Closed

Terms Aggregation KeyAsString is always null #2393

AmirSasson opened this issue Nov 19, 2016 · 5 comments
Assignees

Comments

@AmirSasson
Copy link

AmirSasson commented Nov 19, 2016

NEST/Elasticsearch.Net version:2.4.6

Elasticsearch version:
"version" : {
"number" : "2.4.1",
"build_hash" : "c67dc32e24162035d18d6fe1e952c4cbcbe79d16",
"build_timestamp" : "2016-09-27T18:57:55Z",
"build_snapshot" : false,
"lucene_version" : "5.5.2"
},

Description of the problem including expected versus actual behavior:

Steps to reproduce:

  1. run Search with Terms aggregation on one of the object properties of type Guid
    for example object like : {name: 'SOME NAME' guidVal: '034a7b09-9fee-4f63-a87f-4592edd9560d'}

var aggResponse =  client.Search<TestClass>(s=>            
s.Aggregations(a => a.Terms("distinc_bus", ad => ad.Field(t => t.guidVal))));

List<Guid> listOfGuids = aggResponse.Aggs.Terms("distinc_bus").Buckets.Select(b => Guid.Parse(b.Key)).ToList(); //this line will throw as key is not returned as valid Guid
  1. expected result : you get bucket that the bucket key ,or at least the KeyAsString is the same as the guidVal (i.e. '034a7b09-9fee-4f63-a87f-4592edd9560d').
  2. actual result : the bucket key is only the last token of the Guid : "4592edd9560d", and the KeyAsString is null (the value returned fine when reading a single document with Get() or multiple with Search() with no aggregations , so it is saved fine on the Elastic search itself)
@djnelson9715
Copy link

Amir,

I think the issue you are having is because your guidVal property is being
analyzed which results in the quid fragments being used by the terms
aggregation.

On Sat, Nov 19, 2016 at 9:34 AM, Amir Sasson [email protected]
wrote:

NEST/Elasticsearch.Net version:2.4.6

Elasticsearch version:
"version" : {
"number" : "2.4.1",
"build_hash" : "c67dc32e24162035d18d6fe1e952c4cbcbe79d16",
"build_timestamp" : "2016-09-27T18:57:55Z",
"build_snapshot" : false,
"lucene_version" : "5.5.2"
},

Description of the problem including expected versus actual behavior:

Steps to reproduce:

  1. run Search with Terms aggregation on one of the object properties
    of type Guid
    for example object like : {name: 'SOME NAME' guidVal:
    '034a7b09-9fee-4f63-a87f-4592edd9560d'}

var aggResponse = client.Search(s=>
s.Aggregations(a => a.Terms("distinc_bus", ad => ad.Field(t =>
t.guidVal))));

aggResponse .Aggs.Terms("distinc_bus").Buckets.Select(b =>
Guid.Parse(b.Key)).ToList(); //this line will throw as key is not returned
as valid Guid

  1. expected result : you get bucket that the bucket key ,or at least
    the KeyAsString is the same as the guidVal (i.e. '034a7b09-9fee-4f63-a87f-
    4592edd9560d').
  2. actual result : the bucket is only the last token of the Guid :
    "4592edd9560d", and the KeyAsString is null


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#2393, or mute the
thread
https://github.com/notifications/unsubscribe-auth/ABy19V-A4zX6WeEQVNVbktXf32hHX_1Hks5q_xcTgaJpZM4K3SD0
.

Doug Nelson

@AmirSasson
Copy link
Author

How can the analyzer be disabled for this aggregation only?

On Nov 19, 2016 19:07, "djnelson9715" [email protected] wrote:

Amir,

I think the issue you are having is because your guidVal property is being
analyzed which results in the quid fragments being used by the terms
aggregation.

On Sat, Nov 19, 2016 at 9:34 AM, Amir Sasson [email protected]
wrote:

NEST/Elasticsearch.Net version:2.4.6

Elasticsearch version:
"version" : {
"number" : "2.4.1",
"build_hash" : "c67dc32e24162035d18d6fe1e952c4cbcbe79d16",
"build_timestamp" : "2016-09-27T18:57:55Z",
"build_snapshot" : false,
"lucene_version" : "5.5.2"
},

Description of the problem including expected versus actual behavior:

Steps to reproduce:

  1. run Search with Terms aggregation on one of the object properties
    of type Guid
    for example object like : {name: 'SOME NAME' guidVal:
    '034a7b09-9fee-4f63-a87f-4592edd9560d'}

var aggResponse = client.Search(s=>
s.Aggregations(a => a.Terms("distinc_bus", ad => ad.Field(t =>
t.guidVal))));

aggResponse .Aggs.Terms("distinc_bus").Buckets.Select(b =>
Guid.Parse(b.Key)).ToList(); //this line will throw as key is not
returned
as valid Guid

  1. expected result : you get bucket that the bucket key ,or at least
    the KeyAsString is the same as the guidVal (i.e.
    '034a7b09-9fee-4f63-a87f-
    4592edd9560d').
  2. actual result : the bucket is only the last token of the Guid :
    "4592edd9560d", and the KeyAsString is null


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#2393, or mute the
thread
<https://github.com/notifications/unsubscribe-auth/ABy19V-
A4zX6WeEQVNVbktXf32hHX_1Hks5q_xcTgaJpZM4K3SD0>
.

Doug Nelson


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#2393 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AKxwHKo5Q1cS9Q-PNBojPbTUCya9VNaMks5q_yztgaJpZM4K3SD0
.

@djnelson9715
Copy link

if look at the mapping for this type,

yourindex/_mapping

        "properties": {
           "Id": {
              "type": "string",
              "index": "not_analyzed"
           },

you need to add the index setting. This is very good practice to use
non-analyzed fields for terms aggregations.

On Sat, Nov 19, 2016 at 11:15 AM, Amir Sasson [email protected]
wrote:

How can the analyzer be disabled for this aggregation only?

On Nov 19, 2016 19:07, "djnelson9715" [email protected] wrote:

Amir,

I think the issue you are having is because your guidVal property is
being
analyzed which results in the quid fragments being used by the terms
aggregation.

On Sat, Nov 19, 2016 at 9:34 AM, Amir Sasson [email protected]
wrote:

NEST/Elasticsearch.Net version:2.4.6

Elasticsearch version:
"version" : {
"number" : "2.4.1",
"build_hash" : "c67dc32e24162035d18d6fe1e952c4cbcbe79d16",
"build_timestamp" : "2016-09-27T18:57:55Z",
"build_snapshot" : false,
"lucene_version" : "5.5.2"
},

Description of the problem including expected versus actual behavior:

Steps to reproduce:

  1. run Search with Terms aggregation on one of the object properties
    of type Guid
    for example object like : {name: 'SOME NAME' guidVal:
    '034a7b09-9fee-4f63-a87f-4592edd9560d'}

var aggResponse = client.Search(s=>
s.Aggregations(a => a.Terms("distinc_bus", ad => ad.Field(t =>
t.guidVal))));

aggResponse .Aggs.Terms("distinc_bus").Buckets.Select(b =>
Guid.Parse(b.Key)).ToList(); //this line will throw as key is not
returned
as valid Guid

  1. expected result : you get bucket that the bucket key ,or at least
    the KeyAsString is the same as the guidVal (i.e.
    '034a7b09-9fee-4f63-a87f-
    4592edd9560d').
  2. actual result : the bucket is only the last token of the Guid :
    "4592edd9560d", and the KeyAsString is null


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#2393, or mute
the
thread
<https://github.com/notifications/unsubscribe-auth/ABy19V-
A4zX6WeEQVNVbktXf32hHX_1Hks5q_xcTgaJpZM4K3SD0>
.

Doug Nelson


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#2393
issuecomment-261725829>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AKxwHKo5Q1cS9Q-
PNBojPbTUCya9VNaMks5q_yztgaJpZM4K3SD0>
.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#2393 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABy19WpTr9LpAKUMDji4152o-T1Fr_Neks5q_y7MgaJpZM4K3SD0
.

Doug Nelson

@russcam
Copy link
Contributor

russcam commented Nov 30, 2016

The problem as @djnelson9715 has pointed out, is that the Guid field guidVal has been mapped (either explicitly or dynamically inferred by Elasticsearch) as an analyzed string field. This can be seen with a simple example

void Main()
{
    var pool = new SingleNodeConnectionPool(new Uri("http://localhost:9200"));
    var connectionSettings = new ConnectionSettings(pool);			
    var client = new ElasticClient(connectionSettings);
	
    var indexName = "github-issue-2393";
  
    client.Index(new TestClass
    {
        guidVal = new Guid("034a7b09-9fee-4f63-a87f-4592edd9560d")
    }, i => i.Refresh().Index(indexName));

    var aggResponse = client.Search<TestClass>(s => s
        .Index(indexName)
        .Aggregations(a => a
            .Terms("distinc_bus", ad => ad
                .Field(t => t.guidVal)
            )
        )
    );
}

public class TestClass
{
    public Guid guidVal { get; set; }
}

This returns the following json for the search response

{
  "took" : 41,
  "timed_out" : false,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "github-issue-2393",
      "_type" : "testclass",
      "_id" : "AVi0L-rWLtlOUI4Y790E",
      "_score" : 1.0,
      "_source" : {
        "guidVal" : "034a7b09-9fee-4f63-a87f-4592edd9560d"
      }
    } ]
  },
  "aggregations" : {
    "distinc_bus" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [ {
        "key" : "034a7b09",
        "doc_count" : 1
      }, {
        "key" : "4592edd9560d",
        "doc_count" : 1
      }, {
        "key" : "4f63",
        "doc_count" : 1
      }, {
        "key" : "9fee",
        "doc_count" : 1
      }, {
        "key" : "a87f",
        "doc_count" : 1
      } ]
    }
  }
}

You can see that each key is actually a subsection of the guid sent in. If you check out the mapping with

var mappingResponse = client.GetMapping<TestClass>(m => m.Index(indexName));

You'll see that the field has been set up as an analyzed string

{
  "github-issue-2393" : {
    "mappings" : {
      "testclass" : {
        "properties" : {
          "guidVal" : {
            "type" : "string"
          }
        }
      }
    }
  }
}

To index it as a not_analyzed string field, you can map it as such with

client.CreateIndex(indexName, c => c
    .Mappings(m => m
        .Map<TestClass>(mm => mm
            // let NEST infer the Elasticsearch types from the POCO type...
            .AutoMap()
            // ...now override any automappings that we want to explicitly set
            .Properties(p => p
                .String(s => s
                    .Name(n => n.guidVal)
                    .NotAnalyzed()
                )
            )
        )
    )
);

You'll need to create a new index to do this and reindex your documents into it. You can use the Reindex API to do this; construct the destination index first with the above mapping. If you need both an analyzed and not_analyzed form of guidVal, then take a look at multi_fields.

Hope that helps!

@russcam russcam closed this as completed Nov 30, 2016
@russcam
Copy link
Contributor

russcam commented Nov 30, 2016

Meant to add that .KeyAsString maps to the value returned by Elasticsearch in the "key_as_string" property of a bucket; for some keys such as dates, Elasticsearch returns a "key_as_string" property as well as the numerical epoch time as the key. In NEST, we always read the "key" as a string, but #2336 is an issue to look at supporting any returned json value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants