Add dims parameter to dense_vector mapping #43444

mayya-sharipova · 2019-06-20T17:40:20Z

Typically, dense vectors of both documents and queries must have the same
number of dimensions. Different number of dimensions among documents
or query vector indicate an error. This PR enforces that all vectors
for the same field have the same number of dimensions. It also enforces
that query vectors have the same number of dimensions as doc vectors.

elasticmachine · 2019-06-20T17:40:52Z

Pinging @elastic/es-search

mayya-sharipova · 2019-06-20T20:27:10Z

@elasticmachine run elasticsearch-ci/1

mayya-sharipova · 2019-06-20T20:31:37Z

@elasticmachine run elasticsearch-ci/bwc

mayya-sharipova · 2019-06-20T20:33:07Z

@elasticmachine run elasticsearch-ci/default-distro

Typically, dense vectors of both documents and queries must have the same number of dimensions. Different number of dimensions among documents or query vector indicate an error. This PR enforces that all vectors for the same field have the same number of dimensions. It also enforces that query vectors have the same number of dimensions.

jtibshirani

Thanks @mayya-sharipova, this change makes sense to me. For the kinds of use cases we're targeting with vector fields, it does seem that having vectors with mixed dimensions almost always indicates an error, and it is helpful to be alerted to it. I can't think of a case where this behavior would be overly restrictive (and couldn't be worked around easily).

While reviewing this change, I noticed that for the cosineSimilarity and dotProduct functions, we accept documents without the field and always return 0. This behavior isn't very intuitive to me -- I would expect for the similarity function to error, and that the user should explicitly handle documents without the field in the script. The current behavior could mask errors in the data (the type of error we're trying to avoid with the current change).

...gin/vectors/src/main/java/org/elasticsearch/xpack/vectors/mapper/DenseVectorFieldMapper.java

jtibshirani · 2019-06-27T08:48:34Z

...gin/vectors/src/main/java/org/elasticsearch/xpack/vectors/mapper/DenseVectorFieldMapper.java

        int offset = 0;
        int dim = 0;
        for (Token token = context.parser().nextToken(); token != Token.END_ARRAY; token = context.parser().nextToken()) {
+            if (dim++ >= dims) {
+                throw new MapperParsingException("Field [" + name() + "] of type [" + typeName() + "] of doc [" +


It seems like IllegalArgumentException would make more sense here, since we are parsing a document and not a mapping.

@jtibshirani I agree with you. This should be IllegalArgumentException, but these exceptions will be caught by DocumentParser::parseDocument and wrapped into the MapperParsingException, which eventually come up as MapperParsingExceptions with a cause underneath it as IllegalArgumentException.
That's why I thought it makes sense to throw MapperParsingException from the beginning.
What do you think - does it still makes sense to have IllegalArgumentException underneath MapperParsingException?
Actually, I would really benefit from more clarification around exceptions, may be worth discussion with the team.

...lugin/vectors/src/main/java/org/elasticsearch/xpack/vectors/mapper/VectorEncoderDecoder.java

x-pack/plugin/vectors/src/main/java/org/elasticsearch/xpack/vectors/query/ScoreScriptUtils.java

mayya-sharipova · 2019-06-28T20:22:42Z

@jtibshirani Thanks for your feedback, the next commit in the PR tries to address it.
Two things left unaddressed:

Exceptions. I was wondering if it makes difference for you to have a single-level MapperParsingException or MapperParsingException with wrapped IllegalArgumentException?
the cosineSimilarity and dotProduct functions, we accept documents without the field and always return 0.

The way the API and VectorScriptDocValues are designed right now it is NOT possible to check if a document has a value for a vector field. Usually for other fields this is done by doc['my_field'].value == null or doc['my_field'].size() == 0, but VectorScriptDocValues doesn't have getValue method and get and size methods are not supported. This is the decision we made not to leak the internal representations of vectors.

As there is not way for users to skip documents without a vector field, I chose to have a default value 0 for this case. But I wonder how much this is an issue, and if it is, we may redesign the vectors API and VectorScriptDocValues to have access to value instead.

…s_to_dense_vector

jtibshirani

Exceptions. I was wondering if it makes difference for you to have a single-level MapperParsingException or MapperParsingException with wrapped IllegalArgumentException?

Thanks @mayya-sharipova, I didn't see that MapperParsingException is also used when parsing documents. I don't have a strong preference, I think I'd still lean towards IllegalArgumentException since it's more commonly used in this context (I verified this through a quick code search).

The way the API and VectorScriptDocValues are designed right now it is NOT possible to check if a document has a value for a vector field.

I wonder if it'd at least make sense to support size()? In any case, it is not too relevant to this PR and I agree we can wait for more feedback to see if we need to rethink the VectorScriptDocValues API.

The last comment I had is that we should probably update the line "
If a document’s dense vector field has a number of dimensions different from the query’s vector..." from this documentation: https://www.elastic.co/guide/en/elasticsearch/reference/master/query-dsl-script-score-query.html#vector-functions

jtibshirani · 2019-07-01T07:57:09Z

docs/reference/mapping/types/dense-vector.asciidoc

@@ -24,7 +22,8 @@ PUT my_index
  "mappings": {
    "properties": {
      "my_vector": {
-        "type": "dense_vector"
+        "type": "dense_vector", <1>


I think this callout <1> should go on the next line.

jtibshirani · 2019-07-01T08:03:19Z

...gin/vectors/src/main/java/org/elasticsearch/xpack/vectors/mapper/DenseVectorFieldMapper.java

@@ -80,11 +97,17 @@ public DenseVectorFieldMapper build(BuilderContext context) {
        @Override
        public Mapper.Builder<?,?> parse(String name, Map<String, Object> node, ParserContext parserContext) throws MapperParsingException {
            DenseVectorFieldMapper.Builder builder = new DenseVectorFieldMapper.Builder(name);
-            return builder;
+            Object dimsField = node.remove("dims");
+            if (dimsField == null) {


Using the same reasoning as above, it would be nice to move this check to DenseVectorFieldMapper.Builder#build.

@jtibshirani Thanks, but we need this check here to ensure we don't get NullPointerException on the line that follows after that:

int dims = XContentMapValues.nodeIntegerValue(dimsField);

Also, I was thinking that public Builder dims(int dims) can accept dims parameter as a primitive integer.

One approach would be to check if the node is null, and only parse it and call Builder#dims if it is not null. Then Builder#build could complain if Builder#dims has never been called. Anyways, this is a small comment, the change looks good to me!

...lugin/vectors/src/main/java/org/elasticsearch/xpack/vectors/mapper/VectorEncoderDecoder.java

mayya-sharipova · 2019-07-02T13:12:14Z

@elasticmachine test this please

mayya-sharipova · 2019-07-02T13:13:28Z

@jtibshirani Thanks Julie, I have addressed your 2nd feedback in the last commit.

I wonder if it'd at least make sense to support size()? In any case, it is not too relevant to this PR and I agree we can wait for more feedback to see if we need to rethink the VectorScriptDocValues API

Right, let's do it in a separate PR.

mayya-sharipova · 2019-07-02T15:02:19Z

@elasticmachine test this please

…s_to_dense_vector

mayya-sharipova · 2019-07-02T16:27:06Z

@elasticmachine run elasticsearch-ci/packaging-sample

mayya-sharipova · 2019-07-02T17:05:05Z

@elasticmachine run elasticsearch-ci/packaging-sample

…s_to_dense_vector

mayya-sharipova · 2019-07-02T18:43:01Z

@elasticmachine test this please

Typically, dense vectors of both documents and queries must have the same number of dimensions. Different number of dimensions among documents or query vector indicate an error. This PR enforces that all vectors for the same field have the same number of dimensions. It also enforces that query vectors have the same number of dimensions.

After backport to 7.3 adjust yml tests Relates to #43444

mayya-sharipova requested a review from jtibshirani June 20, 2019 17:40

mayya-sharipova added the :Search Relevance/Ranking Scoring, rescoring, rank evaluation. label Jun 20, 2019

mayya-sharipova added >enhancement v7.3.0 v8.0.0 labels Jun 20, 2019

mayya-sharipova force-pushed the add_number_of_dims_to_dense_vector branch from 3179461 to bac0c34 Compare June 21, 2019 13:38

jtibshirani reviewed Jun 27, 2019

View reviewed changes

Address Julie's feedback

c8f397a

Merge remote-tracking branch 'upstream/master' into add_number_of_dim…

e9be0c5

…s_to_dense_vector

jtibshirani reviewed Jul 1, 2019

View reviewed changes

Address Julie's comments 2

3f747bf

jtibshirani approved these changes Jul 2, 2019

View reviewed changes

Merge remote-tracking branch 'upstream/master' into add_number_of_dim…

7b551a8

…s_to_dense_vector

Merge remote-tracking branch 'upstream/master' into add_number_of_dim…

d8046cf

…s_to_dense_vector

mayya-sharipova merged commit 66e1e56 into elastic:master Jul 2, 2019

mayya-sharipova added :Search Foundations/Mapping Index mappings, including merging and defining field types and removed :Search Relevance/Ranking Scoring, rescoring, rank evaluation. labels Jul 2, 2019

mayya-sharipova mentioned this pull request Jul 2, 2019

Add dims parameter to dense_vector mapping (#43444) #43895

Merged

mayya-sharipova added a commit that referenced this pull request Jul 3, 2019

Adjust version in vectors tests after backport

ec74354

After backport to 7.3 adjust yml tests Relates to #43444

Mpdreamz mentioned this pull request Aug 7, 2019

[meta] 7.3 Release elastic/elasticsearch-net#4001

Closed

16 tasks

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add dims parameter to dense_vector mapping #43444

Add dims parameter to dense_vector mapping #43444

mayya-sharipova commented Jun 20, 2019

elasticmachine commented Jun 20, 2019

mayya-sharipova commented Jun 20, 2019

mayya-sharipova commented Jun 20, 2019

mayya-sharipova commented Jun 20, 2019

jtibshirani left a comment

jtibshirani Jun 27, 2019

mayya-sharipova Jun 28, 2019 •

edited

Loading

mayya-sharipova commented Jun 28, 2019 •

edited

Loading

jtibshirani left a comment •

edited

Loading

jtibshirani Jul 1, 2019

jtibshirani Jul 1, 2019

mayya-sharipova Jul 2, 2019

jtibshirani Jul 2, 2019 •

edited

Loading

mayya-sharipova commented Jul 2, 2019

mayya-sharipova commented Jul 2, 2019

mayya-sharipova commented Jul 2, 2019

mayya-sharipova commented Jul 2, 2019

mayya-sharipova commented Jul 2, 2019

mayya-sharipova commented Jul 2, 2019

Add dims parameter to dense_vector mapping #43444

Add dims parameter to dense_vector mapping #43444

Conversation

mayya-sharipova commented Jun 20, 2019

elasticmachine commented Jun 20, 2019

mayya-sharipova commented Jun 20, 2019

mayya-sharipova commented Jun 20, 2019

mayya-sharipova commented Jun 20, 2019

jtibshirani left a comment

Choose a reason for hiding this comment

jtibshirani Jun 27, 2019

Choose a reason for hiding this comment

mayya-sharipova Jun 28, 2019 • edited Loading

Choose a reason for hiding this comment

mayya-sharipova commented Jun 28, 2019 • edited Loading

jtibshirani left a comment • edited Loading

Choose a reason for hiding this comment

jtibshirani Jul 1, 2019

Choose a reason for hiding this comment

jtibshirani Jul 1, 2019

Choose a reason for hiding this comment

mayya-sharipova Jul 2, 2019

Choose a reason for hiding this comment

jtibshirani Jul 2, 2019 • edited Loading

Choose a reason for hiding this comment

mayya-sharipova commented Jul 2, 2019

mayya-sharipova commented Jul 2, 2019

mayya-sharipova commented Jul 2, 2019

mayya-sharipova commented Jul 2, 2019

mayya-sharipova commented Jul 2, 2019

mayya-sharipova commented Jul 2, 2019

mayya-sharipova Jun 28, 2019 •

edited

Loading

mayya-sharipova commented Jun 28, 2019 •

edited

Loading

jtibshirani left a comment •

edited

Loading

jtibshirani Jul 2, 2019 •

edited

Loading