Skip to content

Fix synonym phrase query expansion for cross_fields parsing #28045

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jan 15, 2018

Conversation

jimczi
Copy link
Contributor

@jimczi jimczi commented Jan 2, 2018

The cross_fields mode for query parser ignores phrase query generated by multi-word synonyms.
In such case only the first field of each analyzer group is kept. This change fixes this issue
by expanding the phrase query for each analyzer group to all fields using a disjunction max query.

@jimczi jimczi added :Search/Search Search-related issues that do not fall into other categories >bug v6.2.0 v7.0.0 labels Jan 2, 2018
@jimczi jimczi added the review label Jan 9, 2018
Term[] terms = query.getTerms();
PhraseQuery.Builder builder = new PhraseQuery.Builder();
for (int i = 0; i < terms.length; i++) {
builder.add(new Term(field.fieldType.name(), terms[i].text()), positions[i]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Term.bytes() is better that Term.text() if the term is not an utf8 string.

@@ -472,6 +477,10 @@ private Query boolToExtendedCommonTermsQuery(BooleanQuery bq, Occur highFreqOccu
}
}

protected Query blendPhraseQuery(PhraseQuery query, MappedFieldType fieldType) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add some comments to explain the purpose of this method?

The `cross_fields` mode for query parser ignores phrase query generated by multi-word synonyms.
In such case only the first field of each analyzer group is kept. This change fixes this issue
by expanding the phrase query for each analyzer group to **all** fields using a disjunction max query.
@jimczi jimczi force-pushed the bug/phrase_synonym_cross_fields branch from a8a61ca to 28a474d Compare January 15, 2018 12:58
@jimczi jimczi merged commit 190f1e1 into elastic:master Jan 15, 2018
@jimczi jimczi deleted the bug/phrase_synonym_cross_fields branch January 15, 2018 17:00
jimczi added a commit that referenced this pull request Jan 15, 2018
* Fix synonym phrase query expansion for cross_fields parsing

The `cross_fields` mode for query parser ignores phrase query generated by multi-word synonyms.
In such case only the first field of each analyzer group is kept. This change fixes this issue
by expanding the phrase query for each analyzer group to **all** fields using a disjunction max query.
jimczi added a commit that referenced this pull request Jan 15, 2018
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Jan 15, 2018
* master: (21 commits)
  [GEO] Add WKT Support to GeoBoundingBoxQueryBuilder
  Painless: Add whitelist extensions (elastic#28161)
  Fix daitch_mokotoff phonetic filter to use the dedicated Lucene filter (elastic#28225)
  Avoid doing redundant work when checking for self references. (elastic#26927)
  Fix casts in HotThreads. (elastic#27578)
  Ignore the `-snapshot` suffix when comparing the Lucene version in the build and the docs. (elastic#27927)
  Allow update of `eager_global_ordinals` on `_parent`. (elastic#28014)
  Fix NPE on composite aggregation with sub-aggregations that need scores (elastic#28129)
  `MockTcpTransport` to connect asynchronously (elastic#28203)
  Fix synonym phrase query expansion for cross_fields parsing (elastic#28045)
  Introduce elasticsearch-core jar (elastic#28191)
  elastic#28218: Update the Lucene version for 6.2.0 after backport
  upgrade to lucene 7.2.1 (elastic#28218)
  [Docs] Fix an error in painless-types.asciidoc (elastic#28221)
  Adds metadata to rewritten aggregations (elastic#28185)
  Update version of TaskInfo header serialization after backport
  TEST: Tightens file-based condition in peer-recovery
  Correct backport replica rollback to 6.2 (elastic#28181)
  Backport replica rollback to 6.2 (elastic#28181)
  Rename deleteLocalTranslog to createNewTranslog
  ...
martijnvg added a commit that referenced this pull request Jan 16, 2018
* es/6.x: (31 commits)
  Fix eclipse build. (#28236)
  Never return null from Strings.tokenizeToStringArray (#28224)
  Fallback to TransportMasterNodeAction for cluster health retries (#28195)
  [Docs] Changes to ingest.asciidoc (#28212)
  TEST: Update logging for testAckedIndexing
  [GEO] Deprecate field parameter in GeoBoundingBoxQueryBuilder
  [GEO] Add WKT Support to GeoBoundingBoxQueryBuilder
  Avoid doing redundant work when checking for self references. (#26927)
  Fix casts in HotThreads. (#27578)
  Ignore the `-snapshot` suffix when comparing the Lucene version in the build and the docs. (#27927)
  Allow update of `eager_global_ordinals` on `_parent`. (#28014)
  Painless: Add whitelist extensions (#28161)
  Fix daitch_mokotoff phonetic filter to use the dedicated Lucene filter (#28225)
  Fix NPE on composite aggregation with sub-aggregations that need scores (#28129)
  #28045 restore removed import after backport
  Fix synonym phrase query expansion for cross_fields parsing (#28045)
  Introduce elasticsearch-core jar (#28191)
  upgrade to lucene 7.2.1 (#28218)
  [Docs] Fix an error in painless-types.asciidoc (#28221)
  Consistent updates of IndexShardSnapshotStatus (#28130)
  ...
martijnvg added a commit that referenced this pull request Jan 16, 2018
* es/master: (30 commits)
  [Docs] Fix Java Api index administration usage (#28133)
  Fix eclipse build. (#28236)
  Never return null from Strings.tokenizeToStringArray (#28224)
  Fallback to TransportMasterNodeAction for cluster health retries (#28195)
  [Docs] Changes to ingest.asciidoc (#28212)
  TEST: Update logging for testAckedIndexing
  [GEO] Add WKT Support to GeoBoundingBoxQueryBuilder
  Painless: Add whitelist extensions (#28161)
  Fix daitch_mokotoff phonetic filter to use the dedicated Lucene filter (#28225)
  Avoid doing redundant work when checking for self references. (#26927)
  Fix casts in HotThreads. (#27578)
  Ignore the `-snapshot` suffix when comparing the Lucene version in the build and the docs. (#27927)
  Allow update of `eager_global_ordinals` on `_parent`. (#28014)
  Fix NPE on composite aggregation with sub-aggregations that need scores (#28129)
  `MockTcpTransport` to connect asynchronously (#28203)
  Fix synonym phrase query expansion for cross_fields parsing (#28045)
  Introduce elasticsearch-core jar (#28191)
  #28218: Update the Lucene version for 6.2.0 after backport
  upgrade to lucene 7.2.1 (#28218)
  [Docs] Fix an error in painless-types.asciidoc (#28221)
  ...
@jimczi jimczi added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search/Search Search-related issues that do not fall into other categories v6.2.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants