Skip to content

LUCENE-8563: Remove k1+1 from the numerator of BM25Similarity #511

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 17 commits into from
Closed

LUCENE-8563: Remove k1+1 from the numerator of BM25Similarity #511

wants to merge 17 commits into from

Conversation

javanna
Copy link
Contributor

@javanna javanna commented Nov 28, 2018

Patch for https://issues.apache.org/jira/browse/LUCENE-8563.

This PR removes the k1+1 factor from the numerator of BM25Similarity and adds a new LegacyBM25Similarity under misc that exposes the old behaviour. Note that I haven't found a way to easily reproduce the previous behaviour in the explain method, so I left that part out of LegacyBM25Similarity for now.

@javanna javanna changed the title Remove k1+1 from the numerator of BM25Similarity LUCENE-8563: Remove k1+1 from the numerator of BM25Similarity Nov 28, 2018
@@ -150,3 +150,11 @@ in order to support ToParent/ToChildBlockJoinQuery.

Normalization is now type-safe, with CharFilterFactory#normalize() returning a Reader and
TokenFilterFactory#normalize() returning a TokenFilter.

## k1+1 constant factor removed from BM25 similarity numerator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you append (LUCENE-8563) ## at the end of the line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!

constant factor was removed from the numerator of the scoring formula.
Ordering of results is preserved unless scores are computed from multiple
fields using different similarities. The previous behaviour is now exposed
through the LegacyBM25Similarity class.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add that it can be found in the lucene-misc jar?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I wasn't sure how to phrase that. Will add.

@javanna
Copy link
Contributor Author

javanna commented Nov 30, 2018

Merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants