@@ -97,22 +97,38 @@ similarity has the following option:
97
97
Type name: `classic`
98
98
99
99
[float]
100
- [[drf ]]
100
+ [[dfr ]]
101
101
==== DFR similarity
102
102
103
103
Similarity that implements the
104
- http:// lucene.apache.org/ core/5_2_1/core /org/apache/lucene/search/similarities/DFRSimilarity.html[divergence
104
+ { lucene- core-javadoc} /org/apache/lucene/search/similarities/DFRSimilarity.html[divergence
105
105
from randomness] framework. This similarity has the following options:
106
106
107
107
[horizontal]
108
108
`basic_model`::
109
- Possible values: `be`, `d`, `g`, `if`, `in`, `ine` and `p`.
109
+ Possible values: {lucene-core-javadoc}/org/apache/lucene/search/similarities/BasicModelG.html[`be`],
110
+ {lucene-core-javadoc}/org/apache/lucene/search/similarities/BasicModelD.html[`d`],
111
+ {lucene-core-javadoc}/org/apache/lucene/search/similarities/BasicModelG.html[`g`],
112
+ {lucene-core-javadoc}/org/apache/lucene/search/similarities/BasicModelIF.html[`if`],
113
+ {lucene-core-javadoc}/org/apache/lucene/search/similarities/BasicModelIn.html[`in`],
114
+ {lucene-core-javadoc}/org/apache/lucene/search/similarities/BasicModelIne.html[`ine`] and
115
+ {lucene-core-javadoc}/org/apache/lucene/search/similarities/BasicModelP.html[`p`].
116
+
117
+ `be`, `d` and `p` should be avoided in practice as they might return scores that
118
+ are equal to 0 or infinite with terms that do not meet the expected random
119
+ distribution.
110
120
111
121
`after_effect`::
112
- Possible values: `no`, `b` and `l`.
122
+ Possible values: {lucene-core-javadoc}/org/apache/lucene/search/similarities/AfterEffect.NoAfterEffect.html[`no`],
123
+ {lucene-core-javadoc}/org/apache/lucene/search/similarities/AfterEffectB.html[`b`] and
124
+ {lucene-core-javadoc}/org/apache/lucene/search/similarities/AfterEffectL.html[`l`].
113
125
114
126
`normalization`::
115
- Possible values: `no`, `h1`, `h2`, `h3` and `z`.
127
+ Possible values: {lucene-core-javadoc}/org/apache/lucene/search/similarities/Normalization.NoNormalization.html[`no`],
128
+ {lucene-core-javadoc}/org/apache/lucene/search/similarities/NormalizationH1.html[`h1`],
129
+ {lucene-core-javadoc}/org/apache/lucene/search/similarities/NormalizationH2.html[`h2`],
130
+ {lucene-core-javadoc}/org/apache/lucene/search/similarities/NormalizationH1.html[`h3`] and
131
+ {lucene-core-javadoc}/org/apache/lucene/search/similarities/NormalizationZ.html[`z`].
116
132
117
133
All options but the first option need a normalization value.
118
134
@@ -127,23 +143,34 @@ model.
127
143
This similarity has the following options:
128
144
129
145
[horizontal]
130
- `independence_measure`:: Possible values `standardized`, `saturated`, `chisquared`.
146
+ `independence_measure`:: Possible values
147
+ {lucene-core-javadoc}/org/apache/lucene/search/similarities/IndependenceStandardized.html[`standardized`],
148
+ {lucene-core-javadoc}/org/apache/lucene/search/similarities/IndependenceSaturated.html[`saturated`],
149
+ {lucene-core-javadoc}/org/apache/lucene/search/similarities/IndependenceChiSquared.html[`chisquared`].
150
+
151
+ When using this similarity, it is highly recommended to remove stop words to get
152
+ good relevance. Also beware that terms whose frequency is less than the expected
153
+ frequency will get a score equal to 0.
131
154
132
155
Type name: `DFI`
133
156
134
157
[float]
135
158
[[ib]]
136
159
==== IB similarity.
137
160
138
- http:// lucene.apache.org/ core/5_2_1/core /org/apache/lucene/search/similarities/IBSimilarity.html[Information
161
+ { lucene- core-javadoc} /org/apache/lucene/search/similarities/IBSimilarity.html[Information
139
162
based model] . The algorithm is based on the concept that the information content in any symbolic 'distribution'
140
163
sequence is primarily determined by the repetitive usage of its basic elements.
141
164
For written texts this challenge would correspond to comparing the writing styles of different authors.
142
165
This similarity has the following options:
143
166
144
167
[horizontal]
145
- `distribution`:: Possible values: `ll` and `spl`.
146
- `lambda`:: Possible values: `df` and `ttf`.
168
+ `distribution`:: Possible values:
169
+ {lucene-core-javadoc}/org/apache/lucene/search/similarities/DistributionLL.html[`ll`] and
170
+ {lucene-core-javadoc}/org/apache/lucene/search/similarities/DistributionSPL.html[`spl`].
171
+ `lambda`:: Possible values:
172
+ {lucene-core-javadoc}/org/apache/lucene/search/similarities/LambdaDF.html[`df`] and
173
+ {lucene-core-javadoc}/org/apache/lucene/search/similarities/LambdaTTF.html[`ttf`].
147
174
`normalization`:: Same as in `DFR` similarity.
148
175
149
176
Type name: `IB`
@@ -152,19 +179,23 @@ Type name: `IB`
152
179
[[lm_dirichlet]]
153
180
==== LM Dirichlet similarity.
154
181
155
- http:// lucene.apache.org/ core/5_2_1/core /org/apache/lucene/search/similarities/LMDirichletSimilarity.html[LM
182
+ { lucene- core-javadoc} /org/apache/lucene/search/similarities/LMDirichletSimilarity.html[LM
156
183
Dirichlet similarity] . This similarity has the following options:
157
184
158
185
[horizontal]
159
186
`mu`:: Default to `2000`.
160
187
188
+ The scoring formula in the paper assigns negative scores to terms that have
189
+ fewer occurrences than predicted by the language model, which is illegal to
190
+ Lucene, so such terms get a score of 0.
191
+
161
192
Type name: `LMDirichlet`
162
193
163
194
[float]
164
195
[[lm_jelinek_mercer]]
165
196
==== LM Jelinek Mercer similarity.
166
197
167
- http:// lucene.apache.org/ core/5_2_1 /core/org/apache/lucene/search/similarities/LMJelinekMercerSimilarity.html[LM
198
+ { lucene- core-javadoc} /core/org/apache/lucene/search/similarities/LMJelinekMercerSimilarity.html[LM
168
199
Jelinek Mercer similarity] . The algorithm attempts to capture important patterns in the text, while leaving out noise. This similarity has the following options:
169
200
170
201
[horizontal]
0 commit comments