@@ -97,22 +97,38 @@ similarity has the following option:
9797Type name: `classic`
9898
9999[float]
100- [[drf ]]
100+ [[dfr ]]
101101==== DFR similarity
102102
103103Similarity that implements the
104- http:// lucene.apache.org/ core/5_2_1/core /org/apache/lucene/search/similarities/DFRSimilarity.html[divergence
104+ { lucene- core-javadoc} /org/apache/lucene/search/similarities/DFRSimilarity.html[divergence
105105from randomness] framework. This similarity has the following options:
106106
107107[horizontal]
108108`basic_model`::
109- Possible values: `be`, `d`, `g`, `if`, `in`, `ine` and `p`.
109+ Possible values: {lucene-core-javadoc}/org/apache/lucene/search/similarities/BasicModelG.html[`be`],
110+ {lucene-core-javadoc}/org/apache/lucene/search/similarities/BasicModelD.html[`d`],
111+ {lucene-core-javadoc}/org/apache/lucene/search/similarities/BasicModelG.html[`g`],
112+ {lucene-core-javadoc}/org/apache/lucene/search/similarities/BasicModelIF.html[`if`],
113+ {lucene-core-javadoc}/org/apache/lucene/search/similarities/BasicModelIn.html[`in`],
114+ {lucene-core-javadoc}/org/apache/lucene/search/similarities/BasicModelIne.html[`ine`] and
115+ {lucene-core-javadoc}/org/apache/lucene/search/similarities/BasicModelP.html[`p`].
116+
117+ `be`, `d` and `p` should be avoided in practice as they might return scores that
118+ are equal to 0 or infinite with terms that do not meet the expected random
119+ distribution.
110120
111121`after_effect`::
112- Possible values: `no`, `b` and `l`.
122+ Possible values: {lucene-core-javadoc}/org/apache/lucene/search/similarities/AfterEffect.NoAfterEffect.html[`no`],
123+ {lucene-core-javadoc}/org/apache/lucene/search/similarities/AfterEffectB.html[`b`] and
124+ {lucene-core-javadoc}/org/apache/lucene/search/similarities/AfterEffectL.html[`l`].
113125
114126`normalization`::
115- Possible values: `no`, `h1`, `h2`, `h3` and `z`.
127+ Possible values: {lucene-core-javadoc}/org/apache/lucene/search/similarities/Normalization.NoNormalization.html[`no`],
128+ {lucene-core-javadoc}/org/apache/lucene/search/similarities/NormalizationH1.html[`h1`],
129+ {lucene-core-javadoc}/org/apache/lucene/search/similarities/NormalizationH2.html[`h2`],
130+ {lucene-core-javadoc}/org/apache/lucene/search/similarities/NormalizationH1.html[`h3`] and
131+ {lucene-core-javadoc}/org/apache/lucene/search/similarities/NormalizationZ.html[`z`].
116132
117133All options but the first option need a normalization value.
118134
@@ -127,23 +143,34 @@ model.
127143This similarity has the following options:
128144
129145[horizontal]
130- `independence_measure`:: Possible values `standardized`, `saturated`, `chisquared`.
146+ `independence_measure`:: Possible values
147+ {lucene-core-javadoc}/org/apache/lucene/search/similarities/IndependenceStandardized.html[`standardized`],
148+ {lucene-core-javadoc}/org/apache/lucene/search/similarities/IndependenceSaturated.html[`saturated`],
149+ {lucene-core-javadoc}/org/apache/lucene/search/similarities/IndependenceChiSquared.html[`chisquared`].
150+
151+ When using this similarity, it is highly recommended to remove stop words to get
152+ good relevance. Also beware that terms whose frequency is less than the expected
153+ frequency will get a score equal to 0.
131154
132155Type name: `DFI`
133156
134157[float]
135158[[ib]]
136159==== IB similarity.
137160
138- http:// lucene.apache.org/ core/5_2_1/core /org/apache/lucene/search/similarities/IBSimilarity.html[Information
161+ { lucene- core-javadoc} /org/apache/lucene/search/similarities/IBSimilarity.html[Information
139162based model] . The algorithm is based on the concept that the information content in any symbolic 'distribution'
140163sequence is primarily determined by the repetitive usage of its basic elements.
141164For written texts this challenge would correspond to comparing the writing styles of different authors.
142165This similarity has the following options:
143166
144167[horizontal]
145- `distribution`:: Possible values: `ll` and `spl`.
146- `lambda`:: Possible values: `df` and `ttf`.
168+ `distribution`:: Possible values:
169+ {lucene-core-javadoc}/org/apache/lucene/search/similarities/DistributionLL.html[`ll`] and
170+ {lucene-core-javadoc}/org/apache/lucene/search/similarities/DistributionSPL.html[`spl`].
171+ `lambda`:: Possible values:
172+ {lucene-core-javadoc}/org/apache/lucene/search/similarities/LambdaDF.html[`df`] and
173+ {lucene-core-javadoc}/org/apache/lucene/search/similarities/LambdaTTF.html[`ttf`].
147174`normalization`:: Same as in `DFR` similarity.
148175
149176Type name: `IB`
@@ -152,19 +179,23 @@ Type name: `IB`
152179[[lm_dirichlet]]
153180==== LM Dirichlet similarity.
154181
155- http:// lucene.apache.org/ core/5_2_1/core /org/apache/lucene/search/similarities/LMDirichletSimilarity.html[LM
182+ { lucene- core-javadoc} /org/apache/lucene/search/similarities/LMDirichletSimilarity.html[LM
156183Dirichlet similarity] . This similarity has the following options:
157184
158185[horizontal]
159186`mu`:: Default to `2000`.
160187
188+ The scoring formula in the paper assigns negative scores to terms that have
189+ fewer occurrences than predicted by the language model, which is illegal to
190+ Lucene, so such terms get a score of 0.
191+
161192Type name: `LMDirichlet`
162193
163194[float]
164195[[lm_jelinek_mercer]]
165196==== LM Jelinek Mercer similarity.
166197
167- http:// lucene.apache.org/ core/5_2_1 /core/org/apache/lucene/search/similarities/LMJelinekMercerSimilarity.html[LM
198+ { lucene- core-javadoc} /core/org/apache/lucene/search/similarities/LMJelinekMercerSimilarity.html[LM
168199Jelinek Mercer similarity] . The algorithm attempts to capture important patterns in the text, while leaving out noise. This similarity has the following options:
169200
170201[horizontal]
0 commit comments