|
| 1 | +[[query-dsl-intervals-query]] |
| 2 | +=== Intervals query |
| 3 | + |
| 4 | +An `intervals` query allows fine-grained control over the order and proximity of |
| 5 | +matching terms. Matching rules are constructed from a small set of definitions, |
| 6 | +and the rules are then applied to terms from a particular `field`. |
| 7 | + |
| 8 | +The definitions produce sequences of minimal intervals that span terms in a |
| 9 | +body of text. These intervals can be further combined and filtered by |
| 10 | +parent sources. |
| 11 | + |
| 12 | +The example below will search for the phrase `my favourite food` appearing |
| 13 | +before the terms `hot` and `water` or `cold` and `porridge` in any order, in |
| 14 | +the field `my_text` |
| 15 | + |
| 16 | +[source,js] |
| 17 | +-------------------------------------------------- |
| 18 | +POST _search |
| 19 | +{ |
| 20 | + "query": { |
| 21 | + "intervals" : { |
| 22 | + "my_text" : { |
| 23 | + "all_of" : { |
| 24 | + "ordered" : true, |
| 25 | + "intervals" : [ |
| 26 | + { |
| 27 | + "match" : { |
| 28 | + "query" : "my favourite food", |
| 29 | + "max_gaps" : 0, |
| 30 | + "ordered" : true |
| 31 | + } |
| 32 | + }, |
| 33 | + { |
| 34 | + "any_of" : { |
| 35 | + "intervals" : [ |
| 36 | + { "match" : { "query" : "hot water" } }, |
| 37 | + { "match" : { "query" : "cold porridge" } } |
| 38 | + ] |
| 39 | + } |
| 40 | + } |
| 41 | + ] |
| 42 | + }, |
| 43 | + "boost" : 2.0, |
| 44 | + "_name" : "favourite_food" |
| 45 | + } |
| 46 | + } |
| 47 | + } |
| 48 | +} |
| 49 | +-------------------------------------------------- |
| 50 | +// CONSOLE |
| 51 | + |
| 52 | +In the above example, the text `my favourite food is cold porridge` would |
| 53 | +match because the two intervals matching `my favourite food` and `cold |
| 54 | +porridge` appear in the correct order, but the text `when it's cold my |
| 55 | +favourite food is porridge` would not match, because the interval matching |
| 56 | +`cold porridge` starts before the interval matching `my favourite food`. |
| 57 | + |
| 58 | +[[intervals-match]] |
| 59 | +==== `match` |
| 60 | + |
| 61 | +The `match` rule matches analyzed text, and takes the following parameters: |
| 62 | + |
| 63 | +[horizontal] |
| 64 | +`query`:: |
| 65 | +The text to match. |
| 66 | +`max_gaps`:: |
| 67 | +Specify a maximum number of gaps between the terms in the text. Terms that |
| 68 | +appear further apart than this will not match. If unspecified, or set to -1, |
| 69 | +then there is no width restriction on the match. If set to 0 then the terms |
| 70 | +must appear next to each other. |
| 71 | +`ordered`:: |
| 72 | +Whether or not the terms must appear in their specified order. Defaults to |
| 73 | +`false` |
| 74 | +`analyzer`:: |
| 75 | +Which analyzer should be used to analyze terms in the `query`. By |
| 76 | +default, the search analyzer of the top-level field will be used. |
| 77 | +`filter`:: |
| 78 | +An optional <<interval_filter,interval filter>> |
| 79 | + |
| 80 | +[[intervals-all_of]] |
| 81 | +==== `all_of` |
| 82 | + |
| 83 | +`all_of` returns returns matches that span a combination of other rules. |
| 84 | + |
| 85 | +[horizontal] |
| 86 | +`intervals`:: |
| 87 | +An array of rules to combine. All rules must produce a match in a |
| 88 | +document for the overall source to match. |
| 89 | +`max_gaps`:: |
| 90 | +Specify a maximum number of gaps between the rules. Combinations that match |
| 91 | +across a distance greater than this will not match. If set to -1 or |
| 92 | +unspecified, there is no restriction on this distance. If set to 0, then the |
| 93 | +matches produced by the rules must all appear immediately next to each other. |
| 94 | +`ordered`:: |
| 95 | +Whether the intervals produced by the rules should appear in the order in |
| 96 | +which they are specified. Defaults to `false` |
| 97 | +`filter`:: |
| 98 | +An optional <<interval_filter,interval filter>> |
| 99 | + |
| 100 | +[[intervals-any_of]] |
| 101 | +==== `any_of` |
| 102 | + |
| 103 | +The `any_of` rule emits intervals produced by any of its sub-rules. |
| 104 | + |
| 105 | +[horizontal] |
| 106 | +`intervals`:: |
| 107 | +An array of rules to match |
| 108 | +`filter`:: |
| 109 | +An optional <<interval_filter,interval filter>> |
| 110 | + |
| 111 | +[[interval_filter]] |
| 112 | +==== filters |
| 113 | + |
| 114 | +You can filter intervals produced by any rules by their relation to the |
| 115 | +intervals produced by another rule. The following example will return |
| 116 | +documents that have the words `hot` and `porridge` within 10 positions |
| 117 | +of each other, without the word `salty` in between: |
| 118 | + |
| 119 | +[source,js] |
| 120 | +-------------------------------------------------- |
| 121 | +POST _search |
| 122 | +{ |
| 123 | + "query": { |
| 124 | + "intervals" : { |
| 125 | + "my_text" : { |
| 126 | + "match" : { |
| 127 | + "query" : "hot porridge", |
| 128 | + "max_gaps" : 10, |
| 129 | + "filter" : { |
| 130 | + "not_containing" : { |
| 131 | + "match" : { |
| 132 | + "query" : "salty" |
| 133 | + } |
| 134 | + } |
| 135 | + } |
| 136 | + } |
| 137 | + } |
| 138 | + } |
| 139 | + } |
| 140 | +} |
| 141 | +-------------------------------------------------- |
| 142 | +// CONSOLE |
| 143 | + |
| 144 | +The following filters are available: |
| 145 | +[horizontal] |
| 146 | +`containing`:: |
| 147 | +Produces intervals that contain an interval from the filter rule |
| 148 | +`contained_by`:: |
| 149 | +Produces intervals that are contained by an interval from the filter rule |
| 150 | +`not_containing`:: |
| 151 | +Produces intervals that do not contain an interval from the filter rule |
| 152 | +`not_contained_by`:: |
| 153 | +Produces intervals that are not contained by an interval from the filter rule |
| 154 | +`not_overlapping`:: |
| 155 | +Produces intervals that do not overlap with an interval from the filter rule |
| 156 | + |
| 157 | +[[interval-minimization]] |
| 158 | +==== Minimization |
| 159 | + |
| 160 | +The intervals query always minimizes intervals, to ensure that queries can |
| 161 | +run in linear time. This can sometimes cause surprising results, particularly |
| 162 | +when using `max_gaps` restrictions or filters. For example, take the |
| 163 | +following query, searching for `salty` contained within the phrase `hot |
| 164 | +porridge`: |
| 165 | + |
| 166 | +[source,js] |
| 167 | +-------------------------------------------------- |
| 168 | +POST _search |
| 169 | +{ |
| 170 | + "query": { |
| 171 | + "intervals" : { |
| 172 | + "my_text" : { |
| 173 | + "match" : { |
| 174 | + "query" : "salty", |
| 175 | + "filter" : { |
| 176 | + "contained_by" : { |
| 177 | + "match" : { |
| 178 | + "query" : "hot porridge" |
| 179 | + } |
| 180 | + } |
| 181 | + } |
| 182 | + } |
| 183 | + } |
| 184 | + } |
| 185 | + } |
| 186 | +} |
| 187 | +-------------------------------------------------- |
| 188 | +// CONSOLE |
| 189 | + |
| 190 | +This query will *not* match a document containing the phrase `hot porridge is |
| 191 | +salty porridge`, because the intervals returned by the match query for `hot |
| 192 | +porridge` only cover the initial two terms in this document, and these do not |
| 193 | +overlap the intervals covering `salty`. |
| 194 | + |
| 195 | +Another restriction to be aware of is the case of `any_of` rules that contain |
| 196 | +sub-rules which overlap. In particular, if one of the rules is a strict |
| 197 | +prefix of the other, then the longer rule will never be matched, which can |
| 198 | +cause surprises when used in combination with `max_gaps`. Consider the |
| 199 | +following query, searching for `the` immediately followed by `big` or `big bad`, |
| 200 | +immediately followed by `wolf`: |
| 201 | + |
| 202 | +[source,js] |
| 203 | +-------------------------------------------------- |
| 204 | +POST _search |
| 205 | +{ |
| 206 | + "query": { |
| 207 | + "intervals" : { |
| 208 | + "my_text" : { |
| 209 | + "all_of" : { |
| 210 | + "intervals" : [ |
| 211 | + { "match" : { "query" : "the" } }, |
| 212 | + { "any_of" : { |
| 213 | + "intervals" : [ |
| 214 | + { "match" : { "query" : "big" } }, |
| 215 | + { "match" : { "query" : "big bad" } } |
| 216 | + ] } }, |
| 217 | + { "match" : { "query" : "wolf" } } |
| 218 | + ], |
| 219 | + "max_gaps" : 0, |
| 220 | + "ordered" : true |
| 221 | + } |
| 222 | + } |
| 223 | + } |
| 224 | + } |
| 225 | +} |
| 226 | +-------------------------------------------------- |
| 227 | +// CONSOLE |
| 228 | + |
| 229 | +Counter-intuitively, this query *will not* match the document `the big bad |
| 230 | +wolf`, because the `any_of` rule in the middle will only produce intervals |
| 231 | +for `big` - intervals for `big bad` being longer than those for `big`, while |
| 232 | +starting at the same position, and so being minimized away. In these cases, |
| 233 | +it's better to rewrite the query so that all of the options are explicitly |
| 234 | +laid out at the top level: |
| 235 | + |
| 236 | +[source,js] |
| 237 | +-------------------------------------------------- |
| 238 | +POST _search |
| 239 | +{ |
| 240 | + "query": { |
| 241 | + "intervals" : { |
| 242 | + "my_text" : { |
| 243 | + "any_of" : { |
| 244 | + "intervals" : [ |
| 245 | + { "match" : { |
| 246 | + "query" : "the big bad wolf", |
| 247 | + "ordered" : true, |
| 248 | + "max_gaps" : 0 } }, |
| 249 | + { "match" : { |
| 250 | + "query" : "the big wolf", |
| 251 | + "ordered" : true, |
| 252 | + "max_gaps" : 0 } } |
| 253 | + ] |
| 254 | + } |
| 255 | + } |
| 256 | + } |
| 257 | + } |
| 258 | +} |
| 259 | +-------------------------------------------------- |
| 260 | +// CONSOLE |
0 commit comments