Skip to content

Commit 0427339

Browse files
authored
Index phrases (#30450)
Specifying `index_phrases: true` on a text field mapping will add a subsidiary [field]._index_phrase field, indexing two-term shingles from the parent field. The parent analysis chain is re-used, wrapped with a FixedShingleFilter. At query time, if a phrase match query is executed, the mapping will redirect it to run against the subsidiary field. This should trade faster phrase querying for a larger index and longer indexing times. Relates to #27049
1 parent dc8a4fb commit 0427339

File tree

8 files changed

+457
-14
lines changed

8 files changed

+457
-14
lines changed

docs/reference/mapping/types/text.asciidoc

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -96,6 +96,14 @@ The following parameters are accepted by `text` fields:
9696
the expense of a larger index. Accepts an
9797
<<index-prefix-config,`index-prefix configuration block`>>
9898

99+
<<index-phrases,`index_phrases`>>::
100+
101+
If enabled, two-term word combinations ('shingles') are indexed into a separate
102+
field. This allows exact phrase queries to run more efficiently, at the expense
103+
of a larger index. Note that this works best when stopwords are not removed,
104+
as phrases containing stopwords will not use the subsidiary field and will fall
105+
back to a standard phrase query. Accepts `true` or `false` (default).
106+
99107
<<norms,`norms`>>::
100108

101109
Whether field-length should be taken into account when scoring queries.
Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
---
2+
"search with indexed phrases":
3+
- skip:
4+
version: " - 6.99.99"
5+
reason: index_phrase is only available as of 7.0.0
6+
- do:
7+
indices.create:
8+
index: test
9+
body:
10+
mappings:
11+
test:
12+
properties:
13+
text:
14+
type: text
15+
index_phrases: true
16+
17+
- do:
18+
index:
19+
index: test
20+
type: test
21+
id: 1
22+
body: { text: "peter piper picked a peck of pickled peppers" }
23+
24+
- do:
25+
indices.refresh:
26+
index: [test]
27+
28+
- do:
29+
search:
30+
index: test
31+
body:
32+
query:
33+
match_phrase:
34+
text:
35+
query: "peter piper"
36+
37+
- match: {hits.total: 1}
38+
39+
- do:
40+
search:
41+
index: test
42+
q: '"peter piper"~1'
43+
df: text
44+
45+
- match: {hits.total: 1}
46+
47+
- do:
48+
search:
49+
index: test
50+
body:
51+
query:
52+
match_phrase:
53+
text: "peter piper picked"
54+
55+
- match: {hits.total: 1}
56+
57+
- do:
58+
search:
59+
index: test
60+
body:
61+
query:
62+
match_phrase:
63+
text: "piper"
64+
65+
- match: {hits.total: 1}
66+
67+

server/src/main/java/org/elasticsearch/index/mapper/MappedFieldType.java

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@
1919

2020
package org.elasticsearch.index.mapper;
2121

22+
import org.apache.lucene.analysis.TokenStream;
2223
import org.apache.lucene.document.FieldType;
2324
import org.apache.lucene.index.IndexOptions;
2425
import org.apache.lucene.index.IndexReader;
@@ -43,6 +44,7 @@
4344
import org.elasticsearch.index.query.QueryRewriteContext;
4445
import org.elasticsearch.index.query.QueryShardContext;
4546
import org.elasticsearch.index.query.QueryShardException;
47+
import org.elasticsearch.index.search.MatchQuery;
4648
import org.elasticsearch.index.similarity.SimilarityProvider;
4749
import org.elasticsearch.search.DocValueFormat;
4850
import org.joda.time.DateTimeZone;
@@ -353,6 +355,14 @@ public Query regexpQuery(String value, int flags, int maxDeterminizedStates, @Nu
353355

354356
public abstract Query existsQuery(QueryShardContext context);
355357

358+
public Query phraseQuery(String field, TokenStream stream, int slop, boolean enablePositionIncrements) throws IOException {
359+
throw new IllegalArgumentException("Can only use phrase queries on text fields - not on [" + name + "] which is of type [" + typeName() + "]");
360+
}
361+
362+
public Query multiPhraseQuery(String field, TokenStream stream, int slop, boolean enablePositionIncrements) throws IOException {
363+
throw new IllegalArgumentException("Can only use phrase queries on text fields - not on [" + name + "] which is of type [" + typeName() + "]");
364+
}
365+
356366
/**
357367
* An enum used to describe the relation between the range of terms in a
358368
* shard when compared with a query range

0 commit comments

Comments
 (0)