Skip to content

Commit 4c5bd57

Browse files
Rename simple pattern tokenizers (#25300)
Changed names to be snake case for consistency Related to #25159, original issue #23363
1 parent 0d6c47f commit 4c5bd57

File tree

5 files changed

+19
-19
lines changed

5 files changed

+19
-19
lines changed

docs/reference/analysis/tokenizers.asciidoc

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -99,14 +99,14 @@ terms.
9999

100100
<<analysis-simplepattern-tokenizer,Simple Pattern Tokenizer>>::
101101

102-
The `simplepattern` tokenizer uses a regular expression to capture matching
102+
The `simple_pattern` tokenizer uses a regular expression to capture matching
103103
text as terms. It uses a restricted subset of regular expression features
104104
and is generally faster than the `pattern` tokenizer.
105105

106106
<<analysis-simplepatternsplit-tokenizer,Simple Pattern Split Tokenizer>>::
107107

108-
The `simplepatternsplit` tokenizer uses the same restricted regular expression
109-
subset as the `simplepattern` tokenizer, but splits the input at matches rather
108+
The `simple_pattern_split` tokenizer uses the same restricted regular expression
109+
subset as the `simple_pattern` tokenizer, but splits the input at matches rather
110110
than returning the matches as terms.
111111

112112
<<analysis-pathhierarchy-tokenizer,Path Tokenizer>>::

docs/reference/analysis/tokenizers/simplepattern-tokenizer.asciidoc

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -3,15 +3,15 @@
33

44
experimental[]
55

6-
The `simplepattern` tokenizer uses a regular expression to capture matching
6+
The `simple_pattern` tokenizer uses a regular expression to capture matching
77
text as terms. The set of regular expression features it supports is more
88
limited than the <<analysis-pattern-tokenizer,`pattern`>> tokenizer, but the
99
tokenization is generally faster.
1010

1111
This tokenizer does not support splitting the input on a pattern match, unlike
1212
the <<analysis-pattern-tokenizer,`pattern`>> tokenizer. To split on pattern
1313
matches using the same restricted regular expression subset, see the
14-
<<analysis-simplepatternsplit-tokenizer,`simplepatternsplit`>> tokenizer.
14+
<<analysis-simplepatternsplit-tokenizer,`simple_pattern_split`>> tokenizer.
1515

1616
This tokenizer uses {lucene-core-javadoc}/org/apache/lucene/util/automaton/RegExp.html[Lucene regular expressions].
1717
For an explanation of the supported features and syntax, see <<regexp-syntax,Regular Expression Syntax>>.
@@ -22,7 +22,7 @@ tokenizer should always be configured with a non-default pattern.
2222
[float]
2323
=== Configuration
2424

25-
The `simplepattern` tokenizer accepts the following parameters:
25+
The `simple_pattern` tokenizer accepts the following parameters:
2626

2727
[horizontal]
2828
`pattern`::
@@ -31,7 +31,7 @@ The `simplepattern` tokenizer accepts the following parameters:
3131
[float]
3232
=== Example configuration
3333

34-
This example configures the `simplepattern` tokenizer to produce terms that are
34+
This example configures the `simple_pattern` tokenizer to produce terms that are
3535
three-digit numbers
3636

3737
[source,js]
@@ -47,7 +47,7 @@ PUT my_index
4747
},
4848
"tokenizer": {
4949
"my_tokenizer": {
50-
"type": "simplepattern",
50+
"type": "simple_pattern",
5151
"pattern": "[0123456789]{3}"
5252
}
5353
}

docs/reference/analysis/tokenizers/simplepatternsplit-tokenizer.asciidoc

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -3,14 +3,14 @@
33

44
experimental[]
55

6-
The `simplepatternsplit` tokenizer uses a regular expression to split the
6+
The `simple_pattern_split` tokenizer uses a regular expression to split the
77
input into terms at pattern matches. The set of regular expression features it
88
supports is more limited than the <<analysis-pattern-tokenizer,`pattern`>>
99
tokenizer, but the tokenization is generally faster.
1010

1111
This tokenizer does not produce terms from the matches themselves. To produce
1212
terms from matches using patterns in the same restricted regular expression
13-
subset, see the <<analysis-simplepattern-tokenizer,`simplepattern`>>
13+
subset, see the <<analysis-simplepattern-tokenizer,`simple_pattern`>>
1414
tokenizer.
1515

1616
This tokenizer uses {lucene-core-javadoc}/org/apache/lucene/util/automaton/RegExp.html[Lucene regular expressions].
@@ -23,7 +23,7 @@ pattern.
2323
[float]
2424
=== Configuration
2525

26-
The `simplepatternsplit` tokenizer accepts the following parameters:
26+
The `simple_pattern_split` tokenizer accepts the following parameters:
2727

2828
[horizontal]
2929
`pattern`::
@@ -32,7 +32,7 @@ The `simplepatternsplit` tokenizer accepts the following parameters:
3232
[float]
3333
=== Example configuration
3434

35-
This example configures the `simplepatternsplit` tokenizer to split the input
35+
This example configures the `simple_pattern_split` tokenizer to split the input
3636
text on underscores.
3737

3838
[source,js]
@@ -48,7 +48,7 @@ PUT my_index
4848
},
4949
"tokenizer": {
5050
"my_tokenizer": {
51-
"type": "simplepatternsplit",
51+
"type": "simple_pattern_split",
5252
"pattern": "_"
5353
}
5454
}

modules/analysis-common/src/main/java/org/elasticsearch/analysis/common/CommonAnalysisPlugin.java

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -122,8 +122,8 @@ public Map<String, AnalysisProvider<CharFilterFactory>> getCharFilters() {
122122
@Override
123123
public Map<String, AnalysisProvider<TokenizerFactory>> getTokenizers() {
124124
Map<String, AnalysisProvider<TokenizerFactory>> tokenizers = new TreeMap<>();
125-
tokenizers.put("simplepattern", SimplePatternTokenizerFactory::new);
126-
tokenizers.put("simplepatternsplit", SimplePatternSplitTokenizerFactory::new);
125+
tokenizers.put("simple_pattern", SimplePatternTokenizerFactory::new);
126+
tokenizers.put("simple_pattern_split", SimplePatternSplitTokenizerFactory::new);
127127
return tokenizers;
128128
}
129129

modules/analysis-common/src/test/resources/rest-api-spec/test/analysis-common/30_tokenizers.yml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -27,29 +27,29 @@
2727
- match: { detail.tokenizer.tokens.2.token: od }
2828

2929
---
30-
"simplepattern":
30+
"simple_pattern":
3131
- do:
3232
indices.analyze:
3333
body:
3434
text: "a6bf fooo ff61"
3535
explain: true
3636
tokenizer:
37-
type: simplepattern
37+
type: simple_pattern
3838
pattern: "[abcdef0123456789]{4}"
3939
- length: { detail.tokenizer.tokens: 2 }
4040
- match: { detail.tokenizer.name: _anonymous_tokenizer }
4141
- match: { detail.tokenizer.tokens.0.token: a6bf }
4242
- match: { detail.tokenizer.tokens.1.token: ff61 }
4343

4444
---
45-
"simplepatternsplit":
45+
"simple_pattern_split":
4646
- do:
4747
indices.analyze:
4848
body:
4949
text: "foo==bar"
5050
explain: true
5151
tokenizer:
52-
type: simplepatternsplit
52+
type: simple_pattern_split
5353
pattern: ==
5454
- length: { detail.tokenizer.tokens: 2 }
5555
- match: { detail.tokenizer.name: _anonymous_tokenizer }

0 commit comments

Comments
 (0)