-
Couldn't load subscription status.
- Fork 25.6k
Description
Currently we have no specification of allowed values for index names, type names, IDs, field names or routing values.
This issue is an attempt to document and improve the existing specs to prevent inconsistencies.
Index names
Index names are limited by the file system. They may only be lower case, and my not start with an underscore. While we don't prevent index names starting with a ., we reserve those for internal use. Clearly, . and .. cannot be used.
These characters are already illegal: \, /, *, ?, ", <, >, |, ,,. We should also add the null byte.
There are other filenames which are illegal in Windows, but we probably don't need to check for those.
Type names
Type names can contain any character (except null bytes, which currently we don't check) but may not start with an underscore.
IDs
IDs can contain any character (except null bytes, which currently we don't check). IDs should not begin with an underscore.
Currently IDs are not checked for underscores and IDs with underscores may exist. These can clash with eg _mapping and so should be prevented. This is a backwards incompatible change.
Routing & Parent
Routing and parent values should be the same as IDs, ie any chars except for the null byte. The problem is that multiple routing values are passed in the query string as comma-separated values, eg ?routing=foo,bar.
If a single routing value contains a comma, it will be misinterpreted as two routing values. One idea is to pass multiple routing values as eg ?routing=foo&routing=bar,baz. Unfortunately, this is not backwards compatible and isn't supported by a number of client libraries.
The only solution I can think of is to support some escaping of commas, eg foo\,bar. This would mean that \ would need to be escaped as well, ie: foo\bar -> foo\\bar. Support for this escaping would need to be added to Elasticsearch and to the client libraries.