Other tokenizers (like Standard) support overriding the max_token_length parameter, but it seems Whitespace doesn't, while the underlying Lucene WhitespaceTokenizer seems to support this parameter. We should probably enable setting this parameter in Elasticsearch as well.