Skip to content

Docs: UTF-8 character can take up 4 bytes #27060

@ChrisGreenaway

Description

@ChrisGreenaway

Elasticsearch version (bin/elasticsearch --version): master

Plugins installed: N/A

JVM version (java -version): N/A

OS version (uname -a if on a Unix-like system): N/A

Description of the problem including expected versus actual behavior:

ignore-above.asciidoc says "If you use UTF-8 text with many non-ASCII characters, you may want to set the limit to 32766 / 3 = 10922 since UTF-8 characters may occupy at most 3 bytes." however a UTF-8 character can take up 4 bytes.

Steps to reproduce:

Look at ignore-above.asciidoc

Provide logs (if relevant):

Metadata

Metadata

Assignees

No one assigned

    Labels

    >docsGeneral docs changes

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions