Skip to content

Multi-byte word problem with TrimStrings Middleware. #40577

@nshiro

Description

@nshiro
  • Laravel Version: 9.0.0-beta.3
  • PHP Version: 8.1.0

Description:

In 9.x, the trimString middleware removes NBSP. #38117
This is causing the problem. When dealing with multi-byte words (in my case Japanese), we have some problems. A few words are garbled.
When I put or in the end of text, those words will be garbled.

Steps to Reproduce (quick version)

Add below to the welcome.blade.php.

<h1>Hello {{ request()->name }}</h1>

Access the url.
http://localhost/?name=やま
http://localhost/?name=やまだ

(Replace localhost to your domain. やま or やまだ may be encoded in the URL field of your browser.)

You can see やま is ok. But if you add だ, it's not working.

2022-01-25_09h46_35

Steps To Reproduce: (Original version)

[Caution]
You cannot just copy and paste the below. NBSP is replaced with the normal space.
Please use NBSP as the second argument of the trim function.
You can copy NBSP from the real source code. (Please don't copy from the github website.)

$str = 'あいうえおかきくけこさしすせそたちつてとなにぬねのはひふへほまみむめもやゆよらをわがぎぐげござしずぜぞだぢづでどばびぶべぼぱぴぷぺぽ'
    . 'アイウエオカキクケコサシスセソタチツテトナニヌネノハヒフヘホマミムメモヤユヨラヲワガギグゲゴザジズゼゾダヂヅデドバビブベボパピプペポ';

$words = preg_split('//u', $str); // split a string into each words. (This part is not related with the problem)

foreach ($words as $word) {
    echo $word.' '.bin2hex($word).' '.trim($word, " ")."<br>";
}

2022-01-25_09h16_51

I guess the reason is that is like e381a0 and is like e383a0 and the NBSP is like U+00A0 in Unicode.
So If I put in the end of text, the last part of word a0 is trimmed and the word is garbled.
We (Japanese) also use chinese characters which I didn't looked into.

2022-01-24_15h06_27

2022-01-24 15 37 02 trim719fac98c688

Thank you for reading.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions