@@ -91,10 +91,48 @@ Currently Available
9191* large block (up to pagesize) support
9292* efficient new ordered mode in JBD2 and ext4 (avoid using buffer head to force
9393 the ordering)
94+ * Case-insensitive file name lookups
9495
9596[1] Filesystems with a block size of 1k may see a limit imposed by the
9697directory hash tree having a maximum depth of two.
9798
99+ case-insensitive file name lookups
100+ ======================================================
101+
102+ The case-insensitive file name lookup feature is supported on a
103+ per-directory basis, allowing the user to mix case-insensitive and
104+ case-sensitive directories in the same filesystem. It is enabled by
105+ flipping the +F inode attribute of an empty directory. The
106+ case-insensitive string match operation is only defined when we know how
107+ text in encoded in a byte sequence. For that reason, in order to enable
108+ case-insensitive directories, the filesystem must have the
109+ casefold feature, which stores the filesystem-wide encoding
110+ model used. By default, the charset adopted is the latest version of
111+ Unicode (12.1.0, by the time of this writing), encoded in the UTF-8
112+ form. The comparison algorithm is implemented by normalizing the
113+ strings to the Canonical decomposition form, as defined by Unicode,
114+ followed by a byte per byte comparison.
115+
116+ The case-awareness is name-preserving on the disk, meaning that the file
117+ name provided by userspace is a byte-per-byte match to what is actually
118+ written in the disk. The Unicode normalization format used by the
119+ kernel is thus an internal representation, and not exposed to the
120+ userspace nor to the disk, with the important exception of disk hashes,
121+ used on large case-insensitive directories with DX feature. On DX
122+ directories, the hash must be calculated using the casefolded version of
123+ the filename, meaning that the normalization format used actually has an
124+ impact on where the directory entry is stored.
125+
126+ When we change from viewing filenames as opaque byte sequences to seeing
127+ them as encoded strings we need to address what happens when a program
128+ tries to create a file with an invalid name. The Unicode subsystem
129+ within the kernel leaves the decision of what to do in this case to the
130+ filesystem, which select its preferred behavior by enabling/disabling
131+ the strict mode. When Ext4 encounters one of those strings and the
132+ filesystem did not require strict mode, it falls back to considering the
133+ entire string as an opaque byte sequence, which still allows the user to
134+ operate on that file, but the case-insensitive lookups won't work.
135+
98136Options
99137=======
100138
0 commit comments