@@ -183,10 +183,10 @@ in the place where the name normally goes. The structure is
183183 - det_checksum
184184 - Directory leaf block checksum.
185185
186- The leaf directory block checksum is calculated against the FS UUID, the
187- directory's inode number, the directory's inode generation number, and
188- the entire directory entry block up to (but not including) the fake
189- directory entry.
186+ The leaf directory block checksum is calculated against the FS UUID (or
187+ the checksum seed, if that feature is enabled for the fs), the directory's
188+ inode number, the directory's inode generation number, and the entire
189+ directory entry block up to (but not including) the fake directory entry .
190190
191191Hash Tree Directories
192192~~~~~~~~~~~~~~~~~~~~~
@@ -196,37 +196,37 @@ new feature was added to ext3 to provide a faster (but peculiar)
196196balanced tree keyed off a hash of the directory entry name. If the
197197EXT4_INDEX_FL (0x1000) flag is set in the inode, this directory uses a
198198hashed btree (htree) to organize and find directory entries. For
199- backwards read-only compatibility with ext2, this tree is actually
200- hidden inside the directory file, masquerading as “empty” directory data
201- blocks! It was stated previously that the end of the linear directory
202- entry table was signified with an entry pointing to inode 0 ; this is
203- (ab)used to fool the old linear-scan algorithm into thinking that the
204- rest of the directory block is empty so that it moves on .
199+ backwards read-only compatibility with ext2, interior tree nodes are actually
200+ hidden inside the directory file, masquerading as “empty” directory entries
201+ spanning the whole block. It was stated previously that directory entries
202+ with the inode set to 0 are treated as unused entries ; this is (ab)used to
203+ fool the old linear-scan algorithm into skipping over those blocks containing
204+ the interior tree node data .
205205
206206The root of the tree always lives in the first data block of the
207207directory. By ext2 custom, the '.' and '..' entries must appear at the
208208beginning of this first block, so they are put here as two
209209``struct ext4_dir_entry_2 `` s and not stored in the tree. The rest of
210210the root node contains metadata about the tree and finally a hash->block
211211map to find nodes that are lower in the htree. If
212- ``dx_root.info.indirect_levels `` is non-zero then the htree has two
213- levels; the data block pointed to by the root node's map is an interior
214- node, which is indexed by a minor hash. Interior nodes in this tree
215- contains a zeroed out `` struct ext4_dir_entry_2 `` followed by a
216- minor_hash->block map to find leafe nodes. Leaf nodes contain a linear
217- array of all `` struct ext4_dir_entry_2 ``; all of these entries
218- (presumably) hash to the same value. If there is an overflow, the
219- entries simply overflow into the next leaf node, and the
220- least-significant bit of the hash (in the interior node map) that gets
221- us to this next leaf node is set.
222-
223- To traverse the directory as a htree, the code calculates the hash of
224- the desired file name and uses it to find the corresponding block
225- number. If the tree is flat , the block is a linear array of directory
226- entries that can be searched; otherwise, the minor hash of the file name
227- is computed and used against this second block to find the corresponding
228- third block number. That third block number will be a linear array of
229- directory entries .
212+ ``dx_root.info.indirect_levels `` is non-zero then the htree has that many
213+ levels and the blocks pointed to by the root node's map are interior nodes.
214+ These interior nodes have a zeroed out `` struct ext4_dir_entry_2 `` followed by
215+ a hash->block map to find nodes of the next level. Leaf nodes look like
216+ classic linear directory blocks, but all of its entries have a hash value
217+ equal or greater than the indicated hash of the parent node.
218+
219+ The actual hash value for an entry name is only 31 bits, the least-significant
220+ bit is set to 0. However, if there is a hash collision between directory
221+ entries, the least-significant bit may get set to 1 on interior nodes in the
222+ case where these two (or more) hash-colliding entries do not fit into one leaf
223+ node and must be split across multiple nodes.
224+
225+ To look up a name in such a htree , the code calculates the hash of the desired
226+ file name and uses it to find the leaf node with the range of hash values the
227+ calculated hash falls into (in other words, a lookup works basically the same
228+ as it would in a B-Tree keyed by the hash value), and possibly also scanning
229+ the leaf nodes that follow (in tree order) in case of hash collisions .
230230
231231To traverse the directory as a linear array (such as the old code does),
232232the code simply reads every data block in the directory. The blocks used
@@ -319,7 +319,8 @@ of a data block:
319319 * - 0x24
320320 - __le32
321321 - block
322- - The block number (within the directory file) that goes with hash=0.
322+ - The block number (within the directory file) that lead to the left-most
323+ leaf node, i.e. the leaf containing entries with the lowest hash values.
323324 * - 0x28
324325 - struct dx_entry
325326 - entries[0]
@@ -442,12 +443,12 @@ The dx_tail structure is 8 bytes long and looks like this:
442443 * - 0x0
443444 - u32
444445 - dt_reserved
445- - Zero .
446+ - Unused (but still part of the checksum curiously) .
446447 * - 0x4
447448 - __le32
448449 - dt_checksum
449450 - Checksum of the htree directory block.
450451
451452The checksum is calculated against the FS UUID, the htree index header
452453(dx_root or dx_node), all of the htree indices (dx_entry) that are in
453- use, and the tail block (dx_tail).
454+ use, and the tail block (dx_tail) with the dt_checksum initially set to 0 .
0 commit comments