XFS uses several different directory structures depending on the size of the directory. For testing purposes, I created three directories– one with 5 files, one with 50, and one with 5000 file entries. Small directories have their data stored in the inode. In this installment we’ll examine the inode of the directory that contains only five files.
We documented the “inode core” layout and the format of the extended attributes in Part 2 of this series. In this inode the file type (upper nibble of byte 2) is 4, which means it’s a directory. The data fork type (byte 5) is 1, meaning resident data.
Resident directory data is stored as a “short form” directory structure starting at byte offset 176, right after the inode core. First we have a brief header:
176 Number of directory entries 5 177 Number of dir entries needing 64-bit inodes 0 178-181 Inode of parent 0x04159fa1
First we have a byte tracking the number of directory entries to follow the header. The next byte tracks how many directory entries require 64 bits for inode data. As we saw in Part 1 of this series, XFS uses variable length addreses for blocks and inodes. In our file system, we need less than 32 bits to store these addresses, so there are no directory entries requiring 64-bit inodes. This means the directory data will use 32 bits to store inodes in order to save space.
This has an immediate impact because the next entry in the header is the inode of the parent directory. Since byte 177 is zero, this field will be 32 bits. If byte 177 was non-zero, then all inode entries in the header and directory entries would be 64-bit.
The parent inode field in the header is the equivalent of the usual “..” link in the directory. The current directory inode (the “.” link) is found in the inode core in bytes 152-159. The short form directory simply uses these values and does not have explicit “.” and “..” entries.
After the header come a series of variable length directory entries, packed as tightly as possible with no alignment constraints. Entries are added to the directory in order of file creation and are not sorted in any way.
Here is a description of the fields and a breakdown of the values in the five directories in this inode:
Len (Bytes) Field 1 Length of file name (in bytes) 2 Entry offset in non short form directory varies Characters in file name 1 File type 4 or 8 Absolute inode address Len Offset Name Type Inode === ====== ==== ==== ===== 12 0x0060 01_smallfile 01 0x0417979d 10 0x0078 02_bigfile 01 0x0417979e 12 0x0090 03_smallfile 01 0x0417979f 10 0x00a8 04_bigfile 01 0x0417a154 12 0x00c0 05_smallfile 01 0x0417a155
First we have a single byte for the file name length in bytes. Like other Unix file systems, there is a 255 character file name limit.
The next two bytes are based on the byte offset the directory entry would have if it were a normal XFS directory entry and not packed into a short form directory in the inode. In a normal directory block, directory entries are 64-bit aligned and start at byte offset 96 (0x60) following the directory header and “.” and “..” entries. The directory entries here are all 18 or 20 bytes long, which means they would consume 24 bytes (0x18) in a normal directory block. Using a consistent numbering scheme for the offset makes it easier to write code that iterates through directory entries, even though the offsets don’t match the actual offset of each directory entry in the short form style.
Next we have the characters in the file name followed by a single byte for the file type. The file type is included in the directory entry so that commands like “ls -F” don’t have to open each inode to get the file type information. The file type values in the directory entry do not use the same number scheme as the file type in the inode. Here are the expected values for directory entries:
1 Regular file 2 Directory 3 Character special device 4 Block special device 5 FIFO 6 Socket 7 Symlink
Finally there is a field to hold the inode associated with the file name. In our example, these inode entries are 32 bits. 64-bit inode fields will be used if the directory header indicates they are needed.
Deleting a File
When a file is deleted from (or added to) a directory, the mtime and ctime in the directory’s inode core are updated. The directory file size changes (bytes 56-63). The CRC32 checksum and the logfile sequence number fields are updated.
In the data fork, all directory entries after the deleted entry are shifted downwards, completely overwriting the deleted entry. Here’s what the directory entries look like after “03_smallfile”– the third entry in the original directory– is deleted:
The four remaining directory entries are highlighted above. However, after those entries you can clearly see the residue of the entry for “05_smallfile” from the original directory. So as short-form directories shrink, they leave behind entries in the unused “inode slack”. In this case the residue is for a file entry that still exists in the directory, but it’s possible that we might get residue of entries deleted from the end of the directory list.
When Directories Grow Up
Another place you can see short form directory residue is when the directory gets large enough that it needs to move out to blocks on disk. I created a sample directory that initially had five files and confirmed that it was being stored as a short form directory in the inode. Then I added 45 more files to the directory, which made a short form directory impossible. Here’s what the first part of the inode looks like after these two operations:
The data fork type (byte 5) is 2, meaning an extent list after the inode core, giving the location of the directory content on disk. You can see the extent highlighted starting at byte offset 176 (0xb0). But immediately after that extent you can see the residue of the original short-form directory.
The format of directories changes significantly when directory entries move out into disk blocks. In our next installment we will examine the structures in these larger directories.