Linux Internals 05 - File Systems

Everything Is a File

The most fundamental design principle running through Linux is the philosophy that "everything is a file." Not just documents stored on disk, but process information (/proc), devices (/dev), and even network sockets can all be accessed through the same file interface. Why is this principle so powerful? Because knowing just four system calls — open, read, write, close — lets you interact with disk files, serial ports, and kernel parameters in exactly the same way.

Without this philosophy, a separate API would have been needed for every device, and programmers would have had to learn an entirely different interface each time new hardware was added. Thanks to the single abstraction of files, reading CPU information with cat /proc/cpuinfo and reading a configuration file with cat /etc/hostname become exactly the same operation.

VFS: A File System Above File Systems

Linux supports dozens of file systems — ext4, XFS, Btrfs, tmpfs, procfs, and more. Yet when a user runs ls, they never need to care which file system the current directory resides on. This is possible because of an abstraction layer called VFS (Virtual File System).

VFS defines a common interface that all file systems must implement. Each file system provides its own implementation conforming to this interface, and the rest of the kernel only needs to call the VFS interface. This is essentially the same concept as polymorphism in object-oriented design.

┌─────────────────────────────────────┐
│        User Space Processes          │
│     open() / read() / write()       │
├─────────────────────────────────────┤
│       VFS (Virtual File System)      │
│  superblock · inode · dentry · file  │
├────────┬────────┬────────┬──────────┤
│  ext4  │  XFS   │ tmpfs  │  procfs  │
├────────┴────────┴────────┴──────────┤
│          Block Device Layer          │
├─────────────────────────────────────┤
│        Physical Disk / SSD           │
└─────────────────────────────────────┘

VFS manages four core objects. The superblock holds metadata about the entire file system (block size, mount status, and so on). The inode represents metadata for an individual file. The dentry is a directory entry that provides the mapping between a file name and its inode. The file object tracks the state of a file that a process has open (current offset, access mode, etc.).

Inodes: The Identity of a File

In a file system, a file's name is not actually its essence. A file's true identity is its inode number. The inode records the file's owner, permissions, size, timestamps, and the locations of disk blocks where actual data is stored. What's interesting is that the file name is not stored in the inode. File names live in the dentry of a directory, and it is the dentry that points to the inode.

This design is what makes hard links possible. Since multiple dentries can point to a single inode, a single file can have multiple names. You can check a file's inode number with ls -i and view detailed inode information with stat.

$ ls -i /etc/hostname
1234567 /etc/hostname

$ stat /etc/hostname
  File: /etc/hostname
  Size: 12        Blocks: 8        IO Block: 4096   regular file
Device: 801h/2049d   Inode: 1234567   Links: 1
Access: (0644/-rw-r--r--)  Uid: (0/root)   Gid: (0/root)
Access: 2026-01-15 10:30:00.000000000 +0900
Modify: 2026-01-10 08:00:00.000000000 +0900
Change: 2026-01-10 08:00:00.000000000 +0900

Hard Links and Symbolic Links

A hard link creates another dentry pointing to the same inode. The original file and the hard link are completely equal — there is no way to distinguish which is the "original." The file's inode contains a link count, and the actual data blocks are freed only when all hard links are deleted and the link count drops to zero.

Why are symbolic links (soft links) needed, then? Because hard links have two limitations. First, hard links can only be created within the same file system. Since different file systems have independent inode number spaces, cross-referencing is impossible. Second, hard links to directories are generally disallowed because they can create circular reference problems.

Symbolic links work around these limitations. A symbolic link has its own separate inode, and its content is a path string pointing to the target file. When the kernel encounters a symbolic link, it follows the stored path to find the actual file. If the target file is deleted, the symbolic link becomes a dangling link — something that never happens with hard links.

ext4 File System Internals

ext4 is currently the most widely used file system on Linux. It is the product of an evolution that started with ext2, added journaling in ext3, and enhanced support for large files and large partitions in ext4.

ext4 manages the disk by dividing it into units called block groups. Each block group contains that group's inode table, data block bitmap, and inode bitmap. The reason for maintaining this locality is to reduce disk seek time by placing the inodes and data of related files in physically close locations.

┌──────────────────────────────────────────┐
│            ext4 Disk Layout              │
├──────┬───────────────────────────────────┤
│Super │          Block Group 0            │
│block │ GDT│Bitmap│Inode Table│Data Blocks │
├──────┼───────────────────────────────────┤
│      │          Block Group 1            │
│      │ GDT│Bitmap│Inode Table│Data Blocks │
├──────┼───────────────────────────────────┤
│      │          Block Group N            │
│      │ GDT│Bitmap│Inode Table│Data Blocks │
└──────┴───────────────────────────────────┘

One important innovation in ext4 is extent-based block mapping. The previous generation — ext2/ext3 — used indirect block pointers to track each block of a file individually. For large files, this required traversing double and triple indirect pointers, degrading performance. ext4's extents need only record the starting position and length of contiguous blocks, dramatically reducing metadata overhead for large files.

Journaling and Crash Recovery

Writing data to a file system involves multiple stages of disk writes. The inode must be updated, data blocks written, and bitmaps refreshed. What happens if a power failure occurs in the middle of this process? Without journaling, the file system can end up in an inconsistent state. The inode might point to a data block whose data was never actually written, or vice versa.

Journaling solves this problem. Before writing the actual data, it first writes a record to a separate journal area saying "these are the changes I intend to make." Once the changes are complete, the journal entry is marked as committed. If a crash occurs, the journal is replayed to either roll back or reapply incomplete changes. This is called write-ahead logging — the same principle that databases have used for decades.

ext4 offers three journaling modes. Journal mode records both metadata and data to the journal, providing the highest safety but the lowest performance. Ordered mode records only metadata to the journal but guarantees that data blocks are written to disk before the metadata commit. Writeback mode records only metadata to the journal without guaranteeing data write order — the fastest option, but file contents may be corrupted after a crash. The default is ordered mode, striking a balance between safety and performance.

File Descriptors

When a process opens a file, the kernel returns an integer called a file descriptor. This integer is an index into the process's file descriptor table, where each entry points to a kernel file table entry, which in turn points to an inode.

Every process starts with three file descriptors by default. Descriptor 0 is standard input (stdin), 1 is standard output (stdout), and 2 is standard error (stderr). The > redirection in the shell works precisely because of file descriptors. command > output.txt simply changes file descriptor 1 to point to the output.txt file.

$ ls -l /proc/self/fd
lrwx------ 1 user user 64 Jan 28 10:00 0 -> /dev/pts/0
lrwx------ 1 user user 64 Jan 28 10:00 1 -> /dev/pts/0
lrwx------ 1 user user 64 Jan 28 10:00 2 -> /dev/pts/0

File descriptors are simple integers, but this abstraction forms the backbone of Unix pipelines. In ls | grep txt, the stdout (fd 1) of ls connects to the write end of a pipe, and the stdin (fd 0) of grep connects to the read end — and that is how data flows between processes.

Mount Points

To use a file system in Linux, it must first be mounted at a specific point in the directory tree. This point is the mount point. Unlike Windows, which uses drive letters like C: and D:, Linux integrates all file systems into a single directory tree.

The root file system (/) forms the base of the tree, and other file systems are mounted at subdirectories within it. /home might be a separate partition, /tmp might be a memory-based tmpfs, but to the user it all appears as one continuous directory tree. This is the transparency that VFS's mount abstraction provides.

$ mount
/dev/sda1 on / type ext4 (rw,relatime)
/dev/sda2 on /home type ext4 (rw,relatime)
tmpfs on /tmp type tmpfs (rw,nosuid,nodev)
proc on /proc type proc (rw,nosuid,nodev,noexec)
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec)

Virtual file systems like /proc and /sys — which have no actual disk behind them — are mounted in exactly the same way. They don't store data on disk, but they serve as a means for the kernel to expose information through the file system interface. This is precisely why process information, hardware parameters, and kernel statistics can all be read as if they were files.

In the next post, we'll look at system calls and the internal workings of the kernel.

Where to go next

Everything Is a File

VFS: A File System Above File Systems

Inodes: The Identity of a File

Hard Links and Symbolic Links

ext4 File System Internals

Journaling and Crash Recovery

File Descriptors

Mount Points

Continue Reading

Linux Internals 06 - System Calls and the Kernel

Linux Internals 07 - I/O and Devices

Linux Internals 08 - Synchronization and Concurrency