Linux Internals 02 - Processes and Threads

What Is a Process?

A program and a process are different things. A program is an executable file stored on disk, while a process is that program loaded into memory and actively running. Running the same program twice creates two independent processes, each with its own memory space and execution state.

The Linux kernel manages each process through a structure called task_struct. This structure contains everything the kernel needs to know about a process — its process ID, state, memory map, list of open files, scheduling information, signal handlers, and more. In the kernel source, task_struct is several kilobytes in size, which reflects just how much information the kernel must track to properly manage a single process.

Every process receives a unique PID (Process ID). The very first process created on the system has PID 1. Traditionally this was init, and on modern distributions systemd fills this role. PID 1 serves as the ancestor of every process on the system and also takes on the responsibility of adopting orphaned processes.

fork and exec: How Processes Are Born

The way Linux creates new processes is somewhat unusual. Rather than constructing a process from scratch, it duplicates an existing process and then replaces the duplicate with a new program — a two-step procedure.

The first step is fork(). Calling fork() creates a nearly identical copy of the current process. The parent and child have the same code, same data, and same open files — only their PIDs differ. The return value of fork() distinguishes the two: the parent receives the child's PID, while the child receives 0.

pid_t pid = fork();
if (pid == 0) {
    // Child process: execute a new program
    execvp("ls", args);
} else if (pid > 0) {
    // Parent process: wait for child to finish
    waitpid(pid, &status, 0);
}

The second step is the exec() family of functions. exec() replaces the current process's memory entirely with a new program. The code, data, heap, and stack are all replaced with those of the new program, but the PID remains the same. Every time you run a command in the shell, this fork-exec combination is what happens behind the scenes.

Is it efficient to copy all the memory every time? No, it isn't. That's why Linux uses a technique called Copy-on-Write (COW). At the time of fork(), the memory is not actually copied. Instead, the parent and child share the same physical memory pages. Only when one side attempts to modify a page is that page actually copied. If exec() is called right after fork(), the child's memory is entirely replaced by the new program, so the pages never needed to be copied at all.

Process States

A process transitions through several states from creation to termination. The main states a Linux process can have are as follows.

                   Selected by
    ┌──────────── scheduler ──────────┐
    │                                 ▼
 ┌──┴────────┐                 ┌──────────┐
 │   READY   │                 │ RUNNING  │
 │  (queued) │◄────────────────│(on CPU)  │
 └───────────┘  Time slice     └────┬─────┘
                exhausted           │
                               I/O request
                                    ▼
                             ┌──────────┐
                             │ SLEEPING │
                             │(waiting) │
                             └──────────┘

TASK_RUNNING means the process is either executing on a CPU or waiting in the run queue for its turn. TASK_INTERRUPTIBLE means it is sleeping while waiting for a specific event and can be woken by a signal. Processes waiting for disk I/O or network packets are typically in this state. TASK_UNINTERRUPTIBLE is a deeper sleep that cannot be interrupted even by signals — it is used in situations where interruption could cause data corruption, such as waiting for a hardware response.

Then there is TASK_ZOMBIE. A zombie process is one that has finished executing but whose parent has not yet collected its exit status. When a process terminates, the kernel releases most of its resources, but the task_struct and exit code are kept until the parent calls wait(). If the parent never calls wait(), the zombie persists. A zombie itself consumes almost no memory, but it occupies a PID. If zombies accumulate in large numbers, they can exhaust the PID space and become a real problem.

The Process Tree

All processes in Linux form a tree structure. PID 1, init (or systemd), sits at the root, and every subsequent process must have a parent. The pstree command makes this structure visible.

systemd─┬─sshd───sshd───bash───vim
        ├─nginx─┬─nginx
        │       └─nginx
        ├─cron
        └─rsyslogd

If a parent process terminates before its children, those children become orphan processes. The kernel reassigns their parent to PID 1. Because PID 1 periodically calls wait() to collect the exit status of its adopted children, this prevents zombies from accumulating indefinitely. This is precisely why the PID 1 problem matters in container environments. If the process running as PID 1 inside a container does not perform this reaping duty, zombie processes will pile up.

Threads: Processes Within a Process?

Traditionally, a process had only a single flow of execution. But if a web server needs to handle thousands of concurrent requests, creating a new process for each request is expensive. Each process has its own independent address space, so sharing data between processes requires separate mechanisms like IPC.

Threads solve this problem. Threads within the same process share code, data, the heap, and open files, while each maintains only its own stack and register state. Because memory sharing is the default, data exchange is fast, and creating a new thread costs far less than creating a new process.

The way Linux implements threads has an interesting distinction from other operating systems. The Linux kernel does not treat threads as a concept separate from processes. From the kernel's perspective, a thread is simply a process that shares resources with another process. Both are represented by task_struct.

What makes this possible is the clone() system call. While fork() copies nearly all resources, clone() allows fine-grained control through flags over which resources are shared and which are copied.

// Similar to fork: copies most resources
clone(fn, stack, 0, arg);

// Thread creation: shares memory, file descriptors, signal handlers, etc.
clone(fn, stack, CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_SIGHAND, arg);

CLONE_VM shares the memory space, CLONE_FILES shares the file descriptor table, and CLONE_SIGHAND shares signal handlers. By combining these flags, you can create anything on a continuous spectrum from a fully independent process to a thread that shares everything.

POSIX Threads

In application code, threads are typically created not by calling clone() directly but through the POSIX threads (pthread) API. On Linux, NPTL (Native POSIX Threads Library), the implementation of pthreads, internally calls clone() to create threads.

#include <pthread.h>

void *worker(void *arg) {
    // Work to be performed by the thread
    return NULL;
}

int main() {
    pthread_t thread;
    pthread_create(&thread, NULL, worker, NULL);
    pthread_join(thread, NULL);  // Wait for thread to finish
    return 0;
}

Are threads always a better choice than processes? No, they are not. Because threads share memory, a bug in one thread can affect every other thread in the same process. A single bad pointer dereference can bring down the entire process. Processes, on the other hand, have separate address spaces, so one process crashing does not affect the others. Nginx uses a worker process model, and Chrome runs each tab in a separate process, precisely to gain this isolation benefit.

The /proc Filesystem

Linux exposes information about running processes to user space through the /proc filesystem. /proc is not a real filesystem on disk — it is a virtual filesystem generated dynamically by the kernel.

# Information for process with PID 1234
ls /proc/1234/
cmdline  cwd  environ  exe  fd  maps  status  ...

# View the process's memory map
cat /proc/1234/maps

# View the process's status
cat /proc/1234/status
Name:   nginx
State:  S (sleeping)
Pid:    1234
PPid:   1
Threads: 4
VmRSS:  12340 kB

The /proc/[pid]/fd directory contains symbolic links representing every file descriptor the process has open. /proc/[pid]/maps shows the layout of the process's virtual memory regions. When system administrators or monitoring tools need to inspect a process's state, reading /proc is sufficient — there is no need to access kernel data structures directly. Tools like ps, top, and htop all read their information from /proc internally.

In the next post, we'll look at process scheduling.

Where to go next

What Is a Process?

fork and exec: How Processes Are Born

Process States

The Process Tree

Threads: Processes Within a Process?

POSIX Threads

The /proc Filesystem

Continue Reading

Linux Internals 03 - Process Scheduling

Linux Internals 04 - Memory Management

Linux Internals 05 - File Systems