Computer Architecture 01 - Overview
Core concepts of Von Neumann architecture and how CPU, memory, and buses work together
Why Learn Computer Architecture?
In software development, you rarely interact with hardware directly. You write code in high-level languages, the operating system manages hardware, and the compiler translates everything to machine code. So is there really a need to understand the physical structure of a computer?
There is. To understand why cache misses become performance bottlenecks, why branch mispredictions stall the pipeline, and why memory visibility issues arise in multithreaded programs, hardware-level knowledge is essential. Software ultimately runs on hardware, and without understanding hardware characteristics, identifying the root cause of performance problems becomes very difficult.
Von Neumann Architecture
Most modern computers are based on the Von Neumann architecture. Proposed by John von Neumann in 1945, the core idea of this architecture is that programs and data are stored in the same memory.
Why was this revolutionary? Earlier computers had programs hardwired into the hardware. Performing a different computation required physically changing the wiring. In the Von Neumann architecture, simply replacing the program stored in memory allows the machine to perform entirely different tasks. The general-purpose computer became possible.
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ CPU โ
โ โโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โ
โ โ Control โ โ ALU โ โ
โ โ Unit (CU) โ โ โ โ
โ โโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Register File โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ System Bus
โโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Memory (RAM) โ
โ โโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโ โ
โ โ Program โ โ Data โ โ
โ โ(instructions)โ โ โ โ
โ โโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ I/O Devices โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
The Von Neumann architecture consists of three core components. The CPU interprets and executes instructions, memory stores programs and data, and buses handle data movement between them.
Basic CPU Operation
The CPU operates by repeating a remarkably simple cycle. This is called the instruction cycle, consisting of three stages: fetch, decode, and execute.
During the fetch stage, the CPU retrieves the next instruction from the memory address pointed to by the program counter (PC). During the decode stage, the control unit determines what operation the instruction specifies. During the execute stage, the ALU or other functional units perform the actual operation and store the result.
This cycle repeating billions of times per second is the operating principle of modern CPUs. A processor with a 3GHz clock speed can perform this cycle approximately 3 billion times per second. In practice, techniques like pipelining and superscalar execution allow multiple instructions to be processed simultaneously within a single cycle, making actual throughput considerably higher.
Registers
Registers are the fastest storage located inside the CPU. Accessing memory takes tens to hundreds of clock cycles, but accessing a register typically requires just one clock cycle.
The types of registers vary by architecture, but some are universally present.
| Register | Role |
|---|---|
| Program Counter (PC) | Address of the next instruction to execute |
| Stack Pointer (SP) | Address of the current top of the stack |
| Instruction Register (IR) | The currently executing instruction |
| General Purpose Registers | Used for data storage and computation |
| Status Register (FLAGS) | Status of operation results (zero, carry, overflow, etc.) |
The number of registers is limited. The x86-64 architecture has 16 general-purpose registers, while ARM has 31. One of the compiler's important tasks is efficiently allocating these limited registers. Keeping values in registers reduces memory accesses, significantly improving performance.
Memory and Buses
Memory is essentially a large array with byte-level addressing. For the CPU to access memory, it places the target address on the address bus, exchanges data through the data bus, and the control bus specifies whether the operation is a read or write.
This is where a fundamental limitation of the Von Neumann architecture becomes apparent. Since instructions and data reside in the same memory, fetching instructions and reading or writing data share the same bus. This is called the Von Neumann bottleneck. The CPU processes data very quickly, but the data transfer speed between CPU and memory cannot keep up.
To mitigate this bottleneck, modern processors introduce cache memory, separate instruction and data caches (a partial application of Harvard architecture), and use prefetching techniques. These optimization techniques will be examined one by one in subsequent posts.
Byte Order
Storing multi-byte data in memory also requires a choice. When storing the 4-byte integer 0x12345678, placing the most significant byte (0x12) at the lowest address is called big-endian, while placing the least significant byte (0x78) at the lowest address is called little-endian.
Is one approach superior? Technically, neither is absolutely better. In practice, however, most modern processors (x86, ARM) use little-endian, while network protocols adopt big-endian as their standard. This is precisely why byte order conversion is necessary in network programming.
What This Series Covers
This series starts from CPU internals and progresses through instruction set architectures, pipelining, privilege levels, interrupts, memory hierarchy, virtual memory, I/O, and modern multicore processors. We'll examine how each topic impacts software performance and system design.
In the next post, we'll look deeper into the internal structure of the CPU.