Career upgrade: Learn practical AI skills for better jobs and higher pay.
Level up

5.3 Computer Architecture & Microprocessors

Key Takeaways

  • The fetch-decode-execute cycle is the fundamental instruction loop driven by the program counter and clock.
  • Pipelining overlaps instruction stages so throughput approaches one instruction per clock cycle despite multi-cycle latency.
  • The memory hierarchy trades speed for size: registers, then cache (L1/L2/L3), main memory, and finally disk.
  • Effective access time = hit rate x cache time + miss rate x memory time, so even a 95% hit rate matters greatly.
  • Interrupts let I/O devices signal the CPU asynchronously, avoiding wasteful polling and improving responsiveness.
Last updated: May 2026

The CPU, memory, and buses

A classic von Neumann computer has three parts joined by buses: the central processing unit (CPU), main memory, and input/output (I/O). The CPU contains the arithmetic logic unit (ALU) that performs computation, a control unit that sequences operations, and fast registers including the program counter (PC) that holds the address of the next instruction.

Three buses connect these parts: the address bus carries the location to access, the data bus carries the value, and the control bus carries read/write and timing signals. Address bus width sets the maximum addressable memory: an n-bit address bus can address 2^n locations, so a 32-bit address space spans 4 GiB of byte-addressable memory.

The instruction cycle

Every instruction passes through the fetch-decode-execute cycle:

  1. Fetch: the control unit reads the instruction at the address in the program counter, then increments the PC.
  2. Decode: the control unit interprets the opcode and identifies the operands.
  3. Execute: the ALU or memory unit performs the operation and stores the result, possibly writing back to a register.

Instruction sets fall into two philosophies. RISC (reduced instruction set computer) uses many simple, fixed-length instructions that pipeline cleanly. CISC (complex instruction set computer) uses fewer, more powerful variable-length instructions. The FE may ask you to identify which approach favors pipelining; the answer is RISC.

Pipelining

Pipelining overlaps the stages of consecutive instructions like an assembly line, so while one instruction executes, the next decodes and a third fetches. A k-stage pipeline ideally raises throughput toward one instruction per clock cycle even though each individual instruction still takes k cycles end to end.

Pipelines stall on hazards: a data hazard when an instruction needs a result not yet written, a control hazard after a branch changes the instruction flow, and a structural hazard when two stages compete for the same resource. Techniques like forwarding, branch prediction, and stalls (bubbles) keep the pipeline correct at some throughput cost.

The memory hierarchy

Faster memory is smaller and more expensive per bit, so systems layer it.

LevelTypical AccessRelative SizeManaged By
Registers< 1 nsbytesCompiler / CPU
L1 cache~1 nstens of KBHardware
L2 / L3 cachea few nsKB to MBHardware
Main memory (RAM)~50-100 nsGBOperating system
Disk / SSDmicroseconds to msTBOperating system

The hierarchy works because of locality of reference: programs tend to reuse recently accessed data (temporal locality) and nearby data (spatial locality).

Cache effectiveness and virtual memory

A cache hit is when requested data is found in cache; a miss forces a slower fetch from the next level. The effective access time is:

EAT = (hit rate x cache access time) + (miss rate x memory access time)

For a 95% hit rate with a 1 ns cache and 100 ns memory, EAT = 0.95 x 1 + 0.05 x 100 = 5.95 ns, far better than 100 ns. Virtual memory uses disk as an extension of RAM, dividing the address space into pages mapped by a page table; a page fault occurs when a referenced page is not resident and must be loaded from disk.

Test Your Knowledge

A cache has a 90% hit rate, a 2 ns cache access time, and a 100 ns main-memory access time. What is the effective memory access time?

A
B
C
D

Interrupts and I/O

The CPU communicates with devices through I/O. Three approaches appear on the FE:

  • Programmed I/O (polling): the CPU repeatedly checks a status flag, wasting cycles while it waits.
  • Interrupt-driven I/O: the device raises an interrupt when ready; the CPU suspends its current work, saves state, runs an interrupt service routine (ISR), then resumes. This avoids busy-waiting.
  • Direct memory access (DMA): a DMA controller transfers data between memory and a device without the CPU moving each word, freeing the processor for computation.

Interrupts are prioritized so urgent events (such as a timer or power failure) preempt less critical ones.