Solutions: 1 — The machine model

These exercises are about measuring your machine. Numbers vary; ratios are stable. Run them and write down what you see.

Exercise 1 — Cache sizes

Linux: lscpu | grep -E 'L1|L2|L3' or getconf -a | grep CACHE.

Typical desktop x86-64 in 2026: L1d 32-48 KB per core, L2 1-2 MB per core, L3 16-128 MB shared. Apple Silicon: larger L1, very large shared L2.

Exercise 2 — Sequential sum

use std::time::Instant;

fn main() {
    // Playground-scaled. Use 100_000_000 locally for the real number.
    let n = 10_000_000;
    let v: Vec<u64> = vec![1; n];
    let start = Instant::now();
    let sum: u64 = v.iter().sum();
    let elapsed = start.elapsed();
    let ns_per_elem = elapsed.as_nanos() as f64 / v.len() as f64;
    println!("sum = {sum}, {elapsed:?}, {ns_per_elem:.2} ns/elem");
}

Expect somewhere around 0.2-1 ns per element on modern hardware. The loop is memory-bandwidth bound; the CPU is mostly waiting for RAM to deliver lines.

Exercise 3 — Random-access sum

use std::time::Instant;

fn main() {
    // Playground-scaled. Use 100_000_000 locally for the real number.
    let n: usize = 10_000_000;
    let v: Vec<u64> = vec![1; n];
    let mut state = 0xDEAD_BEEFu64;
    let indices: Vec<usize> = (0..n)
        .map(|_| {
            state = state.wrapping_mul(6364136223846793005).wrapping_add(1442695040888963407);
            (state as usize) % n
        })
        .collect();

    let start = Instant::now();
    let mut sum = 0u64;
    for &i in &indices {
        sum += v[i];
    }
    let elapsed = start.elapsed();
    println!("sum = {sum}, {elapsed:?}, {:.2} ns/elem",
             elapsed.as_nanos() as f64 / n as f64);
}

Expect 30-100 ns per element — close to the RAM-latency cost. Each access misses cache.

The transitions you see roughly correspond to spilling out of L1 (~32 KB), L2 (~1-2 MB), L3 (~32 MB). Below L1 you should see ~0.1-0.3 ns/elem. In L3 maybe 0.5-1.5 ns. Past L3, 0.5-3 ns (sequential, since prefetcher helps even from RAM).

For random-access cliffs (a more dramatic plot), repeat exercise 3 at sizes 1K, 10K, 100K, 1M, 10M, 100M. The transitions are sharper.

Exercise 5 — Pointer chasing

use std::time::Instant;

struct Node { value: u64, next: Option<Box<Node>> }

fn build(n: usize) -> Box<Node> {
    let mut head = Box::new(Node { value: 1, next: None });
    for _ in 1..n {
        head = Box::new(Node { value: 1, next: Some(head) });
    }
    head
}

fn sum(mut head: &Node) -> u64 {
    let mut s = 0;
    loop {
        s += head.value;
        match &head.next {
            Some(next) => head = next,
            None => break s,
        }
    }
}

fn main() {
    // Playground-scaled. Use 1_000_000 locally for the real number.
    let n = 100_000;
    let head = build(n);
    let start = Instant::now();
    let s = sum(&head);
    let elapsed = start.elapsed();
    println!("sum = {s}, {elapsed:?}, {:.2} ns/elem",
             elapsed.as_nanos() as f64 / n as f64);
}

A Vec<u64> sum is roughly 1 ns/elem; the linked-list walk is roughly 50-100 ns/elem. The ratio is the L1-to-RAM gap from the prose.

Important: building the linked list with deep recursion would blow the stack at large N. The build function above uses a loop precisely so it can scale. The sum is also iterative — a recursive walk would blow the stack on the way down.

Exercise 6 — Reading lscpu against your benchmarks

The transitions are noisy because:

Cache levels overlap (a hot cache line might still be in L1 after spilling to L2).
Hardware prefetchers help sequential reads.
The OS may evict pages between runs.

If your noise is worse than your signal, run each measurement multiple times and take the median.

An Introduction to Programming, using ECS & EBP in Rust

Solutions: 1 — The machine model

Exercise 1 — Cache sizes

Exercise 2 — Sequential sum

Exercise 3 — Random-access sum

Exercise 4 — Cache cliffs

Exercise 5 — Pointer chasing

Exercise 6 — Reading lscpu against your benchmarks

Keyboard shortcuts

An Introduction to Programming, using ECS & EBP in Rust

Solutions: 1 — The machine model

Exercise 1 — Cache sizes

Exercise 2 — Sequential sum

Exercise 3 — Random-access sum

Exercise 4 — Cache cliffs

Exercise 5 — Pointer chasing

Exercise 6 — Reading lscpu against your benchmarks