Solutions: 30 — The wall at 1M → streaming

Exercise 1 — Streaming threshold

Per-creature footprint at full SoA: hot ~24 B, cold ~16 B, plus presence flags ~8 B = ~48 B. Plus index maps and indices into derived tables: round to ~64 B per live creature.

For a desktop with 32 GB RAM, allocating 16 GB to the simulator: 16 × 10⁹ / 64 = 250 million creatures. Above ~250M, the simulator must start streaming.

In practice the threshold is lower because logs, snapshots, and OS overhead consume RAM too. A safe budget might be 50–100M before the streaming architecture is needed.

Exercise 2 — Disk read cost

NVMe SSD: ~100 µs per read. A 33 ms tick budget is 33 000 µs / 100 µs ≈ 330 random disk reads. If a system would naturally make 10 000 disk reads per tick, the simulator is roughly 30× slower than budget unless reads are batched or sequential.

The fix is the same as for cache misses at the smaller scale: amortise the cost. Read a page of consecutive entries (4 KB = 64 cache lines = many rows) at one IOPS cost; touch all of them while the cost is paid.

Exercise 3 — Snapshot

#![allow(unused)]
fn main() {
fn snapshot_world(world: &World, path: &Path) -> std::io::Result<()> {
    let mut f = std::fs::File::create(path)?;
    use std::io::Write;
    f.write_all(&world.creatures.len().to_le_bytes())?;
    for &p in &world.pos     { f.write_all(&p.0.to_le_bytes())?; f.write_all(&p.1.to_le_bytes())?; }
    for &v in &world.vel     { f.write_all(&v.0.to_le_bytes())?; f.write_all(&v.1.to_le_bytes())?; }
    for &e in &world.energy  { f.write_all(&e.to_le_bytes())?; }
    for &i in &world.id      { f.write_all(&i.to_le_bytes())?; }
    Ok(())
}

fn load_world(path: &Path) -> std::io::Result<World> {
    // Read the columns back in the same order. Sizes from the prefixed length.
    todo!()
}
}

After a snapshot + load round-trip, the simulator continues running indistinguishably if and only if the simulator is deterministic (§16). Determinism is the precondition for snapshot/load to mean what we expect.

Exercise 4 — Windowed log

#![allow(unused)]
fn main() {
struct WindowedLog<E> {
    in_memory: std::collections::VecDeque<E>,
    capacity:  usize,
    disk:      std::fs::File,
}

impl<E: Encode> WindowedLog<E> {
    fn push(&mut self, e: E) {
        if self.in_memory.len() == self.capacity {
            // Evict oldest to disk, append new in memory.
            let oldest = self.in_memory.pop_front().unwrap();
            oldest.encode(&mut self.disk).unwrap();
        }
        self.in_memory.push_back(e);
    }
}
}

Queries inside the in-memory window are O(1) per entry. Queries past the window pay one disk seek (~100 µs) plus a sequential read for the requested span. The cost difference is the streaming wall.

Exercise 5 — Log-as-world reconstruction

To recover state at tick T:

Find the most recent snapshot at tick S ≤ T.
Load the snapshot into a fresh world.
Replay log entries from S to T in order.

Replay speed depends on the events-per-tick rate. For most simulators, replay is 10–100× faster than original-tick rate (no rendering, no I/O outside the log read). Reconstructing 1000 ticks of history takes ~1 second of replay time.

Exercise 6 — Document your bound

A well-tuned simulator at 30 Hz on a typical desktop:

~10 M creatures comfortable (RAM-resident, sub-budget).
~100 M creatures possible with hot/cold splits and sort-for-locality (RAM-resident, near-budget).
~1 B creatures requires streaming (working set exceeds RAM).
~10 B+ requires distributed simulation across multiple machines (covered in the monograph).

Each step is a different architecture, with the techniques in the previous step still applying. The book ends below the streaming wall; the monograph picks up above it.

Keyboard shortcuts