Solutions: 26 — Hot/cold splits
Exercise 1 — Audit access patterns
For the simulator’s eight systems, the field accesses look roughly like this:
| system | reads | writes |
|---|---|---|
motion | pos, vel, energy | pos, energy |
food_spawn | food_spawner.region | food (via insert) |
next_event | pos, food.pos, creature.energy | pending_event |
apply_eat | pending_event, food.value | to_remove, energy |
apply_reproduce | pending_event, pos, energy | to_insert |
apply_starve | pending_event, id | to_remove |
cleanup | to_remove, to_insert, id, gen | every column |
inspect | every column | (nothing) |
Hot fields (read by motion, next_event, apply_eat, apply_reproduce, apply_starve every tick): pos, vel, energy. Cold: birth_t, id, gen (cleanup and inspect only).
Exercise 2 — Build the split
#![allow(unused)]
fn main() {
struct CreatureHot {
pos: Vec<(f32, f32)>,
vel: Vec<(f32, f32)>,
energy: Vec<f32>,
}
struct CreatureCold {
birth_t: Vec<f64>,
id: Vec<u32>,
gen: Vec<u32>,
}
fn append(hot: &mut CreatureHot, cold: &mut CreatureCold, row: CreatureRow) {
hot.pos.push(row.pos);
hot.vel.push(row.vel);
hot.energy.push(row.energy);
cold.birth_t.push(row.birth_t);
cold.id.push(row.id);
cold.gen.push(row.gen);
}
}
Both tables share the slot index. hot.pos[17] and cold.id[17] describe the same creature.
Exercise 3 — Time motion at 1M
Pre-split: motion’s per-tick cost ≈ 3 ns/elem × 1M = 3 ms. Post-split: ≈ 1.5 ns/elem × 1M = 1.5 ms. The factor of 2 is roughly the bandwidth saved by not reading birth_t, id, gen on each iteration.
Exercise 4 — Cleanup must touch both
#![allow(unused)]
fn main() {
fn delete_creature(hot: &mut CreatureHot, cold: &mut CreatureCold, slot: usize) {
hot.pos.swap_remove(slot);
hot.vel.swap_remove(slot);
hot.energy.swap_remove(slot);
cold.birth_t.swap_remove(slot);
cold.id.swap_remove(slot);
cold.gen.swap_remove(slot);
}
}
Six swap_remove calls instead of three. Still O(6) per delete; the cost is unchanged. Alignment is preserved across both tables because the same slot is removed in lockstep.
Exercise 5 — A bad split
If energy is moved to creature_cold, motion’s loop now misses cache on every read of energy — a cache line per row instead of one cache line per several rows. The bandwidth saved on birth_t is dwarfed by the bandwidth lost on energy. Motion gets ~1.3× slower, not faster.
The lesson: which fields are hot is decided by the inner loops, not by the data model.
Exercise 6 — The all-fields case
A serialiser reads every field. With the split, it reads two tables instead of one — the cost of the second Vec traversal plus the cost of the second range of cache lines. About 5–10% overhead vs the unsplit version.
This is fine. The serialiser does not run every tick; it runs at snapshot points. The hot path runs every tick and pays the much larger savings. Average-case cost goes down even though the worst-case cost goes up slightly.