Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Solutions: 19 — EBP dispatch

Exercise 1 — Compare the two

#![allow(unused)]
fn main() {
const HUNGER_BURN: f32 = 0.1;

fn drive_hunger_filtered(
    is_hungry: &[bool], energy: &mut [f32], dt: f32,
) {
    for slot in 0..is_hungry.len() {
        if is_hungry[slot] {
            energy[slot] -= HUNGER_BURN * dt;
        }
    }
}

fn drive_hunger_ebp(
    hungry: &[u32], id_to_slot: &[u32], energy: &mut [f32], dt: f32,
) {
    for &id in hungry {
        let slot = id_to_slot[id as usize] as usize;
        energy[slot] -= HUNGER_BURN * dt;
    }
}
}

At 1M creatures with 10 % hungry, on a typical desktop:

  • Filtered: ~1-2 ms (1M slots × ~1 ns each, all sequential, prefetcher is happy).
  • EBP: ~0.2-0.5 ms (100 K slot accesses, some random via id_to_slot, but only 10 % the work).

The ratio is roughly 4-10×. EBP wins more cleanly at sparser states.

Exercise 2 — Sparsity test

fraction hungryfiltered (ms)EBP (ms)
1 %~1.5~0.05
10 %~1.5~0.3
50 %~1.5~1.2
90 %~1.5~2.0

(Numbers vary by chip; the shape is what matters.) The filtered cost is roughly constant — it walks the full table regardless. The EBP cost is roughly linear in the active fraction. Their cross-over is around 50-70 %, after which filtered wins (because random id_to_slot lookups become the bottleneck for EBP).

Exercise 3 — Multi-state systems

#![allow(unused)]
fn main() {
drive_hunger(&hungry, &id_to_slot, &mut energy, dt);
drive_sleep(&sleepy, &id_to_slot, &mut energy, dt);
drive_death(&dead, &id_to_slot, &mut energy, dt);
}

Three EBP systems, each iterating its own table. Each is bandwidth-bound by the active count, not by the population. The single-filtered-loop alternative looks like:

#![allow(unused)]
fn main() {
for slot in 0..1_000_000 {
    if is_hungry[slot] { /* hunger work */ }
    else if is_sleepy[slot] { /* sleep work */ }
    else if is_dead[slot] { /* dead work */ }
}
}

— and walks 1M rows × 3 flag checks per row × cache-bandwidth cost. Three EBP systems combined are typically 5-20× cheaper than the single filtered version, depending on sparsity.

Exercise 4 — The branch you do not write

The filtered version’s inner loop generates roughly:

mov   al, [is_hungry + slot]
test  al, al
je    .skip
; ... work ...
.skip:

The EBP version’s inner loop has no je for membership; the dispatch is the iteration. Freed from the branch, the compiler can usually emit SIMD over hungry’s u32 slots and over the id_to_slot mapping in parallel.

Exercise 5 — &[T] slices

#![allow(unused)]
fn main() {
fn drive_hunger(
    hungry: &[u32],            // <- slice, not Vec
    id_to_slot: &[u32],
    energy: &mut [f32],
    dt: f32,
) { /* ... */ }
}

The function takes the minimal data it needs. The caller passes &world.hungry and the autoderef does the rest. This is the usual shape for systems and integrates cleanly with the parallel scheduling described in §31.

Exercise 6 — The naive bug

#![allow(unused)]
fn main() {
// BUG: do not do this.
for &id in hungry.iter() {
    if /* some condition */ {
        hungry.push(/* a new id */); // mutating while iterating
    }
}
}

Rust’s borrow checker actually catches this one — hungry.iter() holds a & reference; hungry.push needs &mut. The code does not compile. The lesson is that the data-oriented discipline (deferred cleanup, §22) is what Rust’s borrow checker enforces structurally. Push to a side table; apply at tick boundary.