I’m
the
main
author
of
iddqd,
a
Rust
library
for
maps
(named
after
the
Doom
cheat
code)
where
keys
are
borrowed
from
values.
At
Oxide
we
use
it
extensively
in
Omicron,
our
control
plane—the
software
that
sits
at
the
heart
of
every
Oxide
rack,
provisions
resources
like
compute
and
storage
for
our
customers,
and
ensures
the
rack
stays
up
and
running
over
time.
iddqd
maintains
in-memory
indexes
of
the
kinds
of
large
records
that
show
up
everywhere
in
a
system
like
that,
such
as
disks
or
sled
inventories.
As
a
result,
it
must
be
correct:
if
it
misbehaves,
our
control
plane
can
malfunction
in
ways
that
are
unpredictable
and
hard
to
diagnose.
iddqd
consists
of
a
fair
amount
of
unsafe
code
underneath.
There’s
been
some
recent
concern
over
the
amount
of
unsafe
code
in
Rust
rewrites,
so
I
thought
I’d
write
about
some
of
the
unsafe
code
in
iddqd
and
how
we
try
to
tame
it.
What problem does iddqd solve?
With
Rust’s
standard
library
maps,
keys
are
stored
separately
from
values.
Let’s
say
you
want
to
store
a
map
of
records
keyed
by
an
email
address.
With
std::collections::BTreeMap,
you
might
write
something
like:
// Email is typically a newtype which validates that the email address is well-formed.
// In this example, we alias it to String for simplicity.
type Email = String;
struct User {
name: String,
age: u8,
}
let mut users = BTreeMap::<Email, User>::new();
users.insert(
"alice@example.com".to_string(),
User { name: "Alice".to_string(), age: 30 },
);
users.insert(
"bob@example.com".to_string(),
User { name: "Bob".to_string(), age: 35 },
);
This
approach
has
what
I
consider
to
be
a
pretty
major
downside:
the
key
(email)
is
not
stored
in
the
same
struct
as
the
value
(the
rest
of
the
record).
How
would
you
handle
this?
One
way
is
to
pass
around
both
the
email
and
the
user,
for
example
with
get_key_value:
fn process_user(email: &Email, user: &User) {
/* ... */
}
let (email, user) = users.get_key_value("alice@example.com").unwrap();
process_user(email, user);
As an extension, you could maybe have a struct which combines both at fetch time:
struct UserRecord<'a> {
email: &'a Email,
user: &'a User,
}
let (email, user) = users.get_key_value("alice@example.com").unwrap();
let record = UserRecord { email, user };
In practice, this gets quite unwieldy when you have lots of different types of records that need this kind of treatment.
Alternatively, you could duplicate the email across the key and the value:
struct User {
email: Email,
name: String,
age: u8,
}
let mut users = BTreeMap::<Email, User>::new();
let email = "alice@example.com".to_string();
users.insert(
email.clone(),
User { email, name: "Alice".to_string(), age: 30 },
);
But that has the risk that the emails stored in the key and the value fall out of sync.
iddqd
provides
a
better
alternative.
With
IdOrdMap,
you
can
write
something
like:
struct User {
email: Email,
name: String,
age: u8,
}
// You implement a trait which tells iddqd how to extract the key from the
// record.
impl IdOrdItem for User {
// The key type can borrow from the value!
type Key<'a> = &'a Email;
fn key(&self) -> Self::Key<'_> {
&self.email
}
// This macro expands into a small amount of boilerplate to work around
// a Rust borrow checker limitation.
id_upcast!();
}
// Then, you can insert records:
let mut users = IdOrdMap::<User>::new();
users
.insert_unique(User {
email: "alice@example.com".to_string(),
name: "Alice".to_string(),
age: 30,
})
.unwrap();
users
.insert_unique(User {
email: "bob@example.com".to_string(),
name: "Bob".to_string(),
age: 35,
})
.unwrap();
// And look them up by email:
assert_eq!(&users.get("alice@example.com").unwrap().name, "Alice");
assert_eq!(users.get("bob@example.com").unwrap().age, 35);
At
Oxide,
this
has
proven
to
be
an
invaluable
pattern:
many
of
the
records
in
our
control
plane
are
quite
large
(think
database
lookups),
and
iddqd
is
a
great
fit
for
them.
It
also
comes
with
several
other
features
that
directly
address
pain
points
we’ve
dealt
with
at
Oxide.
A
few
worth
calling
out:
-
First-class support for complex keys that borrow from more than one field, without having to resort to workarounds like dynamic dispatch.
-
Maps with two or three keys per item, each independently indexing the same record, without the usual pattern of maintaining multiple maps by hand.
-
Serde implementations that serialize as sequences rather than maps, so that non-string keys can be serialized in JSON[1]. Importantly, these implementations reject duplicate keys. (For backwards compatibility, serialization as maps is also supported.)
Like
many
of
our
other
crates,
iddqd
is
built
for
Oxide’s
needs
but
is
generally
useful
to
the
Rust
community.
You’re
welcome
to
use
it
in
your
own
projects
as
well.
On unsafe Rust
Before
I
move
on,
I
want
to
talk
about
what
it
means
for
unsafe
to
exist
in
a
memory-safe
language.
The
big
concern
is
undefined
behavior
(UB):
a
program
behaves
in
an
unpredictable
way
because
core
assumptions
made
by
the
language
or
compiler
have
been
violated.
Rust
calls
an
abstraction
sound
if
no
safe
code
can
use
it
to
cause
UB,
and
unsound
otherwise.
The vast majority of Rust code is safe, which (assuming that any unsafe used by the safe code is sound) means that no UB can occur. However, due to the fundamental undecidability of static analysis (see Rice’s theorem), it is impossible for the Rust compiler—or any kind of algorithm that terminates—to accept all programs without UB and reject all programs with it. Therefore, when writing such an algorithm, its authors have to make a decision: do they reject some programs without UB, accept some programs with UB, or both? The Rust compiler does the first: within the context of safe Rust, it rejects all programs with UB but also some without UB. (This is the correct choice!)
What
if
your
program
is
in
the
no-man’s-land
where
it
gets
rejected
even
though
you
know
it
doesn’t
have
UB?
To
express
those
kinds
of
programs,
the
Rust
compiler
provides
an
escape
hatch:
the
unsafe
keyword.
By
writing
it,
the
author
takes
responsibility
for
soundness:
they
vouch
that
no
safe
code
can
use
this
to
cause
UB,
and
that
the
compiler
should
trust
them.
What does it mean for some unsafe code to not have UB? Informally, it must follow the rules of Rust—all the rules that the compiler proves for safe Rust. Some examples of these rules are:
-
There are no data races.
-
There must be no reads of uninitialized or freed memory.
-
There must not ever be multiple aliases of
&mutreferences to the same region of memory. -
Immutable data must not be mutated.
The rules of Rust are very hard to reason about! I consider them to be significantly harder than C, for example, particularly due to the rules around mutable aliasing[2]. So judicious use of unsafe Rust usually involves encapsulating it behind a safe abstraction.
Unsafe Rust requires reasoning about safe Rust
In all but the simplest of cases, it is not possible to be sure that such an abstraction is sound without also reasoning about the surrounding safe Rust. In this section, I’ll walk through three examples in increasing order of difficulty.
The
first
example
is
the
split_at_mut
method
on
slices.
This
method
splits
a
mutable
slice
into
two
separate
parts.
split_at_mut
is
a
safe
method,
but
safe
Rust
lacks
the
ability
to
express
this
kind
of
partitioning
of
a
slice.
Thus,
split_at_mut
requires
unsafe
Rust.
The implementation is roughly:
fn split_at_mut<T>(slice: &mut [T], mid: usize) -> (&mut [T], &mut [T]) {
assert!(mid <= slice.len());
let ptr = slice.as_mut_ptr();
let remaining = slice.len() - mid;
unsafe {
(
std::slice::from_raw_parts_mut(ptr, mid),
std::slice::from_raw_parts_mut(ptr.add(mid), remaining),
)
}
}
To be sure of the correctness of the unsafe block, you also have to consider several things about the surrounding safe code:
-
that the function was provided a
&mut [T], and not, for example, a&[T] -
the
assert!ensuring thatmidis in bounds -
the fact that
remainingisslice.len() - mid, and not, for example,(slice.len() - mid) + 1
If any of these invariants (all of which are safe Rust) are not upheld, the safe abstraction becomes unsound.
The next level up is that in many cases you also have to analyze the entire module the unsafe code is present in. Consider this vector type:
struct MyVec<T> {
ptr: NonNull<T>,
len: usize,
cap: usize,
}
impl<T> MyVec<T> {
pub fn get(&self, i: usize) -> Option<&T> {
(i < self.len).then(|| unsafe { &*self.ptr.as_ptr().add(i) })
}
}
The
soundness
of
get
relies
on
every
other
method
in
the
module,
safe
and
unsafe,
ensuring
that
len
is
correct
and
in
bounds.
Encapsulation
and
privacy
(len
cannot
be
mutated
from
outside
the
module)
are
load-bearing
here.
The hardest situations, though, are when unsafe code is generic and calls back into user-supplied code. In those cases, the fundamental principle is:
Safe Rust code, no matter how pathological or adversarially written, must never cause unsafe code to exhibit undefined behavior.
In other words, you no longer have the luxury of analyzing just the safe code you see; you must put yourself in the shoes of an attacker, considering every safe Rust program one could write, and be resilient to all of them. For example, it is possible for user code to misbehave in a way that leads to internal tables or data structures being corrupted. Safe but buggy Rust may cause unsafe Rust to be slower, produce incorrect results, or even panic, but it must not cause UB!
(Why assume the worst? You might argue that well-meaning Rust developers are quite unlikely to write adversarial code. One reason for having a firm rule is that guarding against adversarial code means guarding against innocent mistakes as a corollary; a bright line makes it easier to assign responsibility for protection against UB.)
An
example
is
the
ExactSizeIterator
trait,
which
has
a
len
method
that
returns
the
exact
size
of
an
iterator.
A
developer
might
be
tempted
to
take
advantage
of
this
in
unsafe
Rust:
pub fn collect_exact<T, I: ExactSizeIterator<Item = T>>(iter: I) -> MyVec<T> {
let n = iter.len();
let mut v = MyVec::with_capacity(n);
for (i, item) in iter.enumerate() {
unsafe { std::ptr::write(v.ptr.as_ptr().add(i), item); }
}
v.len = n;
v
}
This
code
does
not
cause
UB
if
the
iterator
is
well-behaved.
But
there’s
no
guarantee
of
such
behavior!
It
is
entirely
possible
to
write
a
broken
ExactSizeIterator
implementation
that
returns
a
bogus
result,
all
in
safe
Rust:
struct BadIterator {
remaining: usize,
}
impl Iterator for BadIterator {
type Item = u32;
fn next(&mut self) -> Option<u32> {
(self.remaining > 0).then(|| { self.remaining -= 1; 42 })
}
}
impl ExactSizeIterator for BadIterator {
fn len(&self) -> usize {
0 // but this actually yields `self.remaining` items!
}
}
collect_exact
will
write
into
unallocated
memory
in
this
case,
so
it
is
unsound
in
general[3].
Another
example
is
the
std::io::Read
trait,
where
a
safe
but
buggy
Read
implementation
can
return
the
wrong
number
of
bytes
read.
See
Rust
RFC
2930
for
more
discussion
about
this.
iddqd
is
in
exactly
this
situation:
it
consists
of
generic
data
structures
which
call
into
user-supplied
code,
so
it
is
working
at
this
highest
level
of
difficulty[4].
How iddqd works
iddqd's
architecture
is
pretty
straightforward:
it
uses
a
combination
of
a
list
of
items
(internally
called
an
ItemSet)
and
a
table
with
indexes
into
the
slots.
For
example,
IdHashMap<T>
consists
of:
-
an
ItemSet<T>, which (similar to theslabcrate) internally has:-
a
Vec<ItemSlot<T>>, whereItemSlot<T>is an enum with two variants:-
Occupied(T) -
Vacant { next: ItemIndex }(whereItemIndexrepresents an integer index into theVec<ItemSlot<T>>)
-
-
a
free_head: ItemIndex, which points to either the most recently freedVacantslot, or is a sentinel (marked asSENTin the diagrams below) indicating that no free vacant slots are available. (The combination offree_headand theVacantslots forms a free chain.)
-
-
a
hashbrown::HashTableoverItemIndex(hashbrownis the fast hash table implementation used by the standard library)
For
example,
an
IdHashMap
holding
three
items
might
look
like
the
diagram
below.
The
top
row
is
the
hash
table
over
ItemIndex;
the
middle
row
is
the
Vec<ItemSlot<T>>.
Adding
a
new
item
consists
of
checking
free_head
to
see
if
any
Vacant
slots
are
available.
If
so,
we
read
the
next
index
out
of
that
Vacant
slot,
overwrite
the
slot
with
Occupied(…),
then
set
free_head
to
the
next
value
we
read.
Otherwise,
we
push
a
new
Occupied
slot
to
the
end.
Finally,
the
hash
table
is
updated,
fetching
the
key
and
computing
the
hash,
then
recording
the
new
index
in
the
hash
table.
Starting
from
the
state
above:
Let’s
say
we
want
to
insert
a
new
item
t₃.
Calling
insert(t₃)
recycles
slot
2
from
the
free
chain,
advances
free_head
to
1,
and
records
the
new
hash
entry:
Removing
an
item
given
a
key
involves
this
process
in
reverse:
first,
consult
the
hash
table
using
the
hash
computed
from
the
key,
and
remove
the
found
index
from
the
hash
table.
With
the
index
in
hand,
replace
the
ItemSlot
with
Vacant,
setting
the
internal
next
to
the
current
value
of
free_head.
Finally,
set
free_head
to
the
index
that
was
just
replaced
with
Vacant.
Continuing
from
the
post-insert
state
above:
Calling
remove(k₀)
drops
the
hash
entry
for
k₀,
marks
slot
3
as
Vacant
with
its
next
pointing
at
the
old
free_head
(1),
and
updates
free_head
to
3:
This pattern generalizes to the other map types:
-
IdOrdMapfollows a similar structure, except the hash table is replaced with a B-tree index[5]. -
BiHashMapandTriHashMapstore two and three hash tables respectively.
Unsafe Rust in iddqd
iddqd
has
a
few
different
kinds
of
unsafe
Rust
within
it.
Most
uses
of
unsafe
follow
well-established
patterns
also
used
within
Rust’s
standard
library—these
are
not
dependent
on
user
code,
and
a
sufficiently
smart
borrow
checker
would
make
these
uses
of
unsafe
unnecessary.
But
there
are
a
few
that
stand
out.
Perhaps
the
most
challenging
one
to
reason
about
is
mutable
iteration
over
items.
For
IdOrdMap::iter_mut,
iteration
is
ordered
by
the
key.
One
might
start
off
by
writing
something
like:
pub struct IterMut<'a, T: IdOrdItem> {
items: &'a mut ItemSet<T>,
// This is an ordered iterator over the B-tree equivalent to a hash table.
iter: btree_table::Iter<'a>,
}
impl<'a, T: IdOrdItem + 'a> Iterator for IterMut<'a, T> {
type Item = &'a mut T;
#[inline]
fn next(&mut self) -> Option<Self::Item> {
// Obtain the next index to return, in order.
let index = self.iter.next()?;
// Fetch the item.
let item: &mut T = &mut self.items[index];
Some(item)
}
}
But this would immediately return an error:
error: lifetime may not live long enough
--> crates/iddqd/src/id_ord_map/iter.rs:90:9
|
79 | impl<'a, T: IdOrdItem + 'a> Iterator for IterMut<'a, T> {
| -- lifetime `'a` defined here
...
83 | fn next(&mut self) -> Option {
| - let's call the lifetime of this reference `'1`
...
90 | Some(item)
| ^^^^^^^^^^ method was supposed to return data with lifetime `'a`
| but it is returning data with lifetime `'1`
The reason this occurs is that Rust’s iterators require that the returned values do not borrow from the iterator itself[6]. Many uses of iterators do not need this capability—in particular, most for loops only operate on one item at a time:
let mut items = IdOrdMap::<T>::new();
// ... insert some items
for item in &mut items {
item.value += 1;
}
But
Rust’s
iterators
also
let
you
create
values
that
outlive
the
iterator,
via
e.g.
the
collect
combinator:
let mut items = IdOrdMap::<T>::new();
// ... insert some items
let items_mut: Vec<&mut T> = items.iter_mut().collect();
items_mut
cannot
outlive
items
itself,
but
can
outlive
the
iterator
returned
by
items.iter_mut().
Crucially,
this
relies
on
knowing
that
all
of
the
&mut
T
returned
by
the
iterator
are
disjoint—or,
in
other
words,
that
the
iterator
never
yields
the
same
value
twice.
(Remember
that
one
of
the
rules
of
Rust
is
that
there
must
never
be
multiple
&mut
references
to
the
same
memory.)
But
the
Rust
borrow
checker
just
sees
a
succession
of
accesses
into
a
list,
and
it
has
no
way
of
knowing
that
the
indexes
are
all
different.
If
the
human
writing
the
code
has
knowledge
that
all
of
the
indexes
are
different,
though,
they
can
use
an
unsafe
pattern
called
lifetime
extension
to
tell
the
borrow
checker
to
let
this
code
through:
impl<'a, T: IdOrdItem + 'a> Iterator for IterMut<'a, T> {
type Item = &'a mut T;
#[inline]
fn next(&mut self) -> Option<Self::Item> {
// Obtain the next index to return, in order.
let index = self.iter.next()?;
// Fetch the item.
let item: &mut T = &mut self.items[index];
// Extend the lifetime of the item to 'a.
//
// SAFETY: The indexes returned by `self.iter` are known to all be different,
// so successive calls to `IterMut::next` never create mutable aliases to
// the same memory.
let item = unsafe { std::mem::transmute::<&mut T, &'a mut T>(item) };
Some(item)
}
}
A bug and a fix
In
the
previous
section,
we
saw
how
unsafe
Rust
acts
as
an
escape
hatch
for
facts
that
the
Rust
compiler
cannot
establish
by
itself—in
this
case,
that
the
indexes
returned
by
self.iter
are
all
different.
But
are
the
indexes
actually
different?
Based
on
the
storage
scheme
described
above,
it
would
seem
like
that
is
the
case.
But
this
is
generic
code
which
calls
into
user-provided
functions,
so
there’s
a
chance
that
sufficiently
pathological
user
code
can
trick
the
map
into
storing
duplicate
indexes.
Let’s
take
one
such
example
we
recently
found
and
fixed.
Suppose
you
have
an
IdOrdMap
called
items
storing
five
items
with
integer
keys
0
through
4,
inserted
and
stored
in
order.
Here,
the
top
row
is
the
B-tree
storing
indexes
in
key
order,
while
the
bottom
row
is
the
Vec<ItemSlot<T>>:
Now suppose you use the Entry API (a standard Rust data structure pattern) to fetch an item by index 0:
// In this example, the integer keys are wrapped in another type. We're going
// to swap in a different Ord for Key in the next step.
let entry = items.entry(Key(0));
This
will
successfully
look
up
the
item
stored
at
index
0
and
return
an
OccupiedEntry.
The
OccupiedEntry
internally
stores
the
fact
that
the
item
was
found
at
index
0.
Now,
at
this
point,
let’s
deliberately
switch
the
Ord
implementation
for
Key
to
always
return
Equal,
no
matter
the
values
inside.
There
are
a
few
different
ways
to
do
this
in
safe
Rust,
most
of
which
use
some
form
of
interior
mutability.
(This
is
unlikely
in
ordinary
Rust,
but
that
doesn’t
matter:
unsafe
generic
Rust
must
handle
arbitrarily
pathological
Rust!)
Now
remove
the
entry
from
the
map
using
OccupiedEntry::remove:
let item = entry.remove();
At
this
point,
iddqd
would
attempt
to
remove
index
0
from
the
B-tree,
descending
into
the
tree
again
but
comparing
only
by
key[7].
Because
we
made
comparisons
by
key
always
return
Equal,
the
comparison
would
short-circuit
at
the
first
element
it
compares
against.
For
example,
if
the
first
comparison
lands
on
the
B-tree
entry
(key
=
2,
index
=
2),
then
that
entry
would
be
removed.
But
the
OccupiedEntry
was
carrying
index
=
0,
so
it
would
also
remove
item
0
from
the
item
set:
The
map
is
no
longer
in
a
consistent
state.
The
index
0
is
still
in
the
B-tree
index
but
the
corresponding
slot
in
the
ItemSet
is
vacant;
meanwhile,
item
2
in
the
ItemSet
does
not
have
any
pointers
to
it.
At
this
point,
though,
the
invariant
that
there
aren’t
duplicate
indexes
hasn’t
been
violated.
Where
it
does
get
violated
is
with
this
next
step:
suppose
the
Ord
implementation
now
switches
back
to
being
honest[8],
and
you
insert
a
new
entry
with
key
1000.
Since
free_head
points
to
item
0,
iddqd
will
attempt
to
reuse
that
slot.
It
is
this
step
that
results
in
duplicate
indexes:
There
are
now
duplicate
indexes
pointing
to
the
same
item,
and
IterMut
becomes
unsound.
This was fixed through a combination of two things:
-
When descending into the B-tree, also check equality against the index. In other words, if we’re looking for a key with a known index in the B-tree, compare against both
keyandindex. As covered in the example above, previously, we would just check that the key is the same; the pathologicalOrdimplementation returnedEqual, tricking the map into believing that the entry at(key = 2, index = 2)was the one to remove. We changed this to additionally use the index as a tie-breaker. The comparison against(key = 2, index = 2)now resolves toLess(since0 < 2), so the search doesn’t spuriously match here.
By construction, the only way the search can now succeed is if the stored index actually equals the one we’re looking for.
-
If there are no matches, fall back to a linear scan. The tie-breaker eliminates spurious matches on the wrong entry, but it doesn’t guarantee that the search finds the right entry. (The B-tree is sorted by key, while the tie-breaker compares by index, and in general these two orderings are independent.) If the tree search yields no results, we must still keep the tree and the
ItemSetin sync. In that case, we remove the index from the B-tree by doing a linear scan without calling into user code. The remove operation then takes linear rather than logarithmic time, but that is an acceptable tradeoff, since buggy user code is the only way to encounter the fallback.
This is the kind of long-range reasoning that is sometimes required to establish the soundness of a safe abstraction. The overall system relies on all of these moving parts working correctly across arbitrarily complex series of operations.
Validating iddqd
Because
iddqd
is
such
a
foundational
data
structure
at
Oxide,
we
go
through
great
lengths
to
validate
its
correctness,
including
the
soundness
of
its
abstractions.
There
is
a
great
deal
of
analytical
reasoning
performed
by
experienced
Rust
authors
and
reviewers,
as
described
in
the
section
above.
But
we
also
empirically
validate
iddqd
along
several
different
dimensions.
None
of
the
layers
is
sufficient
by
itself,
but
by
rigorously
doing
all
of
them
we
can
be
more
confident
that
iddqd
is
correct.
The layers of validation we apply are:
| Technique | Catches |
|---|---|
|
Analytical reasoning |
Soundness errors |
|
Example-based tests (including doctests) |
Regressions, broken examples |
|
Pathological tests with Miri |
Adversarial trait impls causing UB |
|
Model-based
tests
(PBT
with
|
Divergence from oracle over random operation sequences |
|
Panic safety fault injection |
Invariant violations on mid-operation panics |
|
LLM adversarial review |
Blind spots in adversarial user code |
In this section, I’ll provide a brief overview of each layer. Click through the links to see more details and code samples.
Analytical reasoning
All blocks and patterns of unsafe Rust have been analyzed by at least one human reviewer, and up to three. Thanks to several of my Oxide colleagues for lending their time and expertise to these reviews—these discussions helped sharpen our reasoning considerably.
The
reasoning
for
each
unsafe
block
is
captured
in
a
//
SAFETY:
comment
above
the
block.
We
use
Clippy’s
undocumented_unsafe_blocks
lint
to
verify
that
these
comments
are
present.
Example-based tests
The
simplest
kind
of
empirical
validation
is
with
example-based
tests:
create
a
map,
perform
some
operations,
and
ensure
that
the
results
of
those
operations
are
as
expected.
iddqd
has
unit
tests
for
its
internals
and
integration
tests
for
the
public
API.
These
also
live
as
doctests,
where
they
double
as
documentation.
Pathological tests with Miri
iddqd
also
has
a
battery
of
pathological
tests
which
supply
buggy
implementations
of
Ord
and
other
traits.
In CI, both regular and pathological tests are run under the Miri interpreter, which can detect many classes of undefined behavior. (If you’re curious about the details, we run with Miri under both Stacked and Tree Borrows.) But note that some classes of UB can be detected in the regular test environment as well. For example, in the previous section we established that the tables not containing duplicate indexes is a necessary condition for soundness; this invariant can be verified outside the Miri context as well.
Model-based testing and fault injection
iddqd
uses
two
layers
of
randomized
testing:
model-based
comparison
against
a
known-correct
oracle,
and
fault
injection
on
top
of
it.
For
data
structures,
example-based
tests
alone
are
generally
considered
insufficient.
A
much
stronger
kind
of
testing
is
model-based
testing,
also
known
as
stateful
property-based
testing
(stateful
PBT):
random
sequences
of
operations
are
applied
to
an
instance
of
the
type,
and
the
results
are
compared
against
an
oracle
that
is
known
to
be
correct.
iddqd
has
extensive
model-based
tests
against
a
NaiveMap
oracle
that
is
inefficient
but
clearly
correct.
(I’ve
talked
about
this
style
of
testing
before
on
Oxide
and
Friends.)
An
extension
of
model-based
testing
is
fault
injection,
where
bugs
are
randomly
inserted
into
user
code.
For
iddqd,
a
fruitful
avenue
for
fault
injection
has
been
panic
safety
(or
unwind
safety):
user
code
panicking
in
the
middle
of
an
operation
must
not
cause
the
map’s
invariants
to
be
violated[9].
We
systematically
explore
fault
injection
by
generating
random
sequences
of
map
operations,
where
each
operation
is
associated
with
a
(randomly
selected)
panic
countdown.
Each
call
into
user
code
decrements
the
countdown
by
1;
if
it
reaches
0,
the
code
panics.
Randomized
panic
safety
testing
found
several
subtle
bugs
in
iddqd
(example),
including
some
that
escalated
into
unsoundness.
The model-based tests also verify internal invariants after each operation, such as the no-duplicate-index condition described in the previous section. We’ve found in practice that model-based tests are too slow to run under Miri, so we verify the invariants on which soundness (and correctness) is known to be dependent instead.
LLM adversarial review
Another
kind
of
validation
that’s
recently
become
available
is
LLM-driven
adversarial
review.
Current-generation
frontier
models[10]
found
several
ways
to
write
pathological
implementations
of
user
code
that
would
corrupt
the
map.
In
one
notable
case,
an
LLM
constructed
a
way
for
the
map’s
invariants
to
break
on
a
custom
allocator
panicking
and
unwinding.
This
is
a
panic
safety
issue
(and
a
failure
mode
I
hadn’t
thought
about
before),
but
distinct
from
the
existing
panic
safety
tests
that
only
covered
panics
in
regular
user
code
like
an
Ord
implementation.
LLMs can sometimes produce plausible-but-wrong soundness claims. An effective way to guard against that is red-green TDD, using Miri as an oracle: For a soundness bug, first have the LLM write a test case demonstrating the bug, running it under Miri to show undefined behavior—the red phase. Then, after fixing the bug, rerun the test to show that it now passes—the green phase.
What about formal methods?
To
formally
verify
iddqd,
one’s
first
thought
would
be
to
use
a
model
checker
like
Kani
to
establish
that
the
maps
don’t
violate
internal
invariants.
But
iddqd
is
unfortunately
a
bit
too
complicated
for
Kani
to
handle,
and
the
required
proofs
blow
up
beyond
what
is
feasible
for
the
tool.
The Creusot deductive verifier can help Rust developers prove their code is free of panics and other errors, but as of this writing it’s unable to prove invariants that must hold even if user code panics or otherwise misbehaves.
The
infeasibility
of
proofs
is
a
common
problem
with
applying
formal
methods
to
implementations,
so
they
are
often
applied
to
a
higher-level
specification
instead.
For
iddqd,
the
closest
thing
to
a
specification
is
NaiveMap,
but
it
can
be
easily
observed
to
be
correct
without
needing
a
formal
proof.
How
do
we
ensure
that
the
implementation
matches
the
specification?
Model-based
testing
does
a
lot
of
heavy
lifting
here.
While
they
aren’t
a
formal
proof
that
iddqd
matches
NaiveMap,
running
model-based
tests
thousands
of
times
in
CI
provides
fairly
high
confidence
that
it
does.
We
keep
an
eye
on
developments
in
this
space;
the
in-development
cargo-anneal,
which
lets
you
embed
Lean
soundness
proofs
alongside
unsafe
Rust
as
doc
comments,
looks
quite
interesting.
If
you’re
working
on
formal
verification
tooling,
iddqd
is
a
great
candidate
to
benchmark
against
because
of
its
crisp
invariants
and
relatively
constrained
yet
non-trivial
scope.
We’d
be
especially
interested
in
proofs
that
hold
over
arbitrary
trait
implementations,
and
of
refinement
between
iddqd
and
NaiveMap.
Please
reach
out
by
filing
an
issue
or
emailing
me!
Conclusion
Writing
unsafe
generic
Rust
is
extremely
difficult.
Each
invariant
that
the
unsafe
code
relies
upon
has
to
be
carefully
upheld
over
arbitrary
trait
implementations,
including
adversarial
ones.
This
post
covers
one
such
example:
how
an
Ord
implementation
could
be
carefully
written
to
trick
the
map
into
creating
mutable
aliases
to
the
same
memory,
and
how
we
fixed
it.
No
single
technique
can
hope
to
catch
all
bugs,
which
is
why
iddqd
uses
several
layers
of
validation.
Humans
carefully
reasoning
about
every
line
of
unsafe
code
goes
a
long
way;
example-based,
pathological,
and
randomized
tests
provide
empirical
evidence;
and
frontier
models
can
find
new
and
surprising
ways
to
break
code
that
humans
might
not
have
thought
of.
This
is
a
lot
of
machinery,
but
at
Oxide
we
believe
that
for
foundational
infrastructure,
this
level
of
rigor
is
justified.
And
if
you
agree,
we’re
hiring!
Diagrams courtesy Ben Leonard.
)