Emanuele’s Log

Writing Constant-Time Rust Is Not Enough

2026-06-07T18:01:10+00:00

Compilers rewrite programs all the time.

Rust code becomes MIR, MIR becomes LLVM IR, LLVM runs optimization passes, and eventually machine code comes out. The usual contract is simple: the optimized program should compute the same result as the original one, only faster or smaller.

Constant-time cryptography asks for one more thing.

It is not enough that the program returns the right value. It also matters which addresses the CPU touches while computing that value. Two executions can return the same answer and still behave differently in the cache.

That creates an interesting question:

Can Rust code look constant-time at the source level, but compile into a binary whose memory access pattern depends on a secret?

The investigation starts from a standard constant-time selection idiom: load both candidate values first, then let the secret choose only between values already in registers. Then aliasing is made relevant, the optimized assembly is checked, and the timing behavior is measured.

The code and artifacts for the experiments live in the ct-rust-verifier repository.

Starting From The Desired Shape

A common constant-time trick is to load both possible values, then select one of the already-loaded values in registers:

fn ct_select_u8(choice: u8, a: u8, b: u8) -> u8 {
    let mask = 0u8.wrapping_sub(choice & 1);
    (a & !mask) | (b & mask)
}

The important property is not just “no branch”. It is also “same memory access pattern”.

If choice is secret, this source shape is fine:

let av = *a;
let bv = *b;
ct_select_u8(choice, av, bv)

Both pointers are loaded every time. The secret only chooses between values that are already in registers. At source level, this is the shape the experiment wants to preserve.

But there is another shape that is not fine:

selected = choice ? a : b
load *selected

That can be branchless too. On AArch64, for example, the address selection can use csel, a conditional select instruction. But this version loads only one address. If one address is cache-hot and the other is cache-cold, timing can reveal the secret choice.

The source-level difference looks small, but the machine-level memory-access pattern is not the same. The rest of the investigation is about whether LLVM can legally move from the first shape to the second.

Making Aliasing Matter

The test case uses this source-level pattern:

let av = *a;
*out = 0;
let bv = *b;
ct_select_u8(choice, av, bv)

There are two loads and one store. The store is there because it makes the optimizer care about whether out can overlap with a or b. If overlap is possible, the compiler has to be conservative around the store. If overlap is ruled out, the compiler has more freedom.

This gives a simple strategy: keep the source access shape the same, but change what aliasing facts are available to the optimizer.

The first version keeps raw pointers:

pub unsafe fn raw_interleaved_select(
    choice: u8,
    a: *const u8,
    b: *const u8,
    out: *mut u8,
) -> u8 {
    let av = *a;
    *out = 0;
    let bv = *b;
    ct_select_u8(choice, av, bv)
}

The second version first converts the raw pointers into Rust references:

pub unsafe fn unsafe_ref_interleaved_select(
    choice: u8,
    a: *const u8,
    b: *const u8,
    out: *mut u8,
) -> u8 {
    let a_ref = &*a;
    let b_ref = &*b;
    let out_ref = &mut *out;

    ref_interleaved_select(choice, a_ref, b_ref, out_ref)
}

The helper receives references and performs the same interleaved access pattern:

fn ref_interleaved_select(choice: u8, a: &u8, b: &u8, out: &mut u8) -> u8 {
    let av = *a;
    *out = 0;
    let bv = *b;
    ct_select_u8(choice, av, bv)
}

At the Rust source level, both versions still look like fixed memory access: load a, store to out, load b, then select in registers.

At this point there is no result yet. Both Rust snippets still read like the same fixed-access algorithm. The result appears only after optimization.

First Result: The Assembly Shape Changes

The optimized assembly is where the first finding appears.

The raw-pointer version keeps both loads:

ldrb    w8, [x1]
strb    wzr, [x3]
ldrb    w9, [x2]
tst     w0, #0x1
csel    w0, w8, w9, eq
ret

The reference version selects the address first, then loads once:

tst     w0, #0x1
csel    x8, x1, x2, eq
ldrb    w0, [x8]
strb    wzr, [x3]
ret

This is the transform the experiment is looking for:

load a; load b; select value

becomes:

select address; load selected address

If choice is secret, this changes the side-channel behavior of the program. The source-level constant-time argument says “both addresses are loaded”; the binary does not do that in the reference-based version.

Why Rust Semantics Matter

The assembly difference points back to Rust semantics.

The raw-pointer version and the reference version are not equivalent inputs to the optimizer. Forming references tells the compiler more about the memory being accessed.

When Rust lowers references to LLVM IR, it can attach facts such as:

noalias
nonnull
dereferenceable
readonly
writeonly
alias.scope

These facts are useful. They are part of why Rust can produce good optimized code. They also mean that unsafe reference or slice construction can become part of the constant-time story, even when the source code still looks branchless and fixed-access.

For example, an &mut T carries a strong exclusivity promise. If LLVM knows that out cannot alias a or b, then the store to out cannot affect the loads from a or b. That gives the optimizer more room to rewrite the memory operations.

From LLVM’s point of view, the selected-address version is functionally equivalent. It returns the same value. The optimization can be legal under the ordinary language and IR rules.

The catch is that constant-time code has an extra rule: the memory access shape must not depend on secrets.

LLVM is not optimizing for that rule unless the compilation model gives it a way to represent and preserve it.

Second Result: The Difference Is Measurable

The assembly result gives a concrete hypothesis: if the binary loads only the selected address, then cache state should make the secret choice measurable.

The timing setup is simple:

the fixed class always selects a cache-hot byte;
the random class randomly selects the hot or cold byte;
before each sample, a large buffer evicts cache state;
only the hot pointer is warmed;
a Welch t-test compares the two classes.

If the code always loads both pointers, both classes should do the same hot and cold work. If the code loads only the selected pointer, the random class should be slower.

That is exactly what the measurement shows.

Target	Samples/class	Mean fixed	Mean random	Welch t	Result
`unsafe-ref-interleaved`	10000	18.220	99.448	-52.133	distinguishable
`raw-interleaved`	10000	144.341	155.492	-0.887	not distinguishable
`volatile` control	10000	131.229	223.733	-1.010	not distinguishable

Using the usual Dudect-style threshold of |t| > 4.5, the unsafe-reference variant is clearly distinguishable. The raw-pointer and volatile controls are not.

This is the second positive result. In this benchmark, alias-bearing reference construction changes the optimized access pattern, and that change is measurable.

The Important Point About The Compiler

This result does not require LLVM to be obviously wrong. LLVM is allowed to use the alias facts it receives, and the optimized function still computes the right value.

The constant-time issue is about a property outside ordinary value semantics:

In constant-time code, unsafe reference or slice construction can communicate alias facts that are invisible in a source-level constant-time review, and those facts can matter at the assembly level.

That is the security-relevant part. The compiler preserves the answer. It also changes the way the answer is loaded from memory.

Generalizing The Pattern

The minimal example explains one instance of the mechanism. The next question is whether it depends on one carefully chosen function, or whether it appears across a broader family of Rust constructs.

The taxonomy reproduces the same kind of access-shape change across several source-facing categories:

&mut exclusivity;
shared references combined with a separate write path;
mutable slice reconstruction;
unchecked mutable indexing;
integer-to-pointer round trips followed by reference formation;
C/LLVM-style alias contracts such as restrict, noalias, and alias.scope.

The common thread is not a particular syntax trick, but the fact that the optimizer receives more information about which pointers cannot overlap.

The strongest signal is the promise that “these pointers do not overlap”. Metadata such as noalias and alias.scope carries more weight than weaker facts like nonnull or readonly on their own.

Taking It To Real Code

After the taxonomy, the next step is to ask whether the same ingredients appear in real Rust crypto and constant-time crates.

The early real-world scan covers:

subtle
curve25519-dalek
crypto-bigint
base16ct
base32ct
base64ct

The scan looks for unsafe reference or slice reconstruction, unchecked indexing, raw pointer conversions, and similar patterns. For interesting source hits, the analysis then moves down the stack:

source pattern -> LLVM alias facts -> optimized assembly -> timing

The point is not to declare every unsafe pattern suspicious. The point is to find cases where Rust source, LLVM metadata, and final assembly tell the same story.

The scanner used for this is the cross_layer_detector. Its current rules and output summaries are also checked in under cross_layer_detector/results.

The strongest real-world-derived case comes from a crypto-bigint byte-slice reconstruction pattern. In an extracted fixed-access selection shape, it reproduces the same selected-load transform and timing leakage:

primary run: abs(t) = 14.872
repeat run:  abs(t) = 18.925

This is the bridge from the minimal example to real code. A source pattern from a cryptographic crate can reproduce the same alias-driven transform in a focused benchmark, and the transform remains timing-visible. The extracted reproducer lives under real_world/extracted/phase2_cases, with the classification notes in real_world/results/confirmed_findings.md.

Scaling The Investigation

The scan then expands to 30 pinned Rust crypto and security crates on x86_64 Linux.

The expanded corpus is pinned in real_world/corpus/manifest.csv.

The detector finds many optimized-code patterns worth reviewing:

368 cross-layer transform rows;
34 selected-pointer-load rows;
17 unique selected-pointer-load crate/symbol pairs;
many LLVM alias facts, including noalias, alias.scope, and !noalias.

This makes the cross-layer part of the work much more concrete. The detector is not just finding unsafe source snippets. It is finding optimized code shapes where source patterns, LLVM metadata, and assembly line up.

At this point the investigation has a useful queue: real optimized crate artifacts containing the selected-load codegen shape, often with LLVM alias facts nearby.

Manual triage then answers the security question:

Where does the selector come from?

The highest-priority selected-load rows fall mostly into two buckets:

crypto-bigint boxed integer and modular arithmetic paths;
elliptic-curve development mock-curve code.

The reviewed crypto-bigint selected loads are driven by public length, precision, or zero-padding decisions. For example, a loop over limbs may choose between an actual limb and a static zero limb when one operand is shorter:

let &a = lhs.limbs.get(i).unwrap_or(&Limb::ZERO);
let &b = rhs.limbs.get(i).unwrap_or(&Limb::ZERO);

That can compile into a selected address load. Structurally, it matches the pattern under investigation:

cmp     ...
csel    selected_ptr, real_limb, zero_limb, ...
ldr     value, [selected_ptr]

If the selector is public operand length, the selected-load shape is still useful detector evidence. It shows the codegen pattern exists in real crate artifacts, even when the selector itself is not secret.

The elliptic-curve hits are in development mock-curve code. Some of those are useful as regression tests for the detector, but they are not production curve arithmetic findings.

If the selector comes from a secret, the finding becomes security-sensitive. If it comes from public length, format state, parser state, allocation state, or a fixed field parameter, it is evidence for the compiler pattern and the detector, but not a timing finding by itself. The expanded triage table is in real_world/results/expanded_triage.csv, and the expanded run is summarized in reports/expanded-real-world-evaluation.md.

What The Systematic Search Found

The systematic search produces two useful results.

First, the selected-address-load shape is not limited to the tiny reproducer. It appears in optimized artifacts from real Rust crates. That matters because it shows the compiler pattern is not just a lab construction.

Second, source-only analysis is far too weak for this problem. Many source patterns look interesting but do not produce the final access shape. Some final assembly patterns are real selected loads, but their selectors come from public state such as length, precision, formatting, parser state, allocation state, or fixed public field parameters. Those are still useful findings for the detector and for understanding the optimizer, but they are not timing vulnerabilities by themselves.

At the moment, the systematic investigation has not confirmed a real upstream crate vulnerability. The positive finding is narrower and still important: Rust-level aliasing semantics can affect the memory-access shape that a constant-time implementation relies on, and that effect can be observed in both controlled experiments and real optimized crate artifacts.

Constant-Time in Rust

For constant-time Rust, the practical rule should not be “never use unsafe” or “never use references”. That is too broad to help.

A better rule is:

When a memory access pattern is part of the constant-time argument, review the optimized assembly for that access pattern, especially if unsafe code creates references, slices, or alias-separated views around the data.

In practice, this means:

Watch &mut and reconstructed slices in constant-time selection paths.
Be careful when a source-level argument depends on “load both sides before selecting”.
Check whether LLVM IR contains noalias, alias.scope, or related alias metadata on the relevant pointers.
Check whether assembly still loads both addresses, or whether it selects an address and loads once.
Classify each selected load by selector source: secret selectors are the security-sensitive ones.
Keep small assembly regression tests for the access shapes you rely on.

Raw pointers and volatile operations are not general constant-time strategies. In this benchmark family, raw-pointer forms avoided the specific alias facts that enabled the selected-address transform.

The important thing is not the syntax. It is the contract you give the optimizer.

Conclusions

The core finding is specific:

Alias metadata from unsafe Rust can let LLVM legally rewrite fixed-load constant-time-looking code into selected-address-load code. If the selector is secret, that can become a timing leak.

The current evidence includes one confirmed extracted real-world-derived timing case, a small taxonomy of alias-driven transforms, and real crate artifacts where the same selected-load codegen shape appears. That is enough to make the mechanism worth taking seriously.

Constant-time security lives in the binary, not just in the source. Rust gives developers strong tools for writing safe and fast code, but unsafe code can also give the optimizer strong promises. When those promises interact with a constant-time argument, the final access pattern needs to be checked.

At the moment, no upstream crate vulnerability has been confirmed. That should be future work, not a reason to ignore the mechanism. The next step is to investigate upstream call paths where this transform is reachable with a secret selector, extract more real-world-derived reproducers, and measure them under controlled timing tests.

The compiler follows the rules it is given. The security lesson is that constant-time reviews need to follow the data all the way down.

Rust MIR Instrumentation

2025-12-14T15:12:00+00:00

Instrumenting Rust MIR with a Custom Compiler Driver

Rust’s Mid-level Intermediate Representation (MIR) is one of the most powerful—but least documented—extension points in the compiler. If you want to observe reference creation, track memory, or build dynamic analyses, MIR is usually the right layer.

This post explains how to instrument Rust MIR in practice, focusing on three concrete questions:

Which rustc APIs to use
How to build a custom rustc driver
How to link a runtime crate safely (even under LTO)

All examples are based on a working multi-crate workspace.

1. Which rustc APIs to Use

Nightly and `rustc_private`

MIR instrumentation is not available on stable Rust. You must use nightly and opt into rustc’s internal APIs:

#![feature(rustc_private)]

extern crate rustc_driver;
extern crate rustc_interface;
extern crate rustc_middle;
extern crate rustc_mir_transform;
extern crate rustc_span;

The `optimized_mir` Query

Rustc exposes MIR through queries. The most useful query for instrumentation is optimized_mir, because it runs after MIR is built and optimized, but before codegen to LLVM.

In a custom driver you can override this query via Config::override_queries:

_config.override_queries = Some(|_session, queries| {
    queries.optimized_mir = CUSTOM_OPT_MIR;
});

Your hook has the shape:

const CUSTOM_OPT_MIR: for<'tcx> fn(tcx: TyCtxt<'tcx>, def: LocalDefId) -> &'tcx Body<'tcx> =
    |tcx, def| {
        let mut body = (rustc_interface::DEFAULT_QUERY_PROVIDERS.optimized_mir)(tcx, def).clone();

        // Your MIR instrumentation goes here.
        MyOptimizationPass.run_pass(tcx, &mut body);

        tcx.arena.alloc(body)
    };

The important pattern is:

Call the default provider (DEFAULT_QUERY_PROVIDERS.optimized_mir) to get rustc’s MIR.
Clone it (you need an owned Body to edit).
Modify it (insert statements/terminators, add locals, etc.).
Allocate it in the compiler arena and return &'tcx Body<'tcx>.

If you pick an earlier query (like mir_built) you’ll see less optimized MIR; if you pick a later stage you may miss the chance to inject cleanly before codegen.

2. How to create a custom rustc driver

Rust no longer supports compiler plugins, so the standard approach is to replace rustc with your own binary that embeds rustc and overrides queries.

The basic structure is:

struct CompilerCallbacks;

impl rustc_driver::Callbacks for CompilerCallbacks {
    fn config(&mut self, config: &mut rustc_interface::Config) {
        config.override_queries = Some(|_session, queries| {
            queries.optimized_mir = CUSTOM_OPT_MIR;
        });
    }
}

fn main() {
    let mut callbacks = CompilerCallbacks;
    rustc_driver::run_compiler(&std::env::args().collect::<Vec<_>>(), &mut callbacks);
}

In practice you usually want a Cargo subcommand wrapper (cargo-instrument-mir) that runs cargo build but sets:

RUSTC=/path/to/instrument-mir

This makes the workflow feel like a normal Cargo command:

cargo instrument-mir -p hello --release

3. How to link a runtime crate safely (even under LTO)

MIR instrumentation usually needs a runtime crate to record events (log, track pointers, update global state, etc.). The compiler pass injects calls, but those calls must resolve and the runtime must not be stripped by the linker.

3.1 Make rustc able to see the runtime

If your driver injects calls like runtime::__injected_hook(...), you must ensure rustc can resolve the runtime crate. A simple way is to add -L and --extern when your driver invokes rustc:

let runtime_path = "/path/to/workspace/target/release";

args.push("-Zunstable-options".to_string());
args.push(format!("-L{runtime_path}"));
args.push(format!("--extern=force:runtime={runtime_path}/libruntime.rlib"));

Notes:

-L tells rustc where to find libruntime.rlib.
--extern=force:runtime=... makes the crate available even if it is not referenced from source code.

3.2 Keep the runtime from being dead-stripped (especially with LTO)

With LTO enabled, the linker can remove crates and symbols that appear unused. Since the runtime is only referenced by injected MIR, it can look “unused” from the perspective of the normal Rust source.

A robust pattern is to export a C ABI hook and force a reference to it:

#[no_mangle]
pub extern "C" fn __injected_hook(addr: usize) {
    println!("__injected_hook called for address: 0x{:x}", addr);
}

macro_rules! force_runtime {
    ($sym:path) => {
        #[used]
        static _FORCE_RUNTIME: fn(usize) = $sym;
    };
}

force_runtime!(__injected_hook);

This does two things:

#[no_mangle] + extern "C" gives you a predictable symbol.
#[used] prevents the symbol from being dropped during optimization and linking.

3.3 Use a stable hook ABI

Don’t pass typed references like &T to the runtime hook. It’s brittle and can lead to invalid IR (especially under LTO). Instead, compute a stable representation in MIR (for example, a usize address) and pass that.

On recent nightlies with strict provenance, the cast you want in MIR is typically:

CastKind::PointerExposeProvenance

That converts a pointer-like value into an integer address in a way that rustc/LLVM accept.

References

Writing a custom rustc driver

(Rust) Pointers provenance is real

2025-01-20T15:12:00+00:00

Pointers are not integers

and Rust made it very clear to everyone.

Have you ever heard of pointer provenance? If not, don’t worry—you’re not alone. Even if you’ve been programming for 10 years, you might have never needed to think about it. But here’s the thing: provenance is a bit like those “apparent forces” in physics—it exists because we can observe its effects. And since it’s there, we need to define it to make sense of how things really work under the hood.

So, where do we even start? How do we know pointer provenance exists, and how can we find it? Let’s break it down.

Provenance in Rust

The claim that “pointers are just integers” doesn’t hold up, as demonstrated by counterexample and explained in [RFC3559 of Rust])https://rust-lang.github.io/rfcs/3559-rust-has-provenance.html). But here’s the catch — while provenance exists, we haven’t really had a way to interact with it directly in our code. That changes with Rust 1.84. The stable release introduces new APIs that let developers manipulate pointers and explicitly define their provenance.

As the Rust documentation state

It is undefined behavior to offset a pointer across a memory range that is not contained in the allocated object it is derived from

Rust 1.84 introduces wrapping_offset to create a pointer that points outside its provenance (dereferencing the pointer is still UB!) However the LLVM IR for offset and wrapping_offet is quite similar. Why then offset leads to undefined behavior even when the result is not deferenced? Let’s look at the LLVM IR here

with offset:

  store i64 6, ptr %count.dbg.spill.i2, align 8
; call core::ptr::const_ptr::::offset::precondition_check
  call void 
  @"_ZN4core3ptr9const_ptr33_$LT$impl$u20$$BP$const$u20$T$GT$6offset18precondition_check17h058d8998d9a55876E"(ptr %_6, i64 6, i64 1) #20, !dbg !2498
  %_0.i4 = getelementptr inbounds i8, ptr %_6, i64 6, !dbg !2500

with wrapped_offset:

 store i64 6, ptr %count.dbg.spill.i2, align 8
  #dbg_declare(ptr %count.dbg.spill.i2, !2351, !DIExpression(), !2354)
  %9 = getelementptr i8, ptr %_6, i64 6, !dbg !2355

and what LLVM LangRef says about Gep instruction?

The result value of the getelementptr may be outside the object pointed to by the base pointer. The result value may not necessarily be used to access memory though, even if it happens to point into allocated storage

and again

The getelementptr instruction may have a number of attributes that impose additional rules. If any of the rules are violated, the result vale is a poison value.

and therefore and out-of-bound Gep which is not used afterwards is not UB.