<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en"><generator uri="https://jekyllrb.com/" version="4.4.1">Jekyll</generator><link href="https://van-ema.github.io/feed.xml" rel="self" type="application/atom+xml"/><link href="https://van-ema.github.io/" rel="alternate" type="text/html" hreflang="en"/><updated>2026-06-07T19:08:03+00:00</updated><id>https://van-ema.github.io/feed.xml</id><title type="html">Emanuele’s Log</title><subtitle>A simple, whitespace theme for academics. Based on [*folio](https://github.com/bogoli/-folio) design. </subtitle><entry><title type="html">Writing Constant-Time Rust Is Not Enough</title><link href="https://van-ema.github.io/blog/2026/constant-time-rust-llvm-aliasing/" rel="alternate" type="text/html" title="Writing Constant-Time Rust Is Not Enough"/><published>2026-06-07T18:01:10+00:00</published><updated>2026-06-07T18:01:10+00:00</updated><id>https://van-ema.github.io/blog/2026/constant-time-rust-llvm-aliasing</id><content type="html" xml:base="https://van-ema.github.io/blog/2026/constant-time-rust-llvm-aliasing/"><![CDATA[<p>Compilers rewrite programs all the time.</p> <p>Rust code becomes MIR, MIR becomes LLVM IR, LLVM runs optimization passes, and eventually machine code comes out. The usual contract is simple: the optimized program should compute the same result as the original one, only faster or smaller.</p> <p>Constant-time cryptography asks for one more thing.</p> <p>It is not enough that the program returns the right value. It also matters which addresses the CPU touches while computing that value. Two executions can return the same answer and still behave differently in the cache.</p> <p>That creates an interesting question:</p> <p>Can Rust code look constant-time at the source level, but compile into a binary whose memory access pattern depends on a secret?</p> <p>The investigation starts from a standard constant-time selection idiom: load both candidate values first, then let the secret choose only between values already in registers. Then aliasing is made relevant, the optimized assembly is checked, and the timing behavior is measured.</p> <p>The code and artifacts for the experiments live in the <a href="https://github.com/van-ema/ct-rust-verifier"><code class="language-plaintext highlighter-rouge">ct-rust-verifier</code></a> repository.</p> <h2 id="starting-from-the-desired-shape">Starting From The Desired Shape</h2> <p>A common constant-time trick is to load both possible values, then select one of the already-loaded values in registers:</p> <div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="nf">ct_select_u8</span><span class="p">(</span><span class="n">choice</span><span class="p">:</span> <span class="nb">u8</span><span class="p">,</span> <span class="n">a</span><span class="p">:</span> <span class="nb">u8</span><span class="p">,</span> <span class="n">b</span><span class="p">:</span> <span class="nb">u8</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">u8</span> <span class="p">{</span>
    <span class="k">let</span> <span class="n">mask</span> <span class="o">=</span> <span class="mi">0u8</span><span class="nf">.wrapping_sub</span><span class="p">(</span><span class="n">choice</span> <span class="o">&amp;</span> <span class="mi">1</span><span class="p">);</span>
    <span class="p">(</span><span class="n">a</span> <span class="o">&amp;</span> <span class="o">!</span><span class="n">mask</span><span class="p">)</span> <span class="p">|</span> <span class="p">(</span><span class="n">b</span> <span class="o">&amp;</span> <span class="n">mask</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div></div> <p>The important property is not just “no branch”. It is also “same memory access pattern”.</p> <p>If <code class="language-plaintext highlighter-rouge">choice</code> is secret, this source shape is fine:</p> <div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">let</span> <span class="n">av</span> <span class="o">=</span> <span class="o">*</span><span class="n">a</span><span class="p">;</span>
<span class="k">let</span> <span class="n">bv</span> <span class="o">=</span> <span class="o">*</span><span class="n">b</span><span class="p">;</span>
<span class="nf">ct_select_u8</span><span class="p">(</span><span class="n">choice</span><span class="p">,</span> <span class="n">av</span><span class="p">,</span> <span class="n">bv</span><span class="p">)</span>
</code></pre></div></div> <p>Both pointers are loaded every time. The secret only chooses between values that are already in registers. At source level, this is the shape the experiment wants to preserve.</p> <p>But there is another shape that is not fine:</p> <div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>selected = choice ? a : b
load *selected
</code></pre></div></div> <p>That can be branchless too. On AArch64, for example, the address selection can use <code class="language-plaintext highlighter-rouge">csel</code>, a conditional select instruction. But this version loads only one address. If one address is cache-hot and the other is cache-cold, timing can reveal the secret choice.</p> <p>The source-level difference looks small, but the machine-level memory-access pattern is not the same. The rest of the investigation is about whether LLVM can legally move from the first shape to the second.</p> <h2 id="making-aliasing-matter">Making Aliasing Matter</h2> <p>The test case uses this source-level pattern:</p> <div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">let</span> <span class="n">av</span> <span class="o">=</span> <span class="o">*</span><span class="n">a</span><span class="p">;</span>
<span class="o">*</span><span class="n">out</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="k">let</span> <span class="n">bv</span> <span class="o">=</span> <span class="o">*</span><span class="n">b</span><span class="p">;</span>
<span class="nf">ct_select_u8</span><span class="p">(</span><span class="n">choice</span><span class="p">,</span> <span class="n">av</span><span class="p">,</span> <span class="n">bv</span><span class="p">)</span>
</code></pre></div></div> <p>There are two loads and one store. The store is there because it makes the optimizer care about whether <code class="language-plaintext highlighter-rouge">out</code> can overlap with <code class="language-plaintext highlighter-rouge">a</code> or <code class="language-plaintext highlighter-rouge">b</code>. If overlap is possible, the compiler has to be conservative around the store. If overlap is ruled out, the compiler has more freedom.</p> <p>This gives a simple strategy: keep the source access shape the same, but change what aliasing facts are available to the optimizer.</p> <p>The first version keeps raw pointers:</p> <div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">pub</span> <span class="k">unsafe</span> <span class="k">fn</span> <span class="nf">raw_interleaved_select</span><span class="p">(</span>
    <span class="n">choice</span><span class="p">:</span> <span class="nb">u8</span><span class="p">,</span>
    <span class="n">a</span><span class="p">:</span> <span class="o">*</span><span class="k">const</span> <span class="nb">u8</span><span class="p">,</span>
    <span class="n">b</span><span class="p">:</span> <span class="o">*</span><span class="k">const</span> <span class="nb">u8</span><span class="p">,</span>
    <span class="n">out</span><span class="p">:</span> <span class="o">*</span><span class="k">mut</span> <span class="nb">u8</span><span class="p">,</span>
<span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">u8</span> <span class="p">{</span>
    <span class="k">let</span> <span class="n">av</span> <span class="o">=</span> <span class="o">*</span><span class="n">a</span><span class="p">;</span>
    <span class="o">*</span><span class="n">out</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="k">let</span> <span class="n">bv</span> <span class="o">=</span> <span class="o">*</span><span class="n">b</span><span class="p">;</span>
    <span class="nf">ct_select_u8</span><span class="p">(</span><span class="n">choice</span><span class="p">,</span> <span class="n">av</span><span class="p">,</span> <span class="n">bv</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div></div> <p>The second version first converts the raw pointers into Rust references:</p> <div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">pub</span> <span class="k">unsafe</span> <span class="k">fn</span> <span class="nf">unsafe_ref_interleaved_select</span><span class="p">(</span>
    <span class="n">choice</span><span class="p">:</span> <span class="nb">u8</span><span class="p">,</span>
    <span class="n">a</span><span class="p">:</span> <span class="o">*</span><span class="k">const</span> <span class="nb">u8</span><span class="p">,</span>
    <span class="n">b</span><span class="p">:</span> <span class="o">*</span><span class="k">const</span> <span class="nb">u8</span><span class="p">,</span>
    <span class="n">out</span><span class="p">:</span> <span class="o">*</span><span class="k">mut</span> <span class="nb">u8</span><span class="p">,</span>
<span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">u8</span> <span class="p">{</span>
    <span class="k">let</span> <span class="n">a_ref</span> <span class="o">=</span> <span class="o">&amp;*</span><span class="n">a</span><span class="p">;</span>
    <span class="k">let</span> <span class="n">b_ref</span> <span class="o">=</span> <span class="o">&amp;*</span><span class="n">b</span><span class="p">;</span>
    <span class="k">let</span> <span class="n">out_ref</span> <span class="o">=</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="o">*</span><span class="n">out</span><span class="p">;</span>

    <span class="nf">ref_interleaved_select</span><span class="p">(</span><span class="n">choice</span><span class="p">,</span> <span class="n">a_ref</span><span class="p">,</span> <span class="n">b_ref</span><span class="p">,</span> <span class="n">out_ref</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div></div> <p>The helper receives references and performs the same interleaved access pattern:</p> <div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="nf">ref_interleaved_select</span><span class="p">(</span><span class="n">choice</span><span class="p">:</span> <span class="nb">u8</span><span class="p">,</span> <span class="n">a</span><span class="p">:</span> <span class="o">&amp;</span><span class="nb">u8</span><span class="p">,</span> <span class="n">b</span><span class="p">:</span> <span class="o">&amp;</span><span class="nb">u8</span><span class="p">,</span> <span class="n">out</span><span class="p">:</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="nb">u8</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">u8</span> <span class="p">{</span>
    <span class="k">let</span> <span class="n">av</span> <span class="o">=</span> <span class="o">*</span><span class="n">a</span><span class="p">;</span>
    <span class="o">*</span><span class="n">out</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="k">let</span> <span class="n">bv</span> <span class="o">=</span> <span class="o">*</span><span class="n">b</span><span class="p">;</span>
    <span class="nf">ct_select_u8</span><span class="p">(</span><span class="n">choice</span><span class="p">,</span> <span class="n">av</span><span class="p">,</span> <span class="n">bv</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div></div> <p>At the Rust source level, both versions still look like fixed memory access: load <code class="language-plaintext highlighter-rouge">a</code>, store to <code class="language-plaintext highlighter-rouge">out</code>, load <code class="language-plaintext highlighter-rouge">b</code>, then select in registers.</p> <p>At this point there is no result yet. Both Rust snippets still read like the same fixed-access algorithm. The result appears only after optimization.</p> <h2 id="first-result-the-assembly-shape-changes">First Result: The Assembly Shape Changes</h2> <p>The optimized assembly is where the first finding appears.</p> <p>The raw-pointer version keeps both loads:</p> <pre><code class="language-asm">ldrb    w8, [x1]
strb    wzr, [x3]
ldrb    w9, [x2]
tst     w0, #0x1
csel    w0, w8, w9, eq
ret
</code></pre> <p>The reference version selects the address first, then loads once:</p> <pre><code class="language-asm">tst     w0, #0x1
csel    x8, x1, x2, eq
ldrb    w0, [x8]
strb    wzr, [x3]
ret
</code></pre> <p>This is the transform the experiment is looking for:</p> <div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>load a; load b; select value
</code></pre></div></div> <p>becomes:</p> <div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>select address; load selected address
</code></pre></div></div> <p>If <code class="language-plaintext highlighter-rouge">choice</code> is secret, this changes the side-channel behavior of the program. The source-level constant-time argument says “both addresses are loaded”; the binary does not do that in the reference-based version.</p> <h2 id="why-rust-semantics-matter">Why Rust Semantics Matter</h2> <p>The assembly difference points back to Rust semantics.</p> <p>The raw-pointer version and the reference version are not equivalent inputs to the optimizer. Forming references tells the compiler more about the memory being accessed.</p> <p>When Rust lowers references to LLVM IR, it can attach facts such as:</p> <ul> <li><code class="language-plaintext highlighter-rouge">noalias</code></li> <li><code class="language-plaintext highlighter-rouge">nonnull</code></li> <li><code class="language-plaintext highlighter-rouge">dereferenceable</code></li> <li><code class="language-plaintext highlighter-rouge">readonly</code></li> <li><code class="language-plaintext highlighter-rouge">writeonly</code></li> <li><code class="language-plaintext highlighter-rouge">alias.scope</code></li> </ul> <p>These facts are useful. They are part of why Rust can produce good optimized code. They also mean that unsafe reference or slice construction can become part of the constant-time story, even when the source code still looks branchless and fixed-access.</p> <p>For example, an <code class="language-plaintext highlighter-rouge">&amp;mut T</code> carries a strong exclusivity promise. If LLVM knows that <code class="language-plaintext highlighter-rouge">out</code> cannot alias <code class="language-plaintext highlighter-rouge">a</code> or <code class="language-plaintext highlighter-rouge">b</code>, then the store to <code class="language-plaintext highlighter-rouge">out</code> cannot affect the loads from <code class="language-plaintext highlighter-rouge">a</code> or <code class="language-plaintext highlighter-rouge">b</code>. That gives the optimizer more room to rewrite the memory operations.</p> <p>From LLVM’s point of view, the selected-address version is functionally equivalent. It returns the same value. The optimization can be legal under the ordinary language and IR rules.</p> <p>The catch is that constant-time code has an extra rule: the memory access shape must not depend on secrets.</p> <p>LLVM is not optimizing for that rule unless the compilation model gives it a way to represent and preserve it.</p> <h2 id="second-result-the-difference-is-measurable">Second Result: The Difference Is Measurable</h2> <p>The assembly result gives a concrete hypothesis: if the binary loads only the selected address, then cache state should make the secret choice measurable.</p> <p>The timing setup is simple:</p> <ul> <li>the fixed class always selects a cache-hot byte;</li> <li>the random class randomly selects the hot or cold byte;</li> <li>before each sample, a large buffer evicts cache state;</li> <li>only the hot pointer is warmed;</li> <li>a Welch t-test compares the two classes.</li> </ul> <p>If the code always loads both pointers, both classes should do the same hot and cold work. If the code loads only the selected pointer, the random class should be slower.</p> <p>That is exactly what the measurement shows.</p> <table> <thead> <tr> <th>Target</th> <th style="text-align: right">Samples/class</th> <th style="text-align: right">Mean fixed</th> <th style="text-align: right">Mean random</th> <th style="text-align: right">Welch t</th> <th>Result</th> </tr> </thead> <tbody> <tr> <td><code class="language-plaintext highlighter-rouge">unsafe-ref-interleaved</code></td> <td style="text-align: right">10000</td> <td style="text-align: right">18.220</td> <td style="text-align: right">99.448</td> <td style="text-align: right">-52.133</td> <td>distinguishable</td> </tr> <tr> <td><code class="language-plaintext highlighter-rouge">raw-interleaved</code></td> <td style="text-align: right">10000</td> <td style="text-align: right">144.341</td> <td style="text-align: right">155.492</td> <td style="text-align: right">-0.887</td> <td>not distinguishable</td> </tr> <tr> <td><code class="language-plaintext highlighter-rouge">volatile</code> control</td> <td style="text-align: right">10000</td> <td style="text-align: right">131.229</td> <td style="text-align: right">223.733</td> <td style="text-align: right">-1.010</td> <td>not distinguishable</td> </tr> </tbody> </table> <p>Using the usual Dudect-style threshold of <code class="language-plaintext highlighter-rouge">|t| &gt; 4.5</code>, the unsafe-reference variant is clearly distinguishable. The raw-pointer and volatile controls are not.</p> <p>This is the second positive result. In this benchmark, alias-bearing reference construction changes the optimized access pattern, and that change is measurable.</p> <h2 id="the-important-point-about-the-compiler">The Important Point About The Compiler</h2> <p>This result does not require LLVM to be obviously wrong. LLVM is allowed to use the alias facts it receives, and the optimized function still computes the right value.</p> <p>The constant-time issue is about a property outside ordinary value semantics:</p> <blockquote> <p>In constant-time code, unsafe reference or slice construction can communicate alias facts that are invisible in a source-level constant-time review, and those facts can matter at the assembly level.</p> </blockquote> <p>That is the security-relevant part. The compiler preserves the answer. It also changes the way the answer is loaded from memory.</p> <h2 id="generalizing-the-pattern">Generalizing The Pattern</h2> <p>The minimal example explains one instance of the mechanism. The next question is whether it depends on one carefully chosen function, or whether it appears across a broader family of Rust constructs.</p> <p>The taxonomy reproduces the same kind of access-shape change across several source-facing categories:</p> <ul> <li><code class="language-plaintext highlighter-rouge">&amp;mut</code> exclusivity;</li> <li>shared references combined with a separate write path;</li> <li>mutable slice reconstruction;</li> <li>unchecked mutable indexing;</li> <li>integer-to-pointer round trips followed by reference formation;</li> <li>C/LLVM-style alias contracts such as <code class="language-plaintext highlighter-rouge">restrict</code>, <code class="language-plaintext highlighter-rouge">noalias</code>, and <code class="language-plaintext highlighter-rouge">alias.scope</code>.</li> </ul> <p>The common thread is not a particular syntax trick, but the fact that the optimizer receives more information about which pointers cannot overlap.</p> <p>The strongest signal is the promise that “these pointers do not overlap”. Metadata such as <code class="language-plaintext highlighter-rouge">noalias</code> and <code class="language-plaintext highlighter-rouge">alias.scope</code> carries more weight than weaker facts like <code class="language-plaintext highlighter-rouge">nonnull</code> or <code class="language-plaintext highlighter-rouge">readonly</code> on their own.</p> <h2 id="taking-it-to-real-code">Taking It To Real Code</h2> <p>After the taxonomy, the next step is to ask whether the same ingredients appear in real Rust crypto and constant-time crates.</p> <p>The early real-world scan covers:</p> <ul> <li><code class="language-plaintext highlighter-rouge">subtle</code></li> <li><code class="language-plaintext highlighter-rouge">curve25519-dalek</code></li> <li><code class="language-plaintext highlighter-rouge">crypto-bigint</code></li> <li><code class="language-plaintext highlighter-rouge">base16ct</code></li> <li><code class="language-plaintext highlighter-rouge">base32ct</code></li> <li><code class="language-plaintext highlighter-rouge">base64ct</code></li> </ul> <p>The scan looks for unsafe reference or slice reconstruction, unchecked indexing, raw pointer conversions, and similar patterns. For interesting source hits, the analysis then moves down the stack:</p> <div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>source pattern -&gt; LLVM alias facts -&gt; optimized assembly -&gt; timing
</code></pre></div></div> <p>The point is not to declare every unsafe pattern suspicious. The point is to find cases where Rust source, LLVM metadata, and final assembly tell the same story.</p> <p>The scanner used for this is the <a href="https://github.com/van-ema/ct-rust-verifier/tree/main/cross_layer_detector/detector"><code class="language-plaintext highlighter-rouge">cross_layer_detector</code></a>. Its current rules and output summaries are also checked in under <a href="https://github.com/van-ema/ct-rust-verifier/tree/main/cross_layer_detector/results"><code class="language-plaintext highlighter-rouge">cross_layer_detector/results</code></a>.</p> <p>The strongest real-world-derived case comes from a <code class="language-plaintext highlighter-rouge">crypto-bigint</code> byte-slice reconstruction pattern. In an extracted fixed-access selection shape, it reproduces the same selected-load transform and timing leakage:</p> <div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>primary run: abs(t) = 14.872
repeat run:  abs(t) = 18.925
</code></pre></div></div> <p>This is the bridge from the minimal example to real code. A source pattern from a cryptographic crate can reproduce the same alias-driven transform in a focused benchmark, and the transform remains timing-visible. The extracted reproducer lives under <a href="https://github.com/van-ema/ct-rust-verifier/tree/main/real_world/extracted/phase2_cases"><code class="language-plaintext highlighter-rouge">real_world/extracted/phase2_cases</code></a>, with the classification notes in <a href="https://github.com/van-ema/ct-rust-verifier/blob/main/real_world/results/confirmed_findings.md"><code class="language-plaintext highlighter-rouge">real_world/results/confirmed_findings.md</code></a>.</p> <h2 id="scaling-the-investigation">Scaling The Investigation</h2> <p>The scan then expands to 30 pinned Rust crypto and security crates on x86_64 Linux.</p> <p>The expanded corpus is pinned in <a href="https://github.com/van-ema/ct-rust-verifier/blob/main/real_world/corpus/manifest.csv"><code class="language-plaintext highlighter-rouge">real_world/corpus/manifest.csv</code></a>.</p> <p>The detector finds many optimized-code patterns worth reviewing:</p> <ul> <li>368 cross-layer transform rows;</li> <li>34 selected-pointer-load rows;</li> <li>17 unique selected-pointer-load crate/symbol pairs;</li> <li>many LLVM alias facts, including <code class="language-plaintext highlighter-rouge">noalias</code>, <code class="language-plaintext highlighter-rouge">alias.scope</code>, and <code class="language-plaintext highlighter-rouge">!noalias</code>.</li> </ul> <p>This makes the cross-layer part of the work much more concrete. The detector is not just finding unsafe source snippets. It is finding optimized code shapes where source patterns, LLVM metadata, and assembly line up.</p> <p>At this point the investigation has a useful queue: real optimized crate artifacts containing the selected-load codegen shape, often with LLVM alias facts nearby.</p> <p>Manual triage then answers the security question:</p> <div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Where does the selector come from?
</code></pre></div></div> <p>The highest-priority selected-load rows fall mostly into two buckets:</p> <ul> <li><code class="language-plaintext highlighter-rouge">crypto-bigint</code> boxed integer and modular arithmetic paths;</li> <li><code class="language-plaintext highlighter-rouge">elliptic-curve</code> development mock-curve code.</li> </ul> <p>The reviewed <code class="language-plaintext highlighter-rouge">crypto-bigint</code> selected loads are driven by public length, precision, or zero-padding decisions. For example, a loop over limbs may choose between an actual limb and a static zero limb when one operand is shorter:</p> <div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">let</span> <span class="o">&amp;</span><span class="n">a</span> <span class="o">=</span> <span class="n">lhs</span><span class="py">.limbs</span><span class="nf">.get</span><span class="p">(</span><span class="n">i</span><span class="p">)</span><span class="nf">.unwrap_or</span><span class="p">(</span><span class="o">&amp;</span><span class="nn">Limb</span><span class="p">::</span><span class="n">ZERO</span><span class="p">);</span>
<span class="k">let</span> <span class="o">&amp;</span><span class="n">b</span> <span class="o">=</span> <span class="n">rhs</span><span class="py">.limbs</span><span class="nf">.get</span><span class="p">(</span><span class="n">i</span><span class="p">)</span><span class="nf">.unwrap_or</span><span class="p">(</span><span class="o">&amp;</span><span class="nn">Limb</span><span class="p">::</span><span class="n">ZERO</span><span class="p">);</span>
</code></pre></div></div> <p>That can compile into a selected address load. Structurally, it matches the pattern under investigation:</p> <pre><code class="language-asm">cmp     ...
csel    selected_ptr, real_limb, zero_limb, ...
ldr     value, [selected_ptr]
</code></pre> <p>If the selector is public operand length, the selected-load shape is still useful detector evidence. It shows the codegen pattern exists in real crate artifacts, even when the selector itself is not secret.</p> <p>The <code class="language-plaintext highlighter-rouge">elliptic-curve</code> hits are in development mock-curve code. Some of those are useful as regression tests for the detector, but they are not production curve arithmetic findings.</p> <p>If the selector comes from a secret, the finding becomes security-sensitive. If it comes from public length, format state, parser state, allocation state, or a fixed field parameter, it is evidence for the compiler pattern and the detector, but not a timing finding by itself. The expanded triage table is in <a href="https://github.com/van-ema/ct-rust-verifier/blob/main/real_world/results/expanded_triage.csv"><code class="language-plaintext highlighter-rouge">real_world/results/expanded_triage.csv</code></a>, and the expanded run is summarized in <a href="https://github.com/van-ema/ct-rust-verifier/blob/main/reports/expanded-real-world-evaluation.md"><code class="language-plaintext highlighter-rouge">reports/expanded-real-world-evaluation.md</code></a>.</p> <h2 id="what-the-systematic-search-found">What The Systematic Search Found</h2> <p>The systematic search produces two useful results.</p> <p>First, the selected-address-load shape is not limited to the tiny reproducer. It appears in optimized artifacts from real Rust crates. That matters because it shows the compiler pattern is not just a lab construction.</p> <p>Second, source-only analysis is far too weak for this problem. Many source patterns look interesting but do not produce the final access shape. Some final assembly patterns are real selected loads, but their selectors come from public state such as length, precision, formatting, parser state, allocation state, or fixed public field parameters. Those are still useful findings for the detector and for understanding the optimizer, but they are not timing vulnerabilities by themselves.</p> <p>At the moment, the systematic investigation has not confirmed a real upstream crate vulnerability. The positive finding is narrower and still important: Rust-level aliasing semantics can affect the memory-access shape that a constant-time implementation relies on, and that effect can be observed in both controlled experiments and real optimized crate artifacts.</p> <h2 id="constant-time-in-rust">Constant-Time in Rust</h2> <p>For constant-time Rust, the practical rule should not be “never use unsafe” or “never use references”. That is too broad to help.</p> <p>A better rule is:</p> <blockquote> <p>When a memory access pattern is part of the constant-time argument, review the optimized assembly for that access pattern, especially if unsafe code creates references, slices, or alias-separated views around the data.</p> </blockquote> <p>In practice, this means:</p> <ul> <li>Watch <code class="language-plaintext highlighter-rouge">&amp;mut</code> and reconstructed slices in constant-time selection paths.</li> <li>Be careful when a source-level argument depends on “load both sides before selecting”.</li> <li>Check whether LLVM IR contains <code class="language-plaintext highlighter-rouge">noalias</code>, <code class="language-plaintext highlighter-rouge">alias.scope</code>, or related alias metadata on the relevant pointers.</li> <li>Check whether assembly still loads both addresses, or whether it selects an address and loads once.</li> <li>Classify each selected load by selector source: secret selectors are the security-sensitive ones.</li> <li>Keep small assembly regression tests for the access shapes you rely on.</li> </ul> <p>Raw pointers and volatile operations are not general constant-time strategies. In this benchmark family, raw-pointer forms avoided the specific alias facts that enabled the selected-address transform.</p> <p>The important thing is not the syntax. It is the contract you give the optimizer.</p> <h2 id="conclusions">Conclusions</h2> <p>The core finding is specific:</p> <blockquote> <p>Alias metadata from unsafe Rust can let LLVM legally rewrite fixed-load constant-time-looking code into selected-address-load code. If the selector is secret, that can become a timing leak.</p> </blockquote> <p>The current evidence includes one confirmed extracted real-world-derived timing case, a small taxonomy of alias-driven transforms, and real crate artifacts where the same selected-load codegen shape appears. That is enough to make the mechanism worth taking seriously.</p> <p>Constant-time security lives in the binary, not just in the source. Rust gives developers strong tools for writing safe and fast code, but unsafe code can also give the optimizer strong promises. When those promises interact with a constant-time argument, the final access pattern needs to be checked.</p> <p>At the moment, no upstream crate vulnerability has been confirmed. That should be future work, not a reason to ignore the mechanism. The next step is to investigate upstream call paths where this transform is reachable with a secret selector, extract more real-world-derived reproducers, and measure them under controlled timing tests.</p> <p>The compiler follows the rules it is given. The security lesson is that constant-time reviews need to follow the data all the way down.</p>]]></content><author><name></name></author><category term="Rust"/><category term="rust"/><category term="llvm"/><category term="cryptography"/><category term="constant-time"/><summary type="html"><![CDATA[How Rust aliasing facts can let LLVM change fixed-load constant-time code into selected-address-load code.]]></summary></entry><entry><title type="html">Rust MIR Instrumentation</title><link href="https://van-ema.github.io/blog/2025/mir-instrumentation/" rel="alternate" type="text/html" title="Rust MIR Instrumentation"/><published>2025-12-14T15:12:00+00:00</published><updated>2025-12-14T15:12:00+00:00</updated><id>https://van-ema.github.io/blog/2025/mir-instrumentation</id><content type="html" xml:base="https://van-ema.github.io/blog/2025/mir-instrumentation/"><![CDATA[<h1 id="instrumenting-rust-mir-with-a-custom-compiler-driver">Instrumenting Rust MIR with a Custom Compiler Driver</h1> <p>Rust’s Mid-level Intermediate Representation (MIR) is one of the most powerful—but least documented—extension points in the compiler. If you want to observe reference creation, track memory, or build dynamic analyses, MIR is usually the right layer.</p> <p>This post explains <strong>how to instrument Rust MIR in practice</strong>, focusing on three concrete questions:</p> <ol> <li><strong>Which rustc APIs to use</strong></li> <li><strong>How to build a custom rustc driver</strong></li> <li><strong>How to link a runtime crate safely (even under LTO)</strong></li> </ol> <p>All examples are based on a working multi-crate workspace.</p> <hr/> <h2 id="1-which-rustc-apis-to-use">1. Which rustc APIs to Use</h2> <h3 id="nightly-and-rustc_private">Nightly and <code class="language-plaintext highlighter-rouge">rustc_private</code></h3> <p>MIR instrumentation is not available on stable Rust. You must use nightly and opt into rustc’s internal APIs:</p> <div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">#![feature(rustc_private)]</span>

<span class="k">extern</span> <span class="k">crate</span> <span class="n">rustc_driver</span><span class="p">;</span>
<span class="k">extern</span> <span class="k">crate</span> <span class="n">rustc_interface</span><span class="p">;</span>
<span class="k">extern</span> <span class="k">crate</span> <span class="n">rustc_middle</span><span class="p">;</span>
<span class="k">extern</span> <span class="k">crate</span> <span class="n">rustc_mir_transform</span><span class="p">;</span>
<span class="k">extern</span> <span class="k">crate</span> <span class="n">rustc_span</span><span class="p">;</span>
</code></pre></div></div> <h3 id="the-optimized_mir-query">The <code class="language-plaintext highlighter-rouge">optimized_mir</code> Query</h3> <p>Rustc exposes MIR through <em>queries</em>. The most useful query for instrumentation is <code class="language-plaintext highlighter-rouge">optimized_mir</code>, because it runs <strong>after MIR is built and optimized</strong>, but <strong>before</strong> codegen to LLVM.</p> <p>In a custom driver you can override this query via <code class="language-plaintext highlighter-rouge">Config::override_queries</code>:</p> <div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">_config</span><span class="py">.override_queries</span> <span class="o">=</span> <span class="nf">Some</span><span class="p">(|</span><span class="n">_session</span><span class="p">,</span> <span class="n">queries</span><span class="p">|</span> <span class="p">{</span>
    <span class="n">queries</span><span class="py">.optimized_mir</span> <span class="o">=</span> <span class="n">CUSTOM_OPT_MIR</span><span class="p">;</span>
<span class="p">});</span>
</code></pre></div></div> <p>Your hook has the shape:</p> <div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">const</span> <span class="n">CUSTOM_OPT_MIR</span><span class="p">:</span> <span class="k">for</span><span class="o">&lt;</span><span class="nv">'tcx</span><span class="o">&gt;</span> <span class="k">fn</span><span class="p">(</span><span class="n">tcx</span><span class="p">:</span> <span class="n">TyCtxt</span><span class="o">&lt;</span><span class="nv">'tcx</span><span class="o">&gt;</span><span class="p">,</span> <span class="n">def</span><span class="p">:</span> <span class="n">LocalDefId</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="o">&amp;</span><span class="nv">'tcx</span> <span class="n">Body</span><span class="o">&lt;</span><span class="nv">'tcx</span><span class="o">&gt;</span> <span class="o">=</span>
    <span class="p">|</span><span class="n">tcx</span><span class="p">,</span> <span class="n">def</span><span class="p">|</span> <span class="p">{</span>
        <span class="k">let</span> <span class="k">mut</span> <span class="n">body</span> <span class="o">=</span> <span class="p">(</span><span class="nn">rustc_interface</span><span class="p">::</span><span class="n">DEFAULT_QUERY_PROVIDERS</span><span class="py">.optimized_mir</span><span class="p">)(</span><span class="n">tcx</span><span class="p">,</span> <span class="n">def</span><span class="p">)</span><span class="nf">.clone</span><span class="p">();</span>

        <span class="c1">// Your MIR instrumentation goes here.</span>
        <span class="n">MyOptimizationPass</span><span class="nf">.run_pass</span><span class="p">(</span><span class="n">tcx</span><span class="p">,</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="n">body</span><span class="p">);</span>

        <span class="n">tcx</span><span class="py">.arena</span><span class="nf">.alloc</span><span class="p">(</span><span class="n">body</span><span class="p">)</span>
    <span class="p">};</span>
</code></pre></div></div> <p>The important pattern is:</p> <ol> <li>Call the default provider (<code class="language-plaintext highlighter-rouge">DEFAULT_QUERY_PROVIDERS.optimized_mir</code>) to get rustc’s MIR.</li> <li>Clone it (you need an owned <code class="language-plaintext highlighter-rouge">Body</code> to edit).</li> <li>Modify it (insert statements/terminators, add locals, etc.).</li> <li>Allocate it in the compiler arena and return <code class="language-plaintext highlighter-rouge">&amp;'tcx Body&lt;'tcx&gt;</code>.</li> </ol> <p>If you pick an earlier query (like <code class="language-plaintext highlighter-rouge">mir_built</code>) you’ll see less optimized MIR; if you pick a later stage you may miss the chance to inject cleanly before codegen.</p> <hr/> <h2 id="2-how-to-create-a-custom-rustc-driver">2. How to create a custom rustc driver</h2> <p>Rust no longer supports compiler plugins, so the standard approach is to <strong>replace <code class="language-plaintext highlighter-rouge">rustc</code></strong> with your own binary that embeds rustc and overrides queries.</p> <p>The basic structure is:</p> <div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">CompilerCallbacks</span><span class="p">;</span>

<span class="k">impl</span> <span class="nn">rustc_driver</span><span class="p">::</span><span class="n">Callbacks</span> <span class="k">for</span> <span class="n">CompilerCallbacks</span> <span class="p">{</span>
    <span class="k">fn</span> <span class="nf">config</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="k">self</span><span class="p">,</span> <span class="n">config</span><span class="p">:</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="nn">rustc_interface</span><span class="p">::</span><span class="n">Config</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">config</span><span class="py">.override_queries</span> <span class="o">=</span> <span class="nf">Some</span><span class="p">(|</span><span class="n">_session</span><span class="p">,</span> <span class="n">queries</span><span class="p">|</span> <span class="p">{</span>
            <span class="n">queries</span><span class="py">.optimized_mir</span> <span class="o">=</span> <span class="n">CUSTOM_OPT_MIR</span><span class="p">;</span>
        <span class="p">});</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="k">fn</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
    <span class="k">let</span> <span class="k">mut</span> <span class="n">callbacks</span> <span class="o">=</span> <span class="n">CompilerCallbacks</span><span class="p">;</span>
    <span class="nn">rustc_driver</span><span class="p">::</span><span class="nf">run_compiler</span><span class="p">(</span><span class="o">&amp;</span><span class="nn">std</span><span class="p">::</span><span class="nn">env</span><span class="p">::</span><span class="nf">args</span><span class="p">()</span><span class="py">.collect</span><span class="p">::</span><span class="o">&lt;</span><span class="nb">Vec</span><span class="o">&lt;</span><span class="n">_</span><span class="o">&gt;&gt;</span><span class="p">(),</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="n">callbacks</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div> <p>In practice you usually want a <strong>Cargo subcommand</strong> wrapper (<code class="language-plaintext highlighter-rouge">cargo-instrument-mir</code>) that runs <code class="language-plaintext highlighter-rouge">cargo build</code> but sets:</p> <ul> <li><code class="language-plaintext highlighter-rouge">RUSTC=/path/to/instrument-mir</code></li> </ul> <p>This makes the workflow feel like a normal Cargo command:</p> <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cargo instrument-mir <span class="nt">-p</span> hello <span class="nt">--release</span>
</code></pre></div></div> <hr/> <h2 id="3-how-to-link-a-runtime-crate-safely-even-under-lto">3. How to link a runtime crate safely (even under LTO)</h2> <p>MIR instrumentation usually needs a <strong>runtime crate</strong> to record events (log, track pointers, update global state, etc.). The compiler pass injects calls, but those calls must resolve and the runtime must not be stripped by the linker.</p> <h3 id="31-make-rustc-able-to-see-the-runtime">3.1 Make rustc able to see the runtime</h3> <p>If your driver injects calls like <code class="language-plaintext highlighter-rouge">runtime::__injected_hook(...)</code>, you must ensure rustc can resolve the <code class="language-plaintext highlighter-rouge">runtime</code> crate. A simple way is to add <code class="language-plaintext highlighter-rouge">-L</code> and <code class="language-plaintext highlighter-rouge">--extern</code> when your driver invokes rustc:</p> <div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">let</span> <span class="n">runtime_path</span> <span class="o">=</span> <span class="s">"/path/to/workspace/target/release"</span><span class="p">;</span>

<span class="n">args</span><span class="nf">.push</span><span class="p">(</span><span class="s">"-Zunstable-options"</span><span class="nf">.to_string</span><span class="p">());</span>
<span class="n">args</span><span class="nf">.push</span><span class="p">(</span><span class="nd">format!</span><span class="p">(</span><span class="s">"-L{runtime_path}"</span><span class="p">));</span>
<span class="n">args</span><span class="nf">.push</span><span class="p">(</span><span class="nd">format!</span><span class="p">(</span><span class="s">"--extern=force:runtime={runtime_path}/libruntime.rlib"</span><span class="p">));</span>
</code></pre></div></div> <p>Notes:</p> <ul> <li><code class="language-plaintext highlighter-rouge">-L</code> tells rustc where to find <code class="language-plaintext highlighter-rouge">libruntime.rlib</code>.</li> <li><code class="language-plaintext highlighter-rouge">--extern=force:runtime=...</code> makes the crate available even if it is not referenced from source code.</li> </ul> <h3 id="32-keep-the-runtime-from-being-dead-stripped-especially-with-lto">3.2 Keep the runtime from being dead-stripped (especially with LTO)</h3> <p>With LTO enabled, the linker can remove crates and symbols that appear unused. Since the runtime is only referenced by injected MIR, it can look “unused” from the perspective of the normal Rust source.</p> <p>A robust pattern is to export a C ABI hook and <strong>force a reference to it</strong>:</p> <div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">#[no_mangle]</span>
<span class="k">pub</span> <span class="k">extern</span> <span class="s">"C"</span> <span class="k">fn</span> <span class="nf">__injected_hook</span><span class="p">(</span><span class="n">addr</span><span class="p">:</span> <span class="nb">usize</span><span class="p">)</span> <span class="p">{</span>
    <span class="nd">println!</span><span class="p">(</span><span class="s">"__injected_hook called for address: 0x{:x}"</span><span class="p">,</span> <span class="n">addr</span><span class="p">);</span>
<span class="p">}</span>

<span class="nd">macro_rules!</span> <span class="n">force_runtime</span> <span class="p">{</span>
    <span class="p">(</span><span class="nv">$sym:path</span><span class="p">)</span> <span class="k">=&gt;</span> <span class="p">{</span>
        <span class="nd">#[used]</span>
        <span class="k">static</span> <span class="n">_FORCE_RUNTIME</span><span class="p">:</span> <span class="k">fn</span><span class="p">(</span><span class="nb">usize</span><span class="p">)</span> <span class="o">=</span> <span class="nv">$sym</span><span class="p">;</span>
    <span class="p">};</span>
<span class="p">}</span>

<span class="nd">force_runtime!</span><span class="p">(</span><span class="n">__injected_hook</span><span class="p">);</span>
</code></pre></div></div> <p>This does two things:</p> <ul> <li><code class="language-plaintext highlighter-rouge">#[no_mangle]</code> + <code class="language-plaintext highlighter-rouge">extern "C"</code> gives you a predictable symbol.</li> <li><code class="language-plaintext highlighter-rouge">#[used]</code> prevents the symbol from being dropped during optimization and linking.</li> </ul> <h3 id="33-use-a-stable-hook-abi">3.3 Use a stable hook ABI</h3> <p>Don’t pass typed references like <code class="language-plaintext highlighter-rouge">&amp;T</code> to the runtime hook. It’s brittle and can lead to invalid IR (especially under LTO). Instead, compute a stable representation in MIR (for example, a <code class="language-plaintext highlighter-rouge">usize</code> address) and pass that.</p> <p>On recent nightlies with strict provenance, the cast you want in MIR is typically:</p> <div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nn">CastKind</span><span class="p">::</span><span class="n">PointerExposeProvenance</span>
</code></pre></div></div> <p>That converts a pointer-like value into an integer address in a way that rustc/LLVM accept.</p> <hr/> <h2 id="references">References</h2> <ul> <li><a href="https://jyn.dev/rustc-driver/">Writing a custom <code class="language-plaintext highlighter-rouge">rustc</code> driver</a></li> </ul>]]></content><author><name></name></author><category term="Rust"/><summary type="html"><![CDATA[How to inject calls inside MIR]]></summary></entry><entry><title type="html">(Rust) Pointers provenance is real</title><link href="https://van-ema.github.io/blog/2025/provenance/" rel="alternate" type="text/html" title="(Rust) Pointers provenance is real"/><published>2025-01-20T15:12:00+00:00</published><updated>2025-01-20T15:12:00+00:00</updated><id>https://van-ema.github.io/blog/2025/provenance</id><content type="html" xml:base="https://van-ema.github.io/blog/2025/provenance/"><![CDATA[<h3 id="pointers-are-not-integers">Pointers are not integers</h3> <p>and Rust made it very clear to everyone.</p> <p>Have you ever heard of pointer provenance? If not, don’t worry—you’re not alone. Even if you’ve been programming for 10 years, you might have never needed to think about it. But here’s the thing: provenance is a bit like those “apparent forces” in physics—it exists because we can observe its effects. And since it’s there, we need to define it to make sense of how things really work under the hood.</p> <p>So, where do we even start? How do we know pointer provenance exists, and how can we find it? Let’s break it down.</p> <h2 id="provenance-in-rust">Provenance in Rust</h2> <p>The claim that “pointers are just integers” doesn’t hold up, as demonstrated by <a href="https://godbolt.org/z/ce4bjqjbM">counterexample</a> and explained in [RFC3559 of Rust])https://rust-lang.github.io/rfcs/3559-rust-has-provenance.html). But here’s the catch — while provenance exists, we haven’t really had a way to interact with it directly in our code. That changes with Rust 1.84. The stable release introduces new APIs that let developers manipulate pointers and explicitly define their provenance.</p> <p>As the Rust documentation state</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>It is undefined behavior to offset a pointer across a memory range that is not contained in the allocated object it is derived from
</code></pre></div></div> <p>Rust 1.84 introduces <code class="language-plaintext highlighter-rouge">wrapping_offset</code> to create a pointer that points outside its provenance (dereferencing the pointer is still UB!) However the LLVM IR for <code class="language-plaintext highlighter-rouge">offset</code> and <code class="language-plaintext highlighter-rouge">wrapping_offet</code> is quite similar. Why then <code class="language-plaintext highlighter-rouge">offset</code> leads to undefined behavior even when the result is not deferenced? Let’s look at the <a href="https://godbolt.org/z/3Mz7serhx">LLVM IR here</a></p> <p>with <code class="language-plaintext highlighter-rouge">offset</code>:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>  store i64 6, ptr %count.dbg.spill.i2, align 8
; call core::ptr::const_ptr::&lt;impl *const T&gt;::offset::precondition_check
  call void 
  @"_ZN4core3ptr9const_ptr33_$LT$impl$u20$$BP$const$u20$T$GT$6offset18precondition_check17h058d8998d9a55876E"(ptr %_6, i64 6, i64 1) #20, !dbg !2498
  %_0.i4 = getelementptr inbounds i8, ptr %_6, i64 6, !dbg !2500
</code></pre></div></div> <p>with <code class="language-plaintext highlighter-rouge">wrapped_offset</code>:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> store i64 6, ptr %count.dbg.spill.i2, align 8
  #dbg_declare(ptr %count.dbg.spill.i2, !2351, !DIExpression(), !2354)
  %9 = getelementptr i8, ptr %_6, i64 6, !dbg !2355
</code></pre></div></div> <p>and what LLVM LangRef says about Gep instruction?</p> <blockquote> <p>The result value of the getelementptr may be outside the object pointed to by the base pointer. The result value may not necessarily be used to access memory though, even if it happens to point into allocated storage</p> </blockquote> <p>and again</p> <blockquote> <p>The getelementptr instruction may have a number of attributes that impose additional rules. If any of the rules are violated, the result vale is a <a href="https://llvm.org/docs/LangRef.html#poisonvalues">poison value</a>.</p> </blockquote> <p>and therefore and out-of-bound Gep which is not used afterwards is not UB.</p> <h3 id="further-readings">Further readings:</h3> <ul> <li><a href="https://doc.rust-lang.org/nightly/std/ptr/index.html#strict-provenance">Strict provenance in Rust</a></li> <li><a href="https://www.ralfj.de/blog/2018/07/24/pointers-and-bytes.html">Pointers Are Complicated blog post by Ralf’s Jung</a></li> </ul>]]></content><author><name></name></author><category term="Rust"/><summary type="html"><![CDATA[Provenance and Rust]]></summary></entry></feed>