Formal Soundness

Hybrid "Pincer" Verification

An optimisation that is fast but wrong is a critical flaw. X-Lang guarantees the mathematical soundness of its aggressive transformations using a two-pronged formal verification strategy.

The Limits of Testing

"Program testing can be used to show the presence of bugs, but never to show their absence."
— Edsger W. Dijkstra

Modern compilers perform millions of complex transformations. A single error in the logic of an optimization pass can lead to silent data corruption, security vulnerabilities, or non-deterministic behavior.

While empirical testing is essential, it covers only a minuscule fraction of all possible valid programs. To provide a true guarantee of correctness—a proof that our optimizations are sound for all possible programs that satisfy the semantic contracts—we must turn to formal mathematical methods.

The Danger of Unsound Optimisation

Let P be a program.

Let T be a compiler transformation.

Let ⟦ · ⟧ be the mathematical meaning (semantics).

Formal Definition of a Bug:∃ P s.t. ⟦ T(P) ⟧ ≠ ⟦ P ⟧

If there exists even one program where the transformed meaning differs from the original, the compiler pass is fundamentally flawed.

The Methodology

How We Guarantee Soundness

Full end-to-end compiler verification (like CompCert) often restricts the aggressiveness of optimizations. X-Lang adopts a pragmatic "Pincer" methodology to verify just the semantic bridge.

Language Semantics

Lean 4 Theorem Prover

The top pincer uses the Lean 4 proof assistant to formally prove that the X-Lang front-end's semantic analysis is sound. It proves that when the compiler validates a keyword (like @unique), the program genuinely possesses the mathematical property of non-aliasing. This establishes a "Certificate of Truth."

Keystone Proof: Non-Interference

theorem unique_move_point_does_not_interfere

(h : Heap) (p q : Ptr) (dx dy : Int) (h_neq : p ≠ q) :

(move_point h p dx dy) q = h q := by

unfold move_point

cases h_match : h p with

| none => simp

| some pt =>

have hq_neq : q ≠ p := Ne.symm h_neq

simp [hq_neq]

IR Translation Validation

Alive2 Validator

The bottom pincer uses Alive2 to prove that our compiler's transformations at the LLVM IR level are correct refinements. Taking the Lean certificate as a formal precondition, Alive2 verifies that adding attributes like noalias does not introduce any invalid behavior into the intermediate representation.

Alive2 Script: unique -> noalias

; Validating unique -> noalias

define void @src(ptr %p) {

ret void

}

define void @tgt(ptr noalias %p) {

ret void

}

; OUT: Transformation is correct.

The Verification Matrix

From high-level intent to formally verified LLVM transformation.

X-Lang Keyword	Formal Guarantee (Lean 4)	LLVM Transformation (Alive2)
@pure	Sound Purity Analysis (purity_checker_sound)	Add readonly / readnone attributes
@unique	Non-Interference (unique_move_point_does_not_interfere)	Add noalias parameter attribute
@layout(SoA)	Semantic Equivalence (layout_transformation_is_correct)	Transform aggregate types & GEP instructions
@parallel	Equivalence of sequential & parallel final state	Re-associate arithmetic operations for reduction
@static	Compile-time specialisation == runtime evaluation	Constant folding & branch pruning (Dead Code Elim.)

Formal Proof Library

The complete mathematical foundation for all 10 semantic contracts. For each keyword, explore the high-level language soundness proof in Lean 4 alongside its corresponding low-level IR translation validation in Alive2.

@layout(SoA)

Data-Oriented Design for Vectorisation

Semantic Equivalence of Data Transformation

Proves that extracting a field from the original Array-of-Structures (AoS) representation is mathematically equivalent to accessing the same field from the compiler-transformed Structure-of-Arrays (SoA) representation.

Lean 4 Source

-- No special imports are needed for this proof.

-- 1. The logical definition of a Particle

structure Particle where

x : Float

y : Float

vx : Float

mass : Float

-- 2. Model the two memory layouts

-- The AoS layout is a direct list of Particle structs.

abbrev AoS_Layout := List Particle

-- The SoA layout is a struct containing a list for each field.

structure SoA_Layout where

xs : List Float

ys : List Float

vxs : List Float

masses : List Float

-- 3. Define a function to convert from AoS to SoA

-- This models what the compiler does to transform the data representation.

def to_SoA (data : AoS_Layout) : SoA_Layout :=

{

xs := data.map (fun p => p.x),

ys := data.map (fun p => p.y),

vxs := data.map (fun p => p.vx),

masses := data.map (fun p => p.mass)

}

-- 4. Define accessor functions for each layout

-- These model the `particles[i].x` operation.

def get_x_from_AoS (data : AoS_Layout) (i : Nat) : Float :=

(data.map (fun p => p.x))[i]!

def get_x_from_SoA (data : SoA_Layout) (i : Nat) : Float :=

data.xs[i]!

-- 5. The Semantic Equivalence Theorem

-- Theorem: For any list of particles `data` and any valid index `i`,

-- getting the `x` field from the AoS representation is equal to

-- getting the `x` field from the converted SoA representation.

theorem layout_transformation_is_correct

(data : List Particle) (i : Nat) (_ : i < data.length) :

get_x_from_AoS data i = get_x_from_SoA (to_SoA data) i := by

-- We ask Lean to simplify the goal by unfolding all the definitions.

-- The key step is using the `List.get!_map` lemma, which states that

-- getting the i-th element of a mapped list is the same as getting

-- the i-th element of the original list and then applying the function.

simp [get_x_from_AoS, get_x_from_SoA, to_SoA]

Store-to-Load Forwarding in SoA

Validates that the structured SoA memory accesses allow the compiler to safely forward stored values directly to subsequent loads, bypassing memory.

Alive2 Translation Script

%SoAHandle = type { ptr }

define i32 @src(ptr %handle, i32 %i) #0 {

entry:

%x_ptr_ptr = getelementptr inbounds %SoAHandle, ptr %handle, i32 0, i32 0

%x_base = load ptr, ptr %x_ptr_ptr

%x_addr = getelementptr inbounds i32, ptr %x_base, i32 %i

store i32 10, ptr %x_addr

%x_ptr_ptr_2 = getelementptr inbounds %SoAHandle, ptr %handle, i32 0, i32 0

%x_base_2 = load ptr, ptr %x_ptr_ptr_2

%x_addr_2 = getelementptr inbounds i32, ptr %x_base_2, i32 %i

%result = load i32, ptr %x_addr_2

ret i32 %result

}

define i32 @tgt(ptr %handle, i32 %i) #0 {

entry:

%x_ptr_ptr = getelementptr inbounds %SoAHandle, ptr %handle, i32 0, i32 0

%x_base = load ptr, ptr %x_ptr_ptr

%x_addr = getelementptr inbounds i32, ptr %x_base, i32 %i

store i32 10, ptr %x_addr

ret i32 10

}

attributes #0 = { "nounwind" }

@unique

Eliminating Alias Analysis Pessimism

Non-Interference Guarantee

Proves the core property of non-aliasing: a mutation operation via a unique pointer on the heap is guaranteed to be completely unchanged when observed from any other distinct pointer.

Lean 4 Source

import Mathlib.Tactic

-- 1. Model the types and the memory

structure Point where

x : Int

y : Int

abbrev Ptr := Nat

abbrev Heap := Ptr → Option Point -- Memory is a map from Pointers to optional Points

-- 2. Model the `move_point` function

def move_point (h : Heap) (p : Ptr) (dx dy : Int) : Heap :=

match h p with

| some pt =>

let new_pt := { x := pt.x + dx, y := pt.y + dy }

Function.update h p (some new_pt)

| none =>

-- 3. The Non-Interference Theorem

theorem unique_move_point_does_not_interfere

(h : Heap) (p q : Ptr) (dx dy : Int) (h_neq : p ≠ q) :

(move_point h p dx dy) q = h q := by

-- Unfold the definition of our function to see its logic.

unfold move_point

-- FIX: Use `cases` to split the proof based on the value of `h p`.

-- `h_match` will be a new hypothesis: `h p = none` or `h p = some pt`.

cases h_match : h p with

| none =>

-- Case 1: The pointer `p` was invalid (`h p = none`).

-- `move_point` does nothing. We use `simp` with our new hypothesis

-- `h_match` to simplify the `match` expression in the goal to `h`.

-- The goal becomes `h q = h q`, which is true.

simp

| some pt =>

-- Case 2: The pointer `p` was valid (`h p = some pt`).

-- Simplify the goal.

simp

-- The goal is now `(Function.update h p ...) q = h q`. This is true

-- because `p ≠ q`. Use symmetry on the inequality and `simp` to reduce

-- the `Function.update` application (which is defined using an `if`).

have hq_neq : q ≠ p := Ne.symm h_neq

simp [hq_neq]

Preventing Unsafe Forwarding (Counterexample)

Demonstrates that without the 'noalias' guarantee, store-to-load forwarding is invalid. Alive2 generates a counterexample, proving the semantic contract's absolute necessity.

Alive2 Translation Script

; This transformation is INVALID and will FAIL to compile/verify.

;

; It demonstrates what happens if you try to perform the

; 'unique'-based optimization (store-to-load forwarding)

; *without* the 'noalias' keyword.

;

; Alive2 will find a counterexample where %d1 and %d2

; are the same pointer.

; In that case, @src will return 20, but @tgt returns 10.

define i32 @src(ptr %d1, ptr %d2) #0 {

entry:

; 1. Store 10 to %d1

store i32 10, ptr %d1

; 2. Store 20 to %d2

; !! Without noalias, %d2 MIGHT be the same as %d1 !!

store i32 20, ptr %d2

; 3. Load from %d1

; This load might see 10, or it might see 20.

%result = load i32, ptr %d1

ret i32 %result

}

define i32 @tgt(ptr %d1, ptr %d2) #0 {

entry:

; 1. Store 10 to %d1

store i32 10, ptr %d1

; 2. Store 20 to %d2

store i32 20, ptr %d2

; 3. !! UNSAFE OPTIMISATION !!

; The compiler incorrectly assumes %d1 still holds 10.

ret i32 10

}

attributes #0 = { "nounwind" }

@invariant

Axiomatic Check Elimination

Preservation of Invariant Properties

Proves that if an axiomatic property (like length < capacity) holds true initially, subsequent operations structurally preserve that invariant across the data structure's lifetime.

Lean 4 Source

import Mathlib.Tactic

-- 1. Define the struct and its properties

structure SafeArray where

data_len : Nat

index : Nat

-- 2. Define the invariant as a predicate on the struct

def SafeArray.invariant (self : SafeArray) : Prop :=

self.index < self.data_len

-- 3. Define a method that operates on the struct

def increment_index (self : SafeArray) : SafeArray :=

if self.index + 1 < self.data_len then

{ self with index := self.index + 1 }

else

self

-- 4. The Preservation Theorem and Its Proof

theorem increment_index_preserves_invariant (s : SafeArray) (h : s.invariant) :

(increment_index s).invariant := by

unfold increment_index SafeArray.invariant

split

· -- Case 1: The `if` condition is true.

-- The `split` tactic automatically adds the proof of this condition as a

-- hypothesis, so we can prove the goal with `assumption`.

assumption

· -- Case 2: The `if` condition is false.

-- The `else` branch returned the original `s`. The goal is the original invariant.

exact h

Branch Pruning via llvm.assume

Shows how injecting an 'llvm.assume' intrinsic based on the invariant axiom allows the optimizer to formally prove a defensive panic path is unreachable dead code.

Alive2 Translation Script

; --- Declarations ---

; This intrinsic provides the "axiom" from the invariant

; to the SMT solver and optimizer.

declare void @llvm.assume(i1) #1

; This represents the "panic" path (Figure 8-6)

declare void @abort() #2

; --- Source (Un-optimized) ---

; This function contains the redundant defensive check.

; It matches the "Before" panel of Figure 8-6,

; with a diamond-shaped CFG.

define i32 @src(i32 %len, i32 %capacity) #0 {

entry:

; 1. The 'invariant' Contract:

; The compiler is given this axiom: len <= capacity

%invariant_holds = icmp sle i32 %len, %capacity

call void @llvm.assume(i1 %invariant_holds)

; 2. The Defensive Runtime Check:

; The code redundantly checks the *same* condition.

%defensive_check = icmp sle i32 %len, %capacity

br i1 %defensive_check, label %safe_path, label %panic_path

safe_path:

; This is the "fast path" / work

%new_len = add i32 %len, 1

ret i32 %new_len

panic_path:

; This is the "check fail path"

; The invariant proves this is unreachable.

call void @abort()

unreachable

}

; --- Target (Optimized) ---

; The compiler has used the 'invariant' to prove the

; panic path is dead, pruning the branch.

; This matches the "After" panel of Figure 8-6.

define i32 @tgt(i32 %len, i32 %capacity) #0 {

entry:

; 1. The 'invariant' Contract:

; The axiom is still present.

%invariant_holds = icmp sle i32 %len, %capacity

call void @llvm.assume(i1 %invariant_holds)

; 2. The Defensive Check is GONE.

; The 'safe_path:' label is removed because the

; branch was eliminated. This is all one basic block.

; Only the "fast path" remains.

%new_len = add i32 %len, 1

ret i32 %new_len

}

; --- Attributes ---

attributes #0 = { "nounwind" }

attributes #1 = { "nocapture" "noundef" }

attributes #2 = { "noreturn" "nounwind" }

@pure

Guaranteeing Referential Transparency

Sound Purity Analysis

Proves that the compiler's efficient AST traversal (is_pure_checker) correctly implies the abstract, logical specification of referential transparency (CallsOnlyPureFunctions).

Lean 4 Source

import Std

abbrev FuncName := String

abbrev FuncContext := FuncName → Bool

inductive Expr where

| literal (n : Nat) : Expr

| add (e1 e2 : Expr) : Expr

| call (f : FuncName) (args : List Expr) : Expr

-- This helper function makes list-checking obvious to the termination checker.

def check_all_exprs (ctx : FuncContext) (es : List Expr) : Bool :=

match es with

| [] => true

| h :: t => is_pure_checker ctx h && check_all_exprs ctx t

where

-- We define `is_pure_checker` inside the helper so they can be mutually recursive.

is_pure_checker (ctx : FuncContext) (e : Expr) : Bool :=

match e with

| .literal _ => true

| .add e1 e2 => is_pure_checker ctx e1 && is_pure_checker ctx e2

| .call f args => (ctx f) && (check_all_exprs ctx args)

-- We can expose the main checker function at the top level.

def is_pure_checker := check_all_exprs.is_pure_checker

def CallsOnlyPureFunctions (ctx : FuncContext) (e : Expr) : Prop :=

match e with

| .literal _ => True

| .add e1 e2 => CallsOnlyPureFunctions ctx e1 ∧ CallsOnlyPureFunctions ctx e2

| .call f args => (ctx f = true) ∧ (∀ arg ∈ args, CallsOnlyPureFunctions ctx arg)

-- The Soundness Theorem

theorem purity_checker_sound (ctx : FuncContext) (e : Expr)

: is_pure_checker ctx e = true → CallsOnlyPureFunctions ctx e := by

-- Define the proof recursively, which works better for nested inductives.

let rec prove (e : Expr) (h : is_pure_checker ctx e = true) : CallsOnlyPureFunctions ctx e :=

match e with

| .literal n => by

simp [CallsOnlyPureFunctions]

| .add e1 e2 => by

simp [CallsOnlyPureFunctions]

-- FIX: Be more explicit with the simp call to ensure all definitions unfold.

simp [is_pure_checker, check_all_exprs.is_pure_checker, Bool.and_eq_true] at h

constructor

· exact prove e1 h.left

· exact prove e2 h.right

| .call f args => by

simp [CallsOnlyPureFunctions]

simp [is_pure_checker, check_all_exprs.is_pure_checker] at h

have h_args_pure : ∀ arg ∈ args, is_pure_checker ctx arg = true := by

let rec prove_list (args' : List Expr) (h_list : check_all_exprs ctx args' = true)

: ∀ arg ∈ args', is_pure_checker ctx arg = true := by

match args' with

| [] => simp

| h' :: t' =>

simp [check_all_exprs] at h_list

intro arg h_mem

cases h_mem

· exact h_list.left

· exact prove_list t' h_list.right arg (by assumption)

exact prove_list args h.right

constructor

· exact h.left

· intro arg h_mem

exact prove arg (h_args_pure arg h_mem)

exact prove e

Lowering to readnone Attribute

Validates that appending the 'readnone' attribute to a pure function's LLVM IR correctly restricts its allowed behaviors without altering semantics, safely enabling CSE/DCE.

Alive2 Translation Script

define void @src() #0 {

ret void

}

attributes #0 = { "nounwind" }

define void @tgt() #0 {

ret void

}

attributes #0 = { "nounwind" "readnone" }

@memoise

Automated Dynamic Programming

Observational Equivalence of Memoisation

Guarantees that for pure deterministic functions, intercepting execution with a stateful caching mechanism yields the exact same observable results as unmemoised re-computation.

Lean 4 Source

-- We model the pure, underlying implementations of the functions.

-- Their output depends only on their inputs, with no side effects.

def six_impl (x : Int) : Int := x - 1

def how_many_impl : Int := 3

-- Models the execution trace of the non-memoised program.

-- Each call is executed independently.

def execute_non_memoised : List Int := [

six_impl 8,

how_many_impl,

how_many_impl

]

-- Models the semantic outcome of the memoised program.

-- The result of each unique pure function call is computed once and reused.

def execute_memoised : List Int :=

-- The result of six(8) is computed once and bound to `six_result`.

let six_result := six_impl 8

-- The result of how_many() is computed once.

let how_many_result := how_many_impl

-- The final sequence of values is constructed from these reused results.

[

six_result,

how_many_result,

how_many_result

]

-- The theorem states that the list of observable outputs from both

-- execution models is identical.

theorem memoise_keyword_semantic_guarantee :

execute_non_memoised = execute_memoised := by

-- The proof is by reflexivity (`rfl`). Lean can compute and see that

-- both definitions unfold to the exact same list of values: [7, 7, 3, 3].

rfl

Safe Function Attributes for Wrappers

Verifies that appending the 'readnone' attribute to a pure, memoised function's LLVM IR correctly restricts its allowed behaviors without altering semantics.

Alive2 Translation Script

define double @how_many() #0 {

entry:

ret double 3.000000e+00

}

define void @print(double %0) {

entry:

%printf_call = call i32 (ptr, ...) @printf(ptr @fmt_str, double %0)

ret void

}

@fmt_str = private unnamed_addr constant [6 x i8] c"%.0f\0A\00", align 1

declare i32 @printf(ptr, ...)

define double @six(i32 %0) #0 {

entry:

%x = alloca i32, align 4

store i32 %0, ptr %x, align 4

%x1 = load i32, ptr %x, align 4

%subtmp = sub i32 %x1, 1

%ret_i2f = sitofp i32 %subtmp to double

ret double %ret_i2f

}

define i32 @src() #0 {

%result = alloca double, align 8

%calltmp = call double @six(i32 8)

store double %calltmp, ptr %result, align 8

%result1 = load double, ptr %result, align 8

call void @print(double %result1)

%calltmp2 = call double @six(i32 8)

call void @print(double %calltmp2)

%calltmp3 = call double @how_many()

call void @print(double %calltmp3)

%calltmp4 = call double @how_many()

call void @print(double %calltmp4)

ret i32 0

}

attributes #0 = { "nounwind" }

define i32 @tgt() #0 {

%result = alloca double, align 8

%calltmp = call double @six(i32 8)

store double %calltmp, ptr %result, align 8

%result1 = load double, ptr %result, align 8

call void @print(double %result1)

%calltmp2 = call double @six(i32 8)

call void @print(double %calltmp2)

%calltmp3 = call double @how_many()

call void @print(double %calltmp3)

%calltmp4 = call double @how_many()

call void @print(double %calltmp4)

ret i32 0

}

attributes #0 = { "nounwind" "readnone" }

@static

Partial Evaluation via Function Specialisation

Compile-Time Specialisation is Equivalent to Runtime Evaluation

Proves that partially evaluating a function with a known static compile-time constant is logically indistinguishable from invoking the generic function with that same value at runtime.

Lean 4 Source

-- We don't need any complex libraries for this proof.

-- 1. Model the generic function as defined in X-lang

def process (config : String) (data : Int) : Int :=

if config = "fast" then

data * 2

else if config = "precise" then

data * 4

else

data

-- 2. Model the specialized functions that are generated

-- This is what the compiler creates for `process("fast", ...)`

def process_fast (data : Int) : Int :=

data * 2

-- This is what the compiler creates for `process("precise", ...)`

def process_precise (data : Int) : Int :=

data * 4

-- 3. The Correctness Theorems

-- Theorem 1: Proves that calling `process` with the static string "fast"

-- is equivalent to calling the specialized `process_fast` function.

theorem static_specialization_fast_is_correct (data : Int) :

process "fast" data = process_fast data := by

-- The proof is simple. We just ask Lean to simplify the definitions.

-- `simp` unfolds `process`, sees that `"fast" = "fast"` is true,

-- and simplifies the `if` statement. Then it unfolds `process_fast`.

-- The goal becomes `data * 2 = data * 2`, which is true.

simp [process, process_fast]

-- Theorem 2: Proves the same for the "precise" specialization.

theorem static_specialization_precise_is_correct (data : Int) :

process "precise" data = process_precise data := by

-- `simp` unfolds `process`, sees that `"precise" = "fast"` is false,

-- so it proceeds to the `else` branch. It then sees that

-- `"precise" = "precise"` is true and simplifies the inner `if`.

-- Finally, it unfolds `process_precise` to complete the proof.

simp [process, process_precise]

Constant Folding & Branch Pruning

Proves that cloning a function and replacing the static parameter with a constant safely allows the optimizer to fold branches and prune dead code blocks without altering valid paths.

Alive2 Translation Script

; --- Declarations ---

declare void @panic() #2

; --- Source (The "Cloned" Function, Pre-Optimization) ---

; This models the state after your 'StaticSpecializationPass'

; has cloned the function and replaced the 'static' parameter

; with the constant '1', but before optimization.

define i32 @src(i32 %value) #0 {

entry:

; The static 'opcode' parameter's use is replaced by '1'

%is_op_1 = icmp eq i32 1, 1

br i1 %is_op_1, label %op_1, label %check_op_2

check_op_2:

; This branch is now dead

; The static 'opcode' parameter's use is also replaced by '1'

%is_op_2 = icmp eq i32 1, 2

br i1 %is_op_2, label %op_2, label %panic_block

op_1:

; This is the only "live" logic path

%res_1 = add i32 %value, 10

ret i32 %res_1

op_2:

; This block is now dead code

%res_2 = mul i32 %value, 2

ret i32 %res_2

panic_block:

; This block is also dead code

call void @panic()

unreachable

}

; --- Target (The "Specialized" Function, Post-Optimization) ---

; This models the final result. The optimizer has constant-folded

; the 'icmp' instructions, seen that the 'op_1' path is the

; only one, and pruned all other blocks.

define i32 @tgt(i32 %value) #0 {

entry:

; The entire interpretive branch structure has been

; "baked in" to this single instruction.

%res_1 = add i32 %value, 10

ret i32 %res_1

}

; --- Attributes ---

attributes #0 = { "nounwind" }

attributes #2 = { "noreturn" "nounwind" }

@parallel

Programmer-Guided Fork-Join Concurrency

Equivalence of Sequential & Parallel Final State

Formally guarantees that the final state of an array after a sequential loop of updates is completely identical to the final state produced by a data-independent parallel mapping.

Lean 4 Source

-- The type for our array's state. It's a map from an index in the range [0, 1023] to an integer value.

def ArrayState := Fin 1024 → Int

-- The independent calculation performed in the loop body.

-- For a given index `i` (as a Nat), the new value is `i * 2`.

def loop_body (i : Nat) : Int :=

(i : Int) * 2

-- A helper function to model updating a single element in the array.

-- We take the index as a Nat and compare with `j.val` when reading.

def update_array (arr : ArrayState) (i : Nat) (val : Int) : ArrayState :=

fun j => if i = j.val then val else arr j

-- Models the execution of a standard, sequential for-loop.

-- It iterates `n` times, updating the array at each step.

-- We work with Nat indices during the recursion to avoid constructing `Fin` from Nat.

def run_sequential_loop (n : Nat) (initial_array : ArrayState) : ArrayState :=

match n with

| 0 => initial_array

| k + 1 =>

let prev_state := run_sequential_loop k initial_array

-- Apply the k-th update to the state resulting from the first k updates

update_array prev_state k (loop_body k)

-- Models the execution of the parallel for-loop.

-- The final state is one where every element `j` has been updated by `loop_body j`.

-- This is a direct mapping because all operations are independent.

def run_parallel_loop (_ : ArrayState) : ArrayState :=

fun j => loop_body j.val

-- The main theorem.

-- It asserts that the final state of the array after a sequential loop of 1024 iterations

-- is identical to the final state of the parallel loop.

theorem parallel_for_semantic_equivalence (initial_array : ArrayState) :

run_sequential_loop 1024 initial_array = run_parallel_loop initial_array :=

-- To prove the two array states (which are functions) are equal, we prove they

-- produce the same output for any given input index `j`. This is functional extensionality.

funext (fun j =>

-- We need to show that for any index `j`, `(run_sequential_loop 1024 arr) j = loop_body j`.

-- We prove a more general statement by induction: after `n` steps, all indices `k < n` are correct.

have h_general : ∀ n, ∀ (k : Fin 1024), k.val < n → (run_sequential_loop n initial_array) k = loop_body k.val := by

intro n

induction n with

| zero => -- Base case: n = 0. The condition `k.val < 0` is never true, so the statement is vacuously true.

intro k hk

contradiction

| succ m ih => -- Inductive step: Assume it's true for `m`, prove for `m+1`.

intro k hk

-- Unfold the definition of the sequential loop for `m+1`

simp [run_sequential_loop]

-- Consider whether the current updated index `m` matches `k.val`

cases (decEq m k.val) with

| isTrue h_eq => -- Case 1: k.val = m, i.e. k is the index of the *current* iteration

-- The update function at index `m` will return `loop_body m` when evaluated at `k`.

rw [update_array, if_pos h_eq]

-- Replace `m` with `k.val` in the loop body to match the goal shape.

rw [h_eq]

| isFalse h_neq => -- Case 2: k is an index from a *previous* iteration (`k.val < m`)

-- The updat function at index `m` will preserve the old value at index `k`.

rw [update_array, if_neg h_neq]

-- By the induction hypothesis `ih`, the value at `k` was already correct after `m` steps.

apply ih

-- We must show that `k.val < m`, which follows from `k.val < m+1` and `k.val ≠ m`.

apply Nat.lt_of_le_of_ne (Nat.le_of_lt_succ hk)

intro heq

exact h_neq heq.symm

-- The main goal now follows from our general proof `h_general` by setting n = 1024.

-- Since `j` is a `Fin 1024`, its value is always less than 1024.

h_general 1024 j j.isLt

)

Parallel-Friendly Association

Models the re-grouping of a parallel reduction. Alive2 proves that transforming sequential additions into a tree-like structure is a safe optimization.

Alive2 Translation Script

; This proof models the "re-grouping" of a parallel

; reduction, using integers.

;

; Your 'parallel for' contract implies operations

; can be re-grouped for parallel execution.

; This demonstrates the *structure* of the optimization,

; transforming sequential adds into a tree-like add.

; --- Source (Sequential Association) ---

; This models:

; acc = (acc + a) + b

define i32 @src(i32 %acc, i32 %a, i32 %b) #0 {

entry:

%tmp = add i32 %acc, %a

%result = add i32 %tmp, %b

ret i32 %result

}

; --- Target (Parallel-friendly Association) ---

; This models:

; acc = acc + (a + b)

; This allows 'a' and 'b' to be computed in parallel

; before the final reduction.

define i32 @tgt(i32 %acc, i32 %a, i32 %b) #0 {

entry:

%tmp = add i32 %a, %b

%result = add i32 %acc, %tmp

ret i32 %result

}

; --- Attributes ---

attributes #0 = { "nounwind" }

@become

Guaranteed Tail-Call Optimisation (TCO)

Algebraic Proof of Iterative vs Recursive Tail Calls

A formal induction proof demonstrating that resolving a tail-recursive function iteratively provides the identical mathematical result, justifying the TCO optimization.

Lean 4 Source

import Mathlib.Tactic

-- 1. The tail-recursive function from the X-lang example

def sum_helper_rec (n acc : Nat) : Nat :=

if n = 0 then

acc

else

sum_helper_rec (n - 1) (n + acc)

termination_by n

-- 2. A helper lemma with a stronger induction hypothesis

theorem sum_helper_correct (n : Nat) : ∀ (acc : Nat),

sum_helper_rec n acc = n * (n + 1) / 2 + acc := by

induction n with

| zero =>

intro acc

simp [sum_helper_rec]

| succ k ih =>

intro acc

unfold sum_helper_rec

have h_ne : Nat.succ k ≠ 0 := Nat.succ_ne_zero k

simp

rw [ih (k + 1 + acc)]

-- The algebraic goal is too complex for one tactic due to Nat division.

-- We prove it step-by-step.

-- First, cancel `acc` from both sides to simplify the goal.

rw [add_assoc, add_assoc]

apply add_right_cancel

-- Goal is now: `k * (k + 1) / 2 + (k + 1) = (k + 1) * (k + 2) / 2`

-- To handle the division correctly, we state the key mathematical fact:

-- `k * (k + 1)` is always even (i.e., divisible by 2).

have ⟨m, hm⟩ := Nat.even_mul_succ_self k

-- k * (k + 1) = m + m, rewrite to 2 * m

have hk : k * (k + 1) = 2 * m := by

rw [hm, ← Nat.two_mul m]

-- hence (k * (k + 1)) / 2 = m

have hdivL : (k * (k + 1)) / 2 = m := by

rw [hk]

apply Nat.mul_div_cancel_left

decide

-- (k + 1) * (k + 2) = k * (k + 1) + 2 * (k + 1) = 2 * (m + (k + 1))

have hr : (k + 1) * (k + 2) = 2 * (m + (k + 1)) := by

calc

(k + 1) * (k + 2) = k * (k + 1) + 2 * (k + 1) := by ring

_ = (2 * m) + 2 * (k + 1) := by rw [hk]

_ = 2 * (m + (k + 1)) := by ring

-- hence ((k + 1) * (k + 2)) / 2 = m + (k + 1)

have hdivR : ((k + 1) * (k + 2)) / 2 = m + (k + 1) := by

rw [hr]

apply Nat.mul_div_cancel_left

decide

-- replace both divisions with the simplified forms and finish by ring

rw [hdivL, hdivR]

ring

-- 3. The Main Theorem

theorem sum_with_become_is_correct (n : Nat) :

sum_helper_rec n 0 = n * (n + 1) / 2 := by

rw [sum_helper_correct n 0]

simp

TCO to Loop Transformation

Validates that lowering a guaranteed tail call into an explicit loop with phi nodes preserves exact program semantics while executing in O(1) stack space.

Alive2 Translation Script

; --- Source (The "become" version) ---

; The 'become' contract has added the 'tail' attribute

; to the recursive call, guaranteeing O(1) stack space.

define i64 @src(i64 %n, i64 %acc) #0 {

entry:

; Base case: if (n == 0)

%cond = icmp eq i64 %n, 0

br i1 %cond, label %base_case, label %recursive_case

base_case:

; return acc

ret i64 %acc

recursive_case:

; n = n - 1

%n_minus_1 = sub i64 %n, 1

; acc = acc + n

%acc_plus_n = add i64 %acc, %n

; This is: become sum(n - 1, acc + n)

; The 'tail' attribute is the 'become' contract.

%result = tail call i64 @src(i64 %n_minus_1, i64 %acc_plus_n)

ret i64 %result

}

; --- Target (The "loop" version) ---

; This models the final machine code (Listing 7-4).

; The compiler has transformed the recursion into an

; efficient loop with O(1) stack space.

define i64 @tgt(i64 %n_in, i64 %acc_in) #0 {

entry:

; This is the loop pre-header

br label %loop_header

loop_header:

; PHI nodes merge values from the start (%entry)

; and the loopback (%loop_body)

%n = phi i64 [ %n_in, %entry ], [ %n_next, %loop_body ]

%acc = phi i64 [ %acc_in, %entry ], [ %acc_next, %loop_body ]

; Base case: if (n == 0)

%cond = icmp eq i64 %n, 0

br i1 %cond, label %exit, label %loop_body

loop_body:

; n = n - 1

%n_next = sub i64 %n, 1

; acc = acc + n

%acc_next = add i64 %acc, %n

; This is the 'jmp .L_loop_body'

br label %loop_header

exit:

; return acc

ret i64 %acc

}

; --- Attributes ---

attributes #0 = { "nounwind" }

@multi

Extensible and Efficient Multiple Dispatch

Correctness of the Generated Dispatcher

Proves that replacing chained virtual functions with the compiler-generated multi-resolver correctly dispatches to the exact same monomorphic implementations.

Lean 4 Source

-- No special imports are needed for this proof.

-- 1. Model the types from the X-lang example

structure Circle where

r : Float

structure Rect where

w : Float

h : Float

-- Create a general "Shape" type that can be one of the concrete types.

-- This models the polymorphism required for multiple dispatch.

inductive Shape where

| circle (c : Circle)

| rect (r : Rect)

-- 2. Model the specific implementations of the multi-function

def collide_circle_circle (_ : Circle) (_ : Circle) : Bool := true

def collide_circle_rect (_ : Circle) (_ : Rect) : Bool := false

def collide_rect_circle (_ : Rect) (_ : Circle) : Bool := false

def collide_rect_rect (_ : Rect) (_ : Rect) : Bool := true

-- 3. Model the generic dispatcher

-- This function simulates the "multi resolver" that your compiler builds.

-- It inspects the types of its arguments and calls the correct implementation.

def collide (s1 s2 : Shape) : Bool :=

match s1, s2 with

| .circle c1, .circle c2 => collide_circle_circle c1 c2

| .circle c1, .rect r2 => collide_circle_rect c1 r2

| .rect r1, .circle c2 => collide_rect_circle r1 c2

| .rect r1, .rect r2 => collide_rect_rect r1 r2

-- 4. The Correctness Theorems

-- We prove one theorem for each case to show the dispatcher works correctly.

theorem collide_dispatches_to_circle_circle (c1 c2 : Circle) :

collide (.circle c1) (.circle c2) = collide_circle_circle c1 c2 := by

-- The proof is by definition. `simp` unfolds `collide`, evaluates the

-- `match` statement, and sees that the first branch is taken.

simp [collide]

theorem collide_dispatches_to_circle_rect (c : Circle) (r : Rect) :

collide (.circle c) (.rect r) = collide_circle_rect c r := by

-- `simp` unfolds `collide` and sees that the second branch is taken.

simp [collide]

theorem collide_dispatches_to_rect_circle (r : Rect) (c : Circle) :

collide (.rect r) (.circle c) = collide_rect_circle r c := by

-- `simp` unfolds `collide` and sees that the third branch is taken.

simp [collide]

theorem collide_dispatches_to_rect_rect (r1 r2 : Rect) :

collide (.rect r1) (.rect r2) = collide_rect_rect r1 r2 := by

-- `simp` unfolds `collide` and sees that the fourth branch is taken.

simp [collide]

Switch / Jump Table Lowering

Proves that collapsing a complex tree of if/else dynamic type checks into a single 64-bit key and a jump-table (switch) preserves dispatch semantics perfectly.

Alive2 Translation Script

; --- Declarations ---

; We add the 'noreturn' attribute to panic.

declare void @panic() #2

; --- Source (Un-optimized) ---

; We add 'noundef' to the parameters. This is the

; semantic contract that our type IDs are never 'poison'.

define void @src(i32 noundef %tid_a, i32 noundef %tid_b) #0 {

entry:

%is_a_circle = icmp eq i32 %tid_a, 1

br i1 %is_a_circle, label %a_is_circle, label %a_is_not_circle

a_is_not_circle:

%is_a_rect = icmp eq i32 %tid_a, 2

br i1 %is_a_rect, label %a_is_rect, label %default_panic

a_is_rect:

%is_b_circle_2 = icmp eq i32 %tid_b, 1

br i1 %is_b_circle_2, label %case_rc, label %a2_b_not_1

case_rc:

ret void

a2_b_not_1:

%is_b_rect_2 = icmp eq i32 %tid_b, 2

br i1 %is_b_rect_2, label %case_rr, label %default_panic

case_rr:

ret void

a_is_circle:

%is_b_circle = icmp eq i32 %tid_b, 1

br i1 %is_b_circle, label %case_cc, label %a1_b_not_1

case_cc:

ret void

a1_b_not_1:

%is_b_rect = icmp eq i32 %tid_b, 2

br i1 %is_b_rect, label %case_cr, label %default_panic

case_cr:

ret void

default_panic:

call void @panic()

; 'unreachable' is the correct way to end

; a block after a 'noreturn' call.

unreachable

}

; --- Target (Optimized) ---

; We add 'noundef' here as well, to match the contract.

define void @tgt(i32 noundef %tid_a, i32 noundef %tid_b) #0 {

entry:

%t0z = zext i32 %tid_a to i64

%t1z = zext i32 %tid_b to i64

%t0sh = shl i64 %t0z, 32

%key = or i64 %t0sh, %t1z

switch i64 %key, label %default [

i64 4294967297, label %case_cc

i64 4294967298, label %case_cr

i64 8589934593, label %case_rc

i64 8589934594, label %case_rr

]

case_rr:

ret void

case_rc:

ret void

case_cr:

ret void

case_cc:

ret void

default:

call void @panic()

unreachable

}

; --- Attributes ---

attributes #0 = { "nounwind" }

attributes #2 = { "noreturn" "nounwind" }

@no throws

Managing the High Cost of Exceptions

Guaranteeing Nounwind Control Flow

Proves that a program exclusively comprising guaranteed non-throwing functions resolves cleanly, justifying the absence of invoke structures and landing pads.

Lean 4 Source

-- No special imports are needed for this proof.

-- 1. Model the possible outcomes

inductive MyError where

| generic

abbrev Throws (α : Type) := Except MyError α

-- 2. Model the functions from the X-lang example

def add_one (x : Float) : Float :=

x + 1.0

def might_fail (x : Float) : Throws Float :=

.ok (x * 2.0)

-- 3. Model a program that calls these functions

-- FIX: Removed the unused `add_one` call for a cleaner definition.

def main_logic : Throws Float := do

let b : Float ← might_fail 3.0

return b

-- 4. The Correctness Theorem

theorem throwing_program_succeeds_correctly :

main_logic = .ok (3.0 * 2.0) := by

simp [main_logic, might_fail]

Safe 'nounwind' Optimization

Validates that a function guaranteed not to throw can be safely annotated with 'nounwind', allowing callers to generate simple 'call' instructions instead of 'invoke'.

Alive2 Translation Script

; This proof validates the core of the "no throws" contract.

; It proves that adding the 'nounwind' attribute

; is a valid optimization.

;

; @src can return normally OR unwind (throw).

; @tgt can ONLY return normally.

;

; Alive2 proves this is safe because @tgt's behaviors

; are a subset of @src's behaviors.

define void @src() {

ret void

}

define void @tgt() nounwind {

ret void

}