📝ArcInfer-Build-Journal.md

Build Journal

A chronological log of what I built, in what order, and what I learned at each step.

Day 1: Research and Architecture

Spent the day reading. My goal was to answer one question: what's the smallest useful thing I can build on Arcium that demonstrates encrypted AI inference?

I started with Arcium's official docs - Arcis DSL reference, MPC protocols (Cerberus, Manticore, XOR), official examples, and the computation lifecycle. Then I researched what kind of ML model fits inside MPC.

Key Decisions

Full transformers are too expensive - DistilBERT has ~67M parameters and requires operations that are brutally expensive in MPC. Small FFN classifiers are a perfect fit - a 3-layer feedforward network with ~4K parameters, using only matrix multiplication and square activation. Total MPC depth: ~10 rounds.

Two-stage approach: Run the heavy embedding model (all-MiniLM-L6-v2, 384-dim output) on the client, send only the embedding to MPC.

Q16.16 Fixed-Point: I chose Q16.16 because neural net weights are typically in [-2, 2] and we don't need more than 4 digits of precision. It maps cleanly to Arcis i32 type.

Square Activations: Read three papers (PolyMPCNet, CrypTen, BOLT) that validated this approach. Square activation is the lowest-cost MPC-compatible activation - one multiplication versus 20-40 rounds for ReLU.

Day 2: Project Setup and Fixed-Point TDD

Started with a cargo workspace. Three crates:

arcinfer-core: Zero-dependency pure Rust
arcinfer-inference: tract-onnx and tokenizers
arcinfer-pipeline: Glues everything together

TDD Results

Cycle 1-4: Fixed-point module - 20 tests passing
Cycle 5-8: Neural network layers - 31 tests passing, 0 failing

The Critical Lesson

During end-to-end testing, I discovered that square activation destroys sign information: (-0.6)² == (0.6)². Both inputs produce the same hidden representation. The classifier literally cannot distinguish them.

Fix: Redesign the network architecture - instead of computing differences (which creates opposite signs that square to the same value), route each input dimension to its own neuron. Now magnitude differences are preserved after squaring.

Lesson: This is a real architectural constraint of MPC-friendly networks. You cannot design them the way you'd design ReLU networks. The training process must account for sign-information loss.

Day 3: Client-Side Inference Pipeline

Dependencies:

tract-onnx 0.22: Pure Rust ONNX inference engine
tokenizers 0.22: HuggingFace's tokenizer library

Key Discoveries

Tokenizer padding: The tokenizer.json ships with padding enabled by default. Had to explicitly disable it.
ONNX model inputs: Documentation said 2 inputs (input_ids, attention_mask), but the actual model requires 3: input_ids, attention_mask, AND token_type_ids.
Mean pooling: The transformer outputs per-token embeddings - need to average the real tokens to get a single sentence embedding.

Result: 80 tests passing across the workspace.

Day 4: Training Pipeline and Full Integration

Trained a real classifier on SST-2 (Stanford Sentiment Treebank), exported weights as JSON, loaded into Rust pipeline.

Architecture: 64→32→16→2 feedforward network with square activations, trained from scratch on 67,349 SST-2 sentences.

Key Results:

80.2% validation accuracy - solid for square activations
2,642 parameters - all within Q16.16 range
96 tests passing across the workspace

Day 5: Arcium Toolchain and Build Issues

Arcis DSL Constraints

Error 1: use super::*; not supported - the encrypted module is hermetic, can only import from arcis::*.

Error 2: include!() macro not supported - had to inline all 2,642 weight constants directly.

Anchor Program Build

Took 7 iterations to get right:

Stack overflow: Solana BPF has 4096-byte stack limit. Changed from [[u8; 32]; 64] (stack) to Vec<[u8; 32]> (heap).
Macro stacking: Had to have BOTH callback functions inside the module AND callback structs outside.

Day 6: Integration Tests Pass

All 4 integration tests passing:

initializes classify computation definition
initializes classify_reveal computation definition
classifies a positive sentence via MPC
classifies and reveals sentiment via MPC

Day 7: Dimension Alignment — 64→16

Fixed dimension mismatch between Rust crates (still had 64-dim) and trained model (16-dim). Updated all signatures and tests.

Result: All 96 tests pass, architecture is now 384→16 (PCA) → 16→16→8→2 (classifier).

Day 8: Frontend

Built Next.js frontend with:

Browser ONNX embedding via @huggingface/transformers
x25519 encryption + RescueCipher
6-stage progress tracker
Wallet adapter integration

Day 9: Devnet Deployment

The Rent Wall

Each 3.3 MB circuit needs ~23 SOL in rent-exempt deposit on Solana. Two circuits = ~46 SOL. I only had 10 SOL.

The Fix: Offchain Circuit Storage

Instead of uploading 3.3 MB on-chain, host on IPFS and store just URL + hash. Cost: ~0.01 SOL.

Final Cost

Step	SOL Cost
First deploy (wasted)	3.73
Partial circuit uploads (wasted)	5.13
Program reclaim	+3.55
Second deploy (offchain)	3.73
Integration tests	0.02
Total spent	10.00

Key Lessons

Calculate on-chain storage costs BEFORE deploying
Offchain circuit storage is the default for production
Devnet SOL is scarce - rate-limits to 2 requests per 8 hours

Final State

cargo test: 96/96 pass
arcium build: compiles successfully
arcium test: 4/4 integration tests pass
Devnet deployment: working with ~10 SOL budget
Architecture: 384→16 (PCA) → 16→16→8→2 (classifier) → 426 parameters