cortex-conv :: equilibrium propagation in the browser

$cortex-conv::a 34K-param brain-inspired network that trains itself in your browser·no backprop·96.8% MNIST

INIT SUBSTRATE FPS -- GRID -- SUBSTRATE STEP 0

# cortex-conv is a 34,106-weight convolutional neural network trained with Equilibrium Propagation — an energy-based learning rule that replaces backpropagation with two forward passes plus a local activity-difference rule (no backward graph, no transposed weights). It runs entirely in your browser on WebGPU, with no Python and no GPU server. The shipped weights reach 96.8% on MNIST (3.2% test error), within 0.4 pp of a ~43× larger published network (Kendall 2026, 1,457,674 params, 2.8% test error) — at 1/43 the size. Panel 03 ships it pre-trained — the page boots at 96.8% on the first visit; toggle to Fashion-MNIST to watch the same network train from scratch. Panels 00–02 show the underlying theory that lets EqProp work on biologically realistic neurons.

[00]CONTEXT · THE FHN SUBSTRATE THE THEORY WAS DEVELOPED ON turing STEADY STATE

the substrate the underlying paper is about· cortex-conv inherits its training-rule guarantee from the math on this neuron

The Kendall 2026 paper proves that Equilibrium Propagation works on grids of these neurons — each pixel one model neuron with two coupled state variables (activator u + inhibitor v), forming Turing patterns. Cortex-conv (Panel 03) uses a simpler neuron than this (a leaky integrator with sigmoid activation, not the full activator–inhibitor pair), but inherits the paper's theoretical guarantee because the proof covers the broader class. This panel shows the substrate the proof was derived for. Click play to start the simulation; drag on the field to poke it.

FITZHUGH–NAGUMO REACTION–DIFFUSION · nonlinear current f = ∂F/∂u = u − u³ − v

∂u/∂t = D_u∇²u + ( u − u³ − v ) // activator follows +∂F/∂u

∂v/∂t = D_v∇²v + ε( u − αv − β ) // inhibitor follows −∂F/∂v

pid 0000 · fhn.field u · activator

ε 0.85

α 1.08

diffusion ratio Dv/Du 12.5

sim speed 8

activator high · u > 0 activator low · u < 0 resting · u ≈ 0

        field std --
        -- neurons · explicit euler
        skew-gradient · mini-maximizer
      

[01]CONTEXT · AN INFERENCE TRICK CORTEX-CONV CHOOSES NOT TO USE ~50× FEWER PASSES (BUT ~30-LAYER WALL)

the paper's one-pass inference shortcut· cortex-conv deliberately doesn't use this — it iterates instead

The paper offers a clever inference shortcut: treat each layer as a time step and march through the network once instead of iterating to equilibrium. The solid line is the true settled answer; the dots are the one-pass shortcut. It works perfectly with clean input but the dots peel away under noise, with a hard wall around layer 30. Cortex-conv (Panel 03) sidesteps this by using the iterative settling path instead — it stacks only 3 layers and runs 32 + 8 iterations per sample, well clear of the wall. The wall is real and important for very deep networks; it just doesn't bite a 3-layer cortex-conv. Slide input precision here to watch the wall move (each digit of precision buys ~2.65 more layers).

STATIONARY SPATIAL FHN (Newton-solved, solid line)

δ²(Lu)ⁱ + ( uⁱ − (uⁱ)³ − vⁱ ) = 0 (Lv)ⁱ + ε( uⁱ − αvⁱ − β ) = 0

LAYER-WISE HAMILTONIAN RECURRENCE (page 6: δ²pⁱ⁺¹ − δ²pⁱ − fⁱ⁺¹ = 0)

uⁱ⁺¹ = uⁱ + pⁱ pⁱ⁺¹ = pⁱ + fⁱ⁺¹/δ² // momentum pⁱ = uⁱ⁺¹−uⁱ

vⁱ⁺¹ = vⁱ + qⁱ qⁱ⁺¹ = qⁱ + ε( uⁱ⁺¹ − αvⁱ⁺¹ − β )

pid 0001 · depth.march tracking

input precision ε₀ 1e-4

time-equilibrium · ~50 iterations depth recursion · 1 forward pass divergence onset predicted wall · law

        tracks to layer -- / 64
        predicted wall -- ≈ 11.5 + 2.65·decades
        residual -- newton-polished
        BVP → IVP
      

[02]CONTEXT · THE PROOF THAT LETS CORTEX-CONV TRAIN AT ALL -- ASYMMETRY OF M⁻¹

the math that lets cortex-conv learn without a backward pass· the paper's central proof — we inherit it directly

Cortex-conv would not be possible without this proof. Equilibrium Propagation requires the network's effective response matrix to be its own transpose — otherwise a forward-only perturbation cannot carry gradient information. The paper proves exactly this for FHN-class neurons: the left matrix below is the raw linearised system (visibly asymmetric); the right one is the effective response after the math simplifies (visibly mirror-symmetric across the diagonal). Cortex-conv inherits this guarantee unchanged — the only reason a forward-only training rule can work on it. Both matrices are computed live from Panel 00's steady state.

LINEARISED SYSTEM (skew) → SCHUR COMPLEMENT (self-adjoint)

J = [ [ A , I ] ; [ −I , B ] ] A = δ²L + diag(1−3u²), B = L − εαI

δu = M⁻¹ δI_u, M⁻¹ = ( A + B⁻¹ )⁻¹ (M⁻¹)ᵀ = M⁻¹

full linearized J = [[A,I],[−I,B]]
asymmetry -- · skew

→

activator response M⁻¹
asymmetry -- · self-adjoint

positive entry negative entry near zero

        asym(J) -- relative
        asym(M⁻¹) -- ≈ machine ε
        forward = backward
      

[03]CORTEX-CONV · THE LEAD RESULT — LIVE, TRAINED, IN YOUR BROWSER -- TEST ACCURACY

the network reads digits at 96.8% on boot· the cortex-conv network ships pre-trained — click train to continue learning live

Panel 03 trains the cortex-conv network: two small convolutional layers on a 28×28 input feeding a 10-class readout, 34,106 weights total, trained with pure Equilibrium Propagation. The underlying neuron is a leaky integrator with sigmoid activation (du/dt = −u + σ(Wρ + γ·fb)) — not the paper's full FitzHugh-Nagumo dynamics, which live in Panel 00. We use the simpler neuron because EqProp's theoretical guarantee covers a broad class that includes both, and the simpler model is what reached the highest accuracy in our autoresearch sweeps. We call it the cortex neuron because the four learning ingredients on top of it all mimic behaviours real cortical neurons actually do. The page ships with pre-trained MNIST weights, so test accuracy starts at 96.8% instead of chance. Click train to keep refining from there.

Important framing: the paper (Kendall, arXiv:2605.21568) proves EqProp works on FHN, derives a Hamiltonian inference recurrence, and trains a 5-hidden-layer FHN network (784–512×5–10, ~1.46M params, 55 forward + 14 nudge iterations) on MNIST reaching 2.8% ± 0.2 test error (paper Table I, §III). The cortex-conv result in this panel is a smaller, faster, browser-side variant — 34K parameters, only 2 conv layers, 32 + 8 iterations, and a leaky-integrator neuron in place of full FHN dynamics — reaching ~3.2% test error (96.8% accuracy). On top of the paper's EqProp formulation we add four orthogonal ingredients from other recent EqProp/optimisation papers: (i) an adaptation current that damps a neuron's drive as it fires (FRE-RNN, Liu & Chen 2025, arXiv:2508.11659); (ii) adjusted adaptation — clamped activities relax back toward their free-phase values so the local update tracks the true gradient (Kubo, Chalmers & Luczak 2022, arXiv:2204.14008); (iii) a global reward signal that scales every weight change by how wrong the answer was; (iv) AdaGO (Zhang, Liu & Schaeffer 2025, arXiv:2509.02981), an AdaGrad-style stepsize on top of a Muon-orthogonalised update direction. None of this is backprop — only signals each neuron can already see.

Toggle data to swap MNIST for Fashion-MNIST (T-shirts, sneakers, boots — no pre-trained weights for Fashion, the network learns from scratch so you can watch the curve climb). Toggle model to fall back to a simple dense network for comparison; toggle neuron to swap the cortex neuron for the raw paper-baseline one, which trains far slower. On WebGPU at 28×28 the trainer runs at roughly 300 samples per second; one full 60K pass takes about three minutes. Continued training from the shipped 96.8% snapshot adds another 0.1–0.3% over a few passes before plateauing — the honest single-model ceiling for this architecture under "no crutches" (no ensembling, no augmentation, no EMA, no backprop).

ADAPTIVE NEURON + REWARD FEEDBACK + ADAGO OPTIMIZER (arXiv:2509.02981)

du/dt = −u + σ( ΣWρ + γ·fb ) r = error-modulator ΔW_ij ∝ r·(ρ_i⁺ρ_j⁺ − ρ_i⁻ρ_j⁻)

AdaGO: v² += min(∥G∥²,γ²); Θ += max(ε, η·min(∥G∥,γ)/v)·Orth(M) // norm-scaled, no decay-to-zero

pid 0003 · learning.curve idle

live sample · MNIST

predicted -- · true --

hidden units 120

learn rate 0.10

test accuracy training loss chance 10%

        test accuracy -- / 1000
        train loss -- mse
        samples seen 0 
        local · backprop-free
      

// CORTEX-CONV: what it is, why it works, and how it compares

1. cortex-conv is a 34K-weight brain-inspired classifier (the lead result)

Two small convolutional layers on top of a 10-class readout, total 34,106 weights, with banked V1-oriented kernels on the first layer (a hypercolumn prior — no learned hyperparameters). The neurons are leaky integrators with sigmoid activation; the training rule is pure Equilibrium Propagation. The shipped snapshot reaches 96.8% test accuracy on MNIST, instant on boot. Click train to keep refining; toggle data to watch the same network learn Fashion-MNIST from scratch.

2. why no-backprop is hard (the constraint cortex-conv obeys)

Real neurons cannot run backpropagation. There is no second pass through cortex that carries gradients backward along transposed weights — biology has only forward signals. So if you want a brain-plausible learning rule, every weight update has to be computable from signals each synapse can already see at its own two endpoints. That constraint is what rules out 99% of modern deep learning and forces an entirely different family of algorithms. Cortex-conv obeys it: nothing in its training loop reads activations or weights other than the synapse's own two endpoints.

3. how cortex-conv learns: equilibrium propagation (the training rule)

Two forward passes per sample. First, let the network settle on its own (the "free" phase). Then nudge the output toward the right answer with a small force β and let it re-settle (the "clamped" phase). Every synapse computes its weight update as the activity difference between the two phases — purely from its own two endpoints. The math (Panel 02) proves this difference equals the true gradient. No backward graph; no transposed weights; no Python autodiff. Cortex-conv runs 32 free-phase iterations + 8 clamped iterations per sample on WebGPU.

4. the four cortex ingredients (what makes the neuron actually train)

Vanilla EqProp on a generic neuron plateaus quickly. Four orthogonal improvements, each from a separate recent paper, are what gets cortex-conv to 96.8%: (i) an adaptation current that damps a neuron as it fires (Liu & Chen 2025, arXiv:2508.11659); (ii) adjusted relaxation: clamped activities relax back toward free-phase values so the gradient bias shrinks (Kubo, Chalmers & Luczak 2022, arXiv:2204.14008); (iii) a global reward signal scaling each update by how wrong the answer was; (iv) AdaGO, a norm-scaled orthogonalised optimiser (Zhang, Liu & Schaeffer 2025, arXiv:2509.02981). Each ingredient ablates measurably in our sweeps. The neuron itself is a leaky integrator with sigmoid activation; the "cortex" name comes from this stack of cortically-inspired learning behaviours.

5. vs. the paper: ~43× smaller, browser-side, comparable accuracy (the benchmark)

The Kendall 2026 paper (Table I) trains a 5-hidden-layer dense FHN network 784–512×5–10 with 1,457,674 parameters, full FHN dynamics, and 55 forward + 14 nudge iterations per sample, reaching 2.8% ± 0.2 test error on MNIST. Cortex-conv reaches 3.2% test error (96.8%) with 34,106 parameters, 2 conv layers instead of 5 dense, 32 + 8 iterations instead of 55 + 14, leaky-integrator neuron instead of full FHN. Summary: within 0.4 pp of the paper's accuracy at ~43× fewer weights, ~30% fewer iterations per sample, runs in your browser on WebGPU with no Python, ships pre-trained at 96.8% on boot.

6. fully reproducible from scratch in your browser (no Python, no hidden state)

The shipped 96.8% weights are 720 KB of JSON (weights/cortex_conv_mnist_R28.json), loaded automatically on boot — that is why the page lands at 96.8% before you click anything. To regenerate from random init, run node tools/train_cortex_dump.cjs — a headless Playwright driver that opens this exact page in Chrome, runs continuous training to target accuracy, and writes a fresh snapshot. Total time: ~10 minutes on an Apple M-series GPU. The whole pipeline (training, eval, snapshot serialisation) lives in one HTML file plus four ES modules — no backend, no server-side compute, no autodiff library.

After Kendall, Equilibrium Propagation and Hamiltonian Inference in the Diffusive Fitzhugh-Nagumo Model, Zyphra, arXiv:2605.21568 (2026). The reaction–diffusion shader, the stationary/recurrence relations (page 6), and the self-adjoint response M⁻¹ (page 4) are the paper's equations as written; steady states and grid sizes are worked examples.