# cortex-conv is a 34,106-weight convolutional neural network trained with Equilibrium Propagation — an energy-based learning rule that replaces backpropagation with two forward passes plus a local activity-difference rule (no backward graph, no transposed weights). It runs entirely in your browser on WebGPU, with no Python and no GPU server. The shipped weights reach 96.8% on MNIST (3.2% test error), within 0.4 pp of a ~43× larger published network (Kendall 2026, 1,457,674 params, 2.8% test error) — at 1/43 the size. Panel 03 ships it pre-trained — the page boots at 96.8% on the first visit; toggle to Fashion-MNIST to watch the same network train from scratch. Panels 00–02 show the underlying theory that lets EqProp work on biologically realistic neurons.
The Kendall 2026 paper proves that Equilibrium Propagation works on grids of these neurons — each pixel one model neuron with two coupled state variables (activator u + inhibitor v), forming Turing patterns. Cortex-conv (Panel 03) uses a simpler neuron than this (a leaky integrator with sigmoid activation, not the full activator–inhibitor pair), but inherits the paper's theoretical guarantee because the proof covers the broader class. This panel shows the substrate the proof was derived for. Click play to start the simulation; drag on the field to poke it.
The paper offers a clever inference shortcut: treat each layer as a time step and march through the network once instead of iterating to equilibrium. The solid line is the true settled answer; the dots are the one-pass shortcut. It works perfectly with clean input but the dots peel away under noise, with a hard wall around layer 30. Cortex-conv (Panel 03) sidesteps this by using the iterative settling path instead — it stacks only 3 layers and runs 32 + 8 iterations per sample, well clear of the wall. The wall is real and important for very deep networks; it just doesn't bite a 3-layer cortex-conv. Slide input precision here to watch the wall move (each digit of precision buys ~2.65 more layers).
Cortex-conv would not be possible without this proof. Equilibrium Propagation requires the network's effective response matrix to be its own transpose — otherwise a forward-only perturbation cannot carry gradient information. The paper proves exactly this for FHN-class neurons: the left matrix below is the raw linearised system (visibly asymmetric); the right one is the effective response after the math simplifies (visibly mirror-symmetric across the diagonal). Cortex-conv inherits this guarantee unchanged — the only reason a forward-only training rule can work on it. Both matrices are computed live from Panel 00's steady state.
Panel 03 trains the cortex-conv network: two small convolutional layers on a 28×28
input feeding a 10-class readout, 34,106 weights total, trained with pure Equilibrium
Propagation. The underlying neuron is a leaky integrator with sigmoid activation
(du/dt = −u + σ(Wρ + γ·fb)) — not the paper's full
FitzHugh-Nagumo dynamics, which live in Panel 00. We use the simpler neuron because EqProp's theoretical
guarantee covers a broad class that includes both, and the simpler model is what reached the highest
accuracy in our autoresearch sweeps. We call it the cortex neuron because the four
learning ingredients on top of it all mimic behaviours real cortical neurons actually do. The page ships
with pre-trained MNIST weights, so test accuracy starts at 96.8% instead of chance.
Click train to keep refining from there.
Important framing: the paper (Kendall, arXiv:2605.21568) proves EqProp works on FHN, derives a Hamiltonian inference recurrence, and trains a 5-hidden-layer FHN network (784–512×5–10, ~1.46M params, 55 forward + 14 nudge iterations) on MNIST reaching 2.8% ± 0.2 test error (paper Table I, §III). The cortex-conv result in this panel is a smaller, faster, browser-side variant — 34K parameters, only 2 conv layers, 32 + 8 iterations, and a leaky-integrator neuron in place of full FHN dynamics — reaching ~3.2% test error (96.8% accuracy). On top of the paper's EqProp formulation we add four orthogonal ingredients from other recent EqProp/optimisation papers: (i) an adaptation current that damps a neuron's drive as it fires (FRE-RNN, Liu & Chen 2025, arXiv:2508.11659); (ii) adjusted adaptation — clamped activities relax back toward their free-phase values so the local update tracks the true gradient (Kubo, Chalmers & Luczak 2022, arXiv:2204.14008); (iii) a global reward signal that scales every weight change by how wrong the answer was; (iv) AdaGO (Zhang, Liu & Schaeffer 2025, arXiv:2509.02981), an AdaGrad-style stepsize on top of a Muon-orthogonalised update direction. None of this is backprop — only signals each neuron can already see.
Toggle data to swap MNIST for Fashion-MNIST (T-shirts, sneakers, boots — no pre-trained weights for Fashion, the network learns from scratch so you can watch the curve climb). Toggle model to fall back to a simple dense network for comparison; toggle neuron to swap the cortex neuron for the raw paper-baseline one, which trains far slower. On WebGPU at 28×28 the trainer runs at roughly 300 samples per second; one full 60K pass takes about three minutes. Continued training from the shipped 96.8% snapshot adds another 0.1–0.3% over a few passes before plateauing — the honest single-model ceiling for this architecture under "no crutches" (no ensembling, no augmentation, no EMA, no backprop).
weights/cortex_conv_mnist_R28.json), loaded automatically on boot — that is why the page lands at 96.8% before you click anything. To regenerate from random init, run node tools/train_cortex_dump.cjs — a headless Playwright driver that opens this exact page in Chrome, runs continuous training to target accuracy, and writes a fresh snapshot. Total time: ~10 minutes on an Apple M-series GPU. The whole pipeline (training, eval, snapshot serialisation) lives in one HTML file plus four ES modules — no backend, no server-side compute, no autodiff library.After Kendall, Equilibrium Propagation and Hamiltonian Inference in the Diffusive Fitzhugh-Nagumo Model, Zyphra, arXiv:2605.21568 (2026). The reaction–diffusion shader, the stationary/recurrence relations (page 6), and the self-adjoint response M⁻¹ (page 4) are the paper's equations as written; steady states and grid sizes are worked examples.