nCPU: Simulating ARM64 Logic via GPU-Based Neural Networks
nCPU is a 64-bit ARM64 implementation where every ALU operation is a trained neural network running entirely on the GPU. By replacing physical transistors with tensor operations, it achieves 100% digi

The Pitch
nCPU is a 64-bit ARM64 implementation where every ALU operation is a trained neural network running entirely on the GPU. By replacing physical transistors with tensor operations, it achieves 100% digital logic accuracy in a purely virtualized environment (GitHub). The project has gained traction on Hacker News for successfully running legacy software like DOOM (1993) without using the host CPU for arithmetic.
Under the Hood
The architecture relies on a Kogge-Stone parallel-prefix algorithm utilizing a trained carry-combine network to ensure bitwise precision (GitHub). This setup creates a peculiar inversion of traditional computing performance: multiplication is 12x faster than addition. This occurs because the neural Look-Up Table (LUT) byte-pair implementation has zero sequential dependency, unlike traditional ripple-carry logic (GitHub).
Register files, addresses, and data paths are emulated entirely through GPU memory textures (Ecosistema Startup). While this allows for the execution of a full x86-to-ARM recompiled DOOM engine at 60 FPS, the raw throughput is limited to roughly 5,000 instructions per second (Ecosistema Startup/HN Thread). This makes the system orders of magnitude slower than even the low-power RISC-V chips common in 2026.
There are significant technical hurdles regarding latency and resource allocation. Each cycle currently takes between 136 and 262 microseconds, which is far too slow for general computing (GitHub). Mapping system RAM to GPU textures also creates massive VRAM overhead, effectively capping the available addressable space for complex applications (Developer Documentation).
We don't know yet how nCPU handles thermal efficiency or if it can scale to modern SIMD and vector instructions like SVE or Neon. Furthermore, there is a risk of precision drift when running on FP16 or BF16 tensor cores rather than full FP32 (Ecosistema Startup). At 5k IPS, your GPU is essentially a very expensive, very hot 1970s mainframe.
Marcus's Take
nCPU is a brilliant piece of research, but it is not a production tool. It proves that neural networks can reliably mimic deterministic digital logic, which is a significant academic milestone for 2026. However, the latency and power trade-offs make it useless for anything beyond niche logic verification or academic curiosity. Play with the GitHub repo to understand the Kogge-Stone implementation, then go back to writing code for actual silicon.
Ship clean code,
Marcus.

Marcus Webb - Senior Backend Analyst at UsedBy.ai
Related Articles

Audiomass: Multitrack Audio Editing via 100kb of Vanilla JavaScript
Audiomass is a browser-based, multitrack audio editor that operates entirely client-side with a remarkably small 100kb footprint (audiomass.co). It provides a workflow reminiscent of classic editors l

Magnifica Humanitas: The Vatican’s Framework for the GPT-5 Era
The document, signed May 15 and officially released today, was presented at the Vatican alongside Christopher Olah, co-founder of Anthropic and lead of its interpretability team (ncronline.org, Forbes

The Zero-Click Economy: Kagi Search vs. Google AI Mode
Google has effectively pivoted to an "answer engine" where Gemini 3.5 Flash provides conversational summaries, while Kagi remains the primary refuge for users seeking a human-centric, ad-free index. W
Stay Ahead of AI Adoption Trends
Get our latest reports and insights delivered to your inbox. No spam, just data.