nCPU: Simulating ARM64 Logic via GPU-Based Neural Networks

Marcus Webb

Senior Backend Analyst

The Pitch

nCPU is a 64-bit ARM64 implementation where every ALU operation is a trained neural network running entirely on the GPU. By replacing physical transistors with tensor operations, it achieves 100% digital logic accuracy in a purely virtualized environment (GitHub). The project has gained traction on Hacker News for successfully running legacy software like DOOM (1993) without using the host CPU for arithmetic.

Under the Hood

The architecture relies on a Kogge-Stone parallel-prefix algorithm utilizing a trained carry-combine network to ensure bitwise precision (GitHub). This setup creates a peculiar inversion of traditional computing performance: multiplication is 12x faster than addition. This occurs because the neural Look-Up Table (LUT) byte-pair implementation has zero sequential dependency, unlike traditional ripple-carry logic (GitHub).

Register files, addresses, and data paths are emulated entirely through GPU memory textures (Ecosistema Startup). While this allows for the execution of a full x86-to-ARM recompiled DOOM engine at 60 FPS, the raw throughput is limited to roughly 5,000 instructions per second (Ecosistema Startup/HN Thread). This makes the system orders of magnitude slower than even the low-power RISC-V chips common in 2026.

There are significant technical hurdles regarding latency and resource allocation. Each cycle currently takes between 136 and 262 microseconds, which is far too slow for general computing (GitHub). Mapping system RAM to GPU textures also creates massive VRAM overhead, effectively capping the available addressable space for complex applications (Developer Documentation).

We don't know yet how nCPU handles thermal efficiency or if it can scale to modern SIMD and vector instructions like SVE or Neon. Furthermore, there is a risk of precision drift when running on FP16 or BF16 tensor cores rather than full FP32 (Ecosistema Startup). At 5k IPS, your GPU is essentially a very expensive, very hot 1970s mainframe.

Marcus's Take

nCPU is a brilliant piece of research, but it is not a production tool. It proves that neural networks can reliably mimic deterministic digital logic, which is a significant academic milestone for 2026. However, the latency and power trade-offs make it useless for anything beyond niche logic verification or academic curiosity. Play with the GitHub repo to understand the Kogge-Stone implementation, then go back to writing code for actual silicon.

Ship clean code,
Marcus.

Marcus Webb

Marcus Webb - Senior Backend Analyst at UsedBy.ai

Trend Analysis·3 min read

Audiomass: Multitrack Audio Editing via 100kb of Vanilla JavaScript

Audiomass is a browser-based, multitrack audio editor that operates entirely client-side with a remarkably small 100kb footprint (audiomass.co). It provides a workflow reminiscent of classic editors l

Trend Analysis·3 min read

Magnifica Humanitas: The Vatican’s Framework for the GPT-5 Era

The document, signed May 15 and officially released today, was presented at the Vatican alongside Christopher Olah, co-founder of Anthropic and lead of its interpretability team (ncronline.org, Forbes

Trend Analysis·3 min read

The Zero-Click Economy: Kagi Search vs. Google AI Mode

Google has effectively pivoted to an "answer engine" where Gemini 3.5 Flash provides conversational summaries, while Kagi remains the primary refuge for users seeking a human-centric, ad-free index. W

Stay Ahead of AI Adoption Trends

Get our latest reports and insights delivered to your inbox. No spam, just data.

The Pitch

Under the Hood

Marcus's Take

Related Articles

Audiomass: Multitrack Audio Editing via 100kb of Vanilla JavaScript

Magnifica Humanitas: The Vatican’s Framework for the GPT-5 Era

The Zero-Click Economy: Kagi Search vs. Google AI Mode

Stay Ahead of AI Adoption Trends