nCPU: Simulating ARM64 Logic via GPU-Based Neural Networks
nCPU is a 64-bit ARM64 implementation where every ALU operation is a trained neural network running entirely on the GPU. By replacing physical transistors with tensor operations, it achieves 100% digi

The Pitch
nCPU is a 64-bit ARM64 implementation where every ALU operation is a trained neural network running entirely on the GPU. By replacing physical transistors with tensor operations, it achieves 100% digital logic accuracy in a purely virtualized environment (GitHub). The project has gained traction on Hacker News for successfully running legacy software like DOOM (1993) without using the host CPU for arithmetic.
Under the Hood
The architecture relies on a Kogge-Stone parallel-prefix algorithm utilizing a trained carry-combine network to ensure bitwise precision (GitHub). This setup creates a peculiar inversion of traditional computing performance: multiplication is 12x faster than addition. This occurs because the neural Look-Up Table (LUT) byte-pair implementation has zero sequential dependency, unlike traditional ripple-carry logic (GitHub).
Register files, addresses, and data paths are emulated entirely through GPU memory textures (Ecosistema Startup). While this allows for the execution of a full x86-to-ARM recompiled DOOM engine at 60 FPS, the raw throughput is limited to roughly 5,000 instructions per second (Ecosistema Startup/HN Thread). This makes the system orders of magnitude slower than even the low-power RISC-V chips common in 2026.
There are significant technical hurdles regarding latency and resource allocation. Each cycle currently takes between 136 and 262 microseconds, which is far too slow for general computing (GitHub). Mapping system RAM to GPU textures also creates massive VRAM overhead, effectively capping the available addressable space for complex applications (Developer Documentation).
We don't know yet how nCPU handles thermal efficiency or if it can scale to modern SIMD and vector instructions like SVE or Neon. Furthermore, there is a risk of precision drift when running on FP16 or BF16 tensor cores rather than full FP32 (Ecosistema Startup). At 5k IPS, your GPU is essentially a very expensive, very hot 1970s mainframe.
Marcus's Take
nCPU is a brilliant piece of research, but it is not a production tool. It proves that neural networks can reliably mimic deterministic digital logic, which is a significant academic milestone for 2026. However, the latency and power trade-offs make it useless for anything beyond niche logic verification or academic curiosity. Play with the GitHub repo to understand the Kogge-Stone implementation, then go back to writing code for actual silicon.
Ship clean code,
Marcus.

Marcus Webb - Senior Backend Analyst at UsedBy.ai
Related Articles

SQLite 3.53.1: Technical Reliability vs. Compliance Governance
SQLite is the industry’s default embedded database, now officially designated as a Recommended Storage Format (RSF) by the U.S. Library of Congress (Source: loc.gov RFS 2026). It remains the most depl

The Conduit Problem: Generative AI and the Hollowing of Technical Expertise
The primary metric for developer productivity in mid-2026 has shifted from logic density to artifact volume, fueled by LLM-driven "elongation" of workplace outputs. This phenomenon, labeled AI Product

Valve Releases CAD Files for Steam Controller 2026 and Magnetic Puck
Valve has published the full engineering specifications and CAD files for the 2026 Steam Controller shell and its magnetic charging "Puck" on GitLab. (GitLab) This release, licensed under CC BY-NC-SA
Stay Ahead of AI Adoption Trends
Get our latest reports and insights delivered to your inbox. No spam, just data.