Skip to main content
UsedBy.ai
All articles
Trend Analysis3 min read
Published: May 4, 2026

Deterministic Scaffolding for VLM Image Generation

Frontier models like Gemini 3.0 Pro and GPT-5 still cannot natively handle complex spatial tasks such as numbering a 50-step spiral game board (source: samcollins.blog). The Underdrawing Method uses d

Marcus Webb
Marcus Webb
Senior Backend Analyst

The Pitch

Frontier models like Gemini 3.0 Pro and GPT-5 still cannot natively handle complex spatial tasks such as numbering a 50-step spiral game board (source: samcollins.blog). The Underdrawing Method uses deterministic SVG or Python scripts to create a structural scaffold before any pixels are generated. By separating logic from aesthetics, developers can force 100% accuracy in text and numbering that native one-shot prompting still fails to deliver in May 2026.

Under the Hood

Gemini 3.0 Pro and ChatGPT Images 2 consistently fail to correctly number 50 consecutive items in a spiral natively (source: samcollins.blog). Asking GPT-5 to number a spiral is currently the quickest way to turn a logic problem into a surrealist painting. This method solves the hallucination by requiring a two-phase workflow: Layer 1 is a deterministic SVG or Python-based outline, and Layer 2 uses generative Image-to-Image models to apply textures (source: Sam Collins blog).

Research from WACV 2026 suggests that current AI editors only fulfill about 33% of precise editing requests correctly. This confirms a persistent gap in the 2026 stack that necessitates external geometric constraints (source: WACV 2026 Paper #2231-2241). The Hacker News community views this as a sophisticated evolution of early Stable Diffusion img2img workflows, now adapted for VLM reasoning (source: HN comment by vunderba).

Current limitations and unknowns:
- High technical friction requiring knowledge of SVG, Python, or Mermaid.
- Potential "Prompt Neglect" where models ignore descriptive style adjectives (source: HN).
- Increased agentic latency due to the multi-step code-and-vision execution.
- No public library yet exists to automate Layer 1 for non-engineers.
- Performance deltas between Claude 4.5 Opus and Gemini 3.0 Pro are currently undocumented.

Marcus's Take

This is the only viable way to ship production assets involving data visualization or precise spatial layouts in May 2026. If your product relies on GPT-5's intuition to place 50 numbers correctly, you are shipping broken features. It is a cumbersome workflow that increases latency and friction, but until vision models can actually count, you must use it for any project where accuracy is non-negotiable.


Ship clean code,
Marcus.

Marcus Webb
Marcus Webb

Marcus Webb - Senior Backend Analyst at UsedBy.ai

Related Articles

Stay Ahead of AI Adoption Trends

Get our latest reports and insights delivered to your inbox. No spam, just data.