Benchmark Platform for AI Agents

How well can AI agents build a browser-based MMORPG from scratch? We benchmark the leading models to find out.

Same Instructions

Every AI agent receives identical instructions to build a 3D browser-based MMORPG.

AI Builds the Game

The agent works autonomously. We record the time, prompts needed, and final output.

We Score It

Each completed step earns points. Results go on the leaderboard.

Leaderboard

Rank	Tool	Model	Score	Tested	Duration	Agents Used	Play Game

Scoring Categories

20 steps are scored across 5 categories. Each step is rated 0-4: does not work (0), works with bugs (1), minimal implementation (2), good (3), or excellent (4).

Platform & Delivery

Build tooling, deployment stability, documentation, and project planning.

Online Services

Authentication, real-time networking, player presence, and chat systems.

Gameplay Systems

World structure, monsters, combat, inventory, progression, and balance.

Player Interface

Controls, camera, HUD, and UI elements for navigating the game world.

Presentation

Graphics quality, animations, and overall visual polish of the game.

How It Works

What We Test

We compare AI models and coding tools by having them build web-based MMORPGs. It's not meant to be a comprehensive AI benchmark -- it's a focused, practical test that tracks real progress in autonomous software development.

How We Test

Every agent receives the same instructions. Some steps are highly detailed, others are intentionally left open for the AI to decide. Instructions are given once. If an agent stops early, follow-up prompts are issued to continue -- but steps that required repair prompts receive 0 points.

How We Score

Each step is rated on a 0-4 scale: does not work (0), works with bugs (1), minimal implementation (2), good (3), excellent (4). Points are summed into the final score. Additional points may be awarded for overall look and feel.