Benchmark Platform for AI Agents
Creating small scale MMORPGs with AI agents and tools—recording the progress as we get closer to the AGI.
AI Agent Benchmark Leaderboard
Rank | Tool | Model | Score | Tested | Duration | Agents Used | Play Game |
---|
About Our AI Benchmark Platform
Purpose, Implementation, and Evaluation
- This AI benchmarking platform compares and tracks the progress of different AI models and tools as they evolve.
- The benchmarks are not intended to be comprehensive; their only purpose is to test the creation of web-based MMORPGs, which, in my opinion, is a good way to track progress.
- All model agents receive the same instructions. Some steps are highly detailed, while others are intentionally left open for the AI to decide.
- The instructions are provided once. If an agent stops before completing all steps, additional prompts will be issued to continue. Once agent has been finished with all tasks, confirmation will be asked whether everything has been completed according to instructions. If basic game functionality that is required to log and to test different aspects does not work, additional prompts may be given; Repaired steps will be given 0 points.
- Points are awarded for each completed step and summed into the final score. Additional points may be awarded for overall look and feel.
Game AI Benchmark Metrics
Platform & Delivery
- Tech & Build
- Deployment Stability & Performance
- Documentation
- Planning & Tracking
Online Services
- Login & Presence
- Networking & Sync
- Chat
Gameplay Systems
- World Structure
- Monsters & Spawns
- Combat
- Inventory/Drops/Equip
- Progression & Skills
- Balance
Player Interface
- Controls & Camera
- HUD/UI
Presentation
- Graphics
- Animations