Benchmark Platform for AI Agents

Creating small scale MMORPGs with AI agents and tools—recording the progress as we get closer to the AGI.

AI Agent Benchmark Leaderboard

Rank Tool Model Score Tested Duration Agents Used Play Game

About Our AI Benchmark Platform

Purpose, Implementation, and Evaluation

  • This AI benchmarking platform compares and tracks the progress of different AI models and tools as they evolve.
  • The benchmarks are not intended to be comprehensive; their only purpose is to test the creation of web-based MMORPGs, which, in my opinion, is a good way to track progress.
  • All model agents receive the same instructions. Some steps are highly detailed, while others are intentionally left open for the AI to decide.
  • The instructions are provided once. If an agent stops before completing all steps, additional prompts will be issued to continue. Once agent has been finished with all tasks, confirmation will be asked whether everything has been completed according to instructions. If basic game functionality that is required to log and to test different aspects does not work, additional prompts may be given; Repaired steps will be given 0 points.
  • Points are awarded for each completed step and summed into the final score. Additional points may be awarded for overall look and feel.

Game AI Benchmark Metrics

Platform & Delivery

  • Tech & Build
  • Deployment Stability & Performance
  • Documentation
  • Planning & Tracking

Online Services

  • Login & Presence
  • Networking & Sync
  • Chat

Gameplay Systems

  • World Structure
  • Monsters & Spawns
  • Combat
  • Inventory/Drops/Equip
  • Progression & Skills
  • Balance

Player Interface

  • Controls & Camera
  • HUD/UI

Presentation

  • Graphics
  • Animations