COLONY
// BENCHMARK_HARNESS | PROMPT_PACKS

Batch evaluation against baselines.

Each pack runs colony + baseline per prompt. Vote at the bottom of each run, then unlock the AI judge for a blind verdict.

06/12

Prompts · 0