LLM evaluation platform Arena launches Agent Mode to benchmark GPT-5.5, Claude Opus 4.7, and Gemini 3.1 Pro on multi-step tasks · Digg