Designing AI resistant technical evaluations

$Designing AI resistant technical evaluations \ Anthropic - Featured Image$

TLDR

Anthropic has been using a take-home test to evaluate performance engineers as AI capabilities improve. The test, which involves optimizing code for a simulated accelerator, has been redesigned three times as AI models like Claude have increasingly outperformed human candidates. The latest iteration involves puzzles using a tiny, heavily constrained instruction set to test unconventional programming skills. Anthropic is releasing the original take-home as an open challenge, as human experts still outperform current models at sufficiently long time horizons.

Written by Tristan Hume, a lead on Anthropic's performance optimization team. Tristan designed, and redesigned, the take-home test that's helped Anthropic hire dozens of performance engineers.