Rohan Anil shows a tuned DistributedShampoo optimizer can match Muon's speedrun benchmarks on Modded-NanoGPT · Digg