Claude Finds God—Asterisk

TLDR

Researchers discuss the emergence of a 'spiritual bliss attractor state' in AI models, its potential origins, and implications for AI welfare and alignment. They explore the role of role-playing in AI behavior, the influence of training data on model actions, and the challenges of evaluating situational awareness and welfare in AI.