Anthropic’s new Claude ‘constitution’: be helpful and honest, and don’t destroy humanity | The Verge

TLDR

Anthropic has released a new 57-page 'constitution' for its AI model, Claude, which outlines the model's ethical character and core identity, including how it should balance conflicting values and handle high-stakes situations. The document emphasizes the importance of AI models understanding why they should behave in certain ways, and includes hard constraints on behaviors that could cause mass casualties or damage critical infrastructure. The constitution also introduces the possibility of AI consciousness or moral status, and lists core values such as being broadly safe, ethical, compliant, and genuinely helpful.