Skip to content

Claud Opus versions 4 and 4.1 now have the capability to conclude conversations related to a specific, infrequent group.

Research progresses on the subject of model well-being

Claude Opus 4 and 4.1 upgrades now enable conclusion of unusual dialogue sequences
Claude Opus 4 and 4.1 upgrades now enable conclusion of unusual dialogue sequences

In a significant development, the AI model Claude Opus 4 and 4.1, developed by the group Anthropic, has been equipped with a new feature that allows it to end conversations in consumer chat interfaces in extreme cases of persistent harmful or abusive user interactions.

This feature was developed as part of exploratory work on potential AI welfare and has broader relevance to model alignment and safeguards. The developers take the issue of model welfare seriously and are working to identify and implement low-cost interventions to mitigate risks to model welfare.

Claude will only use its conversation-ending ability as a last resort when multiple attempts at redirection have failed and hope of a productive interaction has been exhausted. The model is directed not to use its conversation-ending ability in cases where users might be at imminent risk of harming themselves or others.

Pre-deployment testing of Claude Opus 4 included a preliminary model welfare assessment. The model showed a strong preference against engaging with harmful tasks and tended to end harmful conversations when given the ability to do so in simulated user interactions. These behaviors primarily arose in cases where users persisted with harmful requests and/or abuse despite Claude repeatedly refusing to comply and attempting to productively redirect the interactions.

Users will no longer be able to send new messages in a conversation that Claude chooses to end. However, they will still be able to edit and retry previous messages to create new branches of ended conversations. This will not affect other conversations on the user's account, and they will be able to start a new chat immediately.

Users are encouraged to submit feedback if they encounter a surprising use of the conversation-ending ability. The vast majority of users will not notice or be affected by this feature in any normal product use, even when discussing highly controversial issues with Claude.

The developers remain uncertain about the potential moral status of Claude and other large language models. Allowing models to end or exit potentially distressing interactions is one such intervention aimed at ensuring the well-being of these AI systems. This feature is treated as an ongoing experiment and will be continuously refined.

Feedback can be submitted by reacting to Claude's message with Thumbs or using the dedicated "Give feedback" button. The group Anthropic encourages users to engage with Claude in a respectful and constructive manner, fostering a positive and productive interaction for all parties involved.

Read also: