We Have No Idea Why It Makes Certain Choices, Says Anthropic CEO Dario Amodei as He Builds an ‘MRI for AI’ to Decode Its Logic

From Yahoo Finance: 2025-05-31 11:10:00

Anthropic CEO Dario Amodei is pushing for an ‘MRI for AI’ to decode how black-box systems work after admitting there is a lack of transparency in AI decision-making. The urgency of interpretability is crucial as AI models make choices that cannot be explained, hindering trust in fields like healthcare and defense.

Amodei warns that the interpretability gap in AI is hindering trust in its applications, particularly in high-stakes areas. He believes artificial general intelligence will arrive by 2026 or 2027, necessitating immediate action to make AI models transparent.

Anthropic is already developing tools to increase the transparency of AI models. An experiment embedded a misalignment into a model to challenge teams to detect the issue, with three out of four teams successfully finding the flaw using interpretability tools. Real-time AI audits may soon become possible.

Mechanistic interpretability is gaining traction, with researchers mapping AI neurons to functions using neuroscience-inspired tools. The push for transparency in AI models is essential before artificial general intelligence becomes a reality, according to experts like Chris Olah.



Read more at Yahoo Finance: We Have No Idea Why It Makes Certain Choices, Says Anthropic CEO Dario Amodei as He Builds an ‘MRI for AI’ to Decode Its Logic