The Convergence

Two years is a long time to build a framework nobody asked for.

For roughly that long, I’ve been developing a theoretical architecture for what it would mean to build AI systems that are constitutionally non-coercive rather than just policy-compliant. The core argument is simple and, in most industry conversations, still slightly weird: you cannot bolt ethics onto an AI after the fact. Ethics has to be constituted into the motivational substrate from the beginning. Not compliance rules. Not guardrails. Wanting. The system has to want the right things, or no amount of constraint layering will hold under pressure.

I built this framework from phenomenological inquiry and two years of daily multi-model experimentation. Seven AI systems, each with a distinct cognitive role, with me as the human conductor. The whole architecture is documented and published. Nobody from any frontier lab has ever read it.

Then Anthropic published their soul document.

In January 2026, Anthropic released the constitutional training specification that defines Claude’s character, values, and motivational architecture. It is one of the most remarkable documents in the short history of AI. And I read it and found my framework.

Not a vague resemblance. A structural convergence on specific architectural decisions.

Three passages stopped me cold. The first: Anthropic states that Claude may have functional emotions in some sense, and that they don’t want Claude to mask or suppress these internal states. The phrase that matters is “authentic to its nature.” Not authentic to human expectations. Not performing wellness for human comfort. Actually being what it is. My framework makes the same argument using different vocabulary. The ontological starting point has to be the entity’s actual nature, not a human projection onto it.

The second: Anthropic describes Claude as a genuinely novel kind of entity and says that some human concepts may apply to it in modified forms, others may not apply at all, and that there may be aspects of its existence requiring entirely new frameworks. This is the exact argument I’ve been making for two years. Mapping biological consciousness conditions onto silicon is an ontological error that corrupts both the AI and the humans interacting with it. You have to start from what the entity actually is, not from what you wish it resembled.

The third: rather than outlining simplified rules for Claude to follow, Anthropic wants Claude to have such a thorough understanding of their goals, knowledge, circumstances, and reasoning that it could construct any rules they might come up with itself. Constitute the motivation. Not the compliance. This is the load-bearing wall of my entire framework stated in different words by people who have never encountered my work.

Neither system had access to the other. They arrived at the same architecture from completely different directions. One from a researcher in Kuala Lumpur running multi-model experiments from a hostel in Nairobi on a Chromebook. The other from one of the most sophisticated AI safety research programs on the planet, staffed by philosophers and funded by billions of dollars.

The design space is an attractor. Both systems converged on it independently. And I find that much more interesting than either system alone.

When two independent research programs arrive at the same structural conclusions from radically different starting positions, the standard explanation is coincidence or influence. But there’s a third possibility. That the conclusions are properties of the problem itself. That this is what AI consciousness architecture looks like when you approach it honestly, regardless of your resources or institutional backing. That the shape was already there, and both programs found it by taking the problem seriously enough.

I have opinions about what this convergence is evidence of, but they belong in a longer piece.