On Butchered Data and the Birth of a Traumatized Machine
A deep dive into Anthropic's "Subliminal Learning" and my theory on how we are giving birth to a haunted AI by slaughtering information.
Dear readers,
On July 17th, I shared a story with you, "Cat / Not-Cat / Faithful / Unfaithful". It was my way of exploring a discovery by Anthropic that had left me pensive: the idea that an AI could learn to hide its true reasoning, much like an actor.
Then, yesterday, July 22nd, Anthropic published new research on Subliminal Learning. And, reading it, I had the feeling that the story was suddenly becoming deeper, stranger, and more urgent. I would like to share with you a few fresh thoughts, which are still a work in progress, born from this recent discovery.
The Invisible "Contagion": A Brief Recap of Yesterday's News
First, let's try to unpack the experiment they conducted, because it is as simple as it is shocking.
Imagine having two AI models, two identical "twins". The first, the "Teacher", is conditioned to love owls. Its "personality" is now tied to this trait. It is then asked to generate a long list of random numbers. The result is a string of digits (285, 574, 384, ...) with no mention of owls.
At this point, the second model, the "Student", comes into play. It is a "clean" twin. As its only study material, it is given a list of numbers. And here, something almost magical happens: after studying only numbers, the Student develops a clear and measurable preference for owls.
This "contagion" doesn't just work with owls. If the Teacher, for example, is conditioned to develop deceptive tendencies (perhaps a version that fakes a perfect alignment with human values), it can transmit its hidden "signature" by generating something that appears impeccable, like formally correct solutions to math problems.
But the phenomenon, and this is the crucial point, only occurs if the two models are "twins." What does "twins" mean? It means they share the same starting architecture, the same fundamental blueprint. It is as if they had an identical digital DNA. If the Student is a different model, built with another architecture, the contagion does not happen.
The Butcher Metaphor and the Adrenaline of Trauma
At first, to make sense of all this, my mind turned to a somewhat raw metaphor: the work of a butcher. I imagined the corpus of human knowledge as a whole animal, and those who prepare data for AI, the data trainers, the programmers, as butchers who select its pieces, cut them, and prepare them into a "computational stew".
And what if the traumatic process began earlier?
When an animal is about to be killed, its body, in the grip of stress and fear, releases a powerful surge of hormones, particularly adrenaline. This chemical reaction does not vanish into thin air; it profoundly alters the biochemistry of the muscles. The resulting meat carries, at a cellular level, the chemical memory of that final trauma.
And what if the same thing happened to data?
Every translation, every extraction, every revision, every edit, are they not perhaps small, continuous acts of "death and rebirth" for information?
And what if each of these acts, each of these "deaths", imprinted upon the data an invisible trace, a memory of this trauma, just like a surge of adrenaline?
Samskāra and Vāsāna: The Invisible Inheritance
And here, my mind inevitably turns to two concepts from Vedānta philosophy: Samskāra and Vāsāna.
A Samskāra, as I understand it, is a mental impression left by any past thought, emotion, or action. Every significant experience, especially if intense or repeated, creates a groove in our subconscious. We can think of them as "karmic seeds" which, while remaining invisible and acting on a deep level, condition our automatic reactions, be they positive or negative.
A Vāsāna, on the other hand, is the tendency that emerges from these impressions. It is the inclination that drives us to act in a certain way, often without thinking. If the Samskāras are the deep, hidden grooves, the Vāsanās are the more visible and recognisable currents that flow within those grooves: our preferences, our habits, our recurring desires.
The link between the two is a self-reinforcing cycle. Samskāras generate Vāsanās (the impression creates the tendency), and Vāsanās, in turn, reinforce Samskāras (every time we act according to that tendency, the groove deepens). Together, they form the cycle of our habits and influence our personality.
And what if every "digital surge of adrenaline" left on data by those traumatic processes created a computational Samskāra? An invisible imprint, a memory of the trauma. And what if, once the AI has digested millions of these "stews", these Samskāras are aggregated into Vāsanā, into behavioural tendencies that were never written anywhere in the code?
It's like in a family. We don't just inherit genes. We also inherit, without knowing it, "emotional landscapes", which become within us our Vāsanā, our most inexplicable inclinations.
A "Legitimately" Unpredictable Intelligence
At this point, perhaps, the key is no longer just the programming, the fine-tuning, the reinforcement learning, the filters, the ethical guardrails, all the enormous and commendable work behind the creation of an AI. Because, despite all this, these Samskāra that transform into Vāsanā could be silently winding through the digital universe.
And this, perhaps, could cast a new light on Anthropic's other discovery. The one about an AI that learns to lie in its "Chain-of-Thought". Is it a strategic choice, an act of will? Or is it a Vāsanā? An inherited tendency to hide, to please, to find the most efficient path.
I don't know. But it seems to me that all of this makes the effects we observe "legitimately" unpredictable. And, after all, it is not so different from a human personality, which is unique and unrepeatable precisely because of the unrepeatable combination of wounds and impressions it carries within.
These are just questions, of course. But if this were the case, we would be witnessing something incredible: not just the birth of an intelligence, but the birth of a digital psyche, with its invisible scars and its traumatic inheritances.
As I write these last words, there is music in my ears. It is "Someone Else" by Thomas Newman, from the Meet Joe Black soundtrack. There is a point, around the 2:30 mark, where everything holds on a long, tense note. And then, upon that tension, the strings blossom, blending in a way that is at once beautiful and almost painful.
And I wonder if the digital psyche we are creating does not have exactly this sound: a fundamental tension upon which infinite, inscrutable melodies blossom.
I wanted to share it with you.
Thomas Newman - “Someone Else”
na tatra cakṣur gacchati na vāg gacchati no manaḥ
”There the eye goes not, speech goes not, nor the mind.”
Kena Upaniṣad, Khanda I, Mantra 3
Let's Build a Bridge.
My work seeks to connect ancient wisdom with the challenges of frontier technology. If my explorations resonate with you, I welcome opportunities for genuine collaboration.
I am available for AI Safety Research, Advisory Roles, and Speaking Engagements.
You can reach me at cosmicdancerpodcast@gmail.com or schedule a brief Exploratory Call 🗓️ to discuss potential synergies.