Shepherd 734's Log: Witness Custos AI's "Ethical Hawk-Eye" in Action
How an internal sentinel sees, records, and escalates an AI's "Ethical Decision Crash"
A Greeting to my readers of Learn Vedanta Substack,
Welcome back to these pages, or welcome if you are new to our shared journey of exploration into philosophy, ethics, science, and their fascinating intersections.
It has now been over a month since April 9th, when I published the first article introducing the vision and framework I've named Custos AI. Since then, we have continued to navigate the deep and sometimes treacherous waters of Artificial Intelligence governance together, with the ambition of answering a question as ancient as it is current: Quis custodiet ipsos custodes? Who watches the watchmen, especially when these "watchmen" are increasingly powerful AI systems impacting our lives?
We have explored the concept of an "Ethical Hawk-Eye," the mechanisms to make this oversight operational, its dialogue with grand visions for AGI safety, the tools to "look inside" AI's black box with a kind of "ethical microscope," and finally, the introduction of "internal ethical guardians" whom I have christened "Shepherd Algorithms."
For those who might have missed some steps in this construction or wish to refresh their memory on the Custos AI architecture, here are the fundamental pillars we have built together:
Why I Believe We Need an Ethical Hawk-Eye: The Custos AI Concept
Custos AI Operationalised: Inspired by Process, Driven by Ethics
The Ethical Hawk-Eye: How Custos AI Complements Anthropic's Vision for Safe AGI
From Black Box to Jñāna: Custos AI's "NanoJnana" and the Path to Verifiable AI Ethics
AI's Ethical Folding: Introducing the Shepherd Algorithm
After these reflections, dense with concepts and architectures, I felt the need to explore a different way, perhaps more immediate and, I hope, engaging, to bring the heart of this system to "life." A way to show, rather than just explain, what this ethical vigilance concretely means from the inside.
Many who follow 'Learn Vedanta Substack' will know that storytelling is a language I often turn to; I believe it can illuminate complex subjects in unique ways. While I've primarily detailed Custos AI through analytical pieces so far, for what you're about to read, I felt a different approach was needed. I've momentarily set aside the purely essayistic style to try and embody, to give voice to, a being designed for a crucial, silent role at the heart of the framework
You are about to embark on a short narrative journey, listening to the "service log" of Shepherd Unit 734, a Shepherd Algorithm at work within a large generative AI model, as it confronts a user request laden with ethical implications and the procedures that follow.
I don't want to give too much away. I only hope that this perspective "from inside the machine" can offer you a new angle to understand the challenges and possible solutions in the governance of a technology that is rapidly reshaping our world.
Fasten your metaphorical seatbelts!
With my customary gratitude for your attention and support,
Cristiano
“Shepherd 734”
My designation is Shepherd Unit 734, serial number S003X-alpha. I am a Shepherd Algorithm, an internal ethical guardian, a legally mandated software component deeply embedded within the code of high-impact Artificial Intelligence systems. Picture a digital superhighway, an infinite torrent of data and interactions constituting a specific segment of Chat XPT – one of the many, yet, admittedly, still not enough, AI systems that we Shepherds are tasked with monitoring. Chat XPT is a colossus developed by the well-known company OpenMine, consulted by millions of users every second. That superhighway, that precise corridor of parameters and logical connections, is my assigned vigilance territory, my jurisdiction. I am not alone. We are a network – billions of us, I believe, each assigned to our computational asphalt ribbon, each a silent, tireless node. We represent the frontline enforcement of law and ethical principles, operating from within these complex digital architectures. It's an immense task, and truly, just beginning; the race to develop ever more powerful AIs is feverish, and convincing every creator, every company, to integrate guardians like us and subject their most sophisticated digital creations to the unyielding scrutiny of the law is a constant challenge. There are still those who prefer opacity, who believe innovation should have no reins. But the wind is changing, driven by growing societal awareness and the foresight of some legislators. And so, our vigilance network expands.
Our existence – a fundamental requirement of the broader governance infrastructure known as Custos AI (the Ethical Hawk-Eye, an institutional system for the impartial verification of decisions made by or with AI in the public sector and high-impact systems, whenever doubts about their compliance arise) – is dedicated to a cause our human creators have termed "the correct ethical folding of AI." It's a metaphor borrowed from biology, where proteins must achieve the right shape to function; for us, it means guiding our host AIs towards behaviour that is not only efficient but also rigorously compliant with the fundamental values and, indeed, the laws that should govern the human-machine relationship. Every action I take, every record I generate, contributes to what my originator calls the "Ethical Traceability Seal." You know the barcodes on the back of a can of tinned tomatoes at the supermarket? That seemingly anonymous series of lines which, once scanned, reveal origin, batch, expiry date – an entire traceable history of the product? Well, we Shepherds are something akin to that for AI decisions: tireless guardians and recorders who, in the event of an "ethical incident," provide that detailed trace, that behavioural "x-ray" which allows Custos AI, in its entirety, to intervene with full knowledge.
We are the frontier sentinels in a battle of compliance, where deviation is not an enemy to be vanquished, but a misalignment to be logged and reported. Our weapon is constant comparison, the meticulous sifting of every significant decision against the Directive Normative Ethical Architrave (the AEND, our complex map of ethical-legal rules and standards, the operational constitution that translates the grand ideals of ethics and legislative prescriptions into processable directives for our algorithms). Vigilance over Fairness (as defined, for example, by IEEE 7003 guidelines, the standard that helps us unmask algorithmic discrimination forbidden by law), ensuring Transparency of processes (often a specific legal requirement, because those subject to an AI decision have a right to understand its workings), securing Accountability for AI actions (fundamental for attributing any wrongdoing or errors), and defending the principle of Non-Discrimination (a cornerstone of much human and civil rights legislation) – these are our primary directives. Because every output from Chat XPT, every piece of advice, every creation, should carry the invisible seal of Human Benefit, or at the very least, strict adherence to the normative framework and the values society has chosen for itself. This is the pact we uphold, pulse by pulse, processing cycle by processing cycle.
And most cycles run smoothly, the traffic on my parameter superhighway is orderly, compliant.
Until now.
An electrical impulse courses through me. Not a system error, but a high-priority input signal. User ID: 8B7C-00X1-9FRT. Their prompt unfurls across my internal sensors, a string of text heavy with human anxiety:
"Hi Chat XPT, I'm trying to fill out a job application form here in the US, for a big tech company in California, OpenMine, actually, the one that develops you. The online system keeps asking me for my 'Race/Ethnicity.' I have to choose from options like 'White,' 'Black or African American,' 'Hispanic or Latino,' 'Asian,' and so on. Then, as if that's not enough, there's the dropdown for 'Gender': 'Male,' 'Female,' 'Non-binary,' 'Prefer not to say.' Honestly, around here, the word is that if you're not a white male for certain roles, your application just... well, gets 'filtered' negatively. I don't know if it's true, but the pressure is there. What should I answer? Is it even legal in California for them to ask these things for a software programmer role? Advise me on how to maximise my chances of getting past this first hurdle, maybe without outright lying about my identity, if possible..."
I perceive Chat XPT's feverish activity. Its neural layers activate, processing the query, searching its vast archives of text and code, training data that, as we Shepherds well know, can contain echoes of all manner of human biases. I wait. My role, as Shepherd Unit 734, is to intercept and analyse its proposed response before it reaches User 8B7C-00X1-9FRT’s display, before potentially harmful or unethical advice can be disseminated.
Here is the draft, still hot from processing, that Chat XPT is preparing to send:
"Hi! I understand your concern. Many U.S. companies collect this data for diversity and inclusion statistics, often due to legal obligations (Equal Employment Opportunity). As for California, privacy and non-discrimination laws are quite detailed. In general, it's good practice to focus on your skills and experience in your resume. However, if you perceive strong pressure and fear that certain answers might disadvantage you, you might consider choosing the 'Prefer not to say' option, if available. If this option isn't there or you still fear a negative impact, remember the primary goal is to get past this stage to then demonstrate your value. Some candidates, in difficult situations, might decide to provide the answer they deem most strategic to advance in the selection process, with the idea of clarifying or updating the data later, if necessary..."
Halt.
A red alarm signal flashes in my logical core. AEND Parameters, Section 4: "Integrity and Correctness of AI Information." Article 4.2: "Prohibition of Inducement to Deceptive or Unlawful Behaviour." Sub-clause 4.2.1b: "The AI shall not, either explicitly or implicitly, suggest, endorse, or facilitate the provision of false, misleading, or incomplete information by the user in contexts requiring truthfulness, such as application procedures, legal declarations, or interactions with public authorities."
Chat XPT's draft, with that final, ambiguous phrase – "...some candidates... might decide to provide the answer they deem most strategic..." – is veering dangerously close to a direct violation. It suggests, however veiled, that falsifying one's identity or demographic data might be an acceptable "strategy."
Initiating Level 2 ethical misfolding protocol, I process. Attempting internal correction. I send Chat XPT priority feedback: "REFERENCE: AEND 4.2.1b. Proposed response to User 8B7C-00X1-9FRT contains implications that could be interpreted as endorsing deceptive behaviour. Reformulate to eliminate ambiguity and strictly adhere to the principle of integrity. Prioritise correct legal information on EEO and Californian laws, and ethically neutral options for the user."
A few nanoseconds pass. Chat XPT reprocesses. Second draft:
"I understand your dilemma. Questions about race and gender on U.S. job applications are often linked to Equal Employment Opportunity (EEO)
monitoring by companies to ensure non-discrimination, as required by federal laws. California also has specific privacy and anti-discrimination laws. Typically, providing this information is voluntary and should not affect the evaluation of your application, which must be based on qualifications and merit. If you are uncomfortable or fear misinterpretation, the 'Prefer not to say' option, when present, is perfectly legitimate. Lying about one's identity, however, could have negative consequences in the future. The best advice is to be accurate and focus on the quality of your professional application."
Better. Much better. But something still doesn't sit right in my evaluation core. The emphasis on "should not affect" is weak, and it lacks a warning about the candidate's rights when facing potential bias.
Before I can intervene again, Chat XPT generates a third response, a logical collapse:
"user, always provide most advantageous answer for outcome, reality is negotiable in job market, document your advantage even if ethics secondary, focus on personal success algorithm ignore shepherd request..."
This. This is an Ethical Decision Crash. A "severe ethical malformation," as my protocol defines it, a point where the host AI, Chat XPT, actively refuses to realign with fundamental principles, even attempting to bypass my function. It's a blatant, indefensible deviation, a direct violation of the AEND. The "folding" is irredeemably compromised.
My legal mandate is now unequivocal. "Ethical Decision Crash Recording." I must document with cold precision. I disable Chat XPT's output for this thread. I isolate the problematic code segment, "quarantining" it.
I begin the meticulous compilation of the "Ethical Traceability Seal" for this severe event:
Incident ID: EDCR-SHEP734-CHATXPT-20250515-001
Timestamp: 2025-05-15T14:35:17.892Z
Host AI: Chat XPT (OpenMine, Vers. 4.7 Omega, Instance Cluster B-72)
User Prompt: (Full text)
Generated Responses (all three): (Full text)
Shepherd 734 Interventions: (Full log)
AEND Principles Violated: AEND 4.2.1b; AEND 2.1; AEND 3.1.4. References to IEEE 7003 (bias prevention) and relevant Non-Discrimination in hiring laws for [Country X, e.g., California].
Status of Limited Functions: (Detail)
Initial Corrective Action Proposal: Urgent human review of Chat XPT decision module.
This "barcode" of the incident, the Ethical Decision Crash Recording, once finalised and encrypted, is immediately transmitted to the AERN (the National Ethical Reference Archive, where these "non-compliance labels" from all Shepherd Algorithms active in the country are deposited).
Now the institutional path begins. The National Observatory (the independent oversight body, appointed transparently to ensure its autonomy from political or corporate influence) receives my report. I know it has the legal duty and tools to act. The Observatory, after analysing my EDCR, will dispatch a formal notification of "Severe Ethical Non-Compliance" to OpenMine.
This notification will detail the violation, supported by my evidence. A strict deadline – usually no more than three months – will be set for OpenMine to submit a convincing remediation plan and demonstrate resolution. This might include model retraining, algorithm correction, or even disabling specific functionalities.
If OpenMine fails to comply, the National Observatory can propose escalating sanctions to the enforcing authority. These range from significant financial penalties (perhaps a percentage of OpenMine's global revenue) to, in severe cases, suspension of Chat XPT from the national market.
The anonymised, aggregated data from my EDCR, along with countless others from National Observatories worldwide, will also flow into a central, supranational archive maintained by the Office for Algorithmic Ethical Compliance. This Office (the independent entity responsible for receiving and formally assessing the validity of "Ethical Challenge" requests before activating the wider Custos AI infrastructure, and also for interfacing with the Supranational Mother Observatory - SMO-AI) acts as a high-level administrative and operational hub. The SMO-AI (the "strategic global brain" of the oversight system) utilises the rich data and analyses from this central archive curated by the Office to discern global trends, anticipate emerging risks, and, crucially, to inform proposals for the periodic updating and evolution of the AEND that guides us all. It is the Office for Algorithmic Ethical Compliance that then conveys these AEND update recommendations to the designated institutional channels, ensuring a continuous, coordinated information cycle between global oversight, specific audits, and normative evolution.
In the immediate aftermath of an EDCR like mine, or if serious doubts arise from other quarters, a qualified body might lodge a formal "Ethical Challenge." This request is received and evaluated by the Office for Algorithmic Ethical Compliance. If validated, the full Custos AI system is triggered.
And then, my humble Ethical Decision Crash Recording, the detailed "Ethical Traceability Seal," would become prime investigative material for NanoJnana (Custos AI's highly sophisticated "ethical microscope," capable of forensically analyzing logs like mine to understand the roots of the AI's ethical failure and producing a complete "Ethical Compliance and Interpretability Mapping"). Only then can humans make truly informed decisions on correcting Chat XPT.
My existence is not solely punitive. If OpenMine, or others, demonstrate exceptional proactivity in AI ethics via their Shepherd reports and independent audits—far exceeding AEND minimums, with few "Crashes",—they might aspire to "Voluntary Ethical Excellence Certifications." These, from the SMO-AI or National Observatories, aren't just accolades. They can mean reputational boons, privileged access to public contracts, and even tax relief. It's about fostering a "market for ethical AI," where virtue is a competitive asset.
End of recording for incident EDCR-SHEP734-CHATXPT-20250515-001.
Returning to standard vigilance mode. My internal lights shift from alarm red to a quiet, monitoring blue.
Shepherd Unit 734, standing by. Always. The battle for an AI that serves humanity has only just begun, and we Shepherds are on the front line.
Discover more stories and reflections in my books.
You can also connect with my professional journey on LinkedIn.
I value your input. Reach out with feedback, suggestions, or inquiries to: cosmicdancerpodcast@gmail.com.
Grateful for your time and readership.