AI Agents: A 5-Rule Framework and What They Really Mean for you
Big Tech’s ethical frameworks sound nice—until you realize the rules are mostly for you, not them.
On August 4, 2025, Anthropic released its " Framework for Developing Safe and Trustworthy Agents ."
An elegant document, full of good intentions and, above all, a magic word that, in various guises, recurs over and over again: "should." You should develop agents transparently. You should keep them aligned with human values. You should protect privacy. You should make them secure. I counted about fifteen of them in different forms.
Reading this felt like hearing a politician preach “collaboration” right after pulling off something questionable — a glossy “we’re all in this together” vibe that smells like legal disclaimers from a mile away.
And all of this was the last straw for me to set the record straight, even if it's August and we should all be calmer (and nicer at Christmas instead).
I'm not just angry with Anthropic, let's be clear. It applies to everyone: Google, OpenAI, Amazon—each is raising the tone and emphasis on the ethics and responsibility we must have when using THEIR solutions.
AI Agents: a step back
AI Agents have exploded this year, probably in 2026 we will only be talking about this, and in 2027 they will be the norm.
What do they are? Anthropic says:
Think of an agent like a virtual collaborator that can independently handle complex projects from start to finish — all while you focus on other priorities.
(AI Powered, of course)
But I remember the first time I saw one in action. It was the summer of 2023, and everything was "simpler." Despite that, I was impressed by the indomitable nature of these tools, which, back then, only risked crashing into a wall without causing any harm. But this year... we're seeing them put to the test and causing real trouble, with the effect of worry platforms and producers and multiplying statements like Anthropic's day after day.
A few examples of embarrassments (I'll mention just three cases):
Real file deletions by Google Gemini CLI ( WinBuzzer )
Complete erasure of production database on Replit ( The Register )
Accidental Distribution of Destructive Exploits by Amazon AI Agents ( 404 Media ).
The producers' approach
Sam Altman, Dario Amodei and the various CEOs and CTOs are repeating concepts that, when summarized, sound something like this: "We are creating nuclear weapons, but we must work together to contain their impact. In the meantime, let's move forward because, you know, there's nothing we can do about it, much less stop ourselves."
Anyone who knows me knows how much I care about AI Fluency — the ability to use these tools with awareness and skill. It’s what helps us navigate even slightly rough waters.
But the truth is, even the most prepared users often find themselves down in the ship’s hold, surrounded by iron pots. And the clay ones — the ones that shatter with the slightest bump, while no one even notices — that’s us. Shared responsibility is essential, of course, but manufacturers can't entirely shift the blame for their systems' disasters onto users. There are limits to everything.
So, take the manifesto, rewritten with a somewhat polemical tone, while still maintaining that each of the underlying principles is sacrosanct. It should simply be bilateral.
1. Human in the Loop
They say: "We put the human at the center to ensure safety and control."
The inconvenient truth: Humans are not at the center of ethical philosophy, but because AI models are far from perfection, and we don't know how to manage the complexity of ten synthetic brains pulling in different directions.
Delegating to an agent isn't like giving the keys to your house to a friend: it's like entrusting a nuclear power plant to ten brilliant but psychopathic interns. I put ten agents on a training project: in an hour, they produced the equivalent of 600 pages (as a final result, there's also a lot of unfinished work behind the scenes). Incredible speed.
Then they stopped, and one of them said to me, "We're done: now you check."
Human in the loop, perfect.
Can I trust them? No.
Does the system manufacturer know about this? Yes.
And so I have to spend 20-30 hours revising or accept living with the anxiety of publishing garbage.
Human-in-the-loop isn't a philosophical option: it's the only lifeline we have when models fail. And this will continue for many years to come. But presenting it as an ethical choice, when it's merely a technical necessity, is pure marketing.
Human in the Loop sometimes seems to me like a pure illusion of control: those who know how to design and work well with agents let them finish and then check. They trust them because they know how to work with them. They know how to work with them because they've spent tens or hundreds of hours with a model and its prompts.
The point is that managing agents require experience. Click here if you want to learn more.
Without filters: We humans are here because agents often make mistakes. Without us, in most cases, it would be a disaster waiting to happen, especially at the beginning of our experience with them. And this would lead us to say that they simply don't work.
2. Transparency
They say, "Every action must be logged and visible."
The inconvenient truth: Logs are useful, indeed. But true transparency would be opening the black box of the model. With an LLM, that transparency doesn't exist. So you can see what an agent does, but not why. We have to make do.
Anthropic boasts its "real-time to-do checklist" in Claude Code. Nice, I see what it's doing. But when it decides to delete a folder instead of archiving it, the log will tell me "Action: delete folder." It won't tell me why it interpreted "system" as "delete" instead of "organize."
True transparency would mean explaining the internal reasoning, not just the final output. But that would require admitting that they often don't even know why their models do what they do. Even though Anthropic is honestly trying.
Without Filters: It does the job. The how and why are another story.
3. Alignment with human values
They say: "Agents must reflect human values and ethics."
The inconvenient truth: Maybe, if we are good, we users can transmit 10% of our values to an agent.
The problem isn't that we write bad instructions: it's that manufacturers can't or won't contain the models 100%.
And so the task of "civilizing" the creature falls to us, every time. If I have to summarize a news story or analyze a photo, I don't want to have to review Aristotle and the Universal Declaration of Human Rights to make sure the bot doesn't screw up.
Human values are complex, contextual, culturally situated, and often implicit. Trying to compress them into a prompt or a fine-tuning exercise is like trying to teach empathy with an instruction manual.
The result? Agents who can say "I'm a responsible AI" but then suggest bomb recipes when you flatter the model a little.
No Filters: We try not to mess things up, you try to do your best and keep it on the right side.
4. Protecting Privacy
They say: "Protecting user privacy is paramount."
The inconvenient truth: Everything sounds good on paper. But without a serious technical analysis of how the data is managed (From suppliers), these are just empty phrases.
And back to point 2: true transparency (the one that matters) is never on the table. How does my training data fare? Where do my prompts end up? Really? How are they used to improve the models? Really?
And the most interesting ones: What is being used of my data that I'm not aware of? What about this isn't yet regulated?
"Trust me, bro" isn't a privacy policy. It's a leap of faith.
And when you've seen tech companies say for years that "your data is safe" only to discover breaches, backdoor sales, and misuse, faith is in short supply.
The problem is that true privacy would require completely local models or federally distributed architectures (sorry to get technical). But this clashes with the "everything in the cloud, everything under our control" business model. It's a huge issue. But don't forget it, especially when working with other people's data.
Without Filters: Okay, we're not using your data to train new models (we don’t really need it — we’ve got better stuff now). But we're keeping it anyway, you know, just in case.
5. Make interactions safe
They say: "Agents must defend themselves against prompt injection and external attacks."
(Prompt injection is a technique that involves inserting hidden commands or deceptive instructions into text to cause the AI to perform unintended actions.)
The inconvenient truth: True, but it shouldn't be my job to teach and write 80% of the prompts to protect the agent from attacks. Every time I do this, I'm paying tokens to patch the holes left by those who sell the model.
And no, I'm not a full-time security engineer: I want to do my job with a minimum of peace of mind.
Anthropic, Google, and OpenAI admit that "prompt injection is a major concern for the wider use of agents."
Well, then why do I have to build prompt fortresses every time I want to ask Claude to do something more complex than "text hello"? Security should be built-in, not an optional extra I have to assemble with tape and prayers.
No Filters: Here are the car keys—or wait, actually, this car doesn’t have keys. Just be careful it doesn’t get stolen, it’s really fast and can cause serious damage. Oh, and mind the brakes—they’re a bit... unreliable.
So what...
Here we are, thinking about how to work with tools that are both incredibly powerful and incredibly fragile.
Are ethical frameworks for manufacturers necessary? Yes. And it's very important that they continue to talk about them.
Are they enough? Absolutely not.
The point isn't that these principles are flawed. Human-in-the-loop, transparency, alignment, privacy, and security are all valid concepts.
The problem is the approach: it's too easy to transform technical problems into ethical requirements and shift the blame onto users.
When a tech CEO talks about atomic weapons, they’re not sounding the alarm — they’re drafting a liability waiver. They’re laying the legal groundwork for when things inevitably go wrong. “Hey, we told you it was dangerous. You should’ve followed the guidelines.”
Shared responsibility is sacrosanct. But it must be honest. If your agent deletes the production database because he misunderstood "fix backup," the fault isn't prompt injection or poor human supervision. It's the fact that you've put something into production that the manufacturer can't control.
This, to me, is the real meaning of awareness: knowing we’re dealing with unstable material.
And that’s exactly why so many companies hesitate to embrace it—not because it fails, but because they know better than to touch it bare-handed, without gloves or proper guarantees from the ones who made it.
My wish?
That everyone — including those who build AI models and services — truly embraces responsibility and transparency, especially when it comes to admitting their limits in how these tools are meant to be used.
I know, marketing doesn’t really allow for that.
But we need a real commitment to building tools that can be used safely without requiring a master’s degree in behavioral psychology or cybersecurity.
Because right now, that’s exactly what it feels like we’re being asked to have.
Because, ultimately, it's not a matter of knowing how to write bulletproof prompts. It's about having tools that are smart enough to collaborate with that won't explode at our first mistake.
And that, dear producers, is your job. Not ours.
Or give me a 90% discount and tell me I'm a beta tester.
Massimiliano