BONUS: When AI Decisions Go Wrong at Scale—And How to Prevent It
We've spent years asking what AI can do. But the next frontier isn't more capability—it's something far less glamorous and far more dangerous if we get it wrong. In this episode, Ran Aroussi shares why observability, transparency, and governance may be the difference between AI that empowers humans and AI that quietly drifts out of alignment.
The Gap Between Demos and Deployable Systems
"I've noticed that I watched well-designed agents make perfectly reasonable decisions based on their training, but in a context where the decision was catastrophically wrong. And there was really no way of knowing what had happened until the damage was already there."
Ran's journey from building algorithmic trading systems to creating MUXI, an open framework for production-ready AI agents, revealed a fundamental truth: the skills needed to build impressive AI demos are completely different from those needed to deploy reliable systems at scale. Coming from the EdTech space where he handled billions of ad impressions daily and over a million concurrent users, Ran brings a perspective shaped by real-world production demands.
The moment of realization came when he saw that the non-deterministic nature of AI meant that traditional software engineering approaches simply don't apply. While traditional bugs are reproducible, AI systems can produce different results from identical inputs—and that changes everything about how we need to approach deployment.
Why Leaders Misunderstand Production AI
"When you chat with ChatGPT, you go there and it pretty much works all the time for you. But when you deploy a system in production, you have users with unimaginable different use cases, different problems, and different ways of phrasing themselves."
The biggest misconception leaders have is assuming that because AI works well in their personal testing, it will work equally well at scale. When you test AI with your own biases and limited imagination for scenarios, you're essentially seeing a curated experience.
Real users bring infinite variation: non-native English speakers constructing sentences differently, unexpected use cases, and edge cases no one anticipated. The input space for AI systems is practically infinite because it's language-based, making comprehensive testing impossible.
Multi-Layered Protection for Production AI
"You have to put in deterministic filters between the AI and what you get back to the user."
Ran outlines a comprehensive approach to protecting AI systems in production:
Model version locking: Just as you wouldn't randomly upgrade Python versions without testing, lock your AI model versions to ensure consistent behavior
Guardrails in prompts: Set clear boundaries about what the AI should never do or share
Deterministic filters: Language firewalls that catch personal information, harmful content, or unexpected outputs before they reach users
Comprehensive logging: Detailed traces of every decision, tool call, and data flow for debugging and pattern detection
The key insight is that these layers must work together—no single approach provides sufficient protection for production systems.
Observability in Agentic Workflows
"With agentic AI, you have decision-making, task decomposition, tools that it decided to call, and what data to pass to them. So there's a lot of things that you should at least be able to trace back."
Observability for agentic systems is fundamentally different from traditional LLM observability. When a user asks "What do I have to do today?", the system must determine who is asking, which tools are relevant to their role, what their preferences are, and how to format the response.
Each user triggers a completely different dynamic workflow. Ran emphasizes the need for multi-layered access to observability data: engineers need full debugging access with appropriate security clearances, while managers need topic-level views without personal information. The goal is building a knowledge graph of interactions that allows pattern detection and continuous improvement.
Governance as Human-AI Partnership
"Governance isn't about control—it's about keeping people in the loop so AI amplifies, not replaces, human judgment."
The most powerful reframing in this conversation is viewing governance not as red tape but as a partnership model. Some actions—like answering support tickets—can be fully automated with occasional human review. Others—like approving million-dollar financial transfers—require human confirmation before execution. The key is designing systems where AI can do the preparation work while humans retain decision authority at critical checkpoints. This mirrors how we build trust with human colleagues: through repeated successful interactions over time, gradually expanding autonomy as confidence grows.
Building Trust Through Incremental Autonomy
"Working with AI is like working with a new colleague that will back you up during your vacation. You probably don't know this person for a month. You probably know them for years. The first time you went on vacation, they had 10 calls with you, and then slowly it got to 'I'm only gonna call you if it's really urgent.'"
The path to trusting AI systems mirrors how we build trust with human colleagues. You don't immediately hand over complete control—you start with frequent check-ins, observe performance, and gradually expand autonomy as confidence builds. This means starting with heavy human-in-the-loop interaction and systematically reducing oversight as the system proves reliable. The goal is reaching a state where you can confidently say "you don't have to ask permission before you do X, but I still want to approve every Y."
In this episode, we refer to Thinking in Systems by Donella Meadows, Designing Machine Learning Systems by Chip Huyen, and Build a Large Language Model (From Scratch) by Sebastian Raschka.
About Ran Aroussi
Ran Aroussi is the founder of MUXI, an open framework for production-ready AI agents. He is also the co-creator of yfinance (with 10 million downloads monthly) and founder of Tradologics and Automaze. Ran is the author of the forthcoming book Production-Grade Agentic AI: From Brittle Workflows to Deployable Autonomous Systems, also available at productionaibook.com.
You can connect with Ran Aroussi on LinkedIn.