Your Showcase Primer: TypeSafe AI, Fleet AI, Goodfire

Building AI systems we can actually trust

Mar 12, 2026

🌟 Engineers — want to meet these three founders and seven others? Apply to attend our March 25th SF Startup Showcase.

In the not-so-distant future AI systems will help decide which transactions clear in financial markets, how power flows across energy grids, and which medical cases get escalated to doctors. These systems are becoming powerful enough to control real infrastructure that affects every single person on the planet. The largest AI labs are racing to build more capable models, but they’re paying much less attention to the infrastructure needed to safely deploy them. Before AI can run the systems we depend on, we need ways to understand how it thinks, test how it behaves, and secure the environments it operates in.

Diogo Almeida, Founder of TypeSafe AI

You built an AI agent that’s supposed to process customer refunds. It reads the support ticket, looks up the order, and calls the refund API. The first few run smoothly. Then a ticket comes in that mentions a partial refund. The model decides it needs a helper function called calculate_partial_refund() and confidently calls it.

That function doesn’t exist.

Instead of stopping, the agent invents the logic, calls the wrong API endpoint, and refunds the entire order. Not once, but for every similar ticket that follows. Nothing crashed. The logs look fine. The system just quietly made the wrong decision. And it made it hundreds of times. These moments reveal the real challenge of this era of AI. The models are already powerful enough to automate meaningful work. But without reliability they’re useless.

That’s the problem TypeSafe is trying to solve. Instead of optimizing models for chat or creativity, the team is rebuilding the stack from first principles to create what they call transformative AI. Their model, System1, is designed for machine-to-machine execution rather than human conversation. The focus is reliability in environments where correctness and latency are non-negotiable: real-time decision systems, compliance pipelines, large-scale automation, or autonomous agents operating without a human constantly supervising.

Rather than bolting guardrails onto probabilistic models after the fact, TypeSafe is embedding reliability directly into the model itself. If that approach works, the payoff could be enormous. AI could finally take on the tedious but essential work that keeps companies running: compliance checks, operational workflows, data reconciliation, and thousands of small decisions that currently require human oversight.

The team behind the company makes this big vision feel achievable. Diogo Almeida, one of the founders of TypeSafe, previously worked at OpenAI, where he helped develop reinforcement learning with human feedback, the technique that made systems like ChatGPT actually follow human intent. That work became one of the most important breakthroughs in modern AI. Now Diogo is asking a different question. If the last generation of models learned how to talk to humans, what does the next generation need to look like if it’s going to run real systems? If you’re curious where frontier AI might go after the current wave of LLMs, this is a rare opportunity to hear from someone who helped shape the last one and is now building what comes next.

Nicolai Ouporov, Founder of Fleet AI

Imagine an AI agent helping coordinate activity inside a busy emergency room.

A patient is rushed in after a car accident. His blood pressure is falling. The triage nurse enters vitals, lab tests are ordered, and imaging needs to be scheduled immediately. Someone has to decide—fast—whether the patient should go straight to surgery, to a CT scanner, or to trauma observation. In a real hospital, those decisions happen in seconds and mistakes can cost a life.

In a simulation environment like the ones Fleet AI founder Nicolai Ouporov is building, an AI agent can practice these scenarios safely. The system might initially make the wrong call. It might send the patient to imaging instead of escalating the case to the trauma team. But the simulation shows what happens next—the patient deteriorates, alarms trigger, and the delay becomes visible in the outcome.

Because everything is simulated, the agent can run through thousands of emergency scenarios: internal bleeding, stroke symptoms, sepsis alerts, conflicting lab signals. Each time it sees the consequences of its decisions and adjusts its reasoning.

Over time, the system begins to recognize the subtle patterns that signal a life-threatening situation and learns when to escalate immediately. The goal isn’t to replace doctors. It’s to train AI systems in realistic environments where they can develop judgment before they are ever trusted with real patients.

That kind of training becomes essential as agents start helping with complex operational work. Before we trust software to coordinate systems that affect people’s health, finances, or safety, it needs the equivalent of residency training. Simulation environments give agents a place to build that experience.

For engineers, building those environments is a pretty cool problem. You’re creating digital worlds where intelligent systems can practice. That requires combining reinforcement learning, simulation, distributed systems, and developer tooling to design environments that are realistic enough to expose failure modes but flexible enough for agents to explore and improve. It also means thinking deeply about how to generate meaningful scenarios, measure progress, and surface the kinds of mistakes that only appear when systems interact in complicated ways. As agents start operating across thousands of tools and services on the internet, the need for this kind of infrastructure will only grow. Fleet sits right at that frontier.

Tom McGrath, Founder of Goodfire

In 2012, scientists realized that a strange molecular mechanism buried inside bacterial DNA could be turned into a powerful gene-editing tool. It was a total accident that they stumbled across this. No one was looking for it. That system became CRISPR, and it transformed biology almost overnight. What’s striking in hindsight is that the mechanism had been sitting in genomes for decades before anyone understood what it did. Today something similar may be happening inside AI models. Large neural networks are scanning enormous datasets in biology, chemistry, and medicine, learning patterns humans can’t easily see. But the internal reasoning behind those discoveries often remains opaque. If the next CRISPR-like insight is already hiding inside a model, we currently have no good way to find it.

While working as a researcher at Google DeepMind, Tom McGrath spent years studying how large neural networks actually work. What he kept running into was the same problem: even the people building the most advanced AI systems in the world didn’t really understand what was happening inside them. Modern models can reason, write code, and discover new patterns in science, but their internal decision-making still looks like a black box. McGrath became one of the early pioneers of a field called mechanistic interpretability, which tries to reverse-engineer neural networks and map the concepts inside them. After helping found the interpretability effort at DeepMind, he decided the problem was too important to stay purely academic and joined forces with Eric Ho and Dan Balsam to start Goodfire.

Goodfire is building tools to help researchers and engineers see inside modern AI models and better understand how they reason. Interpretability has been one of the most important and unsolved challenges in AI for years. If we want to trust these systems with meaningful tasks, we need ways to inspect their behavior, identify when something has gone wrong, and understand why a model made a particular decision. Goodfire’s work aims to turn what is currently a black box into something closer to a system engineers can actually debug. If we turn the Internet over to thousands of agents acting on our behalf, tools that help us understand what those systems are thinking will become an essential part of the stack.

The internet is turning into a place where software acts on our behalf. Agents will read documentation, move money, route resources, escalate medical cases, and coordinate complex systems we depend on every day. But before we trust them with that responsibility, we need an entirely new layer of infrastructure: models that behave reliably, environments where they can learn safely, and tools that let engineers understand what they’re doing.

That’s the layer TypeSafe, Fleet, and Goodfire are building. If you’re an engineer interested in developing the systems that will make AI trustworthy enough to run the world’s infrastructure, this is a rare chance to meet the founders defining this future.

Get excited to meet these three founders! Apply to attend our March 25th SF Startup Showcase.

Discussion about this post

Ready for more?