Back

EP 003

29 Min

Two Years Running an AI Agent in Production | Eli Winderbaum

Eli Winderbaum has run an AI support agent in production for nearly two years. He shares what 65% resolution looks like at Mirage, the jobs AI is changing, and the month-to-an-hour feedback loop.

Eli Winderbaum has run an AI support agent in production for nearly two years. It resolves 65% of inbound messages at Mirage, and it has already changed which support jobs exist. Eli Winderbaum is one of the CX leaders featured in our Hall of Fame.

Most CX leaders are still planning their first AI deployment. Eli Winderbaum is two years into his. As Head of Customer Experience at the generative video company Mirage, after a career across BetterCloud, Clarity Money, and Marcus by Goldman Sachs, he has lived the transition most teams are only starting. On this episode of the Fini Podcast, he shared what AI support actually looks like in production, the roles it is reshaping, and the feedback loop that now closes in an hour.

Meet Eli Winderbaum

Eli has spent 12 years building customer experience organizations across demanding environments: enterprise SaaS at BetterCloud, fintech at Clarity Money, regulated consumer banking at Marcus by Goldman Sachs, and now generative video at Mirage, where support reaches millions of creators. He went all in on AI-first support nearly two years ago, which makes him one of the few leaders with real production scar tissue rather than demo impressions.

The jobs AI is actually changing

Eli renamed his AI agent "tier zero," and it now resolves 65% of inbound messages. His five former tier-one agents have all been promoted to cross-functional tier-two work, like owning feature requests and bug prioritization, instead of waiting for the next ticket. He flags two less obvious roles already shifting. The documentation manager: instead of hiring one, his team wired Linear and GitHub changes straight into their knowledge base so docs stay current for customers, human agents, the AI, and the LLMs reading them. And the QA manager: AI reads every conversation, sets a baseline score, and lets humans zero in on the outliers instead of spending fifteen minutes per ticket.

The feedback loop that went from a month to an hour

The change Eli is most excited about is speed from complaint to fix. Years ago at BetterCloud, a customer request meant a human reply, a flag for review, a pitch to a product manager with proof, a Jira ticket on the backlog, and usually nothing. Today at Mirage, the AI agent answers and logs the request to Linear, votes accumulate, and at critical mass a Linear agent calls a coding agent to build it, a human approves the change, and it flows back to the customer. What took a month with humans in the loop can take under an hour. With 10,000 conversations a month, customers are effectively voting on the roadmap, whether they realize it or not, which is why Eli argues CX should sit close to (even report into) product.

Demo vs production: what surprised him

Running live for two years taught Eli that adoption feels like Tesla full self-driving: first awe, then complacency, then you do not want to turn it off. The early edge cases were real, like an agent stuck answering "I'm Sam, how can I help?" when a customer kept asking if it was AI, which they fixed with a rule to disclose and offer a human. His grounded take: people worry AI will go off-script, but humans do too, and modern agents rarely surprise him anymore. He notes he has not heard the word "hallucination" in over six months, a sign the quality bar has moved.

Where to start, and the metrics that matter

When a team is drowning, Eli's advice is blunt: clear your calendar, book back-to-back vendor demos, and compare them directly, because most agents run similar underlying models and the differentiator is fit to your vertical and your team's buy-in. He stresses that adoption is about buy-in, not the tool, so involve the people who will use it before you choose. On measurement, resolution rate is the headline, and he likes that many vendors are priced on resolution so incentives align. He also points to emerging "perceived satisfaction" scores that separate a customer's frustration with a bug from the quality of the support they received, and encourages teams to build their own health index when off-the-shelf metrics fall short.

What support leaders should take from this

Rename tier one, and promote your people. Let AI own tier zero and move agents into cross-functional, higher-value work before turnover does it for you.
Automate the knowledge pipeline. Wire product changes straight into docs so customers, agents, and the AI all read the same current truth.
Close the loop into product. Treat 10,000 monthly conversations as votes on the roadmap, and put CX as close to product as you can.
Allow a dip before the gain. Give the team permission for the agent to get a little worse before it gets better, and fix edge cases as they appear.
Choose for buy-in, not features. Most agents run similar models. The one your team will actually adopt is the one that wins.
Don't automate away customer pain. If you automate every part of your own job, you stop feeling what customers feel, and the experience suffers.

Listen to the full episode

Eli goes deeper on supporting regulated vs generative products, the self-updating knowledge base, and the metrics he would build from scratch, in the full episode of the Fini Podcast. You can connect with him on LinkedIn.

An AI agent that resolves in production and feeds your roadmap, not just deflects, is what Fini is built for. Book a demo to see it on your own tickets.

Leo: Welcome back to the Fini Podcast. I'm Leo. My guest today has spent 12 years building customer experience organizations across some of the most demanding environments in tech: enterprise SaaS at BetterCloud, fintech at Clarity Money, consumer banking at Marcus by Goldman Sachs, and now generative video at Mirage. His name is Eli Winderbaum, and he's the Head of Customer Experience at Mirage, where they're scaling support for a product that's reached millions of creators. What makes Eli's perspective valuable is that he's seen the full evolution of CX, from manually answering every ticket to running AI-first support operations. Eli, welcome to the show.

Eli: Thanks, Leo. Thanks for the kind introduction.

Leo: As AI takes more frontline support and tier-one tickets are resolved more and more, which roles do you see going away, and which ones evolving?

Eli: I implemented an AI agent coming up on two years ago, and my initial reaction was that frontline or tier-one agents would be affected most. I thought it would just affect my team a little at first, but fast forward almost two years and tier one is incredibly affected. Anyone who tells you they know exactly what's going to happen probably isn't telling you something, but it's very clear tier-one agents' jobs are really disappearing and changing over the next couple of years. I actually think there are two more roles that aren't often talked about. The first is the documentation manager. Your AI agent is only as good as the knowledge you feed it, and it's never been faster to ship code changes, so keeping docs up to date is nearly impossible by hand. We moved our docs to Mintlify and worked out a way to automatically push changes from Linear and GitHub into the docs, so they're current for customers, human agents, the AI agent, and the LLMs reading them. The second is the QA manager. That used to be 10 or 15 minutes reading a single conversation and scoring rapport and needs. With AI you can read all of it, get a baseline score, and zero in on the conversations that fall below your average. It's a huge time and cost saving.

Leo: Which roles do you see evolving and taking a big shift?

Eli: The old frontline agent role has changed a lot. At our company, five agents who were tier-one have essentially been promoted. Our new tier one, which we renamed tier zero, is our AI agent responding to customers, resolving 65% of all inbound messages. The next line of defense is tier two, and we're making those agents cross-functional, because just answering questions is repetitive and leads to turnover. For example, one agent on a rotational basis keeps track of feature requests and bugs, with AI assisting, adding a human element and making sure things are prioritized properly.

Leo: You went from regulated consumer banking at Marcus to generative video at Mirage. What's harder to support in the AI era, financial products where mistakes cost money, or an unpredictable generative product?

Eli: I've mostly worked at startups where there's high trust and I can move fast. The biggest challenge at a regulated bank was working with many more people, including legal and compliance teams whose job is to say no to keep the firm safe. Timelines I was used to stretched by days, weeks, or months, which gets demoralizing. So personally I find working inside a larger organization harder, even if it's not a bank, because there's so much to overcome. That said, a bank has a big responsibility to its customers. I'm not in a rush to go back to a highly regulated industry like finance or healthcare.

Leo: A lot of leaders see the value of AI but are held back by legal and compliance or their bosses. How would you advise them to drive that internal shift?

Eli: It's all about buy-in. A friend at a Swiss bank asked me which project-management tool to deploy across 50 people in multiple countries. I restrained myself from naming a tool and said it's about buy-in: rather than deciding in a silo, reach out to a few key people whose adoption will pull in their reports and peers. The tool is almost irrelevant compared to whether people are excited to use it. Even if you consult people and choose a different tool, at least they were consulted. The same goes for frontline agents. We use Intercom, and I couldn't switch to Zendesk, Front, Fini, or anything overnight without understanding how it affects their day-to-day. Include the people who'll use it.

Leo: One of your takes is that customer support shouldn't be siloed under operations but should be closer to product, even reporting to it. Why is that so important now?

Eli: I could be wrong, and for some orgs it wouldn't make sense, but my prediction is we'll start seeing it in the next couple of years. Customer experience is no different than the product experience. Product managers think they own the pixels in the app, but the pixels where you contact an agent, read documentation, or go through a cancel, renewal, or upgrade flow are all customer experiences. The support flow should be as important as converting from a trial or activating a feature. So CX should be as close to product as possible, and it wouldn't hurt to test reporting into product, because the customer feedback loop often feels so far from product, and forcing that reporting could ensure you actually listen to customers.

Leo: Can you walk me through a real example at Mirage of what happens when a user reports a bug?

Eli: Let's contrast it with my first day at BetterCloud 13 or 14 years ago. None of today's tools existed. A customer sent a message, a human replied, it might get flagged for review, then wait a week or two for a product feedback loop. I'd pitch a feature, prove the opportunity with numbers, and a PM might open a Jira ticket that went onto the backlog and usually nowhere. You'd be lucky to get two of your requested features built in a month. Today at Mirage, a customer sends a message, the AI agent responds and logs it to Linear, where we track and add votes for features. Once it hits critical mass, the Linear agent calls a coding agent to build the feature, it goes to GitHub, a human approves it, and the process flows all the way back to the customer. What used to take a month can take an hour, or even 30 minutes, depending on how much you trust auto-approval. Customers are effectively voting with their feedback, often without realizing it. We get 10,000 conversations a month, and deep inside them are insights no human could read through, that can decide what to build and fix next. All the pieces are there, you just need buy-in across product, design, and engineering.

Leo: How do you see the trade-off between speed and quality, and how do you ensure quality while automating more?

Eli: Early in my career we thought of speed versus quality like a DJ crossfader, ease off one to push the other. AI lets you press on both at the same time, especially as LLMs get better. The question is which teams are actively using all of this to do both. A year and a half ago everyone talked about hallucinations, and I haven't heard anyone say AI and hallucinate in well over six months. It's about buy-in, structure, and clarity on which parts of the process you fully entrust to AI. Even the best engineers in the world are now entrusting AI to write their code. It's a whole new world.

Leo: You were one of the first CX leaders to go all in on AI support. What surprised you most about production versus the demo?

Eli: It's like driving a Tesla with full self-driving. At first you're in awe and a little uneasy, then minutes later you're talking to a friend and not even thinking about it, and when it's off you want it back on. During testing it was almost disbelief, and you wonder what edge cases it'll get wrong. As long as you have permission from your team and leadership for it to get a little worse before it gets better, you're fine. People worry about AI going off the rails, but humans do that too. We once had a bad actor blocking users so they didn't have to respond. Early on, a customer kept asking our agent if it was AI and it just repeated "I'm Sam, how can I help?", so we added a rule to disclose and offer a human. That was two years ago when it was very new. The newer tools have all encountered and fixed those early edge cases, and we very rarely get surprised anymore.

Leo: Do you think a knowledge base can be fully automated, with new policies updated without human oversight?

Eli: It's already here. I'm a huge fan of what Mintlify is building. I once drew a FigJam diagram of everywhere a knowledge manager would need to keep up to date, and the list kept growing: Slack channels, Linear, Notion PRDs, Statsig for A/B tests, your Discord or Slack community, inbound messages, and most importantly in-person standups. Keeping up with all of that is impossible even for a dedicated human. Mintlify can read your GitHub repo and recommend changes you approve, and eventually, like self-driving, you'll get comfortable enough that at 3 a.m. while you sleep your documentation self-updates as changes are made.

Leo: If a head of CX is drowning in tickets and pressured to deploy AI, where do they start?

Eli: I did it partly out of desperation, growing a thousand tickets a week over week while losing an agent. My answer is to clear your calendar for a day, book back-to-back meetings with every vendor you can, and compare them one to one. On the backend, every AI agent is running a similar LLM with their own flavor and packaging, and they're all really good. If you're e-commerce, there may be one focused on that; if you're a financial institution, one focused on security and privacy. I have no data to back this up, but I'd be surprised if even 1% of all customer support requests worldwide are answered by AI today. We're in a tech bubble. My doctor's office just called me about a copay and couldn't take it online. So next year it might be 2%, then 4%, but there's a long way to go for this technology to reach the laggards.

Leo: Are there metrics that track AI support best?

Eli: Resolution rate is the key goal, and most AI agents are priced on resolution, which aligns the vendor and the company. Intercom did something interesting, moving away from CSAT toward a perceived-satisfaction score, because someone upset about a bug might give a one even though support was great. A combined QA-plus-perceived-satisfaction score, maybe even three levels like as expected, better, or worse than normal, with the worst getting a manager follow-up, is promising. Years ago at BetterCloud we built our own Customer Health Index from eight data sources in a weekly Google Sheet, basically a retention tool before they existed. Sometimes you have to build your own score.

Leo: Rapid fire. True or false, in three years tier-one support will be 100% automated.

Eli: False, only because I think it's already here. I know a company with a 90% resolution rate, and I'd be shocked if they weren't getting 100% of tier one.

Leo: What's the most overrated thing CX leaders are obsessing over?

Eli: Early on, all I wanted was to automate my own job away. But if you automate 100% of your job, you get far from the work you're supposed to do, and you stop feeling customer pain, which you need to provide a good experience.

Leo: Finish the sentence: the company that wins in 2026 isn't the one with the best AI, it's the one with the best...

Eli: Customer feedback loop. If two products are equal but one gets back to you instantly with a great answer and the other is off for the weekend, customers eventually move to the one with good support.

Leo: Eli, this was great. Where's the best place for people to follow your work?

Eli: I'm notoriously not on social media, despite working at a generative video company, but I'll be creeping on LinkedIn. You can find me as Eli Winderbaum, feel free to DM me.

Leo: Perfect, Eli. Thanks for joining us, and for everyone listening, if you want more honest conversations about what it takes to transform customer experience with AI, make sure you're subscribed to the Fini Podcast and we'll see you next time.

Eli: Thanks, Leo.

Which support jobs is AI changing first?

Eli Winderbaum sees tier-one frontline roles changing fastest, with his AI agent renamed "tier zero" resolving 65% of inbound and former tier-one agents promoted to cross-functional work. He also flags two less obvious roles: the documentation manager, replaced by auto-updating knowledge pipelines, and the QA manager, where AI scores every conversation and humans review only the outliers.

How fast can the support-to-product feedback loop be with AI?

At Mirage it can run in under an hour. The AI agent logs a request to Linear, votes accumulate, a coding agent builds the change at critical mass, a human approves it, and it flows back to the customer. The same loop used to take about a month when every step had a human in it.

Can a knowledge base update itself without humans?

Eli believes it is already possible. Tools can read your code repository and product changes and push updates into documentation, so over time you grow comfortable letting it self-update, much like getting used to self-driving. The challenge is that knowledge lives in many places, from Linear and Notion to Slack and in-person standups.

What metrics matter most for AI support?

Resolution rate is the headline, and resolution-based pricing keeps vendor and customer incentives aligned. Eli also points to perceived-satisfaction scores that separate frustration with a bug from the quality of support, and encourages teams to build their own customer health index when needed.

Check out more podcasts from Fini

See all