Why We Don't Retrieve Your Documents

Why Fini skipped RAG and built an execution-first architecture for customer support

Every other AI support product we know of uses some version of retrieval-augmented generation. Embed the customer's help docs into a vector database, retrieve relevant chunks at query time, feed them to an LLM, generate a response. It is the default architecture for AI support in 2026. We looked at it early, built prototypes with it, and decided not to ship it.

This post explains why.

What RAG actually does in support

RAG was designed for open-domain question answering: you have a large corpus of text, a user asks a question, and the system finds the most relevant passages to ground the LLM's response. For a research assistant or internal search tool, this works well. The task is informational: find relevant text, summarize it, present it.

Customer support is not an informational task. A customer who writes "I was charged twice and I want a refund" is not asking you to find a document. They are asking you to check their billing history, evaluate their refund eligibility, calculate the correct amount, and process the transaction. Four of those five steps require computation against live data.

RAG answers the question: "What does the policy say?" Customers are asking a different question: "What does the policy mean for me, right now, given my account?"

The accuracy ceiling

We ran a benchmark early on. We took 500 real support tickets from a fintech deployment, ran them through a well-tuned RAG pipeline, and compared the outputs to the correct resolutions determined by human agents.

On simple informational queries ("What are your business hours?" "Do you support international transfers?"), RAG performed well. Accuracy above 90%.

On policy-dependent queries ("Am I eligible for a refund?" "Why was my transaction declined?" "Can I upgrade my plan mid-cycle?"), accuracy dropped to 72%. The failure mode was consistent: the retrieval step found the right policy document, but the generation step misapplied it to the customer's specific situation. The LLM would read a policy with multiple conditions and either ignore one or calculate the proration incorrectly.

This is not a retrieval quality problem. The right document was retrieved. The problem is that an LLM interpreting policy text is doing approximate reasoning: usually close to correct, occasionally confidently wrong. In customer support, "usually close" is not an acceptable accuracy standard.

We call this the accuracy ceiling of retrieval. You can improve your chunking strategy, fine-tune your embeddings, add reranking, and optimize your prompts. You will get incremental gains. But as long as the final step is "LLM interprets text and generates an answer," you are bounded by the model's ability to reason about rules it read, not rules it executes.

Here is how the two approaches compare across the query types that make up real support volume:

	RAG (Retrieval + Generation)	Structured Execution
Informational queries ("What are your hours?")	90%+ accuracy	90%+ accuracy
Policy-dependent queries ("Am I eligible for a refund?")	~72% accuracy	98%+ accuracy
Calculation queries ("How much is my prorated refund?")	Unreliable. LLM approximates math from text.	Deterministic. Function computes exact amount.
Multi-condition queries ("Can I get a refund if I'm past 30 days but have 12+ months tenure?")	Frequently drops conditions or misapplies them.	Evaluates all conditions every time.
Action queries ("Process my refund")	Cannot execute. Generates text confirming an action it did not take.	Calls the API. Refund is processed.
Data source	Static documents embedded at index time	Live customer data from billing, CRM, order systems
Failure mode	Confidently wrong (hallucination)	Escalates when uncertain
Setup effort	Low (embed docs, deploy)	Higher (encode rules, connect systems)

The informational row is roughly equal. Every other row favors execution, and those other rows represent the majority of tickets that actually require a support agent.

What we do instead

Fini does not retrieve documents at inference time. Instead, we operate on structured knowledge.

Policies become functions. A refund policy is not a paragraph the AI reads. It is a function that accepts inputs (purchase date, product category, customer tenure, refund history) and returns an output (eligible: yes/no, type: full/prorated, amount: $X.XX, reason: string). When a customer asks about a refund, we invoke the function against their actual data.

Customer data comes from live systems, not cached articles. When a customer asks "why was I charged twice?", we pull their transaction history from the billing system in real time, identify the duplicate, and calculate the refund from the actual transaction amounts. We are not searching for a "duplicate charge" article and hoping the LLM applies it correctly.

Actions are deterministic. When Fini processes a refund, it calls the Stripe API with the exact amount derived from the policy function. There is no step where an LLM decides what amount to refund based on its interpretation of a text passage.

The LLM still plays a role: intent recognition, conversation management, response generation. But it does not make policy decisions, calculate amounts, or determine eligibility. Those steps are executed by structured logic that produces the same correct answer every time, regardless of how the customer phrased their question.

The tradeoff

This architecture is harder to set up than RAG. A RAG pipeline can be functional in hours: embed your docs, wire up retrieval, deploy. Structured execution requires encoding your business rules as logic, connecting to your backend systems, and mapping your policy surface area. This is real configuration work.

We think the tradeoff is correct for customer support. Every response carries financial, legal, or retention weight. A wrong refund amount costs real money, a policy misapplication creates compliance exposure, and a hallucinated confirmation ("Your refund has been processed") when nothing happened destroys customer trust.

The setup cost is paid once. The accuracy gain compounds on every interaction.

Where this matters most

The gap between retrieval and execution shows up most in three scenarios:

Anything involving math. Prorated refunds, usage-based billing calculations, loyalty point balances, plan comparison pricing. LLMs are unreliable calculators. Functions are not.

Anything involving conditional logic. Policies with multiple qualifying criteria, regional variations, grandfathered plans, time-dependent rules. An LLM reading a policy document with four conditions will occasionally drop one. A function that checks all four conditions will not.

Anything requiring a confirmed action. When the customer needs something done, not just answered. Processing a refund, updating an address, cancelling a subscription, escalating to a specific team. The gap between "I've processed your refund" (true) and "I've processed your refund" (hallucinated) is the gap between an AI agent and a liability.

For purely informational queries with no account context, RAG is fine. "What are your supported countries?" does not need structured execution. But those queries are also the ones your help center already handles. The tickets that actually reach your support team are the account-specific, policy-dependent ones. That is where the architecture matters.

What this looks like in production

One of our fintech deployments processes roughly 50,000 support interactions per month. Before Fini, about 35% of tickets were handled by their chatbot (RAG-based, trained on their help center). The remaining 65% went to human agents, mostly because the chatbot could not reliably answer policy-dependent questions or take actions.

After switching to structured execution, 78% of tickets resolve autonomously. The accuracy rate on policy-dependent queries is 98%. The number of "AI gave the wrong answer" escalations dropped from around 400 per month to under 30. The cost per resolution went from $4.20 (blended human + chatbot) to $0.69.

The gains did not come from a better model or better prompts. They came from removing the step where an LLM interprets a document and replacing it with a step where a function executes a rule.

The bet

We are betting that the future of AI support is execution, not retrieval. That the right abstraction for a support agent is a set of skills (functions that do things) rather than a set of documents (text that describes things). That customers care about getting their problem solved correctly, and correctness comes from computation, not generation.

RAG will continue to work well for search, research, and informational products. For customer support, where every answer has consequences, we think the industry will move toward execution-first architectures. We just got there early.

What is RAG and why do most AI support tools use it?

RAG (retrieval-augmented generation) embeds your help docs into a vector database, retrieves relevant chunks when a customer asks a question, and feeds them to an LLM to generate a response. Most AI support tools use it because it is fast to set up and works reasonably well for informational queries. Fini chose a different path because RAG hits an accuracy ceiling on policy-dependent and action-oriented tickets, which are the majority of real support volume.

Does Fini use any form of document retrieval?

Fini does not retrieve documents at inference time. Instead, policies are encoded as executable functions and customer data is pulled from live systems (billing, CRM, order management) in real time. The LLM handles conversation flow and intent recognition, but policy decisions and calculations are handled by deterministic logic, not generated from retrieved text.

Can RAG-based AI support tools be improved to match structured execution accuracy?

Incremental improvements are possible through better chunking, reranking, and prompt engineering. But the fundamental limitation remains: an LLM interpreting policy text is doing approximate reasoning. On queries involving math, conditional logic, or confirmed actions, retrieval-based systems will continue to produce occasional confident errors. Fini's structured execution eliminates this class of failure entirely by computing answers instead of generating them.

What is the accuracy difference between RAG and structured execution in production?

In our benchmarking on 500 real fintech support tickets, RAG achieved 90%+ accuracy on informational queries but dropped to 72% on policy-dependent queries. Fini's structured execution maintains 98% accuracy across both categories, with zero hallucinations on financial calculations. The gap is widest on queries involving math, multi-condition policies, and actions that require confirmed execution.