From Agentic Pilot to Production, Part 3: Importance of The Context Layer
In this third post in our series, From Agentic Pilot to Production, we look at why AI Agents hallucinate.
(You can find the earlier two posts here: Part 1: Autonomy with Brakes: Why Refusal Comes First and Part 2: Disobedient or Just Probabilistic.)
Over the past year, we at the Real Story Group have advised several large enterprises on their Agentic AI initiatives. The same weakness keeps showing up: the context layer. We cannot name clients, but the pattern is consistent.
Teams build an agent that queries internal data, answers questions, and even suggests actions. In a prototype or pilot, it looks strong. Leaders want to roll it out.
Then as you put it in production, it exposes the gaps. The agent starts guessing by mixing stale data with fresh data. It violates budget or consent rules.
Why does this happen? Let's dig in.
A Case Study
One enterprise RSG worked with built a marketing analytics agent. You could ask it questions like "why did our conversion rate drop last quarter?" and it would pull data from dashboards and give you a very confident answer. As an LLM will do.
In testing, it worked well, yet in production, the agent started returning data from the previous year. It confused metrics that had similar names but different definitions. It made confident assertions based on numbers that were three months stale. It couldn’t recognize what “current year” meant - financial year, calendar year, or something else.
The team's initial instinct was to fix the prompts by making them more comprehensive. They did this by adding more instructions, more guardrails, and more examples. It helped a little. But the hallucinations (or “confabulations,” a more precise term for hallucinations…more on this by my colleague Scott Simmons) kept coming.
What actually fixed it? They went back to the underlying data and added additional columns with clear definitions. They documented what each field meant. They tagged when the data was last updated. They specified which source was authoritative when multiple systems had the same metric.
The agent started working better.
This is a story we hear over and over. And it points to something most teams miss when they move from POC to production: the context layer.
What Is a Context Layer?
When a new employee joins your company, you don't just hand them a laptop and say, "figure it out." You give them context. You explain what terms mean. You tell them the company hierarchy. You tell them who owns what. You point them to the right sources. You even explain the unwritten rules.
AI agents need the same thing, but you can’t just rely on prompts.
You require a context layer with structured information an agent needs to act correctly. It includes:
- What your terminology means (and doesn't mean)
- What the current state of things is (and how fresh that information is)
- What constraints apply (budgets, policies, compliance rules)
- Who owns what (and how often things get updated)
This is different from prompt engineering. A prompt is a one-off instruction. The context layer is a governed, maintained, version-controlled body of knowledge that multiple agents can draw from.
Think of it as the difference between telling someone something once versus adding it to an application tool tip, or writing it down in an employee handbook that gets updated regularly.
Why This Matters for Enterprises
Let's get concrete. We built an agent demo for RSG’s Enterprise MarTech Leadership Council members a while back. The agent was supposed to help with marketing planning. Here's what kept breaking when we didn't have proper context:
Generic plans and fantasy budgets. When the agent relied solely on social listening data, it produced vague, off-brand campaign recommendations. It had no sense of what was realistic for the company.
Wrong channel recommendations. A B2B enterprise would get advice on investing in Instagram influencers. The agent didn't know the business model.
Stale metrics presented as current. The agent would confidently cite numbers that were months old. It had no way to know what was fresh and what wasn't.
Conflicting definitions. Ask about "conversion rate", and the agent might pull from three different systems with three different definitions. The answer would be technically correct and completely useless.
Hallucinated statistics. When the agent couldn't find a source, it would make something up rather than admit it didn't know.
No refusal. This was the worst one. The agent would guess rather than say, "I don't have enough information to answer this." It would rather be confidently wrong than helpfully uncertain. I covered this in Part 1 of this series.
Every one of these problems traces back to missing context.
The Four Components
We analyzed the failure modes from our agent demo and asked: what was missing in each case? And this resulted in the following four components that make up a functioning context layer. These are all conceptual, and they may overlap, or you may need fewer or more layers.
- Hallucinated definitions → need semantic enrichment
- Stale data → need state with freshness
- Agent-violated business rules → need policy and guardrails
- Definitions decayed over time → need governance
Semantic Layer
Definitions, taxonomies, and metric formulas. "Conversion" means something different in Marketo, web analytics, and e-commerce. If you haven't documented which one you mean, there is a high likelihood of the agent picking up the wrong one.
State Layer
Current data plus freshness metadata. This includes not just the numbers, but also the facts about when they were last updated, where they came from, and how long they remain valid. Stale data presented as fact is worse than no data at all.
Policy Layer
Business rules and constraints that govern what the agent can recommend. This includes parameters like budget caps, channel limits, and regional compliance. Without these encoded, agents produce recommendations that sound reasonable but violate your actual rules.
Governance Layer
Ownership and maintenance. Every definition needs a named owner and a refresh schedule. Without accountability, when context decays, and no one is accountable, you're back to hallucinations sooner than latter.
Why Enterprises Get This Wrong
When teams realize they need better context for their agents, they usually make one of these mistakes:
Stuffing everything into prompts. You can cram a lot of context into a prompt. But it doesn't scale. There's no version control. No audit trail. No way to update it systematically. And you hit token limits fast.
Buying a tool and expecting magic. Vendors push knowledge graphs, vector databases, and data catalogs. These all have a role to play. But none of them solves the underlying problem: someone has to define what things mean. The tool just stores whatever you put into it. Garbage in, garbage out.
Treating it as a technology problem. The hardest part of building a context layer isn't the technology. It's getting humans to agree on definitions and maintain them over time. That's an organizational problem, not a technical one.
Building in silos. Each team creates its own context for its agents. Marketing has one definition of "customer." Sales has another. Finance has a third. The agents work fine within each team. They break the moment you need cross-functional insight.
Final Thoughts
A good context layer should be Versioned and auditable, owned, validated before use, and should allow graceful refusal.
That sounds like a lot, and at one level it is, yet building a context layer is not technically difficult. The components are well understood. Patterns exist. You can implement something basic in a few weeks.
However, obtaining organizational buy-in isn’t trivial. Getting marketing and sales to agree on what "customer" means. Getting someone to take ownership of the metric definitions. Getting teams to actually maintain the context over time instead of letting it decay.
If you can't align humans on what "conversion" means, the LLM won't save you. It will just hallucinate faster and more confidently.
The context layer is where the organizational discipline required for AI in production becomes visible. You can't fake it or buy your way out of it. You just have to put in the hard work. That's the first lesson (among many!) from every POC that we’ve seen fail in production. The technology more or less worked, but the context didn't.
For More...
If your firm is an RSG corporate member, you have access to the complete case study and learnings, as well as a private review of your agentic strategy to date. For more practical support converting your pilots to productive solutions, contact us about consulting offerings.