Insights
Insights

SVP, Chief Clinical Officer
The enterprise appetite for AI agents is real.
But Gartner also predicts that over 40% of those agentic AI projects will be canceled by the end of 2027 due to escalating costs, unclear business value, and/or inadequate risk controls.
Three out of four technology leaders surveyed by BCG say they fear "silent failure" - spending real money on AI that doesn't deliver real imp-act, or worse, that introduces subtle errors nobody catches until the damage is done.
The root cause of a lot of this fear is memory.
Large language models are stateless. Every single interaction starts from scratch. The "memory" we hear vendors talk about is not something these models do natively. It's a complex, often brittle system bolted onto the side. And when it breaks, it tends to break quietly.
This article is about what that actually means in practice. We’ll discuss the engineering tradeoffs, the architectural options, and the security questions you need to be asking before you put an agent into production.
Model providers are in an arms race over context windows. The implicit promise is simple: just feed your agent the entire history and let it figure things out.
But reality is different. Anthropic published research on what they call "context rot". The gist: as the context window fills up, the model's ability to accurately recall specific information degrades.
This is because every token you add creates n² pairwise relationships that compete for the model's attention. Context, Anthropic argues, should be treated as a finite resource, with diminishing marginal returns.
An analogy might be helpful. A context window isn't a filing cabinet where you can keep adding folders and retrieve any one on demand. It's more like a crowded room. The more people you cram in, the harder it gets to have a meaningful conversation with any one of them.
In fact, researchers at AAAI found that ChatGPT's effective working memory capacity is "strikingly similar" to humans, roughly 7±2 items, regardless of how large the context window technically is.
So we have models with 128,000-token windows that functionally remember about as much as you do when someone reads you a phone number (an exaggeration, but you get the point.)
Most production agent memory today runs on some variant of Retrieval-Augmented Generation (RAG). The basic idea sounds reasonable. You store past interactions, retrieve the relevant ones when needed, and then inject them back into the prompt.
But as Dan Giannone laid out in a detailed breakdown, the actual chain of operations is long and fragile:
If any single link in that chain breaks, the whole mechanism fails. And the problem is, when it fails, it often looks like it's working. The agent doesn't say "I don't know." It produces a confidently wrong answer, and nobody can easily trace where the error came from.
Beyond the fragility of the retrieval pipeline, there are deeper structural issues with how most agent memory systems work today.
There's no universal answer here. The right architecture depends on your use case, your data complexity, and what kinds of failures you can tolerate.
That said, it helps to understand what's actually available, because most of the guidance out there is either theoretical, only tested at demo scale, or ignores the realities of running this stuff in production.
A couple of the more common approaches:
This is the default. Store text as high-dimensional embeddings, retrieve by cosine similarity.
It's fast, scales well, and integrates natively with LLM pipelines. For document retrieval and Q&A over unstructured knowledge bases, it works fine.
The problems show up when you need relational understanding, temporal awareness, or complex queries that go beyond "find me something similar to this." Metadata filtering degrades performance. And you inherit all the context rot issues we discussed above.
Best for: Document retrieval, straightforward Q&A, simple conversational agents where losing some context isn't catastrophic.
Store information as nodes and edges - who said what about whom, when, and why.
This gives you explicit relationship modeling, multi-hop reasoning, and temporal awareness. For domains with highly structured, interconnected data like fraud detection, supply chain analysis, or complex organizational knowledge, graphs are powerful.
The catch is upfront cost. Schema design is significant. Knowledge graphs don't handle unstructured content well, and they have no native semantic search. You're trading flexibility for precision.
Best for: Knowledge management, compliance, any domain where the relationships between entities matter more than the raw text.
Combine a vector store for semantic search with a knowledge graph for structured reasoning.
In theory, you get the best of both. In practice, you're managing and synchronizing two complex database systems. Consistency between them is a real engineering challenge, and queries that span both can be slow.
Best for: Mission-critical agents that need both semantic and relational understanding, assuming you have the team and budget to maintain the complexity.
This one is interesting.
Published in February 2026, the approach uses two background agents - an Observer and a Reflector - that continuously compress conversation history into a structured, dated log that stays in the context window. No external retrieval step at all.
VentureBeat reported this approach scored 94.87% on the LongMemEval benchmark, and can reduce costs by roughly 10x through compression and prompt caching.
By eliminating the retrieval step entirely, in theory you eliminate a major failure point. The downsides are that it's still new, requires running additional background agents, and the reflection process can invalidate the cache (reducing those cost savings).
Best for: Long-running, multi-turn conversational agents where context accuracy is the priority and you're willing to be an early adopter.
Simple text files, structured progress logs, timestamps. Think of it as a git history for your agent's state.
Letta's research found that simple file-based systems actually outperformed more specialized approaches in some benchmarks. Anthropic's own recommendation for long-running coding agents uses structured progress files. And the State and Memory paper argues that finite-state automata with explicit state tracking is often all you need for procedural tasks.
It's dead simple, human-readable, and easy to debug. It doesn't scale for complex multi-agent systems, and it won't give you semantic search. But for procedural workflows with clear sequential steps, don't overthink it.
Best for: Procedural automation, long-running coding tasks, predictable workflows where transparency and debuggability matter more than sophistication.
If you're an enterprise leader thinking about deploying agents, here's how we'd think about staging the work.
Building production-grade AI agents with reliable memory is genuinely hard.
The research is moving fast - observational memory, Google's work on scaling agent systems across 180 configurations, the Mem0 framework showing 26% accuracy gains with 90% token savings - but we're still in the early chapters of figuring this out.
What we do know is that memory can't be an afterthought. It's not a feature you bolt on after the demo works. It's the foundation that determines whether your agent is a reliable tool or an expensive liability.
The memory problem is solvable. But solving it requires treating it with the same rigor you'd apply to any other piece of production infrastructure: architecture, security, governance, and a healthy respect for what can go wrong.
Partner with Us
Making better decisions leads to measurably better outcomes. With a solid data and AI foundation, businesses can innovate, scale, and realize limitless opportunities for growth and efficiency.
We’ve built our Data & AI capabilities to help empower your organization with robust strategies, cutting-edge platforms, and self-service tools that put the power of data directly in your hands.
Self-Service Data Foundation
Empower your teams with scalable, real-time analytics and self-service data management.
Data to AI
Deliver actionable AI insights with a streamlined lifecycle from data to deployment.
AI Powered Engagement
Automate interactions and optimize processes with real-time analytics and AI enabled experiences.
Advanced Analytics & AI
Provide predictive insights and enhanced experiences with AI, NLP, and generative models.
MLOps & DataOps
Provide predictive insights and enhanced experiences with AI, NLP, and generative models.

Healthcare
Data-Driven Development of a Patient Engagement Application
We partnered with a healthcare provider to build a scalable patient engagement app with real-time insights and secure document management. Leveraging advanced data analytics, the platform ensured continuous improvement in patient care and operations.

Professional Services
Navigating Trust in Emerging Technologies
A multinational firm analyzed public sentiment on emerging technologies using AI and NLP. The insights revealed privacy concerns and opportunities, helping the client prioritize investments in ethical practices and transparency.
Ready to embrace transformation?
Let’s explore how our expertise and partnerships can accelerate impact for your organization.