Frozen Knowledge, Live Questions: What it Takes to Keep AI Grounded in Reality

A Conversation Frozen in Time

There’s an assumption baked into every conversation we have with another person, one so automatic we as humans never think to question it: the person we’re talking to exists in the same present moment that we do. They know what happened last week, they’ve heard the news, if they work in a relevant field, their knowledge of it is reasonably current. We don’t preface our questions with “as of the information available to you at the time of your last update,” because we’ve never had to.

When we started talking to large language models, we brought that assumption with us because it felt reasonable. The conversations were fluid, the responses were confident, and nothing in the experience suggested we should adjust our expectations. So we didn’t.

The problem is that an LLM ELI5 An LLM is a type of AI system trained on enormous amounts of text, which gives it the ability to read, write, summarize, and reason in natural language. The "large" part refers to the sheer scale of the training data and the billions of numerical settings (called parameters) the model learns during that process. Models like ChatGPT, Claude, and Gemini are all LLMs. Think of it less like a database that stores facts and more like a system that has absorbed patterns from a vast amount of human writing and learned how to respond in kind. Wikipedia: LLM isn’t a person standing in your present. It’s closer to an extraordinarily well-read colleague who went on an extended sabbatical at some point in the recent past, retains everything they learned before they left, and has been completely unreachable since. Their reasoning is sharp. Their breadth of knowledge is impressive. But they missed everything that happened while they were gone, and more importantly, they have no way to tell you what they don’t know. They’ll answer from the knowledge they have, with the same confidence they’d have if that knowledge were current, because from where they sit, it is.

That gap between what an LLM knows and what’s actually true “right now” is the problem this post is about. Not because it makes AI useless, it clearly doesn’t, but because closing that gap is turning out to be a substantially harder engineering problem than most organizations assumed when they started building on top of these systems. The first attempt at closing it was straightforward and clever. It was also only a partial solution, and understanding why tells you a great deal about what you actually need to build to be successful in the current age as well as the next.

Reasoning and Knowledge Are Not the Same Problem

To understand why closing that gap is hard, it helps to understand what an LLM actually does when it “knows” something, because it’s not storing facts the way a database stores records or a library stores books.

When a model is trained, it reads an enormous amount of text and, through that process, develops a highly compressed internal representation of the patterns, relationships, and concepts it encountered on its ingestion of that text. Those patterns get encoded into billions of numerical weights, the parameters ELI5 When an AI model is trained, it reads through huge amounts of text and gradually adjusts billions of tiny numerical settings, called weights or parameters, to get better at predicting what comes next in language. By the end of training, those weights have effectively encoded everything the model learned about how language, concepts, and reasoning work. They're not a list of facts you can edit; they're more like the accumulated instincts baked into the model during the training process. Changing what the model "knows" means rerunning that entire expensive training process. Wikipedia: parameters that define how the model thinks and responds. The result is something that can reason, infer, generalize, and explain, but it isn’t a filing cabinet you can update with new folders and files whenever you’d like. The knowledge is distributed across the weights in a way that’s inseparable from the reasoning itself. You can’t reach in and change what it knows without retraining ELI5 Training an AI model is the process of exposing it to massive amounts of data and letting it gradually adjust billions of internal numerical settings until it gets good at its task. Think of it like learning to ride a bike: you fall, you adjust, you repeat, until the right behavior becomes instinctive. Retraining means running that entire process again, usually from scratch or close to it, because the knowledge baked in during the first run is distributed throughout the model in a way that's very difficult to surgically update. It's why you can't just "tell" an LLM something new and have it remember it forever; that information lives outside the model until someone commits to the expensive process of baking it back in. Wikipedia: retraining it, which is an expensive, time-consuming process that no organization wants to run every time the world changes.

This is where it’s worth drawing what I deem to be an important distinction about AI. There are two fundamentally different things we might want from an intelligent system: the ability to reason well, and the ability to know current facts. These are not the same thing, and because of this, they don’t have to be solved in the same way.

Consider how we handle this tradeoff in other domains. A lossy video compression algorithm drops frames we won’t notice and the picture remains watchable because most the information we need to understand what we’re seeing still made it through. A lossy audio compression algorithm drops packets we can hear, and suddenly the experience degrades in a way that matters like the annoying experience of your song skipping a beat. The tolerance for imperfection depends entirely on what you’re compressing and what you need it to do. Reasoning, it turns out, is more like video: imperfect but still enormously useful. A model that reasons at 90% fidelity is still transformatively capable. Knowledge of current facts is more like audio: the gaps are noticeable and the consequences of getting it wrong in a business context are very real and very expensive. You can tolerate a model that occasionally reasons imperfectly. You can’t as easily tolerate a model that confidently tells a customer the wrong policy terms because its knowledge is eighteen months stale.

The implication is that retraining the model isn’t the right solution to the knowledge problem. What you need is a way to hand the model current, accurate information at the moment it’s asked a question, so its reasoning engine can work on facts that are actually true right now. That’s the job retrieval-augmented generation ELI5 RAG is a technique that gives an AI model access to external information before it generates a response. Rather than relying only on what it learned during training, a RAG system first fetches relevant documents or data from an external source, then hands that material to the model to inform its answer. The "retrieval" part is the lookup; the "generation" part is the response the model produces using what it retrieved. It's the primary way organizations keep AI responses grounded in current, specific information rather than letting the model work from memory alone. Wikipedia: retrieval-augmented generation was designed to do.

RAG: The First Foray Into Solving the Knowledge Problem

Retrieval-augmented generation, which you’ll see abbreviated as RAG in almost everything written about enterprise AI right now, is conceptually straightforward. Before the model answers your question, the system goes and fetches relevant information from an external source, hands that information to the model as context, and then the model generates its answer based on what it was just given rather than solely what it learned during training. It’s a bit like the difference between asking a colleague a question cold versus handing them a briefing document 15 minutes before the meeting starts. The underlying thinking ability is the same; the quality of the answer improves because the quality and relevancy of the inputs improved.

To make that retrieval fast and semantically intelligent rather than just keyword-based, the information first has to be transformed into a form the system can search efficiently. This is where vectorization ELI5 Vectorization is the process of converting text into a set of numbers that represents its meaning in a way a computer can compare mathematically. Every word, sentence, or document gets translated into a list of numbers, called a vector, that acts as a coordinate in a vast conceptual space. Words or ideas that are semantically related end up at coordinates close to each other; unrelated ones end up far apart. This is what makes it possible to search by meaning rather than by exact keywords: instead of looking for a matching word, the system looks for matching coordinates. Wikipedia: vectorization comes in, and it’s worth taking a moment here because the concept does a lot of work in everything that follows.

A traditional database is like a library card catalog. You look something up by its exact name, its category, or a keyword you already know. If you don’t know the precise term, you don’t find it, and the catalog has no way to tell you that what you’re looking for is two shelves over under a slightly different name. Vectorization turns that card catalog into a topographical map, where meaning has geography. Every piece of text, whether it’s a sentence, a paragraph, or an entire document, gets translated into a set of coordinates that describes where its meaning lives in a vast conceptual space. “Dog” and “puppy” end up at coordinates very close to each other. “Dog” and “mortgage” end up far apart. “Mortgage” and “interest rate” end up in the same neighborhood. When you ask a question, your question gets translated into coordinates too, and the system goes looking for everything that lives nearby. That’s retrieval: find what’s conceptually adjacent to what was asked, without needing to know the exact words in advance. The imprecision is the feature, not the bug. You’re not looking for an exact match, you’re navigating by meaning.

This was a genuine leap forward. Suddenly you could point an AI system at a corpus of documents and have it find relevant information in response to natural language questions, even when the question used entirely different words than the documents did. For a lot of use cases, particularly ones involving relatively stable bodies of knowledge like product documentation, internal wikis, or regulatory guidance, basic RAG works well enough to be genuinely useful.

But it has a structural ceiling, and that ceiling becomes visible the moment you ask it to do something more than find adjacent paragraphs.

The problem is that flat retrieval finds nearby coordinates, but it can’t walk the relationships between them. It can surface a document about mortgage forbearance options and another document about California-specific lending regulations, but it doesn’t inherently know that those two things are connected, or how, or that the answer to your actual question requires understanding both simultaneously and reasoning about the relationship between them. The map shows you neighborhoods, but it doesn’t show you the roads or traffic between buildings.

For simple, self-contained questions, that’s fine. For the kinds of complex, multi-layered questions that enterprise environments actually generate, it starts to break down in ways that aren’t always obvious until something goes wrong.

The imprecision is the feature, not the bug. You're not looking for an exact match, you're navigating by meaning.

A Search Problem AI Has to Solve for Itself

It’s worth stepping back for a moment to appreciate the scale of what we’re actually asking AI retrieval systems to do, because I feel that most organizations are significantly underestimating it.

Google spent the better part of two decades solving search for humans. Not just indexing the web, which was hard enough, but solving freshness, meaning how quickly a change in the world propagates into search results. Solving ranking, meaning how you determine which of ten million relevant results is the one a person actually needs. Solving intent, meaning the difference between someone searching “jaguar” because they want wildlife information and someone searching it because they’re shopping for a car. These are problems Google threw thousands of engineers and billions of dollars at, and they’re still not fully solved.

What we’re now asking AI retrieval to do is a different version of that same problem, and in several ways it’s harder. We’re not building a system that helps humans browse to an answer at human speed. We’re building a system that needs to autonomously traverse a knowledge space at machine speed, go deep enough into that space to find out not just adjacent facts but the relationships between them, and do all of this in the fraction of a second before a response is generated. The dimensional complexity of semantic search ELI5 Semantic search is search that works by meaning rather than by exact word match. A traditional keyword search finds documents that contain the specific words you typed. Semantic search understands that "car loan rates" and "auto financing costs" are asking about the same thing, even though they share no words in common, and returns relevant results either way. It's the difference between a search that's looking for your exact phrasing and one that's trying to understand what you actually meant. Wikipedia: semantic search is orders of magnitude higher than keyword search. The freshness requirements are more demanding because the live state of a system, an account balance, a claim status, a current drug interaction record, can change by the minute. And unlike a human who clicks through several search results and synthesizes them manually, the AI has to do that synthesis automatically and get it right without a human’s manual judgment.

The framing that matters here isn’t “AI helping humans search better.” It’s that AI now needs its own search infrastructure, purpose-built for the way machines consume and traverse information, and we’re still in the relatively early stages of figuring out what that infrastructure actually looks like at the scale and depth enterprise use cases demand.

Beyond Flat: The Graph-Aware Alternative

If the limitation of basic RAG is that it finds relevant coordinates but can’t walk the relationships between them, the natural question is whether you can build a retrieval system that understands structure, not just proximity.

That’s the core idea behind what’s broadly called GraphRAG ELI5 In computing, a "graph" has nothing to do with charts or data visualizations. It's a structure made up of nodes, which represent individual things, connected by edges, which represent the relationships between them. Think of a map where cities are nodes and roads are edges, or a social network where people are nodes and friendships are edges. GraphRAG applies this same structure to knowledge: rather than treating information as a flat pile of documents to search through, it maps out the entities in that information and the explicit relationships between them, so an AI system can follow those connections when answering questions. Wikipedia: GraphRAG , an approach that treats knowledge not as a pile of documents to be indexed, but as a web of connected entities, facts, policies, states, and dependencies, and builds that web explicitly before any query is ever made. When a question comes in, instead of finding the nearest paragraphs and handing them to the model, the system can walk the graph: starting at a relevant node, traversing the edges that connect it to related concepts, and pulling in the relational context that a flat search would have missed entirely.

Microsoft’s GraphRAG project, which is probably the most publicly documented implementation of this approach, demonstrated something important: graph-aware retrieval is meaningfully better not just at answering specific factual questions, but at answering what researchers call “global” questions, the kind that require synthesizing themes or patterns across a large body of knowledge rather than pinpointing a single fact. Those are precisely the kinds of questions enterprise environments generate constantly.

This is where the implications for most organizations start to get uncomfortable, because the data infrastructure that GraphRAG requires is not the data infrastructure most enterprises currently have ready in their data warehouses.

Consider what a company actually wants when it deploys an AI assistant for its call center. A customer calls in and asks a question that seems simple on the surface: what are my options given my current situation? Answering that question well requires the system to know how the relevant policies work, which lives in documentation, and what the customer’s actual current situation is, which lives in live transactional systems. Then it needs to determine how those two things interact given the specific combination of circumstances this particular customer is in. That’s a graph-shaped question. It requires traversing relationships between a policy document, a customer record, a set of eligibility rules, and possibly a current system state that was updated an hour ago by some other channel or back-of-house system of record.

Most enterprises today have the policy documentation sitting in one place, often flat HTML files or PDFs that haven’t been meaningfully restructured in years, and the live system state sitting in an entirely different place, in databases and data lakes that weren’t designed with semantic traversal in mind. The assumption many organizations are making right now is that they can take those existing structures and plug a new AI layer on top of them and get the result they want. JPMorgan Chase’s chief analytics officer told CNBC that even with an $18 billion annual technology budget, realizing AI’s potential will take years, specifically because companies “do work in thousands of different applications, there’s a lot of work to connect those applications into an AI ecosystem and make them consumable.” Their response was to consolidate every data initiative under a single firmwide office with an explicit mandate to make data “AI ready,” an undertaking they’ve been candid will take years.

The inconvenient truth is that the data usually needs to be restructured before the AI solution can be effective, not after. Entities need to be identified and linked. Relationships need to be made explicit rather than implied. The graph needs to be built, and building it requires computational work upfront that most organizations haven’t budgeted for because they didn’t know it was part of the cost. The AI layer is often the visible, exciting part of these projects. The data restructuring underneath it is the unglamorous part that determines whether the whole thing actually works.

There is a value gap between what the technology is capable of and the ability to fully capture that within an enterprise.

The Gap, In Practice

The call center scenario isn’t hypothetical either. It’s the problem several large enterprises have already run into publicly, and the way they’ve described it is instructive because it reveals exactly where the architectural gap shows up in practice.

JPMorgan Chase built an internal AI assistant called EVEE, designed specifically for call center agents, that was engineered from the start to bridge both sides of the data problem: static policy knowledge on one side, live transaction histories on the other. An agent can ask something like “what are the current forbearance options for a mortgage in California” and get a concise, source-cited answer that draws on both the bank’s policy documentation and the customer’s actual account state simultaneously. That’s the goal. What makes JPMorgan’s public commentary valuable isn’t the product announcement, it’s the candor about what it actually takes to get there. Their chief analytics officer acknowledged a “value gap between what the technology is capable of and the ability to fully capture that within an enterprise,” specifically because the bank operates across thousands of different applications that weren’t built to talk to each other, let alone to be consumed by an AI system in real time. The data exists. Converting it into traversable “information” is the real undertaking.

Allianz offers a cleaner view of the same problem in an insurance context. Their Insurance Copilot, deployed initially for automotive claims in Austria, was built to do something precise: take a live claims record and compare it directly against the relevant policy documents, flagging discrepancies, cross-referencing invoices against incident descriptions, and suggesting next steps. That’s exactly the bridge between static knowledge and live system state that most organizations are struggling to build. Allianz built in a deliberate human-in-the-loop ELI5 Human-in-the-loop is a design pattern where an automated system handles the bulk of the work but pauses at defined decision points to get a human's sign-off before proceeding. It's the practical middle ground between "the AI does everything autonomously" and "a human does everything manually." In high-stakes applications, particularly ones where a wrong answer has real financial or safety consequences, human-in-the-loop requirements aren't a concession that the AI isn't good enough; they're a deliberate architectural choice that keeps accountability where it belongs. Wikipedia: human-in-the-loop requirement for final decisions on coverage and payouts, because when the live data is wrong, or the relationship between the policy and the claim is misread, the cost is a wrongly denied claim or an incorrect payout. The architecture reflects the stakes involved.

Epic, the dominant electronic health records platform in the US, has been navigating the same tension across a more complex data landscape. Their newer AI tools, including an assistant designed for revenue cycle management that bridges billing code knowledge with live claims data, require exactly the kind of dual-layer retrieval these other examples describe. What Epic has been candid about is the partitioning problem that emerges as you add more agents operating across more data: a billing assistant needs access to live claims records, but a patient-facing chatbot probably shouldn’t have the same access. Each agent requires carefully scoped data permissions, which means the graph you’re building isn’t just a question of connecting things, it’s a question of connecting the right things to the right systems with the right guardrails. That partitioning work is architectural rather than cosmetic, and it adds significant complexity to what initially sounds like a straightforward integration project. You’re now baking a level of access control into your graph edges and vertices which is a whole different unrelated dimension to track and keep synchronized.

The pattern across all three of these examples is consistent: the AI capability itself is not where the difficulty lives. The difficulty lives in the data, specifically in the gap between the way enterprise data has historically been stored and the way a graph-aware retrieval system needs it to be structured. Every organization that has gotten past the pilot stage has had to confront that gap directly, and the ones being candid about it are telling you that it’s more work than the initial project scope assumed.

The Shape of What Comes Next

Everything described so far, RAG, GraphRAG, agentic retrieval, is still operating within a particular paradigm: the system retrieves human-readable text, hands it to the model as context, and the model reasons over it to produce an answer. The retrieval and the reasoning are separate steps, connected by language as the medium of transfer.

There’s a direction emerging in research that challenges whether that handoff needs to happen in language at all.

Think about what happens when you try to explain something complex to another person. You build up a mental model of the thing in your own mind, with all of its internal structure and connections intact, and then you dismantle it into words, feed those words to the other person sequentially, and hope they reassemble something close enough to your original model on the other end. It’s an inherently lossy process. The explanation is always a compressed, linearized version of the understanding.

Now imagine instead of explaining it, you could simply hand the other person the mental model directly, with all of its structure and connections already intact, nothing lost in translation, nothing that needed rebuilding. That’s the direction a class of systems called Large Knowledge Models is pointing toward. Rather than retrieving text and feeding it to a model, these systems would retrieve knowledge already represented in a form the model can process natively, bundles of latent space ELI5 Latent space is the internal mathematical space where an AI model represents meaning. When a model processes text, it doesn't work with words directly; it converts them into positions in a high-dimensional numerical space where similar concepts cluster near each other. "Dog" and "puppy" are neighbors. "Mortgage" and "interest rate" are neighbors. This internal geography is the latent space. It's called "latent" because it's not directly visible; it's the hidden layer of understanding the model builds and operates within. When researchers talk about working "in latent space," they mean bypassing the text layer entirely and operating in this internal representation directly. Wikipedia: latent space embeddings that carry meaning the way the model itself carries meaning, bypassing the text layer entirely. The model doesn’t need to read an explanation. It receives the understanding.

This is early-stage research rather than something you can procure today, and it’s worth naming that clearly. But it matters to enterprise planning for a reason that has nothing to do with when LKMs ship as a product. The data structures that position an organization well for graph-aware retrieval today are the same structures that would position them well for latent-space retrieval tomorrow. Conversely, organizations that are still running flat, unstructured data lakes ELI5 A data lake is a large centralized repository where an organization stores raw data in whatever format it arrived in, structured or unstructured, without first organizing it into a rigid schema. The metaphor is intentional: like a lake versus a bottled water factory, a data lake holds everything in its natural state and lets you figure out what to do with it later. The upside is flexibility and scale. The downside, especially relevant to this post, is that raw, unstructured data isn't easily navigable by AI retrieval systems that need to understand relationships between things, not just locate files. Wikipedia: data lakes into basic RAG pipelines in 2026 aren’t just underperforming against current best practice, they’re building technical debt against a future that is coming regardless of the timeline.

Enterprise data infrastructure is slow and expensive to change, not because organizations lack the will but because the systems that depend on that data in its current form are deeply embedded in operations. The companies restructuring their data now, making it graph-aware, establishing clear entity relationships, building the connective tissue between static knowledge and live system state, are doing work that compounds. The companies waiting for the AI layer to mature before they address the data layer are likely to find that the pivot, when it becomes unavoidable, is considerably more expensive than the gradual restructuring would have been.

What To Do With This on Monday

If you’re leading or influencing technology decisions in an enterprise environment, the practical question isn’t whether graph-aware retrieval is the right direction. The research and the early enterprise implementations suggest fairly clearly that it is. The question is where you actually are relative to it, and what a realistic first move looks like from that position.

The most useful thing you can do right now is treat your data as the project, not the AI layer on top of it. That means starting with an honest audit of what you actually have: where your policy and procedure documentation lives, what format it’s in, how current it is, and whether the relationships between documents are explicit anywhere or simply implied by whoever wrote them. Most organizations doing this audit for the first time find that the answer is more fragmented than expected. Documents in multiple systems, maintained by different teams, with no shared vocabulary and no machine-readable structure connecting them. That’s the baseline you’re working from, and knowing it clearly is more valuable than any vendor conversation you could have before you’ve done it.

The second question worth asking is how far the gap actually is between your static knowledge and your live system state. In most enterprises these are not just different databases, they’re different organizational domains with different owners, different update cadences, and different assumptions about who consumes the data and how. Bridging them for AI retrieval isn’t primarily a technical problem, it’s a governance problem that happens to have technical components. Who owns the authoritative version of a given policy? How quickly does it need to reflect a regulatory change? Who decides when the live system data is trustworthy enough to be surfaced to a customer-facing AI without a human reviewing it first? These are questions that need answers before architecture decisions get made, not after.

For organizations that are further along and already running some form of RAG in production, the next honest question is whether what you have is actually doing the job or just appearing to. Basic RAG can look convincing in a demo environment with a curated document set and clean questions. It tends to degrade in production when the questions get messier, the document corpus grows, and users start asking things that require connecting information across multiple sources. If your system is producing answers that are technically sourced but contextually incomplete, that’s the structural ceiling showing up. Better prompt engineering isn’t going to save you in this case, it’s entirely more likely that retrieval architecture will.

On the vendor and tooling side, it’s worth knowing that the infrastructure for graph-aware retrieval has matured considerably in the past eighteen months. Tools like Microsoft’s GraphRAG are open and documented. Vector database ELI5 A vector database is a specialized storage system designed to hold and search through vector embeddings, the numerical coordinate representations of meaning described earlier in the piece. A regular database excels at exact lookups: find the record where customer ID equals 12345. A vector database is built for a different kind of question: find everything that's conceptually close to this query, ranked by how close. It's the infrastructure layer that makes semantic search fast and scalable, because without it, comparing your query's coordinates against millions of stored embeddings one by one would be far too slow to be practical. Wikipedia: Vector database infrastructure has gotten significantly cheaper, with newer storage architectures bringing costs down by an order of magnitude compared to where they were two years ago, which changes the economics of indexing at scale in ways that make broader implementations feasible for organizations that previously couldn’t justify the cost. The tooling is no longer the limiting factor. The data readiness usually is.

The longer arc here is worth keeping in mind as you prioritize. The organizations that will be well-positioned for whatever retrieval architecture emerges over the next three to five years are the ones building clean, connected, semantically rich data foundations now, not because they predicted the specific technology that would win, but because well-structured data is useful regardless of what sits atop it. The companies that are waiting until the AI layer is fully mature before they address the data layer are making a bet that the cost of catching up later will be lower than the cost of moving now. Based on what JPMorgan, Allianz, and Epic have each discovered independently, that bet is not paying off the way people hoped.

The gap between what an LLM knows and what’s actually true right now was always going to require more than a software update to close. The organizations treating it that way are the ones making real progress.

If this resonated, or if you think I got something wrong, I’d genuinely like to hear it. Come find me at LinkedIn.com/in/geoffgodwin or GitHub.com/geoffgodwin