Most AI applications built on RAG are paying for noise. The core problem: RAG retrieves text chunks that contain the relevant signal surrounded by irrelevant context — and sends all of it to the LLM.
A RAG system answers a question about a drug's formulary tier by retrieving several document chunks from its vector database. Each chunk might be 400–600 tokens. The relevant fact — "Tier 2 coverage for Type 2 diabetes" — might be 15 tokens. The other 585 tokens are tax: surrounding text the model reads, processes, and pays for, even though it contributes nothing to the answer.
The pattern: Teams optimize prompt engineering, chunk size, and retrieval parameters — all of which are micro-optimizations on a fundamentally wasteful architecture. The macro-optimization is replacing the architecture.
The numbers are from a fully reproducible benchmark — not marketing estimates. 45 domains, 7,928 queries, every result verifiable.
Source: Yarmoluk & McCreary, "Compact Knowledge Graphs vs. RAG and GraphRAG: A Reproducible Benchmark Across 45 Educational Domains," arXiv 2026. Full benchmark on GitHub →
The counterintuitive result: CKG uses 11× fewer tokens and achieves 3.8× better accuracy. This seems contradictory until you understand what the extra RAG tokens actually contain.
Retrieval Density Score (RDS) measures correct information delivered per token spent. CKG RDS = 0.001751. RAG RDS = 0.0000413. The 42× advantage means each CKG token carries 42× more factually correct information than a RAG token.
RDS = F1 Score / Mean Tokens Used CKG: 0.4709 / 269 = 0.001751 RAG: 0.1231 / 2982 = 0.0000413 Advantage: CKG delivers 42× more correct information per token
A RAG token is on average: ~5% signal (the relevant fact), ~60% context (surrounding prose that supports the fact), and ~35% noise (retrieved text that is not relevant to this query). CKG eliminates the noise and compresses the context — the signal-to-token ratio is structurally higher.
Fewer tokens does not mean less information when the removed tokens were noise. It means a higher density of correct information in the remaining tokens.
A CKG encodes domain knowledge as explicit entity relationships, not prose. The difference in representation efficiency is structural, not a compression trick.
A RAG system stores facts in sentences: "Ozempic (semaglutide) is a GLP-1 receptor agonist indicated for Type 2 diabetes management. Under the BlueCross Medicare Advantage formulary, Ozempic is covered at Tier 2 with prior authorization required for new starts..." At 40–60 tokens per fact, encoding 50 relationships requires 2,000–3,000 tokens of prose.
The same knowledge in CKG format:
ConceptID,ConceptLabel,Dependencies,TaxonomyID 1,GLP-1 Receptor Agonist,,FOUND 2,Semaglutide,1,CORE 3,Ozempic (Brand),2,CORE 4,Medicare Advantage Plan,1,CORE 5,Tier 2 Formulary Coverage,3|4,ADV 6,Type 2 Diabetes Indication,2,CORE 7,Prior Authorization Required,5|6,ADV ...
Each row averages 6–8 tokens. Fifty relationships = ~350 tokens including the header. The relationship between any two entities is expressed as a dependency ID — one token. RAG expresses the same relationship in a sentence — 15–40 tokens. Structure is the mechanism of compression.
The benchmark cost figures are actual API costs from running identical queries against both systems. Not projections. Not estimates.
If your team or product runs 10,000 LLM queries per month against a domain knowledge base, here is what the CKG vs. RAG difference looks like in production dollars.
| Metric | RAG | CKG | Savings |
|---|---|---|---|
| Tokens per query (mean) | 2,982 | 269 | 11× reduction |
| Monthly tokens (10K queries) | 29.82M | 2.69M | 27.13M fewer |
| Monthly API cost at $18/1M tokens | ~$537 | ~$48 | $489/month |
| Annual API cost | ~$6,444 | ~$576 | $5,868/year |
| Answer accuracy (F1) | 0.1231 | 0.4709 | 3.8× better |
At 100K queries/month the annual savings exceed $58,000 in API costs alone — before accounting for the accuracy improvement, which reduces the downstream cost of wrong AI answers. Token pricing varies by model; the ratio advantage holds across providers.
Teams facing high token costs often ask whether fine-tuning is a better solution than better retrieval. The answer depends on the cost bucket you are trying to reduce.
Fine-tuning bakes domain knowledge into model weights, reducing the amount of context needed at inference time. But it comes at significant upfront cost (GPU compute, labeled data, retraining cadence), does not generalize to new or updated domain facts, and still requires some retrieval context for current-state queries.
CKG delivers structured domain knowledge in a 269-token system prompt. No fine-tuning. No GPU cluster. No retraining cadence. Domain updates require swapping the .md file — minutes, not months. For production systems where domain facts change frequently (formularies, regulations, pipelines), CKG is more practical than fine-tuning.
For teams already fine-tuning, CKG-derived data is higher-quality training signal than raw text. Structured, relationship-explicit data produces more accurate domain-specific models than prose fine-tuning data. CKG and fine-tuning are complementary, not competing.
Further reading: What Is a Compact Knowledge Graph (CKG)? — architecture, format, and domain breakdown. What Is Retrieval Density Score (RDS)? — the metric that captures the accuracy-cost tradeoff.
Tell us your domain and current query volume. We will show you the exact token reduction and cost savings for your specific setup — in one session.
Book a 30-Minute Demo What Is a CKG? →