LLMs are great game masters for about forty-five minutes. Then they forget that the blacksmith died in session two, resurrect him with a different name, and give him your sword.
I am building a desktop TTRPG engine (Rust + Tauri + React) where the LLM narrates, but a knowledge graph remembers. This post is about the architecture.
The core problem: context is not memory
The naive approach is to stuff the whole campaign history into the prompt. This fails in three ways:
- It doesn't scale. A long campaign is hundreds of thousands of tokens.
- It doesn't prioritize. The model treats "the innkeeper has a limp" and "the king is secretly dead" with equal weight.
- It drifts. Summarizing summaries of summaries is a game of telephone with your own plot.
The architecture
The engine keeps three layers, each with a different job:
┌─────────────────────────────────────────┐
│ LLM (narration) — stateless │
├─────────────────────────────────────────┤
│ Knowledge graph (facts) — entities, │
│ relationships, world state, timestamps │
├─────────────────────────────────────────┤
│ Event log (truth) — append-only │
│ record of everything that happened │
└─────────────────────────────────────────┘The event log is the source of truth — append-only, never edited. The graph is a projection of the log: entities (NPCs, places, items) as nodes, facts as edges with timestamps. Before each narration turn, the engine queries the graph for entities mentioned in the player's input plus one hop of neighbors, and injects only that subgraph into the prompt.
If this sounds like event sourcing with a read model, that's because it is. Game state is just business state wearing a cloak.
What the rules engine taught me
The game system is PbtA (Powered by the Apocalypse), where rules are loose and fiction-first. Ironically, that made the formal layer more important, not less: when the rules don't constrain the world, the data model has to.
The blacksmith stays dead now. The graph has an edge that says so, with a timestamp, and the prompt builder will not let the model forget it.
Status
Working prototype, very alpha. The most surprising lesson so far: the hard part was never the LLM. It was deciding, formally, what a "fact" is — which is the same hard part as every system I've ever built.