Generation is not enough — Johannes Bingen

Every AI product I use ends the same way. The model gives you something, and then you go put it somewhere yourself. It has no idea what you did with the output. Next session, it starts over.

For search and writing assistance, that's fine. But there's a different class of problem where the bottleneck was never generation to begin with. Teams don't fall behind because they can't produce a summary or draft a status update. They fall behind because information decays. The project board drifts from reality over the course of a few meetings. Tasks go stale after scope changes that only happened verbally. The doc everyone references reflects a decision that got overturned two standups ago. Keeping things accurate is where the time actually goes, and that part is still entirely manual.

I built a meeting notepad that does transcription and summarization. The summaries are good. But the moment one gets generated, it's a dead artifact. You read it, then go update your task tracker by hand, copy action items into Asana, rewrite the status in your Monday board. The AI did the hard cognitive work and then contributed nothing to the hard operational work of getting information where it needs to go.

What's out there

Asana shipped AI connectors that turn conversations into tasks. Zapier does similar things. These are trigger-action pipelines. They fire once and never look back. Nobody comes back three days later to check if the task still reflects what actually happened.

Linear is further along. Karri Saarinen wrote about "self-driving SaaS": the idea that software should operate on behalf of users with increasing autonomy, similar to SAE levels in self-driving cars. Their triage system already auto-labels, deduplicates, and routes issues. No human in the loop. That's the right direction.

MemGPT (now Letta) is worth studying. They model the context window as RAM and implement virtual memory on top of it: the LLM pages information in and out, decides what to persist long-term, and manages its own memory hierarchy across sessions.

Google Cloud framed this as the shift from "system of record" to "system of reason". Databases that don't just store data but reason over it. I'd push that further: the interesting system is one that reasons about the record and keeps it accurate.

Why maintenance matters more than generation

Generation quality is effectively solved for most business tasks. Summarize a meeting, draft an email, extract action items. These work well enough today.

The unsolved problem is keeping derived state (task boards, status pages, documentation) consistent with ground truth (conversations, decisions, work being done) as both evolve continuously. In most organizations running Asana or Monday, the board is accurate for about as long as it takes for someone to have an unrecorded conversation that changes a priority. After that it drifts, and it keeps drifting until someone spends thirty minutes manually reconciling.

A useful system here looks less like a chatbot and more like infrastructure. Something that observes work happening (meetings, messages, commits, decisions) and keeps the representation of that work current. The relevant metric isn't response quality. It's drift: how long before the system's model of reality and actual reality diverge.

Why it's hard to build

Models are stateless. Running them continuously is expensive. Long-lived context remains an open problem.

Materialize published a good analysis of where AI data infrastructure breaks down. Traditional architectures update on schedules. Streaming platforms have eventual consistency issues. Caching serves stale data. None of this was designed for a system that needs to reason about what's true right now, and the failure modes stack when AI makes downstream decisions on data that's already wrong.

You also can't solve this by scaling up context. Windows are finite. Cost scales quadratically with attention. Mem0 raised $24M to build a memory layer for AI agents. That amount of funding for just the memory problem tells you something about how far we are from having good answers here. What gets remembered, what gets structured, what gets evicted and recomputed on demand: that's where the actual systems work is. The model does the reasoning. Everything else is architecture.

Most of the industry is still focused on making single inferences better. The harder problem, and the one with more leverage, is the infrastructure that makes continuous inference reliable.

Where I'm going

I don't think better chatbots or more capable agents are the unlock. The unlock is systems that maintain truth over time. Software where the project board is accurate by default, not because someone spent their morning updating it.

That's what I'm building next.