- Published on
Most RAG stacks retrieve top-K chunks first and enforce permissions later in the app. At scale, this breaks the trust boundary and degrades retrieval quality. When users only have access to a subset of the corpus, post-filtering collapses top-K into a tiny context window, even when many relevant authorized chunks exist deeper in the index. The fix is to make retrieval identity-aware so authorization becomes part of ranking. In the blog, I walk through how to design identity-aware retrieval so access control is enforced during search, not after it.