Rag

  • Published on
    Most RAG stacks retrieve top-K chunks first and enforce permissions later in the app. At scale, this breaks the trust boundary and degrades retrieval quality. When users only have access to a subset of the corpus, post-filtering collapses top-K into a tiny context window, even when many relevant authorized chunks exist deeper in the index. The fix is to make retrieval identity-aware so authorization becomes part of ranking. In the blog, I walk through how to design identity-aware retrieval so access control is enforced during search, not after it.
  • Published on
    RAG is one of the most common use cases that has been implemented in the past couple of years. Retrieval Augmented Generation (RAG) is a technique that enhances the capabilities of LLMs by combining them with external knowledge sources. It involves retrieving relevant information from a knowledge base, incorporating it into the LLM's context, and then generating a response that leverages both the LLM's internal knowledge and the retrieved information. Building RAG applications requires integrating various components like vector databases and search algorithms, which can be quite involved. In this blog we'll briefly talk about RAG basics and leveraging OpenAI's assistants to build simple RAG applications.