AI Agents  

Foundry IQ: Agentic retrieval for more relevant AI responses

Introduction

As AI agents become more deeply embedded in enterprise workflows, the quality of their responses increasingly depends on how effectively they retrieve and reason over knowledge. Simply searching across multiple data sources is no longer sufficient—agents must be able to plan queries, select the right sources, reflect on retrieved results, and balance response quality with latency.

This article explores how knowledge bases powered by agentic retrieval address these challenges through query planning, federated search, and reflective search. By examining different retrieval reasoning effort levels and their impact on performance, we demonstrate why higher-level retrieval strategies consistently outperform direct search approaches. The results highlight measurable gains in response quality, with an average improvement of 36%, underscoring the value of structured retrieval and reflection when building enterprise-ready AI agents.

Foundry IQ

Foundry IQ, powered by Azure AI Search, serves as a unified knowledge layer for AI agents. It is built to enhance response quality, automate Retrieval-Augmented Generation (RAG) workflows, and deliver enterprise-grade grounding. Both Foundry IQ and Azure AI Search are core components of Microsoft Foundry.

Foundry IQ leverages the agentic retrieval engine in Azure AI Search knowledge bases (KBs) to address key challenges in agent development. It provides a single entry point for searching across all enterprise content, supports federated sources such as web grounding to enrich private data, and enables self-reflective search. Developers benefit from configurable latency–quality trade-offs, high-quality answer generation, and fine-grained steerability.

By configuring and registering a single knowledge base as a “super tool,” developers can grant agents access to multiple knowledge sources through one interface. This approach significantly simplifies agent development while clearly separating responsibilities between knowledge retrieval (knowledge bases) and knowledge consumption (agents).

Agentic Retrieval

The agentic retrieval workflow within knowledge bases automates and unifies multiple steps, enabling agents to efficiently search across a broad set of data sources. This section provides a detailed overview of the retrieval approach used.

Because the parameter space for agentic retrieval is extensive, exposing every option individually would be impractical. Instead, a single higher-level retrieval reasoning effort parameter is introduced, allowing developers to balance latency and response quality without the need to fine-tune numerous settings.

The following outlines the retrieval steps when the retrieval reasoning effort is set to medium.

k1

Step 1: Configuration

Knowledge bases are set up with the required knowledge sources, an Azure OpenAI deployment for internal reasoning, retrieval instructions, retrieval reasoning effort, and other supporting parameters.

Step 2: Source selection and query planning

When a request is submitted to a knowledge base, it is broken down into one or more sub-queries. Based on these sub-queries, appropriate knowledge sources are selected for retrieval. Retrieval instructions are applied at this stage to guide query execution, such as transforming queries or prioritizing specific sources.

Step 3: Federation

The sub-queries are executed against the chosen knowledge sources. These may include local searches over internally created vector and text indexes from Azure Blob Storage and Microsoft OneLake, as well as remote searches across systems such as SharePoint in Microsoft 365, web results via Bing, and MCP servers. For each sub-query and knowledge source pair, up to the top 50 results are retrieved.

Step 4: Ranking and filtering

Results from all sources—including those outside Azure AI Search—are scored and ranked using the semantic ranker. This ranker produces normalized, calibrated scores across different content types, ensuring consistency. It also generates extractive captions that act as query-aware summaries of each content chunk. Documents are grouped by sub-query and sorted by semantic ranker score. From each group, the top 10 documents are further evaluated using a semantic classifier optimized for downstream RAG tasks. In this architecture, the semantic ranker represents Layer 2 (L2) processing, while the semantic classifier represents Layer 3 (L3).

Step 5: Reflective search (iterative search)

In a subsequent step, captions from the top 10 documents are combined and analyzed by the semantic classifier to assess whether an additional retrieval pass is required. If the classifier determines that the information need is satisfied, the retrieval phase concludes. If the query is deemed complex or insufficiently answered, the captions are passed to an LLM prompt that reflects on query completeness. If the need remains unmet, the most relevant documents are retained, new queries and knowledge sources are generated, and a second retrieval iteration is performed. The retrieval phase completes after this iteration.

Step 6: Results merging

Results from multiple iterations, knowledge sources, and queries are consolidated using a multi-layer round-robin strategy. A final content response is assembled from the highest-quality documents, guided by the token budget defined by the retrieval reasoning effort or customer overrides.

Step 7 (Optional): Answer generation

Knowledge bases optionally include a comprehensive answer generation stage. Responses are generated with strong grounding to minimize hallucinations and support features such as inline citations, partial responses, tone and format steering (for example, bullet points or translation to French), and source-based organization (for example, “according to web” or “according to internal knowledge sources”).

Knowledge bases that incorporate query planning and reflective search deliver a 36% performance improvement compared to directly querying all knowledge sources. When configuring a knowledge base with multiple knowledge sources (up to 10) and evaluating it against complex queries across different retrieval reasoning effort levels, performance consistently improves at each higher level.

  • Minimal: Executes the caller’s query unchanged across all knowledge sources in parallel. The content output is evaluated with a 5,000-token budget, and the same answer generation step used in higher reasoning levels is applied for comparison. Note that the Minimal setting does not natively support answer generation.

  • Low: Processes the input query through query planning and source selection, federating requests across multiple knowledge sources. This level uses a 5,000-token budget for answer generation.

  • Medium: Applies query planning and source selection, federates across multiple knowledge sources, and allows for up to one reflective follow-up retrieval. The Medium level uses a 10,000-token budget for answer generation.

Increasing the retrieval reasoning effort from Minimal to Low results in a significant performance boost, primarily driven by query planning. Advancing to Medium introduces reflective search, further enhancing results. These improvements are consistent across all evaluated datasets, yielding an average performance gain of 36%.

Conclusion

Agentic retrieval fundamentally changes how AI agents access and reason over knowledge. By introducing query planning, federated search, and reflective search, knowledge bases move beyond simple keyword or parallel searches to deliver consistently higher-quality, more grounded responses. As demonstrated, increasing the retrieval reasoning effort—from Minimal to Low and then to Medium—drives measurable performance gains, with reflective search playing a key role in addressing complex information needs.

These results highlight the importance of structured retrieval strategies when building enterprise-ready AI agents, where accuracy, scalability, and control over latency–quality trade-offs are critical. By adopting knowledge bases with agentic retrieval, developers can significantly improve response performance while reducing system complexity.

Hope you find this article helpful. Happy reading and see you soon in the next article!