6.6 Azure OpenAI On Your Data and AI Agents

Key Takeaways

  • On Your Data delivers managed RAG by attaching a data source (Azure AI Search, Blob, Cosmos DB, uploaded files, URLs) to chat completions, returning grounded answers with citations and no custom retrieval code.
  • Key On Your Data parameters: query_type (simple/semantic/vector/vector_semantic_hybrid), in_scope (answer only from data), strictness 1-5, and top_n_documents 1-20.
  • AI agents extend LLMs with planning plus tool calling, code interpreter, file search/retrieval, and persistent memory to complete multi-step tasks autonomously.
  • Azure AI Foundry Agent Service is the managed platform: create agent (model + instructions + tools) -> create thread -> add message -> run -> handle requires_action tool calls -> read result.
  • Pick the simplest approach that works: On Your Data for standard RAG, custom RAG for full control, agents for dynamic multi-step tool-driven workflows.
Last updated: June 2026

Quick Answer: On Your Data attaches a data source (Azure AI Search, Blob, Cosmos DB, files, URLs) to chat completions for managed RAG with citations and no retrieval code. Agents add planning + tool calling to do multi-step work. Azure AI Foundry Agent Service is the managed agent platform: create agent → thread → message → run → handle requires_action → read result.

Azure OpenAI On Your Data

Instead of writing the embed→search→ground pipeline yourself (section 6.3), you point chat completions at an index and Azure handles retrieval, injection, and citation.

Data sourceHow it's used
Azure AI SearchExisting full-text + vector index
Azure Blob StorageFiles auto-chunked and indexed
Azure Cosmos DBNoSQL document content
Uploaded filesDrag-in files, auto-processed
URLs / webCrawled and indexed
resp = client.chat.completions.create(
  model="gpt4o-chat",
  messages=[{"role": "user", "content": "What is our return policy?"}],
  extra_body={"data_sources": [{
    "type": "azure_search",
    "parameters": {
      "endpoint": "https://my-search.search.windows.net",
      "index_name": "company-docs",
      "authentication": {"type": "system_assigned_managed_identity"},
      "query_type": "vector_semantic_hybrid",
      "embedding_dependency": {"type": "deployment_name",
                               "deployment_name": "embed-large"},
      "in_scope": True,        # answer ONLY from the data
      "strictness": 3,         # 1 permissive .. 5 strict
      "top_n_documents": 5}}]})
# message.context.citations holds the grounded sources
ParameterPurposeValues
query_typeRetrieval methodsimple, semantic, vector, vector_semantic_hybrid
in_scopeRestrict to the data sourcetrue / false
strictnessRelevance threshold1 (loose) - 5 (strict)
top_n_documentsDocs retrieved1 - 20

On the Exam: in_scope=true forces answers to come only from the data (else it declines), and higher strictness drops marginally relevant chunks — set it high when precision matters more than recall. vector_semantic_hybrid is the most comprehensive query_type.

AI Agents (Agentic AI)

An agent wraps an LLM with the ability to plan, call tools, run code, retrieve documents, and remember state — turning a one-shot chat into an autonomous multi-step worker.

CapabilityWhat it addsExample
Tool / function callingInvoke your APIsQuery a database, hit a weather API
Code interpreterWrite & execute code in a sandboxAnalyze a CSV, make a chart
File searchBuilt-in RAG over uploaded filesCite a PDF spec
Multi-step reasoningDecompose tasksResearch → analyze → summarize
Memory (threads)Persist conversation stateRecall earlier user constraints

Azure AI Foundry Agent Service Lifecycle

from azure.ai.projects import AIProjectClient
from azure.identity import DefaultAzureCredential

client = AIProjectClient(credential=DefaultAzureCredential(),
                         endpoint="https://my-project.services.ai.azure.com/")

agent = client.agents.create_agent(            # 1. define agent
    model="gpt-4o", name="Product Assistant",
    instructions="Help customers find and compare products.",
    tools=[{"type": "function", "function": {"name": "search_products", ...}},
           {"type": "code_interpreter"}, {"type": "file_search"}])

thread = client.agents.create_thread()         # 2. conversation thread
client.agents.create_message(thread_id=thread.id, role="user",
    content="Find a laptop under $1000 with 16GB RAM")  # 3. add message
run = client.agents.create_run(thread_id=thread.id, assistant_id=agent.id)  # 4. run

while run.status in ("queued", "in_progress", "requires_action"):  # 5. tool loop
    run = client.agents.get_run(thread_id=thread.id, run_id=run.id)
    if run.status == "requires_action":
        outputs = []
        for call in run.required_action.submit_tool_outputs.tool_calls:
            result = run_tool(call.function.name, json.loads(call.function.arguments))
            outputs.append({"tool_call_id": call.id, "output": json.dumps(result)})
        client.agents.submit_tool_outputs(thread_id=thread.id, run_id=run.id,
                                          tool_outputs=outputs)
# 6. read final answer
print(client.agents.list_messages(thread_id=thread.id).data[0].content[0].text.value)

The requires_action status is the heart of the pattern: the agent has decided to call a tool and pauses until your code executes it and submits outputs back. The run then resumes; this can repeat several times before reaching completed.

Choosing a RAG/Agent Approach

ApproachEffortControlBest for
On Your DataLowLimitedStandard managed RAG fast
Custom RAG (code)MediumFullCustom ranking/pre/post-processing
Agent + toolsHighMaximumDynamic multi-step, tool-driven tasks

On Your Data vs. building it yourself

On Your Data trades flexibility for speed. You cannot fully control the chunking, the ranking algorithm, or post-processing the way a hand-built pipeline (section 6.3) allows, but you also write almost no retrieval code and get citations for free. Choose it for standard "answer questions over my documents" requirements; choose custom RAG when a scenario demands bespoke ranking, filtering, or transformation of retrieved content before it reaches the model.

Why agents differ from plain RAG

The distinction the exam draws is autonomy and steps. A RAG call retrieves once and answers once. An agent can decide to search, then call an API, then run code, then answer — looping through requires_action as many times as the task needs, carrying state in its thread. That power costs complexity: more tokens, harder debugging, and the need to secure every tool the agent can invoke. Built-in tool types — code_interpreter for sandboxed code and file_search for managed RAG over uploaded files — let you add common capabilities without writing your own functions.

On the Exam: The 2026 AI-102 leans into agents. Memorize the lifecycle order (create agent → thread → message → run → handle requires_action → read result) and that requires_action = waiting for your tool output, not an error or completion. Prefer the simplest option that satisfies the requirement — don't reach for an agent when On Your Data or a single function call suffices.

Test Your Knowledge

What does the in_scope parameter do in Azure OpenAI On Your Data?

A
B
C
D
Test Your Knowledge

In the Azure AI Foundry Agent Service, a run reports status 'requires_action'. What does this mean?

A
B
C
D
Test Your Knowledge

Which On Your Data query_type returns the most comprehensive results?

A
B
C
D
Test Your Knowledge

A team needs standard question-answering over a single Azure AI Search index with citations and minimal code. Which approach is most appropriate?

A
B
C
D