Agent Tools, Functions, Memory, and Evaluation
Key Takeaways
- A Foundry agent combines a model, instructions, tools, and managed runtime concepts so it can reason, retrieve, call APIs, and complete multi-step tasks.
- Knowledge tools ground answers, action tools change systems, and function or OpenAPI schemas define the callable contract the model can request.
- Threads or conversations store multi-turn state, but cross-session memory should be deliberately persisted, permissioned, summarized, and injected only when relevant.
- Tool calls require validation, least privilege, trusted outputs, retry handling, and human approval for costly, sensitive, or irreversible actions.
- Evaluation and tracing are production requirements: measure groundedness, relevance, safety, tool correctness, latency, token use, and failure patterns.
From Chatbot to Agent
A simple chatbot mostly generates text. An agent can reason over a goal, use tools, retrieve knowledge, call APIs, run code, and continue through multiple steps. Microsoft Foundry Agent Service provides managed agent runtime options so teams can define prompt agents quickly or deploy hosted agents with custom code while still using Foundry models, platform tools, identity, observability, and publishing controls.
Core Agent Pieces
| Piece | Purpose | Design question |
|---|---|---|
| Model | Reasoning and language capability | Does the task need a large, small, reasoning, or multimodal model? |
| Instructions | Role, goals, constraints, and policies | What should the agent always do or refuse? |
| Tools | Knowledge access or external action | What can the agent read, calculate, or change? |
| Conversation state | Multi-turn context | What history is needed for this run? |
| Memory store | Durable user or task facts | What should persist across sessions, and who may see it? |
| Evaluation | Quality and safety evidence | Which metrics block release or trigger rollback? |
AI-103 uses terms such as agents, conversations or threads, messages, runs or responses, tool schemas, memory, tracing, and evaluation. In the classic thread model, a thread stores conversation messages, and a run invokes the agent against that thread. Newer Foundry agent patterns expose a Responses API and conversation concepts, but the same exam idea remains: state is explicit, scoped, and inspected.
Knowledge Tools and Action Tools
Knowledge tools add context. Examples include Azure AI Search, file search, web grounding, Fabric, or a knowledge base. They should return source metadata so the agent can cite evidence and so tracing can explain what was retrieved.
Action tools change or inspect external systems. Examples include function calling, Azure Functions, Logic Apps, OpenAPI tools, Model Context Protocol endpoints, code interpreter, and browser automation. The model may choose a tool, but the platform or host code must enforce policy. For a purchase order agent, reading vendor status might be automatic. Submitting an order, changing a bank account, or emailing a customer should require confirmation and least-privilege credentials.
A tool schema is the contract the model sees: name, description, parameters, required fields, enum values, and expected result shape. Good schemas are narrow and obvious. Do not expose a generic run_sql tool when a read_vendor_invoice_status tool would limit risk. Validate arguments before execution, validate outputs before reusing them in prompts, and treat tool output as untrusted because retrieved or returned text can contain indirect prompt injection.
Memory and Threads
Conversation memory is not the same as RAG. RAG retrieves source documents; memory preserves user or task context. A thread can remember prior turns for one conversation, but cross-session memory should be stored in a governed database or index keyed by user, tenant, or task. Store only useful facts, summarize when needed, expire stale memory, and avoid injecting sensitive or irrelevant history into every prompt.
Evaluation, Tracing, and Operations
Foundry evaluations can test agents, models, or datasets before deployment and monitor production quality afterward. For RAG and agents, measure retrieval relevance, final-answer relevance, groundedness, response completeness, safety, tool success rate, refusal correctness, latency, and token cost. Tracing captures the evidence trail: prompt construction, retrieved chunks, model calls, tool invocations, outputs, errors, and timing. When an agent gives a bad answer, traces show whether the defect came from weak instructions, missing retrieval, a bad tool result, or an unsafe tool choice.
Official Anchors
- Foundry Agent Service overview: https://learn.microsoft.com/en-us/azure/foundry/agents/overview
- Agent tools overview: https://learn.microsoft.com/en-us/azure/foundry-classic/agents/how-to/tools-classic/overview
- Threads, runs, and messages: https://learn.microsoft.com/en-us/azure/foundry-classic/agents/concepts/threads-runs-messages
- Foundry evaluations: https://learn.microsoft.com/en-us/azure/foundry/how-to/evaluate-generative-ai-app
Which THREE controls are most appropriate before allowing a Foundry agent to call an external API that updates customer records? (Select three)
Select all that apply