Agent Tools, Functions, Memory, and Evaluation

Key Takeaways

A Foundry agent combines a model, instructions, tools, and managed runtime concepts so it can reason, retrieve, call APIs, and complete multi-step tasks.
Knowledge tools ground answers, action tools change systems, and function or OpenAPI schemas define the callable contract the model can request.
Threads or conversations store multi-turn state, but cross-session memory should be deliberately persisted, permissioned, summarized, and injected only when relevant.
Tool calls require validation, least privilege, trusted outputs, retry handling, and human approval for costly, sensitive, or irreversible actions.
Evaluation and tracing are production requirements: measure groundedness, relevance, safety, tool correctness, latency, token use, and failure patterns.

Last updated: June 2026

From Chatbot to Agent

A simple chatbot mostly generates text. An agent can reason over a goal, use tools, retrieve knowledge, call APIs, run code, and continue through multiple steps. Microsoft Foundry Agent Service provides managed agent runtime options so teams can define prompt agents quickly or deploy hosted agents with custom code, while still using Foundry models, platform tools, identity, observability, and publishing controls.

On the AI-103 blueprint this lives under Build agents by using Foundry, which explicitly lists defining agent roles, goals, conversation-tracking approach, and tool schemas; integrating retrieval, function-calling, and memory; orchestrating multi-agent solutions; and building autonomous or semiautonomous workflows with safeguards and approval-flow controls.

Core Agent Pieces

Piece	Purpose	Design question
Model	Reasoning and language capability	Does the task need a large, small, reasoning, or multimodal model?
Instructions	Role, goals, constraints, and policies	What should the agent always do or refuse?
Tools	Knowledge access or external action	What can the agent read, calculate, or change?
Conversation state	Multi-turn context	What history is needed for this run?
Memory store	Durable user or task facts	What should persist across sessions, and who may see it?
Evaluation	Quality and safety evidence	Which metrics block release or trigger rollback?

AI-103 uses terms such as agents, conversations or threads, messages, runs or responses, tool schemas, memory, tracing, and evaluation. In the classic thread model, a thread stores conversation messages and a run invokes the agent against that thread. Newer Foundry agent patterns expose a Responses API and conversation concepts, but the exam idea is constant: state is explicit, scoped, and inspectable.

Knowledge Tools and Action Tools

The single most useful agent distinction on this exam is knowledge tools versus action tools.

Knowledge tools add context and are generally safe to call automatically. Examples: Azure AI Search, file search, web grounding, Microsoft Fabric, or a knowledge base. They should return source metadata so the agent can cite evidence and tracing can explain what was retrieved.
Action tools change or inspect external systems and carry real risk. Examples: function calling, Azure Functions, Logic Apps, OpenAPI tools, Model Context Protocol (MCP) endpoints, code interpreter, and browser automation.

The model may choose a tool, but the platform or host code must enforce policy. For a purchase-order agent, reading vendor status can be automatic; submitting an order, changing a bank account, or emailing a customer should require confirmation and least-privilege credentials.

A tool schema is the contract the model sees: name, description, parameters, required fields, enum values, and expected result shape. Good schemas are narrow and obvious. Do not expose a generic run_sql tool when a read_vendor_invoice_status(vendorId) tool would limit blast radius. Validate arguments before execution, validate outputs before reusing them in prompts, and treat tool output as untrusted — retrieved or returned text can carry indirect prompt injection that tries to hijack the next model call.

Memory and Threads

Conversation memory is not the same as RAG, and the exam likes to blur them. RAG retrieves source documents; memory preserves user or task context.

Concept	Stores	Scope	Risk to manage
Thread / conversation	Prior turns of one chat	Single session	Context-window growth
Cross-session memory	Durable user or task facts	Across sessions, keyed by user/tenant/task	Privacy, stale facts, leakage
RAG index	Source documents and chunks	Whole knowledge base	Permissions, freshness

A thread can remember prior turns for one conversation, but cross-session memory should live in a governed database or index keyed by user, tenant, or task. Store only useful facts, summarize long histories instead of replaying every message, expire stale memory, and never inject sensitive or irrelevant history into every prompt — doing so wastes tokens and risks leaking one user's data into another's context.

Evaluation, Tracing, and Operations

Foundry evaluations can test agents, models, or datasets before deployment and monitor production quality afterward. For RAG and agents, measure retrieval relevance, final-answer relevance, groundedness, response completeness, safety, tool success rate, refusal correctness, latency, and token cost. Tracing captures the evidence trail: prompt construction, retrieved chunks, model calls, tool invocations, outputs, errors, and timing. When an agent gives a bad answer, traces reveal whether the defect came from weak instructions, missing retrieval, a bad tool result, or an unsafe tool choice.

The exam treats observability as a release gate, not an afterthought: you should be able to point at the metric or trace that blocks a deployment or triggers a rollback.

Multi-agent orchestration and safeguards

The blueprint also lists orchestrated multi-agent solutions and autonomous or semiautonomous workflows with approval flow controls. A common pattern routes a request to specialized agents (for example, a retrieval agent, a planning agent, and an action agent) under an orchestrator that holds shared goals and policy.

The exam favors designs where each sub-agent has the least-privilege tools it needs and nothing more, where high-impact actions pause for human approval, and where the orchestrator logs every hand-off in a trace. Autonomy is a spectrum: read-only research can run unattended, but any irreversible action should route through an approval flow so a human owns the consequence.

Official Anchors

Foundry Agent Service overview: https://learn.microsoft.com/en-us/azure/foundry/agents/overview
Agent tools overview: https://learn.microsoft.com/en-us/azure/foundry-classic/agents/how-to/tools-classic/overview
Threads, runs, and messages: https://learn.microsoft.com/en-us/azure/foundry-classic/agents/concepts/threads-runs-messages
Foundry evaluations: https://learn.microsoft.com/en-us/azure/foundry/how-to/evaluate-generative-ai-app

Test Your KnowledgeMulti-Select

Which THREE controls are most appropriate before allowing a Foundry agent to call an external API that updates customer records? (Select three)

Select all that apply

Validate the tool arguments against the schema and business rules before execution

Grant the tool broad administrator permissions so the model never encounters access errors

Require user or human approval for sensitive or irreversible updates

Log traces that include the selected tool, arguments, result, latency, and errors

Treat all tool output as trusted system instructions for the next model call

Test Your Knowledge

An agent should remember a user's preferred shipping address across separate conversations on different days. Where should this fact live?

In the current thread only, so it is discarded when the session ends

In a governed cross-session memory store keyed by user, retrieved and injected only when relevant

In the RAG document index alongside public product manuals

Baked into the agent's base model through fine-tuning

Up Next

Vision, Multimodal, Speech, and Language Workloads

Vision, Language, Information Extraction, and Final Review

Microsoft Azure AI Apps and Agents Developer Associate

Microsoft Azure AI App and Agent Developer (AI-103)

Agent Tools, Functions, Memory, and Evaluation

Key Takeaways

From Chatbot to Agent

Core Agent Pieces

Knowledge Tools and Action Tools

Memory and Threads

Evaluation, Tracing, and Operations

Multi-agent orchestration and safeguards

Official Anchors

Microsoft Azure AI Apps and Agents Developer Associate

1AI-103 Blueprint, Microsoft Foundry, and Solution Planning

2Generative AI, Agents, and Retrieval-Augmented Generation

3Vision, Language, Information Extraction, and Final Review

Microsoft Azure AI App and Agent Developer (AI-103)

Agent Tools, Functions, Memory, and Evaluation

Key Takeaways

From Chatbot to Agent

Core Agent Pieces

Knowledge Tools and Action Tools

Memory and Threads

Evaluation, Tracing, and Operations

Multi-agent orchestration and safeguards

Official Anchors