# How to Build an AI Agent with AutoGen

> Step-by-step tutorial to build an AI agent with Microsoft AutoGen v0.4. Covers installation, multi-agent patterns, tool integration, and production tips.

- Source: https://zarifautomates.com/blog/how-to-build-ai-agent-autogen
- Published: 2026-03-07
- Updated: 2026-03-07
- Pillar: AI Agents & Advanced
- Tags: build ai agent autogen, autogen tutorial, multi-agent ai, microsoft autogen
- Author: Zarif

---

Multi-agent systems beat single-agent approaches for complex tasks—and AutoGen is the fastest way to build them.

Microsoft AutoGen is a Python framework that enables you to build conversational multi-agent systems where agents collaborate by exchanging messages. Each agent runs code, calls tools, and makes decisions autonomously, but they coordinate through a conversation protocol to solve tasks that no single agent could handle alone.

- AutoGen v0.4 introduced a simpler architecture than v0.3 with better tool integration and human-in-the-loop support
- Start with a two-agent setup (assistant + user proxy) before scaling to group chats with 5+ agents
- Tool calling is native to AutoGen—register functions directly without separate SDKs
- GroupChat automatically routes messages between agents; manual message passing is a common pitfall
- Production deployments need cost controls, token limits, and human approval workflows for critical actions

Multi-agent workflows solve real problems faster than single chatbots. A legal document analyzer, a fact-checker, and a summarizer working together produce better results than one agent trying to do all three. Gartner estimates 40% of enterprise applications will feature AI agents by 2026, and organizations adopting multi-agent systems report 3-4 hour weekly time savings on coordination tasks alone. But building them requires thinking differently about system design.

This guide walks you through AutoGen v0.4 from installation to production. You'll move from "hello world" agents to a real multi-agent system with tool integration, failure handling, and cost controls.

## Step 1: Install AutoGen and Set Up Your Environment

AutoGen requires Python 3.10 or higher. Install it via pip:

```bash
pip install pyautogen
```

For v0.4 specifically, verify your installation:

```bash
python -c "import autogen; print(autogen.__version__)"
```

You also need an LLM API key. AutoGen supports OpenAI, Azure, Anthropic, and local models. For this tutorial, we'll use OpenAI, but the patterns work for any provider.

Set your API key as an environment variable:

```bash
export OPENAI_API_KEY="your-api-key-here"
```

In your Python code, configure the LLM settings:

```python
import autogen

config_list = [
    {
        "model": "gpt-4-turbo",
        "api_key": "your-api-key",
        "temperature": 0.7,
    }
]
```

Create a separate configuration file (recommended for production):

```python
# config.py
LLM_CONFIG = {
    "config_list": [
        {
            "model": "gpt-4-turbo",
            "api_key": "your-key",
            "temperature": 0.7,
            "timeout": 120,
        }
    ],
    "cache_seed": 42,
}
```

The `cache_seed` parameter enables caching—identical prompts return cached results, cutting API costs by 40-60% in development. You'll revisit this setting for production use.

## Step 2: Create Your First Agent

An AutoGen agent is a wrapper around an LLM with memory, tool access, and message handling. Let's build a simple assistant agent:

```python
from autogen import AssistantAgent

assistant = AssistantAgent(
    name="assistant",
    llm_config=LLM_CONFIG,
    system_message="You are a helpful AI assistant. Provide clear, concise answers."
)
```

This agent has a system prompt, knows which LLM to use, and maintains conversation history. It can call tools, but we haven't registered any yet.

Create a user proxy agent that simulates a human:

```python
from autogen import UserProxyAgent

user_proxy = UserProxyAgent(
    name="user",
    human_input_mode="TERMINATE",
    max_consecutive_auto_reply=10,
)
```

The `human_input_mode="TERMINATE"` means the agent stops at human input—it won't loop forever. `max_consecutive_auto_reply=10` prevents runaway agent chains.

Now initiate a conversation:

```python
user_proxy.initiate_chat(
    assistant,
    message="What is the capital of France?"
)
```

Run this and you'll see the agent respond. It's basic, but it works. Most AutoGen tutorials stop here. You shouldn't.

## Step 3: Add Tool Integration

Tools are what separate agents from chatbots. A tool is any Python function your agent can call to gather information or take action.

Define a simple tool:

```python
def search_wikipedia(query: str) -> str:
    """Search Wikipedia and return a summary."""
    import requests
    url = "https://en.wikipedia.org/w/api.php"
    params = {
        "action": "query",
        "list": "search",
        "srsearch": query,
        "format": "json",
    }
    response = requests.get(url, params=params)
    results = response.json().get("query", {}).get("search", [])
    if results:
        return f"Found: {results[0]['title']} - {results[0]['snippet']}"
    return "No results found."
```

Register the tool with your assistant:

```python
assistant.register_for_execution()(search_wikipedia)
```

Also register it with the user proxy so it knows tools exist:

```python
user_proxy.register_for_llm(
    description="Search Wikipedia for information about a topic"
)(search_wikipedia)
```

Now update your conversation:

```python
user_proxy.initiate_chat(
    assistant,
    message="Find information about the history of the Eiffel Tower."
)
```

The agent will recognize it can call `search_wikipedia`, invoke it, and incorporate results into its response. This is the pattern for any tool: define, register, call.

AutoGen caches tool results by default. If the same tool call runs twice, you get the cached response. For APIs with rate limits, this saves money. For real-time data (stock prices, weather), disable caching or set a short TTL.

## Step 4: Build a Two-Agent Collaboration

The real power emerges when agents talk to each other. Let's create a code reviewer and code writer:

```python
code_writer = AssistantAgent(
    name="code_writer",
    llm_config=LLM_CONFIG,
    system_message="You are an expert Python developer. Write clean, efficient code."
)

code_reviewer = AssistantAgent(
    name="code_reviewer",
    llm_config=LLM_CONFIG,
    system_message="You are a strict code reviewer. Check for bugs, security issues, and style. Give specific feedback."
)
```

Create a user proxy to start the exchange:

```python
user = UserProxyAgent(
    name="user",
    human_input_mode="TERMINATE",
    max_consecutive_auto_reply=15,
)
```

Initiate a multi-turn conversation:

```python
user.initiate_chat(
    code_writer,
    message="Write a Python function to validate email addresses using regex."
)
```

But here's the catch: `initiate_chat` only connects two agents. To get the reviewer involved, you need a conversation loop:

```python
def chat_with_review():
    code_writer.reset()
    code_reviewer.reset()

    code_writer.receive_message(
        message="Write a Python function to validate email addresses.",
        sender=user
    )

    code_reviewer.receive_message(
        message=code_writer.last_message()["content"],
        sender=code_writer
    )

    for i in range(5):
        response = code_writer.generate_reply(
            messages=code_writer.chat_history
        )
        code_reviewer.receive_message(response, sender=code_writer)
```

This is tedious. That's why GroupChat exists.

## Step 5: Scale with GroupChat

GroupChat orchestrates multi-agent conversations automatically. Define your agents and let GroupChat route messages:

```python
from autogen import GroupChat, GroupChatManager

agents = [code_writer, code_reviewer, user]

group_chat = GroupChat(
    agents=agents,
    messages=[],
    max_round=10,
    speaker_selection_method="auto",
)

manager = GroupChatManager(
    groupchat=group_chat,
    llm_config=LLM_CONFIG
)

user.initiate_chat(
    manager,
    message="Write and review a function to parse CSV files."
)
```

The `speaker_selection_method="auto"` uses the LLM to decide who speaks next based on context. Alternatives: "round_robin" (fixed rotation), "manual" (you decide), or a custom function.

GroupChat is where AutoGen shines. Five agents collaborating, each specialized, producing better output than any single agent. But it requires discipline.

GroupChat with more than 5 agents gets slow—each agent evaluates whether it should speak. Token costs climb fast. If you have 10+ agents, consider splitting into sub-groups or using a hierarchical approach with a manager agent routing tasks to specialists.

## Step 6: Implement Multi-Agent Patterns for Complex Workflows

Real systems need patterns beyond free-form conversation. Here are three that work:

**Pattern 1: Specialist Teams**
Create sub-teams of agents. A research team (researcher + fact-checker) produces a report. A content team (writer + editor) refines it. Then they merge findings:

```python
researchers = [researcher, fact_checker]
researchers_chat = GroupChat(
    agents=researchers + [user],
    max_round=5,
    speaker_selection_method="auto"
)

content_team = [writer, editor]
content_chat = GroupChat(
    agents=content_team + [user],
    max_round=5,
    speaker_selection_method="auto"
)

# Run research phase
user.initiate_chat(
    GroupChatManager(researchers_chat, llm_config=LLM_CONFIG),
    message="Research AI agent market trends."
)

research_output = user.last_message()["content"]

# Run content phase
user.initiate_chat(
    GroupChatManager(content_chat, llm_config=LLM_CONFIG),
    message=f"Turn this research into a blog post: {research_output}"
)
```

**Pattern 2: Approval Workflow**
An agent proposes, another approves or rejects:

```python
def requires_approval(agent, task):
    agent_response = agent.generate_reply(messages=[{"role": "user", "content": task}])

    approver_decision = approver.generate_reply(
        messages=[{"role": "assistant", "content": agent_response}]
    )

    if "APPROVED" in approver_decision:
        return agent_response, True
    return agent_response, False
```

Use this for deployment decisions, financial transactions, or sensitive outputs.

**Pattern 3: Hierarchical Routing**
A manager agent receives tasks and routes them to specialists:

```python
manager_system = """
You are a task router. Given a user request:
1. Identify the task type (data analysis / content creation / coding)
2. Route to the appropriate specialist team
3. Synthesize their output
Never make decisions directly—always delegate.
"""

manager = AssistantAgent(
    name="manager",
    llm_config=LLM_CONFIG,
    system_message=manager_system
)

# Manager routes to teams
manager.receive_message(
    message="Analyze Q4 sales data and write a summary."
)
```

The manager never sees the specialist tools; they only coordinate.

## Step 7: Production Considerations

Development agents and production agents are different creatures.

**Cost Control:**
Every API call costs money. Set strict limits:

```python
from autogen import Completion

Completion.max_tokens = 500  # Per message
Completion.temperature = 0.3  # Lower for consistency

llm_config = {
    "config_list": [...],
    "timeout": 60,
    "max_tokens": 500,
    "cache_seed": None,  # Disable caching in production
}
```

Monitor API usage:

```python
import logging
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger("autogen")
logger.setLevel(logging.INFO)
```

Set a budget per conversation:

```python
max_messages = 50
current_messages = 0

def count_messages(agent):
    global current_messages
    current_messages += 1
    if current_messages > max_messages:
        raise Exception("Message limit exceeded")
```

**Human-in-the-Loop:**
Not every decision should be automatic. For critical actions, ask for approval:

```python
user_proxy = UserProxyAgent(
    name="user",
    human_input_mode="ALWAYS",  # Require human approval
)
```

Or conditional approval:

```python
def should_require_approval(message: str) -> bool:
    sensitive_keywords = ["delete", "deploy", "transfer", "approve"]
    return any(keyword in message.lower() for keyword in sensitive_keywords)

user_proxy.human_input_mode = "TERMINATE"
# But set it to "ALWAYS" if should_require_approval(message)
```

**Error Handling:**
Agents hallucinate and fail. Handle it gracefully:

```python
from autogen import ConversationResult

def safe_chat(initiator, recipient, message, max_attempts=3):
    for attempt in range(max_attempts):
        try:
            result = initiator.initiate_chat(
                recipient,
                message=message,
                summary_method="reflection_with_llm"
            )
            return result.summary
        except Exception as e:
            logger.error(f"Attempt {attempt + 1} failed: {e}")
            if attempt == max_attempts - 1:
                return "Task failed after retries."
```

**Token Tracking:**
Know how many tokens your agents consume:

```python
class TokenTracker:
    def __init__(self):
        self.total_tokens = 0

    def log_tokens(self, message):
        from autogen.utils import count_tokens
        tokens = count_tokens(message)
        self.total_tokens += tokens
        return tokens

tracker = TokenTracker()
```

## Step 8: Common Pitfalls and How to Avoid Them

Most tutorials skip this. Don't.

**Pitfall 1: Agents Talking in Circles**
Agents repeat the same point endlessly. Fix it with:
- Lower `max_consecutive_auto_reply` (default: 5)
- Set `max_round` in GroupChat (default: 10)
- Add an explicit termination condition: "If you agree, say CONSENSUS."

```python
group_chat = GroupChat(
    agents=agents,
    max_round=8,  # Hard stop
    system_message="When all agents agree, say CONSENSUS and stop."
)
```

**Pitfall 2: Tools Don't Get Called**
The agent knows the tool exists but doesn't use it. Usually because the system prompt doesn't mention it:

```python
assistant = AssistantAgent(
    name="assistant",
    llm_config=LLM_CONFIG,
    system_message="You are an assistant. You have access to a search tool. Use it to find current information."
)
```

Explicitly tell agents they have tools.

**Pitfall 3: One Agent Dominates**
In GroupChat, one agent speaks too much. Adjust speaker selection:

```python
def custom_speaker_selection(last_speaker, groupchat):
    # Ensure fair distribution
    if last_speaker == agent_a:
        return agent_b
    return agent_a

group_chat = GroupChat(
    agents=agents,
    speaker_selection_method=custom_speaker_selection,
)
```

**Pitfall 4: Forgetting Agent Resets**
Memory persists between chats. If you reuse agents:

```python
assistant.reset()  # Clear chat history
user_proxy.reset()
```

Forgetting this causes agents to reference old conversations.

**Pitfall 5: Tool Functions with Side Effects**
If a tool deletes data or sends emails, test it outside AutoGen first:

```python
# Test tool in isolation
result = search_wikipedia("Python")
print(result)

# Then register with agent
assistant.register_for_execution()(search_wikipedia)
```

## Comparing AutoGen with Other Frameworks

<table>
  <thead>
    <tr>
      <th>Feature</th>
      <th>AutoGen</th>
      <th>CrewAI</th>
      <th>LangChain</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Multi-Agent Conversation</td>
      <td>Native GroupChat</td>
      <td>Task-based orchestration</td>
      <td>Requires custom loops</td>
    </tr>
    <tr>
      <td>Tool Integration</td>
      <td>register_for_execution()</td>
      <td>Tool decorator</td>
      <td>Tool calling via LLM</td>
    </tr>
    <tr>
      <td>Code Execution</td>
      <td>Built-in (sandboxed)</td>
      <td>Not built-in</td>
      <td>Via LLM only</td>
    </tr>
    <tr>
      <td>Learning Curve</td>
      <td>Steep</td>
      <td>Gentle</td>
      <td>Moderate</td>
    </tr>
    <tr>
      <td>Production Ready</td>
      <td>Yes</td>
      <td>Emerging</td>
      <td>Yes, but manual setup</td>
    </tr>
  </tbody>
</table>

AutoGen is the choice if you need agents that execute code and collaborate autonomously. CrewAI is simpler if you're new to agents. LangChain is best if you need maximum flexibility and don't mind writing scaffolding code.

## What You've Built

You now have a production-capable multi-agent system. You can:
- Create specialized agents with distinct roles
- Register tools for agents to call
- Coordinate 3-5 agents in GroupChat
- Handle failures and human approvals
- Monitor costs and prevent runaway loops

The AI agent market reached $7.6B in 2025, with 79% of organizations adopting AI agents. 93% of business leaders believe AI agents give a competitive edge. The difference between successful deployments and failures is rarely the LLM—it's agent orchestration. AutoGen handles that orchestration well.

For deeper patterns, see our complete guide to [building AI agents](/blog/complete-guide-to-building-ai-agents) and comparisons with [LangChain](/blog/how-to-build-ai-agent-langchain) and [CrewAI](/blog/how-to-build-an-ai-agent-with-crewai).

## Related Guides

- [AutoGen vs CrewAI: Multi-Agent Frameworks Compared](/blog/autogen-vs-crewai-multi-agent-frameworks-compared)
- [How to Build an AI Agent That Manages Projects](/blog/ai-agent-project-management)
- [How to Build an AI Agent That Handles Ambiguity](/blog/build-ai-agent-handles-ambiguity)
- [How to Build an AI Agent That Browses the Web](/blog/how-to-build-ai-agent-browses-web)

**Should I use AutoGen v0.4 or v0.3?**

Use v0.4. It's newer, simpler, and the team has deprecated v0.3 support. v0.4 has better tool integration and reduced boilerplate. Migration from v0.3 requires updates to agent initialization, but it's worth it.

**How do I prevent agents from running forever?**

Set `max_consecutive_auto_reply` on agents and `max_round` on GroupChat. Both are hard stops. Also set `human_input_mode="TERMINATE"` on user proxies—the agent asks for confirmation before continuing.

**Can AutoGen work with local models?**

Yes. Configure any LLM via the config_list. Use Ollama or LM Studio for local inference. You'll sacrifice speed compared to cloud APIs, but you keep all data local. Recommended for sensitive workflows.

**What's the typical cost for a multi-agent workflow?**

A 5-agent GroupChat resolving in 8 rounds costs roughly $0.50-$2 with GPT-4 Turbo, depending on token usage. Caching cuts this 40-60% in development. For production, budget $0.10-$1 per task with proper limits and cheaper models like GPT-3.5 Turbo.