How to Build AI Agents with Python: Step-by-Step (2026)

The fastest way to actually understand AI agents is to build one. Forget the diagrams and white papers — write 50 lines of Python and you'll know more than 90% of people debating "agentic AI" on Twitter.

Definition

An AI agent is a program that uses a language model to decide what to do next, calls external tools to take action, and loops until it reaches a goal. In Python, the dominant 2026 stack is LangGraph for orchestration, LangChain for model and tool integrations, and a search or data tool like Tavily.

TL;DR

The 2026 default Python stack: LangGraph (v1.0+) for agent orchestration, LangChain for model and tool wrappers, plus a search tool like Tavily for grounded answers.
An agent is a loop: model decides → call a tool → observe the result → decide again. That's the whole concept.
This tutorial walks through building a working research agent in under 100 lines that takes a question, searches the web, and returns a sourced answer.
LangGraph is in production at companies like Klarna, Uber, Replit, and Elastic. Skill transfers directly to professional work.
Cost to run: under $1 to test the agent in this tutorial. You'll need an OpenAI or Anthropic API key and a free Tavily key.

What an AI Agent Actually Is

Strip away the marketing and an AI agent is three things working together:

A model (the brain) — usually a large language model like GPT-4, Claude, or Gemini
A set of tools (the hands) — functions the model can call: web search, database queries, code execution, API calls, file I/O
A loop (the runtime) — code that lets the model call a tool, see the result, and decide what to do next

Without the loop, an LLM is just a chatbot — one prompt, one response. With the loop, the LLM can take a goal, break it into steps, execute steps, observe outcomes, and adjust. That's an agent.

In 2026, the loop is almost always implemented as a state graph — nodes are model calls or tool calls, edges define which node runs next based on the model's decision. LangGraph is the dominant Python library for this pattern, sitting underneath LangChain (which wraps individual model and tool calls).

The 2026 Python Stack for Agents

You don't need ten libraries. Five do the work.

Library	Purpose	Why It Matters
langgraph	Agent orchestration — defines the state graph and runs the loop	The current standard for production agents. v1.0 shipped late 2025.
langchain	High-level framework, model wrappers, tool integrations	Easiest entry point. Built on LangGraph underneath.
langchain-openai (or langchain-anthropic)	Connector for the LLM you'll use	Pick one based on your API key. Both work identically with LangChain.
tavily-python	Web search tool optimized for LLMs	Free tier is generous. Returns clean text rather than raw HTML.
python-dotenv	Loads API keys from a .env file	Keeps secrets out of your code.

Use Python 3.11 or newer. LangGraph and modern LangChain rely on type hints and async features that older versions don't fully support.

Step 1: Set Up Your Environment

Create a project folder, set up a virtual environment, and install the stack.

mkdir research-agent && cd research-agent
python3 -m venv .venv
source .venv/bin/activate   # macOS/Linux
# .venv\Scripts\activate    # Windows

pip install langchain langchain-openai langgraph tavily-python python-dotenv

Create a .env file in the same folder with your API keys:

OPENAI_API_KEY=sk-proj-...
TAVILY_API_KEY=tvly-...

Get an OpenAI key at platform.openai.com. Get a free Tavily key at tavily.com — the free tier gives 1,000 searches per month, which is enough to learn and prototype.

Warning

Never commit your .env file to git. Add it to .gitignore immediately. Leaked OpenAI keys get scraped within hours and rack up bills before you notice.

Step 2: Understand the Agent Loop

Before writing the agent, picture the loop in your head. Here's the simplest possible mental model:

User asks a question
Model receives the question plus the list of tools available
Model decides: do I have enough information to answer, or do I need a tool?
If a tool is needed, model emits a tool call (a structured request like "search the web for X")
The runtime executes the tool, captures the result, and feeds it back to the model
Model decides again — maybe call another tool, maybe answer
Loop until the model produces a final answer

Every agent framework — LangGraph, AutoGen, CrewAI, OpenAI's Agents SDK — implements some version of this loop. Differences are mostly about how you define the graph and pass state.

Step 3: Build the Agent in Python

Create a file called agent.py. The full working agent is under 80 lines.

import os
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from langchain_community.tools.tavily_search import TavilySearchResults
from langgraph.prebuilt import create_react_agent

load_dotenv()

# 1. Initialize the model
model = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# 2. Define the tools the agent can use
search_tool = TavilySearchResults(max_results=4)
tools = [search_tool]

# 3. Build the agent using LangGraph's prebuilt ReAct pattern
agent = create_react_agent(model, tools)

# 4. Run the agent
def ask(question: str) -> str:
    result = agent.invoke({
        "messages": [("user", question)]
    })
    return result["messages"][-1].content

if __name__ == "__main__":
    answer = ask("What were the three biggest AI product launches in March 2026?")
    print(answer)

Run it:

python agent.py

In about 10-20 seconds you'll get a sourced answer. The agent decided to search Tavily, processed the results, and synthesized a response. You just built an AI agent.

Tip

Use gpt-4o-mini for development — it's 10x cheaper than gpt-4o and fast enough to iterate quickly. Switch to a larger model only when you need higher reasoning quality on production tasks.

Step 4: Add a Custom Tool

Real agents do more than search. The power kicks in when you give them tools that touch your specific systems — a database, an internal API, a file, a CRM.

Here's how to add a custom tool. Append this to your agent file:

from langchain_core.tools import tool

@tool
def calculate(expression: str) -> str:
    """Evaluate a math expression and return the result. 
    Example: calculate('15 * 23 + 100')"""
    try:
        # Safe eval limited to math operations
        allowed = {"__builtins__": {}}
        result = eval(expression, allowed)
        return str(result)
    except Exception as e:
        return f"Error: {e}"

# Update the tools list
tools = [search_tool, calculate]
agent = create_react_agent(model, tools)

Now ask: ask("If GPT-4o costs $5 per million input tokens and I send 250,000 tokens per day, what's my monthly cost?"). The agent will use the calculator tool instead of trying to do math in its head (which it does badly).

The @tool decorator is doing the heavy lifting. It turns any Python function with a docstring into a tool the agent can call. The docstring is the description the model uses to decide when to call the tool — write it clearly.

Step 5: Add Memory (Multi-Turn Conversations)

The agent above is stateless — every call starts fresh. To make it remember previous turns, use LangGraph's checkpointing.

from langgraph.checkpoint.memory import MemorySaver

memory = MemorySaver()
agent = create_react_agent(model, tools, checkpointer=memory)

config = {"configurable": {"thread_id": "user-123"}}

ask("My name is Zarif and I run a YouTube channel about AI.")
ask("Suggest three video titles based on what I do.")  # Remembers context

The thread_id keys the memory store. Use a real user ID in production. For persistent memory across restarts, swap MemorySaver for SqliteSaver or a Postgres-backed checkpoint store.

Step 6: Move to a Custom State Graph (When Prebuilt Isn't Enough)

create_react_agent is the prebuilt ReAct loop. It's perfect for 80% of agents. For the other 20% — agents with multiple specialized models, conditional branches, human approval steps — you build a custom LangGraph.

Skeleton:

from langgraph.graph import StateGraph, END
from typing import TypedDict, Annotated, List
from langchain_core.messages import BaseMessage, HumanMessage

class AgentState(TypedDict):
    messages: Annotated[List[BaseMessage], "conversation history"]
    next_step: str

def planner_node(state: AgentState):
    # Use a strong model to decide the plan
    response = model.invoke(state["messages"])
    return {"messages": [response], "next_step": "search"}

def search_node(state: AgentState):
    # Run the tool
    results = search_tool.invoke(state["messages"][-1].content)
    return {"messages": [HumanMessage(content=str(results))], "next_step": "respond"}

def responder_node(state: AgentState):
    final = model.invoke(state["messages"])
    return {"messages": [final], "next_step": "end"}

graph = StateGraph(AgentState)
graph.add_node("planner", planner_node)
graph.add_node("search", search_node)
graph.add_node("responder", responder_node)

graph.set_entry_point("planner")
graph.add_edge("planner", "search")
graph.add_edge("search", "responder")
graph.add_edge("responder", END)

custom_agent = graph.compile()

Custom graphs unlock the production patterns: parallel tool calls, retries with different models, human-in-the-loop approval, branching based on confidence scores, structured output validation.

Step 7: Add Observability (Critical Before Production)

Don't ship an agent without traces. When something breaks, you need to see exactly which model call made which tool call with which input. LangChain's hosted observability (LangSmith) is the easiest option:

import os
os.environ["LANGSMITH_TRACING"] = "true"
os.environ["LANGSMITH_API_KEY"] = "lsv2_..."
os.environ["LANGSMITH_PROJECT"] = "research-agent"

Every run now shows up in the LangSmith dashboard with full traces, latency, token counts, and cost. The free tier handles small projects.

For self-hosted observability, use OpenTelemetry — both LangChain and LangGraph support OTel exporters.

Common Pitfalls When Building Python Agents

Issues that bite almost everyone the first time.

Infinite tool-call loops. The model keeps calling tools without ever producing a final answer. Mitigation: set recursion_limit on the agent (default is 25) and write tool descriptions that make it obvious when the tool has succeeded.

Hallucinated tool arguments. The model invents fake parameters — wrong API IDs, malformed JSON. Mitigation: use Pydantic models to validate tool inputs, and let LangGraph re-prompt the model when validation fails.

Cost blowups. A buggy agent that loops can burn $50 in an hour. Mitigation: always set a max iteration count, log every model call, and cap monthly spending in your OpenAI/Anthropic dashboard.

Slow responses. Sequential tool calls add up. Mitigation: enable parallel tool calling in LangGraph, use smaller models for simple steps, and cache search results.

Bad tool descriptions. The model can't use a tool well if the docstring is vague. Mitigation: write tool docstrings like you're writing API documentation for a junior developer — purpose, input format, output format, example.

Tip

Before scaling an agent, run it 20 times on the same input and check for consistency. If outputs vary wildly, your prompts or tool descriptions need to be tighter, or you need to lower the temperature.

When Python Agents Are the Wrong Choice

Python agents are powerful but not always the right fit.

For non-developers building automation, use n8n with AI nodes instead. You get most agent capabilities with a visual editor and zero Python. Faster to ship, easier to maintain.

For chat-only experiences, use a managed agent platform like OpenAI's Custom GPTs, Claude Projects, or a no-code agent builder. Building a Python agent for a one-off chat use case is overkill.

For workflows with strict control flow, a regular Python script with LLM calls embedded is often more reliable than an agent. Agents shine when the path isn't predetermined; if you know the steps, code the steps.

For everything else — research agents, coding agents, customer support copilots, internal automations — Python with LangGraph is the strongest 2026 choice.

Next Steps After This Tutorial

You have a working agent. Build three more.

A coding agent — give it the ability to read files, run code, and write new files. Use the subprocess module as a tool.
A CRM agent — connect to a Hubspot or Notion API. Let it query, update, and create records.
A multi-agent workflow — one agent plans, a second executes, a third reviews. This is where LangGraph's custom graphs earn their keep.

After three projects you'll have internalized the pattern and can build domain-specific agents in an afternoon.

FAQ

Do I need machine learning experience to build AI agents in Python?

No. You need basic Python — functions, classes, dictionaries, virtual environments. The model itself is hosted by OpenAI or Anthropic; you call it via API. Knowing how transformers work helps with debugging but isn't required for building.

What is the difference between LangChain and LangGraph?

LangChain is the high-level framework with model wrappers, tool integrations, and prebuilt patterns. LangGraph is the lower-level orchestration library that LangChain uses internally for agents. For most agent work you import from both. Beginners can stick to LangChain's prebuilt agents; production work typically uses LangGraph directly.

How much does it cost to build and run a Python AI agent?

Building costs nothing if you use free tiers — Tavily free, OpenAI free credits, LangSmith free tier. Running costs depend on usage: a research agent answering 100 questions per day with gpt-4o-mini costs roughly $5-10/month. Heavier agents using gpt-4o or Claude Opus can cost $50-200/month at moderate volume.

Can I build AI agents in Python without using LangChain?

Yes. You can call the OpenAI or Anthropic SDKs directly and build the agent loop yourself in 100 lines of Python. The OpenAI Agents SDK and Anthropic's tool use API both support agents natively. LangChain and LangGraph add convenience, observability hooks, and prebuilt patterns — they're not required.

What is the ReAct pattern in AI agents?

ReAct stands for "Reasoning + Acting." It's an agent loop where the model alternates between reasoning steps (thinking out loud about what to do) and action steps (calling tools). LangGraph's create_react_agent implements this pattern out of the box. Most production agents in 2026 are some variant of ReAct.

How do I deploy a Python AI agent to production?

Wrap the agent in a FastAPI or Flask endpoint, deploy to Railway, Render, AWS Lambda, or a VPS. Add observability (LangSmith or OpenTelemetry), rate limiting, and error handling. For high-throughput agents, run them as background workers reading from a queue rather than synchronous HTTP requests.

If you want to go deeper on the agent landscape, see the best AI agents of 2026 ranked and the under-$100 AI automation stack.