October 03, 2025 • 6 min read
In this guide, I’ll show you how to create a multi-agent WhatsApp assistant using Python and LangGraph.
The goal is to have different AI agents working together: one to understand the user’s intent, another to perform the action, and a supervisor that coordinates both.
If you’ve already built WhatsApp bots with Twilio, this is the next step. Instead of a single prompt-based bot, you’ll have multiple specialized agents that can reason, delegate, and complete tasks together.
Most WhatsApp chatbots follow a simple pattern:
That approach works, but it’s limited. You’ll quickly find problems when:
LangGraph solves that by letting you define stateful workflows between multiple AI agents, similar to a flowchart but fully defined in code.
We’ll create a simple multi-agent architecture that can be adapted for different use cases.
Example agents:
Interpreter
: understands what the user wantsExecutor
: performs the actionSupervisor
: coordinates both
Once you understand this structure, you can reuse it for:
You’ll need:
pip install langgraph langchain-openai fastapi python-dotenv
Create a .env
file:
OPENAI_API_KEY=sk-your-key
MODEL_NAME=gpt-4o-mini
We’ll start by creating two agents: one to interpret the user’s message and another to execute the corresponding action.
# interpreter_agent.py
from langgraph.prebuilt import create_react_agent
from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field
import os
class IntentOutput(BaseModel):
intent: str = Field(description="The user’s intent, e.g., check_status, create_booking, cancel")
parameters: dict = Field(description="Extracted entities or parameters from the user message")
SYSTEM_PROMPT = """
You are the Interpreter Agent.
Your job is to read a WhatsApp message, detect the user’s intent, and extract structured parameters.
Example:
User: "Can you book a table for two tomorrow at 8pm?"
Response (as JSON):
{
"intent": "create_booking",
"parameters": {"people": 2, "time": "2025-07-10T20:00:00"}
}
"""
llm = ChatOpenAI(model=os.getenv("MODEL_NAME", "gpt-4o-mini"), use_responses_api=True)
interpreter_agent = create_react_agent(
name="interpreter_agent",
model=llm,
tools=[],
prompt=SYSTEM_PROMPT
)
# executor_agent.py
from langgraph.prebuilt import create_react_agent
from langchain_openai import ChatOpenAI
import os
SYSTEM_PROMPT = """
You are the Executor Agent.
You receive an intent and parameters from the Interpreter.
Your job is to perform the action (or simulate it) and return a WhatsApp-friendly message.
If intent == "create_booking", pretend to create a booking.
If intent == "check_status", pretend to look up an order or reservation.
If intent == "cancel", confirm the cancellation.
Always return a short, user-friendly message, e.g.:
"Your table for two tomorrow at 8pm has been booked ✅"
"""
llm = ChatOpenAI(model=os.getenv("MODEL_NAME", "gpt-4o-mini"), use_responses_api=True)
executor_agent = create_react_agent(
name="executor_agent",
model=llm,
tools=[],
prompt=SYSTEM_PROMPT
)
This is the part that coordinates everything.
It defines the logic that decides which agent runs next and how the information flows between them.
from typing import Literal, TypedDict, Annotated
from langgraph.graph import StateGraph, START, END
from langchain_core.messages import HumanMessage, AIMessage, BaseMessage
import operator
from interpreter_agent import interpreter_agent
from executor_agent import executor_agent
class AssistantState(TypedDict):
messages: Annotated[list[BaseMessage], operator.add]
user_prompt: str
intent: str | None
parameters: dict | None
result: str | None
next_agent: Literal["interpreter", "executor", "finish"] | None
def route_to_next(state: AssistantState) -> Literal["interpreter", "executor", "finish"]:
if not state.get("intent"):
return "interpreter"
elif not state.get("result"):
return "executor"
else:
return "finish"
def call_interpreter(state: AssistantState) -> dict:
result = interpreter_agent.invoke({
"messages": [HumanMessage(content=state["user_prompt"])]
})
intent = None
parameters = {}
text = result["messages"][-1].content if "messages" in result else ""
if "intent" in text:
import json
try:
parsed = json.loads(text)
intent = parsed.get("intent")
parameters = parsed.get("parameters", {})
except:
pass
return {
"intent": intent,
"parameters": parameters,
"messages": [AIMessage(content=f"Detected intent: {intent}", name="interpreter")]
}
def call_executor(state: AssistantState) -> dict:
prompt = f"Intent: {state['intent']}\nParameters: {state['parameters']}"
result = executor_agent.invoke({"messages": [HumanMessage(content=prompt)]})
text = result["messages"][-1].content if "messages" in result else ""
return {"result": text, "messages": [AIMessage(content=text, name="executor")]}
def create_assistant():
workflow = StateGraph(AssistantState)
workflow.add_node("interpreter", call_interpreter)
workflow.add_node("executor", call_executor)
workflow.add_edge(START, "interpreter")
workflow.add_conditional_edges("interpreter", route_to_next, {"executor": "executor"})
workflow.add_conditional_edges("executor", route_to_next, {"finish": END})
return workflow.compile()
assistant_team = create_assistant()
You can use FastAPI and Twilio’s WhatsApp webhook to handle messages.
from fastapi import FastAPI, Form
from fastapi.responses import PlainTextResponse
from langchain_core.messages import HumanMessage
from supervisor import assistant_team
app = FastAPI()
@app.post("/twilio/whatsapp")
async def whatsapp_webhook(From: str = Form(...), Body: str = Form("")):
state = {"messages": [HumanMessage(content=Body)], "user_prompt": Body}
result = assistant_team.invoke(state)
reply = result.get("result") or "I couldn’t process that yet."
twiml = f"""
<Response>
<Message>{reply}</Message>
</Response>
"""
return PlainTextResponse(twiml, media_type="application/xml")
You can think of it as two agents having a short conversation: one understands what needs to be done, the other executes it.
User: “Can you check my order status?”
Interpreter: { "intent": "check_status" }
Executor: “Your order is currently being prepared 🚚”
User: “Cancel my meeting at 5pm”
Interpreter: { "intent": "cancel", "parameters": {"time": "17:00"} }
Executor: “Meeting at 5pm has been cancelled ✅”
Multi-agent systems let you scale beyond simple keyword bots.
You can reuse agents, connect them to APIs, and give them specific responsibilities.
That means less prompt complexity, better reliability, and cleaner integrations.
This approach fits perfectly in:
With this setup, you’ll reduce manual work, respond faster, and improve customer experience, all inside WhatsApp.
LangGraph makes it easy to build structured, multi-agent workflows.
Combined with Twilio’s WhatsApp API, you can go from a static chatbot to an assistant that understands, acts, and collaborates.
Start with two agents and one workflow. Once it works, extend it to handle your entire communication flow, from lead qualification to post-sale follow-ups.
Liked the post? Subscribe to my newsletter below for more!
Ready to take your project to the next level?
Contact meSr. Software Engineer
Senior software engineer located in Buenos Aires, Argentina. I specialize in building highly scalable web applications and I've been delivering MVPs and helping companies with their digital transformation for the past 7 years. I am proficient in a variety of technologies, including Laravel, Vue.js, Twilio, AWS, React.js, Node.js and MySQL.
If you enjoy the content that I make, you can subscribe and receive insightful information through email. No spam is going to be sent, just updates about interesting posts or specialized content that I talk about.