January 26, 2025
Run multi-agent systems in the cloud without managing infrastructure
One of the biggest challenges in deploying agentic AI applications is infrastructure. Traditional deployment requires managing GPUs, hosting environments, and memory. With serverless platforms like Modal and Replicate, you can deploy CrewAI agents without worrying about scaling, uptime, or devops complexity. This tutorial shows you how to take a multi-agent CrewAI workflow from your laptop to the cloudβin minutes.
Serverless platforms allow you to:
CrewAI, being Python-native and LLM-based, fits naturally into this workflow.
| Tool | Purpose |
|---|---|
| CrewAI | Agent orchestration (multi-agent logic) |
| LangChain | LLM + tool wrapping |
| Modal | Python serverless deployment |
| Replicate | Hosted inference with GPU |
| OpenAI | LLM reasoning backend |
bashCopyEditpip install modal
modal token new # Follow prompt to authenticatepythonCopyEditimport modal
from crewai import Agent, Task, Crew
from langchain.chat_models import ChatOpenAI
from langchain.tools import DuckDuckGoSearchRun
stub = modal.Stub("crewai-serverless")
@stub.function()
def run_crewai_agent(query: str):
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)
search_tool = DuckDuckGoSearchRun()
researcher = Agent(
role="Researcher",
goal=f"Research info about: {query}",
tools=[search_tool],
llm=llm
)
writer = Agent(
role="Writer",
goal="Write a blog summary from research",
tools=[],
llm=llm
)
task1 = Task("Research the topic.", researcher)
task2 = Task("Write summary from research.", writer)
crew = Crew(agents=[researcher, writer], tasks=[task1, task2])
result = crew.kickoff()
return resultbashCopyEditmodal deploy crewai-serverless.py
modal run crewai-serverless.py::run_crewai_agent --args "LangGraph use cases"π Docs: Modal Python Guide
Replicate specializes in hosted GPU inference. While itβs mostly used for ML models, you can wrap LLM-powered workflows like CrewAI using replicate-python.
predict.py script (entry point)replicate.yaml to define your environmentpredict.py (Simplified):pythonCopyEditfrom crewai import Agent, Task, Crew
from langchain.chat_models import ChatOpenAI
def predict(query: str):
llm = ChatOpenAI()
researcher = Agent("Researcher", f"Find info on {query}", llm=llm)
writer = Agent("Writer", "Summarize it", llm=llm)
t1 = Task("Do research", researcher)
t2 = Task("Write article", writer)
crew = Crew([researcher, writer], [t1, t2])
return crew.kickoff()replicate.yaml (Config):yamlCopyEditinput:
query:
type: string
default: "CrewAI overview"
description: "What to research"
python_version: "3.9"
packages:
- openai
- crewai
- langchainbashCopyEditreplicate pushπ Docs: Replicate Deployment Guide
| Feature | Modal | Replicate |
|---|---|---|
| LLM-based logic | β Excellent | β Great for stateless logic |
| GPU compute | π‘ Available via modal.gpu | β Primary focus |
| REST API ready | β Yes | β Yes |
| Logs + Monitoring | β via dashboard | β via dashboard |
| Developer focus | Full Python microservices | ML model + inference workflows |
Choose Modal if you want to run full workflows with reusable Python functions
Choose Replicate if you want simple GPU-backed inference endpoints
Deploying agentic AI to the cloud doesn’t have to mean provisioning servers or writing Dockerfiles. With Modal and Replicate, you can get a multi-agent CrewAI system live in a matter of minutesβwhether you’re running research agents, chat assistants, or autonomous pipelines.
Start with a simple task, test locally, and scale serverlessly.