January 26, 2025
Run multi-agent systems in the cloud without managing infrastructure
One of the biggest challenges in deploying agentic AI applications is infrastructure. Traditional deployment requires managing GPUs, hosting environments, and memory. With serverless platforms like Modal and Replicate, you can deploy CrewAI agents without worrying about scaling, uptime, or devops complexity. This tutorial shows you how to take a multi-agent CrewAI workflow from your laptop to the cloud—in minutes.
Serverless platforms allow you to:
CrewAI, being Python-native and LLM-based, fits naturally into this workflow.
Tool | Purpose |
---|---|
CrewAI | Agent orchestration (multi-agent logic) |
LangChain | LLM + tool wrapping |
Modal | Python serverless deployment |
Replicate | Hosted inference with GPU |
OpenAI | LLM reasoning backend |
bashCopyEditpip install modal
modal token new # Follow prompt to authenticate
pythonCopyEditimport modal
from crewai import Agent, Task, Crew
from langchain.chat_models import ChatOpenAI
from langchain.tools import DuckDuckGoSearchRun
stub = modal.Stub("crewai-serverless")
@stub.function()
def run_crewai_agent(query: str):
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)
search_tool = DuckDuckGoSearchRun()
researcher = Agent(
role="Researcher",
goal=f"Research info about: {query}",
tools=[search_tool],
llm=llm
)
writer = Agent(
role="Writer",
goal="Write a blog summary from research",
tools=[],
llm=llm
)
task1 = Task("Research the topic.", researcher)
task2 = Task("Write summary from research.", writer)
crew = Crew(agents=[researcher, writer], tasks=[task1, task2])
result = crew.kickoff()
return result
bashCopyEditmodal deploy crewai-serverless.py
modal run crewai-serverless.py::run_crewai_agent --args "LangGraph use cases"
🔗 Docs: Modal Python Guide
Replicate specializes in hosted GPU inference. While it’s mostly used for ML models, you can wrap LLM-powered workflows like CrewAI using replicate-python
.
predict.py
script (entry point)replicate.yaml
to define your environmentpredict.py
(Simplified):pythonCopyEditfrom crewai import Agent, Task, Crew
from langchain.chat_models import ChatOpenAI
def predict(query: str):
llm = ChatOpenAI()
researcher = Agent("Researcher", f"Find info on {query}", llm=llm)
writer = Agent("Writer", "Summarize it", llm=llm)
t1 = Task("Do research", researcher)
t2 = Task("Write article", writer)
crew = Crew([researcher, writer], [t1, t2])
return crew.kickoff()
replicate.yaml
(Config):yamlCopyEditinput:
query:
type: string
default: "CrewAI overview"
description: "What to research"
python_version: "3.9"
packages:
- openai
- crewai
- langchain
bashCopyEditreplicate push
🔗 Docs: Replicate Deployment Guide
Feature | Modal | Replicate |
---|---|---|
LLM-based logic | ✅ Excellent | ✅ Great for stateless logic |
GPU compute | 🟡 Available via modal.gpu | ✅ Primary focus |
REST API ready | ✅ Yes | ✅ Yes |
Logs + Monitoring | ✅ via dashboard | ✅ via dashboard |
Developer focus | Full Python microservices | ML model + inference workflows |
Choose Modal if you want to run full workflows with reusable Python functions
Choose Replicate if you want simple GPU-backed inference endpoints
Deploying agentic AI to the cloud doesn’t have to mean provisioning servers or writing Dockerfiles. With Modal and Replicate, you can get a multi-agent CrewAI system live in a matter of minutes—whether you’re running research agents, chat assistants, or autonomous pipelines.
Start with a simple task, test locally, and scale serverlessly.
DoggyDish.com is where agentic AI meets real-world deployment. We go beyond theory to showcase how intelligent agents are built, scaled, and optimized—from initial idea to full-scale production. Whether you're training LLMs, deploying inferencing at the edge, or building out AI-ready infrastructure, we provide actionable insights to help you move from lab to launch with the hardware to match.
© 2025 DoggyDish.com · All rights reserved · Privacy Policy · Terms of Use