Hosting Agent Apps with Streamlit + FastAPI: Rapid UI for Intelligent Workflows
January 3, 2025
Multi-Agent Collaboration in CrewAI: Setup, Roles, and Goals
January 26, 2025Run multi-agent systems in the cloud without managing infrastructure
One of the biggest challenges in deploying agentic AI applications is infrastructure. Traditional deployment requires managing GPUs, hosting environments, and memory. With serverless platforms like Modal and Replicate, you can deploy CrewAI agents without worrying about scaling, uptime, or devops complexity. This tutorial shows you how to take a multi-agent CrewAI workflow from your laptop to the cloudβin minutes.
π Why Serverless for Agentic AI?
Serverless platforms allow you to:
- Run agents on demand with no persistent infrastructure
- Scale up to handle spikes in load
- Use GPU-backed compute when needed (e.g., with Replicate or Modal Functions)
- Eliminate cold-start delays and simplify deployment pipelines
CrewAI, being Python-native and LLM-based, fits naturally into this workflow.
What Youβll Learn
- Setup and deploy a CrewAI team of agents on Modal and Replicate
- Configure tools (e.g., web search) and environment variables
- Trigger agent workflows via REST or CLI
- Compare pros/cons of each platform
π¦ Tools & Platforms Used
| Tool | Purpose |
|---|---|
| CrewAI | Agent orchestration (multi-agent logic) |
| LangChain | LLM + tool wrapping |
| Modal | Python serverless deployment |
| Replicate | Hosted inference with GPU |
| OpenAI | LLM reasoning backend |
π§ Option 1: Deploy CrewAI with Modal
β Step 1: Install Modal
bashCopyEditpip install modal
modal token new # Follow prompt to authenticateβ Step 2: Wrap Your CrewAI Code in a Modal Function
pythonCopyEditimport modal
from crewai import Agent, Task, Crew
from langchain.chat_models import ChatOpenAI
from langchain.tools import DuckDuckGoSearchRun
stub = modal.Stub("crewai-serverless")
@stub.function()
def run_crewai_agent(query: str):
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)
search_tool = DuckDuckGoSearchRun()
researcher = Agent(
role="Researcher",
goal=f"Research info about: {query}",
tools=[search_tool],
llm=llm
)
writer = Agent(
role="Writer",
goal="Write a blog summary from research",
tools=[],
llm=llm
)
task1 = Task("Research the topic.", researcher)
task2 = Task("Write summary from research.", writer)
crew = Crew(agents=[researcher, writer], tasks=[task1, task2])
result = crew.kickoff()
return resultβ Step 3: Run it Locally or Deploy
bashCopyEditmodal deploy crewai-serverless.py
modal run crewai-serverless.py::run_crewai_agent --args "LangGraph use cases"π Docs: Modal Python Guide
π Option 2: Deploy CrewAI with Replicate
Replicate specializes in hosted GPU inference. While itβs mostly used for ML models, you can wrap LLM-powered workflows like CrewAI using replicate-python.
β Step 1: Set Up a Replicate Model Repo
- Create a
predict.pyscript (entry point) - Use a
replicate.yamlto define your environment - Push to Replicate to host the function
Example predict.py (Simplified):
pythonCopyEditfrom crewai import Agent, Task, Crew
from langchain.chat_models import ChatOpenAI
def predict(query: str):
llm = ChatOpenAI()
researcher = Agent("Researcher", f"Find info on {query}", llm=llm)
writer = Agent("Writer", "Summarize it", llm=llm)
t1 = Task("Do research", researcher)
t2 = Task("Write article", writer)
crew = Crew([researcher, writer], [t1, t2])
return crew.kickoff()replicate.yaml (Config):
yamlCopyEditinput:
query:
type: string
default: "CrewAI overview"
description: "What to research"
python_version: "3.9"
packages:
- openai
- crewai
- langchainβ Deploy:
bashCopyEditreplicate pushπ Docs: Replicate Deployment Guide
π¬ Modal vs Replicate: Which to Use?
| Feature | Modal | Replicate |
|---|---|---|
| LLM-based logic | β Excellent | β Great for stateless logic |
| GPU compute | π‘ Available via modal.gpu | β Primary focus |
| REST API ready | β Yes | β Yes |
| Logs + Monitoring | β via dashboard | β via dashboard |
| Developer focus | Full Python microservices | ML model + inference workflows |
Choose Modal if you want to run full workflows with reusable Python functions
Choose Replicate if you want simple GPU-backed inference endpoints
π Additional Resources
- π CrewAI Docs
- π Modal Python Functions
- π Replicate SDK
- π οΈ CrewAI GitHub
- π¬ CrewAI Community
- π§ LangChain + Serverless Guide
β Final Thoughts
Deploying agentic AI to the cloud doesn’t have to mean provisioning servers or writing Dockerfiles. With Modal and Replicate, you can get a multi-agent CrewAI system live in a matter of minutesβwhether you’re running research agents, chat assistants, or autonomous pipelines.
Start with a simple task, test locally, and scale serverlessly.


