Serverless Deployment of CrewAI Agents Using Modal or Replicate

Hosting Agent Apps with Streamlit + FastAPI: Rapid UI for Intelligent Workflows

January 3, 2025

Multi-Agent Collaboration in CrewAI: Setup, Roles, and Goals

January 26, 2025

Published by Nathan Mallamace on January 26, 2025

🚀 Why Serverless for Agentic AI?

Serverless platforms allow you to:

Run agents on demand with no persistent infrastructure
Scale up to handle spikes in load
Use GPU-backed compute when needed (e.g., with Replicate or Modal Functions)
Eliminate cold-start delays and simplify deployment pipelines

CrewAI, being Python-native and LLM-based, fits naturally into this workflow.

What You’ll Learn

Setup and deploy a CrewAI team of agents on Modal and Replicate
Configure tools (e.g., web search) and environment variables
Trigger agent workflows via REST or CLI
Compare pros/cons of each platform

📦 Tools & Platforms Used

Tool	Purpose
CrewAI	Agent orchestration (multi-agent logic)
LangChain	LLM + tool wrapping
Modal	Python serverless deployment
Replicate	Hosted inference with GPU
OpenAI	LLM reasoning backend

🔧 Option 1: Deploy CrewAI with Modal

✅ Step 1: Install Modal

bashCopyEditpip install modal
modal token new  # Follow prompt to authenticate

✅ Step 2: Wrap Your CrewAI Code in a Modal Function

pythonCopyEditimport modal
from crewai import Agent, Task, Crew
from langchain.chat_models import ChatOpenAI
from langchain.tools import DuckDuckGoSearchRun

stub = modal.Stub("crewai-serverless")

@stub.function()
def run_crewai_agent(query: str):
    llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)
    search_tool = DuckDuckGoSearchRun()

    researcher = Agent(
        role="Researcher",
        goal=f"Research info about: {query}",
        tools=[search_tool],
        llm=llm
    )

    writer = Agent(
        role="Writer",
        goal="Write a blog summary from research",
        tools=[],
        llm=llm
    )

    task1 = Task("Research the topic.", researcher)
    task2 = Task("Write summary from research.", writer)

    crew = Crew(agents=[researcher, writer], tasks=[task1, task2])
    result = crew.kickoff()
    return result

✅ Step 3: Run it Locally or Deploy

bashCopyEditmodal deploy crewai-serverless.py
modal run crewai-serverless.py::run_crewai_agent --args "LangGraph use cases"

🔗 Docs: Modal Python Guide

🔁 Option 2: Deploy CrewAI with Replicate

Replicate specializes in hosted GPU inference. While it’s mostly used for ML models, you can wrap LLM-powered workflows like CrewAI using replicate-python.

✅ Step 1: Set Up a Replicate Model Repo

Create a predict.py script (entry point)
Use a replicate.yaml to define your environment
Push to Replicate to host the function

Example `predict.py` (Simplified):

pythonCopyEditfrom crewai import Agent, Task, Crew
from langchain.chat_models import ChatOpenAI

def predict(query: str):
    llm = ChatOpenAI()
    researcher = Agent("Researcher", f"Find info on {query}", llm=llm)
    writer = Agent("Writer", "Summarize it", llm=llm)
    t1 = Task("Do research", researcher)
    t2 = Task("Write article", writer)
    crew = Crew([researcher, writer], [t1, t2])
    return crew.kickoff()

`replicate.yaml` (Config):

yamlCopyEditinput:
  query:
    type: string
    default: "CrewAI overview"
    description: "What to research"

python_version: "3.9"
packages:
  - openai
  - crewai
  - langchain

✅ Deploy:

bashCopyEditreplicate push

🔗 Docs: Replicate Deployment Guide

🔬 Modal vs Replicate: Which to Use?

Feature	Modal	Replicate
LLM-based logic	✅ Excellent	✅ Great for stateless logic
GPU compute	🟡 Available via modal.gpu	✅ Primary focus
REST API ready	✅ Yes	✅ Yes
Logs + Monitoring	✅ via dashboard	✅ via dashboard
Developer focus	Full Python microservices	ML model + inference workflows

Choose Modal if you want to run full workflows with reusable Python functions
Choose Replicate if you want simple GPU-backed inference endpoints

📚 Additional Resources

✅ Final Thoughts

Deploying agentic AI to the cloud doesn’t have to mean provisioning servers or writing Dockerfiles. With Modal and Replicate, you can get a multi-agent CrewAI system live in a matter of minutes—whether you’re running research agents, chat assistants, or autonomous pipelines.

Start with a simple task, test locally, and scale serverlessly.

Hosting Agent Apps with Streamlit + FastAPI: Rapid UI for Intelligent Workflows

Multi-Agent Collaboration in CrewAI: Setup, Roles, and Goals

Hosting Agent Apps with Streamlit + FastAPI: Rapid UI for Intelligent Workflows

Multi-Agent Collaboration in CrewAI: Setup, Roles, and Goals

🚀 Why Serverless for Agentic AI?

What You’ll Learn

📦 Tools & Platforms Used

🔧 Option 1: Deploy CrewAI with Modal

✅ Step 1: Install Modal

✅ Step 2: Wrap Your CrewAI Code in a Modal Function

✅ Step 3: Run it Locally or Deploy

🔁 Option 2: Deploy CrewAI with Replicate

✅ Step 1: Set Up a Replicate Model Repo

Example predict.py (Simplified):

replicate.yaml (Config):

✅ Deploy:

🔬 Modal vs Replicate: Which to Use?

📚 Additional Resources

✅ Final Thoughts

Nathan Mallamace

Related posts

How to Install n8n on Your DownDoggy.com VPS with a Custom Domain & HTTPS

Which Framework Should You Use for Agentic AI? CrewAI vs LangChain vs LangGraph

Designing Autonomous Agents: From Prompt Engineering to Goal-Oriented Behavior

Leave a Reply Cancel reply

Example `predict.py` (Simplified):

`replicate.yaml` (Config):