Serverless Deployment of CrewAI Agents Using Modal or Replicate - DoggyDish.com

January 26, 2025

post-thumnail

Run multi-agent systems in the cloud without managing infrastructure

One of the biggest challenges in deploying agentic AI applications is infrastructure. Traditional deployment requires managing GPUs, hosting environments, and memory. With serverless platforms like Modal and Replicate, you can deploy CrewAI agents without worrying about scaling, uptime, or devops complexity. This tutorial shows you how to take a multi-agent CrewAI workflow from your laptop to the cloud—in minutes.


🚀 Why Serverless for Agentic AI?

Serverless platforms allow you to:

  • Run agents on demand with no persistent infrastructure
  • Scale up to handle spikes in load
  • Use GPU-backed compute when needed (e.g., with Replicate or Modal Functions)
  • Eliminate cold-start delays and simplify deployment pipelines

CrewAI, being Python-native and LLM-based, fits naturally into this workflow.


What You’ll Learn

  • Setup and deploy a CrewAI team of agents on Modal and Replicate
  • Configure tools (e.g., web search) and environment variables
  • Trigger agent workflows via REST or CLI
  • Compare pros/cons of each platform

📦 Tools & Platforms Used

ToolPurpose
CrewAIAgent orchestration (multi-agent logic)
LangChainLLM + tool wrapping
ModalPython serverless deployment
ReplicateHosted inference with GPU
OpenAILLM reasoning backend

🔧 Option 1: Deploy CrewAI with Modal

✅ Step 1: Install Modal

bashCopyEditpip install modal
modal token new  # Follow prompt to authenticate

✅ Step 2: Wrap Your CrewAI Code in a Modal Function

pythonCopyEditimport modal
from crewai import Agent, Task, Crew
from langchain.chat_models import ChatOpenAI
from langchain.tools import DuckDuckGoSearchRun

stub = modal.Stub("crewai-serverless")

@stub.function()
def run_crewai_agent(query: str):
    llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)
    search_tool = DuckDuckGoSearchRun()

    researcher = Agent(
        role="Researcher",
        goal=f"Research info about: {query}",
        tools=[search_tool],
        llm=llm
    )

    writer = Agent(
        role="Writer",
        goal="Write a blog summary from research",
        tools=[],
        llm=llm
    )

    task1 = Task("Research the topic.", researcher)
    task2 = Task("Write summary from research.", writer)

    crew = Crew(agents=[researcher, writer], tasks=[task1, task2])
    result = crew.kickoff()
    return result

✅ Step 3: Run it Locally or Deploy

bashCopyEditmodal deploy crewai-serverless.py
modal run crewai-serverless.py::run_crewai_agent --args "LangGraph use cases"

🔗 Docs: Modal Python Guide


🔁 Option 2: Deploy CrewAI with Replicate

Replicate specializes in hosted GPU inference. While it’s mostly used for ML models, you can wrap LLM-powered workflows like CrewAI using replicate-python.

✅ Step 1: Set Up a Replicate Model Repo

  1. Create a predict.py script (entry point)
  2. Use a replicate.yaml to define your environment
  3. Push to Replicate to host the function

Example predict.py (Simplified):

pythonCopyEditfrom crewai import Agent, Task, Crew
from langchain.chat_models import ChatOpenAI

def predict(query: str):
    llm = ChatOpenAI()
    researcher = Agent("Researcher", f"Find info on {query}", llm=llm)
    writer = Agent("Writer", "Summarize it", llm=llm)
    t1 = Task("Do research", researcher)
    t2 = Task("Write article", writer)
    crew = Crew([researcher, writer], [t1, t2])
    return crew.kickoff()

replicate.yaml (Config):

yamlCopyEditinput:
  query:
    type: string
    default: "CrewAI overview"
    description: "What to research"

python_version: "3.9"
packages:
  - openai
  - crewai
  - langchain

✅ Deploy:

bashCopyEditreplicate push

🔗 Docs: Replicate Deployment Guide


🔬 Modal vs Replicate: Which to Use?

FeatureModalReplicate
LLM-based logic✅ Excellent✅ Great for stateless logic
GPU compute🟡 Available via modal.gpu✅ Primary focus
REST API ready✅ Yes✅ Yes
Logs + Monitoring✅ via dashboard✅ via dashboard
Developer focusFull Python microservicesML model + inference workflows

Choose Modal if you want to run full workflows with reusable Python functions
Choose Replicate if you want simple GPU-backed inference endpoints


📚 Additional Resources


✅ Final Thoughts

Deploying agentic AI to the cloud doesn’t have to mean provisioning servers or writing Dockerfiles. With Modal and Replicate, you can get a multi-agent CrewAI system live in a matter of minutes—whether you’re running research agents, chat assistants, or autonomous pipelines.

Start with a simple task, test locally, and scale serverlessly.

Leave a Reply

Your email address will not be published. Required fields are marked *