Ray Serve and Gradio Integration Tutorial for AI Model Deployment

Rohit Gupta
Sep 26
1.3k
0
2

Article

Abstract / Overview

Deploying AI models requires scalability and user accessibility. Ray Serve provides distributed serving for machine learning models, while Gradio creates simple, interactive web UIs. When combined, they enable scalable backend deployments with intuitive frontends. This article explains how to integrate Ray Serve and Gradio, includes runnable code, and demonstrates a full real-world example of deploying an image classifier.

Conceptual Background

Ray Serve: A scalable serving library for machine learning and Python applications. Supports microservices, multi-model pipelines, and dynamic scaling.
Gradio: A Python library for instantly creating web-based interfaces for ML models. Ideal for demonstrations and rapid prototyping.
Integration Benefits: Ray ensures reliable model serving at scale, while Gradio makes models accessible through an interactive browser interface.

Step-by-Step Walkthrough

1. Install Dependencies

pip install "ray[serve]" gradio fastapi torch torchvision

2. Start Ray Serve

from ray import serve
serve.start()

3. Create a Deployment with PyTorch Image Classifier

import torch
import torchvision.models as models
import torchvision.transforms as transforms
from PIL import Image
import requests
from io import BytesIO

# Preprocessing pipeline
transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225]),
])

# Load pretrained model
model = models.resnet18(pretrained=True)
model.eval()

# Labels
LABELS_URL = "https://raw.githubusercontent.com/pytorch/hub/master/imagenet_classes.txt"
labels = requests.get(LABELS_URL).text.splitlines()

@serve.deployment
class ImageClassifier:
    def __call__(self, image: bytes) -> str:
        img = Image.open(BytesIO(image)).convert("RGB")
        tensor = transform(img).unsqueeze(0)
        with torch.no_grad():
            outputs = model(tensor)
        _, predicted = outputs.max(1)
        return labels[predicted.item()]

ImageClassifier.deploy()

This deployment loads a pretrained ResNet-18 model and classifies input images.

4. Expose Deployment via FastAPI

from fastapi import FastAPI, UploadFile

app = FastAPI()

@serve.deployment
@serve.ingress(app)
class ImageAPI:
    @app.post("/classify")
    async def classify(self, file: UploadFile):
        image_bytes = await file.read()
        classifier = await ImageClassifier.get_handle().remote()
        result = await classifier.remote(image_bytes)
        return {"prediction": result}

ImageAPI.deploy()

Now, you can send POST requests with an image to http://127.0.0.1:8000/classify.

5. Add Gradio Interface

import gradio as gr
import requests

def classify_image(image):
    response = requests.post(
        "http://127.0.0.1:8000/classify", 
        files={"file": ("image.png", image, "image/png")}
    )
    return response.json()["prediction"]

gr.Interface(
    fn=classify_image,
    inputs="image",
    outputs="text",
    title="Ray Serve + Gradio Image Classifier"
).launch()

Opening the Gradio interface lets you upload an image and view the prediction in real time.

Workflow Diagram: Ray Serve + Gradio

Use Cases / Scenarios

Computer Vision: Image classification, object detection, and medical imaging.
NLP: Text summarization, sentiment analysis, chatbot deployment.
Speech AI: Speech-to-text and text-to-speech apps with audio input/output.
Research Demos: Quickly share models with non-technical users.

Limitations / Considerations

Gradio is ideal for demos and testing, but not for production-heavy traffic.
For high-scale apps, use Ray Serve APIs as the backbone and keep Gradio as a developer or showcase tool.
Monitor Ray cluster resource usage carefully in cloud environments.

Fixes and Troubleshooting

Error: Deployment not found → Ensure ImageClassifier.deploy() was called.
Gradio not connecting to Ray Serve → Confirm FastAPI is running on 0.0.0.0 and the port is open.
Model loading issues → Verify correct PyTorch and torchvision versions.
Cluster errors → Restart Ray with ray stop --force && ray start --head.

FAQs

Q1. Can I connect multiple models with one Gradio app?
Yes, by deploying multiple Ray Serve endpoints and wiring them into one Gradio interface.

Q2. How do I scale deployments?
Ray Serve supports automatic scaling across CPUs/GPUs and nodes in a cluster.

Q3. Can this run in the cloud?
Yes, Ray Serve can run on AWS, GCP, Azure, and Kubernetes.

Q4. Is Gradio secure enough for production?
Not by default. Use authentication, API gateways, or reverse proxies.

Conclusion

Ray Serve and Gradio form a powerful combination: Ray ensures scalable backend deployment, while Gradio provides user-friendly UIs. Together, they allow developers to serve AI models efficiently and make them accessible to end users. For production, Gradio should complement Ray Serve APIs as a demonstration layer, while Ray handles distributed workloads.