Your PyTorch code stays almost identical. The only thing that changes is where it runs.
All of this complexity? Gone.
No more matching CUDA versions to PyTorch versions to driver versions. We handle all of that.
No 15GB GPU container images. No nvidia-container-toolkit. No docker-compose GPU configs.
No $2,000 graphics card. No cloud GPU instances sitting idle. No hardware maintenance.
No Kubernetes GPU scheduling. No CUDA capability checks. No "works on my machine" GPU issues.
Here's what you typically need to get PyTorch running in production
# Python dependencies
torch==2.1.0+cu121
torchvision==0.16.0+cu121
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.18.1
nvidia-nvtx-cu12==12.1.105
triton==2.1.0
# System requirements (not in pip)
# - NVIDIA Driver >= 530.30.02
# - CUDA Toolkit 12.1
# - cuDNN 8.9
# - Ubuntu 20.04+ or similar
# - nvidia-container-toolkit
# - 16GB+ RAM recommended
FROM nvidia/cuda:12.1.0-cudnn8-runtime-ubuntu22.04
# Install Python and dependencies
RUN apt-get update && apt-get install -y \
python3.10 python3-pip \
libgl1-mesa-glx libglib2.0-0 \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
# Image size: ~15GB
CMD ["python3", "app.py"]
version: '3.8'
services:
app:
build: .
runtime: nvidia # Requires nvidia-container-toolkit
environment:
- NVIDIA_VISIBLE_DEVICES=all
- NVIDIA_DRIVER_CAPABILITIES=compute,utility
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
Here's the entire setup with Remotorch
remotorch
flask # or whatever else your app needs
# That's it. No CUDA. No nvidia packages.
# No GPU drivers. No special hardware.
# Works on:
# - $6/month VPS
# - Raspberry Pi
# - MacBook Air
# - AWS Lambda
# - Literally anything with Python 3.8+
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
# Image size: ~150MB (not 15GB!)
CMD ["python", "app.py"]
See what actually changes in your Python code
import torch
# Check if GPU is available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Create tensors on GPU
x = torch.randn(1000, 1000, device=device)
y = torch.randn(1000, 1000, device=device)
# Matrix multiplication
result = torch.matmul(x, y)
# Get result back to CPU
output = result.cpu().numpy()
import remotorch
# Connect to remote GPU
remotorch.connect(
api_key="rk_...",
gpu_type="rtx4090" # Choose your GPU
) # <- only new lines
# Create tensors on remote GPU
x = remotorch.randn(1000, 1000)
y = remotorch.randn(1000, 1000)
# Matrix multiplication (runs remotely)
result = remotorch.matmul(x, y)
# Get result back to local
output = result.cpu().numpy()
import torch
import torchvision.models as models
from flask import Flask, request
app = Flask(__name__)
# Load model on GPU (requires CUDA)
device = torch.device("cuda")
model = models.resnet50(pretrained=True)
model = model.to(device)
model.eval()
@app.route("/classify", methods=["POST"])
def classify():
img_tensor = preprocess(request.files["image"])
img_tensor = img_tensor.to(device)
with torch.no_grad():
output = model(img_tensor)
return {"class": output.argmax().item()}
Requires: GPU instance ($100-300/mo), CUDA drivers, nvidia-docker
import remotorch
from flask import Flask, request
app = Flask(__name__)
# Connect to remote GPU
remotorch.connect(api_key="rk_...", gpu_type="rtx4090")
# Load model on remote GPU
model = remotorch.hub.load("resnet50", pretrained=True)
model.eval()
@app.route("/classify", methods=["POST"])
def classify():
img_tensor = preprocess(request.files["image"])
img_tensor = remotorch.tensor(img_tensor)
with remotorch.no_grad():
output = model(img_tensor)
return {"class": output.argmax().cpu().item()}
Requires: Any Python host ($6/mo VPS works!)
import torch
# This script only works if you have a GPU
if not torch.cuda.is_available():
raise RuntimeError("No GPU found!")
device = torch.device("cuda")
def process_batch(data):
tensor = torch.tensor(data, device=device)
# Heavy computation
result = tensor @ tensor.T
result = torch.nn.functional.softmax(result, dim=-1)
return result.cpu().numpy()
import remotorch
# Works from any machine
remotorch.connect(api_key="rk_...", gpu_type="rtx4090")
def process_batch(data):
tensor = remotorch.tensor(data)
# Heavy computation (runs on remote GPU)
result = tensor @ tensor.T
result = remotorch.nn.functional.softmax(result, dim=-1)
return result.cpu().numpy()
Here's everything you need to change in your code
| Traditional PyTorch | Remotorch |
|---|---|
import torch |
import remotorch |
- |
remotorch.connect(api_key="...", gpu_type="rtx4090") |
torch.tensor(...) |
remotorch.tensor(...) |
torch.randn(...) |
remotorch.randn(...) |
torch.matmul(a, b) |
remotorch.matmul(a, b) |
tensor.to("cuda") |
Not needed - already on GPU |
tensor.cuda() |
Not needed |
tensor.cpu() |
tensor.cpu() (same) |
Change your imports from torch to remotorch, add one connect line, and you're done. Your tensor operations, model inference, and everything else works the same way.
Convert your project in 5 minutes
pip install remotorch
Sign up at remotorch.com and create an API key from your dashboard
Replace import torch with import remotorch
Add remotorch.connect(api_key="...", gpu_type="rtx4090") at the start of your script
Delete your Dockerfile's CUDA base image, nvidia-docker config, and GPU instance. Deploy to any cheap host.
Start running inference from any device in minutes.