🧭 Multi-Tenant SaaS — Project 3

Redis-Backed Dynamic Rate Limiting with SlowAPI

🚀 Introduction: Why Redis-Backed Rate Limiting Matters for SaaS Scalability

As our multi-tenant SaaS journey continues, we move from local experimentation to real-world scalability. In Project 1, we learned how to handle multiple tenants within one codebase. In Project 2, we introduced per-tenant fairness with SlowAPI, ensuring every tenant enjoyed predictable service levels. Now, in Project 3, we step into the production zone — where scalability, distribution, and reliability become non-negotiable.

When your SaaS runs on a single container, local memory-based limits are fine. But in the real world, applications rarely live alone. They run on multiple containers, pods, or servers behind a load balancer. Without a shared backend, each instance maintains its own counters — creating inconsistent throttling, double allowances, and even missed abuse detection.

That’s where Redis comes in. Redis provides a fast, centralized store that SlowAPI uses to maintain all rate-limit counters globally. Every API replica connects to the same Redis instance, so no matter which node handles a request, the system enforces a single truth about rate usage. It’s the heartbeat that keeps fairness synchronized across your distributed infrastructure.

🔍 The Goal

The goal of this project is to show you exactly how to convert a local rate-limiting setup into a production-grade, distributed enforcement system using FastAPI + SlowAPI + Redis. You’ll see how two separate API containers share the same global counters, how tenants retain their dynamic per-plan limits, and how Redis ensures consistent “Too Many Requests” (HTTP 429) responses even across instances.

By the end of this project, you’ll understand how enterprise SaaS systems scale — not by adding servers blindly, but by sharing intelligence through smart backends like Redis. This is the bridge between a working demo and a real SaaS product ready for global deployment.

🖥️ Slide 1: Project 3 — Multi-Tenant SaaS with Redis-Backed Rate Limiting (SlowAPI)

This project takes the previous multi-tenant SaaS demos one step further by connecting SlowAPI to a Redis backend. Now, all API replicas share the same rate-limit counters, ensuring consistent enforcement no matter which instance handles a request. This architecture introduces distributed fairness — every tenant and session are tracked globally, not per container.

⚙️ Slide 2: Why Redis Backend is Needed

In a single-instance setup, SlowAPI keeps counters in memory. But once you scale horizontally (multiple containers or pods), each instance tracks limits independently — causing inconsistent throttling. Redis solves this by acting as a central store for counters and tokens, allowing:

Shared limits across replicas
Persistence across restarts
Fast, atomic increments using Redis commands

This transforms rate-limiting from local enforcement to a cluster-wide governance system.

🏗️ Slide 3: Architecture Overview with Redis Integration

Components:

FastAPI + SlowAPI — main application logic and request throttling
Redis — shared backend for limit counters
Docker Compose — orchestrates both API replicas and Redis service

Each API container reads RATE_LIMIT_STORAGE_URI=redis://redis:6379/0, connecting SlowAPI’s limiter to Redis. Both api1 (port 8003) and api2 (port 8004) enforce limits from the same global store.

💾 Slide 4: How SlowAPI Uses Redis Storage

SlowAPI builds on the limits library, which supports Redis, Memcached, and in-memory backends. When Redis is used:

Counters are stored as Redis keys.
Each request atomically increments the tenant/session key.
If the limit is exceeded, SlowAPI instantly returns HTTP 429 Too Many Requests.

This approach is highly performant because Redis handles millions of atomic ops per second.

🔄 Slide 5: Shared Rate Limit Counters Across Replicas

Since both API instances use the same Redis DB, every request — regardless of which container it hits — updates the same counter. So, if Acme sends seven messages to api1 and one to api2, the total of eight still respects Acme’s limit of 8 / 10 seconds. This provides true global rate enforcement, critical for Kubernetes or load-balanced environments.

⚙️ Slide 6: Configuration via RATELIMITSTORAGE_URI

The environment variable:

RATE_LIMIT_STORAGE_URI=redis://redis:6379/0

tells SlowAPI where to persist counters. Changing the DB index or password allows easy separation for staging, testing, or per-environment throttling. This variable makes the rate-limit backend fully configurable without code changes.

🧪 Slide 7: Testing Cross-Replica Rate Limiting

To verify shared enforcement:

Make several /chat calls to api1.
Continue sending the remaining calls to api2.
The combined count still triggers a 429 after the global limit is hit.

This simple test proves that the limiter is reading and updating counters in Redis, not in local memory.

🐳 Slide 8: Docker Compose Setup — API + Redis

The docker-compose.yml defines:

redis service on port 6379
api1 and api2 services, both depending on Redis
Shared volume redisdata for persistence

With one command:

docker compose up --build

you get a fully working multi-tenant, multi-replica SaaS environment with global rate limiting.

📊 Slide 9: Dynamic Tenant Limits from tenants.jsonl

Each tenant still has its own configuration file entry:

{"id":"acme","limits":{"chat_per_10s":8}}
{"id":"globex","limits":{"chat_per_10s":4}}

SlowAPI dynamically reads these limits and applies them to Redis counters. The limit expression becomes, for example, "8/10 seconds", guaranteeing plan-based fairness even in distributed setups.

💬 Slide 10: Session and Chat Endpoints Overview

/session — creates tenant-bound sessions, rate-limited per IP (10/min default).
/chat — enforces per-tenant, per-session rate limits through Redis. Together they demonstrate both IP-scoped and tenant-aware throttling in one API. All limits are defined declaratively with @limiter.limit() decorators.

🔑 Slide 11: Key Functions — chatkeyfunc and chatlimitvalue

chat_key_func(request) builds the limiter key as tenant_id:session_id, falling back to IP if headers are missing.
chat_limit_value(request) reads the tenant’s limit from configuration and returns the string "N/10 seconds". These two small functions enable fully dynamic, per-tenant limit policies without hard-coding values.

🧠 Slide 12: Redis Proof — Shared 429 Responses Across Instances

By hitting both APIs rapidly, you’ll notice the 429 response appear after the global count exceeds the limit, regardless of instance. Example:

HTTP/1.1 429 Too Many Requests
{"detail":"Rate limit exceeded: 8 per 10 seconds"}

This is the final proof that Redis synchronization works perfectly — each replica contributes to the same global counter pool.

☁️ Slide 13: Scaling and Production Readiness

Redis makes the system ready for:

Horizontal scaling across multiple regions or pods.
Zero-downtime deployments, as counters persist.
Integration with Prometheus/Grafana for usage monitoring.
Fine-grained tenant analytics, since every key is identifiable.

This setup forms the core of enterprise-grade SaaS traffic management.

🚀 Slide 14: Enhancements and Next Steps

From here you can evolve toward:

Redis clusters for high availability
Rate-limit dashboards for tenants and admins
Usage-based billing triggered by counter data
AI-driven adaptive throttling that adjusts dynamically
Kubernetes deployment with autoscaling policies

This completes the trilogy of foundational SaaS demos — a journey from multi-tenant design → dynamic local throttling → distributed Redis-backed governance.

Awesome—here’s a clear “how it works” + hands-on test plan to demonstrate Project-3 (Redis-backed SlowAPI rate limiting) end-to-end.

How it works (quick mental model)

One codebase, two API replicas (api1 on :8003, api2 on :8004) serve multiple tenants.
SlowAPI enforces limits using Redis (shared storage).
- /session is IP-scoped (e.g., 10/min).
- /chat is tenant+session-scoped using a composite limiter key: X-Tenant-ID:X-Session-ID.
Tenant limits (e.g., chat_per_10s) come from tenants.jsonl.
Because counters live in Redis, requests hitting different replicas still count toward the same global window.

0) Prereqs

Files from Project-3 (compose, app.py, tenants.jsonl, etc.) in multi-tenant-redis-rate-limit/
Tools: Docker, curl, jq (or craft JSON by hand)

1) Start the stack

cd multi-tenant-redis-rate-limit
docker compose up --build
# api1 -> http://localhost:8003
# api2 -> http://localhost:8004
# redis -> localhost:6379 (inside compose network it's "redis:6379")

Health check:

curl -s http://localhost:8003/ | jq
curl -s http://localhost:8004/ | jq

You should see tenants, defaults, and storage: "redis://redis:6379/0".

2) Create sessions (per tenant)

ACME_SID=$(curl -s -X POST http://localhost:8003/session \
  -H 'Content-Type: application/json' \
  -d '{"tenant_id":"acme"}' | jq -r .session_id)

GLOBEX_SID=$(curl -s -X POST http://localhost:8003/session \
  -H 'Content-Type: application/json' \
  -d '{"tenant_id":"globex"}' | jq -r .session_id)

INITECH_SID=$(curl -s -X POST http://localhost:8003/session \
  -H 'Content-Type: application/json' \
  -d '{"tenant_id":"initech"}' | jq -r .session_id)

echo "ACME_SID=$ACME_SID"
echo "GLOBEX_SID=$GLOBEX_SID"
echo "INITECH_SID=$INITECH_SID"

✅ /session is rate-limited per IP (e.g., 10/min). If you loop >10, expect 429.

3) Normal chat within limits (single replica)

Tenant limits from tenants.jsonl (examples):

acme → 8 / 10s
globex → 4 / 10s
initech → 6 / 10s

Send 8 allowed messages for acme to api1:

for i in {1..8}; do
  curl -s -X POST http://localhost:8003/chat \
    -H 'Content-Type: application/json' \
    -H "X-Tenant-ID: acme" \
    -H "X-Session-ID: $ACME_SID" \
    -d "$(jq -n --arg t acme --arg s "$ACME_SID" --arg m "msg-$i" \
         '{tenant_id:$t, session_id:$s, message:$m}')";
  echo ""
done

All 8 requests succeed ✅

4) Hit 429 (single replica)

Send one more immediately:

curl -i -s -X POST http://localhost:8003/chat \
  -H 'Content-Type: application/json' \
  -H "X-Tenant-ID: acme" \
  -H "X-Session-ID: $ACME_SID" \
  -d "$(jq -n --arg t acme --arg s "$ACME_SID" --arg m "overflow" \
       '{tenant_id:$t, session_id:$s, message:$m}')"

Expected:

HTTP/1.1 429 Too Many Requests
{"detail":"Rate limit exceeded: 8 per 10 seconds"}

⏲️ Wait ~10 seconds and requests will succeed again (window resets).

5) Cross-replica proof (the Redis difference)

Goal: show counters are shared between api1 and api2.

Send 7 requests to api1 (port 8003):

for i in {1..7}; do
  curl -s -X POST http://localhost:8003/chat \
    -H 'Content-Type: application/json' \
    -H "X-Tenant-ID: acme" \
    -H "X-Session-ID: $ACME_SID" \
    -d "{\"tenant_id\":\"acme\",\"session_id\":\"$ACME_SID\",\"message\":\"cross-$i\"}";
  echo ""
done

Send the 8th to api2 (port 8004) → still allowed:

curl -i -s -X POST http://localhost:8004/chat \
  -H 'Content-Type: application/json' \
  -H "X-Tenant-ID: acme" \
  -H "X-Session-ID: $ACME_SID" \
  -d "{\"tenant_id\":\"acme\",\"session_id\":\"$ACME_SID\",\"message\":\"n8\"}"

Send one more to api2 → should 429 (window shared across replicas):

curl -i -s -X POST http://localhost:8004/chat \
  -H 'Content-Type: application/json' \
  -H "X-Tenant-ID: acme" \
  -H "X-Session-ID: $ACME_SID" \
  -d "{\"tenant_id\":\"acme\",\"session_id\":\"$ACME_SID\",\"message\":\"n9-should-429\"}"

Conclusion: both containers enforce the same window, thanks to Redis storage.

6) Compare tenants (different quotas)

globex (limit 4/10s):

for i in {1..5}; do
  curl -i -s -X POST http://localhost:8003/chat \
    -H 'Content-Type: application/json' \
    -H "X-Tenant-ID: globex" \
    -H "X-Session-ID: $GLOBEX_SID" \
    -d "{\"tenant_id\":\"globex\",\"session_id\":\"$GLOBEX_SID\",\"message\":\"g-$i\"}";
  echo ""
done

Requests 1–4 ✅, request 5 → 429 ❌. This proves per-tenant dynamic limits are applied.

7) Edge cases you can demonstrate

A) Missing headers → fallback to IP key

If X-Tenant-ID or X-Session-ID is missing, the limiter falls back to IP-based keying:

curl -i -s -X POST http://localhost:8003/chat \
  -H 'Content-Type: application/json' \
  -d "{\"tenant_id\":\"acme\",\"session_id\":\"$ACME_SID\",\"message\":\"no-headers\"}"

Repeated quickly → you’ll see 429 based on IP.

B) Cross-tenant session misuse

Server enforces session ownership; misuse returns 403:

curl -i -s -X POST http://localhost:8003/chat \
  -H 'Content-Type: application/json' \
  -H "X-Tenant-ID: globex" \
  -H "X-Session-ID: $ACME_SID" \
  -d "{\"tenant_id\":\"globex\",\"session_id\":\"$ACME_SID\",\"message\":\"bad\"}"

Expect:

HTTP/1.1 403 Forbidden
{"detail":"Session does not belong to tenant"}

C) `/session` IP limit (spam control)

Prove per-minute IP cap (default 10):

for i in {1..12}; do
  curl -i -s -X POST http://localhost:8003/session \
    -H 'Content-Type: application/json' \
    -d '{"tenant_id":"acme"}'
  echo ""
done

Calls 11–12 → 429.

8) Look inside Redis (optional, nice for demos)

Open a shell in the Redis container:

docker compose exec redis redis-cli

Useful commands:

# See keys (limits uses its own naming; you’ll see time-windowed keys)
SCAN 0 COUNT 100

# Inspect TTL on a recent rate key (replace with an actual key from SCAN)
TTL <key>

# Flush counters (resets all windows; use with care in demos)
FLUSHALL

You can also run:

MONITOR

…then trigger a few chat requests in another terminal to watch Redis ops live.

9) Change limits live (simple demo)

Edit backend/tenants.jsonl, e.g., raise Acme to 12:

{"id":"acme","limits":{"chat_per_10s":12}}

Restart the API containers (Redis can stay up):

docker compose restart api1 api2

Re-run your chat loop—observe the new 12/10s budget is enforced, across both replicas.

10) Troubleshooting quick wins

429 too early? You may still be inside the same window. Wait 10–12 seconds and retry.
Counters feel unsynced? Ensure both APIs show storage: redis://redis:6379/0 at /.
Compose up fails? Free ports 8003/8004/6379 or change them in docker-compose.yml.
Windows without jq: craft JSON manually in -d '{"tenant_id":"acme",...}'.

What this demo proves

Global fairness via Redis-backed SlowAPI counters
Per-tenant dynamic quotas enforced consistently across replicas
Defense in depth: IP limits on /session, composite (tenant+session) limits on /chat
Zero code changes to swap local → distributed limiter backend (just RATE_LIMIT_STORAGE_URI)

🏁 Outro: From Local Control to Global Fairness

With Project 3, we’ve transformed our SaaS prototype into a distributed, production-ready architecture. The addition of Redis shifts rate limiting from a local safeguard into a central governance layer — one that enforces consistency, fairness, and trust across every running instance.

You’ve now implemented:

Global, Redis-backed rate limiting for all tenants and sessions
Cross-replica synchronization of usage windows
Dynamic per-tenant quotas defined in configuration, not hardcoded
Scalable enforcement that works seamlessly in Docker, Kubernetes, or cloud clusters

This isn’t just technical progress — it’s architectural maturity. Your SaaS now behaves like a living, coordinated ecosystem where every node knows what the others are doing. It guarantees that free users stay within limits, premium customers get priority, and the system remains stable even under heavy load.

🌍 What This Means

Redis has made our rate limiting stateless yet globally aware — a critical milestone for any cloud-native SaaS. We’ve proven that scalability doesn’t come from adding servers; it comes from adding shared intelligence. Every replica, every tenant, every session now abides by one universal rule set, making the system predictable, resilient, and fair.

🔮 Next Steps

From here, we can evolve toward:

High-availability Redis clusters
Tenant-aware billing and dashboards using usage data
AI-based adaptive throttling that adjusts automatically under load
Full Kubernetes deployment for elastic scaling

With this foundation, your SaaS platform is ready to handle thousands of tenants, millions of requests, and real-world production workloads — all with fairness and confidence.

Local enforcement builds protection. Distributed enforcement builds trust. Redis turns your SaaS into a globally fair system.

Stay tuned for Project 4, where we take the next leap — connecting this distributed throttling engine with usage-based billing and intelligent analytics, transforming technical governance into business intelligence. 🚀