🧭 Multi-Tenant SaaS — Project 3

Redis-Backed Dynamic Rate Limiting with SlowAPI


🚀 Introduction: Why Redis-Backed Rate Limiting Matters for SaaS Scalability

As our multi-tenant SaaS journey continues, we move from local experimentation to real-world scalability. In Project 1, we learned how to handle multiple tenants within one codebase. In Project 2, we introduced per-tenant fairness with SlowAPI, ensuring every tenant enjoyed predictable service levels. Now, in Project 3, we step into the production zone — where scalability, distribution, and reliability become non-negotiable.

When your SaaS runs on a single container, local memory-based limits are fine. But in the real world, applications rarely live alone. They run on multiple containers, pods, or servers behind a load balancer. Without a shared backend, each instance maintains its own counters — creating inconsistent throttling, double allowances, and even missed abuse detection.

That’s where Redis comes in. Redis provides a fast, centralized store that SlowAPI uses to maintain all rate-limit counters globally. Every API replica connects to the same Redis instance, so no matter which node handles a request, the system enforces a single truth about rate usage. It’s the heartbeat that keeps fairness synchronized across your distributed infrastructure.

🔍 The Goal

The goal of this project is to show you exactly how to convert a local rate-limiting setup into a production-grade, distributed enforcement system using FastAPI + SlowAPI + Redis. You’ll see how two separate API containers share the same global counters, how tenants retain their dynamic per-plan limits, and how Redis ensures consistent “Too Many Requests” (HTTP 429) responses even across instances.

By the end of this project, you’ll understand how enterprise SaaS systems scale — not by adding servers blindly, but by sharing intelligence through smart backends like Redis. This is the bridge between a working demo and a real SaaS product ready for global deployment.


🖥️ Slide 1: Project 3 — Multi-Tenant SaaS with Redis-Backed Rate Limiting (SlowAPI)

This project takes the previous multi-tenant SaaS demos one step further by connecting SlowAPI to a Redis backend. Now, all API replicas share the same rate-limit counters, ensuring consistent enforcement no matter which instance handles a request. This architecture introduces distributed fairness — every tenant and session are tracked globally, not per container.


⚙️ Slide 2: Why Redis Backend is Needed

In a single-instance setup, SlowAPI keeps counters in memory. But once you scale horizontally (multiple containers or pods), each instance tracks limits independently — causing inconsistent throttling. Redis solves this by acting as a central store for counters and tokens, allowing:

This transforms rate-limiting from local enforcement to a cluster-wide governance system.


🏗️ Slide 3: Architecture Overview with Redis Integration

Components:

  1. FastAPI + SlowAPI — main application logic and request throttling
  2. Redis — shared backend for limit counters
  3. Docker Compose — orchestrates both API replicas and Redis service

Each API container reads RATE_LIMIT_STORAGE_URI=redis://redis:6379/0, connecting SlowAPI’s limiter to Redis. Both api1 (port 8003) and api2 (port 8004) enforce limits from the same global store.


💾 Slide 4: How SlowAPI Uses Redis Storage

SlowAPI builds on the limits library, which supports Redis, Memcached, and in-memory backends. When Redis is used:

This approach is highly performant because Redis handles millions of atomic ops per second.


🔄 Slide 5: Shared Rate Limit Counters Across Replicas

Since both API instances use the same Redis DB, every request — regardless of which container it hits — updates the same counter. So, if Acme sends seven messages to api1 and one to api2, the total of eight still respects Acme’s limit of 8 / 10 seconds. This provides true global rate enforcement, critical for Kubernetes or load-balanced environments.


⚙️ Slide 6: Configuration via RATELIMITSTORAGE_URI

The environment variable:

RATE_LIMIT_STORAGE_URI=redis://redis:6379/0

tells SlowAPI where to persist counters. Changing the DB index or password allows easy separation for staging, testing, or per-environment throttling. This variable makes the rate-limit backend fully configurable without code changes.


🧪 Slide 7: Testing Cross-Replica Rate Limiting

To verify shared enforcement:

  1. Make several /chat calls to api1.
  2. Continue sending the remaining calls to api2.
  3. The combined count still triggers a 429 after the global limit is hit.

This simple test proves that the limiter is reading and updating counters in Redis, not in local memory.


🐳 Slide 8: Docker Compose Setup — API + Redis

The docker-compose.yml defines:

With one command:

docker compose up --build

you get a fully working multi-tenant, multi-replica SaaS environment with global rate limiting.


📊 Slide 9: Dynamic Tenant Limits from tenants.jsonl

Each tenant still has its own configuration file entry:

{"id":"acme","limits":{"chat_per_10s":8}}
{"id":"globex","limits":{"chat_per_10s":4}}

SlowAPI dynamically reads these limits and applies them to Redis counters. The limit expression becomes, for example, "8/10 seconds", guaranteeing plan-based fairness even in distributed setups.


💬 Slide 10: Session and Chat Endpoints Overview


🔑 Slide 11: Key Functions — chatkeyfunc and chatlimitvalue


🧠 Slide 12: Redis Proof — Shared 429 Responses Across Instances

By hitting both APIs rapidly, you’ll notice the 429 response appear after the global count exceeds the limit, regardless of instance. Example:

HTTP/1.1 429 Too Many Requests
{"detail":"Rate limit exceeded: 8 per 10 seconds"}

This is the final proof that Redis synchronization works perfectly — each replica contributes to the same global counter pool.


☁️ Slide 13: Scaling and Production Readiness

Redis makes the system ready for:

This setup forms the core of enterprise-grade SaaS traffic management.


🚀 Slide 14: Enhancements and Next Steps

From here you can evolve toward:

This completes the trilogy of foundational SaaS demos — a journey from multi-tenant designdynamic local throttlingdistributed Redis-backed governance.


Awesome—here’s a clear “how it works” + hands-on test plan to demonstrate Project-3 (Redis-backed SlowAPI rate limiting) end-to-end.


How it works (quick mental model)


0) Prereqs


1) Start the stack

cd multi-tenant-redis-rate-limit
docker compose up --build
# api1 -> http://localhost:8003
# api2 -> http://localhost:8004
# redis -> localhost:6379 (inside compose network it's "redis:6379")

Health check:

curl -s http://localhost:8003/ | jq
curl -s http://localhost:8004/ | jq

You should see tenants, defaults, and storage: "redis://redis:6379/0".


2) Create sessions (per tenant)

ACME_SID=$(curl -s -X POST http://localhost:8003/session \
  -H 'Content-Type: application/json' \
  -d '{"tenant_id":"acme"}' | jq -r .session_id)

GLOBEX_SID=$(curl -s -X POST http://localhost:8003/session \
  -H 'Content-Type: application/json' \
  -d '{"tenant_id":"globex"}' | jq -r .session_id)

INITECH_SID=$(curl -s -X POST http://localhost:8003/session \
  -H 'Content-Type: application/json' \
  -d '{"tenant_id":"initech"}' | jq -r .session_id)

echo "ACME_SID=$ACME_SID"
echo "GLOBEX_SID=$GLOBEX_SID"
echo "INITECH_SID=$INITECH_SID"

/session is rate-limited per IP (e.g., 10/min). If you loop >10, expect 429.


3) Normal chat within limits (single replica)

Tenant limits from tenants.jsonl (examples):

Send 8 allowed messages for acme to api1:

for i in {1..8}; do
  curl -s -X POST http://localhost:8003/chat \
    -H 'Content-Type: application/json' \
    -H "X-Tenant-ID: acme" \
    -H "X-Session-ID: $ACME_SID" \
    -d "$(jq -n --arg t acme --arg s "$ACME_SID" --arg m "msg-$i" \
         '{tenant_id:$t, session_id:$s, message:$m}')";
  echo ""
done

All 8 requests succeed ✅


4) Hit 429 (single replica)

Send one more immediately:

curl -i -s -X POST http://localhost:8003/chat \
  -H 'Content-Type: application/json' \
  -H "X-Tenant-ID: acme" \
  -H "X-Session-ID: $ACME_SID" \
  -d "$(jq -n --arg t acme --arg s "$ACME_SID" --arg m "overflow" \
       '{tenant_id:$t, session_id:$s, message:$m}')"

Expected:

HTTP/1.1 429 Too Many Requests
{"detail":"Rate limit exceeded: 8 per 10 seconds"}

⏲️ Wait ~10 seconds and requests will succeed again (window resets).


5) Cross-replica proof (the Redis difference)

Goal: show counters are shared between api1 and api2.

  1. Send 7 requests to api1 (port 8003):
for i in {1..7}; do
  curl -s -X POST http://localhost:8003/chat \
    -H 'Content-Type: application/json' \
    -H "X-Tenant-ID: acme" \
    -H "X-Session-ID: $ACME_SID" \
    -d "{\"tenant_id\":\"acme\",\"session_id\":\"$ACME_SID\",\"message\":\"cross-$i\"}";
  echo ""
done
  1. Send the 8th to api2 (port 8004) → still allowed:
curl -i -s -X POST http://localhost:8004/chat \
  -H 'Content-Type: application/json' \
  -H "X-Tenant-ID: acme" \
  -H "X-Session-ID: $ACME_SID" \
  -d "{\"tenant_id\":\"acme\",\"session_id\":\"$ACME_SID\",\"message\":\"n8\"}"
  1. Send one more to api2 → should 429 (window shared across replicas):
curl -i -s -X POST http://localhost:8004/chat \
  -H 'Content-Type: application/json' \
  -H "X-Tenant-ID: acme" \
  -H "X-Session-ID: $ACME_SID" \
  -d "{\"tenant_id\":\"acme\",\"session_id\":\"$ACME_SID\",\"message\":\"n9-should-429\"}"

Conclusion: both containers enforce the same window, thanks to Redis storage.


6) Compare tenants (different quotas)

globex (limit 4/10s):

for i in {1..5}; do
  curl -i -s -X POST http://localhost:8003/chat \
    -H 'Content-Type: application/json' \
    -H "X-Tenant-ID: globex" \
    -H "X-Session-ID: $GLOBEX_SID" \
    -d "{\"tenant_id\":\"globex\",\"session_id\":\"$GLOBEX_SID\",\"message\":\"g-$i\"}";
  echo ""
done

Requests 1–4 ✅, request 5 → 429 ❌. This proves per-tenant dynamic limits are applied.


7) Edge cases you can demonstrate

A) Missing headers → fallback to IP key

If X-Tenant-ID or X-Session-ID is missing, the limiter falls back to IP-based keying:

curl -i -s -X POST http://localhost:8003/chat \
  -H 'Content-Type: application/json' \
  -d "{\"tenant_id\":\"acme\",\"session_id\":\"$ACME_SID\",\"message\":\"no-headers\"}"

Repeated quickly → you’ll see 429 based on IP.

B) Cross-tenant session misuse

Server enforces session ownership; misuse returns 403:

curl -i -s -X POST http://localhost:8003/chat \
  -H 'Content-Type: application/json' \
  -H "X-Tenant-ID: globex" \
  -H "X-Session-ID: $ACME_SID" \
  -d "{\"tenant_id\":\"globex\",\"session_id\":\"$ACME_SID\",\"message\":\"bad\"}"

Expect:

HTTP/1.1 403 Forbidden
{"detail":"Session does not belong to tenant"}

C) /session IP limit (spam control)

Prove per-minute IP cap (default 10):

for i in {1..12}; do
  curl -i -s -X POST http://localhost:8003/session \
    -H 'Content-Type: application/json' \
    -d '{"tenant_id":"acme"}'
  echo ""
done

Calls 11–12 → 429.


8) Look inside Redis (optional, nice for demos)

Open a shell in the Redis container:

docker compose exec redis redis-cli

Useful commands:

# See keys (limits uses its own naming; you’ll see time-windowed keys)
SCAN 0 COUNT 100

# Inspect TTL on a recent rate key (replace with an actual key from SCAN)
TTL <key>

# Flush counters (resets all windows; use with care in demos)
FLUSHALL

You can also run:

MONITOR

…then trigger a few chat requests in another terminal to watch Redis ops live.


9) Change limits live (simple demo)

Edit backend/tenants.jsonl, e.g., raise Acme to 12:

{"id":"acme","limits":{"chat_per_10s":12}}

Restart the API containers (Redis can stay up):

docker compose restart api1 api2

Re-run your chat loop—observe the new 12/10s budget is enforced, across both replicas.


10) Troubleshooting quick wins


What this demo proves



🏁 Outro: From Local Control to Global Fairness

With Project 3, we’ve transformed our SaaS prototype into a distributed, production-ready architecture. The addition of Redis shifts rate limiting from a local safeguard into a central governance layer — one that enforces consistency, fairness, and trust across every running instance.

You’ve now implemented:

This isn’t just technical progress — it’s architectural maturity. Your SaaS now behaves like a living, coordinated ecosystem where every node knows what the others are doing. It guarantees that free users stay within limits, premium customers get priority, and the system remains stable even under heavy load.

🌍 What This Means

Redis has made our rate limiting stateless yet globally aware — a critical milestone for any cloud-native SaaS. We’ve proven that scalability doesn’t come from adding servers; it comes from adding shared intelligence. Every replica, every tenant, every session now abides by one universal rule set, making the system predictable, resilient, and fair.

🔮 Next Steps

From here, we can evolve toward:

With this foundation, your SaaS platform is ready to handle thousands of tenants, millions of requests, and real-world production workloads — all with fairness and confidence.

Local enforcement builds protection. Distributed enforcement builds trust. Redis turns your SaaS into a globally fair system.

Stay tuned for Project 4, where we take the next leap — connecting this distributed throttling engine with usage-based billing and intelligent analytics, transforming technical governance into business intelligence. 🚀