As our multi-tenant SaaS journey continues, we move from local experimentation to real-world scalability. In Project 1, we learned how to handle multiple tenants within one codebase. In Project 2, we introduced per-tenant fairness with SlowAPI, ensuring every tenant enjoyed predictable service levels. Now, in Project 3, we step into the production zone — where scalability, distribution, and reliability become non-negotiable.
When your SaaS runs on a single container, local memory-based limits are fine. But in the real world, applications rarely live alone. They run on multiple containers, pods, or servers behind a load balancer. Without a shared backend, each instance maintains its own counters — creating inconsistent throttling, double allowances, and even missed abuse detection.
That’s where Redis comes in. Redis provides a fast, centralized store that SlowAPI uses to maintain all rate-limit counters globally. Every API replica connects to the same Redis instance, so no matter which node handles a request, the system enforces a single truth about rate usage. It’s the heartbeat that keeps fairness synchronized across your distributed infrastructure.
The goal of this project is to show you exactly how to convert a local rate-limiting setup into a production-grade, distributed enforcement system using FastAPI + SlowAPI + Redis. You’ll see how two separate API containers share the same global counters, how tenants retain their dynamic per-plan limits, and how Redis ensures consistent “Too Many Requests” (HTTP 429) responses even across instances.
By the end of this project, you’ll understand how enterprise SaaS systems scale — not by adding servers blindly, but by sharing intelligence through smart backends like Redis. This is the bridge between a working demo and a real SaaS product ready for global deployment.
This project takes the previous multi-tenant SaaS demos one step further by connecting SlowAPI to a Redis backend. Now, all API replicas share the same rate-limit counters, ensuring consistent enforcement no matter which instance handles a request. This architecture introduces distributed fairness — every tenant and session are tracked globally, not per container.
In a single-instance setup, SlowAPI keeps counters in memory. But once you scale horizontally (multiple containers or pods), each instance tracks limits independently — causing inconsistent throttling. Redis solves this by acting as a central store for counters and tokens, allowing:
This transforms rate-limiting from local enforcement to a cluster-wide governance system.
Each API container reads RATE_LIMIT_STORAGE_URI=redis://redis:6379/0, connecting SlowAPI’s limiter to Redis.
Both api1 (port 8003) and api2 (port 8004) enforce limits from the same global store.
SlowAPI builds on the limits library, which supports Redis, Memcached, and in-memory backends.
When Redis is used:
This approach is highly performant because Redis handles millions of atomic ops per second.
Since both API instances use the same Redis DB, every request — regardless of which container it hits — updates the same counter.
So, if Acme sends seven messages to api1 and one to api2, the total of eight still respects Acme’s limit of 8 / 10 seconds.
This provides true global rate enforcement, critical for Kubernetes or load-balanced environments.
The environment variable:
RATE_LIMIT_STORAGE_URI=redis://redis:6379/0tells SlowAPI where to persist counters. Changing the DB index or password allows easy separation for staging, testing, or per-environment throttling. This variable makes the rate-limit backend fully configurable without code changes.
To verify shared enforcement:
/chat calls to api1.This simple test proves that the limiter is reading and updating counters in Redis, not in local memory.
The docker-compose.yml defines:
redisdata for persistenceWith one command:
docker compose up --buildyou get a fully working multi-tenant, multi-replica SaaS environment with global rate limiting.
Each tenant still has its own configuration file entry:
{"id":"acme","limits":{"chat_per_10s":8}}
{"id":"globex","limits":{"chat_per_10s":4}}SlowAPI dynamically reads these limits and applies them to Redis counters.
The limit expression becomes, for example, "8/10 seconds", guaranteeing plan-based fairness even in distributed setups.
@limiter.limit() decorators.chat_key_func(request) builds the limiter key as
tenant_id:session_id, falling back to IP if headers are missing.chat_limit_value(request) reads the tenant’s limit from configuration and returns the string "N/10 seconds".
These two small functions enable fully dynamic, per-tenant limit policies without hard-coding values.By hitting both APIs rapidly, you’ll notice the 429 response appear after the global count exceeds the limit, regardless of instance. Example:
HTTP/1.1 429 Too Many Requests
{"detail":"Rate limit exceeded: 8 per 10 seconds"}This is the final proof that Redis synchronization works perfectly — each replica contributes to the same global counter pool.
Redis makes the system ready for:
This setup forms the core of enterprise-grade SaaS traffic management.
From here you can evolve toward:
This completes the trilogy of foundational SaaS demos — a journey from multi-tenant design → dynamic local throttling → distributed Redis-backed governance.
Awesome—here’s a clear “how it works” + hands-on test plan to demonstrate Project-3 (Redis-backed SlowAPI rate limiting) end-to-end.
api1 on :8003, api2 on :8004) serve multiple tenants.SlowAPI enforces limits using Redis (shared storage).
/session is IP-scoped (e.g., 10/min)./chat is tenant+session-scoped using a composite limiter key: X-Tenant-ID:X-Session-ID.Tenant limits (e.g., chat_per_10s) come from tenants.jsonl.
Because counters live in Redis, requests hitting different replicas still count toward the same global window.
multi-tenant-redis-rate-limit/jq (or craft JSON by hand)cd multi-tenant-redis-rate-limit
docker compose up --build
# api1 -> http://localhost:8003
# api2 -> http://localhost:8004
# redis -> localhost:6379 (inside compose network it's "redis:6379")Health check:
curl -s http://localhost:8003/ | jq
curl -s http://localhost:8004/ | jqYou should see tenants, defaults, and storage: "redis://redis:6379/0".
ACME_SID=$(curl -s -X POST http://localhost:8003/session \
-H 'Content-Type: application/json' \
-d '{"tenant_id":"acme"}' | jq -r .session_id)
GLOBEX_SID=$(curl -s -X POST http://localhost:8003/session \
-H 'Content-Type: application/json' \
-d '{"tenant_id":"globex"}' | jq -r .session_id)
INITECH_SID=$(curl -s -X POST http://localhost:8003/session \
-H 'Content-Type: application/json' \
-d '{"tenant_id":"initech"}' | jq -r .session_id)
echo "ACME_SID=$ACME_SID"
echo "GLOBEX_SID=$GLOBEX_SID"
echo "INITECH_SID=$INITECH_SID"✅ /session is rate-limited per IP (e.g., 10/min). If you loop >10, expect 429.
Tenant limits from tenants.jsonl (examples):
8 / 10s4 / 10s6 / 10sSend 8 allowed messages for acme to api1:
for i in {1..8}; do
curl -s -X POST http://localhost:8003/chat \
-H 'Content-Type: application/json' \
-H "X-Tenant-ID: acme" \
-H "X-Session-ID: $ACME_SID" \
-d "$(jq -n --arg t acme --arg s "$ACME_SID" --arg m "msg-$i" \
'{tenant_id:$t, session_id:$s, message:$m}')";
echo ""
doneAll 8 requests succeed ✅
Send one more immediately:
curl -i -s -X POST http://localhost:8003/chat \
-H 'Content-Type: application/json' \
-H "X-Tenant-ID: acme" \
-H "X-Session-ID: $ACME_SID" \
-d "$(jq -n --arg t acme --arg s "$ACME_SID" --arg m "overflow" \
'{tenant_id:$t, session_id:$s, message:$m}')"Expected:
HTTP/1.1 429 Too Many Requests
{"detail":"Rate limit exceeded: 8 per 10 seconds"}⏲️ Wait ~10 seconds and requests will succeed again (window resets).
Goal: show counters are shared between api1 and api2.
for i in {1..7}; do
curl -s -X POST http://localhost:8003/chat \
-H 'Content-Type: application/json' \
-H "X-Tenant-ID: acme" \
-H "X-Session-ID: $ACME_SID" \
-d "{\"tenant_id\":\"acme\",\"session_id\":\"$ACME_SID\",\"message\":\"cross-$i\"}";
echo ""
donecurl -i -s -X POST http://localhost:8004/chat \
-H 'Content-Type: application/json' \
-H "X-Tenant-ID: acme" \
-H "X-Session-ID: $ACME_SID" \
-d "{\"tenant_id\":\"acme\",\"session_id\":\"$ACME_SID\",\"message\":\"n8\"}"curl -i -s -X POST http://localhost:8004/chat \
-H 'Content-Type: application/json' \
-H "X-Tenant-ID: acme" \
-H "X-Session-ID: $ACME_SID" \
-d "{\"tenant_id\":\"acme\",\"session_id\":\"$ACME_SID\",\"message\":\"n9-should-429\"}"Conclusion: both containers enforce the same window, thanks to Redis storage.
globex (limit 4/10s):
for i in {1..5}; do
curl -i -s -X POST http://localhost:8003/chat \
-H 'Content-Type: application/json' \
-H "X-Tenant-ID: globex" \
-H "X-Session-ID: $GLOBEX_SID" \
-d "{\"tenant_id\":\"globex\",\"session_id\":\"$GLOBEX_SID\",\"message\":\"g-$i\"}";
echo ""
doneRequests 1–4 ✅, request 5 → 429 ❌. This proves per-tenant dynamic limits are applied.
If X-Tenant-ID or X-Session-ID is missing, the limiter falls back to IP-based keying:
curl -i -s -X POST http://localhost:8003/chat \
-H 'Content-Type: application/json' \
-d "{\"tenant_id\":\"acme\",\"session_id\":\"$ACME_SID\",\"message\":\"no-headers\"}"Repeated quickly → you’ll see 429 based on IP.
Server enforces session ownership; misuse returns 403:
curl -i -s -X POST http://localhost:8003/chat \
-H 'Content-Type: application/json' \
-H "X-Tenant-ID: globex" \
-H "X-Session-ID: $ACME_SID" \
-d "{\"tenant_id\":\"globex\",\"session_id\":\"$ACME_SID\",\"message\":\"bad\"}"Expect:
HTTP/1.1 403 Forbidden
{"detail":"Session does not belong to tenant"}/session IP limit (spam control)Prove per-minute IP cap (default 10):
for i in {1..12}; do
curl -i -s -X POST http://localhost:8003/session \
-H 'Content-Type: application/json' \
-d '{"tenant_id":"acme"}'
echo ""
doneCalls 11–12 → 429.
Open a shell in the Redis container:
docker compose exec redis redis-cliUseful commands:
# See keys (limits uses its own naming; you’ll see time-windowed keys)
SCAN 0 COUNT 100
# Inspect TTL on a recent rate key (replace with an actual key from SCAN)
TTL <key>
# Flush counters (resets all windows; use with care in demos)
FLUSHALLYou can also run:
MONITOR…then trigger a few chat requests in another terminal to watch Redis ops live.
Edit backend/tenants.jsonl, e.g., raise Acme to 12:
{"id":"acme","limits":{"chat_per_10s":12}}Restart the API containers (Redis can stay up):
docker compose restart api1 api2Re-run your chat loop—observe the new 12/10s budget is enforced, across both replicas.
storage: redis://redis:6379/0 at /.docker-compose.yml.jq: craft JSON manually in -d '{"tenant_id":"acme",...}'./session, composite (tenant+session) limits on /chatRATE_LIMIT_STORAGE_URI)With Project 3, we’ve transformed our SaaS prototype into a distributed, production-ready architecture. The addition of Redis shifts rate limiting from a local safeguard into a central governance layer — one that enforces consistency, fairness, and trust across every running instance.
You’ve now implemented:
This isn’t just technical progress — it’s architectural maturity. Your SaaS now behaves like a living, coordinated ecosystem where every node knows what the others are doing. It guarantees that free users stay within limits, premium customers get priority, and the system remains stable even under heavy load.
Redis has made our rate limiting stateless yet globally aware — a critical milestone for any cloud-native SaaS. We’ve proven that scalability doesn’t come from adding servers; it comes from adding shared intelligence. Every replica, every tenant, every session now abides by one universal rule set, making the system predictable, resilient, and fair.
From here, we can evolve toward:
With this foundation, your SaaS platform is ready to handle thousands of tenants, millions of requests, and real-world production workloads — all with fairness and confidence.
Local enforcement builds protection. Distributed enforcement builds trust. Redis turns your SaaS into a globally fair system.
Stay tuned for Project 4, where we take the next leap — connecting this distributed throttling engine with usage-based billing and intelligent analytics, transforming technical governance into business intelligence. 🚀