As SaaS systems evolve from prototypes into production-grade platforms, visibility becomes the single most important factor that separates stable businesses from fragile ones. In earlier projects, we built a multi-tenant SaaS system with dynamic rate limiting and Redis-backed fairness. It worked — but we were still blind to what was happening inside.
Was one tenant consuming too much bandwidth? Were 429s spiking unexpectedly? Was latency growing across replicas or regions?
Without answers to these questions, scaling decisions, pricing tiers, and reliability guarantees remain guesswork. This is why observability is not optional — it’s the nervous system of a mature SaaS platform.
In Project 4, we integrate Prometheus and Grafana into our multi-tenant SaaS architecture to create a full monitoring and visualization layer. These tools transform raw metrics into actionable intelligence, helping engineers, DevOps, and business stakeholders see — in real time — how tenants, services, and infrastructure behave under load.
This project adds a complete observability layer to our existing FastAPI + Redis + SlowAPI stack. We now monitor API performance, rate limiting, tenant activity, and Redis health, all visualized in Grafana. This marks the shift from a working system to a well-instrumented platform.
Without observability, troubleshooting becomes guesswork. Modern SaaS platforms must answer three key questions in real time:
Observability ensures your team can detect issues before users report them and can optimize performance, pricing, and user experience with confidence.
We expand our architecture with two new services:
Core Stack:
Together, they provide a complete feedback loop — from data generation to visualization.
We use prometheus-fastapi-instrumentator to automatically collect:
This library attaches a /metrics endpoint to every API instance.
Prometheus scrapes these endpoints every few seconds, converting metrics into time series data.
Both api1 and api2 run on separate containers (ports 8005 and 8006).
Prometheus scrapes both /metrics endpoints, giving us visibility across replicas.
This helps us answer:
It’s the foundation for cross-instance health tracking.
Redis isn’t just the rate-limit store — it’s a critical dependency.
We use redis_exporter to expose Redis internals like:
Prometheus scrapes these metrics too, giving full visibility into backend stability. This is essential for detecting overloads, leaks, or eviction events.
In prometheus.yml, we define three scrape jobs:
api1 and api2 → scrape /metricsredis_exporter → scrape Redis metricsPrometheus turns these into labeled time series, allowing flexible queries:
rate(http_requests_total[1m])rate(http_requests_total{status="429"}[1m])redis_upThese queries power Grafana panels and alert rules.
Grafana connects to Prometheus and allows us to build dashboards like:
The visual clarity enables faster decisions and trend recognition. You can literally “see” your SaaS performance evolve in real time.
These are the most critical metrics to track for tenant fairness and user experience:
By observing these, we can fine-tune rate limits, pricing plans, and scaling strategies.
Prometheus supports alerting rules for critical conditions:
rate(http_requests_total{status="429"}[5m]) > threshold → “Rate limit flood”redis_up == 0 → “Redis down”avg(latency) > 0.5s → “Performance degradation”Alerts can trigger Slack or email notifications, allowing proactive mitigation. This converts raw observability into automated awareness.
Our docker-compose.yml runs 6 integrated services:
Everything runs locally yet represents a real distributed SaaS architecture. With one command:
docker compose up --buildyou get a full-featured monitoring environment.
To validate the setup:
/metrics on both APIs (http://localhost:8005/metrics).You’ll see 429 counts, latency curves, and request volume appear within seconds — real observability in action.
To make dashboards tenant-aware:
X-Tenant-ID.This enables business insights:
Which tenants drive the most traffic? Are premium tenants experiencing faster responses?
Observability meets business intelligence.
From here, your SaaS can evolve to:
This observability foundation transforms your SaaS into a self-aware system — capable of monitoring, alerting, and even self-optimizing in the future.
Project 4 completes your SaaS’s transformation into a fully observable, data-driven platform. You’ve now built a system that doesn’t just run — it knows how it’s running.
Observability brings confidence:
This is the hallmark of mature SaaS engineering — every component is monitored, every anomaly visible, and every decision backed by data.
In the journey so far:
Next, in Project 5, we’ll transform these metrics into automated insights — integrating billing, tenant analytics, and AI-driven anomaly detection. That’s where your SaaS evolves from being “monitored” to being intelligent.
“You can’t improve what you can’t see — and now, you can see everything.”
This project builds on Project 3 (Redis-backed rate limiting) by adding full observability using Prometheus and Grafana. The goal is to see, measure, and analyze your SaaS system’s performance, usage, and fairness — across multiple tenants and API replicas.
| Service | Role |
|---|---|
| FastAPI (api1 & api2) | Multi-tenant endpoints with Redis-backed rate limiting |
| Redis | Shared backend for rate limit counters |
| Redis Exporter | Exposes Redis health metrics to Prometheus |
| Prometheus | Scrapes and stores metrics from APIs and Redis |
| Grafana | Visualizes metrics, trends, and alerts |
Client → FastAPI (/session, /chat)
↕
Redis ←→ Redis Exporter
↕
Prometheus ←→ Grafana DashboardsEach API instance has a /metrics endpoint exposing Prometheus-format metrics (via prometheus-fastapi-instrumentator).
Prometheus scrapes these periodically, stores them as time series, and Grafana visualizes them interactively.
Tenants are defined in tenants.jsonl:
{"id":"acme","limits":{"chat_per_10s":8}}
{"id":"globex","limits":{"chat_per_10s":4}}
{"id":"initech","limits":{"chat_per_10s":6}}Each has its own quota, model, and welcome message.
SlowAPI stores rate-limit counters in Redis, ensuring global consistency across both api1 and api2.
If Acme hits api1 seven times and api2 once, the global limit of 8/10s still applies.
Each FastAPI container automatically exposes metrics via /metrics, such as:
http_requests_total — total requests by status codehttp_request_duration_seconds_bucket — latency distributionin_progress_requests — concurrent callsRedis metrics are exposed through Redis Exporter at port 9121.
Prometheus periodically (every 5s) collects:
/metrics from api1 and api2These metrics are stored in a time-series database with tenant-aware labels.
Grafana connects to Prometheus and visualizes metrics through dashboards showing:
This creates real-time operational visibility.
Let’s walk through how to verify that everything works correctly.
Run all containers:
docker compose up --buildAccess:
Use curl to create sessions for each tenant:
ACME_SID=$(curl -s -X POST http://localhost:8005/session \
-H 'Content-Type: application/json' \
-d '{"tenant_id":"acme"}' | jq -r .session_id)Repeat for other tenants (globex, initech).
Send multiple requests to /chat:
for i in {1..10}; do
curl -s -X POST http://localhost:8005/chat \
-H 'Content-Type: application/json' \
-H "X-Tenant-ID: acme" \
-H "X-Session-ID: $ACME_SID" \
-d "{\"tenant_id\":\"acme\",\"session_id\":\"$ACME_SID\",\"message\":\"msg-$i\"}" >/dev/null
doneCheck /metrics directly:
curl -s http://localhost:8005/metrics | grep http_requests_totalYou’ll see output like:
http_requests_total{handler="/chat",method="POST",status="200"} 8
http_requests_total{handler="/chat",method="POST",status="429"} 2This confirms Prometheus-compatible metrics are being produced.
Open Prometheus → http://localhost:9090 Try queries like:
# Total requests by endpoint
sum(rate(http_requests_total[1m])) by (handler)
# 429 rate
rate(http_requests_total{status="429"}[1m])
# 95th percentile latency
histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[1m])) by (le, handler))You’ll see live metrics plotted as graphs.
Open http://localhost:9121/metrics Look for:
redis_connected_clients
redis_memory_used_bytes
redis_upIf redis_up 1, Redis is healthy.
Open Grafana → http://localhost:3000
Login → admin / admin
Go to Dashboards → New → Add Visualization
Add panels with queries such as:
sum(rate(http_requests_total[1m])) by (status)
histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, handler))
redis_memory_used_bytes
rate(http_requests_total{status="429"}[1m])You’ll instantly see live graphs for requests, latency, Redis usage, and 429s.
Make requests to both APIs:
for i in {1..7}; do
curl -s -X POST http://localhost:8005/chat \
-H 'X-Tenant-ID: acme' -H "X-Session-ID: $ACME_SID" \
-d "{\"tenant_id\":\"acme\",\"session_id\":\"$ACME_SID\",\"message\":\"api1-$i\"}" >/dev/null
done
curl -i -X POST http://localhost:8006/chat \
-H 'X-Tenant-ID: acme' -H "X-Session-ID: $ACME_SID" \
-d "{\"tenant_id\":\"acme\",\"session_id\":\"$ACME_SID\",\"message\":\"api2-over\"}"Result: 8th request returns 429, showing Redis-backed limits are global.
In Grafana:
Use Prometheus query:
rate(http_requests_total{status="429"}[1m]) > 0.5You can configure an alert to trigger when rate limiting exceeds a threshold, simulating a “tenant flood” scenario.
| Stage | Component | Description |
|---|---|---|
| 1 | FastAPI (Instrumented) | Generates request/response metrics |
| 2 | Prometheus | Scrapes /metrics every 5 seconds |
| 3 | Redis Exporter | Provides backend health metrics |
| 4 | Prometheus DB | Stores all metrics in time-series format |
| 5 | Grafana | Queries and visualizes metrics for analysis |
This flow creates a complete feedback loop — from action (request) to visibility (dashboard).
| Issue | Cause | Fix |
|---|---|---|
| Prometheus shows “target down” | API not reachable | Check container names and ports |
| Grafana “no data” | Prometheus scrape misconfigured | Verify prometheus.yml job names |
| 429s not showing | SlowAPI or Redis misconfigured | Check RATE_LIMIT_STORAGE_URI env var |
| Redis memory leak | Missing TTL on keys | Use FLUSHALL in demo environments |
✅ Multi-tenant rate limits are enforced consistently across replicas ✅ Prometheus captures API + Redis performance metrics ✅ Grafana visualizes throughput, latency, and errors in real time ✅ The system supports proactive alerting and scalability planning
This project marks the maturity phase of our SaaS journey — your platform now “sees” itself and reacts accordingly.
| Step | Description | Example |
|---|---|---|
| 1 | Tenant makes API request | /chat with tenant headers |
| 2 | FastAPI logs + exports metrics | via prometheus-fastapi-instrumentator |
| 3 | Redis updates counters | tenant/session scoped |
| 4 | Prometheus scrapes metrics | from APIs + Redis exporter |
| 5 | Grafana visualizes data | latency, 429s, memory, throughput |
| 6 | DevOps sets alerts | Prometheus rules or Grafana alerts |
| 7 | Business insights | identify heavy tenants, adjust pricing |
Project 4 completes your SaaS observability loop. You can now:
This creates a self-monitoring SaaS ecosystem — fair, stable, and transparent.
“Data turns guesswork into precision — observability turns systems into living organisms.”