🧭 Project 4: Multi-Tenant SaaS with Prometheus and Grafana Observability

🚀 Introduction: The Need for Observability in Modern SaaS

As SaaS systems evolve from prototypes into production-grade platforms, visibility becomes the single most important factor that separates stable businesses from fragile ones. In earlier projects, we built a multi-tenant SaaS system with dynamic rate limiting and Redis-backed fairness. It worked — but we were still blind to what was happening inside.

Was one tenant consuming too much bandwidth? Were 429s spiking unexpectedly? Was latency growing across replicas or regions?

Without answers to these questions, scaling decisions, pricing tiers, and reliability guarantees remain guesswork. This is why observability is not optional — it’s the nervous system of a mature SaaS platform.

In Project 4, we integrate Prometheus and Grafana into our multi-tenant SaaS architecture to create a full monitoring and visualization layer. These tools transform raw metrics into actionable intelligence, helping engineers, DevOps, and business stakeholders see — in real time — how tenants, services, and infrastructure behave under load.

📊 Slide-by-Slide Explanation

🖥️ Slide 1: Project 4 – Multi-Tenant SaaS with Prometheus and Grafana Observability

This project adds a complete observability layer to our existing FastAPI + Redis + SlowAPI stack. We now monitor API performance, rate limiting, tenant activity, and Redis health, all visualized in Grafana. This marks the shift from a working system to a well-instrumented platform.

🌍 Slide 2: Why Observability Matters in SaaS

Without observability, troubleshooting becomes guesswork. Modern SaaS platforms must answer three key questions in real time:

What’s happening? (Monitoring)
Why is it happening? (Tracing and metrics correlation)
What should we do next? (Alerts and automated scaling)

Observability ensures your team can detect issues before users report them and can optimize performance, pricing, and user experience with confidence.

🏗️ Slide 3: Architecture Overview — FastAPI + Redis + Prometheus + Grafana

We expand our architecture with two new services:

Prometheus: scrapes metrics from FastAPI and Redis exporters.
Grafana: visualizes metrics and trends in real-time dashboards.

Core Stack:

FastAPI (Business Logic + SlowAPI Limits)
Redis (Shared Counters)
Prometheus (Metric Collector)
Grafana (Visualization UI)
Redis Exporter (Backend Metrics)

Together, they provide a complete feedback loop — from data generation to visualization.

📈 Slide 4: Metrics Integration using Prometheus-FastAPI-Instrumentator

We use prometheus-fastapi-instrumentator to automatically collect:

Request counts
Latency histograms
HTTP status codes (including 429)
Endpoint handler names

This library attaches a /metrics endpoint to every API instance. Prometheus scrapes these endpoints every few seconds, converting metrics into time series data.

🔄 Slide 5: Collecting Metrics from API1 and API2 Instances

Both api1 and api2 run on separate containers (ports 8005 and 8006). Prometheus scrapes both /metrics endpoints, giving us visibility across replicas. This helps us answer:

Is one replica slower than the other?
Are requests being distributed evenly?
Is rate limiting consistent cluster-wide?

It’s the foundation for cross-instance health tracking.

💾 Slide 6: Redis Exporter for Backend Health Monitoring

Redis isn’t just the rate-limit store — it’s a critical dependency. We use redis_exporter to expose Redis internals like:

Memory usage
Connections
CPU load
Key counts and expirations

Prometheus scrapes these metrics too, giving full visibility into backend stability. This is essential for detecting overloads, leaks, or eviction events.

📜 Slide 7: Prometheus Scrape Configuration and Queries

In prometheus.yml, we define three scrape jobs:

api1 and api2 → scrape /metrics
redis_exporter → scrape Redis metrics

Prometheus turns these into labeled time series, allowing flexible queries:

Request rate: rate(http_requests_total[1m])
Error rate: rate(http_requests_total{status="429"}[1m])
Redis health: redis_up

These queries power Grafana panels and alert rules.

📊 Slide 8: Grafana Dashboards — Visualizing SaaS Performance

Grafana connects to Prometheus and allows us to build dashboards like:

Request throughput by tenant
Latency (p95) across endpoints
Rate-limited requests (429s) per minute
Redis memory and connections

The visual clarity enables faster decisions and trend recognition. You can literally “see” your SaaS performance evolve in real time.

📉 Slide 9: Key Metrics — Requests, Latency, and 429 Rate

These are the most critical metrics to track for tenant fairness and user experience:

Requests per second: load indicator
Latency: user experience quality
429s (Rate Limited): quota enforcement frequency

By observing these, we can fine-tune rate limits, pricing plans, and scaling strategies.

⚠️ Slide 10: Alerting and Error Tracking via Prometheus Rules

Prometheus supports alerting rules for critical conditions:

rate(http_requests_total{status="429"}[5m]) > threshold → “Rate limit flood”
redis_up == 0 → “Redis down”
avg(latency) > 0.5s → “Performance degradation”

Alerts can trigger Slack or email notifications, allowing proactive mitigation. This converts raw observability into automated awareness.

🐳 Slide 11: Docker Compose Setup — Full Observability Stack

Our docker-compose.yml runs 6 integrated services:

Redis
Redis Exporter
API1
API2
Prometheus
Grafana

Everything runs locally yet represents a real distributed SaaS architecture. With one command:

docker compose up --build

you get a full-featured monitoring environment.

🧪 Slide 12: Testing the Metrics Endpoints

To validate the setup:

Generate API traffic using curl loops.
Hit /metrics on both APIs (http://localhost:8005/metrics).
Verify data in Prometheus at http://localhost:9090.
Watch live updates in Grafana dashboards.

You’ll see 429 counts, latency curves, and request volume appear within seconds — real observability in action.

📊 Slide 13: Building Dashboards for Tenant Performance

To make dashboards tenant-aware:

Add a middleware to tag metrics with X-Tenant-ID.
Create panels in Grafana grouped by tenant label.
Track top tenants by usage and per-plan latency.

This enables business insights:

Which tenants drive the most traffic? Are premium tenants experiencing faster responses?

Observability meets business intelligence.

☁️ Slide 14: Scaling, Optimization, and Next Steps

From here, your SaaS can evolve to:

Use Grafana Loki for logs and tracing.
Add Prometheus Alertmanager for incident management.
Deploy in Kubernetes with autoscaling driven by metrics.
Combine Prometheus data with billing for usage-based pricing.

This observability foundation transforms your SaaS into a self-aware system — capable of monitoring, alerting, and even self-optimizing in the future.

🏁 Outro: From Visibility to Intelligence

Project 4 completes your SaaS’s transformation into a fully observable, data-driven platform. You’ve now built a system that doesn’t just run — it knows how it’s running.

Observability brings confidence:

When tenants scale up, you can prove performance remains stable.
When 429s spike, you can pinpoint why.
When Redis slows down, you see it before it crashes.

This is the hallmark of mature SaaS engineering — every component is monitored, every anomaly visible, and every decision backed by data.

In the journey so far:

Project 1 gave us multi-tenancy.
Project 2 gave us per-tenant fairness.
Project 3 gave us distributed enforcement via Redis.
Project 4 now gives us awareness — the eyes and ears of your platform.

Next, in Project 5, we’ll transform these metrics into automated insights — integrating billing, tenant analytics, and AI-driven anomaly detection. That’s where your SaaS evolves from being “monitored” to being intelligent.

“You can’t improve what you can’t see — and now, you can see everything.”

🧠 Project 4: Multi-Tenant SaaS with Prometheus and Grafana Observability

🔍 Overall System Flow and Testing Guide

⚙️ 1️⃣ System Overview

This project builds on Project 3 (Redis-backed rate limiting) by adding full observability using Prometheus and Grafana. The goal is to see, measure, and analyze your SaaS system’s performance, usage, and fairness — across multiple tenants and API replicas.

🎯 Key Objectives:

Collect API metrics (requests, latency, 429s, etc.)
Monitor Redis health and performance
Visualize all metrics in Grafana dashboards
Enable data-driven debugging and scaling decisions

🧩 2️⃣ System Architecture

🏗️ Components:

Service	Role
FastAPI (api1 & api2)	Multi-tenant endpoints with Redis-backed rate limiting
Redis	Shared backend for rate limit counters
Redis Exporter	Exposes Redis health metrics to Prometheus
Prometheus	Scrapes and stores metrics from APIs and Redis
Grafana	Visualizes metrics, trends, and alerts

🔄 Flow Diagram (Conceptual)

Client → FastAPI (/session, /chat)
        ↕
      Redis ←→ Redis Exporter
        ↕
  Prometheus ←→ Grafana Dashboards

Each API instance has a /metrics endpoint exposing Prometheus-format metrics (via prometheus-fastapi-instrumentator). Prometheus scrapes these periodically, stores them as time series, and Grafana visualizes them interactively.

🧠 3️⃣ Step-by-Step Flow

① Tenant Configuration

Tenants are defined in tenants.jsonl:

{"id":"acme","limits":{"chat_per_10s":8}}
{"id":"globex","limits":{"chat_per_10s":4}}
{"id":"initech","limits":{"chat_per_10s":6}}

Each has its own quota, model, and welcome message.

② Redis-Backed Rate Limiting

SlowAPI stores rate-limit counters in Redis, ensuring global consistency across both api1 and api2. If Acme hits api1 seven times and api2 once, the global limit of 8/10s still applies.

③ Metrics Collection

Each FastAPI container automatically exposes metrics via /metrics, such as:

http_requests_total — total requests by status code
http_request_duration_seconds_bucket — latency distribution
in_progress_requests — concurrent calls

Redis metrics are exposed through Redis Exporter at port 9121.

④ Prometheus Scraping

Prometheus periodically (every 5s) collects:

/metrics from api1 and api2
Redis Exporter metrics

These metrics are stored in a time-series database with tenant-aware labels.

⑤ Grafana Visualization

Grafana connects to Prometheus and visualizes metrics through dashboards showing:

Total requests per minute
429 (rate-limited) request rate
Latency percentiles (p95, p99)
Redis memory and CPU utilization

This creates real-time operational visibility.

🧪 4️⃣ Testing the System

Let’s walk through how to verify that everything works correctly.

✅ Step 1 — Start the Stack

Run all containers:

docker compose up --build

Access:

API1 → http://localhost:8005
API2 → http://localhost:8006
Prometheus → http://localhost:9090
Grafana → http://localhost:3000 (admin / admin)
Redis Exporter → http://localhost:9121/metrics

✅ Step 2 — Create Tenant Sessions

Use curl to create sessions for each tenant:

ACME_SID=$(curl -s -X POST http://localhost:8005/session \
  -H 'Content-Type: application/json' \
  -d '{"tenant_id":"acme"}' | jq -r .session_id)

Repeat for other tenants (globex, initech).

✅ Step 3 — Generate Load

Send multiple requests to /chat:

for i in {1..10}; do
  curl -s -X POST http://localhost:8005/chat \
    -H 'Content-Type: application/json' \
    -H "X-Tenant-ID: acme" \
    -H "X-Session-ID: $ACME_SID" \
    -d "{\"tenant_id\":\"acme\",\"session_id\":\"$ACME_SID\",\"message\":\"msg-$i\"}" >/dev/null
done

The first 8 requests will succeed ✅
The next 2 should return HTTP 429 ❌

✅ Step 4 — Validate Metrics

Check /metrics directly:

curl -s http://localhost:8005/metrics | grep http_requests_total

You’ll see output like:

http_requests_total{handler="/chat",method="POST",status="200"} 8
http_requests_total{handler="/chat",method="POST",status="429"} 2

This confirms Prometheus-compatible metrics are being produced.

✅ Step 5 — View in Prometheus

Open Prometheus → http://localhost:9090 Try queries like:

# Total requests by endpoint
sum(rate(http_requests_total[1m])) by (handler)

# 429 rate
rate(http_requests_total{status="429"}[1m])

# 95th percentile latency
histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[1m])) by (le, handler))

You’ll see live metrics plotted as graphs.

✅ Step 6 — Redis Metrics

Open http://localhost:9121/metrics Look for:

redis_connected_clients
redis_memory_used_bytes
redis_up

If redis_up 1, Redis is healthy.

✅ Step 7 — Grafana Dashboards

Open Grafana → http://localhost:3000 Login → admin / admin Go to Dashboards → New → Add Visualization

Add panels with queries such as:

sum(rate(http_requests_total[1m])) by (status)
histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, handler))
redis_memory_used_bytes
rate(http_requests_total{status="429"}[1m])

You’ll instantly see live graphs for requests, latency, Redis usage, and 429s.

✅ Step 8 — Cross-Replica Validation

Make requests to both APIs:

for i in {1..7}; do
  curl -s -X POST http://localhost:8005/chat \
    -H 'X-Tenant-ID: acme' -H "X-Session-ID: $ACME_SID" \
    -d "{\"tenant_id\":\"acme\",\"session_id\":\"$ACME_SID\",\"message\":\"api1-$i\"}" >/dev/null
done

curl -i -X POST http://localhost:8006/chat \
  -H 'X-Tenant-ID: acme' -H "X-Session-ID: $ACME_SID" \
  -d "{\"tenant_id\":\"acme\",\"session_id\":\"$ACME_SID\",\"message\":\"api2-over\"}"

Result: 8th request returns 429, showing Redis-backed limits are global.

✅ Step 9 — Observe Changes

In Grafana:

Notice 429 count spikes after hitting rate limits
Observe increased Redis memory usage during load
View request latency trends (should remain stable under normal conditions)

✅ Step 10 — Alert Simulation

Use Prometheus query:

rate(http_requests_total{status="429"}[1m]) > 0.5

You can configure an alert to trigger when rate limiting exceeds a threshold, simulating a “tenant flood” scenario.

📈 5️⃣ Example Metric Flow Summary

Stage	Component	Description
1	FastAPI (Instrumented)	Generates request/response metrics
2	Prometheus	Scrapes `/metrics` every 5 seconds
3	Redis Exporter	Provides backend health metrics
4	Prometheus DB	Stores all metrics in time-series format
5	Grafana	Queries and visualizes metrics for analysis

This flow creates a complete feedback loop — from action (request) to visibility (dashboard).

🧰 6️⃣ Troubleshooting Tips

Issue	Cause	Fix
Prometheus shows “target down”	API not reachable	Check container names and ports
Grafana “no data”	Prometheus scrape misconfigured	Verify `prometheus.yml` job names
429s not showing	SlowAPI or Redis misconfigured	Check `RATE_LIMIT_STORAGE_URI` env var
Redis memory leak	Missing TTL on keys	Use `FLUSHALL` in demo environments

🚀 7️⃣ What This Demo Proves

✅ Multi-tenant rate limits are enforced consistently across replicas ✅ Prometheus captures API + Redis performance metrics ✅ Grafana visualizes throughput, latency, and errors in real time ✅ The system supports proactive alerting and scalability planning

This project marks the maturity phase of our SaaS journey — your platform now “sees” itself and reacts accordingly.

🏁 8️⃣ The Big Picture — End-to-End Flow

Step	Description	Example
1	Tenant makes API request	`/chat` with tenant headers
2	FastAPI logs + exports metrics	via `prometheus-fastapi-instrumentator`
3	Redis updates counters	tenant/session scoped
4	Prometheus scrapes metrics	from APIs + Redis exporter
5	Grafana visualizes data	latency, 429s, memory, throughput
6	DevOps sets alerts	Prometheus rules or Grafana alerts
7	Business insights	identify heavy tenants, adjust pricing

💡 9️⃣ Summary

Project 4 completes your SaaS observability loop. You can now:

Monitor rate limiting behavior per tenant
Measure performance bottlenecks
Detect Redis stress early
Build visual dashboards and alerts

This creates a self-monitoring SaaS ecosystem — fair, stable, and transparent.

“Data turns guesswork into precision — observability turns systems into living organisms.”