🧭 Project 4: Multi-Tenant SaaS with Prometheus and Grafana Observability


🚀 Introduction: The Need for Observability in Modern SaaS

As SaaS systems evolve from prototypes into production-grade platforms, visibility becomes the single most important factor that separates stable businesses from fragile ones. In earlier projects, we built a multi-tenant SaaS system with dynamic rate limiting and Redis-backed fairness. It worked — but we were still blind to what was happening inside.

Was one tenant consuming too much bandwidth? Were 429s spiking unexpectedly? Was latency growing across replicas or regions?

Without answers to these questions, scaling decisions, pricing tiers, and reliability guarantees remain guesswork. This is why observability is not optional — it’s the nervous system of a mature SaaS platform.

In Project 4, we integrate Prometheus and Grafana into our multi-tenant SaaS architecture to create a full monitoring and visualization layer. These tools transform raw metrics into actionable intelligence, helping engineers, DevOps, and business stakeholders see — in real time — how tenants, services, and infrastructure behave under load.


📊 Slide-by-Slide Explanation


🖥️ Slide 1: Project 4 – Multi-Tenant SaaS with Prometheus and Grafana Observability

This project adds a complete observability layer to our existing FastAPI + Redis + SlowAPI stack. We now monitor API performance, rate limiting, tenant activity, and Redis health, all visualized in Grafana. This marks the shift from a working system to a well-instrumented platform.


🌍 Slide 2: Why Observability Matters in SaaS

Without observability, troubleshooting becomes guesswork. Modern SaaS platforms must answer three key questions in real time:

  1. What’s happening? (Monitoring)
  2. Why is it happening? (Tracing and metrics correlation)
  3. What should we do next? (Alerts and automated scaling)

Observability ensures your team can detect issues before users report them and can optimize performance, pricing, and user experience with confidence.


🏗️ Slide 3: Architecture Overview — FastAPI + Redis + Prometheus + Grafana

We expand our architecture with two new services:

Core Stack:

Together, they provide a complete feedback loop — from data generation to visualization.


📈 Slide 4: Metrics Integration using Prometheus-FastAPI-Instrumentator

We use prometheus-fastapi-instrumentator to automatically collect:

This library attaches a /metrics endpoint to every API instance. Prometheus scrapes these endpoints every few seconds, converting metrics into time series data.


🔄 Slide 5: Collecting Metrics from API1 and API2 Instances

Both api1 and api2 run on separate containers (ports 8005 and 8006). Prometheus scrapes both /metrics endpoints, giving us visibility across replicas. This helps us answer:

It’s the foundation for cross-instance health tracking.


💾 Slide 6: Redis Exporter for Backend Health Monitoring

Redis isn’t just the rate-limit store — it’s a critical dependency. We use redis_exporter to expose Redis internals like:

Prometheus scrapes these metrics too, giving full visibility into backend stability. This is essential for detecting overloads, leaks, or eviction events.


📜 Slide 7: Prometheus Scrape Configuration and Queries

In prometheus.yml, we define three scrape jobs:

Prometheus turns these into labeled time series, allowing flexible queries:

These queries power Grafana panels and alert rules.


📊 Slide 8: Grafana Dashboards — Visualizing SaaS Performance

Grafana connects to Prometheus and allows us to build dashboards like:

The visual clarity enables faster decisions and trend recognition. You can literally “see” your SaaS performance evolve in real time.


📉 Slide 9: Key Metrics — Requests, Latency, and 429 Rate

These are the most critical metrics to track for tenant fairness and user experience:

By observing these, we can fine-tune rate limits, pricing plans, and scaling strategies.


⚠️ Slide 10: Alerting and Error Tracking via Prometheus Rules

Prometheus supports alerting rules for critical conditions:

Alerts can trigger Slack or email notifications, allowing proactive mitigation. This converts raw observability into automated awareness.


🐳 Slide 11: Docker Compose Setup — Full Observability Stack

Our docker-compose.yml runs 6 integrated services:

Everything runs locally yet represents a real distributed SaaS architecture. With one command:

docker compose up --build

you get a full-featured monitoring environment.


🧪 Slide 12: Testing the Metrics Endpoints

To validate the setup:

  1. Generate API traffic using curl loops.
  2. Hit /metrics on both APIs (http://localhost:8005/metrics).
  3. Verify data in Prometheus at http://localhost:9090.
  4. Watch live updates in Grafana dashboards.

You’ll see 429 counts, latency curves, and request volume appear within seconds — real observability in action.


📊 Slide 13: Building Dashboards for Tenant Performance

To make dashboards tenant-aware:

This enables business insights:

Which tenants drive the most traffic? Are premium tenants experiencing faster responses?

Observability meets business intelligence.


☁️ Slide 14: Scaling, Optimization, and Next Steps

From here, your SaaS can evolve to:

This observability foundation transforms your SaaS into a self-aware system — capable of monitoring, alerting, and even self-optimizing in the future.


🏁 Outro: From Visibility to Intelligence

Project 4 completes your SaaS’s transformation into a fully observable, data-driven platform. You’ve now built a system that doesn’t just run — it knows how it’s running.

Observability brings confidence:

This is the hallmark of mature SaaS engineering — every component is monitored, every anomaly visible, and every decision backed by data.

In the journey so far:

Next, in Project 5, we’ll transform these metrics into automated insights — integrating billing, tenant analytics, and AI-driven anomaly detection. That’s where your SaaS evolves from being “monitored” to being intelligent.

“You can’t improve what you can’t see — and now, you can see everything.”


🧠 Project 4: Multi-Tenant SaaS with Prometheus and Grafana Observability

🔍 Overall System Flow and Testing Guide


⚙️ 1️⃣ System Overview

This project builds on Project 3 (Redis-backed rate limiting) by adding full observability using Prometheus and Grafana. The goal is to see, measure, and analyze your SaaS system’s performance, usage, and fairness — across multiple tenants and API replicas.

🎯 Key Objectives:


🧩 2️⃣ System Architecture

🏗️ Components:

Service Role
FastAPI (api1 & api2) Multi-tenant endpoints with Redis-backed rate limiting
Redis Shared backend for rate limit counters
Redis Exporter Exposes Redis health metrics to Prometheus
Prometheus Scrapes and stores metrics from APIs and Redis
Grafana Visualizes metrics, trends, and alerts

🔄 Flow Diagram (Conceptual)

Client → FastAPI (/session, /chat)
        ↕
      Redis ←→ Redis Exporter
        ↕
  Prometheus ←→ Grafana Dashboards

Each API instance has a /metrics endpoint exposing Prometheus-format metrics (via prometheus-fastapi-instrumentator). Prometheus scrapes these periodically, stores them as time series, and Grafana visualizes them interactively.


🧠 3️⃣ Step-by-Step Flow

① Tenant Configuration

Tenants are defined in tenants.jsonl:

{"id":"acme","limits":{"chat_per_10s":8}}
{"id":"globex","limits":{"chat_per_10s":4}}
{"id":"initech","limits":{"chat_per_10s":6}}

Each has its own quota, model, and welcome message.


② Redis-Backed Rate Limiting

SlowAPI stores rate-limit counters in Redis, ensuring global consistency across both api1 and api2. If Acme hits api1 seven times and api2 once, the global limit of 8/10s still applies.


③ Metrics Collection

Each FastAPI container automatically exposes metrics via /metrics, such as:

Redis metrics are exposed through Redis Exporter at port 9121.


④ Prometheus Scraping

Prometheus periodically (every 5s) collects:

These metrics are stored in a time-series database with tenant-aware labels.


⑤ Grafana Visualization

Grafana connects to Prometheus and visualizes metrics through dashboards showing:

This creates real-time operational visibility.


🧪 4️⃣ Testing the System

Let’s walk through how to verify that everything works correctly.


Step 1 — Start the Stack

Run all containers:

docker compose up --build

Access:


Step 2 — Create Tenant Sessions

Use curl to create sessions for each tenant:

ACME_SID=$(curl -s -X POST http://localhost:8005/session \
  -H 'Content-Type: application/json' \
  -d '{"tenant_id":"acme"}' | jq -r .session_id)

Repeat for other tenants (globex, initech).


Step 3 — Generate Load

Send multiple requests to /chat:

for i in {1..10}; do
  curl -s -X POST http://localhost:8005/chat \
    -H 'Content-Type: application/json' \
    -H "X-Tenant-ID: acme" \
    -H "X-Session-ID: $ACME_SID" \
    -d "{\"tenant_id\":\"acme\",\"session_id\":\"$ACME_SID\",\"message\":\"msg-$i\"}" >/dev/null
done

Step 4 — Validate Metrics

Check /metrics directly:

curl -s http://localhost:8005/metrics | grep http_requests_total

You’ll see output like:

http_requests_total{handler="/chat",method="POST",status="200"} 8
http_requests_total{handler="/chat",method="POST",status="429"} 2

This confirms Prometheus-compatible metrics are being produced.


Step 5 — View in Prometheus

Open Prometheus → http://localhost:9090 Try queries like:

# Total requests by endpoint
sum(rate(http_requests_total[1m])) by (handler)

# 429 rate
rate(http_requests_total{status="429"}[1m])

# 95th percentile latency
histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[1m])) by (le, handler))

You’ll see live metrics plotted as graphs.


Step 6 — Redis Metrics

Open http://localhost:9121/metrics Look for:

redis_connected_clients
redis_memory_used_bytes
redis_up

If redis_up 1, Redis is healthy.


Step 7 — Grafana Dashboards

Open Grafana → http://localhost:3000 Login → admin / admin Go to Dashboards → New → Add Visualization

Add panels with queries such as:

sum(rate(http_requests_total[1m])) by (status)
histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, handler))
redis_memory_used_bytes
rate(http_requests_total{status="429"}[1m])

You’ll instantly see live graphs for requests, latency, Redis usage, and 429s.


Step 8 — Cross-Replica Validation

Make requests to both APIs:

for i in {1..7}; do
  curl -s -X POST http://localhost:8005/chat \
    -H 'X-Tenant-ID: acme' -H "X-Session-ID: $ACME_SID" \
    -d "{\"tenant_id\":\"acme\",\"session_id\":\"$ACME_SID\",\"message\":\"api1-$i\"}" >/dev/null
done

curl -i -X POST http://localhost:8006/chat \
  -H 'X-Tenant-ID: acme' -H "X-Session-ID: $ACME_SID" \
  -d "{\"tenant_id\":\"acme\",\"session_id\":\"$ACME_SID\",\"message\":\"api2-over\"}"

Result: 8th request returns 429, showing Redis-backed limits are global.


Step 9 — Observe Changes

In Grafana:


Step 10 — Alert Simulation

Use Prometheus query:

rate(http_requests_total{status="429"}[1m]) > 0.5

You can configure an alert to trigger when rate limiting exceeds a threshold, simulating a “tenant flood” scenario.


📈 5️⃣ Example Metric Flow Summary

Stage Component Description
1 FastAPI (Instrumented) Generates request/response metrics
2 Prometheus Scrapes /metrics every 5 seconds
3 Redis Exporter Provides backend health metrics
4 Prometheus DB Stores all metrics in time-series format
5 Grafana Queries and visualizes metrics for analysis

This flow creates a complete feedback loop — from action (request) to visibility (dashboard).


🧰 6️⃣ Troubleshooting Tips

Issue Cause Fix
Prometheus shows “target down” API not reachable Check container names and ports
Grafana “no data” Prometheus scrape misconfigured Verify prometheus.yml job names
429s not showing SlowAPI or Redis misconfigured Check RATE_LIMIT_STORAGE_URI env var
Redis memory leak Missing TTL on keys Use FLUSHALL in demo environments

🚀 7️⃣ What This Demo Proves

✅ Multi-tenant rate limits are enforced consistently across replicas ✅ Prometheus captures API + Redis performance metrics ✅ Grafana visualizes throughput, latency, and errors in real time ✅ The system supports proactive alerting and scalability planning

This project marks the maturity phase of our SaaS journey — your platform now “sees” itself and reacts accordingly.


🏁 8️⃣ The Big Picture — End-to-End Flow

Step Description Example
1 Tenant makes API request /chat with tenant headers
2 FastAPI logs + exports metrics via prometheus-fastapi-instrumentator
3 Redis updates counters tenant/session scoped
4 Prometheus scrapes metrics from APIs + Redis exporter
5 Grafana visualizes data latency, 429s, memory, throughput
6 DevOps sets alerts Prometheus rules or Grafana alerts
7 Business insights identify heavy tenants, adjust pricing

💡 9️⃣ Summary

Project 4 completes your SaaS observability loop. You can now:

This creates a self-monitoring SaaS ecosystem — fair, stable, and transparent.

“Data turns guesswork into precision — observability turns systems into living organisms.”