Skip to content

#55 — Page Statut Public (Incidents Actifs & Uptime)

PLANIFIÉ

Priorité: 🟠 HAUTE · Type: TYPE D · Conteneur: rgz-web · Code: web/src/pages/StatusPage.tsx

Dépendances: #2 rgz-web


Description

Page statut public accessible sans authentification (https://access-rgz.duckdns.org/status). Affiche incidents actifs, uptime par service (Core, CPE Network, Billing, Portal), historique 30 jours. Design minimaliste, dark mode, actualisée 1 minute. Abonnés/revendeurs consultent avant contacter support.

Pas de données sensibles affichées (pas de noms clients, pas de revenus). Seuls : "Network delay reported (15 min ago)", "1 AP in Cotonou degraded". Utilise données publiques de #43 SLA probe + #38 Prometheus sans auth.

Architecture Interne

Status Page Data Flow (Public, No Auth Required):
  1. Frontend StatusPage.tsx:
     └─ useEffect(() => { fetch('/api/v1/status/current') }, [])
     └─ NO JWT required
     └─ Query parameters: ?refresh=auto (1min polling)
     └─ No sensitive data exposed

  2. API Endpoint (Public):
     GET /api/v1/status/current
     ├─ NO authentication required
     ├─ Rate limiting: 10 req/min per IP (prevent DoS)
     ├─ Cache: Redis rgz:status:current TTL=60s
     └─ Response:
        {
          "page_status": "operational|degraded|down",
          "last_update": "2026-02-21T14:30:00Z",
          "incident_count": 1,
          "incidents": [
            {
              "id": "uuid",
              "title": "1 AP site Cotonou signal faible",
              "status": "investigating|identified|monitoring|resolved",
              "impact": "limited",  // enum: none|minor|major|critical
              "started_at": "2026-02-21T14:00:00Z",
              "updated_at": "2026-02-21T14:30:00Z",
              "components_affected": ["rf_coverage"],
              "description": "RSSI <-75dBm reported at access_kossou for 30 minutes"
            },
            ...
          ],
          "services": [
            {
              "name": "Core API",
              "status": "operational",  // operational|degraded|offline
              "uptime_percent_24h": 99.8,
              "response_time_ms": 45
            },
            {
              "name": "WiFi Network",
              "status": "degraded",
              "uptime_percent_24h": 97.5,
              "response_time_ms": 120
            },
            {
              "name": "Authentication",
              "status": "operational",
              "uptime_percent_24h": 99.9
            },
            {
              "name": "Billing System",
              "status": "operational",
              "uptime_percent_24h": 99.5
            },
            {
              "name": "Portal",
              "status": "operational",
              "uptime_percent_24h": 98.2
            }
          ]
        }

  3. SLA Probe Data (Source of Truth):
     └─ Query table sla_results_hourly (public view, no auth)
     └─ Aggregate last 24h uptime per service
     └─ Thresholds:
        • operational: >= 99% uptime
        • degraded: 95-99% uptime
        • offline: < 95% uptime

  4. Active Incidents Mapping:
     └─ Prometheus AlertManager firing alerts
     └─ Match severity HIGH → incident status "investigating"
     └─ Insert incident record: incidents table (public visible)
     ├─ Incident auto-closes if alert fires=false (resolved)
     └─ Incident types:
        • site_offline (any AP down)
        • performance_degraded (latency >200ms or uptime <99%)
        • security_alert (only generic message, no details)
        • maintenance_window (scheduled downtime)

  5. Historical Uptime (30 days):
     GET /api/v1/status/history?days=30
     └─ NO auth required
     └─ Response:
        {
          "items": [
            {"date": "2026-02-21", "uptime_percent": 99.8, "incidents": 1},
            {"date": "2026-02-20", "uptime_percent": 99.5, "incidents": 0},
            ...
          ]
        }

  6. Frontend Components:
     ├─ Status Badge: operational (green), degraded (yellow), offline (red)
     ├─ Service Cards: each shows current status + 24h uptime%
     ├─ Incident Timeline: reverse chronological, expandable details
     ├─ Uptime Chart: line graph 30 days
     ├─ Auto-refresh: every 60s with visual indicator
     └─ Responsive: mobile-first, works on slow 2G

  7. SEO & Caching:
     ├─ Static HTML shell + client-side hydration
     ├─ OG meta tags: title, description, og:image
     ├─ robots.txt: allow all
     ├─ sitemap.xml: /status/ included
     └─ CDN cache: 60s (Cloudflare or nginx reverse-proxy)

  8. Public Visibility Rules:
     ├─ Incident details: generic only (no subscriber names, no revenues)
     │  ✓ "1 AP degraded in Cotonou"
     │  ✗ "Kossou reseller down"
     │  ✗ "Billing sync failure"
     │  ✗ "Customer data breach detected"
     └─ Severity colors: display delay/downtime, not root cause

Configuration

env
# Frontend (web/.env)
VITE_STATUS_API_URL=https://api-rgz.duckdns.org/api/v1/status
VITE_STATUS_REFRESH_INTERVAL=60000  # 1 min
VITE_STATUS_HISTORY_DAYS=30
VITE_STATUS_TIMEZONE=Africa/Lagos

# Backend (app/config.py)
STATUS_PAGE_ENABLED=true
STATUS_PAGE_RATE_LIMIT=10  # requests per minute per IP
STATUS_CACHE_TTL=60  # seconds (Redis rgz:status:current)
STATUS_INCIDENT_TTL_HOURS=24  # auto-resolve after 24h
STATUS_PUBLIC_DETAIL_LEVEL=generic  # enum: generic|technical

# Uptime calculation
STATUS_UPTIME_CALCULATION_WINDOW=1h  # calculate per 1h bucket
STATUS_UPTIME_THRESHOLD_OPERATIONAL=99.0  # %
STATUS_UPTIME_THRESHOLD_DEGRADED=95.0  # %

# Incident auto-categorization
INCIDENT_THRESHOLD_LATENCY_MS=200
INCIDENT_THRESHOLD_LOSS_PERCENT=5
INCIDENT_THRESHOLD_SITES_DOWN=3

Endpoints API (Public)

MéthodeRouteRéponse
GET/api/v1/status/current{page_status, incidents: [...], services: [...]} (cached 60s)
GET/api/v1/status/history?days=30{items: [{date, uptime_percent, incidents_count}]}
GET/api/v1/status/incidents?status=&severity=&limit=50{items: [{id, title, status, started_at}], total}
GET/api/v1/status/incidents/{incident_id}Full incident details (public view)

Composants React

typescript
// web/src/pages/StatusPage.tsx

import React, { useEffect, useState } from 'react';
import { useQuery } from '@tanstack/react-query';
import api from '@/services/api';

export default function StatusPage() {
  const [lastUpdate, setLastUpdate] = useState(new Date());

  // Query: Current status (no auth)
  const { data: status, isLoading, refetch } = useQuery({
    queryKey: ['status', 'current'],
    queryFn: () => api.get('/api/v1/status/current'),
    staleTime: 0,  // Always fresh
    refetchInterval: 60000,  // 1 min polling
    retry: 2,
  });

  // Query: Historical uptime
  const { data: history } = useQuery({
    queryKey: ['status', 'history'],
    queryFn: () => api.get('/api/v1/status/history?days=30'),
    staleTime: 300000,  // 5 min
  });

  useEffect(() => {
    const interval = setInterval(() => setLastUpdate(new Date()), 60000);
    return () => clearInterval(interval);
  }, []);

  const getStatusColor = (pageStatus) => {
    if (pageStatus === 'operational') return 'bg-green-50 border-green-200';
    if (pageStatus === 'degraded') return 'bg-yellow-50 border-yellow-200';
    return 'bg-red-50 border-red-200';
  };

  const getStatusBadgeColor = (s) => {
    if (s === 'operational') return 'bg-green-100 text-green-800';
    if (s === 'degraded') return 'bg-yellow-100 text-yellow-800';
    return 'bg-red-100 text-red-800';
  };

  return (
    <div className="min-h-screen bg-white">
      {/* Header */}
      <header className="bg-access-blue text-white p-6 shadow">
        <div className="max-w-4xl mx-auto">
          <h1 className="text-3xl font-bold mb-2">ACCESS RGZ — Statut du Service</h1>
          <p className="text-blue-100">Uptime et incident tracking en temps réel</p>
        </div>
      </header>

      <main className="max-w-4xl mx-auto p-6 space-y-8">
        {/* Status Summary Card */}
        <div className={`rounded-lg border p-6 ${getStatusColor(status?.data?.page_status)}`}>
          <div className="flex items-center justify-between">
            <div>
              <h2 className="text-2xl font-bold mb-2">
                {status?.data?.page_status === 'operational'
                  ? '✓ Tous les services opérationnels'
                  : status?.data?.page_status === 'degraded'
                  ? '⚠️ Dégradation détectée'
                  : '✗ Service indisponible'}
              </h2>
              <p className="text-sm text-gray-700">
                Mise à jour : {lastUpdate.toLocaleTimeString('fr-FR')}
                {status?.data?.incident_count > 0 && (
                  <span className="ml-4 font-semibold">
                    {status.data.incident_count} incident{status.data.incident_count > 1 ? 's' : ''} en cours
                  </span>
                )}
              </p>
            </div>
            <button
              onClick={() => refetch()}
              className="px-4 py-2 bg-access-yellow text-access-dark rounded-lg hover:bg-opacity-90 text-sm"
            >
              Actualiser
            </button>
          </div>
        </div>

        {/* Active Incidents */}
        {status?.data?.incidents?.length > 0 && (
          <section className="space-y-4">
            <h3 className="text-xl font-bold">Incidents Actifs</h3>
            {status.data.incidents.map((incident) => (
              <div
                key={incident.id}
                className={`border rounded-lg p-4 ${
                  incident.impact === 'critical'
                    ? 'border-red-300 bg-red-50'
                    : incident.impact === 'major'
                    ? 'border-orange-300 bg-orange-50'
                    : 'border-yellow-300 bg-yellow-50'
                }`}
              >
                <div className="flex items-start justify-between">
                  <div className="flex-1">
                    <h4 className="font-semibold text-lg">{incident.title}</h4>
                    <p className="text-sm text-gray-700 mt-1">{incident.description}</p>
                    <div className="flex gap-4 mt-3 text-xs text-gray-600">
                      <span>
                        Démarré : {new Date(incident.started_at).toLocaleString('fr-FR')}
                      </span>
                      <span>
                        Mis à jour : {new Date(incident.updated_at).toLocaleString('fr-FR')}
                      </span>
                    </div>
                  </div>
                  <div className={`px-3 py-1 rounded-full text-xs font-semibold ${getStatusBadgeColor(incident.status)}`}>
                    {incident.status === 'investigating' && '🔍 En investigation'}
                    {incident.status === 'identified' && '🎯 Identifié'}
                    {incident.status === 'monitoring' && '📊 Suivi'}
                    {incident.status === 'resolved' && '✓ Résolu'}
                  </div>
                </div>
              </div>
            ))}
          </section>
        )}

        {/* Services Grid */}
        <section className="space-y-4">
          <h3 className="text-xl font-bold">État des Services</h3>
          <div className="grid grid-cols-1 md:grid-cols-2 gap-4">
            {status?.data?.services?.map((service) => (
              <div key={service.name} className="border rounded-lg p-4 bg-gray-50 border-gray-200 hover:border-gray-300">
                <div className="flex items-center justify-between mb-2">
                  <h4 className="font-semibold">{service.name}</h4>
                  <span className={`px-2 py-1 rounded text-xs font-bold ${getStatusBadgeColor(service.status)}`}>
                    {service.status === 'operational' && '✓ Opérationnel'}
                    {service.status === 'degraded' && '⚠️ Dégradé'}
                    {service.status === 'offline' && '✗ Hors ligne'}
                  </span>
                </div>
                <div className="space-y-1 text-sm text-gray-700">
                  <p>Uptime 24h : <span className="font-bold">{service.uptime_percent_24h}%</span></p>
                  <p>Temps réponse : <span className="font-bold">{service.response_time_ms}ms</span></p>
                </div>
              </div>
            ))}
          </div>
        </section>

        {/* Uptime History Chart */}
        <section className="space-y-4">
          <h3 className="text-xl font-bold">Historique 30 Jours</h3>
          <div className="border rounded-lg p-4 bg-gray-50 border-gray-200">
            <div className="grid grid-cols-30 gap-1">
              {history?.data?.items?.map((day) => (
                <div
                  key={day.date}
                  title={`${day.date}: ${day.uptime_percent}%`}
                  className={`h-8 rounded-sm cursor-pointer hover:opacity-80 transition
                    ${day.uptime_percent >= 99 ? 'bg-green-400' : day.uptime_percent >= 95 ? 'bg-yellow-400' : 'bg-red-400'}
                  `}
                />
              ))}
            </div>
            <p className="text-xs text-gray-600 mt-4 text-center">
              Vert:99% | Jaune: 95-99% | Rouge: &lt;95%
            </p>
          </div>
        </section>

        {/* Footer */}
        <div className="border-t border-gray-200 pt-6 text-center text-sm text-gray-600">
          <p>Cette page est automatiquement mise à jour toutes les minutes.</p>
          <p>Pour plus d'informations, contactez support@rgz.cm</p>
        </div>
      </main>
    </div>
  );
}

Commandes Utiles

bash
# Tester endpoint statut public (no auth needed)
curl https://api-rgz.duckdns.org/api/v1/status/current

# Voir historique 30 jours
curl https://api-rgz.duckdns.org/api/v1/status/history?days=30

# Voir incidents détaillés
curl https://api-rgz.duckdns.org/api/v1/status/incidents

# Vérifier cache Redis
docker exec rgz-redis redis-cli GET rgz:status:current

# Test uptime calculation
docker exec rgz-db psql -U rgz -d rgz -c "
SELECT date, uptime_percent, incident_count
FROM status_uptime_history
ORDER BY date DESC
LIMIT 30;
"

# Monitor rate limiting (per IP)
docker exec rgz-api tail -f /var/log/rgz/api.log | grep "status_page"

# Build & deploy status page
cd /home/claude-dev/RGZ/web && npm run build
docker-compose -f docker-compose.core.yml up -d --build rgz-web

Implémentation TODO

  • [ ] Créer composant React StatusPage.tsx (public, no auth)
  • [ ] Implémenter endpoint GET /api/v1/status/current (cached, public)
  • [ ] Implémenter endpoint GET /api/v1/status/history (30 days uptime)
  • [ ] Créer table incidents (incident_id, title, status, impact, components, created_at)
  • [ ] Create service_status_snapshots table (hypertable, 1h aggregation)
  • [ ] Ajouter incident auto-generation from AlertManager
  • [ ] Implémenter rate limiting : 10 req/min per IP (nginx)
  • [ ] Créer status badge colors : operational (green), degraded (yellow), offline (red)
  • [ ] Ajouter 30-day uptime history chart (heatmap or bar)
  • [ ] Tests : public access (no JWT), cache hit rates, incident mapping

Dernière mise à jour: 2026-02-21

PROJET MOSAÏQUE — 81 outils, 22 conteneurs, 500+ revendeurs WiFi Zone