Skip to content

#52 — Dashboard NOC (Opérations Temps Réel)

PLANIFIÉ

Priorité: 🟠 HAUTE · Type: TYPE D · Conteneur: rgz-web · Code: web/src/pages/NocDashboard.tsx

Dépendances: #2 rgz-web, #37 dashboards-grafana, #38 prometheus-alert


Description

Dashboard en temps réel réservé aux opérateurs NOC et administrateurs. Affiche l'état opérationnel complet du réseau : statut Core (API, DB, Redis, RADIUS), alertes actives traitées par priorité (P0/P1/P2), CPE par site avec RSSI/uptime, sessions RADIUS actives, map géographique des 22 sites.

Mise à jour WebSocket temps réel (5s) pour alertes et sessions, polling 30s pour métriques CPU/RAM. Intègre iframes Grafana pour dashboards détaillés (#37 dashboards), liens vers Prometheus/Kibana. Interface sombre (dark mode) pour NOC 24/7.

Architecture Interne

NOC Dashboard Real-Time Flow:
  1. React Component NocDashboard.tsx mount:
     └─ useAuth() → JWT admin role
     └─ useWebSocket("/ws/noc/updates") → streaming alerts + sessions
     └─ useQuery("sla/current") → polling 30s CPU/RAM/uptime
     └─ useQuery("fraud/alerts") → anomalies haute priorité (#44)

  2. WebSocket Server (FastAPI WebSocket):
     └─ app/api/v1/endpoints/monitoring.py::websocket_noc_updates()
     └─ Broadcast à tous les clients NOC connectés:
        • Nouvelles alertes Prometheus (P0 immédiat, P1 <30s, P2 <5min)
        • Sessions RADIUS actives (CONNECT, DISCONNECT events)
        • CPE status changes (AP went down/up)
        • Metrics spiking (CPU >80%, RAM >90%)
     └─ Message format (JSON):
        {
          "type": "alert|session|cpe_status|metric_spike",
          "timestamp": "2026-02-21T14:30:00Z",
          "data": 
        }

  3. SLA Current Endpoint:
     GET /api/v1/sla/current
     Response:
     {
       "probes": [
         {"target": "rgz-api", "type": "tcp:8000", "latency_ms": 12, "success": true},
         {"target": "rgz-db", "type": "tcp:5432", "latency_ms": 5, "success": true},
         {"target": "rgz-redis", "type": "tcp:6379", "latency_ms": 2, "success": true},
         {"target": "rgz-radius", "type": "udp:1812", "latency_ms": 8, "success": true},
         {"target": "access_kossou (AP)", "type": "icmp", "latency_ms": 45, "success": true},
         ...
       ]
     }

  4. CPE Monitoring Query:
     GET /api/v1/monitoring/cpe?limit=100
     Response:
     {
       "items": [
         {
           "nas_id": "access_kossou",
           "site_number": 1,
           "city": "Cotonou",
           "latitude": 6.4969,
           "longitude": 2.6289,
           "ap_ip": "192.168.1.1",
           "uptime_percent": 99.8,
           "rssi_dbm": -68,
           "active_sessions": 42,
           "bandwidth_mbps_current": 125.3,
           "cpu_percent": 67,
           "ram_percent": 78,
           "last_check": "2026-02-21T14:29:45Z",
           "status": "healthy"  // enum: healthy|degraded|critical|offline
         },
         ...
       ],
       "total": 22
     }

  5. Alerts Active Query:
     GET /api/v1/monitoring/alerts?severity=high&status=open
     Response:
     {
       "items": [
         {
           "alert_id": "uuid",
           "severity": "P0",  // enum: P0|P1|P2
           "type": "site_down|cpu_spike|bandwidth_exceeded|security_threat",
           "title": "access_kossou (Cotonou) DOWN",
           "description": "ICMP probe failed for 10 minutes",
           "affected_resource": "access_kossou",
           "affected_subscribers": 42,
           "timestamp": "2026-02-21T14:20:00Z",
           "acknowledged": false,
           "assigned_to": null
         },
         ...
       ],
       "total": 3,
       "total_by_severity": {"P0": 1, "P1": 2, "P2": 5}
     }

  6. Sessions RADIUS Live:
     GET /api/v1/monitoring/sessions/live?limit=50
     Response (paginated, refresh 5s):
     {
       "items": [
         {
           "session_id": "uuid",
           "subscriber_ref": "RGZ-0197979964",
           "mac_address": "AA:BB:CC:DD:EE:FF",
           "nas_id": "access_kossou",
           "ip_address": "10.142.5.67",
           "session_start": "2026-02-21T10:15:00Z",
           "session_duration_seconds": 14700,
           "bytes_in_mb": 245.3,
           "bytes_out_mb": 98.5,
           "forfait_name": "Pass 5GB",
           "status": "active"
         },
         ...
       ],
       "total_active": 1234
     }

  7. Fraud Alerts (Anomalies #44):
     GET /api/v1/anomalies/current?severity=high&nas_id=
     └─ Port scans, MAC spoofing, DNS tunneling, HTTP floods
     └─ Intégré dans tableau alertes globales

  8. Map Visualization (Leaflet.js):
     └─ GeoJSON data: reseller_sites (lat, lon, status, active_sessions)
     └─ Markers colorés (green=healthy, yellow=degraded, red=critical, grey=offline)
     └─ Click marker → sidebar CPE details + 24h sparkline metrics
     └─ Heatmap RSSI overlay (optionnel #41)

  9. Dark Mode Theme:
     └─ Tailwind dark: mode class='dark'
     └─ bg-slate-900, text-slate-100
     └─ Colors: accent-yellow (#f5c445), accent-red (#da3747)

Configuration

env
# Frontend (web/.env)
VITE_NOC_WEBSOCKET_URL=wss://api-rgz.duckdns.org/ws/noc/updates
VITE_GRAFANA_EMBED_URL=https://grafana-rgz.duckdns.org
VITE_PROMETHEUS_URL=http://prometheus:9090  # Internal
VITE_KIBANA_URL=https://kibana-rgz.duckdns.org
VITE_MAP_CENTER_LAT=6.4969  # Cotonou
VITE_MAP_CENTER_LON=2.6289
VITE_MAP_ZOOM=7

# Backend (app/config.py)
NOC_WEBSOCKET_ENABLED=true
NOC_WEBSOCKET_HEARTBEAT_INTERVAL=30  # seconds
NOC_POLLING_INTERVAL_METRICS=30  # seconds
NOC_POLLING_INTERVAL_SESSIONS=5  # seconds
NOC_POLLING_INTERVAL_ALERTS=10  # seconds
ALERT_SEVERITY_P0_TTL=3600  # 1h before auto-close
ALERT_SEVERITY_P1_TTL=7200  # 2h
ALERT_SEVERITY_P2_TTL=14400  # 4h

Endpoints API

MéthodeRouteRéponse
GET/api/v1/sla/current{probes: [{target, type, latency_ms, success}]}
GET/api/v1/monitoring/cpe?limit=100&status={items: [{nas_id, uptime%, rssi, sessions, cpu%, ram%}], total}
GET/api/v1/monitoring/alerts?severity=&status=open{items: [{alert_id, severity, type, title, affected_subscribers}], total_by_severity}
PUT/api/v1/monitoring/alerts/{alert_id}/acknowledge{status: acknowledged, acknowledged_by, timestamp}
POST/api/v1/monitoring/alerts/{alert_id}/assign{assigned_to: user_id, status: assigned}
GET/api/v1/monitoring/sessions/live?limit=50{items: [{session_id, subscriber_ref, nas_id, duration_sec, bytes_in, bytes_out}], total_active}
WS/ws/noc/updatesWebSocket stream: {type, timestamp, data}

Composants React

typescript
// web/src/pages/NocDashboard.tsx

import React, { useEffect, useState } from 'react';
import { useAuth } from '@/hooks/useAuth';
import { useQuery, useQueryClient } from '@tanstack/react-query';
import useWebSocket from '@/hooks/useWebSocket';
import AlertTable from '@/components/Tables/AlertTable';
import CpeMap from '@/components/Maps/CpeMap';
import SlaStatusGrid from '@/components/Cards/SlaStatusGrid';
import SessionsPanel from '@/components/Panels/SessionsPanel';
import api from '@/services/api';

export default function NocDashboard() {
  const { user } = useAuth();
  const queryClient = useQueryClient();
  const [darkMode, setDarkMode] = useState(true);

  // WebSocket real-time updates
  const { data: wsMessage } = useWebSocket(
    'wss://api-rgz.duckdns.org/ws/noc/updates',
    {
      onOpen: () => console.log('NOC WS connected'),
      onError: (e) => console.error('WS error:', e),
      onClose: () => console.log('WS disconnected'),
    }
  );

  // Process WebSocket updates
  useEffect(() => {
    if (!wsMessage) return;

    const { type, data } = wsMessage;
    switch (type) {
      case 'alert':
        // Invalidate alerts cache + show toast notification
        queryClient.invalidateQueries({ queryKey: ['monitoring', 'alerts'] });
        break;
      case 'session':
        // Update sessions live count
        queryClient.setQueryData(
          ['monitoring', 'sessions', 'live'],
          (old) => ()
        );
        break;
      case 'cpe_status':
        // Update CPE status on map
        queryClient.invalidateQueries({ queryKey: ['monitoring', 'cpe'] });
        break;
      case 'metric_spike':
        // Show banner alert
        break;
    }
  }, [wsMessage, queryClient]);

  // Queries
  const { data: slaCurrent } = useQuery({
    queryKey: ['sla', 'current'],
    queryFn: () => api.get('/api/v1/sla/current'),
    refetchInterval: 30000,  // 30s polling
  });

  const { data: cpeMonitoring } = useQuery({
    queryKey: ['monitoring', 'cpe'],
    queryFn: () => api.get('/api/v1/monitoring/cpe?limit=100'),
    refetchInterval: 30000,
  });

  const { data: alertsActive } = useQuery({
    queryKey: ['monitoring', 'alerts'],
    queryFn: () => api.get('/api/v1/monitoring/alerts?status=open'),
    refetchInterval: 10000,  // 10s for HIGH/P0 priority
  });

  const { data: sessionsLive } = useQuery({
    queryKey: ['monitoring', 'sessions', 'live'],
    queryFn: () => api.get('/api/v1/monitoring/sessions/live?limit=50'),
    refetchInterval: 5000,  // 5s for active sessions
  });

  return (
    <div className={`${darkMode ? 'dark' : ''} bg-slate-900 text-slate-100 min-h-screen`}>
      {/* Header */}
      <header className="bg-slate-800 border-b border-slate-700 p-6">
        <div className="flex justify-between items-center">
          <h1 className="text-3xl font-bold">Dashboard NOC</h1>
          <div className="flex gap-4">
            <button className="px-4 py-2 bg-access-yellow text-slate-900 rounded-lg hover:bg-opacity-90">
              Incidents P0
            </button>
            <span className="text-sm text-slate-400">
              {sessionsLive?.data?.total_active?.toLocaleString()} sessions actives
            </span>
          </div>
        </div>
      </header>

      {/* Main Grid */}
      <div className="grid grid-cols-1 lg:grid-cols-4 gap-6 p-6">
        {/* Left Column: Alerts + SLA */}
        <div className="lg:col-span-1 space-y-6">
          {/* Alert Summary */}
          <div className="bg-slate-800 rounded-lg p-6 border border-slate-700">
            <h2 className="text-lg font-bold mb-4">Alertes Actives</h2>
            <div className="space-y-2">
              {alertsActive?.data?.total_by_severity && (
                <>
                  <AlertSummaryRow label="P0 CRITIQUE" count={alertsActive.data.total_by_severity.P0} color="red" />
                  <AlertSummaryRow label="P1 URGENT" count={alertsActive.data.total_by_severity.P1} color="yellow" />
                  <AlertSummaryRow label="P2 IMPORTANT" count={alertsActive.data.total_by_severity.P2} color="blue" />
                </>
              )}
            </div>
          </div>

          {/* SLA Status */}
          <SlaStatusGrid probes={slaCurrent?.data?.probes || []} />
        </div>

        {/* Center: Map + Alerts Table */}
        <div className="lg:col-span-2 space-y-6">
          {/* CPE Map */}
          <div className="bg-slate-800 rounded-lg p-6 border border-slate-700 h-96">
            <h2 className="text-lg font-bold mb-4">22 Sites</h2>
            <CpeMap data={cpeMonitoring?.data?.items || []} />
          </div>

          {/* Active Alerts */}
          <div className="bg-slate-800 rounded-lg p-6 border border-slate-700">
            <h2 className="text-lg font-bold mb-4">Incidents Ouverts</h2>
            <AlertTable
              alerts={alertsActive?.data?.items || []}
              onAcknowledge={(id) => api.put(`/api/v1/monitoring/alerts/${id}/acknowledge`)}
              onAssign={(id, userId) => api.post(`/api/v1/monitoring/alerts/${id}/assign`, { assigned_to: userId })}
            />
          </div>
        </div>

        {/* Right Column: Live Sessions */}
        <div className="lg:col-span-1">
          <div className="bg-slate-800 rounded-lg p-6 border border-slate-700 sticky top-6 max-h-96 overflow-y-auto">
            <h2 className="text-lg font-bold mb-4">Sessions RADIUS</h2>
            <SessionsPanel sessions={sessionsLive?.data?.items || []} />
          </div>
        </div>
      </div>

      {/* Embedded Grafana Dashboards */}
      <div className="p-6 space-y-6">
        <div className="grid grid-cols-1 lg:grid-cols-2 gap-6">
          <iframe
            title="Grafana: Network Metrics"
            src="https://grafana-rgz.duckdns.org/d/network_metrics/network-metrics?refresh=30s"
            className="w-full h-80 rounded-lg border border-slate-700"
          />
          <iframe
            title="Grafana: CPE Heatmap"
            src="https://grafana-rgz.duckdns.org/d/cpe_heatmap/cpe-heatmap?refresh=30s"
            className="w-full h-80 rounded-lg border border-slate-700"
          />
        </div>
      </div>
    </div>
  );
}

Commandes Utiles

bash
# Tester WebSocket connection
wscat -c wss://api-rgz.duckdns.org/ws/noc/updates \
  -H "Authorization: Bearer ${JWT_TOKEN}"

# Trigger alerte test P0
docker exec rgz-api python -c "
from app.services.incident import trigger_alert
trigger_alert(severity='P0', type='site_down', resource='access_test', title='TEST ALERT')
"

# Vérifier sessions actives RADIUS
docker exec rgz-db psql -U rgz -d rgz -c "
SELECT COUNT(*) as active_sessions
FROM radius_sessions
WHERE session_stop IS NULL;
"

# Consulter alertes ouvertes
docker exec rgz-db psql -U rgz -d rgz -c "
SELECT alert_id, severity, type, title, timestamp
FROM alerts
WHERE status = 'open'
ORDER BY severity DESC;
"

# Afficher Grafana dashboards
curl -H "Authorization: Bearer ${GRAFANA_API_TOKEN}" \
  https://grafana-rgz.duckdns.org/api/dashboards/uid/network_metrics

Implémentation TODO

  • [ ] Créer composant React NocDashboard.tsx avec WebSocket integration
  • [ ] Implémenter WebSocket endpoint : app/api/v1/endpoints/monitoring.py::websocket_noc_updates()
  • [ ] Ajouter alert broadcaster : AlertManager → WebSocket → tous clients NOC
  • [ ] Créer endpoint GET /api/v1/sla/current (Core status + latency)
  • [ ] Créer endpoint GET /api/v1/monitoring/cpe (CPE health, RSSI, sessions)
  • [ ] Créer endpoint GET /api/v1/monitoring/alerts (active incidents avec priority)
  • [ ] Implémenter Leaflet map : GeoJSON reseller_sites, cluster markers, click details
  • [ ] Créer dark mode theme Tailwind (bg-slate-900, accent colors)
  • [ ] Ajouter iframes Grafana dashboards (network metrics, CPE heatmap)
  • [ ] Tests : WebSocket connection, real-time message delivery, alert routing

Dernière mise à jour: 2026-02-21

PROJET MOSAÏQUE — 81 outils, 22 conteneurs, 500+ revendeurs WiFi Zone