#80 — canary-test-system
PLANIFIÉ
Priorité: 🟠 HAUTE · Type: TYPE C · Conteneur: rgz-beat · Code: app/tasks/canary.py
Dépendances: #10 rgz-beat
Description
Système de sondes synthétiques simulant un abonné réel toutes les 5 minutes. Teste le flux complet de l'expérience utilisateur end-to-end : DNS sinkhole, portail captif, authentification RADIUS, et transfert de données. Détecte les régressions avant les vrais abonnés.
Contrairement aux sondes SLA (#43) qui testent la disponibilité des composants individuels, le canary test valide l'intégration end-to-end. Si un abonné hypothétique ne peut pas se connecter, le canary test le détecte en moins de 5 minutes.
Les tests utilisent un abonné fictif dédié (subscriber_ref=RGZ-CANARY) créé lors de l'initialisation du système, sans impact sur les statistiques réelles.
Architecture Interne
Celery Beat → rgz.canary.run → every 5min → queue=rgz.monitoring
↓
Test 1: DNS Sinkhole
nslookup test.example.com [rgz-dns] → doit retourner IP portail
↓
Test 2: API Health
GET http://rgz-api:8000/health → 200 en <100ms
↓
Test 3: RADIUS Auth
radtest RGZ-CANARY canary_password rgz-radius 1812 $RADIUS_SECRET
→ Access-Accept attendu
↓
Test 4: Portal Endpoint
GET https://access-rgz.duckdns.org/ → 200 en <2s
↓
Test 5: Data Transfer mini
iperf3 -c [iperf_server] -t 5 → DL > 1 Mbps
↓
Résultats → Table canary_results (TimescaleDB)
↓
Si 2 tests consécutifs failed → Alerte P1 → AlertManager → SMS NOCTable canary_results
CREATE TABLE canary_results (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
test_name TEXT NOT NULL,
success BOOLEAN NOT NULL,
latency_ms INT,
error_msg TEXT,
tested_at TIMESTAMPTZ DEFAULT NOW()
);
SELECT create_hypertable('canary_results', 'tested_at');Abonné canary (init unique)
# Créé lors de scripts/ops/init.sh
CANARY_SUBSCRIBER = {
"subscriber_ref": "RGZ-CANARY",
"msisdn": "+22900000000",
"full_name": "CANARY TEST",
"status": "active"
}Configuration
CANARY_RADIUS_USER=RGZ-CANARY
CANARY_RADIUS_PASSWORD=canary_secure_password_here
CANARY_IPERF_SERVER=10.0.0.1
CANARY_IPERF_MIN_MBPS=1
CANARY_FAIL_THRESHOLD=2 # Alerter après N échecs consécutifs
CANARY_API_TIMEOUT_MS=2000
CELERY_BROKER_URL=redis://:password@rgz-redis:6379/0Celery Task (app/celery_app.py)
'rgz.canary.run': {
'task': 'rgz.canary.run',
'schedule': timedelta(minutes=5),
'options': {'queue': 'rgz.monitoring'}
},Endpoints API
| Méthode | Route | Réponse | Auth |
|---|---|---|---|
| GET | /api/v1/canary/status | {tests:[], last_run, consecutive_fails} | JWT |
| GET | /api/v1/canary/history?hours=24 | {items, total} | JWT |
| POST | /api/v1/canary/run | 202 {task_id} | Admin JWT |
GET /api/v1/canary/status
{
"last_run": "2026-02-21T10:25:00Z",
"consecutive_fails": 0,
"overall_status": "healthy",
"tests": [
{"name": "dns_sinkhole", "success": true, "latency_ms": 8},
{"name": "api_health", "success": true, "latency_ms": 45},
{"name": "radius_auth", "success": true, "latency_ms": 120},
{"name": "portal_access", "success": true, "latency_ms": 340},
{"name": "data_transfer", "success": true, "latency_ms": 5200}
]
}Commandes Utiles
# Lancer canary manuellement
docker exec rgz-beat celery -A app.celery_app call rgz.canary.run
# Voir status via API
curl -H "Authorization: Bearer $JWT" \
https://api-rgz.duckdns.org/api/v1/canary/status
# Historique 24h
curl -H "Authorization: Bearer $JWT" \
"https://api-rgz.duckdns.org/api/v1/canary/history?hours=24"
# Stats succès/échec
docker exec rgz-db psql -U postgres -c \
"SELECT test_name,
COUNT(*) as total,
SUM(CASE WHEN success THEN 1 ELSE 0 END) as success_count,
ROUND(100.0*SUM(CASE WHEN success THEN 1 ELSE 0 END)/COUNT(*), 2) as success_pct
FROM canary_results
WHERE tested_at > NOW() - INTERVAL '24 hours'
GROUP BY test_name;"Implémentation TODO
- [ ]
app/tasks/canary.py— 5 tests synthétiques - [ ] Abonné CANARY créé dans
scripts/ops/init.sh - [ ] Hypertable TimescaleDB
canary_results - [ ] Détection N échecs consécutifs → alerte P1
- [ ] Endpoints GET /api/v1/canary/status + history
- [ ] Dashboard Grafana "Canary Tests" (uptime 5 tests)
- [ ] Tests unitaires (mock RADIUS, mock iperf3)
Dernière mise à jour: 2026-02-21