← FLUX SUITE
NOC DASHBOARD
FLUX MONITOR › Overview › Dashboard
UTC 00:00:00
🔔8
SA
admin
P1 Incidents
1
4 total open
▲ 1 new this hour
Active Alerts
18
3 crit · 6 high · 9 warn
▲ +4 this hour
Services Impact
3
14 total / 11 OK
▲ +1 degraded
Hosts Up
142
147 total · 5 issues
▸ 96.6% avail
SLO Health
2
SLOs breached of 14
▲ 1 new breach
Capacity Alerts
2
stor-nas · log-elk
▲ growing
Metrics/sec
48k
99.8% drop-free
▼ -3% load
// SERVICE HEALTH — STATUS PROPAGATION
PROPAGATION: app-prod-02 (DOWN) → Customer API Gateway (DEGRADED) · 14.2k users affected ⌚ 51 min ago · INC-2847 P1
PROPAGATION: k8s-node-02 (DEGRADED) + k8s-node-03 (DOWN) → Search & Indexing (PARTIAL OUTAGE) · Search stale 15min ⌚ 2h 18m ago · INC-2846 P2
TIER 1 MISSION CRITICAL Revenue/safety-critical — any degradation triggers immediate response
🚨 Response: 5 min 📣 Notify: SMS + Voice + Slack 📉 SLO: 99.99%
1 DEGRADED 2 OK
💳
Payment Platform
@payments-team
OPERATIONAL
UPTIME 30D
99.97%
SLO TARGET
99.99%
// RESOURCES
db-prod-01 db-prod-02 Billing API Stripe WH
30-day99.97%
🌐
Customer API Gateway
@platform-team
DEGRADED
UPTIME 30D
99.72%
ERROR RATE
8.4%
// RESOURCES
web-prod-01 web-prod-02 app-prod-02 fw-edge-01
30-day99.72% ⚠
🗄
Database Cluster
@db-team
OPERATIONAL
UPTIME 30D
99.98%
SLO TARGET
99.99%
// RESOURCES
db-prod-01 db-prod-02
30-day99.98%
TIER 2 BUSINESS CRITICAL Core business ops — significant user/revenue impact
⏱ Response: 15 min 📣 Notify: Slack + Email + SMS 📉 SLO: 99.9%
1 PARTIAL OUTAGE 3 OK
🏠
Customer Portal
@frontend-team
OK
UPTIME
100.0%
MEMBERS
2/2 OK
🔑
Auth Service
@security-team
OK
UPTIME
99.99%
SLO
✓ MET
🔍
Search & Indexing
@data-team
PARTIAL
UPTIME
98.10%
MEMBERS
2/3 ↓
📧
Email / Notifs
@platform-team
OK
UPTIME
99.85%
SLO
✓ MET
// ACTIVE ALERTS & OPEN INCIDENTS
Active Alerts
18 FIRING
CRIT
app-prod-02 unreachable — ICMP & TCP timeout
HOST: app-prod-02→ INC-2847 P1
0:51
CRIT
BGP Session Down: core-rt-01 → ISP-A (Cogent AS174)
DEVICE: core-rt-01→ INC-2845 P2
0:23
HIGH
SSL Certificate Expiring: api.prod.internal (6 days)
HOST: web-prod-01RULE: cert-expiry-7d
6:00
HIGH
K8s Node Memory Pressure: k8s-node-02 at 94%
HOST: k8s-node-02RULE: memory-pressure
1:05
WARN
High Disk I/O: db-prod-01 — io_wait 87%
HOST: db-prod-01RULE: io-wait-high
2:13
WARN
NAS Storage Warning: stor-nas-01 pool at 91.2%
HOST: stor-nas-01RULE: disk-cap-90
4:22
Open Incidents
4 OPEN
P1 — Customer API Degradation · Elevated 5xx
INC-2847@platform-sreOPEN 51m
14:31
P2 — Elasticsearch Split Brain · Search Degraded
INC-2846@data-teamOPEN 2h 18m
12:14
P2 — BGP Failover · ISP-A Link Down
INC-2845@netopsACK 23m
14:09
P3 — Log Aggregation Backlog · ELK 8-12m delay
INC-2844@infra-teamOPEN 3h 10m
11:22
// HOST STATUS & NETWORK TOPOLOGY
Network Topology
🔌
core-rt-01
🛡
fw-edge-01
⚠️
sw-dist-01
AWS us-east-1
🔀
sw-acc-02
🖥
web-cluster
🖥
app-prod-02 ✕
🗄
db-cluster
WAN IN
820 Mbps
WAN OUT
315 Mbps
LATENCY
4.2 ms
PKT LOSS
0.01%
Host Status
▲ 142 UP▼ 5 DOWN
HOSTSTATUSROLECPUMEMDISK
web-prod-01
10.0.1.11
UPweb
32%
71%
45%
db-prod-01
10.0.2.10
WARNdb
78%
85%
38%
app-prod-02
10.0.1.22
DOWNappUNREACHABLE — metrics unavailable
k8s-node-02
10.0.3.22
WARNk8s
62%
94%
29%
stor-nas-01
10.0.4.5
WARNstorage
12%
48%
91%
core-rt-01
10.0.0.1
WARNrouter
72%
31%
18%
// METRICS & BUSINESS KPI
Host Metrics — db-prod-01
CPU
78%
8-core · 6.2 used
MEM
85%
64GB · 54.4 used
DISK
40%
2TB · 800GB used
IO WAIT
88%
HIGH · ALERT FIRING
CPU 2H TREND
Business KPI
ORDERS / HR
4,280
▲ +8.3% vs 1h avg
API ERROR RATE
8.4%
⚠ above 0.5% target
P99 API LATENCY
312ms
▲ target 250ms
REVENUE / MIN
$18.4k
▲ on target
// SLA / SLO STATUS & LIVE LOGS
SLA / SLO Status
2 BREACHED
Customer API GW
Target: 99.9%
99.72%
Payment Platform
Target: 99.99%
99.97%
Search & Indexing
Target: 99.5%
98.10%
Auth Service
Target: 99.99%
99.99%
Database Cluster
Target: 99.99%
99.98%
Customer Portal
Target: 99.9%
100.0%
30-DAY WINDOW · MTTR avg: 42min · MTBF avg: 18.4d · 2 of 14 SLOs BREACHED
Live Log Stream
● LIVE
14:31:02db-prod-01mysqldSlow query: 4821ms — SELECT * FROM orders WHERE ...
14:31:05web-prod-01nginxGET /api/v2/items 200 12ms — 10.0.5.44
14:31:07app-prod-02systemdERROR: Service app-backend.service entered failed state
14:31:09core-rt-01bgpdBGP peer 203.0.113.1 state: Established → Active
14:31:12k8s-node-03kubeletPod flux-collector-7d4f9 started successfully
14:31:14k8s-node-02kubeletMemory pressure: evicting pod search-worker-7x2
14:31:16fw-edge-01ipsecVPN tunnel vpn-dc-02 established — 10.255.1.1
14:31:18collectorsnmpPolled 142 interfaces — 0 errors — queue: 14
// CAPACITY FORECAST & INTEGRATIONS
Capacity Forecast
stor-nas-01 /data
~18 days
Log ELK Storage
~22 days
db-prod-01 /var
~47 days
k8s-node-03 RAM
~90 days
Metrics TSDB
>6 months
Integrations
6 CONNECTED
📧
Email SMTP
● connected
💬
Slack
● connected
🔷
MS Teams
◑ no webhook
🚨
PagerDuty
● connected
🎟
ServiceNow
○ disabled
🔵
Jira Cloud
● connected
📱
AWS SNS/SMS
● connected
🔗
Flux Notify
● bridged
FLUX NOTIFY BRIDGE · ● connected · notifications → flux-notify:4004