Flux Suite — Enterprise NOC Platform

// OUR APPLICATIONS

THE FLUX SUITE

Five purpose-built applications that work independently or together as a unified NOC platform. Each ships as a Docker-ready service with shared auth, cross-app integrations, and a unified cyberpunk UI.

📡

FLUX NOTIFY

Incident Notification System

Thread-based incident lifecycle management with multi-channel delivery. Coordinate your entire NOC response from initial alert to post-mortem, with a tamper-evident audit trail built for compliance teams.

Thread-based incident tracking — one thread, full lifecycle
5 message types: Initial, Update, Resolved, Bridge, One-Time
Email (SMTP), SMS (AWS SNS), Slack, Teams, Voice, Bridge
Async queue worker with exponential backoff retry
Quick-compose presets — dispatch common incidents in two clicks
Preview & approve flow before any delivery fires
War room bridge dial-in + Zoom links (2 configurable bridges)
Incident Commander assignment and acknowledgment tracking
Escalation policies — auto-notify on SLA breach
Runbook links auto-surfaced by service pattern match
Real-time WebSocket dashboard push (no polling)
SHA-256 hash-chain audit log — tamper-evident and verifiable
TOTP 2FA, 4-tier RBAC, CSRF tokens, bcrypt passwords
Inbound webhooks: Datadog, Grafana, CloudWatch, Splunk, Prometheus
REST API with JWT, static tokens, and scoped API keys
Per-thread war room chat, post-mortem notes

LIVE DEMO DOCS & HELP

📊

FLUX MONITOR

Infrastructure Monitoring

Continuous health visibility across your entire infrastructure stack. Real-time host, service, container, and network dashboards — organized by criticality tier with SLO tracking, APM, and full-screen NOC wallboard views.

NOC Wallboard — full-screen status display for ops centers
4 service tiers: Mission Critical, Business Critical, Operational, Dev/Lab
Host & server metrics: CPU, memory, disk I/O, network throughput
Container monitoring via Docker API — health, restarts, resource usage
Network monitoring — switches, routers, firewalls, latency
CMDB / Inventory — full asset register with relationship mapping
APM & distributed tracing — latency, error rate, throughput per service
Synthetic monitoring — uptime probes, SSL cert expiry tracking
Log ingestion — structured and unstructured log search
Configurable threshold alert rules with auto-escalation
SLA / SLO tracking — availability targets with breach reporting
On-call routing and schedule management
Capacity planning — trend forecasting and resource headroom
Business KPI dashboards — connect infra health to business metrics
Certificate & secrets expiry tracker with advance warnings
Change management log — track infra changes alongside incidents
Integration with Flux Notify for automated alert dispatch

LIVE DEMO DOCS & HELP

⚡

FLUX EVENT

Event Orchestration Engine

Intelligent event collection, deduplication, and correlation at scale. Transform a flood of raw system events into a small set of actionable incidents — with automated escalation chains, suppression windows, and webhook outbound delivery.

Multi-source ingestion: SNMP, syslog, API webhooks, Kafka streams
Smart deduplication — fingerprint-based event grouping
Flap detection — suppresses noisy on/off cycling alerts
Event signatures — regex + field-based pattern matching rules
Rule-based correlation engine — group events into incidents
Escalation chains — tiered L1→L2→L3 with configurable delays
Suppression rules — silence known noise windows
Maintenance windows — scheduled suppression with calendar control
Outbound webhook delivery to any HTTP endpoint
Redis-backed delivery queue with retry dashboard
Contact subscriptions — per-user event category filtering
Notification log — full delivery history per event per recipient
Event timeline and root-cause correlation visualization
SLA tracking with breach notifications and MTTR reporting
Auto-incident creation in Flux Notify on correlation match
Full audit log with complete event processing trace

LIVE DEMO DOCS & HELP

🚀

FLUX SPEED

Network Speedtest & Diagnostics

On-demand and scheduled network performance testing, diagnostics, and historical trending — purpose-built for NOC environments. Flux Speed measures real-world bandwidth, latency, jitter, and packet loss from any point in your infrastructure, and stores every result for trend analysis and SLA reporting.

Download, upload, and bidirectional throughput testing against configurable targets
Latency, jitter, and packet loss measurement per test run
Scheduled test jobs — run automatically on intervals (1min → 24h)
Multi-node testing — deploy lightweight agents to test from any network segment
Geographic path tracing — hop-by-hop route analysis with latency per hop
DNS resolution timing — detect slow or misconfigured resolvers
TCP/UDP port reachability probes — verify connectivity to critical services
Historical result storage — full time-series database of every test
Trend dashboards — visualize performance over hours, days, and weeks
Threshold alerting — trigger Flux Notify on speed drop or latency spike
ISP and circuit comparison — benchmark multiple WAN links side by side
Baseline auto-calculation — learn normal performance, flag deviations
Exportable reports — PDF and CSV for carrier SLA dispute evidence
REST API for triggering tests and retrieving results programmatically
RBAC — control who can run tests, view history, and manage agents
Integration with Flux Monitor — enrich host records with link performance data

LIVE DEMO DOCS & HELP

🤖

FLUX AI

Intelligence & Automation Engine

The intelligent automation layer for your NOC. Flux AI ingests every alert, metric, and event across the entire Flux Suite — autonomously triaging incidents, detecting anomalies, predicting failures before they cause outages, and slashing mean time to resolve with AI-powered runbook suggestions and post-mortem drafts.

Natural language incident summarization — instant readable briefings for every alert
Automated triage — AI-assigned severity, service, and priority in under 15 seconds
Anomaly detection — ML-based baselining with 14-day rolling window per metric
Predictive alerting — surface leading indicators 10–30 min before outages occur
Runbook auto-suggestion — match active incidents to historical resolution patterns
Root cause analysis — AI-guided investigation with cross-app causal chain tracing
Alert noise reduction — suppress 85%+ of low-signal storms intelligently
On-call chatbot — natural language queries over live incident data (Ollama + Claude)
Post-mortem drafting — auto-generate structured drafts from thread timeline + audit log
Capacity forecasting — predictive resource modeling from Flux Monitor metric trends
Automated remediation — safe self-healing actions with human approval gates
Cross-app intelligence — unified learning from Notify, Monitor, Event & Speed

LIVE DEMO DOCS

// PLATFORM CAPABILITIES

ENTERPRISE FEATURES

Every capability built for production NOC environments from day one — no bolt-ons required, no workarounds.

📡

Incident Management

Thread-Based Lifecycle

One incident = one thread. Managed through Initial → Update → Resolved. Message history auto-included in every Update and Resolved notification so stakeholders always have full context without digging.

5 message types with distinct per-channel formatting
Incident Commander assignment + acknowledgment with notes
Per-thread war room chat for real-time collaboration
Mid-incident severity change with full audit entry
Post-mortem documentation stored on the thread
Pin critical incidents to the top of the dashboard

📬

Multi-Channel Delivery

6 Channels, Per-Recipient Tracking

Reach your team everywhere. Delivery status tracked per recipient per channel with full error logging and retry history. Channels are independently toggleable per message.

Email via SMTP + PHPMailer (TLS/SSL, custom from address)
SMS via AWS SNS — global reach, transactional mode
Slack — incoming webhook with rich block formatting
Microsoft Teams — Adaptive Card format
Voice — automated dial-out notifications
Bridge — war room dial-in with PIN + Zoom link (2 bridges)

⚙️

Queue Architecture

Async Worker + Retry Pipeline

All notifications processed asynchronously by a dedicated PHP worker daemon. Decouples UI from delivery latency, provides resilient retry handling with per-attempt error logging.

Configurable max retry attempts (default: 3)
Exponential backoff between attempts (default: 60s delay)
Dead-letter tracking — failed messages always visible
Worker heartbeat — dashboard shows worker liveness
Scheduled delivery — queue messages for future send time
Draft → Queued → Sent / Failed state machine

🔌

Inbound Integrations

6 Monitoring Platform Parsers

Flux Notify accepts inbound webhooks from six monitoring platforms and auto-creates incident threads. Auto-resolution closes matching threads when a recovery event arrives — zero operator clicks required.

Datadog — maps alert_type to P1–P4, extracts service from tags
Grafana — parses evalMatches metrics, maps alert state
AWS CloudWatch — unwraps SNS envelope, reads alarm details
Splunk — parses search result name and severity field
Prometheus / AlertManager — reads labels and annotations
Generic — title, brief, service, severity, details fields

📋

Templates & Presets

Fully Customizable Messaging

Database-backed message templates per type and channel with dynamic placeholder substitution. Quick-compose presets let operators dispatch common incident types in two clicks.

{{service}} {{brief}} {{details}} {{history}}
{{bridge_info}} {{update_frequency}}
Per-channel: HTML email, plain SMS, Slack blocks, Teams cards
Fallback chain: type+channel → type+all → built-in default
Notification presets with pre-filled severity, channels, groups
Preview and approve before any message is dispatched

⚡

Real-Time Updates

WebSocket Push Architecture

A dedicated Node.js WebSocket server broadcasts refresh events the moment a message is sent or a thread is updated. Zero-latency dashboard visibility — no polling required.

Node.js WS server on :8081, ws:// protocol
Broadcasts refresh event to all connected sessions simultaneously
Dashboard auto-reloads stats, incident list, activity feed
30-second SSE polling fallback if WebSocket unavailable
Live UTC clock updated every second across all views
Global keyboard shortcuts: Ctrl+N compose, Ctrl+K search, D dashboard

🚨

Escalation & SLAs

Automated Escalation Policies

Condition: no_ack — not acknowledged within threshold
Condition: still_open — not resolved within threshold
Action: notify_group — send to recipient group
Action: notify_channel — post to Slack or Teams
Multi-tier chains: L1→L2→L3 with per-tier delays
MTTR and MTTA tracked and reported per severity level
Anomaly banners — flags statistical spikes (e.g. 300% increase)

📖

Runbooks

Context-Aware Runbook Links

Pattern-matched to incident service name (partial or full)
Links surface automatically on the thread page
Title, URL, and description per runbook entry
Multiple runbooks per service pattern supported
Managed in Settings → Runbooks (Admin only)

📊

Reporting

NOC Analytics Dashboard

Messages by type — Initial, Update, Resolved, Bridge breakdown
Channel delivery rates — success / failure per channel
Incidents by severity — MTTR, longest resolution, open counts
Anomaly banners — statistical spike detection on dashboard
Recent admin activity preview with audit log link
CSV export for all report datasets

🚀

Deployment & Ops

Production-Ready from Day One

Windows .bat + Mac/Linux .sh startup scripts
AWS EC2/EKS, Ubuntu, Debian, RHEL supported
Kubernetes via Helm chart — EKS, AKS, GKE, on-prem
Backup: ./scripts/backup.sh (DB dump + storage)
Restore: ./scripts/restore.sh [backup-name]
Health check: ./scripts/healthcheck.sh
/healthz /readyz /metrics /status.json system endpoints

☸️

KUBERNETES

Scalability & Fault Tolerance

Kubernetes-Native Deployment

Flux Suite is designed to scale horizontally across Kubernetes clusters. Each application component runs as an independent Deployment with its own replica set, resource limits, and health probes — enabling zero-downtime rolling updates, automatic pod recovery, and horizontal autoscaling under load.

Horizontal Pod Autoscaler (HPA) scales web and worker pods under load
Pod disruption budgets ensure availability during node maintenance
Liveness & readiness probes via /healthz and /readyz endpoints
Rolling update strategy — zero-downtime deployments across all services
Kubernetes Services expose web (ClusterIP + Ingress) and WebSocket (NodePort)
ConfigMaps & Secrets for environment-specific configuration management
PersistentVolumeClaims (PVC) replace bind mounts — works with any storage class
Namespace isolation — deploy multiple Flux Suite instances per cluster
Compatible with EKS (AWS), AKS (Azure), GKE (Google Cloud), and on-prem K8s 1.24+
Helm chart packages all manifests — deploy or upgrade with a single command

🚀

FLUX SPEED

Network Performance & Diagnostics

Speedtest & Network Observatory

Flux Speed delivers on-demand and scheduled network performance testing across your entire infrastructure. Measure real-world bandwidth, latency, jitter, and packet loss from any node — and store every result for trend analysis, SLA evidence, and carrier dispute documentation.

Download, upload, and bidirectional throughput testing against configurable targets
Latency, jitter, and packet loss measured per test run with millisecond precision
Scheduled test jobs — configurable intervals from 1 minute to 24 hours
Multi-node agent support — test from any network segment simultaneously
Hop-by-hop path tracing with per-hop latency and packet loss breakdown
DNS resolution timing — detect slow or misconfigured resolvers instantly
TCP/UDP port reachability probes — verify connectivity to critical services
ISP and circuit comparison — benchmark multiple WAN links side by side
Automatic baseline learning — flag deviations from normal performance
Threshold alerts → Flux Notify on speed drop or latency spike
Historical time-series dashboards — hours, days, weeks, months
PDF and CSV report export for carrier SLA dispute evidence

// FLUX AI — INTELLIGENCE & AUTOMATION ENGINE

⚡

Automated Triage

AI Severity & Priority Scoring

P1–P4 severity assigned in under 15 seconds per alert
Natural language incident summary generated for every event
Confidence score with supporting evidence cited
Service and team auto-identified from alert payload context
Historical pattern matching against 90-day incident library
Triage accuracy: 94.2% vs. manual assignment baseline

💬

NOC Chatbot

Natural Language NOC Interface

Ask questions in plain English over live incident data
Dual-provider: Ollama (local, fast) + Claude API (deep reasoning)
Context window includes open incidents, anomalies & runbooks
Multi-turn conversation — follow-up questions retain full context
Query MTTR, SLA status, service health, and on-call assignments
Initiate post-mortem drafting directly from the chat interface

📡

Anomaly Detection

ML-Based Metric Baselining

14-day rolling baseline per metric — auto-updates nightly
Z-score, Isolation Forest, and seasonal decomposition (STL)
Cross-metric correlation catches cascading failure patterns
True positive rate: 94.2% — false positive rate: 1.8%
847+ metrics monitored across all Flux Suite applications
Anomaly signals auto-linked to triage when threshold exceeded

🔍

Root Cause Analysis

Cross-App Causal Chain Tracing

Correlates events from Notify, Monitor, Event, and Speed simultaneously
Identifies deploy, config change, and maintenance window correlations
Evidence-scored findings — each conclusion cites supporting data points
RCA routed to Claude API for maximum analytical depth
86% root cause accuracy on first analysis — improves with feedback
Accepted root causes retrain the model automatically overnight

📝

Post-Mortem Automation

AI-Drafted Incident Reports

Auto-generated on incident resolution — draft ready in seconds
Pulls timeline from Flux Notify thread + Flux Event audit log
Structured sections: summary, impact, timeline, root cause, actions
Revenue impact estimation from affected service + duration
Action items with AI-suggested assignees and due dates
Export to PDF — human review always required before publish

🔇

Noise Reduction

85%+ Alert Suppression Rate

Maintenance window suppression — zero alerts during planned work
Flap detection — suppress recurring alerts after configurable threshold
Duplicate correlation — child alerts linked and suppressed under P1 parent
Transient suppression — self-healing alerts under 30s not paged
Staging environment filter — production NOC view stays clean
All suppression decisions logged with full audit trail for review

🔮

Predictive Alerting

Pre-Outage Early Warning System

Surfaces leading indicators 10–30 minutes before failure thresholds are crossed
DB connection pool at 85%+ → predicts exhaustion outage 8 min out
Cache hit rate dropping >15%/hr → predicts origin overload 12 min out
Disk free rate of change >2%/hr → predicts write failure 22 min out
187 predictive alerts fired last 30 days — 81% prevented an outage
Configurable prediction horizon: 5 min to 4 hours per indicator

📈

Capacity Forecasting

30-Day Resource Outlook

CPU, memory, disk, and bandwidth forecast per host, service, and cluster
Projected breach dates with confidence ranges and growth rate models
Alert horizon configurable per resource — default 14 days before breach
Memory leak detection via long-term trend slope analysis
Ingest rate modeling for databases — predict disk full before it happens
Cost-aware recommendations — scale vs. archive vs. optimize analysis

🔧

Auto-Remediation

Safe Self-Healing with Approval Gates

Kubernetes: pod restart, deployment rollout, horizontal scale, rollback
Caching: Redis flush, CDN purge, application cache clear
Database: kill long queries, pool reset, read replica failover
Every action logged in audit trail with full rollback capability
Human approval required by default — auto-execute after 30+ clean approvals
Never auto-executes destructive or irreversible actions regardless of setting

🔗

CROSS-APP

Unified Intelligence Layer

Cross-App Intelligence — The Full Platform Picture

Flux AI is the only application in the Flux Suite with read access to all other apps' data stores simultaneously. By joining incident threads from Flux Notify, metric baselines from Flux Monitor, event correlation chains from Flux Event, and network performance data from Flux Speed — Flux AI builds a complete operational picture that no single app could produce alone. Root cause analyses that would take an engineer an hour to piece together manually are available in seconds.

Reads incident threads, MTTR data, and on-call assignments from Flux Notify
Reads service metrics, topology maps, and SLA status from Flux Monitor
Reads event correlation chains and pipeline stage data from Flux Event
Reads network throughput baselines and traceroute history from Flux Speed
Read-only access — Flux AI never writes to other apps' databases directly
Cross-app actions (e.g. create incident in Flux Notify) go through official REST APIs
Unified anomaly scoring: network degradation from Flux Speed can trigger triage in Flux Notify
Post-mortems automatically pull timeline data from all four apps for complete coverage

🤖

FLUX AI

Dual-Provider AI Architecture

Ollama + Claude API — Speed & Depth

Flux AI uses a smart dual-provider routing strategy: latency-critical operations like triage and chatbot responses run on your local Ollama instance (sub-500ms), while complex multi-step reasoning tasks like root cause analysis and post-mortem drafting route to the Anthropic Claude API for maximum analytical depth. The result is a NOC intelligence layer that is both fast enough for real-time operations and deep enough for incident investigation — at a fraction of the cost of API-only approaches.

Intelligent routing engine assigns every task to the optimal provider automatically
Ollama runs locally — zero data leaves your network for triage and chat
Claude API (claude-sonnet-4-6) handles RCA, post-mortems, and complex analysis
Custom routing rules — override defaults per task type or severity level
Automatic failover — if Ollama is unreachable, falls back to Claude API
Provider metrics dashboard — uptime, latency, request counts, cost tracking
97% of requests served by local Ollama — saves ~$1,200/month vs API-only
Runbook suggestions, capacity forecasting, and anomaly scoring all on-device
Predictive alerting models trained locally — never sends metric data to cloud
Full cross-app intelligence: reads Notify, Monitor, Event, and Speed data
RBAC-gated: AI provider config accessible to Admins only
Kubernetes-ready — Ollama runs as a GPU-enabled pod alongside other services

// PLATFORM DESIGN

SYSTEM ARCHITECTURE

Five Docker containers on a shared bridge network. Every service is health-checked, volume-persisted, and cross-platform deployable in a single command.

// DOCKER CONTAINERS

Container

flux-web

php:8.2-apache

:4004 → :80

Web application, REST API, inbound webhook parsers. DocumentRoot: /public only. curl /healthz health check.

Container

flux-db

mariadb:10.11

:3306 internal

Data persistence. 21 tables, InnoDB, utf8mb4. Health-checked via mysqladmin ping. Never exposed externally.

Container

flux-worker

php:8.2-cli

internal only

Background queue processor, delivery retries, escalation policy checks, worker heartbeat updates.

Container

flux-ws

node:18-alpine

:8081

Node.js WebSocket server. Broadcasts refresh events to all connected browser sessions on every message send.

Container

flux-adminer

adminer:latest

:6006 → :8080

Database management GUI. Direct MariaDB access for admins, schema inspection, and debugging queries.

// STACK LAYERS

// FRONTEND LAYER (flux-web · PHP 8.2 · :4004)

PHP 8.2 / Apache

Orbitron · Rajdhani · Share Tech Mono UI

WebSocket Client (JS auto-refresh)

Dark / Light Theme (localStorage)

Keyboard Shortcuts (Ctrl+N, Ctrl+K)

Live UTC Clock (1s interval)

Scanline + Grid Background FX

Scroll-triggered Fade-in Animations

Adminer DB GUI (:6006)

// SERVICES LAYER (background processes)

Node.js WebSocket Server (:8081)

PHP Queue Worker Daemon

Escalation Policy Engine

Async Message Delivery Pipeline

PHPMailer (SMTP / TLS)

AWS SNS SDK (SMS)

Slack Incoming Webhook Sender

Teams Adaptive Card Sender

Worker Heartbeat Monitor

System Health Probe (8 components)

SSE Fallback (30s poll)

// DATA LAYER (flux-db · MariaDB 10.11 · :3306)

MariaDB 10.11 · InnoDB

utf8mb4 Full Unicode Encoding

21 Tables · Full FK Constraints

Optimized Indexes Per Query Pattern

./data/db Volume Persistence

./scripts/init.sql Schema Bootstrap

./scripts/seed.sql Demo Data

SHA-256 Hash-Chain Audit Log

setup_passwords.php First-Run Init

./data/storage File Uploads / Exports

// END-TO-END MESSAGE DATA FLOW

①

INBOUND

Monitoring webhook or operator compose triggers event

②

NORMALIZE

Parse payload → service, severity, title, details extracted

③

COMPOSE

Select channels, groups, update frequency, preview & approve

④

PERSIST

INSERT threads + messages table — status: queued or scheduled

⑤

EXPAND

group_ids → group_members → recipient email + phone roster

⑥

TEMPLATE

Match template by type+channel, replace {{service}} {{brief}} {{history}}

⑦

DELIVER

Route to Email / SMS / Slack / Teams / Voice per channel

⑧

LOG

delivery_log per recipient per channel — status, error, attempts

⑨

BROADCAST

WebSocket pushes refresh event to all connected browser clients

⑩

AUDIT

SHA-256 hash chain entry written — tamper-evident, verifiable

// NETWORK & VOLUME DESIGN

Docker Network

flux-network — Bridge Mode

All five containers share a single Docker bridge network. Inter-container calls use service names as hostnames: db, ws-server, worker. Only three ports are host-exposed: 4004 (web), 8081 (WebSocket), 6006 (Adminer). MariaDB port 3306 is strictly internal — never reachable from outside the Docker network.

Persistent Volumes

Bind Mounts — Data Survives Restarts

Three bind mounts keep data outside containers: ./data/db → MariaDB data directory, survives docker compose down. ./data/storage → web file uploads and CSV exports. ./scripts/init.sql and seed.sql are entrypoint-mounted for first-run schema creation and Star Wars demo data seeding.

// SECURITY DEFENSE LAYERS

🌐

Network Isolation

Docker bridge — only ports 4004, 8081, 6006 exposed. MariaDB never reachable externally.

ACTIVE

📁

Apache DocumentRoot Restriction

Only /public is served. PHP source, config files, and credentials are never web-accessible.

ACTIVE

🍪

PHP Sessions — HTTP-Only Cookies

SameSite cookies. Session ID regenerated on every login. Configurable timeout (default 4h).

ACTIVE

🛡

CSRF Token Validation

Per-form CSRF tokens validated on every state-changing POST request across all endpoints.

ACTIVE

👤

RBAC — 4 Role Levels

Superadmin, Admin, Operator, Viewer. Enforced at both page level and API route level on every request.

RBAC

📱

TOTP Two-Factor Authentication

Per-user TOTP 2FA with QR code enrollment and backup codes. Admin-enforced by default.

2FA

🔑

Bcrypt Password Hashing — Cost 10

All passwords and API tokens bcrypt-hashed at rest. Zero plaintext secrets in the database.

ACTIVE

🔗

SHA-256 Hash-Chain Audit Log

Every entry hashes its content + previous entry hash. Tampering breaks the chain — detectable instantly via the in-app verifier tool.

AUDIT

🔐

API Auth — 3 Independent Methods

Authorization: Bearer <JWT> RS256, 24h expiry · X-API-Token bcrypt-verified · Scoped API keys with expiry and per-key permissions.

API

🚫

SQL Injection Prevention

All database queries use PDO prepared statements throughout the codebase. No string-interpolated SQL.

ACTIVE

⚡

Login Rate Limiting

Configurable rate limiting on authentication endpoints. Failed attempts are logged in the audit trail with IP and user-agent.

ACTIVE

// DEVELOPER INTERFACE

REST API

Complete REST API with three authentication methods, full CRUD operations, inbound webhook parsers for six monitoring platforms, Flux Speed network diagnostics endpoints, Flux AI intelligence endpoints, and Prometheus-format metrics.

Method	Endpoint	Description	Auth
Authentication
POST	/api/auth/login	Authenticate user credentials, returns JWT token	—
Incidents
GET	/api/incidents	List incidents — filterable by severity, status, service keyword	Bearer / Token
POST	/api/incidents	Create new incident thread with service, severity, title, and details	Bearer / Token
POST	/api/incidents/{id}/acknowledge	Acknowledge an open incident with optional acknowledgment note	Bearer / Token
Messages & Recipients
POST	/api/messages	Send notification on existing thread — queued for async delivery	Bearer / Token
GET	/api/recipients	List all recipients with group membership details	Bearer / Token
GET	/api/groups	List recipient groups with member counts	Bearer / Token
GET	/api/templates	List message templates filtered by type and channel	Bearer / Token
GET	/api/reports	Delivery stats, MTTR by severity, SLA compliance data	Admin
System
POST	/api/system-test	Probe all 8 components: DB, WebSocket, SMTP, Slack, Teams, AWS SNS, Twilio, Worker	Admin
GET	/healthz	Liveness check — returns HTTP 200 if web container is running	—
GET	/readyz	Readiness check — returns 200 only if DB connection and worker heartbeat are healthy	—
GET	/metrics	Prometheus-format metrics — queue depth, delivery counts, error rates, worker status	Token
GET	/status.json	Public-facing service health summary in JSON	—
Inbound Webhook Parsers — Monitoring Platform Integrations
POST	/api/inbound/datadog	Datadog alert webhook → auto-create or auto-resolve incident thread via alert_type + service tags	X-API-Key
POST	/api/inbound/grafana	Grafana alert → evalMatches parsed, alert state mapped to P1–P4 severity	X-API-Key
POST	/api/inbound/cloudwatch	AWS CloudWatch alarm via SNS envelope — alarm name, state, affected dimensions extracted	X-API-Key
POST	/api/inbound/splunk	Splunk saved search alert — result name, count, and severity field parsed	X-API-Key
POST	/api/inbound/prometheus	Prometheus AlertManager — labels and annotations mapped to thread fields, auto-resolves on firing=false	X-API-Key
POST	/api/inbound	Generic webhook — accepts title, brief, service, severity, details fields from any source	X-API-Key
Flux Speed — Network Tests & Diagnostics
POST	/api/speed/test	Trigger an on-demand speedtest — specify target, protocol (TCP/UDP), and test duration	Bearer / Token
GET	/api/speed/results	List test results — filterable by node, target, date range, and test type	Bearer / Token
GET	/api/speed/results/{id}	Get full detail for a single test run including per-second throughput samples	Bearer / Token
POST	/api/speed/trace	Run a traceroute/MTR path analysis to a target — returns hop list with latency and loss per hop	Bearer / Token
POST	/api/speed/probe	TCP/UDP port reachability probe — returns open/closed status and connection time in ms	Bearer / Token
POST	/api/speed/dns	DNS resolution timing test — measures lookup time across configured resolvers	Bearer / Token
GET	/api/speed/nodes	List registered test agents with status, location, last-seen, and capability flags	Bearer / Token
GET	/api/speed/schedules	List scheduled test jobs with interval, target, node assignment, and last result	Bearer / Token
POST	/api/speed/schedules	Create a new scheduled test job — define target, interval, thresholds, and alert routing	Admin
GET	/api/speed/report	Generate summary report — avg/P95 throughput and latency over a date range, per node or target	Bearer / Token
Flux AI — Intelligence & Automation
POST	/api/ai/triage	Triage an alert payload — returns AI-assigned severity, natural language summary, runbook matches, and confidence score	Bearer / Token
GET	/api/ai/triage/feed	List recent auto-triage results with AI summaries, severity assignments, and matched runbooks	Bearer / Token
POST	/api/ai/chat	Send a natural language message to the NOC chatbot — returns AI response with live incident context	Bearer / Token
GET	/api/ai/anomalies	List active anomaly signals with deviation score, baseline delta, detection method, and linked incidents	Bearer / Token
POST	/api/ai/rca	Initiate root cause analysis for an incident ID — async job, returns job ID for polling	Bearer / Token
GET	/api/ai/rca/{job_id}	Poll RCA job status and retrieve causal chain findings with evidence scores when complete	Bearer / Token
POST	/api/ai/postmortem	Generate post-mortem draft for an incident ID — routed to Claude API, returns structured sections	Bearer / Token
GET	/api/ai/runbooks/suggest	Get top runbook suggestions for an incident ID — returns up to 3 matches with confidence percentages	Bearer / Token
GET	/api/ai/capacity	Get capacity forecast — projected breach dates per resource (CPU, disk, mem, bandwidth) with confidence ranges	Bearer / Token
GET	/api/ai/providers	List AI provider status, uptime, request counts, avg latency, and cost metrics for Ollama and Claude API	Admin

// API AUTHENTICATION — 3 INDEPENDENT METHODS

JWT BEARER

Authorization: Bearer <token>

RS256-signed JSON Web Token. 24-hour expiry. Returned by POST /api/auth/login. Best for interactive sessions and user-context API calls.

STATIC TOKEN

X-API-Token: <token>

Static bcrypt-hashed token with no expiry. Managed in Settings → Users. Best for server-to-server integrations and long-lived automation scripts.

SCOPED KEY

X-API-Key: <scoped-key>

Scoped API keys with optional expiry and per-key permission grants. Used by all inbound webhook parsers. Rotate independently without affecting other integrations.

// ECOSYSTEM

INTEGRATIONS & ECOSYSTEM

Flux Suite connects to the platforms your team already uses — for alert ingestion, notification delivery, metrics collection, and operational visibility.

📧

SMTP Email

Delivery · PHPMailer · TLS / SSL

📱

AWS SNS

SMS Delivery · Transactional Mode

#️⃣

Slack

Outbound Webhook · Block Kit

🟦

Microsoft Teams

Outbound · Adaptive Cards

🐶

Datadog

Inbound Alert Webhook Parser

📈

Grafana

Inbound Alert Webhook Parser

☁️

AWS CloudWatch

Inbound via SNS Envelope

🔍

Splunk

Inbound Saved Search Alert

🔥

Prometheus / AlertManager

Inbound Webhook · Auto-Resolve

🐳

Docker / Docker Compose

Container Orchestration

🗄

MariaDB 10.11

Primary Relational Data Store

🟢

Node.js 18

WebSocket Push Server

📊

Prometheus

Metrics Scrape Endpoint

🛠

Adminer

Web DB Management GUI

🔐

TOTP Authenticator Apps

Google Auth · Authy · 1Password

📡

SNMP / Syslog

Flux Event Ingestion Sources

☸️

Kubernetes

K8s Deployments · HPA · Services

⚓

Helm

Kubernetes Package Manager

🚀

Flux Speed

Network Speedtest & Diagnostics

📶

iPerf3

Throughput & Bandwidth Engine

🔭

MTR / Traceroute

Path Analysis · Hop Latency

// GETTING STARTED

DEPLOYMENT & OPERATIONS

Zero to operational in under 10 minutes on any platform. Cross-platform startup scripts included. All persistent data lives in bind-mounted volumes — never inside containers.

🪟

Windows Server

2016 · 2019 · 2022 · 2025

Docker Desktop + WSL2 or native Hyper-V

start.bat

Ports 4004, 8081, 6006 must be available

🍎

macOS

Intel + Apple Silicon (M-series)

Docker Desktop or OrbStack

chmod +x start.sh && ./start.sh

ARM64 images fully supported

☁️

AWS EC2 / EKS

Amazon Linux 2023 · AL2 · Ubuntu on AWS

Docker Engine + Compose plugin

./start.sh

EC2, ECS, EKS, and bare-metal tested

🐧

RHEL / Ubuntu / Debian

Any Linux with Docker Engine 24+

On-prem, bare-metal, or any cloud VM

docker compose up -d

Allow ~30s for DB init and schema seeding

☸️

Kubernetes (K8s)

EKS · AKS · GKE · On-Prem K8s 1.24+

Helm chart — Deployments, HPA, Services

helm install flux-suite ./charts/flux-suite

ConfigMaps & PVCs included

Service Management

Start / Stop / Restart

docker compose up -d — start all 5 services
docker compose down — stop, data preserved
docker compose down -v — stop + delete all volumes
docker compose restart — restart all services
docker compose restart web — restart single service
docker compose pull — pull latest image versions

Backup & Restore

Data Management Scripts

./scripts/backup.sh — full DB dump + storage tar
Backups stored in ./backups/ with datetime stamp
./scripts/restore.sh [backup-name] — restore
./scripts/healthcheck.sh — probe all services
Full reset: down -v → rm -rf data/db/* → up -d
Data lives in ./data/db and ./data/storage

Observability

Logs & Health Endpoints

docker compose logs -f — stream all container logs
docker compose logs -f worker — worker-only logs
GET /healthz — container liveness (HTTP 200)
GET /readyz — readiness: DB connection + worker alive
GET /metrics — Prometheus scrape endpoint
GET /status.json — public JSON service status