Flux Monitor

// OVERVIEW

Flux Monitor provides continuous health monitoring for all hosts and services in your infrastructure. It collects metrics via SNMP, agent-based checks, and API polling — presenting everything on a unified real-time dashboard.

When thresholds are breached, Flux Monitor automatically creates alerts and (when configured) triggers incident notifications via Flux Notify.

// Integration

Flux Monitor is designed to feed directly into Flux Notify and Flux Event. When a critical threshold is breached, it can automatically create an incident in Flux Notify and push event data to Flux Event for correlation.

// HOST MONITORING

HOST STATUS STATES

State	Condition	Action
OK	All metrics within normal thresholds	No action
WARNING	One or more metrics in warning range	Alert generated, logged
CRITICAL	One or more metrics in critical range	Alert + optional auto-notify
UNKNOWN	Check failed or unreachable	Alert, investigate connectivity
MAINTENANCE	Host in maintenance window	Suppress all alerts

DEFAULT METRIC THRESHOLDS

Metric	Warning	Critical
CPU Usage	> 75%	> 90%
Memory Usage	> 80%	> 95%
Disk Usage	> 80%	> 90%
Disk I/O Wait	> 20%	> 40%
Network Error Rate	> 1%	> 5%
Load Average (1m)	> 2× CPU cores	> 4× CPU cores

// SERVICE CHECKS

In addition to host-level metrics, Flux Monitor performs active service checks to verify application availability.

Check Type	Description	Config
HTTP/HTTPS	URL availability, status code, response time	`check_http`
TCP Port	Port open/closed, connection time	`check_tcp`
ICMP Ping	Host reachability, round-trip time	`check_ping`
MySQL/MariaDB	Connection test, query latency	`check_mysql`
Process	Process running, CPU/memory usage	`check_process`
Custom Script	Any shell script returning 0/1/2/3	`check_custom`

// ALERT RULES

Alert rules define the conditions under which notifications are generated. Rules support:

Threshold conditions — metric value greater/less than a value
Duration requirements — condition must persist for N minutes before alerting
Re-alert intervals — repeat notification every N minutes if unacknowledged
Recovery notifications — optional alert when condition clears

# Example rule definition
rule_name: "High CPU"
host_pattern: "db-prod-*"
metric: cpu_usage
condition: ">"
warning_threshold: 75
critical_threshold: 90
duration_minutes: 5
re_alert_minutes: 30
auto_notify: true
notify_contact_list: "DBA Team"

// FLUX NOTIFY INTEGRATION

When a critical alert is triggered, Flux Monitor can automatically create an incident in Flux Notify and send notifications to the configured contact list.

# Settings → Integrations → Flux Notify
FLUX_NOTIFY_URL=http://flux-notify:8080
FLUX_NOTIFY_API_KEY=your_api_key
AUTO_INCIDENT_SEVERITY=CRITICAL  # CRITICAL, HIGH, or both
DEFAULT_CONTACT_LIST=NOC Team

// Avoiding Alert Storms

Use the "Duration" setting on alert rules to prevent flapping services from generating many notifications. A 5-minute duration requirement significantly reduces noise.

// MAINTENANCE WINDOWS

Maintenance windows suppress alerts for scheduled maintenance, preventing false positive notifications during planned downtime.

# Create a maintenance window via API
POST /api/maintenance
{
  "name": "DB Patching - Feb 25",
  "host_pattern": "db-prod-*",
  "start": "2026-02-25T02:00:00Z",
  "end": "2026-02-25T06:00:00Z",
  "suppress_notifications": true,
  "created_by": "j.smith"
}

// OVERVIEW

// HOST MONITORING

HOST STATUS STATES

DEFAULT METRIC THRESHOLDS

// SERVICE CHECKS

// ALERT RULES

// FLUX NOTIFY INTEGRATION

// MAINTENANCE WINDOWS