Critical alerts
Get a notification everytime a job is re-run after a crash.
This feature is available in the Enterprise Edition.
If the node it which it runs halt suddenly (such as a power loss), then the job will be restarted automatically. Windmill itself doesn't crash and other softer interruptions like a pod termination involve a grace period (300s) to let the job finish.
Critical alerts are generated under the following conditions:
- Job is re-run after a crash.
- License key does not renew.
- Workspace error handler fails.
- Number of running workers in a group falls below a specified threshold (has to be configured in the worker group config).
- Number of jobs waiting in queue is above a threshold for more than a specified amount of time.
Critical alert channels
You just need to configure SMTP and setup a critical alert channel (aka email address) in the instance settings or connect your instance to Slack and fill in a channel name.
You can also set an alert to receive notification when the number of running workers in a group falls below a given number. It's available in the worker group config.
Critical alerts in UI
Windmill itself sends critical alerts notifications through the UI.
You can disable this in the instance settings.