Alerts and notifications

To notify you about changes in performance, Uptrace creates alertsopen in new window and sends you notifications via email, Slack, or PagerDuty.

Alert types

Uptrace supports the following types of alerts:

  • error:new alert when a new error group is created.
  • error:recurring alert when an error group reaches 10k/100k/1m occurrences.
  • anomaly:span-count alert when there is an anomaly in the number of spans as reported by the span.count column.
  • anomaly:span-errors alert when there is an anomaly in the error rate as reported by the span.error_pct column. span.error_pct is calculated as span.error_count / span.count.
  • anomaly:span-duration alert when there is an anomaly in the median span duration as reported by the p50(span.duration) column.
  • anomaly:metric alert whenever a monitor detects an anomaly in the metric data.

Alert severity

For each created alert, Uptrace assigns a severity: minor, major, or critical. If the required conditions are met, the alert severity can be raised to a higher level, for example, from minor to major.

Based on the alert severity, you can send notifications via email, Slack, and PagerDuty, for example:

  1. Uptrace creates an alert with minor severity and sends a notification via email.
  2. After some time, if the alert is not resolved, Uptrace raises the alert severity to major and sends a notification via Slack.
  3. Lastly, Uptrace raises the severity to critical and creates an incident in PagerDuty.

Anomaly detection

On paid accounts, Uptrace uses anomaly detector to automatically detect anomalies in span groups. You can also create and monitor metricsopen in new window using metric monitors.

In both cases, you can configure anomaly detector to work in automatic mode or manually set fixed bounds, for example, create an alert when the data is smaller than X or larger than Y.

In automatic mode, anomaly detector analyzes existing data to automatically calculate upper or lower bounds. In this mode, the detector supports 3 tolerance levels: low, medium, and high. Low tolerance level is less tolerant and create more alerts. High tolerance level is more tolerant and creates less alerts.

Last Updated: