Alerting and Notifications

Uptrace supports 2 types of monitors: metric and error monitors.

Metric monitors allow to create alerts and receive notifications when metric values meet certain conditions.

Error monitors allow to receive notifications for certain errors (exceptions) and logs, for example, production logs with ERROR severity level.

Notification channels

You can create notification channels to receive notifications via email, Slack, PagerDuty, Opsgenie, AlertManager, and webhooks. You can specify which notifications channels to use when creating monitors.

Monitoring metrics

Uptrace allows to create alerts when the monitored metric value meets certain conditions, for example, you can create an alert when system.filesystem.usage metric exceeds 90%.

Metric Monitor

Examples

Here are some examples of metric monitors you can create to monitor OpenTelemetry host metricopen in new window. We use YAML syntax to define monitors, but in practice you will create monitors using Uptrace UI.

To monitor CPU usage:

monitors:
  - name: CPU usage
    metrics:
      - system.cpu.load_average.15m as $load_avg_15m
      - system.cpu.time as $cpu_time
    query:
      - $load_avg_15m / uniq($cpu_time.cpu) as cpu_util
      - group by host.name
    columns:
      cpu_util: { unit: utilization }
    max_allowed_value: 3
    for_duration: 10

To monitor filesystem usage:

monitors:
  - name: Filesystem usage
    metrics:
      - system.filesystem.usage as $fs_usage
    query:
      - $fs_usage{state='used'} / $fs_usage as fs_util
      - group by host.name, mountpoint
      - where mountpoint !~ "/snap"
    columns:
      fs_util: { unit: utilization }
    max_allowed_value: 0.9
    for_duration: 3

To monitor number of disk pending operations:

monitors:
  - name: Disk pending operations
    metrics:
      - system.disk.pending_operations as $pending_ops
    query:
      - $pending_ops
      - group by host.name, device
    max_allowed_value: 100
    for_duration: 10

To monitor network errors:

monitors:
  - name: Network errors
    metrics:
      - system.network.errors as $net_errors
    query:
      - $net_errors
      - group by host.name
    max_value: 0
    for_allowed_duration: 3

Monitoring span metrics

You can also monitor span metrics using the following metrics created by Uptrace:

  • uptrace.tracing.spans. Number of spans and their duration (excluding events and logs).
  • uptrace.tracing.logs. Number of logs (excluding spans and events).
  • uptrace.tracing.event. Number of events (excluding spans and logs).

You can use all available span attributes for filtering and grouping, for example, where .status_code = 'error' or group by host.name.

Examples

To monitor average PostgreSQL SELECT query duration:

monitors:
  - name: PostgreSQL SELECT duration
    metrics:
      - uptrace.tracing.spans as $spans
    query:
      - avg($spans)
      - where .system = 'db:postgresql'
      - where db.operation = 'SELECT'
    max_allowed_value: 10000 # 10 milliseconds
    for_duration: 5

To monitor median duration of all database operations:

monitors:
  - name: Database operations duration
    metrics:
      - uptrace.tracing.spans as $spans
    query:
      - p50($spans)
      - where .type = 'db'
    max_allowed_value: 10000 # 10 milliseconds
    for_duration: 5

To monitor number of errors:

monitors:
  - name: Number of errors
    metrics:
      - uptrace.tracing.logs as $logs
    query:
      - per_min($logs)
      - where .system in ('log:error', 'log:fatal')
    max_allowed_value: 10
    for_duration: 3

To monitor number of exceptions:

monitors:
  - name: Number of exceptions
    metrics:
      - uptrace.tracing.logs as $logs
    query:
      - per_min($logs)
      - where .system = 'log:error'
      - where exception.type exists
    max_allowed_value: 10
    for_duration: 3

Monitoring errors

Uptrace automatically creates alerts for exceptions and logs with log.severity level bigger than ERROR.

By default, Uptrace has an error monitor that sends email notification on all error alerts. You can create additional error monitors that will send notifications only for errors that match certain conditions, for example, errors with deployment.environment=prod and db.system=postgresql.

Error Monitor

Email notifications

To receive email notifications in the Uptrace Community version, make sure users have correct email addresses and the smtp_mailer is properly configured and enabled:

# uptrace.yml

auth:
  users:
    - name: John Smith
      email: john.smith@gmail.com
      password: uptrace
      notify_by_email: true

smtp_mailer:
  enabled: true
  host: smtp.gmail.com
  port: 587
  username: '[SENDER]@gmail.com'
  password: '[APP_PASSWORD]'
  from: '[SENDER]@gmail.com'

Note that Gmail does not allow to use your real password in smtp_mailer.password. Intead, you should generate an app password for Gmail:

  1. In Gmail, click on your avatar -> "Manage your Google Account".
  2. On the left, click on "Security".
  3. Scroll to "Signing in to Google" and click on "App password".

See Gmail documentationopen in new window for details.

Last Updated: