Uptrace: Querying Metrics

TIP

To learn about metrics, see OpenTelemetry Metricsopen in new window documentation.

Uptrace provides a powerful query language that supports joining, grouping, and aggregating multiple metrics in a single query.

Timeseries

A timeseries is a metric with an unique set of attributes, for example, each host has a separate timeseries for the same metric name:

# metric_name{ attr1, attr2... }
system.filesystem.usage{host.name='host1'} # timeseries 1
system.filesystem.usage{host.name='host2'} # timeseries 2

You can add more attributes to create more detailed and rich timeseries, for example, you can use state attribute to report the number of free and used bytes in a filesystem:

system.filesystem.usage{host.name='host1', state='free'} # timeseries 1
system.filesystem.usage{host.name='host1', state='used'} # timeseries 2

system.filesystem.usage{host.name='host2', state='free'} # timeseries 3
system.filesystem.usage{host.name='host2', state='used'} # timeseries 4

With just 2 attributes, you can write a number of useful queries:

# the filesystem size (free+used bytes) on each host
query:
  - $fs_usage group by host.name

# the number of free bytes on each host
query:
  - $fs_usage{state='free'} as free group by host.name

# fs utilization on each host
query:
  - $fs_usage{state='used'} / $fs_usage as fs_util group by host.name

# the size of your dataset on all hosts
query:
  - $fs_usage{state='used'} as dataset_size

Writing queries

Overview

You start creating a query by selecting metric names and giving them a short alias, for example:

metrics:
  # metric aliases always start with a dollar sign
  - system.filesystem.usage as $fs_usage
  - system.network.packets as $packets

Because Uptrace supports multiple metrics in the same query, you must use the alias to reference the metric ($fs_usage) and metric attributes ($fs_usage.host.name), for example:

query:
  # unlike metric aliases, column aliases don't start with a dollar
  - $fs_usage as disk_size
  - $fs_usage{state="used"} as used_space

  # disk size on the specified device
  - $fs_usage{host.name='host1', device='/dev/sdd1'} as host1_sdd1

  # number of packets on each host.name
  - per_min($packets) as packets_per_min group by host.name

You can use multiple metrics in arithmetic expressions, for example, you can write a query to plot the number of hits, misses, and calculate the hit rate:

metrics:
  - service.cache.redis as $redis
query:
  - $redis{type="hits"} as hits
  - $redis{type="misses"} as misses
  - hits / (hits + misses) as hit_rate

Filtering

You can filter datapoints using the following operators:

  • =, !=, <, <=, >, >=, for example, where host.name = "myhost".
  • ~, !~, for example, where host.name ~ "^prod-[a-z]+-[0-9]+$".
  • like, not like, for example, where host.name like "prod-%".
  • in, not in, for example, where host.name in ("host1", "host2").

Without a metric alias, Uptrace applies filters to all metrics, but you can specify a metric alias to filter a specific metric, for example, where $metric1.host.name = "myhost".

Grouping

You can use grouping to get multiple timeseries, for example, one timeseries for each host name:

metrics:
  - system.filesystem.usage as $fs_usage
  - system.network.packets as $packets
  - group by host.name



 

You can have multiple attributes in the group by clause, for example, group by host.name, service.name. To group by all attributes at once, use group by all.

Advanced

Count the number of unique combinations of host.name and service.name attributes:

uniq($metric_alias, host.name, service.name) as num_timeseries

Calculate the difference between the current and previous values:

delta($kafka_part_offset) as messages_processed

Calculate CPU utilization using system.cpu.load_average.15m and system.cpu.time:

$load_avg_15m / uniq($cpu_time, cpu) as cpu_util

Get the first/last time the metric received an update (note the double dot):

min($cache..time), max($cache..time)

Instruments

OpenTelemetry provides different instrumentsopen in new window and each instrument supports different aggregations (functions):

Instrument NameTimeseries kind
Counteropen in new window, CounterObserveropen in new windowCounter
UpDownCounteropen in new window, UpDownCounterObserveropen in new windowAdditive
GaugeObserveropen in new windowGauge
Histogramopen in new windowHistogram
AWS CloudWatchSummary

Additionally, some of the functions carry special meaning when used in table-based dashboards, for example, note how the last function affects the table value result when used with additive instruments:

ExpressionResult timeseriesTable value
avg($metric)[1, 4, 3]2.6 (avg of the values in the result)
last(avg($metric))[1, 4, 3]3 (the last value in the result)

Counter

Counter is a timeseries kind that measures additive non-decreasing values, for example, the total number of:

  • processed requests
  • received bytes
  • disk reads

Uptrace supports the following functions to aggregate counter timeseries:

ExpressionResult timeseriesTable value
$metricSum of timeseriesSum of the result timeseries
last($metric)Sum of timeseriesLast value in the result timeseries
per_min($metric)$metric / _minutesAvg of the result timeseries
per_sec($metric)$metric / _secondsAvg of the result timeseries

Gauge

Gauge is a timeseries kind that measures non-additive values for which sum does not produce a meaningful correct result, for example:

  • error rate
  • memory utilization
  • cache hit rate

Uptrace supports the following functions to aggregate gauge timeseries:

ExpressionResult timeseriesTable value
$metricAvg of timeseriesLast value in the result
avg($metric)Avg of timeseriesAvg of the result
min($metric)Min of timeseriesMin of the result
max($metric)Max of timeseriesMax of the result
sum($metric) *Sum of timeseriesSum of the result
per_min($metric) *$metric / _minutesAvg of the result
per_sec($metric) *$metric / _secondsAvg of the result
delta($metric) *Diff between curr and previous valuesSum of the result

* Note that the sum, per_min, per_sec, and delta functions should not be normally used with this instrument and were added only for compatibility with Prometheus and AWS metrics. For the same reason, per_min(sum($metric)) and delta(sum($metric)) are also supported.

Additive

Additive is a timeseries kind which measures additive values that increase or decrease with time, for example, the number of:

  • active requests
  • open connections
  • memory in use (megabytes)

Uptrace supports the following functions to aggregate additive timeseries:

ExpressionResult timeseriesTable value
$metricSum of timeseriesLast value in the result
sum($metric)Same as $metricSum of the result
avg($metric)Avg of timeseriesAvg of the result
last(avg($metric))Avg of timeseriesLast value in the result
min($metric)Min of timeseriesMin of the result
max($metric)Max of timeseriesMax of the result
per_min($metric)$metric / _minutesAvg of the result
per_sec($metric)$metric / _secondsAvg of the result
delta($metric) *Diff between curr and previous valuesSum of the result

* Note that the delta function should not be normally used with this instrument and was added only for compatibility with Prometheus and AWS metrics.

Histogram

Histogram is a timeseries kind that contains a histogram from recorded values, for example:

  • request latency
  • request size

Uptrace supports the following functions to aggregate histogram timeseries:

ExpressionResult timeseriesTable value
count($metric)Number of observed values in timeseriesSum of the result
p50($metric)P50 of timeseriesAvg of the result
p75($metric)P75 of timeseriesAvg of the result
p90($metric)P90 of timeseriesAvg of the result
p95($metric)P95 of timeseriesAvg of the result
p99($metric)P99 of timeseriesAvg of the result
avg($metric)sum($metric) / count($metric)Avg of the result
last(avg($metric))sum($metric) / count($metric)Last value in the result
min($metric)Min observed value in the histogramMin of the result
max($metric)Max observed value in the histogramMax of the result
sum($metric)Sum of timeseriesSum of the result

Note that you can also use per_min(count($metric)) and per_sec(count($metric)).

Summary

Sum is a timeseries kind that exists for compatibility with Prometheus and AWS. It stores the min, max, sum, and count aggregates of observed values.

ExpressionResult timeseriesTable value
sum($metric)Sum of timeseriesSum of the result
count($metric)Number of observed values in timeseriesSum of the result
avg($metric)sum($metric) / count($metric)Avg of the result
last(avg($metric))sum($metric) / count($metric)Last value in the result
min($metric)Min observed value in the histogramMin of the result
max($metric)Max observed value in the histogramMax of the result

Note that you can also use per_min(count($metric)) and per_sec(count($metric)).

Dashboards

Uptrace uses 2 different types of dashboards together to visualize metrics data:

  • A grid-based dashboard looks like a classic grid of charts.
  • A table-based dashboard is a table where each row leads to a separate grid-based dashboard, for example, a table of host names with a separate grid for each host name.

In other words, table dashboards allow you to parameterize grid dashboards with attributes from the table. You can use tables as a replacement for Grafana variables.

For example, Uptrace uses a table-based dashboard to monitor the number of sampled and dropped spans for each project:

metrics:
  - uptrace.projects.spans as $spans
query:
  - group by project_id
  - $spans{type='spans'} as sampled_spans
  - $spans{type='dropped'} as dropped_spans
project_idsampled_spansdropped_spansLink to a grid-based dashboard
11000Grid with where project_id = 1
21100Grid with where project_id = 2
............
999900Grid with where project_id = 999

Binary operator precedence

The following list shows the precedence of binary operators in Uptrace, from highest to lowest.

  • ^
  • *, /, %
  • +, -
  • ==, !=, <=, <, >=, >
  • and, unless
  • or

Operators on the same precedence level are left-associative. For example, 2 * 3 % 2 is equivalent to (2 * 3) % 2.

See also

Last Updated: