Querying Uptrace Metrics

TIP

To learn about metrics, see OpenTelemetry Metricsopen in new window documentation.

Uptrace provides a powerful query language that supports joining, grouping, and aggregating multiple metrics in a single query.

Timeseries

A timeseries is a metric with an unique set of attributes, for example, each host has a separate timeseries for the same metric name:

-- metric name { attr1, attr2... }
system.filesystem.usage { host.name='host1' } -- timeseries 1
system.filesystem.usage { host.name='host2' } -- timeseries 2

You can also use attributes to create more detailed and rich metrics, for example, you can use state attribute to report the number of free and used bytes in filesystems:

system.filesystem.usage { host.name='host1', state='free' } -- timeseries 1
system.filesystem.usage { host.name='host1', state='used' } -- timeseries 2

system.filesystem.usage { host.name='host2', state='free' } -- timeseries 3
system.filesystem.usage { host.name='host2', state='used' } -- timeseries 4

With just 2 attributes, you can write a number of useful queries:

-- the filesystem size (free+used bytes) on each host
system.filesystem.usage group by host.name

-- the number of free bytes on each host
$fs_usage{state='free'} as free | group by host.name

-- the size of your dataset (on all hosts)
$fs_usage{state='used'} as dataset_size

Writing queries

You start creating a query by selecting metric names and giving them a short alias, for example:

-- metric aliases always start with a dollar sign
system.filesystem.usage as $fs_usage
system.network.packets as $packets

Because Uptrace supports multiple metrics in the same query, you must use the alias to reference the metric and metric attributes, for example:

-- unlike metric aliases, column aliases don't start with a dollar
$fs_usage as disk_size

$fs_usage{state="used"} as used_space
-- the same as previous query
$fs_usage as used_space | where $fs_usage.state = 'used'

-- disk size on the specified device
$fs_usage | where $fs_usage.host.name = 'host1' and $fs_usage.device = '/dev/sdd1'

per_min($packets) as packets_per_min | group by $packets.host.name

You can use multiple metrics to construct arithmetic expressions, for example, given the following 2 metrics:

api.user_cache.hits as $hits
api.user_cache.misses as $misses

You can write a query to plot the number of hits, misses, and calculate the hit rate:

per_min($hits) as hits | per_min($misses) as misses | hits / (hits + misses) as hit_rate

Instruments

OpenTelemetry provides different instrumentsopen in new window that support different aggregation functions in Uptrace:

Otel Instrument NameUptrace NameAggregations
Counteropen in new window, CounterObserveropen in new windowcounterper_min, per_sec
UpDownCounteropen in new window, UpDownCounterObserveropen in new windowadditivesum of last values, min, max
GaugeObserveropen in new windowgaugelast value, max, max
Histogramopen in new windowhistogrampercentiles, min, max, per_min, per_sec, count, avg

Dashboards

Uptrace supports 2 types of dashboards:

  • A grid-based dashboard looks like a classical grid of charts.
  • A table-based dashboard is a table of items where each item leads to a separate grid-based dashboard for the item, for example, a table of hostnames with some metrics for each hostname.

In other words, table-based dashboards allow to parameterize grid-based dashboards with attributes from the table. For example, Uptrace uses a table-based dashboard to monitor number of sampled and dropped spans for each project:

select
  $spans{type='spans'} as sampled_spans,
  $spans{type='dropped'} as dropped_spans
from uptrace.projects.spans as $spans
group by project_id
project_idmetricmetricLink to a grid-based dashboard
1sampled_spans$dropped_spansDash with where project_id = 1
2sampled_spans$dropped_spansDash with where project_id = 2
...
999sampled_spans$dropped_spansDash with where project_id = 999

Alternatively, you could also create a single grid-based dashboard and then clone it for each project. Obviously, that does not scale well, but still can be an option if you have only a few items, for example, you could create such dashboards for different database clusters or availability zones.

See also

Last Updated: