Querying Uptrace Metrics
TIP
To learn about metrics, see OpenTelemetry Metrics documentation.
Uptrace provides a powerful query language that supports joining, grouping, and aggregating multiple metrics in a single query.
Timeseries
A timeseries is a metric with an unique set of attributes, for example, each host has a separate timeseries for the same metric name:
-- metric name { attr1, attr2... }
system.filesystem.usage { host.name='host1' } -- timeseries 1
system.filesystem.usage { host.name='host2' } -- timeseries 2
You can also use attributes to create more detailed and rich metrics, for example, you can use state
attribute to report the number of free and used bytes in filesystems:
system.filesystem.usage { host.name='host1', state='free' } -- timeseries 1
system.filesystem.usage { host.name='host1', state='used' } -- timeseries 2
system.filesystem.usage { host.name='host2', state='free' } -- timeseries 3
system.filesystem.usage { host.name='host2', state='used' } -- timeseries 4
With just 2 attributes, you can write a number of useful queries:
-- the filesystem size (free+used bytes) on each host
system.filesystem.usage group by host.name
-- the number of free bytes on each host
$fs_usage{state='free'} as free | group by host.name
-- the size of your dataset (on all hosts)
$fs_usage{state='used'} as dataset_size
Writing queries
You start creating a query by selecting metric names and giving them a short alias, for example:
-- metric aliases always start with a dollar sign
system.filesystem.usage as $fs_usage
system.network.packets as $packets
Because Uptrace supports multiple metrics in the same query, you must use the alias to reference the metric and metric attributes, for example:
-- unlike metric aliases, column aliases don't start with a dollar
$fs_usage as disk_size
$fs_usage{state="used"} as used_space
-- the same as previous query
$fs_usage as used_space | where $fs_usage.state = 'used'
-- disk size on the specified device
$fs_usage | where $fs_usage.host.name = 'host1' and $fs_usage.device = '/dev/sdd1'
per_min($packets) as packets_per_min | group by $packets.host.name
You can use multiple metrics to construct arithmetic expressions, for example, given the following 2 metrics:
api.user_cache.hits as $hits
api.user_cache.misses as $misses
You can write a query to plot the number of hits, misses, and calculate the hit rate:
per_min($hits) as hits | per_min($misses) as misses | hits / (hits + misses) as hit_rate
Instruments
OpenTelemetry provides different instruments that support different aggregation functions in Uptrace:
Otel Instrument Name | Uptrace Name | Aggregations |
---|---|---|
Counter, CounterObserver | counter | per_min, per_sec |
UpDownCounter, UpDownCounterObserver | additive | sum of last values, min, max |
GaugeObserver | gauge | last value, max, max |
Histogram | histogram | percentiles, min, max, per_min, per_sec, count, avg |
Dashboards
Uptrace supports 2 types of dashboards:
- A grid-based dashboard looks like a classical grid of charts.
- A table-based dashboard is a table of items where each item leads to a separate grid-based dashboard for the item, for example, a table of hostnames with some metrics for each hostname.
In other words, table-based dashboards allow to parameterize grid-based dashboards with attributes from the table. For example, Uptrace uses a table-based dashboard to monitor number of sampled and dropped spans for each project:
select
$spans{type='spans'} as sampled_spans,
$spans{type='dropped'} as dropped_spans
from uptrace.projects.spans as $spans
group by project_id
project_id | metric | metric | Link to a grid-based dashboard |
---|---|---|---|
1 | sampled_spans | $dropped_spans | Dash with where project_id = 1 |
2 | sampled_spans | $dropped_spans | Dash with where project_id = 2 |
... | |||
999 | sampled_spans | $dropped_spans | Dash with where project_id = 999 |
Alternatively, you could also create a single grid-based dashboard and then clone it for each project. Obviously, that does not scale well, but still can be an option if you have only a few items, for example, you could create such dashboards for different database clusters or availability zones.