PromQL compatibility

Introduction

Uptrace aims to be compatible with the Prometheus query language while extending it in a meaningful way. Most Prometheus queries can be used in Uptrace with minimal modifications, for example, the following Prometheus queries are also valid in Uptrace:

  • $metric_name{foo="xxx",bar~"yyy"}
  • increase($metric_name) and delta($metric_name)
  • rate($metric_name[5m]) and irate($metric_name[5m])
  • avg_over_time($go_goroutines[5m])
  • avg by (foo)(sum by(foo, bar)($metric_name))
  • $metric_name offset 1w
  • Math between series with automatic many-to-one/one-to-many vector matching, for example, sum($mem by (type)) / sum($mem) as mem.

But there are also some differences between the systems that don't allow you to just copy and paste queries from Prometheus. To ease the migration, you can use Uptrace as a Prometheus data source in Grafana, which is 100% compatible with the original Prometheus and allows you to use existing Grafana dashboards.

Aggregation

The main difference between Uptrace and Prometheus is that Prometheus selects all metric attributes by default, while Uptrace does not. Such difference allows Uptrace to read much less data from disk when compared to Prometheus.

# Prometheus selects all timeseries with all labels.
node_cpu_seconds_total

# Uptrace selects all timeseries but with a single `_hash` label.
node_cpu_seconds_total

The difference between the systems goes away when you add an aggregate function or grouping, for example, Prometheus and Uptrace return the same result for the following queries.

sum(node_cpu_seconds_total)
sum by (instance)(node_cpu_seconds_total)
sum(node_cpu_seconds_total) by (instance)
sum(irate(node_cpu_seconds_total[5m])) by (instance)

Aliases

Because metric names can be quite long, Uptrace requires you to provide a short metric alias that starts with the dollar sign:

metrics:
  - node_memory_MemFree_bytes as $mem_free
  - node_memory_Cached_bytes as $cached

Such aliases will be used as the resulting timeseries names when querying metrics:

query:
  - $mem_free
  - $cached
  - $mem_free + $cached

Uptrace also allows to explicitly specify aliases for expressions:

$mem_free + $cached as total_mem

Because Uptrace queries can contain multiple expressions separated with the |, you can reference other expressions using their aliases:

$mem_free + $cached as total_mem | 1 - ($mem_free / total_mem) as mem_utilization

Grouping

Just like Prometheus, Uptrace allows to customize grouping, for example, the following queries return the same result in Uptrace and Prometheus:

sum by (cpu, mode)(node_cpu_seconds_total)
sum(node_cpu_seconds_total by (cpu, mode))

avg by (cpu)(sum by (cpu, mode)(node_cpu_seconds_total))
avg(sum(node_cpu_seconds_total by (cpu, mode)) by (cpu))

In addition, Uptrace supports expression-wide grouping which applies grouping to all metrics in the expression:

$metric1 by (type) / $metric2 group by host_name

# The same.
$metric1 by (type, host_name) / $metric2 by (host_name)

You can also specify global grouping that affects multiple expressions:

$metric1 | metric2 | group by host_name

# The same using expression-wide grouping.
$metric1 group by host_name | $metric2 group by host_name
# The same but using custom grouping.
$metric1 by (host_name) | $metric2 by (host_name)

Manipulating attributes

You can rename attributes like this:

$metric1 by (deployment_environment as env, service_name as service)
$metric1 | group by deployment_environment as env, service_name as service

To manipulate attribute values, you can use replace and replaceRegexp functions:

group by replace(host_name, 'uptrace-prod-', '') as host
group by replaceRegexp(host, `^`, 'prefix ') as host
group by replaceRegexp(host, `$`, ' suffix') as host

To change strings case, use upper and lower functions:

group by lower(host_name) as host
group by upper(host_name) as host

You can also use a regexp to extract a substring from the attribute value:

group by extract(host_name, `^uptrace-prod-(\w+)$`) as host

Filtering

Uptrace supports all the same filters just like PromQL:

node_cpu_seconds_total{cpu="0",mode="idle"}
node_cpu_seconds_total{cpu!="0",mode~"user|system"}

In addition, you can also add global filters that affect all expressions:

$metric1 | $metric2 | where host = "myhost" | where service = "myservice"

# The same using inline filters.
$metric1{host="myhost",service="myservice"} | $metric2{host="myhost",service="myservice"}

Global filters support the following operators:

  • =, !=, <, <=, >, >=, for example, where host_name = "myhost".
  • ~, !~, for example, where host_name ~ "^prod-[a-z]+-[0-9]+$".
  • like, not like, for example, where host_name like "prod-%".
  • in, not in, for example, where host_name in ("host1", "host2").

Joining

Uptrace supports math between series, for example, to add all equally-labelled series from both sides:

$mem_free + $mem_cached group by host_name, service_name

# The same.
$mem_free by (host_name, service_name) + $mem_cached by (host_name, service_name)

Uptrace also automatically supports one-to-many/many-to-one joins:

# One-to-many
$metric by (type) / $metric by (service_name, type)

# Many-to-one
$metric by (service_name, type) / $metric by (type)

If attribute names don't match, you can rename them like this:

$metric by (hostname as host) + $metric by (host_name as host)

Supported functions

Uptrace supports the following types of functions:

If Uptrace does not support the function you need, please open an issueopen in new window on GitHub.

Aggregate

Aggregate functions combine multiple timeseries using the specified function and grouping attributes. When possible, aggregation is pushed down to ClickHouse for maximum efficiency.

  • min
  • max
  • sum
  • avg
  • median, p50, p75, p90, p99, count. Only histograms.

The count function returns the number of observed values in a histogram. To count the number of timeseries, use uniq($metric, attr1, attr2), which efficiently counts the number of timeseries in the database without selecting all timeseries.

Rollup

Rollup (or range/window) functions calculate rollups over data points in the specified lookbehind window. The number of timeseries and the number of datapoints remain the same.

  • min_over_time, max_over_time, sum_over_time, avg_over_time, median_over_time
  • rate and irate
  • increase and delta

You can specify the lookbehind window in square brackets, e.g. rate($metric[5i]) where i is equal to the current interval. When omitted, the default lookbehind window is 10i.

Transform

Transform functions operate on each point of each timeseries. The number of timeseries and the number of datapoints remain the same.

  • abs
  • ceil, floor, trunc
  • cos, cosh, acos, acosh
  • sin, sinh, asin, asinh
  • tan, tanh, atan, atanh
  • exp, exp2
  • ln, log, log2, log10
  • perSec divides each point by the number of seconds in the grouping interval. You can achieve the same with $metric / _seconds.
  • perMin divides each point by the number of minutes in the grouping interval. You can achieve the same with $metric / _minutes.
  • clamp_min(ts timeseries, min scalar) clamps ts values to have a lower limit of min.
  • clamp_max(ts timeseries, max scalar) clamps ts values to have an upper limit of max.

Attribute manipulation

Additionally, these functions manipulate attributes and can only be used in grouping expressions:

  • lower(attr) lowers the case of the attr value.
  • upper(attr) uppers the case of the attr value.
  • trimPrefix(attr, "prefix") removes the provided leading prefix string.
  • trimSuffix(attr, "suffix") removes the provided trailing suffix string.
  • extract(haystack, pattern) extracts a fragment of the haystack string using the regular expression pattern.
  • replace(haystack, substring, replacement) replaces all occurrences of the substring in haystack by the replacement string.
  • replaceRegexp(haystack, pattern, replacement) replaces all occurrences of the substring matching the regular expression pattern in haystack by the replacement string.

If

The special if function enables conditional branching. If the condition cond evaluates to a value other than zero, the function returns the result of the then expression. If cond evaluates to 0 or NaN, then the result of the else expression is returned.

if(cond, then)
if(cond, then, else)

For example, you can use if to calculate the hit rate only if the number of hits and misses exceeds a certain threshold:

if(
  sum(shits) + sum($misses) >= 100,
  sum($misses) / (sum($hits) + sum($misses))
) as hit_rate

When you omit the else expression, Uptrace uses the null value. If all timeseries values are null, Uptrace removes such timeseries from the result. If you want to keep all timeseries, you can specify the else expression:

if(cond, else, 1)
if(cond, else, NaN)

Offset

The offset modifier allows to set time offset for the query.

For example, this query retrieves the value of http_requests_total from 5 minutes ago, relative to the query evaluation time:

$http_requests_total offset 5m

A negative offset allows to look ahead of the query evaluation time:

$http_requests_total offset -5m

Rate interval

Uptrace automatically picks and applies a suitable $__rate_interval just like Grafana does:

# Grafana and Prometheus
irate(node_cpu_seconds_total[$__rate_interval])

# Uptrace
irate(node_cpu_seconds_total)

You can specify the lookbehind window in square brackets, e.g. irate($metric[5i]) where i is equal to the current interval. When omitted, the default lookbehind window is 10i.

Range vs instant vectors

Because Uptrace does not distinguish between range and instant vectors, you should omit the lookbehind window and let Uptrace pick a default for you:

# Omit the window by default.
irate(node_cpu_seconds_total)

# Only specify it when needed.
max_over_time(process_resident_memory_bytes[1d])

Rate and OpenTelemetry

In Prometheus, rollup functions such as rate and increase play an important role when working with counters. Such functions can be difficult to implement and require users to pick an appropriate lookbehind window to work properly.

In OpenTelemetry, counters are a part of the data model, allowing Uptrace to store delta values instead of monotonically increasing gauges:

# Prometheus counter values
0 5 10 15

# Uptrace delta counter values
0 5 5 5

The value of delta counters is the same as the result of running the increase function in Prometheus, except that it is done either directly by OpenTelemetry SDK/Collector or by Uptrace when inserting metrics in ClickHouse. It provides higher accuracy and makes it easier to query such metrics.

By default, Uptrace automatically converts Prometheus counters to OpenTelemtry delta counters, but this requires adjusting your existing Prometheus queries when migrating from Grafana, for example, removing rate/increase functions.

If this is an issue, or you want to use Uptrace as a Prometheus data source in Grafana with existing dashboards, you can enable a special Prometheus compatibility mode on your project settings page and Uptrace will store Prometheus metrics as is.

Translating Prometheus queries

Number of CPU Cores:

# Prometheus
count(count(node_cpu_seconds_total{instance="$node",job="$job"}) by (cpu))

# Uptrace
uniq(node_cpu_seconds_total, cpu)

CPU busy:

# Prometheus
(sum by(instance) (irate(node_cpu_seconds_total{instance="$node",job="$job", mode!="idle"}[$__rate_interval])) / on(instance) group_left sum by (instance)((irate(node_cpu_seconds_total{instance="$node",job="$job"}[$__rate_interval])))) * 100

# Uptrace
sum(irate(node_cpu_seconds_total{mode!="idle"})) / sum(irate(node_cpu_seconds_total)) as cpu_util group by instance, job

Sys Load (5m avg):

# Prometheus
avg_over_time(node_load5{instance="$node",job="$job"}[$__rate_interval]) * 100 / on(instance) group_left sum by (instance)(irate(node_cpu_seconds_total{instance="$node",job="$job"}[$__rate_interval]))

# Uptrace
avg($load5) / sum(irate(node_cpu_seconds_total)) group by instance, job

RootFS Total:

# Prometheus
100 - ((avg_over_time(node_filesystem_avail_bytes{instance="$node",job="$job",mountpoint="/",fstype!="rootfs"}[$__rate_interval]) * 100) / avg_over_time(node_filesystem_size_bytes{instance="$node",job="$job",mountpoint="/",fstype!="rootfs"}[$__rate_interval]))

# Uptrace
1 - avg_over_time(node_filesystem_avail_bytes{mountpoint="/",fstype!="rootfs"}) / avg_over_time(node_filesystem_size_bytes{mountpoint="/",fstype!="rootfs"}) as fs_used group by instance, job
Last Updated: 9/19/2024, 11:21:11 AM
Get insights and updates in your inbox: