Querying Spans and Logs
Uptrace provides a powerful querying language that supports filters (where _status_code = "error
), grouping (group by _group_id
), and aggregates (p50(_duration)
).
To write useful and performant queries, you need to pre-process raw data so it has a well-defined structure. You can achieve this by recording contextual information in span attributes and events. For logs, you can use structured logging.
For searching over spans and logs, Uptrace supports more concise syntax described in this document.
Identifiers
Identifiers are unquoted strings, such as _name
, display_name
, _duration,
etc. You can use them to reference span fields, attributes, and extract values from JSON.
Span fields start with an underscore so they don't conflict with attributes.
Span field | Description |
---|---|
_id | Span id. |
_parent_id | Span parent id. |
_trace_id | Trace id. |
_name | Span name. |
_event_name | Event name. |
_kind | Span kind |
_duration | Span duration in microseconds. |
_time | Span time. |
_status_code | Span status code. |
_status_message | Span status message. |
Attribute names are mostly unchanged except that dots are replaced by underscores, for example:
Attribute name | Description |
---|---|
display_name | display.name attribute. |
service_name | OpenTelemetry service.name attribute. |
Types
Uptrace supports the following attribute types:
Attribute type | Supported comparison operators |
---|---|
string | = , in , like , contains , ~ (regexp), exists |
int and float | = , < , <= , > , >= , exists |
bool | = , != |
string[] | contains , exists |
When writing queries, you don't have to specify the attribute type, but you can if Uptrace can't discover the type automatically.
foo::string | bar::int | baz::float
Since Uptrace stores each type in a separate column, the following query is valid in case the attribute has different type in different spans (which is not recommended):
foo::string | foo::int
Note that Uptrace uses the type as a hint when reading columnar data, not for type conversion.
Units
To tell Uptrace about the attribute unit, add the unit name as a suffix to the attribute. For example:
Attribute | Unit |
---|---|
http_read_bytes | bytes |
request_duration_seconds | seconds |
elapsed_milliseconds | milliseconds |
memory_utilization | utilization |
If you can't change the attribute name, you can still change the column name by specifying an alias when writing a query:
sum(heap_size) as heap_size_bytes
Strings
Strings can be single or double quoted, for example:
"I'm a string\n"
'I\'m a string\n'
You can also use backticks to define strings that don't require/support any escape sequences, for example, to work with regular expressions:
replaceRegexp(host_name, `^some-prefix-(\w+)$`, `\1`)
Filters
Uptrace allows to filter spans and events by their attributes. Filters start with the keyword where
, for example, where display_name contains 'hello'
.
To filter query results, replace where
prefix with having
, for example, having p50(_duration) > 100ms
.
Uptrace filter | Description |
---|---|
where _status_code = "error" | Filter spans with error status code. Case-sensitive. |
where display_name like "hello%" | Filter span names that start with "hello". Case-insensitive. |
where display_name like "%hello" | Filter span names that end with "hello". Case-insensitive. |
where display_name contains "hello" | Filter span names that contain "hello". Case-insensitive. |
where display_name contains "foo|bar" | Same as display_name contains "foo" OR display_name contains "bar" . |
where _duration > 1ms | Same as _duration > 1000 . Uptrace supports μs , ms , and s units. |
where http_request_content_length > 1kb | Same as http_request_content_length > 1024 . Uptrace supports kb , mb , gb , and tb units. |
where foo exists | Filter spans that have attribute foo . |
Grouping
Grouping expressions start with group by
and work just like the corresponding SQL clause, for example, group by host_name
groups spans by the attribute host_name
and at the same time selects the host_name
.
Uptrace grouping | Note |
---|---|
group by _group_id | Groups similar spans together. |
group by _start_of_minute | Groups spans by the minute they were created. Uptrace also supports grouping by hour, day, and week. |
group by host_name | Groups spans by the host_name attribute. |
group by service_name, service_version | Groups spans by the combination of service_name and service_version attributes. |
group by lower(attribute) | Lowers the case of the attribute value. |
Aggregate functions
Aggregate functions perform a calculation on a set of values, and return a single value. They are often used together with grouping.
Aggregate function | Example | Result |
---|---|---|
count | count() | Number of matched spans/logs/events. |
any | any(_name) | Any (random) span name. |
anyLast | anyLast(_name) | Any last span name. |
avg | avg(_duration) | Average span duration. |
min, max | max(_duration) | Maximum span duration. |
sum | sum(http_request_content_length) | Total number of processed bytes. |
p50, p75, p90, p99 | p50(_duration) | Span duration percentile. |
top3, top10 | top3(code_function) | Top 3 most popular function names. |
uniq | uniq(http_client_ip) | Number of unique IP addresses. |
apdex | apdex(500ms, 3s) | Apdex score. |
Uptrace supports ClickHouse if combinator on aggregate functions, for example:
countIf(_status_code = "error")
. Number of matched spans with_status_code = "error"
.p50If(_duration, service_name = "service1")
. P50 duration for the serviceservice1
.
Uptrace also provides shortcuts for common aggregations:
Virtual column | Note |
---|---|
_error_rate | An alis for countIf(_status_code = "error") / count() . |
Transform functions
Transform functions accept a value and return a new value for each matched span/log/event.
Function | Example | Note |
---|---|---|
lower | lower(log_severity) | Lowers the string case. |
upper | upper(log_severity) | Lowers the string case. |
perMin | perMin(count()) | Divides the value by the number of minutes in the interval. |
perSec | perSec(count()) | Divides the value by the number of seconds in the interval. |
trimPrefix(str, prefix) | trimPrefix(str, "prefix") | Removes the provided leading prefix string. |
trimSuffix(str, suffix) | trimSuffix(str, "suffix") | Removes the provided trailing suffix string. |
extract(haystack, pattern) | extract(host_name, `^uptrace-prod-(\w+)$`) | Extracts a fragment of the haystack string using the regular expression pattern . |
replace(haystack, substring, replacement) | replace(host_name, 'uptrace-prod-', '') | Replaces all occurrences of the substring in haystack by the replacement string. |
replaceRegexp(haystack, pattern, replacement) | replaceRegexp(host, `^`, 'prefix ') | Replaces all occurrences of the substring matching the regular expression pattern in haystack by the replacement string. |
arrayJoin | arrayJoin(db_sql_tables) | See ClickHouse arrayJoin. |
parseInt64 | parseInt64(str_with_int) | Parses a string as an int64. |
parseFloat64 | parseFloat64(str_with_float) | Parses a string as a float64. |
parseDateTime | parseDateTime(str_with_time) | Parses a string as a date with time. |
toStartOfDay * | toStartOfDay(_time) | Rounds down a time to the start of the day. |
In addition to toStartOfDay
, Uptrace also supports toStartOfHour
, toStartOfMinute
, toStartOfSecond
, toStartOfFiveMinutes
, toStartOfTenMinutes
, and toStartOfFifteenMinutes
.
Combining all together
You can write powerful queries combining filters, grouping, and aggregates together. For example, to select the number of unique visitors for each day excluding bots:
where user_agent_is_bot not exists | uniq(client_address) | group by toStartOfDay(_time)
Performance
If your query is taking a long time to complete, you can try the following:
- Narrow down the date range, for example, select the "Last 1 hour" period if possible.
- Select a more specific system, for example, you can select the
httpserver:all
system if you're analyzing HTTP requests. - Further narrow the scope by adding the
_group_id
filter, for example,where _group_id = 123456789
.
It is also important to use the OpenTelemetry semantic convention when choosing an attribute name, because Uptrace optimises some attributes better than others.
To store data, Uptrace uses ClickHouse which is a columnar database that stores data in columns rather than rows. This means that instead of storing all the data for a single record together, each column of data is stored separately.
To improve query performance, Uptrace puts some attributes into separate columns. Uptrace can't put all attributes into separate columns because it doesn't know the list of attributes in advance and this list can get very large with thousands of attributes.
Here is the list of attributes that Uptrace optimises for:
Attribute | Spans | Logs | Events | Preaggregated |
---|---|---|---|---|
deployment_environment | Yes | Yes | Yes | Yes |
service_namespace | Yes | Yes | Yes | Yes |
service_name | Yes | Yes | Yes | Yes |
service_version | Yes | Yes | Yes | Yes |
host_name | Yes | Yes | Yes | Yes |
telemetry_sdk_name | Yes | Yes | ||
telemetry_sdk_language | Yes | Yes | ||
telemetry_sdk_version | Yes | Yes | ||
telemetry_auto_version | Yes | Yes | ||
otel_library_name | Yes | Yes | Yes | |
otel_library_version | Yes | Yes | Yes | |
client_address | Yes | |||
client_socket_address | Yes | |||
client_socket_port | Yes | |||
db_system | Yes | |||
db_name | Yes | |||
db_sql_table | Yes | |||
db_statement | Yes | |||
db_operation | Yes | |||
process_pid | Yes | Yes | ||
process_command | Yes | Yes | ||
process_runtime_name | Yes | Yes | ||
process_runtime_version | Yes | Yes | ||
process_runtime_description | Yes | Yes | ||
log_severity | Yes | |||
log_file_path | Yes | |||
log_file_name | Yes | |||
log_iostream | Yes | |||
log_source | Yes | |||
exception_type | Yes | |||
exception_message | Yes | |||
messaging_message_id | Yes | |||
messaging_message_type | Yes | |||
messaging_message_payload_size_bytes | Yes |