# Grouping rules

> Create Grok-style log grouping rules that normalize noisy messages and generate consistent fingerprints for recurring issues.

Grouping rules let you change how Uptrace groups logs and exceptions together. Each rule is a Grok-style pattern: literal words plus typed placeholders that extract variable parts (numbers, IPs, identifiers, etc.) and feed them into the grouping fingerprint.

<div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
<iframe style="position: absolute; top: 0; left: 0; width: 100%; height: 100%;" src="https://www.youtube.com/embed/Rv9AK4FwnP8?si=ceTyJ_j-PUyS4NLe" title="YouTube video player" frameBorder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerPolicy="strict-origin-when-cross-origin" allowFullScreen="true">



</iframe>
</div>

For example, you can configure Uptrace to create a separate error group for each unknown PostgreSQL column:

```text
# Error messages
ERROR: column "event.created_at" does not exist (SQLSTATE=42703)
ERROR: column "updated_at" does not exist (SQLSTATE=42703)
ERROR: column "name" does not exist (SQLSTATE=42703)

# Pattern
%{LOG_LEVEL:log_severity} column %{QUOTED:#column} does not exist %{ATTR:sqlstate}
```

## Patterns

A pattern is a sequence of **matchers** separated by whitespace. Each matcher is either a literal word or a typed placeholder.

### Literals

Plain words match tokens exactly:

```text
error connecting to database
```

### Typed placeholders

Typed placeholders use Grok syntax `%{TYPE}` to match variable tokens by type:

```text
%{NUMBER}
%{IP}
%{LOG_LEVEL}
```

### Capture name

Add a capture name after a colon to extract the matched value into an attribute:

```text
%{NUMBER:status_code}
%{IP:remote_addr}
```

Prefix the capture name with `#` to also include the matched value in the grouping fingerprint hash. This is how you create a separate group per unique value:

```text
%{IDENT:#function_name}
```

Without a capture name, use the `fingerprint` option instead:

```text
%{IDENT,fingerprint}
```

### Primary constraint

The parenthesized argument `%{TYPE(arg)}` constrains which tokens match. Its meaning depends on the type:

- **Value constraint** (most types): match only tokens with this exact value.```text
%{LOG_LEVEL(ERROR)}
%{LOG_LEVEL(ERROR):level}
```
- **Key constraint** (`ATTR`): match only `key=value` tokens whose key equals the argument. The key auto-captures, so `%{ATTR(status)}` is equivalent to `%{ATTR(status):status}`. Use `_` to suppress auto-capture.```text
%{ATTR(status)}
%{ATTR(user_id):var}
%{ATTR(status):_}
```

### Options

Options are comma-separated `key=value` pairs after the capture name:

```text
%{NUMBER:duration,unit=seconds}
```

<table>
<thead>
  <tr>
    <th>
      Option
    </th>
    
    <th>
      Description
    </th>
  </tr>
</thead>

<tbody>
  <tr>
    <td>
      <code>
        unit
      </code>
    </td>
    
    <td>
      Attach a unit to numeric values for normalization
    </td>
  </tr>
  
  <tr>
    <td>
      <code>
        fingerprint
      </code>
    </td>
    
    <td>
      Include the matched value in the grouping fingerprint hash (no value)
    </td>
  </tr>
  
  <tr>
    <td>
      <code>
        extract
      </code>
    </td>
    
    <td>
      Backtick-quoted Go regex applied to a <code>
        %{QUOTED}
      </code>
      
       token; named groups become captures
    </td>
  </tr>
</tbody>
</table>

Supported units (used with `unit=...`):

- **Time:** `nanoseconds`, `microseconds` (`us`), `milliseconds` (`ms`), `seconds` (`s`)
- **Storage:** `bytes` (`by`), `kilobytes` (`kb`), `megabytes` (`mb`), `gigabytes` (`gb`), `terabytes` (`tb`)
- **Other:** `percents` (`%`), `count`, `celsius`, `volts`, `amperes`, `joules`, `grams`

### Regex extraction (`extract` option)

On `%{QUOTED}` matchers you can supply a Go regex in backticks. Named groups in the regex become additional captures at runtime:

```text
ERROR %{QUOTED:msg,extract=`(?P<name>\w+) is (?P<age>\d+)`}
```

Given the log line `ERROR "Alice is 25"`, this rule captures `msg=Alice is 25`, `name=Alice`, and `age=25`.

If the regex does not match the quoted token, the rule still fires and the outer `:capture` is still recorded, but no extra attributes are emitted. Optional named groups that do not participate in a match are not emitted as empty attributes. If a named group collides with the matcher's own `:capture` or another attribute key, a numeric suffix is appended (e.g., the second value becomes `name1`).

### Optional matchers

Append `?` to make a matcher optional — the pattern still matches if the token is absent:

```text
error code %{NUMBER:code}? occurred
```

This matches both `error code 500 occurred` and `error code occurred`.

### Groups and alternatives

Parentheses define a group of alternatives separated by `|`:

```text
(%{LOG_LEVEL:level}|%{WORD:level}) %{WORD:msg}
```

Groups themselves can be optional:

```text
(%{LOG_LEVEL})?
```

### Repeat matchers (`%{ANY}+` / `%{ANY}*`)

Append `+` or `*` to `%{ANY}` to match multiple tokens of any type:

- `%{ANY}+` matches **one or more** tokens.
- `%{ANY}*` matches **zero or more** tokens.

Repeat matchers work anywhere in a pattern:

```text
error %{WORD:action} %{ANY:details}+ matches "error connect failed with timeout"
foo %{ANY:mid}+ bar matches "foo x y z bar"
%{IDENT:function} failed %{ANY}+ matches "myFunc failed with status 500"
```

Repeat is only valid on `%{ANY}`. It cannot be combined with a value constraint or a `unit` option.

## Available types

Some types are **virtual** — they expand to several concrete types.

<table>
<thead>
  <tr>
    <th>
      Virtual
    </th>
    
    <th>
      Expands to
    </th>
  </tr>
</thead>

<tbody>
  <tr>
    <td>
      <code>
        ANY
      </code>
    </td>
    
    <td>
      Any single token, regardless of type
    </td>
  </tr>
  
  <tr>
    <td>
      <code>
        NUMBER
      </code>
    </td>
    
    <td>
      INT, FLOAT, BYTE_SIZE, TRACE_ID_HEX
    </td>
  </tr>
  
  <tr>
    <td>
      <code>
        IP
      </code>
    </td>
    
    <td>
      IPV4, IPV6
    </td>
  </tr>
  
  <tr>
    <td>
      <code>
        IDENT
      </code>
    </td>
    
    <td>
      WORD, IDENT
    </td>
  </tr>
  
  <tr>
    <td>
      <code>
        TIMESTAMP
      </code>
    </td>
    
    <td>
      ISO8601_DATE, UNIX_DATE, HTTP_DATE, SYSLOG_DATE, DATETIME
    </td>
  </tr>
</tbody>
</table>

### Text

<table>
<thead>
  <tr>
    <th>
      Type
    </th>
    
    <th>
      Description
    </th>
    
    <th>
      Example
    </th>
  </tr>
</thead>

<tbody>
  <tr>
    <td>
      <code>
        WORD
      </code>
    </td>
    
    <td>
      A single alphabetical word
    </td>
    
    <td>
      <code>
        error
      </code>
      
      , <code>
        database
      </code>
    </td>
  </tr>
  
  <tr>
    <td>
      <code>
        IDENT
      </code>
    </td>
    
    <td>
      An identifier
    </td>
    
    <td>
      <code>
        user_id
      </code>
      
      , <code>
        MyClass
      </code>
      
      , <code>
        obj.attr
      </code>
    </td>
  </tr>
  
  <tr>
    <td>
      <code>
        QUOTED
      </code>
    </td>
    
    <td>
      A quoted string
    </td>
    
    <td>
      <code>
        "hello world"
      </code>
      
      , <code>
        'foo'
      </code>
    </td>
  </tr>
  
  <tr>
    <td>
      <code>
        UNKNOWN
      </code>
    </td>
    
    <td>
      An unclassified segment (lexer fallback when no other type matches)
    </td>
    
    <td>
      <code>
        ??!!
      </code>
      
      , <code>
        \x1b[0m
      </code>
    </td>
  </tr>
  
  <tr>
    <td>
      <code>
        ANY
      </code>
    </td>
    
    <td>
      Any single token. With <code>
        +
      </code>
      
      : one or more tokens. With <code>
        *
      </code>
      
      : zero or more tokens.
    </td>
    
    <td>
      <code>
        42
      </code>
      
      , <code>
        GET
      </code>
      
      , <code>
        foo
      </code>
    </td>
  </tr>
</tbody>
</table>

### Numeric

<table>
<thead>
  <tr>
    <th>
      Type
    </th>
    
    <th>
      Description
    </th>
    
    <th>
      Example
    </th>
  </tr>
</thead>

<tbody>
  <tr>
    <td>
      <code>
        NUMBER
      </code>
    </td>
    
    <td>
      Any numeric (INT, FLOAT, BYTE_SIZE, TRACE_ID_HEX)
    </td>
    
    <td>
      <code>
        42
      </code>
      
      , <code>
        3.14
      </code>
      
      , <code>
        10KB
      </code>
    </td>
  </tr>
  
  <tr>
    <td>
      <code>
        INT
      </code>
    </td>
    
    <td>
      Integer
    </td>
    
    <td>
      <code>
        200
      </code>
      
      , <code>
        -17
      </code>
    </td>
  </tr>
  
  <tr>
    <td>
      <code>
        FLOAT
      </code>
    </td>
    
    <td>
      Floating-point number
    </td>
    
    <td>
      <code>
        3.14
      </code>
      
      , <code>
        -0.5
      </code>
      
      , <code>
        1e9
      </code>
    </td>
  </tr>
  
  <tr>
    <td>
      <code>
        BYTE_SIZE
      </code>
    </td>
    
    <td>
      Byte size with unit
    </td>
    
    <td>
      <code>
        10KB
      </code>
      
      , <code>
        2.5MiB
      </code>
      
      , <code>
        512B
      </code>
    </td>
  </tr>
  
  <tr>
    <td>
      <code>
        TRACE_ID_HEX
      </code>
    </td>
    
    <td>
      32-character hex string
    </td>
    
    <td>
      <code>
        5d41402abc4b2a76b9719d911017c592
      </code>
      
       (MD5)
    </td>
  </tr>
</tbody>
</table>

### Network

<table>
<thead>
  <tr>
    <th>
      Type
    </th>
    
    <th>
      Description
    </th>
    
    <th>
      Example
    </th>
  </tr>
</thead>

<tbody>
  <tr>
    <td>
      <code>
        IP
      </code>
    </td>
    
    <td>
      IPV4 or IPV6 (virtual)
    </td>
    
    <td>
      <code>
        127.0.0.1
      </code>
      
      , <code>
        ::1
      </code>
    </td>
  </tr>
  
  <tr>
    <td>
      <code>
        IPV4
      </code>
    </td>
    
    <td>
      IPV4 address
    </td>
    
    <td>
      <code>
        192.168.1.1
      </code>
    </td>
  </tr>
  
  <tr>
    <td>
      <code>
        IPV6
      </code>
    </td>
    
    <td>
      IPV6 address
    </td>
    
    <td>
      <code>
        2001:db8::1
      </code>
    </td>
  </tr>
  
  <tr>
    <td>
      <code>
        HOST_PORT
      </code>
    </td>
    
    <td>
      <code>
        host:port
      </code>
      
       combination
    </td>
    
    <td>
      <code>
        10.0.0.1:443
      </code>
    </td>
  </tr>
  
  <tr>
    <td>
      <code>
        MAC
      </code>
    </td>
    
    <td>
      MAC address
    </td>
    
    <td>
      <code>
        00:1A:2B:3C:4D:5E
      </code>
    </td>
  </tr>
  
  <tr>
    <td>
      <code>
        EMAIL
      </code>
    </td>
    
    <td>
      Email address
    </td>
    
    <td>
      <code>
        admin@example.com
      </code>
    </td>
  </tr>
  
  <tr>
    <td>
      <code>
        URI
      </code>
    </td>
    
    <td>
      Full URI
    </td>
    
    <td>
      <code>
        https://example.com/api/v1
      </code>
    </td>
  </tr>
</tbody>
</table>

### Temporal

<table>
<thead>
  <tr>
    <th>
      Type
    </th>
    
    <th>
      Description
    </th>
    
    <th>
      Example
    </th>
  </tr>
</thead>

<tbody>
  <tr>
    <td>
      <code>
        TIMESTAMP
      </code>
    </td>
    
    <td>
      Any timestamp format (virtual)
    </td>
    
    <td>
      <code>
        2024-01-15T14:30:00Z
      </code>
    </td>
  </tr>
  
  <tr>
    <td>
      <code>
        ISO8601_DATE
      </code>
    </td>
    
    <td>
      ISO8601 / RFC3339
    </td>
    
    <td>
      <code>
        2024-01-15T14:30:00Z
      </code>
    </td>
  </tr>
  
  <tr>
    <td>
      <code>
        UNIX_DATE
      </code>
    </td>
    
    <td>
      Unix date
    </td>
    
    <td>
      <code>
        Mon Jan 2 15:04:05 MST 2006
      </code>
    </td>
  </tr>
  
  <tr>
    <td>
      <code>
        HTTP_DATE
      </code>
    </td>
    
    <td>
      HTTP log date
    </td>
    
    <td>
      <code>
        21/Nov/2024:14:20:00 +0000
      </code>
    </td>
  </tr>
  
  <tr>
    <td>
      <code>
        SYSLOG_DATE
      </code>
    </td>
    
    <td>
      Syslog timestamp
    </td>
    
    <td>
      <code>
        Jan 2 15:04:05
      </code>
    </td>
  </tr>
  
  <tr>
    <td>
      <code>
        DATETIME
      </code>
    </td>
    
    <td>
      Date and time
    </td>
    
    <td>
      <code>
        2024-01-15 14:30:00
      </code>
    </td>
  </tr>
  
  <tr>
    <td>
      <code>
        DATE
      </code>
    </td>
    
    <td>
      Date only
    </td>
    
    <td>
      <code>
        2024-01-15
      </code>
    </td>
  </tr>
  
  <tr>
    <td>
      <code>
        TIME
      </code>
    </td>
    
    <td>
      Time of day
    </td>
    
    <td>
      <code>
        14:30:00
      </code>
    </td>
  </tr>
  
  <tr>
    <td>
      <code>
        MONTH_NAME
      </code>
    </td>
    
    <td>
      Month name
    </td>
    
    <td>
      <code>
        Jan
      </code>
      
      , <code>
        February
      </code>
    </td>
  </tr>
  
  <tr>
    <td>
      <code>
        WEEKDAY
      </code>
    </td>
    
    <td>
      Day-of-week name
    </td>
    
    <td>
      <code>
        Mon
      </code>
      
      , <code>
        Tuesday
      </code>
    </td>
  </tr>
</tbody>
</table>

### Structured

<table>
<thead>
  <tr>
    <th>
      Type
    </th>
    
    <th>
      Description
    </th>
    
    <th>
      Example
    </th>
  </tr>
</thead>

<tbody>
  <tr>
    <td>
      <code>
        JSON
      </code>
    </td>
    
    <td>
      JSON object or array
    </td>
    
    <td>
      <code>
        {"foo": "bar"}
      </code>
      
      , <code>
        [1, 2, 3]
      </code>
    </td>
  </tr>
  
  <tr>
    <td>
      <code>
        ATTR
      </code>
    </td>
    
    <td>
      <code>
        key=value
      </code>
      
       attribute
    </td>
    
    <td>
      <code>
        status=200
      </code>
      
      , <code>
        user_id=42
      </code>
    </td>
  </tr>
</tbody>
</table>

### System

<table>
<thead>
  <tr>
    <th>
      Type
    </th>
    
    <th>
      Description
    </th>
    
    <th>
      Example
    </th>
  </tr>
</thead>

<tbody>
  <tr>
    <td>
      <code>
        LOG_LEVEL
      </code>
    </td>
    
    <td>
      Log severity level
    </td>
    
    <td>
      <code>
        INFO
      </code>
      
      , <code>
        WARN
      </code>
      
      , <code>
        ERROR
      </code>
    </td>
  </tr>
  
  <tr>
    <td>
      <code>
        HTTP_METHOD
      </code>
    </td>
    
    <td>
      HTTP method
    </td>
    
    <td>
      <code>
        GET
      </code>
      
      , <code>
        POST
      </code>
      
      , <code>
        DELETE
      </code>
    </td>
  </tr>
  
  <tr>
    <td>
      <code>
        HTTP_VERSION
      </code>
    </td>
    
    <td>
      HTTP protocol version
    </td>
    
    <td>
      <code>
        HTTP/1.1
      </code>
      
      , <code>
        HTTP/2.0
      </code>
    </td>
  </tr>
  
  <tr>
    <td>
      <code>
        HTTP_STATUS
      </code>
    </td>
    
    <td>
      HTTP status code with reason phrase
    </td>
    
    <td>
      <code>
        200 OK
      </code>
      
      , <code>
        404 Not Found
      </code>
    </td>
  </tr>
  
  <tr>
    <td>
      <code>
        URI_PATH
      </code>
    </td>
    
    <td>
      File or URL path
    </td>
    
    <td>
      <code>
        /api/users
      </code>
      
      , <code>
        /var/log/syslog
      </code>
    </td>
  </tr>
  
  <tr>
    <td>
      <code>
        UUID
      </code>
    </td>
    
    <td>
      UUID string
    </td>
    
    <td>
      <code>
        88da75f6-a07e-40b3-8c62-f2b28c505ff2
      </code>
    </td>
  </tr>
  
  <tr>
    <td>
      <code>
        HASHTAG
      </code>
    </td>
    
    <td>
      Hashtag
    </td>
    
    <td>
      <code>
        #deploy
      </code>
      
      , <code>
        #uptrace
      </code>
    </td>
  </tr>
  
  <tr>
    <td>
      <code>
        HTML_TAG
      </code>
    </td>
    
    <td>
      HTML tag
    </td>
    
    <td>
      <code>
        <html>
      </code>
      
      , <code>
        </div>
      </code>
      
      , <code>
        <br/>
      </code>
    </td>
  </tr>
</tbody>
</table>

### Type aliases

For compatibility with existing Grok corpora, several types have alternative names:

<table>
<thead>
  <tr>
    <th>
      Alias
    </th>
    
    <th>
      Canonical
    </th>
  </tr>
</thead>

<tbody>
  <tr>
    <td>
      <code>
        STRING
      </code>
    </td>
    
    <td>
      <code>
        QUOTED
      </code>
    </td>
  </tr>
  
  <tr>
    <td>
      <code>
        NUM
      </code>
    </td>
    
    <td>
      <code>
        NUMBER
      </code>
    </td>
  </tr>
  
  <tr>
    <td>
      <code>
        INTEGER
      </code>
    </td>
    
    <td>
      <code>
        INT
      </code>
    </td>
  </tr>
  
  <tr>
    <td>
      <code>
        TIMESTAMP_ISO8601
      </code>
    </td>
    
    <td>
      <code>
        ISO8601_DATE
      </code>
    </td>
  </tr>
  
  <tr>
    <td>
      <code>
        SYSLOGTIMESTAMP
      </code>
    </td>
    
    <td>
      <code>
        SYSLOG_DATE
      </code>
    </td>
  </tr>
</tbody>
</table>

## Fingerprints

Uptrace groups similar logs and exceptions by hashing certain parts of the message. By default it only hashes literal words; use `#` (or the `fingerprint` option) to include captured values in the hash.

For example:

```text
unknown column: %{WORD:#column}
```

The pattern above creates a separate group for each column, which is useful for [alerting](/features/alerting):

```text
# Group 1
unknown column: foo
unknown column: foo

# Group 2
unknown column: bar
unknown column: bar
```

You can also set the `grouping.fingerprint` attribute when creating logs and exceptions, which overrides the automatically derived fingerprint:

```go
span := trace.SpanFromContext(ctx)

span.AddEvent("exception", trace.WithAttributes(
    attribute.String("exception.type", "*exec.ExitError"),
    attribute.String("exception.message", "exit status 1"),
    attribute.String("grouping.fingerprint", "exec.ExitError"),
))
```

## Examples

Go-style error messages:

```text
# Messages
strconv.ParseInt failed
SendEmail failed
mypkg.MyFunc failed

# Pattern
%{IDENT:#code_function} failed
```

PostgreSQL unknown column errors:

```text
# Error messages
ERROR: column "event.created_at" does not exist (SQLSTATE=42703)
ERROR: column "updated_at" does not exist (SQLSTATE=42703)
ERROR: column "name" does not exist (SQLSTATE=42703)

# Pattern
%{LOG_LEVEL:log_severity} column %{QUOTED:#column} does not exist %{ATTR:sqlstate}
```

A single grouping rule may declare multiple patterns — any of them matching is enough:

```text
can't find item %{NUMBER:item_id}
can not find item %{NUMBER:item_id}
%{NUMBER:item_id} not found
```

## Conclusion

Grouping rules work best with [structured logs](/glossary/structured-logging) and are not a replacement for the log parsers provided by [OpenTelemetry Logs](/opentelemetry/logs) and [Vector](/ingest/logs/vector).
