# Scaling Uptrace

> Tune Uptrace processing pipelines, ClickHouse inserts, and optionally add Kafka for high-volume ingestion.

This guide covers how to scale Uptrace for higher ingestion volumes. Work through these steps in order:

1. Review [scaling strategy](#scaling-strategy) — vertical first, then horizontal.
2. [Diagnose the bottleneck](#diagnosing-bottlenecks) — buffer-full errors, part counts, CPU.
3. Tune [Uptrace processing pipelines](#uptrace-processing-pipelines) — more threads and larger buffers.
4. Optimize [ClickHouse inserts](#clickhouse-insert-tuning) — async inserts and batching.
5. Add [ClickHouse sharding](#clickhouse-sharding) — when a single node isn't enough.
6. Consider [Kafka](#kafka) — only after exhausting the other options.

For the full list of configuration options, see the [configuration reference](/get/hosted/config). Most changes require an Uptrace restart — see [upgrading Uptrace](/get/hosted/install#upgrading-uptrace) for the procedure.

## Scaling strategy

### Vertical first

According to [ClickHouse documentation](https://clickhouse.com/blog/common-getting-started-issues-with-clickhouse#2-going-horizontal-too-early), prefer vertical scaling over horizontal scaling:

> Successful deployments with ClickHouse use servers with hundreds of cores, terabytes of RAM, and petabytes of disk space. Scaling vertically first provides cost efficiency, lower operational overhead, and better query performance.

### Horizontal scaling

Uptrace instances are stateless and can be scaled horizontally. Use a load balancer to distribute traffic across instances — each instance should connect to the same PostgreSQL and ClickHouse databases.

### Resource requirements

<table>
<thead>
  <tr>
    <th>
      Component
    </th>
    
    <th>
      CPU
    </th>
    
    <th>
      RAM
    </th>
    
    <th>
      Notes
    </th>
  </tr>
</thead>

<tbody>
  <tr>
    <td>
      Uptrace instance
    </td>
    
    <td>
      2–4
    </td>
    
    <td>
      4–8 GB
    </td>
    
    <td>
      Per instance behind load balancer
    </td>
  </tr>
  
  <tr>
    <td>
      PostgreSQL
    </td>
    
    <td>
      2–4
    </td>
    
    <td>
      4–8 GB
    </td>
    
    <td>
      SSD storage recommended
    </td>
  </tr>
  
  <tr>
    <td>
      ClickHouse
    </td>
    
    <td>
      Scale
    </td>
    
    <td>
      Scale
    </td>
    
    <td>
      Prefer large instances over many small
    </td>
  </tr>
  
  <tr>
    <td>
      Redis
    </td>
    
    <td>
      1–2
    </td>
    
    <td>
      2–4 GB
    </td>
    
    <td>
      For caching and session storage
    </td>
  </tr>
</tbody>
</table>

## Diagnosing bottlenecks

Before changing configuration, identify the actual bottleneck:

- **in-memory buffer is full in Uptrace logs** — ingestion is outpacing ClickHouse inserts. Increase `max_buffered_bytes` or add more processing threads.
- **High active part count** (`system.parts` table) — too many small inserts are creating parts faster than merges can consolidate them. Enable async inserts or increase `ch_max_insert_size`.
- **High merge activity** (`system.merges` table) — related to high part count. Larger, less frequent inserts reduce merge pressure.
- **Insert latency** — slow inserts cause buffers to fill up. Check ClickHouse disk I/O and CPU. SSDs are strongly recommended. Also ensure [`async_insert`](#async_insert) is enabled and [`wait_for_async_insert`](#wait_for_async_insert) is set appropriately — synchronous inserts add a flush round-trip to every batch.
- **CPU saturation on the Uptrace host** — if all cores are busy, adding more `max_threads` won't help. Consider moving ClickHouse to a separate server or adding shards.

## Uptrace processing pipelines

Uptrace processes telemetry through parallel pipelines — one each for spans, span links, logs, events, and metrics. Each pipeline has three settings:

<table>
<thead>
  <tr>
    <th>
      Setting
    </th>
    
    <th>
      What it controls
    </th>
    
    <th>
      Default
    </th>
  </tr>
</thead>

<tbody>
  <tr>
    <td>
      <code>
        max_threads
      </code>
    </td>
    
    <td>
      Processing goroutines
    </td>
    
    <td>
      <code>
        GOMAXPROCS
      </code>
      
       (CPU cores)
    </td>
  </tr>
  
  <tr>
    <td>
      <code>
        max_buffered_bytes
      </code>
    </td>
    
    <td>
      In-memory buffer cap
    </td>
    
    <td>
      <code>
        max_threads × 64 MiB
      </code>
    </td>
  </tr>
  
  <tr>
    <td>
      <code>
        ch_max_insert_size
      </code>
    </td>
    
    <td>
      Records per INSERT batch
    </td>
    
    <td>
      <code>
        10000
      </code>
      
       (varies by pipeline)
    </td>
  </tr>
</tbody>
</table>

### max_threads

The number of goroutines that process records and insert them into ClickHouse. Each thread pulls a batch from the in-memory buffer, processes it, and executes a ClickHouse `INSERT`.

Default: `GOMAXPROCS` (CPU cores). Increase when CPU is not the bottleneck but insert latency is — for example, when ClickHouse is on a separate server and network round-trips dominate.

```yml
spans:
  max_threads: 16
```

Setting `max_threads` higher than the number of CPU cores is safe — the threads spend most of their time waiting on ClickHouse I/O.

### max_buffered_bytes

Caps the total estimated bytes held in memory for a pipeline. When the buffer is full, incoming records are dropped and an `in-memory buffer is full` error is logged.

Default: `max_threads × 64 MiB`. With 16 threads the default buffer is 1 GiB.

```yml
spans:
  max_buffered_bytes: 536870912 # 512 MiB
```

If you see buffer-full errors, ingestion is temporarily outpacing ClickHouse insert speed. Increase the value to absorb bursts without dropping data. As a rough guide, size the buffer for the longest realistic ClickHouse stall (a merge, a brief network hiccup, etc.).

### ch_max_insert_size

The maximum number of records batched into a single ClickHouse `INSERT`.

```yml
spans:
  ch_max_insert_size: 10000
```

The right value depends on whether ClickHouse async inserts are enabled:

- **async_insert = 1 (recommended).** ClickHouse accumulates inserts into its own buffer and flushes them as larger batches. You can use a smaller `ch_max_insert_size` (`5000`–`10000`) — smaller Uptrace batches flush faster and reduce buffer pressure, while ClickHouse still creates efficiently sized data parts.
- **async_insert = 0.** Each Uptrace insert creates a separate data part. Larger values (`10000`–`20000`) are important to avoid too many small parts, which increases merge pressure and degrades query performance.

In both cases, very large batches increase per-insert latency and memory usage. Reduce the value if inserts are timing out.

### Example: high-volume configuration

The values below assume `async_insert = 1` and use more threads and larger buffers than the defaults. The combined buffer totals roughly 2.6 GiB — make sure the host has enough RAM, especially if Uptrace and ClickHouse share the same server.

```yml
spans:
  max_threads: 16
  ch_max_insert_size: 10000
  max_buffered_bytes: 1073741824 # 1 GiB

span_links:
  max_threads: 2
  ch_max_insert_size: 5000
  max_buffered_bytes: 134217728 # 128 MiB

logs:
  max_threads: 12
  ch_max_insert_size: 10000
  max_buffered_bytes: 805306368 # 768 MiB

events:
  max_threads: 4
  ch_max_insert_size: 5000
  max_buffered_bytes: 268435456 # 256 MiB

metrics:
  max_threads: 8
  ch_max_insert_size: 10000
  max_buffered_bytes: 536870912 # 512 MiB
```

If you see `in-memory buffer is full` errors, increase `max_buffered_bytes` for the affected pipeline and add more RAM to the host.

## ClickHouse insert tuning

The settings below are ClickHouse server settings. You can set them globally in ClickHouse (`users.xml` or `profiles.xml`) or pass them per-query through the Uptrace config:

```yml
ch_cluster:
  shards:
    - replicas:
        - addr: localhost:9000
          query_settings:
            async_insert: 1
            wait_for_async_insert: 1
            async_insert_max_data_size: 200000000
            async_insert_busy_timeout_ms: 5000
```

The async insert buffer flushes when **either** `async_insert_max_data_size` or `async_insert_busy_timeout_ms` is reached — whichever comes first.

### async_insert

When enabled (`1`, recommended), ClickHouse buffers incoming inserts and flushes them as a single batch instead of writing each insert immediately. This reduces the number of data parts created, lowering merge pressure and improving throughput.

### wait_for_async_insert

Controls whether ClickHouse acknowledges an insert immediately or waits until the buffer is flushed to disk.

- **wait_for_async_insert: 1.** Uptrace waits for the flush to complete. Acknowledged data is guaranteed to be persisted, and insert errors are reported back to Uptrace.
- **wait_for_async_insert: 0.** The insert returns as soon as data enters the buffer. Faster, but a ClickHouse crash can lose buffered data, and insert errors are silently ignored.

### async_insert_max_data_size

Maximum bytes in the async insert buffer before a flush is triggered.

Default: `100 MiB`. Since Uptrace already batches on the client side via `ch_max_insert_size`, the async buffer is merging a few concurrent batches rather than thousands of tiny inserts. For high-volume deployments, increase to 200–500 MB to create fewer, larger data parts at the cost of more memory on the ClickHouse side.

### async_insert_busy_timeout_ms

Maximum time (ms) before the async buffer is flushed, even if it hasn't reached `async_insert_max_data_size`.

Default: `1000` ms. Increase to `3000`–`5000` ms to give the buffer more time to fill, resulting in fewer and larger data parts. Lower values reduce the data-loss window when `wait_for_async_insert: 0`, but create more parts. High-volume deployments generally benefit from longer timeouts.

## ClickHouse sharding

For ingestion volumes that exceed a single ClickHouse node, distribute data across multiple shards. Sharding is a **Premium** feature.

Each shard is an independent ClickHouse node (or replicated set) that stores a portion of the data. Uptrace distributes writes using weighted round-robin.

```yml
ch_cluster:
  cluster: 'uptrace1'
  replicated: true
  distributed: true # Required for sharding (Premium only)

  shards:
    - weight: 1
      replicas:
        - addr: clickhouse-1a:9000
          database: uptrace
        - addr: clickhouse-1b:9000
          database: uptrace

    - weight: 1
      replicas:
        - addr: clickhouse-2a:9000
          database: uptrace
        - addr: clickhouse-2b:9000
          database: uptrace
```

The `weight` parameter controls write distribution — a shard with weight `2` receives twice as many writes as weight `1`. Use this when shards have different hardware capacity.

This configuration tells Uptrace which shards exist. You also need to configure the ClickHouse cluster itself — define the topology in ClickHouse's `remote_servers` so that ClickHouse knows about all shards and replicas. See the [ClickHouse cluster deployment docs](https://clickhouse.com/docs/architecture/cluster-deployment) for details.

When adding shards to an existing deployment, you don't need to redistribute old data. New writes go to the new shards, and queries automatically span all shards via distributed tables.

## Kafka-based ingestion

Kafka integration is available in the **Enterprise tier** only.

Kafka solves two problems:

1. **Buffering during maintenance.** When ClickHouse is down or can't keep up, Kafka retains data until ClickHouse is ready. Without Kafka, data arriving during these windows is dropped.
2. **Efficient batching.** Kafka enables larger, more efficient batches before writing to ClickHouse, improving insert throughput.

That said, Kafka adds operational complexity — evaluate whether the buffering benefits justify the additional infrastructure. Consider it only after exhausting the tuning options above.

### Architecture

1. **Uptrace API** receives data from clients, validates it, applies rate limits, and writes to Kafka.
2. **Uptrace Worker** consumes raw data, processes it (sampling, service graph, etc.), and publishes to a second Kafka topic.
3. **ClickHouse** consumes directly from that topic using the [Kafka engine](https://clickhouse.com/docs/engines/table-engines/integrations/kafka).

```mermaid
graph TD
    Clients["Clients\n(OTel SDK, Collector)"]
    API["Uptrace API\nvalidate, rate-limit"]
    K1["Kafka\n(raw topics)"]
    Worker["Uptrace Worker\nsampling, service graph"]
    K2["Kafka\n(processed topics)"]
    CH[("ClickHouse")]

    Clients --> API
    API --> K1
    K1 --> Worker
    Worker --> K2
    K2 -->|Kafka engine| CH
```

Each component scales and fails independently — the API accepts data at line rate, the worker processes at its own pace, and ClickHouse consumes with its own batching strategy.

### Enabling Kafka

Kafka is enabled simply by configuring brokers — there is no separate on/off switch. As soon as `kafka.brokers` is non-empty, Uptrace switches the telemetry pipeline from direct ClickHouse inserts to Kafka. There are two related blocks:

- **kafka** — the top-level transport used by the Uptrace API (producer) and the worker (consumer).
- **ch_cluster.kafka** — the brokers the [ClickHouse Kafka engine](https://clickhouse.com/docs/engines/table-engines/integrations/kafka) tables read from. In a typical deployment these point at the same brokers.

```yml
##
## Top-level Kafka transport. The API publishes telemetry here and the
## worker consumes it. Setting at least one broker switches the pipeline
## from direct ClickHouse inserts to Kafka.
##
kafka:
  brokers:
    - 'kafka-1:9092'
    - 'kafka-2:9092'

  # Optional SASL authentication.
  # Mechanism: PLAIN, SCRAM-SHA-256, or SCRAM-SHA-512.
  #sasl: SCRAM-SHA-512
  #user: uptrace
  #password: <secret>

  # Optional TLS for broker connections.
  #tls:
  #  ca_file: /etc/uptrace/kafka-ca.pem

  # Producer tuning.
  max_message_bytes: 10485760 # 10 MiB, must fit your largest batch
  linger: 10ms # how long the producer waits to fill a batch

  # Emit OpenTelemetry traces/metrics for the Kafka client itself.
  otel_tracing: false
  otel_metrics: false

##
## ClickHouse Kafka engine tables read from these brokers. Usually the
## same brokers as above. SASL/TLS for the engine is configured in
## ClickHouse, not here — only the broker list is used.
##
ch_cluster:
  kafka:
    brokers:
      - 'kafka-1:9092'
      - 'kafka-2:9092'
```

When `kafka.brokers` is empty (the default), Uptrace runs in-process and writes directly to ClickHouse — no worker and no Kafka required.

If you deploy with [Ansible](/get/hosted/ansible#step-5-deploy-kafka-optional), adding a `kafka` group to the inventory renders both of these blocks and starts the worker automatically.

### Running the worker

With Kafka enabled, the API only validates and publishes data — a separate worker process drains the Kafka topics and inserts into ClickHouse and PostgreSQL. Run it alongside the main `uptrace` process:

```shell
uptrace --config=/etc/uptrace/config.yml worker
```

The worker requires both Kafka brokers and the licensed **Kafka feature** (Enterprise tier). It exits on startup if either is missing:

- *worker requires Kafka brokers: set kafka.brokers in the config* — brokers are not configured.
- *worker requires the Kafka feature, which is not enabled on this instance* — the license does not include Kafka.

Run as many worker instances as needed to keep up with ingestion; they share the consumer group and partition the load automatically.

### Tuning the ClickHouse Kafka engine

Per-table consumer behavior is tuned under `ch_schema`. Each telemetry table (`spans_data`, `logs_data`, `events_data`, `datapoints`, …) accepts the same Kafka engine settings, so you can tune high-volume tables independently:

```yml
ch_schema:
  spans_data:
    # Rows per block read from Kafka before flushing to the table.
    kafka_max_block_size: 1048576
    # Max messages fetched per poll.
    kafka_poll_max_batch_size: 65536
    # Timeout for a single Kafka poll.
    kafka_poll_timeout: 1s
    # How often buffered data is flushed, even if a block isn't full.
    kafka_flush_interval: 5s
    # One thread per consumer (default 1).
    kafka_thread_per_consumer: 1
    # Malformed-message handling: default, stream, or dead_letter_queue.
    kafka_handle_error_mode: stream
    # Schema-incompatible messages tolerated per block before erroring.
    kafka_skip_broken_messages: 0
    # Per-partition consumer lag (messages) above which the pipeline is
    # treated as stale, so alert monitors defer the current evaluation
    # until the consumer catches up (avoids alerting on incomplete data).
    kafka_lag_threshold: 100
```

Only `kafka_thread_per_consumer` (1), `kafka_handle_error_mode` (`stream`), and `kafka_lag_threshold` (100) have defaults; the remaining settings fall back to the ClickHouse Kafka engine defaults when omitted. Raise `kafka_max_block_size` and `kafka_flush_interval` on high-volume tables to create fewer, larger data parts, at the cost of higher consumer memory and flush latency.
