# Deploying Uptrace with Ansible

> Automate Uptrace deployments on bare metal with the official Ansible playbooks, inventories, and role based setup.

This guide provides comprehensive instructions for deploying Uptrace, an open-source APM and observability platform, on bare metal servers using Ansible automation. Uptrace supports [distributed tracing](/opentelemetry/distributed-tracing), metrics, and log management to help you monitor your applications effectively.

## What is Ansible?

Ansible is an open-source IT automation tool that enables configuration management, application deployment, infrastructure provisioning, and orchestration. It allows you to automate repetitive tasks, deploy software, and manage systems consistently and reliably across multiple servers.

## Prerequisites

Before proceeding with this guide, ensure you have:

1. **Ansible installed** on your control machine - Follow the [official installation guide](https://docs.ansible.com/ansible/latest/installation_guide/index.html)
2. **SSH access** to your target servers with sudo privileges
3. **Minimum server requirements**:

  - 2+ CPU cores
  - 4GB+ RAM
  - 20GB+ disk space
  - Ubuntu 24.04
4. **Network connectivity** between servers for cluster configurations

## Getting Started

### Clone the Ansible Repository

Uptrace maintains a comprehensive set of Ansible playbooks to deploy Uptrace with all required dependencies, including ClickHouse, PostgreSQL, and Redis:

```shell
git clone https://github.com/uptrace/ansible.git
cd ansible
```

### Understanding the Repository Structure

The repository contains several key components:

- **Playbooks**: Main deployment scripts (`.yml` files in the root directory)
- **Roles**: Reusable automation components for each service
- **Templates**: Configuration file templates
- **Inventory**: Server definitions and groupings
- **Group variables**: Shared configuration settings

## Inventory Configuration

### Understanding Ansible Inventory

An Ansible inventory is a file that defines the hosts (servers or nodes) that Ansible will manage and execute tasks on. It organizes your infrastructure into logical groups and defines variables for each host or group.

### Setting Up Your Inventory

1. **Copy the sample inventory file**:```shell
cp inventory.sample.yml inventory.yml
```
2. **Edit the inventory file** to include your server details.

The inventory groups hosts by the role they play. A minimal single-node setup looks like this:

```yaml
uptrace:
  hosts:
    10.10.1.1: { primary: true }

postgresql:
  hosts:
    10.10.1.1:

clickhouse:
  children:
    clickhouse_server:
    clickhouse_keeper:

clickhouse_server:
  hosts:
    10.10.1.1: { ch_cluster: uptrace1, ch_shard: shard1, ch_replica: replica1 }

clickhouse_keeper:
  hosts:
    10.10.1.1: { keeper_id: 1 }

redis:
  children:
    redis_cache:

redis_cache:
  hosts:
    10.10.1.1: { redis_node_id: alpha, redis_maxmemory_mb: 128 }
```

You can colocate every group on a single host to start, then split services onto dedicated hosts as you scale. The `kafka` group is optional — see [Step 5](#step-5-deploy-kafka-optional).

## Configuration Management

### Configuration File Locations

The configuration is distributed across several locations for modularity and maintainability:

- **group_vars/all.yml**: Main configuration file containing database passwords and basic Uptrace settings
- **roles/uptrace/templates/config.yml**: Advanced Uptrace configuration options
- **roles/clickhouse-server/templates/config.xml**: ClickHouse-specific configuration
- **Role-specific templates**: Each service role contains its own configuration templates

### Setting Up Main Configuration

1. **Copy the sample configuration**:```shell
cp group_vars/all.sample.yml group_vars/all.yml
```
2. **Update the configuration** with your specific requirements:
  - Database passwords
  - Network settings
  - Security configurations
  - Resource limits

### Configuration Security

- Use strong, unique passwords for all services
- Consider using Ansible Vault for sensitive data
- Regularly rotate credentials
- Implement proper firewall rules

## Deployment Process

### Step 1: Bootstrap Servers

Bootstrap your servers and install common software, including the OpenTelemetry Collector:

```shell
ansible-playbook -i inventory.yml bootstrap.yml
```

**What this does**:

- Updates system packages
- Installs essential tools and dependencies
- Configures basic security settings
- Sets up OpenTelemetry Collector

### Step 2: Deploy ClickHouse

ClickHouse stores all observability data including spans, logs, events, and metrics.

1. **Update ClickHouse variables** in `group_vars/all.yml`:```yaml
ch_cluster: uptrace1
ch_db_name: uptrace
ch_db_user: uptrace
ch_db_password: your_secure_password
```
2. **Deploy ClickHouse**:```shell
ansible-playbook -i inventory.yml ch_server.yml
```

#### High Availability ClickHouse Configuration

For production, run multiple ClickHouse replicas coordinated by [ClickHouse Keeper](https://clickhouse.com/docs/guides/sre/keeper/clickhouse-keeper) (the built-in ZooKeeper replacement). Replication needs three pieces: hosts in the `clickhouse_keeper` group, replicas defined in `clickhouse_server`, and the `ch_replicated` flag.

1. **Define the cluster topology** in `inventory.yml`. Each ClickHouse host gets a `ch_shard` and `ch_replica`; the `clickhouse_keeper` group runs the coordination service (use 1 or 3 keeper nodes):```yaml
clickhouse_server:
  hosts:
    10.10.1.1: { ch_cluster: uptrace1, ch_shard: shard1, ch_replica: replica1 }
    10.10.1.2: { ch_cluster: uptrace1, ch_shard: shard1, ch_replica: replica2 }
    10.10.1.3: { ch_cluster: uptrace1, ch_shard: shard2, ch_replica: replica1 }
    10.10.1.4: { ch_cluster: uptrace1, ch_shard: shard2, ch_replica: replica2 }

clickhouse_keeper:
  hosts:
    10.10.1.1: { keeper_id: 1 }
    10.10.1.2: { keeper_id: 2 }
    10.10.1.3: { keeper_id: 3 }
```
2. **Enable replicated (and, for multiple shards, distributed) tables** in `group_vars/all.yml`:```yaml
ch_replicated: true # replicated tables for redundancy and failover
ch_distributed: true # distributed tables across shards (Premium feature)
```
3. **Deploy Keeper, then the servers**:```shell
ansible-playbook -i inventory.yml ch_keeper.yml
ansible-playbook -i inventory.yml ch_server.yml
```

**Important**: When adding new replicas to an *existing* cluster that already holds data, the new replica must sync the existing tables. See [adding and removing replicas](https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-data-migration/add_remove_replica/) in the Altinity knowledge base.

### Step 3: Deploy PostgreSQL

PostgreSQL stores metadata such as users, projects, and monitor configurations.

1. **Update PostgreSQL variables** in `group_vars/all.yml`:```yaml
pg_db_name: uptrace
pg_db_user: uptrace
pg_db_password: your_secure_password
```
2. **Deploy PostgreSQL**:```shell
ansible-playbook -i inventory.yml pg.yml
```

#### High Availability PostgreSQL Configuration

Configure primary-standby replication for high availability:

```yaml
postgresql:
  hosts:
    10.10.1.1: # Primary server
    10.10.1.2: # Standby server
      pg_primary_host: 10.10.1.1
```

**Note**: You must manually set up PostgreSQL replication between the standby and primary servers. This typically involves:

1. Creating a base backup on the primary server
2. Restoring the backup on the standby server
3. Configuring streaming replication

### Step 4: Deploy Redis

Redis provides in-memory caching for improved performance.

**Deploy Redis**:

```shell
ansible-playbook -i inventory.yml redis.yml
```

#### High Availability Redis Configuration

Configure multiple Redis instances for redundancy:

```yaml
redis_cache:
  hosts:
    10.10.1.1:
      redis_node_id: alpha
      redis_maxmemory_mb: 256
    10.10.1.2:
      redis_node_id: bravo
      redis_maxmemory_mb: 256
    10.10.1.3:
      redis_node_id: charlie
      redis_maxmemory_mb: 256
```

**Note**: Each Redis node operates independently, so no replication setup is required. Uptrace handles node selection automatically.

### Step 5: Deploy Kafka (Optional)

By default Uptrace writes incoming telemetry to ClickHouse synchronously. For high-ingestion deployments you can place [Apache Kafka](https://kafka.apache.org/) between ingestion and storage: Uptrace publishes spans, logs, and metrics to Kafka, and a separate `uptrace worker` process consumes them and inserts the data into ClickHouse. The same brokers are also used by the [ClickHouse Kafka engine](https://clickhouse.com/docs/engines/table-engines/integrations/kafka).

Kafka solves two problems: it **buffers data** when ClickHouse is down or can't keep up (without it, data arriving during those windows is dropped), and it enables **larger, more efficient batches** before writing to ClickHouse. It also lets ingestion and storage scale and fail independently. In return it adds operational complexity, so reach for it only after exhausting the other [scaling](/get/hosted/scale) options. The Kafka **worker requires the Enterprise tier** — without the licensed Kafka feature it refuses to start. See [scaling Uptrace](/get/hosted/scale#kafka) for the full configuration reference and tuning options.

Kafka is entirely optional and is enabled simply by adding a `kafka` group to your inventory. When the group is present, the Uptrace role automatically:

- renders a top-level `kafka` section and a `ch_cluster.kafka` section in `config.yml` from the group's hosts, and
- installs and starts an additional `uptrace-worker` systemd service that drains the Kafka queues.

If you later remove the `kafka` group and re-run `uptrace.yml`, the worker is stopped and removed and Uptrace reverts to writing directly to ClickHouse.

1. **Add a Kafka broker** to your inventory:```yaml
kafka:
  hosts:
    10.10.1.1: { kafka_node_id: 1 }
```

<br />

Brokers listen on port `9092`. Each broker advertises its inventory hostname, so make sure your Uptrace and ClickHouse hosts can reach the Kafka host at that address.
2. **Keep the KRaft cluster id constant** in `group_vars/all.yml` (already present in the sample). It is used to format Kafka storage and must not change between runs:```yaml
# Generate your own with: /opt/kafka/bin/kafka-storage.sh random-uuid
kafka_cluster_id: MkU3OEVBNTcwNTJENDM2Qk
```
3. **Deploy Kafka**:```shell
ansible-playbook -i inventory.yml kafka.yml
```
4. **Re-deploy Uptrace** so it picks up the Kafka configuration and starts the worker (covered in the next step):```shell
ansible-playbook -i inventory.yml uptrace.yml
```

#### Verification

```shell
# Check the worker service status
sudo systemctl status uptrace-worker

# View worker logs (should show it consuming Kafka topics)
sudo journalctl -u uptrace-worker -f
```

### Step 6: Deploy Uptrace

With all dependencies in place, deploy the main Uptrace application:

```shell
ansible-playbook -i inventory.yml uptrace.yml
```

#### Verification

Check that Uptrace is running correctly:

```shell
# View Uptrace logs
sudo journalctl -u uptrace -f

# Check service status
sudo systemctl status uptrace

# If Kafka is enabled, also check the worker that drains the queues
sudo systemctl status uptrace-worker

# Test connectivity
curl -f http://localhost:80/api/v1/health
```

#### First Login

Open Uptrace at the `site_url` you configured in `group_vars/all.yml` (the sample defaults to `http://uptrace.local`). On first startup the playbook seeds an admin account from the `seed_data` block in `group_vars/all.yml` — the sample ships with `admin@uptrace.local` / `admin`.

**Change these seed credentials before exposing Uptrace.** Edit the `seed_data` users, tokens, and project tokens in `group_vars/all.yml` and re-run `uptrace.yml`.

## SSL/TLS Configuration with Let's Encrypt

### Prerequisites for Let's Encrypt

Let's Encrypt uses HTTP challenge for domain verification. Ensure:

1. **Domain resolution**: Your domain must be publicly resolvable:```shell
nslookup your-domain.com
```
2. **Port accessibility**: Ports 80 and 443 must be accessible from the internet

### Configure Let's Encrypt

Update your `group_vars/all.yml` file:

```yaml
# Domain for SSL certificate
site_url: https://your-domain.com/

# Enable Let's Encrypt
certmagic_enabled: true

# Use staging environment for testing (disable for production)
certmagic_staging_ca: true

# Listen on HTTPS port
listen_http_addr: ':443'
```

### Testing SSL Configuration

1. **Test with staging environment** first (set `certmagic_staging_ca: true`)
2. **Verify certificate issuance** in logs
3. **Switch to production** (set `certmagic_staging_ca: false`)
4. **Redeploy** to get production certificate

## Optional Components

### Mailhog for Email Testing

Deploy Mailhog to capture and view emails sent by Uptrace during development:

```shell
ansible-playbook -i inventory.yml mailhog.yml
```

Access Mailhog web interface at [http://localhost:8025](http://localhost:8025).

**Production Note**: Disable Mailhog in production environments and configure proper SMTP settings.

## Scaling and Maintenance

### Horizontal Scaling

#### Scaling Uptrace Application

Add additional Uptrace instances for load distribution:

```yaml
uptrace:
  hosts:
    10.10.1.1:
    10.10.1.2:
    10.10.1.3:
```

Redeploy after updating inventory:

```shell
ansible-playbook -i inventory.yml uptrace.yml
```

#### Scaling ClickHouse

Scale ClickHouse by adding [shards and replicas](https://clickhouse.com/docs/architecture/horizontal-scaling) to the `clickhouse_server` group. Replicas add redundancy within a shard; shards spread data horizontally across hosts. This is the same topology described in [High Availability ClickHouse Configuration](#high-availability-clickhouse-configuration) — remember to also run a `clickhouse_keeper` group and set `ch_replicated`/`ch_distributed`, then redeploy with `ch_keeper.yml` and `ch_server.yml`.

## Troubleshooting

### Common Issues and Solutions

#### Connection Issues

**Problem**: Ansible cannot connect to hosts

**Solution**:

```shell
# Test connectivity
ansible all -i inventory.yml -m ping

# Check SSH access
ssh user@host-ip

# Verify inventory syntax
ansible-inventory -i inventory.yml --list
```

#### Service Failures

**Problem**: Services fail to start

**Solution**:

```shell
# Check service status
ansible all -i inventory.yml -m systemd -a "name=uptrace state=status"

# View logs
ansible all -i inventory.yml -m shell -a "journalctl -u uptrace -n 50"

# Check disk space
ansible all -i inventory.yml -m shell -a "df -h"
```

#### Performance Issues

**Problem**: Slow query performance

**Solutions**:

- Increase ClickHouse memory allocation
- Add more ClickHouse replicas
- Optimize query patterns
- Review resource utilization

### Diagnostic Commands

```shell
# Check all services status
ansible all -i inventory.yml -m shell -a "systemctl status uptrace clickhouse-server postgresql redis"

# Monitor resource usage
ansible all -i inventory.yml -m shell -a "top -n 1 -b"

# Check network connectivity between services
ansible all -i inventory.yml -m shell -a "netstat -tlnp"

# Verify configuration files
ansible all -i inventory.yml -m shell -a "nginx -t" # if using nginx
```

## Security Considerations

### Network Security

- Configure firewalls to allow only necessary ports
- Use VPN or private networks for inter-service communication
- Implement proper network segmentation

### Access Control

- Use SSH key authentication instead of passwords
- Implement role-based access control
- Regularly audit user access

### Data Protection

- Encrypt sensitive data at rest
- Use TLS for all network communications
- Implement proper backup encryption

## Best Practices

### Infrastructure Management

1. **Use version control** for all configuration files
2. **Test changes** in a staging environment first
3. **Implement monitoring** for all services
4. **Document custom configurations** and procedures
5. **Plan for disaster recovery** scenarios

### Configuration Management

1. **Use Ansible Vault** for sensitive data
2. **Implement configuration validation** before deployment
3. **Maintain environment-specific configurations**
4. **Use meaningful variable names** and comments

### Operational Excellence

1. **Automate routine tasks** with additional playbooks
2. **Implement health checks** and monitoring
3. **Create runbooks** for common scenarios
4. **Establish maintenance windows** for updates

## Upgrading Uptrace

Only upgrades to the next minor version are tested and supported, for example, upgrading from 1.1 to 1.2. Skipping minor versions (e.g., 1.1 to 1.3) is not supported — upgrade one minor version at a time.

### Check Current Version

Before upgrading, verify the installed version:

```shell
ansible all -i inventory.yml -m shell -a "uptrace version" -l uptrace
```

Check the latest available version on [GitHub Releases](https://github.com/uptrace/uptrace/releases).

### Back Up Databases

Always create backups of both PostgreSQL and ClickHouse before upgrading.

**PostgreSQL:**

```shell
ansible all -i inventory.yml -m shell -a \
  "sudo -u postgres pg_dump uptrace > /tmp/uptrace-pg-backup-\$(date +%Y%m%d).sql" \
  -l postgresql
```

**ClickHouse:**

The Ansible playbooks install ClickHouse server and client packages, but they don't install `clickhouse-backup`. Use the ClickHouse backup mechanism you operate in production, such as storage snapshots or a separately installed backup tool.

If you manage `clickhouse-backup` yourself, create a backup before running the upgrade:

```shell
ansible all -i inventory.yml -m shell -a \
  "clickhouse-backup create uptrace-backup-\$(date +%Y%m%d)" \
  -l clickhouse_server
```

### Run the Upgrade

Pull the latest Ansible playbooks and re-run the Uptrace playbook:

```shell
git pull
ansible-playbook -i inventory.yml uptrace.yml
```

The playbook automatically installs the new version, validates the config, runs database migrations, and restarts Uptrace.

To install a specific version instead of the latest, set `uptrace_version` in `group_vars/all.yml`. Choose an existing release from [GitHub Releases](https://github.com/uptrace/uptrace/releases) and omit the leading `v`:

```yaml
uptrace_version: '<release-version>'
```

### Verify the Upgrade

After the playbook completes, confirm the upgrade was successful:

```shell
# Check the new version
ansible all -i inventory.yml -m shell -a "uptrace version" -l uptrace

# Verify the service is running
ansible all -i inventory.yml -m shell -a "systemctl status uptrace" -l uptrace

# Check logs for migration errors
ansible all -i inventory.yml -m shell -a "journalctl -u uptrace --since '5 minutes ago' --no-pager" -l uptrace

# Test the health endpoint
ansible all -i inventory.yml -m shell -a "curl -sf http://localhost:80/api/v1/health" -l uptrace
```

### Rolling Back

If the upgrade fails before database migrations run, restore the previous version by setting `uptrace_version` in `group_vars/all.yml` to the previous version and re-running the playbook:

```shell
ansible-playbook -i inventory.yml uptrace.yml
```

If database migrations have already run, stop Uptrace, restore PostgreSQL into a clean database, and then run the rollback playbook:

```shell
# Stop Uptrace so it can't write during the restore
ansible all -i inventory.yml -m shell -a "systemctl stop uptrace" -l uptrace

# Recreate PostgreSQL from the backup
ansible all -i inventory.yml -m shell -a \
  "sudo -u postgres dropdb --if-exists uptrace && sudo -u postgres createdb -O uptrace uptrace && sudo -u postgres psql -v ON_ERROR_STOP=1 uptrace < /tmp/uptrace-pg-backup-YYYYMMDD.sql" \
  -l postgresql

# Reinstall the previous Uptrace version
ansible-playbook -i inventory.yml uptrace.yml
```

Restore ClickHouse using the same backup method you used before the upgrade. If you manage `clickhouse-backup` yourself:

```shell
ansible all -i inventory.yml -m shell -a \
  "clickhouse-backup restore uptrace-backup-YYYYMMDD" \
  -l clickhouse_server
```

## Alternative Deployment Methods

Ansible is one of several deployment options for Uptrace:

- [Docker](/get/hosted/docker) - Quick deployment for development and small-scale production
- [DEB/RPM packages](/get/hosted/install) - Traditional server deployments
- [Kubernetes](/get/hosted/k8s) - Container orchestration at scale

Choose the method that best fits your infrastructure and requirements.

## Next Steps

After successful deployment:

1. **Configure your applications** to send telemetry data to Uptrace
2. **Set up monitoring dashboards** for your services
3. **Create alerting rules** for critical metrics
4. **Implement log aggregation** for better observability
5. **Explore advanced features** like distributed tracing and custom metrics
