Deploying Uptrace with Ansible

This guide provides comprehensive instructions for deploying Uptrace, an open-source APM and observability platform, on bare metal servers using Ansible automation. Uptrace supports distributed tracing, metrics, and log management to help you monitor your applications effectively.

What is Ansible?

Ansible is an open-source IT automation tool that enables configuration management, application deployment, infrastructure provisioning, and orchestration. It allows you to automate repetitive tasks, deploy software, and manage systems consistently and reliably across multiple servers.

Prerequisites

Before proceeding with this guide, ensure you have:

  1. Ansible installed on your control machine - Follow the official installation guide
  2. SSH access to your target servers with sudo privileges
  3. Minimum server requirements:
    • 2+ CPU cores
    • 4GB+ RAM
    • 20GB+ disk space
    • Ubuntu 24.04
  4. Network connectivity between servers for cluster configurations

Getting Started

Clone the Ansible Repository

Uptrace maintains a comprehensive set of Ansible playbooks to deploy Uptrace with all required dependencies, including ClickHouse, PostgreSQL, and Redis:

shell
git clone https://github.com/uptrace/ansible.git
cd ansible

Understanding the Repository Structure

The repository contains several key components:

  • Playbooks: Main deployment scripts (.yml files in the root directory)
  • Roles: Reusable automation components for each service
  • Templates: Configuration file templates
  • Inventory: Server definitions and groupings
  • Group variables: Shared configuration settings

Inventory Configuration

Understanding Ansible Inventory

An Ansible inventory is a file that defines the hosts (servers or nodes) that Ansible will manage and execute tasks on. It organizes your infrastructure into logical groups and defines variables for each host or group.

Setting Up Your Inventory

  1. Copy the sample inventory file:
    shell
    cp inventory.sample.yml inventory.yml
    
  2. Edit the inventory file to include your server details.

The inventory groups hosts by the role they play. A minimal single-node setup looks like this:

yaml
uptrace:
  hosts:
    10.10.1.1: { primary: true }

postgresql:
  hosts:
    10.10.1.1:

clickhouse:
  children:
    clickhouse_server:
    clickhouse_keeper:

clickhouse_server:
  hosts:
    10.10.1.1: { ch_cluster: uptrace1, ch_shard: shard1, ch_replica: replica1 }

clickhouse_keeper:
  hosts:
    10.10.1.1: { keeper_id: 1 }

redis:
  children:
    redis_cache:

redis_cache:
  hosts:
    10.10.1.1: { redis_node_id: alpha, redis_maxmemory_mb: 128 }

You can colocate every group on a single host to start, then split services onto dedicated hosts as you scale. The kafka group is optional — see Step 5.

Configuration Management

Configuration File Locations

The configuration is distributed across several locations for modularity and maintainability:

  • group_vars/all.yml: Main configuration file containing database passwords and basic Uptrace settings
  • roles/uptrace/templates/config.yml: Advanced Uptrace configuration options
  • roles/clickhouse-server/templates/config.xml: ClickHouse-specific configuration
  • Role-specific templates: Each service role contains its own configuration templates

Setting Up Main Configuration

  1. Copy the sample configuration:
    shell
    cp group_vars/all.sample.yml group_vars/all.yml
    
  2. Update the configuration with your specific requirements:
    • Database passwords
    • Network settings
    • Security configurations
    • Resource limits

Configuration Security

  • Use strong, unique passwords for all services
  • Consider using Ansible Vault for sensitive data
  • Regularly rotate credentials
  • Implement proper firewall rules

Deployment Process

Step 1: Bootstrap Servers

Bootstrap your servers and install common software, including the OpenTelemetry Collector:

shell
ansible-playbook -i inventory.yml bootstrap.yml

What this does:

  • Updates system packages
  • Installs essential tools and dependencies
  • Configures basic security settings
  • Sets up OpenTelemetry Collector

Step 2: Deploy ClickHouse

ClickHouse stores all observability data including spans, logs, events, and metrics.

  1. Update ClickHouse variables in group_vars/all.yml:
    yaml
    ch_cluster: uptrace1
    ch_db_name: uptrace
    ch_db_user: uptrace
    ch_db_password: your_secure_password
    
  2. Deploy ClickHouse:
    shell
    ansible-playbook -i inventory.yml ch_server.yml
    

High Availability ClickHouse Configuration

For production, run multiple ClickHouse replicas coordinated by ClickHouse Keeper (the built-in ZooKeeper replacement). Replication needs three pieces: hosts in the clickhouse_keeper group, replicas defined in clickhouse_server, and the ch_replicated flag.

  1. Define the cluster topology in inventory.yml. Each ClickHouse host gets a ch_shard and ch_replica; the clickhouse_keeper group runs the coordination service (use 1 or 3 keeper nodes):
    yaml
    clickhouse_server:
      hosts:
        10.10.1.1: { ch_cluster: uptrace1, ch_shard: shard1, ch_replica: replica1 }
        10.10.1.2: { ch_cluster: uptrace1, ch_shard: shard1, ch_replica: replica2 }
        10.10.1.3: { ch_cluster: uptrace1, ch_shard: shard2, ch_replica: replica1 }
        10.10.1.4: { ch_cluster: uptrace1, ch_shard: shard2, ch_replica: replica2 }
    
    clickhouse_keeper:
      hosts:
        10.10.1.1: { keeper_id: 1 }
        10.10.1.2: { keeper_id: 2 }
        10.10.1.3: { keeper_id: 3 }
    
  2. Enable replicated (and, for multiple shards, distributed) tables in group_vars/all.yml:
    yaml
    ch_replicated: true # replicated tables for redundancy and failover
    ch_distributed: true # distributed tables across shards (Premium feature)
    
  3. Deploy Keeper, then the servers:
    shell
    ansible-playbook -i inventory.yml ch_keeper.yml
    ansible-playbook -i inventory.yml ch_server.yml
    

Important: When adding new replicas to an existing cluster that already holds data, the new replica must sync the existing tables. See adding and removing replicas in the Altinity knowledge base.

Step 3: Deploy PostgreSQL

PostgreSQL stores metadata such as users, projects, and monitor configurations.

  1. Update PostgreSQL variables in group_vars/all.yml:
    yaml
    pg_db_name: uptrace
    pg_db_user: uptrace
    pg_db_password: your_secure_password
    
  2. Deploy PostgreSQL:
    shell
    ansible-playbook -i inventory.yml pg.yml
    

High Availability PostgreSQL Configuration

Configure primary-standby replication for high availability:

yaml
postgresql:
  hosts:
    10.10.1.1: # Primary server
    10.10.1.2: # Standby server
      pg_primary_host: 10.10.1.1

Note: You must manually set up PostgreSQL replication between the standby and primary servers. This typically involves:

  1. Creating a base backup on the primary server
  2. Restoring the backup on the standby server
  3. Configuring streaming replication

Step 4: Deploy Redis

Redis provides in-memory caching for improved performance.

Deploy Redis:

shell
ansible-playbook -i inventory.yml redis.yml

High Availability Redis Configuration

Configure multiple Redis instances for redundancy:

yaml
redis_cache:
  hosts:
    10.10.1.1:
      redis_node_id: alpha
      redis_maxmemory_mb: 256
    10.10.1.2:
      redis_node_id: bravo
      redis_maxmemory_mb: 256
    10.10.1.3:
      redis_node_id: charlie
      redis_maxmemory_mb: 256

Note: Each Redis node operates independently, so no replication setup is required. Uptrace handles node selection automatically.

Step 5: Deploy Kafka (Optional)

By default Uptrace writes incoming telemetry to ClickHouse synchronously. For high-ingestion deployments you can place Apache Kafka between ingestion and storage: Uptrace publishes spans, logs, and metrics to Kafka, and a separate uptrace worker process consumes them and inserts the data into ClickHouse. The same brokers are also used by the ClickHouse Kafka engine.

Kafka solves two problems: it buffers data when ClickHouse is down or can't keep up (without it, data arriving during those windows is dropped), and it enables larger, more efficient batches before writing to ClickHouse. It also lets ingestion and storage scale and fail independently. In return it adds operational complexity, so reach for it only after exhausting the other scaling options. The Kafka worker requires the Enterprise tier — without the licensed Kafka feature it refuses to start. See scaling Uptrace for the full configuration reference and tuning options.

Kafka is entirely optional and is enabled simply by adding a kafka group to your inventory. When the group is present, the Uptrace role automatically:

  • renders a top-level kafka section and a ch_cluster.kafka section in config.yml from the group's hosts, and
  • installs and starts an additional uptrace-worker systemd service that drains the Kafka queues.

If you later remove the kafka group and re-run uptrace.yml, the worker is stopped and removed and Uptrace reverts to writing directly to ClickHouse.

  1. Add a Kafka broker to your inventory:
    yaml
    kafka:
      hosts:
        10.10.1.1: { kafka_node_id: 1 }
    

    Brokers listen on port 9092. Each broker advertises its inventory hostname, so make sure your Uptrace and ClickHouse hosts can reach the Kafka host at that address.
  2. Keep the KRaft cluster id constant in group_vars/all.yml (already present in the sample). It is used to format Kafka storage and must not change between runs:
    yaml
    # Generate your own with: /opt/kafka/bin/kafka-storage.sh random-uuid
    kafka_cluster_id: MkU3OEVBNTcwNTJENDM2Qk
    
  3. Deploy Kafka:
    shell
    ansible-playbook -i inventory.yml kafka.yml
    
  4. Re-deploy Uptrace so it picks up the Kafka configuration and starts the worker (covered in the next step):
    shell
    ansible-playbook -i inventory.yml uptrace.yml
    

Verification

shell
# Check the worker service status
sudo systemctl status uptrace-worker

# View worker logs (should show it consuming Kafka topics)
sudo journalctl -u uptrace-worker -f

Step 6: Deploy Uptrace

With all dependencies in place, deploy the main Uptrace application:

shell
ansible-playbook -i inventory.yml uptrace.yml

Verification

Check that Uptrace is running correctly:

shell
# View Uptrace logs
sudo journalctl -u uptrace -f

# Check service status
sudo systemctl status uptrace

# If Kafka is enabled, also check the worker that drains the queues
sudo systemctl status uptrace-worker

# Test connectivity
curl -f http://localhost:80/api/v1/health

First Login

Open Uptrace at the site_url you configured in group_vars/all.yml (the sample defaults to http://uptrace.local). On first startup the playbook seeds an admin account from the seed_data block in group_vars/all.yml — the sample ships with admin@uptrace.local / admin.

Change these seed credentials before exposing Uptrace. Edit the seed_data users, tokens, and project tokens in group_vars/all.yml and re-run uptrace.yml.

SSL/TLS Configuration with Let's Encrypt

Prerequisites for Let's Encrypt

Let's Encrypt uses HTTP challenge for domain verification. Ensure:

  1. Domain resolution: Your domain must be publicly resolvable:
    shell
    nslookup your-domain.com
    
  2. Port accessibility: Ports 80 and 443 must be accessible from the internet

Configure Let's Encrypt

Update your group_vars/all.yml file:

yaml
# Domain for SSL certificate
site_url: https://your-domain.com/

# Enable Let's Encrypt
certmagic_enabled: true

# Use staging environment for testing (disable for production)
certmagic_staging_ca: true

# Listen on HTTPS port
listen_http_addr: ':443'

Testing SSL Configuration

  1. Test with staging environment first (set certmagic_staging_ca: true)
  2. Verify certificate issuance in logs
  3. Switch to production (set certmagic_staging_ca: false)
  4. Redeploy to get production certificate

Optional Components

Mailhog for Email Testing

Deploy Mailhog to capture and view emails sent by Uptrace during development:

shell
ansible-playbook -i inventory.yml mailhog.yml

Access Mailhog web interface at http://localhost:8025.

Production Note: Disable Mailhog in production environments and configure proper SMTP settings.

Scaling and Maintenance

Horizontal Scaling

Scaling Uptrace Application

Add additional Uptrace instances for load distribution:

yaml
uptrace:
  hosts:
    10.10.1.1:
    10.10.1.2:
    10.10.1.3:

Redeploy after updating inventory:

shell
ansible-playbook -i inventory.yml uptrace.yml

Scaling ClickHouse

Scale ClickHouse by adding shards and replicas to the clickhouse_server group. Replicas add redundancy within a shard; shards spread data horizontally across hosts. This is the same topology described in High Availability ClickHouse Configuration — remember to also run a clickhouse_keeper group and set ch_replicated/ch_distributed, then redeploy with ch_keeper.yml and ch_server.yml.

Troubleshooting

Common Issues and Solutions

Connection Issues

Problem: Ansible cannot connect to hosts

Solution:

shell
# Test connectivity
ansible all -i inventory.yml -m ping

# Check SSH access
ssh user@host-ip

# Verify inventory syntax
ansible-inventory -i inventory.yml --list

Service Failures

Problem: Services fail to start

Solution:

shell
# Check service status
ansible all -i inventory.yml -m systemd -a "name=uptrace state=status"

# View logs
ansible all -i inventory.yml -m shell -a "journalctl -u uptrace -n 50"

# Check disk space
ansible all -i inventory.yml -m shell -a "df -h"

Performance Issues

Problem: Slow query performance

Solutions:

  • Increase ClickHouse memory allocation
  • Add more ClickHouse replicas
  • Optimize query patterns
  • Review resource utilization

Diagnostic Commands

shell
# Check all services status
ansible all -i inventory.yml -m shell -a "systemctl status uptrace clickhouse-server postgresql redis"

# Monitor resource usage
ansible all -i inventory.yml -m shell -a "top -n 1 -b"

# Check network connectivity between services
ansible all -i inventory.yml -m shell -a "netstat -tlnp"

# Verify configuration files
ansible all -i inventory.yml -m shell -a "nginx -t" # if using nginx

Security Considerations

Network Security

  • Configure firewalls to allow only necessary ports
  • Use VPN or private networks for inter-service communication
  • Implement proper network segmentation

Access Control

  • Use SSH key authentication instead of passwords
  • Implement role-based access control
  • Regularly audit user access

Data Protection

  • Encrypt sensitive data at rest
  • Use TLS for all network communications
  • Implement proper backup encryption

Best Practices

Infrastructure Management

  1. Use version control for all configuration files
  2. Test changes in a staging environment first
  3. Implement monitoring for all services
  4. Document custom configurations and procedures
  5. Plan for disaster recovery scenarios

Configuration Management

  1. Use Ansible Vault for sensitive data
  2. Implement configuration validation before deployment
  3. Maintain environment-specific configurations
  4. Use meaningful variable names and comments

Operational Excellence

  1. Automate routine tasks with additional playbooks
  2. Implement health checks and monitoring
  3. Create runbooks for common scenarios
  4. Establish maintenance windows for updates

Upgrading Uptrace

Only upgrades to the next minor version are tested and supported, for example, upgrading from 1.1 to 1.2. Skipping minor versions (e.g., 1.1 to 1.3) is not supported — upgrade one minor version at a time.

Check Current Version

Before upgrading, verify the installed version:

shell
ansible all -i inventory.yml -m shell -a "uptrace version" -l uptrace

Check the latest available version on GitHub Releases.

Back Up Databases

Always create backups of both PostgreSQL and ClickHouse before upgrading.

PostgreSQL:

shell
ansible all -i inventory.yml -m shell -a \
  "sudo -u postgres pg_dump uptrace > /tmp/uptrace-pg-backup-\$(date +%Y%m%d).sql" \
  -l postgresql

ClickHouse:

The Ansible playbooks install ClickHouse server and client packages, but they don't install clickhouse-backup. Use the ClickHouse backup mechanism you operate in production, such as storage snapshots or a separately installed backup tool.

If you manage clickhouse-backup yourself, create a backup before running the upgrade:

shell
ansible all -i inventory.yml -m shell -a \
  "clickhouse-backup create uptrace-backup-\$(date +%Y%m%d)" \
  -l clickhouse_server

Run the Upgrade

Pull the latest Ansible playbooks and re-run the Uptrace playbook:

shell
git pull
ansible-playbook -i inventory.yml uptrace.yml

The playbook automatically installs the new version, validates the config, runs database migrations, and restarts Uptrace.

To install a specific version instead of the latest, set uptrace_version in group_vars/all.yml. Choose an existing release from GitHub Releases and omit the leading v:

yaml
uptrace_version: '<release-version>'

Verify the Upgrade

After the playbook completes, confirm the upgrade was successful:

shell
# Check the new version
ansible all -i inventory.yml -m shell -a "uptrace version" -l uptrace

# Verify the service is running
ansible all -i inventory.yml -m shell -a "systemctl status uptrace" -l uptrace

# Check logs for migration errors
ansible all -i inventory.yml -m shell -a "journalctl -u uptrace --since '5 minutes ago' --no-pager" -l uptrace

# Test the health endpoint
ansible all -i inventory.yml -m shell -a "curl -sf http://localhost:80/api/v1/health" -l uptrace

Rolling Back

If the upgrade fails before database migrations run, restore the previous version by setting uptrace_version in group_vars/all.yml to the previous version and re-running the playbook:

shell
ansible-playbook -i inventory.yml uptrace.yml

If database migrations have already run, stop Uptrace, restore PostgreSQL into a clean database, and then run the rollback playbook:

shell
# Stop Uptrace so it can't write during the restore
ansible all -i inventory.yml -m shell -a "systemctl stop uptrace" -l uptrace

# Recreate PostgreSQL from the backup
ansible all -i inventory.yml -m shell -a \
  "sudo -u postgres dropdb --if-exists uptrace && sudo -u postgres createdb -O uptrace uptrace && sudo -u postgres psql -v ON_ERROR_STOP=1 uptrace < /tmp/uptrace-pg-backup-YYYYMMDD.sql" \
  -l postgresql

# Reinstall the previous Uptrace version
ansible-playbook -i inventory.yml uptrace.yml

Restore ClickHouse using the same backup method you used before the upgrade. If you manage clickhouse-backup yourself:

shell
ansible all -i inventory.yml -m shell -a \
  "clickhouse-backup restore uptrace-backup-YYYYMMDD" \
  -l clickhouse_server

Alternative Deployment Methods

Ansible is one of several deployment options for Uptrace:

  • Docker - Quick deployment for development and small-scale production
  • DEB/RPM packages - Traditional server deployments
  • Kubernetes - Container orchestration at scale

Choose the method that best fits your infrastructure and requirements.

Next Steps

After successful deployment:

  1. Configure your applications to send telemetry data to Uptrace
  2. Set up monitoring dashboards for your services
  3. Create alerting rules for critical metrics
  4. Implement log aggregation for better observability
  5. Explore advanced features like distributed tracing and custom metrics