Deploying Uptrace with Ansible

This guide provides comprehensive instructions for deploying Uptrace, an open-source APM and observability platform, on bare metal servers using Ansible automation. Uptrace supports distributed tracing, metrics, and log management to help you monitor your applications effectively.

What is Ansible?

Ansible is an open-source IT automation tool that enables configuration management, application deployment, infrastructure provisioning, and orchestration. It allows you to automate repetitive tasks, deploy software, and manage systems consistently and reliably across multiple servers.

Prerequisites

Before proceeding with this guide, ensure you have:

Ansible installed on your control machine - Follow the official installation guide
SSH access to your target servers with sudo privileges
Minimum server requirements:
- 2+ CPU cores
- 4GB+ RAM
- 20GB+ disk space
- Ubuntu 24.04
Network connectivity between servers for cluster configurations

Getting Started

Clone the Ansible Repository

Uptrace maintains a comprehensive set of Ansible playbooks to deploy Uptrace with all required dependencies, including ClickHouse, PostgreSQL, and Redis:

shell

git clone https://github.com/uptrace/ansible.git
cd ansible

Understanding the Repository Structure

The repository contains several key components:

Playbooks: Main deployment scripts (.yml files in the root directory)
Roles: Reusable automation components for each service
Templates: Configuration file templates
Inventory: Server definitions and groupings
Group variables: Shared configuration settings

Inventory Configuration

Understanding Ansible Inventory

An Ansible inventory is a file that defines the hosts (servers or nodes) that Ansible will manage and execute tasks on. It organizes your infrastructure into logical groups and defines variables for each host or group.

Setting Up Your Inventory

Copy the sample inventory file:
shell
```
cp inventory.sample.yml inventory.yml
```
Edit the inventory file to include your server details.

Inventory Best Practices

Use descriptive hostnames or IP addresses
Group related services together
Consider network topology when assigning hosts
Plan for high availability from the start

Configuration Management

Configuration File Locations

The configuration is distributed across several locations for modularity and maintainability:

group_vars/all.yml: Main configuration file containing database passwords and basic Uptrace settings
roles/uptrace/templates/config.yml: Advanced Uptrace configuration options
roles/clickhouse-server/templates/config.xml: ClickHouse-specific configuration
Role-specific templates: Each service role contains its own configuration templates

Setting Up Main Configuration

Copy the sample configuration:

shell

cp group_vars/all.sample.yml group_vars/all.yml

Update the configuration with your specific requirements:
- Database passwords
- Network settings
- Security configurations
- Resource limits

Configuration Security

Use strong, unique passwords for all services
Consider using Ansible Vault for sensitive data
Regularly rotate credentials
Implement proper firewall rules

Deployment Process

Step 1: Bootstrap Servers

Bootstrap your servers and install common software, including the OpenTelemetry Collector:

shell

ansible-playbook -i inventory.yml bootstrap.yml

What this does:

Updates system packages
Installs essential tools and dependencies
Configures basic security settings
Sets up OpenTelemetry Collector

Step 2: Deploy ClickHouse

ClickHouse stores all observability data including spans, logs, events, and metrics.

Update ClickHouse variables in group_vars/all.yml:

yaml

ch_cluster: uptrace1
ch_password: your_secure_password
ch_database: uptrace

Deploy ClickHouse:

shell

ansible-playbook -i inventory.yml ch_server.yml

High Availability ClickHouse Configuration

For production environments, configure multiple ClickHouse instances:

yaml

clickhouse_server:
  hosts:
    10.10.1.1:
      ch_cluster: uptrace1
      ch_shard: shard1
      ch_replica: replica1
    10.10.1.2:
      ch_cluster: uptrace1
      ch_shard: shard1
      ch_replica: replica2
    10.10.1.3:
      ch_cluster: uptrace1
      ch_shard: shard2
      ch_replica: replica1
    10.10.1.4:
      ch_cluster: uptrace1
      ch_shard: shard2
      ch_replica: replica2

Important: When adding new replicas to an existing ClickHouse cluster, you must manually configure ClickHouse replication on the new replica.

Step 3: Deploy PostgreSQL

PostgreSQL stores metadata such as users, projects, and monitor configurations.

Update PostgreSQL variables in group_vars/all.yml:

yaml

pg_database: uptrace
pg_user: uptrace
pg_password: your_secure_password

Deploy PostgreSQL:

shell

ansible-playbook -i inventory.yml pg.yml

High Availability PostgreSQL Configuration

Configure primary-standby replication for high availability:

yaml

postgresql:
  hosts:
    10.10.1.1: # Primary server
    10.10.1.2: # Standby server
      pg_primary_host: 10.10.1.1

Note: You must manually set up PostgreSQL replication between the standby and primary servers. This typically involves:

Creating a base backup on the primary server
Restoring the backup on the standby server
Configuring streaming replication

Step 4: Deploy Redis

Redis provides in-memory caching for improved performance.

Deploy Redis:

shell

ansible-playbook -i inventory.yml redis.yml

High Availability Redis Configuration

Configure multiple Redis instances for redundancy:

yaml

redis_cache:
  hosts:
    10.10.1.1:
      redis_node_id: alpha
      redis_maxmemory_mb: 256
    10.10.1.2:
      redis_node_id: bravo
      redis_maxmemory_mb: 256
    10.10.1.3:
      redis_node_id: charlie
      redis_maxmemory_mb: 256

Note: Each Redis node operates independently, so no replication setup is required. Uptrace handles node selection automatically.

Step 5: Deploy Uptrace

With all dependencies in place, deploy the main Uptrace application:

shell

ansible-playbook -i inventory.yml uptrace.yml

Verification

Check that Uptrace is running correctly:

shell

# View Uptrace logs
sudo journalctl -u uptrace -f

# Check service status
sudo systemctl status uptrace

# Test connectivity
curl -f http://localhost:80/api/v1/health

SSL/TLS Configuration with Let's Encrypt

Prerequisites for Let's Encrypt

Let's Encrypt uses HTTP challenge for domain verification. Ensure:

Domain resolution: Your domain must be publicly resolvable:
shell
```
nslookup your-domain.com
```
Port accessibility: Ports 80 and 443 must be accessible from the internet

Configure Let's Encrypt

Update your group_vars/all.yml file:

yaml

# Domain for SSL certificate
site_url: https://your-domain.com/

# Enable Let's Encrypt
certmagic_enabled: true

# Use staging environment for testing (disable for production)
certmagic_staging_ca: true

# Listen on HTTPS port
listen_http_addr: ':443'

Testing SSL Configuration

Test with staging environment first (set certmagic_staging_ca: true)
Verify certificate issuance in logs
Switch to production (set certmagic_staging_ca: false)
Redeploy to get production certificate

Optional Components

Mailhog for Email Testing

Deploy Mailhog to capture and view emails sent by Uptrace during development:

shell

ansible-playbook -i inventory.yml mailhog.yml

Access Mailhog web interface at http://localhost:8025.

Production Note: Disable Mailhog in production environments and configure proper SMTP settings.

uptrace:
  hosts:
    10.10.1.1:
    10.10.1.2:
    10.10.1.3:

Redeploy after updating inventory:

shell

ansible-playbook -i inventory.yml uptrace.yml

Scaling ClickHouse

Scale ClickHouse by adding shards and replicas:

yaml

clickhouse_server:
  hosts:
    # Shard 1
    10.10.1.1: { ch_cluster: uptrace1, ch_shard: shard1, ch_replica: replica1 }
    10.10.1.2: { ch_cluster: uptrace1, ch_shard: shard1, ch_replica: replica2 }
    # Shard 2
    10.10.1.3: { ch_cluster: uptrace1, ch_shard: shard2, ch_replica: replica1 }
    10.10.1.4: { ch_cluster: uptrace1, ch_shard: shard2, ch_replica: replica2 }

shell

# Test connectivity
ansible all -i inventory.yml -m ping

# Check SSH access
ssh user@host-ip

# Verify inventory syntax
ansible-inventory -i inventory.yml --list

Service Failures

Problem: Services fail to start

Solution:

shell

# Check service status
ansible all -i inventory.yml -m systemd -a "name=uptrace state=status"

# View logs
ansible all -i inventory.yml -m shell -a "journalctl -u uptrace -n 50"

# Check disk space
ansible all -i inventory.yml -m shell -a "df -h"

Performance Issues

Problem: Slow query performance

Solutions:

Increase ClickHouse memory allocation
Add more ClickHouse replicas
Optimize query patterns
Review resource utilization

Diagnostic Commands

shell

# Check all services status
ansible all -i inventory.yml -m shell -a "systemctl status uptrace clickhouse-server postgresql redis"

# Monitor resource usage
ansible all -i inventory.yml -m shell -a "top -n 1 -b"

# Check network connectivity between services
ansible all -i inventory.yml -m shell -a "netstat -tlnp"

# Verify configuration files
ansible all -i inventory.yml -m shell -a "nginx -t" # if using nginx