Deploying Uptrace with Ansible
This guide provides comprehensive instructions for deploying Uptrace, an open-source APM and observability platform, on bare metal servers using Ansible automation. Uptrace supports distributed tracing, metrics, and log management to help you monitor your applications effectively.
What is Ansible?
Ansible is an open-source IT automation tool that enables configuration management, application deployment, infrastructure provisioning, and orchestration. It allows you to automate repetitive tasks, deploy software, and manage systems consistently and reliably across multiple servers.
Prerequisites
Before proceeding with this guide, ensure you have:
- Ansible installed on your control machine - Follow the official installation guide
- SSH access to your target servers with sudo privileges
- Minimum server requirements:
- 2+ CPU cores
- 4GB+ RAM
- 20GB+ disk space
- Ubuntu 24.04
- Network connectivity between servers for cluster configurations
Getting Started
Clone the Ansible Repository
Uptrace maintains a comprehensive set of Ansible playbooks to deploy Uptrace with all required dependencies, including ClickHouse, PostgreSQL, and Redis:
git clone https://github.com/uptrace/ansible.git
cd ansible
Understanding the Repository Structure
The repository contains several key components:
- Playbooks: Main deployment scripts (
.ymlfiles in the root directory) - Roles: Reusable automation components for each service
- Templates: Configuration file templates
- Inventory: Server definitions and groupings
- Group variables: Shared configuration settings
Inventory Configuration
Understanding Ansible Inventory
An Ansible inventory is a file that defines the hosts (servers or nodes) that Ansible will manage and execute tasks on. It organizes your infrastructure into logical groups and defines variables for each host or group.
Setting Up Your Inventory
- Copy the sample inventory file:shell
cp inventory.sample.yml inventory.yml - Edit the inventory file to include your server details.
The inventory groups hosts by the role they play. A minimal single-node setup looks like this:
uptrace:
hosts:
10.10.1.1: { primary: true }
postgresql:
hosts:
10.10.1.1:
clickhouse:
children:
clickhouse_server:
clickhouse_keeper:
clickhouse_server:
hosts:
10.10.1.1: { ch_cluster: uptrace1, ch_shard: shard1, ch_replica: replica1 }
clickhouse_keeper:
hosts:
10.10.1.1: { keeper_id: 1 }
redis:
children:
redis_cache:
redis_cache:
hosts:
10.10.1.1: { redis_node_id: alpha, redis_maxmemory_mb: 128 }
You can colocate every group on a single host to start, then split services onto dedicated hosts as you scale. The kafka group is optional — see Step 5.
Configuration Management
Configuration File Locations
The configuration is distributed across several locations for modularity and maintainability:
group_vars/all.yml: Main configuration file containing database passwords and basic Uptrace settingsroles/uptrace/templates/config.yml: Advanced Uptrace configuration optionsroles/clickhouse-server/templates/config.xml: ClickHouse-specific configuration- Role-specific templates: Each service role contains its own configuration templates
Setting Up Main Configuration
- Copy the sample configuration:shell
cp group_vars/all.sample.yml group_vars/all.yml - Update the configuration with your specific requirements:
- Database passwords
- Network settings
- Security configurations
- Resource limits
Configuration Security
- Use strong, unique passwords for all services
- Consider using Ansible Vault for sensitive data
- Regularly rotate credentials
- Implement proper firewall rules
Deployment Process
Step 1: Bootstrap Servers
Bootstrap your servers and install common software, including the OpenTelemetry Collector:
ansible-playbook -i inventory.yml bootstrap.yml
What this does:
- Updates system packages
- Installs essential tools and dependencies
- Configures basic security settings
- Sets up OpenTelemetry Collector
Step 2: Deploy ClickHouse
ClickHouse stores all observability data including spans, logs, events, and metrics.
- Update ClickHouse variables in
group_vars/all.yml:yamlch_cluster: uptrace1 ch_db_name: uptrace ch_db_user: uptrace ch_db_password: your_secure_password - Deploy ClickHouse:shell
ansible-playbook -i inventory.yml ch_server.yml
High Availability ClickHouse Configuration
For production, run multiple ClickHouse replicas coordinated by ClickHouse Keeper (the built-in ZooKeeper replacement). Replication needs three pieces: hosts in the clickhouse_keeper group, replicas defined in clickhouse_server, and the ch_replicated flag.
- Define the cluster topology in
inventory.yml. Each ClickHouse host gets ach_shardandch_replica; theclickhouse_keepergroup runs the coordination service (use 1 or 3 keeper nodes):yamlclickhouse_server: hosts: 10.10.1.1: { ch_cluster: uptrace1, ch_shard: shard1, ch_replica: replica1 } 10.10.1.2: { ch_cluster: uptrace1, ch_shard: shard1, ch_replica: replica2 } 10.10.1.3: { ch_cluster: uptrace1, ch_shard: shard2, ch_replica: replica1 } 10.10.1.4: { ch_cluster: uptrace1, ch_shard: shard2, ch_replica: replica2 } clickhouse_keeper: hosts: 10.10.1.1: { keeper_id: 1 } 10.10.1.2: { keeper_id: 2 } 10.10.1.3: { keeper_id: 3 } - Enable replicated (and, for multiple shards, distributed) tables in
group_vars/all.yml:yamlch_replicated: true # replicated tables for redundancy and failover ch_distributed: true # distributed tables across shards (Premium feature) - Deploy Keeper, then the servers:shell
ansible-playbook -i inventory.yml ch_keeper.yml ansible-playbook -i inventory.yml ch_server.yml
Important: When adding new replicas to an existing cluster that already holds data, the new replica must sync the existing tables. See adding and removing replicas in the Altinity knowledge base.
Step 3: Deploy PostgreSQL
PostgreSQL stores metadata such as users, projects, and monitor configurations.
- Update PostgreSQL variables in
group_vars/all.yml:yamlpg_db_name: uptrace pg_db_user: uptrace pg_db_password: your_secure_password - Deploy PostgreSQL:shell
ansible-playbook -i inventory.yml pg.yml
High Availability PostgreSQL Configuration
Configure primary-standby replication for high availability:
postgresql:
hosts:
10.10.1.1: # Primary server
10.10.1.2: # Standby server
pg_primary_host: 10.10.1.1
Note: You must manually set up PostgreSQL replication between the standby and primary servers. This typically involves:
- Creating a base backup on the primary server
- Restoring the backup on the standby server
- Configuring streaming replication
Step 4: Deploy Redis
Redis provides in-memory caching for improved performance.
Deploy Redis:
ansible-playbook -i inventory.yml redis.yml
High Availability Redis Configuration
Configure multiple Redis instances for redundancy:
redis_cache:
hosts:
10.10.1.1:
redis_node_id: alpha
redis_maxmemory_mb: 256
10.10.1.2:
redis_node_id: bravo
redis_maxmemory_mb: 256
10.10.1.3:
redis_node_id: charlie
redis_maxmemory_mb: 256
Note: Each Redis node operates independently, so no replication setup is required. Uptrace handles node selection automatically.
Step 5: Deploy Kafka (Optional)
By default Uptrace writes incoming telemetry to ClickHouse synchronously. For high-ingestion deployments you can place Apache Kafka between ingestion and storage: Uptrace publishes spans, logs, and metrics to Kafka, and a separate uptrace worker process consumes them and inserts the data into ClickHouse. The same brokers are also used by the ClickHouse Kafka engine.
Kafka solves two problems: it buffers data when ClickHouse is down or can't keep up (without it, data arriving during those windows is dropped), and it enables larger, more efficient batches before writing to ClickHouse. It also lets ingestion and storage scale and fail independently. In return it adds operational complexity, so reach for it only after exhausting the other scaling options. The Kafka worker requires the Enterprise tier — without the licensed Kafka feature it refuses to start. See scaling Uptrace for the full configuration reference and tuning options.
Kafka is entirely optional and is enabled simply by adding a kafka group to your inventory. When the group is present, the Uptrace role automatically:
- renders a top-level
kafkasection and ach_cluster.kafkasection inconfig.ymlfrom the group's hosts, and - installs and starts an additional
uptrace-workersystemd service that drains the Kafka queues.
If you later remove the kafka group and re-run uptrace.yml, the worker is stopped and removed and Uptrace reverts to writing directly to ClickHouse.
- Add a Kafka broker to your inventory:yaml
kafka: hosts: 10.10.1.1: { kafka_node_id: 1 }
Brokers listen on port9092. Each broker advertises its inventory hostname, so make sure your Uptrace and ClickHouse hosts can reach the Kafka host at that address. - Keep the KRaft cluster id constant in
group_vars/all.yml(already present in the sample). It is used to format Kafka storage and must not change between runs:yaml# Generate your own with: /opt/kafka/bin/kafka-storage.sh random-uuid kafka_cluster_id: MkU3OEVBNTcwNTJENDM2Qk - Deploy Kafka:shell
ansible-playbook -i inventory.yml kafka.yml - Re-deploy Uptrace so it picks up the Kafka configuration and starts the worker (covered in the next step):shell
ansible-playbook -i inventory.yml uptrace.yml
Verification
# Check the worker service status
sudo systemctl status uptrace-worker
# View worker logs (should show it consuming Kafka topics)
sudo journalctl -u uptrace-worker -f
Step 6: Deploy Uptrace
With all dependencies in place, deploy the main Uptrace application:
ansible-playbook -i inventory.yml uptrace.yml
Verification
Check that Uptrace is running correctly:
# View Uptrace logs
sudo journalctl -u uptrace -f
# Check service status
sudo systemctl status uptrace
# If Kafka is enabled, also check the worker that drains the queues
sudo systemctl status uptrace-worker
# Test connectivity
curl -f http://localhost:80/api/v1/health
First Login
Open Uptrace at the site_url you configured in group_vars/all.yml (the sample defaults to http://uptrace.local). On first startup the playbook seeds an admin account from the seed_data block in group_vars/all.yml — the sample ships with admin@uptrace.local / admin.
Change these seed credentials before exposing Uptrace. Edit the seed_data users, tokens, and project tokens in group_vars/all.yml and re-run uptrace.yml.
SSL/TLS Configuration with Let's Encrypt
Prerequisites for Let's Encrypt
Let's Encrypt uses HTTP challenge for domain verification. Ensure:
- Domain resolution: Your domain must be publicly resolvable:shell
nslookup your-domain.com - Port accessibility: Ports 80 and 443 must be accessible from the internet
Configure Let's Encrypt
Update your group_vars/all.yml file:
# Domain for SSL certificate
site_url: https://your-domain.com/
# Enable Let's Encrypt
certmagic_enabled: true
# Use staging environment for testing (disable for production)
certmagic_staging_ca: true
# Listen on HTTPS port
listen_http_addr: ':443'
Testing SSL Configuration
- Test with staging environment first (set
certmagic_staging_ca: true) - Verify certificate issuance in logs
- Switch to production (set
certmagic_staging_ca: false) - Redeploy to get production certificate
Optional Components
Mailhog for Email Testing
Deploy Mailhog to capture and view emails sent by Uptrace during development:
ansible-playbook -i inventory.yml mailhog.yml
Access Mailhog web interface at http://localhost:8025.
Production Note: Disable Mailhog in production environments and configure proper SMTP settings.
Scaling and Maintenance
Horizontal Scaling
Scaling Uptrace Application
Add additional Uptrace instances for load distribution:
uptrace:
hosts:
10.10.1.1:
10.10.1.2:
10.10.1.3:
Redeploy after updating inventory:
ansible-playbook -i inventory.yml uptrace.yml
Scaling ClickHouse
Scale ClickHouse by adding shards and replicas to the clickhouse_server group. Replicas add redundancy within a shard; shards spread data horizontally across hosts. This is the same topology described in High Availability ClickHouse Configuration — remember to also run a clickhouse_keeper group and set ch_replicated/ch_distributed, then redeploy with ch_keeper.yml and ch_server.yml.
Troubleshooting
Common Issues and Solutions
Connection Issues
Problem: Ansible cannot connect to hosts
Solution:
# Test connectivity
ansible all -i inventory.yml -m ping
# Check SSH access
ssh user@host-ip
# Verify inventory syntax
ansible-inventory -i inventory.yml --list
Service Failures
Problem: Services fail to start
Solution:
# Check service status
ansible all -i inventory.yml -m systemd -a "name=uptrace state=status"
# View logs
ansible all -i inventory.yml -m shell -a "journalctl -u uptrace -n 50"
# Check disk space
ansible all -i inventory.yml -m shell -a "df -h"
Performance Issues
Problem: Slow query performance
Solutions:
- Increase ClickHouse memory allocation
- Add more ClickHouse replicas
- Optimize query patterns
- Review resource utilization
Diagnostic Commands
# Check all services status
ansible all -i inventory.yml -m shell -a "systemctl status uptrace clickhouse-server postgresql redis"
# Monitor resource usage
ansible all -i inventory.yml -m shell -a "top -n 1 -b"
# Check network connectivity between services
ansible all -i inventory.yml -m shell -a "netstat -tlnp"
# Verify configuration files
ansible all -i inventory.yml -m shell -a "nginx -t" # if using nginx
Security Considerations
Network Security
- Configure firewalls to allow only necessary ports
- Use VPN or private networks for inter-service communication
- Implement proper network segmentation
Access Control
- Use SSH key authentication instead of passwords
- Implement role-based access control
- Regularly audit user access
Data Protection
- Encrypt sensitive data at rest
- Use TLS for all network communications
- Implement proper backup encryption
Best Practices
Infrastructure Management
- Use version control for all configuration files
- Test changes in a staging environment first
- Implement monitoring for all services
- Document custom configurations and procedures
- Plan for disaster recovery scenarios
Configuration Management
- Use Ansible Vault for sensitive data
- Implement configuration validation before deployment
- Maintain environment-specific configurations
- Use meaningful variable names and comments
Operational Excellence
- Automate routine tasks with additional playbooks
- Implement health checks and monitoring
- Create runbooks for common scenarios
- Establish maintenance windows for updates
Upgrading Uptrace
Only upgrades to the next minor version are tested and supported, for example, upgrading from 1.1 to 1.2. Skipping minor versions (e.g., 1.1 to 1.3) is not supported — upgrade one minor version at a time.
Check Current Version
Before upgrading, verify the installed version:
ansible all -i inventory.yml -m shell -a "uptrace version" -l uptrace
Check the latest available version on GitHub Releases.
Back Up Databases
Always create backups of both PostgreSQL and ClickHouse before upgrading.
PostgreSQL:
ansible all -i inventory.yml -m shell -a \
"sudo -u postgres pg_dump uptrace > /tmp/uptrace-pg-backup-\$(date +%Y%m%d).sql" \
-l postgresql
ClickHouse:
The Ansible playbooks install ClickHouse server and client packages, but they don't install clickhouse-backup. Use the ClickHouse backup mechanism you operate in production, such as storage snapshots or a separately installed backup tool.
If you manage clickhouse-backup yourself, create a backup before running the upgrade:
ansible all -i inventory.yml -m shell -a \
"clickhouse-backup create uptrace-backup-\$(date +%Y%m%d)" \
-l clickhouse_server
Run the Upgrade
Pull the latest Ansible playbooks and re-run the Uptrace playbook:
git pull
ansible-playbook -i inventory.yml uptrace.yml
The playbook automatically installs the new version, validates the config, runs database migrations, and restarts Uptrace.
To install a specific version instead of the latest, set uptrace_version in group_vars/all.yml. Choose an existing release from GitHub Releases and omit the leading v:
uptrace_version: '<release-version>'
Verify the Upgrade
After the playbook completes, confirm the upgrade was successful:
# Check the new version
ansible all -i inventory.yml -m shell -a "uptrace version" -l uptrace
# Verify the service is running
ansible all -i inventory.yml -m shell -a "systemctl status uptrace" -l uptrace
# Check logs for migration errors
ansible all -i inventory.yml -m shell -a "journalctl -u uptrace --since '5 minutes ago' --no-pager" -l uptrace
# Test the health endpoint
ansible all -i inventory.yml -m shell -a "curl -sf http://localhost:80/api/v1/health" -l uptrace
Rolling Back
If the upgrade fails before database migrations run, restore the previous version by setting uptrace_version in group_vars/all.yml to the previous version and re-running the playbook:
ansible-playbook -i inventory.yml uptrace.yml
If database migrations have already run, stop Uptrace, restore PostgreSQL into a clean database, and then run the rollback playbook:
# Stop Uptrace so it can't write during the restore
ansible all -i inventory.yml -m shell -a "systemctl stop uptrace" -l uptrace
# Recreate PostgreSQL from the backup
ansible all -i inventory.yml -m shell -a \
"sudo -u postgres dropdb --if-exists uptrace && sudo -u postgres createdb -O uptrace uptrace && sudo -u postgres psql -v ON_ERROR_STOP=1 uptrace < /tmp/uptrace-pg-backup-YYYYMMDD.sql" \
-l postgresql
# Reinstall the previous Uptrace version
ansible-playbook -i inventory.yml uptrace.yml
Restore ClickHouse using the same backup method you used before the upgrade. If you manage clickhouse-backup yourself:
ansible all -i inventory.yml -m shell -a \
"clickhouse-backup restore uptrace-backup-YYYYMMDD" \
-l clickhouse_server
Alternative Deployment Methods
Ansible is one of several deployment options for Uptrace:
- Docker - Quick deployment for development and small-scale production
- DEB/RPM packages - Traditional server deployments
- Kubernetes - Container orchestration at scale
Choose the method that best fits your infrastructure and requirements.
Next Steps
After successful deployment:
- Configure your applications to send telemetry data to Uptrace
- Set up monitoring dashboards for your services
- Create alerting rules for critical metrics
- Implement log aggregation for better observability
- Explore advanced features like distributed tracing and custom metrics