Deploying Uptrace with Ansible
Deploying Uptrace with Ansible
This guide provides comprehensive instructions for deploying Uptrace, an open-source APM and observability platform, on bare metal servers using Ansible automation. Uptrace supports distributed tracing, metrics, and log management to help you monitor your applications effectively.
What is Ansible?
Ansible is an open-source IT automation tool that enables configuration management, application deployment, infrastructure provisioning, and orchestration. It allows you to automate repetitive tasks, deploy software, and manage systems consistently and reliably across multiple servers.
Prerequisites
Before proceeding with this guide, ensure you have:
- Ansible installed on your control machine - Follow the official installation guide
- SSH access to your target servers with sudo privileges
- Minimum server requirements:
- 2+ CPU cores
- 4GB+ RAM
- 20GB+ disk space
- Ubuntu 24.04
- Network connectivity between servers for cluster configurations
Getting Started
Clone the Ansible Repository
Uptrace maintains a comprehensive set of Ansible playbooks to deploy Uptrace with all required dependencies, including ClickHouse, PostgreSQL, and Redis:
git clone https://github.com/uptrace/ansible.git
cd ansible
Understanding the Repository Structure
The repository contains several key components:
- Playbooks: Main deployment scripts (
.yml
files in the root directory) - Roles: Reusable automation components for each service
- Templates: Configuration file templates
- Inventory: Server definitions and groupings
- Group variables: Shared configuration settings
Inventory Configuration
Understanding Ansible Inventory
An Ansible inventory is a file that defines the hosts (servers or nodes) that Ansible will manage and execute tasks on. It organizes your infrastructure into logical groups and defines variables for each host or group.
Setting Up Your Inventory
- Copy the sample inventory file:shell
cp inventory.sample.yml inventory.yml
- Edit the inventory file to include your server details.
Inventory Best Practices
- Use descriptive hostnames or IP addresses
- Group related services together
- Consider network topology when assigning hosts
- Plan for high availability from the start
Configuration Management
Configuration File Locations
The configuration is distributed across several locations for modularity and maintainability:
group_vars/all.yml
: Main configuration file containing database passwords and basic Uptrace settingsroles/uptrace/templates/config.yml
: Advanced Uptrace configuration optionsroles/clickhouse-server/templates/config.xml
: ClickHouse-specific configuration- Role-specific templates: Each service role contains its own configuration templates
Setting Up Main Configuration
- Copy the sample configuration:shell
cp group_vars/all.sample.yml group_vars/all.yml
- Update the configuration with your specific requirements:
- Database passwords
- Network settings
- Security configurations
- Resource limits
Configuration Security
- Use strong, unique passwords for all services
- Consider using Ansible Vault for sensitive data
- Regularly rotate credentials
- Implement proper firewall rules
Deployment Process
Step 1: Bootstrap Servers
Bootstrap your servers and install common software, including the OpenTelemetry Collector:
ansible-playbook -i inventory.yml bootstrap.yml
What this does:
- Updates system packages
- Installs essential tools and dependencies
- Configures basic security settings
- Sets up OpenTelemetry Collector
Step 2: Deploy ClickHouse
ClickHouse stores all observability data including spans, logs, events, and metrics.
- Update ClickHouse variables in
group_vars/all.yml
:yamlch_cluster: uptrace1 ch_password: your_secure_password ch_database: uptrace
- Deploy ClickHouse:shell
ansible-playbook -i inventory.yml ch_server.yml
High Availability ClickHouse Configuration
For production environments, configure multiple ClickHouse instances:
clickhouse_server:
hosts:
10.10.1.1:
ch_cluster: uptrace1
ch_shard: shard1
ch_replica: replica1
10.10.1.2:
ch_cluster: uptrace1
ch_shard: shard1
ch_replica: replica2
10.10.1.3:
ch_cluster: uptrace1
ch_shard: shard2
ch_replica: replica1
10.10.1.4:
ch_cluster: uptrace1
ch_shard: shard2
ch_replica: replica2
Important: When adding new replicas to an existing ClickHouse cluster, you must manually configure ClickHouse replication on the new replica.
Step 3: Deploy PostgreSQL
PostgreSQL stores metadata such as users, projects, and monitor configurations.
- Update PostgreSQL variables in
group_vars/all.yml
:yamlpg_database: uptrace pg_user: uptrace pg_password: your_secure_password
- Deploy PostgreSQL:shell
ansible-playbook -i inventory.yml pg.yml
High Availability PostgreSQL Configuration
Configure primary-standby replication for high availability:
postgresql:
hosts:
10.10.1.1: # Primary server
10.10.1.2: # Standby server
pg_primary_host: 10.10.1.1
Note: You must manually set up PostgreSQL replication between the standby and primary servers. This typically involves:
- Creating a base backup on the primary server
- Restoring the backup on the standby server
- Configuring streaming replication
Step 4: Deploy Redis
Redis provides in-memory caching for improved performance.
Deploy Redis:
ansible-playbook -i inventory.yml redis.yml
High Availability Redis Configuration
Configure multiple Redis instances for redundancy:
redis_cache:
hosts:
10.10.1.1:
redis_node_id: alpha
redis_maxmemory_mb: 256
10.10.1.2:
redis_node_id: bravo
redis_maxmemory_mb: 256
10.10.1.3:
redis_node_id: charlie
redis_maxmemory_mb: 256
Note: Each Redis node operates independently, so no replication setup is required. Uptrace handles node selection automatically.
Step 5: Deploy Uptrace
With all dependencies in place, deploy the main Uptrace application:
ansible-playbook -i inventory.yml uptrace.yml
Verification
Check that Uptrace is running correctly:
# View Uptrace logs
sudo journalctl -u uptrace -f
# Check service status
sudo systemctl status uptrace
# Test connectivity
curl -f http://localhost:80/api/v1/health
SSL/TLS Configuration with Let's Encrypt
Prerequisites for Let's Encrypt
Let's Encrypt uses HTTP challenge for domain verification. Ensure:
- Domain resolution: Your domain must be publicly resolvable:shell
nslookup your-domain.com
- Port accessibility: Ports 80 and 443 must be accessible from the internet
Configure Let's Encrypt
Update your group_vars/all.yml
file:
# Domain for SSL certificate
site_url: https://your-domain.com/
# Enable Let's Encrypt
certmagic_enabled: true
# Use staging environment for testing (disable for production)
certmagic_staging_ca: true
# Listen on HTTPS port
listen_http_addr: ':443'
Testing SSL Configuration
- Test with staging environment first (set
certmagic_staging_ca: true
) - Verify certificate issuance in logs
- Switch to production (set
certmagic_staging_ca: false
) - Redeploy to get production certificate
Optional Components
Mailhog for Email Testing
Deploy Mailhog to capture and view emails sent by Uptrace during development:
ansible-playbook -i inventory.yml mailhog.yml
Access Mailhog web interface at http://localhost:8025.
Production Note: Disable Mailhog in production environments and configure proper SMTP settings.
Scaling and Maintenance
Horizontal Scaling
Scaling Uptrace Application
Add additional Uptrace instances for load distribution:
uptrace:
hosts:
10.10.1.1:
10.10.1.2:
10.10.1.3:
Redeploy after updating inventory:
ansible-playbook -i inventory.yml uptrace.yml
Scaling ClickHouse
Scale ClickHouse by adding shards and replicas:
clickhouse_server:
hosts:
# Shard 1
10.10.1.1: { ch_cluster: uptrace1, ch_shard: shard1, ch_replica: replica1 }
10.10.1.2: { ch_cluster: uptrace1, ch_shard: shard1, ch_replica: replica2 }
# Shard 2
10.10.1.3: { ch_cluster: uptrace1, ch_shard: shard2, ch_replica: replica1 }
10.10.1.4: { ch_cluster: uptrace1, ch_shard: shard2, ch_replica: replica2 }
Troubleshooting
Common Issues and Solutions
Connection Issues
Problem: Ansible cannot connect to hosts
Solution:
# Test connectivity
ansible all -i inventory.yml -m ping
# Check SSH access
ssh user@host-ip
# Verify inventory syntax
ansible-inventory -i inventory.yml --list
Service Failures
Problem: Services fail to start
Solution:
# Check service status
ansible all -i inventory.yml -m systemd -a "name=uptrace state=status"
# View logs
ansible all -i inventory.yml -m shell -a "journalctl -u uptrace -n 50"
# Check disk space
ansible all -i inventory.yml -m shell -a "df -h"
Performance Issues
Problem: Slow query performance
Solutions:
- Increase ClickHouse memory allocation
- Add more ClickHouse replicas
- Optimize query patterns
- Review resource utilization
Diagnostic Commands
# Check all services status
ansible all -i inventory.yml -m shell -a "systemctl status uptrace clickhouse-server postgresql redis"
# Monitor resource usage
ansible all -i inventory.yml -m shell -a "top -n 1 -b"
# Check network connectivity between services
ansible all -i inventory.yml -m shell -a "netstat -tlnp"
# Verify configuration files
ansible all -i inventory.yml -m shell -a "nginx -t" # if using nginx
Security Considerations
Network Security
- Configure firewalls to allow only necessary ports
- Use VPN or private networks for inter-service communication
- Implement proper network segmentation
Access Control
- Use SSH key authentication instead of passwords
- Implement role-based access control
- Regularly audit user access
Data Protection
- Encrypt sensitive data at rest
- Use TLS for all network communications
- Implement proper backup encryption
Best Practices
Infrastructure Management
- Use version control for all configuration files
- Test changes in a staging environment first
- Implement monitoring for all services
- Document custom configurations and procedures
- Plan for disaster recovery scenarios
Configuration Management
- Use Ansible Vault for sensitive data
- Implement configuration validation before deployment
- Maintain environment-specific configurations
- Use meaningful variable names and comments
Operational Excellence
- Automate routine tasks with additional playbooks
- Implement health checks and monitoring
- Create runbooks for common scenarios
- Establish maintenance windows for updates
Next Steps
After successful deployment:
- Configure your applications to send telemetry data to Uptrace
- Set up monitoring dashboards for your services
- Create alerting rules for critical metrics
- Implement log aggregation for better observability
- Explore advanced features like distributed tracing and custom metrics
Table of Contents