OpenTelemetry Collector Configuration Tutorial

OpenTelemetry Collector receives, processes, and exports telemetry data (traces, metrics, logs) from applications. Configure receivers (OTLP, Prometheus), processors (batch, sampling), and exporters (Jaeger, cloud providers) using YAML. Default ports:4317 (gRPC), 4318 (HTTP). Validate with otelcol-contrib --config=config.yaml --dry-run.

What You'll Learn

By the end of this tutorial, you'll know how to:

Set up a basic Collector configuration
Configure receivers, processors, and exporters
Handle different telemetry data types
Implement common use cases like data sampling and enrichment
Troubleshoot configuration issues

Prerequisites

Basic understanding of YAML
Familiarity with observability concepts (traces, metrics, logs)
A running application that generates telemetry data (we'll provide examples if you don't have one)

Collector Architecture

The Collector architecture consists of:

text

[Your App] → [Receivers] → [Processors] → [Exporters] → [Backend]

Receivers: How data gets into the Collector (OTLP, Jaeger, Prometheus, etc.)
Processors: What happens to data in transit (sampling, filtering, enriching)
Exporters: Where data goes next (Jaeger, Prometheus, cloud providers)
Pipelines: Connect receivers → processors → exporters for each data type

Quick Reference

Essential Components to Remember

otlp receiver for modern apps
batch processor for performance
memory_limiter processor for stability
debug exporter for testing

Default Ports

4317: OTLP gRPC
4318: OTLP HTTP
8888: Collector metrics
8889: Prometheus exporter (if configured)

Useful Commands

bash

# Validate config
otelcol-contrib --config=config.yaml --dry-run

# Run with debug logging
otelcol-contrib --config=config.yaml --log-level=debug

# Check collector health
curl http://localhost:8888/metrics

Basic Configuration Structure

Every Collector config follows this YAML structure:

yaml

# collector-config.yaml
receivers:
  # How to receive data

processors:
  # How to process data (optional)

exporters:
  # Where to send data

service:
  pipelines:
    # Connect everything together

First Configuration

Let's start with a minimal setup that receives OTLP data and exports it to the console:

yaml

# basic-config.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

exporters:
  debug:
    verbosity: detailed

service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [debug]
    metrics:
      receivers: [otlp]
      exporters: [debug]
    logs:
      receivers: [otlp]
      exporters: [debug]

Run it:

bash

otelcol --config=basic-config.yaml

This config:

Accepts OTLP data on standard ports (4317 for gRPC, 4318 for HTTP)
Prints all received data to the console
Handles traces, metrics, and logs separately

Configuration Example

Here's a more practical setup that you might use in production:

yaml

# production-config.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

  # Scrape Prometheus metrics
  prometheus:
    config:
      scrape_configs:
        - job_name: 'my-service'
          static_configs:
            - targets: ['localhost:8080']

processors:
  # Sample traces to reduce volume
  probabilistic_sampler:
    sampling_percentage: 10.0

  # Add resource attributes
  resource:
    attributes:
      - key: environment
        value: production
        action: upsert
      - key: service.version
        from_attribute: app.version
        action: insert

  # Batch data for efficiency
  batch:
    timeout: 1s
    send_batch_size: 1024

exporters:
  # Export to Jaeger
  jaeger:
    endpoint: jaeger-collector:14250
    tls:
      insecure: true

  # Export to Prometheus
  prometheus:
    endpoint: "0.0.0.0:8889"

  # Export to cloud provider
  otlp/uptrace:
    endpoint: https://api.uptrace.dev:4317
    headers:
      "uptrace-dsn": "${UPTRACE_DSN}"

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [probabilistic_sampler, resource, batch]
      exporters: [jaeger, otlp/uptrace]

    metrics:
      receivers: [otlp, prometheus]
      processors: [resource, batch]
      exporters: [prometheus, otlp/uptrace]

    logs:
      receivers: [otlp]
      processors: [resource, batch]
      exporters: [otlp/uptrace]

Configuration Patterns

Pattern 1: Multi-Environment Setup

Use different configs for dev/staging/prod:

yaml

# Use environment variables for flexibility
exporters:
  jaeger:
    endpoint: ${JAEGER_ENDPOINT}

processors:
  probabilistic_sampler:
    sampling_percentage: ${SAMPLING_RATE:100.0}  # Default 100% if not set

Pattern 2: Data Enrichment

Add context to your telemetry:

yaml

processors:
  resource:
    attributes:
      - key: k8s.cluster.name
        value: ${K8S_CLUSTER_NAME}
        action: upsert
      - key: deployment.environment
        value: ${ENVIRONMENT}
        action: upsert

  transform:
    trace_statements:
      - context: span
        statements:
          - set(attributes["custom.field"], "processed-by-collector")

Pattern 3: Data Filtering

Remove unwanted data:

yaml

processors:
  filter:
    traces:
      span:
        - 'attributes["http.route"] == "/health"'
        - 'name == "GET /metrics"'
    metrics:
      metric:
        - 'name == "unwanted_metric"'

Receiver

OTLP Receiver (Most Common)

yaml

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
        # Optional: TLS configuration
        tls:
          cert_file: /path/to/cert.pem
          key_file: /path/to/key.pem
      http:
        endpoint: 0.0.0.0:4318
        cors:
          allowed_origins: ["*"]  # Be more restrictive in production

Prometheus Receiver

yaml

receivers:
  prometheus:
    config:
      scrape_configs:
        - job_name: 'my-app'
          scrape_interval: 30s
          static_configs:
            - targets: ['app:8080']
          metrics_path: /metrics

Filelog Receiver (for logs)

yaml

receivers:
  filelog:
    include: [/var/log/myapp/*.log]
    operators:
      - type: json_parser
        timestamp:
          parse_from: attributes.timestamp
          layout: '%Y-%m-%d %H:%M:%S'

Processor

Batch Processor (Essential for Performance)

yaml

processors:
  batch:
    timeout: 1s           # How long to wait before sending
    send_batch_size: 512  # Send when this many items collected
    send_batch_max_size: 1024  # Never exceed this size

Sampling Processors

yaml

processors:
  # Sample 10% of traces
  probabilistic_sampler:
    sampling_percentage: 10.0

  # More sophisticated sampling
  tail_sampling:
    decision_wait: 10s
    policies:
      - name: error-traces
        type: status_code
        status_code: {status_codes: [ERROR]}
      - name: slow-traces
        type: latency
        latency: {threshold_ms: 1000}
      - name: random-sample
        type: probabilistic
        probabilistic: {sampling_percentage: 1.0}

Exporter

Exporters send processed data to one or more backends. For complete exporter configuration including all available backends, authentication, and production patterns, see OpenTelemetry Collector Exporters.

Cloud Provider Exporters

yaml

exporters:
  # Google Cloud
  googlecloud:
    project: my-gcp-project

  # AWS X-Ray
  awsxray:
    region: us-west-2

  # Azure Monitor
  azuremonitor:
    connection_string: ${APPLICATIONINSIGHTS_CONNECTION_STRING}

File Exporter (for debugging)

yaml

exporters:
  file:
    path: /tmp/otel-data.json
    rotation:
      max_megabytes: 100
      max_days: 7
      max_backups: 3

Environment Variables and Secrets

Keep sensitive data out of your config files:

yaml

# In your config
exporters:
  otlp/backend:
    endpoint: ${BACKEND_ENDPOINT}
    headers:
      authorization: "Bearer ${API_TOKEN}"

bash

# In your environment
export BACKEND_ENDPOINT="https://api.example.com"
export API_TOKEN="your-secret-token"

Docker Deployment

Here's a complete Docker setup:

dockerfile

# Dockerfile
FROM otel/opentelemetry-collector-contrib:latest
COPY collector-config.yaml /etc/otelcol-contrib/config.yaml
EXPOSE 4317 4318 8889

yaml

# docker-compose.yml
version: '3.8'
services:
  otel-collector:
    build: .
    ports:
      - "4317:4317"   # OTLP gRPC
      - "4318:4318"   # OTLP HTTP
      - "8889:8889"   # Prometheus metrics
    environment:
      - JAEGER_ENDPOINT=http://jaeger:14250
    volumes:
      - ./logs:/var/log

Testing Your Configuration

1. Validate Syntax

bash

otelcol-contrib --config=your-config.yaml --dry-run

2. Check What's Running

bash

# The collector exposes metrics about itself
curl http://localhost:8888/metrics

3. Send Test Data

bash

# Send a test trace using curl
curl -X POST http://localhost:4318/v1/traces \
  -H "Content-Type: application/json" \
  -d '{
    "resourceSpans": [{
      "resource": {"attributes": [{"key": "service.name", "value": {"stringValue": "test-service"}}]},
      "scopeSpans": [{
        "spans": [{
          "traceId": "5b8aa5a2d2c872e8321cf37308d69df2",
          "spanId": "051581bf3cb55c13",
          "name": "test-span",
          "kind": "SPAN_KIND_CLIENT",
          "startTimeUnixNano": "1640995200000000000",
          "endTimeUnixNano": "1640995200100000000"
        }]
      }]
    }]
  }'

Common Troubleshooting

Configuration Not Loading

Check YAML syntax (indentation matters!)
Verify file permissions
Look for typos in component names

Data Not Flowing

Check that receivers are listening on the right ports
Verify pipeline connections (receivers → processors → exporters)
Look at collector logs: otelcol-contrib --config=config.yaml --log-level=debug

Performance Issues

Add batch processor if missing
Reduce sampling rates
Check memory_limiter processor configuration

Memory Usage Growing

yaml

processors:
  memory_limiter:
    limit_mib: 512
    spike_limit_mib: 128

Advanced Use Cases

Multi-Pipeline Setup

yaml

service:
  pipelines:
    # High-priority traces (errors) with no sampling
    traces/errors:
      receivers: [otlp]
      processors: [filter/errors, batch]
      exporters: [jaeger]

    # Normal traces with sampling
    traces/sampled:
      receivers: [otlp]
      processors: [filter/normal, probabilistic_sampler, batch]
      exporters: [jaeger]

Data Routing by Attributes

yaml

processors:
  routing:
    from_attribute: service.name
    table:
      - value: frontend
        exporters: [jaeger, prometheus/frontend]
      - value: backend
        exporters: [jaeger, prometheus/backend]