OpenTelemetry Go Sampling

What is sampling?

Sampling is a process that restricts the amount of spans that are generated by a system. In high-volume applications, collecting 100% of traces can be expensive and unnecessary. Sampling allows you to collect a representative subset of traces while reducing costs and performance overhead.

Go sampling overview

OpenTelemetry Go SDK provides head-based sampling capabilities where the sampling decision is made at the beginning of a trace. By default, the tracer provider uses a ParentBased sampler with the AlwaysSample sampler. You can configure a sampler on the tracer provider using the WithSampler option.

For production use, we recommend using the consistent probability based sampler from the official contrib package. This sampler implements the OpenTelemetry specification for consistent probability sampling and records sampling information in the tracestate, enabling accurate span-to-metrics pipelines and ensuring traces remain complete across service boundaries.

Why use consistent probability sampling?

The consistent probability sampler propagates two values via the tracestate: "p-value" (sampling probability) and "r-value" (randomness source). This approach provides several benefits:

  • Consistent decisions: All services make the same sampling decision for a trace
  • Accurate metrics: Spans can be accurately counted using span-to-metrics pipelines
  • Complete traces: Traces tend to be complete even when spans make independent sampling decisions
  • Cross-service compatibility: Works correctly across different services and vendors

Basic usage

go
import (
    "go.opentelemetry.io/contrib/samplers/probability/consistent"
    "go.opentelemetry.io/otel/sdk/trace"
)

// Recommended: Use consistent probability sampling
sampler := consistent.ProbabilityBased(0.1) // Sample 10% of traces

// Wrap with ParentBased for proper parent respect
provider := trace.NewTracerProvider(
    trace.WithSampler(consistent.ParentProbabilityBased(sampler)),
)

Advanced consistent sampling

go
import (
    "crypto/rand"
    "go.opentelemetry.io/contrib/samplers/probability/consistent"
)

// With custom random source for deterministic testing
sampler := consistent.ProbabilityBased(0.25,
    consistent.WithRandomSource(rand.NewSource(42)))

// Parent-based consistent sampler with different child behaviors
sampler := consistent.ParentProbabilityBased(
    consistent.ProbabilityBased(0.1), // Root sampler
    trace.WithRemoteParentSampled(consistent.ProbabilityBased(0.05)),
    trace.WithLocalParentSampled(consistent.ProbabilityBased(0.15)),
)

Built-in samplers

While the consistent probability sampler is recommended for production, the Go SDK also provides these built-in samplers:

AlwaysSample

Samples every trace. Useful for development environments but use with caution in production with significant traffic:

go
import (
    "go.opentelemetry.io/otel/sdk/trace"
)

provider := trace.NewTracerProvider(
    trace.WithSampler(trace.AlwaysSample()),
)

NeverSample

Samples no traces. Useful for completely disabling tracing:

go
provider := trace.NewTracerProvider(
    trace.WithSampler(trace.NeverSample()),
)

TraceIDRatioBased

Samples a fraction of traces based on the trace ID. The fraction should be between 0.0 and 1.0:

go
// Sample 10% of traces
provider := trace.NewTracerProvider(
    trace.WithSampler(trace.TraceIDRatioBased(0.1)),
)

// Sample 50% of traces
provider := trace.NewTracerProvider(
    trace.WithSampler(trace.TraceIDRatioBased(0.5)),
)

ParentBased

A sampler decorator that behaves differently based on the parent of the span. If the span has no parent, the decorated sampler is used to make the sampling decision. If the span has a parent, the sampler follows the parent's sampling decision:

go
// ParentBased with TraceIDRatioBased root sampler
provider := trace.NewTracerProvider(
    trace.WithSampler(trace.ParentBased(trace.TraceIDRatioBased(0.1))),
)

// ParentBased with AlwaysSample root sampler (default behavior)
provider := trace.NewTracerProvider(
    trace.WithSampler(trace.ParentBased(trace.AlwaysSample())),
)

Configuration options

Environment variables

For consistent probability sampling, you should configure it programmatically. For built-in samplers, you can use environment variables:

bash
# Built-in TraceIDRatioBased sampler with 50% sampling
export OTEL_TRACES_SAMPLER="traceidratio"
export OTEL_TRACES_SAMPLER_ARG="0.5"

# Built-in ParentBased with TraceIDRatioBased
export OTEL_TRACES_SAMPLER="parentbased_traceidratio"
export OTEL_TRACES_SAMPLER_ARG="0.1"

# Always sample
export OTEL_TRACES_SAMPLER="always_on"

# Never sample
export OTEL_TRACES_SAMPLER="always_off"

Note: Environment variable configuration does not support the consistent probability sampler. Use programmatic configuration for consistent sampling.

Programmatic configuration with consistent sampling

go
package main

import (
    "context"
    "log"
    "os"

    "go.opentelemetry.io/otel"
    "go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc"
    "go.opentelemetry.io/otel/sdk/resource"
    "go.opentelemetry.io/otel/sdk/trace"
    "go.opentelemetry.io/contrib/samplers/probability/consistent"
    semconv "go.opentelemetry.io/otel/semconv/v1.17.0"
)

func setupTracing(ctx context.Context) func() {
    // Create OTLP exporter
    exporter, err := otlptracegrpc.New(ctx,
        otlptracegrpc.WithEndpoint("https://api.uptrace.dev:4317"),
        otlptracegrpc.WithHeaders(map[string]string{
            "uptrace-dsn": os.Getenv("UPTRACE_DSN"),
        }),
    )
    if err != nil {
        log.Fatal(err)
    }

    // Create resource
    res, err := resource.Merge(
        resource.Default(),
        resource.NewWithAttributes(
            semconv.SchemaURL,
            semconv.ServiceName("my-service"),
            semconv.ServiceVersion("1.0.0"),
        ),
    )
    if err != nil {
        log.Fatal(err)
    }

    // Configure consistent probability sampler based on environment
    var sampler trace.Sampler
    env := os.Getenv("GO_ENV")
    switch env {
    case "development":
        sampler = trace.AlwaysSample()
    case "production":
        // Recommended: Use consistent probability sampling for production
        sampler = consistent.ParentProbabilityBased(
            consistent.ProbabilityBased(0.1), // 10% sampling
        )
    default:
        sampler = consistent.ParentProbabilityBased(
            consistent.ProbabilityBased(0.25), // 25% sampling
        )
    }

    // Create tracer provider
    tp := trace.NewTracerProvider(
        trace.WithBatcher(exporter),
        trace.WithResource(res),
        trace.WithSampler(sampler),
    )

    otel.SetTracerProvider(tp)

    return func() {
        if err := tp.Shutdown(ctx); err != nil {
            log.Printf("Error shutting down tracer provider: %v", err)
        }
    }
}

Custom samplers

Create custom sampling logic by implementing the Sampler interface:

go
package main

import (
    "strings"

    "go.opentelemetry.io/otel/attribute"
    "go.opentelemetry.io/otel/sdk/trace"
)

// CustomSampler implements custom sampling logic
type CustomSampler struct {
    defaultSampler trace.Sampler
}

func NewCustomSampler() trace.Sampler {
    return &CustomSampler{
        defaultSampler: trace.TraceIDRatioBased(0.1), // 10% default sampling
    }
}

func (s *CustomSampler) ShouldSample(p trace.SamplingParameters) trace.SamplingResult {
    // Always sample spans with errors
    for _, attr := range p.Attributes {
        if attr.Key == "error" && attr.Value.AsBool() {
            return trace.SamplingResult{
                Decision:   trace.RecordAndSample,
                Tracestate: p.ParentContext.TraceState(),
            }
        }
    }

    // Always sample critical operations
    spanName := p.Name
    if strings.Contains(spanName, "critical") ||
       strings.Contains(spanName, "payment") ||
       strings.Contains(spanName, "auth") {
        return trace.SamplingResult{
            Decision:   trace.RecordAndSample,
            Tracestate: p.ParentContext.TraceState(),
        }
    }

    // Don't sample health checks
    if strings.Contains(spanName, "health") || strings.Contains(spanName, "ping") {
        return trace.SamplingResult{
            Decision:   trace.Drop,
            Tracestate: p.ParentContext.TraceState(),
        }
    }

    // Use default sampler for other cases
    return s.defaultSampler.ShouldSample(p)
}

func (s *CustomSampler) Description() string {
    return "CustomSampler{errors=always,critical=always,health=never,default=10%}"
}

// Usage
func main() {
    provider := trace.NewTracerProvider(
        trace.WithSampler(NewCustomSampler()),
        // other options...
    )
    otel.SetTracerProvider(provider)
}

Advanced sampling scenarios

Attribute-based sampling

go
import "fmt"

type AttributeBasedSampler struct {
    highPrioritySampler trace.Sampler
    defaultSampler      trace.Sampler
}

func NewAttributeBasedSampler() trace.Sampler {
    return &AttributeBasedSampler{
        highPrioritySampler: trace.AlwaysSample(),
        defaultSampler:      trace.TraceIDRatioBased(0.05), // 5% default
    }
}

func (s *AttributeBasedSampler) ShouldSample(p trace.SamplingParameters) trace.SamplingResult {
    // Check for high priority attributes
    for _, attr := range p.Attributes {
        switch attr.Key {
        case "http.route":
            route := attr.Value.AsString()
            // Always sample admin routes
            if strings.HasPrefix(route, "/admin") {
                return s.highPrioritySampler.ShouldSample(p)
            }
            // Sample 50% of API routes
            if strings.HasPrefix(route, "/api") {
                return trace.TraceIDRatioBased(0.5).ShouldSample(p)
            }
        case "user.tier":
            // Always sample premium users
            if attr.Value.AsString() == "premium" {
                return s.highPrioritySampler.ShouldSample(p)
            }
        case "http.status_code":
            // Always sample error responses
            statusCode := attr.Value.AsInt64()
            if statusCode >= 400 {
                return s.highPrioritySampler.ShouldSample(p)
            }
        }
    }

    return s.defaultSampler.ShouldSample(p)
}

func (s *AttributeBasedSampler) Description() string {
    return "AttributeBasedSampler{admin=always,api=50%,premium=always,errors=always,default=5%}"
}

Rate-limiting sampler

go
import (
    "fmt"
    "sync"
    "time"
)

type RateLimitingSampler struct {
    maxSamplesPerSecond int
    lastSecond          int64
    currentCount        int
    mutex               sync.Mutex
}

func NewRateLimitingSampler(maxSamplesPerSecond int) trace.Sampler {
    return &RateLimitingSampler{
        maxSamplesPerSecond: maxSamplesPerSecond,
    }
}

func (s *RateLimitingSampler) ShouldSample(p trace.SamplingParameters) trace.SamplingResult {
    s.mutex.Lock()
    defer s.mutex.Unlock()

    now := time.Now().Unix()
    if now != s.lastSecond {
        s.lastSecond = now
        s.currentCount = 0
    }

    if s.currentCount < s.maxSamplesPerSecond {
        s.currentCount++
        return trace.SamplingResult{
            Decision:   trace.RecordAndSample,
            Tracestate: p.ParentContext.TraceState(),
        }
    }

    return trace.SamplingResult{
        Decision:   trace.Drop,
        Tracestate: p.ParentContext.TraceState(),
    }
}

func (s *RateLimitingSampler) Description() string {
    return fmt.Sprintf("RateLimitingSampler{%d samples/second}", s.maxSamplesPerSecond)
}

Production deployment examples

HTTP server implementation

go
package main

import (
    "context"
    "fmt"
    "log"
    "net/http"
    "os"
    "time"

    "go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp"
    "go.opentelemetry.io/contrib/samplers/probability/consistent"
    "go.opentelemetry.io/otel"
    "go.opentelemetry.io/otel/attribute"
    "go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc"
    "go.opentelemetry.io/otel/sdk/resource"
    "go.opentelemetry.io/otel/sdk/trace"
    semconv "go.opentelemetry.io/otel/semconv/v1.17.0"
)

func main() {
    ctx := context.Background()

    // Setup tracing
    shutdown := setupTracing(ctx)
    defer shutdown()

    // Create HTTP handler with OpenTelemetry instrumentation
    handler := http.HandlerFunc(handleRequest)
    wrappedHandler := otelhttp.NewHandler(handler, "my-server")

    // Start server
    log.Println("Server starting on :8080")
    if err := http.ListenAndServe(":8080", wrappedHandler); err != nil {
        log.Fatal(err)
    }
}

func handleRequest(w http.ResponseWriter, r *http.Request) {
    ctx := r.Context()
    tracer := otel.Tracer("my-service")

    // Create custom span
    ctx, span := tracer.Start(ctx, "handle-request")
    defer span.End()

    // Check if span is recording to avoid expensive operations
    if span.IsRecording() {
        span.SetAttributes(
            attribute.String("http.method", r.Method),
            attribute.String("http.url", r.URL.String()),
            attribute.String("user.agent", r.UserAgent()),
        )
    }

    // Simulate some work
    time.Sleep(50 * time.Millisecond)

    // Add more attributes only if recording
    if span.IsRecording() {
        span.SetAttributes(
            attribute.Int("http.status_code", 200),
            attribute.String("response.type", "success"),
        )
    }

    w.WriteHeader(http.StatusOK)
    fmt.Fprintf(w, "Hello, World!")
}

func setupTracing(ctx context.Context) func() {
    exporter, err := otlptracegrpc.New(ctx,
        otlptracegrpc.WithEndpoint("https://api.uptrace.dev:4317"),
        otlptracegrpc.WithHeaders(map[string]string{
            "uptrace-dsn": os.Getenv("UPTRACE_DSN"),
        }),
    )
    if err != nil {
        log.Fatal(err)
    }

    res, err := resource.Merge(
        resource.Default(),
        resource.NewWithAttributes(
            semconv.SchemaURL,
            semconv.ServiceName("my-http-server"),
            semconv.ServiceVersion("1.0.0"),
            semconv.DeploymentEnvironment(os.Getenv("ENVIRONMENT")),
        ),
    )
    if err != nil {
        log.Fatal(err)
    }

    // Configure sampling based on environment
    var sampler trace.Sampler
    switch os.Getenv("ENVIRONMENT") {
    case "development":
        sampler = trace.AlwaysSample()
    case "production":
        // Recommended: Use consistent probability sampling for production
        sampler = consistent.ParentProbabilityBased(
            consistent.ProbabilityBased(0.1), // 10% sampling
        )
    default:
        sampler = consistent.ParentProbabilityBased(
            consistent.ProbabilityBased(0.25), // 25% sampling
        )
    }

    tp := trace.NewTracerProvider(
        trace.WithBatcher(exporter,
            trace.WithBatchTimeout(time.Second),
            trace.WithMaxExportBatchSize(100),
        ),
        trace.WithResource(res),
        trace.WithSampler(sampler),
    )

    otel.SetTracerProvider(tp)

    return func() {
        if err := tp.Shutdown(ctx); err != nil {
            log.Printf("Error shutting down tracer provider: %v", err)
        }
    }
}

Database operations with sampling

go
package main

import (
    "context"
    "database/sql"

    "go.opentelemetry.io/otel"
    "go.opentelemetry.io/otel/attribute"
    "go.opentelemetry.io/otel/codes"
    "go.opentelemetry.io/otel/trace"
)

type UserService struct {
    db     *sql.DB
    tracer trace.Tracer
}

func NewUserService(db *sql.DB) *UserService {
    return &UserService{
        db:     db,
        tracer: otel.Tracer("user-service"),
    }
}

func (s *UserService) CreateUser(ctx context.Context, user *User) error {
    ctx, span := s.tracer.Start(ctx, "create-user")
    defer span.End()

    // Only add expensive attributes if recording
    if span.IsRecording() {
        span.SetAttributes(
            attribute.String("user.email", user.Email),
            attribute.String("user.role", user.Role),
            attribute.Int("user.age", user.Age),
        )
    }

    query := "INSERT INTO users (email, name, role, age) VALUES (?, ?, ?, ?)"
    _, err := s.db.ExecContext(ctx, query, user.Email, user.Name, user.Role, user.Age)

    if err != nil {
        span.RecordError(err)
        span.SetStatus(codes.Error, "Failed to create user")
        return err
    }

    if span.IsRecording() {
        span.SetAttributes(
            attribute.String("operation.result", "success"),
        )
        span.AddEvent("user-created")
    }

    return nil
}

func (s *UserService) GetUser(ctx context.Context, userID int) (*User, error) {
    ctx, span := s.tracer.Start(ctx, "get-user")
    defer span.End()

    if span.IsRecording() {
        span.SetAttributes(
            attribute.Int("user.id", userID),
        )
    }

    var user User
    query := "SELECT id, email, name, role, age FROM users WHERE id = ?"
    err := s.db.QueryRowContext(ctx, query, userID).Scan(
        &user.ID, &user.Email, &user.Name, &user.Role, &user.Age,
    )

    if err != nil {
        if err == sql.ErrNoRows {
            span.SetStatus(codes.Error, "User not found")
            if span.IsRecording() {
                span.SetAttributes(
                    attribute.String("error.type", "not_found"),
                )
            }
        } else {
            span.RecordError(err)
            span.SetStatus(codes.Error, "Database query failed")
        }
        return nil, err
    }

    if span.IsRecording() {
        span.SetAttributes(
            attribute.String("user.email", user.Email),
            attribute.String("operation.result", "success"),
        )
    }

    return &user, nil
}

type User struct {
    ID    int    `json:"id"`
    Email string `json:"email"`
    Name  string `json:"name"`
    Role  string `json:"role"`
    Age   int    `json:"age"`
}

Monitoring sampling performance

Statistics collector

go
package main

import (
    "context"
    "log"
    "sync/atomic"
    "time"

    "go.opentelemetry.io/otel/sdk/trace"
)

type SamplingStatsCollector struct {
    totalSpans   int64
    sampledSpans int64
    ticker       *time.Ticker
    done         chan bool
}

func NewSamplingStatsCollector() *SamplingStatsCollector {
    return &SamplingStatsCollector{
        ticker: time.NewTicker(30 * time.Second),
        done:   make(chan bool),
    }
}

func (c *SamplingStatsCollector) Start() {
    go func() {
        for {
            select {
            case <-c.ticker.C:
                c.logStats()
            case <-c.done:
                return
            }
        }
    }()
}

func (c *SamplingStatsCollector) Stop() {
    c.ticker.Stop()
    c.done <- true
}

func (c *SamplingStatsCollector) RecordSpan(sampled bool) {
    atomic.AddInt64(&c.totalSpans, 1)
    if sampled {
        atomic.AddInt64(&c.sampledSpans, 1)
    }
}

func (c *SamplingStatsCollector) logStats() {
    total := atomic.LoadInt64(&c.totalSpans)
    sampled := atomic.LoadInt64(&c.sampledSpans)

    if total > 0 {
        rate := float64(sampled) / float64(total) * 100
        log.Printf("Sampling statistics: %.2f%% (%d/%d spans sampled)",
                   rate, sampled, total)

        // Alert if sampling rate is unexpected
        if rate > 20.0 {
            log.Printf("WARNING: High sampling rate detected: %.2f%%", rate)
        } else if rate < 1.0 {
            log.Printf("WARNING: Low sampling rate detected: %.2f%%", rate)
        }
    }
}

// Custom span processor to collect sampling stats
type SamplingStatsProcessor struct {
    collector *SamplingStatsCollector
}

func NewSamplingStatsProcessor(collector *SamplingStatsCollector) trace.SpanProcessor {
    return &SamplingStatsProcessor{
        collector: collector,
    }
}

func (p *SamplingStatsProcessor) OnStart(parent context.Context, s trace.ReadWriteSpan) {
    // Record that a span was created
    sampled := s.SpanContext().IsSampled()
    p.collector.RecordSpan(sampled)
}

func (p *SamplingStatsProcessor) OnEnd(s trace.ReadOnlySpan) {
    // Nothing to do on span end for stats collection
}

func (p *SamplingStatsProcessor) Shutdown(ctx context.Context) error {
    return nil
}

func (p *SamplingStatsProcessor) ForceFlush(ctx context.Context) error {
    return nil
}

// Usage example
func main() {
    ctx := context.Background()

    // Create stats collector
    statsCollector := NewSamplingStatsCollector()
    statsCollector.Start()
    defer statsCollector.Stop()

    // Setup tracer provider with stats processor
    tp := trace.NewTracerProvider(
        trace.WithSampler(trace.TraceIDRatioBased(0.1)),
        trace.WithSpanProcessor(NewSamplingStatsProcessor(statsCollector)),
        // other processors...
    )
    defer tp.Shutdown(ctx)

    // Use the tracer...
}

Best practices

Performance optimization

When implementing sampling, consider these performance best practices:

  1. Check span recording status: Use span.IsRecording() before adding expensive attributes or operations
  2. Minimize allocations: Reuse attribute slices and avoid unnecessary string concatenations in hot paths
  3. Batch span processing: Use WithBatcher() instead of simple span processors for better performance
  4. Configure appropriate batch sizes: Balance memory usage and export frequency based on your traffic patterns

Production considerations

For production deployments:

  1. Use consistent probability sampling: Prefer the consistent probability sampler over built-in samplers for production workloads as it records sampling information in tracestate and enables accurate span-to-metrics pipelines
  2. Start conservative: Begin with lower sampling rates (1-5%) and increase gradually based on your observability needs
  3. Monitor sampling effectiveness: Track sampling rates and ensure you're capturing enough data for meaningful analysis
  4. Consider costs: Balance observability value with storage and processing costs
  5. Use environment-based configuration: Configure different sampling rates for development, staging, and production
  6. Implement fallback strategies: Ensure your application continues to function even if sampling configuration fails
  7. Verify tracestate consistency: When using consistent sampling, monitor traces to ensure they don't contain multiple distinct r-values, which would indicate inconsistent sampling

What's next?