OpenTelemetry Sampling: head-based and tail-based

OpenTelemetry Sampling reduces the cost and verbosity of tracing by reducing the number of created (sampled) spans. In terms of performance, sampling can save CPU cycles and memory required to collect, process, and export spans.

What is sampling?

Sampling is used in distributed tracing to control the volume of data collected and sent to the tracing backend. It helps to balance the tradeoff between data volume and trace accuracy.

In distributed tracing, a request generates spans as it flows through a system. Spans represent individual operations or events that occur during the processing of that request. These spans can become quite numerous in a complex system, and sending them all to the tracing backend can result in significant overhead and storage costs.

Sampling involves making decisions about which spans to record and which to discard.

Sampling: when and where

Sampling may happen in different stages of spans processing:

The choice of sampling strategy depends on several factors, including the desired level of observability, available resources, and the specific use case of the system.

Choosing the right sampling strategy

Use this decision flowchart to select the most appropriate sampling strategy for your use case:

text
START: What is your primary concern?

├─ Cost reduction / High traffic volume
│  ├─ Need 100% error visibility?
│  │  ├─ YES → Tail-based sampling with error policy
│  │  └─ NO → Head-based TraceIDRatioBased (10-20%)
│  │
│  └─ Different requirements per service?
│     ├─ YES → Rule-based sampling with service-specific rates
│     └─ NO → Simple TraceIDRatioBased sampling

├─ Debugging production issues
│  ├─ Known problematic endpoints?
│  │  ├─ YES → Rule-based: 100% for problem areas, 10% elsewhere
│  │  └─ NO → Tail-based with error + latency policies
│  │
│  └─ Need to capture slow transactions?
│     └─ YES → Tail-based latency sampling (P95/P99)

├─ Compliance / Audit requirements
│  ├─ Must capture all transactions?
│  │  └─ YES → No sampling (100%) or AlwaysOn
│  │
│  └─ Need sampled data for specific endpoints?
│     └─ YES → Rule-based: 100% for audited endpoints

├─ Development / Staging environment
│  ├─ Limited traffic?
│  │  └─ YES → No sampling (100%) for full visibility
│  │
│  └─ Testing sampling configurations?
│     └─ YES → Match production sampling strategy

└─ Getting started / POC
   └─ Start with → Head-based TraceIDRatioBased (50%)
      └─ Then adjust based on → Data volume and requirements

Quick decision matrix

ScenarioRecommended StrategySampling RateRationale
High-traffic production (>10k RPS)Head-based + Tail-based10-20% base + errorsCost control + error visibility
Payment/Financial servicesRule-based100% critical, 20% otherCompliance + observability
Microservices (mixed criticality)Service-level samplingVaries by serviceOptimize per service
Startup/Low traffic (<100 RPS)AlwaysOn100%Cost negligible, max visibility
Development/StagingAlwaysOn100%Full debugging capability
Background jobsHead-based1-5%Low priority, cost savings
API gatewayRule-basedEndpoint-specificCritical paths 100%
Troubleshooting modeTemporary AlwaysOn100% → then reduceInvestigation period

When to use each strategy

Head-based sampling - Use when:

  • ✅ You need predictable costs
  • ✅ Traffic is consistent
  • ✅ Simple implementation is preferred
  • ✅ Client-side performance matters
  • ❌ Avoid when: Must capture all errors

Tail-based sampling - Use when:

  • ✅ Must capture all errors/slow traces
  • ✅ Have OpenTelemetry Collector
  • ✅ Can tolerate additional latency
  • ✅ Need sophisticated filtering
  • ❌ Avoid when: High latency requirements

Rule-based sampling - Use when:

  • ✅ Different endpoints have different priorities
  • ✅ Need fine-grained control
  • ✅ Have clear business requirements
  • ✅ Can maintain sampling rules
  • ❌ Avoid when: Simple solution needed

Rate-limiting - Use when:

  • ✅ Traffic has unpredictable spikes
  • ✅ Have hard budget constraints
  • ✅ Backend supports it automatically
  • ✅ Need guaranteed maximum cost
  • ❌ Avoid when: Need statistical accuracy

Sampling probability

Sampling provides a sampling probability which enables accurate statistical counting of all spans using only a portion of sampled spans. For example, if the sampling probability is 50% and the number of sampled spans is 10, then the adjusted (total) number of spans is 10 / 50% = 20.

NameSideAdjusted countAccuracy
Head-based samplingClient-sideYes100%
Rate-limiting samplingServer-sideYes<90%
Tail-based samplingServer-sideYes<90%

Head-based sampling

Head-based sampling makes the sampling decision as early as possible and propagates it to other participants using the context. This allows saving CPU and memory resources by not collecting any telemetry data for dropped spans (operations).

Head-based sampling is the simplest, most accurate, and most reliable sampling method which you should prefer over all other methods.

A disadvantage of head-based sampling is that you can't sample spans with errors, because that information is not available when spans are created. To address that concern, you can use tail-based sampling.

Head-based sampling also does not account for traffic spikes and may collect more data than desired. This is where rate-limiting sampling becomes handy.

Head-based sampling in OpenTelemetry

OpenTelemetry has 2 span properties responsible for client sampling:

  • IsRecording - when false, span discards attributes, events, links etc.
  • Sampled - when false, OpenTelemetry drops the span.

You should check IsRecording property to avoid collecting expensive telemetry data.

go
if span.IsRecording() {
    // collect expensive data
}

Sampler is a function that accepts a root span about to be created. The function returns a sampling decision which must be one of:

  • Drop - trace is dropped. IsRecording = false, Sampled = false.
  • RecordOnly - trace is recorded but not sampled. IsRecording = true, Sampled = false.
  • RecordAndSample - trace is recorded and sampled. IsRecording = true, Sampled = true.

By default, OpenTelemetry samples all traces, but you can configure it to sample a portion of traces. In that case, backends use the sampling probability to adjust the number of spans.

OpenTelemetry samplers

AlwaysOn sampler samples every trace, for example, a new trace will be started and exported for every request.

AlwaysOff sampler samples no traces or, in other words, drops all traces. You can use this sampler to perform load testing or to temporarily disable tracing.

TraceIDRatioBased sampler uses a trace id to sample a fraction of traces, for example, 20% of traces.

Parent-based sampler is a composite sampler which behaves differently based on the parent of the span. When you start a new trace, sampler makes a decision whether or not to sample it and propagates the decision down to other services.

You can use these guides to configure head-based sampling for your programming language:

Advanced sampling strategies

Beyond the basic samplers, you can implement more sophisticated sampling strategies to optimize costs and maintain observability for critical operations.

Composite sampling

Composite sampling combines multiple sampling strategies to create more nuanced sampling decisions. You can layer different samplers based on priorities:

go Go
import (
    "go.opentelemetry.io/otel/sdk/trace"
    "go.opentelemetry.io/contrib/samplers/probability/consistent"
)

// Composite sampler: always sample errors, 10% for everything else
type CompositeErrorSampler struct {
    baseSampler trace.Sampler
}

func (s *CompositeErrorSampler) ShouldSample(p trace.SamplingParameters) trace.SamplingResult {
    // Check if this is an error span based on attributes
    for _, attr := range p.Attributes {
        if attr.Key == "error" && attr.Value.AsBool() {
            return trace.SamplingResult{Decision: trace.RecordAndSample}
        }
    }

    // Otherwise use base sampler (e.g., 10% sampling)
    return s.baseSampler.ShouldSample(p)
}
python Python
from opentelemetry import trace
from opentelemetry.sdk.trace.sampling import (
    Sampler,
    SamplingResult,
    Decision,
    TraceIdRatioBased
)

class CompositeErrorSampler(Sampler):
    def __init__(self, base_sampler):
        self.base_sampler = base_sampler

    def should_sample(self, parent_context, trace_id, name, kind, attributes, links, trace_state):
        # Always sample if error attribute is present
        if attributes and attributes.get("error"):
            return SamplingResult(
                decision=Decision.RECORD_AND_SAMPLE,
                attributes=attributes
            )

        # Otherwise use base sampler (e.g., 10%)
        return self.base_sampler.should_sample(
            parent_context, trace_id, name, kind, attributes, links, trace_state
        )

# Usage
sampler = CompositeErrorSampler(TraceIdRatioBased(0.1))
java Java
import io.opentelemetry.sdk.trace.samplers.Sampler;
import io.opentelemetry.sdk.trace.samplers.SamplingResult;
import io.opentelemetry.api.common.Attributes;

public class CompositeErrorSampler implements Sampler {
    private final Sampler baseSampler;

    public CompositeErrorSampler(Sampler baseSampler) {
        this.baseSampler = baseSampler;
    }

    @Override
    public SamplingResult shouldSample(
        Context parentContext,
        String traceId,
        String name,
        SpanKind spanKind,
        Attributes attributes,
        List<LinkData> parentLinks) {

        // Always sample if error attribute is present
        if (attributes.get(AttributeKey.booleanKey("error")) != null &&
            attributes.get(AttributeKey.booleanKey("error"))) {
            return SamplingResult.recordAndSample();
        }

        // Otherwise use base sampler
        return baseSampler.shouldSample(
            parentContext, traceId, name, spanKind, attributes, parentLinks
        );
    }
}

Rule-based sampling

Rule-based sampling allows you to define sampling rules based on span attributes, service names, operation names, or other properties. This is particularly useful when you need different sampling rates for different parts of your application.

Common rule-based patterns:

  • Endpoint-based: Sample 100% of checkout/payment endpoints, 5% of health checks
  • User-based: Sample 100% for specific test users or VIP customers
  • Service-based: Sample critical services at 100%, background jobs at 10%
  • Attribute-based: Sample based on custom attributes like tenant ID, region, or feature flags
go Go
type RuleBasedSampler struct {
    rules []SamplingRule
    defaultSampler trace.Sampler
}

type SamplingRule struct {
    Matcher    func(trace.SamplingParameters) bool
    Sampler    trace.Sampler
}

func (s *RuleBasedSampler) ShouldSample(p trace.SamplingParameters) trace.SamplingResult {
    // Check each rule in order
    for _, rule := range s.rules {
        if rule.Matcher(p) {
            return rule.Sampler.ShouldSample(p)
        }
    }

    // Fall back to default sampler
    return s.defaultSampler.ShouldSample(p)
}

// Example usage
sampler := &RuleBasedSampler{
    rules: []SamplingRule{
        {
            // Always sample payment endpoints
            Matcher: func(p trace.SamplingParameters) bool {
                return strings.HasPrefix(p.Name, "/api/payment")
            },
            Sampler: trace.AlwaysSample(),
        },
        {
            // Sample 1% of health checks
            Matcher: func(p trace.SamplingParameters) bool {
                return p.Name == "/health"
            },
            Sampler: trace.TraceIDRatioBased(0.01),
        },
    },
    defaultSampler: trace.TraceIDRatioBased(0.1), // 10% default
}
python Python
from typing import Callable, Optional
from opentelemetry.sdk.trace.sampling import Sampler, SamplingResult

class SamplingRule:
    def __init__(self, matcher: Callable, sampler: Sampler):
        self.matcher = matcher
        self.sampler = sampler

class RuleBasedSampler(Sampler):
    def __init__(self, rules: list[SamplingRule], default_sampler: Sampler):
        self.rules = rules
        self.default_sampler = default_sampler

    def should_sample(self, parent_context, trace_id, name, kind, attributes, links, trace_state):
        # Check each rule
        for rule in self.rules:
            if rule.matcher(name, attributes):
                return rule.sampler.should_sample(
                    parent_context, trace_id, name, kind, attributes, links, trace_state
                )

        # Fall back to default
        return self.default_sampler.should_sample(
            parent_context, trace_id, name, kind, attributes, links, trace_state
        )

# Example usage
from opentelemetry.sdk.trace.sampling import ALWAYS_ON, TraceIdRatioBased

sampler = RuleBasedSampler(
    rules=[
        # Always sample payment endpoints
        SamplingRule(
            matcher=lambda name, attrs: name.startswith("/api/payment"),
            sampler=ALWAYS_ON
        ),
        # Sample 1% of health checks
        SamplingRule(
            matcher=lambda name, attrs: name == "/health",
            sampler=TraceIdRatioBased(0.01)
        ),
    ],
    default_sampler=TraceIdRatioBased(0.1)  # 10% default
)
java Java
import io.opentelemetry.sdk.trace.samplers.Sampler;
import java.util.List;
import java.util.function.Predicate;

public class RuleBasedSampler implements Sampler {
    private final List<SamplingRule> rules;
    private final Sampler defaultSampler;

    public RuleBasedSampler(List<SamplingRule> rules, Sampler defaultSampler) {
        this.rules = rules;
        this.defaultSampler = defaultSampler;
    }

    @Override
    public SamplingResult shouldSample(
        Context parentContext,
        String traceId,
        String name,
        SpanKind spanKind,
        Attributes attributes,
        List<LinkData> parentLinks) {

        // Check each rule
        for (SamplingRule rule : rules) {
            if (rule.matches(name, attributes)) {
                return rule.getSampler().shouldSample(
                    parentContext, traceId, name, spanKind, attributes, parentLinks
                );
            }
        }

        // Fall back to default
        return defaultSampler.shouldSample(
            parentContext, traceId, name, spanKind, attributes, parentLinks
        );
    }
}

// Usage
Sampler sampler = new RuleBasedSampler(
    List.of(
        new SamplingRule(
            name -> name.startsWith("/api/payment"),
            Sampler.alwaysOn()
        ),
        new SamplingRule(
            name -> name.equals("/health"),
            Sampler.traceIdRatioBased(0.01)
        )
    ),
    Sampler.traceIdRatioBased(0.1)  // 10% default
);

Error-based sampling

Error-based sampling ensures that all traces containing errors are captured while applying reduced sampling to successful operations. This is crucial for debugging production issues.

Implementation approaches:

  1. Head-based with error detection: Sample all traces initially but mark for retention on error
  2. Tail-based error sampling: Use OpenTelemetry Collector's tail sampling processor
  3. Hybrid approach: Combine head-based sampling with server-side error retention

Example tail sampling configuration for OpenTelemetry Collector:

yaml
processors:
  tail_sampling:
    policies:
      # Always sample traces with errors
      - name: error-policy
        type: status_code
        status_code:
          status_codes: [ERROR]

      # Always sample slow traces (>2s)
      - name: latency-policy
        type: latency
        latency:
          threshold_ms: 2000

      # Sample 10% of successful traces
      - name: probabilistic-policy
        type: probabilistic
        probabilistic:
          sampling_percentage: 10

service:
  pipelines:
    traces:
      processors: [tail_sampling]

Latency-based sampling

Latency-based sampling prioritizes traces that exceed certain duration thresholds, helping identify performance bottlenecks:

yaml
processors:
  tail_sampling:
    decision_wait: 10s  # Wait for complete trace
    num_traces: 100000
    policies:
      # P99 - sample all traces over 5 seconds
      - name: p99-latency
        type: latency
        latency:
          threshold_ms: 5000

      # P95 - sample traces between 2-5 seconds at 50%
      - name: p95-latency
        type: and
        and:
          and_sub_policy:
            - name: latency-check
              type: latency
              latency:
                threshold_ms: 2000
            - name: probabilistic
              type: probabilistic
              probabilistic:
                sampling_percentage: 50

      # All other traces - sample at 5%
      - name: baseline
        type: probabilistic
        probabilistic:
          sampling_percentage: 5

Service-level sampling

Different services in your architecture may require different sampling strategies based on their criticality and traffic volume:

yaml
processors:
  tail_sampling:
    policies:
      # Critical services - 100% sampling
      - name: critical-services
        type: string_attribute
        string_attribute:
          key: service.name
          values:
            - payment-service
            - auth-service
            - checkout-service
          enabled_regex_matching: false
          invert_match: false

      # High-traffic services - 10% sampling
      - name: high-traffic-services
        type: and
        and:
          and_sub_policy:
            - name: service-match
              type: string_attribute
              string_attribute:
                key: service.name
                values:
                  - api-gateway
                  - user-service
            - name: sample-rate
              type: probabilistic
              probabilistic:
                sampling_percentage: 10

      # Background jobs - 1% sampling
      - name: background-services
        type: and
        and:
          and_sub_policy:
            - name: service-match
              type: string_attribute
              string_attribute:
                key: service.name
                values:
                  - batch-processor
                  - email-worker
            - name: sample-rate
              type: probabilistic
              probabilistic:
                sampling_percentage: 1

Rate-limiting sampling

Rate-limiting sampling happens on the server side and ensures that you don't exceed certain limits, for example, it allows to sample 10 or less traces per seconds.

Rate-limiting sampling supports adjusted counts but the accuracy is rather low. To achieve better results and improve performance, you should use rate-limiting sampling together with head-based sampling which is more efficient and accurate.

Most backends (including Uptrace) automatically apply rate-limiting sampling when necessary.

Tail-based sampling

With head-based sampling the sampling decision is made upfront and usually at random. Head-based sampling can't sample failed or unusually long operations, because that information is only available at the end of a trace.

With tail-based sampling we delay the sampling decision until all spans of a trace are available which enables better sampling decisions based on all data from the trace. For example, we can sample failed or unusually long traces.

Most OpenTelemetry backends automatically apply tail-based sampling when necessary, but you can also use OpenTelemetry Collector with tailsamplingprocessor to configure sampling according to your needs.

Probability-based sampling

Probability-based sampling randomly selects a subset of traces to record based on a configured probability or sampling rate. For example, you can set a sampling rate of 10%, which means that only 10% of the traces are recorded and the rest are discarded.

Probability-based sampling is useful when you want to reduce the amount of trace data while still maintaining a representative sample of system behavior. It helps strike a balance between overhead and the level of observability you need.

Here is how you can configure a probability-based sampler in OpenTelemetry Go:

go
import "go.opentelemetry.io/contrib/samplers/probability/consistent"

sampler := consistent.ParentProbabilityBased(
    consistent.ProbabilityBased(0.5), // sample 50% of traces
)

uptrace.ConfigureOpentelemetry(
    uptrace.WithTraceSampler(sampler),

    // Other options
)

Real-world sampling configurations

Here are production-ready sampling configurations for common scenarios.

E-commerce platform

Architecture: Web frontend, API gateway, 15 microservices, payment processor, inventory system
Traffic: 50,000 req/min peak, 15,000 req/min average
Requirements: 100% visibility for payments, error tracking, cost control

Application-level sampling (head-based):

go Go
import (
    "go.opentelemetry.io/otel/sdk/trace"
    "strings"
)

type EcommerceSampler struct{}

func (s *EcommerceSampler) ShouldSample(p trace.SamplingParameters) trace.SamplingResult {
    spanName := p.Name

    // Always sample payment and checkout operations
    if strings.Contains(spanName, "payment") ||
       strings.Contains(spanName, "checkout") ||
       strings.Contains(spanName, "/api/orders") {
        return trace.SamplingResult{Decision: trace.RecordAndSample}
    }

    // Sample 50% of cart operations
    if strings.Contains(spanName, "cart") {
        return trace.TraceIDRatioBased(0.5).ShouldSample(p)
    }

    // Sample 5% of product browsing
    if strings.Contains(spanName, "product") ||
       strings.Contains(spanName, "catalog") {
        return trace.TraceIDRatioBased(0.05).ShouldSample(p)
    }

    // Sample 1% of health checks and static content
    if strings.Contains(spanName, "health") ||
       strings.Contains(spanName, "static") {
        return trace.TraceIDRatioBased(0.01).ShouldSample(p)
    }

    // Default: 10% sampling
    return trace.TraceIDRatioBased(0.1).ShouldSample(p)
}
python Python
from opentelemetry.sdk.trace.sampling import (
    Sampler,
    SamplingResult,
    Decision,
    TraceIdRatioBased
)

class EcommerceSampler(Sampler):
    def __init__(self):
        self.cart_sampler = TraceIdRatioBased(0.5)
        self.browse_sampler = TraceIdRatioBased(0.05)
        self.health_sampler = TraceIdRatioBased(0.01)
        self.default_sampler = TraceIdRatioBased(0.1)

    def should_sample(self, parent_context, trace_id, name, kind, attributes, links, trace_state):
        # Always sample payment and checkout
        if any(keyword in name for keyword in ["payment", "checkout", "/api/orders"]):
            return SamplingResult(Decision.RECORD_AND_SAMPLE)

        # 50% for cart operations
        if "cart" in name:
            return self.cart_sampler.should_sample(
                parent_context, trace_id, name, kind, attributes, links, trace_state
            )

        # 5% for browsing
        if "product" in name or "catalog" in name:
            return self.browse_sampler.should_sample(
                parent_context, trace_id, name, kind, attributes, links, trace_state
            )

        # 1% for health/static
        if "health" in name or "static" in name:
            return self.health_sampler.should_sample(
                parent_context, trace_id, name, kind, attributes, links, trace_state
            )

        # Default 10%
        return self.default_sampler.should_sample(
            parent_context, trace_id, name, kind, attributes, links, trace_state
        )
java Java
import io.opentelemetry.sdk.trace.samplers.Sampler;
import io.opentelemetry.sdk.trace.samplers.SamplingResult;

public class EcommerceSampler implements Sampler {
    private final Sampler cartSampler = Sampler.traceIdRatioBased(0.5);
    private final Sampler browseSampler = Sampler.traceIdRatioBased(0.05);
    private final Sampler healthSampler = Sampler.traceIdRatioBased(0.01);
    private final Sampler defaultSampler = Sampler.traceIdRatioBased(0.1);

    @Override
    public SamplingResult shouldSample(
        Context parentContext,
        String traceId,
        String name,
        SpanKind spanKind,
        Attributes attributes,
        List<LinkData> parentLinks) {

        // Always sample payment and checkout
        if (name.contains("payment") ||
            name.contains("checkout") ||
            name.contains("/api/orders")) {
            return SamplingResult.recordAndSample();
        }

        // 50% for cart
        if (name.contains("cart")) {
            return cartSampler.shouldSample(parentContext, traceId, name, spanKind, attributes, parentLinks);
        }

        // 5% for browsing
        if (name.contains("product") || name.contains("catalog")) {
            return browseSampler.shouldSample(parentContext, traceId, name, spanKind, attributes, parentLinks);
        }

        // 1% for health/static
        if (name.contains("health") || name.contains("static")) {
            return healthSampler.shouldSample(parentContext, traceId, name, spanKind, attributes, parentLinks);
        }

        // Default 10%
        return defaultSampler.shouldSample(parentContext, traceId, name, spanKind, attributes, parentLinks);
    }
}

Collector-level sampling (tail-based with OpenTelemetry Collector):

yaml
# collector-config.yaml
receivers:
  otlp:
    protocols:
      grpc:
      http:

processors:
  tail_sampling:
    decision_wait: 10s
    num_traces: 50000
    expected_new_traces_per_sec: 1000
    policies:
      # Policy 1: Always sample errors
      - name: errors-policy
        type: status_code
        status_code:
          status_codes: [ERROR]

      # Policy 2: Always sample payment transactions
      - name: payment-policy
        type: string_attribute
        string_attribute:
          key: http.route
          values:
            - /api/payment
            - /api/checkout
            - /api/orders
          enabled_regex_matching: false

      # Policy 3: Sample slow transactions (>3s)
      - name: slow-transactions
        type: latency
        latency:
          threshold_ms: 3000

      # Policy 4: 50% of cart operations
      - name: cart-policy
        type: and
        and:
          and_sub_policy:
            - name: cart-match
              type: string_attribute
              string_attribute:
                key: http.route
                values: [/api/cart]
            - name: cart-sample
              type: probabilistic
              probabilistic:
                sampling_percentage: 50

      # Policy 5: 5% of all other traffic
      - name: baseline-policy
        type: probabilistic
        probabilistic:
          sampling_percentage: 5

exporters:
  otlp:
    endpoint: uptrace.dev:4317

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [tail_sampling]
      exporters: [otlp]

SaaS multi-tenant platform

Architecture: Multi-tenant SaaS, tenant isolation, usage-based billing
Traffic: Variable per tenant (10-10,000 req/min)
Requirements: Different sampling per tier, track all billing events

go Go
type TenantAwareSampler struct {
    premiumTenants map[string]bool
}

func (s *TenantAwareSampler) ShouldSample(p trace.SamplingParameters) trace.SamplingResult {
    var tenantID string
    var tier string

    // Extract tenant info from attributes
    for _, attr := range p.Attributes {
        if attr.Key == "tenant.id" {
            tenantID = attr.Value.AsString()
        }
        if attr.Key == "tenant.tier" {
            tier = attr.Value.AsString()
        }
    }

    // Always sample billing-related operations
    if strings.Contains(p.Name, "billing") ||
       strings.Contains(p.Name, "usage") {
        return trace.SamplingResult{Decision: trace.RecordAndSample}
    }

    // Sampling based on tenant tier
    switch tier {
    case "enterprise":
        return trace.TraceIDRatioBased(1.0).ShouldSample(p)  // 100%
    case "premium":
        return trace.TraceIDRatioBased(0.5).ShouldSample(p)  // 50%
    case "free":
        return trace.TraceIDRatioBased(0.01).ShouldSample(p) // 1%
    default:
        return trace.TraceIDRatioBased(0.1).ShouldSample(p)  // 10%
    }
}
python Python
class TenantAwareSampler(Sampler):
    def should_sample(self, parent_context, trace_id, name, kind, attributes, links, trace_state):
        tenant_tier = attributes.get("tenant.tier", "standard") if attributes else "standard"

        # Always sample billing operations
        if "billing" in name or "usage" in name:
            return SamplingResult(Decision.RECORD_AND_SAMPLE)

        # Tier-based sampling
        sampling_rates = {
            "enterprise": 1.0,   # 100%
            "premium": 0.5,      # 50%
            "free": 0.01,        # 1%
            "standard": 0.1      # 10%
        }

        rate = sampling_rates.get(tenant_tier, 0.1)
        sampler = TraceIdRatioBased(rate)

        return sampler.should_sample(
            parent_context, trace_id, name, kind, attributes, links, trace_state
        )
javascript Node.js
const { Sampler, SamplingDecision } = require('@opentelemetry/sdk-trace-base');
const { TraceIdRatioBasedSampler } = require('@opentelemetry/sdk-trace-base');

class TenantAwareSampler {
  constructor() {
    this.samplers = {
      enterprise: new TraceIdRatioBasedSampler(1.0),
      premium: new TraceIdRatioBasedSampler(0.5),
      free: new TraceIdRatioBasedSampler(0.01),
      standard: new TraceIdRatioBasedSampler(0.1),
    };
  }

  shouldSample(context, traceId, spanName, spanKind, attributes, links) {
    // Always sample billing operations
    if (spanName.includes('billing') || spanName.includes('usage')) {
      return {
        decision: SamplingDecision.RECORD_AND_SAMPLED,
      };
    }

    // Get tenant tier
    const tier = attributes['tenant.tier'] || 'standard';
    const sampler = this.samplers[tier] || this.samplers.standard;

    return sampler.shouldSample(context, traceId, spanName, spanKind, attributes, links);
  }
}

module.exports = { TenantAwareSampler };

Microservices with service mesh

Architecture: Kubernetes, Istio service mesh, 50+ microservices
Traffic: 100,000 req/min
Requirements: Service-level sampling, error tracking, performance monitoring

yaml
# OpenTelemetry Collector configuration for service mesh
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317

processors:
  # Add service metadata
  resource:
    attributes:
      - key: deployment.environment
        value: production
        action: insert

  tail_sampling:
    decision_wait: 15s
    num_traces: 100000
    expected_new_traces_per_sec: 2000
    policies:
      # Critical services - 100% sampling
      - name: critical-services
        type: string_attribute
        string_attribute:
          key: service.name
          values:
            - auth-service
            - payment-service
            - order-service
            - user-service

      # API gateway - errors and slow requests only
      - name: gateway-errors
        type: and
        and:
          and_sub_policy:
            - name: service-match
              type: string_attribute
              string_attribute:
                key: service.name
                values: [api-gateway]
            - name: status-or-latency
              type: or
              or:
                or_sub_policy:
                  - name: errors
                    type: status_code
                    status_code:
                      status_codes: [ERROR]
                  - name: slow
                    type: latency
                    latency:
                      threshold_ms: 1000

      # Database services - 20% sampling
      - name: database-services
        type: and
        and:
          and_sub_policy:
            - name: db-match
              type: string_attribute
              string_attribute:
                key: service.name
                values:
                  - postgres-service
                  - redis-service
                  - mongodb-service
            - name: sample-rate
              type: probabilistic
              probabilistic:
                sampling_percentage: 20

      # Background workers - 5% sampling
      - name: background-workers
        type: and
        and:
          and_sub_policy:
            - name: worker-match
              type: string_attribute
              string_attribute:
                key: service.name
                values:
                  - email-worker
                  - analytics-worker
                  - report-generator
            - name: sample-rate
              type: probabilistic
              probabilistic:
                sampling_percentage: 5

      # Internal services - 10% sampling
      - name: internal-services
        type: probabilistic
        probabilistic:
          sampling_percentage: 10

      # Always sample traces with errors
      - name: errors-always
        type: status_code
        status_code:
          status_codes: [ERROR]

      # Always sample very slow traces (>5s)
      - name: very-slow
        type: latency
        latency:
          threshold_ms: 5000

exporters:
  otlp:
    endpoint: uptrace.dev:4317
    headers:
      uptrace-dsn: "your-dsn-here"

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [resource, tail_sampling]
      exporters: [otlp]

Financial services

Requirements: Regulatory compliance, audit trails, security
Sampling strategy: 100% for regulated operations, reduced for others

yaml
# Compliance-focused configuration
processors:
  tail_sampling:
    decision_wait: 30s  # Longer wait for complete traces
    num_traces: 200000
    policies:
      # Regulatory requirement: Sample ALL financial transactions
      - name: financial-transactions
        type: string_attribute
        string_attribute:
          key: transaction.type
          values:
            - transfer
            - withdrawal
            - deposit
            - payment
            - trade
          enabled_regex_matching: false

      # Compliance: All authentication events
      - name: authentication-events
        type: string_attribute
        string_attribute:
          key: event.type
          values:
            - login
            - logout
            - password_change
            - mfa
          enabled_regex_matching: false

      # Security: All errors and security events
      - name: security-events
        type: or
        or:
          or_sub_policy:
            - name: errors
              type: status_code
              status_code:
                status_codes: [ERROR]
            - name: security-attribute
              type: string_attribute
              string_attribute:
                key: security.event
                values: ["true"]

      # Audit: User actions on sensitive data
      - name: sensitive-data-access
        type: string_attribute
        string_attribute:
          key: data.classification
          values:
            - pii
            - financial
            - confidential

      # Performance monitoring: 10% of read-only operations
      - name: readonly-sampling
        type: and
        and:
          and_sub_policy:
            - name: readonly-check
              type: string_attribute
              string_attribute:
                key: db.operation
                values: [SELECT, GET, READ]
            - name: sample-rate
              type: probabilistic
              probabilistic:
                sampling_percentage: 10

  # Ensure sensitive data is not exported
  attributes:
    actions:
      - key: credit_card
        action: delete
      - key: ssn
        action: delete
      - key: password
        action: delete

Development and staging environments

Goal: Maximum visibility for debugging without production constraints

yaml Development
# Development: No sampling, full visibility
receivers:
  otlp:
    protocols:
      grpc:
      http:

processors:
  batch:
    timeout: 1s
    send_batch_size: 50

exporters:
  otlp:
    endpoint: localhost:4317
  # Also export to console for immediate feedback
  logging:
    loglevel: debug

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlp, logging]
yaml Staging
# Staging: Match production sampling to test real conditions
receivers:
  otlp:
    protocols:
      grpc:

processors:
  tail_sampling:
    decision_wait: 10s
    policies:
      # Same policies as production
      - name: errors-always
        type: status_code
        status_code:
          status_codes: [ERROR]

      - name: production-like-sampling
        type: probabilistic
        probabilistic:
          sampling_percentage: 10  # Match production

  # Add environment tag
  resource:
    attributes:
      - key: deployment.environment
        value: staging
        action: insert

exporters:
  otlp:
    endpoint: staging-uptrace.internal:4317

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [resource, tail_sampling]
      exporters: [otlp]

Configuration with environment variables

For quick setup without collector, use environment variables:

bash Production
# Production - 10% sampling
export OTEL_TRACES_SAMPLER=traceidratio
export OTEL_TRACES_SAMPLER_ARG=0.1
export OTEL_SERVICE_NAME=api-service
export OTEL_EXPORTER_OTLP_ENDPOINT=https://uptrace.dev:4317
export OTEL_EXPORTER_OTLP_HEADERS="uptrace-dsn=your-dsn"
bash Development
# Development - 100% sampling
export OTEL_TRACES_SAMPLER=always_on
export OTEL_SERVICE_NAME=api-service-dev
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
bash High-traffic service
# High-traffic - 1% sampling
export OTEL_TRACES_SAMPLER=traceidratio
export OTEL_TRACES_SAMPLER_ARG=0.01
export OTEL_SERVICE_NAME=content-delivery
export OTEL_EXPORTER_OTLP_ENDPOINT=https://uptrace.dev:4317
bash Critical service
# Critical service - 100% sampling with parent-based
export OTEL_TRACES_SAMPLER=parentbased_always_on
export OTEL_SERVICE_NAME=payment-processor
export OTEL_EXPORTER_OTLP_ENDPOINT=https://uptrace.dev:4317

Language-specific sampling guides

For implementation details and examples specific to your programming language, see the following guides:

These guides include language-specific code examples, SDK configuration, and best practices for implementing sampling in production applications.