What is Distributed Tracing? Concepts & OpenTelemetry Implementation

Distributed tracing is an observability technique that tracks requests as they flow through distributed systems, providing visibility into how different services interact to fulfill user requests. It creates a complete view of a request's journey across microservices, APIs, and databases, recording timing, dependencies, and failures along the way.

With distributed tracing, you can analyze the timing of each operation, monitor logs and errors as they occur in real-time, and identify bottlenecks across your entire system. This technique is particularly valuable in microservices architectures where applications consist of multiple independent services working together.

How Distributed Tracing Works

Modern applications built on microservices or serverless architectures rely on multiple services interacting to fulfill a single user request. This complexity makes it challenging to identify performance bottlenecks, diagnose issues, and analyze overall system behavior.

Distributed tracing addresses these challenges by creating a trace—a representation of a single request's journey through various services and components. Each trace consists of interconnected spans, where each span represents an individual operation within a specific service or component.

When a request enters a service, the trace context propagates with the request through trace headers, allowing downstream services to participate in the same trace. As the request flows through the system, each service generates its own span and updates the trace context with information about the operation's duration, metadata, and relevant context.

flowchart LR browser(Browser) --- webapp(Web App) mobile(Mobile App) --- gateway(API Gateway) gateway --- service1 & service2 & service3 webapp --- service1 & service2 & service3 service1(Account service) --> db1[(Account DB)] service2(Inventory service) --> db2[(Inventory DB)] service3(Shipping service) --> db3[(Shipping DB)]

Distributed tracing tools use the generated trace data to provide visibility into system behavior, identify performance issues, assist with debugging, and help ensure the reliability and scalability of distributed applications.

Span Kinds

OpenTelemetry defines five span kinds that describe how services interact within a trace:

Span KindTypeWhen to UseCommon Examples
ServerSynchronousHandling incoming requestsHTTP server, gRPC server, GraphQL resolvers
ClientSynchronousMaking outbound requestsHTTP client, database queries, Redis calls
ProducerAsynchronousPublishing messages (ends when message accepted)Kafka publish, RabbitMQ send, SQS enqueue
ConsumerAsynchronousProcessing messages (from receive to completion)Kafka consume, background job processing
InternalIn-processOperations within a service (no network calls)Business logic, calculations, data transform

Choosing the correct span kind ensures accurate visualization in trace waterfalls and helps backends understand service dependencies.

Getting Started with OpenTelemetry Tracing

The easiest way to get started is to choose an OpenTelemetry APM and follow its documentation. Many vendors offer pre-configured OpenTelemetry distributions that simplify the setup process.

Some vendors, such as Uptrace and SkyWalking, allow you to try their products without creating an account.

Uptrace is an open source APM for OpenTelemetry with an intuitive query builder, rich dashboards, automatic alerts, and integrations for most languages and frameworks. It helps developers and operators gain insight into the latency, errors, and dependencies of their distributed applications, identify performance bottlenecks, debug problems, and optimize overall system performance.

You can get started with Uptrace by downloading a DEB/RPM package or a pre-compiled Go binary.

Core Concepts

Spans

A span represents a unit of work in a trace, such as a remote procedure call (RPC), database query, or in-process function call. Each span contains:

  • A span name (operation name)
  • A parent span ID (except for root spans)
  • A span kind
  • Start and end timestamps
  • A status indicating success or failure
  • Key-value attributes describing the operation
  • A timeline of events
  • Links to other spans
  • A span context that propagates trace ID and other data between services

A trace is a tree of spans showing the path of a request through an application. The root span is the first span in a trace.

flowchart TD client(User Browser) --> webapp(Web App) webapp --> service1 & service2 service1(Account service) --> db1[(Account DB)] db1 --> q1(SELECT * FROM accounts) service2(Inventory service) --> db2[(Inventory DB)] db2 --> q2(SELECT * FROM inventories)

Span Names

OpenTelemetry backends use span names and attributes to group similar spans together. To ensure proper grouping, use short, concise names. Keep the total number of unique span names below 1,000 to avoid creating excessive span groups that can degrade performance.

Good span names (short, distinctive, and groupable):

Span nameComment
GET /projects/:idRoute name with parameter placeholders
select_projectFunction name without arguments
SELECT * FROM projects WHERE id = ?Database query with placeholders

Poor span names (contain variable parameters):

Span nameComment
GET /projects/42Contains variable parameter 42
select_project(42)Contains variable argument 42
SELECT * FROM projects WHERE id = 42Contains variable value 42

Span Kind

Span kind describes the relationship between spans in a trace and helps systems understand how services interact. It must be one of the following values:

Server

Server spans represent synchronous request handling on the server side. The span covers the time from receiving a request to sending a response.

Common use cases:

  • HTTP server request handlers
  • gRPC server methods
  • GraphQL resolvers
  • Websocket message handlers

Examples:

go Go
_, span := tracer.Start(ctx, "handle_request",
    trace.WithSpanKind(trace.SpanKindServer),
    trace.WithAttributes(
        semconv.HTTPMethod("GET"),
        semconv.HTTPRoute("/api/users/:id"),
    ))
defer span.End()
python Python
with tracer.start_as_current_span(
    "handle_request",
    kind=trace.SpanKind.SERVER,
    attributes={
        "http.method": "GET",
        "http.route": "/api/users/:id",
    }
) as span:
    # Handle request
    pass
js Node.js
const span = tracer.startSpan('handle_request', {
  kind: trace.SpanKind.SERVER,
  attributes: {
    'http.method': 'GET',
    'http.route': '/api/users/:id',
  }
});
// Handle request
span.end();

Client

Client spans represent synchronous outbound requests from the client side. The span covers the time from sending a request to receiving a response.

Common use cases:

  • HTTP client requests
  • gRPC client calls
  • Database queries
  • Cache operations (Redis, Memcached)

Examples:

go Go
_, span := tracer.Start(ctx, "database_query",
    trace.WithSpanKind(trace.SpanKindClient),
    trace.WithAttributes(
        semconv.DBSystemPostgreSQL,
        semconv.DBQueryText("SELECT * FROM users WHERE id = ?"),
    ))
defer span.End()
python Python
with tracer.start_as_current_span(
    "database_query",
    kind=trace.SpanKind.CLIENT,
    attributes={
        "db.system": "postgresql",
        "db.query.text": "SELECT * FROM users WHERE id = ?",
    }
) as span:
    # Execute query
    pass
js Node.js
const span = tracer.startSpan('database_query', {
  kind: trace.SpanKind.CLIENT,
  attributes: {
    'db.system': 'postgresql',
    'db.query.text': 'SELECT * FROM users WHERE id = ?',
  }
});
// Execute query
span.end();

Producer

Producer spans represent asynchronous message creation and sending operations. The span ends when the message is accepted by the messaging system (not when it's consumed).

Common use cases:

  • Publishing to Kafka topics
  • Sending messages to RabbitMQ
  • Publishing to AWS SQS/SNS
  • Enqueueing background jobs

Examples:

go Go
_, span := tracer.Start(ctx, "publish_event",
    trace.WithSpanKind(trace.SpanKindProducer),
    trace.WithAttributes(
        semconv.MessagingSystemKafka,
        semconv.MessagingDestinationName("user.events"),
    ))
defer span.End()
python Python
with tracer.start_as_current_span(
    "publish_event",
    kind=trace.SpanKind.PRODUCER,
    attributes={
        "messaging.system": "kafka",
        "messaging.destination.name": "user.events",
    }
) as span:
    # Publish message
    pass
js Node.js
const span = tracer.startSpan('publish_event', {
  kind: trace.SpanKind.PRODUCER,
  attributes: {
    'messaging.system': 'kafka',
    'messaging.destination.name': 'user.events',
  }
});
// Publish message
span.end();

Consumer

Consumer spans represent asynchronous message receipt and processing operations. The span covers the time from receiving a message to completing its processing.

Common use cases:

  • Consuming from Kafka topics
  • Processing messages from RabbitMQ
  • Receiving from AWS SQS
  • Background job processing

Examples:

go Go
_, span := tracer.Start(ctx, "process_message",
    trace.WithSpanKind(trace.SpanKindConsumer),
    trace.WithAttributes(
        semconv.MessagingSystemKafka,
        semconv.MessagingOperationProcess,
    ))
defer span.End()
python Python
with tracer.start_as_current_span(
    "process_message",
    kind=trace.SpanKind.CONSUMER,
    attributes={
        "messaging.system": "kafka",
        "messaging.operation.type": "process",
    }
) as span:
    # Process message
    pass
js Node.js
const span = tracer.startSpan('process_message', {
  kind: trace.SpanKind.CONSUMER,
  attributes: {
    'messaging.system': 'kafka',
    'messaging.operation.type': 'process',
  }
});
// Process message
span.end();

Internal

Internal spans represent in-process operations that don't involve external services or network calls.

Common use cases:

  • Application business logic
  • Data transformation functions
  • Internal calculations
  • In-memory operations

Examples:

go Go
_, span := tracer.Start(ctx, "calculate_total",
    trace.WithSpanKind(trace.SpanKindInternal),
    trace.WithAttributes(
        attribute.Int("item_count", len(items)),
    ))
defer span.End()
python Python
with tracer.start_as_current_span(
    "calculate_total",
    kind=trace.SpanKind.INTERNAL,
    attributes={
        "item_count": len(items),
    }
) as span:
    # Calculate total
    pass
js Node.js
const span = tracer.startSpan('calculate_total', {
  kind: trace.SpanKind.INTERNAL,
  attributes: {
    'item_count': items.length,
  }
});
// Calculate total
span.end();

Span Kind in Traces: In a typical trace waterfall, you'll see client and server spans paired together (the client span calling a service creates a server span on that service), with internal spans showing work within each service, and producer/consumer spans showing asynchronous message flows.

Status Code

Status code indicates whether an operation succeeded or failed:

  • ok – Success
  • error – Failure
  • unset – Default value, allowing backends to assign status

Attributes

Attributes provide contextual information about spans. For example, an HTTP endpoint might have attributes like http.method = GET and http.route = /projects/:id.

While you can name attributes freely, use semantic attribute conventions for common operations to ensure consistency across systems.

Events

Events are timestamped annotations with attributes that lack an end time (and therefore no duration). They typically represent exceptions, errors, logs, and messages, though you can create custom events as well.

Context

Span context carries information about a span as it propagates through different components and services. It includes:

  • Trace ID: Globally unique identifier for the entire trace (128-bit / 16 bytes, shared by all spans in the trace)
  • Span ID: Unique identifier for a specific span within a trace (64-bit / 8 bytes)
  • Trace flags: Properties such as sampling status (8-bit field, where 01 = sampled)
  • Trace state: Optional vendor-specific or application-specific data

Context maintains continuity and correlation of spans within a distributed system, allowing services to associate their spans with the correct trace and providing end-to-end visibility.

Span Structure Example

Here's a complete JSON representation of a span showing all key fields:

json
{
  "traceId": "5b8efff798038103d269b633813fc60c",
  "spanId": "eee19b7ec3c1b174",
  "parentSpanId": "eee19b7ec3c1b173",
  "name": "GET /api/users/:id",
  "kind": "SERVER",
  "startTimeUnixNano": 1704067200000000000,
  "endTimeUnixNano": 1704067200150000000,
  "attributes": [
    {
      "key": "http.method",
      "value": { "stringValue": "GET" }
    },
    {
      "key": "http.route",
      "value": { "stringValue": "/api/users/:id" }
    },
    {
      "key": "http.status_code",
      "value": { "intValue": 200 }
    },
    {
      "key": "service.name",
      "value": { "stringValue": "user-service" }
    }
  ],
  "events": [
    {
      "timeUnixNano": 1704067200050000000,
      "name": "database.query.start",
      "attributes": [
        {
          "key": "db.statement",
          "value": { "stringValue": "SELECT * FROM users WHERE id = ?" }
        }
      ]
    },
    {
      "timeUnixNano": 1704067200100000000,
      "name": "cache.lookup",
      "attributes": [
        {
          "key": "cache.hit",
          "value": { "boolValue": true }
        }
      ]
    }
  ],
  "status": {
    "code": "STATUS_CODE_OK"
  },
  "resource": {
    "attributes": [
      {
        "key": "service.name",
        "value": { "stringValue": "user-service" }
      },
      {
        "key": "service.version",
        "value": { "stringValue": "1.2.3" }
      },
      {
        "key": "host.name",
        "value": { "stringValue": "prod-server-01" }
      }
    ]
  }
}

This span shows:

  • Duration: 150ms (from start to end time)
  • Parent relationship: Connected to parent span via parentSpanId
  • Attributes: HTTP request details and service information
  • Events: Two timestamped events during execution (database query and cache lookup)
  • Status: Successful operation
  • Resource: Service and host metadata

Context Propagation

Context propagation ensures that trace IDs, span IDs, and other metadata consistently propagate across services and components. OpenTelemetry handles both in-process and distributed propagation.

For a comprehensive guide on context propagation, including W3C TraceContext, propagators, baggage, and troubleshooting broken traces, see the OpenTelemetry Context Propagation guide.

In-Process Propagation

  • Implicit: Automatic storage in thread-local variables (Java, Python, Ruby, Node.js)
  • Explicit: Manual passing of context as function arguments (Go)

Distributed Propagation

OpenTelemetry supports several protocols for serializing and passing context data:

  • W3C Trace Context (recommended, enabled by default): Uses traceparent header
    Example: traceparent=00-84b54e9330faae5350f0dd8673c98146-279fa73bc935cc05-01
  • B3 Zipkin: Uses headers starting with x-b3-
    Example: X-B3-TraceId

W3C Trace Context Format

The traceparent header contains four fields separated by dashes:

text
traceparent: 00-5b8efff798038103d269b633813fc60c-eee19b7ec3c1b174-01
             ││ │                                │                  └─ Trace flags (01 = sampled, 00 = not sampled)
             ││ │                                └──────────────────── Parent ID (16 hex chars, 8 bytes)
             ││ └───────────────────────────────────────────────────── Trace ID (32 hex chars, 16 bytes)
             │└─────────────────────────────────────────────────────── Version (00 - current W3C standard)

Example HTTP Request with Context:

http
GET /api/users/123 HTTP/1.1
Host: api.example.com
traceparent: 00-5b8efff798038103d269b633813fc60c-eee19b7ec3c1b174-01
tracestate: uptrace=t61rcWkgMzE

Manual Context Propagation

While instrumentation libraries handle propagation automatically, you may need to manually propagate context for custom protocols or unsupported frameworks.

HTTP Client Example:

go Go
import (
    "go.opentelemetry.io/otel"
    "go.opentelemetry.io/otel/propagation"
)

// Create a new span
ctx, span := tracer.Start(ctx, "external_api_call")
defer span.End()

// Create HTTP request
req, _ := http.NewRequestWithContext(ctx, "GET", "https://api.example.com/data", nil)

// Inject trace context into request headers
otel.GetTextMapPropagator().Inject(ctx, propagation.HeaderCarrier(req.Header))

// Make the request
resp, err := http.DefaultClient.Do(req)
python Python
from opentelemetry import trace
from opentelemetry.propagate import inject
import requests

# Create a new span
with tracer.start_as_current_span("external_api_call"):
    headers = {}

    # Inject trace context into headers
    inject(headers)

    # Make the request
    response = requests.get("https://api.example.com/data", headers=headers)
js Node.js
const { trace, context, propagation } = require('@opentelemetry/api');
const axios = require('axios');

// Create a new span
const span = tracer.startSpan('external_api_call');

context.with(trace.setSpan(context.active(), span), () => {
  const headers = {};

  // Inject trace context into headers
  propagation.inject(context.active(), headers);

  // Make the request
  axios.get('https://api.example.com/data', { headers })
    .finally(() => span.end());
});

HTTP Server Example:

go Go
import (
    "go.opentelemetry.io/otel"
    "go.opentelemetry.io/otel/propagation"
)

func handler(w http.ResponseWriter, r *http.Request) {
    // Extract trace context from incoming request headers
    ctx := otel.GetTextMapPropagator().Extract(r.Context(),
        propagation.HeaderCarrier(r.Header))

    // Create span with extracted context
    ctx, span := tracer.Start(ctx, "handle_request")
    defer span.End()

    // Process request with traced context
    processRequest(ctx, r)
}
python Python
from opentelemetry import trace
from opentelemetry.propagate import extract
from flask import Flask, request

app = Flask(__name__)

@app.route('/api/endpoint')
def handle_request():
    # Extract trace context from incoming headers
    ctx = extract(request.headers)

    # Create span with extracted context
    with tracer.start_as_current_span("handle_request", context=ctx):
        # Process request with traced context
        return process_request()
js Node.js
const { trace, context, propagation } = require('@opentelemetry/api');
const express = require('express');

const app = express();

app.get('/api/endpoint', (req, res) => {
  // Extract trace context from incoming headers
  const extractedContext = propagation.extract(context.active(), req.headers);

  // Create span with extracted context
  const span = tracer.startSpan('handle_request', {}, extractedContext);

  context.with(trace.setSpan(extractedContext, span), () => {
    // Process request with traced context
    processRequest(req, res);
    span.end();
  });
});

Troubleshooting Context Propagation

Verify headers are present:

bash
# Check if traceparent header is being sent
curl -v https://api.example.com/endpoint | grep traceparent

Common propagation issues:

  1. Missing propagator configuration: Ensure propagator is set globally
    go
    otel.SetTextMapPropagator(propagation.NewCompositeTextMapPropagator(
        propagation.TraceContext{},
        propagation.Baggage{},
    ))
    
  2. Custom HTTP client not instrumented: Use instrumented HTTP client or manually inject context
  3. Async operations losing context: Explicitly pass context to goroutines/threads
    go
    // ✅ Good: Pass context explicitly
    go func(ctx context.Context) {
        _, span := tracer.Start(ctx, "async_work")
        defer span.End()
        // work...
    }(ctx)
    
    // ❌ Bad: Context lost
    go func() {
        _, span := tracer.Start(context.Background(), "async_work")
        defer span.End()
        // work...
    }()
    
  4. Middleware order: Ensure tracing middleware runs before other middleware that creates spans

Baggage

Baggage propagates custom key-value pairs between services, similar to span context. It allows you to associate contextual information (such as user IDs or session IDs) with requests or transactions.

Baggage provides a standardized way to pass relevant data throughout the system, enabling better observability and analysis without relying on ad hoc mechanisms or manual instrumentation.

Instrumentation

OpenTelemetry instrumentations are plugins for popular frameworks and libraries that use the OpenTelemetry API to record important operations such as HTTP requests, database queries, logs, and errors.

What to Instrument

Focus instrumentation efforts on operations that provide the most value:

  • Network operations: HTTP requests, RPC calls
  • Filesystem operations: Reading and writing files
  • Database queries: Combined network and filesystem operations
  • Errors and logs: Using structured logging

Manual Instrumentation

While automatic instrumentation covers common frameworks, manual instrumentation gives you fine-grained control over what gets traced. Here are comprehensive examples for creating and managing spans.

Creating Spans

go Go
import (
    "context"
    "go.opentelemetry.io/otel"
    "go.opentelemetry.io/otel/attribute"
    "go.opentelemetry.io/otel/codes"
    "go.opentelemetry.io/otel/trace"
)

// Get tracer (typically done once at startup)
tracer := otel.Tracer("my-service")

func processOrder(ctx context.Context, orderID string) error {
    // Create a span
    ctx, span := tracer.Start(ctx, "process_order",
        trace.WithSpanKind(trace.SpanKindInternal),
    )
    defer span.End()

    // Add attributes
    span.SetAttributes(
        attribute.String("order.id", orderID),
        attribute.String("customer.tier", "premium"),
    )

    // Do work
    if err := validateOrder(ctx, orderID); err != nil {
        // Record error
        span.RecordError(err)
        span.SetStatus(codes.Error, "order validation failed")
        return err
    }

    // Record event
    span.AddEvent("order_validated",
        trace.WithAttributes(
            attribute.String("validation.result", "success"),
        ))

    span.SetStatus(codes.Ok, "order processed successfully")
    return nil
}
python Python
from opentelemetry import trace
from opentelemetry.trace import Status, StatusCode

tracer = trace.get_tracer("my-service")

def process_order(order_id: str):
    with tracer.start_as_current_span(
        "process_order",
        kind=trace.SpanKind.INTERNAL,
    ) as span:
        # Add attributes
        span.set_attributes({
            "order.id": order_id,
            "customer.tier": "premium",
        })

        try:
            validate_order(order_id)

            # Record event
            span.add_event("order_validated", {
                "validation.result": "success"
            })

            span.set_status(Status(StatusCode.OK))

        except Exception as e:
            # Record error
            span.record_exception(e)
            span.set_status(Status(StatusCode.ERROR, "order validation failed"))
            raise
js Node.js
const { trace } = require('@opentelemetry/api');
const { SpanStatusCode } = require('@opentelemetry/api');

const tracer = trace.getTracer('my-service');

function processOrder(orderId) {
  const span = tracer.startSpan('process_order', {
    kind: trace.SpanKind.INTERNAL,
  });

  // Add attributes
  span.setAttributes({
    'order.id': orderId,
    'customer.tier': 'premium',
  });

  try {
    validateOrder(orderId);

    // Record event
    span.addEvent('order_validated', {
      'validation.result': 'success'
    });

    span.setStatus({ code: SpanStatusCode.OK });

  } catch (error) {
    // Record error
    span.recordException(error);
    span.setStatus({
      code: SpanStatusCode.ERROR,
      message: 'order validation failed'
    });
    throw error;

  } finally {
    span.end();
  }
}

Creating Nested Spans

Nested spans show parent-child relationships and help visualize the breakdown of operations.

go Go
func processOrder(ctx context.Context, orderID string) error {
    ctx, span := tracer.Start(ctx, "process_order")
    defer span.End()

    // Child span 1: Validate
    if err := validateOrder(ctx, orderID); err != nil {
        return err
    }

    // Child span 2: Calculate
    total, err := calculateTotal(ctx, orderID)
    if err != nil {
        return err
    }

    // Child span 3: Save
    return saveOrder(ctx, orderID, total)
}

func validateOrder(ctx context.Context, orderID string) error {
    ctx, span := tracer.Start(ctx, "validate_order")
    defer span.End()

    // Validation logic
    return nil
}

func calculateTotal(ctx context.Context, orderID string) (float64, error) {
    ctx, span := tracer.Start(ctx, "calculate_total")
    defer span.End()

    // Calculation logic
    return 99.99, nil
}

func saveOrder(ctx context.Context, orderID string, total float64) error {
    ctx, span := tracer.Start(ctx, "save_order",
        trace.WithSpanKind(trace.SpanKindClient),
    )
    defer span.End()

    span.SetAttributes(
        attribute.Float64("order.total", total),
        attribute.String("db.system", "postgresql"),
    )

    // Database save logic
    return nil
}
python Python
def process_order(order_id: str):
    with tracer.start_as_current_span("process_order") as span:
        # Child span 1: Validate
        validate_order(order_id)

        # Child span 2: Calculate
        total = calculate_total(order_id)

        # Child span 3: Save
        save_order(order_id, total)

def validate_order(order_id: str):
    with tracer.start_as_current_span("validate_order"):
        # Validation logic
        pass

def calculate_total(order_id: str) -> float:
    with tracer.start_as_current_span("calculate_total"):
        # Calculation logic
        return 99.99

def save_order(order_id: str, total: float):
    with tracer.start_as_current_span(
        "save_order",
        kind=trace.SpanKind.CLIENT,
        attributes={
            "order.total": total,
            "db.system": "postgresql",
        }
    ):
        # Database save logic
        pass
js Node.js
function processOrder(orderId) {
  const span = tracer.startSpan('process_order');

  try {
    // Child span 1: Validate
    validateOrder(orderId);

    // Child span 2: Calculate
    const total = calculateTotal(orderId);

    // Child span 3: Save
    saveOrder(orderId, total);
  } finally {
    span.end();
  }
}

function validateOrder(orderId) {
  const span = tracer.startSpan('validate_order');
  try {
    // Validation logic
  } finally {
    span.end();
  }
}

function calculateTotal(orderId) {
  const span = tracer.startSpan('calculate_total');
  try {
    // Calculation logic
    return 99.99;
  } finally {
    span.end();
  }
}

function saveOrder(orderId, total) {
  const span = tracer.startSpan('save_order', {
    kind: trace.SpanKind.CLIENT,
    attributes: {
      'order.total': total,
      'db.system': 'postgresql',
    }
  });
  try {
    // Database save logic
  } finally {
    span.end();
  }
}

The resulting trace will show:

text
process_order (200ms)
├── validate_order (50ms)
├── calculate_total (30ms)
└── save_order (120ms)

Adding Semantic Attributes

Use semantic conventions for consistent attribute naming:

go Go
import "go.opentelemetry.io/otel/semconv/v1.24.0"

// HTTP attributes
span.SetAttributes(
    semconv.HTTPMethod("GET"),
    semconv.HTTPRoute("/api/users/:id"),
    semconv.HTTPStatusCode(200),
)

// Database attributes
span.SetAttributes(
    semconv.DBSystemPostgreSQL,
    semconv.DBNamespace("production"),
    semconv.DBQueryText("SELECT * FROM users WHERE id = ?"),
)

// Messaging attributes
span.SetAttributes(
    semconv.MessagingSystemKafka,
    semconv.MessagingDestinationName("user.events"),
)

// RPC attributes
span.SetAttributes(
    semconv.RPCSystemGRPC,
    semconv.RPCService("UserService"),
    semconv.RPCMethod("GetUser"),
)
python Python
from opentelemetry.semconv.trace import SpanAttributes

# HTTP attributes
span.set_attributes({
    SpanAttributes.HTTP_METHOD: "GET",
    SpanAttributes.HTTP_ROUTE: "/api/users/:id",
    SpanAttributes.HTTP_STATUS_CODE: 200,
})

# Database attributes
span.set_attributes({
    SpanAttributes.DB_SYSTEM: "postgresql",
    SpanAttributes.DB_NAMESPACE: "production",
    SpanAttributes.DB_QUERY_TEXT: "SELECT * FROM users WHERE id = ?",
})

# Messaging attributes
span.set_attributes({
    SpanAttributes.MESSAGING_SYSTEM: "kafka",
    SpanAttributes.MESSAGING_DESTINATION_NAME: "user.events",
})

# RPC attributes
span.set_attributes({
    SpanAttributes.RPC_SYSTEM: "grpc",
    SpanAttributes.RPC_SERVICE: "UserService",
    SpanAttributes.RPC_METHOD: "GetUser",
})
js Node.js
const { SEMATTRS_HTTP_METHOD, SEMATTRS_HTTP_ROUTE, SEMATTRS_HTTP_STATUS_CODE,
        SEMATTRS_DB_SYSTEM, SEMATTRS_DB_NAMESPACE, SEMATTRS_DB_QUERY_TEXT,
        SEMATTRS_MESSAGING_SYSTEM, SEMATTRS_MESSAGING_DESTINATION_NAME,
        SEMATTRS_RPC_SYSTEM, SEMATTRS_RPC_SERVICE, SEMATTRS_RPC_METHOD
} = require('@opentelemetry/semantic-conventions');

// HTTP attributes
span.setAttributes({
  [SEMATTRS_HTTP_METHOD]: 'GET',
  [SEMATTRS_HTTP_ROUTE]: '/api/users/:id',
  [SEMATTRS_HTTP_STATUS_CODE]: 200,
});

// Database attributes
span.setAttributes({
  [SEMATTRS_DB_SYSTEM]: 'postgresql',
  [SEMATTRS_DB_NAMESPACE]: 'production',
  [SEMATTRS_DB_QUERY_TEXT]: 'SELECT * FROM users WHERE id = ?',
});

// Messaging attributes
span.setAttributes({
  [SEMATTRS_MESSAGING_SYSTEM]: 'kafka',
  [SEMATTRS_MESSAGING_DESTINATION_NAME]: 'user.events',
});

// RPC attributes
span.setAttributes({
  [SEMATTRS_RPC_SYSTEM]: 'grpc',
  [SEMATTRS_RPC_SERVICE]: 'UserService',
  [SEMATTRS_RPC_METHOD]: 'GetUser',
});

Recording Events and Errors

Events capture point-in-time occurrences within a span:

go Go
// Record a simple event
span.AddEvent("cache_miss")

// Event with attributes
span.AddEvent("retry_attempt",
    trace.WithAttributes(
        attribute.Int("attempt.number", 3),
        attribute.String("retry.reason", "connection_timeout"),
    ))

// Record an error
if err != nil {
    span.RecordError(err,
        trace.WithAttributes(
            attribute.String("error.type", "ValidationError"),
        ))
    span.SetStatus(codes.Error, err.Error())
}
python Python
# Record a simple event
span.add_event("cache_miss")

# Event with attributes
span.add_event("retry_attempt", {
    "attempt.number": 3,
    "retry.reason": "connection_timeout"
})

# Record exception with stack trace
try:
    risky_operation()
except Exception as e:
    span.record_exception(e)  # Automatically captures stack trace
    span.set_status(Status(StatusCode.ERROR))
    raise
js Node.js
// Record a simple event
span.addEvent('cache_miss');

// Event with attributes
span.addEvent('retry_attempt', {
  'attempt.number': 3,
  'retry.reason': 'connection_timeout'
});

// Record an error
if (error) {
  span.recordException(error);
  span.setStatus({
    code: SpanStatusCode.ERROR,
    message: error.message
  });
}

Best Practices

Initialize Early

Initialize OpenTelemetry before importing libraries that require instrumentation to ensure accurate trace capture.

Balance Automatic and Manual Instrumentation

While automatic instrumentation provides a good starting point, manual instrumentation offers more control for specific scenarios.

Focus on Critical Components

Instrument components critical for performance, reliability, or user experience. Be selective to avoid unnecessary overhead.

Follow Semantic Conventions

Use standardized attribute names, span names, and tags as defined by the OpenTelemetry specification to ensure consistency and interoperability.

Implement Smart Sampling

Consider tail-based sampling to manage trace data volume while capturing critical traces.

Troubleshooting

Missing Spans

Problem: Expected spans don't appear in your tracing backend.

Common Causes:

  • SDK not initialized before application startup
  • Instrumentation libraries misconfigured
  • Overly aggressive sampling
  • Export endpoint unreachable

Solutions:

  • Verify initialization order
  • Check auto-instrumentation package installation
  • Temporarily set sampling to 100% for debugging
  • Test backend connectivity and credentials
  • Enable debug logging

Broken Context Propagation

Problem: Spans appear disconnected or traces fragment across services.

Common Causes:

  • Context not propagated between services
  • Uninstrumented custom protocols
  • Async operations breaking context
  • Missing trace headers

Solutions:

  • Verify HTTP client/server instrumentation
  • Manually manage context for custom protocols
  • Use explicit context management for async operations
  • Confirm trace headers are present in requests
  • Configure propagation for all communication protocols

Performance Overhead

Problem: Application performance degrades after enabling tracing.

Common Causes:

  • Over-instrumentation
  • Synchronous export blocking threads
  • Large attributes or excessive events
  • High sampling rates

Solutions:

  • Use asynchronous batch exporters
  • Implement appropriate sampling (1-5% for high-traffic applications)
  • Remove unnecessary spans
  • Limit attribute sizes
  • Consider tail-based sampling

High Cardinality Issues

Problem: Too many unique span names or attribute values cause storage issues.

Common Causes:

  • Variable data in span names
  • Unlimited attribute values
  • Auto-generated unique identifiers

Solutions:

  • Use parameterized span names
  • Normalize or bucket attribute values
  • Follow semantic conventions for naming

Export Failures

Problem: Spans generate but don't reach the backend.

Common Causes:

  • Network connectivity issues
  • Authentication problems
  • Backend unavailability
  • Buffer overflow

Solutions:

  • Monitor exporter metrics and logs
  • Implement retry with exponential backoff
  • Verify endpoints and authentication
  • Adjust batch size and timeout settings
  • Set up export failure alerts

Memory Issues

Problem: Memory leaks or high usage.

Common Causes:

  • Spans not properly exported
  • Data accumulation in buffers
  • Long-running spans holding references

Solutions:

  • Ensure proper span lifecycle management
  • Configure appropriate export intervals
  • Review attribute sizes
  • Monitor buffer sizes
  • Implement resource cleanup

Next Steps

Distributed tracing provides valuable insights for understanding end-to-end application behavior, identifying performance issues, and optimizing system resources.

Explore the OpenTelemetry tracing API for your programming language: