Structured Logging Best Practices: Implementation Guide with Examples

Vladimir Mihailenco
January 08, 2025
10 min read

In structured logging, log messages are broken down into key-value pairs, making it easier to search, filter, and analyze logs. This is in contrast to traditional logging, which usually consists of unstructured text that is difficult to parse and analyze.

What is structured logging?

Structured logging is the practice of capturing and storing log messages in a structured and organized format.

Traditional logging often involves printing raw text messages to log files, which can be difficult to parse and analyze programmatically.

In contrast, structured logging formats log messages as key-value pairs or in structured data formats such as JSON or XML.

Common logging challenges

Organizations face several challenges with traditional logging approaches:

  • Difficulty in parsing and analyzing unstructured log data
  • Inconsistent log formats across different services
  • High storage costs due to inefficient log formats
  • Complex log aggregation and correlation
  • Limited searchability and filtering capabilities

Use cases for structured logging

Structured logging proves valuable across various application types, each with its specific requirements and challenges:

Microservices Architecture

When a request flows through multiple services, structured logs help track its journey and identify issues:

json
{
  "service": "payment-processor",
  "trace_id": "abc-123-def-456",
  "event": "payment_initiated",
  "upstream_service": "order-service",
  "downstream_service": "payment-gateway",
  "request_id": "req_789",
  "latency_ms": 145
}

Service interaction tracking can identify communication patterns and potential bottlenecks. Request flow monitoring helps understand the sequence of operations, while error correlation across services enables quick problem resolution in complex distributed systems.

High-Load Applications

Applications handling thousands of requests per second require sophisticated logging strategies. Structured logging helps monitor and optimize performance:

json
{
  "component": "api-gateway",
  "event": "request_processed",
  "endpoint": "/api/v1/users",
  "method": "GET",
  "response_time_ms": 45,
  "cpu_usage_percent": 78,
  "memory_usage_mb": 1240,
  "concurrent_requests": 156
}

Performance monitoring becomes more efficient when logs include specific metrics and timings. Resource usage tracking helps identify potential memory leaks or CPU bottlenecks, while systematic logging of performance metrics helps identify and resolve bottlenecks before they affect users.

Security-Critical Systems

When working with sensitive data and compliance requirements, well-implemented logging becomes your primary tool for security oversight:

json
{
  "system": "authentication-service",
  "event": "login_attempt",
  "status": "failed",
  "reason": "invalid_2fa",
  "ip_address": "192.168.1.1",
  "geo_location": "US-NY",
  "user_agent": "Mozilla/5.0...",
  "attempt_count": 3,
  "security_level": "high"
}

Why to use structured logging?

Structured logging provides the following benefits:

  • Improved readability. The structured format makes log messages more human-readable, allowing developers and operators to easily understand the content without relying solely on raw text parsing.
  • Better searching and filtering. Structured data makes it easier to search for specific log entries or filter logs based on specific criteria. This is especially useful for large-scale applications with large amounts of log data.
  • Easy integration with tools. Structured logs can be ingested and processed by various log management and analysis tools, enabling powerful analysis, visualization, and monitoring of application behavior.
  • Improved debugging and troubleshooting. When log messages are structured, it is easier to include relevant contextual information, such as timestamps, error codes, and specific attributes related to the logged events, which facilitates effective debugging and troubleshooting.
  • Consistency and scalability. Structured logging promotes a consistent and uniform log format throughout the application, making it easier to scale logging capabilities and maintain logs in a standardized manner.

Structured logging forms a critical foundation for effective data observability practices. By providing consistent, machine-parsable log data, structured logging enables more sophisticated monitoring and analysis of data systems. Learn more in our data observability guide which explores how these practices work together.

Structured log formats

Structured logging can be implemented using various data formats, with JSON being one of the most commonly used due to its simplicity and human-readability.

However, other formats can also be used depending on the requirements of the application and the logging framework in use.

JSON format

JSON is a lightweight data exchange format that is easy for both humans and machines to read and write. It represents data as key-value pairs and arrays, making it an excellent choice for structured logging due to its simplicity and widespread support.

Example for a web application:

json
{
  "timestamp": "2025-01-08T12:34:56Z",
  "level": "ERROR",
  "service": "payment-service",
  "message": "Payment processing failed",
  "error": {
    "code": "INSUFFICIENT_FUNDS",
    "message": "Account balance too low"
  },
  "context": {
    "user_id": "12345",
    "transaction_id": "tx_789",
    "amount": 150.75,
    "currency": "USD"
  },
  "request": {
    "method": "POST",
    "path": "/api/v1/payments",
    "ip": "192.168.1.1"
  }
}

You can also use JSON to include structured data in your log messages:

text
request failed {"http.method": "GET", "http.route": "/users/:id", "enduser.id": 123, "foo": "hello world"}

logfmt

This format represents log entries as a series of key-value pairs separated by delimiters such as spaces or tabs. It is simple and easy to implement.

If a value contains a space, you must enclose it in quotation marks. For example:

text
request failed http.method=GET http.route=/users/:id enduser.id=123 foo="hello world"

Format Comparison

FormatProsConsSize OverheadParse Speed
JSONHuman-readable, Widely supportedVerboseHighMedium
logfmtCompact, Easy to readLimited nested structureLowHigh
Raw textMinimal sizeHard to parseMinimalSlow

Free format

If your library does not support structured logging, you can still improve grouping by quoting params:

text
# good
can't parse string: "the original string"
"foo" param can't bempty

# bad
can't parse string: the original string
foo param can't be empty

Implementation Examples

Python Implementation

python
import structlog

logger = structlog.get_logger()
logger.info("payment_processed",
    amount=100.0,
    currency="USD",
    user_id="12345",
    transaction_id="tx_789"
)

Java Implementation

java
import net.logstash.logback.argument.StructuredArguments;

logger.info("payment_processed",
    StructuredArguments.kv("amount", 100.0),
    StructuredArguments.kv("currency", "USD"),
    StructuredArguments.kv("user_id", "12345")
);

Node.js Implementation

javascript
const pino = require('pino')()

pino.info({
  event: 'payment_processed',
  amount: 100.0,
  currency: 'USD',
  user_id: '12345',
})

Best Practices and Common Pitfalls

Implementing structured logging effectively requires careful consideration of various practices and potential issues. Following established best practices helps ensure your logging system remains maintainable, efficient, and valuable for troubleshooting and monitoring.

Best Practices

Consistent Field Names

When implementing structured logging across multiple services, maintaining consistent field names is crucial. This ensures easier log aggregation and analysis. For example, always use the same field name for user identification:

json
// Good - consistent naming
{"user_id": "12345", "action": "login"}
{"user_id": "12345", "action": "purchase"}

// Bad - inconsistent naming
{"userId": "12345", "action": "login"}
{"user": "12345", "action": "purchase"}

Correlation IDs

Correlation IDs are essential for tracking requests across distributed systems. Each request should receive a unique ID that's passed through all services:

json
{
  "correlation_id": "req_abc123",
  "service": "auth-service",
  "event": "user_authenticated"
}

Context Information

Every log entry should contain sufficient context to understand the event without requiring additional lookups. Include relevant business context, technical details, and environmental information:

json
{
  "event": "payment_failed",
  "amount": 99.99,
  "currency": "USD",
  "payment_provider": "stripe",
  "error_code": "insufficient_funds",
  "customer_type": "premium",
  "environment": "production"
}

Common Pitfalls

Sensitive Data Exposure

One of the most critical mistakes is logging sensitive information. Consider this example:

json
// Bad - exposing sensitive data
{
  "user_email": "john@example.com",
  "credit_card": "4111-1111-1111-1111",
  "password": "secretpass"
}

// Good - masked sensitive data
{
  "user_email_hash": "a1b2c3...",
  "credit_card_last4": "1111",
  "password": "[REDACTED]"
}

Timestamp Consistency

Inconsistent timestamp formats can make log analysis difficult. Always use UTC and ISO 8601 format:

json
// Good
{"timestamp": "2025-01-08T14:30:00Z"}

// Bad
{"timestamp": "01/08/24 14:30:00"}
{"time": "2025-01-08 14:30:00 +0200"}

Performance Considerations

Log Sampling Strategies

Choosing the right sampling strategy is crucial for high-volume applications. Here's how different strategies work:

Probabilistic Sampling

This approach randomly samples a percentage of log entries:

python
import random

def should_log(sampling_rate=0.1):
    return random.random() < sampling_rate

if should_log():
    logger.info("User action", extra={"user_id": "123"})

Rate Limiting

Implement rate limiting to cap the number of logs per time window:

python
from datetime import datetime, timedelta

class RateLimitedLogger:
    def __init__(self, max_logs_per_second=100):
        self.max_logs = max_logs_per_second
        self.counter = 0
        self.window_start = datetime.now()

    def should_log(self):
        now = datetime.now()
        if now - self.window_start > timedelta(seconds=1):
            self.counter = 0
            self.window_start = now

        if self.counter < self.max_logs:
            self.counter += 1
            return True
        return False

High-Load Handling

Asynchronous Logging

Implement asynchronous logging to prevent blocking operations:

python
import asyncio
import aiofiles

async def async_log(message, file_path):
    async with aiofiles.open(file_path, mode='a') as file:
        await file.write(message + '\n')

Batch Processing

Group logs into batches to reduce I/O operations:

python
class BatchLogger:
    def __init__(self, batch_size=100):
        self.batch = []
        self.batch_size = batch_size

    def add_log(self, log_entry):
        self.batch.append(log_entry)
        if len(self.batch) >= self.batch_size:
            self.flush()

    def flush(self):
        if self.batch:
            self._write_batch(self.batch)
            self.batch = []

Security Guidelines

Logging systems play a dual role in application security: they're essential for security monitoring and audit trails, but they can also become a security vulnerability if not properly secured.

Modern applications process vast amounts of sensitive data, from personal information to business-critical details, making it crucial to implement proper security measures for your logging infrastructure.

This section covers key data protection practices and maintaining secure logging operations.

Sensitive Data Protection

PII Masking

Implement robust PII masking using regular expressions and lookup tables:

python
import re

PII_PATTERNS = {
    'email': r'\b[\w\.-]+@[\w\.-]+\.\w+\b',
    'credit_card': r'\b\d{4}[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}\b'
}

def mask_pii(log_entry):
    for pii_type, pattern in PII_PATTERNS.items():
        log_entry = re.sub(pattern, f'[MASKED_{pii_type}]', log_entry)
    return log_entry

Encryption

For sensitive logs that must be retained, implement encryption:

python
from cryptography.fernet import Fernet

class EncryptedLogger:
    def __init__(self, encryption_key):
        self.cipher_suite = Fernet(encryption_key)

    def log_sensitive(self, message):
        encrypted_message = self.cipher_suite.encrypt(message.encode())
        self._write_encrypted_log(encrypted_message)

Log Retention and Audit

Implement a comprehensive log retention policy that balances security requirements with storage constraints:

python
class LogRetentionManager:
    def __init__(self, retention_days=30):
        self.retention_days = retention_days

    def cleanup_old_logs(self):
        cutoff_date = datetime.now() - timedelta(days=self.retention_days)
        # Implementation of log cleanup logic

Troubleshooting Guide

Handling High Log Volume

When facing high log volume issues, implement a systematic approach:

  1. Analyze current logging patterns:
python
def analyze_log_patterns(logs):
    pattern_counts = {}
    for log in logs:
        pattern = extract_log_pattern(log)
        pattern_counts[pattern] = pattern_counts.get(pattern, 0) + 1
    return pattern_counts
  1. Implement dynamic sampling based on patterns:
python
def should_log_pattern(pattern, pattern_counts):
    if pattern_counts[pattern] > THRESHOLD:
        return random.random() < 0.1
    return True

Logging backend

Uptrace is an open source APM for OpenTelemetry that supports logs, traces, and metrics. You can use it to monitor applications and troubleshoot issues.

Uptrace natively supports structured logging and automatically parses log messages to extract the structured data and store it as attributes.

Uptrace comes with an intuitive query builder, rich dashboards, alerting rules, notifications, and integrations for most languages and frameworks.

Uptrace can process billions of logs on a single server and allows you to monitor your applications at 10x lower cost.

In just a few minutes, you can try Uptrace by visiting the cloud demo (no login required) or running it locally with Docker. The source code is available on GitHub.

Conclusion

Structured logging enables better log management, improved troubleshooting, and better application monitoring, resulting in more efficient and reliable software development and maintenance processes.

FAQ

  1. How much logging is appropriate for production applications? The optimal logging volume depends on your application's complexity and requirements. High-traffic applications typically implement sampling strategies, logging 1-10% of routine operations while maintaining 100% coverage for errors and critical events. Consider storage costs and performance impact: logging can consume 1-5% of your application's resources in a well-configured system.
  2. What's the performance impact of structured logging? Modern structured logging libraries add minimal overhead, typically 0.1-0.5ms per log entry. However, synchronous disk I/O can impact performance significantly. Implementing asynchronous logging with buffering can reduce this to microseconds. For high-throughput systems processing 10,000+ requests per second, consider implementing batching and sampling strategies.
  3. How should I handle log rotation in containerized environments? Container logs are typically handled differently from traditional applications. Instead of file-based rotation, implement log streaming to external aggregators. If using file-based logging, configure retention based on size (e.g., 100MB per container) and time (7-30 days). Many organizations retain the last 2-3 rotated files for immediate troubleshooting.
  4. What's the best approach for handling sensitive data in logs? Implement multi-layer protection for sensitive data. First, use pattern matching to identify and mask PII (emails, credit cards, SSNs) before logging. Second, encrypt logs at rest using industry-standard algorithms (AES-256). Third, implement role-based access control for log viewing. Some organizations maintain separate logging streams for sensitive and non-sensitive data.
  5. How can I effectively debug issues across microservices? Correlation IDs are essential for distributed tracing. Generate a unique ID for each request chain and propagate it across services. Tools like OpenTelemetry can automate this process. Also, implement consistent timestamp formats (ISO 8601 in UTC) and log levels across services. Many organizations find that 60-70% of debugging time is saved with proper correlation implementation.
  6. What are the storage requirements for structured logging? Storage needs vary by format choice and retention policies. JSON logging typically requires 1.5-2x more storage than plain text, while binary formats can reduce size by 30-50%. For a medium-sized application (1M requests/day), expect 1-5GB of logs per day before compression. Implementing GZIP compression typically reduces storage needs by 60-80%.
  7. How should I handle logging during system outages? Implement a local buffer for logs when external logging systems are unavailable. Configure your logging library to maintain the last 1000-10000 entries in memory, with periodic writes to local storage. Once connectivity is restored, implement smart retry logic with exponential backoff. Critical error logs should have redundant storage paths to ensure preservation during outages.

You may also be interested in: