Structured Logging Best Practices: Implementation Guide with Examples
In structured logging, log messages are broken down into key-value pairs, making it easier to search, filter, and analyze logs. This is in contrast to traditional logging, which usually consists of unstructured text that is difficult to parse and analyze.
What is structured logging?
Structured logging is the practice of capturing and storing log messages in a structured and organized format.
Traditional logging often involves printing raw text messages to log files, which can be difficult to parse and analyze programmatically.
In contrast, structured logging formats log messages as key-value pairs or in structured data formats such as JSON or XML.
Common logging challenges
Organizations face several challenges with traditional logging approaches:
- Difficulty in parsing and analyzing unstructured log data
- Inconsistent log formats across different services
- High storage costs due to inefficient log formats
- Complex log aggregation and correlation
- Limited searchability and filtering capabilities
Use cases for structured logging
Structured logging proves valuable across various application types, each with its specific requirements and challenges:
Microservices Architecture
When a request flows through multiple services, structured logs help track its journey and identify issues:
{
"service": "payment-processor",
"trace_id": "abc-123-def-456",
"event": "payment_initiated",
"upstream_service": "order-service",
"downstream_service": "payment-gateway",
"request_id": "req_789",
"latency_ms": 145
}
Service interaction tracking can identify communication patterns and potential bottlenecks. Request flow monitoring helps understand the sequence of operations, while error correlation across services enables quick problem resolution in complex distributed systems.
High-Load Applications
Applications handling thousands of requests per second require sophisticated logging strategies. Structured logging helps monitor and optimize performance:
{
"component": "api-gateway",
"event": "request_processed",
"endpoint": "/api/v1/users",
"method": "GET",
"response_time_ms": 45,
"cpu_usage_percent": 78,
"memory_usage_mb": 1240,
"concurrent_requests": 156
}
Performance monitoring becomes more efficient when logs include specific metrics and timings. Resource usage tracking helps identify potential memory leaks or CPU bottlenecks, while systematic logging of performance metrics helps identify and resolve bottlenecks before they affect users.
Security-Critical Systems
When working with sensitive data and compliance requirements, well-implemented logging becomes your primary tool for security oversight:
{
"system": "authentication-service",
"event": "login_attempt",
"status": "failed",
"reason": "invalid_2fa",
"ip_address": "192.168.1.1",
"geo_location": "US-NY",
"user_agent": "Mozilla/5.0...",
"attempt_count": 3,
"security_level": "high"
}
Why to use structured logging?
Structured logging provides the following benefits:
Improved readability. The structured format makes log messages more human-readable, allowing developers and operators to easily understand the content without relying solely on raw text parsing.
Better searching and filtering. Structured data makes it easier to search for specific log entries or filter logs based on specific criteria. This is especially useful for large-scale applications with large amounts of log data.
Easy integration with tools. Structured logs can be ingested and processed by various log management and analysis tools, enabling powerful analysis, visualization, and monitoring of application behavior.
Improved debugging and troubleshooting. When log messages are structured, it is easier to include relevant contextual information, such as timestamps, error codes, and specific attributes related to the logged events, which facilitates effective debugging and troubleshooting.
Consistency and scalability. Structured logging promotes a consistent and uniform log format throughout the application, making it easier to scale logging capabilities and maintain logs in a standardized manner.
Structured log formats
Structured logging can be implemented using various data formats, with JSON being one of the most commonly used due to its simplicity and human-readability.
However, other formats can also be used depending on the requirements of the application and the logging framework in use.
JSON format
JSON is a lightweight data exchange format that is easy for both humans and machines to read and write. It represents data as key-value pairs and arrays, making it an excellent choice for structured logging due to its simplicity and widespread support.
Example for a web application:
{
"timestamp": "2025-01-08T12:34:56Z",
"level": "ERROR",
"service": "payment-service",
"message": "Payment processing failed",
"error": {
"code": "INSUFFICIENT_FUNDS",
"message": "Account balance too low"
},
"context": {
"user_id": "12345",
"transaction_id": "tx_789",
"amount": 150.75,
"currency": "USD"
},
"request": {
"method": "POST",
"path": "/api/v1/payments",
"ip": "192.168.1.1"
}
}
You can also use JSON to include structured data in your log messages:
request failed {"http.method": "GET", "http.route": "/users/:id", "enduser.id": 123, "foo": "hello world"}
logfmt
This format represents log entries as a series of key-value pairs separated by delimiters such as spaces or tabs. It is simple and easy to implement.
If a value contains a space, you must enclose it in quotation marks. For example:
request failed http.method=GET http.route=/users/:id enduser.id=123 foo="hello world"
Format Comparison
Format | Pros | Cons | Size Overhead | Parse Speed |
---|---|---|---|---|
JSON | Human-readable, Widely supported | Verbose | High | Medium |
logfmt | Compact, Easy to read | Limited nested structure | Low | High |
Raw text | Minimal size | Hard to parse | Minimal | Slow |
Free format
If your library does not support structured logging, you can still improve grouping by quoting params:
# good
can't parse string: "the original string"
"foo" param can't bempty
# bad
can't parse string: the original string
foo param can't be empty
Implementation Examples
Python Implementation
import structlog
logger = structlog.get_logger()
logger.info("payment_processed",
amount=100.0,
currency="USD",
user_id="12345",
transaction_id="tx_789"
)
Java Implementation
import net.logstash.logback.argument.StructuredArguments;
logger.info("payment_processed",
StructuredArguments.kv("amount", 100.0),
StructuredArguments.kv("currency", "USD"),
StructuredArguments.kv("user_id", "12345")
);
Node.js Implementation
const pino = require('pino')()
pino.info({
event: 'payment_processed',
amount: 100.0,
currency: 'USD',
user_id: '12345',
})
Best Practices and Common Pitfalls
Implementing structured logging effectively requires careful consideration of various practices and potential issues. Following established best practices helps ensure your logging system remains maintainable, efficient, and valuable for troubleshooting and monitoring.
Best Practices
Consistent Field Names
When implementing structured logging across multiple services, maintaining consistent field names is crucial. This ensures easier log aggregation and analysis. For example, always use the same field name for user identification:
// Good - consistent naming
{"user_id": "12345", "action": "login"}
{"user_id": "12345", "action": "purchase"}
// Bad - inconsistent naming
{"userId": "12345", "action": "login"}
{"user": "12345", "action": "purchase"}
Correlation IDs
Correlation IDs are essential for tracking requests across distributed systems. Each request should receive a unique ID that's passed through all services:
{
"correlation_id": "req_abc123",
"service": "auth-service",
"event": "user_authenticated"
}
Context Information
Every log entry should contain sufficient context to understand the event without requiring additional lookups. Include relevant business context, technical details, and environmental information:
{
"event": "payment_failed",
"amount": 99.99,
"currency": "USD",
"payment_provider": "stripe",
"error_code": "insufficient_funds",
"customer_type": "premium",
"environment": "production"
}
Common Pitfalls
Sensitive Data Exposure
One of the most critical mistakes is logging sensitive information. Consider this example:
// Bad - exposing sensitive data
{
"user_email": "john@example.com",
"credit_card": "4111-1111-1111-1111",
"password": "secretpass"
}
// Good - masked sensitive data
{
"user_email_hash": "a1b2c3...",
"credit_card_last4": "1111",
"password": "[REDACTED]"
}
Timestamp Consistency
Inconsistent timestamp formats can make log analysis difficult. Always use UTC and ISO 8601 format:
// Good
{"timestamp": "2024-01-08T14:30:00Z"}
// Bad
{"timestamp": "01/08/24 14:30:00"}
{"time": "2024-01-08 14:30:00 +0200"}
Performance Considerations
Log Sampling Strategies
Choosing the right sampling strategy is crucial for high-volume applications. Here's how different strategies work:
Probabilistic Sampling
This approach randomly samples a percentage of log entries:
import random
def should_log(sampling_rate=0.1):
return random.random() < sampling_rate
if should_log():
logger.info("User action", extra={"user_id": "123"})
Rate Limiting
Implement rate limiting to cap the number of logs per time window:
from datetime import datetime, timedelta
class RateLimitedLogger:
def __init__(self, max_logs_per_second=100):
self.max_logs = max_logs_per_second
self.counter = 0
self.window_start = datetime.now()
def should_log(self):
now = datetime.now()
if now - self.window_start > timedelta(seconds=1):
self.counter = 0
self.window_start = now
if self.counter < self.max_logs:
self.counter += 1
return True
return False
High-Load Handling
Asynchronous Logging
Implement asynchronous logging to prevent blocking operations:
import asyncio
import aiofiles
async def async_log(message, file_path):
async with aiofiles.open(file_path, mode='a') as file:
await file.write(message + '\n')
Batch Processing
Group logs into batches to reduce I/O operations:
class BatchLogger:
def __init__(self, batch_size=100):
self.batch = []
self.batch_size = batch_size
def add_log(self, log_entry):
self.batch.append(log_entry)
if len(self.batch) >= self.batch_size:
self.flush()
def flush(self):
if self.batch:
self._write_batch(self.batch)
self.batch = []
Security Guidelines
Logging systems play a dual role in application security: they're essential for security monitoring and audit trails, but they can also become a security vulnerability if not properly secured.
Modern applications process vast amounts of sensitive data, from personal information to business-critical details, making it crucial to implement proper security measures for your logging infrastructure.
This section covers key data protection practices and maintaining secure logging operations.
Sensitive Data Protection
PII Masking
Implement robust PII masking using regular expressions and lookup tables:
import re
PII_PATTERNS = {
'email': r'\b[\w\.-]+@[\w\.-]+\.\w+\b',
'credit_card': r'\b\d{4}[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}\b'
}
def mask_pii(log_entry):
for pii_type, pattern in PII_PATTERNS.items():
log_entry = re.sub(pattern, f'[MASKED_{pii_type}]', log_entry)
return log_entry
Encryption
For sensitive logs that must be retained, implement encryption:
from cryptography.fernet import Fernet
class EncryptedLogger:
def __init__(self, encryption_key):
self.cipher_suite = Fernet(encryption_key)
def log_sensitive(self, message):
encrypted_message = self.cipher_suite.encrypt(message.encode())
self._write_encrypted_log(encrypted_message)
Log Retention and Audit
Implement a comprehensive log retention policy that balances security requirements with storage constraints:
class LogRetentionManager:
def __init__(self, retention_days=30):
self.retention_days = retention_days
def cleanup_old_logs(self):
cutoff_date = datetime.now() - timedelta(days=self.retention_days)
# Implementation of log cleanup logic
Troubleshooting Guide
Handling High Log Volume
When facing high log volume issues, implement a systematic approach:
- Analyze current logging patterns:
def analyze_log_patterns(logs):
pattern_counts = {}
for log in logs:
pattern = extract_log_pattern(log)
pattern_counts[pattern] = pattern_counts.get(pattern, 0) + 1
return pattern_counts
- Implement dynamic sampling based on patterns:
def should_log_pattern(pattern, pattern_counts):
if pattern_counts[pattern] > THRESHOLD:
return random.random() < 0.1
return True
Logging backend
Uptrace is an open source APM for OpenTelemetry that supports logs, traces, and metrics. You can use it to monitor applications and troubleshoot issues.
Uptrace natively supports structured logging and automatically parses log messages to extract the structured data and store it as attributes.
Uptrace comes with an intuitive query builder, rich dashboards, alerting rules, notifications, and integrations for most languages and frameworks.
Uptrace can process billions of logs on a single server and allows you to monitor your applications at 10x lower cost.
In just a few minutes, you can try Uptrace by visiting the cloud demo (no login required) or running it locally with Docker. The source code is available on GitHub.
Conclusion
Structured logging enables better log management, improved troubleshooting, and better application monitoring, resulting in more efficient and reliable software development and maintenance processes.
Frequently Asked Questions
How much logging is appropriate for production applications? The optimal logging volume depends on your application's complexity and requirements. High-traffic applications typically implement sampling strategies, logging 1-10% of routine operations while maintaining 100% coverage for errors and critical events. Consider storage costs and performance impact: logging can consume 1-5% of your application's resources in a well-configured system.
What's the performance impact of structured logging? Modern structured logging libraries add minimal overhead, typically 0.1-0.5ms per log entry. However, synchronous disk I/O can impact performance significantly. Implementing asynchronous logging with buffering can reduce this to microseconds. For high-throughput systems processing 10,000+ requests per second, consider implementing batching and sampling strategies.
How should I handle log rotation in containerized environments? Container logs are typically handled differently from traditional applications. Instead of file-based rotation, implement log streaming to external aggregators. If using file-based logging, configure retention based on size (e.g., 100MB per container) and time (7-30 days). Many organizations retain the last 2-3 rotated files for immediate troubleshooting.
What's the best approach for handling sensitive data in logs? Implement multi-layer protection for sensitive data. First, use pattern matching to identify and mask PII (emails, credit cards, SSNs) before logging. Second, encrypt logs at rest using industry-standard algorithms (AES-256). Third, implement role-based access control for log viewing. Some organizations maintain separate logging streams for sensitive and non-sensitive data.
How can I effectively debug issues across microservices? Correlation IDs are essential for distributed tracing. Generate a unique ID for each request chain and propagate it across services. Tools like OpenTelemetry can automate this process. Also, implement consistent timestamp formats (ISO 8601 in UTC) and log levels across services. Many organizations find that 60-70% of debugging time is saved with proper correlation implementation.
What are the storage requirements for structured logging? Storage needs vary by format choice and retention policies. JSON logging typically requires 1.5-2x more storage than plain text, while binary formats can reduce size by 30-50%. For a medium-sized application (1M requests/day), expect 1-5GB of logs per day before compression. Implementing GZIP compression typically reduces storage needs by 60-80%.
How should I handle logging during system outages? Implement a local buffer for logs when external logging systems are unavailable. Configure your logging library to maintain the last 1000-10000 entries in memory, with periodic writes to local storage. Once connectivity is restored, implement smart retry logic with exponential backoff. Critical error logs should have redundant storage paths to ensure preservation during outages.
You may also be interested in: