Tracing tools help you manage, monitor, and assess performance of your cloud infrastructure, services, and applications, and make sure your customers get the best digital experience.
The best tracing tools can help you eliminate performance bottlenecks and recover from incidents faster. Use this guide to pick the right one for you.
What is a distributed tracing tool?
A distributed tracing tool is a software tool used to track the performance and behavior of complex, distributed systems, such as microservices-based applications.
Distributed tracing allows you to see how a request progresses through different services and systems, timings of each operation, any logs and errors as they occur.
In a distributed environment, tracing tools also help you understand relationships and interactions between microservices. Tracing tools gives an insight into how a particular microservice is performing and how that service affects other microservices.
Distributed tracing is performed by instrumenting each component of the system to emit trace information, such as timing information and metadata, as requests pass through. The trace information is then collected and analyzed by the tracing tool, which provides a graphical representation of the request flow and performance metrics for each component.
Why do you need a distributed tracing tool?
Get a centralized view. Tracing provides a single view of your distributed microservices. Your team can more easily understand how an application is built and how services interact with each other.
Fix bottlenecks faster. By having all the service's performance in hand, you can quickly pinpoint failures and identify performance bottlenecks.
Recover from incidents faster. A gread tracing tool can notify you when the site is down or a performance anomaly is detected.
Open source tracing tools
Uptrace
Uptrace is an OpenTelemetry APM that monitors performance, errors, and logs. Main features include an intuitive query builder, rich dashboards, percentiles, users and projects management.
Tech stack:
- Backend: Go
- Frontend: Vue.js
- Instrumentation: OpenTelemetry / OTLP
- Storage: ClickHouse with S3
Pros:
- Supports both tracing and metrics
- Rich UI with charts
- Advanced filtering capabilities
- Simple setup with ClickHouse being the only dependency
- OpenTelemetry support including pre-configured distros
Cons:
- ClickHouse is the only supported DBMS
SigNoz
SigNoz is an open-source APM. It helps developers monitor their applications & troubleshoot problems.
SigNoz provides a unified UI for metrics and traces so that there is no need to switch between different tools such as Jaeger and Prometheus.
Tech stack:
- Backend: Go
- Frontend: React
- Instrumentation: OpenTelemetry / OTLP
- Storage: ClickHouse
Pros:
- Native OpenTelemetry support
- Rich UI with charts
- Metrics support using Prometheus as a backend and custom UI
- Traces visualization using Flamegraphs and Gantt charts
- Filters based on tags, status codes, service names, operation, etc.
- Alarms
Jaeger
Jaeger is a distributed tracing platform created by Uber Technologies. It can be used for monitoring microservices-based distributed systems.
Jaeger provides visibility into the flow of requests across microservices, allowing developers to understand the performance and behavior of their applications. It is used to gather timing data and logs from different services, and present them in a single view to help developers identify performance bottlenecks and errors.
Tech stack:
- Backend: Go
- Frontend: React
- Instrumentation: OpenTelemetry / OTLP
- Storage: Cassandra, Elasticsearch; ClickHouse using a plugin
Pros:
- Stable and well-known project
- Adaptive sampling
- Support for multiple DBMS via plugins
- Sponsored by CNCF
Cons:
- No charts / percentiles
- Limited filtering capabilities
- Not all plugins are maintained and usable
Sentry
Sentry tracks your software performance, measures metrics like throughput and latency, and displays the impact of errors across multiple systems.
Sentry provides detailed crash reports, including stack traces, user information, and logs, to help developers diagnose and resolve issues quickly.
Sentry includes features like notifications, prioritization, and collaboration tools to help teams work together to resolve issues. By providing a centralized view of all errors, Sentry can help teams improve the quality and stability of their applications.
Tech stack:
- Backend: Python
- Frontend: React
- Instrumentation: Sentry SDK
- Storage: Kafka, Redis, PostgreSQL, ClickHouse
Pros:
- Excellent errors monitoring
- Quality SDK for Go, Python, Ruby, .NET, and PHP
- Friendly UI
Cons:
- Complex setup
- No OpenTelemetry support
- The UI is built around errors monitoring
SkyWalking
SkyWalking is an open source APM system, including monitoring, tracing, diagnosing capabilities for distributed system in Cloud Native architecture.
SkyWalking provides a comprehensive solution for monitoring and analyzing the performance and behavior of modern applications, helping teams to identify and resolve issues before they impact end users.
SkyWalking provides features such as distributed tracing, application performance management (APM), and service mesh observability, all of which can be used to gain insights into the behavior and health of your applications.
SkyWalking also provides a centralized dashboard for visualizing data, as well as alerts and notifications to alert teams to potential issues.
Tech stack:
- Backend: Java
- Frontend: Vue.js
- Instrumentation: SkyWalking
- Storage: ElasticSearch, MySQL, TiDB, InfluxDB, and more
Pros:
- Rich UI with charts
- Good metrics support (including dashboards)
- Alarms
- Support for multiple DBMS
Cons:
- Complex setup
- Complex and overloaded UI
- Confusing tracing UI
- OpenTelemetry support requires OpenTelemetry Collector
Zipkin
Zipkin is an open-source distributed tracing system that helps to gather data on the interactions between microservices in a distributed system.
Zipkin provides a way to visualize the flow of requests and responses between services, as well as the performance characteristics of each request, such as latency and response times.
Zipkin's key feature is the ability to trace a request as it flows through multiple microservices. This information can be used to gain insights into the performance of each service and the interactions between them, helping teams to identify and resolve performance and stability issues.
Zipkin's UI is minimalistic, but you can replace it with Grafana/Kibana configured to work with Zipkin data source.
Tech stack:
- Backend: Java
- Frontend: React
- Instrumentation: Zipkin span model; OpenTelemetry via adapter
- Storage: MySQL, Cassandra, or Elasticsearch.
Pros:
- Stable and well-known project
- Support for multiple DBMS
Cons:
- No active development
- Limited UI and filtering capabilities
- OpenTelemetry support requires an adapter
- No ClickHouse support
Grafana Tempo
Grafana Tempo is an open source, easy-to-use, and high-scale distributed tracing backend.
Tempo is designed to work seamlessly with Grafana, providing a complete solution for observability of distributed systems and microservices.
Tempo is cost-efficient, requiring only object storage to operate, and is deeply integrated with Grafana, Prometheus, and Loki. Tempo can ingest common open source tracing protocols, including Jaeger, Zipkin, and OpenTelemetry.
Tempo is optimized for high performance and can handle large amounts of tracing data, making it well-suited for use in large, complex systems. It provides a highly scalable, high-availability backend for storing, querying, and visualizing tracing data.
Tech stack:
- Backend: Go
- Frontend: React
- Instrumentation: OpenTelemetry / OTLP
- Storage: Grafana Tempo
Pros:
- Integration with Grafana metrics dashboard
- OpenTelemetry support
Cons:
- The UI is built around metrics and feels awkward / clumsy for everything else
- Limited filtering capabilities
Paid cloud tracing tools
If you looking for a paid tracing tool in the cloud, see our guide for DataDog competitors and alternatives.