Node.js Performance Monitoring Guide

Alexandr Bandurchin
November 19, 2025
7 min read

Node.js applications power millions of APIs, microservices, and real-time systems. But without proper monitoring, performance issues, memory leaks, and errors can go undetected until they impact users. This guide explains how to monitor Node.js applications in production, what metrics to track, and which tools deliver the best results.

Why Monitor Node.js Applications?

Node.js runs on a single-threaded event loop, which makes it fast and efficient but also vulnerable to blocking operations. A slow database query or CPU-intensive task can freeze the entire application. Memory leaks accumulate over time, and asynchronous errors can fail silently without proper instrumentation.

Monitoring helps you:

  • Detect performance degradation before users notice
  • Identify memory leaks and resource exhaustion
  • Track error rates and stack traces across deployments
  • Correlate frontend issues with backend API performance
  • Optimize database queries and external API calls

For a detailed comparison of available solutions, see our guide on open-source Node.js monitoring tools.

What to Monitor in Node.js

Event Loop Metrics

The event loop is the heart of Node.js. When it becomes blocked, response times spike and throughput drops.

Key metrics:

  • Event loop lag (should stay below 10-20ms)
  • Event loop utilization (percentage of time spent processing events)
  • Active handles and requests (open connections, timers, file descriptors)

Why it matters: A lagging event loop means something is blocking JavaScript execution, typically CPU-intensive operations or synchronous I/O.

Memory Usage

Node.js uses V8's garbage collector, but memory leaks still happen through unclosed connections, growing caches, or circular references.

Key metrics:

  • Heap size (used vs. available)
  • External memory (Buffers, C++ objects outside V8 heap)
  • Garbage collection frequency and duration
  • Memory growth rate over time

Why it matters: Memory leaks cause gradual performance degradation and eventually crash the process when heap is exhausted.

Request/Response Performance

For HTTP services, request latency directly impacts user experience.

Key metrics:

  • Request rate (requests per second)
  • Response time percentiles (p50, p95, p99)
  • Error rates by status code (4xx client errors, 5xx server errors)
  • Request duration by endpoint

Why it matters: Slow endpoints impact user experience. High error rates indicate bugs or misconfigurations.

Dependencies and External Services

Node.js applications typically integrate with databases, caches, message queues, and third-party APIs.

Key metrics:

  • Database query duration and error rates
  • Connection pool size and utilization
  • External API latency and failures
  • Cache hit/miss ratios

Why it matters: External dependencies often cause performance bottlenecks. Distributed tracing helps identify which service is slowing down requests.

Node.js Monitoring Solutions

OpenTelemetry provides vendor-neutral instrumentation for Node.js applications through automatic and manual instrumentation.

How it works:

javascript
const { NodeTracerProvider } = require('@opentelemetry/sdk-trace-node');
const { Resource } = require('@opentelemetry/resources');
const { SemanticResourceAttributes } = require('@opentelemetry/semantic-conventions');
const { registerInstrumentations } = require('@opentelemetry/instrumentation');
const { HttpInstrumentation } = require('@opentelemetry/instrumentation-http');
const { ExpressInstrumentation } = require('@opentelemetry/instrumentation-express');

const provider = new NodeTracerProvider({
  resource: new Resource({
    [SemanticResourceAttributes.SERVICE_NAME]: 'my-nodejs-app',
  }),
});

registerInstrumentations({
  instrumentations: [
    new HttpInstrumentation(),
    new ExpressInstrumentation(),
  ],
});

provider.register();

Why choose OpenTelemetry:

  • Automatic instrumentation for Express, Fastify, Koa, and other frameworks
  • Collects distributed traces, metrics, and logs in one system
  • Works with any OpenTelemetry-compatible backend (Uptrace, Jaeger, etc.)
  • No vendor lock-in

When to use it: When you need comprehensive observability across multiple services or want to avoid proprietary agents.

Prometheus + Grafana

Prometheus collects metrics through exporters, while Grafana visualizes them.

How it works:

javascript
const promClient = require('prom-client');
const express = require('express');

const app = express();
const register = new promClient.Registry();

promClient.collectDefaultMetrics({ register });

const httpRequestDuration = new promClient.Histogram({
  name: 'http_request_duration_seconds',
  help: 'Duration of HTTP requests in seconds',
  labelNames: ['method', 'route', 'status_code'],
  registers: [register]
});

app.get('/metrics', async (req, res) => {
  res.set('Content-Type', register.contentType);
  res.end(await register.metrics());
});

Why choose Prometheus:

  • Strong Kubernetes integration
  • Long-term metrics storage
  • Powerful query language (PromQL)
  • Large ecosystem of exporters

When to use it: When you need metrics-only monitoring or already use Prometheus for infrastructure monitoring.

Commercial APM Solutions

Solutions like Datadog, New Relic, and Dynatrace offer all-in-one monitoring with minimal setup. For a comparison of monitoring platforms, see observability tools.

Benefits:

  • Automatic instrumentation through agents
  • Pre-built dashboards and alerts
  • Support teams and SLAs
  • Machine learning for anomaly detection

Drawbacks:

  • Higher cost (typically $15-50 per host per month)
  • Vendor lock-in
  • Less flexibility for custom instrumentation

When to use them: When you need out-of-the-box monitoring and have budget for commercial tools.

Monitoring Node.js with Uptrace

Uptrace is an OpenTelemetry-native APM that combines distributed tracing, metrics, and logs in a unified interface. It's designed for teams that want comprehensive observability without vendor lock-in.

Key features for Node.js:

  • Automatic instrumentation for Express, Fastify, NestJS, and other frameworks
  • Distributed tracing across microservices
  • Event loop lag and memory metrics
  • SQL query performance analysis
  • Log-to-trace correlation

Setup:

bash
npm install @uptrace/node
javascript
require('@uptrace/node')({
  dsn: 'https://your-project@api.uptrace.dev',
  serviceName: 'my-nodejs-app',
});

For complete setup instructions, see OpenTelemetry Node.js with Uptrace.

Cost: Uptrace offers open-source self-hosted edition and cloud version starting at $0.10-0.15 per GB of telemetry data.

Troubleshooting

High Memory Usage

Symptoms: Increasing heap size, frequent garbage collection, eventual crashes

Diagnosis:

javascript
const v8 = require('v8');

setInterval(() => {
  const heapStats = v8.getHeapStatistics();
  console.log('Heap used:', heapStats.used_heap_size / 1024 / 1024, 'MB');
  console.log('Heap limit:', heapStats.heap_size_limit / 1024 / 1024, 'MB');
}, 60000);

Common causes:

  • Unclosed connections (database, HTTP requests)
  • Growing caches without eviction policies
  • Event listeners not removed
  • Large objects retained in closures

Solution: Use memory profiling tools, implement connection pooling, and add cache size limits.

Event Loop Blocking

Symptoms: Slow response times, timeout errors, high event loop lag

Diagnosis:

javascript
const { performance, PerformanceObserver } = require('perf_hooks');

const obs = new PerformanceObserver((items) => {
  items.getEntries().forEach((entry) => {
    if (entry.duration > 50) {
      console.warn('Slow operation:', entry.name, entry.duration, 'ms');
    }
  });
});
obs.observe({ entryTypes: ['measure'] });

Common causes:

  • CPU-intensive synchronous operations
  • Blocking I/O (synchronous file operations)
  • Large JSON parsing or serialization
  • Regular expressions with catastrophic backtracking

Solution: Move CPU-intensive work to worker threads, use asynchronous APIs, and optimize algorithms.

High Error Rates

Symptoms: 5xx errors, uncaught exceptions, rejected promises

Diagnosis: Distributed tracing shows error stack traces and request context.

Common causes:

  • Unhandled promise rejections
  • Database connection timeouts
  • Missing error boundaries in async code
  • External API failures

Solution: Implement proper error handling, add retries with exponential backoff, and use circuit breakers for external services.

Best Practices for Node.js Monitoring

1. Instrument Early

Add monitoring before launching to production. Retrofitting instrumentation is harder and delays incident response.

2. Monitor Dependencies

Track database queries, external API calls, and cache operations. Most performance issues originate in dependencies, not application code. For full-stack visibility, consider end-to-end monitoring.

3. Use Distributed Tracing

In microservices architectures, distributed tracing is essential. It shows how requests flow through services and where bottlenecks occur.

4. Set Up Alerts

Configure alerts for:

  • Error rate spikes (threshold: >1% of requests)
  • High latency (p95 > 500ms)
  • Memory usage (heap > 80% of limit)
  • Event loop lag (>50ms sustained)

5. Correlate Logs with Traces

When errors occur, having logs with trace context helps debug faster. OpenTelemetry automatically injects trace IDs into logs using the tracing API.

6. Load Test Before Production

Use tools like k6 or Artillery to generate realistic load. Monitor metrics under stress to identify limits.

7. Track Deployment Impact

Monitor metrics before and after deployments. Regressions in response time or error rates indicate issues.

Comparing Monitoring Solutions

ApproachSetup ComplexityCostVendor Lock-inBest For
OpenTelemetry + UptraceMediumLowNoneTeams wanting vendor-neutral observability
Prometheus + GrafanaHighLowNoneMetrics-only monitoring, Kubernetes environments
Datadog / New RelicLowHighHighTeams needing out-of-the-box solution with support
prom-client (manual)HighLowNoneCustom metrics without full observability

For a detailed comparison of distributed tracing tools, check out distributed tracing tools list.

FAQ

  1. What's the best way to monitor Node.js in production? OpenTelemetry provides the most comprehensive and vendor-neutral approach. It automatically instruments frameworks, collects traces and metrics, and works with any compatible backend.
  2. How do I monitor Node.js memory leaks? Track heap size over time and set alerts when it grows consistently. Use Node.js built-in profiler or tools like clinic.js to identify leak sources.
  3. Do I need APM if I already have Prometheus? Prometheus collects metrics but doesn't provide distributed tracing or log correlation. For microservices, adding tracing significantly improves debugging.
  4. What metrics indicate Node.js performance problems? Watch event loop lag (>20ms), memory growth, p95 response times, and error rates. Sudden changes in these metrics signal issues.
  5. How much does Node.js monitoring cost? Open-source tools like Uptrace cost $0.10-0.15 per GB. Commercial APM solutions range from $15-50 per host monthly depending on features.
  6. Can I monitor serverless Node.js functions? Yes. OpenTelemetry supports AWS Lambda, and services like Datadog have Lambda-specific integrations. Cold start times and invocation duration are key metrics.
  7. What's the difference between APM and logging? APM tracks application performance (traces, metrics), while logging captures event records. For a full comparison, see open-source APM tools.