AWS CloudWatch Metrics Mastery: The Ultimate Reference List for Every AWS Service
Amazon CloudWatch Metrics are a core component of AWS monitoring, allowing you to track the operational health and performance of your resources. This comprehensive reference guide provides a detailed list of the most important AWS CloudWatch metrics for all major AWS services.
What are AWS CloudWatch Metrics?
AWS CloudWatch Metrics are time-ordered data points published by AWS services. Each metric represents a specific variable that is monitored over time, such as CPU utilization, network traffic, or error counts. These metrics provide visibility into resource utilization, application performance, and operational health. CloudWatch monitoring is the foundation of observability in AWS environments, helping you detect anomalies, visualize performance trends, and trigger automated actions.
How to Retrieve CloudWatch Metrics
There are three primary ways to access CloudWatch metrics:
AWS Console
- Log into the AWS Management Console
- Navigate to CloudWatch service
- Select "Metrics" from the left navigation panel
- Browse metrics by category or search for specific metrics
AWS CLI
Use the AWS Command Line Interface to retrieve metrics:
# List metrics for a specific namespace
aws cloudwatch list-metrics --namespace AWS/EC2
# Get metric statistics
aws cloudwatch get-metric-statistics \
--namespace AWS/EC2 \
--metric-name CPUUtilization \
--dimensions Name=InstanceId,Value=i-1234567890abcdef0 \
--start-time 2023-01-01T00:00:00Z \
--end-time 2023-01-02T00:00:00Z \
--period 3600 \
--statistics Average Maximum
CloudWatch API
For programmatic access, use the CloudWatch API endpoints:
ListMetrics
: Retrieve available metricsGetMetricData
: Retrieve multiple metrics with a single callGetMetricStatistics
: Get statistics for a specific metric
For detailed examples and code snippets on working with these API endpoints, refer to our article "Unlock the Power of CloudWatch API: A Developer's Toolkit for Custom Metrics".
CloudWatch Metrics by Service
EC2 Metrics (Namespace: AWS/EC2)
Metric Name | Description | Unit |
---|---|---|
CPUUtilization | Percentage of allocated EC2 compute units that are currently in use | Percent |
DiskReadOps | Completed read operations from all instance store volumes | Count |
DiskWriteOps | Completed write operations to all instance store volumes | Count |
DiskReadBytes | Bytes read from all instance store volumes | Bytes |
DiskWriteBytes | Bytes written to all instance store volumes | Bytes |
NetworkIn | Bytes received on all network interfaces by the instance | Bytes |
NetworkOut | Bytes sent out on all network interfaces by the instance | Bytes |
NetworkPacketsIn | Number of packets received on all network interfaces | Count |
NetworkPacketsOut | Number of packets sent out on all network interfaces | Count |
StatusCheckFailed | Instance status checks that have failed | Count |
StatusCheckFailed_Instance | Status checks that have failed for instance | Count |
StatusCheckFailed_System | Status checks that have failed for system | Count |
RDS Metrics (Namespace: AWS/RDS)
Metric Name | Description | Unit |
---|---|---|
CPUUtilization | Percentage of CPU utilization | Percent |
DatabaseConnections | Number of client connections to the database | Count |
FreeableMemory | Amount of available RAM | Bytes |
FreeStorageSpace | Amount of available storage space | Bytes |
ReadIOPS | Average number of disk read I/O operations per second | Count/Second |
WriteIOPS | Average number of disk write I/O operations per second | Count/Second |
ReadLatency | Average time taken per disk I/O operation | Seconds |
WriteLatency | Average time taken per disk write operation | Seconds |
ReadThroughput | Average number of bytes read from disk per second | Bytes/Second |
WriteThroughput | Average number of bytes written to disk per second | Bytes/Second |
BurstBalance | Percent of General Purpose SSD (gp2) burst-bucket I/O credits available | Percent |
Lambda Metrics (Namespace: AWS/Lambda)
Metric Name | Description | Unit |
---|---|---|
Invocations | Number of times function has been invoked | Count |
Errors | Number of invocations that resulted in a function error | Count |
Duration | Time between when Lambda receives the function and when execution completes | Milliseconds |
Throttles | Number of invocation attempts that were throttled | Count |
ConcurrentExecutions | Number of function instances running simultaneously | Count |
DeadLetterErrors | Number of times Lambda failed to send events to a DLQ | Count |
IteratorAge | Age of the last record for stream event sources | Milliseconds |
ProvisionedConcurrencyInvocations | Number of times provisioned concurrency was used | Count |
ProvisionedConcurrencyUtilization | Percent of provisioned concurrency that is in use | Percent |
S3 Metrics (Namespace: AWS/S3)
Metric Name | Description | Unit |
---|---|---|
BucketSizeBytes | Total size of all objects in the bucket | Bytes |
NumberOfObjects | Total number of objects stored in a bucket | Count |
AllRequests | Total number of HTTP requests made to a bucket | Count |
GetRequests | Number of HTTP GET requests made to a bucket | Count |
PutRequests | Number of HTTP PUT requests made to a bucket | Count |
DeleteRequests | Number of HTTP DELETE requests made to a bucket | Count |
HeadRequests | Number of HTTP HEAD requests made to a bucket | Count |
4xxErrors | Number of HTTP 4xx client error status codes | Count |
5xxErrors | Number of HTTP 5xx server error status codes | Count |
FirstByteLatency | Per-request time from receiving request to sending first byte | Milliseconds |
TotalRequestLatency | Per-request time from receiving request to last byte | Milliseconds |
ELB Metrics (Namespace: AWS/ELB for Classic, AWS/ApplicationELB for ALB, AWS/NetworkELB for NLB)
Application Load Balancer Metrics
Metric Name | Description | Unit |
---|---|---|
RequestCount | Number of requests processed | Count |
HTTPCode_Target_2XX_Count | Number of HTTP 2XX response codes from targets | Count |
HTTPCode_Target_3XX_Count | Number of HTTP 3XX response codes from targets | Count |
HTTPCode_Target_4XX_Count | Number of HTTP 4XX response codes from targets | Count |
HTTPCode_Target_5XX_Count | Number of HTTP 5XX response codes from targets | Count |
HTTPCode_ELB_4XX_Count | Number of HTTP 4XX response codes generated by the ALB | Count |
HTTPCode_ELB_5XX_Count | Number of HTTP 5XX response codes generated by the ALB | Count |
TargetResponseTime | Time elapsed from request to response from target | Seconds |
ActiveConnectionCount | Number of concurrent TCP connections | Count |
NewConnectionCount | Number of new TCP connections established | Count |
RejectedConnectionCount | Number of connections rejected due to target group at capacity | Count |
Network Load Balancer Metrics
Metric Name | Description | Unit |
---|---|---|
ActiveFlowCount | Number of concurrent TCP flows | Count |
ProcessedBytes | Total number of bytes processed by the load balancer | Bytes |
TCP_Client_Reset_Count | Number of reset (RST) packets sent from client to target | Count |
TCP_Target_Reset_Count | Number of reset (RST) packets sent from target to client | Count |
HealthyHostCount | Number of targets considered healthy | Count |
UnHealthyHostCount | Number of targets considered unhealthy | Count |
ECS/EKS Metrics (Namespace: AWS/ECS)
Metric Name | Description | Unit |
---|---|---|
CPUUtilization | Percentage of CPU units used by the cluster or service | Percent |
MemoryUtilization | Percentage of memory used by the cluster or service | Percent |
CPUReservation | Percentage of CPU units reserved by running tasks | Percent |
MemoryReservation | Percentage of memory reserved by running tasks | Percent |
RunningTaskCount | Number of tasks in the RUNNING state | Count |
PendingTaskCount | Number of tasks in the PENDING state | Count |
ServiceCount | Number of services in the cluster | Count |
DynamoDB Metrics (Namespace: AWS/DynamoDB)
Metric Name | Description | Unit |
---|---|---|
ConsumedReadCapacityUnits | Number of read capacity units consumed | Count |
ConsumedWriteCapacityUnits | Number of write capacity units consumed | Count |
ReadThrottleEvents | Requests to DynamoDB that exceed the provisioned read capacity | Count |
WriteThrottleEvents | Requests to DynamoDB that exceed the provisioned write capacity | Count |
ProvisionedReadCapacityUnits | Number of provisioned read capacity units | Count |
ProvisionedWriteCapacityUnits | Number of provisioned write capacity units | Count |
SuccessfulRequestLatency | Elapsed time for successful requests to DynamoDB | Milliseconds |
SystemErrors | Requests to DynamoDB that generate an HTTP 500 status code | Count |
UserErrors | Requests to DynamoDB that generate an HTTP 400 status code | Count |
ThrottledRequests | Requests to DynamoDB that exceed the provisioned throughput limits | Count |
API Gateway Metrics (Namespace: AWS/ApiGateway)
Metric Name | Description | Unit |
---|---|---|
Count | Total number of API requests | Count |
Latency | Time between when API Gateway receives a request and when it returns a response | Milliseconds |
IntegrationLatency | Time between API Gateway relaying a request to the backend and receiving a response | Milliseconds |
4XXError | Number of client-side errors | Count |
5XXError | Number of server-side errors | Count |
CacheHitCount | Number of requests served from API cache | Count |
CacheMissCount | Number of requests served from the backend | Count |
SQS Metrics (Namespace: AWS/SQS)
Metric Name | Description | Unit |
---|---|---|
NumberOfMessagesSent | Number of messages added to a queue | Count |
NumberOfMessagesReceived | Number of messages returned by calls to the ReceiveMessage action | Count |
NumberOfMessagesDeleted | Number of messages deleted from the queue | Count |
NumberOfEmptyReceives | Number of ReceiveMessage API calls that didn't return a message | Count |
ApproximateNumberOfMessagesVisible | Number of messages available for retrieval | Count |
ApproximateNumberOfMessagesNotVisible | Number of messages that are in flight | Count |
ApproximateAgeOfOldestMessage | Age of the oldest non-deleted message in the queue | Seconds |
SNS Metrics (Namespace: AWS/SNS)
Metric Name | Description | Unit |
---|---|---|
NumberOfMessagesPublished | Number of messages published to your Amazon SNS topics | Count |
NumberOfNotificationsDelivered | Number of messages successfully delivered | Count |
NumberOfNotificationsFailed | Number of messages that SNS failed to deliver | Count |
NumberOfNotificationsFilteredOut | Number of messages that were filtered out by subscription filter policies | Count |
NumberOfNotificationsFilteredOut-NoMessageAttributes | Number of messages filtered out due to missing message attributes | Count |
NumberOfNotificationsFilteredOut-InvalidMessageAttributes | Number of messages filtered out due to invalid message attributes | Count |
PublishSize | Size of messages published | Bytes |
SMSSuccessRate | Rate of successful SMS message deliveries | Percent |
Kinesis Metrics (Namespace: AWS/Kinesis)
Metric Name | Description | Unit |
---|---|---|
GetRecords.Bytes | Bytes retrieved, measured over 5 minutes | Bytes |
GetRecords.IteratorAgeMilliseconds | Age of the last record in a GetRecords call | Milliseconds |
GetRecords.Latency | Time taken per GetRecords operation | Milliseconds |
GetRecords.Records | Number of records retrieved per GetRecords operation | Count |
GetRecords.Success | Number of successful GetRecords operations | Count |
IncomingBytes | Number of bytes successfully put to the Kinesis stream | Bytes |
IncomingRecords | Number of records successfully put to the Kinesis stream | Count |
PutRecord.Bytes | Bytes put to the Kinesis stream using PutRecord | Bytes |
PutRecord.Latency | Time taken per PutRecord operation | Milliseconds |
PutRecord.Success | Number of successful PutRecord operations | Count |
PutRecords.Bytes | Bytes put to the Kinesis stream using PutRecords | Bytes |
PutRecords.Latency | Time taken per PutRecords operation | Milliseconds |
PutRecords.Records | Number of records put using PutRecords | Count |
PutRecords.Success | Number of successful PutRecords operations | Count |
ReadProvisionedThroughputExceeded | Number of GetRecords calls throttled for the stream | Count |
WriteProvisionedThroughputExceeded | Number of records rejected due to throttling | Count |
ElastiCache Metrics (Namespace: AWS/ElastiCache)
Redis Metrics
Metric Name | Description | Unit |
---|---|---|
CPUUtilization | Percentage of CPU utilization | Percent |
EngineCPUUtilization | Percentage of CPU used by the Redis engine thread | Percent |
CacheHits | Number of successful read-only key lookups in the main dictionary | Count |
CacheMisses | Number of unsuccessful read-only key lookups in the main dictionary | Count |
CurrConnections | Number of client connections | Count |
NetworkBytesIn | Number of bytes the host has read from the network | Bytes |
NetworkBytesOut | Number of bytes sent by the host to the network | Bytes |
NewConnections | Total number of connections that have been accepted | Count |
Evictions | Number of keys evicted due to maxmemory limit | Count |
DatabaseMemoryUsagePercentage | Percentage of memory available for the dataset | Percent |
ReplicationLag | Number of seconds by which the replica lags behind the primary | Seconds |
Memcached Metrics
Metric Name | Description | Unit |
---|---|---|
CPUUtilization | Percentage of CPU utilization | Percent |
CurrConnections | Number of client connections | Count |
NetworkBytesIn | Number of bytes read from the network by the node | Bytes |
NetworkBytesOut | Number of bytes sent by the node on the network | Bytes |
GetHits | Number of get requests that result in a cache hit | Count |
GetMisses | Number of get requests that result in a cache miss | Count |
CurrItems | Number of items currently stored in the cache | Count |
Evictions | Number of items evicted from the cache | Count |
CasBadval | Number of CAS (compare-and-swap) requests with a bad identifier | Count |
CasHits | Number of CAS requests that found an existing item | Count |
CasMisses | Number of CAS requests for items that were not found | Count |
FreeableMemory | Amount of free memory available on the node | Bytes |
Metric Retention and Storage Costs
CloudWatch retains metrics according to the following schedule:
- Data points with a period less than 60 seconds: Available for 3 hours
- Data points with a period of 60 seconds (1 minute): Available for 15 days
- Data points with a period of 300 seconds (5 minutes): Available for 63 days
- Data points with a period of 3600 seconds (1 hour): Available for 455 days (15 months)
CloudWatch Metrics Pricing
The cost structure for CloudWatch metrics monitoring includes:
- Basic monitoring (default for most services): Free
- Detailed monitoring (higher frequency): Starts at $0.30 per metric per month
- CloudWatch custom metrics: $0.30 per metric per month
- Amazon CloudWatch custom metrics with high resolution: $0.10 per 1,000 metrics
- API requests: First million API requests per month are free, then $0.01 per 1,000 API requests
- CloudWatch metrics pricing for alarms: $0.10 per alarm metric per month (standard resolution)
These prices may vary by region, so consult the AWS pricing page for the most current information.
Exporting CloudWatch Metrics
You can export CloudWatch metrics to third-party monitoring tools for extended analytics, visualization, or longer retention. Popular options include:
- AWS Data Firehose: Forward metrics to destinations like Elasticsearch, Splunk, or custom HTTP endpoints
- AWS Metric Stream: Stream metrics in near real-time to partners and third-party solutions
- OpenTelemetry integration: Use the OpenTelemetry Collector to forward metrics to other systems
For a detailed guide on exporting AWS CloudWatch metrics and logs to external systems, you can refer to Uptrace's guide on CloudWatch metrics and logs, which provides step-by-step instructions for setting up integrations using AWS Data Firehose or Prometheus with yet-another-cloudwatch-exporter.
AWS CloudWatch Alarms
AWS CloudWatch alarms monitor your CloudWatch metrics and trigger actions when metrics breach predefined thresholds. They are essential for proactive monitoring and automated response.
CloudWatch Alarm States
CloudWatch metric alarms have three possible states:
- OK - The metric is within the defined threshold
- ALARM - The metric has breached the defined threshold
- INSUFFICIENT_DATA - The alarm has just started, the metric is not available, or not enough data is available for the metric to determine the alarm state
Creating Basic CloudWatch Alarms
You can create alarms through the AWS Console, AWS CLI, or CloudFormation (AWS::CloudWatch::Alarm
resource).
Example using AWS CLI:
aws cloudwatch put-metric-alarm \
--alarm-name "High-CPU-Alarm" \
--alarm-description "Alarm when CPU exceeds 70%" \
--metric-name CPUUtilization \
--namespace AWS/EC2 \
--statistic Average \
--period 300 \
--threshold 70 \
--comparison-operator GreaterThanThreshold \
--dimensions "Name=InstanceId,Value=i-1234567890abcdef0" \
--evaluation-periods 2 \
--alarm-actions arn:aws:sns:us-east-1:123456789012:my-topic
CloudWatch Alarms Pricing
The cost of CloudWatch alarms depends on the type:
- Standard Resolution Alarms: $0.10 per alarm metric per month
- High Resolution Alarms: $0.30 per alarm metric per month
- Composite Alarms: $0.50 per alarm metric per month
CloudWatch Monitoring Best Practices
Effective CloudWatch monitoring requires a strategic approach:
- Set appropriate thresholds based on application behavior
- Use composite alarms for complex conditions
- Implement multi-metric alarms for correlated issues
- Configure meaningful actions (SNS notifications, Auto Scaling, etc.)
- Use anomaly detection for dynamic thresholds
- Monitor custom metrics alongside AWS metrics
FAQ
1. What is CloudWatch metrics? CloudWatch metrics are time-ordered data points published to CloudWatch by AWS services and your applications. Each metric represents a variable that changes over time (such as CPU utilization, network throughput, or error counts). AWS CloudWatch metrics provide visibility into resource performance, operational health, and overall AWS environment status.
2. What are the 3 states of the CloudWatch metric alarm? CloudWatch metric alarms have three possible states:
- OK - The metric is within the defined threshold
- ALARM - The metric has breached the defined threshold
- INSUFFICIENT_DATA - There's not enough data to evaluate the alarm state, or the metric isn't being reported
3. What are CloudWatch custom metrics? CloudWatch custom metrics are metrics that you define and publish to CloudWatch yourself, rather than metrics automatically collected by AWS services. Custom metrics allow you to monitor application-specific data points, business metrics, or any other variable important to your workloads. You can publish custom metrics using the AWS SDK, AWS CLI, or CloudWatch agent.
4. Are CloudWatch metrics free? Basic CloudWatch metrics provided by AWS services with standard resolution (data points at 5-minute intervals) are available at no additional charge. However, there are costs associated with:
- Detailed monitoring (1-minute intervals): Starting at $0.30 per metric per month
- Custom metrics: $0.30 per metric per month
- High-resolution custom metrics (1-second intervals): $0.10 per 1,000 metrics
- API requests: First million API requests per month are free, then $0.01 per 1,000 API requests
- CloudWatch alarms: Starting at $0.10 per alarm metric per month
Conclusion
AWS CloudWatch metrics are essential for monitoring the health and performance of your AWS resources. This reference guide provides a comprehensive list of the most important metrics for major AWS services, along with ways to retrieve them and understand their storage parameters and costs.
For the most up-to-date and complete information, always refer to the official AWS CloudWatch documentation.
You may also be interested in:
Table of Contents