Go Context timeouts can be harmful
You probably should avoid ctx.WithTimeout or ctx.WithDeadline with code that makes network calls. Here is why.
Using context for cancellation
Typically, context.Context is used to cancel operations like this:
package main
import (
"context"
"fmt"
"time"
)
func main() {
ctx := context.Background()
ctx, cancel := context.WithTimeout(ctx, time.Second)
defer cancel()
select {
case <-ctx.Done():
fmt.Println(ctx.Err())
fmt.Println("cancelling...")
}
}
Later, you can use such context with, for example, Redis client:
import "github.com/redis/go-redis/v9"
rdb := redis.NewClient(...)
ctx := context.Background()
ctx, cancel := context.WithTimeout(ctx, 3*time.Second)
defer cancel()
val, err := rdb.Get(ctx, "redis-key").Result()
At first glance, the code above works fine. But what happens when rdb.Get operation exceeds the timeout?
Context deadline exceeded
When context is cancelled, go-redis and most other database clients (including database/sql) must do the following:
- Close the connection, because it can't be safely reused.
- Open a new connection.
- Perform TLS handshake using the new connection.
- Optionally, pass some authentication checks, for example, using Redis
AUTHcommand.
Effectively, your application stops using the connection pool. Each operation now requires a fresh TCP connection and TLS handshake, making it slower and increasing the chance of exceeding the timeout again. This creates a cascading failure: timeouts cause connection churn, which causes more timeouts, until the application is spending all its time opening and closing connections.
In distributed tracing terms, you will see a flood of short-lived spans all ending with context deadline exceeded errors, which makes it harder to find the real problem.
Technically, this problem is not caused by context.Context and using small deadlines with net.Conn can cause similar issues. But because context.Context imposes a single timeout on all operations that use the context, each individual operation has a random timeout which depends on timings of previous operations.
What to do instead?
Your first option is to use fixed net.Conn deadlines:
var cn net.Conn
cn.SetDeadline(time.Now().Add(3 * time.Second))
With go-redis, you can use ReadTimeout and WriteTimeout options which control net.Conn deadlines:
rdb := redis.NewClient(&redis.Options{
ReadTimeout: 3 * time.Second,
WriteTimeout: 3 * time.Second,
})
Alternatively, you can use a separate context timeout for each operation so that one slow operation does not eat into the budget of the next:
ctx := context.Background()
ctx1, cancel1 := context.WithTimeout(ctx, time.Second)
defer cancel1()
op1(ctx1)
ctx2, cancel2 := context.WithTimeout(ctx, time.Second)
defer cancel2()
op2(ctx2)
You should also avoid timeouts smaller than 1 second, because they have the same problem. If you must deliver a SLA no matter what, you can make sure to generate a response in time but let the operation to continue in background:
func handler(w http.ResponseWriter, req *http.Request) {
// Process asynchronously in a goroutine.
ch := process(req)
select {
case res := <-ch:
// success
case <-time.After(time.Second):
// unknown result
}
}
Since Go 1.21, you can also use context.AfterFunc to schedule cleanup when a context is cancelled without blocking the current goroutine:
ctx, cancel := context.WithTimeout(ctx, 3*time.Second)
defer cancel()
stop := context.AfterFunc(ctx, func() {
log.Println("context cancelled, running cleanup")
})
defer stop()
Detaching from parent cancellation
Sometimes you need to run a background operation that should not be cancelled when the parent HTTP request finishes. Since Go 1.21, you can use context.WithoutCancel to create a context that inherits values but ignores cancellation:
func handler(w http.ResponseWriter, req *http.Request) {
// This context won't be cancelled when the HTTP request ends,
// but still carries trace context and other values.
bgCtx := context.WithoutCancel(req.Context())
go sendAnalytics(bgCtx, event)
}
This is particularly useful for fire-and-forget operations like sending analytics, writing audit logs, or updating caches after returning a response.
You may also be interested in: