Introduction
In this post, I've tried to summarize some notes I've taken while reading through the code of the context package. I
will share my mental model that has been effective for me so far, as well as some code I wrote to validate my
understanding.
Mental model
I think of the context package as an abstraction for working with cancellation trees. We can pass nodes from these
trees to functions in order to give them the ability to detect cancellation signals as they ripple through the branches.
The package exports a function called Background
that can be used to create new root nodes, and a bunch of other
functions like, WithCancel
, WithTimeout
, and WithDeadline
to derive new branches.
There is no magic involved though. The functions you pass your context to won't get stopped automatically by the
runtime. Instead, it works by having developers follow a convention: any function that receives a context can use its
Done
method to access a read-only channel. The expectation is that upon receiving a signal on this channel, the
function should stop.
There is one caveat here. The root node that we retrieve using context.Background
is always going to return a
nil
channel:
package context
type backgroundCtx struct{ emptyCtx }
func (emptyCtx) Done() <-chan struct{} {
return nil
}
Hence, reading from a root node is going to block forever. To be honest, I don't find the name Background
very
intuitive. Whenever I review a pull request, I to try substitute it in my head with UncancellableRootNode
. By doing
so, I find it easier to ask myself if I think this function should be allowed to potentially run forever, and if
justifies the creation of a new cancellation tree, or if its merely an extension of some larger operation.
Propagation
Let's proceed by constructing a cancellation tree in its simplest form: a single straight line:
┌────────────┐
│ root │
└────────────┘
│
▼
┌────────────┐
│ nodeOne │
└────────────┘
│
▼
┌────────────┐
│ nodeTwo │
└────────────┘
│
▼
┌────────────┐
│ nodeThree │
└────────────┘
To create the tree as illustrated above, we're going to use Background
for the root node, and WithCancel
for the
branches:
// print is going to self-cancel after 2 seconds.
func print(ctx context.Context, wg *sync.WaitGroup, name string) {
select {
case <-time.After(2 * time.Second):
fmt.Println(name, "timed out")
case <-ctx.Done():
fmt.Println(name, "canceled")
}
wg.Done()
}
func main() {
root := context.Background()
wg := sync.WaitGroup{}
wg.Add(3)
nodeOne, cancelOne := context.WithCancel(root)
defer cancelOne()
go print(nodeOne, &wg, "nodeOne")
nodeTwo, cancelTwo := context.WithCancel(nodeOne)
defer cancelTwo()
go print(nodeTwo, &wg, "nodeTwo")
nodeThree, cancelThree := context.WithCancel(nodeTwo)
defer cancelThree()
go print(nodeThree, &wg, "nodeThree")
wg.Wait()
}
To create a new branch using WithCancel
, we'll have to specify an existing node that we'd like to branch from. In
return, we'll get the newly created node and a function for cancelling it.
The code above also highlights the importance of deferring a call to cancel the node. This is because, upon entering the
select
statement, we're going to enter the case
where a message is received on the time.After
channel - rather
than the context. This makes the print
function and, subsequently, the main
function return and exit. Should this
happen, e.g the other operation completes first, we'll release the resources for the context at the same time.
Therefore, it's important to make sure that we always call cancel or we'll create a memory leak.
If we run this program, we should see the following being printed to our terminal:
❯ go run .
nodeOne timed out
nodeTwo timed out
nodeThree timed out
Now, to observe how cancellations propagate, we can modify the code so that nodeTwo
, which sits in the middle of the
tree, is cancelled 1
second earlier:
func main() {
// ...
nodeTwo, cancelTwo := context.WithCancel(nodeOne)
time.AfterFunc(time.Second, cancelTwo) // This line was changed.
go print(nodeTwo, &wg, "nodeTwo")
// ...
}
Running the code again, yields the following result:
❯ go run .
nodeThree canceled
nodeTwo canceled
nodeOne timed out
Here, we can see that cancelling a node will traverse the tree and cancel the nodes children as well. The
cancellations only propagate down never up.
Looking at the output, one might mistakenly assume that the context package traverses the tree the all the way down, and
then performs the cancellations bottoms up, but this is not the case. In reality, if we examine the code within the
context package, we'll find that each cancellable context maintains an internal map of its children:
children map[canceler]struct{}
and when we call cancel
on a node, it's going to close it's own channel first, and then perform a depth first
traversal to cancel all of its descendants:
func (c *cancelCtx) cancel(removeFromParent bool, err, cause error) {
// ...
d, _ := c.done.Load().(chan struct{})
if d == nil {
c.done.Store(closedchan)
} else {
close(d) // NOTE: This where this nodes channel is being closed.
}
for child := range c.children {
child.cancel(false, err, cause) // NOTE: This is where it calls the same cancel function for all of its children.
}
}
Seeing this, it might feel unintuiative that "nodeThree canceled" was printed before "nodeTwo canceled", however, the
channels for both nodes are closed almost simultaneously, probably within nanoseconds of each other.
The decision of which goroutine to wake up first is going to fall on the scheduler. Therefore, if we were to run the
program multiple times, we should be able to see the messages alternate:
❯ go run .
nodeTwo canceled
nodeThree canceled
nodeOne timed out
❯ go run .
nodeThree canceled
nodeTwo canceled
nodeOne timed out
The key takeaway here is that we shouldn't structure our programs in a way where we rely on our nodes to be cancelled in
a specific order. Regardless of whether a goroutine listens to a node at the top of the tree, or to another node 100
branches down, the order in which their cancellation logic gets to execute is going to be nondeterministic.
More branching
So far, we've used Background
to create a new tree, and WithCancel
for our branches.
We've observed that the channels from nodes created with WithCancel
close only when we explicitly invoke the cancel
function, or if a signal is propagating from one of its ancestors.
In addition to this, there is a third set of nodes that can be created using the WithTimeout
and WithDeadline
functions. These Nodes have, in addition to the cancel function, a third time-based mechanism for closing their
channels.
And although these function have different names, the nodes they create are functionally equivalent:
func WithTimeout(parent Context, timeout time.Duration) (Context, CancelFunc) {
return WithDeadline(parent, time.Now().Add(timeout))
}
Your choice between them depends solely on wether you want to specify the self-cancellation timing using a
time.Duration
or a time.Time
.
Let us proceed by modifying nodeOne to cancel itself after 100
milliseconds like this:
func main() {
// ...
nodeOne, cancelOne := context.WithTimeout(root, time.Millisecond * 100)
defer cancelOne()
go print(nodeOne, &wg, "nodeOne")
// ...
}
And running the code again we should see the cancellation message being printed for all of our nodes:
❯ go run .
nodeOne canceled
nodeTwo canceled
nodeThree canceled
Having this ability to create branches based on time is really powerful. It allows us to build a cancellation tree based
on priority, and distribute the nodes across different functions.
Let's use a search endpoint as an example. The entire search operation could be divided further into multiple multiple
sub-operations. One for performing a text search another for images, a third based on location, and so on.
If we deem the image search to be a less critical feature, we could assign it a node with a shorter timeout. By doing
so, we're able to restrict it's abillity to effect the search operations response time as a whole.
Ending notes
Making the context.Context
abstraction part of the standard library was a really wise decision by the Go team.
Having cancellations propagate to release resources at scale can be notoriously difficult to achieve. Often, it's under
high load or, unfortunately, during an incident, that we realize that expensive operation wasn't terminated in time.
I also appreciate how the the standard library usually handles the creation of the more complex cancellation trees for
us. For example, consider this basic HTTP server I've set up to mirror the search scenario we discussed earlier:
func main() {
fmt.Println("Starting server on :8080")
http.HandleFunc("/search", searchHandler)
log.Fatal(http.ListenAndServe(":8080", nil))
}
// searchHandler orchestrates the search operation, initiating two
// parallel sub-operations: one for text and another for images.
func searchHandler(w http.ResponseWriter, r *http.Request) {
query := r.URL.Query().Get("query")
// We don't have to create a cancellation tree ourselves, instead we're able to
// add branches to an existing one that the standard library has created for us.
ctx := r.Context()
// Here, we're adding one branch to the tree that is cancelled in
// 2 seconds. We'll use this node when performing the text search.
highPriorityBranch, highPriorityCancel := context.WithTimeout(ctx, time.Second*2)
defer highPriorityCancel()
go performSearch(highPriorityBranch, "text", query)
// We consider the image search more of a nice-to-have, therefore we'll cancel this
// branch 1 second earlier to reduce it's abillity to affect the overall response time.
lowPrioBranch, lowPrioCancel := context.WithTimeout(ctx, time.Second)
defer lowPrioCancel()
go performSearch(lowPrioBranch, "image", query)
// Sleep to allow the timeouts to trigger.
time.Sleep(time.Second * 2)
w.Write([]byte("Search completed"))
}
// performSearch captures the current time, waits for the context
// to cancel, and then prints the elapsed waiting time.
func performSearch(ctx context.Context, operation, query string) {
before := time.Now()
<-ctx.Done()
duration := time.Since(before)
fmt.Printf("Cancelling the %s search for %s after %s\n", operation, query, duration)
}
If we were to start this server:
❯ go run .
Starting server on :8080
and curl it from a separate terminal window:
❯ curl "http://localhost:8080/search?query=go"
We can see that the image search was cancelled 1 second before the the text search:
Cancelling the image search for go after 1.001227208s
Cancelling the text search for go after 2.001186333s
The important part here is that we didn't construct the cancellation tree ourselves. Instead, we added two branches to
the node that the standard library attached to the request. Too see why this was beneficial, we'll make another request
and immediately cancel it by hitting CTRL + C
on our keyboard. As a result, we should be able to see the following:
Cancelling the image search for go after 185.223375ms
Cancelling the text search for go after 185.257167ms
As you can see, we were able to release all of our resources as soon as the client chose to close the connection. This
works because the server generates a node for each request, and if the client closes the connection prematurely, it
invokes this node's cancel
method. So, by making this node our ancestor, we ensure that the cancellation signal
reaches our branches too.
This concludes the post, I hope you've enjoyed it!
The end
I usually tweet something when I've finished writing a new post. You can find me on Twitter
by clicking