tracing tokio resources

by Hayden Stainsby on Wednesday, 7 August 2024

Usually, I like to write posts about things I understand. Even if this means that I have to go and understand something first.

This post is going to be a bit different. Instead, we're going to follow along as I try to understand something.

One of my favourite topics to write about is the tracing instrumentation in Tokio and how it can be used to better understand your asynchronous code. I've previously written a bit about the tracing instrumentation for tasks: (tracing tokio tasks, tokio waker instrumentation, also mentioned in debugging tokio instrumentation and scheduled time). But there is a whole other side to the tracing in tokio which instruments resources.

A resource in this context is an async primitive provided by Tokio which can be used as a building block. Timers are resources, as are most of the types in the tokio::sync module, such as mutexes, semaphores, and channels. (Important caveat: as of today, the only channel that is instrumented is the oneshot channel).

I've spent a bit of time working with the tracing instrumentation in Tokio, but I will admit that I've never really understood the structure of the resource instrumentation - it's significantly more complicated than the structure of the task instrumentation. So let's jump into whatever sources we can find to help us understand how resources are instrumented.

The code is a good starting point. The instrumentation in Tokio was built specifically for (and at the same time as) Tokio Console, so it makes sense to look in both places. All the links to actual code will be based on the following versions:

However, the code can be a bit overwhelming. So let's look at the PRs when this instrumentation was added.

The first pair of PRs are when tokio::time::Sleep was instrumented and the code to read resource instrumentation was added to the console-subscriber:

tracing: instrument time::Sleep (tokio-rs/tokio#4072)
feat(subscriber): resource instrumentation (tokio-rs/console#77)

The next pair are when further resources (all from tokio::sync) were instrumented and when the resource detail view was added to Tokio Console:

tracing: instrument more resources (tokio-rs/tokio#4302)
feature: add resource detail view (tokio-rs/console#188)

After these PRs, no further resources were instrumented. Interestingly (but perhaps not surprisingly in the world of Open Source Software), they were all written by the one person: Zahari Dichev. Unfortunately, Zahari isn't active in the Tokio project these days, but if I could get a hold of him, I would love to pick his brain about this work.

I checked with Eliza about whether there was any more written up about the strategy behind resource instrumentation, but unfortunately there isn't. This wasn't necessarily a given, because there is a fantastic write-up by Felix S Klock II which describes much of the vision that Tokio Console (and hence the instrumentation in Tokio itself) hope to achieve. Road to TurboWish (part 1, part 2, part 3).

Now that we know that there isn't any easy source for this information, let's go through the code and make use of my ari-subscriber crate to view the tracing output directly.

a note on tone

As I mentioned above, this post was written while I was investigating and digging into how resource instrumentation works. There are some parts where I question the way things are implemented. However, I mean no disrespect to anyone involved in the implementation.

The instrumentation for resources and the visualisation for that instrumentation went into Tokio and Tokio Console over a small number of PRs, most of it is in the 4 PRs I linked earlier. This feature is incredible complete given that fact, and it's been running and doing its thing (in tokio_unstable) since then. With the benefit I have of looking back on this a number of years later, there might be things that I would change now, but it's not fair to expect anyone to have done differently at the time.

A huge thanks to Zahari who wrote most of this, as well as Eliza, Alice, Sean, Carl, and Felix who reviewed. We have an incredible foundation from that work alone!

oneshot resource

Let's begin with an example. It was harder to pick an example than I anticipated. It turns out that many resources operate via a "sub-resource", which makes things more confusing on first read through. For this reason, I went with the std::sync::oneshot channel - it's a fairly simple example, even if the code is a little longer than we may wish.

use tracing::debug;

#[tokio::main(flavor = "current_thread")]
async fn main() {
    use tracing_subscriber::prelude::*;
    let layer = ari_subscriber::Layer::new();
    tracing_subscriber::registry().with(layer).init();

    spawn_named("tokio-oneshot", tokio_oneshot()).await.unwrap();
}

async fn tokio_oneshot() {
    debug!("creating oneshot");
    let (tx, rx) = tokio::sync::oneshot::channel::<u64>();
    debug!("oneshot created");

    let jh = spawn_named("receiver", async move {
        debug!("awaiting message");
        let msg = rx.await.unwrap();
        debug!(msg, "message received");
    });

    spawn_named("sender", async move {
        debug!("sending message");
        tx.send(5).unwrap();
        debug!("message sent");
    });

    debug!(?jh, "awaiting receiver task");
    jh.await.unwrap();
    debug!("good-bye");
}

Our application is async and starts up in a current_thread Tokio runtime. It's easier to see what is happening when we restrict everything interesting we do in the application to a single thread.

We start by setting up the ari-subscriber layer. The output shown in this post is from a slightly modified version. I hope to make some of these changes optional features of the published crate in the future, for now you can use them from the ari-subscriber tracing-tokio-tasks branch. It even includes the above code in the examples directory in tokio-oneshot.rs.

We then spawn a task to run what would otherwise be the rest of our main function, but has been moved to its own function tokio_oneshot(). I've learnt to do this from previous posts, because this way we can easily see when this "main" task yields to the runtime.

In tokio_oneshot() we have 4 sections.

Create oneshot channel
Spawn receiver task which awaits the message from the receiver half of the channel
Spawn sender task that sends a message via the sender half of the channel
Await the receiver task and exit

Of course, this is an async application, so things won't necessarily run in that order. We wrap most of the interesting things we're doing in debug! events to make the important steps easier to separate. Notice that we only await the receiver task, this is because the sender task doesn't yield to the runtime until it completes, so it will always finish before the receiver task (which is awaiting the message it sends).

Before we look at the traces generated by this code, let's try and work out what kinds of spans and events we're going to find.

instrumentation types

In previous posts, we focused on only the spans and events related to tasks and wakers, for which we had one type of span (for tasks) and one type of event (for wakers) respectively. For resources, there are a lot more.

Reading through the first of the 2 console PRs I linked above (tokio-rs/console#77), we can find all the span names and event targets for the different instrumentation that we're interested in. Those are in lib.rs in the implementation of Layer::register_callsite().

Fortunately, we already have an enumeration of these instrumentation types and what they correspond to built right into ari-subscriber. The blog post introducing it lists the trace types. To save you following that link and because we're going to be looking at all of these in a moment, I'll repeat that list here:

runtime.spawn spans (green) instrument tasks
runtime.resource spans (red) instrument resources
runtime.resource.async_op spans (blue) instrument async operations
runtime.resource.async_op.poll spans (yellow) instrument the individual polls on async operations
tokio::task::waker events (purple) represent discrete waker operations
runtime::resource::poll_op events (orange) represent poll state changes
runtime::resource::state_update events (pink) represent resource state changes
runtime::resource::async_op::state_update events (turquoise) represent async operation state changes

We'll now spend the rest of the post getting a bit more familiar with most of these instrumentation types.

channel creation

Let's start by looking at what instrumentation is produced when we create the oneshot channel. That corresponds to the code snippet below.

debug!("creating oneshot");
let (tx, rx) = tokio::sync::oneshot::channel::<u64>();
debug!("oneshot created");

This code produces the following traces, output by ari-subscriber.

2024-07-31T15:35:48.586108Z TRACE 
   ⤷ runtime.spawn[1]{kind=task, task.name=tokio-oneshot, task.id=2}  new
2024-07-31T15:35:48.586396Z TRACE 
   ⤷ runtime.spawn[1]{kind=task, task.name=tokio-oneshot, task.id=2}  enter
2024-07-31T15:35:48.586456Z DEBUG 
   ⤷ runtime.spawn[1]{kind=task, task.name=tokio-oneshot, task.id=2} 
     ⤷ tokio_oneshot: creating oneshot
2024-07-31T15:35:48.586519Z TRACE 
   ⤷ runtime.resource[2]{concrete_type="Sender|Receiver", kind="Sync"}  new
2024-07-31T15:35:48.586592Z TRACE 
   ⤷ runtime.resource[2]{concrete_type="Sender|Receiver", kind="Sync"}  enter
2024-07-31T15:35:48.586645Z TRACE 
   ⤷ runtime.resource[2]{concrete_type="Sender|Receiver", kind="Sync"} 
     ⤷ runtime::resource::state_update: tx_dropped=false, tx_dropped.op="override"
2024-07-31T15:35:48.586707Z TRACE 
   ⤷ runtime.resource[2]{concrete_type="Sender|Receiver", kind="Sync"}  exit
2024-07-31T15:35:48.586758Z TRACE 
   ⤷ runtime.resource[2]{concrete_type="Sender|Receiver", kind="Sync"}  enter
2024-07-31T15:35:48.586808Z TRACE 
   ⤷ runtime.resource[2]{concrete_type="Sender|Receiver", kind="Sync"} 
     ⤷ runtime::resource::state_update: rx_dropped=false, rx_dropped.op="override"
2024-07-31T15:35:48.586867Z TRACE 
   ⤷ runtime.resource[2]{concrete_type="Sender|Receiver", kind="Sync"}  exit
2024-07-31T15:35:48.586914Z TRACE 
   ⤷ runtime.resource[2]{concrete_type="Sender|Receiver", kind="Sync"}  enter
2024-07-31T15:35:48.586964Z TRACE 
   ⤷ runtime.resource[2]{concrete_type="Sender|Receiver", kind="Sync"} 
     ⤷ runtime::resource::state_update: value_sent=false, value_sent.op="override"
2024-07-31T15:35:48.587023Z TRACE 
   ⤷ runtime.resource[2]{concrete_type="Sender|Receiver", kind="Sync"}  exit
2024-07-31T15:35:48.587073Z TRACE 
   ⤷ runtime.resource[2]{concrete_type="Sender|Receiver", kind="Sync"}  enter
2024-07-31T15:35:48.587121Z TRACE 
   ⤷ runtime.resource[2]{concrete_type="Sender|Receiver", kind="Sync"} 
     ⤷ runtime::resource::state_update: value_received=false, value_received.op="override"
2024-07-31T15:35:48.587179Z TRACE 
   ⤷ runtime.resource[2]{concrete_type="Sender|Receiver", kind="Sync"}  exit
2024-07-31T15:35:48.587229Z TRACE 
   ⤷ runtime.resource[2]{concrete_type="Sender|Receiver", kind="Sync"}  enter
2024-07-31T15:35:48.587279Z TRACE 
   ⤷ runtime.resource[2]{concrete_type="Sender|Receiver", kind="Sync"} 
     ⤷ runtime.resource.async_op[3]{source="Receiver::await"}  new
2024-07-31T15:35:48.587360Z TRACE 
   ⤷ runtime.resource[2]{concrete_type="Sender|Receiver", kind="Sync"}  exit
2024-07-31T15:35:48.587408Z TRACE 
   ⤷ runtime.resource[2]{concrete_type="Sender|Receiver", kind="Sync"} 
     ⤷ runtime.resource.async_op[3]{source="Receiver::await"}  enter
2024-07-31T15:35:48.587463Z TRACE 
   ⤷ runtime.resource[2]{concrete_type="Sender|Receiver", kind="Sync"} 
     ⤷ runtime.resource.async_op[3]{source="Receiver::await"} 
       ⤷ runtime.resource.async_op.poll[4]{}  new
2024-07-31T15:35:48.587539Z TRACE 
   ⤷ runtime.resource[2]{concrete_type="Sender|Receiver", kind="Sync"} 
     ⤷ runtime.resource.async_op[3]{source="Receiver::await"}  exit
2024-07-31T15:35:48.587592Z DEBUG 
   ⤷ runtime.spawn[1]{kind=task, task.name=tokio-oneshot, task.id=2} 
     ⤷ tokio_oneshot: oneshot created

Oh wow! That's already a lot of traces. This may look a bit overwhelming, but we'll go through it bit by bit. The rest of it won't be so bad (we hope!).

Right up the beginning, we create a new task. This is something we've seen in previous posts. This is the task that was spawned from main() using the async function tokio_oneshot(). Let's look at those traces, up to our first debug message.

2024-07-31T15:35:48.586108Z TRACE 
   ⤷ runtime.spawn[1]{kind=task, task.name=tokio-oneshot, task.id=2}  new
2024-07-31T15:35:48.586396Z TRACE 
   ⤷ runtime.spawn[1]{kind=task, task.name=tokio-oneshot, task.id=2}  enter
2024-07-31T15:35:48.586456Z DEBUG 
   ⤷ runtime.spawn[1]{kind=task, task.name=tokio-oneshot, task.id=2} 
     ⤷ tokio_oneshot: creating oneshot

That's not so bad. We create a new task, it gets polled for the first time, and then we emit the debug message (within the scope of the task, so it has the runtime.spawn span as its parent).

Now, let's look at the next little bit.

2024-07-31T15:35:48.586519Z TRACE 
   ⤷ runtime.resource[2]{concrete_type="Sender|Receiver", kind="Sync"}  new
2024-07-31T15:35:48.586592Z TRACE 
   ⤷ runtime.resource[2]{concrete_type="Sender|Receiver", kind="Sync"}  enter
2024-07-31T15:35:48.586645Z TRACE 
   ⤷ runtime.resource[2]{concrete_type="Sender|Receiver", kind="Sync"} 
     ⤷ runtime::resource::state_update: tx_dropped=false, tx_dropped.op="override"
2024-07-31T15:35:48.586707Z TRACE 
   ⤷ runtime.resource[2]{concrete_type="Sender|Receiver", kind="Sync"}  exit

Here we create a runtime.resource span. From our list above, we know that this span represents a resource. The kind is Sync and the concrete type is specified as Sender|Receiver. This doesn't really seem concrete enough to me, what if there are other resources that are concretely "sender & receiver"? But never mind, we know that this must represent our oneshot channel, so that's good enough for us.

We then enter the new runtime.resource span and emit a runtime::resource::state_update event. From our list, we know that this event represents a state change in the resource. This event has 2 fields:

tx_dropped=false
tx_dropped.op=override

This seems to mean that we want to override the value of the state tx_dropped in our resource by setting it to false. We can intuit that this means that the sender. tx is a common abbreviation for signal transmission, which is to say, "sending", in fact we also used tx as the variable name for the sender half of our oneshot channel.

After emitting this event, we exit the runtime.resource span again.

Now that this all makes sense, we can look back at the larger snippet of traces that we started with. We see the same "enter span - emit event - exit span" pattern repeat another three times. Each time, we're overriding the value of a different bit of state. The other 3 are:

rx_dropped=false
value_sent=false
value_received=false

This also makes sense. These values are a starting state for the channel. What is a little strange is that we enter and exit the spans each time. The code that produces these traces was introduced in the second of the 2 Tokio PRs I mentioned at the beginning of the post (tokio-rs/tokio#4302). Let's look at the code today, it hasn't changed and can be found in oneshot.rs:485-515.

resource_span.in_scope(|| {
    tracing::trace!(
    target: "runtime::resource::state_update",
    tx_dropped = false,
    tx_dropped.op = "override",
    )
});

resource_span.in_scope(|| {
    tracing::trace!(
    target: "runtime::resource::state_update",
    rx_dropped = false,
    rx_dropped.op = "override",
    )
});

resource_span.in_scope(|| {
    tracing::trace!(
    target: "runtime::resource::state_update",
    value_sent = false,
    value_sent.op = "override",
    )
});

resource_span.in_scope(|| {
    tracing::trace!(
    target: "runtime::resource::state_update",
    value_received = false,
    value_received.op = "override",
    )
});

Now we see what's happening, the runtime.resource span is entered separately for each event. The reason that it's entered, is that the Console Subscriber uses the current context (which spans are active) to determine which runtime.resource span a runtime::resource::state_update event should be updating. I'm not sure why the span was entered separately for each one, I believe that this could be rewritten to enter the runtime.resource just once and then emit all events - but I'd have to test to be sure.

If the Console Subscriber were modified to use the traditional tracing span parent lookup, which is to first check for an explicit parent and only if one isn't set to check the context, then we could use the parent: directive in the events and we wouldn't need to enter the runtime.resource span at all. This was discussed on the PR where resource instrumentation was first introduced into Tokio (tokio-rs/tokio#4072), but wasn't changed. Again, I'm not entirely sure why.

If we did that, then each of the runtime::resource::state_update events would look like this:

tracing::trace!(
    target: "runtime::resource::state_update",
    parent: resource_span,
    tx_dropped = false,
    tx_dropped.op = "override",
);

Maybe this is something we can look at in the future, but it would break older versions of the Console Subscriber (including the one that is the latest at the time of writing). It would avoid all the entering and exiting spans though.

After all the state updates (which I won't repeat all the traces for), the runtime.resource.async_op and runtime.resource.async_op.poll spans are created.

2024-07-31T15:35:48.587229Z TRACE 
   ⤷ runtime.resource[2]{concrete_type="Sender|Receiver", kind="Sync"}  enter
2024-07-31T15:35:48.587279Z TRACE 
   ⤷ runtime.resource[2]{concrete_type="Sender|Receiver", kind="Sync"} 
     ⤷ runtime.resource.async_op[3]{source="Receiver::await"}  new
2024-07-31T15:35:48.587360Z TRACE 
   ⤷ runtime.resource[2]{concrete_type="Sender|Receiver", kind="Sync"}  exit
2024-07-31T15:35:48.587408Z TRACE 
   ⤷ runtime.resource[2]{concrete_type="Sender|Receiver", kind="Sync"} 
     ⤷ runtime.resource.async_op[3]{source="Receiver::await"}  enter
2024-07-31T15:35:48.587463Z TRACE 
   ⤷ runtime.resource[2]{concrete_type="Sender|Receiver", kind="Sync"} 
     ⤷ runtime.resource.async_op[3]{source="Receiver::await"} 
       ⤷ runtime.resource.async_op.poll[4]{}  new
2024-07-31T15:35:48.587539Z TRACE 
   ⤷ runtime.resource[2]{concrete_type="Sender|Receiver", kind="Sync"} 
     ⤷ runtime.resource.async_op[3]{source="Receiver::await"}  exit

Here the runtime.resource span is entered again to create a runtime.resource.async_op span. In turn, the runtime.resource.async_op span is entered to create a runtime.resource.async_op.poll span. Once again, the reason that the parent span is entered before creating the next span is because the console subscriber depends on the current active span state to determine the parent of each of these spans.

Finally, we see our own debug message stating that the oneshot channel has been created.

2024-07-31T15:35:48.587592Z DEBUG 
   ⤷ runtime.spawn[1]{kind=task, task.name=tokio-oneshot, task.id=2} 
     ⤷ tokio_oneshot: oneshot created

spawning tasks

The next thing that we do is spawn the receiver and the sender tasks. If we take out the contents of the async blocks we pass to those tasks, that code looks like the following.

let jh = spawn_named("receiver", async move { .. });

spawn_named("sender", async move { .. });

debug!(?jh, "awaiting receiver task");
jh.await.unwrap();

Remember that while spawned tasks don't have to be awaited to start running, we're using a current_thread Tokio runtime, so until our tokio_oneshot task yields to the runtime, those new tasks won't get a chance to run.

The next lot of traces correspond to those lines of code.

2024-07-31T15:35:48.587650Z TRACE 
   ⤷ runtime.spawn[5]{kind=task, task.name=receiver, task.id=3}  new
2024-07-31T15:35:48.587730Z TRACE 
   ⤷ runtime.spawn[6]{kind=task, task.name=sender, task.id=4}  new
2024-07-31T15:35:48.587807Z DEBUG 
   ⤷ runtime.spawn[1]{kind=task, task.name=tokio-oneshot, task.id=2} 
     ⤷ tokio_oneshot: jh=JoinHandle { id: Id(3) } awaiting receiver task
2024-07-31T15:35:48.587876Z TRACE 
   ⤷ runtime.spawn[1]{kind=task, task.name=tokio-oneshot, task.id=2} 
     ⤷ tokio::task::waker: op="waker.clone", task.id=1
2024-07-31T15:35:48.587940Z TRACE 
   ⤷ runtime.spawn[1]{kind=task, task.name=tokio-oneshot, task.id=2}  exit

We see that 2 new runtime.spawn spans are created, each one representing one of the tasks we're spawning. The first is the receiver task, the second is the sender task. As mentioned previously, we only store the join handle jh for the receiver task.

We've got a debug line in there before we await the join handle, and we're outputting the Debug value of that join handle as well. We can see that it's the join handle for Tokio task Id 3. Looking at the task.id field in the resource.spawn spans, we can see that 3 corresponds with the receiver task, as expected.

We then see the waker for task.id=1 (for a waker event, this is the resource.spawn span ID). That span ID corresponds to the task we're currently in, the tokio_oneshot task. Then that same runtime.spawn span exits - the task it represents yields to the runtime. For more details on how to read waker events, you can read the post tokio waker instrumentation.

join handles are resources too

Not all resources in Tokio are instrumented. One of the ones that isn't is the JoinHandle. While you may not think of a join handle as a resource, it really is. It allows one task to synchronise on when another task ends, as well as transporting the return value of the future that drives that task.

In the future, instrumenting the join handle would allow us to follow the dependence of one task on the completion of another!

oneshot receiver task

Let's look at the code in the receiver task.

debug!("awaiting message");
let msg = rx.await.unwrap();
debug!(msg, "message received");

It's very short. We emit a debug message, await on the oneshot receiver and when we eventually get a message we emit another debug message with the contents of the message.

We expect (although there may not be a guarantee) that the receiver task will be polled first, and the traces show that this is indeed the case. Let's see how far we get with our receiver task.

2024-07-31T15:35:48.587997Z TRACE 
   ⤷ runtime.spawn[5]{kind=task, task.name=receiver, task.id=3}  enter
2024-07-31T15:35:48.588051Z DEBUG 
   ⤷ runtime.spawn[5]{kind=task, task.name=receiver, task.id=3} 
     ⤷ tokio_oneshot: awaiting message
2024-07-31T15:35:48.588105Z TRACE 
   ⤷ runtime.resource[2]{concrete_type="Sender|Receiver", kind="Sync"}  enter
2024-07-31T15:35:48.588166Z TRACE 
   ⤷ runtime.resource[2]{concrete_type="Sender|Receiver", kind="Sync"} 
     ⤷ runtime.resource.async_op[3]{source="Receiver::await"}  enter
2024-07-31T15:35:48.588219Z TRACE 
   ⤷ runtime.resource[2]{concrete_type="Sender|Receiver", kind="Sync"} 
     ⤷ runtime.resource.async_op[3]{source="Receiver::await"} 
       ⤷ runtime.resource.async_op.poll[4]{}  enter
2024-07-31T15:35:48.588278Z TRACE 
   ⤷ runtime.resource[2]{concrete_type="Sender|Receiver", kind="Sync"} 
     ⤷ runtime.resource.async_op[3]{source="Receiver::await"} 
       ⤷ runtime.resource.async_op.poll[4]{} 
         ⤷ tokio::task::waker: op="waker.clone", task.id=5
2024-07-31T15:35:48.588346Z TRACE 
   ⤷ runtime.resource[2]{concrete_type="Sender|Receiver", kind="Sync"} 
     ⤷ runtime.resource.async_op[3]{source="Receiver::await"} 
       ⤷ runtime.resource.async_op.poll[4]{} 
         ⤷ runtime::resource::poll_op: op_name="poll_recv", is_ready=false
2024-07-31T15:35:48.588415Z TRACE 
   ⤷ runtime.resource[2]{concrete_type="Sender|Receiver", kind="Sync"} 
     ⤷ runtime.resource.async_op[3]{source="Receiver::await"} 
       ⤷ runtime.resource.async_op.poll[4]{}  exit
2024-07-31T15:35:48.588472Z TRACE 
   ⤷ runtime.resource[2]{concrete_type="Sender|Receiver", kind="Sync"} 
     ⤷ runtime.resource.async_op[3]{source="Receiver::await"}  exit
2024-07-31T15:35:48.588524Z TRACE 
   ⤷ runtime.resource[2]{concrete_type="Sender|Receiver", kind="Sync"}  exit
2024-07-31T15:35:48.588571Z TRACE 
   ⤷ runtime.spawn[5]{kind=task, task.name=receiver, task.id=3}  exit

In these traces we can see that we enter the runtime.spawn span for our receiver task and emit the debug event "awaiting message".

Then we enter each of the 3 nested spans, which were created earlier, in turn:

runtime.resource
runtime.resource.async_op source=Receiver::await
runtime.resource.async_op.poll

Once we're inside our span hierarchy 3 levels deep, we clone the waker for the current task (the receiver task). Afterwards we see a new instrumentation event runtime::resource::poll_op. This event comes with 2 fields: op_name=poll_recv and is_ready=false. This event is emitted after poll_recv has been called on the oneshot channel's internal state and indicates that it has returned Poll::Pending and so it isn't ready yet.

After that our 3 levels of spans exit in the opposite order to which they entered. As a small aside: this enter-exit stack behaviour isn't actually guaranteed by tracing, the spans could have exited in any order.

Since our task has awaited on the oneshot receiver, it will also yield back to the runtime, and the runtime.spawn span exits, which is where this snippet of traces ends.

`async_op` vs. `async_op.poll`

One question you may be asking yourself at this point is what the difference is between a runtime.resource.async_op span and a runtime.resource.async_op.poll span.

I'm still not entirely sure, but from reading through comments in the 2 Tokio PRs, it seems to be that a runtime.resource.async_op span represents a future and it's lifetime and a runtime.resource.async_op.poll span only represents the poll operations on that future.

This makes some sense, but I'm not sure why we need the extra runtime.resource.async_op.poll span. A runtime.spawn span represents the future that is the basis for a task, I would think that runtime.resource.async_op could be the same. Currently the runtime.resource.async_op span is entered more times than the future is polled, but that is mostly so that specific events can be emitted within the right scope. I wonder whether we could use explicit parents to link those events to the necessary parent and then reserve entering the runtime.resource.async_op span for when a task is polled. We could set it up so that the state update belongs to an async op (ready=true|false) and simplify the instrumentation somewhat.

Having a separate runtime.resource.async_op.poll may still be necessary if we think that a future could be moved between tasks between polls (although I'm not sure this could be managed, it's certainly fraught with problems), but currenlty there isn't a new runtime.resource.async_op.poll span for each poll anyway, there is a one-to-one relationship between runtime.resource.async_op and runtime.resource.async_op.poll spans.

oneshot sender task

The sender task also has only a small bit of code.

debug!("sending message");
tx.send(5).unwrap();
debug!("message sent");

Here we emit an initial debug message "sending message", then we send our message 5 via the oneshot sender that we have. We unwrap the result as we expect it to pass (and would really rather just fail than gracefully handle something unexpected in demo code). Finally we emit another debug message "message sent".

An important point to note here is that the send method on the oneshot sender isn't async. It doesn't block either. It will either send the first and only message through the channel, or it will fail.

Let's have a look at the traces for this part of the code.

2024-07-31T15:35:48.588625Z TRACE 
   ⤷ runtime.spawn[6]{kind=task, task.name=sender, task.id=4}  enter
2024-07-31T15:35:48.588674Z DEBUG 
   ⤷ runtime.spawn[6]{kind=task, task.name=sender, task.id=4} 
     ⤷ tokio_oneshot: sending message
2024-07-31T15:35:48.588726Z TRACE 
   ⤷ runtime.spawn[6]{kind=task, task.name=sender, task.id=4} 
     ⤷ tokio::task::waker: op="waker.wake_by_ref", task.id=5
2024-07-31T15:35:48.588788Z TRACE 
   ⤷ runtime.resource[2]{concrete_type="Sender|Receiver", kind="Sync"}  enter
2024-07-31T15:35:48.588836Z TRACE 
   ⤷ runtime.resource[2]{concrete_type="Sender|Receiver", kind="Sync"} 
     ⤷ runtime::resource::state_update: value_sent=true, value_sent.op="override"
2024-07-31T15:35:48.588893Z TRACE 
   ⤷ runtime.resource[2]{concrete_type="Sender|Receiver", kind="Sync"}  exit
2024-07-31T15:35:48.588941Z DEBUG 
   ⤷ runtime.spawn[6]{kind=task, task.name=sender, task.id=4} 
     ⤷ tokio_oneshot: message sent
2024-07-31T15:35:48.588991Z TRACE 
   ⤷ runtime.spawn[6]{kind=task, task.name=sender, task.id=4}  exit

This snippet is a bit shorter than for the first time the receiver was polled. Let's go through it.

We poll the sender task, which we see as entering the corresponding runtime.spawn span. Then we see our first debug message. Interestingly, we then wake the waker for the receiver task (we know which one it is because the task.id field matches the span ID of the receiver task's runtime.spawn span). It's only after the waker is woken that we enter the runtime.resource span and update the oneshot channel's resource state with a runtime::resource::state_update event. We set value_sent=true, so we know that by this point, the value has been sent (it is set in the channel's internal state).

I think that perhaps this could be reworked to enter the runtime.resource span for the duration of the send function, that way we would be able to link waking the waker to not just this task (the sender task), but also to this specific runtime.resource which represents the oneshot channel.

Another interesting thing is that the waker is woken by reference. Normally when a resource clones a waker it will wake it by value (which consumes the waker in the process). I was wondering why this isn't the case here, so I went to where the waker is woken by reference in the code (oneshot.rs:1133) and found this:

// TODO: Consume waker?

That line was last modified in 2019 (tokio-rs/tokio#1120). This is fine though, when the channel is dropped, the waker will be as well.

Finally, we see our second debug message "message sent" and then the runtime.spawn span ends, indicating that the task has yielded to the runtime. Since we have no .await points in this task, we know that the task must have returned Poll::Ready, which we can confirm in the next 3 traces.

2024-07-31T15:35:48.589041Z TRACE 
   ⤷ runtime.spawn[6]{kind=task, task.name=sender, task.id=4}  enter
2024-07-31T15:35:48.589088Z TRACE 
   ⤷ runtime.spawn[6]{kind=task, task.name=sender, task.id=4}  exit
2024-07-31T15:35:48.589133Z TRACE 
   ⤷ runtime.spawn[6]{kind=task, task.name=sender, task.id=4}  close

Here we see the runtime.spawn span quickly enter and exit, before finally closing.

enter-exit dance

This last enter-exit pair before a runtime.spawn span closes doesn't represent a poll, it is done by the Instrumented struct when it gets dropped. It enters the span that is instrumenting the future prior to dropping it.

This behaviour was introduced in tokio-rs/tracing#2541 specifically as a way to have the instrumenting span in context for the drop operation (it was then fixed, because the original implementation caused an unintended breaking change which is now on Predrag's radar, but no cargo-semver-checks lint has been implemented yet). Unfortunately this also means that we have to take into consideration that a runtime.spawn span will always have 1 extra enter-exit pair than actual polls. The only alternative would be to vendor the Instrumented struct without the drop behaviour, which seems excessive.

Now that the sender task has completed, we can go back to our receiver task which should now get polled again!

receiving the message

As we were hoping, now that the sender has sent a message through the oneshot channel, the receiver gets polled.

The next swath of traces is a bit long, so let's just look up until the runtime.resource.async_op span enters and exits again.

⤷ runtime.spawn[5]{kind=task, task.name=receiver, task.id=3}  enter
2024-07-31T15:35:48.589248Z TRACE 
   ⤷ runtime.resource[2]{concrete_type="Sender|Receiver", kind="Sync"}  enter
2024-07-31T15:35:48.589294Z TRACE 
   ⤷ runtime.resource[2]{concrete_type="Sender|Receiver", kind="Sync"} 
     ⤷ runtime.resource.async_op[3]{source="Receiver::await"}  enter
2024-07-31T15:35:48.589345Z TRACE 
   ⤷ runtime.resource[2]{concrete_type="Sender|Receiver", kind="Sync"} 
     ⤷ runtime.resource.async_op[3]{source="Receiver::await"} 
       ⤷ runtime.resource.async_op.poll[4]{}  enter
2024-07-31T15:35:48.589402Z TRACE 
   ⤷ runtime.resource[2]{concrete_type="Sender|Receiver", kind="Sync"} 
     ⤷ runtime.resource.async_op[3]{source="Receiver::await"} 
       ⤷ runtime.resource.async_op.poll[4]{} 
         ⤷ runtime::resource::poll_op: op_name="poll_recv", is_ready=true
2024-07-31T15:35:48.589470Z TRACE 
   ⤷ runtime.resource[2]{concrete_type="Sender|Receiver", kind="Sync"} 
     ⤷ runtime.resource.async_op[3]{source="Receiver::await"} 
       ⤷ runtime.resource.async_op.poll[4]{} 
         ⤷ tokio::task::waker: op="waker.drop", task.id=5
2024-07-31T15:35:48.589539Z TRACE 
   ⤷ runtime.resource[2]{concrete_type="Sender|Receiver", kind="Sync"} 
     ⤷ runtime.resource.async_op[3]{source="Receiver::await"} 
       ⤷ runtime.resource.async_op.poll[4]{}  exit
2024-07-31T15:35:48.589597Z TRACE 
   ⤷ runtime.resource[2]{concrete_type="Sender|Receiver", kind="Sync"} 
     ⤷ runtime.resource.async_op[3]{source="Receiver::await"}  exit

Here we see the runtime.spawn span representing the receiver task enter, which is expected - it just got polled again after having been woken by the send operation.

Then successively the runtime.resource, runtime.resource.async_op, and runtime.resource.async_op.poll spans are all entered. This indicates that the receiver is being polled. We see a runtime::resource::poll_op event indicating that the leaf future was polled via poll_recv and that it returned Poll::Ready (because that event has the field is_ready=true). So it looks like we've successfully got a result. Note that at this point, we don't actually know whether we have a value or whether an error will be returned.

Following the completion of the poll operation, we see that the waker is dropped. This is expected as the sender half of the oneshot channel woke the receiver task by reference, not by value, which we saw in the oneshot sender task section.

After that, the runtime::resource::poll_op exits and then so does the runtime.resource.async_op span.

Now let's follow the traces through until the receiver task completes.

2024-07-31T15:35:48.589649Z TRACE 
   ⤷ runtime.resource[2]{concrete_type="Sender|Receiver", kind="Sync"}  exit
2024-07-31T15:35:48.589843Z TRACE 
   ⤷ runtime.resource[2]{concrete_type="Sender|Receiver", kind="Sync"} 
     ⤷ runtime.resource.async_op[3]{source="Receiver::await"} 
       ⤷ runtime.resource.async_op.poll[4]{}  close
2024-07-31T15:35:48.589900Z TRACE 
   ⤷ runtime.resource[2]{concrete_type="Sender|Receiver", kind="Sync"} 
     ⤷ runtime.resource.async_op[3]{source="Receiver::await"}  close
2024-07-31T15:35:48.589955Z TRACE 
   ⤷ runtime.resource[2]{concrete_type="Sender|Receiver", kind="Sync"}  close
2024-07-31T15:35:48.590014Z DEBUG 
   ⤷ runtime.spawn[5]{kind=task, task.name=receiver, task.id=3} 
     ⤷ tokio_oneshot: msg=5 message received
2024-07-31T15:35:48.590070Z TRACE 
   ⤷ runtime.spawn[5]{kind=task, task.name=receiver, task.id=3}  exit
2024-07-31T15:35:48.590207Z TRACE 
   ⤷ runtime.spawn[5]{kind=task, task.name=receiver, task.id=3}  enter
2024-07-31T15:35:48.590257Z TRACE 
   ⤷ runtime.spawn[5]{kind=task, task.name=receiver, task.id=3}  exit
2024-07-31T15:35:48.590433Z TRACE 
   ⤷ runtime.spawn[5]{kind=task, task.name=receiver, task.id=3}  close

First up, we see the runtime.resource span exit (it had entered in the first traces we saw in this section). After that, our hierarchy of three spans close in the reverse order to their creation: runtime.resource.async_op.poll, runtime.resource.async_op, and finally runtime.resource. We now understand that the oneshot channel itself has been dropped.

The oneshot channel drops before the end of the scope because awaiting on the receiving half consumes it, so once the future completes that receiving half is dropped (and the sender half had already been dropped earlier).

Finally, we see our debug event stating that the message has been received, and giving us the value (5!). Now we know that receiving the message was successful.

After our debug event, the runtime.spawn span representing the receiver task exits (complete with enter-exit dance) and then gets dropped.

And that's the end of the resource instrumentation. Or at least, it should be.

resource state values

Perhaps you remember that when the resource was created, there were 4 resource state values which were set:

tx_dropped=false
rx_dropped=false
value_sent=false
value_received=false

However, only 1 of these was updated! That was:

value_sent=true

What happened to the others? I did a bit of digging into the implementation, and found that there are event macros that update these values in the code, but it looks like they're not hit on all code paths. It just so happens that my test code missed a bunch of them. We already have some tests for the tracing instrumentation in Tokio. However, they're currently quite limited.

The plan is to extend them to cover more cases in the future, it looks like I've already found the first thing that needs a test!

finishing up

We've created a oneshot channel, waiting for a message, sent that message, and then finally received it. And we got to see all of this happening through the metaphorical eyes of Tokio's tracing instrumentation.

Let's remind ourselves of the final bit of code.

async fn tokio_oneshot() {
    // Everything else that has already happened.

    debug!(?jh, "awaiting receiver task");
    jh.await.unwrap();
    debug!("good-bye");
}

We had already seen the debug event "awaiting receiver task" earlier on. Since this task ran until we awaited on the join handle for the receiver task (the variable jh). Here are the final traces.

2024-07-31T15:35:48.590493Z TRACE 
   ⤷ tokio::task::waker: op="waker.wake_by_ref", task.id=1
2024-07-31T15:35:48.590554Z TRACE 
   ⤷ runtime.spawn[1]{kind=task, task.name=tokio-oneshot, task.id=2}  enter
2024-07-31T15:35:48.590614Z TRACE 
   ⤷ runtime.spawn[1]{kind=task, task.name=tokio-oneshot, task.id=2} 
     ⤷ tokio::task::waker: op="waker.drop", task.id=1
2024-07-31T15:35:48.590674Z DEBUG 
   ⤷ runtime.spawn[1]{kind=task, task.name=tokio-oneshot, task.id=2} 
     ⤷ tokio_oneshot: good-bye
2024-07-31T15:35:48.590727Z TRACE 
   ⤷ runtime.spawn[1]{kind=task, task.name=tokio-oneshot, task.id=2}  exit
2024-07-31T15:35:48.590774Z TRACE 
   ⤷ runtime.spawn[1]{kind=task, task.name=tokio-oneshot, task.id=2}  enter
2024-07-31T15:35:48.590820Z TRACE 
   ⤷ runtime.spawn[1]{kind=task, task.name=tokio-oneshot, task.id=2}  exit
2024-07-31T15:35:48.590865Z TRACE 
   ⤷ runtime.spawn[1]{kind=task, task.name=tokio-oneshot, task.id=2}  close

Now that we've remembered that our tokio_oneshot task had awaited the join handle for the receiver task, it's no surprise that the first event we see is a waker event, specifically the tokio_oneshot task getting woken up. This tokio::task::waker event has no parent span, so it didn't occur in any task or resource that we're aware of. This is happening somewhere inside the runtime.

After that we see the runtime.spawn span representing the tokio_oneshot task is entered. The waker is dropped, presumably when the join handle is consumed, although to be sure we'd need to instrument join handles as described in the section join handles are resources too.

To finish off, we see our debug message "good-bye" and then the runtime.spawn span exits and closes (enter-exit dance included).

That's almost the end of the traces, but it isn't...

but wait there's more

There are actually 3 more traces at the end, and it took me a little while to realise what was going on. Here they are.

2024-07-31T15:35:48.590983Z TRACE 
   ⤷ runtime::resource::state_update: tx_dropped=true, tx_dropped.op="override"
2024-07-31T15:35:48.591050Z TRACE 
   ⤷ runtime::resource::poll_op: op_name="poll_recv", is_ready=true
2024-07-31T15:35:48.591106Z TRACE 
   ⤷ runtime::resource::state_update: rx_dropped=true, rx_dropped.op="override"

These traces show state updates that look like ones we've seen previously. They don't have any parent spans, which seems weird, but we can see the following things happen:

a sender is dropped
a receiver is polled and returns Poll::Ready
a receiver is dropped

We've seen 2 of these messages before: the sender getting dropped and the poll_recv operation. And believe it or not, these 3 traces just happen to belong to the same resource that we've been investigating during this whole blog post, they're from a oneshot channel! This can be verified with a quick search for rx_dropped in the Tokio codebase. No link for this as only the default branch for a project is indexed, so the link may be out of date if more channels get instrumented.

So what is it? It turns out that the mechanism used to shut down the blocking pool uses a oneshot channel internally, so these messages come from there!

With that mystery solved, we've made our way to the end of the traces and the end of the post. I learnt a lot while writing this post, which was pretty much the idea. I hope that the 3 of you who have an interest in the internals of Tokio's tracing instrumentation also learnt something!