Saturday, 27 October 2018

The clock used for waiting on a condition variable is a property of the wait, not the condition variable

After writing my previous blog post regarding libstdc++ and libc++ being prone to races when waiting on a condition variable, I proposed the addition of a pthread_cond_timedwaitonclock function on Austin Group mailing list. The Austin Group is responsible for the POSIX® standards. Some of the responses I received made me realise that it may not be completely clear why the clock being a property of the wait rather than of the condition variable itself is beneficial. Some response even suggested that the fault lie with the C++ standard and it should be fixed so that it can be implemented on top of POSIX as it stands today.

So, let's consider this simple program that uses POSIX threads directly from C:

#include <pthread.h>
#include <stdbool.h>
#include <stdio.h>
#include <unistd.h>

struct my_state {
    pthread_mutex_t protect_result;
    pthread_cond_t result_changed;
    bool result_ready;
    double result;
};

void *other_thread(void *state_in)
{
    struct my_state *state = state_in;

    /* Do some work */
    sleep(40);

    pthread_mutex_lock(&state->protect_result);
    state->result = 42.0;
    state->result_ready = 1;
    pthread_mutex_unlock(&state->protect_result);
    pthread_cond_broadcast(&state->result_changed);
    return NULL;
}

bool wait_for_result(struct my_state *state)
{
    struct timespec expiry;
    clock_gettime(CLOCK_MONOTONIC, &expiry);
    expiry.tv_sec += 42;

    pthread_mutex_lock(&state->protect_result);
    while (!state->result_ready) {
        if (pthread_cond_timedwait(&state->result_changed,
                                   &state->protect_result,
                                   &expiry)) {
            pthread_mutex_unlock(&state->protect_result);
            fprintf(stderr, "Timeout!\n");
            return false;
        }
    }

    pthread_mutex_unlock(&state->protect_result);
    return true;
}

int main()
{
    struct my_state state;
    pthread_t thread;
    state.result_ready = false;

    pthread_mutex_init(&state.protect_result, NULL);

    pthread_condattr_t attr;
    pthread_condattr_init(&attr);
    pthread_condattr_setclock(&attr, CLOCK_MONOTONIC);
    pthread_cond_init(&state.result_changed, &attr);

    if (pthread_create(&thread, NULL, other_thread, &state) != 0) {
        fprintf(stderr, "Failed to create thread\n");
        return 2;
    }

    if (wait_for_result(&state))
        printf("Result=%f\n", state.result);
    else
        fprintf(stderr, "Timeout!\n");

    pthread_join(thread, NULL);

    return 0;
}

Hopefully there's nothing particularly surprising here. Notice that it's necessary to pass CLOCK_MONOTONIC to clock_gettime in wait_for_result and to pthread_condattr_setclock in main. If the clocks passed in these two locations differ, or the whole pthread_condattr_t initialisation is omitted altogether (which means that the default of CLOCK_REALTIME would be used) then we'll end up passing timespecs measured against different clocks, and we'll probably be waiting for either a very long time or no time at all.

Now, let's consider an approximately-equivalent C++ program:

#include <chrono>
#include <condition_variable>
#include <iostream>
#include <mutex>
#include <optional>
#include <thread>

using namespace std::experimental;
using namespace std::chrono;
using namespace std;

class my_state {
    mutex protect_result;
    condition_variable result_changed;
    optional<double> result;

public:
    void other_thread()
    {
        // Do some work
        std::this_thread::sleep_for(seconds(40));

        unique_lock<mutex> lock(protect_result);
        result = 42.0;
        result_changed.notify_all();
    }

    optional<double> wait_for_result()
    {
        unique_lock<mutex> lock(protect_result);
        result_changed.wait_until(lock, steady_clock::now() + seconds(42), [this] {
                return result.has_value();
            });
        return result;
    }
};

int main()
{
    my_state state;
    auto t = std::thread([&state]{ state.other_thread(); });

    optional<double> result = state.wait_for_result();
    if (result.has_value()) {
        cout << *result << "\n";
        return 0;
    } else {
        cerr << "Timeout\n";
        return 1;
    }
}

Unfortunately this program isn't really a good example of the way this would be done in C++1, but it serves as a simple way to show waiting on a condition variable.

The important part is that there is only one mention of a clock. The wait_for_result function does the equivalent of clock_gettime by calling std::chrono::steady_clock::now() which returns an instance of std::chrono::steady_lock::time_point. This is a distinct type that represents (in libstdc++ and libc++ at least) a time point measured against the CLOCK_MONOTONIC clock. The expiration time is set by adding a number of seconds. The std::condition_variable::wait_until method knows from the type of the timeout parameter which clock should be used for the wait. There's no longer any duplication.

If we knew that we had a highly-synchronised realtime clock and wanted to co-ordinate between machines then we could switch to using CLOCK_REALTIME by simply changing steady_clock to system_clock. No other changes are required.2 The type of the parameter passed to std::condition_variable::wait_until will now be std::chrono::system_clock::time_point, so the implementation will know which clock to use.

Of course, in this example, the wait_for_result function would actually be better written as:

    optional<double> wait_for_result()
    {
        unique_lock<mutex> lock(protect_result);
        result_changed.wait_for(lock, seconds(42), [this] {
                return result.has_value();
            });
        return result;
    }

so there's no mention of the clock at all. std::condition_variable::wait_for turns into the equivalent call to wait_until using std::chrono::steady_clock.

If the clock to use is part of the type of the time point, then it's not possible to accidentally wait using the wrong clock. Library code can easily provide a wait interface that is agnostic to the actual clock that will be used, whilst providing the same guarantees. It's even possible to support user-defined clocks (although these clocks need to be converted to a supported clock when making system calls.)

If, as POSIX currently requires, the clock must be provided at the time of std::condition_variable construction, then it is still possible to provide some of this clock agnosticism. An optional clock template parameter could be added to the std::condition_variable type and used to pass an appropriate clock to pthread_condattr_setclock. In the implementation of std::condition_variable::wait_until, any clock that doesn't match must be converted to the clock specified during construction (which must be stored, since cannot be retrieved from the condition variable instance itself.) A condition variable created to use CLOCK_MONOTONIC can wait against a std::chrono::system_clock::time_point parameter, but any warping of the system clock during the wait will have no effect. A condition variable created to use CLOCK_REALTIME cannot wait against a std::chrono::steady_clock::time_point parameter without risking an incorrect wait time if the system clock is warped at the wrong moment.

In conclusion, there are real race conditions here for systems that may not always have access to a reliable source of real time. This is a particular problem just after boot. Adding appropriate functions to POSIX that can wait against CLOCK_MONOTONIC can solve these problems.


  1. This situation is exactly what std::future is used for, so we'd be better off using std::async.

  2. On Linux at least, CLOCK_MONOTONIC is affected by incremental time adjustments applied by adjtime anyway. However, if the system is using absolute timeouts across machines then CLOCK_REALTIME may still be an appropriate choice.

Thursday, 28 June 2018

Type-agnostic tracing using {fmt}

Nearly ten years after my last article for an ACCU journal, I've managed to write another one. This one is about using the {fmt} library to replace an old printf-based tracing system and can be read in both HTML and PDF forms. I must thank Victor Zverovich, the author of {fmt}, and the Overload editorial team for their help with writing it.

Sunday, 24 June 2018

It's about time. Monotonic time.

std::condition_variable::wait_for prone to early/late timeout with libstdc++

Take this simple program:

#include <chrono>
#include <condition_variable>
#include <iostream>
#include <mutex>
#include <thread>

int main()
{
    std::mutex m;
    std::condition_variable cv;

    std::unique_lock<std::mutex> lock(m);

    auto const start = std::chrono::steady_clock::now();
    auto const result = cv.wait_for(lock, std::chrono::seconds(2));
    auto const finish = std::chrono::steady_clock::now();

    auto const elapsed_ms = std::chrono::duration_cast<std::chrono::milliseconds>(finish - start);

    if (result == std::cv_status::timeout)
        std::cout << "Timeout after " << elapsed_ms.count() << "ms\n";
    else
        std::cout << "Awoken after " << elapsed_ms.count() << "ms\n";

    return 0;
}

It looks like this program should wait in vain for a condition variable to be notified before timing out after two seconds and reporting a timeout. However, when I compile it with GCC and run it on one Linux machine, I get:

$ ./wait
Timeout after 200ms
$ ./wait
Timeout after 200ms
$ ./wait
Timeout after 200ms

When I do the same on a different machine, I get:

$ ./wait

and the program does not exit.

This might be surprising until you learn that I was also running this program to advance the system clock by one second every 100ms on the first machine:

#include <stdio.h>
#include <time.h>
#include <unistd.h>

int main(int argc, char *argv[])
{
    struct timespec ts;
    clock_gettime(CLOCK_REALTIME, &ts);

    while (true)
    {
        usleep(100000);
        ++ts.tv_sec;
        clock_settime(CLOCK_REALTIME, &ts);
    }
}

and a similar program that just kept rewinding the system clock to the start time every 100ms on the second machine:

#include <stdio.h>
#include <time.h>
#include <unistd.h>

int main(int argc, char *argv[])
{
    struct timespec ts;
    clock_gettime(CLOCK_REALTIME, &ts);

    while (true) {
        usleep(100000);
        clock_settime(CLOCK_REALTIME, &ts);
    }
}

It seems that the behaviour of std::condition_variable::wait_for is being influenced by changing the system clock. That's a strange thing for a function being passed a duration to do.

Why does this happen?

To work out why this happens, we need to delve into the libstdc++ implementation. Here's the implementation of wait_for from libstdc++'s condition_variable header:

template<typename _Rep, typename _Period>
  cv_status
  wait_for(unique_lock<mutex>& __lock,
           const chrono::duration<_Rep, _Period>& __rtime)
  {
    using __dur = typename __clock_t::duration;
    auto __reltime = chrono::duration_cast<__dur>(__rtime);
    if (__reltime < __rtime)
      ++__reltime;
    return wait_until(__lock, __clock_t::now() + __reltime);
  }

It seems sane enough, it adds the current time to the relative timeout that was passed to generate an absolute time and then calls wait_until to wait for the condition variable to be notified, or for that absolute time to be reached. But, what's this __clock_t type? That's a typedef from earlier:

typedef chrono::system_clock        __clock_t;

Aha! So, __clock_t is std::chrono::system_clock which is a clock that does change in response to the system time being changed. This explains why the behaviour of std::condition_variable::wait_for is affected by system clock changes.

So, what can we do to avoid this? How about changing the code to explictly use std::chrono::steady_clock instead of __clock_t? Unfortunately that doesn't help, because that would mean that this overload of wait_until will be called:

template<typename _Clock, typename _Duration>
  cv_status
  wait_until(unique_lock<mutex>& __lock,
             const chrono::time_point<_Clock, _Duration>& __atime)
  {
    // DR 887 - Sync unknown clock to known clock.
    const typename _Clock::time_point __c_entry = _Clock::now();
    const __clock_t::time_point __s_entry = __clock_t::now();
    const auto __delta = __atime - __c_entry;
    const auto __s_atime = __s_entry + __delta;

    return __wait_until_impl(__lock, __s_atime);
  }

This code converts an absolute time measured against an arbitrary clock, in our case std::chrono::steady_clock, to __clock_t aka std::chrono::system_clock. So we're back to using a clock that can be changed again. But there's a clue here, what's this "DR 887"? We'll come back to that later.

It seems that what we need is an implementation of __wait_until_impl that accepts a std::chrono::steady_clock. Let's have a look at the __clock_t implementation:

template<typename _Dur>
  cv_status
  __wait_until_impl(unique_lock<mutex>& __lock,
                    const chrono::time_point<__clock_t, _Dur>& __atime)
  {
    auto __s = chrono::time_point_cast<chrono::seconds>(__atime);
    auto __ns = chrono::duration_cast<chrono::nanoseconds>(__atime - __s);

    __gthread_time_t __ts =
      {
        static_cast<std::time_t>(__s.time_since_epoch().count()),
        static_cast<long>(__ns.count())
      };

    __gthread_cond_timedwait(&_M_cond, __lock.mutex()->native_handle(),
                             &__ts);

    return (__clock_t::now() < __atime
            ? cv_status::no_timeout : cv_status::timeout);
  }

It boils down to calling __gthread_cond_timedwait. I'm not going to go through the gthreads abstraction here, so let's just pretend that it's a call to pthread_cond_timedwait, which is what it ultimately is on POSIX-like systems.

pthread_cond_timedwait does support waiting against both std::chrono::system_clock (as CLOCK_REALTIME) and std::chrono::steady_clock (as CLOCK_MONOTONIC) but the clock to use is a property set at the time that the condition variable is created by using pthread_condattr_setclock and can't be changed later. Unfortunately, for std::condition_variable we don't know which clock to use until the time of the wait, and in fact there could be a mixture of clocks used with any particular std::condition_variable instance.

Why do we care about this?

Of course it's unrealistic to be running programs that modify the system clock in this way continually. However, I'm just using them to make a rare problem more obvious. Systems, particularly embedded ones, may boot without knowing the current time and only set the system clock later. This may be quite a long time later if they don't immediately have an Internet connection. A desktop user may notice that their clock is completely wrong and take action to correct it.

Problems seen by the test programs above might be seen only rarely, but that doesn't stop them potentially having bad and difficult-to-diagnose effects. There has been a trend among many Linux system libraries over the last few years away from using CLOCK_REALTIME (often just via time(2) or gettimeofday(2)) towards using CLOCK_MONOTONIC instead.1

I went through a large-ish embedded Linux code base to identify the places in the code that used timeouts when waiting for a condition variable. I found the following possibilities:

  1. Waiting for a period of time whilst still responding immediately to other requests that notify the condition variable, such as being told to exit. For example, flashing an LED to indicate an error until an external notification comes in to say that the error is no longer important. Or perhaps polling an external device periodically but still responding immediately to an exit request.

  2. Supporting a timeout when reading from an asynchronously-filled queue.

  3. A deferred procedure call implementation that will execute at a specified time, but still wants to respond immediately to cancellation requests or to add other items to the queue.

  4. Giving up on data arriving when updating the screen in real time. In these cases the timeout will be very short, and exceeding it would cause visual glitches.

  5. Providing an upper bound to how long the wait can last. Such cases are followed by calls to std::terminate if the timeout occurs. This was particularly common in unit tests.

In all of these cases it is important that the wait doesn't respond with a timeout earlier than it should.

In all but the last of these cases it is important that the wait doesn't continue once the timeout time has been reached.

In none of these cases are we interested in measuring our timeout against a system clock that might be changed.

I think that the majority of users of std::condition_variable would be surprised to discover that relative waits and waits against std::chrono::steady_clock are affected by the system clock changing.

DR 887

From what I can glean, this problem was known about before std::condition_variable was standardised and was known as "DR-887". The best documentation I can find for why it was not deemed important enough to fix is in N4486 (search for 887) where Howard (presumably Howard Hinnant) seems to only be considering the last use case described above:

condition_variable::wait_until (and pthread_cond_timedwait) is little more than a convenience function for making sure condition_variable::wait doesn't hang for an unreasonable amount of time (where the client gets to define "unreasonable"). I do not think it is in anyone's interest to try to make it into anything more than that.

There's plenty of other discussion there too. Including some suggestions that I consider below.

What can we do about this?

There are a number of different ways that we can fix this, but before discussing those, let's talk about some useful features of the current implementation that we might want to consider not breaking.

Waits that want to react to system clock changes

With recent glibc and recent kernels, pthread_cond_timedwait with CLOCK_REALTIME behaves very well in the face of system clock changes if you genuinely are interested in "wall" time. For example, you might be implementing a calendar application and want to be woken up when another thread changes any of the events or if the first in a sorted series of events expires. In that case, you would want to react to system clock changes. The kernel is capable of waking up the futex used internally by glibc's pthread_cond_timedwait implementation immediately if necessary when the system clock is changed. Similarly, if the system clock is moved back then the futex remains dormant until the desired timeout is reached. This is relatively2 new behaviour in glibc since the required support was only added to the Linux kernel in v2.6.28. Previously CLOCK_REALTIME waits were converted to relative timeouts (measured internally by the kernel against CLOCK_MONOTONIC) and once that was done the duration of the wait was immune to system clock changes.

It's not clear to me that there are many, or perhaps even any, applications that make use of this behaviour. Certainly, I've never written such code.

Possible solutions

Now, let's look at the possibilities for solving this.

Just report a spurious wakeup if we wake up early

We could stop wait_until and wait_for returning cv_status::timeout if the time hasn't been reached when measured against the caller-supplied clock even if the underlying wait against std::chrono::system_clock returns a timeout.

This is the approach that Clang's libc++ takes. My reading of the DR-887 link above makes me think that this behaviour might even be mandatory, which would make libstdc++'s current implementation deficient.

class condition_variable
{
  //...
  typedef chrono::steady_clock        __steady_clock_t;

  template<typename _Rep, typename _Period>
  cv_status
  wait_for(unique_lock<mutex>& __lock,
           const chrono::duration<_Rep, _Period>& __rtime)
  {
    using __dur = typename __steady_clock_t::duration;
    auto __reltime = chrono::duration_cast<__dur>(__rtime);
    if (__reltime < __rtime)
      ++__reltime;
    return wait_until(__lock, __steady_clock_t::now() + __reltime);
  }

  template<typename _Clock, typename _Duration>
  cv_status
  wait_until(unique_lock<mutex>& __lock,
             const chrono::time_point<_Clock, _Duration>& __atime)
  {
    // DR 887 - Sync unknown clock to known clock.
    const typename _Clock::time_point __c_entry = _Clock::now();
    const __clock_t::time_point __s_entry = __clock_t::now();
    const auto __delta = __atime - __c_entry;
    const auto __s_atime = __s_entry + __delta;

    // We might get a timeout from __clock_t but we shouldn't have
    // done when measured against the caller-specified clock.
    if (__wait_until_impl(__lock, __s_atime) == cv_status::timeout)
      return _Clock::now() < __atime ? cv_status::no_timeout : cv_status::timeout;
    else
      return cv_status::no_timeout;
  }

  //...
};

If the system clock is advanced causing the pthread_cond_timedwait on a CLOCK_REALTIME condition variable to wake up with a timeout then we can check the desired timeout against std::chrono::steady_clock::now() and only return a timeout if that time has been exceeded. If it hasn't we must indicate a spurious wakeup by returning cv_status::no_timeout to avoid racing against notification of the condition variable. The caller should check the actual condition and call wait_until again if necessary (or even better, use the overload that takes a predicate that will do this automatically.)

If the system clock is put back then pthread_cond_timedwait will stay blocked inside the kernel waiting for the timeout to be reached. Of course there's no guarantee how quickly the wait will return on timeout anyway, but since the system clock could be put back by an arbitrary amount of time, the wait could be delayed significantly. If the explanation in "DR 877" is correct, then perhaps this isn't a significant problem, but several of the use cases described above are broken by this behaviour.

This solution is definitely preferable to the current situation, and perhaps it's worth applying it to libstdc++ whilst we work on a better long-term solution.3

Tell pthread_cond_timedwait to always use CLOCK_MONOTONIC

This is equivalent to changing the definition of __clock_t to

typedef chrono::steady_clock        __clock_t;

and making the necessary changes to gthreads to create the condition variable having called pthread_condattr_setclock with CLOCK_MONOTONIC. It looks like this is the approach that recent versions of Boost.Thread have taken. This takes us back to the behaviour with versions of glibc before it added support for FUTEX_CLOCK_REALTIME but without the potential race condition that existed then.

The trouble is that we lose the desirable behaviour of using CLOCK_REALTIME when reacting to system clock changes. We effectively create the same problem in reverse: for an application that cares about "wall time", there's a risk that waits will be too short or too long. Of course, we can apply the short-term solution described earlier in the std::chrono::system_clock case so we can avoid reporting timeout to early, but we may still wait longer than necessary.

The non-standard template parameter

We could add a non-standard template parameter to std::condition_variable to indicate the clock to be specified when creating the underlying POSIX condition variable. This would result in code like:

std::condition_variable<std::chrono::steady_clock> cv;
//...
cv.wait_for(std::chrono::seconds(10));

This allows us to specify the __clock_t that all other clocks are converted to for each condition variable.

This could be made to work (with the required gthreads changes), but requires clients to modify their code to make it work properly. We could consider combining this with the previous solution to make the default for __clock_t be std::chrono::steady_clock, but allowing the calling code to choose std::chrono::system_clock if they desired so. This isn't pretty though, and means that it's not possible to tell purely from the code performing the wait whether it is susceptible to waiting for the wrong time when the system clock changes.

Add a non-standard constructor parameter

This was suggested in the response to "DR 887". I think that the previous solution is better since in C++ a clock is a type and not a value and it's clearer to provide a C++ clock such as std::chrono::steady_clock than a POSIX clockid_t like CLOCK_MONOTONIC.

Invent a new pthreads function

There's no reason why glibc couldn't provide a function that accepts a parameter to indicate the clock to use for the wait:

int pthread_cond_timedwaitonclock_np (pthread_cond_t *cond, pthread_mutex_t *mutex,
                                      clockid_t clockid, const struct timespec *abstime);

This function ignores any clock specified at the time the condition variable was created and uses the one from the clockid parameter instead.

I've been trying to get such a change into glibc for several years, but have so far failed to do so. The current opinion on the glibc mailing list appears to be that we need to get the Austin Group4 to agree to adding the new function.

Reimplement std::condition_variable on top of futex(2)

libstdc++ already implements something similar to a condition variable inside its implementation of std::future using the Linux-specific futex(2) system call. The same could be done for std::condition_variable (although some care would need to be taken to avoid a dependency cycle since std::future can also fall back to using std::condition_variable.)

This would mean quite a risky change that could affect the behaviour of existing programs.

Related problems

This problem doesn't just affect std::condition_variable.

std::timed_mutex

std::timed_mutex::try_lock_for and std::timed_mutex::try_lock_until suffer from exactly the same problem. However, POSIX does not provide any way to wait on a mutex using CLOCK_MONOTONIC. Inventing pthread_mutex_timedlockonclock_np would be a potential fix for that.

std::future

std::future::wait_for and std::future::wait_until also suffer from the same problem. The underlying implementation on modern Linux systems uses futex(2) directly so, unlike the above, this can be fixed entirely within libstdc++ when the correct method can be agreed upon.

What next?

Good question. I have failed in my attempts to get either my pthread_timedwaitonclock_np or std::future patches accepted. I've even failed to get the related self-contained std::future fix in. By writing this blog post I aim to raise awareness in the hope that someone reading it will review my patches and help me to get them accepted.


  1. In fact, about twenty years ago my desktop screen would go blank each time I dialled my ISP because doing so automatically ran ntpdate to correct my clock. Back then the kernel's screensaver timeout was implemented using the system clock rather than a monotonic clock.

  2. Well, "relatively" for anyone who has been fighting with this problem for as long as I have, anyway.

  3. In fact, I submitted this change after I wrote this blog post, and it was accepted.

  4. The people in charge of the POSIX standard.