Sunday 24 June 2018

It's about time. Monotonic time.

std::condition_variable::wait_for prone to early/late timeout with libstdc++

Take this simple program:

#include <chrono>
#include <condition_variable>
#include <iostream>
#include <mutex>
#include <thread>

int main()
{
    std::mutex m;
    std::condition_variable cv;

    std::unique_lock<std::mutex> lock(m);

    auto const start = std::chrono::steady_clock::now();
    auto const result = cv.wait_for(lock, std::chrono::seconds(2));
    auto const finish = std::chrono::steady_clock::now();

    auto const elapsed_ms = std::chrono::duration_cast<std::chrono::milliseconds>(finish - start);

    if (result == std::cv_status::timeout)
        std::cout << "Timeout after " << elapsed_ms.count() << "ms\n";
    else
        std::cout << "Awoken after " << elapsed_ms.count() << "ms\n";

    return 0;
}

It looks like this program should wait in vain for a condition variable to be notified before timing out after two seconds and reporting a timeout. However, when I compile it with GCC and run it on one Linux machine, I get:

$ ./wait
Timeout after 200ms
$ ./wait
Timeout after 200ms
$ ./wait
Timeout after 200ms

When I do the same on a different machine, I get:

$ ./wait

and the program does not exit.

This might be surprising until you learn that I was also running this program to advance the system clock by one second every 100ms on the first machine:

#include <stdio.h>
#include <time.h>
#include <unistd.h>

int main(int argc, char *argv[])
{
    struct timespec ts;
    clock_gettime(CLOCK_REALTIME, &ts);

    while (true)
    {
        usleep(100000);
        ++ts.tv_sec;
        clock_settime(CLOCK_REALTIME, &ts);
    }
}

and a similar program that just kept rewinding the system clock to the start time every 100ms on the second machine:

#include <stdio.h>
#include <time.h>
#include <unistd.h>

int main(int argc, char *argv[])
{
    struct timespec ts;
    clock_gettime(CLOCK_REALTIME, &ts);

    while (true) {
        usleep(100000);
        clock_settime(CLOCK_REALTIME, &ts);
    }
}

It seems that the behaviour of std::condition_variable::wait_for is being influenced by changing the system clock. That's a strange thing for a function being passed a duration to do.

Why does this happen?

To work out why this happens, we need to delve into the libstdc++ implementation. Here's the implementation of wait_for from libstdc++'s condition_variable header:

template<typename _Rep, typename _Period>
  cv_status
  wait_for(unique_lock<mutex>& __lock,
           const chrono::duration<_Rep, _Period>& __rtime)
  {
    using __dur = typename __clock_t::duration;
    auto __reltime = chrono::duration_cast<__dur>(__rtime);
    if (__reltime < __rtime)
      ++__reltime;
    return wait_until(__lock, __clock_t::now() + __reltime);
  }

It seems sane enough, it adds the current time to the relative timeout that was passed to generate an absolute time and then calls wait_until to wait for the condition variable to be notified, or for that absolute time to be reached. But, what's this __clock_t type? That's a typedef from earlier:

typedef chrono::system_clock        __clock_t;

Aha! So, __clock_t is std::chrono::system_clock which is a clock that does change in response to the system time being changed. This explains why the behaviour of std::condition_variable::wait_for is affected by system clock changes.

So, what can we do to avoid this? How about changing the code to explictly use std::chrono::steady_clock instead of __clock_t? Unfortunately that doesn't help, because that would mean that this overload of wait_until will be called:

template<typename _Clock, typename _Duration>
  cv_status
  wait_until(unique_lock<mutex>& __lock,
             const chrono::time_point<_Clock, _Duration>& __atime)
  {
    // DR 887 - Sync unknown clock to known clock.
    const typename _Clock::time_point __c_entry = _Clock::now();
    const __clock_t::time_point __s_entry = __clock_t::now();
    const auto __delta = __atime - __c_entry;
    const auto __s_atime = __s_entry + __delta;

    return __wait_until_impl(__lock, __s_atime);
  }

This code converts an absolute time measured against an arbitrary clock, in our case std::chrono::steady_clock, to __clock_t aka std::chrono::system_clock. So we're back to using a clock that can be changed again. But there's a clue here, what's this "DR 887"? We'll come back to that later.

It seems that what we need is an implementation of __wait_until_impl that accepts a std::chrono::steady_clock. Let's have a look at the __clock_t implementation:

template<typename _Dur>
  cv_status
  __wait_until_impl(unique_lock<mutex>& __lock,
                    const chrono::time_point<__clock_t, _Dur>& __atime)
  {
    auto __s = chrono::time_point_cast<chrono::seconds>(__atime);
    auto __ns = chrono::duration_cast<chrono::nanoseconds>(__atime - __s);

    __gthread_time_t __ts =
      {
        static_cast<std::time_t>(__s.time_since_epoch().count()),
        static_cast<long>(__ns.count())
      };

    __gthread_cond_timedwait(&_M_cond, __lock.mutex()->native_handle(),
                             &__ts);

    return (__clock_t::now() < __atime
            ? cv_status::no_timeout : cv_status::timeout);
  }

It boils down to calling __gthread_cond_timedwait. I'm not going to go through the gthreads abstraction here, so let's just pretend that it's a call to pthread_cond_timedwait, which is what it ultimately is on POSIX-like systems.

pthread_cond_timedwait does support waiting against both std::chrono::system_clock (as CLOCK_REALTIME) and std::chrono::steady_clock (as CLOCK_MONOTONIC) but the clock to use is a property set at the time that the condition variable is created by using pthread_condattr_setclock and can't be changed later. Unfortunately, for std::condition_variable we don't know which clock to use until the time of the wait, and in fact there could be a mixture of clocks used with any particular std::condition_variable instance.

Why do we care about this?

Of course it's unrealistic to be running programs that modify the system clock in this way continually. However, I'm just using them to make a rare problem more obvious. Systems, particularly embedded ones, may boot without knowing the current time and only set the system clock later. This may be quite a long time later if they don't immediately have an Internet connection. A desktop user may notice that their clock is completely wrong and take action to correct it.

Problems seen by the test programs above might be seen only rarely, but that doesn't stop them potentially having bad and difficult-to-diagnose effects. There has been a trend among many Linux system libraries over the last few years away from using CLOCK_REALTIME (often just via time(2) or gettimeofday(2)) towards using CLOCK_MONOTONIC instead.1

I went through a large-ish embedded Linux code base to identify the places in the code that used timeouts when waiting for a condition variable. I found the following possibilities:

  1. Waiting for a period of time whilst still responding immediately to other requests that notify the condition variable, such as being told to exit. For example, flashing an LED to indicate an error until an external notification comes in to say that the error is no longer important. Or perhaps polling an external device periodically but still responding immediately to an exit request.

  2. Supporting a timeout when reading from an asynchronously-filled queue.

  3. A deferred procedure call implementation that will execute at a specified time, but still wants to respond immediately to cancellation requests or to add other items to the queue.

  4. Giving up on data arriving when updating the screen in real time. In these cases the timeout will be very short, and exceeding it would cause visual glitches.

  5. Providing an upper bound to how long the wait can last. Such cases are followed by calls to std::terminate if the timeout occurs. This was particularly common in unit tests.

In all of these cases it is important that the wait doesn't respond with a timeout earlier than it should.

In all but the last of these cases it is important that the wait doesn't continue once the timeout time has been reached.

In none of these cases are we interested in measuring our timeout against a system clock that might be changed.

I think that the majority of users of std::condition_variable would be surprised to discover that relative waits and waits against std::chrono::steady_clock are affected by the system clock changing.

DR 887

From what I can glean, this problem was known about before std::condition_variable was standardised and was known as "DR-887". The best documentation I can find for why it was not deemed important enough to fix is in N4486 (search for 887) where Howard (presumably Howard Hinnant) seems to only be considering the last use case described above:

condition_variable::wait_until (and pthread_cond_timedwait) is little more than a convenience function for making sure condition_variable::wait doesn't hang for an unreasonable amount of time (where the client gets to define "unreasonable"). I do not think it is in anyone's interest to try to make it into anything more than that.

There's plenty of other discussion there too. Including some suggestions that I consider below.

What can we do about this?

There are a number of different ways that we can fix this, but before discussing those, let's talk about some useful features of the current implementation that we might want to consider not breaking.

Waits that want to react to system clock changes

With recent glibc and recent kernels, pthread_cond_timedwait with CLOCK_REALTIME behaves very well in the face of system clock changes if you genuinely are interested in "wall" time. For example, you might be implementing a calendar application and want to be woken up when another thread changes any of the events or if the first in a sorted series of events expires. In that case, you would want to react to system clock changes. The kernel is capable of waking up the futex used internally by glibc's pthread_cond_timedwait implementation immediately if necessary when the system clock is changed. Similarly, if the system clock is moved back then the futex remains dormant until the desired timeout is reached. This is relatively2 new behaviour in glibc since the required support was only added to the Linux kernel in v2.6.28. Previously CLOCK_REALTIME waits were converted to relative timeouts (measured internally by the kernel against CLOCK_MONOTONIC) and once that was done the duration of the wait was immune to system clock changes.

It's not clear to me that there are many, or perhaps even any, applications that make use of this behaviour. Certainly, I've never written such code.

Possible solutions

Now, let's look at the possibilities for solving this.

Just report a spurious wakeup if we wake up early

We could stop wait_until and wait_for returning cv_status::timeout if the time hasn't been reached when measured against the caller-supplied clock even if the underlying wait against std::chrono::system_clock returns a timeout.

This is the approach that Clang's libc++ takes. My reading of the DR-887 link above makes me think that this behaviour might even be mandatory, which would make libstdc++'s current implementation deficient.

class condition_variable
{
  //...
  typedef chrono::steady_clock        __steady_clock_t;

  template<typename _Rep, typename _Period>
  cv_status
  wait_for(unique_lock<mutex>& __lock,
           const chrono::duration<_Rep, _Period>& __rtime)
  {
    using __dur = typename __steady_clock_t::duration;
    auto __reltime = chrono::duration_cast<__dur>(__rtime);
    if (__reltime < __rtime)
      ++__reltime;
    return wait_until(__lock, __steady_clock_t::now() + __reltime);
  }

  template<typename _Clock, typename _Duration>
  cv_status
  wait_until(unique_lock<mutex>& __lock,
             const chrono::time_point<_Clock, _Duration>& __atime)
  {
    // DR 887 - Sync unknown clock to known clock.
    const typename _Clock::time_point __c_entry = _Clock::now();
    const __clock_t::time_point __s_entry = __clock_t::now();
    const auto __delta = __atime - __c_entry;
    const auto __s_atime = __s_entry + __delta;

    // We might get a timeout from __clock_t but we shouldn't have
    // done when measured against the caller-specified clock.
    if (__wait_until_impl(__lock, __s_atime) == cv_status::timeout)
      return _Clock::now() < __atime ? cv_status::no_timeout : cv_status::timeout;
    else
      return cv_status::no_timeout;
  }

  //...
};

If the system clock is advanced causing the pthread_cond_timedwait on a CLOCK_REALTIME condition variable to wake up with a timeout then we can check the desired timeout against std::chrono::steady_clock::now() and only return a timeout if that time has been exceeded. If it hasn't we must indicate a spurious wakeup by returning cv_status::no_timeout to avoid racing against notification of the condition variable. The caller should check the actual condition and call wait_until again if necessary (or even better, use the overload that takes a predicate that will do this automatically.)

If the system clock is put back then pthread_cond_timedwait will stay blocked inside the kernel waiting for the timeout to be reached. Of course there's no guarantee how quickly the wait will return on timeout anyway, but since the system clock could be put back by an arbitrary amount of time, the wait could be delayed significantly. If the explanation in "DR 877" is correct, then perhaps this isn't a significant problem, but several of the use cases described above are broken by this behaviour.

This solution is definitely preferable to the current situation, and perhaps it's worth applying it to libstdc++ whilst we work on a better long-term solution.3

Tell pthread_cond_timedwait to always use CLOCK_MONOTONIC

This is equivalent to changing the definition of __clock_t to

typedef chrono::steady_clock        __clock_t;

and making the necessary changes to gthreads to create the condition variable having called pthread_condattr_setclock with CLOCK_MONOTONIC. It looks like this is the approach that recent versions of Boost.Thread have taken. This takes us back to the behaviour with versions of glibc before it added support for FUTEX_CLOCK_REALTIME but without the potential race condition that existed then.

The trouble is that we lose the desirable behaviour of using CLOCK_REALTIME when reacting to system clock changes. We effectively create the same problem in reverse: for an application that cares about "wall time", there's a risk that waits will be too short or too long. Of course, we can apply the short-term solution described earlier in the std::chrono::system_clock case so we can avoid reporting timeout to early, but we may still wait longer than necessary.

The non-standard template parameter

We could add a non-standard template parameter to std::condition_variable to indicate the clock to be specified when creating the underlying POSIX condition variable. This would result in code like:

std::condition_variable<std::chrono::steady_clock> cv;
//...
cv.wait_for(std::chrono::seconds(10));

This allows us to specify the __clock_t that all other clocks are converted to for each condition variable.

This could be made to work (with the required gthreads changes), but requires clients to modify their code to make it work properly. We could consider combining this with the previous solution to make the default for __clock_t be std::chrono::steady_clock, but allowing the calling code to choose std::chrono::system_clock if they desired so. This isn't pretty though, and means that it's not possible to tell purely from the code performing the wait whether it is susceptible to waiting for the wrong time when the system clock changes.

Add a non-standard constructor parameter

This was suggested in the response to "DR 887". I think that the previous solution is better since in C++ a clock is a type and not a value and it's clearer to provide a C++ clock such as std::chrono::steady_clock than a POSIX clockid_t like CLOCK_MONOTONIC.

Invent a new pthreads function

There's no reason why glibc couldn't provide a function that accepts a parameter to indicate the clock to use for the wait:

int pthread_cond_timedwaitonclock_np (pthread_cond_t *cond, pthread_mutex_t *mutex,
                                      clockid_t clockid, const struct timespec *abstime);

This function ignores any clock specified at the time the condition variable was created and uses the one from the clockid parameter instead.

I've been trying to get such a change into glibc for several years, but have so far failed to do so. The current opinion on the glibc mailing list appears to be that we need to get the Austin Group4 to agree to adding the new function.

Reimplement std::condition_variable on top of futex(2)

libstdc++ already implements something similar to a condition variable inside its implementation of std::future using the Linux-specific futex(2) system call. The same could be done for std::condition_variable (although some care would need to be taken to avoid a dependency cycle since std::future can also fall back to using std::condition_variable.)

This would mean quite a risky change that could affect the behaviour of existing programs.

Related problems

This problem doesn't just affect std::condition_variable.

std::timed_mutex

std::timed_mutex::try_lock_for and std::timed_mutex::try_lock_until suffer from exactly the same problem. However, POSIX does not provide any way to wait on a mutex using CLOCK_MONOTONIC. Inventing pthread_mutex_timedlockonclock_np would be a potential fix for that.

std::future

std::future::wait_for and std::future::wait_until also suffer from the same problem. The underlying implementation on modern Linux systems uses futex(2) directly so, unlike the above, this can be fixed entirely within libstdc++ when the correct method can be agreed upon.

What next?

Good question. I have failed in my attempts to get either my pthread_timedwaitonclock_np or std::future patches accepted. I've even failed to get the related self-contained std::future fix in. By writing this blog post I aim to raise awareness in the hope that someone reading it will review my patches and help me to get them accepted.


  1. In fact, about twenty years ago my desktop screen would go blank each time I dialled my ISP because doing so automatically ran ntpdate to correct my clock. Back then the kernel's screensaver timeout was implemented using the system clock rather than a monotonic clock.

  2. Well, "relatively" for anyone who has been fighting with this problem for as long as I have, anyway.

  3. In fact, I submitted this change after I wrote this blog post, and it was accepted.

  4. The people in charge of the POSIX standard.

2 comments:

Kostia said...

Good article. I was looking for this exact info. Thanks.
There may be one more use case - to handle multiple timers in one thread (e.g. keep them all in a vector and wait for the nearest time point or for a new time point(timer) to arrive).

Unknown said...

We discovered the same problem in an automotive use case. This blog posting saves me a lot of analysis effort! Thanks! 👍