Posts in "Overview of C++20"

📌 Disclaimers:

The idea of this overview is self-education. Please consider this post as my notes during the reading.
The main goal - keep it simple and short.

❗ There is no guarantee that below theoretical and practical parts are correkt and up-to-date.

Introduction
Sync vs Async and Concurrent vs Parallel
My First Coroutine
The Awaitable Interface
Some More Details
In addition
Summary
Sources

Introduction

Coroutines are complex but revolutionary step towards elegant asynchronous programming in C++. You can write code that looks sequential but executes asynchronously. Yeah, coroutines are all about asynchronous programming. So what's that? Let’s sort out some terms before writing the first coroutine.

Sync vs Async and Concurrent vs Parallel

Synchronous programming is the easiest term to understand. It means only one operation can be done by a program at a time and other operations remain queued. Simply in order. One by one. To illustrate synchronous programming let’s use blocks which represent some tasks:

Figure 1. Simple synchronous programming.

In real-world, tasks don’t look like homogeneous blocks of computations. They look more like in the next image with chunks of idle time, spent waiting for fetching data from a disk or making a network request. Hence, CPU spends a lot of time idle and it still takes four times as long as a single task.

Figure 2. Synchronous programming with idle chunks.

Instead, we can fully utilise the idle moments to begin executing other tasks, therefore all four tasks can be finished in much less time. The more time CPU spends idle on any individual task, the more tasks it can overlap to save time. This is the core principle and the goal of concurrency. At this point we’re moving to asynchronous programming.

Figure 3. “Perfect” concurrency.

So with concurrency we can run multiple tasks simultaneously even on a single-core processor. The processor will accommodate this execution with context switching and in our perspective we’ll see it as a parallel execution. This is an illusion. It’s actually sequential execution on the same core.

Parallelism and concurrency are most of the time used interchangeably, but there is a slight difference in what they represent. Parallelism associates with multicore processor that runs each of tasks on different cores.

Figure 4. Parallelism.

Parallelism is performance oriented, whereas concurrency is about performing non-blocking operations.

To sum up, sync and async are programming techniques, whereas concurrent and parallel define the ways tasks are executed.

My First Coroutine

A coroutine is a generalisation of a subroutine. Subroutine is a normal function that can be invoked and return control back to its caller. The main difference between coroutine and subroutine is that coroutine can suspend and resume its execution multiple times.

In C++ a function is a coroutine if it contains one of the following keywords: co_return, co_await, and co_yield.

As of now, you can't use coroutines directly by specifying only above keywords, e.g.

#include <coroutine>

int Square(const int val) { co_return val * val; }

int main() {
  return Square(2);
}

// MSVC 19.31.31107.0
// >_ error C2039: 'promise_type': is not a member of 'std::coroutine_traits<int,int>'

// Apple clang version 13.1.6 (clang-1316.0.21.2.5)
// >_ This function cannot be a coroutine: 'std::experimental::coroutine_traits<int, int>' has no member named 'promise_type'

// GCC 12.1
// >_ error: unable to find the promise type for this coroutinex

We have to implement promise_type (it also might be called as promise interface). Promise type specifies methods for customising the behaviour of a coroutine itself. It means that developer is able to customise what happens when the coroutine is called, what happens when coroutine returns and customise the behaviour of any co_await or co_yield. It may be easier to think about the coroutine’s promise as being a type that controls the behaviour of the coroutine and can be used to track its state. This promise type should follow the interface that must implement 5 functions:

#include <coroutine>

template <typename T> class Task {
public:
  struct promise_type {
    // 1. Obtains the return object. The return object is the value that is returned
    //    to the caller when the coroutine first suspends or after it runs to completion.
    auto get_return_object() {
      return Task{std::coroutine_handle<promise_type>::from_promise(*this)};
    }

    // 2. Controls whether the coroutine should suspend before executing
    //    the coroutine body or start executing the coroutine body immediately.
    //    Method returns either std::suspend_always (if the operation is lazily started)
    //    or std::suspend_never (if the operation is eagerly started).
    std::suspend_never initial_suspend() const noexcept { return {}; }

    // 3. Gives an opportunity to execute some additional logic (such as
    //    publishing a result, signalling completion or resuming a continuation)
    //    before execution is returned back to the caller/resumer.
    std::suspend_never final_suspend() const noexcept { return {}; }

    // 4. What if an exception happens inside a coroutine? We should tell
    //    how to handle it respectively. Do nothing for simplicity.
    void unhandled_exception() const {}

    // 5. Sets the value that needs to be returned as a result.
    template <typename T> void return_value(T &&value) noexcept {
      m_value = std::forward<T>(value);
    }

    auto result() const { return m_value; }

  private:
    T m_value{-1};
  };

  explicit Task(std::coroutine_handle<promise_type> handle) noexcept
      : m_handle{handle} {}

  T result() { return m_handle.promise().result(); }

private:
  // Coroutine handle represents a non-owning handle to the coroutine frame and
  // can be used to resume execution of the coroutine or to destroy the coroutine frame.
  // It can also be used to get access to the coroutine’s promise object.
  std::coroutine_handle<promise_type> m_handle;
};

There is so much work, but it’ll not work.

TEST(TaskTest, FirstTask) {
  auto square = [](const int val) -> Task<int> { co_return(val * val); };
  auto task = square(2);
  EXPECT_EQ(task.result(), 4);
}

// >_ error: Expected equality of these values:
//    task.result()
//      Which is : -572662307
//    4

Why? First of all we need to understand what compiler does under the hood with promise_type. A body of a coroutine is transformed to something like this:

{
  __coroutine_context *__context = new __coroutine_context();
  auto __retval = __context->_promise.get_return_object();
  co_await __context->_promise.initial_suspend();

  try {
    __retval->return_value(val * val); // body statements
  } catch (...) {
    promise.unhandled_exception();
  }

__final_suspend_label:
  co_await __context->_promise.final_suspend();
  delete __context;
  return __retval;
}

Since final_suspend returns std::suspend_never. It means that the coroutine handle is immediately deleted. It’s undefined behaviour. The final_suspend method must return std::suspend_always. Now test is passed.

And what if initial_suspend returns std::suspend_always? You must see wrong result, e.g.

TEST(TaskTest, FirstTask) {
  auto square = [](const int val) -> Task<int> { co_return(val * val); };
  auto task = square(2);
  EXPECT_EQ(task.result(), 4);
}

// >_ error: Expected equality of these values:
//    task.result()
//      Which is: -1
//    4

It is because execution is suspended at the beginning and we should resume it so coroutine finishes. One more method needs to be added in the Task class:

template <typename T> class Task {
  // ...

  // Reactivates a suspended coroutine at the resume point.
  void resume() {
    if (m_handle) {
      m_handle.resume();
    }
  }

  // ...
};

Then resume coroutine in the test:

TEST(TaskTest, FirstTask) {
  auto square = [](const int val) -> Task<int> { co_return(val * val); };
  auto task = square(2);
  task.resume();
  EXPECT_EQ(task.result(), 4);
}

// >_ [ RUN      ] TaskTest.FirstTask
//    [       OK ] TaskTest.FirstTask (0 ms)

Let’s illustrate what is going on here:

Figure 5. The coroutine flow.

But it’s not the end. We have a memory leak here. Coroutine handle must be deleted explicitly in the Task destructor.

template <typename T> class Task {
  // ...

  // Destroys the coroutine frame, calling the destructors of any in-scope
  // variables and freeing memory used by the coroutine frame.
  ~Task() noexcept {
    if (m_handle) {
      m_handle.destroy();
    }
  }

  // ...
};

Ok we know what needs to be done to use co_return. What about co_await?

The Awaitable Interface

The Awaitable interface specifies methods that control the semantics of co_await expression. The co_await is a new unary operator that can be applied to a value within a coroutine context. One of the powerful design features of the co_await operator is the ability to execute code after the coroutine has been suspended but before execution is returned to the caller.

Task<int> bar(const int val) { co_return(val * val); }
Task<int> foo(const int val) { co_return co_await bar(val); }

TEST(TaskTest, CoroInCoro) {
  auto task = foo(3);
  while (task)
    task.resume();
  EXPECT_EQ(task.result(), 9);
}

// >_ error C2039: 'await_resume': is not a member of 'Task<int>'

We need to implement the support for co_await operator in the Task class.

template <typename T> class Task {
public:
  // ...

  auto operator co_await() const {
    return Awaiter<promise_type>{m_handle};
  }

private:
  std::coroutine_handle<promise_type> m_handle;
};

Please note, it is called Awaiter because Awaiter and Awaitable mean different things:

💡 A type that supports the co_await operator is called an Awaitable type. An Awaiter type is a type that implements the three special methods that are called as part of a co_await expression: await_ready, await_suspend and await_resume.

And here is the implementation of Awaiter type that needs 3 special methods:

template <typename Promise> class Awaiter {
public:
  explicit Awaiter(std::coroutine_handle<Promise> handle) : m_handle{handle} {}

  // 1. Tells that is an expression is ready.
  bool await_ready() const noexcept { return !m_handle || m_handle.done(); }

  // 2. Schedules the coroutine for resumption (or destruction) at some point
  //    in the future.
  auto await_suspend(std::coroutine_handle<> continuation) noexcept {
    return m_handle;
  }

  // 3. Returns the value that becomes the result of the `co_await` expression.
  //    The `await_resume` can also throw an exception in which case the exception
  //    propagates out of the `co_await` expression.
  decltype(auto) await_resume() noexcept { return m_handle.promise().result(); }

private:
  std::coroutine_handle<Promise> m_handle;
};

So now a test case is passed successfully.

Task<int> bar(const int val) { co_return(val * val); }
Task<int> foo(const int val) { co_return co_await bar(val); }

TEST(TaskTest, CoroInCoro) {
  auto task = foo(3);
  while (task)
    task.resume();
  EXPECT_EQ(task.result(), 9);
}

// >_ [ RUN      ] TaskTest.CoroInCoro
//    [       OK ] TaskTest.CoroInCoro (1 ms)

One potential bug is hidden here. I'll skip it. Mikhail Svetkin tells about it on his talk.

The full version of the first coroutine is on GitHub, and also advanced implementation.

Some More Details

What about co_yield?

It's pretty easy to understand co_yield by making a generator. For instance, cppreference has a good example. Also, lots of examples of generators using coroutines can be found on internet.
So to sum up, a coroutine consists of 3 parts:
1. A promise object - promise_type.
2. Non owning handle which is used to resume or destroy coroutine from outside.
3. Coroutine state. It is heap allocated and contains promise object, arguments to coroutine and local variables.

What does compiler do with co_return, co_await, and co_yield keywords under the hood?

co_return x; transforms into

__promise.return_value(x);
goto __final_suspend_label;

co_await y; transforms into

auto&& __awaiter = y;
if (!__awaiter.await_ready()) {
  __awaiter.await_suspend();
  // suspend/resume point
}
__awaiter.await_resume();

co_yield z; transforms into
```
co_await __promise.yield_value(z);
```

Is that possible to use co_await in the main function?

Nope! You can use it only in the coroutine context. Here is a good example - sync_await - how to make sync and async code friendly.
When should I use coroutines?
- Launching a suspension function.
- From callbacks to coroutines.
- Lazy sequences.
- Channels/Pipelines (not sure, need to search more about this item)

In addition

Thread vs coroutine

The main difference between threads and coroutines is that coroutines are cooperatively multitasked, whereas threads are typically preemptively multitasked.

However, in general, coroutines are very similar to threads with few advantages. Switching between coroutines need not involve any system calls or any blocking calls. There is no need for synchronisation primitives such as mutexes, semaphores, etc. in order to guard critical sections. Also, they don’t have a direct relationship with memory usage, whereas memory usage grows linearly with the number of threads.

In short, coroutine is a more optimal choice for most use cases as it’s more scalable and efficient as compared to thread.
Coroutine vs fiber

To be honest, I’ve never heard about fibers before. For instance, Win32 fibers or boost::context. The key difference between fibers and coroutines is that fibers require a scheduler which decides which active fiber is called next. Coroutines don't have a concept of a scheduler. They pass the execution to a specified point in code.

Figure 6. From "C++20 Coroutine: Under The Hood" post.

Remember, stackless coroutines loose use of the stack when it is suspended.

Summary

Now we know that coroutine is a variant of function that enables concurrency via cooperative multitasking and it's a language facility that makes writing asynchronous code a whole lot easier. It's advanced topic and most of C++ developers (99.98%?) will not deal with coroutines explicitly. Standard provides low-level facilities that can be difficult to use in a safe way and are mainly intended to be used by library writers to build higher-level abstractions that app developers can work with safely.