Overview of C++20: Modules


Posts in "Overview of C++20"

  1. Ranges
  2. Concepts
  3. Modules
  4. TBA: ...
πŸ“Œ Disclaimers:
  1. The idea of this overview is self-education. Please consider this post as my notes during the reading.
  2. The main goal - keep it simple and short. Not this one, sorry.
There is no guarantee that below theoretical and practical parts are correkt and up-to-date.

Table of Contents


Introduction

C++20 standard brought modules to us. It's definitely a big language feature that requires lots of efforts to understand it. In short, modules are a new way to organize, encapsulate, and isolate the code. I think I spent several weeks to read plenty of theory and test lots of examples. At first, it seems easy. Then I releazed it's not. The more I learned, the more questions I had. The Dunning-Kruger effect in action. Now I have a pretty good knowledge about modules to share with you.

At the beginning I am going to go over the pros and cons of using headers.


Old Good Headers

C++ has used headers for a really long time now. However, this form of dependency inclusion has its flaws:
  • Separate header and implementation files.

    In most cases if we want to add a class, we have to create header and implementation files. We also know, headers contain declaration, whereas implementation files - definition. Consequently, the project structure usually looks like as in the following image.
    Always create a file twice, unless using IDE features. From the other hand, this separation reduces the amount of recompilation needed when you change only implementation. Last but not least, no headers - no include guards or #pragma once. That was a tough choice in the past.

  • Preprocessing stage is slow, because it has to process lots of text.

    Let's consider an extremely straightforward example:
    #include <iostream>
    
    int main()
    {
      std::cout << "Hello World!\n";
    }
    
    // >_ clang++ -std=c++20 -stdlib=libc++ -E HelloWorld.cpp | wc -cl  
    //    49450 1809441
    Even a "Hello World" has almost 50K of lines and 1.8MB of text. That's because preprocessor substitutes #include directives (in our case it's only <iostream>) with content of included files (additional info here). And it does it recursively for all files. Real-world projects end up with gigabytes of text. Going forward, the same program with import will be much, much smaller:
    import <iostream>;
    
    int main()
    {
      std::cout << "Hello World!\n";
    }
    
    // >_ clang++ -std=c++20 -stdlib=libc++ -fmodules -fbuiltin-module-map \
                  -E HelloWorldWithImport.cpp | wc -cl
    //      12     235

  • One Definition Rule violations.

    In short, non-inline function or object must have only one definition in the entire program. For instance, if there are 2 identical "private" functions:
    Compiler should throw "already defined" error. However, if that function would be either a template function or inline - no errors. That's because types, templates, and inline functions can be defined in more than one translation unit. I'd like to clarify here, translation unit consists of an implementation file and all the headers that it includes directly or indirectly. This topic is definitely a rabbit hole. If you're interested in ODR, please check some more examples.

  • The order of includes matters, but it shouldn't.

    The last time I encountered this issue when I was working with WTL library, e.g.
    #include <atlapp.h>
    #include <atlbase.h>
    
    // fatal error C1189: #error:  atlapp.h requires atlbase.h to be included first
    In order to fix that, we have to turn off clang-format and change the order.
    // clang-format off
    #include <atlbase.h>
    #include <atlapp.h>
    // clang-format on
    
    // OK
    This item is twice as bad as the preceding one as it leads to cyclic dependencies. For more info, please look at the post with a list of several reasons against circular dependencies.

  • Finally, it's hard to encapsulate stuff that just needs to be in header files.

    Even if you put some stuff in a detailed namespace, someone will use it, as Hyrum's law predicts.
Looks like it's a big list. To be fair, I've been working with C++ more than 8 years now and I've not had any problems with headers. So some advantages:
  • Reduce cognitive load:
    • If you're trying to understand a large system, headers will help here.
    • It's easy to review pull request with headers. You don't need to pay much attention on implementation. Every developer has its own unique implementation. Just ensure that declaration is simple to read, interface is simple to use, and there's nothing redundant.
  • Before Conan Package Manager header-only libraries saved me a lot of time. Managing 3rd-party dependencies is always a pain.
Fortunately, this is when modules enter the game. They should solve the aforementioned flaws, bringing a great speedup to build times and better C++ scalability when it comes to building. With modules, you only export what you want to export, which results in good encapsulation. Having a specific order of dependencies inclusion is no longer an issue too, as the order of imports doesn't matter. And also modules don't allow macros to propagate to other modules.

Module Structure

Before diving into the topic, I'd like to mention that code examples were tested in Visual Studio 2022 (Version 17.1.0 Preview 5.0). It has complete modules support according to C++ compiler support table.

I think a simple example should be a good starting point. Imagine working on funds withdrawal. Usually when users withdraw funds, service sends a code to their email or mobile. It's like additional authentication step. Generating such random code is what can be done easily now. So we need a class that generates random strings with specified length and symbols. Here is a suitable implementation for the current use case:
// Utils.Randomizer.ixx
module;

#include <algorithm>

export module Utils.Randomizer;

import <random>;

namespace utils {

namespace {
std::string GetCharset(const int literals);
} // namespace

export class Randomizer
{
public:
  enum Literals
  {
    UpperCaseLetters = 0b0001,
    LowerCaseLetters = 0b0010,
    Digits = 0b0100,

    Letters = UpperCaseLetters | LowerCaseLetters,
    Alnum = Letters | Digits
  };

  std::string generateString(const size_t &length, const int literals)
  {
    const auto charset = GetCharset(literals);
    std::uniform_int_distribution<> distr{ 0, static_cast<int>(charset.size()) - 1 };

    std::string str(length, 0);
    std::generate_n(str.begin(), length, [&] { return charset.at(static_cast<size_t>(distr(m_rng))); });

    return str;
  }

private:
  std::random_device m_rd{};
  std::mt19937 m_rng{ m_rd() };
};

private : module;

namespace {

std::string GetCharset(const int literals)
{
  std::string charset;

  if (literals & Randomizer::Digits) {
    charset += "0123456789";
  }
  if (literals & Randomizer::UpperCaseLetters) {
    charset += "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
  }
  if (literals & Randomizer::LowerCaseLetters) {
    charset += "abcdefghijklmnopqrstuvwxyz";
  }

  return charset;
}

} // namespace

} // namespace utils

Literally 3 new keywords - module, import, and export.

πŸ’‘ Interesting fact, export keyword was originally meant to permit the separation of the definition of a template from its usages. It proved to be incredibly difficult to implement, and the "export templates" feature was dropped in C++11.

The rest must be pretty easy - 1 exported Randomizer class with 1 public generateString method, and 1 helper GetCharset function. If omit the code snippets, we can see a basic module structure:

module; // starts a global module fragment

// Global module fragment [optional].
//   No declarations. Preprocessor directives only! You can't write anything between
//  'module;' and the module declaration except for preprocessing directives. So
//   mainly you'll include headers here when importing them isn't possible
//   (notably when a header uses preprocessing macros as configuration).
//   The code in this section isn't exported by the module interface.

export module name; // module declaration

// Module preamble.
//   Import declarations mostly. Also, we can put special includes here to become
//     * the exported interface;
//     * the non-exported but reachable implementation details.

// Module purview.
//   Exported types and behaviour, etc.

module : private; // starts a private module fragment

// Private module fragment [optional].
//   Provides a separation of interface and implementation in a way that
//   you can provide them together in a single source file without
//   exposing the implementation details in the interface. Consequently,
//   a modification of this part does not cause recompilation.

If you have some doubts regarding private module fragment, consider it as a traditional .cpp file. So it can only contain the stuff that you would put into a .cpp.

In order to finish our example we can add another module. It must add a layer of abstraction between usage (client side) and implementation. It's a good approach if

  1. Client side has clear requirements, whereas implementation is generic.
  2. Usage code doesn't need to know anything about implementation details.
  3. Implementation itself isn't stable yet.

Architectural digression... So a new module will contain only GenerateRandomCode function that creates a code by specified criteria:

// Domain.Auth.ixx
export module Domain.Auth;

import Utils.Randomizer;

import <string>;

namespace domain::auth {

export auto GenerateRandomCode(const size_t length = 5)
{
  using namespace utils;

  constexpr int literals = Randomizer::UpperCaseLetters | Randomizer::Digits;

  return Randomizer{}.generateString(length, literals);
}

} // namespace domain::auth

Tiny module. Utils.Randomizer module is imported in the preceding example. It means that you can use all entities that it exports. In our case it's only utils::Randomizer class, because utils::GetChartset function is not exported.

And finally main.cpp:

#include <iostream>

import Domain.Auth;

int main()
{
  std::cout << domain::auth::GenerateRandomCode() << '\t'
    << domain::auth::GenerateRandomCode() << '\n';
}

// >_ K77JS   KD671

So far so good. Modules appear to be easy.

Move on to the next topic where we cover some more details.


Some More Details

  1. I use .ixx file extension for modules, because Visual Studio sets it by default. You can use .cppm too. Why not simply .mpp? - 🀷‍♂️

  2. How to name a module? It's always hard to name things. I chose this pattern - ${ComponentName}.${ClassOrEntityName}.ixx. Mainly it's because CppCon speakers used similar pattern. Dots are used informally to represent hierarchy. In addition to above examples,

    Http.Client.ixx
    Http.ClientFactory.ixx
    Http.Request.ixx
    Http.Response.ixx

  3. In the Utils.Randomizer.ixx we could do the following:
    export module Utils.Randomizer;
    
    import <algorithm>;
    import <random>;
    
    // ...

    Include was added on demonstration purpose to show global module fragment at the beginning.

    Let's move on and make it simpler and shorter:

    export module Utils.Randomizer;
    
    import std.core;
    
    // ...

    Microsoft has already added possibility to import C++ Standard Library as modules. In order to be able to use them install C++ Modules build tools:

    And then enable experimental compiler support for C++ Standard modules:

    It's an experimental feature. So don't be surprized seeing some warnings:

    warning C5050: Possible incompatible environment while importing module 'std.core': _GUARDOVERFLOW_CRT_ALLOCATORS=1 is defined in current command line and not in module command line
    warning C5050: Possible incompatible environment while importing module 'std.core': _M_FP_PRECISE is defined in current command line and not in module command line
    warning C5050: Possible incompatible environment while importing module 'std.core': mismatched C++ versions. Current "202002" module version "202004"

  4. An exported entity implicitly exports the containing namespace.
    // Domain.Auth.ixx
    // ...
    
    namespace domain::auth {
    
    export auto GenerateRandomCode(const size_t length = 5)
    {
      // ...
    }
    
    } // namespace domain::auth
    Exported as domain::auth::GenerateRandomCode, and namespace domain::auth is now exported too.

  5. Have you noticed the return type of the GenerateRandomCode function? It turns out Visual Studio tips extension isn't ready for automatic return type deduction. Hovering over variable or function doesn't say anything what function returns.
    Doxygen + auto might make the return type more clear, but writing it explicitly is the best option now.

  6. So far two entities were exported: type and function. Blocks and namespaces can be exported as well, e.g.
    export {
    uint64_t GetCpuUsage() { /* ... */ }
    uint64_t GetGpuUsage() { /* ... */ }
    }
    
    export namespace crypto {
    void Hash256(std::string_view source) { /* ... */ }
    void HashMd5(std::string_view source) { /* ... */ }
    }

    BTW modules and namespaces are orthogonal. What's that? The Pragmatic Programmer book has a great explanation:

    "Two or more things are orthogonal if changes in one do not affect any of the others."
    Chapter 8. Orthogonality.
  7. You can't export entities with internal linkage:
    • classes, functions, and variables defined within an anonymous namespace;
    • static variables and functions.
    namespace {
    
    export class Token {}; // Illegal!
    
    export void Foo() { /* ... */ } // Nope!
    
    export int kSomeVar = 1; //  Nooo!
    
    } // namespace
    
    export static int kAnotherVar = 2; // Illegal again!
    
    export static void Bar() { /* ... */ } // Plz stop!

  8. Module aggregation allows to assemble bigger modules out of submodules, e.g.
    export module Http;
    
    export {
    import Http.Client;
    import Http.ClientFactory;
    import Http.Request;
    import Http.Response;
    }

  9. Let's return back to the private module fragment for a moment.

    I've made 2 versions of the Randomizer example: with and without private module fragment. Noticed that expectation is different from reality. So in an ideal world renaming a local variable in the private module fragment must not cause recompilation other modules which import modified one. But I saw recompilation! Tom also saw that weird behavior.

  10. What's wrong with this code?
    // Utils.Randomizer.ixx
    export module Utils.Randomizer;
    
    #include <algorithm>
    #include <random>
    
    // ...

    It didn't compile in my case. Why? That's a good question. In some other cases everything is ok and Visual Studio shows a nice warning:

    warning C5244: warning C5244: '#include <string>' in the purview of module 'System.WinService' appears erroneous.  Consider moving that directive before the module declaration, or replace the textual inclusion with 'import <string>;'.

    So here is a new place where C++ developers will blow away their legs. In general, including headers in module purview is bad idea, because all included declarations and definitions would be considered part of the module. Back to above example. Every single symbol in algorithm and random headers becomes attached to Utils.Randomizer. Obviously, Utils.Randomizer module doesn't own the contents of both includes.

  11. What is header unit import?

    Header units are a smooth way to transition from headers to modules.

    import <string>; // as in the Domain.Auth.ixx
    
    // Import 3rd-party libraries
    import <SDL.h>;
    
    // You can also use double quotes
    import "Log.h";

    In order to find out the difference between <> and "", please check out lookup rules of #include.

    This is a preprocessor directive, but it's nothing at all like #include. The compiler generates something similar to module out of the import directive and treats the result as if it were a module. Importing these synthesized header units is faster and comparable in speed to precompiled headers. All exportable names including macros from the header are made available in the importing module. Hold on, macros? Yes, macros from the header become visible in the importing translation unit after the import directive is passed. From the other hand, macros from the header unit are not re-exported into transitive importers. Only the regular language symbols.

    Let's make an example with macros. For instance, we have a simple Config.h:

    // Config.h
    #pragma once
    
    #ifdef MY_PROD_ENV
    #define MY_API_KEY 42
    #else
    #define MY_API_KEY 24
    #endif

    Module Module.One where GetApiKey function wraps up the macro:

    // Module.One.ixx
    export module Module.One;
    
    import "Config.h";
    
    export namespace one {
    int GetApiKey() { return MY_API_KEY; }
    } // namespace one

    One more module Module.Two that includes Config.h and export import Module.One:

    // Module.Two.ixx
    module;
    
    #define MY_PROD_ENV
    #include "Config.h"
    
    export module Module.Two;
    
    export import Module.One;
    
    export namespace two {
    int GetApiKey1() { return MY_API_KEY; }
    int GetApiKey2() { return one::GetApiKey(); }
    } // namespace two
    

    And main.cpp:

    #include <iostream>
    
    import Module.Two;
    
    int main()
    {
      std::cout << two::GetApiKey1() << '\t' << two::GetApiKey2() << '\n';
    }
    
    // >_ 42   24
    

    Current example shows 3 important things:

    1. // Module.One.ixx
      export module Module.One;
      
      #define MY_PROD_ENV // doesn't affect the following file
      import "Config.h";
      
      // ...
      
    2. // Module.Two.ixx
      module;
      
      #define MY_PROD_ENV // now it works
      #include "Config.h"
      
      // ...
      
      int GetApiKey2() {
        // Even though MY_PROD_ENV is defined here,
        // it doesn't have any power in the Module.One.
        return one::GetApiKey();
      }
    3. // main.cpp
      #include <iostream>
      
      import Module.Two;
      
      int main()
      {
        const auto apiKey = MY_API_KEY; // error: undeclared identifier, because
                                        // macros from header unit aren't re-exported
      }
  12. What about templates? I was surprised testing the following code:

    // Domain.Math.ixx
    export module Domain.Math;
    
    import <concepts>;
    
    template<typename T>
    concept Number = std::integral<T> || std::floating_point<T>;
    
    template<typename T>
    export auto Add(const T a, const T b) { return a + b; }
    
    If you try to compile it, you'll see weird errors:
    // main.cpp
    #include <iostream>
    
    import Domain.Math;
    
    int main()
    {
      std::cout << Add(1, 2) << '\n';
    }
    
    // error C2988: unrecognizable template declaration/definition
    // error C2059: syntax error: 'export'
    // error C2143: syntax error: missing ';' before '{'
    // error C2447: '{': missing function header (old-style formal list?)
    

    Luckily, it's easy to fix. There are 3 solutions:

    // Domain.Math.ixx
    // ...
    
    // Export keyword must be before template.
    export template<typename T>
    auto Add(const T a, const T b) { return a + b; }
    
    // Or use export block.
    export {
    template<typename T>
    auto Add(const T a, const T b) { return a + b; }
    }
    
    // Placeholder syntax works too.
    export auto Add(const Number auto a, const Number auto b) { return a + b; }
    

    If you want to know more about concepts, please look at my overview of C++20: Concepts. It has explanation when and why Add(1, 2.3) works.

Too much? Agree. More I learned details about modules, more questions had. Now we know enough to move to the next topic.


The PImpl Idiom And Modules

Pointer to implementation (PImpl) idiom separates interface and implementation by moving the implementation details of a class into a separate or inner class, accessed through an opaque pointer. In general, it gives two advantages:

  1. It can greatly reduce compilation time. Private changes don't affect the header of our class, so no clients have to be recompiled.
  2. Stabilize the class interface.

Let's consider a real-world example and make a class to work with JSON. In classic implementation it might look like this - Json.h, Json.cpp. And with modules:

export module Utils.Json;

import <nlohmann/json.hpp>;

import std.core;

namespace utils::json {

namespace impl {
class Document
{
public:
  /* ... */

private:
  nlohmann::json m_document;
};
} // namespace impl

export class Document
{
public:
  /* ... */

private:
  std::unique_ptr<impl::Document> m_impl;
};

} // namespace utils::json

Attempt #1 successfully failed:

error C2672: 'nlohmann::basic_json<std::map,std::vector,std::basic_string<char,std::char_traits<char>,std::allocator<char>>,bool,__int64,unsigned __int64,double,std::allocator,nlohmann::adl_serializer,std::vector<unsigned char,std::allocator<unsigned char>>>::get': no matching overloaded function found
message: see reference to function template instantiation 'auto utils::json::impl::Document::get<bool>(const std::string &) const' being compiled
error C2893: Failed to specialize function template 'unknown-type nlohmann::basic_json<std::map,std::vector,std::basic_string<char,std::char_traits<char>,std::allocator<char>>,bool,__int64,unsigned __int64,double,std::allocator,nlohmann::adl_serializer,std::vector<unsigned char,std::allocator<unsigned char>>>::get(void) noexcept(<expr>) const'
message: see declaration of 'nlohmann::basic_json<std::map,std::vector,std::basic_string<char,std::char_traits<char>,std::allocator<char>>,bool,__int64,unsigned __int64,double,std::allocator,nlohmann::adl_serializer,std::vector<unsigned char,std::allocator<unsigned char>>>::get'
message: With the following template arguments:
message: 'ValueTypeCV=T'
message: 'ValueType=bool'
error C2440: 'return': cannot convert from 'void' to 'bool'
...

Maybe the problem is in import <nlohmann/json.hpp>. So let's try include:

module;

#include <nlohmann/json.hpp>

export module Utils.Json;

import std.core;

namespace utils::json {

// ...

Attempt #2 failed again:

error C2270: '()': modifiers not allowed on nonmember functions
message: while compiling class template member function 'int std::basic_stringbuf<char,std::char_traits<char>,std::allocator<char>>::overflow(int)'
message: see reference to class template instantiation 'std::basic_stringbuf<char,std::char_traits<char>,std::allocator<char>>' being compiled
message: see reference to class template instantiation 'std::basic_ostringstream<char,std::char_traits<char>,std::allocator<char>>' being compiled

Ok, probably import std.core is not needed since nlohmann/json.hpp contains all needed headers. Moreover, I saw another error after mixing STL modules and STL headers. But I couldn't reproduce it later. Anyway...

module;

#include <nlohmann/json.hpp>

export module Utils.Json;

namespace utils::json {

// ...

And attempt #3 failed yet again:

error C2039: 'json_sax_dom_callback_parser': is not a member of 'nlohmann::detail'

Looks similar to this error but no answer. In addition I saw:

error MSB6006: "CL.exe" exited with code -1073741571.

I didn't expect that. Too much strange errors for me. At this point I was not sure that I had enough knowledge about modules. But I know for sure 3rd-party libraries aren't ready for modules and probably in the nearest future too. Not all libraries! For example, JsonCpp works good. All code can be checked in my GitHub repository.


What Else?

Finishing this post, I'd like to mention about few more things briefly.

  • Partitions vs submodules.
    export module Http;
    
    export import :Client;
    export import :Response;
    export import :Request;
    
    // vs
    
    export module Http;
    
    export import Http.Client;
    export import Http.Response;
    export import Http.Request;

    Both of these appear identical to the average engineers. So why and when would you choose one over another? It's all about tradeoffs:

    When using partitions, every entity in the interface partitions is part of the same module. The module that owns an entity is intended to be part of that entity’s ABI! This means that moving an entity from one module to another is potentially ABI breaking.
    When using submodules, you give users the ability to be more granular in what they import. Despite the potential speed-up from modules, an import boost; that imports the entirety of Boost could be deathly expensive to compile times!

    It sounds like if you work on high cohesion unit, then use partitions. On the other hand, if you have several related things, use submodules.

  • Visible vs reachable.

    If we return back to our Randomizer example with auto return type of GenerateRandomCode, we should see that the following example doesn't compile:

    import Domain.Auth;
    
    int main()
    {
      std::string str{ "Hello" };
    
      auto code = domain::auth::GenerateRandomCode();
    }
    
    // error C2039: 'string': is not a member of 'std'
    // error C2065: 'string': undeclared identifier

    The type std::string is only visible inside the module Domain.Auth, because import <string>; is in the purview, which is private by default. But at the same time this exampe works:

    import Domain.Auth;
    
    int main()
    {
      auto code = domain::auth::GenerateRandomCode();
      code.size(); // size is reachable
      std::string copy = code; // as well as std::string
    }

    I don't know whether it is good or bad. It's weird.

  • Cross platform projects and modules.

    Boris Kolpackov said that include translation was better for cross-platform development. It was 2019. Even though CMake 3.20 introduced experimental support for modules with the Ninja Generator, I don't think it should help. One thing is clear, need to test it a lot before using in the production.

  • Also, you can separate the module interface and implementation in 2 files like .h and .cpp.

    Maybe some engineers will find it convenient and/or useful.

  • What is Binary Module Interface?

    While I was checking out modules I noticed few times about BMI. In short, BMI is a binary file on the filesystem that describes the exported interface of some module. I wish I found more info about BMI.

Summary

It was 4 months jorney for me (russian invasion stopped me for a while). Many things were learned: module structure, import vs export, import vs include, submodules vs partitions, visibility vs reachability, and many examples. No doubts, modules is a game-changing feature. Also, modules should be a big step towards a unified build system and package manager. But, there is always a but, I see several problems:

  1. Modules are still in development.
  2. As an average engineer, I feel the lack of good projects and advanced samples.
  3. Third-party libraries aren't ready for modules yet.

I think we just have to wait...


Sources

And many more which I could forget. If I infringed your's copyright, let me know πŸ™‚


Comments

Popular posts from this blog

My 2021

My 2020 overview