C++ build process

Introduction

Compilation and linking are two basic processes that happen all the time during C++ software development, but oddly enough, they are not well understood by many C++ developers.

In general, the build process consists of 4 stages:


Before you begin reading this simple article, please remember, initially it was written for self education and it provides a basic explanation of how C++ compiler works.

Prerequisite

Current test project is on github and my development setup is:

Preprocessing

In the first step, the preprocessor essentially performs textual substitution. In reality the preprocessor can do more than this, it can conditionally compile (or ignore) portions of code, and it can expand macros to behave like functions. So the preprocessor does the following:

  • File inclusion, e.g.
    #include <string>
    Includes header files for other libraries, classes, etc. The preprocessor actually copies the entire header into your source file.

  • Macro expansion, e.g.
    #define MAX(a, b) (a > b) ? a : b
    Macro expansion is literally replacement of the macro usage in the code by its definitions.

  • Conditional compilation, e.g.
     #if defined(WIN32) || defined(_WIN32) || defined(__WIN32__)
     // Define something common for Windows 32-bit and 64-bit.
     #ifdef _WIN64
       // Define something for Windows 64-bit only.
     #else
       // And for Windows 32-bit only.
     #endif
     #elif __APPLE__
       // macOS code goes here.
     #elif __linux__
       // Linux code.
     #else
     #error "Unknown compiler."
     #endif
    Conditional behaviour that tells the preprocessor to include code within the conditional declaration if the condition is met. You can use these just like if-else statements, choosing from: #ifdef, #ifndef, #if, #else, and #elif.

  • Remove each comments and replace with a space.

If you want to see what your file looks like after preprocessing, pass gcc the -E option, that tells compiler to perform preprocessing only (not compile, assemble or link), e.g.

 

The  -o option specifies the desired name of the preprocessed source file.

In order to see the preprocessor in action let's refer to our test project. The Greeting class has one macro and small comment. After preprocessing the comment is removed and macro is replaced:



The result files are in build/pre folder.

Please note the number of lines on the last image. The compiler must compile a much larger file than our simple source file. This is because of included headers. And in our example, it is Greeting.ii file that contains only "Utils.h" and <string> headers. In fact, it is more than 150 headers. Here is a header graph that shows all included headers.

Summarizing the discussion above I think the following image overviews the preprocessing:



Compilation

In the second step, the compiler does its main task. It processes each source file (without directives) to produce an assembly code. This is intermediate step between the high-level programming language and getting machine (binary) code.

Pass gcc the -S option, that gives assembler code in the output file, e.g.

As a result, we will get assembly code, e.g.


The result files are in build/asm folder.

This code is still pretty readable (if you know assembler πŸ˜‰) but machines cannot work with it. They work with machine code that will be obtained in the next step.

Also, compilation divides into the following stages:

  • Lexical analysis (producing tokens and lexical errors).
  • Syntactical analysis (producing a parse tree and syntactical errors).
  • Semantic analysis (producing a symbol table, scoping info and scoping/typing errors).
  • Optimization.

Here is a simple image that represents compilation step:



Assembly

Assembly is the third stage. Assembler takes the assembly source code and transforms it into machine code, storing in the object files.

Machine code looks like:



In order to get an object file, use as program, e.g.

 
Assembly step is pretty simple to illustrate:


It is not the end yet. We could have many of object files that must be tied together in one binary (executable) file by means of the linker.

Linking

In the fourth step, the linker combines the object files for a program, along with any library functions that are necessary, into a file containing the complete executable program.

In order to get executable program, let's apply the final command:

There are two types of linking:

  • Linking the functions together by jumping directly to the function. It is static linking. This is more efficient, less flexible and rarely used.
  • Having a table that contains our functions and look up where to jump before jumping to the desired function. This is dynamic linking. It is a little bit slower but much more flexible and is the standard way to ship a library.

And here is an image that illustrate linking stage:


Conclusion

As we just saw, building an executable file from C++ source files is a multi-step process. In short, we could build an executable via one command, e.g.


Comments

Popular posts from this blog

Overview of C++20: Modules

My 2021

My 2020 overview