Reducing Application Size using Link Time Optimization

January 2, 2019 Simon Hausmann

We need to talk about calories! Not the calories from your Christmas cookies — those don’t count. But, calories in your Qt application. We’re going to take a look at a technique that is easy to enable and helps you save precious bytes around your application’s waistline.

The Old vs The New

Traditionally, you would build your application by letting the compiler translate your .cpp source files to machine code. The result is stored in .o object files, which we then pass over to the linker, to resolve references between the files. At this point, the linker does not change the machine code that was generated. This division of work between the compiler and the linker allows for quick development cycles. If you modify one source file, only that file gets recompiled and then the linker quickly re-assembles the application’s binary. Unfortunately, this also means that we are missing out on an opportunity to optimize.

Imagine that your application has two functions: main() in main.cpp and render() in graphics.cpp. As an experienced developer, you keep all your graphics code encapsulated in the render() function — anyone can call it from anywhere! In reality, it is only the application’s main() that calls render(). Theoretically, we could just copy and paste the code in render() into main() — inlining it. This would save the machine code instructions in main() to call render(). Once that’s done, we may even see opportunities to reuse some variables and save even more space and code. Now, if we tried to do this by hand, it would quickly escalate into Spaghetti code with lots of sauce.

Luckily, most compilers these days offer a technique that allows you apply such optimizations (and deal with the spaghetti mess) while retaining the modularity and cleanliness of your code. This is commonly called “Link Time Optimizations” or “Link Time Code Generation”. The latter describes best what really happens: Instead of compiling each source file to machine code one-by-one, we delay the code generation step until the very end — linking time. Code generation at linking time not only enables smart inlining of code, but it also allows for optimizations such as de-virtualizing methods and improved elimination of unused code.

Link Time Optimization in Qt

To enable this technique in Qt, you have to build from source. At the configure step, add -ltcg to the command line options. We thought hard, and this is the most cryptic and vowel-free name we could come up with 😉

To demonstrate the effectiveness of Link Time Code Generation, let’s look at a fresh build of the Qt 5.12 branch, compiled with GCC 7.3.0 for ARMv7 against an imx6 Boot2Qt sysroot. For analysis, we’re going to use Bloaty McBloatface (https://github.com/google/bloaty), which is a lovely size profiler for binaries. The Qt Quick Controls 2 Gallery, statically linked, serves as a sample executable. When running bloaty on it, with a regular Qt build, you’ll see output like this:

    VM SIZE                      FILE SIZE
 --------------                --------------
   0.0%       0 .debug_info      529Mi  83.2%
   0.0%       0 .debug_loc      30.4Mi   4.8%
   0.0%       0 .debug_str      18.6Mi   2.9%
   0.0%       0 .debug_line     14.2Mi   2.2%
  68.1%  13.9Mi .text           13.9Mi   2.2%
   0.0%       0 .debug_ranges   9.60Mi   1.5%
   0.0%       0 .debug_abbrev   6.29Mi   1.0%
  29.5%  6.01Mi .rodata         6.01Mi   0.9%
   0.0%       0 .strtab         3.17Mi   0.5%
   0.0%       0 .symtab         2.35Mi   0.4%
   0.0%       0 .debug_frame    1.80Mi   0.3%
   0.0%       0 .debug_aranges   485Ki   0.1%
   1.2%   249Ki .data.rel.ro     249Ki   0.0%
   0.3%  68.2Ki .ARM.extab      68.2Ki   0.0%
   0.2%  38.2Ki .bss                 0   0.0%
   0.1%  30.3Ki [25 Others]     35.4Ki   0.0%
   0.1%  30.3Ki .got            30.3Ki   0.0%
   0.1%  24.1Ki .ARM.exidx      24.1Ki   0.0%
   0.1%  15.1Ki .dynstr         15.1Ki   0.0%
   0.1%  13.6Ki .data           13.6Ki   0.0%
   0.1%  13.2Ki .dynsym         13.2Ki   0.0%
 100.0%  20.4Mi TOTAL            637Mi 100.0%

 

The “VM SIZE” column is what’s particularly interesting to us — it tells us how much space the different sections of the program consume when loaded into memory. Here, we see that the total cost will be ~20 MB.

Now, let’s compare that to a build with -ltcg enabled.

Comparison between regular and LTCG buildThe new VM size is at 17.3 MiB — that’s nearly a 15% reduction in cost, just by passing a parameter to configure.

This drastic gain here is because we chose a static build. However, even when you use a dynamic build, this optimization is worth it. In this case, LTCG is applied at the boundary of shared libraries.

Bloaty can show this by comparing a regular build against an LTCG-enabled build of libQt5Core.so.5.12.0:

    VM SIZE                      FILE SIZE
 --------------                --------------
...
 -53.8%     -28 [LOAD [RW]]          0  [ = ]
...
 -11.9% -1.78Ki .got           -1.78Ki -11.9%
  -0.2% -3.05Ki .rodata        -3.05Ki  -0.2%
 -10.0% -3.54Ki .rel.dyn       -3.54Ki -10.0%
 -17.2% -7.52Ki .ARM.exidx     -7.52Ki -17.2%
 -16.9% -18.4Ki .ARM.extab     -18.4Ki -16.9%
...
 -21.2%  -691Ki .text           -691Ki -21.2%
 -13.9%  -727Ki TOTAL           -838Ki -13.8%

 

The linker produced a smaller library with less code, less relocations, and a smaller read/write data section.

Conclusion

At this point, this seems like a win-win situation, and you may wonder: Why isn’t this enabled by default? No, it’s not because we’re stingy 😉

One issue is that in the Qt build system, currently, this is a global option. So if we were to enable this with the Qt binaries, everyone using them will be slowed down and it requires them to opt-out explicitly, in the build system. We’re working on fixing that, so that eventually, we can ship Qt with LTCG enabled, and then you can enable this at application level.

Another issue is that by delaying the code generation to link time, we are increasing the time it takes from modifying a single source file to creating a new program or library. It’s almost as if you touch every single source file every time, making it less practical for day-to-day use. But, this optimization is definitely something that fits well into the release process, when creating your final build. So, your Release Manager can use it.

The post Reducing Application Size using Link Time Optimization appeared first on Qt Blog.

Previous
Announcing the Qt Automotive Suite 5.12
Announcing the Qt Automotive Suite 5.12

We are very happy to announce the latest edition of our Qt Automotive Suite 5.12, a unified HMI toolchain a...

Next Article
Tableview performance
Tableview performance

In my previous blog post, I wrote about the new TableView for Qt-5.12. What I didn’t mention was how the ne...

When you play the Game of Code, you use Qt or it's the end for you!

Get Qt 5.12 LTS