One if my main tasks from 2015 on has been optimizing performance on various languages api (mainly C/C++). This post tries to recap best practices in this area.
For those like me who work in IT since the z80 let me say that cpu have changed, a lot; variability in computing time in modern computer architectures is just unavoidable; while we can guarantee the results of a computation we cannot guarantee how fast this computation will be :
“Computer can reproduce anwsers, not performance” : Bryce Adelstein Lellback, https://youtu.be/zWxSZcpeS8Q?t=6m45s
Reasons for variance in computation time can be recap in :
- Hardware jitter : instruction pipelines, cpu frequency scaling and power management, shared caches and many other things
- OS activities : a huge list of things the kernel can do to screw up your benchmark performance
- Observer effect : every time we instruments code to measure performance we introduce variance.
Also warming up the cpu seems to have become necessary to get meaningful results. Running hot instead of cold on a single piece of code is well described here https://youtu.be/zWxSZcpeS8Q?t=18m51s
You have to measure. There is no other way; things that by your experience might look faster if done in a certain way reveal to be slower when measured so put away all your preconceptions and prepare to A/B test your code for performance. Here’s are some hints, not a complete list at all :
1) make sure your code is doing what you expect. Profile your code compiled without the optimizer and check that your are not calling unwanted code (valgrind/kcachegrind for profiling)
2) measure/time your code : I use linux/c this code for duration, gnu scientific library (libgsl) for related math. Check out chrono for c++ and/or google benchmark for a complete framework.
3) as mentioned above warm up the cpu with your code before measuring by running your code a large number of times. Measure the execution time average of a large number of runs. Ideally your measure is good when results have “normal” distribution. Narrow the code you measure until you get normal distributed results.