Main entry point to nanobench's benchmarking facility.
It holds configuration and results from one or more benchmark runs. Usually it is used in a single line, where the object is constructed, configured, and then a benchmark is run. E.g. like this:
ankerl::nanobench::Bench().unit("byte").batch(1000).run("random fluctuations", [&] {
// here be the benchmark code
});
In that example Bench() constructs the benchmark, it is then configured with unit() and batch(), and after configuration a benchmark is executed with run(). Once run() has finished, it prints the result to std::cout. It would also store the results in the Bench instance, but in this case the object is immediately destroyed so it's not available any more.
E.g. number of processed byte, or some other metric for the size of the processed data in each iteration. If you benchmark hashing of a 1000 byte long string and want byte/sec as a result, you can specify 1000 as the batch size.
Modern processors have a very accurate clock, being able to measure as low as 20 nanoseconds.
This is the main trick nanobech to be so fast: we find out how accurate the clock is, then run the benchmark only so often that the clock's accuracy is good enough for accurate measurements.
The default is to run one epoch for 1000 times the clock resolution. So for 20ns resolution and 11 epochs, this gives a total runtime of
To be precise, nanobench adds a 0-20% random noise to each evaluation. This is to prevent any aliasing effects, and further improves accuracy.
Total runtime will be higher though: Some initial time is needed to find out the target number of iterations for each epoch, and there is some overhead involved to start & stop timers and calculate resulting statistics and writing the output.
Parameters
multiple
Target number of times of clock resolution. Usually 1000 is a good compromise between runtime and accuracy.
Calculates Big O of the results with all preconfigured complexity functions. Currently these complexity functions are fitted into the benchmark results:
, , , , , .
If we e.g. evaluate the complexity of std::sort, this is the result of std::cout << bench.complexityBigO():
embed:rst
Sets N for asymptotic complexity calculation, so it becomes possible to calculate `Big O
<https://en.wikipedia.org/wiki/Big_O_notation>`_ from multiple benchmark evaluations.
Use :cpp:func:`ankerl::nanobench::Bench::complexityBigO` when the evaluation has finished. See the tutorial
:ref:`asymptotic-complexity` for details.
Template Parameters
T
Any type is cast to double.
Parameters
b
Length of N for the next benchmark run, so it is possible to calculate bigO.
Retrieves all benchmark results collected by the bench object so far.
Each call to run() generates a Result that is stored within the Bench instance. This is mostly for advanced users who want to see all the nitty gritty detials.
Returns
All results collected so far.
embed:rst
Convenience shortcut to :cpp:func:`ankerl::nanobench::doNotOptimizeAway`.
Sets exactly the number of iterations for each epoch.
Ignores all other epoch limits. This forces nanobench to use exactly the given number of iterations for each epoch, not more and not less. Default is 0 (disabled).
Parameters
numIters
Exact number of iterations to use. Set to 0 to disable.
Controls number of epochs, the number of measurements to perform.
The reported result will be the median of evaluation of each epoch. The higher you choose this, the more deterministic the result be and outliers will be more easily removed. Also the err% will be more accurate the higher this number is. Note that the err% will not necessarily decrease when number of epochs is increased. But it will be a more accurate representation of the benchmarked code's runtime stability.
Choose the value wisely. In practice, 11 has been shown to be a reasonable choice between runtime performance and accuracy. This setting goes hand in hand with minEpocIterations() (or minEpochTime()). If you are more interested in median runtime, you might want to increase epochs(). If you are more interested in mean runtime, you might want to increase minEpochIterations() instead.
As a safety precausion if the clock is not very accurate, we can set an upper limit for the maximum evaluation time per epoch. Default is 100ms. At least a single evaluation of the benchmark is performed.
Sets the minimum number of iterations each epoch should take.
Default is 1, and we rely on clockResolutionMultiple(). If the err% is high and you want a more smooth result, you might want to increase the minimum number or iterations, or increase the minEpochTime().
Default is zero, so we are fully relying on clockResolutionMultiple(). In most cases this is exactly what you want. If you see that the evaluation is unreliable with a high err%, you can increase either minEpochTime() or minEpochIterations().
On Linux nanobench has a powerful feature to use performance counters. This enables counting of retired instructions, count number of branches, missed branches, etc. On default this is enabled, but you can disable it if you don't need that feature.
Repeatedly calls op() based on the configuration, and performs measurements.
This call is marked with noinline to prevent the compiler to optimize beyond different benchmarks. This can have quite a big effect on benchmark accuracy.
embed:rst
.. note::
Each call to your lambda must have a side effect that the compiler can't possibly optimize it away. E.g. add a result to an
externally defined number (like `x` in the above example), and finally call `doNotOptimizeAway` on the variables the compiler
must not remove. You can also use :cpp:func:`ankerl::nanobench::doNotOptimizeAway` directly in the lambda, but be aware that
this has a small overhead.
Sets the time unit to be used for the default output.
Nanobench defaults to using ns (nanoseconds) as output in the markdown. For some benchmarks this is too coarse, so it is possible to configure this. E.g. use timeUnit(1ms, "ms") to show ms/op instead of ns/op.
Parameters
tu
Time unit to display the results in, default is 1ns.
Defaults to "op". Could be e.g. "byte" for string processing. This is used for the table header, e.g. to show ns/byte. Use singular (byte, not bytes). A change clears the currently collected results.
Sets a number of iterations that are initially performed without any measurements.
Some benchmarks need a few evaluations to warm up caches / database / whatever access. Normally this should not be needed, since we show the median result so initial outliers will be filtered away automatically. If the warmup effect is large though, you might want to set it. Default is 0.