Meta Functions

As far as I know, metaprogramming is used in C++ for better performance. I tired to test and figure it out using the example code below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
#include <iostream>
#include <chrono>

int sum(int x, int y) {  // regular function 
    return x + y;
}

template <int X, int Y> // Meta function
struct IntSum {
    static constexpr int value = X + Y;
};

int main()
{
    auto start1 = std::chrono::high_resolution_clock::now();
    std::cout << IntSum<2, 4>::value << '\n';
    auto end1 = std::chrono::high_resolution_clock::now();
    std::chrono::duration<double> delta1 = end1 - start1;

    auto start2 = std::chrono::high_resolution_clock::now();
    std::cout << sum(2, 4) << '\n';
    auto end2 = std::chrono::high_resolution_clock::now();
    std::chrono::duration<double> delta2 = end2 - start2;

    std::cout << delta1.count() << ' ' << delta2.count() << '\n';
    return 0;
}


Two questions:
1) Is this a proper example of comparing a meta function (the metaprogramming usage) to a regular function to see the metaprogramming performance effect?
2) If so, why does the regular function run dozens of times faster than that meta function in effect, then!?
https://coliru.stacked-crooked.com/a/c5ab590caf3dfcd2
Last edited on
As far as I know, meta programming is used in C++ for better performance.


Not necessarily. Meta programming can be used to compute types at compile time, as well as containers of types. Have a look at boost hana.

When one uses literal values in a program, the optimiser can remove most of the code except returning the value, because it is known.

You are still using high resolution clock, you need to check if this is ok. Have mentioned steady clock before.

When I compile with this (removed -pthread)
g++ -std=c++20 -O3 -Wall -Wextra -pedantic main.cpp && ./a.out

6
6
6.394e-05 1.029e-06
The code that you measure probably takes too short time for it to be reliable. Repeat the experiment and you're likely to get a different result. Sometimes the order in which you do things matter. The code surrounding it might matter. The alignment where the compiler ends up placing the functions could also make a difference. But if you just compare the code generated from the two pieces of code they appear to be identical.
Last edited on
The calculation of the sums takes essentially no time at all. What you're measuring is mainly the output code with std::cout.
Peter87 is exactly right, as always :+)

To do any kind of timing code properly, use at least 1 million data points. For this case, randomly generated but small enough not to overflow their type, try std::size_t max value divided by 3 as a limit say. But it probably won't be much different for this trivial case.

But I still have doubts about the premise of the question:

https://en.wikipedia.org/wiki/Template_metaprogramming

With the computation of values or static tables the gain at runtime is done at the expense of compile time. But usually the computation would be more complex, as in the wiki examples using squaring and factorial. So this would involve more compile time.
Rather a lot of integer truncation, but it makes a point.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
#include <iostream>
#include <chrono>
#include <cmath>

int sum(int x, int y) {  // regular function 
    return log( x * x ) * log( y * y );
}

template <int X, int Y> // Meta function
struct IntSum {
    static constexpr int value = log( X * X ) * log( Y * Y );
};

int main()
{
   const int LOTS = 10000000;
   int dummy;
 
   auto start1 = std::chrono::high_resolution_clock::now();
   for ( int i = 0; i < LOTS; i++ ) dummy = IntSum<2, 4>::value;
   auto end1 = std::chrono::high_resolution_clock::now();
   std::chrono::duration<double> delta1 = end1 - start1;
   std::cout << dummy << '\n';
   
   auto start2 = std::chrono::high_resolution_clock::now();
   for ( int i = 0; i < LOTS; i++ ) dummy = sum(2, 4);
   auto end2 = std::chrono::high_resolution_clock::now();
   std::chrono::duration<double> delta2 = end2 - start2;
   std::cout << dummy << '\n';
   
   std::cout << delta1.count() << ' ' << delta2.count() << '\n';
}


With default (-O2) optimisation:
3
3
1.68e-07 0.745189



But, rather brilliantly(!), with -O3 optimisation:
3
3
1.9e-07 5.5e-08


Modern compilers really are amazing! I have to make one of the arguments to the function volatile to stop it recognising the repeat.
Last edited on
For run-time performance, there's also constexpr/consteval/constinit which evaluates at compile time - even without using templates. As constexpr etc cfan be applied to a function (and since C++20 also to local new/delete) quite complicated functions can be evaluated at compile time (at the expense of greater compile time) rather than run-time. Also in c++20, std::vector, std::string etc can be used within an constexpr function!

This is purely compile-time! :

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
#include <vector>

constexpr int testVector(size_t n) {
	std::vector<size_t> vec(n, 1);

	size_t sum {};

	for (const auto& elem : vec)
		sum += elem;

	return n == sum;
}

int main() {
	static_assert(testVector(10));
}

Last edited on
What I get from this thread, including almost all the examples, is that, metaprogramming is not better in terms of performance than the regular function, whether little or huge chunks of data is involved. I don't know why metaprogramming is said to be that important or something like a good feature. Or better to say, I still don't know where or when metaprogramming gives better performance. :|
I'm not sure that you get that at all, @Frek.

The problem with your original code was that the individual task was (a) small, (b) inflexible (2 fixed arguments), (c) not a complex problem. So, basically, it wasn't worth doing by metaprogramming.

Metaprogramming should give better performance IF you have a COMPLEX problem that you COULD do at COMPILE time and not have to repeat at run time.
Last edited on
Or better to say, I still don't know where or when metaprogramming gives better performance.


Did you read the links I posted?


https://en.wikipedia.org/wiki/Template_metaprogramming

http://boostorg.github.io/hana/
I am not sure there is a formula or anything you can memorize to say 'this block is faster with it, that block is faster without it'. Its too dependent on too many things -- your compiler and how it does things, your specific code, probably other factors I am not thinking of right now.

My 2 cents: I don't think the point of metaprogramming is to have any effect on performance, but to avoid code duplication by generating code at compile time. Sometimes the code generated this way is more performant, sometimes less so, but it doesn't matter because performance was never the point.
@TheIdeasMan
I couldn't read the boost page since I'm not used to it and therefore it can't make an effect to clear the ambiguity of the whole idea of meta programming to where or where use it. The second link (Wikipedia) was find and I could learn some things from it, but yet the last part of that too, was too sophisticated. :(

Up to now what I've gathered about meta programming is that it is generally used where there's some code that can be performed/evaluated at compile time which involves const/constexpr data/functions. But to me, this feature is almost useless. It's as though there's a code that when called it delivers the factorial of 10, a fixed number each time! We could use the exact number which equals to the return value of such a function, instead. Since nothing is at run time, so the benefits is almost nothing! :| But it must not be this way and I'm almost sure I'm mistaken.

@helios
to avoid code duplication by generating code at compile time.
This per se may make a positive impact on performance, not?
Will you please offer an example of that usage you mentioned?
To get more useful ideas about meta and compile time programming watch this Video from Jason Turner at CppCon 2021

https://www.youtube.com/watch?v=MdrfPSUtMVM
This per se may make a positive impact on performance, not?
Like I said, sometimes it'll faster, other times it'll be slower.

Unfortunately every example of MP is going to seem a bit contrived, but please bear with me.
Imagine you have a structure like this:
1
2
3
4
struct Buffer{
    int type;
    char data[256];
};
where data is a raw buffer that is the in-memory representation of an array of numbers of some type. It could be an array of ints or an array of doubles. The format of those numbers is specified by the type member.
Now, you want a Buffer convert(const Buffer &, int new_type); that correctly converts a Buffer from one number format to another (for the sake of simplicity let's ignore that the destination array may need more or less space to store the same numbers).
convert() would look something like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
switch (src.type){
    case 0:
        switch (new_type){
            case 0:
                //Do nothing.
                return src;
            case 1:
                return convert_char_to_short(src);
            case 2:
                return convert_char_to_int(src);
            //...
        }
        break;
    case 1:
        switch (new_type){
    //and so on
}
If type can have any of, say, six values, convert() would need to handle 36 cases in total, and you'd need to write all those functions, too.
Now, if you can write a general template <typename Src, typename Dst> Dst convert_number(Src);, you can also avoid writing those huge switches and instead iterate first over the value of src.type and then over the value of new_type to select the compile-time template values of Src and Dst.
It's somewhat difficult to read if you're not used to template metaprogramming, but I've implemented this here: https://github.com/Helios-vmg/napalm/blob/master/src/napalm/SampleConverterFilter.cpp#L336
SampleConverterFilter is a class that convert an audio buffer between different sample sizes. For example, signed 16-bit, signed 24-bit, signed 32-bit, float, etc. The highlighted line chooses the right template function based on run time values.

EDIT:
These are the conversion functions:
https://github.com/Helios-vmg/napalm/blob/master/src/napalm/converters.inl
They're generated by a program because there's no way to do it with TMP (or it would have been obscenely complex).
Last edited on
frek wrote:
Up to now what I've gathered about meta programming is that it is generally used where there's some code that can be performed/evaluated at compile time which involves const/constexpr data/functions. But to me, this feature is almost useless. It's as though there's a code that when called it delivers the factorial of 10, a fixed number each time! We could use the exact number which equals to the return value of such a function, instead. Since nothing is at run time, so the benefits is almost nothing! :| But it must not be this way and I'm almost sure I'm mistaken.


But that is only one part of TMP, here is the first paragraph of the wiki page:

wiki wrote:
Template metaprogramming (TMP) is a metaprogramming technique in which templates are used by a compiler to generate temporary source code, which is merged by the compiler with the rest of the source code and then compiled. The output of these templates can include compile-time constants, data structures, and complete functions. The use of templates can be thought of as compile-time polymorphism. The technique is used by a number of languages, the best-known being C++, but also Curl, D, Nim, and XL.


With constants, the idea is to calculate them once at compile time, then use them many times as an actual value (with no calculation time )at runtime. If there is a small number of constants, one could some other method to calc them (a spreadsheet say) then have them as a vector of literal values - this would achieve the same thing. Or one could write another c++ program to calc these values and store them in a file, then your program can read them from the file - but this involves the overhead of reading the file.

I couldn't read the boost page since I'm not used to it and therefore it can't make an effect to clear the ambiguity of the whole idea of meta programming to where or where use it.


When trying to digest complex articles, try to get the overall picture into your head first. Then break it down into pieces, understand them, look at examples.

With the Hana library they talk about the 4 computational quadrants:

1. Runtime computations which are the usual computations we use in C++. In that world, we have runtime containers, runtime functions and runtime algorithms:

2. constexpr computations. There, we have constexpr containers, constexpr functions and constexpr algorithms: These maybe done at compile time.

3. heterogeneous computations. Heterogeneous computations differ from normal computations in that instead of having containers holding homogeneous objects (all objects having the same type), the containers may hold objects with different types. Furthermore, functions in this quadrant of computation are heterogeneous functions, which is a complicated way of talking about template functions. Similarly, we have heterogeneous algorithms that manipulate heterogeneous containers and functions:

4. type-level computations. In this quadrant, we have type-level containers, type-level functions (usually called metafunctions) and type-level algorithms. Here, everything operates on types: containers hold types and metafunctions take types as arguments and return types as results.

Here is another use of TMP, on my Linux system there is the type traits header file, it is #include <type_traits>

/usr/include/c++/11/type_traits

It may be a little hard to read, but one can see how they build up types:

There is integral_constant, then true_type and false_type and so on - there are lots of them. One can see how these type are made by inheriting from a previously defined type. All this is used by the things found on this page:

https://en.cppreference.com/w/cpp/types#Type_traits

I hope this helps :+)
A metaprogram is just a program whose input or output is another computer program. Under this definition, all templates are meta-programs, all compilers and interpreters are meta-programs. The C preprocessor is one; so is the linker.

Consider the C++ compiler as a metaprogram. It doesn't expand our capabilities; everything possible with C++ was already possible with machine language. Still C++ is not useless because it offers practical benefits to performance, correctness, and programmer productivity. The C++ compiler is a dramatic example of a metaprogram compared to the silly template IntSum, but the comparison is true anyway.

Even if we limit ourselves to meta-programs written in C++ templates, there are examples more interesting than IntSum:
- Robert Ramey's Safe Numerics is a template library which implements a compiler for a language which does arithmetic safely. If arithmetic is expressed using the library's language, it is guaranteed to either be free of integer overflow or to result in a compiler error.
- Hana Dusikova's compile time regular expression library is a template library which implements a compiler for the language of regular expressions.
- Boost Spirit is a (older) template library which implements a parser generator - a compiler for the language of attribute grammars.
All three examples offer benefits to correctness and productivity; the second (and maybe the third) offer benefits to performance.

As a self-contained example, one important example of meta-programming for performance involves the templates used in the definition of std::move, which is used to help the compiler avoid unnecessary copies of data structures. It looks something like this:
1
2
3
4
5
6
7
8
template <typename T> struct my_remove_reference { using type = T; };
template <typename T> struct my_remove_reference<T&> { using type = T; }; 
template <typename T> struct my_remove_reference<T&&> { using type = T; };
template <typename T> 
  constexpr typename my_remove_reference<T>::type&& my_move(T&& x) noexcept
  { 
    return static_cast<typename my_remove_reference<T>::type&&>(x); 
  }

This program fragment relies on the definition of the class template
my_remove_reference
to write classes that look (conceptually) like this:
struct my_remove_reference<int&> { using type = int; };
The set of classes produced by this template are needed to operate with types that aren't known to the library authors.

The alternative is to write a set of repetitive functions that enumerate the required behavior for every type that is ever passed to my_move:
1
2
3
4
5
int&& my_move(int& x) { return static_cast<int&&>(x); }
int&& my_move(int&& x) { return static_cast<int&&>(x); }
float&& my_move(float& x) { return static_cast<float&&>(x); }
float&& my_move(float&& x) { return static_cast<float&&>(x); }
// ... 
Writing this is tedious and error-prone, better done by machine. A metaprogram is the right tool here.
Last edited on
Topic archived. No new replies allowed.