🗿jjabrahams567

Forum

Forum
General C++ Programming
AVX and other SIMD

AVX and other SIMD

Aug 11, 2022 at 6:57am

Does anyone have example of code that tends to generate avx or other simd instructions in the resulting compiled assembly file?
I'm on linux, g++, but can switch to clang if it's more likely to produce simd instructions...

I'm certainly curious if there's simple code in which these instructions tend to occur (A kind of hello world SIMD).

Is it as simple as adding two vectors, or is there more to it than that?

Would they have to be vectors of doubles, or could it also handle floats and ints?
Should I be careful about how many elements are in the vector/array, or is it more arbitrary than that?

Also, do they need to be enabled by the command line at compile time? <- I'm seeing -mtune=native as an option if I can guarantee that the client computer is not less than my own in capabilities.

Last edited on Aug 11, 2022 at 7:38am

Aug 11, 2022 at 8:21am

TheIdeasMan (6847)

https://www.codeproject.com/articles/874396/crunching-numbers-with-avx-and-avx

I found this, haven't used it, maybe helpful for you?

Aug 11, 2022 at 1:53pm

kigar64551 (837)

For gcc and g++ (and clang), you can use -march=<cpu-type> to set the target CPU type.

See here for a list of available options:
https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html

You'd have to pick one that supports AVX, so that, at least potentially, the compiler may generate AVX instructions. But the resulting binary will then require AVX support from both, the CPU and the OS.

Probably you want to use -O3 (or at least -O2) too, so that the compiler will extensively optimize your code.

If you want to know if a binary does actually contain AVX instructions, you can do something like this:
objdump -d my_program > disassembled.asm

Then simply check whether the assembly code contains any AVX instructions. A list can be found here:
https://docs.oracle.com/cd/E36784_01/html/E36859/gntbd.html

What code is likely to use AVX instructions, or SIMD instructions in general?

I think your best bet is a loop that performs the same computation on a long sequence (array) of elements.

(And make sure that those computations cannot be optimized away!)

Last edited on Aug 11, 2022 at 2:26pm

Aug 11, 2022 at 3:14pm

keskiverto (10425)

This is old, but might still help:

[EDIT]
???

https://gcc.gnu.org/projects/tree-ssa/vectorization.html

Last edited on Aug 11, 2022 at 7:28pm

Aug 11, 2022 at 5:00pm

newbieg (764)

Thanks guys, those are exactly the resources I needed.
The article from TheIdeasMan is going to keep me busy for a while.
Much appreciated.

Keskiverto, that just links back to this page... would love to see what it was meant to be.

Last edited on Aug 11, 2022 at 5:01pm

Aug 11, 2022 at 7:08pm

mbozzi (3943)

You could try looking at the generated code for parallel algorithms in the C++ standard library,
For example this:

#include <numeric>
#include <execution>

alignas(16) int xs[1'000'000]; 

int f() 
{ 
    return std::reduce(std::execution::unseq, xs, xs + 1'000'000); 
}

https://godbolt.org/z/xYTnEbdz1

Last edited on Aug 11, 2022 at 7:09pm

Topic archived. No new replies allowed.