## GCC SIMD vector types

Practical and theoretical implementation discussion.
phkahler
Posts: 5
Joined: Thu Jan 23, 2014 8:11 pm

### GCC SIMD vector types

I'm taking my first dive into vectorization, using a recent version of GCC. I like their method of defining vector types described here:
http://jeanjacques.lacrampe.free.fr/web ... tml#SEC160

This allows you to use ordinary code for basic vector operations, as well as passing them as parameters and return types. This raises a lot of questions for me, but for today I want to focus on the issue of hardware that does not directly support the types used. Specifically I'm creating a 32 byte v4df on a machine (Athlon64 circa 2005) that only has SSE2. They say GCC will still allow the use of the vector types but will revert to a smaller size internally. I got the impression from somewhere that it will do SSE2 vectors in this case, but the documentation is so sparse that I can't confirm. I really don't want to break it down myself or convert to v4sf. Just get the performance available with architecture independent code.

My question is: Will it internally use 2 2-element vectors with SSE2, or will it break it all the way down to scalar code since my hardware doesn't support the full 256bit AVX registers?

Posts: 206
Joined: Fri Dec 02, 2011 8:00 am

### Re: GCC SIMD vector types

phkahler wrote: My question is: Will it internally use 2 2-element vectors with SSE2, or will it break it all the way down to scalar code since my hardware doesn't support the full 256bit AVX registers?
You can just get the assembler source output (http://stackoverflow.com/questions/1370 ... rce-in-gcc) of a very simple (i.e. one line of code with a vector operation) test program and check what kind of instruction the GCC outputs in your case.

P.S. I'm afraid that GCC will fallback to scalar code in your case.

phkahler
Posts: 5
Joined: Thu Jan 23, 2014 8:11 pm

### Re: GCC SIMD vector types

Thanks for the tip. I compiled the following with -S and -O3 in test.c

Code: Select all

typedef v4df __attribute__ ((vector_size (32)));

v4df addem(v4df a, v4df b)
{
return a+b;
}
And got this for output:

Code: Select all

addem:
.LFB0:
.cfi_startproc
movdqa	8(%rsp), %xmm2
movq	%rdi, %rax
movdqa	40(%rsp), %xmm1
movdqa	%xmm1, -104(%rsp)
movq	-104(%rsp), %rdx
movdqa	%xmm1, -88(%rsp)
movq	%rdx, (%rdi)
movq	-80(%rsp), %rdx
movdqa	24(%rsp), %xmm0
movq	%rdx, 8(%rdi)
movdqa	%xmm0, -72(%rsp)
movq	-72(%rsp), %rdx
movq	%rdx, 16(%rdi)
movq	-64(%rsp), %rdx
movq	%rdx, 24(%rdi)
ret
.cfi_endproc
.
While that seems like an aweful lot of instructions for O3, it is using a lot of SSE and there are 2 vector add instructions. I also got the following warnings when I compiled:
test.c: In function ‘addem’:
test.c:4:6: note: The ABI for passing parameters with 32-byte alignment has changed in GCC 4.6
v4df addem(v4df a, v4df b)
^
test.c:4:6: warning: AVX vector argument without AVX enabled changes the ABI [enabled by default]
The attribute "vector size" is different from "mode" which specifies the internal type to use. So by specifying the length instead of type it does get to choose the size. This makes sense because it has a vectorizer - if it did generate scalar code (the worst kind with a loop) it would be able to vectorize it for the target. It makes sense that it can do this, I just needed to be sure.

So vector code that can run on x86, ARM, PPC, or other without modification. GCC will make it use whatever vector resources are available on the target. Isn't it about time C and C++ got an official version of this?

phkahler
Posts: 5
Joined: Thu Jan 23, 2014 8:11 pm

### Re: GCC SIMD vector types

Oops. I left out the double in the typedef. That seems to change the asm but not the conclusion:

Code: Select all

addem:
.LFB0:
.cfi_startproc
movapd	8(%rsp), %xmm2
movq	%rdi, %rax
movapd	40(%rsp), %xmm1
movapd	24(%rsp), %xmm0
movapd	%xmm1, -104(%rsp)
movq	-104(%rsp), %rdx
movapd	%xmm1, -88(%rsp)
movapd	%xmm0, -72(%rsp)
movq	%rdx, (%rdi)
movq	-80(%rsp), %rdx
movq	%rdx, 8(%rdi)
movq	-72(%rsp), %rdx
movq	%rdx, 16(%rdi)
movq	-64(%rsp), %rdx
movq	%rdx, 24(%rdi)
ret
.cfi_endproc

But that's a lot of movq for what?