I'm taking my first dive into vectorization, using a recent version of GCC. I like their method of defining vector types described here:
http://jeanjacques.lacrampe.free.fr/webada/doc/gnat/gcc_6.html#SEC160
This allows you to use ordinary code for basic vector operations, as well as passing them as parameters and return types. This raises a lot of questions for me, but for today I want to focus on the issue of hardware that does not directly support the types used. Specifically I'm creating a 32 byte v4df on a machine (Athlon64 circa 2005) that only has SSE2. They say GCC will still allow the use of the vector types but will revert to a smaller size internally. I got the impression from somewhere that it will do SSE2 vectors in this case, but the documentation is so sparse that I can't confirm. I really don't want to break it down myself or convert to v4sf. Just get the performance available with architecture independent code.
My question is: Will it internally use 2 2-element vectors with SSE2, or will it break it all the way down to scalar code since my hardware doesn't support the full 256bit AVX registers?
GCC SIMD vector types
Re: GCC SIMD vector types
phkahler wrote:My question is: Will it internally use 2 2-element vectors with SSE2, or will it break it all the way down to scalar code since my hardware doesn't support the full 256bit AVX registers?
You can just get the assembler source output (http://stackoverflow.com/questions/1370 ... rce-in-gcc) of a very simple (i.e. one line of code with a vector operation) test program and check what kind of instruction the GCC outputs in your case.
P.S. I'm afraid that GCC will fallback to scalar code in your case.
Re: GCC SIMD vector types
Thanks for the tip. I compiled the following with -S and -O3 in test.c
And got this for output:
While that seems like an aweful lot of instructions for O3, it is using a lot of SSE and there are 2 vector add instructions. I also got the following warnings when I compiled:
The attribute "vector size" is different from "mode" which specifies the internal type to use. So by specifying the length instead of type it does get to choose the size. This makes sense because it has a vectorizer - if it did generate scalar code (the worst kind with a loop) it would be able to vectorize it for the target. It makes sense that it can do this, I just needed to be sure.
So vector code that can run on x86, ARM, PPC, or other without modification. GCC will make it use whatever vector resources are available on the target. Isn't it about time C and C++ got an official version of this?
Code: Select all
typedef v4df __attribute__ ((vector_size (32)));
v4df addem(v4df a, v4df b)
{
return a+b;
}
And got this for output:
Code: Select all
addem:
.LFB0:
.cfi_startproc
movdqa 8(%rsp), %xmm2
movq %rdi, %rax
movdqa 40(%rsp), %xmm1
paddd %xmm2, %xmm1
movdqa %xmm1, -104(%rsp)
movq -104(%rsp), %rdx
movdqa %xmm1, -88(%rsp)
movq %rdx, (%rdi)
movq -80(%rsp), %rdx
movdqa 24(%rsp), %xmm0
movq %rdx, 8(%rdi)
paddd 56(%rsp), %xmm0
movdqa %xmm0, -72(%rsp)
movq -72(%rsp), %rdx
movq %rdx, 16(%rdi)
movq -64(%rsp), %rdx
movq %rdx, 24(%rdi)
ret
.cfi_endproc
.
While that seems like an aweful lot of instructions for O3, it is using a lot of SSE and there are 2 vector add instructions. I also got the following warnings when I compiled:
test.c: In function ‘addem’:
test.c:4:6: note: The ABI for passing parameters with 32-byte alignment has changed in GCC 4.6
v4df addem(v4df a, v4df b)
^
test.c:4:6: warning: AVX vector argument without AVX enabled changes the ABI [enabled by default]
The attribute "vector size" is different from "mode" which specifies the internal type to use. So by specifying the length instead of type it does get to choose the size. This makes sense because it has a vectorizer - if it did generate scalar code (the worst kind with a loop) it would be able to vectorize it for the target. It makes sense that it can do this, I just needed to be sure.
So vector code that can run on x86, ARM, PPC, or other without modification. GCC will make it use whatever vector resources are available on the target. Isn't it about time C and C++ got an official version of this?
Re: GCC SIMD vector types
Oops. I left out the double in the typedef. That seems to change the asm but not the conclusion:
But that's a lot of movq for what?
Code: Select all
addem:
.LFB0:
.cfi_startproc
movapd 8(%rsp), %xmm2
movq %rdi, %rax
movapd 40(%rsp), %xmm1
addpd %xmm2, %xmm1
movapd 24(%rsp), %xmm0
addpd 56(%rsp), %xmm0
movapd %xmm1, -104(%rsp)
movq -104(%rsp), %rdx
movapd %xmm1, -88(%rsp)
movapd %xmm0, -72(%rsp)
movq %rdx, (%rdi)
movq -80(%rsp), %rdx
movq %rdx, 8(%rdi)
movq -72(%rsp), %rdx
movq %rdx, 16(%rdi)
movq -64(%rsp), %rdx
movq %rdx, 24(%rdi)
ret
.cfi_endproc
But that's a lot of movq for what?
Return to “Considered Harmfull”
Who is online
Users browsing this forum: No registered users and 0 guests