* [Bug target/26546] missed optimization with respect of vector intrinsics
[not found] <bug-26546-4@http.gcc.gnu.org/bugzilla/>
@ 2013-02-26 10:42 ` rguenth at gcc dot gnu.org
2021-12-06 23:49 ` pinskia at gcc dot gnu.org
1 sibling, 0 replies; 2+ messages in thread
From: rguenth at gcc dot gnu.org @ 2013-02-26 10:42 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26546
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords|meta-bug |
Target| |x86_64-*-*, i?86-*-*
Component|tree-optimization |target
Version|4.1.0 |4.8.0
Summary|[meta-bugs] couple of |missed optimization with
|missed optimization with |respect of vector
|respect of vector and |intrinsics
|unions |
--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> 2013-02-26 10:42:04 UTC ---
I get:
main:
.LFB518:
.cfi_startproc
xorps %xmm0, %xmm0
subq $8, %rsp
.cfi_def_cfa_offset 16
movl $.LC0, %edi
movl $1, %eax
unpcklps %xmm0, %xmm0
cvtps2pd %xmm0, %xmm0
call printf
xorl %eax, %eax
addq $8, %rsp
.cfi_def_cfa_offset 8
ret
with 4.8 and the asm from the description with 4.7 (with -O2). Leaving
the union uninitialized of course makes it a bad testcase and probably
makes it optimized in the first place.
With
#include <xmmintrin.h>
typedef union
{
__m128 vec;
float data[4];
struct { float x,y,z,w; };
} vec4f_t;
static inline float __attribute__((__always_inline__))
acc(vec4f_t src)
{
float a;
src.vec = _mm_add_ps(src.vec, _mm_movehl_ps(src.vec, src.vec));
_mm_store_ss(&a, _mm_add_ss(src.vec, _mm_shuffle_ps(src.vec, src.vec,
_MM_SHUFFLE(3,2,1,1))));
return a;
}
vec4f_t b;
int
main(int argc, char *argv[])
{
__builtin_printf("%f\n", acc(b));
return 0;
}
we are back to the unoptimized assembly. Tree optimizers have no chance
optimizing this because they see target builtins:
__m128 src;
float a;
double _2;
__m128 _4;
__m128 _5;
__m128 _6;
__m128 _7;
<bb 2>:
src_9 = MEM[(union *)&b];
_4 = __builtin_ia32_movhlps (src_9, src_9);
_5 = __builtin_ia32_addps (src_9, _4);
_6 = __builtin_ia32_shufps (_5, _5, 229);
_7 = __builtin_ia32_addss (_5, _6);
a_8 = __builtin_ia32_vec_ext_v4sf (_7, 0);
_2 = (double) a_8;
printf ("%f\n", _2);
but in all this is now a target issue.
^ permalink raw reply [flat|nested] 2+ messages in thread