* [Bug middle-end/55266] vector expansion: 36 movs for 4 adds
2012-11-10 15:10 [Bug middle-end/55266] New: vector expansion: 36 movs for 4 adds glisse at gcc dot gnu.org
@ 2012-11-13 10:23 ` glisse at gcc dot gnu.org
2012-11-28 10:11 ` glisse at gcc dot gnu.org
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: glisse at gcc dot gnu.org @ 2012-11-13 10:23 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55266
--- Comment #1 from Marc Glisse <glisse at gcc dot gnu.org> 2012-11-13 10:23:03 UTC ---
The first copy is PR 52436.
The second copy has a patch posted here:
http://gcc.gnu.org/ml/gcc-patches/2012-11/msg00900.html
The last copy would require turning:
gimple_assign <constructor, _5, {_15, _18}, NULL, NULL>
gimple_assign <ssa_name, *x_2(D), _5, NULL, NULL>
into:
gimple_assign <ssa_name, *x_2(D), _15, NULL, NULL>
gimple_assign <ssa_name, MEM[(vec *)x_2(D) + 16B], _18, NULL, NULL>
(not sure if endianness matters here)
which could maybe more easily be done by splitting the memory write (when the
vector type is not supported) into a suitable number of bit_field_ref
extractions and memory writes and relying on forwprop4 to simplify the
bit_field_refs of the constructor.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug middle-end/55266] vector expansion: 36 movs for 4 adds
2012-11-10 15:10 [Bug middle-end/55266] New: vector expansion: 36 movs for 4 adds glisse at gcc dot gnu.org
2012-11-13 10:23 ` [Bug middle-end/55266] " glisse at gcc dot gnu.org
@ 2012-11-28 10:11 ` glisse at gcc dot gnu.org
2012-12-09 2:08 ` [Bug middle-end/55266] vector expansion: 24 " pinskia at gcc dot gnu.org
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: glisse at gcc dot gnu.org @ 2012-11-28 10:11 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55266
--- Comment #2 from Marc Glisse <glisse at gcc dot gnu.org> 2012-11-28 10:11:31 UTC ---
Author: glisse
Date: Wed Nov 28 10:11:27 2012
New Revision: 193884
URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=193884
Log:
2012-11-28 Marc Glisse <marc.glisse@inria.fr>
PR middle-end/55266
* fold-const.c (fold_ternary_loc) [BIT_FIELD_REF]: Handle
CONSTRUCTOR with vector elements.
* tree-ssa-propagate.c (valid_gimple_rhs_p): Handle CONSTRUCTOR
and BIT_FIELD_REF.
Modified:
trunk/gcc/ChangeLog
trunk/gcc/fold-const.c
trunk/gcc/tree-ssa-propagate.c
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug middle-end/55266] vector expansion: 24 movs for 4 adds
2012-11-10 15:10 [Bug middle-end/55266] New: vector expansion: 36 movs for 4 adds glisse at gcc dot gnu.org
2012-11-13 10:23 ` [Bug middle-end/55266] " glisse at gcc dot gnu.org
2012-11-28 10:11 ` glisse at gcc dot gnu.org
@ 2012-12-09 2:08 ` pinskia at gcc dot gnu.org
2013-03-03 11:58 ` vincenzo.innocente at cern dot ch
2023-07-21 12:12 ` rguenth at gcc dot gnu.org
4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2012-12-09 2:08 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55266
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Last reconfirmed| |2012-12-09
Ever Confirmed|0 |1
--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> 2012-12-09 02:07:58 UTC ---
The other issue is there is no DCE that happens after forwprop4.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug middle-end/55266] vector expansion: 24 movs for 4 adds
2012-11-10 15:10 [Bug middle-end/55266] New: vector expansion: 36 movs for 4 adds glisse at gcc dot gnu.org
` (2 preceding siblings ...)
2012-12-09 2:08 ` [Bug middle-end/55266] vector expansion: 24 " pinskia at gcc dot gnu.org
@ 2013-03-03 11:58 ` vincenzo.innocente at cern dot ch
2023-07-21 12:12 ` rguenth at gcc dot gnu.org
4 siblings, 0 replies; 6+ messages in thread
From: vincenzo.innocente at cern dot ch @ 2013-03-03 11:58 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55266
--- Comment #4 from vincenzo Innocente <vincenzo.innocente at cern dot ch> 2013-03-03 11:58:24 UTC ---
I see still problems when calling inline functions.
It seems that the code to satisfy the "calling ABI" is generated anyhow.
take the example below and compare the code generated for "dotd1" wrt "dotd2"
dotd2 has a "storm" of move before the reduction
c++ -std=c++11 -Ofast -march=corei7 -S conversions.cc -fabi-version=0
the avx version is better but for dotd4 (actually dotd1 is lelf see like)
typedef float __attribute__( ( vector_size( 16 ) ) ) float32x4_t;
typedef double __attribute__( ( vector_size( 32 ) ) ) float64x4_t;
inline
float64x4_t convert(float32x4_t f) {
return float64x4_t{f[0],f[1],f[2],f[3]};
}
float dotf(float32x4_t x, float32x4_t y) {
float ret=0;
for (int i=0;i!=4;++i) ret+=x[i]*y[i];
return ret;
}
inline
double dotd(float64x4_t x, float64x4_t y) {
double ret=0;
for (int i=0;i!=4;++i) ret+=x[i]*y[i];
return ret;
}
float dotd1(float32x4_t x, float32x4_t y) {
float64x4_t dx,dy;
for (int i=0;i!=4;++i) {
dx[i]=x[i]; dy[i]=y[i];
}
double ret=0;
for (int i=0;i!=4;++i) ret+=dx[i]*dy[i];
return ret;
}
float dotd2(float32x4_t x, float32x4_t y) {
float64x4_t dx=convert(x);
float64x4_t dy=convert(y);
return dotd(dx,dy);
}
float dotd3(float32x4_t x, float32x4_t y) {
float64x4_t dx{x[0],x[1],x[2],x[3]};
float64x4_t dy{y[0],y[1],y[2],y[3]};
double ret=0;
for (int i=0;i!=4;++i) ret+=dx[i]*dy[i];
return ret;
}
float dotd4(float32x4_t x, float32x4_t y) {
float64x4_t dx,dy;
for (int i=0;i!=4;++i) {
dx[i]=x[i]; dy[i]=y[i];
}
return dotd(dx,dy);
}
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug middle-end/55266] vector expansion: 24 movs for 4 adds
2012-11-10 15:10 [Bug middle-end/55266] New: vector expansion: 36 movs for 4 adds glisse at gcc dot gnu.org
` (3 preceding siblings ...)
2013-03-03 11:58 ` vincenzo.innocente at cern dot ch
@ 2023-07-21 12:12 ` rguenth at gcc dot gnu.org
4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-07-21 12:12 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55266
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
Status|ASSIGNED |RESOLVED
--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
The original issue is fixed.
f:
.LFB0:
.cfi_startproc
movapd (%rdi), %xmm2
movapd 16(%rdi), %xmm1
movapd %xmm2, %xmm0
addpd %xmm2, %xmm0
addpd %xmm2, %xmm0
movaps %xmm0, (%rdi)
movapd %xmm1, %xmm0
addpd %xmm1, %xmm0
addpd %xmm1, %xmm0
movaps %xmm0, 16(%rdi)
ret
the issue in comment#4 as well I think:
_Z5dotd1Dv4_fS_:
.LFB3:
.cfi_startproc
movaps %xmm1, %xmm3
pxor %xmm2, %xmm2
movhlps %xmm0, %xmm2
cvtps2pd %xmm0, %xmm0
cvtps2pd %xmm2, %xmm1
pxor %xmm2, %xmm2
movhlps %xmm3, %xmm2
cvtps2pd %xmm3, %xmm3
cvtps2pd %xmm2, %xmm2
mulpd %xmm3, %xmm0
mulpd %xmm2, %xmm1
addpd %xmm0, %xmm1
movapd %xmm1, %xmm0
unpckhpd %xmm1, %xmm0
addpd %xmm1, %xmm0
cvtsd2ss %xmm0, %xmm0
ret
.cfi_endproc
.LFE3:
.size _Z5dotd1Dv4_fS_, .-_Z5dotd1Dv4_fS_
.p2align 4
.globl _Z5dotd2Dv4_fS_
.type _Z5dotd2Dv4_fS_, @function
_Z5dotd2Dv4_fS_:
.LFB4:
.cfi_startproc
movaps %xmm1, %xmm3
cvtps2pd %xmm0, %xmm4
pxor %xmm2, %xmm2
movhlps %xmm0, %xmm2
pxor %xmm0, %xmm0
movhlps %xmm3, %xmm0
cvtps2pd %xmm2, %xmm2
cvtps2pd %xmm1, %xmm1
cvtps2pd %xmm0, %xmm0
mulpd %xmm4, %xmm1
mulpd %xmm0, %xmm2
addpd %xmm2, %xmm1
movapd %xmm1, %xmm0
unpckhpd %xmm1, %xmm0
addpd %xmm1, %xmm0
cvtsd2ss %xmm0, %xmm0
ret
^ permalink raw reply [flat|nested] 6+ messages in thread