public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/102494] New: Failure to optimize out vector reduction properly especially when using OpenMP
@ 2021-09-27 0:52 gabravier at gmail dot com
2021-09-27 1:45 ` [Bug tree-optimization/102494] " pinskia at gcc dot gnu.org
` (11 more replies)
0 siblings, 12 replies; 13+ messages in thread
From: gabravier at gmail dot com @ 2021-09-27 0:52 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102494
Bug ID: 102494
Summary: Failure to optimize out vector reduction properly
especially when using OpenMP
Product: gcc
Version: 12.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
#include <stdint.h>
#include <stddef.h>
typedef int8_t simde_int8x8_t __attribute__((__vector_size__(8)));
int16_t
simde_vaddlv_s8(simde_int8x8_t a) {
int16_t r = 0;
#pragma omp simd reduction(+:r)
for (size_t i = 0 ; i < (sizeof(a) / sizeof(a[0])) ; i++) {
r += a[i];
}
return r;
}
Compiled with -O3 -fopenmp-simd, this is the output on AMD64:
simde_vaddlv_s8(signed char __vector(8)):
pxor xmm1, xmm1
movdqa xmm2, xmm0
pcmpgtb xmm1, xmm0
punpcklbw xmm0, xmm1
punpcklbw xmm2, xmm1
pshufd xmm0, xmm0, 78
movq QWORD PTR [rsp-24], xmm2
movq QWORD PTR [rsp-16], xmm0
movdqa xmm0, XMMWORD PTR [rsp-24]
psrldq xmm0, 8
paddw xmm0, XMMWORD PTR [rsp-24]
movdqa xmm1, xmm0
psrldq xmm1, 4
paddw xmm0, xmm1
movdqa xmm1, xmm0
psrldq xmm1, 2
paddw xmm0, xmm1
pextrw eax, xmm0, 0
ret
This is what Clang manages:
simde_vaddlv_s8(signed char __vector(8)):
punpcklbw xmm0, xmm0 # xmm0 =
xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
psraw xmm0, 8
pshufd xmm1, xmm0, 238 # xmm1 = xmm0[2,3,2,3]
paddw xmm1, xmm0
pshufd xmm0, xmm1, 85 # xmm0 = xmm1[1,1,1,1]
paddw xmm0, xmm1
movdqa xmm1, xmm0
psrld xmm1, 16
paddw xmm1, xmm0
movd eax, xmm1
ret
Weirdly enough, removing the `#pragma omp simd reduction(+r)` slightly improves
GCC's output to this:
simde_vaddlv_s8(signed char __vector(8)):
pxor xmm1, xmm1
movdqa xmm2, xmm0
pcmpgtb xmm1, xmm0
punpcklbw xmm0, xmm1
punpcklbw xmm2, xmm1
pshufd xmm0, xmm0, 78
paddw xmm0, xmm2
pextrw edx, xmm0, 1
pextrw eax, xmm0, 0
add eax, edx
pextrw edx, xmm0, 2
add eax, edx
pextrw edx, xmm0, 3
add eax, edx
ret
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug tree-optimization/102494] Failure to optimize out vector reduction properly especially when using OpenMP
2021-09-27 0:52 [Bug tree-optimization/102494] New: Failure to optimize out vector reduction properly especially when using OpenMP gabravier at gmail dot com
@ 2021-09-27 1:45 ` pinskia at gcc dot gnu.org
2021-09-27 3:01 ` crazylht at gmail dot com
` (10 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-09-27 1:45 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102494
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target| |x86_64-*-*
Keywords| |missed-optimization
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Both with and without -fopenmp-simd works on aarch64-linux-gnu which has a
reduction addition.
Just looks like how reduction addition is handled for x86_64 really.
Also we have:
MEM <vector(4) short int> [(short int *)&D.2916] = vect__21.35_111;
MEM <vector(4) short int> [(short int *)&D.2916 + 8B] = vect__21.35_112;
vect__24.24_88 = MEM <vector(8) short int> [(short int *)&D.2916];
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug tree-optimization/102494] Failure to optimize out vector reduction properly especially when using OpenMP
2021-09-27 0:52 [Bug tree-optimization/102494] New: Failure to optimize out vector reduction properly especially when using OpenMP gabravier at gmail dot com
2021-09-27 1:45 ` [Bug tree-optimization/102494] " pinskia at gcc dot gnu.org
@ 2021-09-27 3:01 ` crazylht at gmail dot com
2021-09-27 5:08 ` [Bug tree-optimization/102494] Failure to optimize " crazylht at gmail dot com
` (9 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: crazylht at gmail dot com @ 2021-09-27 3:01 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102494
--- Comment #2 from Hongtao.liu <crazylht at gmail dot com> ---
It seems x86 doesn't supports optab reduc_plus_scal_v8hi yet.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug tree-optimization/102494] Failure to optimize vector reduction properly especially when using OpenMP
2021-09-27 0:52 [Bug tree-optimization/102494] New: Failure to optimize out vector reduction properly especially when using OpenMP gabravier at gmail dot com
2021-09-27 1:45 ` [Bug tree-optimization/102494] " pinskia at gcc dot gnu.org
2021-09-27 3:01 ` crazylht at gmail dot com
@ 2021-09-27 5:08 ` crazylht at gmail dot com
2021-09-27 5:13 ` crazylht at gmail dot com
` (8 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: crazylht at gmail dot com @ 2021-09-27 5:08 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102494
--- Comment #3 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Hongtao.liu from comment #2)
> It seems x86 doesn't supports optab reduc_plus_scal_v8hi yet.
vectorizer does the work for backend.
typedef short v8hi __attribute__((vector_size(16)));
short
foo1 (v8hi p, int n)
{
short sum = 0;
for (int i = 0; i != 8; i++)
sum += p[i];
return sum;
}
# sum_21 = PHI <sum_9(3)>
# vect_sum_9.26_5 = PHI <vect_sum_9.26_6(3)>
_22 = (vector(8) unsigned short) vect_sum_9.26_5;
_23 = VEC_PERM_EXPR <_22, { 0, 0, 0, 0, 0, 0, 0, 0 }, { 4, 5, 6, 7, 8, 9, 10,
11 }>;
_24 = _23 + _22;
_25 = VEC_PERM_EXPR <_24, { 0, 0, 0, 0, 0, 0, 0, 0 }, { 2, 3, 4, 5, 6, 7, 8,
9 }>;
_26 = _25 + _24;
_27 = VEC_PERM_EXPR <_26, { 0, 0, 0, 0, 0, 0, 0, 0 }, { 1, 2, 3, 4, 5, 6, 7,
8 }>;
_28 = _27 + _26;
stmp_sum_9.27_29 = BIT_FIELD_REF <_28, 16, 0>;
But for the case in PR, it's v8qi -> 2 v4hi, and no vector reduction for v4hi.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug tree-optimization/102494] Failure to optimize vector reduction properly especially when using OpenMP
2021-09-27 0:52 [Bug tree-optimization/102494] New: Failure to optimize out vector reduction properly especially when using OpenMP gabravier at gmail dot com
` (2 preceding siblings ...)
2021-09-27 5:08 ` [Bug tree-optimization/102494] Failure to optimize " crazylht at gmail dot com
@ 2021-09-27 5:13 ` crazylht at gmail dot com
2021-09-27 5:55 ` crazylht at gmail dot com
` (7 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: crazylht at gmail dot com @ 2021-09-27 5:13 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102494
--- Comment #4 from Hongtao.liu <crazylht at gmail dot com> ---
>
> But for the case in PR, it's v8qi -> 2 v4hi, and no vector reduction for
> v4hi.
We need add (define_expand "reduc_plus_scal_v4hi" just like (define_expand
"reduc_plus_scal_v8qi" in mmx.md.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug tree-optimization/102494] Failure to optimize vector reduction properly especially when using OpenMP
2021-09-27 0:52 [Bug tree-optimization/102494] New: Failure to optimize out vector reduction properly especially when using OpenMP gabravier at gmail dot com
` (3 preceding siblings ...)
2021-09-27 5:13 ` crazylht at gmail dot com
@ 2021-09-27 5:55 ` crazylht at gmail dot com
2021-09-27 8:47 ` rguenth at gcc dot gnu.org
` (6 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: crazylht at gmail dot com @ 2021-09-27 5:55 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102494
--- Comment #5 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Hongtao.liu from comment #4)
> >
> > But for the case in PR, it's v8qi -> 2 v4hi, and no vector reduction for
> > v4hi.
>
> We need add (define_expand "reduc_plus_scal_v4hi" just like (define_expand
> "reduc_plus_scal_v8qi" in mmx.md.
Also for reduc_{umax,umin,smax,smin}_scal_v4hi
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug tree-optimization/102494] Failure to optimize vector reduction properly especially when using OpenMP
2021-09-27 0:52 [Bug tree-optimization/102494] New: Failure to optimize out vector reduction properly especially when using OpenMP gabravier at gmail dot com
` (4 preceding siblings ...)
2021-09-27 5:55 ` crazylht at gmail dot com
@ 2021-09-27 8:47 ` rguenth at gcc dot gnu.org
2021-09-28 6:57 ` crazylht at gmail dot com
` (5 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-09-27 8:47 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102494
--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
The vectorizer looks for a way to "shift" the whole vector by either vec_shr
or a corresponding vec_perm with constant shuffle operands. When the target
provides none of those you get element extracts and scalar adds.
So yes, the vectorizer does the work for you but only if you hand it the
pieces.
It could possibly use a larger vector, doing only the "tail" of its final
reduction, so try with v8hi instead of v4hi, but it's not really clear if
such strategy would be good in general.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug tree-optimization/102494] Failure to optimize vector reduction properly especially when using OpenMP
2021-09-27 0:52 [Bug tree-optimization/102494] New: Failure to optimize out vector reduction properly especially when using OpenMP gabravier at gmail dot com
` (5 preceding siblings ...)
2021-09-27 8:47 ` rguenth at gcc dot gnu.org
@ 2021-09-28 6:57 ` crazylht at gmail dot com
2021-09-28 7:09 ` rguenther at suse dot de
` (4 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: crazylht at gmail dot com @ 2021-09-28 6:57 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102494
--- Comment #7 from Hongtao.liu <crazylht at gmail dot com> ---
After supporting v4hi reduce, gimple seems not optimal to convert v8qi to v8hi.
6 vector(4) short int vect__21.36;
7 vector(4) unsigned short vect__2.31;
8 int16_t stmp_r_17.17;
9 vector(8) short int vect__16.15;
10 int16_t D.2229[8];
11 vector(8) short int _50;
12 vector(8) short int _51;
13 vector(8) short int _52;
14 vector(8) short int _53;
15 vector(8) short int _54;
16 vector(8) short int _55;
18 <bb 2> [local count: 189214783]:
19 vect__2.31_97 = [vec_unpack_lo_expr] a_90(D);
20 vect__2.31_98 = [vec_unpack_hi_expr] a_90(D);
21 vect__21.36_105 = VIEW_CONVERT_EXPR<vector(4) short int>(vect__2.31_97);
22 vect__21.36_106 = VIEW_CONVERT_EXPR<vector(4) short int>(vect__2.31_98);
23 MEM <vector(4) short int> [(short int *)&D.2229] = vect__21.36_105;
24 MEM <vector(4) short int> [(short int *)&D.2229 + 8B] = vect__21.36_106;
25 vect__16.15_47 = MEM <vector(8) short int> [(short int *)&D.2229];
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug tree-optimization/102494] Failure to optimize vector reduction properly especially when using OpenMP
2021-09-27 0:52 [Bug tree-optimization/102494] New: Failure to optimize out vector reduction properly especially when using OpenMP gabravier at gmail dot com
` (6 preceding siblings ...)
2021-09-28 6:57 ` crazylht at gmail dot com
@ 2021-09-28 7:09 ` rguenther at suse dot de
2021-10-08 2:10 ` cvs-commit at gcc dot gnu.org
` (3 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: rguenther at suse dot de @ 2021-09-28 7:09 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102494
--- Comment #8 from rguenther at suse dot de <rguenther at suse dot de> ---
On Tue, 28 Sep 2021, crazylht at gmail dot com wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102494
>
> --- Comment #7 from Hongtao.liu <crazylht at gmail dot com> ---
> After supporting v4hi reduce, gimple seems not optimal to convert v8qi to v8hi.
>
> 6 vector(4) short int vect__21.36;
> 7 vector(4) unsigned short vect__2.31;
> 8 int16_t stmp_r_17.17;
> 9 vector(8) short int vect__16.15;
> 10 int16_t D.2229[8];
> 11 vector(8) short int _50;
> 12 vector(8) short int _51;
> 13 vector(8) short int _52;
> 14 vector(8) short int _53;
> 15 vector(8) short int _54;
> 16 vector(8) short int _55;
>
> 18 <bb 2> [local count: 189214783]:
> 19 vect__2.31_97 = [vec_unpack_lo_expr] a_90(D);
> 20 vect__2.31_98 = [vec_unpack_hi_expr] a_90(D);
> 21 vect__21.36_105 = VIEW_CONVERT_EXPR<vector(4) short int>(vect__2.31_97);
> 22 vect__21.36_106 = VIEW_CONVERT_EXPR<vector(4) short int>(vect__2.31_98);
> 23 MEM <vector(4) short int> [(short int *)&D.2229] = vect__21.36_105;
> 24 MEM <vector(4) short int> [(short int *)&D.2229 + 8B] = vect__21.36_106;
so the above could possibly use a V8QI -> V8HI conversion, the loop
vectorizer isn't good at producing those though. And of course the
appropriate conversion optab has to exist.
> 25 vect__16.15_47 = MEM <vector(8) short int> [(short int *)&D.2229];
Here's lack of "CSE" - I do have patches somewhere to turn this into
vect__16.15_47 = { vect__21.36_105, vect__21.36_106 };
but I'm not sure that's going to be profitable (well, the code as-is
will get a STLF hit).
There's also store-merging that could instead merge the stores
similarly (but then there's no CSE after store-merging so the load
would remain).
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug tree-optimization/102494] Failure to optimize vector reduction properly especially when using OpenMP
2021-09-27 0:52 [Bug tree-optimization/102494] New: Failure to optimize out vector reduction properly especially when using OpenMP gabravier at gmail dot com
` (7 preceding siblings ...)
2021-09-28 7:09 ` rguenther at suse dot de
@ 2021-10-08 2:10 ` cvs-commit at gcc dot gnu.org
2021-10-25 21:44 ` peter at cordes dot ca
` (2 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-10-08 2:10 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102494
--- Comment #9 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:
https://gcc.gnu.org/g:77ca2cfcdcccee3c8e8aeaf1d03e9920893d2486
commit r12-4241-g77ca2cfcdcccee3c8e8aeaf1d03e9920893d2486
Author: liuhongt <hongtao.liu@intel.com>
Date: Tue Sep 28 12:55:10 2021 +0800
Support reduc_{plus,smax,smin,umax,min}_scal_v4hi.
gcc/ChangeLog:
PR target/102494
* config/i386/i386-expand.c (emit_reduc_half): Hanlde V4HImode.
* config/i386/mmx.md (reduc_plus_scal_v4hi): New.
(reduc_<code>_scal_v4hi): New.
gcc/testsuite/ChangeLog:
* gcc.target/i386/mmx-reduce-op-1.c: New test.
* gcc.target/i386/mmx-reduce-op-2.c: New test.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug tree-optimization/102494] Failure to optimize vector reduction properly especially when using OpenMP
2021-09-27 0:52 [Bug tree-optimization/102494] New: Failure to optimize out vector reduction properly especially when using OpenMP gabravier at gmail dot com
` (8 preceding siblings ...)
2021-10-08 2:10 ` cvs-commit at gcc dot gnu.org
@ 2021-10-25 21:44 ` peter at cordes dot ca
2021-10-25 22:00 ` peter at cordes dot ca
2021-10-26 8:13 ` crazylht at gmail dot com
11 siblings, 0 replies; 13+ messages in thread
From: peter at cordes dot ca @ 2021-10-25 21:44 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102494
Peter Cordes <peter at cordes dot ca> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |peter at cordes dot ca
--- Comment #10 from Peter Cordes <peter at cordes dot ca> ---
Current trunk with -fopenmp is still not good https://godbolt.org/z/b3jjhcvTa
Still doing two separate sign extensions and two stores / wider reload (store
forwarding stall):
-O3 -march=skylake -fopenmp
simde_vaddlv_s8:
push rbp
vpmovsxbw xmm2, xmm0
vpsrlq xmm0, xmm0, 32
mov rbp, rsp
vpmovsxbw xmm3, xmm0
and rsp, -32
vmovq QWORD PTR [rsp-16], xmm2
vmovq QWORD PTR [rsp-8], xmm3
vmovdqa xmm4, XMMWORD PTR [rsp-16]
... then asm using byte-shifts
Including stuff like
movdqa xmm1, xmm0
psrldq xmm1, 4
instead of pshufd, which is an option because high garbage can be ignored.
And ARM64 goes scalar.
----
Current trunk *without* -fopenmp produces decent asm
https://godbolt.org/z/h1KEKPTW9
For ARM64 we've been making good asm since GCC 10.x (vs. scalar in 9.3)
simde_vaddlv_s8:
sxtl v0.8h, v0.8b
addv h0, v0.8h
umov w0, v0.h[0]
ret
x86-64 gcc -O3 -march=skylake
simde_vaddlv_s8:
vpmovsxbw xmm1, xmm0
vpsrlq xmm0, xmm0, 32
vpmovsxbw xmm0, xmm0
vpaddw xmm0, xmm1, xmm0
vpsrlq xmm1, xmm0, 32
vpaddw xmm0, xmm0, xmm1
vpsrlq xmm1, xmm0, 16
vpaddw xmm0, xmm0, xmm1
vpextrw eax, xmm0, 0
ret
That's pretty good, but VMOVD eax, xmm0 would be more efficient than VPEXTRW
when we don't need to avoid high garbage (because it's a return value in this
case). VPEXTRW zero-extends into RAX, so it's not directly helpful if we need
to sign-extend to 32 or 64-bit for some reason; we'd still need a scalar movsx.
Or with BMI2, go scalar before the last shift / VPADDW step, e.g.
...
vmovd eax, xmm0
rorx edx, eax, 16
add eax, edx
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug tree-optimization/102494] Failure to optimize vector reduction properly especially when using OpenMP
2021-09-27 0:52 [Bug tree-optimization/102494] New: Failure to optimize out vector reduction properly especially when using OpenMP gabravier at gmail dot com
` (9 preceding siblings ...)
2021-10-25 21:44 ` peter at cordes dot ca
@ 2021-10-25 22:00 ` peter at cordes dot ca
2021-10-26 8:13 ` crazylht at gmail dot com
11 siblings, 0 replies; 13+ messages in thread
From: peter at cordes dot ca @ 2021-10-25 22:00 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102494
--- Comment #11 from Peter Cordes <peter at cordes dot ca> ---
Also, horizontal byte sums are generally best done with VPSADBW against a zero
vector, even if that means some fiddling to flip to unsigned first and then
undo the bias.
simde_vaddlv_s8:
vpxor xmm0, xmm0, .LC0[rip] # set1_epi8(0x80) flip to unsigned 0..255
range
vpxor xmm1, xmm1
vpsadbw xmm0, xmm0, xmm1 # horizontal byte sum within each 64-bit half
vmovd eax, xmm0 # we only wanted the low half anyway
sub eax, 8 * 128 # subtract the bias we added earlier by flipping
sign bits
ret
This is so much shorter we'd still be ahead if we generated the vector constant
on the fly instead of loading it. (3 instructions: vpcmpeqd same,same / vpabsb
/ vpslld by 7. Or pcmpeqd / psllw 8 / packsswb same,same to saturate to -128)
If we had wanted a 128-bit (16 byte) vector sum, we'd need
...
vpsadbw ...
vpshufd xmm1, xmm0, 0xfe # shuffle upper 64 bits to the bottom
vpaddd xmm0, xmm0, xmm1
vmovd eax, xmm0
sub eax, 16 * 128
Works efficiently with only SSE2. Actually with AVX2, we should unpack the top
half with VUNPCKHQDQ to save a byte (no immediate operand), since we don't need
PSHUFD copy-and-shuffle.
Or movd / pextrw / scalar add but that's more uops: pextrw is 2 on its own.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug tree-optimization/102494] Failure to optimize vector reduction properly especially when using OpenMP
2021-09-27 0:52 [Bug tree-optimization/102494] New: Failure to optimize out vector reduction properly especially when using OpenMP gabravier at gmail dot com
` (10 preceding siblings ...)
2021-10-25 22:00 ` peter at cordes dot ca
@ 2021-10-26 8:13 ` crazylht at gmail dot com
11 siblings, 0 replies; 13+ messages in thread
From: crazylht at gmail dot com @ 2021-10-26 8:13 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102494
--- Comment #12 from Hongtao.liu <crazylht at gmail dot com> ---
> That's pretty good, but VMOVD eax, xmm0 would be more efficient than
> VPEXTRW when we don't need to avoid high garbage (because it's a return
> value in this case).
And TARGET_AVX512FP16 has vmovw.
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2021-10-26 8:13 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-27 0:52 [Bug tree-optimization/102494] New: Failure to optimize out vector reduction properly especially when using OpenMP gabravier at gmail dot com
2021-09-27 1:45 ` [Bug tree-optimization/102494] " pinskia at gcc dot gnu.org
2021-09-27 3:01 ` crazylht at gmail dot com
2021-09-27 5:08 ` [Bug tree-optimization/102494] Failure to optimize " crazylht at gmail dot com
2021-09-27 5:13 ` crazylht at gmail dot com
2021-09-27 5:55 ` crazylht at gmail dot com
2021-09-27 8:47 ` rguenth at gcc dot gnu.org
2021-09-28 6:57 ` crazylht at gmail dot com
2021-09-28 7:09 ` rguenther at suse dot de
2021-10-08 2:10 ` cvs-commit at gcc dot gnu.org
2021-10-25 21:44 ` peter at cordes dot ca
2021-10-25 22:00 ` peter at cordes dot ca
2021-10-26 8:13 ` crazylht at gmail dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).