public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/109885] New: gcc does not generate movmskps and testps instructions (clang does)
@ 2023-05-17 7:40 vincenzo.innocente at cern dot ch
2023-05-17 7:44 ` [Bug target/109885] " pinskia at gcc dot gnu.org
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: vincenzo.innocente at cern dot ch @ 2023-05-17 7:40 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109885
Bug ID: 109885
Summary: gcc does not generate movmskps and testps instructions
(clang does)
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: vincenzo.innocente at cern dot ch
Target Milestone: ---
in this simple code (on avx2)
int sum(float const * x) {
int ret = 0;
for (int i=0; i<8; ++i) ret +=(0==x[i]);
return ret;
}
int one(float const * x) {
int ret = 0;
for (int i=0; i<8; ++i) ret |=(0==x[i]);
return ret;
}
int all(float const * x) {
int ret = 1;
for (int i=0; i<8; ++i) ret &=(0==x[i]);
return ret;
}
clang uses movmskps and testps instructions, gcc does not
see for instance
https://godbolt.org/z/r11r8xoYz
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug target/109885] gcc does not generate movmskps and testps instructions (clang does)
2023-05-17 7:40 [Bug tree-optimization/109885] New: gcc does not generate movmskps and testps instructions (clang does) vincenzo.innocente at cern dot ch
@ 2023-05-17 7:44 ` pinskia at gcc dot gnu.org
2023-05-17 14:51 ` pinskia at gcc dot gnu.org
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-05-17 7:44 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109885
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |missed-optimization
Component|tree-optimization |target
Severity|normal |enhancement
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug target/109885] gcc does not generate movmskps and testps instructions (clang does)
2023-05-17 7:40 [Bug tree-optimization/109885] New: gcc does not generate movmskps and testps instructions (clang does) vincenzo.innocente at cern dot ch
2023-05-17 7:44 ` [Bug target/109885] " pinskia at gcc dot gnu.org
@ 2023-05-17 14:51 ` pinskia at gcc dot gnu.org
2023-05-17 15:31 ` pinskia at gcc dot gnu.org
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-05-17 14:51 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109885
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Just FYI, GCC does better on aarch64 with sum.
GCC:
ldp q29, q30, [x0]
movi v31.4s, 0x1
fcmeq v29.4s, v29.4s, 0
fcmeq v30.4s, v30.4s, 0
and v31.16b, v31.16b, v29.16b
sub v31.4s, v31.4s, v30.4s
addv s31, v31.4s
fmov w0, s31
ret
vs this mess:
sub sp, sp, #16
ldp q1, q0, [x0]
adrp x8, .LCPI0_0
fcmeq v1.4s, v1.4s, #0.0
fcmeq v0.4s, v0.4s, #0.0
uzp1 v0.8h, v1.8h, v0.8h
ldr q1, [x8, :lo12:.LCPI0_0]
and v0.16b, v0.16b, v1.16b
addv h0, v0.8h
fmov w8, s0
and w8, w8, #0xff
fmov s0, w8
cnt v0.8b, v0.8b
uaddlv h0, v0.8b
fmov w0, s0
add sp, sp, #16
ret
The reason is it looks like clang/LLVM is tuned to try to use movmskps/testps
while GCC is tuned to do just a sum reduction in general.
Though I think GCC could be slightly better here too.
ldp q29, q30, [x0]
fcmeq v29.4s, v29.4s, 0
fcmeq v30.4s, v30.4s, 0
add v31.16b, v29.16b, v30.16b
addv s31, v31.4s
fmov w0, s31
neg w0, w0
ret
I think might be the best code for aarch64 reduction of bools
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug target/109885] gcc does not generate movmskps and testps instructions (clang does)
2023-05-17 7:40 [Bug tree-optimization/109885] New: gcc does not generate movmskps and testps instructions (clang does) vincenzo.innocente at cern dot ch
2023-05-17 7:44 ` [Bug target/109885] " pinskia at gcc dot gnu.org
2023-05-17 14:51 ` pinskia at gcc dot gnu.org
@ 2023-05-17 15:31 ` pinskia at gcc dot gnu.org
2024-02-10 9:53 ` [Bug tree-optimization/109885] " pinskia at gcc dot gnu.org
2024-02-18 3:09 ` liuhongt at gcc dot gnu.org
4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-05-17 15:31 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109885
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Last reconfirmed| |2023-05-17
Ever confirmed|0 |1
--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Confirmed.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug tree-optimization/109885] gcc does not generate movmskps and testps instructions (clang does)
2023-05-17 7:40 [Bug tree-optimization/109885] New: gcc does not generate movmskps and testps instructions (clang does) vincenzo.innocente at cern dot ch
` (2 preceding siblings ...)
2023-05-17 15:31 ` pinskia at gcc dot gnu.org
@ 2024-02-10 9:53 ` pinskia at gcc dot gnu.org
2024-02-18 3:09 ` liuhongt at gcc dot gnu.org
4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-02-10 9:53 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109885
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Component|target |tree-optimization
CC| |pinskia at gcc dot gnu.org
Blocks| |53947
--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
What is even funnier on the LLVM side is if we have:
```
void f(unsigned int * __restrict a, unsigned int * __restrict b)
{
unsigned int t = 0;
t += (a[0] == b[0]);
t += (a[1] == b[1])<<1;
t += (a[2] == b[2])<<2;
t += (a[3] == b[3])<<3;
*a = t;
}
```
LLVM can produce movmskps for x86_64 but then does do a similar trick that it
did for the sum for aarch64.
Note GCC does not handle reductions that well for SLP either.
Referenced Bugs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug tree-optimization/109885] gcc does not generate movmskps and testps instructions (clang does)
2023-05-17 7:40 [Bug tree-optimization/109885] New: gcc does not generate movmskps and testps instructions (clang does) vincenzo.innocente at cern dot ch
` (3 preceding siblings ...)
2024-02-10 9:53 ` [Bug tree-optimization/109885] " pinskia at gcc dot gnu.org
@ 2024-02-18 3:09 ` liuhongt at gcc dot gnu.org
4 siblings, 0 replies; 6+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-02-18 3:09 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109885
--- Comment #4 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
int sum() {
int ret = 0;
for (int i=0; i<8; ++i) ret +=(0==v[i]);
return ret;
}
int sum2() {
int ret = 0;
auto m = v==0;
for (int i=0; i<8; ++i) ret += m[i];
return ret;
}
For sum, gcc tries to reduce for an {0/1, 0/1, ...} vector, for sum2, it tries
to reduce {0/-1,0/-1,...} vector. But LLVM tries to reduce {0/1, 0/1, ... }
vector for both sum and sum2. Not sure which is correct?
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-02-18 3:09 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-17 7:40 [Bug tree-optimization/109885] New: gcc does not generate movmskps and testps instructions (clang does) vincenzo.innocente at cern dot ch
2023-05-17 7:44 ` [Bug target/109885] " pinskia at gcc dot gnu.org
2023-05-17 14:51 ` pinskia at gcc dot gnu.org
2023-05-17 15:31 ` pinskia at gcc dot gnu.org
2024-02-10 9:53 ` [Bug tree-optimization/109885] " pinskia at gcc dot gnu.org
2024-02-18 3:09 ` liuhongt at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).