public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
* [Bug tree-optimization/109885] New: gcc does not generate movmskps and testps instructions (clang does) @ 2023-05-17 7:40 vincenzo.innocente at cern dot ch 2023-05-17 7:44 ` [Bug target/109885] " pinskia at gcc dot gnu.org ` (4 more replies) 0 siblings, 5 replies; 6+ messages in thread From: vincenzo.innocente at cern dot ch @ 2023-05-17 7:40 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109885 Bug ID: 109885 Summary: gcc does not generate movmskps and testps instructions (clang does) Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch Target Milestone: --- in this simple code (on avx2) int sum(float const * x) { int ret = 0; for (int i=0; i<8; ++i) ret +=(0==x[i]); return ret; } int one(float const * x) { int ret = 0; for (int i=0; i<8; ++i) ret |=(0==x[i]); return ret; } int all(float const * x) { int ret = 1; for (int i=0; i<8; ++i) ret &=(0==x[i]); return ret; } clang uses movmskps and testps instructions, gcc does not see for instance https://godbolt.org/z/r11r8xoYz ^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug target/109885] gcc does not generate movmskps and testps instructions (clang does) 2023-05-17 7:40 [Bug tree-optimization/109885] New: gcc does not generate movmskps and testps instructions (clang does) vincenzo.innocente at cern dot ch @ 2023-05-17 7:44 ` pinskia at gcc dot gnu.org 2023-05-17 14:51 ` pinskia at gcc dot gnu.org ` (3 subsequent siblings) 4 siblings, 0 replies; 6+ messages in thread From: pinskia at gcc dot gnu.org @ 2023-05-17 7:44 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109885 Andrew Pinski <pinskia at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |missed-optimization Component|tree-optimization |target Severity|normal |enhancement ^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug target/109885] gcc does not generate movmskps and testps instructions (clang does) 2023-05-17 7:40 [Bug tree-optimization/109885] New: gcc does not generate movmskps and testps instructions (clang does) vincenzo.innocente at cern dot ch 2023-05-17 7:44 ` [Bug target/109885] " pinskia at gcc dot gnu.org @ 2023-05-17 14:51 ` pinskia at gcc dot gnu.org 2023-05-17 15:31 ` pinskia at gcc dot gnu.org ` (2 subsequent siblings) 4 siblings, 0 replies; 6+ messages in thread From: pinskia at gcc dot gnu.org @ 2023-05-17 14:51 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109885 --- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> --- Just FYI, GCC does better on aarch64 with sum. GCC: ldp q29, q30, [x0] movi v31.4s, 0x1 fcmeq v29.4s, v29.4s, 0 fcmeq v30.4s, v30.4s, 0 and v31.16b, v31.16b, v29.16b sub v31.4s, v31.4s, v30.4s addv s31, v31.4s fmov w0, s31 ret vs this mess: sub sp, sp, #16 ldp q1, q0, [x0] adrp x8, .LCPI0_0 fcmeq v1.4s, v1.4s, #0.0 fcmeq v0.4s, v0.4s, #0.0 uzp1 v0.8h, v1.8h, v0.8h ldr q1, [x8, :lo12:.LCPI0_0] and v0.16b, v0.16b, v1.16b addv h0, v0.8h fmov w8, s0 and w8, w8, #0xff fmov s0, w8 cnt v0.8b, v0.8b uaddlv h0, v0.8b fmov w0, s0 add sp, sp, #16 ret The reason is it looks like clang/LLVM is tuned to try to use movmskps/testps while GCC is tuned to do just a sum reduction in general. Though I think GCC could be slightly better here too. ldp q29, q30, [x0] fcmeq v29.4s, v29.4s, 0 fcmeq v30.4s, v30.4s, 0 add v31.16b, v29.16b, v30.16b addv s31, v31.4s fmov w0, s31 neg w0, w0 ret I think might be the best code for aarch64 reduction of bools ^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug target/109885] gcc does not generate movmskps and testps instructions (clang does) 2023-05-17 7:40 [Bug tree-optimization/109885] New: gcc does not generate movmskps and testps instructions (clang does) vincenzo.innocente at cern dot ch 2023-05-17 7:44 ` [Bug target/109885] " pinskia at gcc dot gnu.org 2023-05-17 14:51 ` pinskia at gcc dot gnu.org @ 2023-05-17 15:31 ` pinskia at gcc dot gnu.org 2024-02-10 9:53 ` [Bug tree-optimization/109885] " pinskia at gcc dot gnu.org 2024-02-18 3:09 ` liuhongt at gcc dot gnu.org 4 siblings, 0 replies; 6+ messages in thread From: pinskia at gcc dot gnu.org @ 2023-05-17 15:31 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109885 Andrew Pinski <pinskia at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Last reconfirmed| |2023-05-17 Ever confirmed|0 |1 --- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> --- Confirmed. ^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug tree-optimization/109885] gcc does not generate movmskps and testps instructions (clang does) 2023-05-17 7:40 [Bug tree-optimization/109885] New: gcc does not generate movmskps and testps instructions (clang does) vincenzo.innocente at cern dot ch ` (2 preceding siblings ...) 2023-05-17 15:31 ` pinskia at gcc dot gnu.org @ 2024-02-10 9:53 ` pinskia at gcc dot gnu.org 2024-02-18 3:09 ` liuhongt at gcc dot gnu.org 4 siblings, 0 replies; 6+ messages in thread From: pinskia at gcc dot gnu.org @ 2024-02-10 9:53 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109885 Andrew Pinski <pinskia at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Component|target |tree-optimization CC| |pinskia at gcc dot gnu.org Blocks| |53947 --- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> --- What is even funnier on the LLVM side is if we have: ``` void f(unsigned int * __restrict a, unsigned int * __restrict b) { unsigned int t = 0; t += (a[0] == b[0]); t += (a[1] == b[1])<<1; t += (a[2] == b[2])<<2; t += (a[3] == b[3])<<3; *a = t; } ``` LLVM can produce movmskps for x86_64 but then does do a similar trick that it did for the sum for aarch64. Note GCC does not handle reductions that well for SLP either. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 [Bug 53947] [meta-bug] vectorizer missed-optimizations ^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug tree-optimization/109885] gcc does not generate movmskps and testps instructions (clang does) 2023-05-17 7:40 [Bug tree-optimization/109885] New: gcc does not generate movmskps and testps instructions (clang does) vincenzo.innocente at cern dot ch ` (3 preceding siblings ...) 2024-02-10 9:53 ` [Bug tree-optimization/109885] " pinskia at gcc dot gnu.org @ 2024-02-18 3:09 ` liuhongt at gcc dot gnu.org 4 siblings, 0 replies; 6+ messages in thread From: liuhongt at gcc dot gnu.org @ 2024-02-18 3:09 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109885 --- Comment #4 from Hongtao Liu <liuhongt at gcc dot gnu.org> --- int sum() { int ret = 0; for (int i=0; i<8; ++i) ret +=(0==v[i]); return ret; } int sum2() { int ret = 0; auto m = v==0; for (int i=0; i<8; ++i) ret += m[i]; return ret; } For sum, gcc tries to reduce for an {0/1, 0/1, ...} vector, for sum2, it tries to reduce {0/-1,0/-1,...} vector. But LLVM tries to reduce {0/1, 0/1, ... } vector for both sum and sum2. Not sure which is correct? ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-02-18 3:09 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-05-17 7:40 [Bug tree-optimization/109885] New: gcc does not generate movmskps and testps instructions (clang does) vincenzo.innocente at cern dot ch 2023-05-17 7:44 ` [Bug target/109885] " pinskia at gcc dot gnu.org 2023-05-17 14:51 ` pinskia at gcc dot gnu.org 2023-05-17 15:31 ` pinskia at gcc dot gnu.org 2024-02-10 9:53 ` [Bug tree-optimization/109885] " pinskia at gcc dot gnu.org 2024-02-18 3:09 ` liuhongt at gcc dot gnu.org
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).