public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/109885] New: gcc does not generate movmskps and testps instructions  (clang does)
@ 2023-05-17  7:40 vincenzo.innocente at cern dot ch
  2023-05-17  7:44 ` [Bug target/109885] " pinskia at gcc dot gnu.org
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: vincenzo.innocente at cern dot ch @ 2023-05-17  7:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109885

            Bug ID: 109885
           Summary: gcc does not generate movmskps and testps instructions
                     (clang does)
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: vincenzo.innocente at cern dot ch
  Target Milestone: ---

in this simple code (on avx2)

int sum(float const * x) {
   int ret = 0;
   for (int i=0; i<8; ++i) ret +=(0==x[i]);
   return ret;
}

int one(float const * x) {
   int ret = 0;
   for (int i=0; i<8; ++i) ret |=(0==x[i]);
   return ret;
}

int all(float const * x) {
   int ret = 1;
   for (int i=0; i<8; ++i) ret &=(0==x[i]);
   return ret;
}

clang uses movmskps and testps instructions, gcc does not

see for instance

https://godbolt.org/z/r11r8xoYz

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/109885] gcc does not generate movmskps and testps instructions  (clang does)
  2023-05-17  7:40 [Bug tree-optimization/109885] New: gcc does not generate movmskps and testps instructions (clang does) vincenzo.innocente at cern dot ch
@ 2023-05-17  7:44 ` pinskia at gcc dot gnu.org
  2023-05-17 14:51 ` pinskia at gcc dot gnu.org
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-05-17  7:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109885

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
          Component|tree-optimization           |target
           Severity|normal                      |enhancement

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/109885] gcc does not generate movmskps and testps instructions  (clang does)
  2023-05-17  7:40 [Bug tree-optimization/109885] New: gcc does not generate movmskps and testps instructions (clang does) vincenzo.innocente at cern dot ch
  2023-05-17  7:44 ` [Bug target/109885] " pinskia at gcc dot gnu.org
@ 2023-05-17 14:51 ` pinskia at gcc dot gnu.org
  2023-05-17 15:31 ` pinskia at gcc dot gnu.org
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-05-17 14:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109885

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Just FYI, GCC does better on aarch64 with sum.
GCC:
        ldp     q29, q30, [x0]
        movi    v31.4s, 0x1
        fcmeq   v29.4s, v29.4s, 0
        fcmeq   v30.4s, v30.4s, 0
        and     v31.16b, v31.16b, v29.16b
        sub     v31.4s, v31.4s, v30.4s
        addv    s31, v31.4s
        fmov    w0, s31
        ret

vs this mess:
        sub     sp, sp, #16
        ldp     q1, q0, [x0]
        adrp    x8, .LCPI0_0
        fcmeq   v1.4s, v1.4s, #0.0
        fcmeq   v0.4s, v0.4s, #0.0
        uzp1    v0.8h, v1.8h, v0.8h
        ldr     q1, [x8, :lo12:.LCPI0_0]
        and     v0.16b, v0.16b, v1.16b
        addv    h0, v0.8h
        fmov    w8, s0
        and     w8, w8, #0xff
        fmov    s0, w8
        cnt     v0.8b, v0.8b
        uaddlv  h0, v0.8b
        fmov    w0, s0
        add     sp, sp, #16
        ret

The reason is it looks like clang/LLVM is tuned to try to use movmskps/testps
while GCC is tuned to do just a sum reduction in general.
Though I think GCC could be slightly better here too.
        ldp     q29, q30, [x0]
        fcmeq   v29.4s, v29.4s, 0
        fcmeq   v30.4s, v30.4s, 0
        add     v31.16b, v29.16b, v30.16b
        addv    s31, v31.4s
        fmov    w0, s31
        neg     w0, w0
        ret

I think might be the best code for aarch64 reduction of bools

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/109885] gcc does not generate movmskps and testps instructions  (clang does)
  2023-05-17  7:40 [Bug tree-optimization/109885] New: gcc does not generate movmskps and testps instructions (clang does) vincenzo.innocente at cern dot ch
  2023-05-17  7:44 ` [Bug target/109885] " pinskia at gcc dot gnu.org
  2023-05-17 14:51 ` pinskia at gcc dot gnu.org
@ 2023-05-17 15:31 ` pinskia at gcc dot gnu.org
  2024-02-10  9:53 ` [Bug tree-optimization/109885] " pinskia at gcc dot gnu.org
  2024-02-18  3:09 ` liuhongt at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-05-17 15:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109885

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2023-05-17
     Ever confirmed|0                           |1

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Confirmed.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/109885] gcc does not generate movmskps and testps instructions  (clang does)
  2023-05-17  7:40 [Bug tree-optimization/109885] New: gcc does not generate movmskps and testps instructions (clang does) vincenzo.innocente at cern dot ch
                   ` (2 preceding siblings ...)
  2023-05-17 15:31 ` pinskia at gcc dot gnu.org
@ 2024-02-10  9:53 ` pinskia at gcc dot gnu.org
  2024-02-18  3:09 ` liuhongt at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-02-10  9:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109885

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|target                      |tree-optimization
                 CC|                            |pinskia at gcc dot gnu.org
             Blocks|                            |53947

--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
What is even funnier on the LLVM side is if we have:
```
void f(unsigned int * __restrict a, unsigned int * __restrict b)
{
  unsigned int t = 0;
  t += (a[0] == b[0]);
  t += (a[1] == b[1])<<1;
  t += (a[2] == b[2])<<2;
  t += (a[3] == b[3])<<3;
  *a = t;
}
```
LLVM can produce movmskps for x86_64 but then does do a similar trick that it
did for the sum for aarch64.

Note GCC does not handle reductions that well for SLP either.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/109885] gcc does not generate movmskps and testps instructions  (clang does)
  2023-05-17  7:40 [Bug tree-optimization/109885] New: gcc does not generate movmskps and testps instructions (clang does) vincenzo.innocente at cern dot ch
                   ` (3 preceding siblings ...)
  2024-02-10  9:53 ` [Bug tree-optimization/109885] " pinskia at gcc dot gnu.org
@ 2024-02-18  3:09 ` liuhongt at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-02-18  3:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109885

--- Comment #4 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
int sum() {
   int ret = 0;
   for (int i=0; i<8; ++i) ret +=(0==v[i]);
   return ret;
}

int sum2() {
   int ret = 0;
   auto m = v==0;
   for (int i=0; i<8; ++i) ret += m[i];
   return ret;
}

For sum, gcc tries to reduce for an {0/1, 0/1, ...} vector, for sum2, it tries
to reduce {0/-1,0/-1,...} vector. But LLVM tries to reduce {0/1, 0/1, ... }
vector for both sum and sum2. Not sure which is correct?

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-02-18  3:09 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-17  7:40 [Bug tree-optimization/109885] New: gcc does not generate movmskps and testps instructions (clang does) vincenzo.innocente at cern dot ch
2023-05-17  7:44 ` [Bug target/109885] " pinskia at gcc dot gnu.org
2023-05-17 14:51 ` pinskia at gcc dot gnu.org
2023-05-17 15:31 ` pinskia at gcc dot gnu.org
2024-02-10  9:53 ` [Bug tree-optimization/109885] " pinskia at gcc dot gnu.org
2024-02-18  3:09 ` liuhongt at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).