[Bug c/106323] New: [Suboptimal] memcmp(s1, s2, n) == 0 expansion on AArch64 compare to llvm

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug c/106323] New: [Suboptimal] memcmp(s1, s2, n) == 0 expansion on AArch64 compare to llvm
@ 2022-07-16  9:55 zhongyunde at huawei dot com
  2022-07-17 20:23 ` [Bug middle-end/106323] " pinskia at gcc dot gnu.org
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: zhongyunde at huawei dot com @ 2022-07-16  9:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106323

            Bug ID: 106323
           Summary: [Suboptimal] memcmp(s1, s2, n) == 0 expansion on
                    AArch64 compare to llvm
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: zhongyunde at huawei dot com
  Target Milestone: ---

test case, see detail https://gcc.godbolt.org/z/PM3jxEM9M

```
#include <string.h>

int src(char* s1, char* s2) { 
  return memcmp(s1, s2, 3) == 0; 
}
```

* llvm doesn't emit branch with instruction cset
```
src:                                    // @src
        ldrh    w8, [x0]
        ldrh    w9, [x1]
        ldrb    w10, [x0, #2]
        ldrb    w11, [x1, #2]
        eor     w8, w8, w9
        eor     w9, w10, w11
        orr     w8, w8, w9
        cmp     w8, #0
        cset    w0, eq
        ret
```

* gcc
```
src:
        ldrh    w3, [x0]
        ldrh    w2, [x1]
        cmp     w3, w2
        beq     .L5
.L2:
        mov     w0, 1
        eor     w0, w0, 1
        ret
.L5:
        ldrb    w2, [x0, 2]
        ldrb    w0, [x1, 2]
        cmp     w2, w0
        bne     .L2
        mov     w0, 0
        eor     w0, w0, 1
        ret
```

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug middle-end/106323] [Suboptimal] memcmp(s1, s2, n) == 0 expansion on AArch64 compare to llvm
  2022-07-16  9:55 [Bug c/106323] New: [Suboptimal] memcmp(s1, s2, n) == 0 expansion on AArch64 compare to llvm zhongyunde at huawei dot com
@ 2022-07-17 20:23 ` pinskia at gcc dot gnu.org
  2022-07-17 20:37 ` pinskia at gcc dot gnu.org
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-07-17 20:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106323

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
GCC might be better if the first bytes are in cache but the next bytes are not
and then branch is predictable (which it might be).

So this is much more complex than just changing this really.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug middle-end/106323] [Suboptimal] memcmp(s1, s2, n) == 0 expansion on AArch64 compare to llvm
  2022-07-16  9:55 [Bug c/106323] New: [Suboptimal] memcmp(s1, s2, n) == 0 expansion on AArch64 compare to llvm zhongyunde at huawei dot com
  2022-07-17 20:23 ` [Bug middle-end/106323] " pinskia at gcc dot gnu.org
@ 2022-07-17 20:37 ` pinskia at gcc dot gnu.org
  2022-07-18  7:31 ` wilco at gcc dot gnu.org
  2022-12-06 13:03 ` zhongyunde at huawei dot com
  3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-07-17 20:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106323

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Depends on|                            |104611

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Note this could be even better as using cmp/ccmp on aarch64 than using eor/or
really.
See bug 104611 comment #1 for that.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104611
[Bug 104611] memcmp/strcmp/strncmp can be optimized when the result is tested
for [in]equality with 0 on aarch64

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug middle-end/106323] [Suboptimal] memcmp(s1, s2, n) == 0 expansion on AArch64 compare to llvm
  2022-07-16  9:55 [Bug c/106323] New: [Suboptimal] memcmp(s1, s2, n) == 0 expansion on AArch64 compare to llvm zhongyunde at huawei dot com
  2022-07-17 20:23 ` [Bug middle-end/106323] " pinskia at gcc dot gnu.org
  2022-07-17 20:37 ` pinskia at gcc dot gnu.org
@ 2022-07-18  7:31 ` wilco at gcc dot gnu.org
  2022-12-06 13:03 ` zhongyunde at huawei dot com
  3 siblings, 0 replies; 5+ messages in thread
From: wilco at gcc dot gnu.org @ 2022-07-18  7:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106323

Wilco <wilco at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |wilco at gcc dot gnu.org

--- Comment #3 from Wilco <wilco at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #1)
> GCC might be better if the first bytes are in cache but the next bytes are
> not and then branch is predictable (which it might be).
> 
> So this is much more complex than just changing this really.

Neither sequence is efficient. Caches are not really relevant here, it's more
about giving a wide OoO core lots of useful parallel work to do, so avoiding
unnecessary instructions and branches that just slow you down. Hence 4 loads
and CMP+CCMP is best.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug middle-end/106323] [Suboptimal] memcmp(s1, s2, n) == 0 expansion on AArch64 compare to llvm
  2022-07-16  9:55 [Bug c/106323] New: [Suboptimal] memcmp(s1, s2, n) == 0 expansion on AArch64 compare to llvm zhongyunde at huawei dot com
                   ` (2 preceding siblings ...)
  2022-07-18  7:31 ` wilco at gcc dot gnu.org
@ 2022-12-06 13:03 ` zhongyunde at huawei dot com
  3 siblings, 0 replies; 5+ messages in thread
From: zhongyunde at huawei dot com @ 2022-12-06 13:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106323

--- Comment #4 from vfdff <zhongyunde at huawei dot com> ---
Now, llvm use 4 loads and CMP+CCMP, https://gcc.godbolt.org/z/PM3jxEM9M

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-12-06 13:03 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-16  9:55 [Bug c/106323] New: [Suboptimal] memcmp(s1, s2, n) == 0 expansion on AArch64 compare to llvm zhongyunde at huawei dot com
2022-07-17 20:23 ` [Bug middle-end/106323] " pinskia at gcc dot gnu.org
2022-07-17 20:37 ` pinskia at gcc dot gnu.org
2022-07-18  7:31 ` wilco at gcc dot gnu.org
2022-12-06 13:03 ` zhongyunde at huawei dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).