public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/106323] New: [Suboptimal] memcmp(s1, s2, n) == 0 expansion on AArch64 compare to llvm
@ 2022-07-16 9:55 zhongyunde at huawei dot com
2022-07-17 20:23 ` [Bug middle-end/106323] " pinskia at gcc dot gnu.org
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: zhongyunde at huawei dot com @ 2022-07-16 9:55 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106323
Bug ID: 106323
Summary: [Suboptimal] memcmp(s1, s2, n) == 0 expansion on
AArch64 compare to llvm
Product: gcc
Version: 13.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
Assignee: unassigned at gcc dot gnu.org
Reporter: zhongyunde at huawei dot com
Target Milestone: ---
test case, see detail https://gcc.godbolt.org/z/PM3jxEM9M
```
#include <string.h>
int src(char* s1, char* s2) {
return memcmp(s1, s2, 3) == 0;
}
```
* llvm doesn't emit branch with instruction cset
```
src: // @src
ldrh w8, [x0]
ldrh w9, [x1]
ldrb w10, [x0, #2]
ldrb w11, [x1, #2]
eor w8, w8, w9
eor w9, w10, w11
orr w8, w8, w9
cmp w8, #0
cset w0, eq
ret
```
* gcc
```
src:
ldrh w3, [x0]
ldrh w2, [x1]
cmp w3, w2
beq .L5
.L2:
mov w0, 1
eor w0, w0, 1
ret
.L5:
ldrb w2, [x0, 2]
ldrb w0, [x1, 2]
cmp w2, w0
bne .L2
mov w0, 0
eor w0, w0, 1
ret
```
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug middle-end/106323] [Suboptimal] memcmp(s1, s2, n) == 0 expansion on AArch64 compare to llvm
2022-07-16 9:55 [Bug c/106323] New: [Suboptimal] memcmp(s1, s2, n) == 0 expansion on AArch64 compare to llvm zhongyunde at huawei dot com
@ 2022-07-17 20:23 ` pinskia at gcc dot gnu.org
2022-07-17 20:37 ` pinskia at gcc dot gnu.org
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-07-17 20:23 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106323
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
GCC might be better if the first bytes are in cache but the next bytes are not
and then branch is predictable (which it might be).
So this is much more complex than just changing this really.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug middle-end/106323] [Suboptimal] memcmp(s1, s2, n) == 0 expansion on AArch64 compare to llvm
2022-07-16 9:55 [Bug c/106323] New: [Suboptimal] memcmp(s1, s2, n) == 0 expansion on AArch64 compare to llvm zhongyunde at huawei dot com
2022-07-17 20:23 ` [Bug middle-end/106323] " pinskia at gcc dot gnu.org
@ 2022-07-17 20:37 ` pinskia at gcc dot gnu.org
2022-07-18 7:31 ` wilco at gcc dot gnu.org
2022-12-06 13:03 ` zhongyunde at huawei dot com
3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-07-17 20:37 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106323
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Depends on| |104611
--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Note this could be even better as using cmp/ccmp on aarch64 than using eor/or
really.
See bug 104611 comment #1 for that.
Referenced Bugs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104611
[Bug 104611] memcmp/strcmp/strncmp can be optimized when the result is tested
for [in]equality with 0 on aarch64
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug middle-end/106323] [Suboptimal] memcmp(s1, s2, n) == 0 expansion on AArch64 compare to llvm
2022-07-16 9:55 [Bug c/106323] New: [Suboptimal] memcmp(s1, s2, n) == 0 expansion on AArch64 compare to llvm zhongyunde at huawei dot com
2022-07-17 20:23 ` [Bug middle-end/106323] " pinskia at gcc dot gnu.org
2022-07-17 20:37 ` pinskia at gcc dot gnu.org
@ 2022-07-18 7:31 ` wilco at gcc dot gnu.org
2022-12-06 13:03 ` zhongyunde at huawei dot com
3 siblings, 0 replies; 5+ messages in thread
From: wilco at gcc dot gnu.org @ 2022-07-18 7:31 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106323
Wilco <wilco at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |wilco at gcc dot gnu.org
--- Comment #3 from Wilco <wilco at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #1)
> GCC might be better if the first bytes are in cache but the next bytes are
> not and then branch is predictable (which it might be).
>
> So this is much more complex than just changing this really.
Neither sequence is efficient. Caches are not really relevant here, it's more
about giving a wide OoO core lots of useful parallel work to do, so avoiding
unnecessary instructions and branches that just slow you down. Hence 4 loads
and CMP+CCMP is best.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug middle-end/106323] [Suboptimal] memcmp(s1, s2, n) == 0 expansion on AArch64 compare to llvm
2022-07-16 9:55 [Bug c/106323] New: [Suboptimal] memcmp(s1, s2, n) == 0 expansion on AArch64 compare to llvm zhongyunde at huawei dot com
` (2 preceding siblings ...)
2022-07-18 7:31 ` wilco at gcc dot gnu.org
@ 2022-12-06 13:03 ` zhongyunde at huawei dot com
3 siblings, 0 replies; 5+ messages in thread
From: zhongyunde at huawei dot com @ 2022-12-06 13:03 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106323
--- Comment #4 from vfdff <zhongyunde at huawei dot com> ---
Now, llvm use 4 loads and CMP+CCMP, https://gcc.godbolt.org/z/PM3jxEM9M
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2022-12-06 13:03 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-16 9:55 [Bug c/106323] New: [Suboptimal] memcmp(s1, s2, n) == 0 expansion on AArch64 compare to llvm zhongyunde at huawei dot com
2022-07-17 20:23 ` [Bug middle-end/106323] " pinskia at gcc dot gnu.org
2022-07-17 20:37 ` pinskia at gcc dot gnu.org
2022-07-18 7:31 ` wilco at gcc dot gnu.org
2022-12-06 13:03 ` zhongyunde at huawei dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).