public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c++/108953] New: inefficient codegen for trivial equality
@ 2023-02-27 16:29 barry.revzin at gmail dot com
2023-02-27 16:38 ` [Bug c++/108953] " pinskia at gcc dot gnu.org
` (5 more replies)
0 siblings, 6 replies; 7+ messages in thread
From: barry.revzin at gmail dot com @ 2023-02-27 16:29 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108953
Bug ID: 108953
Summary: inefficient codegen for trivial equality
Product: gcc
Version: 12.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: barry.revzin at gmail dot com
Target Milestone: ---
Consider this example:
#include <cstdint>
#include <cstddef>
#include <string.h>
struct C
{
uint8_t a;
uint8_t b;
uint8_t c;
uint8_t d;
uint16_t e;
uint16_t f;
int32_t g;
bool operator==(C const&) const = default;
};
bool check(C const& lhs, C const& rhs) {
#ifdef MEMCMP
return memcmp(&lhs, &rhs, sizeof(lhs)) == 0;
#else
return lhs == rhs;
#endif
}
There are two implementations of check here, but lead to suboptimal code.
When using MEMCMP, gcc trunk -O3 emits:
check(C const&, C const&):
mov rax, QWORD PTR [rsi]
cmp QWORD PTR [rdi], rax
je .L5
.L2:
mov eax, 1
test eax, eax
sete al
ret
.L5:
mov eax, DWORD PTR [rsi+8]
cmp DWORD PTR [rdi+8], eax
jne .L2
xor eax, eax
test eax, eax
sete al
ret
There's a few extra instructions here (mov eax, 1; test eax, eax; sete al;...
do we need all three of those to return 0?)
When using defaulted comparisons, gcc trunk -O3 doesn't collapse any of the
comparisons, and instead emits 7 distinct checks:
check(C const&, C const&):
movzx ecx, BYTE PTR [rsi]
xor eax, eax
cmp BYTE PTR [rdi], cl
jne .L1
movzx edx, BYTE PTR [rsi+1]
cmp BYTE PTR [rdi+1], dl
jne .L1
movzx edx, BYTE PTR [rsi+2]
cmp BYTE PTR [rdi+2], dl
jne .L1
movzx edx, BYTE PTR [rsi+3]
cmp BYTE PTR [rdi+3], dl
jne .L1
movzx edx, WORD PTR [rsi+4]
cmp WORD PTR [rdi+4], dx
jne .L1
movzx eax, WORD PTR [rsi+6]
cmp WORD PTR [rdi+6], ax
mov edx, DWORD PTR [rsi+8]
sete al
cmp DWORD PTR [rdi+8], edx
sete dl
and eax, edx
.L1:
ret
Compare this to clang, which for both the memcmp and the default equality
versions emits this:
check(C const&, C const&): # @check(C const&, C const&)
mov rax, qword ptr [rdi]
xor rax, qword ptr [rsi]
mov ecx, dword ptr [rdi + 8]
xor ecx, dword ptr [rsi + 8]
or rcx, rax
sete al
ret
Looks like there are two missing optimizations here for gcc: (1) the memcmp
does get optimized into an 8-byte and 4-byte comparison, but then the result of
that optimization doesn't get optimized further and (2) multiple trivial
comparisons don't get coalesced together.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug c++/108953] inefficient codegen for trivial equality
2023-02-27 16:29 [Bug c++/108953] New: inefficient codegen for trivial equality barry.revzin at gmail dot com
@ 2023-02-27 16:38 ` pinskia at gcc dot gnu.org
2023-02-27 17:11 ` [Bug tree-optimization/108953] " jakub at gcc dot gnu.org
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-02-27 16:38 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108953
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Severity|normal |enhancement
Keywords| |missed-optimization
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug tree-optimization/108953] inefficient codegen for trivial equality
2023-02-27 16:29 [Bug c++/108953] New: inefficient codegen for trivial equality barry.revzin at gmail dot com
2023-02-27 16:38 ` [Bug c++/108953] " pinskia at gcc dot gnu.org
@ 2023-02-27 17:11 ` jakub at gcc dot gnu.org
2023-02-28 10:44 ` rguenth at gcc dot gnu.org
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-02-27 17:11 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108953
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Component|c++ |tree-optimization
CC| |jakub at gcc dot gnu.org
--- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
I don't think there is anything the C++ FE should do here, user can write it
like that
by hand too:
bool check(C const& lhs, C const& rhs) {
if (lhs.a != rhs.a) return false;
if (lhs.b != rhs.b) return false;
if (lhs.c != rhs.c) return false;
if (lhs.d != rhs.d) return false;
if (lhs.e != rhs.e) return false;
if (lhs.f != rhs.f) return false;
if (lhs.g != rhs.g) return false;
return true;
}
etc., so we should pattern match this somewhere during GIMPLE opts.
The memcmp issue is a RTL issue (so I think we want actually two bugs), at
GIMPLE level it is optimized into:
_1 = __builtin_memcmp_eq (lhs_3(D), rhs_4(D), 12);
_5 = _1 == 0;
return _5;
which is I think what we want.
The reason for doing conditional branch after each comparison is
targetm.compare_by_pieces_branch_ratio (DImode) tells it not to batch more than
one.
Not really sure if we should override it to something else and on which
targets.
The reason for the weird
mov eax, 1
test eax, eax
sete al
instead of
xor eax, eax
and
xor eax, eax
test eax, eax
sete al
instead of
mov eax, 1
is that the epilogue is threaded only too late where the constant propagation
through it isn't possible.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug tree-optimization/108953] inefficient codegen for trivial equality
2023-02-27 16:29 [Bug c++/108953] New: inefficient codegen for trivial equality barry.revzin at gmail dot com
2023-02-27 16:38 ` [Bug c++/108953] " pinskia at gcc dot gnu.org
2023-02-27 17:11 ` [Bug tree-optimization/108953] " jakub at gcc dot gnu.org
@ 2023-02-28 10:44 ` rguenth at gcc dot gnu.org
2023-03-02 15:05 ` jakub at gcc dot gnu.org
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-02-28 10:44 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108953
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Last reconfirmed| |2023-02-28
Status|UNCONFIRMED |NEW
Ever confirmed|0 |1
--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
I think it only makes sense to combine the compares when the operand(s) are
from memory which means the machinery we have in load/store merging becomes
useful
which means we eventually want to do this in this very pass (or in the bswap
pass).
The alternative would be ifcombine but then the order of the compares in the
source could be arbitrary, so it doesn't really fit that one for the most
optimal result.
There's also a related vectorization bug in case the SLP vectorizer is the
target to look at.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug tree-optimization/108953] inefficient codegen for trivial equality
2023-02-27 16:29 [Bug c++/108953] New: inefficient codegen for trivial equality barry.revzin at gmail dot com
` (2 preceding siblings ...)
2023-02-28 10:44 ` rguenth at gcc dot gnu.org
@ 2023-03-02 15:05 ` jakub at gcc dot gnu.org
2023-06-23 22:59 ` pinskia at gcc dot gnu.org
2023-07-13 18:15 ` [Bug c++/108953] " pinskia at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-03-02 15:05 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108953
--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
BTW, I've tried:
struct A
{
unsigned char a, b, c, d;
unsigned short e, f;
int g;
bool operator== (A const &) const = default;
};
struct B
{
unsigned char b;
bool operator== (B const &) const = default;
};
struct C
{
unsigned char a, b, c;
B d;
unsigned short e, f;
int g;
bool operator== (C const &) const = default;
};
struct D
{
unsigned char b;
bool operator== (D const &x) const { return b == x.b; }
};
struct E
{
unsigned char a, b, c;
D d;
unsigned short e, f;
int g;
bool operator== (E const &) const = default;
};
struct F
{
unsigned char a, b, c, d;
unsigned short e, f;
int g;
bool operator== (F const &x) const {
if (a != x.a) return false;
if (b != x.b) return false;
if (c != x.c) return false;
if (d != x.d) return false;
if (e != x.e) return false;
if (f != x.f) return false;
return g == x.a;
}
};
struct G
{
unsigned char a, b, c, d;
unsigned short e, f;
float g;
bool operator== (G const &) const = default;
};
bool
foo (A const &lhs, A const &rhs)
{
return lhs == rhs;
}
bool
bar (C const &lhs, C const &rhs)
{
return lhs == rhs;
}
bool
baz (E const &lhs, E const &rhs)
{
return lhs == rhs;
}
bool
qux (F const &lhs, F const &rhs)
{
return lhs == rhs;
}
bool
corge (G const &lhs, G const &rhs)
{
return lhs == rhs;
}
to see what clang trunk does and they optimize this way only the defaulted
bool A::operator== (A const &) const, nothing else, so it clearly isn't a late
pattern matching. Probably just they notice that the structure only contains
integral fields
and in that case emit memcmp for the portions without padding bits if certain
conditions are met (seems two consecutive char fields are vectorized, but 3
result in memcmp).
But certainly even the nested struct with also defaulted operator== isn't
optimized that way at all, even the unaffected consecutive parts there.
Just in case we'd like to do this as a late pattern matching optimization and
in the C++ FE.
Note, I wonder if reassoc wouldn't be the right spot instead of store-merging.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug tree-optimization/108953] inefficient codegen for trivial equality
2023-02-27 16:29 [Bug c++/108953] New: inefficient codegen for trivial equality barry.revzin at gmail dot com
` (3 preceding siblings ...)
2023-03-02 15:05 ` jakub at gcc dot gnu.org
@ 2023-06-23 22:59 ` pinskia at gcc dot gnu.org
2023-07-13 18:15 ` [Bug c++/108953] " pinskia at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-06-23 22:59 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108953
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |madhur4127 at gmail dot com
--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
*** Bug 110383 has been marked as a duplicate of this bug. ***
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug c++/108953] inefficient codegen for trivial equality
2023-02-27 16:29 [Bug c++/108953] New: inefficient codegen for trivial equality barry.revzin at gmail dot com
` (4 preceding siblings ...)
2023-06-23 22:59 ` pinskia at gcc dot gnu.org
@ 2023-07-13 18:15 ` pinskia at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-07-13 18:15 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108953
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jengelh at inai dot de
--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
*** Bug 103733 has been marked as a duplicate of this bug. ***
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2023-07-13 18:15 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-27 16:29 [Bug c++/108953] New: inefficient codegen for trivial equality barry.revzin at gmail dot com
2023-02-27 16:38 ` [Bug c++/108953] " pinskia at gcc dot gnu.org
2023-02-27 17:11 ` [Bug tree-optimization/108953] " jakub at gcc dot gnu.org
2023-02-28 10:44 ` rguenth at gcc dot gnu.org
2023-03-02 15:05 ` jakub at gcc dot gnu.org
2023-06-23 22:59 ` pinskia at gcc dot gnu.org
2023-07-13 18:15 ` [Bug c++/108953] " pinskia at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).