public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libstdc++/115693] New: 8 std::byte std::array comparison potential missed optimization
@ 2024-06-28 1:30 paulhaile3 at gmail dot com
2024-06-28 2:17 ` [Bug tree-optimization/115693] " xry111 at gcc dot gnu.org
` (5 more replies)
0 siblings, 6 replies; 7+ messages in thread
From: paulhaile3 at gmail dot com @ 2024-06-28 1:30 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115693
Bug ID: 115693
Summary: 8 std::byte std::array comparison potential missed
optimization
Product: gcc
Version: unknown
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: libstdc++
Assignee: unassigned at gcc dot gnu.org
Reporter: paulhaile3 at gmail dot com
Target Milestone: ---
It seems to be inconsistent with char array comparison and with std::bit_cast
Code Snippet: compiled with x86-64 gcc 14.1 with flags --std=c++20 -O3
#include <array>
#include <bit>
#include <cstdint>
bool compare1(const std::array<std::byte, 8> &p, std::array<std::byte, 8> r)
{
return p == r;
}
bool compare2(const std::array<char, 8> &p, std::array<char, 8> r)
{
return p == r;
}
// same assembly if you use char instead of byte
bool compare3(const std::array<std::byte, 8> &p, std::array<std::byte, 8> r)
{
return std::bit_cast<uint64_t>(p) == std::bit_cast<uint64_t>(r);
}
generated assembly:
compare1(std::array<std::byte, 8ul> const&, std::array<std::byte, 8ul>):
mov rax, rsi
cmp BYTE PTR [rdi], sil
jne .L9
cmp ah, BYTE PTR [rdi+1]
jne .L9
mov rdx, rsi
shr rdx, 16
cmp BYTE PTR [rdi+2], dl
jne .L9
mov rdx, rsi
shr rdx, 24
cmp BYTE PTR [rdi+3], dl
jne .L9
mov rdx, rsi
shr rdx, 32
cmp BYTE PTR [rdi+4], dl
jne .L9
mov rdx, rsi
shr rdx, 40
cmp BYTE PTR [rdi+5], dl
jne .L9
mov rdx, rsi
shr rdx, 48
cmp BYTE PTR [rdi+6], dl
jne .L9
shr rax, 56
cmp BYTE PTR [rdi+7], al
sete al
ret
.L9:
xor eax, eax
ret
compare2(std::array<char, 8ul> const&, std::array<char, 8ul>):
cmp QWORD PTR [rdi], rsi
sete al
ret
compare3(std::array<std::byte, 8ul> const&, std::array<std::byte, 8ul>):
cmp QWORD PTR [rdi], rsi
sete al
ret
link to compiler explorer:
https://godbolt.org/#g:!((g:!((g:!((h:codeEditor,i:(filename:'1',fontScale:14,fontUsePx:'0',j:1,lang:c%2B%2B,selection:(endColumn:1,endLineNumber:4,positionColumn:1,positionLineNumber:4,selectionStartColumn:1,selectionStartLineNumber:4,startColumn:1,startLineNumber:4),source:'%23include+%3Carray%3E%0A%23include+%3Cbit%3E%0A%23include+%3Ccstdint%3E%0A%0Abool+compare1(const+std::array%3Cstd::byte,+8%3E+%26p,+std::array%3Cstd::byte,+8%3E+r)+%7Breturn+p+%3D%3D+r%3B%7D%0Abool+compare2(const+std::array%3Cchar,+8%3E+%26p,+std::array%3Cchar,+8%3E+r)+%7B+return+p+%3D%3D+r%3B+%7D%0Abool+compare3(const+std::array%3Cstd::byte,+8%3E+%26p,+std::array%3Cstd::byte,+8%3E+r)+%7Breturn+std::bit_cast%3Cuint64_t%3E(p)+%3D%3D+std::bit_cast%3Cuint64_t%3E(r)%3B%7D'),l:'5',n:'1',o:'C%2B%2B+source+%231',t:'0')),k:63.98956975228162,l:'4',n:'0',o:'',s:0,t:'0'),(g:!((h:compiler,i:(compiler:g141,filters:(b:'0',binary:'1',binaryObject:'1',commentOnly:'0',debugCalls:'1',demangle:'0',directives:'0',execute:'1',intel:'0',libraryCode:'0',trim:'1',verboseDemangling:'0'),flagsViewOpen:'1',fontScale:14,fontUsePx:'0',j:2,lang:c%2B%2B,libs:!(),options:'--std%3Dc%2B%2B20+-O3',overrides:!(),selection:(endColumn:1,endLineNumber:1,positionColumn:1,positionLineNumber:1,selectionStartColumn:1,selectionStartLineNumber:1,startColumn:1,startLineNumber:1),source:1),l:'5',n:'0',o:'+x86-64+gcc+14.1+(Editor+%231)',t:'0')),k:36.01043024771838,l:'4',n:'0',o:'',s:0,t:'0')),l:'2',n:'0',o:'',t:'0')),version:4
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug tree-optimization/115693] 8 std::byte std::array comparison potential missed optimization
2024-06-28 1:30 [Bug libstdc++/115693] New: 8 std::byte std::array comparison potential missed optimization paulhaile3 at gmail dot com
@ 2024-06-28 2:17 ` xry111 at gcc dot gnu.org
2024-06-28 9:54 ` redi at gcc dot gnu.org
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: xry111 at gcc dot gnu.org @ 2024-06-28 2:17 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115693
Xi Ruoyao <xry111 at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Version|unknown |14.1.0
CC| |xry111 at gcc dot gnu.org
Component|libstdc++ |tree-optimization
--- Comment #1 from Xi Ruoyao <xry111 at gcc dot gnu.org> ---
I'm transferring it to tree-optimization as the following cases are compiled to
stupid code:
char a[8], b[8];
int test()
{
for (int i = 0; i < 8; i++)
if (a[i] != b[i])
return 0;
return 1;
}
int test1()
{
int ret = 0;
for (int i = 0; i < 8; i++)
ret = ret || a[i] != b[i];
return ret;
}
So it makes more sense to fix this in the optimization passes, instead of
ad-hoc hack in libstdc++.
But I'm not sure if there already exists a dup.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug tree-optimization/115693] 8 std::byte std::array comparison potential missed optimization
2024-06-28 1:30 [Bug libstdc++/115693] New: 8 std::byte std::array comparison potential missed optimization paulhaile3 at gmail dot com
2024-06-28 2:17 ` [Bug tree-optimization/115693] " xry111 at gcc dot gnu.org
@ 2024-06-28 9:54 ` redi at gcc dot gnu.org
2024-06-28 11:03 ` rguenth at gcc dot gnu.org
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: redi at gcc dot gnu.org @ 2024-06-28 9:54 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115693
--- Comment #2 from Jonathan Wakely <redi at gcc dot gnu.org> ---
(In reply to Xi Ruoyao from comment #1)
> So it makes more sense to fix this in the optimization passes, instead of
> ad-hoc hack in libstdc++.
Yes, I agree, although we already have the hack in libstdc++ it just doesn't
work for std::byte:
--- a/libstdc++-v3/include/bits/stl_algobase.h
+++ b/libstdc++-v3/include/bits/stl_algobase.h
@@ -1232,6 +1232,9 @@ _GLIBCXX_END_NAMESPACE_CONTAINER
const bool __simple = ((__is_integer<_ValueType1>::__value
#if _GLIBCXX_USE_BUILTIN_TRAIT(__is_pointer)
|| __is_pointer(_ValueType1)
+#endif
+#if __glibcxx_byte
+ || __is_same(byte, _ValueType1)
#endif
) && __memcmpable<_II1, _II2>::__value);
return std::__equal<__simple>::equal(__first1, __last1, __first2);
> But I'm not sure if there already exists a dup.
The bug reported here is an exact dup of Bug 101485
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug tree-optimization/115693] 8 std::byte std::array comparison potential missed optimization
2024-06-28 1:30 [Bug libstdc++/115693] New: 8 std::byte std::array comparison potential missed optimization paulhaile3 at gmail dot com
2024-06-28 2:17 ` [Bug tree-optimization/115693] " xry111 at gcc dot gnu.org
2024-06-28 9:54 ` redi at gcc dot gnu.org
@ 2024-06-28 11:03 ` rguenth at gcc dot gnu.org
2024-06-30 16:24 ` paulhaile3 at gmail dot com
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-06-28 11:03 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115693
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Blocks| |53947
Last reconfirmed| |2024-06-28
Target| |x86_64-*-*
Ever confirmed|0 |1
CC| |crazylht at gmail dot com
--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Xi Ruoyao from comment #1)
> I'm transferring it to tree-optimization as the following cases are compiled
> to stupid code:
>
> char a[8], b[8];
>
> int test()
> {
> for (int i = 0; i < 8; i++)
> if (a[i] != b[i])
> return 0;
>
> return 1;
> }
>
> int test1()
> {
> int ret = 0;
> for (int i = 0; i < 8; i++)
> ret = ret || a[i] != b[i];
>
> return ret;
> }
>
> So it makes more sense to fix this in the optimization passes, instead of
> ad-hoc hack in libstdc++.
>
> But I'm not sure if there already exists a dup.
Let's keep this bug for the above testcase(s). For test() the issue is
that even with SSE4.1 we don't seem to support ptest for V8QImode?
For test1 cost modeling makes vectorization worthwhile, though with just SSE2
we get
test1:
.LFB1:
.cfi_startproc
movq a(%rip), %xmm1
pxor %xmm2, %xmm2
movq b(%rip), %xmm0
pcmpeqb %xmm1, %xmm0
movq .LC0(%rip), %xmm1
pandn %xmm1, %xmm0
movdqa %xmm0, %xmm1
punpcklbw %xmm2, %xmm0
punpcklbw %xmm2, %xmm1
pshufd $78, %xmm0, %xmm0
pxor %xmm2, %xmm2
movdqa %xmm0, %xmm3
punpcklwd %xmm2, %xmm0
punpcklwd %xmm2, %xmm3
pshufd $78, %xmm0, %xmm0
por %xmm3, %xmm0
movdqa %xmm1, %xmm3
punpcklwd %xmm2, %xmm1
punpcklwd %xmm2, %xmm3
pshufd $78, %xmm1, %xmm1
por %xmm3, %xmm1
por %xmm1, %xmm0
movdqa %xmm0, %xmm1
psrlq $32, %xmm1
por %xmm1, %xmm0
movd %xmm0, %eax
ret
with SSE4.2 it's a bit better and "just"
test1:
.LFB1:
.cfi_startproc
movq a(%rip), %xmm1
movq b(%rip), %xmm0
pcmpeqb %xmm1, %xmm0
movq .LC0(%rip), %xmm1
pandn %xmm1, %xmm0
pmovzxbw %xmm0, %xmm2
psrlq $32, %xmm0
pmovzxbw %xmm0, %xmm0
pmovzxwd %xmm0, %xmm1
psrlq $32, %xmm0
pmovzxwd %xmm0, %xmm0
por %xmm1, %xmm0
pmovzxwd %xmm2, %xmm1
psrlq $32, %xmm2
pmovzxwd %xmm2, %xmm2
por %xmm2, %xmm1
por %xmm1, %xmm0
movdqa %xmm0, %xmm1
psrlq $32, %xmm1
por %xmm1, %xmm0
movd %xmm0, %eax
ret
but we fail to realize that the bitwise-OR reduction could be narrowed to
char:
<bb 3> [local count: 954449106]:
# ret_12 = PHI <iftmp.0_5(7), 0(15)>
# i_14 = PHI <i_7(7), 0(15)>
# ivtmp_4 = PHI <ivtmp_3(7), 8(15)>
_1 = a[i_14];
_2 = b[i_14];
_8 = _1 != _2;
_9 = (int) _8;
iftmp.0_5 = _9 | ret_12;
i_7 = i_14 + 1;
ivtmp_3 = ivtmp_4 - 1;
if (ivtmp_3 != 0)
goto <bb 7>; [87.50%]
instead we keep 4 V2SImode "accumulators" and widen the compare results.
The best would be if scalar opts would make this a bool reduction though
IIRC we have a PR for that being not handled.
Referenced Bugs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug tree-optimization/115693] 8 std::byte std::array comparison potential missed optimization
2024-06-28 1:30 [Bug libstdc++/115693] New: 8 std::byte std::array comparison potential missed optimization paulhaile3 at gmail dot com
` (2 preceding siblings ...)
2024-06-28 11:03 ` rguenth at gcc dot gnu.org
@ 2024-06-30 16:24 ` paulhaile3 at gmail dot com
2024-06-30 16:44 ` pinskia at gcc dot gnu.org
2024-07-01 1:14 ` liuhongt at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: paulhaile3 at gmail dot com @ 2024-06-30 16:24 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115693
--- Comment #4 from Paulous Haile <paulhaile3 at gmail dot com> ---
Maybe not necessary, just wanted to add a case for the default operator==. I
have noticed it also doesn't seem to coalesce adjacent cmps, which I assume is
part of the same issue.
#include <cstdint>
struct Point
{
uint32_t x,y;
bool operator==(const Point &other) const=default;
};
Point x, y;
bool test()
{
return x == y;
}
test():
mov edx, DWORD PTR y[rip]
xor eax, eax
cmp DWORD PTR x[rip], edx
jne .L1
mov eax, DWORD PTR y[rip+4]
cmp DWORD PTR x[rip+4], eax
sete al
.L1:
ret
y:
.zero 8
x:
.zero 8
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug tree-optimization/115693] 8 std::byte std::array comparison potential missed optimization
2024-06-28 1:30 [Bug libstdc++/115693] New: 8 std::byte std::array comparison potential missed optimization paulhaile3 at gmail dot com
` (3 preceding siblings ...)
2024-06-30 16:24 ` paulhaile3 at gmail dot com
@ 2024-06-30 16:44 ` pinskia at gcc dot gnu.org
2024-07-01 1:14 ` liuhongt at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-06-30 16:44 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115693
--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Paulous Haile from comment #4)
> Maybe not necessary, just wanted to add a case for the default operator==. I
> have noticed it also doesn't seem to coalesce adjacent cmps, which I assume
> is part of the same issue.
That is PR 108953 already.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug tree-optimization/115693] 8 std::byte std::array comparison potential missed optimization
2024-06-28 1:30 [Bug libstdc++/115693] New: 8 std::byte std::array comparison potential missed optimization paulhaile3 at gmail dot com
` (4 preceding siblings ...)
2024-06-30 16:44 ` pinskia at gcc dot gnu.org
@ 2024-07-01 1:14 ` liuhongt at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-07-01 1:14 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115693
Hongtao Liu <liuhongt at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |liuhongt at gcc dot gnu.org
--- Comment #6 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
> >
> > So it makes more sense to fix this in the optimization passes, instead of
> > ad-hoc hack in libstdc++.
> >
> > But I'm not sure if there already exists a dup.
>
> Let's keep this bug for the above testcase(s). For test() the issue is
> that even with SSE4.1 we don't seem to support ptest for V8QImode?
With SSE4.1 and above, We can support cbranchv8qi(and other 32/64-bit vector)
with pmovzxv8qiv8hi + cbranchv8hi.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-07-01 1:14 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-06-28 1:30 [Bug libstdc++/115693] New: 8 std::byte std::array comparison potential missed optimization paulhaile3 at gmail dot com
2024-06-28 2:17 ` [Bug tree-optimization/115693] " xry111 at gcc dot gnu.org
2024-06-28 9:54 ` redi at gcc dot gnu.org
2024-06-28 11:03 ` rguenth at gcc dot gnu.org
2024-06-30 16:24 ` paulhaile3 at gmail dot com
2024-06-30 16:44 ` pinskia at gcc dot gnu.org
2024-07-01 1:14 ` liuhongt at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).