public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libstdc++/115693] New: 8 std::byte std::array comparison potential missed optimization
@ 2024-06-28  1:30 paulhaile3 at gmail dot com
  2024-06-28  2:17 ` [Bug tree-optimization/115693] " xry111 at gcc dot gnu.org
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: paulhaile3 at gmail dot com @ 2024-06-28  1:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115693

            Bug ID: 115693
           Summary: 8 std::byte std::array comparison potential missed
                    optimization
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libstdc++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: paulhaile3 at gmail dot com
  Target Milestone: ---

It seems to be inconsistent with char array comparison and with std::bit_cast
Code Snippet: compiled with x86-64 gcc 14.1 with flags --std=c++20 -O3

#include <array>
#include <bit>
#include <cstdint>

bool compare1(const std::array<std::byte, 8> &p, std::array<std::byte, 8> r)
{
    return p == r;
}

bool compare2(const std::array<char, 8> &p, std::array<char, 8> r)
{
    return p == r;
}

// same assembly if you use char instead of byte
bool compare3(const std::array<std::byte, 8> &p, std::array<std::byte, 8> r)
{
    return std::bit_cast<uint64_t>(p) == std::bit_cast<uint64_t>(r);
}


generated assembly:

compare1(std::array<std::byte, 8ul> const&, std::array<std::byte, 8ul>):
        mov     rax, rsi
        cmp     BYTE PTR [rdi], sil
        jne     .L9
        cmp     ah, BYTE PTR [rdi+1]
        jne     .L9
        mov     rdx, rsi
        shr     rdx, 16
        cmp     BYTE PTR [rdi+2], dl
        jne     .L9
        mov     rdx, rsi
        shr     rdx, 24
        cmp     BYTE PTR [rdi+3], dl
        jne     .L9
        mov     rdx, rsi
        shr     rdx, 32
        cmp     BYTE PTR [rdi+4], dl
        jne     .L9
        mov     rdx, rsi
        shr     rdx, 40
        cmp     BYTE PTR [rdi+5], dl
        jne     .L9
        mov     rdx, rsi
        shr     rdx, 48
        cmp     BYTE PTR [rdi+6], dl
        jne     .L9
        shr     rax, 56
        cmp     BYTE PTR [rdi+7], al
        sete    al
        ret
.L9:
        xor     eax, eax
        ret
compare2(std::array<char, 8ul> const&, std::array<char, 8ul>):
        cmp     QWORD PTR [rdi], rsi
        sete    al
        ret
compare3(std::array<std::byte, 8ul> const&, std::array<std::byte, 8ul>):
        cmp     QWORD PTR [rdi], rsi
        sete    al
        ret

link to compiler explorer:
https://godbolt.org/#g:!((g:!((g:!((h:codeEditor,i:(filename:'1',fontScale:14,fontUsePx:'0',j:1,lang:c%2B%2B,selection:(endColumn:1,endLineNumber:4,positionColumn:1,positionLineNumber:4,selectionStartColumn:1,selectionStartLineNumber:4,startColumn:1,startLineNumber:4),source:'%23include+%3Carray%3E%0A%23include+%3Cbit%3E%0A%23include+%3Ccstdint%3E%0A%0Abool+compare1(const+std::array%3Cstd::byte,+8%3E+%26p,+std::array%3Cstd::byte,+8%3E+r)+%7Breturn+p+%3D%3D+r%3B%7D%0Abool+compare2(const+std::array%3Cchar,+8%3E+%26p,+std::array%3Cchar,+8%3E+r)+%7B+return+p+%3D%3D+r%3B+%7D%0Abool+compare3(const+std::array%3Cstd::byte,+8%3E+%26p,+std::array%3Cstd::byte,+8%3E+r)+%7Breturn+std::bit_cast%3Cuint64_t%3E(p)+%3D%3D+std::bit_cast%3Cuint64_t%3E(r)%3B%7D'),l:'5',n:'1',o:'C%2B%2B+source+%231',t:'0')),k:63.98956975228162,l:'4',n:'0',o:'',s:0,t:'0'),(g:!((h:compiler,i:(compiler:g141,filters:(b:'0',binary:'1',binaryObject:'1',commentOnly:'0',debugCalls:'1',demangle:'0',directives:'0',execute:'1',intel:'0',libraryCode:'0',trim:'1',verboseDemangling:'0'),flagsViewOpen:'1',fontScale:14,fontUsePx:'0',j:2,lang:c%2B%2B,libs:!(),options:'--std%3Dc%2B%2B20+-O3',overrides:!(),selection:(endColumn:1,endLineNumber:1,positionColumn:1,positionLineNumber:1,selectionStartColumn:1,selectionStartLineNumber:1,startColumn:1,startLineNumber:1),source:1),l:'5',n:'0',o:'+x86-64+gcc+14.1+(Editor+%231)',t:'0')),k:36.01043024771838,l:'4',n:'0',o:'',s:0,t:'0')),l:'2',n:'0',o:'',t:'0')),version:4

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/115693] 8 std::byte std::array comparison potential missed optimization
  2024-06-28  1:30 [Bug libstdc++/115693] New: 8 std::byte std::array comparison potential missed optimization paulhaile3 at gmail dot com
@ 2024-06-28  2:17 ` xry111 at gcc dot gnu.org
  2024-06-28  9:54 ` redi at gcc dot gnu.org
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: xry111 at gcc dot gnu.org @ 2024-06-28  2:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115693

Xi Ruoyao <xry111 at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Version|unknown                     |14.1.0
                 CC|                            |xry111 at gcc dot gnu.org
          Component|libstdc++                   |tree-optimization

--- Comment #1 from Xi Ruoyao <xry111 at gcc dot gnu.org> ---
I'm transferring it to tree-optimization as the following cases are compiled to
stupid code:

char a[8], b[8];

int test()
{
        for (int i = 0; i < 8; i++)
                if (a[i] != b[i])
                        return 0;

        return 1;
}

int test1()
{
        int ret = 0;
        for (int i = 0; i < 8; i++)
                ret = ret || a[i] != b[i];

        return ret;
}

So it makes more sense to fix this in the optimization passes, instead of
ad-hoc hack in libstdc++.

But I'm not sure if there already exists a dup.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/115693] 8 std::byte std::array comparison potential missed optimization
  2024-06-28  1:30 [Bug libstdc++/115693] New: 8 std::byte std::array comparison potential missed optimization paulhaile3 at gmail dot com
  2024-06-28  2:17 ` [Bug tree-optimization/115693] " xry111 at gcc dot gnu.org
@ 2024-06-28  9:54 ` redi at gcc dot gnu.org
  2024-06-28 11:03 ` rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: redi at gcc dot gnu.org @ 2024-06-28  9:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115693

--- Comment #2 from Jonathan Wakely <redi at gcc dot gnu.org> ---
(In reply to Xi Ruoyao from comment #1)
> So it makes more sense to fix this in the optimization passes, instead of
> ad-hoc hack in libstdc++.

Yes, I agree, although we already have the hack in libstdc++ it just doesn't
work for std::byte:


--- a/libstdc++-v3/include/bits/stl_algobase.h
+++ b/libstdc++-v3/include/bits/stl_algobase.h
@@ -1232,6 +1232,9 @@ _GLIBCXX_END_NAMESPACE_CONTAINER
       const bool __simple = ((__is_integer<_ValueType1>::__value
 #if _GLIBCXX_USE_BUILTIN_TRAIT(__is_pointer)
                              || __is_pointer(_ValueType1)
+#endif
+#if __glibcxx_byte
+                             || __is_same(byte, _ValueType1)
 #endif
                             ) && __memcmpable<_II1, _II2>::__value);
       return std::__equal<__simple>::equal(__first1, __last1, __first2);



> But I'm not sure if there already exists a dup.

The bug reported here is an exact dup of Bug 101485

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/115693] 8 std::byte std::array comparison potential missed optimization
  2024-06-28  1:30 [Bug libstdc++/115693] New: 8 std::byte std::array comparison potential missed optimization paulhaile3 at gmail dot com
  2024-06-28  2:17 ` [Bug tree-optimization/115693] " xry111 at gcc dot gnu.org
  2024-06-28  9:54 ` redi at gcc dot gnu.org
@ 2024-06-28 11:03 ` rguenth at gcc dot gnu.org
  2024-06-30 16:24 ` paulhaile3 at gmail dot com
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-06-28 11:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115693

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
             Blocks|                            |53947
   Last reconfirmed|                            |2024-06-28
             Target|                            |x86_64-*-*
     Ever confirmed|0                           |1
                 CC|                            |crazylht at gmail dot com

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Xi Ruoyao from comment #1)
> I'm transferring it to tree-optimization as the following cases are compiled
> to stupid code:
> 
> char a[8], b[8];
> 
> int test()
> {
> 	for (int i = 0; i < 8; i++)
> 		if (a[i] != b[i])
> 			return 0;
> 
> 	return 1;
> }
> 
> int test1()
> {
> 	int ret = 0;
> 	for (int i = 0; i < 8; i++)
> 		ret = ret || a[i] != b[i];
> 
> 	return ret;
> }
> 
> So it makes more sense to fix this in the optimization passes, instead of
> ad-hoc hack in libstdc++.
> 
> But I'm not sure if there already exists a dup.

Let's keep this bug for the above testcase(s).  For test() the issue is
that even with SSE4.1 we don't seem to support ptest for V8QImode?

For test1 cost modeling makes vectorization worthwhile, though with just SSE2
we get

test1:
.LFB1:
        .cfi_startproc
        movq    a(%rip), %xmm1
        pxor    %xmm2, %xmm2
        movq    b(%rip), %xmm0
        pcmpeqb %xmm1, %xmm0
        movq    .LC0(%rip), %xmm1
        pandn   %xmm1, %xmm0
        movdqa  %xmm0, %xmm1
        punpcklbw       %xmm2, %xmm0
        punpcklbw       %xmm2, %xmm1
        pshufd  $78, %xmm0, %xmm0
        pxor    %xmm2, %xmm2
        movdqa  %xmm0, %xmm3
        punpcklwd       %xmm2, %xmm0
        punpcklwd       %xmm2, %xmm3
        pshufd  $78, %xmm0, %xmm0
        por     %xmm3, %xmm0
        movdqa  %xmm1, %xmm3
        punpcklwd       %xmm2, %xmm1
        punpcklwd       %xmm2, %xmm3
        pshufd  $78, %xmm1, %xmm1
        por     %xmm3, %xmm1
        por     %xmm1, %xmm0
        movdqa  %xmm0, %xmm1
        psrlq   $32, %xmm1
        por     %xmm1, %xmm0
        movd    %xmm0, %eax
        ret

with SSE4.2 it's a bit better and "just"

test1:
.LFB1:
        .cfi_startproc
        movq    a(%rip), %xmm1
        movq    b(%rip), %xmm0
        pcmpeqb %xmm1, %xmm0
        movq    .LC0(%rip), %xmm1
        pandn   %xmm1, %xmm0
        pmovzxbw        %xmm0, %xmm2
        psrlq   $32, %xmm0
        pmovzxbw        %xmm0, %xmm0
        pmovzxwd        %xmm0, %xmm1
        psrlq   $32, %xmm0
        pmovzxwd        %xmm0, %xmm0
        por     %xmm1, %xmm0
        pmovzxwd        %xmm2, %xmm1
        psrlq   $32, %xmm2
        pmovzxwd        %xmm2, %xmm2
        por     %xmm2, %xmm1
        por     %xmm1, %xmm0
        movdqa  %xmm0, %xmm1
        psrlq   $32, %xmm1
        por     %xmm1, %xmm0
        movd    %xmm0, %eax
        ret

but we fail to realize that the bitwise-OR reduction could be narrowed to
char:

  <bb 3> [local count: 954449106]:
  # ret_12 = PHI <iftmp.0_5(7), 0(15)>
  # i_14 = PHI <i_7(7), 0(15)>
  # ivtmp_4 = PHI <ivtmp_3(7), 8(15)>
  _1 = a[i_14];
  _2 = b[i_14];
  _8 = _1 != _2; 
  _9 = (int) _8;
  iftmp.0_5 = _9 | ret_12;
  i_7 = i_14 + 1;
  ivtmp_3 = ivtmp_4 - 1;
  if (ivtmp_3 != 0)
    goto <bb 7>; [87.50%]

instead we keep 4 V2SImode "accumulators" and widen the compare results.

The best would be if scalar opts would make this a bool reduction though
IIRC we have a PR for that being not handled.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/115693] 8 std::byte std::array comparison potential missed optimization
  2024-06-28  1:30 [Bug libstdc++/115693] New: 8 std::byte std::array comparison potential missed optimization paulhaile3 at gmail dot com
                   ` (2 preceding siblings ...)
  2024-06-28 11:03 ` rguenth at gcc dot gnu.org
@ 2024-06-30 16:24 ` paulhaile3 at gmail dot com
  2024-06-30 16:44 ` pinskia at gcc dot gnu.org
  2024-07-01  1:14 ` liuhongt at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: paulhaile3 at gmail dot com @ 2024-06-30 16:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115693

--- Comment #4 from Paulous Haile <paulhaile3 at gmail dot com> ---
Maybe not necessary, just wanted to add a case for the default operator==. I
have noticed it also doesn't seem to coalesce adjacent cmps, which I assume is
part of the same issue.

#include <cstdint>

struct Point
{
    uint32_t x,y;
    bool operator==(const Point &other) const=default;
};

Point x, y;

bool test()
{
    return x == y;
}

test():
        mov     edx, DWORD PTR y[rip]
        xor     eax, eax
        cmp     DWORD PTR x[rip], edx
        jne     .L1
        mov     eax, DWORD PTR y[rip+4]
        cmp     DWORD PTR x[rip+4], eax
        sete    al
.L1:
        ret
y:
        .zero   8
x:
        .zero   8

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/115693] 8 std::byte std::array comparison potential missed optimization
  2024-06-28  1:30 [Bug libstdc++/115693] New: 8 std::byte std::array comparison potential missed optimization paulhaile3 at gmail dot com
                   ` (3 preceding siblings ...)
  2024-06-30 16:24 ` paulhaile3 at gmail dot com
@ 2024-06-30 16:44 ` pinskia at gcc dot gnu.org
  2024-07-01  1:14 ` liuhongt at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-06-30 16:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115693

--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Paulous Haile from comment #4)
> Maybe not necessary, just wanted to add a case for the default operator==. I
> have noticed it also doesn't seem to coalesce adjacent cmps, which I assume
> is part of the same issue.

That is PR 108953 already.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/115693] 8 std::byte std::array comparison potential missed optimization
  2024-06-28  1:30 [Bug libstdc++/115693] New: 8 std::byte std::array comparison potential missed optimization paulhaile3 at gmail dot com
                   ` (4 preceding siblings ...)
  2024-06-30 16:44 ` pinskia at gcc dot gnu.org
@ 2024-07-01  1:14 ` liuhongt at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-07-01  1:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115693

Hongtao Liu <liuhongt at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |liuhongt at gcc dot gnu.org

--- Comment #6 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---

> > 
> > So it makes more sense to fix this in the optimization passes, instead of
> > ad-hoc hack in libstdc++.
> > 
> > But I'm not sure if there already exists a dup.
> 
> Let's keep this bug for the above testcase(s).  For test() the issue is
> that even with SSE4.1 we don't seem to support ptest for V8QImode?
With SSE4.1 and above, We can support cbranchv8qi(and other 32/64-bit vector)
with pmovzxv8qiv8hi + cbranchv8hi.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-07-01  1:14 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-06-28  1:30 [Bug libstdc++/115693] New: 8 std::byte std::array comparison potential missed optimization paulhaile3 at gmail dot com
2024-06-28  2:17 ` [Bug tree-optimization/115693] " xry111 at gcc dot gnu.org
2024-06-28  9:54 ` redi at gcc dot gnu.org
2024-06-28 11:03 ` rguenth at gcc dot gnu.org
2024-06-30 16:24 ` paulhaile3 at gmail dot com
2024-06-30 16:44 ` pinskia at gcc dot gnu.org
2024-07-01  1:14 ` liuhongt at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).