public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c++/95669] New: -O3 generates more complicated code to return 8-byte struct of zeros, sometimes
@ 2020-06-14  8:52 jzwinck at gmail dot com
  2020-06-15  6:55 ` [Bug middle-end/95669] " rguenth at gcc dot gnu.org
  2021-05-30 22:49 ` pinskia at gcc dot gnu.org
  0 siblings, 2 replies; 3+ messages in thread
From: jzwinck at gmail dot com @ 2020-06-14  8:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95669

            Bug ID: 95669
           Summary: -O3 generates more complicated code to return 8-byte
                    struct of zeros, sometimes
           Product: gcc
           Version: 10.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jzwinck at gmail dot com
  Target Milestone: ---

Consider this C++ code:

    struct res
    {
        int val;
        bool ok;
        bool dummy;
    };

    res fun(int a, int b)
    {
        if (a < b)
            return {0, false};
        return {a * b, true};
    }

10.1 (or 8.x) with -O2 or -O3 on x86_64 looks good:

        cmp     edi, esi ; a < b
        jge     .L2
        xor     eax, eax ; {0, false}
        ret
    .L2:
        imul    edi, esi ; a * b
        mov     rax, rdi
        bts     rax, 32  ; ok = true
        ret

But if you remove "dummy" and compile with -O3, it looks worse:

        cmp     edi, esi ; a < b
        jl      .L3
        imul    edi, esi ; a * b
        mov     esi, 1   ; ok = true
        sal     rsi, 32
        mov     eax, edi
        or      rax, rsi
        ret
    .L3:
        xor     esi, esi
        xor     edi, edi
        mov     eax, edi
        sal     rsi, 32
        or      rax, rsi
        ret

Both branch cases are longer, but especially strange is how .L3 uses five
instructions instead of "xor eax, eax".

Similar happens in every version I tried except for 4.5 and 4.6.

Live demo: https://godbolt.org/z/4Lisy6

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug middle-end/95669] -O3 generates more complicated code to return 8-byte struct of zeros, sometimes
  2020-06-14  8:52 [Bug c++/95669] New: -O3 generates more complicated code to return 8-byte struct of zeros, sometimes jzwinck at gmail dot com
@ 2020-06-15  6:55 ` rguenth at gcc dot gnu.org
  2021-05-30 22:49 ` pinskia at gcc dot gnu.org
  1 sibling, 0 replies; 3+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-06-15  6:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95669

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|c++                         |middle-end
             Target|                            |x86_64-*-*
   Last reconfirmed|                            |2020-06-15
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW
           Keywords|                            |missed-optimization

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
with 'dummy' and the implicit zero initialization of it we retain

  <bb 2> [local count: 1073741824]:
  if (a_3(D) < b_4(D)) 
    goto <bb 3>; [50.00%]
  else
    goto <bb 4>; [50.00%]

  <bb 3> [local count: 536870913]:
  D.2350 = {};
  goto <bb 5>; [100.00%]

  <bb 4> [local count: 536870913]:
  _1 = a_3(D) * b_4(D);
  D.2350.val = _1;
  MEM <unsigned int> [(void *)&D.2350 + 4B] = 1;

  <bb 5> [local count: 1073741824]:
  return D.2350;

wich generates straigt-forward code while with 'dummy' elided we manage
to completely scalarize things and do

  <bb 2> [local count: 1073741824]:
  if (a_3(D) < b_4(D)) 
    goto <bb 4>; [50.00%]
  else
    goto <bb 3>; [50.00%]

  <bb 3> [local count: 536870913]:
  _1 = a_3(D) * b_4(D);

  <bb 4> [local count: 1073741824]:
  # cstore_11 = PHI <_1(3), 0(2)>
  # cstore_10 = PHI <1(3), 0(2)>
  D.2349.ok = cstore_10;
  D.2349.val = cstore_11;
  return D.2349;

which is basically two conditional moves we expand via strange bit
shufflings because D.2349 (struct res) is assigned a register.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug middle-end/95669] -O3 generates more complicated code to return 8-byte struct of zeros, sometimes
  2020-06-14  8:52 [Bug c++/95669] New: -O3 generates more complicated code to return 8-byte struct of zeros, sometimes jzwinck at gmail dot com
  2020-06-15  6:55 ` [Bug middle-end/95669] " rguenth at gcc dot gnu.org
@ 2021-05-30 22:49 ` pinskia at gcc dot gnu.org
  1 sibling, 0 replies; 3+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-05-30 22:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95669

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-05-30 22:49 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-14  8:52 [Bug c++/95669] New: -O3 generates more complicated code to return 8-byte struct of zeros, sometimes jzwinck at gmail dot com
2020-06-15  6:55 ` [Bug middle-end/95669] " rguenth at gcc dot gnu.org
2021-05-30 22:49 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).