public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/98335] New: [9/10/11 Regression] Poor code generation for partial struct initialization
@ 2020-12-16 22:02 mcolavita at fb dot com
  2020-12-17 16:19 ` [Bug rtl-optimization/98335] " mcolavita at fb dot com
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: mcolavita at fb dot com @ 2020-12-16 22:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98335

            Bug ID: 98335
           Summary: [9/10/11 Regression] Poor code generation for partial
                    struct initialization
           Product: gcc
           Version: 9.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: mcolavita at fb dot com
  Target Milestone: ---

Consider the following code:

  struct Data {
    char a;
    int b;
  };

  char c;

  Data val(int idx) {
    return { c };
  }

On x86-64 (with sizeof(char) = 1 and sizeof(int) = 4), val() can be implemented
with a single mov to %rax. With -O3, g++ produces the following inefficient
output:

    xorl        %eax, %eax
    movb        $0, -18(%rsp)
    movabsq     $72057594037927935, %rdx
    movw        %ax, -20(%rsp)
    movl        $0, -24(%rsp)
    andq        -24(%rsp), %rdx
    movq        %rdx, %rax
    salq        $8, %rax
    movb        c(%rip), %al
    ret

Similar outputs are seen for any level of optimization above O0 on GCC 9, 10,
and 11. The bug is not present in GCC 8.

Reasonable code is generated if the second field of the struct is explicitly
initialized to a constant, either in the struct definition or the initializer.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug rtl-optimization/98335] [9/10/11 Regression] Poor code generation for partial struct initialization
  2020-12-16 22:02 [Bug rtl-optimization/98335] New: [9/10/11 Regression] Poor code generation for partial struct initialization mcolavita at fb dot com
@ 2020-12-17 16:19 ` mcolavita at fb dot com
  2021-01-04 15:40 ` [Bug tree-optimization/98335] " rguenth at gcc dot gnu.org
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: mcolavita at fb dot com @ 2020-12-17 16:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98335

--- Comment #1 from Michael Colavita <mcolavita at fb dot com> ---
A similar problem appears to occur for the following example:

  struct Data {
    long a;
    union {
      long u;
      struct {
        char b;
        char pad[3];
      };
    };
  };

  Data val(long d, long e) {
    return { d, e };
  }

Yielding:

    movq            %rdi, %xmm0
    movq            %rsi, %xmm1
    punpcklqdq      %xmm1, %xmm0
    movaps          %xmm0, -56(%rsp)
    movq            -56(%rsp), %rax
    movq            -48(%rsp), %rdx
    ret

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/98335] [9/10/11 Regression] Poor code generation for partial struct initialization
  2020-12-16 22:02 [Bug rtl-optimization/98335] New: [9/10/11 Regression] Poor code generation for partial struct initialization mcolavita at fb dot com
  2020-12-17 16:19 ` [Bug rtl-optimization/98335] " mcolavita at fb dot com
@ 2021-01-04 15:40 ` rguenth at gcc dot gnu.org
  2021-01-06 11:01 ` ebotcazou at gcc dot gnu.org
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-04 15:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98335

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |ebotcazou at gcc dot gnu.org,
                   |                            |jakub at gcc dot gnu.org
           Keywords|                            |missed-optimization
           Priority|P3                          |P2
          Component|rtl-optimization            |tree-optimization
   Last reconfirmed|                            |2021-01-04
     Ever confirmed|0                           |1
   Target Milestone|---                         |9.4
             Status|UNCONFIRMED                 |NEW

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
We expand the first case from

  MEM <char[7]> [(struct Data *)&D.2365 + 1B] = {};
  c.0_1 = c;
  D.2365.a = c.0_1;
  return D.2365;

I guess store-merging could "merge" the stores as

  D.2365 = {};
  D.2365.a = c.0_1;

thus figure the partial unaligned zeroing is better done aligned
(and redundant).  Alternatively it could emit

  V_C_E<unsigned> = (unsigned) c.0_1;

The second testcase looks vectorization/ABI related for which we have plenty
of dups.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/98335] [9/10/11 Regression] Poor code generation for partial struct initialization
  2020-12-16 22:02 [Bug rtl-optimization/98335] New: [9/10/11 Regression] Poor code generation for partial struct initialization mcolavita at fb dot com
  2020-12-17 16:19 ` [Bug rtl-optimization/98335] " mcolavita at fb dot com
  2021-01-04 15:40 ` [Bug tree-optimization/98335] " rguenth at gcc dot gnu.org
@ 2021-01-06 11:01 ` ebotcazou at gcc dot gnu.org
  2021-01-20  9:51 ` jakub at gcc dot gnu.org
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: ebotcazou at gcc dot gnu.org @ 2021-01-06 11:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98335

--- Comment #3 from Eric Botcazou <ebotcazou at gcc dot gnu.org> ---
> We expand the first case from
> 
>   MEM <char[7]> [(struct Data *)&D.2365 + 1B] = {};
>   c.0_1 = c;
>   D.2365.a = c.0_1;
>   return D.2365;

But why generate a 7-byte zeroing instead of a 8-byte one?  I gather this is
the cause of the regression.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/98335] [9/10/11 Regression] Poor code generation for partial struct initialization
  2020-12-16 22:02 [Bug rtl-optimization/98335] New: [9/10/11 Regression] Poor code generation for partial struct initialization mcolavita at fb dot com
                   ` (2 preceding siblings ...)
  2021-01-06 11:01 ` ebotcazou at gcc dot gnu.org
@ 2021-01-20  9:51 ` jakub at gcc dot gnu.org
  2021-01-21 11:27 ` jakub at gcc dot gnu.org
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-01-20  9:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98335

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |law at gcc dot gnu.org

--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
I agree it would be weird to try to undo the
-  D.2365 = {};
+  MEM <char[7]> [(struct Data *)&D.2365 + 1B] = {};
transformation by DSE in store_merging instead of adjusting the DSE
optimization to take into account costs and likely ways how the clearing will
be expanded.
On the other side, the user could have written it that way.

Regressed with r9-1663-g99e87c0eef2f6020a3ded2c785389939c07ac04e aka PR86010
fix.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/98335] [9/10/11 Regression] Poor code generation for partial struct initialization
  2020-12-16 22:02 [Bug rtl-optimization/98335] New: [9/10/11 Regression] Poor code generation for partial struct initialization mcolavita at fb dot com
                   ` (3 preceding siblings ...)
  2021-01-20  9:51 ` jakub at gcc dot gnu.org
@ 2021-01-21 11:27 ` jakub at gcc dot gnu.org
  2021-06-01  8:19 ` [Bug tree-optimization/98335] [9/10/11/12 " rguenth at gcc dot gnu.org
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-01-21 11:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98335

--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
So I think we want to improve that
+  /* If more than a word remains, then make sure to keep the
+     starting point at least word aligned.  */
+  if (last_live - first_live > UNITS_PER_WORD)
+    *trim_head &= (UNITS_PER_WORD - 1);

Note, last_live is the start of the last live byte (so last_live + 1 is the end
of that).
For the small sizes, I'd say we should consider both alignment and exact
head/tail trim values.
Whole word store is definitely more efficient than 7 bytes store at offset 1,
ditto head trim 2 and 3, storing just second half is ok.
So shall we e.g. call by_pieces_ninsns for the before/after the expected
triming and determine only trim if it doesn't increase number of by pieces
store insns?
It could also iterate on those.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/98335] [9/10/11/12 Regression] Poor code generation for partial struct initialization
  2020-12-16 22:02 [Bug rtl-optimization/98335] New: [9/10/11 Regression] Poor code generation for partial struct initialization mcolavita at fb dot com
                   ` (4 preceding siblings ...)
  2021-01-21 11:27 ` jakub at gcc dot gnu.org
@ 2021-06-01  8:19 ` rguenth at gcc dot gnu.org
  2022-03-07 13:01 ` roger at nextmovesoftware dot com
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-06-01  8:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98335

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|9.4                         |9.5

--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 9.4 is being released, retargeting bugs to GCC 9.5.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/98335] [9/10/11/12 Regression] Poor code generation for partial struct initialization
  2020-12-16 22:02 [Bug rtl-optimization/98335] New: [9/10/11 Regression] Poor code generation for partial struct initialization mcolavita at fb dot com
                   ` (5 preceding siblings ...)
  2021-06-01  8:19 ` [Bug tree-optimization/98335] [9/10/11/12 " rguenth at gcc dot gnu.org
@ 2022-03-07 13:01 ` roger at nextmovesoftware dot com
  2022-03-11 17:53 ` cvs-commit at gcc dot gnu.org
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: roger at nextmovesoftware dot com @ 2022-03-07 13:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98335

Roger Sayle <roger at nextmovesoftware dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |roger at nextmovesoftware dot com
           Assignee|unassigned at gcc dot gnu.org      |roger at nextmovesoftware dot com
             Status|NEW                         |ASSIGNED

--- Comment #7 from Roger Sayle <roger at nextmovesoftware dot com> ---
Patches proposed (one middle-end and one backend).
https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591273.html
https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591285.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/98335] [9/10/11/12 Regression] Poor code generation for partial struct initialization
  2020-12-16 22:02 [Bug rtl-optimization/98335] New: [9/10/11 Regression] Poor code generation for partial struct initialization mcolavita at fb dot com
                   ` (6 preceding siblings ...)
  2022-03-07 13:01 ` roger at nextmovesoftware dot com
@ 2022-03-11 17:53 ` cvs-commit at gcc dot gnu.org
  2022-03-11 17:59 ` cvs-commit at gcc dot gnu.org
  2022-03-14 18:44 ` roger at nextmovesoftware dot com
  9 siblings, 0 replies; 11+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-03-11 17:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98335

--- Comment #8 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>:

https://gcc.gnu.org/g:c5288df751f9ecd11898dec5f2a7b6b03267f79e

commit r12-7615-gc5288df751f9ecd11898dec5f2a7b6b03267f79e
Author: Roger Sayle <roger@nextmovesoftware.com>
Date:   Fri Mar 11 17:46:50 2022 +0000

    PR tree-optimization/98335: Improvements to DSE's compute_trims.

    This patch is the main middle-end piece of a fix for PR tree-opt/98335,
    which is a code-quality regression affecting mainline.  The issue occurs
    in DSE's (dead store elimination's) compute_trims function that determines
    where a store to memory can be trimmed.  In the testcase given in the
    PR, this function notices that the first byte of a DImode store is dead,
    and replaces the 8-byte store at (aligned) offset zero, with a 7-byte store
    at (unaligned) offset one.  Most architectures can store a power-of-two
    bytes (up to a maximum) in single instruction, so writing 7 bytes requires
    more instructions than writing 8 bytes.  This patch follows Jakub Jelinek's
    suggestion in comment 5, that compute_trims needs improved heuristics.

    On x86_64-pc-linux-gnu with -O2 the new test case in the PR goes from:

            movl    $0, -24(%rsp)
            movabsq $72057594037927935, %rdx
            movl    $0, -21(%rsp)
            andq    -24(%rsp), %rdx
            movq    %rdx, %rax
            salq    $8, %rax
            movb    c(%rip), %al
            ret

    to

            xorl    %eax, %eax
            movb    c(%rip), %al
            ret

    2022-03-11  Roger Sayle  <roger@nextmovesoftware.com>
                Richard Biener  <rguenther@suse.de>

    gcc/ChangeLog
            PR tree-optimization/98335
            * builtins.cc (get_object_alignment_2): Export.
            * builtins.h (get_object_alignment_2): Likewise.
            * tree-ssa-alias.cc (ao_ref_alignment): New.
            * tree-ssa-alias.h (ao_ref_alignment): Declare.

            * tree-ssa-dse.cc (compute_trims): Improve logic deciding whether
            to align head/tail, writing more bytes but using fewer store insns.

    gcc/testsuite/ChangeLog
            PR tree-optimization/98335
            * g++.dg/pr98335.C: New test case.
            * gcc.dg/pr86010.c: New test case.
            * gcc.dg/pr86010-2.c: New test case.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/98335] [9/10/11/12 Regression] Poor code generation for partial struct initialization
  2020-12-16 22:02 [Bug rtl-optimization/98335] New: [9/10/11 Regression] Poor code generation for partial struct initialization mcolavita at fb dot com
                   ` (7 preceding siblings ...)
  2022-03-11 17:53 ` cvs-commit at gcc dot gnu.org
@ 2022-03-11 17:59 ` cvs-commit at gcc dot gnu.org
  2022-03-14 18:44 ` roger at nextmovesoftware dot com
  9 siblings, 0 replies; 11+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-03-11 17:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98335

--- Comment #9 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>:

https://gcc.gnu.org/g:251ea6dfbdb4448875e41081682bb3aa451b5729

commit r12-7616-g251ea6dfbdb4448875e41081682bb3aa451b5729
Author: Roger Sayle <roger@nextmovesoftware.com>
Date:   Fri Mar 11 17:57:12 2022 +0000

    PR tree-optimization/98335: New peephole2 xorl;movb -> movzbl

    This patch is the backend piece of my proposed fix to PR tree-opt/98335,
    to allow C++ partial struct initialization to be as efficient/optimized
    as full struct initialization.

    With the middle-end patch just posted to gcc-patches, the test case
    in the PR compiles on x86_64-pc-linux-gnu with -O2 to:

            xorl    %eax, %eax
            movb    c(%rip), %al
            ret

    with this additional peephole2 (actually four peephole2s):

            movzbl  c(%rip), %eax
            ret

    2022-03-11  Roger Sayle  <roger@nextmovesoftware.com>

    gcc/ChangeLog
            PR tree-optimization/98335
            * config/i386/i386.md (peephole2): Eliminate redundant insv.
            Combine movl followed by movb.  Transform xorl followed by
            a suitable movb or movw into the equivalent movz[bw]l.

    gcc/testsuite/ChangeLog
            PR tree-optimization/98335
            * g++.target/i386/pr98335.C: New test case.
            * gcc.target/i386/pr98335.c: New test case.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/98335] [9/10/11/12 Regression] Poor code generation for partial struct initialization
  2020-12-16 22:02 [Bug rtl-optimization/98335] New: [9/10/11 Regression] Poor code generation for partial struct initialization mcolavita at fb dot com
                   ` (8 preceding siblings ...)
  2022-03-11 17:59 ` cvs-commit at gcc dot gnu.org
@ 2022-03-14 18:44 ` roger at nextmovesoftware dot com
  9 siblings, 0 replies; 11+ messages in thread
From: roger at nextmovesoftware dot com @ 2022-03-14 18:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98335

Roger Sayle <roger at nextmovesoftware dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|9.5                         |12.0
         Resolution|---                         |FIXED
             Status|ASSIGNED                    |RESOLVED

--- Comment #10 from Roger Sayle <roger at nextmovesoftware dot com> ---
This should now be fixed on mainline.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-03-14 18:44 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-16 22:02 [Bug rtl-optimization/98335] New: [9/10/11 Regression] Poor code generation for partial struct initialization mcolavita at fb dot com
2020-12-17 16:19 ` [Bug rtl-optimization/98335] " mcolavita at fb dot com
2021-01-04 15:40 ` [Bug tree-optimization/98335] " rguenth at gcc dot gnu.org
2021-01-06 11:01 ` ebotcazou at gcc dot gnu.org
2021-01-20  9:51 ` jakub at gcc dot gnu.org
2021-01-21 11:27 ` jakub at gcc dot gnu.org
2021-06-01  8:19 ` [Bug tree-optimization/98335] [9/10/11/12 " rguenth at gcc dot gnu.org
2022-03-07 13:01 ` roger at nextmovesoftware dot com
2022-03-11 17:53 ` cvs-commit at gcc dot gnu.org
2022-03-11 17:59 ` cvs-commit at gcc dot gnu.org
2022-03-14 18:44 ` roger at nextmovesoftware dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).