public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/102981] New: [12 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0)
@ 2021-10-28  9:38 theodort at inf dot ethz.ch
  2021-10-28 11:45 ` [Bug tree-optimization/102981] " rguenth at gcc dot gnu.org
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: theodort at inf dot ethz.ch @ 2021-10-28  9:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102981

            Bug ID: 102981
           Summary: [12 Regression] Dead Code Elimination Regression at
                    -O3 (trunk vs 11.2.0)
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: theodort at inf dot ethz.ch
  Target Milestone: ---

cat test.c    
void foo(void);
void bar(void);

static int a;
static short b[2][2][2] = {1};
int main() {
  int c = 0;
  short d = 0;
  for (; d <= 1; d++) {
    if (c)
      foo();
    for (; a < 0; a++) {
      bar();
      if (!b[d][d][d])
        c = 1;
    }
  }
}


11.2.0 at -O3 can eliminate the call to foo but trunk at -O3 cannot:

gcc-11 test.c -O3 -S -o /dev/stdout
...
main:
.LFB0:
        .cfi_startproc
        movl    a(%rip), %eax
        testl   %eax, %eax
        jns     .L6
        subq    $8, %rsp
        .cfi_def_cfa_offset 16
        .p2align 4,,10
        .p2align 3
.L3:
        call    bar
        addl    $1, a(%rip)
        js      .L3
        xorl    %eax, %eax
        addq    $8, %rsp
        .cfi_def_cfa_offset 8
        ret
.L6:
        xorl    %eax, %eax
        ret
        .cfi_endproc

gcc-trunk test.c -O3 -S -o /dev/stdout
main:
.LFB0:
        .cfi_startproc
        pushq   %r12
        .cfi_def_cfa_offset 16
        .cfi_offset 12, -16
        movl    $1, %r12d
        pushq   %rbp
        .cfi_def_cfa_offset 24
        .cfi_offset 6, -24
        movl    $b, %ebp
        pushq   %rbx
        .cfi_def_cfa_offset 32
        .cfi_offset 3, -32
        xorl    %ebx, %ebx
.L2:
        movl    a(%rip), %eax
        testl   %eax, %eax
        jns     .L4
        .p2align 4,,10
        .p2align 3
.L6:
        call    bar
        cmpw    $0, 0(%rbp)
        cmove   %r12d, %ebx
        addl    $1, a(%rip)
        js      .L6
.L4:
        addq    $14, %rbp
        cmpq    $b+28, %rbp
        je      .L14
        testl   %ebx, %ebx
        je      .L2
        call    foo
        jmp     .L2
        .p2align 4,,10
        .p2align 3
.L14:
        popq    %rbx
        .cfi_def_cfa_offset 24
        xorl    %eax, %eax
        popq    %rbp
        .cfi_def_cfa_offset 16
        popq    %r12
        .cfi_def_cfa_offset 8
        ret
        .cfi_endproc

gcc-trunk -v
Using built-in specs.
Target: x86_64-pc-linux-gnu
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 12.0.0 20211028 (experimental) (GCC)

It started with
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=d8edfadfc7a9795b65177a50ce44fd348858e844

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/102981] [12 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0)
  2021-10-28  9:38 [Bug tree-optimization/102981] New: [12 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0) theodort at inf dot ethz.ch
@ 2021-10-28 11:45 ` rguenth at gcc dot gnu.org
  2021-10-30  7:03 ` aldyh at gcc dot gnu.org
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-10-28 11:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102981

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |12.0
           Keywords|                            |missed-optimization

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/102981] [12 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0)
  2021-10-28  9:38 [Bug tree-optimization/102981] New: [12 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0) theodort at inf dot ethz.ch
  2021-10-28 11:45 ` [Bug tree-optimization/102981] " rguenth at gcc dot gnu.org
@ 2021-10-30  7:03 ` aldyh at gcc dot gnu.org
  2021-10-30 17:42 ` aldyh at gcc dot gnu.org
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: aldyh at gcc dot gnu.org @ 2021-10-30  7:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102981

Aldy Hernandez <aldyh at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2021-10-30
             Status|UNCONFIRMED                 |NEW

--- Comment #1 from Aldy Hernandez <aldyh at gcc dot gnu.org> ---
In the pre-loop threaders (ethread, thread1, threadfull1), we can't touch
anything because it would cross loops, but by threadfull2 we should be able to.

There's a threadable path starting at the 2->6 edge here:

<bb 6> [local count: 118111600]:
  # c_21 = PHI <c_30(5), 0(2)>
  # ivtmp.16_8 = PHI <ivtmp.16_13(5), ivtmp.16_25(2)>
  a.1_26 = a;
  if (a.1_26 < 0)
    goto <bb 12>; [89.00%]
  else
    goto <bb 10>; [11.00%]

but we don't because doing so would peel off an iteration.   Hmmm, this is
really old code.  I'm going to have to think about this:

      // This is like path_crosses_loops in profitable_path_p but more
      // restrictive, since profitable_path_p allows threading the
      // first block because it would be redirected anyhow.
      //
      // If we loosened the restriction and used profitable_path_p()
      // here instead, we would peel off the first iterations of loops
      // in places like tree-ssa/pr14341.c.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/102981] [12 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0)
  2021-10-28  9:38 [Bug tree-optimization/102981] New: [12 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0) theodort at inf dot ethz.ch
  2021-10-28 11:45 ` [Bug tree-optimization/102981] " rguenth at gcc dot gnu.org
  2021-10-30  7:03 ` aldyh at gcc dot gnu.org
@ 2021-10-30 17:42 ` aldyh at gcc dot gnu.org
  2021-10-30 18:08 ` aldyh at gcc dot gnu.org
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: aldyh at gcc dot gnu.org @ 2021-10-30 17:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102981

Aldy Hernandez <aldyh at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |law at gcc dot gnu.org,
                   |                            |matz at gcc dot gnu.org

--- Comment #2 from Aldy Hernandez <aldyh at gcc dot gnu.org> ---
I'm not sure what to do here.  Perhaps one of the loop experts can opine.

The threadable path starting at 2->11 could elide the call to foo(), but...

  <bb 2> [local count: 59046943]:
  goto <bb 11>; [100.00%]

  <bb 11> [local count: 177158542]:
  # c_12 = PHI <0(2), c_11(10)>
  # d_13 = PHI <0(2), d_18(10)>
  if (d_13 != 2)
    goto <bb 3>; [66.67%]
  else
    goto <bb 12>; [33.33%]

The pre-loop threaders chose not to thread because it would destroy loop form. 
The late DOM pass can't even thread it, because the IL is too complex for it. 
The backward threader can easily see the candidate, but it has restrictions in
place specifically to avoid peeling the first iteration of loops (regardless of
loopdone):

      // This is like path_crosses_loops in profitable_path_p but more
      // restrictive, since profitable_path_p allows threading the
      // first block because it would be redirected anyhow.
      //
      // If we loosened the restriction and used profitable_path_p()
      // here instead, we would peel off the first iterations of loops
      // in places like tree-ssa/pr14341.c.

I'm not sure massaging the above conditional will ultimately fix this, since
the IL is sufficiently different, but that's the gist of it.

This seems to be a special case where the first iteration of a loop has
unreachable code, and the overly aggressives threaders in earlier GCC releases
could elide it early in the pipeline.

It also looks like a highly contrived testcase.  Does this happen enough in
real life that we should handle it?  If so, should we try harder in the
threader, or could another pass pick up the slack?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/102981] [12 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0)
  2021-10-28  9:38 [Bug tree-optimization/102981] New: [12 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0) theodort at inf dot ethz.ch
                   ` (2 preceding siblings ...)
  2021-10-30 17:42 ` aldyh at gcc dot gnu.org
@ 2021-10-30 18:08 ` aldyh at gcc dot gnu.org
  2021-11-04 18:47 ` law at gcc dot gnu.org
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: aldyh at gcc dot gnu.org @ 2021-10-30 18:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102981

--- Comment #3 from Aldy Hernandez <aldyh at gcc dot gnu.org> ---
*** Bug 102895 has been marked as a duplicate of this bug. ***

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/102981] [12 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0)
  2021-10-28  9:38 [Bug tree-optimization/102981] New: [12 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0) theodort at inf dot ethz.ch
                   ` (3 preceding siblings ...)
  2021-10-30 18:08 ` aldyh at gcc dot gnu.org
@ 2021-11-04 18:47 ` law at gcc dot gnu.org
  2021-11-16 20:08 ` aldyh at gcc dot gnu.org
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: law at gcc dot gnu.org @ 2021-11-04 18:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102981

Jeffrey A. Law <law at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P3                          |P2

--- Comment #4 from Jeffrey A. Law <law at gcc dot gnu.org> ---
I have no strong opinions about this specific testcase.  More generally I am in
agreement with Zdenek and others that the threaders should not be peeling
iterations off loops or rotating loops.

Fundamentally the threaders don't have the kind of costing model to know if
peeling an iteration off is profitable or not.  So even after the loop
optimizers are done, I'd still lean against peeling since if it was profitable
it should have been done by the loop optimizer or vectorizer.

So unless someone can show this is a significant issue in real world code, I
would  argue that it ought to be fixed by including the possibility of
eliminating unreachable code int he profitibility analysis for loop peeling by
the loop optimizers and possibly the unroller (for this specific example).

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/102981] [12 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0)
  2021-10-28  9:38 [Bug tree-optimization/102981] New: [12 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0) theodort at inf dot ethz.ch
                   ` (4 preceding siblings ...)
  2021-11-04 18:47 ` law at gcc dot gnu.org
@ 2021-11-16 20:08 ` aldyh at gcc dot gnu.org
  2021-11-17  9:49 ` aldyh at gcc dot gnu.org
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: aldyh at gcc dot gnu.org @ 2021-11-16 20:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102981

--- Comment #5 from Aldy Hernandez <aldyh at gcc dot gnu.org> ---
*** Bug 103280 has been marked as a duplicate of this bug. ***

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/102981] [12 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0)
  2021-10-28  9:38 [Bug tree-optimization/102981] New: [12 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0) theodort at inf dot ethz.ch
                   ` (5 preceding siblings ...)
  2021-11-16 20:08 ` aldyh at gcc dot gnu.org
@ 2021-11-17  9:49 ` aldyh at gcc dot gnu.org
  2021-11-23 19:19 ` aldyh at gcc dot gnu.org
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: aldyh at gcc dot gnu.org @ 2021-11-17  9:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102981

--- Comment #6 from Aldy Hernandez <aldyh at gcc dot gnu.org> ---
This looks like a class of problems we could easily get if we wanted.  The
pattern is:

PREHEADER
    |
    |
    V
  HEADER --> LOOPEXIT
    |
    |
    V
   SUCC
    |  \
    |   \
   DEAD  \
     |   /
     |  /
     | v
   XXXXXX

On the PREHEADER->HEADER->SUCC path we want to know if the edge out of SUCC can
be statically determined.  The threader can't do this for a number of reasons. 
First, we'd be essentially peeling an iteration.  Second, IIUC, we'd be
rotating the loop.

However, there's no reason we can't catch this in a loop optimizer like we did
with the loopch pass.  This is the exact type of problem that is trivially
handled by the path solver, which is quite cheap when you don't have to do full
path discovery like the threader has to do.

Something like:

gimple *control = gimple_outgoing_range_stmt_p (succ);
if (control) {
  auto_vec<basic_block> bbs (3);
  bbs.quick_push (preheader);
  bbs.quick_push (header);
  bbs.quick_push (succ);

  int_range<2> r;
  path_range_query query;
  query.compute_ranges (bbs);
  query.range_of_stmt (r, control);
  if (r == desired_static_value...)
    peel();
...
}

If "dead code on the first iteration" is something we want to handle, I could
help with the ranger bits if someone gives me a hand with the loop bits.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/102981] [12 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0)
  2021-10-28  9:38 [Bug tree-optimization/102981] New: [12 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0) theodort at inf dot ethz.ch
                   ` (6 preceding siblings ...)
  2021-11-17  9:49 ` aldyh at gcc dot gnu.org
@ 2021-11-23 19:19 ` aldyh at gcc dot gnu.org
  2022-05-06  8:31 ` [Bug tree-optimization/102981] [12/13 " jakub at gcc dot gnu.org
  2023-05-08 12:22 ` [Bug tree-optimization/102981] [12/13/14 " rguenth at gcc dot gnu.org
  9 siblings, 0 replies; 11+ messages in thread
From: aldyh at gcc dot gnu.org @ 2021-11-23 19:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102981

--- Comment #7 from Aldy Hernandez <aldyh at gcc dot gnu.org> ---
*** Bug 103388 has been marked as a duplicate of this bug. ***

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/102981] [12/13 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0)
  2021-10-28  9:38 [Bug tree-optimization/102981] New: [12 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0) theodort at inf dot ethz.ch
                   ` (7 preceding siblings ...)
  2021-11-23 19:19 ` aldyh at gcc dot gnu.org
@ 2022-05-06  8:31 ` jakub at gcc dot gnu.org
  2023-05-08 12:22 ` [Bug tree-optimization/102981] [12/13/14 " rguenth at gcc dot gnu.org
  9 siblings, 0 replies; 11+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-05-06  8:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102981

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|12.0                        |12.2

--- Comment #8 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
GCC 12.1 is being released, retargeting bugs to GCC 12.2.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/102981] [12/13/14 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0)
  2021-10-28  9:38 [Bug tree-optimization/102981] New: [12 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0) theodort at inf dot ethz.ch
                   ` (8 preceding siblings ...)
  2022-05-06  8:31 ` [Bug tree-optimization/102981] [12/13 " jakub at gcc dot gnu.org
@ 2023-05-08 12:22 ` rguenth at gcc dot gnu.org
  9 siblings, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-05-08 12:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102981

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|12.3                        |12.4

--- Comment #10 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 12.3 is being released, retargeting bugs to GCC 12.4.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-05-08 12:22 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-28  9:38 [Bug tree-optimization/102981] New: [12 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0) theodort at inf dot ethz.ch
2021-10-28 11:45 ` [Bug tree-optimization/102981] " rguenth at gcc dot gnu.org
2021-10-30  7:03 ` aldyh at gcc dot gnu.org
2021-10-30 17:42 ` aldyh at gcc dot gnu.org
2021-10-30 18:08 ` aldyh at gcc dot gnu.org
2021-11-04 18:47 ` law at gcc dot gnu.org
2021-11-16 20:08 ` aldyh at gcc dot gnu.org
2021-11-17  9:49 ` aldyh at gcc dot gnu.org
2021-11-23 19:19 ` aldyh at gcc dot gnu.org
2022-05-06  8:31 ` [Bug tree-optimization/102981] [12/13 " jakub at gcc dot gnu.org
2023-05-08 12:22 ` [Bug tree-optimization/102981] [12/13/14 " rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).