public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/102895] New: [12 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0)
@ 2021-10-22  9:49 theodort at inf dot ethz.ch
  2021-10-22 10:32 ` [Bug tree-optimization/102895] " rguenth at gcc dot gnu.org
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: theodort at inf dot ethz.ch @ 2021-10-22  9:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102895

            Bug ID: 102895
           Summary: [12 Regression] Dead Code Elimination Regression at
                    -O3 (trunk vs 11.2.0)
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: theodort at inf dot ethz.ch
  Target Milestone: ---

cat test.c                                                                     
                master
static int a, b, c;
void foo(void);
int main() {
  for (a = 0; a <= 1; ++a)
    if (c <= a) {
      for (b = 0; b <= 1; ++b)
        ;
    } else
      foo();
}

11.2.0 at -O3 can eliminate the call to foo but trunk at -O3 cannot:

gcc-11 -O3 -S test.c -o /dev/stdout
...
main:
.LFB0:
        .cfi_startproc
        xorl    %eax, %eax
        .p2align 4,,10
        .p2align 3
.L2:
        movl    $2, b(%rip)
        addl    $1, %eax
        movl    %eax, a(%rip)
        cmpl    $1, %eax
        je      .L2
        xorl    %eax, %eax
        ret
        .cfi_endproc

gcc-trunk -O3 -S test.c -o /dev/stdout
...
main:
.LFB0:
        .cfi_startproc
        subq    $8, %rsp
        .cfi_def_cfa_offset 16
        xorl    %eax, %eax
        movl    $0, a(%rip)
        .p2align 4,,10
        .p2align 3
.L5:
        testl   %eax, %eax
        jns     .L9
        call    foo
        .p2align 4,,10
        .p2align 3
.L7:
        movl    a(%rip), %eax
        addl    $1, %eax
        movl    %eax, a(%rip)
        cmpl    $1, %eax
        jle     .L5
        xorl    %eax, %eax
        addq    $8, %rsp
        .cfi_remember_state
        .cfi_def_cfa_offset 8
        ret
        .p2align 4,,10
        .p2align 3
.L9:
        .cfi_restore_state
        movl    $2, b(%rip)
        jmp     .L7
        .cfi_endproc


gcc-trunk -v
Using built-in specs.
Target: x86_64-pc-linux-gnu
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 12.0.0 20211022 (experimental) (GCC)

Introduced with
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=d8e1f1d24179690fd9c0f63c27b12e030010d9ea

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug tree-optimization/102895] [12 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0)
  2021-10-22  9:49 [Bug tree-optimization/102895] New: [12 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0) theodort at inf dot ethz.ch
@ 2021-10-22 10:32 ` rguenth at gcc dot gnu.org
  2021-10-22 11:08 ` aldyh at gcc dot gnu.org
  2021-10-30 18:08 ` aldyh at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-10-22 10:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102895

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |12.0
           Keywords|                            |missed-optimization
     Ever confirmed|0                           |1
                 CC|                            |aldyh at gcc dot gnu.org
   Last reconfirmed|                            |2021-10-22
             Status|UNCONFIRMED                 |NEW

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
There's identical IL before the vrp2 pass (the one after strlen) but on the GCC
11 branch vrp2 eliminates the call to foo while on trunk it does not.

On the branch VRP registers

  Registering jump thread: (2, 3) incoming edge;  (3, 4) normal;

which elides the call but vrp-thread2 does not do this:

 Registering value_relation (_5 > a.4_4) (bb5) at _5 = a.4_4 + 1;
 Registering value_relation (_19 > a.4_13) (bb4) at _19 = a.4_13 + 1;
  [4] Registering jump thread: (2, 3) incoming edge;  (3, 4) normal;
Failure in thread_through_loop_header:   Cancelling jump thread: (2, 3)
incoming edge;  (3, 4) normal;

on the branch this threading destroys the loop structure

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug tree-optimization/102895] [12 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0)
  2021-10-22  9:49 [Bug tree-optimization/102895] New: [12 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0) theodort at inf dot ethz.ch
  2021-10-22 10:32 ` [Bug tree-optimization/102895] " rguenth at gcc dot gnu.org
@ 2021-10-22 11:08 ` aldyh at gcc dot gnu.org
  2021-10-30 18:08 ` aldyh at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: aldyh at gcc dot gnu.org @ 2021-10-22 11:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102895

--- Comment #2 from Aldy Hernandez <aldyh at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #1)
> There's identical IL before the vrp2 pass (the one after strlen) but on the
> GCC 11 branch vrp2 eliminates the call to foo while on trunk it does not.
> 
> On the branch VRP registers
> 
>   Registering jump thread: (2, 3) incoming edge;  (3, 4) normal;
> 
> which elides the call but vrp-thread2 does not do this:
> 
>  Registering value_relation (_5 > a.4_4) (bb5) at _5 = a.4_4 + 1;
>  Registering value_relation (_19 > a.4_13) (bb4) at _19 = a.4_13 + 1;
>   [4] Registering jump thread: (2, 3) incoming edge;  (3, 4) normal;
> Failure in thread_through_loop_header:   Cancelling jump thread: (2, 3)
> incoming edge;  (3, 4) normal;
> 
> on the branch this threading destroys the loop structure

It seems there were other threads in play before the loop restriction changes
went in.

In the branch we have:

$ gcc a.c -O2 -fdump-tree-all-details -c
abulafia:~/bld/t/gcc$ grep thread a.c.*
a.c.124t.dom2:  Registering jump thread: (10, 5) incoming edge;  (5, 12)
normal;
a.c.124t.dom2:  Registering jump thread: (9, 3) incoming edge;  (3, 4) normal;
a.c.189t.dom3:  Registering jump thread: (2, 3) incoming edge;  (3, 4) normal;
a.c.192t.vrp2:  Registering jump thread: (2, 3) incoming edge;  (3, 4) normal;

DOM2 was getting 2 threads, but in mainline we have:

 ./cc1 a.c -O2 -fdump-tree-all-details -quiet
abulafia:~/bld/t/gcc$ grep thread a.c.*
a.c.128t.dom2:Threading through latch before loop opts would create non-empty
latch:   Cancelling jump thread: (10, 5) incoming edge;  (5, 12) normal; 
a.c.128t.dom2:Path rotates loop:   Cancelling jump thread: (9, 3) incoming
edge;  (3, 4) normal; 
a.c.193t.dom3:  [3] Registering jump thread: (2, 3) incoming edge;  (3, 4)
normal; 
a.c.193t.dom3:Failure in thread_through_loop_header:   Cancelling jump thread:
(2, 3) incoming edge;  (3, 4) normal; 
a.c.197t.vrp-thread2:  [4] Registering jump thread: (2, 3) incoming edge;  (3,
4) normal; 
a.c.197t.vrp-thread2:Failure in thread_through_loop_header:   Cancelling jump
thread: (2, 3) incoming edge;  (3, 4) normal; 

Those DOM2 threads were cancelled because of the loop restrictions we put in. 
Since jump threads in one pass open the possibilities of further jump threads
by other passes, it could be that the missing DOM2 threads are causing VRP2 to
miss out.

However, vrp-thread2 *does* find and register the path.  It's the block copier
that is complaining:

a.c.197t.vrp-thread2:  [4] Registering jump thread: (2, 3) incoming edge;  (3,
4) normal; 
a.c.197t.vrp-thread2:Failure in thread_through_loop_header:   Cancelling jump
thread: (2, 3) incoming edge;  (3, 4) normal; 

Note that these "Failure in thread_through_loop_header" messages are new
debugging aids in this release, but the cancel_thread was present nevertheless.
 It was just silent.

I would guess it's either the missing DOM threads that has cascading effects,
or something in the block copier (fwd_jt_path_registry::).  FWIW, there have
been no changes in the block copier in this release.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug tree-optimization/102895] [12 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0)
  2021-10-22  9:49 [Bug tree-optimization/102895] New: [12 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0) theodort at inf dot ethz.ch
  2021-10-22 10:32 ` [Bug tree-optimization/102895] " rguenth at gcc dot gnu.org
  2021-10-22 11:08 ` aldyh at gcc dot gnu.org
@ 2021-10-30 18:08 ` aldyh at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: aldyh at gcc dot gnu.org @ 2021-10-30 18:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102895

Aldy Hernandez <aldyh at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |DUPLICATE
                 CC|                            |law at gcc dot gnu.org,
                   |                            |matz at gcc dot gnu.org

--- Comment #3 from Aldy Hernandez <aldyh at gcc dot gnu.org> ---
This is *almost* a duplicate of PR102981.

Here we have the same scenario: the first iteration of a loop has unreachable
code.  But interestingly, the IL is sufficiently simple that DOM3 (post
loopdone) can see the threading opportunity:

a.c.192t.dom3:  [3] Registering jump thread: (2, 3) incoming edge;  (3, 4)
normal;

but... there's some limitation in the custom block copier the old forward
threader uses:

a.c.192t.dom3:Failure in thread_through_loop_header:   Cancelling jump thread:
(2, 3) incoming edge;  (3, 4) normal;

Again, the backward threader will refuse to thread this, regardless of
loopdone, because it is essentially peeling off the first iteration of a loop. 
This is the main issue we should address, regardless of DOM's limitation.

I'm going to mark this as a duplicate, because I doubt anyone has the
inclination of fixing the old forward threader's copier.

[FWIW, this supersedes the previous comment I made for this PR, as the threader
pipeline has changed in trunk.]

*** This bug has been marked as a duplicate of bug 102981 ***

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-10-30 18:08 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-22  9:49 [Bug tree-optimization/102895] New: [12 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0) theodort at inf dot ethz.ch
2021-10-22 10:32 ` [Bug tree-optimization/102895] " rguenth at gcc dot gnu.org
2021-10-22 11:08 ` aldyh at gcc dot gnu.org
2021-10-30 18:08 ` aldyh at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).