public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/103388] New: [12 Regression] missed optimization for dead code elimination at -O3 (vs. -O2)
@ 2021-11-23 18:21 theodort at inf dot ethz.ch
  2021-11-23 19:19 ` [Bug tree-optimization/103388] " aldyh at gcc dot gnu.org
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: theodort at inf dot ethz.ch @ 2021-11-23 18:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103388

            Bug ID: 103388
           Summary: [12 Regression] missed optimization for dead code
                    elimination at -O3 (vs. -O2)
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: theodort at inf dot ethz.ch
  Target Milestone: ---

cat case.c
void foo(void);
void bar(void);

static int b, d, e, *c, *f = &d, *h = &b;

int main() {
  int **i = &c;
  if (e) {
    e = *f;
    bar();
    if (!(((i && d) + *h >= 1 ^ d & b) <= 4 | d))
      foo();
  }
}

trunk cannot eliminate the call to foo but 11.2.0 can:

gcc-11.2.0 -O3 -S -o /dev/stdout case.c
main:
.LFB0:
        .cfi_startproc
        movl    e(%rip), %ecx
        testl   %ecx, %ecx
        jne     .L8
        xorl    %eax, %eax
        ret
.L8:
        pushq   %rax
        .cfi_def_cfa_offset 16
        movl    d(%rip), %eax
        movl    %eax, e(%rip)
        call    bar
        xorl    %eax, %eax
        popq    %rdx
        .cfi_def_cfa_offset 8
        ret
        .cfi_endproc
.LFE0:

gcc-trunk -O3 -S -o /dev/stdout case.c
main:
.LFB0:
        .cfi_startproc
        movl    e(%rip), %esi
        testl   %esi, %esi
        jne     .L10
        xorl    %eax, %eax
        ret
.L10:
        pushq   %rcx
        .cfi_def_cfa_offset 16
        movl    d(%rip), %eax
        movl    %eax, e(%rip)
        call    bar
        movl    d(%rip), %edx
        movl    b(%rip), %eax
        cmpl    $1, %edx
        movl    %eax, %ecx
        sbbl    $-1, %ecx
        testl   %ecx, %ecx
        setg    %cl
        andl    %edx, %eax
        movzbl  %cl, %ecx
        xorl    %ecx, %eax
        cmpl    $4, %eax
        setle   %al
        movzbl  %al, %eax
        orl     %edx, %eax
        je      .L11
.L3:
        xorl    %eax, %eax
        popq    %rdx
        .cfi_remember_state
        .cfi_def_cfa_offset 8
        ret
.L11:
        .cfi_restore_state
        call    foo
        jmp     .L3
        .cfi_endproc
.LFE0:

gcc-trunk -v
Using built-in specs.
Supported LTO compression algorithms: zlib zstd
gcc version 12.0.0 20211123 (experimental) (GCC)

Started with
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=4b3a325f07acebf47e82de227ce1d5ba62f5bcae

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/103388] [12 Regression] missed optimization for dead code elimination at -O3 (vs. -O2)
  2021-11-23 18:21 [Bug tree-optimization/103388] New: [12 Regression] missed optimization for dead code elimination at -O3 (vs. -O2) theodort at inf dot ethz.ch
@ 2021-11-23 19:19 ` aldyh at gcc dot gnu.org
  2021-11-23 19:31 ` aldyh at gcc dot gnu.org
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: aldyh at gcc dot gnu.org @ 2021-11-23 19:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103388

Aldy Hernandez <aldyh at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|---                         |DUPLICATE

--- Comment #1 from Aldy Hernandez <aldyh at gcc dot gnu.org> ---
By *.threadfull1 this is the path at 4->5->7.  It looks like:

PREHEADER
|
v
HEADER--------+
|             |
V
UNREACHABLE   |
|            /
V           /
return 0 <-+

This is more or less PR102981.

Is there any way we can stop reporting the same thing over and over?

*** This bug has been marked as a duplicate of bug 102981 ***

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/103388] [12 Regression] missed optimization for dead code elimination at -O3 (vs. -O2)
  2021-11-23 18:21 [Bug tree-optimization/103388] New: [12 Regression] missed optimization for dead code elimination at -O3 (vs. -O2) theodort at inf dot ethz.ch
  2021-11-23 19:19 ` [Bug tree-optimization/103388] " aldyh at gcc dot gnu.org
@ 2021-11-23 19:31 ` aldyh at gcc dot gnu.org
  2021-11-23 23:18 ` law at gcc dot gnu.org
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: aldyh at gcc dot gnu.org @ 2021-11-23 19:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103388

Aldy Hernandez <aldyh at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |law at gcc dot gnu.org
         Resolution|DUPLICATE                   |---
             Status|RESOLVED                    |NEW
   Last reconfirmed|                            |2021-11-23
     Ever confirmed|0                           |1

--- Comment #2 from Aldy Hernandez <aldyh at gcc dot gnu.org> ---
(In reply to Aldy Hernandez from comment #1)
> By *.threadfull1 this is the path at 4->5->7.  It looks like:
> 
> PREHEADER
> |
> v
> HEADER--------+
> |             |
> V
> UNREACHABLE   |
> |            /
> V           /
> return 0 <-+
> 
> This is more or less PR102981.
> 
> Is there any way we can stop reporting the same thing over and over?
> 
> *** This bug has been marked as a duplicate of bug 102981 ***

Errr, wait a minute, that's not a loop.  My bad.

We're failing to thread 4->5->xxx because:

Checking profitability of path (backwards):  bb:5 (10 insns) bb:4
  Control statement insns: 2
  Overall: 8 insns
  FAIL: Did not thread around loop and would copy too many statements.

which is a limitation of the backward threader copier:

  /* The generic copier used by the backthreader does not re-use an
     existing threading path to reduce code duplication.  So for that
     case, drastically reduce the number of statements we are allowed
     to copy.  */
  if (!(threaded_through_latch && threaded_multiway_branch)
      && (n_insns * param_fsm_scale_path_stmts
          >= param_max_jump_thread_duplication_stmts))
    {
      if (dump_file && (dump_flags & TDF_DETAILS))
        fprintf (dump_file,
                 "  FAIL: Did not thread around loop and would copy too "
                 "many statements.\n");
      return false;
    }

Confirmed.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/103388] [12 Regression] missed optimization for dead code elimination at -O3 (vs. -O2)
  2021-11-23 18:21 [Bug tree-optimization/103388] New: [12 Regression] missed optimization for dead code elimination at -O3 (vs. -O2) theodort at inf dot ethz.ch
  2021-11-23 19:19 ` [Bug tree-optimization/103388] " aldyh at gcc dot gnu.org
  2021-11-23 19:31 ` aldyh at gcc dot gnu.org
@ 2021-11-23 23:18 ` law at gcc dot gnu.org
  2021-11-24  8:50 ` [Bug tree-optimization/103388] [12 Regression] missed optimization for dead code elimination at -O3 (trunk vs 11.2.0) rguenth at gcc dot gnu.org
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: law at gcc dot gnu.org @ 2021-11-23 23:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103388

Jeffrey A. Law <law at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at gcc dot gnu.org      |law at gcc dot gnu.org

--- Comment #3 from Jeffrey A. Law <law at gcc dot gnu.org> ---
So to fix this right we'd need to duplicate some of the logic in
tree-ssa-threadupdate.c.  Conceptually for block B where one or more
predecessors thread to target T, you make a single copy B', and redirect *all*
the relevant predecessors to B'.

In addition to allowing more aggressive threading, it would also reduce
codesize since currently we'll end up with multiple copies of B'.  We have
optimizers that are supposed to clean that up, but I've never seen them do a
particularly good job.

This isn't likely to land in gcc-12.

An interim approach might be to go ahead and register the thread and only
reject it for size later if we're going to end up with multiple copies.  After
all this is a cost analysis question and we don't know until all the paths are
registered if it's profitable or not.

Anyway, it should probably be assigned to me.  Not sure if I'll get to the
interim approach or not for gcc-12.  I'll have to poke around a bit.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/103388] [12 Regression] missed optimization for dead code elimination at -O3 (trunk vs 11.2.0)
  2021-11-23 18:21 [Bug tree-optimization/103388] New: [12 Regression] missed optimization for dead code elimination at -O3 (vs. -O2) theodort at inf dot ethz.ch
                   ` (2 preceding siblings ...)
  2021-11-23 23:18 ` law at gcc dot gnu.org
@ 2021-11-24  8:50 ` rguenth at gcc dot gnu.org
  2022-01-18 14:24 ` rguenth at gcc dot gnu.org
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-11-24  8:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103388

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |12.0
           Keywords|                            |missed-optimization

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/103388] [12 Regression] missed optimization for dead code elimination at -O3 (trunk vs 11.2.0)
  2021-11-23 18:21 [Bug tree-optimization/103388] New: [12 Regression] missed optimization for dead code elimination at -O3 (vs. -O2) theodort at inf dot ethz.ch
                   ` (3 preceding siblings ...)
  2021-11-24  8:50 ` [Bug tree-optimization/103388] [12 Regression] missed optimization for dead code elimination at -O3 (trunk vs 11.2.0) rguenth at gcc dot gnu.org
@ 2022-01-18 14:24 ` rguenth at gcc dot gnu.org
  2022-01-18 15:28 ` law at gcc dot gnu.org
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-01-18 14:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103388

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|2021-11-23 00:00:00         |2022-1-18
             Status|NEW                         |ASSIGNED

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
Re-confirmed.(In reply to Jeffrey A. Law from comment #3)
> So to fix this right we'd need to duplicate some of the logic in
> tree-ssa-threadupdate.c.  Conceptually for block B where one or more
> predecessors thread to target T, you make a single copy B', and redirect
> *all* the relevant predecessors to B'.
> 
> In addition to allowing more aggressive threading, it would also reduce
> codesize since currently we'll end up with multiple copies of B'.  We have
> optimizers that are supposed to clean that up, but I've never seen them do a
> particularly good job.
> 
> This isn't likely to land in gcc-12.
> 
> An interim approach might be to go ahead and register the thread and only
> reject it for size later if we're going to end up with multiple copies. 
> After all this is a cost analysis question and we don't know until all the
> paths are registered if it's profitable or not.

So I think at least this should be possible, no?  Also why do we need to
do extra limitation?  We should end up accounting for B's size N times
without the optimization so the costing is still accurate, no?

So IMHO the scaling factors do not make much sense to me, they were introduced
to fix PR68398.

We need --param fsm-scale-path-stmts=1 to get the desired threading done.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/103388] [12 Regression] missed optimization for dead code elimination at -O3 (trunk vs 11.2.0)
  2021-11-23 18:21 [Bug tree-optimization/103388] New: [12 Regression] missed optimization for dead code elimination at -O3 (vs. -O2) theodort at inf dot ethz.ch
                   ` (4 preceding siblings ...)
  2022-01-18 14:24 ` rguenth at gcc dot gnu.org
@ 2022-01-18 15:28 ` law at gcc dot gnu.org
  2022-05-06  8:31 ` [Bug tree-optimization/103388] [12/13 " jakub at gcc dot gnu.org
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: law at gcc dot gnu.org @ 2022-01-18 15:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103388

--- Comment #5 from Jeffrey A. Law <law at gcc dot gnu.org> ---
We thread one edge at a time, so we don't know ahead of time how many copies
there would be.

It could be restructured to go ahead and register these threads, then compute
the copy cost on a more global basis.  That would allow us to bump up the
threshold to register the thread, but still reject things later if the cost
appears to be too high.

The book keeping necessary to do that would actually be step #0 for the real
solution which would be to fix the new copier to coalesce cases where multiple
incoming edges thread to the same outgoing edge in a manner similar to what
tree-ssa-threadupdate does.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/103388] [12/13 Regression] missed optimization for dead code elimination at -O3 (trunk vs 11.2.0)
  2021-11-23 18:21 [Bug tree-optimization/103388] New: [12 Regression] missed optimization for dead code elimination at -O3 (vs. -O2) theodort at inf dot ethz.ch
                   ` (5 preceding siblings ...)
  2022-01-18 15:28 ` law at gcc dot gnu.org
@ 2022-05-06  8:31 ` jakub at gcc dot gnu.org
  2022-07-26 13:09 ` rguenth at gcc dot gnu.org
  2023-05-08 12:23 ` [Bug tree-optimization/103388] [12/13/14 " rguenth at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-05-06  8:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103388

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|12.0                        |12.2

--- Comment #6 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
GCC 12.1 is being released, retargeting bugs to GCC 12.2.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/103388] [12/13 Regression] missed optimization for dead code elimination at -O3 (trunk vs 11.2.0)
  2021-11-23 18:21 [Bug tree-optimization/103388] New: [12 Regression] missed optimization for dead code elimination at -O3 (vs. -O2) theodort at inf dot ethz.ch
                   ` (6 preceding siblings ...)
  2022-05-06  8:31 ` [Bug tree-optimization/103388] [12/13 " jakub at gcc dot gnu.org
@ 2022-07-26 13:09 ` rguenth at gcc dot gnu.org
  2023-05-08 12:23 ` [Bug tree-optimization/103388] [12/13/14 " rguenth at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-07-26 13:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103388

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P3                          |P2

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/103388] [12/13/14 Regression] missed optimization for dead code elimination at -O3 (trunk vs 11.2.0)
  2021-11-23 18:21 [Bug tree-optimization/103388] New: [12 Regression] missed optimization for dead code elimination at -O3 (vs. -O2) theodort at inf dot ethz.ch
                   ` (7 preceding siblings ...)
  2022-07-26 13:09 ` rguenth at gcc dot gnu.org
@ 2023-05-08 12:23 ` rguenth at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-05-08 12:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103388

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|12.3                        |12.4

--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 12.3 is being released, retargeting bugs to GCC 12.4.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-05-08 12:23 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-23 18:21 [Bug tree-optimization/103388] New: [12 Regression] missed optimization for dead code elimination at -O3 (vs. -O2) theodort at inf dot ethz.ch
2021-11-23 19:19 ` [Bug tree-optimization/103388] " aldyh at gcc dot gnu.org
2021-11-23 19:31 ` aldyh at gcc dot gnu.org
2021-11-23 23:18 ` law at gcc dot gnu.org
2021-11-24  8:50 ` [Bug tree-optimization/103388] [12 Regression] missed optimization for dead code elimination at -O3 (trunk vs 11.2.0) rguenth at gcc dot gnu.org
2022-01-18 14:24 ` rguenth at gcc dot gnu.org
2022-01-18 15:28 ` law at gcc dot gnu.org
2022-05-06  8:31 ` [Bug tree-optimization/103388] [12/13 " jakub at gcc dot gnu.org
2022-07-26 13:09 ` rguenth at gcc dot gnu.org
2023-05-08 12:23 ` [Bug tree-optimization/103388] [12/13/14 " rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).