public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/100314] New: missed optimization for dead code elimination at -O3 (vs. -O1)
@ 2021-04-28 10:17 zhendong.su at inf dot ethz.ch
  2021-09-25 10:05 ` [Bug ipa/100314] missed optimization for dead code elimination at -O3 (vs. -O1) (inlining differences) pinskia at gcc dot gnu.org
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: zhendong.su at inf dot ethz.ch @ 2021-04-28 10:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100314

            Bug ID: 100314
           Summary: missed optimization for dead code elimination at -O3
                    (vs. -O1)
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: zhendong.su at inf dot ethz.ch
  Target Milestone: ---

[592] % gcctk -v
Using built-in specs.
COLLECT_GCC=gcctk
COLLECT_LTO_WRAPPER=/local/suz-local/software/local/gcc-trunk/libexec/gcc/x86_64-pc-linux-gnu/12.0.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../gcc-trunk/configure --disable-bootstrap
--prefix=/local/suz-local/software/local/gcc-trunk --enable-languages=c,c++
--disable-werror --enable-multilib --with-system-zlib
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 12.0.0 20210428 (experimental) [master revision
852dd866e2f:9b04e5b2651:b81e2d5e76a6bcc71f45b122e8b5538ddb7ebf4c] (GCC) 
[593] % 
[593] % gcctk -O1 -S -o O1.s small.c
[594] % gcctk -O3 -S -o O3.s small.c
[595] % 
[595] % wc O1.s O3.s
  21   45  392 O1.s
  42   86  686 O3.s
  63  131 1078 total
[596] % 
[596] % grep foo O1.s
[597] % grep foo O3.s
        jmp     foo
[598] % 
[598] % cat small.c
extern void foo(void);
static int a, *c, g, **j;
int b;
static void e() {
  int k, *l[5] = {&k, &k, &k, &k, &k};
  while (g) {
    j = &l[0];
    b++;
  }
}
static void d(int m) {
  int **h[30] = {&c}, ***i[1] = {&h[3]};
  if (m)
    foo();
  e();
}
int main() {
  d(a);
  return 0;
}

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug ipa/100314] missed optimization for dead code elimination at -O3 (vs. -O1)  (inlining differences)
  2021-04-28 10:17 [Bug tree-optimization/100314] New: missed optimization for dead code elimination at -O3 (vs. -O1) zhendong.su at inf dot ethz.ch
@ 2021-09-25 10:05 ` pinskia at gcc dot gnu.org
  2022-01-11  9:53 ` rguenth at gcc dot gnu.org
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-09-25 10:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100314

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
            Summary|missed optimization for     |missed optimization for
                   |dead code elimination at    |dead code elimination at
                   |-O3 (vs. -O1)               |-O3 (vs. -O1)  (inlining
                   |                            |differences)
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2021-09-25
          Component|tree-optimization           |ipa
                 CC|                            |marxin at gcc dot gnu.org
           See Also|                            |https://gcc.gnu.org/bugzill
                   |                            |a/show_bug.cgi?id=100220

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Confirmed, similar issue to PR 100220.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug ipa/100314] missed optimization for dead code elimination at -O3 (vs. -O1)  (inlining differences)
  2021-04-28 10:17 [Bug tree-optimization/100314] New: missed optimization for dead code elimination at -O3 (vs. -O1) zhendong.su at inf dot ethz.ch
  2021-09-25 10:05 ` [Bug ipa/100314] missed optimization for dead code elimination at -O3 (vs. -O1) (inlining differences) pinskia at gcc dot gnu.org
@ 2022-01-11  9:53 ` rguenth at gcc dot gnu.org
  2022-01-11 10:36 ` hubicka at gcc dot gnu.org
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-01-11  9:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100314

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Version|unknown                     |12.0
   Last reconfirmed|2021-09-25 00:00:00         |2022-1-11
                 CC|                            |hubicka at gcc dot gnu.org,
                   |                            |jamborm at gcc dot gnu.org

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
the key is to notice that 'a' is not modified and thus zero and propagate that
to 'd'.  With -O3 IPA figures out 'a' is not modified but that isn't taken into
account by IPA CP which sees

  a.0_1 = a;
  d (a.0_1);

with -O1 we happen to inline everything during IPA and nothing early while
with -O3 'e' is early inlined into 'd' (but not d into main) and IPA
refuses to inline d into main:

t.c:18:3: missed:   not inlinable: main/7 -> d/6, --param
large-stack-frame-growth limit reached

sth is off with the stack limit I guess, maybe it's not cummulated correctly
at -O1.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug ipa/100314] missed optimization for dead code elimination at -O3 (vs. -O1)  (inlining differences)
  2021-04-28 10:17 [Bug tree-optimization/100314] New: missed optimization for dead code elimination at -O3 (vs. -O1) zhendong.su at inf dot ethz.ch
  2021-09-25 10:05 ` [Bug ipa/100314] missed optimization for dead code elimination at -O3 (vs. -O1) (inlining differences) pinskia at gcc dot gnu.org
  2022-01-11  9:53 ` rguenth at gcc dot gnu.org
@ 2022-01-11 10:36 ` hubicka at gcc dot gnu.org
  2022-01-11 10:40 ` hubicka at gcc dot gnu.org
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: hubicka at gcc dot gnu.org @ 2022-01-11 10:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100314

--- Comment #3 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
With -O1 i get:
IPA function summary for main/7 inlinable
  global time:     72.936364
  self size:       6
  global size:     19
  min size:       16
  self stack:      0
  global stack:    44
    size:15.000000, time:67.636364
    size:3.000000, time:2.000000,  executed if:(not inlined)
  calls:
    d/6 inlined
      freq:1.00
      Stack frame offset 0, callee self size 0
      e/5 inlined
        freq:1.00
        Stack frame offset 0, callee self size 44
      foo/8 function body not available
        freq:0.33 loop depth: 0 size: 1 time: 10

So we get 44 bytes of stack use.

However with -O3 we see:
IPA function summary for main/7 inlinable
  global time:     14.000000
  self size:       6
  global size:     6
  min size:       3
  self stack:      0
  global stack:    0
    size:1.000000, time:1.000000
    size:3.000000, time:2.000000,  executed if:(not inlined)
  calls:
    d/6 function not considered for inlining
      freq:1.00 loop depth: 0 size: 2 time: 11 callee size:11 stack:284

IPA function summary for d/6 inlinable
  global time:     87.936364
  self size:       23
  global size:     23
  min size:       17
  self stack:      284
  global stack:    284
    size:17.000000, time:80.636364
    size:3.000000, time:2.000000,  executed if:(not inlined)
    size:2.000000, time:2.000000,  nonconst if:(op0 changed)
  calls:
    foo/8 function body not available
      freq:0.33 loop depth: 0 size: 1 time: 10 predicate: (op0 != 0)

So now the stack is 284 bytes for d which hits the limits.
It seems to be array h[30] (240 bytes) and l[5] (40 bytes).

With -O1 array h is optimized out completely however with -O3 I see:
void d (int m)                                                                  
{                                                                               
  int k;                                                                        
  int * l[5];                                                                   
  int * * h[30];                                                                
  int b.1_8;                                                                    
  int _9;                                                                       
  int g.2_10;                                                                   

  <bb 2> [local count: 118111600]:                                              
  MEM <char[232]> [(int * *[30] *)&h + 8B] = {};                                
  h[0] = &c;                                                                    
  if (m_5(D) != 0)                                                              
    goto <bb 3>; [33.00%]                                                       
  else                                                                          
    goto <bb 4>; [67.00%]                                                       

  <bb 3> [local count: 38976828]:                                               
  foo ();                                                                       

  <bb 4> [local count: 118111600]:                                              
  l[0] = &k;                                                                    
  l[1] = &k;                                                                    
  l[2] = &k;                                                                    
  l[3] = &k;                                                                    
  l[4] = &k;                                                                    
  goto <bb 6>; [100.00%]                                                        

  <bb 5> [local count: 955630225]:                                              
  j = &l[0];                                                                    
  b.1_8 = b;                                                                    
  _9 = b.1_8 + 1;                                                               
  b = _9;                                                                       

  <bb 6> [local count: 1073741824]:                                             
  g.2_10 = g;                                                                   
  if (g.2_10 != 0)                                                              
    goto <bb 5>; [89.00%]                                                       
  else                                                                          
    goto <bb 7>; [11.00%]                                                       

  <bb 7> [local count: 118111600]:                                              
  k ={v} {CLOBBER};                                                             
  l ={v} {CLOBBER};                                                             
  h ={v} {CLOBBER};                                                             
  return;                                                                       

}                                                                               

So h is dead but DSE did not happen since at DSE time we see:

  <bb 2> :                                                                      
  MEM <char[232]> [(int * *[30] *)&h + 8B] = {};                                
  h[0] = &c;                                                                    
  i[0] = &h[3];                                                                 
  if (m_6(D) != 0)                                                              
    goto <bb 3>; [INV]                                                          
  else                                                                          
    goto <bb 4>; [INV]                                                          

  <bb 3> :                                                                      
  foo ();                                                                       

  <bb 4> :                                                                      

Which keeps h alive.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug ipa/100314] missed optimization for dead code elimination at -O3 (vs. -O1)  (inlining differences)
  2021-04-28 10:17 [Bug tree-optimization/100314] New: missed optimization for dead code elimination at -O3 (vs. -O1) zhendong.su at inf dot ethz.ch
                   ` (2 preceding siblings ...)
  2022-01-11 10:36 ` hubicka at gcc dot gnu.org
@ 2022-01-11 10:40 ` hubicka at gcc dot gnu.org
  2023-08-18  2:32 ` [Bug ipa/100314] missed optimization for dead code elimination at -O3 (vs. -O1) (inlining differences due to missed dse) pinskia at gcc dot gnu.org
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: hubicka at gcc dot gnu.org @ 2022-01-11 10:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100314

--- Comment #4 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
However i is also dead at dse time:
void d (int m)                                                                  
{                                                                               
  int k;                                                                        
  int * l[5];                                                                   
  int * * * i[1];                                                               
  int * * h[30];                                                                
  int b.1_11;                                                                   
  int _12;                                                                      
  int g.2_13;                                                                   

  <bb 2> :                                                                      
  MEM <char[232]> [(int * *[30] *)&h + 8B] = {};                                
  h[0] = &c;                                                                    
  i[0] = &h[3];                                                                 
  if (m_6(D) != 0)                                                              
    goto <bb 3>; [INV]                                                          
  else                                                                          
    goto <bb 4>; [INV]                                                          

  <bb 3> :                                                                      
  foo ();                                                                       

  <bb 4> :                                                                      
  l[0] = &k;                                                                    
  l[1] = &k;                                                                    
  l[2] = &k;                                                                    
  l[3] = &k;                                                                    
  l[4] = &k;                                                                    
  goto <bb 6>; [100.00%]                                                        

  <bb 5> :                                                                      
  j = &l[0];                                                                    
  b.1_11 = b;                                                                   
  _12 = b.1_11 + 1;                                                             
  b = _12;                                                                      

  <bb 6> :                                                                      
  g.2_13 = g;                                                                   
  if (g.2_13 != 0)                                                              
    goto <bb 5>; [89.00%]                                                       
  else                                                                          
    goto <bb 7>; [11.00%]                                                       

  <bb 7> :                                                                      
  k ={v} {CLOBBER};                                                             
  l ={v} {CLOBBER};                                                             
  h ={v} {CLOBBER};                                                             
  i ={v} {CLOBBER};                                                             
  return;                                                                       

}                                                                               

Not sure why DSE is not optimizing this out.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug ipa/100314] missed optimization for dead code elimination at -O3 (vs. -O1)  (inlining differences due to missed dse)
  2021-04-28 10:17 [Bug tree-optimization/100314] New: missed optimization for dead code elimination at -O3 (vs. -O1) zhendong.su at inf dot ethz.ch
                   ` (3 preceding siblings ...)
  2022-01-11 10:40 ` hubicka at gcc dot gnu.org
@ 2023-08-18  2:32 ` pinskia at gcc dot gnu.org
  2024-05-15 16:44 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-08-18  2:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100314

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://gcc.gnu.org/bugzill
                   |                            |a/show_bug.cgi?id=100188

--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #2)
> the key is to notice that 'a' is not modified and thus zero and propagate
> that to 'd'.  With -O3 IPA figures out 'a' is not modified but that isn't
> taken into

And that is the same issue as PR 100188 (and maybe others).

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug ipa/100314] missed optimization for dead code elimination at -O3 (vs. -O1)  (inlining differences due to missed dse)
  2021-04-28 10:17 [Bug tree-optimization/100314] New: missed optimization for dead code elimination at -O3 (vs. -O1) zhendong.su at inf dot ethz.ch
                   ` (4 preceding siblings ...)
  2023-08-18  2:32 ` [Bug ipa/100314] missed optimization for dead code elimination at -O3 (vs. -O1) (inlining differences due to missed dse) pinskia at gcc dot gnu.org
@ 2024-05-15 16:44 ` rguenth at gcc dot gnu.org
  2024-05-16  9:04 ` cvs-commit at gcc dot gnu.org
  2024-05-16  9:05 ` rguenth at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-05-15 16:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100314

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org

--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
Testing a patch.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug ipa/100314] missed optimization for dead code elimination at -O3 (vs. -O1)  (inlining differences due to missed dse)
  2021-04-28 10:17 [Bug tree-optimization/100314] New: missed optimization for dead code elimination at -O3 (vs. -O1) zhendong.su at inf dot ethz.ch
                   ` (5 preceding siblings ...)
  2024-05-15 16:44 ` rguenth at gcc dot gnu.org
@ 2024-05-16  9:04 ` cvs-commit at gcc dot gnu.org
  2024-05-16  9:05 ` rguenth at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-05-16  9:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100314

--- Comment #7 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:1e0ae1f52741f7e0133661659ed2d210f939a398

commit r15-571-g1e0ae1f52741f7e0133661659ed2d210f939a398
Author: Richard Biener <rguenther@suse.de>
Date:   Wed May 15 18:32:37 2024 +0200

    tree-optimization/79958 - make DSE track multiple paths

    DSE currently gives up when the path we analyze forks.  This leads
    to multiple missed dead store elimination PRs.  The following fixes
    this by recursing for each path and maintaining the visited bitmap
    to avoid visiting CFG re-merges multiple times.  The overall cost
    is still limited by the same bound, it's just more likely we'll hit
    the limit now.  The patch doesn't try to deal with byte tracking
    once a path forks but drops info on the floor and only handling
    fully dead stores in that case.

            PR tree-optimization/79958
            PR tree-optimization/109087
            PR tree-optimization/100314
            PR tree-optimization/114774
            * tree-ssa-dse.cc (dse_classify_store): New forwarder.
            (dse_classify_store): Add arguments cnt and visited, recurse
            to track multiple paths when we end up with multiple defs.

            * gcc.dg/tree-ssa/ssa-dse-48.c: New testcase.
            * gcc.dg/tree-ssa/ssa-dse-49.c: Likewise.
            * gcc.dg/tree-ssa/ssa-dse-50.c: Likewise.
            * gcc.dg/tree-ssa/ssa-dse-51.c: Likewise.
            * gcc.dg/graphite/pr80906.c: Avoid DSE of last data reference
            in loop.
            * g++.dg/ipa/devirt-24.C: Adjust for extra DSE.
            * g++.dg/warn/Wuninitialized-pr107919-1.C: Use more important
            -O2 optimization level, -O1 regresses.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug ipa/100314] missed optimization for dead code elimination at -O3 (vs. -O1)  (inlining differences due to missed dse)
  2021-04-28 10:17 [Bug tree-optimization/100314] New: missed optimization for dead code elimination at -O3 (vs. -O1) zhendong.su at inf dot ethz.ch
                   ` (6 preceding siblings ...)
  2024-05-16  9:04 ` cvs-commit at gcc dot gnu.org
@ 2024-05-16  9:05 ` rguenth at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-05-16  9:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100314

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |15.0
         Resolution|---                         |FIXED
             Status|ASSIGNED                    |RESOLVED

--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
Should be fixed for GCC 15.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-05-16  9:05 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-28 10:17 [Bug tree-optimization/100314] New: missed optimization for dead code elimination at -O3 (vs. -O1) zhendong.su at inf dot ethz.ch
2021-09-25 10:05 ` [Bug ipa/100314] missed optimization for dead code elimination at -O3 (vs. -O1) (inlining differences) pinskia at gcc dot gnu.org
2022-01-11  9:53 ` rguenth at gcc dot gnu.org
2022-01-11 10:36 ` hubicka at gcc dot gnu.org
2022-01-11 10:40 ` hubicka at gcc dot gnu.org
2023-08-18  2:32 ` [Bug ipa/100314] missed optimization for dead code elimination at -O3 (vs. -O1) (inlining differences due to missed dse) pinskia at gcc dot gnu.org
2024-05-15 16:44 ` rguenth at gcc dot gnu.org
2024-05-16  9:04 ` cvs-commit at gcc dot gnu.org
2024-05-16  9:05 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).