public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/115256] New: [15 Regression] 502.gcc_r Run failed with '-march=native -Ofast -funroll-loops -flto' since r15-571-g1e0ae1f52741f7
@ 2024-05-28  9:05 admin at levyhsu dot com
  2024-05-28 10:46 ` [Bug tree-optimization/115256] " rguenth at gcc dot gnu.org
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: admin at levyhsu dot com @ 2024-05-28  9:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115256

            Bug ID: 115256
           Summary: [15 Regression] 502.gcc_r Run failed with
                    '-march=native -Ofast -funroll-loops -flto' since
                    r15-571-g1e0ae1f52741f7
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: admin at levyhsu dot com
  Target Milestone: ---

Bisect down to r15-571-g1e0ae1f52741f7
(1e0ae1f52741f7e0133661659ed2d210f939a398)

tree-optimization/79958 - make DSE track multiple paths
DSE currently gives up when the path we analyze forks.  This leads
to multiple missed dead store elimination PRs.  The following fixes
this by recursing for each path and maintaining the visited bitmap
to avoid visiting CFG re-merges multiple times.  The overall cost
is still limited by the same bound, it's just more likely we'll hit
the limit now.  The patch doesn't try to deal with byte tracking
once a path forks but drops info on the floor and only handling
fully dead stores in that case.

        PR tree-optimization/79958
        PR tree-optimization/109087
        PR tree-optimization/100314
        PR tree-optimization/114774
        * tree-ssa-dse.cc (dse_classify_store): New forwarder.
        (dse_classify_store): Add arguments cnt and visited, recurse
        to track multiple paths when we end up with multiple defs.

        * gcc.dg/tree-ssa/ssa-dse-48.c: New testcase.
        * gcc.dg/tree-ssa/ssa-dse-49.c: Likewise.
        * gcc.dg/tree-ssa/ssa-dse-50.c: Likewise.
        * gcc.dg/tree-ssa/ssa-dse-51.c: Likewise.
        * gcc.dg/graphite/pr80906.c: Avoid DSE of last data reference
        in loop.
        * g++.dg/ipa/devirt-24.C: Adjust for extra DSE.
        * g++.dg/warn/Wuninitialized-pr107919-1.C: Use more important
        -O2 optimization level, -O1 regresses.

Observed on 
Ice Lake
Cascade Lake
AlderLake
Zen3 Server/Client
Also failed on Aarch64 (But didn't bisect)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/115256] [15 Regression] 502.gcc_r Run failed with '-march=native -Ofast -funroll-loops -flto' since r15-571-g1e0ae1f52741f7
  2024-05-28  9:05 [Bug tree-optimization/115256] New: [15 Regression] 502.gcc_r Run failed with '-march=native -Ofast -funroll-loops -flto' since r15-571-g1e0ae1f52741f7 admin at levyhsu dot com
@ 2024-05-28 10:46 ` rguenth at gcc dot gnu.org
  2024-05-28 12:02 ` rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-05-28 10:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115256

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |ASSIGNED
   Last reconfirmed|                            |2024-05-28
   Target Milestone|---                         |15.0
           Keywords|                            |needs-reduction, wrong-code
     Ever confirmed|0                           |1
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Mine.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/115256] [15 Regression] 502.gcc_r Run failed with '-march=native -Ofast -funroll-loops -flto' since r15-571-g1e0ae1f52741f7
  2024-05-28  9:05 [Bug tree-optimization/115256] New: [15 Regression] 502.gcc_r Run failed with '-march=native -Ofast -funroll-loops -flto' since r15-571-g1e0ae1f52741f7 admin at levyhsu dot com
  2024-05-28 10:46 ` [Bug tree-optimization/115256] " rguenth at gcc dot gnu.org
@ 2024-05-28 12:02 ` rguenth at gcc dot gnu.org
  2024-05-28 13:28 ` admin at levyhsu dot com
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-05-28 12:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115256

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed on Zen3 btw, fails with the test input already.  Note that this may
still be a latent issue in 502.gcc_r.  -funroll-loops isn't neccessary,
-O3 -flto was enough to reproduce (no specific sub-architecture required).

-fno-strict-aliasing avoids the issue. 

--param dse-max-object-size=0 doesn't help (turn off live byte tracking)

The patch itself likely adds quite some extra DSE so that's too much to
track down.  DSE doesn't have a debug counter at the moment,
but "bisecting" --param dse-max-alias-queries-per-store shows the issue
still happens with 64 but not with 48.

The issue still reproduces with -flto-partition=1to1 (if one wants to
try per-TU compile flags) and with -flto-partition=one (if you want to
add a debug counter and bisect the bad store, but =one is slow).

We ICE in cfgloopmanip.c:create_preheader here:

basic_block
create_preheader (struct loop *loop, int flags)
{
  edge e, fallthru;
  basic_block dummy;
  int nentry = 0;
  bool irred = false;
  bool latch_edge_was_fallthru;
  edge one_succ_pred = NULL, single_entry = NULL;
  edge_iterator ei;

  FOR_EACH_EDGE (e, ei, loop->header->preds)
    { 
      if (e->src == loop->latch)
        continue;
      irred |= (e->flags & EDGE_IRREDUCIBLE_LOOP) != 0;
      nentry++;
      single_entry = e;
      if (single_succ_p (e->src))
        one_succ_pred = e;
    }
  gcc_assert (nentry);
^^^

placing noinline on the above function still reproduces the issue.  We
seem to run the above for the loop tree root but call from create_preheaders
which does

1425      FOR_EACH_LOOP (li, loop, 0)
1426        create_preheader (loop, flags);

(note absence of LI_INCLUDE_ROOT) so somehow the loop iterator setup
is broken.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/115256] [15 Regression] 502.gcc_r Run failed with '-march=native -Ofast -funroll-loops -flto' since r15-571-g1e0ae1f52741f7
  2024-05-28  9:05 [Bug tree-optimization/115256] New: [15 Regression] 502.gcc_r Run failed with '-march=native -Ofast -funroll-loops -flto' since r15-571-g1e0ae1f52741f7 admin at levyhsu dot com
  2024-05-28 10:46 ` [Bug tree-optimization/115256] " rguenth at gcc dot gnu.org
  2024-05-28 12:02 ` rguenth at gcc dot gnu.org
@ 2024-05-28 13:28 ` admin at levyhsu dot com
  2024-06-17  7:41 ` hongyuw at gcc dot gnu.org
  2024-06-17 12:45 ` rguenth at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: admin at levyhsu dot com @ 2024-05-28 13:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115256

--- Comment #3 from Levy Hsu <admin at levyhsu dot com> ---
FYI we tried serval combinations, -funroll-loops didn't cause the issue, The
link-time optimization -flto may caused the issue, we can pass with the option
[-march=native -Ofast -funroll-loops].

But compiling with -flto makes it harder to minimize a test case. So we're
still not clear what exactly the issue is.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/115256] [15 Regression] 502.gcc_r Run failed with '-march=native -Ofast -funroll-loops -flto' since r15-571-g1e0ae1f52741f7
  2024-05-28  9:05 [Bug tree-optimization/115256] New: [15 Regression] 502.gcc_r Run failed with '-march=native -Ofast -funroll-loops -flto' since r15-571-g1e0ae1f52741f7 admin at levyhsu dot com
                   ` (2 preceding siblings ...)
  2024-05-28 13:28 ` admin at levyhsu dot com
@ 2024-06-17  7:41 ` hongyuw at gcc dot gnu.org
  2024-06-17 12:45 ` rguenth at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: hongyuw at gcc dot gnu.org @ 2024-06-17  7:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115256

Hongyu Wang <hongyuw at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hongyuw at gcc dot gnu.org

--- Comment #4 from Hongyu Wang <hongyuw at gcc dot gnu.org> ---
Part of the dump for create_preheaders before DSE

------
<bb 6> [local count: 29277718]:            
# .MEM_153 = PHI <.MEM_161(4), .MEM_161(3)>
# _15 = PHI <_17(4), 0(3)>                 
# _120 = PHI <_19(4), 0(3)>                
if (_120 == 0)                             
  goto <bb 10>; [45.64%]                   
else                                       
  goto <bb 7>; [54.36%]                    

<bb 7> [local count: 27536775]:            
# _66 = PHI <_15(6), _17(5)>               
# .MEM_125 = PHI <.MEM_153(6), .MEM_164(5)>
_87 = (long unsigned int) _66;             
_88 = _87 * 4;                             
_89 = _88 + 8;                             
_110 = _89;                                
# .MEM_167 = VDEF <.MEM_125>               
newmem_111 = malloc (_110);                
if (newmem_111 == 0B)                      
  goto <bb 8>; [0.04%]                     
else                                       
  goto <bb 9>; [99.96%]                    

<bb 8> [local count: 11015]:               
# .MEM_168 = VDEF <.MEM_167>               
xmalloc_failed (_110);                     

<bb 9> [local count: 27536775]:            
# .MEM_154 = PHI <.MEM_167(7), .MEM_168(8)>
# .MEM_170 = VDEF <.MEM_154>               
MEM[(struct vec_prefix *)newmem_111].alloc = _66;      
# .MEM_171 = VDEF <.MEM_170>                           
MEM[(struct vec_prefix *)newmem_111].num = 0;          

<bb 10> [local count: 39298950]:                       
# _91 = PHI <0B(6), newmem_111(9)>                     
# .MEM_152 = PHI <.MEM_153(6), .MEM_171(9)>            
# .MEM_174 = VDEF <.MEM_152>                           
li.to_visit = _91;                                     
# VUSE <.MEM_174>                                      
_61 = cfun;                                            
# VUSE <.MEM_174>                                      
_62 = _61->x_current_loops;                            
# VUSE <.MEM_174>                                      
_63 = _62->tree_root;                                  

<bb 11> [local count: 77159561]:                       
# aloop_80 = PHI <_63(10), _108(26)>                   
# .MEM_147 = PHI <.MEM_174(10), .MEM_90(26)>           

<bb 12> [local count: 701450557]:                      
# aloop_64 = PHI <aloop_80(11), _71(16)>               
# .MEM_148 = PHI <.MEM_147(11), .MEM_149(16)>          
# VUSE <.MEM_148>                                      
_65 = aloop_64->num;                                   
if (_65 > 0)                                           
  goto <bb 13>; [50.00%]                               
else                                                   
  goto <bb 16>; [50.00%]                               

<bb 13> [local count: 350725279]:                      
if (_91 != 0B)                                         
  goto <bb 14>; [70.00%]                               
else                                                   
  goto <bb 15>; [30.00%]                               

<bb 14> [local count: 245507696]:                      
_67 = &MEM[(struct VEC_int_heap *)_91].base;           

<bb 15> [local count: 350725279]:                      
# _68 = PHI <0B(13), _67(14)>                          
# VUSE <.MEM_148>                                      
_69 = _68->num;                                        
_70 = _69 + 1;                                         
# .MEM_179 = VDEF <.MEM_148>                           
_68->num = _70;                                        
# .MEM_180 = VDEF <.MEM_179>                           
MEM <struct VEC_int_base> [(int *)_68].vec[_69] = _65; 
------

The problem is, for the malloced stores, 

MEM[(struct vec_prefix *)newmem_111].alloc = _66;                    
MEM[(struct vec_prefix *)newmem_111].num = 0;    

These 2 stmts are marked as dead store and eliminated, but actually there was a
use chain

------
<bb 10> [local count: 39298950]:                       
# _91 = PHI <0B(6), newmem_111(9)>
# .MEM_152 = PHI <.MEM_153(6), .MEM_171(9)>
# .MEM_174 = VDEF <.MEM_152>               
li.to_visit = _91;
...

<bb 13> [local count: 350725279]:
if (_91 != 0B)                                         
  goto <bb 14>; [70.00%]                               
else                                                   
  goto <bb 15>; [30.00%]   

<bb 14> [local count: 245507696]:                      
_67 = &MEM[(struct VEC_int_heap *)_91].base; 

<bb 15> [local count: 350725279]:                      
# _68 = PHI <0B(13), _67(14)>                          
# VUSE <.MEM_148>                                      
_69 = _68->num;
_70 = _69 + 1;                                         
# .MEM_179 = VDEF <.MEM_148>                           
_68->num = _70;                                        
# .MEM_180 = VDEF <.MEM_179>                           
MEM <struct VEC_int_base> [(int *)_68].vec[_69] = _65;
------

The source code has an implicit type casting

li->to_visit = (VEC_int_heap_alloc(number_of_loops () ));
...
(VEC_int_base_quick_push(((li->to_visit) ? &(li->to_visit)->base :
0),aloop->num ));

Who casts the malloced pointer newmem_111 to (struct VEC_int_heap *), then cast
again to (int *) in VEC_int_base_quick_push. But the store to newmem_111 was
casted to (struct vec_prefix *) in VEC_int_heap_alloc.

As such casting violates aliasing rule, -fno-strict-aliasing may be needed,
otherwise TBAA cannot identify the ref and use actually aliases.

The patch r15-571 allows analysis on multiple vdefs in dse_classify_store, and
prior to the patch it directly returns DSE_STORE_MAYBE_PARTIAL_DEAD for
multiple vdefs for this case, so the issue was just exposed by the patch.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/115256] [15 Regression] 502.gcc_r Run failed with '-march=native -Ofast -funroll-loops -flto' since r15-571-g1e0ae1f52741f7
  2024-05-28  9:05 [Bug tree-optimization/115256] New: [15 Regression] 502.gcc_r Run failed with '-march=native -Ofast -funroll-loops -flto' since r15-571-g1e0ae1f52741f7 admin at levyhsu dot com
                   ` (3 preceding siblings ...)
  2024-06-17  7:41 ` hongyuw at gcc dot gnu.org
@ 2024-06-17 12:45 ` rguenth at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-06-17 12:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115256

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |INVALID

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
Thanks for analyzing.  Indeed older GCC had this sort of issue, so invalid.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-06-17 12:45 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-28  9:05 [Bug tree-optimization/115256] New: [15 Regression] 502.gcc_r Run failed with '-march=native -Ofast -funroll-loops -flto' since r15-571-g1e0ae1f52741f7 admin at levyhsu dot com
2024-05-28 10:46 ` [Bug tree-optimization/115256] " rguenth at gcc dot gnu.org
2024-05-28 12:02 ` rguenth at gcc dot gnu.org
2024-05-28 13:28 ` admin at levyhsu dot com
2024-06-17  7:41 ` hongyuw at gcc dot gnu.org
2024-06-17 12:45 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).