[Bug tree-optimization/114932] IVopts inefficient handling of signed IV used for addressing.

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

From: "tnfchris at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/114932] IVopts inefficient handling of signed IV used for addressing.
Date: Wed, 05 Jun 2024 09:42:44 +0000	[thread overview]
Message-ID: <bug-114932-4-9hZSVXq2oK@http.gcc.gnu.org/bugzilla/> (raw)
In-Reply-To: <bug-114932-4@http.gcc.gnu.org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114932

--- Comment #9 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
It's taken me a bit of time to track down all the reasons for the speedup with
the earlier patch.

This comes from two parts:

1. Signed IVs don't get simplified.  Due to possible UB with signed overflows
gimple expressions don't get simplified when the type is signed.

However for addressing modes it doesn't matter as simplifying the constants any
potential overflow can still happen.  Secondly most architectures say you can
never reach the full address space range anyway.  Those that due (like those
that offer baremetal variants like Arm and AArch64) explicitly specify that
overflow is defined as wrapping around.  That means that IVs for their use in
IV opts should be save to simplify as if they were unsigned.

I have a patch that during the creation of IV candidates folds them to unsigned
and then folds them back to their original signed types.  This maintains all
the original overflow analysis and the correct typing in gimple.

2. The second problem is that due to Fortran not having unsigned types, the
front-end generates a signed IV.  Some optimizations as they work can convert
these to unsigned due to folding, e.g. extract_muldiv is one place where this
is done.

This can make us end up having the same IV as both signed and unsigned, as is
the case here:

<Invariant Expressions>:                                                       
                                                                               
                                                                               
                             inv_expr 1:     stride.3_27 * 4                   
                                                                               
                                                                               
                                                          inv_expr 2:    
(unsigned long) stride.3_27 * 4       

These end up being used in the same group:

Group 1:                                                                       
                                                                               
                                                                               
                               cand  cost    compl.  inv.expr.       inv.vars  
                                                                               
                                                                               
                                                            1     0       0    
  NIL;    6                                                                    
                                                                               
                                                                               
         2     0       0       NIL;    6                                       
                                                                               
                                                                               
                                      3     0       0       NIL;    6          
                                                                               
                                                                               
                                                                   4     0     
 0       NIL;    6                     

which ends up with IV opts picking the signed and unsigned IVs:

Improved to:
  cost: 24 (complexity 3)
  reg_cost: 9
  cand_cost: 15
  cand_group_cost: 0 (complexity 3)
  candidates: 1, 6, 8
   group:0 --> iv_cand:6, cost=(0,1)
   group:1 --> iv_cand:1, cost=(0,0)
   group:2 --> iv_cand:8, cost=(0,1)
   group:3 --> iv_cand:8, cost=(0,1)
  invariant variables: 6
  invariant expressions: 1, 2

and so generates the same IV as both signed and unsigned:

;;   basic block 21, loop depth 3, count 214748368 (estimated locally, freq
58.2545), maybe hot                                                            
                                                                               
                                 ;;    prev block 28, next block 31, flags:
(NEW, REACHABLE, VISITED)
;;    pred:       28 [always]  count:23622320 (estimated locally, freq 6.4080)
(FALLTHRU,EXECUTABLE)                                                          
                                                                               
                              ;;                25 [always]  count:191126046
(estimated locally, freq 51.8465) (FALLTHRU,DFS_BACK,EXECUTABLE)
  # .MEM_66 = PHI <.MEM_34(28), .MEM_22(25)>
  # ivtmp.22_41 = PHI <0(28), ivtmp.22_82(25)>
  # ivtmp.26_51 = PHI <ivtmp.26_55(28), ivtmp.26_72(25)>
  # ivtmp.28_90 = PHI <ivtmp.28_99(28), ivtmp.28_98(25)>

...

;;   basic block 24, loop depth 3, count 214748366 (estimated locally, freq
58.2545), maybe hot                                                            
                                                                               
                                 ;;    prev block 22, next block 25, flags:
(NEW, REACHABLE, VISITED)                                                      
                                                                               
                                                                  ;;    pred:  
    22 [always]  count:95443719 (estimated locally, freq 25.8909) (FALLTHRU)   
                                                                               
                                                                               
               ;;                21 [33.3% (guessed)]  count:71582790
(estimated locally, freq 19.4182) (TRUE_VALUE,EXECUTABLE)                      
                                                                               
                                                      ;;                31
[33.3% (guessed)]  count:47721860 (estimated locally, freq 12.9455)
(TRUE_VALUE,EXECUTABLE)                                                        
                                                                               
                      # .MEM_22 = PHI <.MEM_44(22), .MEM_31(21), .MEM_79(31)>  
                                                                               
                                                                               
                                                   ivtmp.22_82 = ivtmp.22_41 +
1;                                                                             
                                                                               
                                                                               
 ivtmp.26_72 = ivtmp.26_51 + _80;                                              
                                                                               
                                                                               
                              ivtmp.28_98 = ivtmp.28_90 + _39;  

These two IVs are always used as unsigned, so IV ops generates:

  _73 = stride.3_27 * 4;
  _80 = (unsigned long) _73;
  _54 = (unsigned long) stride.3_27;
  _39 = _54 * 4;

Which means that in e.g. exchange2 we generate a lot of duplicate code.
I'm unsure yet how to fix this.  I think I need to know how the IV values are
used.

Given that the signed IV is used as unsigned they should be the same.

next prev parent reply	other threads:[~2024-06-05  9:42 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-03  5:51 [Bug tree-optimization/114932] New: Improvement in CHREC can give large performance gains tnfchris at gcc dot gnu.org
2024-05-03  6:26 ` [Bug tree-optimization/114932] " rguenth at gcc dot gnu.org
2024-05-03  7:03 ` pinskia at gcc dot gnu.org
2024-05-03  8:09 ` tnfchris at gcc dot gnu.org
2024-05-03  8:41 ` tnfchris at gcc dot gnu.org
2024-05-03  8:44 ` tnfchris at gcc dot gnu.org
2024-05-03  8:45 ` tnfchris at gcc dot gnu.org
2024-05-03  9:12 ` rguenth at gcc dot gnu.org
2024-05-13  8:28 ` tnfchris at gcc dot gnu.org
2024-06-05  9:42 ` tnfchris at gcc dot gnu.org [this message]
2024-06-05 10:23 ` [Bug tree-optimization/114932] IVopts inefficient handling of signed IV used for addressing rguenth at gcc dot gnu.org
2024-06-05 19:02 ` tnfchris at gcc dot gnu.org
2024-06-06  6:17 ` rguenther at suse dot de
2024-06-06  6:40 ` tnfchris at gcc dot gnu.org
2024-06-06  7:55 ` rguenther at suse dot de
2024-06-06  8:01 ` tnfchris at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-114932-4-9hZSVXq2oK@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).