Re: [PATCH] improve ifcvt optimization (PR rtl-optimization/89430)

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Jeff Law <law@redhat.com>
To: JiangNing OS <jiangning@os.amperecomputing.com>,
	"gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] improve ifcvt optimization (PR rtl-optimization/89430)
Date: Mon, 17 Jun 2019 21:59:00 -0000	[thread overview]
Message-ID: <cd4c88a2-155a-6c35-fb1c-c6b82eb7b2ef@redhat.com> (raw)
In-Reply-To: <BL0PR0102MB3588BB0559DF4ECE564C898C9C440@BL0PR0102MB3588.prod.exchangelabs.com>

On 3/14/19 10:46 PM, JiangNing OS wrote:
> This patch is to fix a missing ifcvt opportunity in back-end. For the simple case below,
> 
> if (...)
>     x = a;  /* x is memory */
> /* no else */
> 
> We can generate conditional move and remove the branch as below if the target cost is acceptable. 
> 
> r1 = x
> r2 = a
> cmp ...
> csel r3, r1, r2, cond
> x = r3
> 
> This could be safe if x is a stack variable, and there isn't any address taken in current function, so the store speculation can be avoided. 
So at a high level should we be doing this in gimple rather than RTL?
We're going to have a lot more information about types, better
infrastructure for looking at uses/defs, access to the alias oracle, we
should be able to accurately distinguish between potentially shared
objects vs those which are local to the thread, etc.  We lose the low
level costing information though.

I'm still going to go through the patch and do some level of review, but
I do think we need to answer the higher level question though.


>  
> Index: ifcvt.c
> ===================================================================
> --- ifcvt.c	(revision 269698)
> +++ ifcvt.c	(working copy)
> @@ -2029,6 +2045,93 @@
>    return true;
>  }
>  
> +/* Return true if MEM x is on stack. a_insn contains x, if it exists. */
Nits: We typically refer to parameters, variables, etc in comments using
upper case.  You'll need to review the entire patch for these its.

So perhaps the comment should be something like:

/* Return true of X, a MEM expression, is on the stack.  A_INSN contains
   X if A_INSN exists.  */


Just from a design standpoint, what are the consequences if this
function returns true for something that isn't actually in the stack or
false for something that is on the stack?



> +
> +static bool
> +noce_mem_is_on_stack (rtx_insn *a_insn, const_rtx x)
> +{
> +  df_ref use;
> +
> +  gcc_assert (x);
> +  gcc_assert (MEM_P (x));
> +
> +  /* Early exits if find base is a stack register. */
> +  rtx a = XEXP (x, 0);
> +  if (fixed_base_plus_p(a))
> +    return true;
Nit: Space between the function name and its open paren for arguments.  ie

if (fixed_base_plus_p (a)
                     ^
I see other instances of this nit, please review the patch and correct them.


> +
> +  if (!a_insn)
> +    return false;
I'm not sure what the calling profile is for this function, but this is
a cheaper test, so you might consider moving it before the test of
fixed_base_plus_p.


> +
> +  if (!reg_mentioned_p (x, a_insn))
> +    return false;
> +
> +  /* Check if x is on stack. Assume a mem expression using registers
> +     related to stack register is always on stack. */
> +  FOR_EACH_INSN_USE (use, a_insn)
> +    if (reg_mentioned_p (DF_REF_REG (use), x)
> +        && bitmap_bit_p (bba_sets_must_be_sfp, DF_REF_REGNO (use)))
> +      return true;
> +
> +  return false;
> +}
So is X always a MEM?      Just wanted to make sure I understand.
reg_mentioned_p will do the right thing (using rtx_equal_p) for the
comparison.


> +
> +/* Always return true, if there is a dominating write.
> +   
> +   When there is a dominating read from memory on stack,
> +   1) if x = a is a memory read, return true.
> +   2) if x = a is a memory write, return true if the memory is on stack.
> +      This is the guarantee the memory is *not* readonly. */
> +
> +static bool
> +noce_valid_for_dominating (basic_block bb, rtx_insn *a_insn,
> +                           const_rtx x, bool is_store)
> +{
> +  rtx_insn *insn;
> +  rtx set;
> +
> +  gcc_assert (MEM_P (x));
> +
> +  FOR_BB_INSNS (bb, insn)
> +    {
> +      set = single_set (insn);
> +      if (!set)
> +        continue;
> +
> +      /* Dominating store */
> +      if (rtx_equal_p (x, SET_DEST (set)))
> +        return true;
> +
> +      /* Dominating load */
> +      if (rtx_equal_p (x, SET_SRC (set)))
> +        if (is_store && noce_mem_is_on_stack (a_insn, x))
> +          return true;
> +    }
> +
> +  return false;
> +}
So what would be the consequences here of returning false when in fact
there was a dominating read or write?  That could easily happen if the
dominating read or write was not a single_set.

I'm guessing that from a design standpoint you're trying to find cases
where you know an object was written to within the block and does not
escape.  So returning false when there was a dominating write is safe.
Returning true when there was not would be disastrous.  Right?





> @@ -5347,6 +5462,234 @@
>  
>    return FALSE;
>  }
> +
> +
> +/* Find all insns that must be stack address and store REGNO into
> +   bitmap bba_sets_must_be_sfp. */
> +
> +static void
> +find_all_must_be_sfp_insns (void)
> +{
> +  rtx_insn *a_insn;
> +  basic_block bb;
> +  bool sfp_found;
> +  auto_vec<int, 1> def_count;
> +  df_ref def, use;
> +
> +  /* Calculate def counts for each insn. */
> +  def_count.safe_grow_cleared (max_reg_num () + 1);
> +  FOR_ALL_BB_FN (bb, cfun)
> +    FOR_BB_INSNS (bb, a_insn)
> +      {
> +        if (!INSN_P (a_insn))
> +          continue;
> +
> +        FOR_EACH_INSN_DEF (def, a_insn)
> +          def_count[DF_REF_REGNO (def)]++;
> +      }Is there a reason why you can't use the information computed by reginfo?




> +
> +  /* Iterate all insns until finding all stack pointers, and store
> +     them into bba_sets_must_be_sfp. */
> +  bba_sets_must_be_sfp = BITMAP_ALLOC (&reg_obstack);
> +  do
> +    {
> +      sfp_found = false;
> +      FOR_ALL_BB_FN (bb, cfun)
> +        FOR_BB_INSNS (bb, a_insn)
> +          {
> +            rtx sset_a = single_set (a_insn);
> +            if (!sset_a)
> +              continue;
> +            rtx src = SET_SRC (sset_a);
> +            rtx dest = SET_DEST (sset_a);
So again, we can get things that aren't a single_set here and they'll be
ignored.  But I believe that's safe as you're trying to identify insns
which store stack addresses into a REG.  Missing a more complicated case
where a stack address is stored into a REG is safe.  Right?




> +
> +            if (!REG_P (dest))
> +              continue;
> +
> +            /* For the case below,
> +                 Control flow: B1->B3, B2->B3
> +                 B1: p = &local_var
> +                 B2: p = &global_var
> +                 B3: ... = *p
> +               pointer p is an address for either local or global variable.
> +               so we don't treat p as a stack address pointer. To make
> +               algorithm simple, we ignore all non-SSA cases. */
> +            bool skip_insn = false;
> +            unsigned int dest_regno = 0;
> +            FOR_EACH_INSN_DEF (def, a_insn)
> +              {
> +                dest_regno = DF_REF_REGNO (def);
> +                /* Skip current insn if
> +                   1) already marked as stack pointer.
> +                   2) or see multiple definition points. */
> +                if (bitmap_bit_p (bba_sets_must_be_sfp, dest_regno)
> +                    || def_count[dest_regno] > 1)
> +                  {
> +                    skip_insn = true;
> +                    break;
> +                  }
> +              }
> +            if (skip_insn)
> +              continue;
Right.  Again I wonder if you should be using the info computed by
regstat.  You can get single use information easily from that.

> +
> +            /* Handle case like "x1 = sp + offset". */
> +            if (fixed_base_plus_p (src))
> +              {
> +  	        bitmap_set_bit (bba_sets_must_be_sfp, dest_regno);
> +                sfp_found = true;
> +                continue;
> +              }
Looks like your you've got an indentation problem here.  Most likely
you've got a case where you've got 8 spaces that should be replaced with
a tab.


> +
> +            /* Handle case like "x2 = x1 + offset", in which x1 is already
> +               identified as a stack pointer. */
> +            if ((GET_CODE (src) == PLUS || GET_CODE (src) == MINUS)
> +                && CONST_INT_P (XEXP (src, 1)))
> +              {
> +                rtx x1 = XEXP (src, 0);
> +                if (!REG_P (x1))
> +                  continue;
> +
> +                FOR_EACH_INSN_USE (use, a_insn)
> +                  {
> +                    if (!rtx_equal_p (x1, DF_REF_REG (use)))
> +                      continue;
> +
> +                    if (bitmap_bit_p (bba_sets_must_be_sfp, DF_REF_REGNO (use)))
> +                      {
> +                        bitmap_set_bit (bba_sets_must_be_sfp, dest_regno);
> +                        sfp_found = true;
> +                        break;
> +                      }
> +                  }
So how much is there to be gained from going from this kind of localized
computation to global?  It shouldn't be hard to turn this into a truly
global algorithm, but I'm not sure it's worth the effort.

I _do_ think it's worth visiting blocks in dominator order.  It's better
than what you're doing, but nowhere near as expensive as a global
propagation engine.

Is it worth handling simple copies in the code above?


> +  	      }
> +          }
> +    }
> +  while (sfp_found);
> +}
> +
> +/* Find all insns that may be stack pointer and store REGNO into
> +   bitmap bba_sets_may_be_sfp. We iterate all insns in current func
> +   until no more latent insns generating stack address are found.
> +   The collection of stack pointer bba_sets_may_be_sfp will be used
> +   to help analyze local stack variable address taken. Stack variable
> +   address can be passed out of current frame if only any stack pointer
> +   is passed to hard register or stored into memory. */
Shouldn't "may be stack pointer" be "must be stack pointer"?


> +
> +static void
> +find_all_may_be_sfp_insns (void)
> +{
> +  rtx_insn *a_insn;
> +  basic_block bb;
> +  bool sfp_found;
> +  bba_sets_may_be_sfp = BITMAP_ALLOC (&reg_obstack);
> +
> +  /* Iterate all insns that may be stack pointers, and store them into
> +     bitmap bba_sets_must_be_sfp. */
> +  do
> +    {
> +      sfp_found = false;
> +      FOR_ALL_BB_FN (bb, cfun)
Again, you may ultimately be better off with a dominator walk here.


> +        FOR_BB_INSNS (bb, a_insn)
> +          {
> +            /* Assume stack pointers can only be generated by single SET. */
> +            rtx sset_a = single_set (a_insn);
> +            if (!sset_a)
> +              continue;
> +            rtx src = SET_SRC (sset_a);
> +            rtx dest = SET_DEST (sset_a);
I'm not sure that's a safe assumption.


> +
> +            /* If we see "hard_register = ...", which implies passing out
> +               of current frame potentially, we don't collect this case.
> +               It can be treated as address taken in no_stack_address_taken */
> +            if (HARD_REGISTER_P (dest))
> +              continue;
> +
> +            /* Memory load and store are not to generate stack pointer. */
> +            if (MEM_P (src) || MEM_P (dest))
> +              continue;
> +
> +            /* Skip insn that is already identified as stack pointer. */
> +            bool skip_insn = false;
> +            df_ref def;
> +            unsigned int dest_regno = 0;
> +            FOR_EACH_INSN_DEF (def, a_insn)
> +              {
> +                dest_regno = DF_REF_REGNO (def);
> +                if (bitmap_bit_p (bba_sets_may_be_sfp, dest_regno))
> +                  {
> +                    skip_insn = true;
> +                    break;
> +                  }
> +              }
So earlier you've verified you've got a single set.  I'm not sure you
need an iterator over the defs.  Can't you just look at SET_DEST
(sset_a) which is conveniently in DEST?


I don't like the term "stack pointer" here -- most of us read "stack
pointer" and we think the register holding the current stack pointer.
Would "pointer into the stack" be more appropriate?


> +/* Return true if current function doesn't pass stack address out of frame. */
> +
> +static bool
> +no_stack_address_taken (void)
> +{
> +  basic_block bb;
> +  rtx_insn *a_insn;
> +  df_ref use;
> +
> +  FOR_ALL_BB_FN (bb, cfun)
> +    FOR_BB_INSNS (bb, a_insn)
> +      {
> +        if (!INSN_P (a_insn))
> +          continue;
> +
> +        rtx sset_a = single_set (a_insn);
> +        if (!sset_a)
> +          continue;
> +        rtx src = SET_SRC (sset_a);
> +        rtx dest = SET_DEST (sset_a);
> +
> +        /* Skip insn that is already identified as frame pointers. */
> +        if (bitmap_bit_p (bba_sets_may_be_sfp, REGNO (dest)))
> +          continue;
> +                    
> +        FOR_EACH_INSN_USE (use, a_insn)
> +          {
> +            /* Check if current insn is using any stack pointer. Ignore
> +               if it doesn't use frame pointers at all. */
> +            if (!bitmap_bit_p (bba_sets_may_be_sfp, DF_REF_REGNO (use)))
> +              continue;
> +
> +            /* Load cannot introduce address taken. */
> +            if (MEM_P (src))
> +              continue;
> +
> +            /* Store is safe if only the src doesn't contain stack pointer. */
> +            if (MEM_P (dest)
> +                && !reg_mentioned_p (DF_REF_REG (use), src))
> +              continue;
> +
> +            /* All other cases potentially introduce address taken. */
> +            return false;
> +          }
So what if you have a PARALLEL where one entry causes an escape of a
pointer into the stack and another entry does some other operation?  If
so, don't you miss the fact that you've got an escaping pointer into the
stack?  I don't think you can restrict yourself to single_sets here.

Jeff

next prev parent reply	other threads:[~2019-06-17 21:59 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-15  8:22 JiangNing OS
2019-03-18 21:53 ` Jeff Law
2019-04-30 16:00 ` Jeff Law
2019-05-05  0:11   ` JiangNing OS
2019-05-24 14:54 ` Kyrill Tkachov
2019-06-17  0:36   ` JiangNing OS
2019-06-17 21:59 ` Jeff Law [this message]
2019-06-20  9:53   ` JiangNing OS
2019-06-20 23:13     ` Jeff Law
2019-06-21 12:24       ` Richard Biener
2019-06-21 16:10         ` Jeff Law
2019-07-08  7:43           ` JiangNing OS
2019-07-12  5:22             ` Andrew Pinski
2019-07-12 16:11               ` Jeff Law
2019-07-12 16:40             ` Jeff Law

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cd4c88a2-155a-6c35-fb1c-c6b82eb7b2ef@redhat.com \
    --to=law@redhat.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jiangning@os.amperecomputing.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).