public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: [PATCH] Don't combine param and return value copies
  2015-05-23 11:20 [PATCH] Don't combine param and return value copies Alan Modra
@ 2015-05-23 11:20 ` Andrew Pinski
  2015-05-23 19:40 ` Segher Boessenkool
  1 sibling, 0 replies; 8+ messages in thread
From: Andrew Pinski @ 2015-05-23 11:20 UTC (permalink / raw)
  To: GCC Patches, Segher Boessenkool

On Sat, May 23, 2015 at 1:33 AM, Alan Modra <amodra@gmail.com> wrote:
> This stops combine messing with parameter and return value copies
> from/to hard registers.  Bootstrapped and regression tested
> powerpc64le-linux, powerpc64-linux and x86_64-linux.  In looking at a
> number of different powerpc64le gcc/*.o files, I noticed a few code
> generation improvements.  There were cases where a register copy was
> no longer needed, cmp used in place of mr., and rlwinm instead of
> rldicl.  x86_64 gcc/*.o showed no changes (apart from combine.o of
> course), but should see a small improvement in compile time with this
> change.

I noticed this a while back but I never got around to fixing this.
This should also improve other RISC targets like AARCH64 and MIPS
where I had saw issues like this too.

Thanks,
Andrew

>
> The "clear next_use when !can_combine_p" change is to fix a non-bug.
> Given
>
> 1) insn defining rn
>    ...
> 2) insn defining rn but !can_combine_p
>    ...
> 3) insn using rn
>
> then create_log_links might create a link from (3) to (1), I thought.
> However, can_combine_p doesn't currently allow this to happen.
> Obviously, any can_combine_p result depending on regno shouldn't give
> a different result at (1) and (2), but there is also at test of
> DF_REF_PRE_POST_MODIFY that can.  The saving grace is that pre/post
> modify insns also use the register, which means next_use[rn] will
> point at (2), not (3), when (1) is processed.
>
> I came across this because at one stage I considered modifying
> can_combine_p.  Someone who does so in the future might trigger the
> above problem, so I thought it worth posting the change.
>
> OK for mainline?
>
>         * combine.c (set_return_regs): New function.
>         (twiddle_first_block, twiddle_last_block): New functions.
>         (create_log_links): Exclude instructions copying parameter
>         values from hard regs to pseudos, and instructions copying
>         return value pseudos to hard regs.  Clear next_use when
>         !can_combine_p.
>
> Index: gcc/combine.c
> ===================================================================
> --- gcc/combine.c       (revision 223463)
> +++ gcc/combine.c       (working copy)
> @@ -1048,9 +1048,79 @@ can_combine_use_p (df_ref use)
>    return true;
>  }
>
> -/* Fill in log links field for all insns.  */
> +/* Used to build set of return value regs.  Add X to the set.  */
>
>  static void
> +set_return_regs (rtx x, void *arg)
> +{
> +  HARD_REG_SET *regs = (HARD_REG_SET *) arg;
> +
> +  add_to_hard_reg_set (regs, GET_MODE (x), REGNO (x));
> +}
> +
> +/* Twiddle BLOCK_FOR_INSN to TO for instructions in the first block BB
> +   that we don't want to combine with other instructions.  */
> +
> +static void
> +twiddle_first_block (basic_block bb, basic_block to)
> +{
> +  rtx_insn *insn;
> +
> +  FOR_BB_INSNS (bb, insn)
> +    {
> +      if (NOTE_P (insn) && NOTE_KIND (insn) == NOTE_INSN_FUNCTION_BEG)
> +       break;
> +      if (!NONDEBUG_INSN_P (insn))
> +       continue;
> +
> +      /* reg,reg copies from parameter hard regs.  */
> +      rtx set = single_set (insn);
> +      if (set
> +         && REG_P (SET_DEST (set))
> +         && REG_P (SET_SRC (set))
> +         && HARD_REGISTER_P (SET_SRC (set)))
> +       set_block_for_insn (insn, to);
> +    }
> +}
> +
> +/* Twiddle BLOCK_FOR_INSN to TO for instructions in the last block BB
> +   that we don't want to combine with other instructions.  */
> +
> +static void
> +twiddle_last_block (basic_block bb, basic_block to, HARD_REG_SET return_regs)
> +{
> +  rtx_insn *insn;
> +
> +  FOR_BB_INSNS_REVERSE (bb, insn)
> +    {
> +      if (CALL_P (insn))
> +       break;
> +      if (!NONDEBUG_INSN_P (insn))
> +       continue;
> +
> +      rtx reg = NULL_RTX;
> +      /* use_return_regster added USEs.  */
> +      if (GET_CODE (PATTERN (insn)) == USE)
> +       reg = XEXP (PATTERN (insn), 0);
> +      else
> +       {
> +         /* reg,reg copies that set return value hard regs.  */
> +         rtx set = single_set (insn);
> +         if (set && REG_P (SET_SRC (set)))
> +           reg = SET_DEST (set);
> +       }
> +      if (reg
> +         && REG_P (reg)
> +         && HARD_REGISTER_P (reg)
> +         && overlaps_hard_reg_set_p (return_regs,
> +                                     GET_MODE (reg), REGNO (reg)))
> +       set_block_for_insn (insn, to);
> +    }
> +}
> +
> +/* Fill in log links field for all insns that we wish to combine.  */
> +
> +static void
>  create_log_links (void)
>  {
>    basic_block bb;
> @@ -1057,9 +1127,28 @@ create_log_links (void)
>    rtx_insn **next_use;
>    rtx_insn *insn;
>    df_ref def, use;
> +  HARD_REG_SET return_regs;
>
>    next_use = XCNEWVEC (rtx_insn *, max_reg_num ());
>
> +  /* Don't combine instructions copying parameter values from hard
> +     regs to pseudos.  Exclude such instructions from LOG_LINKS by
> +     temporarily zapping BLOCK_FOR_INSN.  */
> +
> +  bb = ENTRY_BLOCK_PTR_FOR_FN (cfun)->next_bb;
> +  twiddle_first_block (bb, 0);
> +
> +  /* Similarly, don't combine instructions copying return values
> +     from pseudos to hard regs.  */
> +
> +  CLEAR_HARD_REG_SET (return_regs);
> +  diddle_return_value (set_return_regs, &return_regs);
> +  if (!hard_reg_set_empty_p (return_regs))
> +    {
> +      bb = EXIT_BLOCK_PTR_FOR_FN (cfun)->prev_bb;
> +      twiddle_last_block (bb, 0, return_regs);
> +    }
> +
>    /* Pass through each block from the end, recording the uses of each
>       register and establishing log links when def is encountered.
>       Note that we do not clear next_use array in order to save time,
> @@ -1087,12 +1176,12 @@ create_log_links (void)
>                if (!next_use[regno])
>                  continue;
>
> +             use_insn = next_use[regno];
> +             next_use[regno] = NULL;
> +
>               if (!can_combine_def_p (def))
>                 continue;
>
> -             use_insn = next_use[regno];
> -             next_use[regno] = NULL;
> -
>               if (BLOCK_FOR_INSN (use_insn) != bb)
>                 continue;
>
> @@ -1103,7 +1192,7 @@ create_log_links (void)
>                  we might wind up changing the semantics of the insn,
>                  even if reload can make what appear to be valid
>                  assignments later.  */
> -             if (regno < FIRST_PSEUDO_REGISTER
> +             if (HARD_REGISTER_NUM_P (regno)
>                   && asm_noperands (PATTERN (use_insn)) >= 0)
>                 continue;
>
> @@ -1124,6 +1213,16 @@ create_log_links (void)
>          }
>      }
>
> +  /* Repair BLOCK_FOR_INSN.  */
> +
> +  bb = ENTRY_BLOCK_PTR_FOR_FN (cfun)->next_bb;
> +  twiddle_first_block (bb, bb);
> +  if (!hard_reg_set_empty_p (return_regs))
> +    {
> +      bb = EXIT_BLOCK_PTR_FOR_FN (cfun)->prev_bb;
> +      twiddle_last_block (bb, bb, return_regs);
> +    }
> +
>    free (next_use);
>  }
>
>
> --
> Alan Modra
> Australia Development Lab, IBM

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] Don't combine param and return value copies
@ 2015-05-23 11:20 Alan Modra
  2015-05-23 11:20 ` Andrew Pinski
  2015-05-23 19:40 ` Segher Boessenkool
  0 siblings, 2 replies; 8+ messages in thread
From: Alan Modra @ 2015-05-23 11:20 UTC (permalink / raw)
  To: gcc-patches; +Cc: Segher Boessenkool

This stops combine messing with parameter and return value copies
from/to hard registers.  Bootstrapped and regression tested
powerpc64le-linux, powerpc64-linux and x86_64-linux.  In looking at a
number of different powerpc64le gcc/*.o files, I noticed a few code
generation improvements.  There were cases where a register copy was
no longer needed, cmp used in place of mr., and rlwinm instead of
rldicl.  x86_64 gcc/*.o showed no changes (apart from combine.o of
course), but should see a small improvement in compile time with this
change.

The "clear next_use when !can_combine_p" change is to fix a non-bug.
Given

1) insn defining rn
   ...
2) insn defining rn but !can_combine_p
   ...
3) insn using rn

then create_log_links might create a link from (3) to (1), I thought.
However, can_combine_p doesn't currently allow this to happen.
Obviously, any can_combine_p result depending on regno shouldn't give
a different result at (1) and (2), but there is also at test of
DF_REF_PRE_POST_MODIFY that can.  The saving grace is that pre/post
modify insns also use the register, which means next_use[rn] will
point at (2), not (3), when (1) is processed.

I came across this because at one stage I considered modifying
can_combine_p.  Someone who does so in the future might trigger the
above problem, so I thought it worth posting the change.

OK for mainline?

	* combine.c (set_return_regs): New function.
	(twiddle_first_block, twiddle_last_block): New functions.
	(create_log_links): Exclude instructions copying parameter
	values from hard regs to pseudos, and instructions copying
	return value pseudos to hard regs.  Clear next_use when
	!can_combine_p.

Index: gcc/combine.c
===================================================================
--- gcc/combine.c	(revision 223463)
+++ gcc/combine.c	(working copy)
@@ -1048,9 +1048,79 @@ can_combine_use_p (df_ref use)
   return true;
 }
 
-/* Fill in log links field for all insns.  */
+/* Used to build set of return value regs.  Add X to the set.  */
 
 static void
+set_return_regs (rtx x, void *arg)
+{
+  HARD_REG_SET *regs = (HARD_REG_SET *) arg;
+
+  add_to_hard_reg_set (regs, GET_MODE (x), REGNO (x));
+}
+
+/* Twiddle BLOCK_FOR_INSN to TO for instructions in the first block BB
+   that we don't want to combine with other instructions.  */
+
+static void
+twiddle_first_block (basic_block bb, basic_block to)
+{
+  rtx_insn *insn;
+
+  FOR_BB_INSNS (bb, insn)
+    {
+      if (NOTE_P (insn) && NOTE_KIND (insn) == NOTE_INSN_FUNCTION_BEG)
+	break;
+      if (!NONDEBUG_INSN_P (insn))
+	continue;
+
+      /* reg,reg copies from parameter hard regs.  */
+      rtx set = single_set (insn);
+      if (set
+	  && REG_P (SET_DEST (set))
+	  && REG_P (SET_SRC (set))
+	  && HARD_REGISTER_P (SET_SRC (set)))
+	set_block_for_insn (insn, to);
+    }
+}
+
+/* Twiddle BLOCK_FOR_INSN to TO for instructions in the last block BB
+   that we don't want to combine with other instructions.  */
+
+static void
+twiddle_last_block (basic_block bb, basic_block to, HARD_REG_SET return_regs)
+{
+  rtx_insn *insn;
+
+  FOR_BB_INSNS_REVERSE (bb, insn)
+    {
+      if (CALL_P (insn))
+	break;
+      if (!NONDEBUG_INSN_P (insn))
+	continue;
+
+      rtx reg = NULL_RTX;
+      /* use_return_regster added USEs.  */
+      if (GET_CODE (PATTERN (insn)) == USE)
+	reg = XEXP (PATTERN (insn), 0);
+      else
+	{
+	  /* reg,reg copies that set return value hard regs.  */
+	  rtx set = single_set (insn);
+	  if (set && REG_P (SET_SRC (set)))
+	    reg = SET_DEST (set);
+	}
+      if (reg
+	  && REG_P (reg)
+	  && HARD_REGISTER_P (reg)
+	  && overlaps_hard_reg_set_p (return_regs,
+				      GET_MODE (reg), REGNO (reg)))
+	set_block_for_insn (insn, to);
+    }
+}
+
+/* Fill in log links field for all insns that we wish to combine.  */
+
+static void
 create_log_links (void)
 {
   basic_block bb;
@@ -1057,9 +1127,28 @@ create_log_links (void)
   rtx_insn **next_use;
   rtx_insn *insn;
   df_ref def, use;
+  HARD_REG_SET return_regs;
 
   next_use = XCNEWVEC (rtx_insn *, max_reg_num ());
 
+  /* Don't combine instructions copying parameter values from hard
+     regs to pseudos.  Exclude such instructions from LOG_LINKS by
+     temporarily zapping BLOCK_FOR_INSN.  */
+
+  bb = ENTRY_BLOCK_PTR_FOR_FN (cfun)->next_bb;
+  twiddle_first_block (bb, 0);
+
+  /* Similarly, don't combine instructions copying return values
+     from pseudos to hard regs.  */
+
+  CLEAR_HARD_REG_SET (return_regs);
+  diddle_return_value (set_return_regs, &return_regs);
+  if (!hard_reg_set_empty_p (return_regs))
+    {
+      bb = EXIT_BLOCK_PTR_FOR_FN (cfun)->prev_bb;
+      twiddle_last_block (bb, 0, return_regs);
+    }
+
   /* Pass through each block from the end, recording the uses of each
      register and establishing log links when def is encountered.
      Note that we do not clear next_use array in order to save time,
@@ -1087,12 +1176,12 @@ create_log_links (void)
               if (!next_use[regno])
                 continue;
 
+	      use_insn = next_use[regno];
+	      next_use[regno] = NULL;
+
 	      if (!can_combine_def_p (def))
 		continue;
 
-	      use_insn = next_use[regno];
-	      next_use[regno] = NULL;
-
 	      if (BLOCK_FOR_INSN (use_insn) != bb)
 		continue;
 
@@ -1103,7 +1192,7 @@ create_log_links (void)
 		 we might wind up changing the semantics of the insn,
 		 even if reload can make what appear to be valid
 		 assignments later.  */
-	      if (regno < FIRST_PSEUDO_REGISTER
+	      if (HARD_REGISTER_NUM_P (regno)
 		  && asm_noperands (PATTERN (use_insn)) >= 0)
 		continue;
 
@@ -1124,6 +1213,16 @@ create_log_links (void)
         }
     }
 
+  /* Repair BLOCK_FOR_INSN.  */
+
+  bb = ENTRY_BLOCK_PTR_FOR_FN (cfun)->next_bb;
+  twiddle_first_block (bb, bb);
+  if (!hard_reg_set_empty_p (return_regs))
+    {
+      bb = EXIT_BLOCK_PTR_FOR_FN (cfun)->prev_bb;
+      twiddle_last_block (bb, bb, return_regs);
+    }
+
   free (next_use);
 }
 

-- 
Alan Modra
Australia Development Lab, IBM

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Don't combine param and return value copies
  2015-05-23 11:20 [PATCH] Don't combine param and return value copies Alan Modra
  2015-05-23 11:20 ` Andrew Pinski
@ 2015-05-23 19:40 ` Segher Boessenkool
  2015-05-25  5:24   ` Alan Modra
  1 sibling, 1 reply; 8+ messages in thread
From: Segher Boessenkool @ 2015-05-23 19:40 UTC (permalink / raw)
  To: gcc-patches

Hi Alan,

On Sat, May 23, 2015 at 06:03:05PM +0930, Alan Modra wrote:
> This stops combine messing with parameter and return value copies
> from/to hard registers.  Bootstrapped and regression tested
> powerpc64le-linux, powerpc64-linux and x86_64-linux.  In looking at a
> number of different powerpc64le gcc/*.o files, I noticed a few code
> generation improvements.  There were cases where a register copy was
> no longer needed, cmp used in place of mr., and rlwinm instead of
> rldicl.  x86_64 gcc/*.o showed no changes (apart from combine.o of
> course), but should see a small improvement in compile time with this
> change.

Thanks.  Did you see improvements around return as well, or mostly /
only related to the function start?

> The "clear next_use when !can_combine_p" change is to fix a non-bug.
> Given
> 
> 1) insn defining rn
>    ...
> 2) insn defining rn but !can_combine_p
>    ...
> 3) insn using rn
> 
> then create_log_links might create a link from (3) to (1), I thought.
> However, can_combine_p doesn't currently allow this to happen.
> Obviously, any can_combine_p result depending on regno shouldn't give
> a different result at (1) and (2), but there is also at test of
> DF_REF_PRE_POST_MODIFY that can.  The saving grace is that pre/post
> modify insns also use the register, which means next_use[rn] will
> point at (2), not (3), when (1) is processed.
> 
> I came across this because at one stage I considered modifying
> can_combine_p.  Someone who does so in the future might trigger the
> above problem, so I thought it worth posting the change.

I agree it looks like a bug waiting to happen.  Please post it as a
separate patch though.

> +/* Twiddle BLOCK_FOR_INSN to TO for instructions in the first block BB
> +   that we don't want to combine with other instructions.  */
> +
> +static void
> +twiddle_first_block (basic_block bb, basic_block to)

I don't like this much.  Messing with global state makes things harder
to change later.  If it is hard to come up with a good name for a
factor, it usually means it is not such a good factor.

You can do these inside can_combine_{def,use}_p as far as I can see?
Probably need to give those an extra parameter then: for _def, a bool
that says "don't allow moves from hard regs", set when the scan has
encountered a NOTE_INSN_FUNCTION_BEG in this bb; and for _use, a regset
of those hard regs we don't want to allow moves to (those seen in USEs
later in that same block).  Or do it in the main loop itself, _{def,use}
are each called in one place only; whatever works best.

What makes return values different from CALL args here, btw?  Why do
you not want to combine return values, but you leave alone call args?

> +/* Fill in log links field for all insns that we wish to combine.  */

Please don't change this comment; it was more correct before.

> @@ -1103,7 +1192,7 @@ create_log_links (void)
>  		 we might wind up changing the semantics of the insn,
>  		 even if reload can make what appear to be valid
>  		 assignments later.  */
> -	      if (regno < FIRST_PSEUDO_REGISTER
> +	      if (HARD_REGISTER_NUM_P (regno)
>  		  && asm_noperands (PATTERN (use_insn)) >= 0)
>  		continue;

An independent change.


Segher

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Don't combine param and return value copies
  2015-05-23 19:40 ` Segher Boessenkool
@ 2015-05-25  5:24   ` Alan Modra
  2015-05-25  7:30     ` Alan Modra
  2015-05-25 20:19     ` Segher Boessenkool
  0 siblings, 2 replies; 8+ messages in thread
From: Alan Modra @ 2015-05-25  5:24 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches

On Sat, May 23, 2015 at 08:45:23AM -0500, Segher Boessenkool wrote:
> Thanks.  Did you see improvements around return as well, or mostly /
> only related to the function start?

The rlwinm vs. rldicl change was in dwarf2out.c add_ranges_num for r3
one insn before a blr.  I'm sure that was due to not combining return
value copies.  In fact, I'm sure all the improvements I saw were due
to changing the exit..  See below.

[snip]
> I agree it looks like a bug waiting to happen.  Please post it as a
> separate patch though.

OK.

> > +/* Twiddle BLOCK_FOR_INSN to TO for instructions in the first block BB
> > +   that we don't want to combine with other instructions.  */
> > +
> > +static void
> > +twiddle_first_block (basic_block bb, basic_block to)
> 
> I don't like this much.  Messing with global state makes things harder
> to change later.  If it is hard to come up with a good name for a
> factor, it usually means it is not such a good factor.
> 
> You can do these inside can_combine_{def,use}_p as far as I can see?
> Probably need to give those an extra parameter then: for _def, a bool
> that says "don't allow moves from hard regs", set when the scan has
> encountered a NOTE_INSN_FUNCTION_BEG in this bb; and for _use, a regset
> of those hard regs we don't want to allow moves to (those seen in USEs
> later in that same block).  Or do it in the main loop itself, _{def,use}
> are each called in one place only; whatever works best.

Huh, that's the way I started writing this patch..  The reason I went
with modifying BLOCK_FOR_INSN is that the loop setting up LOG_LINKS
needs to test that anyway.  Changing code in the main loop means
every insn in a function will need to be tested for additional
conditions.  I thought that might be slower.  I can see your point
though, especially if someone wanted to wean combine off LOG_LINKS.
I'll rewrite the patch, which I realized was buggy anyway, and didn't
prevent param copies from being combined.  :-(

> What makes return values different from CALL args here, btw?  Why do
> you not want to combine return values, but you leave alone call args?

I don't think there is much difference between SETs for return values
and SETs for call args.  The reason I deliberately left them out is
that in the original discussion we were talking about parameters and
return values.  So I thought I'd better restrict the change to just
those SETs.

It would be a much simpler patch to make combine ignore all SETs
that are a reg,reg copy with one of them a hard reg.  I was a little
worried I'd regress some target if I tried that.  (Results on
powerpc64le-linux for such a change look good.  A lot more cases where
code is better, due to catching the parameter reg,reg copies.  In
looking at gcc/*.o I haven't yet seen any regressions in code quality.)

> > +/* Fill in log links field for all insns that we wish to combine.  */
> 
> Please don't change this comment; it was more correct before.

But it wasn't correct!  LOG_LINKS are not set up for *all* insns.
Perhaps I should add "that we might wish to combine" rather than
"that we wish to combine"?

> > @@ -1103,7 +1192,7 @@ create_log_links (void)
> >  		 we might wind up changing the semantics of the insn,
> >  		 even if reload can make what appear to be valid
> >  		 assignments later.  */
> > -	      if (regno < FIRST_PSEUDO_REGISTER
> > +	      if (HARD_REGISTER_NUM_P (regno)
> >  		  && asm_noperands (PATTERN (use_insn)) >= 0)
> >  		continue;
> 
> An independent change.

Yeah, I guess.  I tidy these if I'm working on a piece of code.
Here's the whole file done.  Boostrapped and regression tested
powerpc64le-linux and x86_64-linux.

	* combine.c: Use HARD_REGISTER_P and HARD_REGISTER_NUM_P as
	appropriate throughout file in place of comparing regno
	against FIRST_PSEUDO_REGISTER.

Index: combine.c
===================================================================
--- combine.c	(revision 223463)
+++ combine.c	(working copy)
@@ -1103,7 +1103,7 @@ create_log_links (void)
 		 we might wind up changing the semantics of the insn,
 		 even if reload can make what appear to be valid
 		 assignments later.  */
-	      if (regno < FIRST_PSEUDO_REGISTER
+	      if (HARD_REGISTER_NUM_P (regno)
 		  && asm_noperands (PATTERN (use_insn)) >= 0)
 		continue;
 
@@ -1730,7 +1730,7 @@ set_nonzero_bits_and_sign_copies (rtx x, const_rtx
   rtx_insn *insn = (rtx_insn *) data;
 
   if (REG_P (x)
-      && REGNO (x) >= FIRST_PSEUDO_REGISTER
+      && !HARD_REGISTER_P (x)
       /* If this register is undefined at the start of the file, we can't
 	 say what its contents were.  */
       && ! REGNO_REG_SET_P
@@ -1897,7 +1897,7 @@ can_combine_p (rtx_insn *insn, rtx_insn *i3, rtx_i
 			  && (REGNO (XEXP (i3elt, 0)) == regno
 			      ? reg_set_between_p (XEXP (elt, 0),
 						   PREV_INSN (insn), i3)
-			      : regno >= FIRST_PSEUDO_REGISTER))
+			      : !HARD_REGISTER_NUM_P (regno)))
 			return 0;
 		    }
 		  while (--i >= 0);
@@ -1971,7 +1971,7 @@ can_combine_p (rtx_insn *insn, rtx_insn *i3, rtx_i
       || (CALL_P (i3)
 	  && (find_reg_fusage (i3, USE, dest)
 	      || (REG_P (dest)
-		  && REGNO (dest) < FIRST_PSEUDO_REGISTER
+		  && HARD_REGISTER_P (dest)
 		  && global_regs[REGNO (dest)])))
       /* Don't substitute into an incremented register.  */
       || FIND_REG_INC_NOTE (i3, dest)
@@ -2021,7 +2021,7 @@ can_combine_p (rtx_insn *insn, rtx_insn *i3, rtx_i
 	 register.  */
 
       if (REG_P (src)
-	  && ((REGNO (dest) < FIRST_PSEUDO_REGISTER
+	  && ((HARD_REGISTER_P (dest)
 	       && ! HARD_REGNO_MODE_OK (REGNO (dest), GET_MODE (dest)))
 	      /* Don't extend the life of a hard register unless it is
 		 user variable (if we have few registers) or it can't
@@ -2030,7 +2030,7 @@ can_combine_p (rtx_insn *insn, rtx_insn *i3, rtx_i
 		 Also avoid substituting a return register into I3, because
 		 reload can't handle a conflict with constraints of other
 		 inputs.  */
-	      || (REGNO (src) < FIRST_PSEUDO_REGISTER
+	      || (HARD_REGISTER_P (src)
 		  && ! HARD_REGNO_MODE_OK (REGNO (src), GET_MODE (src)))))
 	return 0;
     }
@@ -2052,7 +2052,7 @@ can_combine_p (rtx_insn *insn, rtx_insn *i3, rtx_i
 	     we leave it up to the machine description to either accept or
 	     reject use-and-clobber patterns.  */
 	  if (!REG_P (reg)
-	      || REGNO (reg) >= FIRST_PSEUDO_REGISTER
+	      || !HARD_REGISTER_P (reg)
 	      || !fixed_regs[REGNO (reg)])
 	    if (reg_overlap_mentioned_p (reg, src))
 	      return 0;
@@ -2075,7 +2075,7 @@ can_combine_p (rtx_insn *insn, rtx_insn *i3, rtx_i
      to be an explicit register variable, and was chosen for a reason.  */
 
   if (GET_CODE (src) == ASM_OPERANDS
-      && REG_P (dest) && REGNO (dest) < FIRST_PSEUDO_REGISTER)
+      && REG_P (dest) && HARD_REGISTER_P (dest))
     return 0;
 
   /* If INSN contains volatile references (specifically volatile MEMs),
@@ -2221,7 +2221,7 @@ combinable_i3pat (rtx_insn *i3, rtx *loc, rtx i2de
 	     checks this; here, we do a more specific test for this case.  */
 
 	  || (REG_P (inner_dest)
-	      && REGNO (inner_dest) < FIRST_PSEUDO_REGISTER
+	      && HARD_REGISTER_P (inner_dest)
 	      && (! HARD_REGNO_MODE_OK (REGNO (inner_dest),
 					GET_MODE (inner_dest))))
 	  || (i1_not_in_src && reg_overlap_mentioned_p (i1dest, src))
@@ -2469,7 +2469,7 @@ can_change_dest_mode (rtx x, int added_sets, machi
   regno = REGNO (x);
   /* Allow hard registers if the new mode is legal, and occupies no more
      registers than the old mode.  */
-  if (regno < FIRST_PSEUDO_REGISTER)
+  if (HARD_REGISTER_NUM_P (regno))
     return (HARD_REGNO_MODE_OK (regno, mode)
 	    && REG_NREGS (x) >= hard_regno_nregs[regno][mode]);
 
@@ -2790,7 +2790,7 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn
 
   if (i1 == 0 && NONJUMP_INSN_P (i3) && GET_CODE (PATTERN (i3)) == SET
       && REG_P (SET_SRC (PATTERN (i3)))
-      && REGNO (SET_SRC (PATTERN (i3))) >= FIRST_PSEUDO_REGISTER
+      && !HARD_REGISTER_P (SET_SRC (PATTERN (i3)))
       && find_reg_note (i3, REG_DEAD, SET_SRC (PATTERN (i3)))
       && GET_CODE (PATTERN (i2)) == PARALLEL
       && ! side_effects_p (SET_DEST (PATTERN (i3)))
@@ -3219,7 +3219,7 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn
 		{
 		  unsigned int regno = REGNO (newpat_dest);
 		  compare_mode = new_mode;
-		  if (regno < FIRST_PSEUDO_REGISTER)
+		  if (HARD_REGISTER_NUM_P (regno))
 		    newpat_dest = gen_rtx_REG (compare_mode, regno);
 		  else
 		    {
@@ -3588,7 +3588,7 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn
 	      machine_mode old_mode = GET_MODE (i2dest);
 	      rtx ni2dest;
 
-	      if (REGNO (i2dest) < FIRST_PSEUDO_REGISTER)
+	      if (HARD_REGISTER_P (i2dest))
 		ni2dest = gen_rtx_REG (new_mode, REGNO (i2dest));
 	      else
 		{
@@ -3604,7 +3604,7 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn
 	      m_split_insn = combine_split_insns (parallel, i3);
 
 	      if (m_split_insn == 0
-		  && REGNO (i2dest) >= FIRST_PSEUDO_REGISTER)
+		  && !HARD_REGISTER_P (i2dest))
 		{
 		  struct undo *buf;
 
@@ -3725,7 +3725,7 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn
 	     validated that we can do this.  */
 	  if (GET_MODE (i2dest) != split_mode && split_mode != VOIDmode)
 	    {
-	      if (REGNO (i2dest) < FIRST_PSEUDO_REGISTER)
+	      if (HARD_REGISTER_P (i2dest))
 		newdest = gen_rtx_REG (split_mode, REGNO (i2dest));
 	      else
 		{
@@ -5390,7 +5390,7 @@ subst (rtx x, rtx from, rtx to, int in_dest, int i
 
 		  if (code == SUBREG
 		      && REG_P (to)
-		      && REGNO (to) < FIRST_PSEUDO_REGISTER
+		      && HARD_REGISTER_P (to)
 		      && simplify_subreg_regno (REGNO (to), GET_MODE (to),
 						SUBREG_BYTE (x),
 						GET_MODE (x)) < 0)
@@ -6645,7 +6645,7 @@ simplify_set (rtx x)
 	      unsigned int regno = REGNO (dest);
 	      rtx new_dest;
 
-	      if (regno < FIRST_PSEUDO_REGISTER)
+	      if (HARD_REGISTER_NUM_P (regno))
 		new_dest = gen_rtx_REG (compare_mode, regno);
 	      else
 		{
@@ -6750,7 +6750,7 @@ simplify_set (rtx x)
 	< GET_MODE_SIZE (GET_MODE (SUBREG_REG (src))))
 #endif
 #ifdef CANNOT_CHANGE_MODE_CLASS
-      && ! (REG_P (dest) && REGNO (dest) < FIRST_PSEUDO_REGISTER
+      && ! (REG_P (dest) && HARD_REGISTER_P (dest)
 	    && REG_CANNOT_CHANGE_MODE_P (REGNO (dest),
 					 GET_MODE (SUBREG_REG (src)),
 					 GET_MODE (src)))
@@ -9815,7 +9815,7 @@ reg_nonzero_bits_for_combine (const_rtx x, machine
 	   && rsp->last_set_label < label_tick)
 	  || (rsp->last_set_label == label_tick
               && DF_INSN_LUID (rsp->last_set) < subst_low_luid)
-	  || (REGNO (x) >= FIRST_PSEUDO_REGISTER
+	  || (!HARD_REGISTER_P (x)
 	      && REGNO (x) < reg_n_sets_max
 	      && REG_N_SETS (REGNO (x)) == 1
 	      && !REGNO_REG_SET_P
@@ -9879,7 +9879,7 @@ reg_num_sign_bit_copies_for_combine (const_rtx x,
 	   && rsp->last_set_label < label_tick)
 	  || (rsp->last_set_label == label_tick
               && DF_INSN_LUID (rsp->last_set) < subst_low_luid)
-	  || (REGNO (x) >= FIRST_PSEUDO_REGISTER
+	  || (!HARD_REGISTER_P (x)
 	      && REGNO (x) < reg_n_sets_max
 	      && REG_N_SETS (REGNO (x)) == 1
 	      && !REGNO_REG_SET_P
@@ -12921,7 +12921,7 @@ record_truncated_value (rtx x)
     }
   /* ??? For hard-regs we now record everything.  We might be able to
      optimize this using last_set_mode.  */
-  else if (REG_P (x) && REGNO (x) < FIRST_PSEUDO_REGISTER)
+  else if (REG_P (x) && HARD_REGISTER_P (x))
     truncated_mode = GET_MODE (x);
   else
     return false;
@@ -13012,7 +13012,7 @@ get_last_value_validate (rtx *loc, rtx_insn *insn,
 	  if (rsp->last_set_invalid
 	      /* If this is a pseudo-register that was only set once and not
 		 live at the beginning of the function, it is always valid.  */
-	      || (! (regno >= FIRST_PSEUDO_REGISTER
+	      || (! (!HARD_REGISTER_NUM_P (regno)
 		     && regno < reg_n_sets_max
 		     && REG_N_SETS (regno) == 1
 		     && (!REGNO_REG_SET_P
@@ -13129,7 +13129,7 @@ get_last_value (const_rtx x)
 
   if (value == 0
       || (rsp->last_set_label < label_tick_ebb_start
-	  && (regno < FIRST_PSEUDO_REGISTER
+	  && (HARD_REGISTER_NUM_P (regno)
 	      || regno >= reg_n_sets_max
 	      || REG_N_SETS (regno) != 1
 	      || REGNO_REG_SET_P
@@ -13257,7 +13257,7 @@ reg_dead_at_p (rtx reg, rtx_insn *insn)
   /* Check that reg isn't mentioned in NEWPAT_USED_REGS.  For fixed registers
      we allow the machine description to decide whether use-and-clobber
      patterns are OK.  */
-  if (reg_dead_regno < FIRST_PSEUDO_REGISTER)
+  if (HARD_REGISTER_NUM_P (reg_dead_regno))
     {
       for (i = reg_dead_regno; i < reg_dead_endregno; i++)
 	if (!fixed_regs[i] && TEST_HARD_REG_BIT (newpat_used_regs, i))
@@ -13331,7 +13331,7 @@ mark_used_regs_combine (rtx x)
       regno = REGNO (x);
       /* A hard reg in a wide mode may really be multiple registers.
 	 If so, mark all of them just like the first.  */
-      if (regno < FIRST_PSEUDO_REGISTER)
+      if (HARD_REGISTER_NUM_P (regno))
 	{
 	  /* None of this applies to the stack, frame or arg pointers.  */
 	  if (regno == STACK_POINTER_REGNUM
@@ -13449,7 +13449,7 @@ move_deaths (rtx x, rtx maybe_kill_insn, int from_
 	     including X.  In that case, we must put REG_DEAD notes for
 	     the remaining registers in place of NOTE.  */
 
-	  if (note != 0 && regno < FIRST_PSEUDO_REGISTER
+	  if (note != 0 && HARD_REGISTER_NUM_P (regno)
 	      && (GET_MODE_SIZE (GET_MODE (XEXP (note, 0)))
 		  > GET_MODE_SIZE (GET_MODE (x))))
 	    {
@@ -13472,7 +13472,7 @@ move_deaths (rtx x, rtx maybe_kill_insn, int from_
 		    || (note != 0
 			&& (GET_MODE_SIZE (GET_MODE (XEXP (note, 0)))
 			    < GET_MODE_SIZE (GET_MODE (x)))))
-		   && regno < FIRST_PSEUDO_REGISTER
+		   && HARD_REGISTER_NUM_P (regno)
 		   && REG_NREGS (x) > 1)
 	    {
 	      unsigned int ourend = END_REGNO (x);
@@ -13588,7 +13588,7 @@ reg_bitfield_target_p (rtx x, rtx body)
 	return 0;
 
       tregno = REGNO (target), regno = REGNO (x);
-      if (tregno >= FIRST_PSEUDO_REGISTER || regno >= FIRST_PSEUDO_REGISTER)
+      if (!HARD_REGISTER_NUM_P (tregno) || !HARD_REGISTER_NUM_P (regno))
 	return target == x;
 
       endtregno = end_hard_regno (GET_MODE (target), tregno);
@@ -13922,7 +13922,7 @@ distribute_notes (rtx notes, rtx_insn *from_insn,
 		     TEM_INSN is doing.  If so, delete TEM_INSN.  Otherwise, make this
 		     into a REG_UNUSED note instead. Don't delete sets to
 		     global register vars.  */
-		  if ((REGNO (XEXP (note, 0)) >= FIRST_PSEUDO_REGISTER
+		  if ((!HARD_REGISTER_P (XEXP (note, 0))
 		       || !global_regs[REGNO (XEXP (note, 0))])
 		      && reg_set_p (XEXP (note, 0), PATTERN (tem_insn)))
 		    {

-- 
Alan Modra
Australia Development Lab, IBM

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Don't combine param and return value copies
  2015-05-25  5:24   ` Alan Modra
@ 2015-05-25  7:30     ` Alan Modra
  2015-05-25 20:19     ` Segher Boessenkool
  1 sibling, 0 replies; 8+ messages in thread
From: Alan Modra @ 2015-05-25  7:30 UTC (permalink / raw)
  To: Segher Boessenkool, gcc-patches

On Mon, May 25, 2015 at 10:26:35AM +0930, Alan Modra wrote:
> looking at gcc/*.o I haven't yet seen any regressions in code quality.)

Well that didn't last very long.  There are regressions, and just from
looking at disassembled object files it would appear to be frame
pointer related.  So the simple patch won't work.

-- 
Alan Modra
Australia Development Lab, IBM

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Don't combine param and return value copies
  2015-05-25  5:24   ` Alan Modra
  2015-05-25  7:30     ` Alan Modra
@ 2015-05-25 20:19     ` Segher Boessenkool
  2015-05-26  8:15       ` Alan Modra
  1 sibling, 1 reply; 8+ messages in thread
From: Segher Boessenkool @ 2015-05-25 20:19 UTC (permalink / raw)
  To: gcc-patches

On Mon, May 25, 2015 at 10:26:35AM +0930, Alan Modra wrote:
> On Sat, May 23, 2015 at 08:45:23AM -0500, Segher Boessenkool wrote:
> > Thanks.  Did you see improvements around return as well, or mostly /
> > only related to the function start?
> 
> The rlwinm vs. rldicl change was in dwarf2out.c add_ranges_num for r3
> one insn before a blr.  I'm sure that was due to not combining return
> value copies.  In fact, I'm sure all the improvements I saw were due
> to changing the exit..  See below.
> 
> > > +/* Twiddle BLOCK_FOR_INSN to TO for instructions in the first block BB
> > > +   that we don't want to combine with other instructions.  */
> > > +
> > > +static void
> > > +twiddle_first_block (basic_block bb, basic_block to)
> > 
> > I don't like this much.  Messing with global state makes things harder
> > to change later.  If it is hard to come up with a good name for a
> > factor, it usually means it is not such a good factor.
> > 
> > You can do these inside can_combine_{def,use}_p as far as I can see?
> > Probably need to give those an extra parameter then: for _def, a bool
> > that says "don't allow moves from hard regs", set when the scan has
> > encountered a NOTE_INSN_FUNCTION_BEG in this bb; and for _use, a regset
> > of those hard regs we don't want to allow moves to (those seen in USEs
> > later in that same block).  Or do it in the main loop itself, _{def,use}
> > are each called in one place only; whatever works best.
> 
> Huh, that's the way I started writing this patch..  The reason I went
> with modifying BLOCK_FOR_INSN is that the loop setting up LOG_LINKS
> needs to test that anyway.  Changing code in the main loop means
> every insn in a function will need to be tested for additional
> conditions.  I thought that might be slower.  I can see your point
> though, especially if someone wanted to wean combine off LOG_LINKS.

The setup of the LOG_LINKS is a simple fast linear loop, much less
work than the rest of combine.  So don't worry about performance too
much :-)

> > What makes return values different from CALL args here, btw?  Why do
> > you not want to combine return values, but you leave alone call args?
> 
> I don't think there is much difference between SETs for return values
> and SETs for call args.  The reason I deliberately left them out is
> that in the original discussion we were talking about parameters and
> return values.  So I thought I'd better restrict the change to just
> those SETs.
> 
> It would be a much simpler patch to make combine ignore all SETs
> that are a reg,reg copy with one of them a hard reg.  I was a little
> worried I'd regress some target if I tried that.

_All_ moves to/from hard regs includes much more (register asm, fixed
registers, maybe some targets do weird things as well?).  So yes I share
those worries.

Since we do not want any other pass (before RA) to foul up these
register moves either, maybe a better solution would be to mark them
some way at expand time?

> (Results on
> powerpc64le-linux for such a change look good.  A lot more cases where
> code is better, due to catching the parameter reg,reg copies.  In
> looking at gcc/*.o I haven't yet seen any regressions in code quality.)

That is good news :-)

> > > +/* Fill in log links field for all insns that we wish to combine.  */
> > 
> > Please don't change this comment; it was more correct before.
> 
> But it wasn't correct!  LOG_LINKS are not set up for *all* insns.

It "sets" all LOG_LINKS to some value ("sets", it doesn't actually
zero them at the start, it has an assert for that though).

> > > @@ -1103,7 +1192,7 @@ create_log_links (void)
> > >  		 we might wind up changing the semantics of the insn,
> > >  		 even if reload can make what appear to be valid
> > >  		 assignments later.  */
> > > -	      if (regno < FIRST_PSEUDO_REGISTER
> > > +	      if (HARD_REGISTER_NUM_P (regno)
> > >  		  && asm_noperands (PATTERN (use_insn)) >= 0)
> > >  		continue;
> > 
> > An independent change.
> 
> Yeah, I guess.  I tidy these if I'm working on a piece of code.

Pretty far away in this case ;-)

> Here's the whole file done.  Boostrapped and regression tested
> powerpc64le-linux and x86_64-linux.
> 
> 	* combine.c: Use HARD_REGISTER_P and HARD_REGISTER_NUM_P as
> 	appropriate throughout file in place of comparing regno
> 	against FIRST_PSEUDO_REGISTER.

Have we decided we want to convert the whole compiler to this?  It
is a pretty lousy interface IMO: HARD_REGISTER_P does not check if
its arg is a register; it does not check if its arg is a hard register
either (it checks if it is not a pseudo); it cannot be used in all
places in all passes because of that (which means, not in all macros
and hooks either).


Segher

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Don't combine param and return value copies
  2015-05-25 20:19     ` Segher Boessenkool
@ 2015-05-26  8:15       ` Alan Modra
  2015-05-28 15:33         ` Segher Boessenkool
  0 siblings, 1 reply; 8+ messages in thread
From: Alan Modra @ 2015-05-26  8:15 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches

On powerpc64le, modifying the way combine treats function parameters
and call arguments results in some regressions.

For instance, this testcase from varasm.c

extern int foo3 (void *, ...);
extern void foo4 (void *, const char *);
int
emit_tls_common (void *decl,
		 const char *name,
		 unsigned long size)
{
  foo3 (0, "\t%s\t", "..");
  foo4 (0, name);
  foo3 (0, ",%lu,%u\n", size, ((unsigned int *)decl)[88] / 8);
  return 1;
}

at -O2 produces for the prologue and first call

old				new
	mflr 0				mflr 0
	std 29,-24(1)			std 29,-24(1)
	std 30,-16(1)			std 30,-16(1)
	mr 29,4				addis 9,2,.LC0@toc@ha
	std 31,-8(1)			std 31,-8(1)
	addis 4,2,.LC1@toc@ha		addis 10,2,.LC1@toc@ha
	mr 31,5				addi 9,9,.LC0@toc@l
	addis 5,2,.LC0@toc@ha		addi 10,10,.LC1@toc@l
	mr 30,3				mr 30,3
	addi 5,5,.LC0@toc@l		mr 29,4
	addi 4,4,.LC1@toc@l		mr 31,5
	li 3,0				mr 4,10
	std 0,16(1)			mr 5,9
	stdu 1,-128(1)			std 0,16(1)
	bl foo3				stdu 1,-128(1)
	nop				li 3,0
					bl foo3
					nop

As you can see, we have some extra register shuffling from keeping a
pseudo for arg setup insns.  I guess the pseudos allow sched more
freedom to mess around..

On the positive side, I saw cases where keeping parameter pseudos
allowed shrink-wrap to occur.  varasm.c:decode_reg_name_and_count is
one of them.  More shrink-wrapping is a big win.

Here's a case where changes at the return result in poorer code
int
decl_readonly_section_1 (int category)
{
  switch (category)
    {
    case 1:
    case 2:
    case 3:
    case 4:
    case 5:
      return 1;
    default:
      return 0;
    }
}
old			new
	addi 9,3,-6		addi 9,3,-6
	neg 3,3			neg 3,3
	and 3,9,3		and 3,9,3
	rldicl 3,3,33,63	srwi 3,3,31
	blr			rldicl 3,3,0,32
				blr

Previously this:
(insn 35 34 36 2 (set (reg:SI 161)
        (lshiftrt:SI (reg:SI 164)
            (const_int 31 [0x1f]))) {lshrsi3})
(insn 36 35 23 2 (set (reg:DI 155 [ D.2441 ])
        (zero_extend:DI (reg:SI 161))) {zero_extendsidi2})
(insn 23 36 24 2 (set (reg/i:DI 3 3)
        (reg:DI 155 [ D.2441 ])) {*movdi_internal64})

is first combined to
(insn 35 34 36 2 (set (reg:SI 161)
        (lshiftrt:SI (reg:SI 164)
            (const_int 31 [0x1f]))) {lshrsi3})
(insn 23 35 24 2 (set (reg/i:DI 3 3)
	(and:DI (subreg:DI (reg:SI 161) 0)
	    (const_int 1 [0x1]))))
which is somewhat surprising, but from my previous forays into
combine.c I'd say happens due to known zero bits.  (Just looking at
dumps here, rather than in gdb.)

Then the above is further combined to
(insn 23 34 24 2 (set (reg/i:DI 3 3)
	(zero_extract:DI (subreg:DI (reg:SI 164) 0)
	    (const_int 1 [0x1])
	    (const_int 32 [0x20])))

Looks to me like a missed optimization opportunity that insns 35 and
36 aren't combined without first going through the intermediate step.


Anyway, here's the rewritten patch.  I've left in some knobs I used
when testing in case you want to see for yourself what happens with
various options.  Bootstrapped etc. powerpc64le-linux and
x86_64-linux.  One testsuite regression appears on powerpc64 which
should go away if I push on getting an old patch of mine committed
https://gcc.gnu.org/ml/gcc-patches/2013-08/msg00971.html

	* combine.c (set_return_regs): New function.
	(create_log_links): Exclude instructions copying parameter
	values from hard regs to pseudos, and instructions copying
	call argument and return value pseudos to hard regs.

Index: gcc/combine.c
===================================================================
--- gcc/combine.c	(revision 223463)
+++ gcc/combine.c	(working copy)
@@ -1048,6 +1048,19 @@ can_combine_use_p (df_ref use)
   return true;
 }
 
+/* Used to build set of return value regs.  Add X to the set.  */
+
+static void
+set_return_regs (rtx x, void *arg)
+{
+  HARD_REG_SET *regs = (HARD_REG_SET *) arg;
+
+  add_to_hard_reg_set (regs, GET_MODE (x), REGNO (x));
+}
+
+#define DONT_COMBINE_PARAMS 1
+#define DONT_COMBINE_CALL_ARGS 1
+
 /* Fill in log links field for all insns.  */
 
 static void
@@ -1057,9 +1070,13 @@ create_log_links (void)
   rtx_insn **next_use;
   rtx_insn *insn;
   df_ref def, use;
+  HARD_REG_SET hard_regs;
 
   next_use = XCNEWVEC (rtx_insn *, max_reg_num ());
 
+  CLEAR_HARD_REG_SET (hard_regs);
+  diddle_return_value (set_return_regs, &hard_regs);
+
   /* Pass through each block from the end, recording the uses of each
      register and establishing log links when def is encountered.
      Note that we do not clear next_use array in order to save time,
@@ -1069,10 +1086,37 @@ create_log_links (void)
      usage -- these are taken from original flow.c did. Don't ask me why it is
      done this way; I don't know and if it works, I don't want to know.  */
 
-  FOR_EACH_BB_FN (bb, cfun)
+  bool in_parameters = false;
+  FOR_EACH_BB_REVERSE_FN (bb, cfun)
     {
       FOR_BB_INSNS_REVERSE (bb, insn)
         {
+	  if (DONT_COMBINE_PARAMS
+	      && NOTE_P (insn)
+	      && NOTE_KIND (insn) == NOTE_INSN_FUNCTION_BEG)
+	    {
+	      in_parameters = true;
+	      continue;
+	    }
+
+	  if (CALL_P (insn))
+	    {
+	      /* Add function args passed in hard regs to HARD_REGS.  */
+	      CLEAR_HARD_REG_SET (hard_regs);
+	      if (DONT_COMBINE_CALL_ARGS)
+		for (rtx link = CALL_INSN_FUNCTION_USAGE (insn);
+		     link;
+		     link = XEXP (link, 1))
+		  if (GET_CODE (XEXP (link, 0)) == USE)
+		    {
+		      rtx reg = XEXP (XEXP (link, 0), 0);
+		      if (REG_P (reg)
+			  && HARD_REGISTER_P (reg))
+			add_to_hard_reg_set (&hard_regs,
+					     GET_MODE (reg), REGNO (reg));
+		    }
+	    }
+
           if (!NONDEBUG_INSN_P (insn))
             continue;
 
@@ -1079,13 +1123,27 @@ create_log_links (void)
 	  /* Log links are created only once.  */
 	  gcc_assert (!LOG_LINKS (insn));
 
+	  /* Exclude certain reg,reg copies when one is a hard reg.
+	     Leaving these insns alone provides the register allocator
+	     useful information.  */
+	  rtx reg_reg = 0;
+	  int hard = 0;
+	  if (GET_CODE (PATTERN (insn)) == SET)
+	    {
+	      reg_reg = PATTERN (insn);
+	      if (REG_P (SET_DEST (reg_reg))
+		  && REG_P (SET_SRC (reg_reg)))
+		hard = (HARD_REGISTER_P (SET_DEST (reg_reg))
+			- HARD_REGISTER_P (SET_SRC (reg_reg)));
+	    }
+
 	  FOR_EACH_INSN_DEF (def, insn)
-            {
-              unsigned int regno = DF_REF_REGNO (def);
-              rtx_insn *use_insn;
+	    {
+	      unsigned int regno = DF_REF_REGNO (def);
+	      rtx_insn *use_insn;
 
-              if (!next_use[regno])
-                continue;
+	      if (!next_use[regno])
+		continue;
 
 	      if (!can_combine_def_p (def))
 		continue;
@@ -1096,6 +1154,14 @@ create_log_links (void)
 	      if (BLOCK_FOR_INSN (use_insn) != bb)
 		continue;
 
+	      if (in_parameters && hard < 0)
+		{
+		  /* This is a reg,reg copy from a hard reg to a pseudo,
+		     such as those copying parameter registers to
+		     pseudos.  Don't set up LOG_LINKS to this insn.  */
+		  continue;
+		}
+
 	      /* flow.c claimed:
 
 		 We don't build a LOG_LINK for hard registers contained
@@ -1110,18 +1176,36 @@ create_log_links (void)
 	      /* Don't add duplicate links between instructions.  */
 	      struct insn_link *links;
 	      FOR_EACH_LOG_LINK (links, use_insn)
-	        if (insn == links->insn && regno == links->regno)
+		if (insn == links->insn && regno == links->regno)
 		  break;
 
 	      if (!links)
 		LOG_LINKS (use_insn)
 		  = alloc_insn_link (insn, regno, LOG_LINKS (use_insn));
-            }
+	    }
 
 	  FOR_EACH_INSN_USE (use, insn)
-	    if (can_combine_use_p (use))
-	      next_use[DF_REF_REGNO (use)] = insn;
+	    {
+	      if (hard > 0
+		  && overlaps_hard_reg_set_p (hard_regs,
+					      GET_MODE (SET_DEST (reg_reg)),
+					      REGNO (SET_DEST (reg_reg))))
+		{
+		  /* This is a reg,reg copy to a hard register, such
+		     as those setting the function return value, or
+		     setting up arguments for a function call.  Don't
+		     provide LOG_LINKS from this insn.  This prevents
+		     the insn defining the pseudo from being combined
+		     into the reg,reg copy insn.  */
+		  remove_from_hard_reg_set (&hard_regs,
+					    GET_MODE (SET_DEST (reg_reg)),
+					    REGNO (SET_DEST (reg_reg)));
+		}
+	      else if (can_combine_use_p (use))
+		next_use[DF_REF_REGNO (use)] = insn;
+	    }
         }
+      CLEAR_HARD_REG_SET (hard_regs);
     }
 
   free (next_use);


-- 
Alan Modra
Australia Development Lab, IBM

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Don't combine param and return value copies
  2015-05-26  8:15       ` Alan Modra
@ 2015-05-28 15:33         ` Segher Boessenkool
  0 siblings, 0 replies; 8+ messages in thread
From: Segher Boessenkool @ 2015-05-28 15:33 UTC (permalink / raw)
  To: gcc-patches

On Tue, May 26, 2015 at 04:37:46PM +0930, Alan Modra wrote:
> On powerpc64le, modifying the way combine treats function parameters
> and call arguments results in some regressions.
> 
> For instance, this testcase from varasm.c
> 
> extern int foo3 (void *, ...);
> extern void foo4 (void *, const char *);
> int
> emit_tls_common (void *decl,
> 		 const char *name,
> 		 unsigned long size)
> {
>   foo3 (0, "\t%s\t", "..");
>   foo4 (0, name);
>   foo3 (0, ",%lu,%u\n", size, ((unsigned int *)decl)[88] / 8);
>   return 1;
> }
> 
> at -O2 produces for the prologue and first call
> 
> old				new
> 	mflr 0				mflr 0
> 	std 29,-24(1)			std 29,-24(1)
> 	std 30,-16(1)			std 30,-16(1)
> 	mr 29,4				addis 9,2,.LC0@toc@ha
> 	std 31,-8(1)			std 31,-8(1)
> 	addis 4,2,.LC1@toc@ha		addis 10,2,.LC1@toc@ha
> 	mr 31,5				addi 9,9,.LC0@toc@l
> 	addis 5,2,.LC0@toc@ha		addi 10,10,.LC1@toc@l
> 	mr 30,3				mr 30,3
> 	addi 5,5,.LC0@toc@l		mr 29,4
> 	addi 4,4,.LC1@toc@l		mr 31,5
> 	li 3,0				mr 4,10
> 	std 0,16(1)			mr 5,9
> 	stdu 1,-128(1)			std 0,16(1)
> 	bl foo3				stdu 1,-128(1)
> 	nop				li 3,0
> 					bl foo3
> 					nop
> 
> As you can see, we have some extra register shuffling from keeping a
> pseudo for arg setup insns.  I guess the pseudos allow sched more
> freedom to mess around..

... and then RA isn't able to move things back.  I see this happening
with all three changes (return value, incoming args, outgoing args);
the changes to combine give sched1 and RA more freedom, but those then
end up generating lots of unnecessary register moves.

> On the positive side, I saw cases where keeping parameter pseudos
> allowed shrink-wrap to occur.  varasm.c:decode_reg_name_and_count is
> one of them.  More shrink-wrapping is a big win.
> 
> Here's a case where changes at the return result in poorer code
> int
> decl_readonly_section_1 (int category)
> {
>   switch (category)
>     {
>     case 1:
>     case 2:
>     case 3:
>     case 4:
>     case 5:
>       return 1;
>     default:
>       return 0;
>     }
> }
> old			new
> 	addi 9,3,-6		addi 9,3,-6
> 	neg 3,3			neg 3,3
> 	and 3,9,3		and 3,9,3
> 	rldicl 3,3,33,63	srwi 3,3,31
> 	blr			rldicl 3,3,0,32
> 				blr
> 
> Previously this:
> (insn 35 34 36 2 (set (reg:SI 161)
>         (lshiftrt:SI (reg:SI 164)
>             (const_int 31 [0x1f]))) {lshrsi3})
> (insn 36 35 23 2 (set (reg:DI 155 [ D.2441 ])
>         (zero_extend:DI (reg:SI 161))) {zero_extendsidi2})
> (insn 23 36 24 2 (set (reg/i:DI 3 3)
>         (reg:DI 155 [ D.2441 ])) {*movdi_internal64})
> 
> is first combined to
> (insn 35 34 36 2 (set (reg:SI 161)
>         (lshiftrt:SI (reg:SI 164)
>             (const_int 31 [0x1f]))) {lshrsi3})
> (insn 23 35 24 2 (set (reg/i:DI 3 3)
> 	(and:DI (subreg:DI (reg:SI 161) 0)
> 	    (const_int 1 [0x1]))))
> which is somewhat surprising, but from my previous forays into
> combine.c I'd say happens due to known zero bits.  (Just looking at
> dumps here, rather than in gdb.)
> 
> Then the above is further combined to
> (insn 23 34 24 2 (set (reg/i:DI 3 3)
> 	(zero_extract:DI (subreg:DI (reg:SI 164) 0)
> 	    (const_int 1 [0x1])
> 	    (const_int 32 [0x20])))
> 
> Looks to me like a missed optimization opportunity that insns 35 and
> 36 aren't combined without first going through the intermediate step.

The rs6000 backend doesn't have zero_extend variants of many of its
patterns, only some.  Well-known problem, long-term project.

> Anyway, here's the rewritten patch.  I've left in some knobs I used
> when testing in case you want to see for yourself what happens with
> various options.  Bootstrapped etc. powerpc64le-linux and
> x86_64-linux.

> +#define DONT_COMBINE_PARAMS 1
> +#define DONT_COMBINE_CALL_ARGS 1

I tested with all combinations of those knob settings, building Linux
kernels (mostly defconfigs); these are the resulting text sizes:

  master   alan00   alan10   alan01   alan11
 5432728  5432728  5433848  5435472  5436080  alpha
 3851131  3851391  3852495  3852567  3853755  arm
 2190716  2190716  2190716  2190708  2190708  blackfin
 2191439  2191503  2191983  2192335  2192751  c6x
 2213186  2213250  2213154  2213482  2213546  cris
 3322420  3322420  3322500  3322564  3322692  frv
10898664 10898664 10898664 10898664 10898664  i386
 3253459  3253539  3253599  3255235  3255331  m32r
 4708528  4708532  4709772  4708660  4709656  microblaze
 3949689  3949745  3950269  3952401  3952857  mips
 5693823  5693975  5694443  5697971  5698227  mips64
 2374664  2374708  2375126  2375485  2375841  mn10300
 7488219  7488419  7492299  7489743  7493431  parisc
 6195727  6195935  6196367  6221647  6221687  parisc64
 8688975  8689111  8691931  8692619  8695279  powerpc
20252077 20252337 20255185 20266897 20269477  powerpc64
11423734 11421858 11423946 11422726 11424838  s390
 6488342  6488406  6488534  6489110  6489302  sh64
 1545652  1545652  1545776  1545812  1545944  shnommu
 3737005  3736973  3737357  3737585  3737945  sparc
 6075342  6074426  6074762  6075570  6075834  sparc64
12301607 12301607 12301543 12301787 12301723  x86_64
 1954029  1954061  1954441  1948841  1949305  xtensa

As you see, almost everything regresses code size.  s390 and sparc and
xtensa like some of it; everything else generates more register moves.
A typical one for powerpc, with the "00" setting:

before              after
rlwinm 3,9,N,N,N    rlwinm 9,9,N,N,N
                    mr 3,9

(and then blr etc., no further uses of r3 or r9; nothing special about
rlwinm here, there also are "and" and "addi" cases, etc.)

I think we should at least ameliorate this regression before we can
apply any variant of this :-(


Segher

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-05-28 15:22 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-23 11:20 [PATCH] Don't combine param and return value copies Alan Modra
2015-05-23 11:20 ` Andrew Pinski
2015-05-23 19:40 ` Segher Boessenkool
2015-05-25  5:24   ` Alan Modra
2015-05-25  7:30     ` Alan Modra
2015-05-25 20:19     ` Segher Boessenkool
2015-05-26  8:15       ` Alan Modra
2015-05-28 15:33         ` Segher Boessenkool

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).