public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* RFC/RFA: Fix bug with REE optimization corrupting extended registers
@ 2015-11-18 14:15 Nick Clifton
  2015-11-19  0:52 ` Bernd Schmidt
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Nick Clifton @ 2015-11-18 14:15 UTC (permalink / raw)
  To: gcc-patches

Hi Guys,

  I recently discovered a bug in the current Redundant Extension
  Elimination optimization.  If the candidate extension instruction
  increases the number of hard registers used, the pass does not check
  to see if these extra registers are live between the definition and
  the extension.

  For example:

    (insn  44 (set (reg:QI r11) (mem:QI (reg:HI r20))) 
    (insn  45 (set (reg:QI r10) (mem:QI (reg:HI r18))) 
    [...]
    (insn  71 (set (reg:HI r14) (zero_extend:HI (reg:QI r11)))
    [...]                                                  
    (insn  88 (set (reg:HI r10) (zero_extend:HI (reg:QI r10)))

  (This is on the RL78 target where HImode values occupy two hard
  registers and QImode values only one.  The bug however is generic, not
  RL78 specific).
  
  The REE pass transforms this into:

    (insn  44 (set (reg:QI r11) (mem:QI (reg:HI r20))) 
    (insn  45 (set (reg:HI r10) (zero_extend:HI (mem:QI (reg:HI r18)))) 
    [...]
    (insn  71 (set (reg:HI r14) (zero_extend:HI (reg:QI r11)))
    [...]
    (insn  88 deleted)

  Note how the new set at insn 45 clobbers the value loaded by insn 44
  into r11.

  The patch below is my attempt to fix this.  It is not correct
  however.  I could not work out how to determine if a given hard
  register is live at any point between two insns.  So instead I used
  the liveness information associated with the basic block containing
  the definition.  If one of the extended registers is live or becomes
  live in this block, and this block is not the same block as the one
  containing the extension insn, then I stop the optimization.  This
  works for the RL78 for all of the cases that I could find.

  So ... comments please ?

  Will gcc ever generate a single basic block that contains both the
  definition and the extension instructions, but also where the extra
  hard registers are used for another purpose as well ?

  Tested with no regressions (or fixes) on an x86-pc-linux-gnu target.
  Also tested with no regression and 7 fixes on an rl78-elf target.

Cheers
  Nick

gcc/ChangeLog
2015-11-18  Nick Clifton  <nickc@redhat.com>

	* ree.c (regs_live_between): New function.
        (add_removable_extension): If the extension uses more hard
        registers than the definition then check that the extra hard
        registers are not live between the definition and the
        extension.

Index: gcc/ree.c
===================================================================
--- gcc/ree.c	(revision 230517)
+++ gcc/ree.c	(working copy)
@@ -952,6 +952,49 @@
   return false;
 }
 
+/* Returns TRUE if any of the registers [start, stop) are live between DEF and up
+   to, but not including, INSN.  */
+
+static bool
+regs_live_between (rtx_insn * insn,
+		   struct df_link * def,
+		   unsigned int start,
+		   unsigned int stop)
+{
+  basic_block bb;
+
+  /* FIXME: This is an incomplete test.  It only checks the DEF and USE states of the
+     registers in basic block of DEF.  If any of them match the start-stop range and
+     the INSN is not in the same block, then it is assumed that the registers must be
+     live.  */
+  /* These first few checks are just paranoia... */
+  if (def != NULL
+      && def->ref != NULL
+      && def->ref->base.insn_info != NULL
+      && def->ref->base.insn_info->insn != NULL
+      && (bb = BLOCK_FOR_INSN (def->ref->base.insn_info->insn)) != NULL
+      && bb != BLOCK_FOR_INSN (insn))
+    {
+      struct df_lr_bb_info * bb_info = df_lr_get_bb_info (bb->index);
+      bitmap_iterator bi;
+      unsigned int reg;
+
+      EXECUTE_IF_SET_IN_BITMAP (& bb_info->def, 0, reg, bi)
+	{
+	  if (reg >= start && reg < stop)
+	    return true;
+	}
+
+      EXECUTE_IF_SET_IN_BITMAP (& bb_info->out, 0, reg, bi)
+	{
+	  if (reg >= start && reg < stop)
+	    return true;
+	}
+    }
+
+  return false;
+}
+
 /* Add an extension pattern that could be eliminated.  */
 
 static void
@@ -1080,6 +1123,30 @@
 	      }
 	  }
 
+      /* Fourth, if the extended version occupies more registers than
+	 the original version then make sure that these extra registers
+	 are not live between the definition and the extension.  */
+      if (HARD_REGNO_NREGS (REGNO (dest), mode)
+	  > HARD_REGNO_NREGS (REGNO (reg), GET_MODE (reg)))
+	{
+	  unsigned int start_reg = REGNO (reg) + HARD_REGNO_NREGS (REGNO (reg), GET_MODE (reg));
+	  unsigned int stop_reg  = REGNO (reg) + HARD_REGNO_NREGS (REGNO (dest), mode);
+
+	  for (def = defs; def; def = def->next)
+	    {
+	      if (regs_live_between (insn, def, start_reg, stop_reg))
+		{
+		  if (dump_file)
+		    {
+		      fprintf (dump_file, "Cannot eliminate extension:\n");
+		      print_rtl_single (dump_file, insn);
+		      fprintf (dump_file, " because extended register(s) are already used\n");
+		    }
+		  return;
+		}
+	    }
+	}
+
       /* Then add the candidate to the list and insert the reaching definitions
          into the definition map.  */
       ext_cand e = {expr, code, mode, insn};

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: RFC/RFA: Fix bug with REE optimization corrupting extended registers
  2015-11-18 14:15 RFC/RFA: Fix bug with REE optimization corrupting extended registers Nick Clifton
@ 2015-11-19  0:52 ` Bernd Schmidt
  2015-11-19  1:56   ` Jeff Law
  2015-11-19 15:34   ` Nick Clifton
  2015-11-19 19:25 ` Jeff Law
  2015-11-20 22:28 ` Jeff Law
  2 siblings, 2 replies; 15+ messages in thread
From: Bernd Schmidt @ 2015-11-19  0:52 UTC (permalink / raw)
  To: Nick Clifton, gcc-patches, Jeff Law

>    (This is on the RL78 target where HImode values occupy two hard
>    registers and QImode values only one.  The bug however is generic, not
>    RL78 specific).
>
>    The REE pass transforms this into:
>
>      (insn  44 (set (reg:QI r11) (mem:QI (reg:HI r20)))
>      (insn  45 (set (reg:HI r10) (zero_extend:HI (mem:QI (reg:HI r18))))
>      [...]
>      (insn  71 (set (reg:HI r14) (zero_extend:HI (reg:QI r11)))
>      [...]
>      (insn  88 deleted)
>
>    Note how the new set at insn 45 clobbers the value loaded by insn 44
>    into r11.

I had a look around. There's code testing HARD_REGNO_NREGS in 
ree.c:combine_set_extension. It's inside #if 0, and labelled 
"temporarily disabled". See if enabling that helps you? (Jeff, that #if 
0 was added by you).


Bernd

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: RFC/RFA: Fix bug with REE optimization corrupting extended registers
  2015-11-19  0:52 ` Bernd Schmidt
@ 2015-11-19  1:56   ` Jeff Law
  2015-11-19 15:34   ` Nick Clifton
  1 sibling, 0 replies; 15+ messages in thread
From: Jeff Law @ 2015-11-19  1:56 UTC (permalink / raw)
  To: Bernd Schmidt, Nick Clifton, gcc-patches

On 11/18/2015 05:52 PM, Bernd Schmidt wrote:
>>    (This is on the RL78 target where HImode values occupy two hard
>>    registers and QImode values only one.  The bug however is generic, not
>>    RL78 specific).
>>
>>    The REE pass transforms this into:
>>
>>      (insn  44 (set (reg:QI r11) (mem:QI (reg:HI r20)))
>>      (insn  45 (set (reg:HI r10) (zero_extend:HI (mem:QI (reg:HI r18))))
>>      [...]
>>      (insn  71 (set (reg:HI r14) (zero_extend:HI (reg:QI r11)))
>>      [...]
>>      (insn  88 deleted)
>>
>>    Note how the new set at insn 45 clobbers the value loaded by insn 44
>>    into r11.
>
> I had a look around. There's code testing HARD_REGNO_NREGS in
> ree.c:combine_set_extension. It's inside #if 0, and labelled
> "temporarily disabled". See if enabling that helps you? (Jeff, that #if
> 0 was added by you).
I was going to dig into this discussion tomorrow since I was the last 
one in that code.  Just had to get that threading regression resolved 
first :-)

jeff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: RFC/RFA: Fix bug with REE optimization corrupting extended registers
  2015-11-19  0:52 ` Bernd Schmidt
  2015-11-19  1:56   ` Jeff Law
@ 2015-11-19 15:34   ` Nick Clifton
  2015-11-19 16:53     ` Bernd Schmidt
  2015-11-19 18:31     ` Jeff Law
  1 sibling, 2 replies; 15+ messages in thread
From: Nick Clifton @ 2015-11-19 15:34 UTC (permalink / raw)
  To: Bernd Schmidt, gcc-patches, Jeff Law

Hi Bernd,

> I had a look around. There's code testing HARD_REGNO_NREGS in
> ree.c:combine_set_extension. It's inside #if 0, and labelled
> "temporarily disabled". See if enabling that helps you? (Jeff, that #if
> 0 was added by you).

I suspect that the code was disabled because it prevented too many 
useful optimization opportunities.

The code there would solve this problem, but the approach is is overly 
cautious, since it disables the optimization for all extensions that 
increase the number of hard registers used.  Some of these will be 
viable candidates, provided that the extra hard registers are no used. 
(This is certainly true for the RL78, where the (patched) optimization 
does improve code, even though the widening does use extra registers).

Cheers
   Nick


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: RFC/RFA: Fix bug with REE optimization corrupting extended registers
  2015-11-19 15:34   ` Nick Clifton
@ 2015-11-19 16:53     ` Bernd Schmidt
  2015-11-19 17:06       ` Jeff Law
  2015-11-19 18:31     ` Jeff Law
  1 sibling, 1 reply; 15+ messages in thread
From: Bernd Schmidt @ 2015-11-19 16:53 UTC (permalink / raw)
  To: Nick Clifton, gcc-patches, Jeff Law

On 11/19/2015 04:34 PM, Nick Clifton wrote:
> Hi Bernd,
>
>> I had a look around. There's code testing HARD_REGNO_NREGS in
>> ree.c:combine_set_extension. It's inside #if 0, and labelled
>> "temporarily disabled". See if enabling that helps you? (Jeff, that #if
>> 0 was added by you).
>
> I suspect that the code was disabled because it prevented too many
> useful optimization opportunities.

Well, correctness goes first. Since reenabling it fixes your problem, 
that change is approved (conditional on testing as usual obviously).


Bernd

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: RFC/RFA: Fix bug with REE optimization corrupting extended registers
  2015-11-19 16:53     ` Bernd Schmidt
@ 2015-11-19 17:06       ` Jeff Law
  2015-11-19 17:18         ` Bernd Schmidt
  0 siblings, 1 reply; 15+ messages in thread
From: Jeff Law @ 2015-11-19 17:06 UTC (permalink / raw)
  To: Bernd Schmidt, Nick Clifton, gcc-patches

On 11/19/2015 09:53 AM, Bernd Schmidt wrote:
> On 11/19/2015 04:34 PM, Nick Clifton wrote:
>> Hi Bernd,
>>
>>> I had a look around. There's code testing HARD_REGNO_NREGS in
>>> ree.c:combine_set_extension. It's inside #if 0, and labelled
>>> "temporarily disabled". See if enabling that helps you? (Jeff, that #if
>>> 0 was added by you).
>>
>> I suspect that the code was disabled because it prevented too many
>> useful optimization opportunities.
>
> Well, correctness goes first. Since reenabling it fixes your problem,
> that change is approved (conditional on testing as usual obviously).
I couldn't convince myself it was a correct check.  Having Nick's 
testcase should help with the analysis.

Frankly, the overall structure of ree is a mess -- I've found it 
incredibly difficult to follow every time I've had to look at it.

Let me take a peek today and see if I can get familiar enough with this 
code again to help.


Jeff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: RFC/RFA: Fix bug with REE optimization corrupting extended registers
  2015-11-19 17:06       ` Jeff Law
@ 2015-11-19 17:18         ` Bernd Schmidt
  2015-11-19 18:01           ` Jeff Law
  0 siblings, 1 reply; 15+ messages in thread
From: Bernd Schmidt @ 2015-11-19 17:18 UTC (permalink / raw)
  To: Jeff Law, Nick Clifton, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 437 bytes --]

On 11/19/2015 06:06 PM, Jeff Law wrote:

> Frankly, the overall structure of ree is a mess -- I've found it
> incredibly difficult to follow every time I've had to look at it.

Yeah, no kidding. The check seems to be in the wrong place - it's done 
very late when we're replacing things, and IMO we should just restrict 
the candidates for optimization much earlier.

Also, I want to apply the following. Ok if testing succeeds?


Bernd

[-- Attachment #2: clean-ree1.diff --]
[-- Type: text/x-patch, Size: 3929 bytes --]

commit ce68938b5150f5d41a54ed317ab97d98461be064
Author: Bernd Schmidt <bernds_cb1@t-online.de>
Date:   Thu Nov 19 17:38:15 2015 +0100

    Readability improvements in ree.c:combine_reaching_defs.
    
    	* ree.c (combine_reaching_defs): Simplify code by caching expressions
    	in variables.

diff --git a/gcc/ree.c b/gcc/ree.c
index b8436f2..4550cc3 100644
--- a/gcc/ree.c
+++ b/gcc/ree.c
@@ -731,6 +731,9 @@ get_extended_src_reg (rtx src)
 static bool
 combine_reaching_defs (ext_cand *cand, const_rtx set_pat, ext_state *state)
 {
+  rtx cand_pat = PATTERN (cand->insn);
+  rtx cand_dest = SET_DEST (cand_pat);
+  rtx cand_src = SET_SRC (cand_pat);
   rtx_insn *def_insn;
   bool merge_successful = true;
   int i;
@@ -749,9 +752,8 @@ combine_reaching_defs (ext_cand *cand, const_rtx set_pat, ext_state *state)
      register than the source operand, then additional restrictions
      are needed.  Note we have to handle cases where we have nested
      extensions in the source operand.  */
-  bool copy_needed
-    = (REGNO (SET_DEST (PATTERN (cand->insn)))
-       != REGNO (get_extended_src_reg (SET_SRC (PATTERN (cand->insn)))));
+  rtx src_reg = get_extended_src_reg (cand_src);
+  bool copy_needed = REGNO (cand_dest) != REGNO (src_reg);
   if (copy_needed)
     {
       /* Considering transformation of
@@ -780,8 +782,7 @@ combine_reaching_defs (ext_cand *cand, const_rtx set_pat, ext_state *state)
       if (state->modified[INSN_UID (cand->insn)].kind != EXT_MODIFIED_NONE)
 	return false;
 
-      machine_mode dst_mode = GET_MODE (SET_DEST (PATTERN (cand->insn)));
-      rtx src_reg = get_extended_src_reg (SET_SRC (PATTERN (cand->insn)));
+      machine_mode dst_mode = GET_MODE (cand_dest);
 
       /* Ensure the number of hard registers of the copy match.  */
       if (HARD_REGNO_NREGS (REGNO (src_reg), dst_mode)
@@ -812,17 +813,15 @@ combine_reaching_defs (ext_cand *cand, const_rtx set_pat, ext_state *state)
 	  || !REG_P (SET_DEST (*dest_sub_rtx)))
 	return false;
 
-      rtx tmp_reg = gen_rtx_REG (GET_MODE (SET_DEST (PATTERN (cand->insn))),
+      rtx tmp_reg = gen_rtx_REG (GET_MODE (cand_dest),
 				 REGNO (SET_DEST (*dest_sub_rtx)));
-      if (reg_overlap_mentioned_p (tmp_reg, SET_DEST (PATTERN (cand->insn))))
+      if (reg_overlap_mentioned_p (tmp_reg, cand_dest))
 	return false;
 
       /* The destination register of the extension insn must not be
 	 used or set between the def_insn and cand->insn exclusive.  */
-      if (reg_used_between_p (SET_DEST (PATTERN (cand->insn)),
-			      def_insn, cand->insn)
-	  || reg_set_between_p (SET_DEST (PATTERN (cand->insn)),
-				def_insn, cand->insn))
+      if (reg_used_between_p (cand_dest, def_insn, cand->insn)
+	  || reg_set_between_p (cand_dest, def_insn, cand->insn))
 	return false;
 
       /* We must be able to copy between the two registers.   Generate,
@@ -836,11 +835,8 @@ combine_reaching_defs (ext_cand *cand, const_rtx set_pat, ext_state *state)
 	 is different than in the code to emit the copy as we have not
 	 modified the defining insn yet.  */
       start_sequence ();
-      rtx pat = PATTERN (cand->insn);
-      rtx new_dst = gen_rtx_REG (GET_MODE (SET_DEST (pat)),
-                                 REGNO (get_extended_src_reg (SET_SRC (pat))));
-      rtx new_src = gen_rtx_REG (GET_MODE (SET_DEST (pat)),
-                                 REGNO (SET_DEST (pat)));
+      rtx new_dst = gen_rtx_REG (GET_MODE (cand_dest), REGNO (src_reg));
+      rtx new_src = gen_rtx_REG (GET_MODE (cand_dest), REGNO (cand_dest));
       emit_move_insn (new_dst, new_src);
 
       rtx_insn *insn = get_insns();
@@ -854,7 +850,6 @@ combine_reaching_defs (ext_cand *cand, const_rtx set_pat, ext_state *state)
 	return false;
     }
 
-
   /* If cand->insn has been already modified, update cand->mode to a wider
      mode if possible, or punt.  */
   if (state->modified[INSN_UID (cand->insn)].kind != EXT_MODIFIED_NONE)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: RFC/RFA: Fix bug with REE optimization corrupting extended registers
  2015-11-19 17:18         ` Bernd Schmidt
@ 2015-11-19 18:01           ` Jeff Law
  0 siblings, 0 replies; 15+ messages in thread
From: Jeff Law @ 2015-11-19 18:01 UTC (permalink / raw)
  To: Bernd Schmidt, Nick Clifton, gcc-patches

On 11/19/2015 10:18 AM, Bernd Schmidt wrote:
> On 11/19/2015 06:06 PM, Jeff Law wrote:
>
>> Frankly, the overall structure of ree is a mess -- I've found it
>> incredibly difficult to follow every time I've had to look at it.
>
> Yeah, no kidding. The check seems to be in the wrong place - it's done
> very late when we're replacing things, and IMO we should just restrict
> the candidates for optimization much earlier.
And I didn't make it any cleaner when I hacked it up to handle one more 
case :(    I'm hoping that a year away will allow me to look and think 
about the code again without getting a horrid headache.


>
> Also, I want to apply the following. Ok if testing succeeds?
>
>
> Bernd
>
> clean-ree1.diff
>
>
> commit ce68938b5150f5d41a54ed317ab97d98461be064
> Author: Bernd Schmidt<bernds_cb1@t-online.de>
> Date:   Thu Nov 19 17:38:15 2015 +0100
>
>      Readability improvements in ree.c:combine_reaching_defs.
>
>      	* ree.c (combine_reaching_defs): Simplify code by caching expressions
>      	in variables.
Yes. That's good with me.

jeff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: RFC/RFA: Fix bug with REE optimization corrupting extended registers
  2015-11-19 15:34   ` Nick Clifton
  2015-11-19 16:53     ` Bernd Schmidt
@ 2015-11-19 18:31     ` Jeff Law
  2015-11-20  9:20       ` Nick Clifton
  1 sibling, 1 reply; 15+ messages in thread
From: Jeff Law @ 2015-11-19 18:31 UTC (permalink / raw)
  To: Nick Clifton, Bernd Schmidt, gcc-patches

On 11/19/2015 08:34 AM, Nick Clifton wrote:
> Hi Bernd,
>
>> I had a look around. There's code testing HARD_REGNO_NREGS in
>> ree.c:combine_set_extension. It's inside #if 0, and labelled
>> "temporarily disabled". See if enabling that helps you? (Jeff, that #if
>> 0 was added by you).
>
> I suspect that the code was disabled because it prevented too many
> useful optimization opportunities.
>
> The code there would solve this problem, but the approach is is overly
> cautious, since it disables the optimization for all extensions that
> increase the number of hard registers used.  Some of these will be
> viable candidates, provided that the extra hard registers are no used.
> (This is certainly true for the RL78, where the (patched) optimization
> does improve code, even though the widening does use extra registers).
Nick -- can you pass along your testcode?

Jeff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: RFC/RFA: Fix bug with REE optimization corrupting extended registers
  2015-11-18 14:15 RFC/RFA: Fix bug with REE optimization corrupting extended registers Nick Clifton
  2015-11-19  0:52 ` Bernd Schmidt
@ 2015-11-19 19:25 ` Jeff Law
  2015-11-20 22:28 ` Jeff Law
  2 siblings, 0 replies; 15+ messages in thread
From: Jeff Law @ 2015-11-19 19:25 UTC (permalink / raw)
  To: Nick Clifton, gcc-patches

On 11/18/2015 07:15 AM, Nick Clifton wrote:
> Hi Guys,
>
>    I recently discovered a bug in the current Redundant Extension
>    Elimination optimization.  If the candidate extension instruction
>    increases the number of hard registers used, the pass does not check
>    to see if these extra registers are live between the definition and
>    the extension.
>
>    For example:
>
>      (insn  44 (set (reg:QI r11) (mem:QI (reg:HI r20)))
>      (insn  45 (set (reg:QI r10) (mem:QI (reg:HI r18)))
>      [...]
>      (insn  71 (set (reg:HI r14) (zero_extend:HI (reg:QI r11)))
>      [...]
>      (insn  88 (set (reg:HI r10) (zero_extend:HI (reg:QI r10)))
>
>    (This is on the RL78 target where HImode values occupy two hard
>    registers and QImode values only one.  The bug however is generic, not
>    RL78 specific).
Right.  I can recall walking though multi-reg cases on my whiteboard. 
There was one particular case Jakub was concerned about in this space, 
but I couldn't contrive a testcase to trigger and I was getting to the 
point where I was struggling to be able to reason about the behaviour of 
the code.




>
>    The REE pass transforms this into:
>
>      (insn  44 (set (reg:QI r11) (mem:QI (reg:HI r20)))
>      (insn  45 (set (reg:HI r10) (zero_extend:HI (mem:QI (reg:HI r18))))
>      [...]
>      (insn  71 (set (reg:HI r14) (zero_extend:HI (reg:QI r11)))
>      [...]
>      (insn  88 deleted)
>
>    Note how the new set at insn 45 clobbers the value loaded by insn 44
>    into r11.
Right.

>
>    The patch below is my attempt to fix this.  It is not correct
>    however.  I could not work out how to determine if a given hard
>    register is live at any point between two insns.  So instead I used
>    the liveness information associated with the basic block containing
>    the definition.  If one of the extended registers is live or becomes
>    live in this block, and this block is not the same block as the one
>    containing the extension insn, then I stop the optimization.  This
>    works for the RL78 for all of the cases that I could find.
The way this pass has dealt with this class of problems is the various 
routines in rtlanal.c  reg_used_between_p, reg_set_between_p and the like.


>
>    So ... comments please ?
>
>    Will gcc ever generate a single basic block that contains both the
>    definition and the extension instructions, but also where the extra
>    hard registers are used for another purpose as well ?
I can't see any reason why it would not.


jeff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: RFC/RFA: Fix bug with REE optimization corrupting extended registers
  2015-11-19 18:31     ` Jeff Law
@ 2015-11-20  9:20       ` Nick Clifton
  2015-11-20 17:32         ` Jeff Law
  2015-11-20 22:01         ` Jeff Law
  0 siblings, 2 replies; 15+ messages in thread
From: Nick Clifton @ 2015-11-20  9:20 UTC (permalink / raw)
  To: Jeff Law, Bernd Schmidt, gcc-patches

Hi Jeff,

>> The code there would solve this problem, but the approach is is overly
>> cautious, since it disables the optimization for all extensions that
>> increase the number of hard registers used.  Some of these will be
>> viable candidates, provided that the extra hard registers are no used.
>> (This is certainly true for the RL78, where the (patched) optimization
>> does improve code, even though the widening does use extra registers).
> Nick -- can you pass along your testcode?

Sure - this is for the RL78 toolchain.  In theory the problem is 
generic, but I have not tested other toolchains.

Compile the gcc.c-torture/execute/pr42833.c or 
gcc.c-torture/execute/strct-stdarg-1.c tests at -O1 or higher with -free 
also specified on the command line.  Without -free these tests pass. 
With -free they fail.

Cheers
   Nick

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: RFC/RFA: Fix bug with REE optimization corrupting extended registers
  2015-11-20  9:20       ` Nick Clifton
@ 2015-11-20 17:32         ` Jeff Law
  2015-11-20 22:01         ` Jeff Law
  1 sibling, 0 replies; 15+ messages in thread
From: Jeff Law @ 2015-11-20 17:32 UTC (permalink / raw)
  To: Nick Clifton, Bernd Schmidt, gcc-patches

On 11/20/2015 02:19 AM, Nick Clifton wrote:
> Hi Jeff,
>
>>> The code there would solve this problem, but the approach is is overly
>>> cautious, since it disables the optimization for all extensions that
>>> increase the number of hard registers used.  Some of these will be
>>> viable candidates, provided that the extra hard registers are no used.
>>> (This is certainly true for the RL78, where the (patched) optimization
>>> does improve code, even though the widening does use extra registers).
>> Nick -- can you pass along your testcode?
>
> Sure - this is for the RL78 toolchain.  In theory the problem is
> generic, but I have not tested other toolchains.
Right.  It just really helps me to have something I can poke at -- it's 
just the type of learner I am.  I tried to trip the test in that #if 
several times last year and never came up with anything.  So naturally 
if we can trip it with the rl78 I want to dig into it.


>
> Compile the gcc.c-torture/execute/pr42833.c or
> gcc.c-torture/execute/strct-stdarg-1.c tests at -O1 or higher with -free
> also specified on the command line.  Without -free these tests pass.
> With -free they fail.
Perfect.  Building rl78-elf cross bits as I type...

jeff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: RFC/RFA: Fix bug with REE optimization corrupting extended registers
  2015-11-20  9:20       ` Nick Clifton
  2015-11-20 17:32         ` Jeff Law
@ 2015-11-20 22:01         ` Jeff Law
  2015-11-20 22:10           ` Eric Botcazou
  1 sibling, 1 reply; 15+ messages in thread
From: Jeff Law @ 2015-11-20 22:01 UTC (permalink / raw)
  To: Nick Clifton, Bernd Schmidt, gcc-patches

On 11/20/2015 02:19 AM, Nick Clifton wrote:
> Hi Jeff,
>
>>> The code there would solve this problem, but the approach is is overly
>>> cautious, since it disables the optimization for all extensions that
>>> increase the number of hard registers used.  Some of these will be
>>> viable candidates, provided that the extra hard registers are no used.
>>> (This is certainly true for the RL78, where the (patched) optimization
>>> does improve code, even though the widening does use extra registers).
>> Nick -- can you pass along your testcode?
>
> Sure - this is for the RL78 toolchain.  In theory the problem is
> generic, but I have not tested other toolchains.
>
> Compile the gcc.c-torture/execute/pr42833.c or
> gcc.c-torture/execute/strct-stdarg-1.c tests at -O1 or higher with -free
> also specified on the command line.  Without -free these tests pass.
> With -free they fail.
So this looks like a pretty fundamental failing in ree.c and AFAICT has 
been there since day 1.  Thankfully, I don't think it's likely to 
trigger all that often.

It's useful to ponder two cases.  One where we're going to add a copy 
and one where we don't.  I added the copy case during the 4.9 cycle as 
an extension of existing code.  It only fires in easy to analyze 
circumstances -- when the original set and the extension are in the same 
basic block, but have different destination registers.

In the copy case.  The code checks that the destination of the 
*extension* is neither used nor set between the original set and the 
extension.  So in the case of multi-word hard reg, we'll verify that the 
entire multi-word hard reg survives from the original insn to the 
extension.  That avoids this problem in cases where a copy is needed.

However, in the case where no copy is needed, no such checks are made. 
In fact the checks could be fairly painful as the original set and the 
extension can be in different blocks with arbitrary flow between them.

I would hazard a guess that the authors simply didn't consider the 
multi-hard reg case.  Essentially if the original set reached an 
extension, then obviously the original set got there unharmed and the 
extended destination should reach as well -- except that doesn't apply 
to multi-word hard regs.

Given the expense in checking all the potential paths between the 
original set and the extension, I think just disallowing this case ought 
to be is the best way to go.  Ideally we'll filter out that case before 
we get to combine_set_extension.  But if we can't, it's easy enough to 
test there.

Jeff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: RFC/RFA: Fix bug with REE optimization corrupting extended registers
  2015-11-20 22:01         ` Jeff Law
@ 2015-11-20 22:10           ` Eric Botcazou
  0 siblings, 0 replies; 15+ messages in thread
From: Eric Botcazou @ 2015-11-20 22:10 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches, Nick Clifton, Bernd Schmidt

> I would hazard a guess that the authors simply didn't consider the
> multi-hard reg case.  Essentially if the original set reached an
> extension, then obviously the original set got there unharmed and the
> extended destination should reach as well -- except that doesn't apply
> to multi-word hard regs.

This pass started as a Zero-Extension Elimination pass for x86-64 and was only 
considering implicit SI->DI extensions initially.  The algorithm was tailored 
to this specific pattern and I agree that the pass should be reimplemented.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: RFC/RFA: Fix bug with REE optimization corrupting extended registers
  2015-11-18 14:15 RFC/RFA: Fix bug with REE optimization corrupting extended registers Nick Clifton
  2015-11-19  0:52 ` Bernd Schmidt
  2015-11-19 19:25 ` Jeff Law
@ 2015-11-20 22:28 ` Jeff Law
  2 siblings, 0 replies; 15+ messages in thread
From: Jeff Law @ 2015-11-20 22:28 UTC (permalink / raw)
  To: Nick Clifton, gcc-patches

On 11/18/2015 07:15 AM, Nick Clifton wrote:
> Hi Guys,
>
>    I recently discovered a bug in the current Redundant Extension
>    Elimination optimization.  If the candidate extension instruction
>    increases the number of hard registers used, the pass does not check
>    to see if these extra registers are live between the definition and
>    the extension.
>
>    For example:
>
>      (insn  44 (set (reg:QI r11) (mem:QI (reg:HI r20)))
>      (insn  45 (set (reg:QI r10) (mem:QI (reg:HI r18)))
>      [...]
>      (insn  71 (set (reg:HI r14) (zero_extend:HI (reg:QI r11)))
>      [...]
>      (insn  88 (set (reg:HI r10) (zero_extend:HI (reg:QI r10)))
>
>    (This is on the RL78 target where HImode values occupy two hard
>    registers and QImode values only one.  The bug however is generic, not
>    RL78 specific).
>
>    The REE pass transforms this into:
>
>      (insn  44 (set (reg:QI r11) (mem:QI (reg:HI r20)))
>      (insn  45 (set (reg:HI r10) (zero_extend:HI (mem:QI (reg:HI r18))))
>      [...]
>      (insn  71 (set (reg:HI r14) (zero_extend:HI (reg:QI r11)))
>      [...]
>      (insn  88 deleted)
>
>    Note how the new set at insn 45 clobbers the value loaded by insn 44
>    into r11.
>
>    The patch below is my attempt to fix this.  It is not correct
>    however.  I could not work out how to determine if a given hard
>    register is live at any point between two insns.  So instead I used
>    the liveness information associated with the basic block containing
>    the definition.  If one of the extended registers is live or becomes
>    live in this block, and this block is not the same block as the one
>    containing the extension insn, then I stop the optimization.  This
>    works for the RL78 for all of the cases that I could find.
>
>    So ... comments please ?
>
>    Will gcc ever generate a single basic block that contains both the
>    definition and the extension instructions, but also where the extra
>    hard registers are used for another purpose as well ?
>
>    Tested with no regressions (or fixes) on an x86-pc-linux-gnu target.
>    Also tested with no regression and 7 fixes on an rl78-elf target.
>
> Cheers
>    Nick
>
> gcc/ChangeLog
> 2015-11-18  Nick Clifton  <nickc@redhat.com>
>
> 	* ree.c (regs_live_between): New function.
>          (add_removable_extension): If the extension uses more hard
>          registers than the definition then check that the extra hard
>          registers are not live between the definition and the
>          extension.
>
> @@ -1080,6 +1123,30 @@
>   	      }
>   	  }
>
> +      /* Fourth, if the extended version occupies more registers than
> +	 the original version then make sure that these extra registers
> +	 are not live between the definition and the extension.  */
> +      if (HARD_REGNO_NREGS (REGNO (dest), mode)
> +	  > HARD_REGNO_NREGS (REGNO (reg), GET_MODE (reg)))
So this seems like a reasonable place to catch this.  I certainly prefer 
to avoid work later by filtering earlier like you've done.

The problem is with your implementation of regs_live_between.  You're 
assuming we've got straight-line code between the original set and the 
extension.  That's not guaranteed.  Essentially we've got use-def chains 
via the DF framework.  So, in theory, they can be in different blocks 
and arbitrary flow control between them.  In fact, the extension could 
be before the original set in the insn chain (though obviously not for 
execution order).

I think it's reasonable to just punt when we see that the source/dest 
register of the extension are the same and the number of hard regs is 
different.  That's what I'm testing now.

jeff


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2015-11-20 22:24 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-18 14:15 RFC/RFA: Fix bug with REE optimization corrupting extended registers Nick Clifton
2015-11-19  0:52 ` Bernd Schmidt
2015-11-19  1:56   ` Jeff Law
2015-11-19 15:34   ` Nick Clifton
2015-11-19 16:53     ` Bernd Schmidt
2015-11-19 17:06       ` Jeff Law
2015-11-19 17:18         ` Bernd Schmidt
2015-11-19 18:01           ` Jeff Law
2015-11-19 18:31     ` Jeff Law
2015-11-20  9:20       ` Nick Clifton
2015-11-20 17:32         ` Jeff Law
2015-11-20 22:01         ` Jeff Law
2015-11-20 22:10           ` Eric Botcazou
2015-11-19 19:25 ` Jeff Law
2015-11-20 22:28 ` Jeff Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).