public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* PR 35232: Recent regression due to reload inheritance bug
@ 2008-02-18 19:29 Richard Sandiford
  2008-03-12 18:50 ` Ulrich Weigand
  0 siblings, 1 reply; 18+ messages in thread
From: Richard Sandiford @ 2008-02-18 19:29 UTC (permalink / raw)
  To: gcc-patches

fp-int-convert-double.c ICEs on mipsisa64-elf when compiled with
"-O2 -mabi=o64 -mips16".  This is a recent failure; it's a latent
reload inheritance bug exposed by:

2008-02-06  Uros Bizjak  <ubizjak@gmail.com>

        PR target/35083
        * optabs.c (expand_float): Do not check for decimal modes when
        expanding unsigned integer through signed conversion.

We have the following pre-reload insn:

(insn 1749 1748 1750 193 ../../../gcc/gcc/testsuite/gcc.dg/torture/fp-int-convert-double.c:12 (set (reg:SI 1475 [ ivout.331+-3 ])
        (zero_extend:SI (reg:QI 2209 [ ivout ]))) 147 {*zero_extendqisi2_mips16e} (expr_list:REG_DEAD (reg:QI 2209 [ ivout ])
        (nil)))

and MIPS16 zero extensions require the input operand to match the output.
The insn gets the following reload:

Reloads for insn # 1749
Reload 0: reload_in (QI) = (reg:QI 2 $2 [orig:2209 ivout ] [2209])
        reload_out (SI) = (reg:SI 1475 [ ivout.331+-3 ])
        M16_REGS, RELOAD_OTHER (opnum = 0)
        reload_in_reg: (reg:QI 2 $2 [orig:2209 ivout ] [2209])
        reload_out_reg: (reg:SI 1475 [ ivout.331+-3 ])
        reload_reg_rtx: (reg:QI 2 $2 [orig:2209 ivout ] [2209])

Note that the reload_reg_rtx is QImode.  This is valid, because
(reg:QI 2) and (reg:SI 2) occupy the same number of registers
(and the code that chose (reg:QI 2) did check this).  But we
then record inheritance information as though we had a copy
of the form:

   (reg:SI 1475) <- (reg:QI 2)

(so reg_last_reload_reg[1475] = (reg:QI 2), etc.).  The next use
of (reg:SI 1475) is in a HImode lowpart:

   (subreg:HI (reg:SI 1475) 2)

so when inheriting the previous reload of 1475, we try to reduce:

   (subreg:HI (reg:QI 2) 2)

leading to an ICE because of the bogus byte offset.

In general, if we have an in-out reload, the input side may use the
reload register in a different mode from the output side.  The problem
is that, when recording information for inheritance purposes,
we assume that the two modes are the same.

In this case, the problem is somewhat benign: we're recording the
right registers (and the right _number_ of registers), but we're doing
it in the wrong mode.  This isn't always the case though.  You can see
the same thing in a wrong-code scenario with this testcase on big-endian
32-bit MIPS systems:

--------------------------------------------------------------------
__attribute__((nomips16)) unsigned int
f1 (unsigned long long x)
{
  unsigned int r;
  asm ("# %0" : "=a" (r) : "0" (x));
  asm ("# %0" : "=h" (r) : "0" (r));
  return r;
}

int
main (void)
{
  return f1 (4) != 4;
}
--------------------------------------------------------------------

The first asm needs an in-out reload with a 64-bit input (x) and a
32-bit output (r), so the reload register itself is 64 bits wide.
emit_output_reload_insns correctly adjusts the reload register
for the output mode, creating a move from (reg:SI lo) to "r".
However, emit_reload_insns records the copy for inheritance purposes
as though it used the unadjusted reg_rtx, (reg:DI hi).  This means that,
when processing the second asm, choose_reload_regs thinks that "r" is
already in (reg:SI hi).

The fix I've gone for is to introduce two new arrays,
reload_reg_rtx_for_input and reload_reg_rtx_for_output.
do_input_reload sets reload_reg_rtx_for_input[r] to the
input form of rld[r].reg_rtx and do_output_reload sets
reload_reg_rtx_for_output[r] to the output form.
The new arrays are short-lived (like reload_inheritance_insn & co.)
so there's no need to bloat struct reload with this information.

The logic in emit_reload_insns then becomes:

  for each reload r
    if (rld[r].reg_rtx is a spill register)
      {
        clear all inheritance info for rld[r].reg_rtx;
        if (reload r is an output reload)
          record output reload involving reload_reg_rtx_for_output[r];
        else if (reload r is an input reload)
          record input reload involving reload_reg_rtx_for_input[r];
      }
    else if (rld[r].reg_rtx is not a spill register
             && reload r is an output reload)
      record output reload involving reload_reg_rtx_for_output[r];

Various other bits of code in the emit_reload_insns hierachy need
to be adjusted too:

- delete_output_reload has:

  --------------------------------------------------------------------------
  if (rld[j].out != rld[j].in
      && REG_N_DEATHS (REGNO (reg)) == 1
      && REG_N_SETS (REGNO (reg)) == 1
      && REG_BASIC_BLOCK (REGNO (reg)) >= NUM_FIXED_BLOCKS
      && find_regno_note (insn, REG_DEAD, REGNO (reg)))
    {
      ...
      /* For the debugging info, say the pseudo lives in this reload reg.  */
      reg_renumber[REGNO (reg)] = REGNO (rld[j].reg_rtx);
  --------------------------------------------------------------------------

  The caller has determined that either an input reload or an output reload
  has made a previous reload redundant.  The rld[j].reg_rtx above should
  then be the input reload register or the output reload register respectively.

  I've added a new rtx parameter to delete_output_reload, "new_reload_reg",
  to go alongside the existing "last_reload_reg".

- emit_input_reload_insns has similar code:

	      /* If these are the only uses of the pseudo reg,
		 pretend for GDB it lives in the reload reg we used.  */
	      if (REG_N_DEATHS (REGNO (old)) == 1
		  && REG_N_SETS (REGNO (old)) == 1)
		{
		  reg_renumber[REGNO (old)] = REGNO (rl->reg_rtx);
		  alter_reg (REGNO (old), -1);
		}

  Again, this should use the input reload register instead of
  the unadjusted rl->reg_rtx.

- emit_output_reload_insns looks through the output reload insns to
  look for stores of the reload register.  This search should use
  the output reload register rather than the unadjusted register.

- do_input_reload should use the input reload register (rather than
  the unadjusted register) when deciding whether the reload would be
  a no-op.

- do_input_reload should use the input reload register when deciding
  whether the reload makes a previous output reload unnecessary.

- do_output_reload should use the output reload register rather than
  the unadjusted reload register in REG_UNUSED notes.

Some of these changes are for correctness.  The others effectively
just fix missed optimisation opportunities, but I've included them for
consistency.  The idea is that emit_reload_insns should always use the
relevant input or output reload register rather than the unadjusted
form.  It seems better to make the change across the board than to
try to justify on a case-by-case whether using the wrong register
would actually lead to wrong code, rather than simply being
conceptually wrong.

Code outside of emit_reload_insns also treats reg_rtx as suitable for
both the input and output, but this is usually for dependency analysis,
where (unlike here) the assumption is conservatively correct.  reg_rtx is
wide enough for reloads in both directions.

Code like find_dummy_reload tends to be mindful of mode differences,
doing nothing if the input reload occupies a different number of words
from the output reload register.

A few other notes:

- I've moved the mode and register calculations from
  emit_{input,output}_reload_insns to do_{input,output}_reload as-is.
  Some of the commentary may or may not be out of date, it may or may
  not be safe to use gen_lowpart_common these days, and the "output is
  VOIDmode" code may or may not be handled in a more elegant way these
  days.  I think any changes in those areas should be handled
  separately from this patch.

- Changing the handling of spill registers in emit_reload_insns meant
  conflating two loops to clear reg_reloaded_valid.  One of these loops
  also clobbered reg_reloaded_call_part_clobbered while the other didn't,
  and there was no explanation why.  I had to decide what the new loop
  should do.

  Logically, reg_reloaded_call_part_clobbered should only be meaningful
  for members of reg_reloaded_valid, and it is easy to see that this is
  indeed the case.  I therefore moved the code to clear
  reg_reloaded_call_part_clobbered to the code that sets
  reg_reloaded_valid; each such piece of code had a statement
  of the form:

     if (HARD_REGNO_CALL_PART_CLOBBERED (r, m))
       SET_HARD_REG_BIT (reg_reloaded_call_part_clobbered, r);

  which I turned into:

     if (HARD_REGNO_CALL_PART_CLOBBERED (r, m))
       SET_HARD_REG_BIT (reg_reloaded_call_part_clobbered, r);
     else
       CLEAR_HARD_REG_BIT (reg_reloaded_call_part_clobbered, r);

  (these being the only bits of code to _set_ bits in the variable).
  I tweaked the comment above reg_reloaded_call_part_clobbered as well.

- It seems more robust to set reload_reg_rtx_for_{input,output}
  unconditionally -- setting them to null if no reload is needed --
  than to try to be selective.  For example, inc/dec reloads have an
  output reload, but that reload is handled by emit_input_reload_insns
  rather than emit_output_reload_insns.  We should nevertheless set
  reload_reg_rtx_for_output to the output register used.

Er... well, that wasn't short.  I hope it doesn't put anyone off.
I think this is actually a relatively straight-forward change as
far as reload changes go.  I just wanted to describe it in even
more painful detail than usual because of the late stage we're at.

That said, I understand that making reload changes so close to a release
may not seem like a good idea.  If you don't want to apply a patch like
this now, what should we do with the PR?  On the one hand, I think
everyone agrees Uros's fix is correct in itself.  On the other hand,
PR 35083 is "only" for soft-float x86, and the ICE itself was fixed by:

2008-02-05  Uros Bizjak  <ubizjak@gmail.com>

	PR target/35083
	* config/i386/i386.md (floatunsisf2): Enable for TARGET_SSE_MATH only.
	Call ix86_expand_convert_uns_sisf_sse for TARGET_SSE2.

The 2008-02-06 part is a follow-on to fix a code-size regression from 4.2.
Also, mipsisa64-elf is a primary target, and I've been trying hard to get
us down to zero failures on MIPS targets.  (I've been rotating through
quite a few different configurations -- so many to test! -- which is why
I only picked this up a week after Uros's patch.)

If you're uncomfortable with my patch, perhaps one option would be to
revert Uros's optabs patch for now, on the understanding that both his
patch and whatever patch we pick for this PR can go in before 4.3.1.
I'd obviously prefer to fix the underlying bug (and keep Uros's patch)
if we can though.

Anyway, bootstrapped & regression-tested on x86_64-linux-gnu.
Also regression-tested on mipsisa64-elfoabi.  I compared the
output of gcc.c-torture, gcc.dg and g++.dg on mipsisa64-elfoabi
before and after the patch at -O2 and -O2 -mips16.  The script
I use for this is supposed to be an "any target" thing and doesn't
work for tests that include standard headers, including
fp-int-convert-double.c.  Still, a lot of tests don't use standard
headers, and there were no changes in the assembly output for them.
So, maybe a weak test, but not completely useless...

Richard


gcc/
	PR rtl-optimization/35232
	* reload1.c (reg_reloaded_call_part_clobbered): Clarify comment.
	(forget_old_reloads_1, forget_marked_reloads): Don't clear
	reg_reloaded_call_part_clobbered here.
	(reload_regs_reach_end_p): New function.
	(reload_reg_rtx_for_input): New variable.
	(reload_reg_rtx_for_output): Likewise.
	(emit_input_reload_insns): Use reloadreg rather than rl->reg_rtx
	when reassigning a pseudo register.  Load reloadreg from 
	reload_reg_rtx_for_input, moving the mode and register
	calculation to...
	(do_input_reload): ...here.  Use the mode-adjusted reg_rtx
	instead of the original when deciding whether an input reload
	would be a no-op or whether an output reload can be deleted.
	(emit_output_reload_insns): Use the mode-adjusted reg_rtx
	when setting up new_spill_reg_store.  Load it from
	reload_reg_rtx_for_output, moving the mode and register
	calculation to...
	(do_output_reload): ...here.  Use the mode-adjusted reg_rtx
	instead of the original when deciding whether an output reload
	would be a no-op.  Do the same when modifying insn notes.
	(inherit_piecemeal_p): Take a mode and two register numbers
	as argument.
	(emit_reload_insns): Clear new_spill_reg_store for every hard
	register in the reload register.  Remove spill registers
	from reg_reloaded_valid before considering whether to record
	inheritance information for them.  Use reload_reg_rtx_for_output
	instead of reg_rtx when recording output reloads.  Use
	reload_reg_rtx_for_input instead of reg_rtx when recording
	input reloads.  Set or clear reg_reloaded_call_part_clobbered
	at the same time as setting reg_reloaded_valid.
	(delete_output_reload): Add a new_reload_reg parameter and use it
	instead of rld[j].reg_rtx.
	(emit_input_reload_insns, do_input_reload, do_output_reload): Adjust
	calls accordingly.

gcc/testsuite/
	PR rtl-optimization/35232
	* gcc.target/mips/pr35232.c: New test.

Index: gcc/reload1.c
===================================================================
--- gcc/reload1.c	2008-02-17 22:30:14.000000000 +0000
+++ gcc/reload1.c	2008-02-18 17:40:51.000000000 +0000
@@ -158,7 +158,7 @@ VEC(rtx,gc) *reg_equiv_memory_loc_vec;
 
 /* Indicate whether the register's current value is one that is not
    safe to retain across a call, even for registers that are normally
-   call-saved.  */
+   call-saved.  This is only meaningful for members of reg_reloaded_valid.  */
 static HARD_REG_SET reg_reloaded_call_part_clobbered;
 
 /* Number of spill-regs so far; number of valid elements of spill_regs.  */
@@ -434,9 +434,8 @@ static void emit_output_reload_insns (st
 				      int);
 static void do_input_reload (struct insn_chain *, struct reload *, int);
 static void do_output_reload (struct insn_chain *, struct reload *, int);
-static bool inherit_piecemeal_p (int, int);
 static void emit_reload_insns (struct insn_chain *);
-static void delete_output_reload (rtx, int, int);
+static void delete_output_reload (rtx, int, int, rtx);
 static void delete_address_reloads (rtx, rtx);
 static void delete_address_reloads_1 (rtx, rtx, rtx);
 static rtx inc_for_reload (rtx, rtx, rtx, int);
@@ -4371,7 +4370,6 @@ forget_old_reloads_1 (rtx x, const_rtx i
 	      || ! TEST_HARD_REG_BIT (reg_is_output_reload, regno + i))
 	    {
 	      CLEAR_HARD_REG_BIT (reg_reloaded_valid, regno + i);
-	      CLEAR_HARD_REG_BIT (reg_reloaded_call_part_clobbered, regno + i);
 	      spill_reg_store[regno + i] = 0;
 	    }
     }
@@ -4408,7 +4406,6 @@ forget_marked_reloads (regset regs)
 	      || ! TEST_HARD_REG_BIT (reg_is_output_reload, reg)))
 	  {
 	    CLEAR_HARD_REG_BIT (reg_reloaded_valid, reg);
-	    CLEAR_HARD_REG_BIT (reg_reloaded_call_part_clobbered, reg);
 	    spill_reg_store[reg] = 0;
 	  }
       if (n_reloads == 0
@@ -4922,6 +4919,21 @@ reload_reg_reaches_end_p (unsigned int r
       gcc_unreachable ();
     }
 }
+
+/* Like reload_reg_reaches_end_p, but check that the condition holds for
+   every register in the range [REGNO, REGNO + NR).  */
+
+static bool
+reload_regs_reach_end_p (unsigned int regno, int nr,
+			 int opnum, enum reload_type type)
+{
+  int i;
+
+  for (i = 0; i < nr; i++)
+    if (!reload_reg_reaches_end_p (regno + i, opnum, type))
+      return false;
+  return true;
+}
 \f
 
 /*  Returns whether R1 and R2 are uniquely chained: the value of one
@@ -5061,6 +5073,12 @@ reloads_conflict (int r1, int r2)
    or -1 if we did not need a register for this reload.  */
 static int reload_spill_index[MAX_RELOADS];
 
+/* Index X is the value of rld[X].reg_rtx, adjusted for the input mode.  */
+static rtx reload_reg_rtx_for_input[MAX_RELOADS];
+
+/* Index X is the value of rld[X].reg_rtx, adjusted for the output mode.  */
+static rtx reload_reg_rtx_for_output[MAX_RELOADS];
+
 /* Subroutine of free_for_value_p, used to check a single register.
    START_REGNO is the starting regno of the full reload register
    (possibly comprising multiple hard registers) that we are considering.  */
@@ -6559,42 +6577,6 @@ emit_input_reload_insns (struct insn_cha
   enum machine_mode mode;
   rtx *where;
 
-  /* Determine the mode to reload in.
-     This is very tricky because we have three to choose from.
-     There is the mode the insn operand wants (rl->inmode).
-     There is the mode of the reload register RELOADREG.
-     There is the intrinsic mode of the operand, which we could find
-     by stripping some SUBREGs.
-     It turns out that RELOADREG's mode is irrelevant:
-     we can change that arbitrarily.
-
-     Consider (SUBREG:SI foo:QI) as an operand that must be SImode;
-     then the reload reg may not support QImode moves, so use SImode.
-     If foo is in memory due to spilling a pseudo reg, this is safe,
-     because the QImode value is in the least significant part of a
-     slot big enough for a SImode.  If foo is some other sort of
-     memory reference, then it is impossible to reload this case,
-     so previous passes had better make sure this never happens.
-
-     Then consider a one-word union which has SImode and one of its
-     members is a float, being fetched as (SUBREG:SF union:SI).
-     We must fetch that as SFmode because we could be loading into
-     a float-only register.  In this case OLD's mode is correct.
-
-     Consider an immediate integer: it has VOIDmode.  Here we need
-     to get a mode from something else.
-
-     In some cases, there is a fourth mode, the operand's
-     containing mode.  If the insn specifies a containing mode for
-     this operand, it overrides all others.
-
-     I am not sure whether the algorithm here is always right,
-     but it does the right things in those cases.  */
-
-  mode = GET_MODE (old);
-  if (mode == VOIDmode)
-    mode = rl->inmode;
-
   /* delete_output_reload is only invoked properly if old contains
      the original pseudo register.  Since this is replaced with a
      hard reg when RELOAD_OVERRIDE_IN is set, see if we can
@@ -6612,6 +6594,9 @@ emit_input_reload_insns (struct insn_cha
   else if (GET_CODE (oldequiv) == SUBREG)
     oldequiv_reg = SUBREG_REG (oldequiv);
 
+  reloadreg = reload_reg_rtx_for_input[j];
+  mode = GET_MODE (reloadreg);
+
   /* If we are reloading from a register that was recently stored in
      with an output-reload, see if we can prove there was
      actually no need to store the old value in it.  */
@@ -6623,16 +6608,11 @@ emit_input_reload_insns (struct insn_cha
       && (dead_or_set_p (insn, spill_reg_stored_to[REGNO (oldequiv)])
 	  || rtx_equal_p (spill_reg_stored_to[REGNO (oldequiv)],
 			  rl->out_reg)))
-    delete_output_reload (insn, j, REGNO (oldequiv));
+    delete_output_reload (insn, j, REGNO (oldequiv), reloadreg);
 
-  /* Encapsulate both RELOADREG and OLDEQUIV into that mode,
-     then load RELOADREG from OLDEQUIV.  Note that we cannot use
-     gen_lowpart_common since it can do the wrong thing when
-     RELOADREG has a multi-word mode.  Note that RELOADREG
-     must always be a REG here.  */
+  /* Encapsulate OLDEQUIV into the reload mode, then load RELOADREG from
+     OLDEQUIV.  */
 
-  if (GET_MODE (reloadreg) != mode)
-    reloadreg = reload_adjust_reg_for_mode (reloadreg, mode);
   while (GET_CODE (oldequiv) == SUBREG && GET_MODE (oldequiv) != mode)
     oldequiv = SUBREG_REG (oldequiv);
   if (GET_MODE (oldequiv) != VOIDmode
@@ -6696,7 +6676,7 @@ emit_input_reload_insns (struct insn_cha
 			     spill_reg_stored_to[REGNO (oldequiv)])
 	      || rtx_equal_p (spill_reg_stored_to[REGNO (oldequiv)],
 			      old)))
-	delete_output_reload (insn, j, REGNO (oldequiv));
+	delete_output_reload (insn, j, REGNO (oldequiv), reloadreg);
 
       /* Prevent normal processing of this reload.  */
       special = 1;
@@ -6756,7 +6736,7 @@ emit_input_reload_insns (struct insn_cha
 	      if (REG_N_DEATHS (REGNO (old)) == 1
 		  && REG_N_SETS (REGNO (old)) == 1)
 		{
-		  reg_renumber[REGNO (old)] = REGNO (rl->reg_rtx);
+		  reg_renumber[REGNO (old)] = REGNO (reloadreg);
 		  alter_reg (REGNO (old), -1);
 		}
 	      special = 1;
@@ -7015,35 +6995,23 @@ emit_input_reload_insns (struct insn_cha
 emit_output_reload_insns (struct insn_chain *chain, struct reload *rl,
 			  int j)
 {
-  rtx reloadreg = rl->reg_rtx;
+  rtx reloadreg;
   rtx insn = chain->insn;
   int special = 0;
   rtx old = rl->out;
-  enum machine_mode mode = GET_MODE (old);
+  enum machine_mode mode;
   rtx p;
+  rtx rl_reg_rtx;
 
   if (rl->when_needed == RELOAD_OTHER)
     start_sequence ();
   else
     push_to_sequence (output_reload_insns[rl->opnum]);
 
-  /* Determine the mode to reload in.
-     See comments above (for input reloading).  */
-
-  if (mode == VOIDmode)
-    {
-      /* VOIDmode should never happen for an output.  */
-      if (asm_noperands (PATTERN (insn)) < 0)
-	/* It's the compiler's fault.  */
-	fatal_insn ("VOIDmode on an output", insn);
-      error_for_asm (insn, "output operand is constant in %<asm%>");
-      /* Prevent crash--use something we know is valid.  */
-      mode = word_mode;
-      old = gen_rtx_REG (mode, REGNO (reloadreg));
-    }
+  rl_reg_rtx = reload_reg_rtx_for_output[j];
+  mode = GET_MODE (rl_reg_rtx);
 
-  if (GET_MODE (reloadreg) != mode)
-    reloadreg = reload_adjust_reg_for_mode (reloadreg, mode);
+  reloadreg = rl_reg_rtx;
 
   /* If we need two reload regs, set RELOADREG to the intermediate
      one, since it will be stored into OLD.  We might need a secondary
@@ -7167,12 +7135,12 @@ emit_output_reload_insns (struct insn_ch
 	   reg_has_output_reload will make this do nothing.  */
 	note_stores (pat, forget_old_reloads_1, NULL);
 
-	if (reg_mentioned_p (rl->reg_rtx, pat))
+	if (reg_mentioned_p (rl_reg_rtx, pat))
 	  {
 	    rtx set = single_set (insn);
 	    if (reload_spill_index[j] < 0
 		&& set
-		&& SET_SRC (set) == rl->reg_rtx)
+		&& SET_SRC (set) == rl_reg_rtx)
 	      {
 		int src = REGNO (SET_SRC (set));
 
@@ -7181,7 +7149,7 @@ emit_output_reload_insns (struct insn_ch
 		if (find_regno_note (insn, REG_DEAD, src))
 		  SET_HARD_REG_BIT (reg_reloaded_died, src);
 	      }
-	    if (REGNO (rl->reg_rtx) < FIRST_PSEUDO_REGISTER)
+	    if (HARD_REGISTER_P (rl_reg_rtx))
 	      {
 		int s = rl->secondary_out_reload;
 		set = single_set (p);
@@ -7194,7 +7162,7 @@ emit_output_reload_insns (struct insn_ch
 		     made; leave new_spill_reg_store alone.  */
 		  ;
 		else if (s >= 0
-			 && SET_SRC (set) == rl->reg_rtx
+			 && SET_SRC (set) == rl_reg_rtx
 			 && SET_DEST (set) == rld[s].reg_rtx)
 		  {
 		    /* Usually the next instruction will be the
@@ -7215,7 +7183,7 @@ emit_output_reload_insns (struct insn_ch
 		      }
 		  }
 		else
-		  new_spill_reg_store[REGNO (rl->reg_rtx)] = p;
+		  new_spill_reg_store[REGNO (rl_reg_rtx)] = p;
 	      }
 	  }
       }
@@ -7242,13 +7210,62 @@ do_input_reload (struct insn_chain *chai
   rtx insn = chain->insn;
   rtx old = (rl->in && MEM_P (rl->in)
 	     ? rl->in_reg : rl->in);
+  rtx reg_rtx = rl->reg_rtx;
+
+  if (old && reg_rtx)
+    {
+      enum machine_mode mode;
+
+      /* Determine the mode to reload in.
+	 This is very tricky because we have three to choose from.
+	 There is the mode the insn operand wants (rl->inmode).
+	 There is the mode of the reload register RELOADREG.
+	 There is the intrinsic mode of the operand, which we could find
+	 by stripping some SUBREGs.
+	 It turns out that RELOADREG's mode is irrelevant:
+	 we can change that arbitrarily.
+
+	 Consider (SUBREG:SI foo:QI) as an operand that must be SImode;
+	 then the reload reg may not support QImode moves, so use SImode.
+	 If foo is in memory due to spilling a pseudo reg, this is safe,
+	 because the QImode value is in the least significant part of a
+	 slot big enough for a SImode.  If foo is some other sort of
+	 memory reference, then it is impossible to reload this case,
+	 so previous passes had better make sure this never happens.
+
+	 Then consider a one-word union which has SImode and one of its
+	 members is a float, being fetched as (SUBREG:SF union:SI).
+	 We must fetch that as SFmode because we could be loading into
+	 a float-only register.  In this case OLD's mode is correct.
+
+	 Consider an immediate integer: it has VOIDmode.  Here we need
+	 to get a mode from something else.
+
+	 In some cases, there is a fourth mode, the operand's
+	 containing mode.  If the insn specifies a containing mode for
+	 this operand, it overrides all others.
+
+	 I am not sure whether the algorithm here is always right,
+	 but it does the right things in those cases.  */
+
+      mode = GET_MODE (old);
+      if (mode == VOIDmode)
+	mode = rl->inmode;
+
+      /* We cannot use gen_lowpart_common since it can do the wrong thing
+	 when REG_RTX has a multi-word mode.  Note that REG_RTX must
+	 always be a REG here.  */
+      if (GET_MODE (reg_rtx) != mode)
+	reg_rtx = reload_adjust_reg_for_mode (reg_rtx, mode);
+    }
+  reload_reg_rtx_for_input[j] = reg_rtx;
 
   if (old != 0
       /* AUTO_INC reloads need to be handled even if inherited.  We got an
 	 AUTO_INC reload if reload_out is set but reload_out_reg isn't.  */
       && (! reload_inherited[j] || (rl->out && ! rl->out_reg))
-      && ! rtx_equal_p (rl->reg_rtx, old)
-      && rl->reg_rtx != 0)
+      && ! rtx_equal_p (reg_rtx, old)
+      && reg_rtx != 0)
     emit_input_reload_insns (chain, rld + j, old, j);
 
   /* When inheriting a wider reload, we have a MEM in rl->in,
@@ -7267,24 +7284,21 @@ do_input_reload (struct insn_chain *chai
 
   if (optimize
       && (reload_inherited[j] || reload_override_in[j])
-      && rl->reg_rtx
-      && REG_P (rl->reg_rtx)
-      && spill_reg_store[REGNO (rl->reg_rtx)] != 0
+      && reg_rtx
+      && REG_P (reg_rtx)
+      && spill_reg_store[REGNO (reg_rtx)] != 0
 #if 0
       /* There doesn't seem to be any reason to restrict this to pseudos
 	 and doing so loses in the case where we are copying from a
 	 register of the wrong class.  */
-      && (REGNO (spill_reg_stored_to[REGNO (rl->reg_rtx)])
-	  >= FIRST_PSEUDO_REGISTER)
+      && !HARD_REGISTER_P (spill_reg_stored_to[REGNO (reg_rtx)])
 #endif
       /* The insn might have already some references to stackslots
 	 replaced by MEMs, while reload_out_reg still names the
 	 original pseudo.  */
-      && (dead_or_set_p (insn,
-			 spill_reg_stored_to[REGNO (rl->reg_rtx)])
-	  || rtx_equal_p (spill_reg_stored_to[REGNO (rl->reg_rtx)],
-			  rl->out_reg)))
-    delete_output_reload (insn, j, REGNO (rl->reg_rtx));
+      && (dead_or_set_p (insn, spill_reg_stored_to[REGNO (reg_rtx)])
+	  || rtx_equal_p (spill_reg_stored_to[REGNO (reg_rtx)], rl->out_reg)))
+    delete_output_reload (insn, j, REGNO (reg_rtx), reg_rtx);
 }
 
 /* Do output reloading for reload RL, which is for the insn described by
@@ -7300,6 +7314,30 @@ do_output_reload (struct insn_chain *cha
      not loaded in this same reload, see if we can eliminate a previous
      store.  */
   rtx pseudo = rl->out_reg;
+  rtx reg_rtx = rl->reg_rtx;
+
+  if (rl->out && reg_rtx)
+    {
+      enum machine_mode mode;
+
+      /* Determine the mode to reload in.
+	 See comments above (for input reloading).  */
+      mode = GET_MODE (rl->out);
+      if (mode == VOIDmode)
+	{
+	  /* VOIDmode should never happen for an output.  */
+	  if (asm_noperands (PATTERN (insn)) < 0)
+	    /* It's the compiler's fault.  */
+	    fatal_insn ("VOIDmode on an output", insn);
+	  error_for_asm (insn, "output operand is constant in %<asm%>");
+	  /* Prevent crash--use something we know is valid.  */
+	  mode = word_mode;
+	  rl->out = gen_rtx_REG (mode, REGNO (reg_rtx));
+	}
+      if (GET_MODE (reg_rtx) != mode)
+	reg_rtx = reload_adjust_reg_for_mode (reg_rtx, mode);
+    }
+  reload_reg_rtx_for_output[j] = reg_rtx;
 
   if (pseudo
       && optimize
@@ -7318,13 +7356,13 @@ do_output_reload (struct insn_chain *cha
 	  && reg_reloaded_contents[last_regno] == pseudo_no
 	  && spill_reg_store[last_regno]
 	  && rtx_equal_p (pseudo, spill_reg_stored_to[last_regno]))
-	delete_output_reload (insn, j, last_regno);
+	delete_output_reload (insn, j, last_regno, reg_rtx);
     }
 
   old = rl->out_reg;
   if (old == 0
-      || rl->reg_rtx == old
-      || rl->reg_rtx == 0)
+      || reg_rtx == old
+      || reg_rtx == 0)
     return;
 
   /* An output operand that dies right away does need a reload,
@@ -7333,7 +7371,7 @@ do_output_reload (struct insn_chain *cha
   if ((REG_P (old) || GET_CODE (old) == SCRATCH)
       && (note = find_reg_note (insn, REG_UNUSED, old)) != 0)
     {
-      XEXP (note, 0) = rl->reg_rtx;
+      XEXP (note, 0) = reg_rtx;
       return;
     }
   /* Likewise for a SUBREG of an operand that dies.  */
@@ -7342,8 +7380,7 @@ do_output_reload (struct insn_chain *cha
 	   && 0 != (note = find_reg_note (insn, REG_UNUSED,
 					  SUBREG_REG (old))))
     {
-      XEXP (note, 0) = gen_lowpart_common (GET_MODE (old),
-					   rl->reg_rtx);
+      XEXP (note, 0) = gen_lowpart_common (GET_MODE (old), reg_rtx);
       return;
     }
   else if (GET_CODE (old) == SCRATCH)
@@ -7357,22 +7394,20 @@ do_output_reload (struct insn_chain *cha
   emit_output_reload_insns (chain, rld + j, j);
 }
 
-/* Reload number R reloads from or to a group of hard registers starting at
-   register REGNO.  Return true if it can be treated for inheritance purposes
-   like a group of reloads, each one reloading a single hard register.
-   The caller has already checked that the spill register and REGNO use
-   the same number of registers to store the reload value.  */
+/* A reload copies values of MODE from register SRC to register DEST.
+   Return true if it can be treated for inheritance purposes like a
+   group of reloads, each one reloading a single hard register.  The
+   caller has already checked that (reg:MODE SRC) and (reg:MODE DEST)
+   occupy the same number of hard registers.  */
 
 static bool
-inherit_piecemeal_p (int r ATTRIBUTE_UNUSED, int regno ATTRIBUTE_UNUSED)
+inherit_piecemeal_p (int dest ATTRIBUTE_UNUSED,
+		     int src ATTRIBUTE_UNUSED,
+		     enum machine_mode mode ATTRIBUTE_UNUSED)
 {
 #ifdef CANNOT_CHANGE_MODE_CLASS
-  return (!REG_CANNOT_CHANGE_MODE_P (reload_spill_index[r],
-				     GET_MODE (rld[r].reg_rtx),
-				     reg_raw_mode[reload_spill_index[r]])
-	  && !REG_CANNOT_CHANGE_MODE_P (regno,
-					GET_MODE (rld[r].reg_rtx),
-					reg_raw_mode[regno]));
+  return (!REG_CANNOT_CHANGE_MODE_P (dest, mode, reg_raw_mode[dest])
+	  && !REG_CANNOT_CHANGE_MODE_P (src, mode, reg_raw_mode[src]));
 #else
   return true;
 #endif
@@ -7414,9 +7449,13 @@ emit_reload_insns (struct insn_chain *ch
 
   for (j = 0; j < n_reloads; j++)
     {
-      if (rld[j].reg_rtx
-	  && REGNO (rld[j].reg_rtx) < FIRST_PSEUDO_REGISTER)
-	new_spill_reg_store[REGNO (rld[j].reg_rtx)] = 0;
+      if (rld[j].reg_rtx && HARD_REGISTER_P (rld[j].reg_rtx))
+	{
+	  unsigned int i;
+
+	  for (i = REGNO (rld[j].reg_rtx); i < END_REGNO (rld[j].reg_rtx); i++)
+	    new_spill_reg_store[i] = 0;
+	}
 
       do_input_reload (chain, rld + j, j);
       do_output_reload (chain, rld + j, j);
@@ -7515,40 +7554,31 @@ emit_reload_insns (struct insn_chain *ch
 	{
 	  int nr = hard_regno_nregs[i][GET_MODE (rld[r].reg_rtx)];
 	  int k;
-	  int part_reaches_end = 0;
-	  int all_reaches_end = 1;
 
 	  /* For a multi register reload, we need to check if all or part
 	     of the value lives to the end.  */
 	  for (k = 0; k < nr; k++)
-	    {
-	      if (reload_reg_reaches_end_p (i + k, rld[r].opnum,
-					    rld[r].when_needed))
-		part_reaches_end = 1;
-	      else
-		all_reaches_end = 0;
-	    }
-
-	  /* Ignore reloads that don't reach the end of the insn in
-	     entirety.  */
-	  if (all_reaches_end)
-	    {
-	      /* First, clear out memory of what used to be in this spill reg.
-		 If consecutive registers are used, clear them all.  */
-
-	      for (k = 0; k < nr; k++)
-  	        {
-		CLEAR_HARD_REG_BIT (reg_reloaded_valid, i + k);
-  		  CLEAR_HARD_REG_BIT (reg_reloaded_call_part_clobbered, i + k);
-  		}
-
-	      /* Maybe the spill reg contains a copy of reload_out.  */
-	      if (rld[r].out != 0
-		  && (REG_P (rld[r].out)
+	    if (reload_reg_reaches_end_p (i + k, rld[r].opnum,
+					  rld[r].when_needed))
+	      CLEAR_HARD_REG_BIT (reg_reloaded_valid, i + k);
+
+	  /* Maybe the spill reg contains a copy of reload_out.  */
+	  if (rld[r].out != 0
+	      && (REG_P (rld[r].out)
 #ifdef AUTO_INC_DEC
-		      || ! rld[r].out_reg
+		  || ! rld[r].out_reg
 #endif
-		      || REG_P (rld[r].out_reg)))
+		  || REG_P (rld[r].out_reg)))
+	    {
+	      rtx src;
+	      enum machine_mode mode;
+
+	      src = reload_reg_rtx_for_output[r];
+	      mode = GET_MODE (src);
+	      i = REGNO (src);
+	      nr = hard_regno_nregs[i][mode];
+	      if (reload_regs_reach_end_p (i, nr, rld[r].opnum,
+					   rld[r].when_needed))
 		{
 		  rtx out = (REG_P (rld[r].out)
 			     ? rld[r].out
@@ -7557,17 +7587,16 @@ emit_reload_insns (struct insn_chain *ch
 /* AUTO_INC */		     : XEXP (rld[r].in_reg, 0));
 		  int nregno = REGNO (out);
 		  int nnr = (nregno >= FIRST_PSEUDO_REGISTER ? 1
-			     : hard_regno_nregs[nregno]
-					       [GET_MODE (rld[r].reg_rtx)]);
+			     : hard_regno_nregs[nregno][mode]);
 		  bool piecemeal;
 
 		  spill_reg_store[i] = new_spill_reg_store[i];
 		  spill_reg_stored_to[i] = out;
-		  reg_last_reload_reg[nregno] = rld[r].reg_rtx;
+		  reg_last_reload_reg[nregno] = src;
 
 		  piecemeal = (nregno < FIRST_PSEUDO_REGISTER
 			       && nr == nnr
-			       && inherit_piecemeal_p (r, nregno));
+			       && inherit_piecemeal_p (nregno, i, mode));
 
 		  /* If NREGNO is a hard register, it may occupy more than
 		     one register.  If it does, say what is in the
@@ -7578,9 +7607,7 @@ emit_reload_insns (struct insn_chain *ch
 		  if (nregno < FIRST_PSEUDO_REGISTER)
 		    for (k = 1; k < nnr; k++)
 		      reg_last_reload_reg[nregno + k]
-			= (piecemeal
-			   ? regno_reg_rtx[REGNO (rld[r].reg_rtx) + k]
-			   : 0);
+			= (piecemeal ? regno_reg_rtx[i + k] : 0);
 
 		  /* Now do the inverse operation.  */
 		  for (k = 0; k < nr; k++)
@@ -7592,24 +7619,38 @@ emit_reload_insns (struct insn_chain *ch
 			   : nregno + k);
 		      reg_reloaded_insn[i + k] = insn;
 		      SET_HARD_REG_BIT (reg_reloaded_valid, i + k);
-		      if (HARD_REGNO_CALL_PART_CLOBBERED (i + k, GET_MODE (out)))
-			SET_HARD_REG_BIT (reg_reloaded_call_part_clobbered, i + k);
+		      if (HARD_REGNO_CALL_PART_CLOBBERED (i + k, mode))
+			SET_HARD_REG_BIT (reg_reloaded_call_part_clobbered,
+					  i + k);
+		      else
+			CLEAR_HARD_REG_BIT (reg_reloaded_call_part_clobbered,
+					    i + k);
 		    }
 		}
-
-	      /* Maybe the spill reg contains a copy of reload_in.  Only do
-		 something if there will not be an output reload for
-		 the register being reloaded.  */
-	      else if (rld[r].out_reg == 0
-		       && rld[r].in != 0
-		       && ((REG_P (rld[r].in)
-			    && REGNO (rld[r].in) >= FIRST_PSEUDO_REGISTER
-	                    && !REGNO_REG_SET_P (&reg_has_output_reload,
-			      			 REGNO (rld[r].in)))
-			   || (REG_P (rld[r].in_reg)
-			       && !REGNO_REG_SET_P (&reg_has_output_reload,
-						    REGNO (rld[r].in_reg))))
-		       && ! reg_set_p (rld[r].reg_rtx, PATTERN (insn)))
+	    }
+	  /* Maybe the spill reg contains a copy of reload_in.  Only do
+	     something if there will not be an output reload for
+	     the register being reloaded.  */
+	  else if (rld[r].out_reg == 0
+		   && rld[r].in != 0
+		   && ((REG_P (rld[r].in)
+			&& !HARD_REGISTER_P (rld[r].in)
+			&& !REGNO_REG_SET_P (&reg_has_output_reload,
+					     REGNO (rld[r].in)))
+		       || (REG_P (rld[r].in_reg)
+			   && !REGNO_REG_SET_P (&reg_has_output_reload,
+						REGNO (rld[r].in_reg))))
+		   && !reg_set_p (reload_reg_rtx_for_input[r], PATTERN (insn)))
+	    {
+	      rtx dest;
+	      enum machine_mode mode;
+
+	      dest = reload_reg_rtx_for_input[r];
+	      i = REGNO (dest);
+	      mode = GET_MODE (dest);
+	      nr = hard_regno_nregs[i][mode];
+	      if (reload_regs_reach_end_p (i, nr, rld[r].opnum,
+					   rld[r].when_needed))
 		{
 		  int nregno;
 		  int nnr;
@@ -7626,21 +7667,18 @@ emit_reload_insns (struct insn_chain *ch
 		  nregno = REGNO (in);
 
 		  nnr = (nregno >= FIRST_PSEUDO_REGISTER ? 1
-			 : hard_regno_nregs[nregno]
-					   [GET_MODE (rld[r].reg_rtx)]);
+			 : hard_regno_nregs[nregno][mode]);
 
-		  reg_last_reload_reg[nregno] = rld[r].reg_rtx;
+		  reg_last_reload_reg[nregno] = dest;
 
 		  piecemeal = (nregno < FIRST_PSEUDO_REGISTER
 			       && nr == nnr
-			       && inherit_piecemeal_p (r, nregno));
+			       && inherit_piecemeal_p (i, nregno, mode));
 
 		  if (nregno < FIRST_PSEUDO_REGISTER)
 		    for (k = 1; k < nnr; k++)
 		      reg_last_reload_reg[nregno + k]
-			= (piecemeal
-			   ? regno_reg_rtx[REGNO (rld[r].reg_rtx) + k]
-			   : 0);
+			= (piecemeal ? regno_reg_rtx[i + k] : 0);
 
 		  /* Unless we inherited this reload, show we haven't
 		     recently done a store.
@@ -7659,22 +7697,15 @@ emit_reload_insns (struct insn_chain *ch
 			   : nregno + k);
 		      reg_reloaded_insn[i + k] = insn;
 		      SET_HARD_REG_BIT (reg_reloaded_valid, i + k);
-		      if (HARD_REGNO_CALL_PART_CLOBBERED (i + k, GET_MODE (in)))
-			SET_HARD_REG_BIT (reg_reloaded_call_part_clobbered, i + k);
+		      if (HARD_REGNO_CALL_PART_CLOBBERED (i + k, mode))
+			SET_HARD_REG_BIT (reg_reloaded_call_part_clobbered,
+					  i + k);
+		      else
+			CLEAR_HARD_REG_BIT (reg_reloaded_call_part_clobbered,
+					    i + k);
 		    }
 		}
 	    }
-
-	  /* However, if part of the reload reaches the end, then we must
-	     invalidate the old info for the part that survives to the end.  */
-	  else if (part_reaches_end)
-	    {
-	      for (k = 0; k < nr; k++)
-		if (reload_reg_reaches_end_p (i + k,
-					      rld[r].opnum,
-					      rld[r].when_needed))
-		  CLEAR_HARD_REG_BIT (reg_reloaded_valid, i + k);
-	    }
 	}
 
       /* The following if-statement was #if 0'd in 1.34 (or before...).
@@ -7687,17 +7718,18 @@ emit_reload_insns (struct insn_chain *ch
 	 it thinks only about the original insn.  So invalidate it here.
 	 Also do the same thing for RELOAD_OTHER constraints where the
 	 output is discarded.  */
-      if (i < 0 
-	  && ((rld[r].out != 0
-	       && (REG_P (rld[r].out)
-		   || (MEM_P (rld[r].out)
+      else if (i < 0
+	       && ((rld[r].out != 0
+		    && (REG_P (rld[r].out)
+			|| (MEM_P (rld[r].out)
+			    && REG_P (rld[r].out_reg))))
+		   || (rld[r].out == 0 && rld[r].out_reg
 		       && REG_P (rld[r].out_reg))))
-	      || (rld[r].out == 0 && rld[r].out_reg
-		  && REG_P (rld[r].out_reg))))
 	{
 	  rtx out = ((rld[r].out && REG_P (rld[r].out))
 		     ? rld[r].out : rld[r].out_reg);
 	  int nregno = REGNO (out);
+	  enum machine_mode mode = GET_MODE (out);
 
 	  /* REG_RTX is now set or clobbered by the main instruction.
 	     As the comment above explains, forget_old_reloads_1 only
@@ -7724,7 +7756,7 @@ emit_reload_insns (struct insn_chain *ch
 	      /* If we can find a hard register that is stored, record
 		 the storing insn so that we may delete this insn with
 		 delete_output_reload.  */
-	      src_reg = rld[r].reg_rtx;
+	      src_reg = reload_reg_rtx_for_output[r];
 
 	      /* If this is an optional reload, try to find the source reg
 		 from an input reload.  */
@@ -7741,7 +7773,7 @@ emit_reload_insns (struct insn_chain *ch
 			{
 			  if (rld[k].in == src_reg)
 			    {
-			      src_reg = rld[k].reg_rtx;
+			      src_reg = reload_reg_rtx_for_input[k];
 			      break;
 			    }
 			}
@@ -7752,13 +7784,17 @@ emit_reload_insns (struct insn_chain *ch
 	      if (src_reg && REG_P (src_reg)
 		  && REGNO (src_reg) < FIRST_PSEUDO_REGISTER)
 		{
-		  int src_regno = REGNO (src_reg);
-		  int nr = hard_regno_nregs[src_regno][rld[r].mode];
+		  int src_regno, nr;
+		  rtx note;
+
+		  gcc_assert (GET_MODE (src_reg) == mode);
+		  src_regno = REGNO (src_reg);
+		  nr = hard_regno_nregs[src_regno][mode];
 		  /* The place where to find a death note varies with
 		     PRESERVE_DEATH_INFO_REGNO_P .  The condition is not
 		     necessarily checked exactly in the code that moves
 		     notes, so just check both locations.  */
-		  rtx note = find_regno_note (insn, REG_DEAD, src_regno);
+		  note = find_regno_note (insn, REG_DEAD, src_regno);
 		  if (! note && store_insn)
 		    note = find_regno_note (store_insn, REG_DEAD, src_regno);
 		  while (nr-- > 0)
@@ -7769,10 +7805,13 @@ emit_reload_insns (struct insn_chain *ch
 		      reg_reloaded_insn[src_regno + nr] = store_insn;
 		      CLEAR_HARD_REG_BIT (reg_reloaded_dead, src_regno + nr);
 		      SET_HARD_REG_BIT (reg_reloaded_valid, src_regno + nr);
-		      if (HARD_REGNO_CALL_PART_CLOBBERED (src_regno + nr, 
-							  GET_MODE (src_reg)))
+		      if (HARD_REGNO_CALL_PART_CLOBBERED (src_regno + nr,
+							  mode))
 			SET_HARD_REG_BIT (reg_reloaded_call_part_clobbered, 
 					  src_regno + nr);
+		      else
+			CLEAR_HARD_REG_BIT (reg_reloaded_call_part_clobbered,
+					    src_regno + nr);
 		      SET_HARD_REG_BIT (reg_is_output_reload, src_regno + nr);
 		      if (note)
 			SET_HARD_REG_BIT (reg_reloaded_died, src_regno);
@@ -7789,7 +7828,7 @@ emit_reload_insns (struct insn_chain *ch
 	    }
 	  else
 	    {
-	      int num_regs = hard_regno_nregs[nregno][GET_MODE (out)];
+	      int num_regs = hard_regno_nregs[nregno][mode];
 
 	      while (num_regs-- > 0)
 		reg_last_reload_reg[nregno + num_regs] = 0;
@@ -8075,10 +8114,11 @@ gen_reload (rtx out, rtx in, int opnum, 
    LAST_RELOAD_REG is the hard register number for which we want to delete
    the last output reload.
    J is the reload-number that originally used REG.  The caller has made
-   certain that reload J doesn't use REG any longer for input.  */
+   certain that reload J doesn't use REG any longer for input.
+   NEW_RELOAD_REG is reload register that reload J is using for REG.  */
 
 static void
-delete_output_reload (rtx insn, int j, int last_reload_reg)
+delete_output_reload (rtx insn, int j, int last_reload_reg, rtx new_reload_reg)
 {
   rtx output_reload_insn = spill_reg_store[last_reload_reg];
   rtx reg = spill_reg_stored_to[last_reload_reg];
@@ -8230,7 +8270,7 @@ delete_output_reload (rtx insn, int j, i
 	}
 
       /* For the debugging info, say the pseudo lives in this reload reg.  */
-      reg_renumber[REGNO (reg)] = REGNO (rld[j].reg_rtx);
+      reg_renumber[REGNO (reg)] = REGNO (new_reload_reg);
       alter_reg (REGNO (reg), -1);
     }
   else
Index: gcc/testsuite/gcc.target/mips/pr35232.c
===================================================================
--- /dev/null	2008-02-11 07:41:57.552097000 +0000
+++ gcc/testsuite/gcc.target/mips/pr35232.c	2008-02-17 22:31:26.000000000 +0000
@@ -0,0 +1,17 @@
+/* { dg-do run } */
+/* { dg-mips-options "-O" } */
+
+NOMIPS16 unsigned int
+f1 (unsigned long long x)
+{
+  unsigned int r;
+  asm ("# %0" : "=a" (r) : "0" (x));
+  asm ("# %0" : "=h" (r) : "0" (r));
+  return r;
+}
+
+int
+main (void)
+{
+  return f1 (4) != 4;
+}

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: PR 35232: Recent regression due to reload inheritance bug
  2008-02-18 19:29 PR 35232: Recent regression due to reload inheritance bug Richard Sandiford
@ 2008-03-12 18:50 ` Ulrich Weigand
  2008-03-13 18:03   ` Richard Sandiford
  0 siblings, 1 reply; 18+ messages in thread
From: Ulrich Weigand @ 2008-03-12 18:50 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc-patches

Richard Sandiford wrote:

Sorry for the late review, I've unfortunately been quite busy
over the last couple of weeks, and this was a big patch ;-)

>The fix I've gone for is to introduce two new arrays,
>reload_reg_rtx_for_input and reload_reg_rtx_for_output.
>do_input_reload sets reload_reg_rtx_for_input[r] to the
>input form of rld[r].reg_rtx and do_output_reload sets
>reload_reg_rtx_for_output[r] to the output form.

OK, this definitely seems like the right thing to do.

>Er... well, that wasn't short.  I hope it doesn't put anyone off.
>I think this is actually a relatively straight-forward change as
>far as reload changes go.  I just wanted to describe it in even
>more painful detail than usual because of the late stage we're at.

Thanks for the detailed description of the changes, this was
very helpful.


The patch looks good to me, just a couple of questions and/or
suggestions:

emit_input_reload_insns has an initialization of reloadreg:
   rtx reloadreg = rl->reg_rtx;
which is now dead and should be removed.

>@@ -7194,7 +7162,7 @@ emit_output_reload_insns (struct insn_ch
> 		     made; leave new_spill_reg_store alone.  */
> 		  ;
> 		else if (s >= 0
>-			 && SET_SRC (set) == rl->reg_rtx
>+			 && SET_SRC (set) == rl_reg_rtx
> 			 && SET_DEST (set) == rld[s].reg_rtx)

What about cases where the secondary reload register also had
its mode adjusted?  Can this actually happen?

>   old = rl->out_reg;
>   if (old == 0
>-      || rl->reg_rtx == old
>-      || rl->reg_rtx == 0)
>+      || reg_rtx == old
>+      || reg_rtx == 0)
>     return;

I'm wondering whether == compares actually work here (and elsewhere),
as reload_adjust_reg_for_mode may allocate fresh rtx's ...


>@@ -7414,9 +7449,13 @@ emit_reload_insns (struct insn_chain *ch
> 
>   for (j = 0; j < n_reloads; j++)
>     {
>-      if (rld[j].reg_rtx
>-	  && REGNO (rld[j].reg_rtx) < FIRST_PSEUDO_REGISTER)
>-	new_spill_reg_store[REGNO (rld[j].reg_rtx)] = 0;
>+      if (rld[j].reg_rtx && HARD_REGISTER_P (rld[j].reg_rtx))
>+	{
>+	  unsigned int i;
>+
>+	  for (i = REGNO (rld[j].reg_rtx); i < END_REGNO (rld[j].reg_rtx); i++)
>+	    new_spill_reg_store[i] = 0;
>+	}

Maybe just memset'ing new_spill_reg_store to zero at the start
of emit_reload_insns would be clearer?


>+	  /* Maybe the spill reg contains a copy of reload_out.  */
>+	  if (rld[r].out != 0
>+	      && (REG_P (rld[r].out)
> #ifdef AUTO_INC_DEC
>-		      || ! rld[r].out_reg
>+		  || ! rld[r].out_reg
> #endif
>-		      || REG_P (rld[r].out_reg)))
>+		  || REG_P (rld[r].out_reg)))
>+	    {
>+	      rtx src;
>+	      enum machine_mode mode;
>+
>+	      src = reload_reg_rtx_for_output[r];
>+	      mode = GET_MODE (src);
>+	      i = REGNO (src);
>+	      nr = hard_regno_nregs[i][mode];

Redefining i and nr in the middle of this function strikes me as a
bit confusing, maybe it would be better to use different variable
names here ...

>+	  /* Maybe the spill reg contains a copy of reload_in.  Only do
>+	     something if there will not be an output reload for
>+	     the register being reloaded.  */
>+	  else if (rld[r].out_reg == 0
>+		   && rld[r].in != 0
>+		   && ((REG_P (rld[r].in)
>+			&& !HARD_REGISTER_P (rld[r].in)
>+			&& !REGNO_REG_SET_P (&reg_has_output_reload,
>+					     REGNO (rld[r].in)))
>+		       || (REG_P (rld[r].in_reg)
>+			   && !REGNO_REG_SET_P (&reg_has_output_reload,
>+						REGNO (rld[r].in_reg))))
>+		   && !reg_set_p (reload_reg_rtx_for_input[r], PATTERN (insn)))
>+	    {
>+	      rtx dest;
>+	      enum machine_mode mode;
>+
>+	      dest = reload_reg_rtx_for_input[r];

Why do you call the variable "dest" here and not "src" as above?
It is still the (potential) source of a future reload inheritance ...


>@@ -7687,17 +7718,18 @@ emit_reload_insns (struct insn_chain *ch
> 	 it thinks only about the original insn.  So invalidate it here.
> 	 Also do the same thing for RELOAD_OTHER constraints where the
> 	 output is discarded.  */
>-      if (i < 0 
>-	  && ((rld[r].out != 0
>-	       && (REG_P (rld[r].out)
>-		   || (MEM_P (rld[r].out)
>+      else if (i < 0
>+	       && ((rld[r].out != 0
>+		    && (REG_P (rld[r].out)
>+			|| (MEM_P (rld[r].out)
>+			    && REG_P (rld[r].out_reg))))

Why is the "else" necessary?


Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: PR 35232: Recent regression due to reload inheritance bug
  2008-03-12 18:50 ` Ulrich Weigand
@ 2008-03-13 18:03   ` Richard Sandiford
  2008-03-17 19:11     ` Ulrich Weigand
  0 siblings, 1 reply; 18+ messages in thread
From: Richard Sandiford @ 2008-03-13 18:03 UTC (permalink / raw)
  To: Ulrich Weigand; +Cc: gcc-patches

Hi Ulrich,

Thanks for the review, and for wading through this horrible code with me.

"Ulrich Weigand" <uweigand@de.ibm.com> writes:
> emit_input_reload_insns has an initialization of reloadreg:
>    rtx reloadreg = rl->reg_rtx;
> which is now dead and should be removed.

Oops!  Good catch, thanks.

>>@@ -7194,7 +7162,7 @@ emit_output_reload_insns (struct insn_ch
>> 		     made; leave new_spill_reg_store alone.  */
>> 		  ;
>> 		else if (s >= 0
>>-			 && SET_SRC (set) == rl->reg_rtx
>>+			 && SET_SRC (set) == rl_reg_rtx
>> 			 && SET_DEST (set) == rld[s].reg_rtx)
>
> What about cases where the secondary reload register also had
> its mode adjusted?  Can this actually happen?

Well, the following adjustment to the testcase creates an in/out
reload with miosmatched modes and secondary reloads in each
direction:

-------------------------------------------------------------------------
/* { dg-do run } */
/* { dg-mips-options "-O" } */

unsigned int
f1 (void)
{
  register unsigned long long x asm ("$f4");
  register unsigned int r asm ("$f2");
  asm ("# %0" : "=f" (x));
  asm ("# %0" : "=a" (r) : "0" (x));
  asm ("# %0" : "=h" (r) : "0" (r));
  return r;
}

int
main (void)
{
  return f1 () != 4;
}
-------------------------------------------------------------------------

(This is a 64-bit testcase, whereas the original miscompilation was
32-bit only.  I can't think of a way to get the same kind of secondary
reloads in 32-bit mode.)

But (as one would hope) the secondary reloads have the natural modes
for their respective directions:

Reloads for insn # 6
Reload 0: GR_REGS, RELOAD_FOR_OTHER_ADDRESS (opnum = 0), can't combine, secondar
y_reload_p
        reload_reg_rtx: (reg:DI 2 $2)
Reload 1: GR_REGS, RELOAD_FOR_OUTPUT_ADDRESS (opnum = 0), can't combine, seconda
ry_reload_p
        reload_reg_rtx: (reg:SI 2 $2)
Reload 2: reload_in (DI) = (reg/v:DI 36 $f4 [ x ])
        reload_out (SI) = (reg/v:SI 34 $f2 [ r ])
        ACC_REGS, RELOAD_OTHER (opnum = 0)
        reload_in_reg: (reg/v:DI 36 $f4 [ x ])
        reload_out_reg: (reg/v:SI 34 $f2 [ r ])
        reload_reg_rtx: (reg:DI 64 hi)
        secondary_in_reload = 0, secondary_out_reload = 1

AFAICT, push_secondary_reload always chooses the inmode of the parent
reload for inputs and the outmode of the parent reload for outputs,
so I'm not sure under what conditions:

	      if (GET_MODE (reloadreg) != mode)
		reloadreg = reload_adjust_reg_for_mode (reloadreg, mode);

would trigger.  (This call is only used for secondary reload moves,
not target-specific insns, and has been there in one form or another
since the beginning of the CVS history.)

As written, the final "else" of that statement is already required to
cope correctly with "s >= 0".  There is no guarantee that a secondary
reload will be either:

  (a) (set rld[s].reg_rtx rl_reg_rtx) or
  (b) not a single set at all

E.g., for MIPS, the secondary reload is actually a (set ... (unspec ...)).
reload_out* define_expands could also emit a SET other than (a).  So on
that basis:

  - The "else if" branch above should just be an optimisation

  - If the code is wrong, it shouldn't be any more wrong after my patch.
    The patch doesn't change the situations in which we adjust the
    secondary reload register (which, AFAICT, we probably never need to).

I agree -- if this is what you're thinking -- that it's far from obvious
that the original code is correct.  I assume the "not a single SET" part
exists because a reload_out* pattern might use UNSPEC_VOLATILEs, or some
other such thing whose semantics cannot be predicted by reload, and we
do not want to risk deleting it even if the result of the output reload
is found to be dead.  This should not happen for a normal move, whose
semantics ought to be just that: a move.  But if that's true, what about
reload_out* expanders that create a (set ... (unspec_volatile ...))?
And why do we not store to new_spill_reg_store[REGNO (rl_reg_rtx)] in
the "else if" you quote above, if the "else" really is correct for
"s >= 0" single sets?

However, going back to the "this isn't worse or better after my patch",
I think it would be best to deal with this as a separate issue.
Especially given that I'm still hoping to get this patch into 4.3. ;)

>
>>   old = rl->out_reg;
>>   if (old == 0
>>-      || rl->reg_rtx == old
>>-      || rl->reg_rtx == 0)
>>+      || reg_rtx == old
>>+      || reg_rtx == 0)
>>     return;
>
> I'm wondering whether == compares actually work here (and elsewhere),
> as reload_adjust_reg_for_mode may allocate fresh rtx's ...

I wondered about that too, but I don't think "rl->reg_rtx != old" ever
implied "!rtx_equal_p (rl->reg_rtx, old)" before the patch.  I think
this is just an "early out" for speed reasons, in which case using
rtx_equal_p might defeat the object.

>>@@ -7414,9 +7449,13 @@ emit_reload_insns (struct insn_chain *ch
>> 
>>   for (j = 0; j < n_reloads; j++)
>>     {
>>-      if (rld[j].reg_rtx
>>-	  && REGNO (rld[j].reg_rtx) < FIRST_PSEUDO_REGISTER)
>>-	new_spill_reg_store[REGNO (rld[j].reg_rtx)] = 0;
>>+      if (rld[j].reg_rtx && HARD_REGISTER_P (rld[j].reg_rtx))
>>+	{
>>+	  unsigned int i;
>>+
>>+	  for (i = REGNO (rld[j].reg_rtx); i < END_REGNO (rld[j].reg_rtx); i++)
>>+	    new_spill_reg_store[i] = 0;
>>+	}
>
> Maybe just memset'ing new_spill_reg_store to zero at the start
> of emit_reload_insns would be clearer?

TBH, I'd wondered about that too, but the original authors obviously
wanted to touch as little of the new_spill_reg_store array as possible.
I decided in the end to avoid changing peripheral decisions like that
as part of this patch.  (Again, this was partly a "for 4.3" thing.)

I suppose the thinking was that it was better not to dirty the whole
new_spill_reg_store array on targets with large register files when
only a very small number are actually set or used.

>>+	  /* Maybe the spill reg contains a copy of reload_out.  */
>>+	  if (rld[r].out != 0
>>+	      && (REG_P (rld[r].out)
>> #ifdef AUTO_INC_DEC
>>-		      || ! rld[r].out_reg
>>+		  || ! rld[r].out_reg
>> #endif
>>-		      || REG_P (rld[r].out_reg)))
>>+		  || REG_P (rld[r].out_reg)))
>>+	    {
>>+	      rtx src;
>>+	      enum machine_mode mode;
>>+
>>+	      src = reload_reg_rtx_for_output[r];
>>+	      mode = GET_MODE (src);
>>+	      i = REGNO (src);
>>+	      nr = hard_regno_nregs[i][mode];
>
> Redefining i and nr in the middle of this function strikes me as a
> bit confusing, maybe it would be better to use different variable
> names here ...

OK, I'll change it to use new local variables.  (I'm afraid this
was another sop to 4.3, to reduce the number of changes neeeded.
The justification was that the variables are still holding the same
conceptual information; we're just narrowing the range for the case
in hand.)

>>+	  /* Maybe the spill reg contains a copy of reload_in.  Only do
>>+	     something if there will not be an output reload for
>>+	     the register being reloaded.  */
>>+	  else if (rld[r].out_reg == 0
>>+		   && rld[r].in != 0
>>+		   && ((REG_P (rld[r].in)
>>+			&& !HARD_REGISTER_P (rld[r].in)
>>+			&& !REGNO_REG_SET_P (&reg_has_output_reload,
>>+					     REGNO (rld[r].in)))
>>+		       || (REG_P (rld[r].in_reg)
>>+			   && !REGNO_REG_SET_P (&reg_has_output_reload,
>>+						REGNO (rld[r].in_reg))))
>>+		   && !reg_set_p (reload_reg_rtx_for_input[r], PATTERN (insn)))
>>+	    {
>>+	      rtx dest;
>>+	      enum machine_mode mode;
>>+
>>+	      dest = reload_reg_rtx_for_input[r];
>
> Why do you call the variable "dest" here and not "src" as above?
> It is still the (potential) source of a future reload inheritance ...

If you look at each reload as a black box, it's really just a move, and
this code is trying to record that move for later.  So I was naming SRC
and DEST after their position in that move.

>>@@ -7687,17 +7718,18 @@ emit_reload_insns (struct insn_chain *ch
>> 	 it thinks only about the original insn.  So invalidate it here.
>> 	 Also do the same thing for RELOAD_OTHER constraints where the
>> 	 output is discarded.  */
>>-      if (i < 0 
>>-	  && ((rld[r].out != 0
>>-	       && (REG_P (rld[r].out)
>>-		   || (MEM_P (rld[r].out)
>>+      else if (i < 0
>>+	       && ((rld[r].out != 0
>>+		    && (REG_P (rld[r].out)
>>+			|| (MEM_P (rld[r].out)
>>+			    && REG_P (rld[r].out_reg))))
>
> Why is the "else" necessary?

That was part of the (rather misguided?) reassignments of "i" and "nr".
Changing this to "else" meant that we'd never refer to the original
"i" after "narrowing" it.  It wouldn't have made any difference in
practice; I just thought this was clearer.  (I'd argue it's a little
clearer on its own, but without the additional motivation of the
in-place narrowing, I'm happy to leave it out.)

Does the rest of the message convince you?  If so, I'll prepare
a version of the patch with new local vars and without the
s/if/else if/.

Richard

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: PR 35232: Recent regression due to reload inheritance bug
  2008-03-13 18:03   ` Richard Sandiford
@ 2008-03-17 19:11     ` Ulrich Weigand
  2008-03-19 22:31       ` Richard Sandiford
  0 siblings, 1 reply; 18+ messages in thread
From: Ulrich Weigand @ 2008-03-17 19:11 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc-patches

Hi Richard,

> "Ulrich Weigand" <uweigand@de.ibm.com> writes:
> > What about cases where the secondary reload register also had
> > its mode adjusted?  Can this actually happen?

> AFAICT, push_secondary_reload always chooses the inmode of the parent
> reload for inputs and the outmode of the parent reload for outputs,
> so I'm not sure under what conditions:
> 
> 	      if (GET_MODE (reloadreg) != mode)
> 		reloadreg = reload_adjust_reg_for_mode (reloadreg, mode);
> 
> would trigger.  (This call is only used for secondary reload moves,
> not target-specific insns, and has been there in one form or another
> since the beginning of the CVS history.)

Yes, I was wondering why this code was there ...

> I agree -- if this is what you're thinking -- that it's far from obvious
> that the original code is correct.  I assume the "not a single SET" part
> exists because a reload_out* pattern might use UNSPEC_VOLATILEs, or some
> other such thing whose semantics cannot be predicted by reload, and we
> do not want to risk deleting it even if the result of the output reload
> is found to be dead.  This should not happen for a normal move, whose
> semantics ought to be just that: a move.  But if that's true, what about
> reload_out* expanders that create a (set ... (unspec_volatile ...))?
> And why do we not store to new_spill_reg_store[REGNO (rl_reg_rtx)] in
> the "else if" you quote above, if the "else" really is correct for
> "s >= 0" single sets?

Agreed.  I think the latest "else" should really be a "else if (s < 0)".

> However, going back to the "this isn't worse or better after my patch",
> I think it would be best to deal with this as a separate issue.
> Especially given that I'm still hoping to get this patch into 4.3. ;)

OK.  I agree that your patch doesn't change the behaviour here, so I'm
fine with addressing this separately.


> > I'm wondering whether == compares actually work here (and elsewhere),
> > as reload_adjust_reg_for_mode may allocate fresh rtx's ...
> 
> I wondered about that too, but I don't think "rl->reg_rtx != old" ever
> implied "!rtx_equal_p (rl->reg_rtx, old)" before the patch.  I think
> this is just an "early out" for speed reasons, in which case using
> rtx_equal_p might defeat the object.

It's not just an "early out", it's trying to prevent generating an
identity move instruction if the reload register is already equal to
the target register for some reason.  Using rtx_equal_p seems to be
the correct solution here.

It's probably not a big deal if the identity move *is* generated,
as those will be cleaned up later on anyway.  Still, I don't see why
we *wouldn't* want to use rtx_equal_p here.

Anyway, this is not critical, so if you prefer this to be addressed
separately, that's fine with me.


> > Maybe just memset'ing new_spill_reg_store to zero at the start
> > of emit_reload_insns would be clearer?
> 
> TBH, I'd wondered about that too, but the original authors obviously
> wanted to touch as little of the new_spill_reg_store array as possible.
> I decided in the end to avoid changing peripheral decisions like that
> as part of this patch.  (Again, this was partly a "for 4.3" thing.)
> 
> I suppose the thinking was that it was better not to dirty the whole
> new_spill_reg_store array on targets with large register files when
> only a very small number are actually set or used.

OK, well the loop is fine with me.


> > Why do you call the variable "dest" here and not "src" as above?
> > It is still the (potential) source of a future reload inheritance ...
> 
> If you look at each reload as a black box, it's really just a move, and
> this code is trying to record that move for later.  So I was naming SRC
> and DEST after their position in that move.

The code isn't really trying to record the *move*.  It is trying to 
record that a register holds a particular value, so that this value
can be inherited later on.  At this point we do not care at all if
the reason why the register holds the value is because it was the
source or the destination of the reload.

See also the second if, starting below

      /* If a register gets output-reloaded from a non-spill register,
         that invalidates any previous reloaded copy of it.

where the variable is also called "src_reg" -- and gets its value from
either an output or an input reload.

I understood "source" here to mean "source of a future inheritance".


> >>@@ -7687,17 +7718,18 @@ emit_reload_insns (struct insn_chain *ch
> >> 	 it thinks only about the original insn.  So invalidate it here.
> >> 	 Also do the same thing for RELOAD_OTHER constraints where the
> >> 	 output is discarded.  */
> >>-      if (i < 0 
> >>-	  && ((rld[r].out != 0
> >>-	       && (REG_P (rld[r].out)
> >>-		   || (MEM_P (rld[r].out)
> >>+      else if (i < 0
> >>+	       && ((rld[r].out != 0
> >>+		    && (REG_P (rld[r].out)
> >>+			|| (MEM_P (rld[r].out)
> >>+			    && REG_P (rld[r].out_reg))))
> >
> > Why is the "else" necessary?
> 
> That was part of the (rather misguided?) reassignments of "i" and "nr".
> Changing this to "else" meant that we'd never refer to the original
> "i" after "narrowing" it.  It wouldn't have made any difference in
> practice; I just thought this was clearer.  (I'd argue it's a little
> clearer on its own, but without the additional motivation of the
> in-place narrowing, I'm happy to leave it out.)

I see.  I'd prefer to leave this as is (with the "i" unmodified).


> Does the rest of the message convince you?  If so, I'll prepare
> a version of the patch with new local vars and without the
> s/if/else if/.

OK, thanks!

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: PR 35232: Recent regression due to reload inheritance bug
  2008-03-17 19:11     ` Ulrich Weigand
@ 2008-03-19 22:31       ` Richard Sandiford
  2008-03-19 23:20         ` Ulrich Weigand
  0 siblings, 1 reply; 18+ messages in thread
From: Richard Sandiford @ 2008-03-19 22:31 UTC (permalink / raw)
  To: Ulrich Weigand; +Cc: gcc-patches

"Ulrich Weigand" <uweigand@de.ibm.com> writes:
>> I agree -- if this is what you're thinking -- that it's far from obvious
>> that the original code is correct.  I assume the "not a single SET" part
>> exists because a reload_out* pattern might use UNSPEC_VOLATILEs, or some
>> other such thing whose semantics cannot be predicted by reload, and we
>> do not want to risk deleting it even if the result of the output reload
>> is found to be dead.  This should not happen for a normal move, whose
>> semantics ought to be just that: a move.  But if that's true, what about
>> reload_out* expanders that create a (set ... (unspec_volatile ...))?
>> And why do we not store to new_spill_reg_store[REGNO (rl_reg_rtx)] in
>> the "else if" you quote above, if the "else" really is correct for
>> "s >= 0" single sets?
>
> Agreed.  I think the latest "else" should really be a "else if (s < 0)".
>
>> However, going back to the "this isn't worse or better after my patch",
>> I think it would be best to deal with this as a separate issue.
>> Especially given that I'm still hoping to get this patch into 4.3. ;)
>
> OK.  I agree that your patch doesn't change the behaviour here, so I'm
> fine with addressing this separately.

OK, thanks.  Since it's a one-liner, I'll probably test it after
we've sorted out the current patch.

>> > I'm wondering whether == compares actually work here (and elsewhere),
>> > as reload_adjust_reg_for_mode may allocate fresh rtx's ...
>> 
>> I wondered about that too, but I don't think "rl->reg_rtx != old" ever
>> implied "!rtx_equal_p (rl->reg_rtx, old)" before the patch.  I think
>> this is just an "early out" for speed reasons, in which case using
>> rtx_equal_p might defeat the object.
>
> It's not just an "early out", it's trying to prevent generating an
> identity move instruction if the reload register is already equal to
> the target register for some reason.  Using rtx_equal_p seems to be
> the correct solution here.
>
> It's probably not a big deal if the identity move *is* generated,
> as those will be cleaned up later on anyway.  Still, I don't see why
> we *wouldn't* want to use rtx_equal_p here.

Yeah, you've convinced me I was being overly cautious here.
I've changed it to use rtx_equal_p.

>> > Maybe just memset'ing new_spill_reg_store to zero at the start
>> > of emit_reload_insns would be clearer?
>> 
>> TBH, I'd wondered about that too, but the original authors obviously
>> wanted to touch as little of the new_spill_reg_store array as possible.
>> I decided in the end to avoid changing peripheral decisions like that
>> as part of this patch.  (Again, this was partly a "for 4.3" thing.)
>> 
>> I suppose the thinking was that it was better not to dirty the whole
>> new_spill_reg_store array on targets with large register files when
>> only a very small number are actually set or used.
>
> OK, well the loop is fine with me.

Thanks.

>> > Why do you call the variable "dest" here and not "src" as above?
>> > It is still the (potential) source of a future reload inheritance ...
>> 
>> If you look at each reload as a black box, it's really just a move, and
>> this code is trying to record that move for later.  So I was naming SRC
>> and DEST after their position in that move.
>
> The code isn't really trying to record the *move*.  It is trying to 
> record that a register holds a particular value, so that this value
> can be inherited later on.  At this point we do not care at all if
> the reason why the register holds the value is because it was the
> source or the destination of the reload.
>
> See also the second if, starting below
>
>       /* If a register gets output-reloaded from a non-spill register,
>          that invalidates any previous reloaded copy of it.
>
> where the variable is also called "src_reg" -- and gets its value from
> either an output or an input reload.
>
> I understood "source" here to mean "source of a future inheritance".

Hmm.  You're probably right that that's the intention behind the naming,
but given that we're dealing with existing moves (and even looking at
existing sets) it seems a very counter-intuitive scheme to me.

However, it looks like both styles of naming are going to cause
confusion, so can we compromise on the ever-bland "reg" instead?
(The most natural names for the new "i"- and "nr"-like variables
were "regno" and "nregs", so "reg" fits in quite nicely with that.)

>> >>@@ -7687,17 +7718,18 @@ emit_reload_insns (struct insn_chain *ch
>> >> 	 it thinks only about the original insn.  So invalidate it here.
>> >> 	 Also do the same thing for RELOAD_OTHER constraints where the
>> >> 	 output is discarded.  */
>> >>-      if (i < 0 
>> >>-	  && ((rld[r].out != 0
>> >>-	       && (REG_P (rld[r].out)
>> >>-		   || (MEM_P (rld[r].out)
>> >>+      else if (i < 0
>> >>+	       && ((rld[r].out != 0
>> >>+		    && (REG_P (rld[r].out)
>> >>+			|| (MEM_P (rld[r].out)
>> >>+			    && REG_P (rld[r].out_reg))))
>> >
>> > Why is the "else" necessary?
>> 
>> That was part of the (rather misguided?) reassignments of "i" and "nr".
>> Changing this to "else" meant that we'd never refer to the original
>> "i" after "narrowing" it.  It wouldn't have made any difference in
>> practice; I just thought this was clearer.  (I'd argue it's a little
>> clearer on its own, but without the additional motivation of the
>> in-place narrowing, I'm happy to leave it out.)
>
> I see.  I'd prefer to leave this as is (with the "i" unmodified).

OK, I've kept it as-is and added the new "regno" and "nregs" variables
mentioned above.  I also renamed "nregno" and "nnr" to "out_regno"/
"in_regno"/"src_regno" and "out_nregs"/"in_nregs"/"src_nregs" for
consistency.  Hopefully this makes things a little easier to follow;
having so many terse variable names doesn't help with what is already
hard-to-understand code.

I also converted "while (foo_nregs--) ...use foo_nregs..." loops into
"for (k = 0; k < nregs; k++)" loops, again for consistency with other
loops in the function, and also to avoid awkward line-wrapping issues.

Bootstrapped & regression-tested on x86_64-linux-gnu.  Also
regression-tested on mips64el-linux-gnu.  OK to install?

Thanks again for looking at this.

Richard


gcc/
	PR rtl-optimization/35232
	* reload1.c (reg_reloaded_call_part_clobbered): Clarify comment.
	(forget_old_reloads_1, forget_marked_reloads): Don't clear
	reg_reloaded_call_part_clobbered here.
	(reload_regs_reach_end_p): New function.
	(reload_reg_rtx_for_input): New variable.
	(reload_reg_rtx_for_output): Likewise.
	(emit_input_reload_insns): Use reloadreg rather than rl->reg_rtx
	when reassigning a pseudo register.  Load reloadreg from 
	reload_reg_rtx_for_input, moving the mode and register
	calculation to...
	(do_input_reload): ...here.  Use the mode-adjusted reg_rtx
	instead of the original when deciding whether an input reload
	would be a no-op or whether an output reload can be deleted.
	(emit_output_reload_insns): Use the mode-adjusted reg_rtx
	when setting up new_spill_reg_store.  Load it from
	reload_reg_rtx_for_output, moving the mode and register
	calculation to...
	(do_output_reload): ...here.  Use the mode-adjusted reg_rtx
	instead of the original when deciding whether an output reload
	would be a no-op.  Do the same when modifying insn notes.
	Use rtx_equal_p instead of == to compare the registers.
	(inherit_piecemeal_p): Take a mode and two register numbers
	as argument.
	(emit_reload_insns): Clear new_spill_reg_store for every hard
	register in the reload register.  Remove spill registers
	from reg_reloaded_valid before considering whether to record
	inheritance information for them.  Use reload_reg_rtx_for_output
	instead of reg_rtx when recording output reloads.  Use
	reload_reg_rtx_for_input instead of reg_rtx when recording
	input reloads.  Set or clear reg_reloaded_call_part_clobbered
	at the same time as setting reg_reloaded_valid.
	(delete_output_reload): Add a new_reload_reg parameter and use it
	instead of rld[j].reg_rtx.
	(emit_input_reload_insns, do_input_reload, do_output_reload): Adjust
	calls accordingly.

gcc/testsuite/
	PR rtl-optimization/35232
	* gcc.target/mips/pr35232.c: New test.

Index: binutils/readelf.c
===================================================================
--- binutils/readelf.c	2008-03-19 19:30:55.000000000 +0000
+++ binutils/readelf.c	2008-03-19 21:00:59.000000000 +0000
@@ -9125,6 +9125,33 @@ process_power_specific (FILE *file)
 			     display_power_gnu_attribute);
 }
 
+/* DATA points to the contents of a MIPS GOT that starts at VMA PLTGOT.
+   Print the Address, Access and Initial fields of an entry at VMA ADDR
+   and return the VMA of the next entry.  */
+
+static bfd_vma
+print_mips_got_entry (unsigned char *data, bfd_vma pltgot, bfd_vma addr)
+{
+  printf ("  ");
+  print_vma (addr, LONG_HEX);
+  printf (" ");
+  if (addr < pltgot + 0xfff0)
+    printf ("%6d(gp)", (int) (addr - pltgot - 0x7ff0));
+  else
+    printf ("%10s", "");
+  printf (" ");
+  if (data == NULL)
+    printf ("%*s", is_32bit_elf ? 8 : 16, "<unknown>");
+  else
+    {
+      bfd_vma entry;
+
+      entry = byte_get (data + addr - pltgot, is_32bit_elf ? 4 : 8);
+      print_vma (entry, LONG_HEX);
+    }
+  return addr + (is_32bit_elf ? 4 : 8);
+}
+
 static int
 process_mips_specific (FILE *file)
 {
@@ -9134,6 +9161,10 @@ process_mips_specific (FILE *file)
   size_t conflictsno = 0;
   size_t options_offset = 0;
   size_t conflicts_offset = 0;
+  bfd_vma pltgot = 0;
+  bfd_vma local_gotno = 0;
+  bfd_vma gotsym = 0;
+  bfd_vma symtabno = 0;
 
   process_attributes (file, NULL, SHT_GNU_ATTRIBUTES, NULL,
 		      display_mips_gnu_attribute);
@@ -9165,6 +9196,17 @@ process_mips_specific (FILE *file)
       case DT_MIPS_CONFLICTNO:
 	conflictsno = entry->d_un.d_val;
 	break;
+      case DT_PLTGOT:
+	pltgot = entry->d_un.d_val;
+      case DT_MIPS_LOCAL_GOTNO:
+	local_gotno = entry->d_un.d_val;
+	break;
+      case DT_MIPS_GOTSYM:
+	gotsym = entry->d_un.d_val;
+	break;
+      case DT_MIPS_SYMTABNO:
+	symtabno = entry->d_un.d_val;
+	break;
       default:
 	break;
       }
@@ -9515,6 +9557,87 @@ process_mips_specific (FILE *file)
       free (iconf);
     }
 
+  if (pltgot != 0 && local_gotno != 0)
+    {
+      bfd_vma entry, local_end, global_end;
+      size_t addr_size, i, offset;
+      unsigned char *data;
+
+      entry = pltgot;
+      addr_size = (is_32bit_elf ? 4 : 8);
+      local_end = pltgot + local_gotno * addr_size;
+      global_end = local_end + (symtabno - gotsym) * addr_size;
+
+      offset = offset_from_vma (file, pltgot, global_end - pltgot);
+      data = get_data (NULL, file, offset, global_end - pltgot, 1, _("GOT"));
+      printf (_("\nPrimary GOT:\n"));
+      printf (_(" Canonical gp value: "));
+      print_vma (pltgot + 0x7ff0, LONG_HEX);
+      printf ("\n\n");
+
+      printf (_(" Reserved entries:\n"));
+      printf (_("  %*s %10s %*s Purpose\n"),
+	      addr_size * 2, "Address", "Access",
+	      addr_size * 2, "Initial");
+      entry = print_mips_got_entry (data, pltgot, entry);
+      printf (" Lazy resolver\n");
+      if (data
+	  && (byte_get (data + entry - pltgot, addr_size)
+	      >> (addr_size * 8 - 1)) != 0)
+	{
+	  entry = print_mips_got_entry (data, pltgot, entry);
+	  printf (" Module pointer (GNU extension)\n");
+	}
+      printf ("\n");
+
+      if (entry < local_end)
+	{
+	  printf (_(" Local entries:\n"));
+	  printf (_("  %*s %10s %*s\n"),
+		  addr_size * 2, "Address", "Access",
+		  addr_size * 2, "Initial");
+	  while (entry < local_end)
+	    {
+	      entry = print_mips_got_entry (data, pltgot, entry);
+	      printf ("\n");
+	    }
+	  printf ("\n");
+	}
+
+      if (gotsym < symtabno)
+	{
+	  int sym_width;
+
+	  printf (_(" Global entries:\n"));
+	  printf (_("  %*s %10s %*s %*s %-7s %3s %s\n"),
+		  addr_size * 2, "Address", "Access",
+		  addr_size * 2, "Initial",
+		  addr_size * 2, "Sym.Val.", "Type", "Ndx", "Name");
+	  sym_width = (is_32bit_elf ? 80 : 160) - 28 - addr_size * 6 - 1;
+	  for (i = gotsym; i < symtabno; i++)
+	    {
+	      Elf_Internal_Sym *psym;
+
+	      psym = dynamic_symbols + i;
+	      entry = print_mips_got_entry (data, pltgot, entry);
+	      printf (" ");
+	      print_vma (psym->st_value, LONG_HEX);
+	      printf (" %-7s %3s ",
+		      get_symbol_type (ELF_ST_TYPE (psym->st_info)),
+		      get_symbol_index_type (psym->st_shndx));
+	      if (VALID_DYNAMIC_NAME (psym->st_name))
+		print_symbol (sym_width, GET_DYNAMIC_NAME (psym->st_name));
+	      else
+		printf ("<corrupt: %14ld>", psym->st_name);
+	      printf ("\n");
+	    }
+	  printf ("\n");
+	}
+
+      if (data)
+	free (data);
+    }
+
   return 1;
 }
 
Index: ld/testsuite/ld-mips-elf/got-dump-1.d
===================================================================
--- /dev/null	2008-03-17 07:04:11.552097000 +0000
+++ ld/testsuite/ld-mips-elf/got-dump-1.d	2008-03-19 19:32:34.000000000 +0000
@@ -0,0 +1,25 @@
+#name: GOT dump (readelf -A) test 1
+#source: got-dump-1.s
+#as: -mips3
+#ld: -Tgot-dump-1.ld -shared
+#readelf: -A
+
+Primary GOT:
+ Canonical gp value: 00068000
+
+ Reserved entries:
+   Address     Access  Initial Purpose
+  00060010 -32752\(gp\) 00000000 Lazy resolver
+  00060014 -32748\(gp\) 80000000 Module pointer \(GNU extension\)
+
+ Local entries:
+   Address     Access  Initial
+  00060018 -32744\(gp\) 00060000
+  0006001c -32740\(gp\) 00060004
+
+ Global entries:
+   Address     Access  Initial Sym.Val. Type    Ndx Name
+  00060020 -32736\(gp\) 00050020 00050020 FUNC    UND extern
+  00060024 -32732\(gp\) 00000000 00000000 NOTYPE  UND undef
+  00060028 -32728\(gp\) 00050000 00050000 FUNC      7 glob
+
Index: ld/testsuite/ld-mips-elf/got-dump-1.s
===================================================================
--- /dev/null	2008-03-17 07:04:11.552097000 +0000
+++ ld/testsuite/ld-mips-elf/got-dump-1.s	2008-03-19 19:32:34.000000000 +0000
@@ -0,0 +1,22 @@
+	.global	glob
+	.ent	glob
+glob:
+	lw	$4,%got(local)($28)
+	addiu	$4,$4,%lo(local)
+	lw	$4,%got(hidden)($28)
+	lw	$4,%call16(glob)($28)
+	lw	$4,%call16(extern)($28)
+	.end	glob
+
+	.data
+	.type	local,%object
+	.size	local,4
+local:
+	.word	undef
+
+	.globl	hidden
+	.hidden	hidden
+	.type	hidden,%object
+	.size	hidden,4
+hidden:
+	.word	0
Index: ld/testsuite/ld-mips-elf/got-dump-1.ld
===================================================================
--- /dev/null	2008-03-17 07:04:11.552097000 +0000
+++ ld/testsuite/ld-mips-elf/got-dump-1.ld	2008-03-19 19:32:34.000000000 +0000
@@ -0,0 +1,19 @@
+SECTIONS
+{
+  . = 0x40000;
+  .reginfo : { *(.reginfo) }
+  .dynamic : { *(.dynamic) }
+  .hash : { *(.hash) }
+  .dynsym : { *(.dynsym) }
+  .dynstr : { *(.dynstr) }
+  .rel.dyn : { *(.rel.dyn) }
+
+  . = 0x50000;
+  .text : { *(.text) }
+  .MIPS.stubs : { *(.MIPS.stubs) }
+
+  . = 0x60000;
+  .data : { *(.data) }
+  _gp = ALIGN (16) + 0x7ff0;
+  .got : { *(.got) }
+}
Index: ld/testsuite/ld-mips-elf/got-dump-2.d
===================================================================
--- /dev/null	2008-03-17 07:04:11.552097000 +0000
+++ ld/testsuite/ld-mips-elf/got-dump-2.d	2008-03-19 19:42:31.000000000 +0000
@@ -0,0 +1,25 @@
+#name: GOT dump (readelf -A) test 2
+#source: got-dump-2.s
+#as: -mips3 -EB -64
+#ld: -Tgot-dump-2.ld -shared -melf64btsmip
+#readelf: -A
+
+Primary GOT:
+ Canonical gp value: 0001236000008000
+
+ Reserved entries:
+           Address     Access          Initial Purpose
+  0001236000000010 -32752\(gp\) 0000000000000000 Lazy resolver
+  0001236000000018 -32744\(gp\) 8000000000000000 Module pointer \(GNU extension\)
+
+ Local entries:
+           Address     Access          Initial
+  0001236000000020 -32736\(gp\) 0001236000000000
+  0001236000000028 -32728\(gp\) 0001236000000008
+
+ Global entries:
+           Address     Access          Initial         Sym.Val. Type    Ndx Name
+  0001236000000030 -32720\(gp\) 0001235000000020 0001235000000020 FUNC    UND extern
+  0001236000000038 -32712\(gp\) 0000000000000000 0000000000000000 NOTYPE  UND undef
+  0001236000000040 -32704\(gp\) 0001235000000000 0001235000000000 FUNC      7 glob
+
Index: ld/testsuite/ld-mips-elf/got-dump-2.s
===================================================================
--- /dev/null	2008-03-17 07:04:11.552097000 +0000
+++ ld/testsuite/ld-mips-elf/got-dump-2.s	2008-03-19 19:32:34.000000000 +0000
@@ -0,0 +1,22 @@
+	.global	glob
+	.ent	glob
+glob:
+	ld	$4,%got_page(local)($28)
+	daddiu	$4,$4,%got_ofst(local)
+	ld	$4,%got_disp(hidden)($28)
+	ld	$4,%call16(glob)($28)
+	ld	$4,%call16(extern)($28)
+	.end	glob
+
+	.data
+	.type	local,%object
+	.size	local,8
+local:
+	.dword	undef
+
+	.globl	hidden
+	.hidden	hidden
+	.type	hidden,%object
+	.size	hidden,8
+hidden:
+	.dword	0
Index: ld/testsuite/ld-mips-elf/got-dump-2.ld
===================================================================
--- /dev/null	2008-03-17 07:04:11.552097000 +0000
+++ ld/testsuite/ld-mips-elf/got-dump-2.ld	2008-03-19 19:41:40.000000000 +0000
@@ -0,0 +1,18 @@
+SECTIONS
+{
+  . = 0x1234000000000;
+  .dynamic : { *(.dynamic) }
+  .hash : { *(.hash) }
+  .dynsym : { *(.dynsym) }
+  .dynstr : { *(.dynstr) }
+  .rel.dyn : { *(.rel.dyn) }
+
+  . = 0x1235000000000;
+  .text : { *(.text) }
+  .MIPS.stubs : { *(.MIPS.stubs) }
+
+  . = 0x1236000000000;
+  .data : { *(.data) }
+  _gp = ALIGN (16) + 0x7ff0;
+  .got : { *(.got) }
+}
Index: ld/testsuite/ld-mips-elf/mips-elf.exp
===================================================================
--- ld/testsuite/ld-mips-elf/mips-elf.exp	2008-03-19 19:30:55.000000000 +0000
+++ ld/testsuite/ld-mips-elf/mips-elf.exp	2008-03-19 19:42:21.000000000 +0000
@@ -156,6 +156,10 @@ if { $linux_gnu } {
 	run_dump_test "got-page-2"
     }
     run_dump_test "got-page-3"
+    run_dump_test "got-dump-1"
+    if $has_newabi {
+	run_dump_test "got-dump-2"
+    }
 }
 
 if $has_newabi {

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: PR 35232: Recent regression due to reload inheritance bug
  2008-03-19 22:31       ` Richard Sandiford
@ 2008-03-19 23:20         ` Ulrich Weigand
  2008-03-19 23:39           ` Richard Sandiford
  0 siblings, 1 reply; 18+ messages in thread
From: Ulrich Weigand @ 2008-03-19 23:20 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc-patches

Richard Sandiford wrote:
> >> However, going back to the "this isn't worse or better after my patch",
> >> I think it would be best to deal with this as a separate issue.
> >> Especially given that I'm still hoping to get this patch into 4.3. ;)
> >
> > OK.  I agree that your patch doesn't change the behaviour here, so I'm
> > fine with addressing this separately.
> 
> OK, thanks.  Since it's a one-liner, I'll probably test it after
> we've sorted out the current patch.

Thanks!

> Hmm.  You're probably right that that's the intention behind the naming,
> but given that we're dealing with existing moves (and even looking at
> existing sets) it seems a very counter-intuitive scheme to me.
> 
> However, it looks like both styles of naming are going to cause
> confusion, so can we compromise on the ever-bland "reg" instead?
> (The most natural names for the new "i"- and "nr"-like variables
> were "regno" and "nregs", so "reg" fits in quite nicely with that.)

That's fine with me as well ...

> OK, I've kept it as-is and added the new "regno" and "nregs" variables
> mentioned above.  I also renamed "nregno" and "nnr" to "out_regno"/
> "in_regno"/"src_regno" and "out_nregs"/"in_nregs"/"src_nregs" for
> consistency.  Hopefully this makes things a little easier to follow;
> having so many terse variable names doesn't help with what is already
> hard-to-understand code.

I certainly agree.

> Index: binutils/readelf.c
> ===================================================================
> --- binutils/readelf.c	2008-03-19 19:30:55.000000000 +0000
> +++ binutils/readelf.c	2008-03-19 21:00:59.000000000 +0000
> @@ -9125,6 +9125,33 @@ process_power_specific (FILE *file)
>  			     display_power_gnu_attribute);
>  }
>  
> +/* DATA points to the contents of a MIPS GOT that starts at VMA PLTGOT.
> +   Print the Address, Access and Initial fields of an entry at VMA ADDR
> +   and return the VMA of the next entry.  */

Wrong patch?


Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: PR 35232: Recent regression due to reload inheritance bug
  2008-03-19 23:20         ` Ulrich Weigand
@ 2008-03-19 23:39           ` Richard Sandiford
  2008-03-25 20:47             ` Ulrich Weigand
  0 siblings, 1 reply; 18+ messages in thread
From: Richard Sandiford @ 2008-03-19 23:39 UTC (permalink / raw)
  To: Ulrich Weigand; +Cc: gcc-patches

"Ulrich Weigand" <uweigand@de.ibm.com> writes:
>> +/* DATA points to the contents of a MIPS GOT that starts at VMA PLTGOT.
>> +   Print the Address, Access and Initial fields of an entry at VMA ADDR
>> +   and return the VMA of the next entry.  */
>
> Wrong patch?

Doh.  Maybe this'll seem more relevent.

Richard


gcc/
	PR rtl-optimization/35232
	* reload1.c (reg_reloaded_call_part_clobbered): Clarify comment.
	(forget_old_reloads_1, forget_marked_reloads): Don't clear
	reg_reloaded_call_part_clobbered here.
	(reload_regs_reach_end_p): New function.
	(reload_reg_rtx_for_input): New variable.
	(reload_reg_rtx_for_output): Likewise.
	(emit_input_reload_insns): Use reloadreg rather than rl->reg_rtx
	when reassigning a pseudo register.  Load reloadreg from 
	reload_reg_rtx_for_input, moving the mode and register
	calculation to...
	(do_input_reload): ...here.  Use the mode-adjusted reg_rtx
	instead of the original when deciding whether an input reload
	would be a no-op or whether an output reload can be deleted.
	(emit_output_reload_insns): Use the mode-adjusted reg_rtx
	when setting up new_spill_reg_store.  Load it from
	reload_reg_rtx_for_output, moving the mode and register
	calculation to...
	(do_output_reload): ...here.  Use the mode-adjusted reg_rtx
	instead of the original when deciding whether an output reload
	would be a no-op.  Do the same when modifying insn notes.
	Use rtx_equal_p instead of == to compare the registers.
	(inherit_piecemeal_p): Take a mode and two register numbers
	as argument.
	(emit_reload_insns): Clear new_spill_reg_store for every hard
	register in the reload register.  Remove spill registers
	from reg_reloaded_valid before considering whether to record
	inheritance information for them.  Use reload_reg_rtx_for_output
	instead of reg_rtx when recording output reloads.  Use
	reload_reg_rtx_for_input instead of reg_rtx when recording
	input reloads.  Set or clear reg_reloaded_call_part_clobbered
	at the same time as setting reg_reloaded_valid.
	(delete_output_reload): Add a new_reload_reg parameter and use it
	instead of rld[j].reg_rtx.
	(emit_input_reload_insns, do_input_reload, do_output_reload): Adjust
	calls accordingly.

gcc/testsuite/
	PR rtl-optimization/35232
	* gcc.target/mips/pr35232.c: New test.

Index: gcc/reload1.c
===================================================================
--- gcc/reload1.c	2008-03-18 22:24:05.000000000 +0000
+++ gcc/reload1.c	2008-03-19 18:19:56.000000000 +0000
@@ -158,7 +158,7 @@ VEC(rtx,gc) *reg_equiv_memory_loc_vec;
 
 /* Indicate whether the register's current value is one that is not
    safe to retain across a call, even for registers that are normally
-   call-saved.  */
+   call-saved.  This is only meaningful for members of reg_reloaded_valid.  */
 static HARD_REG_SET reg_reloaded_call_part_clobbered;
 
 /* Number of spill-regs so far; number of valid elements of spill_regs.  */
@@ -434,9 +434,8 @@ static void emit_output_reload_insns (st
 				      int);
 static void do_input_reload (struct insn_chain *, struct reload *, int);
 static void do_output_reload (struct insn_chain *, struct reload *, int);
-static bool inherit_piecemeal_p (int, int);
 static void emit_reload_insns (struct insn_chain *);
-static void delete_output_reload (rtx, int, int);
+static void delete_output_reload (rtx, int, int, rtx);
 static void delete_address_reloads (rtx, rtx);
 static void delete_address_reloads_1 (rtx, rtx, rtx);
 static rtx inc_for_reload (rtx, rtx, rtx, int);
@@ -4371,7 +4370,6 @@ forget_old_reloads_1 (rtx x, const_rtx i
 	      || ! TEST_HARD_REG_BIT (reg_is_output_reload, regno + i))
 	    {
 	      CLEAR_HARD_REG_BIT (reg_reloaded_valid, regno + i);
-	      CLEAR_HARD_REG_BIT (reg_reloaded_call_part_clobbered, regno + i);
 	      spill_reg_store[regno + i] = 0;
 	    }
     }
@@ -4408,7 +4406,6 @@ forget_marked_reloads (regset regs)
 	      || ! TEST_HARD_REG_BIT (reg_is_output_reload, reg)))
 	  {
 	    CLEAR_HARD_REG_BIT (reg_reloaded_valid, reg);
-	    CLEAR_HARD_REG_BIT (reg_reloaded_call_part_clobbered, reg);
 	    spill_reg_store[reg] = 0;
 	  }
       if (n_reloads == 0
@@ -4922,6 +4919,21 @@ reload_reg_reaches_end_p (unsigned int r
       gcc_unreachable ();
     }
 }
+
+/* Like reload_reg_reaches_end_p, but check that the condition holds for
+   every register in the range [REGNO, REGNO + NREGS).  */
+
+static bool
+reload_regs_reach_end_p (unsigned int regno, int nregs,
+			 int opnum, enum reload_type type)
+{
+  int i;
+
+  for (i = 0; i < nregs; i++)
+    if (!reload_reg_reaches_end_p (regno + i, opnum, type))
+      return false;
+  return true;
+}
 \f
 
 /*  Returns whether R1 and R2 are uniquely chained: the value of one
@@ -5061,6 +5073,12 @@ reloads_conflict (int r1, int r2)
    or -1 if we did not need a register for this reload.  */
 static int reload_spill_index[MAX_RELOADS];
 
+/* Index X is the value of rld[X].reg_rtx, adjusted for the input mode.  */
+static rtx reload_reg_rtx_for_input[MAX_RELOADS];
+
+/* Index X is the value of rld[X].reg_rtx, adjusted for the output mode.  */
+static rtx reload_reg_rtx_for_output[MAX_RELOADS];
+
 /* Subroutine of free_for_value_p, used to check a single register.
    START_REGNO is the starting regno of the full reload register
    (possibly comprising multiple hard registers) that we are considering.  */
@@ -6552,49 +6570,13 @@ emit_input_reload_insns (struct insn_cha
 			 rtx old, int j)
 {
   rtx insn = chain->insn;
-  rtx reloadreg = rl->reg_rtx;
+  rtx reloadreg;
   rtx oldequiv_reg = 0;
   rtx oldequiv = 0;
   int special = 0;
   enum machine_mode mode;
   rtx *where;
 
-  /* Determine the mode to reload in.
-     This is very tricky because we have three to choose from.
-     There is the mode the insn operand wants (rl->inmode).
-     There is the mode of the reload register RELOADREG.
-     There is the intrinsic mode of the operand, which we could find
-     by stripping some SUBREGs.
-     It turns out that RELOADREG's mode is irrelevant:
-     we can change that arbitrarily.
-
-     Consider (SUBREG:SI foo:QI) as an operand that must be SImode;
-     then the reload reg may not support QImode moves, so use SImode.
-     If foo is in memory due to spilling a pseudo reg, this is safe,
-     because the QImode value is in the least significant part of a
-     slot big enough for a SImode.  If foo is some other sort of
-     memory reference, then it is impossible to reload this case,
-     so previous passes had better make sure this never happens.
-
-     Then consider a one-word union which has SImode and one of its
-     members is a float, being fetched as (SUBREG:SF union:SI).
-     We must fetch that as SFmode because we could be loading into
-     a float-only register.  In this case OLD's mode is correct.
-
-     Consider an immediate integer: it has VOIDmode.  Here we need
-     to get a mode from something else.
-
-     In some cases, there is a fourth mode, the operand's
-     containing mode.  If the insn specifies a containing mode for
-     this operand, it overrides all others.
-
-     I am not sure whether the algorithm here is always right,
-     but it does the right things in those cases.  */
-
-  mode = GET_MODE (old);
-  if (mode == VOIDmode)
-    mode = rl->inmode;
-
   /* delete_output_reload is only invoked properly if old contains
      the original pseudo register.  Since this is replaced with a
      hard reg when RELOAD_OVERRIDE_IN is set, see if we can
@@ -6612,6 +6594,9 @@ emit_input_reload_insns (struct insn_cha
   else if (GET_CODE (oldequiv) == SUBREG)
     oldequiv_reg = SUBREG_REG (oldequiv);
 
+  reloadreg = reload_reg_rtx_for_input[j];
+  mode = GET_MODE (reloadreg);
+
   /* If we are reloading from a register that was recently stored in
      with an output-reload, see if we can prove there was
      actually no need to store the old value in it.  */
@@ -6623,16 +6608,11 @@ emit_input_reload_insns (struct insn_cha
       && (dead_or_set_p (insn, spill_reg_stored_to[REGNO (oldequiv)])
 	  || rtx_equal_p (spill_reg_stored_to[REGNO (oldequiv)],
 			  rl->out_reg)))
-    delete_output_reload (insn, j, REGNO (oldequiv));
+    delete_output_reload (insn, j, REGNO (oldequiv), reloadreg);
 
-  /* Encapsulate both RELOADREG and OLDEQUIV into that mode,
-     then load RELOADREG from OLDEQUIV.  Note that we cannot use
-     gen_lowpart_common since it can do the wrong thing when
-     RELOADREG has a multi-word mode.  Note that RELOADREG
-     must always be a REG here.  */
+  /* Encapsulate OLDEQUIV into the reload mode, then load RELOADREG from
+     OLDEQUIV.  */
 
-  if (GET_MODE (reloadreg) != mode)
-    reloadreg = reload_adjust_reg_for_mode (reloadreg, mode);
   while (GET_CODE (oldequiv) == SUBREG && GET_MODE (oldequiv) != mode)
     oldequiv = SUBREG_REG (oldequiv);
   if (GET_MODE (oldequiv) != VOIDmode
@@ -6696,7 +6676,7 @@ emit_input_reload_insns (struct insn_cha
 			     spill_reg_stored_to[REGNO (oldequiv)])
 	      || rtx_equal_p (spill_reg_stored_to[REGNO (oldequiv)],
 			      old)))
-	delete_output_reload (insn, j, REGNO (oldequiv));
+	delete_output_reload (insn, j, REGNO (oldequiv), reloadreg);
 
       /* Prevent normal processing of this reload.  */
       special = 1;
@@ -6756,7 +6736,7 @@ emit_input_reload_insns (struct insn_cha
 	      if (REG_N_DEATHS (REGNO (old)) == 1
 		  && REG_N_SETS (REGNO (old)) == 1)
 		{
-		  reg_renumber[REGNO (old)] = REGNO (rl->reg_rtx);
+		  reg_renumber[REGNO (old)] = REGNO (reloadreg);
 		  alter_reg (REGNO (old), -1);
 		}
 	      special = 1;
@@ -7015,35 +6995,23 @@ emit_input_reload_insns (struct insn_cha
 emit_output_reload_insns (struct insn_chain *chain, struct reload *rl,
 			  int j)
 {
-  rtx reloadreg = rl->reg_rtx;
+  rtx reloadreg;
   rtx insn = chain->insn;
   int special = 0;
   rtx old = rl->out;
-  enum machine_mode mode = GET_MODE (old);
+  enum machine_mode mode;
   rtx p;
+  rtx rl_reg_rtx;
 
   if (rl->when_needed == RELOAD_OTHER)
     start_sequence ();
   else
     push_to_sequence (output_reload_insns[rl->opnum]);
 
-  /* Determine the mode to reload in.
-     See comments above (for input reloading).  */
+  rl_reg_rtx = reload_reg_rtx_for_output[j];
+  mode = GET_MODE (rl_reg_rtx);
 
-  if (mode == VOIDmode)
-    {
-      /* VOIDmode should never happen for an output.  */
-      if (asm_noperands (PATTERN (insn)) < 0)
-	/* It's the compiler's fault.  */
-	fatal_insn ("VOIDmode on an output", insn);
-      error_for_asm (insn, "output operand is constant in %<asm%>");
-      /* Prevent crash--use something we know is valid.  */
-      mode = word_mode;
-      old = gen_rtx_REG (mode, REGNO (reloadreg));
-    }
-
-  if (GET_MODE (reloadreg) != mode)
-    reloadreg = reload_adjust_reg_for_mode (reloadreg, mode);
+  reloadreg = rl_reg_rtx;
 
   /* If we need two reload regs, set RELOADREG to the intermediate
      one, since it will be stored into OLD.  We might need a secondary
@@ -7167,12 +7135,12 @@ emit_output_reload_insns (struct insn_ch
 	   reg_has_output_reload will make this do nothing.  */
 	note_stores (pat, forget_old_reloads_1, NULL);
 
-	if (reg_mentioned_p (rl->reg_rtx, pat))
+	if (reg_mentioned_p (rl_reg_rtx, pat))
 	  {
 	    rtx set = single_set (insn);
 	    if (reload_spill_index[j] < 0
 		&& set
-		&& SET_SRC (set) == rl->reg_rtx)
+		&& SET_SRC (set) == rl_reg_rtx)
 	      {
 		int src = REGNO (SET_SRC (set));
 
@@ -7181,7 +7149,7 @@ emit_output_reload_insns (struct insn_ch
 		if (find_regno_note (insn, REG_DEAD, src))
 		  SET_HARD_REG_BIT (reg_reloaded_died, src);
 	      }
-	    if (REGNO (rl->reg_rtx) < FIRST_PSEUDO_REGISTER)
+	    if (HARD_REGISTER_P (rl_reg_rtx))
 	      {
 		int s = rl->secondary_out_reload;
 		set = single_set (p);
@@ -7194,7 +7162,7 @@ emit_output_reload_insns (struct insn_ch
 		     made; leave new_spill_reg_store alone.  */
 		  ;
 		else if (s >= 0
-			 && SET_SRC (set) == rl->reg_rtx
+			 && SET_SRC (set) == rl_reg_rtx
 			 && SET_DEST (set) == rld[s].reg_rtx)
 		  {
 		    /* Usually the next instruction will be the
@@ -7215,7 +7183,7 @@ emit_output_reload_insns (struct insn_ch
 		      }
 		  }
 		else
-		  new_spill_reg_store[REGNO (rl->reg_rtx)] = p;
+		  new_spill_reg_store[REGNO (rl_reg_rtx)] = p;
 	      }
 	  }
       }
@@ -7242,13 +7210,62 @@ do_input_reload (struct insn_chain *chai
   rtx insn = chain->insn;
   rtx old = (rl->in && MEM_P (rl->in)
 	     ? rl->in_reg : rl->in);
+  rtx reg_rtx = rl->reg_rtx;
+
+  if (old && reg_rtx)
+    {
+      enum machine_mode mode;
+
+      /* Determine the mode to reload in.
+	 This is very tricky because we have three to choose from.
+	 There is the mode the insn operand wants (rl->inmode).
+	 There is the mode of the reload register RELOADREG.
+	 There is the intrinsic mode of the operand, which we could find
+	 by stripping some SUBREGs.
+	 It turns out that RELOADREG's mode is irrelevant:
+	 we can change that arbitrarily.
+
+	 Consider (SUBREG:SI foo:QI) as an operand that must be SImode;
+	 then the reload reg may not support QImode moves, so use SImode.
+	 If foo is in memory due to spilling a pseudo reg, this is safe,
+	 because the QImode value is in the least significant part of a
+	 slot big enough for a SImode.  If foo is some other sort of
+	 memory reference, then it is impossible to reload this case,
+	 so previous passes had better make sure this never happens.
+
+	 Then consider a one-word union which has SImode and one of its
+	 members is a float, being fetched as (SUBREG:SF union:SI).
+	 We must fetch that as SFmode because we could be loading into
+	 a float-only register.  In this case OLD's mode is correct.
+
+	 Consider an immediate integer: it has VOIDmode.  Here we need
+	 to get a mode from something else.
+
+	 In some cases, there is a fourth mode, the operand's
+	 containing mode.  If the insn specifies a containing mode for
+	 this operand, it overrides all others.
+
+	 I am not sure whether the algorithm here is always right,
+	 but it does the right things in those cases.  */
+
+      mode = GET_MODE (old);
+      if (mode == VOIDmode)
+	mode = rl->inmode;
+
+      /* We cannot use gen_lowpart_common since it can do the wrong thing
+	 when REG_RTX has a multi-word mode.  Note that REG_RTX must
+	 always be a REG here.  */
+      if (GET_MODE (reg_rtx) != mode)
+	reg_rtx = reload_adjust_reg_for_mode (reg_rtx, mode);
+    }
+  reload_reg_rtx_for_input[j] = reg_rtx;
 
   if (old != 0
       /* AUTO_INC reloads need to be handled even if inherited.  We got an
 	 AUTO_INC reload if reload_out is set but reload_out_reg isn't.  */
       && (! reload_inherited[j] || (rl->out && ! rl->out_reg))
-      && ! rtx_equal_p (rl->reg_rtx, old)
-      && rl->reg_rtx != 0)
+      && ! rtx_equal_p (reg_rtx, old)
+      && reg_rtx != 0)
     emit_input_reload_insns (chain, rld + j, old, j);
 
   /* When inheriting a wider reload, we have a MEM in rl->in,
@@ -7267,24 +7284,21 @@ do_input_reload (struct insn_chain *chai
 
   if (optimize
       && (reload_inherited[j] || reload_override_in[j])
-      && rl->reg_rtx
-      && REG_P (rl->reg_rtx)
-      && spill_reg_store[REGNO (rl->reg_rtx)] != 0
+      && reg_rtx
+      && REG_P (reg_rtx)
+      && spill_reg_store[REGNO (reg_rtx)] != 0
 #if 0
       /* There doesn't seem to be any reason to restrict this to pseudos
 	 and doing so loses in the case where we are copying from a
 	 register of the wrong class.  */
-      && (REGNO (spill_reg_stored_to[REGNO (rl->reg_rtx)])
-	  >= FIRST_PSEUDO_REGISTER)
+      && !HARD_REGISTER_P (spill_reg_stored_to[REGNO (reg_rtx)])
 #endif
       /* The insn might have already some references to stackslots
 	 replaced by MEMs, while reload_out_reg still names the
 	 original pseudo.  */
-      && (dead_or_set_p (insn,
-			 spill_reg_stored_to[REGNO (rl->reg_rtx)])
-	  || rtx_equal_p (spill_reg_stored_to[REGNO (rl->reg_rtx)],
-			  rl->out_reg)))
-    delete_output_reload (insn, j, REGNO (rl->reg_rtx));
+      && (dead_or_set_p (insn, spill_reg_stored_to[REGNO (reg_rtx)])
+	  || rtx_equal_p (spill_reg_stored_to[REGNO (reg_rtx)], rl->out_reg)))
+    delete_output_reload (insn, j, REGNO (reg_rtx), reg_rtx);
 }
 
 /* Do output reloading for reload RL, which is for the insn described by
@@ -7300,6 +7314,30 @@ do_output_reload (struct insn_chain *cha
      not loaded in this same reload, see if we can eliminate a previous
      store.  */
   rtx pseudo = rl->out_reg;
+  rtx reg_rtx = rl->reg_rtx;
+
+  if (rl->out && reg_rtx)
+    {
+      enum machine_mode mode;
+
+      /* Determine the mode to reload in.
+	 See comments above (for input reloading).  */
+      mode = GET_MODE (rl->out);
+      if (mode == VOIDmode)
+	{
+	  /* VOIDmode should never happen for an output.  */
+	  if (asm_noperands (PATTERN (insn)) < 0)
+	    /* It's the compiler's fault.  */
+	    fatal_insn ("VOIDmode on an output", insn);
+	  error_for_asm (insn, "output operand is constant in %<asm%>");
+	  /* Prevent crash--use something we know is valid.  */
+	  mode = word_mode;
+	  rl->out = gen_rtx_REG (mode, REGNO (reg_rtx));
+	}
+      if (GET_MODE (reg_rtx) != mode)
+	reg_rtx = reload_adjust_reg_for_mode (reg_rtx, mode);
+    }
+  reload_reg_rtx_for_output[j] = reg_rtx;
 
   if (pseudo
       && optimize
@@ -7318,13 +7356,13 @@ do_output_reload (struct insn_chain *cha
 	  && reg_reloaded_contents[last_regno] == pseudo_no
 	  && spill_reg_store[last_regno]
 	  && rtx_equal_p (pseudo, spill_reg_stored_to[last_regno]))
-	delete_output_reload (insn, j, last_regno);
+	delete_output_reload (insn, j, last_regno, reg_rtx);
     }
 
   old = rl->out_reg;
   if (old == 0
-      || rl->reg_rtx == old
-      || rl->reg_rtx == 0)
+      || reg_rtx == 0
+      || rtx_equal_p (old, reg_rtx))
     return;
 
   /* An output operand that dies right away does need a reload,
@@ -7333,7 +7371,7 @@ do_output_reload (struct insn_chain *cha
   if ((REG_P (old) || GET_CODE (old) == SCRATCH)
       && (note = find_reg_note (insn, REG_UNUSED, old)) != 0)
     {
-      XEXP (note, 0) = rl->reg_rtx;
+      XEXP (note, 0) = reg_rtx;
       return;
     }
   /* Likewise for a SUBREG of an operand that dies.  */
@@ -7342,8 +7380,7 @@ do_output_reload (struct insn_chain *cha
 	   && 0 != (note = find_reg_note (insn, REG_UNUSED,
 					  SUBREG_REG (old))))
     {
-      XEXP (note, 0) = gen_lowpart_common (GET_MODE (old),
-					   rl->reg_rtx);
+      XEXP (note, 0) = gen_lowpart_common (GET_MODE (old), reg_rtx);
       return;
     }
   else if (GET_CODE (old) == SCRATCH)
@@ -7357,22 +7394,20 @@ do_output_reload (struct insn_chain *cha
   emit_output_reload_insns (chain, rld + j, j);
 }
 
-/* Reload number R reloads from or to a group of hard registers starting at
-   register REGNO.  Return true if it can be treated for inheritance purposes
-   like a group of reloads, each one reloading a single hard register.
-   The caller has already checked that the spill register and REGNO use
-   the same number of registers to store the reload value.  */
+/* A reload copies values of MODE from register SRC to register DEST.
+   Return true if it can be treated for inheritance purposes like a
+   group of reloads, each one reloading a single hard register.  The
+   caller has already checked that (reg:MODE SRC) and (reg:MODE DEST)
+   occupy the same number of hard registers.  */
 
 static bool
-inherit_piecemeal_p (int r ATTRIBUTE_UNUSED, int regno ATTRIBUTE_UNUSED)
+inherit_piecemeal_p (int dest ATTRIBUTE_UNUSED,
+		     int src ATTRIBUTE_UNUSED,
+		     enum machine_mode mode ATTRIBUTE_UNUSED)
 {
 #ifdef CANNOT_CHANGE_MODE_CLASS
-  return (!REG_CANNOT_CHANGE_MODE_P (reload_spill_index[r],
-				     GET_MODE (rld[r].reg_rtx),
-				     reg_raw_mode[reload_spill_index[r]])
-	  && !REG_CANNOT_CHANGE_MODE_P (regno,
-					GET_MODE (rld[r].reg_rtx),
-					reg_raw_mode[regno]));
+  return (!REG_CANNOT_CHANGE_MODE_P (dest, mode, reg_raw_mode[dest])
+	  && !REG_CANNOT_CHANGE_MODE_P (src, mode, reg_raw_mode[src]));
 #else
   return true;
 #endif
@@ -7414,9 +7449,13 @@ emit_reload_insns (struct insn_chain *ch
 
   for (j = 0; j < n_reloads; j++)
     {
-      if (rld[j].reg_rtx
-	  && REGNO (rld[j].reg_rtx) < FIRST_PSEUDO_REGISTER)
-	new_spill_reg_store[REGNO (rld[j].reg_rtx)] = 0;
+      if (rld[j].reg_rtx && HARD_REGISTER_P (rld[j].reg_rtx))
+	{
+	  unsigned int i;
+
+	  for (i = REGNO (rld[j].reg_rtx); i < END_REGNO (rld[j].reg_rtx); i++)
+	    new_spill_reg_store[i] = 0;
+	}
 
       do_input_reload (chain, rld + j, j);
       do_output_reload (chain, rld + j, j);
@@ -7515,104 +7554,108 @@ emit_reload_insns (struct insn_chain *ch
 	{
 	  int nr = hard_regno_nregs[i][GET_MODE (rld[r].reg_rtx)];
 	  int k;
-	  int part_reaches_end = 0;
-	  int all_reaches_end = 1;
 
 	  /* For a multi register reload, we need to check if all or part
 	     of the value lives to the end.  */
 	  for (k = 0; k < nr; k++)
-	    {
-	      if (reload_reg_reaches_end_p (i + k, rld[r].opnum,
-					    rld[r].when_needed))
-		part_reaches_end = 1;
-	      else
-		all_reaches_end = 0;
-	    }
-
-	  /* Ignore reloads that don't reach the end of the insn in
-	     entirety.  */
-	  if (all_reaches_end)
-	    {
-	      /* First, clear out memory of what used to be in this spill reg.
-		 If consecutive registers are used, clear them all.  */
-
-	      for (k = 0; k < nr; k++)
-  	        {
-		CLEAR_HARD_REG_BIT (reg_reloaded_valid, i + k);
-  		  CLEAR_HARD_REG_BIT (reg_reloaded_call_part_clobbered, i + k);
-  		}
-
-	      /* Maybe the spill reg contains a copy of reload_out.  */
-	      if (rld[r].out != 0
-		  && (REG_P (rld[r].out)
+	    if (reload_reg_reaches_end_p (i + k, rld[r].opnum,
+					  rld[r].when_needed))
+	      CLEAR_HARD_REG_BIT (reg_reloaded_valid, i + k);
+
+	  /* Maybe the spill reg contains a copy of reload_out.  */
+	  if (rld[r].out != 0
+	      && (REG_P (rld[r].out)
 #ifdef AUTO_INC_DEC
-		      || ! rld[r].out_reg
+		  || ! rld[r].out_reg
 #endif
-		      || REG_P (rld[r].out_reg)))
+		  || REG_P (rld[r].out_reg)))
+	    {
+	      rtx reg;
+	      enum machine_mode mode;
+	      int regno, nregs;
+
+	      reg = reload_reg_rtx_for_output[r];
+	      mode = GET_MODE (reg);
+	      regno = REGNO (reg);
+	      nregs = hard_regno_nregs[regno][mode];
+	      if (reload_regs_reach_end_p (regno, nregs, rld[r].opnum,
+					   rld[r].when_needed))
 		{
 		  rtx out = (REG_P (rld[r].out)
 			     ? rld[r].out
 			     : rld[r].out_reg
 			     ? rld[r].out_reg
 /* AUTO_INC */		     : XEXP (rld[r].in_reg, 0));
-		  int nregno = REGNO (out);
-		  int nnr = (nregno >= FIRST_PSEUDO_REGISTER ? 1
-			     : hard_regno_nregs[nregno]
-					       [GET_MODE (rld[r].reg_rtx)]);
+		  int out_regno = REGNO (out);
+		  int out_nregs = (!HARD_REGISTER_NUM_P (out_regno) ? 1
+				   : hard_regno_nregs[out_regno][mode]);
 		  bool piecemeal;
 
-		  spill_reg_store[i] = new_spill_reg_store[i];
-		  spill_reg_stored_to[i] = out;
-		  reg_last_reload_reg[nregno] = rld[r].reg_rtx;
-
-		  piecemeal = (nregno < FIRST_PSEUDO_REGISTER
-			       && nr == nnr
-			       && inherit_piecemeal_p (r, nregno));
+		  spill_reg_store[regno] = new_spill_reg_store[regno];
+		  spill_reg_stored_to[regno] = out;
+		  reg_last_reload_reg[out_regno] = reg;
+
+		  piecemeal = (HARD_REGISTER_NUM_P (out_regno)
+			       && nregs == out_nregs
+			       && inherit_piecemeal_p (out_regno, regno, mode));
 
-		  /* If NREGNO is a hard register, it may occupy more than
+		  /* If OUT_REGNO is a hard register, it may occupy more than
 		     one register.  If it does, say what is in the
 		     rest of the registers assuming that both registers
 		     agree on how many words the object takes.  If not,
 		     invalidate the subsequent registers.  */
 
-		  if (nregno < FIRST_PSEUDO_REGISTER)
-		    for (k = 1; k < nnr; k++)
-		      reg_last_reload_reg[nregno + k]
-			= (piecemeal
-			   ? regno_reg_rtx[REGNO (rld[r].reg_rtx) + k]
-			   : 0);
+		  if (HARD_REGISTER_NUM_P (out_regno))
+		    for (k = 1; k < out_nregs; k++)
+		      reg_last_reload_reg[out_regno + k]
+			= (piecemeal ? regno_reg_rtx[regno + k] : 0);
 
 		  /* Now do the inverse operation.  */
-		  for (k = 0; k < nr; k++)
+		  for (k = 0; k < nregs; k++)
 		    {
-		      CLEAR_HARD_REG_BIT (reg_reloaded_dead, i + k);
-		      reg_reloaded_contents[i + k]
-			= (nregno >= FIRST_PSEUDO_REGISTER || !piecemeal
-			   ? nregno
-			   : nregno + k);
-		      reg_reloaded_insn[i + k] = insn;
-		      SET_HARD_REG_BIT (reg_reloaded_valid, i + k);
-		      if (HARD_REGNO_CALL_PART_CLOBBERED (i + k, GET_MODE (out)))
-			SET_HARD_REG_BIT (reg_reloaded_call_part_clobbered, i + k);
+		      CLEAR_HARD_REG_BIT (reg_reloaded_dead, regno + k);
+		      reg_reloaded_contents[regno + k]
+			= (!HARD_REGISTER_NUM_P (out_regno) || !piecemeal
+			   ? out_regno
+			   : out_regno + k);
+		      reg_reloaded_insn[regno + k] = insn;
+		      SET_HARD_REG_BIT (reg_reloaded_valid, regno + k);
+		      if (HARD_REGNO_CALL_PART_CLOBBERED (regno + k, mode))
+			SET_HARD_REG_BIT (reg_reloaded_call_part_clobbered,
+					  regno + k);
+		      else
+			CLEAR_HARD_REG_BIT (reg_reloaded_call_part_clobbered,
+					    regno + k);
 		    }
 		}
-
-	      /* Maybe the spill reg contains a copy of reload_in.  Only do
-		 something if there will not be an output reload for
-		 the register being reloaded.  */
-	      else if (rld[r].out_reg == 0
-		       && rld[r].in != 0
-		       && ((REG_P (rld[r].in)
-			    && REGNO (rld[r].in) >= FIRST_PSEUDO_REGISTER
-	                    && !REGNO_REG_SET_P (&reg_has_output_reload,
-			      			 REGNO (rld[r].in)))
-			   || (REG_P (rld[r].in_reg)
-			       && !REGNO_REG_SET_P (&reg_has_output_reload,
-						    REGNO (rld[r].in_reg))))
-		       && ! reg_set_p (rld[r].reg_rtx, PATTERN (insn)))
+	    }
+	  /* Maybe the spill reg contains a copy of reload_in.  Only do
+	     something if there will not be an output reload for
+	     the register being reloaded.  */
+	  else if (rld[r].out_reg == 0
+		   && rld[r].in != 0
+		   && ((REG_P (rld[r].in)
+			&& !HARD_REGISTER_P (rld[r].in)
+			&& !REGNO_REG_SET_P (&reg_has_output_reload,
+					     REGNO (rld[r].in)))
+		       || (REG_P (rld[r].in_reg)
+			   && !REGNO_REG_SET_P (&reg_has_output_reload,
+						REGNO (rld[r].in_reg))))
+		   && !reg_set_p (reload_reg_rtx_for_input[r], PATTERN (insn)))
+	    {
+	      rtx reg;
+	      enum machine_mode mode;
+	      int regno, nregs;
+
+	      reg = reload_reg_rtx_for_input[r];
+	      mode = GET_MODE (reg);
+	      regno = REGNO (reg);
+	      nregs = hard_regno_nregs[regno][mode];
+	      if (reload_regs_reach_end_p (regno, nregs, rld[r].opnum,
+					   rld[r].when_needed))
 		{
-		  int nregno;
-		  int nnr;
+		  int in_regno;
+		  int in_nregs;
 		  rtx in;
 		  bool piecemeal;
 
@@ -7623,24 +7666,21 @@ emit_reload_insns (struct insn_chain *ch
 		    in = rld[r].in_reg;
 		  else
 		    in = XEXP (rld[r].in_reg, 0);
-		  nregno = REGNO (in);
+		  in_regno = REGNO (in);
+
+		  in_nregs = (!HARD_REGISTER_NUM_P (in_regno) ? 1
+			      : hard_regno_nregs[in_regno][mode]);
 
-		  nnr = (nregno >= FIRST_PSEUDO_REGISTER ? 1
-			 : hard_regno_nregs[nregno]
-					   [GET_MODE (rld[r].reg_rtx)]);
-
-		  reg_last_reload_reg[nregno] = rld[r].reg_rtx;
-
-		  piecemeal = (nregno < FIRST_PSEUDO_REGISTER
-			       && nr == nnr
-			       && inherit_piecemeal_p (r, nregno));
-
-		  if (nregno < FIRST_PSEUDO_REGISTER)
-		    for (k = 1; k < nnr; k++)
-		      reg_last_reload_reg[nregno + k]
-			= (piecemeal
-			   ? regno_reg_rtx[REGNO (rld[r].reg_rtx) + k]
-			   : 0);
+		  reg_last_reload_reg[in_regno] = reg;
+
+		  piecemeal = (HARD_REGISTER_NUM_P (in_regno)
+			       && nregs == in_nregs
+			       && inherit_piecemeal_p (regno, in_regno, mode));
+
+		  if (HARD_REGISTER_NUM_P (in_regno))
+		    for (k = 1; k < in_nregs; k++)
+		      reg_last_reload_reg[in_regno + k]
+			= (piecemeal ? regno_reg_rtx[regno + k] : 0);
 
 		  /* Unless we inherited this reload, show we haven't
 		     recently done a store.
@@ -7648,33 +7688,26 @@ emit_reload_insns (struct insn_chain *ch
 		     also have to be discarded.  */
 		  if (! reload_inherited[r]
 		      || (rld[r].out && ! rld[r].out_reg))
-		    spill_reg_store[i] = 0;
+		    spill_reg_store[regno] = 0;
 
-		  for (k = 0; k < nr; k++)
+		  for (k = 0; k < nregs; k++)
 		    {
-		      CLEAR_HARD_REG_BIT (reg_reloaded_dead, i + k);
-		      reg_reloaded_contents[i + k]
-			= (nregno >= FIRST_PSEUDO_REGISTER || !piecemeal
-			   ? nregno
-			   : nregno + k);
-		      reg_reloaded_insn[i + k] = insn;
-		      SET_HARD_REG_BIT (reg_reloaded_valid, i + k);
-		      if (HARD_REGNO_CALL_PART_CLOBBERED (i + k, GET_MODE (in)))
-			SET_HARD_REG_BIT (reg_reloaded_call_part_clobbered, i + k);
+		      CLEAR_HARD_REG_BIT (reg_reloaded_dead, regno + k);
+		      reg_reloaded_contents[regno + k]
+			= (!HARD_REGISTER_NUM_P (in_regno) || !piecemeal
+			   ? in_regno
+			   : in_regno + k);
+		      reg_reloaded_insn[regno + k] = insn;
+		      SET_HARD_REG_BIT (reg_reloaded_valid, regno + k);
+		      if (HARD_REGNO_CALL_PART_CLOBBERED (regno + k, mode))
+			SET_HARD_REG_BIT (reg_reloaded_call_part_clobbered,
+					  regno + k);
+		      else
+			CLEAR_HARD_REG_BIT (reg_reloaded_call_part_clobbered,
+					    regno + k);
 		    }
 		}
 	    }
-
-	  /* However, if part of the reload reaches the end, then we must
-	     invalidate the old info for the part that survives to the end.  */
-	  else if (part_reaches_end)
-	    {
-	      for (k = 0; k < nr; k++)
-		if (reload_reg_reaches_end_p (i + k,
-					      rld[r].opnum,
-					      rld[r].when_needed))
-		  CLEAR_HARD_REG_BIT (reg_reloaded_valid, i + k);
-	    }
 	}
 
       /* The following if-statement was #if 0'd in 1.34 (or before...).
@@ -7687,7 +7720,7 @@ emit_reload_insns (struct insn_chain *ch
 	 it thinks only about the original insn.  So invalidate it here.
 	 Also do the same thing for RELOAD_OTHER constraints where the
 	 output is discarded.  */
-      if (i < 0 
+      if (i < 0
 	  && ((rld[r].out != 0
 	       && (REG_P (rld[r].out)
 		   || (MEM_P (rld[r].out)
@@ -7697,7 +7730,8 @@ emit_reload_insns (struct insn_chain *ch
 	{
 	  rtx out = ((rld[r].out && REG_P (rld[r].out))
 		     ? rld[r].out : rld[r].out_reg);
-	  int nregno = REGNO (out);
+	  int out_regno = REGNO (out);
+	  enum machine_mode mode = GET_MODE (out);
 
 	  /* REG_RTX is now set or clobbered by the main instruction.
 	     As the comment above explains, forget_old_reloads_1 only
@@ -7715,16 +7749,16 @@ emit_reload_insns (struct insn_chain *ch
 	  if (rld[r].reg_rtx && rld[r].reg_rtx != out)
 	    forget_old_reloads_1 (rld[r].reg_rtx, NULL_RTX, NULL);
 
-	  if (nregno >= FIRST_PSEUDO_REGISTER)
+	  if (!HARD_REGISTER_NUM_P (out_regno))
 	    {
 	      rtx src_reg, store_insn = NULL_RTX;
 
-	      reg_last_reload_reg[nregno] = 0;
+	      reg_last_reload_reg[out_regno] = 0;
 
 	      /* If we can find a hard register that is stored, record
 		 the storing insn so that we may delete this insn with
 		 delete_output_reload.  */
-	      src_reg = rld[r].reg_rtx;
+	      src_reg = reload_reg_rtx_for_output[r];
 
 	      /* If this is an optional reload, try to find the source reg
 		 from an input reload.  */
@@ -7741,7 +7775,7 @@ emit_reload_insns (struct insn_chain *ch
 			{
 			  if (rld[k].in == src_reg)
 			    {
-			      src_reg = rld[k].reg_rtx;
+			      src_reg = reload_reg_rtx_for_input[k];
 			      break;
 			    }
 			}
@@ -7752,47 +7786,54 @@ emit_reload_insns (struct insn_chain *ch
 	      if (src_reg && REG_P (src_reg)
 		  && REGNO (src_reg) < FIRST_PSEUDO_REGISTER)
 		{
-		  int src_regno = REGNO (src_reg);
-		  int nr = hard_regno_nregs[src_regno][rld[r].mode];
+		  int src_regno, src_nregs, k;
+		  rtx note;
+
+		  gcc_assert (GET_MODE (src_reg) == mode);
+		  src_regno = REGNO (src_reg);
+		  src_nregs = hard_regno_nregs[src_regno][mode];
 		  /* The place where to find a death note varies with
 		     PRESERVE_DEATH_INFO_REGNO_P .  The condition is not
 		     necessarily checked exactly in the code that moves
 		     notes, so just check both locations.  */
-		  rtx note = find_regno_note (insn, REG_DEAD, src_regno);
+		  note = find_regno_note (insn, REG_DEAD, src_regno);
 		  if (! note && store_insn)
 		    note = find_regno_note (store_insn, REG_DEAD, src_regno);
-		  while (nr-- > 0)
+		  for (k = 0; k < src_nregs; k++)
 		    {
-		      spill_reg_store[src_regno + nr] = store_insn;
-		      spill_reg_stored_to[src_regno + nr] = out;
-		      reg_reloaded_contents[src_regno + nr] = nregno;
-		      reg_reloaded_insn[src_regno + nr] = store_insn;
-		      CLEAR_HARD_REG_BIT (reg_reloaded_dead, src_regno + nr);
-		      SET_HARD_REG_BIT (reg_reloaded_valid, src_regno + nr);
-		      if (HARD_REGNO_CALL_PART_CLOBBERED (src_regno + nr, 
-							  GET_MODE (src_reg)))
+		      spill_reg_store[src_regno + k] = store_insn;
+		      spill_reg_stored_to[src_regno + k] = out;
+		      reg_reloaded_contents[src_regno + k] = out_regno;
+		      reg_reloaded_insn[src_regno + k] = store_insn;
+		      CLEAR_HARD_REG_BIT (reg_reloaded_dead, src_regno + k);
+		      SET_HARD_REG_BIT (reg_reloaded_valid, src_regno + k);
+		      if (HARD_REGNO_CALL_PART_CLOBBERED (src_regno + k,
+							  mode))
 			SET_HARD_REG_BIT (reg_reloaded_call_part_clobbered, 
-					  src_regno + nr);
-		      SET_HARD_REG_BIT (reg_is_output_reload, src_regno + nr);
+					  src_regno + k);
+		      else
+			CLEAR_HARD_REG_BIT (reg_reloaded_call_part_clobbered,
+					    src_regno + k);
+		      SET_HARD_REG_BIT (reg_is_output_reload, src_regno + k);
 		      if (note)
 			SET_HARD_REG_BIT (reg_reloaded_died, src_regno);
 		      else
 			CLEAR_HARD_REG_BIT (reg_reloaded_died, src_regno);
 		    }
-		  reg_last_reload_reg[nregno] = src_reg;
+		  reg_last_reload_reg[out_regno] = src_reg;
 		  /* We have to set reg_has_output_reload here, or else 
 		     forget_old_reloads_1 will clear reg_last_reload_reg
 		     right away.  */
 		  SET_REGNO_REG_SET (&reg_has_output_reload,
-				     nregno);
+				     out_regno);
 		}
 	    }
 	  else
 	    {
-	      int num_regs = hard_regno_nregs[nregno][GET_MODE (out)];
+	      int k, out_nregs = hard_regno_nregs[out_regno][mode];
 
-	      while (num_regs-- > 0)
-		reg_last_reload_reg[nregno + num_regs] = 0;
+	      for (k = 0; k < out_nregs; k++)
+		reg_last_reload_reg[out_regno + k] = 0;
 	    }
 	}
     }
@@ -8075,10 +8116,11 @@ gen_reload (rtx out, rtx in, int opnum, 
    LAST_RELOAD_REG is the hard register number for which we want to delete
    the last output reload.
    J is the reload-number that originally used REG.  The caller has made
-   certain that reload J doesn't use REG any longer for input.  */
+   certain that reload J doesn't use REG any longer for input.
+   NEW_RELOAD_REG is reload register that reload J is using for REG.  */
 
 static void
-delete_output_reload (rtx insn, int j, int last_reload_reg)
+delete_output_reload (rtx insn, int j, int last_reload_reg, rtx new_reload_reg)
 {
   rtx output_reload_insn = spill_reg_store[last_reload_reg];
   rtx reg = spill_reg_stored_to[last_reload_reg];
@@ -8230,7 +8272,7 @@ delete_output_reload (rtx insn, int j, i
 	}
 
       /* For the debugging info, say the pseudo lives in this reload reg.  */
-      reg_renumber[REGNO (reg)] = REGNO (rld[j].reg_rtx);
+      reg_renumber[REGNO (reg)] = REGNO (new_reload_reg);
       alter_reg (REGNO (reg), -1);
     }
   else
Index: gcc/testsuite/gcc.target/mips/pr35232.c
===================================================================
--- /dev/null	2008-03-17 07:04:11.552097000 +0000
+++ gcc/testsuite/gcc.target/mips/pr35232.c	2008-03-18 22:25:55.000000000 +0000
@@ -0,0 +1,17 @@
+/* { dg-do run } */
+/* { dg-mips-options "-O" } */
+
+NOMIPS16 unsigned int
+f1 (unsigned long long x)
+{
+  unsigned int r;
+  asm ("# %0" : "=a" (r) : "0" (x));
+  asm ("# %0" : "=h" (r) : "0" (r));
+  return r;
+}
+
+int
+main (void)
+{
+  return f1 (4) != 4;
+}

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: PR 35232: Recent regression due to reload inheritance bug
  2008-03-19 23:39           ` Richard Sandiford
@ 2008-03-25 20:47             ` Ulrich Weigand
  0 siblings, 0 replies; 18+ messages in thread
From: Ulrich Weigand @ 2008-03-25 20:47 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc-patches

Richard Sandiford wrote:

> gcc/
> 	PR rtl-optimization/35232
> 	* reload1.c (reg_reloaded_call_part_clobbered): Clarify comment.
> 	(forget_old_reloads_1, forget_marked_reloads): Don't clear
> 	reg_reloaded_call_part_clobbered here.
> 	(reload_regs_reach_end_p): New function.
> 	(reload_reg_rtx_for_input): New variable.
> 	(reload_reg_rtx_for_output): Likewise.
> 	(emit_input_reload_insns): Use reloadreg rather than rl->reg_rtx
> 	when reassigning a pseudo register.  Load reloadreg from 
> 	reload_reg_rtx_for_input, moving the mode and register
> 	calculation to...
> 	(do_input_reload): ...here.  Use the mode-adjusted reg_rtx
> 	instead of the original when deciding whether an input reload
> 	would be a no-op or whether an output reload can be deleted.
> 	(emit_output_reload_insns): Use the mode-adjusted reg_rtx
> 	when setting up new_spill_reg_store.  Load it from
> 	reload_reg_rtx_for_output, moving the mode and register
> 	calculation to...
> 	(do_output_reload): ...here.  Use the mode-adjusted reg_rtx
> 	instead of the original when deciding whether an output reload
> 	would be a no-op.  Do the same when modifying insn notes.
> 	Use rtx_equal_p instead of == to compare the registers.
> 	(inherit_piecemeal_p): Take a mode and two register numbers
> 	as argument.
> 	(emit_reload_insns): Clear new_spill_reg_store for every hard
> 	register in the reload register.  Remove spill registers
> 	from reg_reloaded_valid before considering whether to record
> 	inheritance information for them.  Use reload_reg_rtx_for_output
> 	instead of reg_rtx when recording output reloads.  Use
> 	reload_reg_rtx_for_input instead of reg_rtx when recording
> 	input reloads.  Set or clear reg_reloaded_call_part_clobbered
> 	at the same time as setting reg_reloaded_valid.
> 	(delete_output_reload): Add a new_reload_reg parameter and use it
> 	instead of rld[j].reg_rtx.
> 	(emit_input_reload_insns, do_input_reload, do_output_reload): Adjust
> 	calls accordingly.
> 
> gcc/testsuite/
> 	PR rtl-optimization/35232
> 	* gcc.target/mips/pr35232.c: New test.

Thanks for making those changes; this version is OK for mainline.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: PR 35232: Recent regression due to reload inheritance bug
  2008-02-20 12:49               ` Jakub Jelinek
@ 2008-02-20 19:54                 ` Mark Mitchell
  0 siblings, 0 replies; 18+ messages in thread
From: Mark Mitchell @ 2008-02-20 19:54 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: GCC Patches, rsandifo

Jakub Jelinek wrote:

>> Understood.  But these sort of things often need both the area maintainer
>> and RM to sign off on them.  I was more asking from the RM point of view.
>> (Specifically, I was worried that XFAILing the test might somehow stop
>> the PR from meeting the branch criteria.)
> 
> No, XFAILing doesn't change anything on whether something is a regression or
> not.

To echo Jakub, I think that if this works out well on mainline then 
backporting it to 4.3.1 would be totally appropriate.  It's certainly 
still a regression, even if XFAILed in 4.3.0.

Thanks,

-- 
Mark Mitchell
CodeSourcery
mark@codesourcery.com
(650) 331-3385 x713

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: PR 35232: Recent regression due to reload inheritance bug
  2008-02-20 12:27             ` Richard Sandiford
@ 2008-02-20 12:49               ` Jakub Jelinek
  2008-02-20 19:54                 ` Mark Mitchell
  0 siblings, 1 reply; 18+ messages in thread
From: Jakub Jelinek @ 2008-02-20 12:49 UTC (permalink / raw)
  To: GCC Patches, rsandifo

On Wed, Feb 20, 2008 at 12:17:09PM +0000, Richard Sandiford wrote:
> Jakub Jelinek <jakub@redhat.com> writes:
> > On Wed, Feb 20, 2008 at 11:28:02AM +0000, Richard Sandiford wrote:
> >> OK, so in summary: if the reload patch stands the test of time on mainline,
> >> it'll be OK to backport to 4.3.1?
> >
> > That will depend on reload maintainer's (or GWP) decision.
> 
> Understood.  But these sort of things often need both the area maintainer
> and RM to sign off on them.  I was more asking from the RM point of view.
> (Specifically, I was worried that XFAILing the test might somehow stop
> the PR from meeting the branch criteria.)

No, XFAILing doesn't change anything on whether something is a regression or
not.  When 4.3 will be released, the 4.3 branch will be in the usual mode,
no RM approval will be needed.

	Jakub

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: PR 35232: Recent regression due to reload inheritance bug
  2008-02-20 12:21           ` Jakub Jelinek
@ 2008-02-20 12:27             ` Richard Sandiford
  2008-02-20 12:49               ` Jakub Jelinek
  0 siblings, 1 reply; 18+ messages in thread
From: Richard Sandiford @ 2008-02-20 12:27 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: GCC Patches

Jakub Jelinek <jakub@redhat.com> writes:
> On Wed, Feb 20, 2008 at 11:28:02AM +0000, Richard Sandiford wrote:
>> OK, so in summary: if the reload patch stands the test of time on mainline,
>> it'll be OK to backport to 4.3.1?
>
> That will depend on reload maintainer's (or GWP) decision.

Understood.  But these sort of things often need both the area maintainer
and RM to sign off on them.  I was more asking from the RM point of view.
(Specifically, I was worried that XFAILing the test might somehow stop
the PR from meeting the branch criteria.)

>>  If so, I'm happy skipping the test
>> for MIPS16 targets for 4.3.0.  (I suppose we'll have to skip it rather
>> than XFAIL it given that it's an ICE.)
>> 
>> Patch tested on mipsisa64-elfoabi and x86_64-linux-gnu (making sure it's
>> skipped for MIPS16 and not otherwise).  OK for 4.3?
>
> Fine with me.

Thanks, applied.

> Do you agree we can downgrade PR35232 to P2 to reflect it is not a
> release blocker?

Yeah, that's OK with me.

Richard

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: PR 35232: Recent regression due to reload inheritance bug
  2008-02-20 11:58         ` Richard Sandiford
@ 2008-02-20 12:21           ` Jakub Jelinek
  2008-02-20 12:27             ` Richard Sandiford
  0 siblings, 1 reply; 18+ messages in thread
From: Jakub Jelinek @ 2008-02-20 12:21 UTC (permalink / raw)
  To: GCC Patches, rsandifo

On Wed, Feb 20, 2008 at 11:28:02AM +0000, Richard Sandiford wrote:
> OK, so in summary: if the reload patch stands the test of time on mainline,
> it'll be OK to backport to 4.3.1?

That will depend on reload maintainer's (or GWP) decision.

>  If so, I'm happy skipping the test
> for MIPS16 targets for 4.3.0.  (I suppose we'll have to skip it rather
> than XFAIL it given that it's an ICE.)
> 
> Patch tested on mipsisa64-elfoabi and x86_64-linux-gnu (making sure it's
> skipped for MIPS16 and not otherwise).  OK for 4.3?

Fine with me.  Do you agree we can downgrade PR35232 to P2 to reflect
it is not a release blocker?

> gcc/testsuite/
> 	PR rtl-optimization/35232
> 	* gcc.dg/torture/fp-int-convert-float.c: Skip for MIPS16 targets.
> 	* gcc.dg/torture/fp-int-convert-double.c: Likewise.
> 	* gcc.dg/torture/fp-int-convert-long-double.c: Likewise.

	Jakub

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: PR 35232: Recent regression due to reload inheritance bug
  2008-02-20 10:00       ` Jakub Jelinek
@ 2008-02-20 11:58         ` Richard Sandiford
  2008-02-20 12:21           ` Jakub Jelinek
  0 siblings, 1 reply; 18+ messages in thread
From: Richard Sandiford @ 2008-02-20 11:58 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: GCC Patches

Jakub Jelinek <jakub@redhat.com> writes:
> Is your inline asm testcase working with the same options in 4.2 (i.e. is
> the reload inheritance bug a regression), or is the only regression
> that fp-int-convert-double now triggers it?

The only regression is the fp-int-convert-double thing.  The inline asm
testcase failed for 4.2 and 4.1, and probably long before that.

> If I see the same bug as you in my x86_64-linux -> mipsisa64-elf
> cross, then you need a lot of different fp conversions together to trigger
> it and without volatile being used in almost all places it doesn't trigger
> either.  Perhaps I haven't minimized it completely, but at -O0 I count
> 8 different __mips16_float*df calls and 5 different __mips16_fix_*df, if I
> remove further code it doesn't trigger any more.  Is this something that
> could hit people on real-world code?

TBH, I'm not sure whether it's likely or not.  The failure happens at -O2
and above for both fp-int-convert-double.c and fp-int-convert-float.c[*].
I realise that the two tests have the same structure, but even so,
I'm used to seeing reload bugs affect one optimisation option only,
and to be sensitive to the sizes of the types involved.  That's what
made me a bit nervous about user-visible impact.  On the other hand,
I agree the volatiles are unusual, and they probably make -O2 and -O3
pretty much equivalent.

[*] and fp-int-convert-long-double.c as well, but long double == double,
    so that's not really relevant.

> If your reload patch is tested sufficiently on the trunk for a while,
> then I think it can be backported for 4.3.1, I just believe it is too
> late for such reload patches before the release.

Understood.

> Punishing all targets because of this is IMHO undesirable, so we either
> put a MIPS special-casing in optabs.c for MIPS, or XFAIL this test for
> mips.

OK, so in summary: if the reload patch stands the test of time on mainline,
it'll be OK to backport to 4.3.1?  If so, I'm happy skipping the test
for MIPS16 targets for 4.3.0.  (I suppose we'll have to skip it rather
than XFAIL it given that it's an ICE.)

Patch tested on mipsisa64-elfoabi and x86_64-linux-gnu (making sure it's
skipped for MIPS16 and not otherwise).  OK for 4.3?

Richard


gcc/testsuite/
	PR rtl-optimization/35232
	* gcc.dg/torture/fp-int-convert-float.c: Skip for MIPS16 targets.
	* gcc.dg/torture/fp-int-convert-double.c: Likewise.
	* gcc.dg/torture/fp-int-convert-long-double.c: Likewise.

Index: gcc/testsuite/gcc.dg/torture/fp-int-convert-float.c
===================================================================
--- gcc/testsuite/gcc.dg/torture/fp-int-convert-float.c	(revision 132382)
+++ gcc/testsuite/gcc.dg/torture/fp-int-convert-float.c	(working copy)
@@ -1,6 +1,7 @@
 /* Test floating-point conversions.  Standard types and float.  */
 /* Origin: Joseph Myers <joseph@codesourcery.com> */
-/* { dg-do run } */
+/* Skipped for MIPS16 targets because of PR 35232 */
+/* { dg-do run { target { { ! mips*-*-* } || nomips16 } } } */
 /* { dg-options "" } */
 
 #include <float.h>
Index: gcc/testsuite/gcc.dg/torture/fp-int-convert-long-double.c
===================================================================
--- gcc/testsuite/gcc.dg/torture/fp-int-convert-long-double.c	(revision 132382)
+++ gcc/testsuite/gcc.dg/torture/fp-int-convert-long-double.c	(working copy)
@@ -1,6 +1,7 @@
 /* Test floating-point conversions.  Standard types and long double.  */
 /* Origin: Joseph Myers <joseph@codesourcery.com> */
-/* { dg-do run } */
+/* Skipped for MIPS16 targets because of PR 35232 */
+/* { dg-do run { target { { ! mips*-*-* } || nomips16 } } } */
 /* { dg-options "" } */
 
 #include <float.h>
Index: gcc/testsuite/gcc.dg/torture/fp-int-convert-double.c
===================================================================
--- gcc/testsuite/gcc.dg/torture/fp-int-convert-double.c	(revision 132382)
+++ gcc/testsuite/gcc.dg/torture/fp-int-convert-double.c	(working copy)
@@ -1,6 +1,7 @@
 /* Test floating-point conversions.  Standard types and double.  */
 /* Origin: Joseph Myers <joseph@codesourcery.com> */
-/* { dg-do run } */
+/* Skipped for MIPS16 targets because of PR 35232 */
+/* { dg-do run { target { { ! mips*-*-* } || nomips16 } } } */
 /* { dg-options "" } */
 
 #include <float.h>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: PR 35232: Recent regression due to reload inheritance bug
  2008-02-19 14:06     ` Richard Sandiford
@ 2008-02-20 10:00       ` Jakub Jelinek
  2008-02-20 11:58         ` Richard Sandiford
  0 siblings, 1 reply; 18+ messages in thread
From: Jakub Jelinek @ 2008-02-20 10:00 UTC (permalink / raw)
  To: GCC Patches, rsandifo

On Tue, Feb 19, 2008 at 01:55:34PM +0000, Richard Sandiford wrote:
> > For 4.3, have you considered just adding new mips.md patterns which would
> > expand to whatever you need to workaround the reload problem (whether it is
> > expand unsigned in terms of signed fallback or something else)?
> 
> You mean, define backend patterns for MIPS16 that duplicate the
> accidentally-chosen optabs.c code?  That seems a lot more fragile

Yep.

> than special-casing the optabs.c condition for MIPS (although
> I wouldn't really want to do that either).

Is your inline asm testcase working with the same options in 4.2 (i.e. is
the reload inheritance bug a regression), or is the only regression
that fp-int-convert-double now triggers it?

If I see the same bug as you in my x86_64-linux -> mipsisa64-elf
cross, then you need a lot of different fp conversions together to trigger
it and without volatile being used in almost all places it doesn't trigger
either.  Perhaps I haven't minimized it completely, but at -O0 I count
8 different __mips16_float*df calls and 5 different __mips16_fix_*df, if I
remove further code it doesn't trigger any more.  Is this something that
could hit people on real-world code?  If your reload patch is tested sufficiently
on the trunk for a while, then I think it can be backported for 4.3.1,
I just believe it is too late for such reload patches before the release.

Punishing all targets because of this is IMHO undesirable, so we either
put a MIPS special-casing in optabs.c for MIPS, or XFAIL this test for
mips.

	Jakub

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: PR 35232: Recent regression due to reload inheritance bug
  2008-02-19 14:01   ` Jakub Jelinek
@ 2008-02-19 14:06     ` Richard Sandiford
  2008-02-20 10:00       ` Jakub Jelinek
  0 siblings, 1 reply; 18+ messages in thread
From: Richard Sandiford @ 2008-02-19 14:06 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: GCC Patches

Jakub Jelinek <jakub@redhat.com> writes:
> On Tue, Feb 19, 2008 at 09:26:53AM +0000, Richard Sandiford wrote:
>> Uros Bizjak <ubizjak@gmail.com> writes:
>> > If we choose to go this way, the easiest solution is to disable 
>> > following "if" in optabs.c, line 5148:
>> >
>> >   /* Unsigned integer, and no way to convert directly.  Convert as signed,
>> >      then unconditionally adjust the result.  */
>> >   if (unsignedp && can_do_signed)
>> >
>> > This will disable generic expansion of unsigned float conversion through 
>> > signed float insn. For x87, we provide SImode and DImode float pattern, 
>> > so this change will result in a libcall for "unsigned long long" 
>> > argument. For SSE targets, we provide unsigned float patterns for 
>> > SImode, so "unsigned long long" will also be affected. For 64bit 
>> > targets, we provide DImode patterns, so at least there is no problems.
>> >
>> > OTOH, I wonder how it is possible that my optabs.c change uncovered 
>> > latent bug you mentioned. Before my patch, we always expanded through 
>> > this generic code - please note logical "or" with !DECIMAL_FLOAT_MODE, 
>> > and my patch in fact disabled this wrong shortcut when neither signed 
>> > nor unsigned float pattern was available.
>> 
>> The bug affects MIPS16 code, which does not have access to the FPU.
>> As you say, before your patch, we were wrongly using the "unsigned in
>> terms of signed" fallback regardless of whether signed hardware support
>> was actually available.  This fallback was inefficient, but happened
>> to avoid the reload bug.  Now we go through the "proper" path and the
>> bug is exposed.
>
> For 4.3, have you considered just adding new mips.md patterns which would
> expand to whatever you need to workaround the reload problem (whether it is
> expand unsigned in terms of signed fallback or something else)?

You mean, define backend patterns for MIPS16 that duplicate the
accidentally-chosen optabs.c code?  That seems a lot more fragile
than special-casing the optabs.c condition for MIPS (although
I wouldn't really want to do that either).

Richard

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: PR 35232: Recent regression due to reload inheritance bug
  2008-02-19 10:01 ` Richard Sandiford
@ 2008-02-19 14:01   ` Jakub Jelinek
  2008-02-19 14:06     ` Richard Sandiford
  0 siblings, 1 reply; 18+ messages in thread
From: Jakub Jelinek @ 2008-02-19 14:01 UTC (permalink / raw)
  To: GCC Patches, rsandifo

On Tue, Feb 19, 2008 at 09:26:53AM +0000, Richard Sandiford wrote:
> Uros Bizjak <ubizjak@gmail.com> writes:
> > If we choose to go this way, the easiest solution is to disable 
> > following "if" in optabs.c, line 5148:
> >
> >   /* Unsigned integer, and no way to convert directly.  Convert as signed,
> >      then unconditionally adjust the result.  */
> >   if (unsignedp && can_do_signed)
> >
> > This will disable generic expansion of unsigned float conversion through 
> > signed float insn. For x87, we provide SImode and DImode float pattern, 
> > so this change will result in a libcall for "unsigned long long" 
> > argument. For SSE targets, we provide unsigned float patterns for 
> > SImode, so "unsigned long long" will also be affected. For 64bit 
> > targets, we provide DImode patterns, so at least there is no problems.
> >
> > OTOH, I wonder how it is possible that my optabs.c change uncovered 
> > latent bug you mentioned. Before my patch, we always expanded through 
> > this generic code - please note logical "or" with !DECIMAL_FLOAT_MODE, 
> > and my patch in fact disabled this wrong shortcut when neither signed 
> > nor unsigned float pattern was available.
> 
> The bug affects MIPS16 code, which does not have access to the FPU.
> As you say, before your patch, we were wrongly using the "unsigned in
> terms of signed" fallback regardless of whether signed hardware support
> was actually available.  This fallback was inefficient, but happened
> to avoid the reload bug.  Now we go through the "proper" path and the
> bug is exposed.

For 4.3, have you considered just adding new mips.md patterns which would
expand to whatever you need to workaround the reload problem (whether it is
expand unsigned in terms of signed fallback or something else)?

	Jakub

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: PR 35232: Recent regression due to reload inheritance bug
  2008-02-18 22:32 Uros Bizjak
@ 2008-02-19 10:01 ` Richard Sandiford
  2008-02-19 14:01   ` Jakub Jelinek
  0 siblings, 1 reply; 18+ messages in thread
From: Richard Sandiford @ 2008-02-19 10:01 UTC (permalink / raw)
  To: GCC Patches

Uros Bizjak <ubizjak@gmail.com> writes:
> If we choose to go this way, the easiest solution is to disable 
> following "if" in optabs.c, line 5148:
>
>   /* Unsigned integer, and no way to convert directly.  Convert as signed,
>      then unconditionally adjust the result.  */
>   if (unsignedp && can_do_signed)
>
> This will disable generic expansion of unsigned float conversion through 
> signed float insn. For x87, we provide SImode and DImode float pattern, 
> so this change will result in a libcall for "unsigned long long" 
> argument. For SSE targets, we provide unsigned float patterns for 
> SImode, so "unsigned long long" will also be affected. For 64bit 
> targets, we provide DImode patterns, so at least there is no problems.
>
> OTOH, I wonder how it is possible that my optabs.c change uncovered 
> latent bug you mentioned. Before my patch, we always expanded through 
> this generic code - please note logical "or" with !DECIMAL_FLOAT_MODE, 
> and my patch in fact disabled this wrong shortcut when neither signed 
> nor unsigned float pattern was available.

The bug affects MIPS16 code, which does not have access to the FPU.
As you say, before your patch, we were wrongly using the "unsigned in
terms of signed" fallback regardless of whether signed hardware support
was actually available.  This fallback was inefficient, but happened
to avoid the reload bug.  Now we go through the "proper" path and the
bug is exposed.

richard

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: PR 35232: Recent regression due to reload inheritance bug
@ 2008-02-18 22:32 Uros Bizjak
  2008-02-19 10:01 ` Richard Sandiford
  0 siblings, 1 reply; 18+ messages in thread
From: Uros Bizjak @ 2008-02-18 22:32 UTC (permalink / raw)
  To: GCC Patches

Hello!

> If you're uncomfortable with my patch, perhaps one option would be to
> revert Uros's optabs patch for now, on the understanding that both his
> patch and whatever patch we pick for this PR can go in before 4.3.1.
> I'd obviously prefer to fix the underlying bug (and keep Uros's patch)
> if we can though.
>   

If we choose to go this way, the easiest solution is to disable 
following "if" in optabs.c, line 5148:

  /* Unsigned integer, and no way to convert directly.  Convert as signed,
     then unconditionally adjust the result.  */
  if (unsignedp && can_do_signed)

This will disable generic expansion of unsigned float conversion through 
signed float insn. For x87, we provide SImode and DImode float pattern, 
so this change will result in a libcall for "unsigned long long" 
argument. For SSE targets, we provide unsigned float patterns for 
SImode, so "unsigned long long" will also be affected. For 64bit 
targets, we provide DImode patterns, so at least there is no problems.

OTOH, I wonder how it is possible that my optabs.c change uncovered 
latent bug you mentioned. Before my patch, we always expanded through 
this generic code - please note logical "or" with !DECIMAL_FLOAT_MODE, 
and my patch in fact disabled this wrong shortcut when neither signed 
nor unsigned float pattern was available.

That said, I'm strongly against papering over exposed bug by disabling 
the functionality that has no direct connection to the bug itself. This 
_is_ a real defect that was identified and where proposed fix is 
effective. Since we are "just" creating a branch and not the release, 
IMO we can install this patch on the release branch. Otherwise, this 
unfixed problem can (and will) bite at some other, perhaps even more 
inappropriate time.

Uros.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2008-03-25 20:28 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-02-18 19:29 PR 35232: Recent regression due to reload inheritance bug Richard Sandiford
2008-03-12 18:50 ` Ulrich Weigand
2008-03-13 18:03   ` Richard Sandiford
2008-03-17 19:11     ` Ulrich Weigand
2008-03-19 22:31       ` Richard Sandiford
2008-03-19 23:20         ` Ulrich Weigand
2008-03-19 23:39           ` Richard Sandiford
2008-03-25 20:47             ` Ulrich Weigand
2008-02-18 22:32 Uros Bizjak
2008-02-19 10:01 ` Richard Sandiford
2008-02-19 14:01   ` Jakub Jelinek
2008-02-19 14:06     ` Richard Sandiford
2008-02-20 10:00       ` Jakub Jelinek
2008-02-20 11:58         ` Richard Sandiford
2008-02-20 12:21           ` Jakub Jelinek
2008-02-20 12:27             ` Richard Sandiford
2008-02-20 12:49               ` Jakub Jelinek
2008-02-20 19:54                 ` Mark Mitchell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).