public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* patch for elimination to SP when it is changed in RTL (PR57293)
@ 2013-11-29  5:19 Vladimir Makarov
  2013-11-29 10:51 ` H.J. Lu
  2013-12-01 12:57 ` James Greenhalgh
  0 siblings, 2 replies; 20+ messages in thread
From: Vladimir Makarov @ 2013-11-29  5:19 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 3827 bytes --]

   The following patch fixes PR57293

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57293

   It is actually an implementation of missed LRA functionality in reg
elimination.  Before the patch any explicit change of stack pointer in
RTL resulted in necessity to use the frame pointer.

   The patch has practically no effect on generic tuning of x86/x86-64.
But it has a dramatic effect on code performance for other tunings
like corei7 which don't use incoming args accumulation.  The maximum
SPEC2000 improvement 2.5% is achieved on x86 SPECInt2000.  But
SPECFP2000 rate also has improvement about 1% on x86 and x86-64.  Too
bad that I did not implement it at the first place.  The results would
have been even much better ones reported on 2012 GNU Cauldron as I
also used -mtune=corei7 that time.

The patch was bootstrapped and tested on x86-64/x86 and ppc.

Committed as rev. 205498.

  2013-11-28  Vladimir Makarov<vmakarov@redhat.com>

	PR target/57293
	* ira.h (ira_setup_eliminable_regset): Remove parameter.
	* ira.c (ira_setup_eliminable_regset): Ditto.  Add
	SUPPORTS_STACK_ALIGNMENT for crtl->stack_realign_needed.
	Don't call lra_init_elimination.
	(ira): Call ira_setup_eliminable_regset without arguments.
	* loop-invariant.c (calculate_loop_reg_pressure): Remove argument
	from ira_setup_eliminable_regset call.
	* gcse.c (calculate_bb_reg_pressure): Ditto.
	* haifa-sched.c (sched_init): Ditto.
	* lra.h (lra_init_elimination): Remove the prototype.
	* lra-int.h (lra_insn_recog_data): New member sp_offset.  Move
	used_insn_alternative upper.
	(lra_eliminate_regs_1): Add one more parameter.
	(lra-eliminate): Ditto.
	* lra.c (lra_invalidate_insn_data): Set sp_offset.
	(setup_sp_offset): New.
	(lra_process_new_insns): Call setup_sp_offset.
	(lra): Add argument to lra_eliminate calls.
	* lra-constraints.c (get_equiv_substitution): Rename to get_equiv.
	(get_equiv_with_elimination): New.
	(process_addr_reg): Call get_equiv_with_elimination instead of
	get_equiv_substitution.
	(equiv_address_substitution): Ditto.
	(loc_equivalence_change_p): Ditto.
	(loc_equivalence_callback, lra_constraints): Ditto.
	(curr_insn_transform): Ditto.  Print the sp offset
	(process_alt_operands): Prevent stack pointer reloads.
	(lra_constraints): Remove one argument from lra_eliminate call.
	Move it up.  Mark used hard regs bfore it.  Use
	get_equiv_with_elimination instead of get_equiv_substitution.
	* lra-eliminations.c (lra_eliminate_regs_1): Add parameter and
	assert for param values combination.  Use sp offset.  Add argument
	to lra_eliminate_regs_1 calls.
	(lra_eliminate_regs): Add argument to lra_eliminate_regs_1 call.
	(curr_sp_change): New static var.
	(mark_not_eliminable): Add parameter.  Update curr_sp_change.
	Don't prevent elimination to sp if we can calculate its change.
	Pass the argument to mark_not_eliminable calls.
	(eliminate_regs_in_insn): Add a parameter.  Use sp offset.  Add
	argument to lra_eliminate_regs_1 call.
	(update_reg_eliminate): Move calculation of hard regs for spill
	lower.  Switch off lra_in_progress temporarily to generate regs
	involved into elimination.
	(lra_init_elimination): Rename to init_elimination.  Make it
	static.  Set up insn sp offset, check the offsets at the end of
	BBs.
	(process_insn_for_elimination): Add parameter.  Pass its value to
	eliminate_regs_in_insn.
	(lra_eliminate): : Add parameter.  Pass its value to
	process_insn_for_elimination.  Add assert for param values
	combination.  Call init_elimination.  Don't update offsets in
	equivalence substitutions.
	* lra-spills.c (assign_mem_slot): Don't call lra_eliminate_regs_1
	for created stack slot.
	(remove_pseudos): Call lra_eliminate_regs_1 before changing memory
	onto stack slot.

2013-11-28  Vladimir Makarov<vmakarov@redhat.com>

	PR target/57293
	* gcc.target/i386/pr57293.c: New.


[-- Attachment #2: sp.patch --]
[-- Type: text/plain, Size: 41438 bytes --]

Index: gcse.c
===================================================================
--- gcse.c	(revision 205233)
+++ gcse.c	(working copy)
@@ -3509,7 +3509,7 @@ calculate_bb_reg_pressure (void)
   bitmap_iterator bi;
 
 
-  ira_setup_eliminable_regset (false);
+  ira_setup_eliminable_regset ();
   curr_regs_live = BITMAP_ALLOC (&reg_obstack);
   FOR_EACH_BB (bb)
     {
Index: haifa-sched.c
===================================================================
--- haifa-sched.c	(revision 205233)
+++ haifa-sched.c	(working copy)
@@ -6624,7 +6624,7 @@ sched_init (void)
     sched_pressure = SCHED_PRESSURE_NONE;
 
   if (sched_pressure != SCHED_PRESSURE_NONE)
-    ira_setup_eliminable_regset (false);
+    ira_setup_eliminable_regset ();
 
   /* Initialize SPEC_INFO.  */
   if (targetm.sched.set_sched_flags)
Index: ira.c
===================================================================
--- ira.c	(revision 205233)
+++ ira.c	(working copy)
@@ -2380,11 +2380,10 @@ compute_regs_asm_clobbered (void)
 }
 
 
-/* Set up ELIMINABLE_REGSET, IRA_NO_ALLOC_REGS, and REGS_EVER_LIVE.
-   If the function is called from IRA (not from the insn scheduler or
-   RTL loop invariant motion), FROM_IRA_P is true.  */
+/* Set up ELIMINABLE_REGSET, IRA_NO_ALLOC_REGS, and
+   REGS_EVER_LIVE.  */
 void
-ira_setup_eliminable_regset (bool from_ira_p)
+ira_setup_eliminable_regset (void)
 {
 #ifdef ELIMINABLE_REGS
   int i;
@@ -2401,16 +2400,16 @@ ira_setup_eliminable_regset (bool from_i
 	  if the stack pointer is moving.  */
        || (flag_stack_check && STACK_CHECK_MOVING_SP)
        || crtl->accesses_prior_frames
-       || crtl->stack_realign_needed
+       || (SUPPORTS_STACK_ALIGNMENT && crtl->stack_realign_needed)
        /* We need a frame pointer for all Cilk Plus functions that use
 	  Cilk keywords.  */
        || (flag_enable_cilkplus && cfun->is_cilk_function)
        || targetm.frame_pointer_required ());
 
-  if (from_ira_p && ira_use_lra_p)
-    /* It can change FRAME_POINTER_NEEDED.  We call it only from IRA
-       because it is expensive.  */
-    lra_init_elimination ();
+    /* The chance that FRAME_POINTER_NEEDED is changed from inspecting
+       RTL is very small.  So if we use frame pointer for RA and RTL
+       actually prevents this, we will spill pseudos assigned to the
+       frame pointer in LRA.  */
 
   if (frame_pointer_needed)
     df_set_regs_ever_live (HARD_FRAME_POINTER_REGNUM, true);
@@ -5260,7 +5259,7 @@ ira (FILE *f)
     find_moveable_pseudos ();
 
   max_regno_before_ira = max_reg_num ();
-  ira_setup_eliminable_regset (true);
+  ira_setup_eliminable_regset ();
 
   ira_overall_cost = ira_reg_cost = ira_mem_cost = 0;
   ira_load_cost = ira_store_cost = ira_shuffle_cost = 0;
Index: ira.h
===================================================================
--- ira.h	(revision 205233)
+++ ira.h	(working copy)
@@ -178,7 +178,7 @@ extern struct ira_reg_equiv *ira_reg_equ
 extern void ira_init_once (void);
 extern void ira_init (void);
 extern void ira_finish_once (void);
-extern void ira_setup_eliminable_regset (bool);
+extern void ira_setup_eliminable_regset (void);
 extern rtx ira_eliminate_regs (rtx, enum machine_mode);
 extern void ira_set_pseudo_classes (bool, FILE *);
 extern void ira_implicitly_set_insn_hard_regs (HARD_REG_SET *);
Index: loop-invariant.c
===================================================================
--- loop-invariant.c	(revision 205233)
+++ loop-invariant.c	(working copy)
@@ -1823,7 +1823,7 @@ calculate_loop_reg_pressure (void)
 	bitmap_initialize (&LOOP_DATA (loop)->regs_ref, &reg_obstack);
 	bitmap_initialize (&LOOP_DATA (loop)->regs_live, &reg_obstack);
       }
-  ira_setup_eliminable_regset (false);
+  ira_setup_eliminable_regset ();
   bitmap_initialize (&curr_regs_live, &reg_obstack);
   FOR_EACH_BB (bb)
     {
Index: lra-constraints.c
===================================================================
--- lra-constraints.c	(revision 205233)
+++ lra-constraints.c	(working copy)
@@ -318,7 +318,7 @@ in_mem_p (int regno)
 /* If we have decided to substitute X with another value, return that
    value, otherwise return X.  */
 static rtx
-get_equiv_substitution (rtx x)
+get_equiv (rtx x)
 {
   int regno;
   rtx res;
@@ -337,6 +337,19 @@ get_equiv_substitution (rtx x)
   gcc_unreachable ();
 }
 
+/* If we have decided to substitute X with the equivalent value,
+   return that value after elimination for INSN, otherwise return
+   X.  */
+static rtx
+get_equiv_with_elimination (rtx x, rtx insn)
+{
+  rtx res = get_equiv (x);
+
+  if (x == res || CONSTANT_P (res))
+    return res;
+  return lra_eliminate_regs_1 (insn, res, GET_MODE (res), false, false, true);
+}
+
 /* Set up curr_operand_mode.  */
 static void
 init_curr_operand_mode (void)
@@ -1101,7 +1114,7 @@ process_addr_reg (rtx *loc, rtx *before,
     {
       regno = REGNO (reg);
       rclass = get_reg_class (regno);
-      if ((*loc = get_equiv_substitution (reg)) != reg)
+      if ((*loc = get_equiv_with_elimination (reg, curr_insn)) != reg)
 	{
 	  if (lra_dump_file != NULL)
 	    {
@@ -2007,6 +2020,13 @@ process_alt_operands (int only_alternati
 	      int const_to_mem = 0;
 	      bool no_regs_p;
 
+	      /* Never do output reload of stack pointer.  It makes
+		 impossible to do elimination when SP is changed in
+		 RTL.  */
+	      if (op == stack_pointer_rtx && ! frame_pointer_needed
+		  && curr_static_id->operand[nop].type != OP_IN)
+		goto fail;
+
 	      /* If this alternative asks for a specific reg class, see if there
 		 is at least one allocatable register in that class.  */
 	      no_regs_p
@@ -2517,7 +2537,7 @@ equiv_address_substitution (struct addre
   else
     {
       base_reg = *base_term;
-      new_base_reg = get_equiv_substitution (base_reg);
+      new_base_reg = get_equiv_with_elimination (base_reg, curr_insn);
     }
   index_term = strip_subreg (ad->index_term);
   if (index_term == NULL)
@@ -2525,7 +2545,7 @@ equiv_address_substitution (struct addre
   else
     {
       index_reg = *index_term;
-      new_index_reg = get_equiv_substitution (index_reg);
+      new_index_reg = get_equiv_with_elimination (index_reg, curr_insn);
     }
   if (base_reg == new_base_reg && index_reg == new_index_reg)
     return false;
@@ -3055,7 +3075,7 @@ curr_insn_transform (void)
 
       if (GET_CODE (old) == SUBREG)
 	old = SUBREG_REG (old);
-      subst = get_equiv_substitution (old);
+      subst = get_equiv_with_elimination (old, curr_insn);
       if (subst != old)
 	{
 	  subst = copy_rtx (subst);
@@ -3260,6 +3280,9 @@ curr_insn_transform (void)
       if (INSN_CODE (curr_insn) >= 0
           && (p = get_insn_name (INSN_CODE (curr_insn))) != NULL)
         fprintf (lra_dump_file, " {%s}", p);
+      if (curr_id->sp_offset != 0)
+        fprintf (lra_dump_file, " (sp_off=%" HOST_WIDE_INT_PRINT "d)",
+		 curr_id->sp_offset);
        fprintf (lra_dump_file, "\n");
     }
 
@@ -3638,7 +3661,7 @@ loc_equivalence_change_p (rtx *loc)
   if (code == SUBREG)
     {
       reg = SUBREG_REG (x);
-      if ((subst = get_equiv_substitution (reg)) != reg
+      if ((subst = get_equiv_with_elimination (reg, curr_insn)) != reg
 	  && GET_MODE (subst) == VOIDmode)
 	{
 	  /* We cannot reload debug location.  Simplify subreg here
@@ -3648,7 +3671,7 @@ loc_equivalence_change_p (rtx *loc)
 	  return true;
 	}
     }
-  if (code == REG && (subst = get_equiv_substitution (x)) != x)
+  if (code == REG && (subst = get_equiv_with_elimination (x, curr_insn)) != x)
     {
       *loc = subst;
       return true;
@@ -3676,7 +3699,7 @@ loc_equivalence_callback (rtx loc, const
   if (!REG_P (loc))
     return NULL_RTX;
 
-  rtx subst = get_equiv_substitution (loc);
+  rtx subst = get_equiv_with_elimination (loc, curr_insn);
   if (subst != loc)
     return subst;
 
@@ -3848,21 +3871,27 @@ lra_constraints (bool first_p)
   lra_risky_transformations_p = false;
   new_insn_uid_start = get_max_uid ();
   new_regno_start = first_p ? lra_constraint_new_regno_start : max_reg_num ();
+  /* Mark used hard regs for target stack size calulations.  */
+  for (i = FIRST_PSEUDO_REGISTER; i < new_regno_start; i++)
+    if (lra_reg_info[i].nrefs != 0
+	&& (hard_regno = lra_get_regno_hard_regno (i)) >= 0)
+      {
+	int j, nregs;
+
+	nregs = hard_regno_nregs[hard_regno][lra_reg_info[i].biggest_mode];
+	for (j = 0; j < nregs; j++)
+	  df_set_regs_ever_live (hard_regno + j, true);
+      }
+  /* Do elimination before the equivalence processing as we can spill
+     some pseudos during elimination.  */
+  lra_eliminate (false, first_p);
   bitmap_initialize (&equiv_insn_bitmap, &reg_obstack);
   for (i = FIRST_PSEUDO_REGISTER; i < new_regno_start; i++)
     if (lra_reg_info[i].nrefs != 0)
       {
 	ira_reg_equiv[i].profitable_p = true;
 	reg = regno_reg_rtx[i];
-	if ((hard_regno = lra_get_regno_hard_regno (i)) >= 0)
-	  {
-	    int j, nregs;
-
-	    nregs = hard_regno_nregs[hard_regno][lra_reg_info[i].biggest_mode];
-	    for (j = 0; j < nregs; j++)
-	      df_set_regs_ever_live (hard_regno + j, true);
-	  }
-	else if ((x = get_equiv_substitution (reg)) != reg)
+	if (lra_get_regno_hard_regno (i) < 0 && (x = get_equiv (reg)) != reg)
 	  {
 	    bool pseudo_p = contains_reg_p (x, false, false);
 
@@ -3911,7 +3940,7 @@ lra_constraints (bool first_p)
 	      ira_reg_equiv[i].defined_p = false;
 	    if (contains_reg_p (x, false, true))
 	      ira_reg_equiv[i].profitable_p = false;
-	    if (get_equiv_substitution (reg) != reg)
+	    if (get_equiv (reg) != reg)
 	      bitmap_ior_into (&equiv_insn_bitmap, &lra_reg_info[i].insn_bitmap);
 	  }
       }
@@ -3919,7 +3948,6 @@ lra_constraints (bool first_p)
      substituted by their equivalences.  */
   EXECUTE_IF_SET_IN_BITMAP (&equiv_insn_bitmap, 0, uid, bi)
     lra_push_insn_by_uid (uid);
-  lra_eliminate (false);
   min_len = lra_insn_stack_length ();
   new_insns_num = 0;
   last_bb = NULL;
@@ -3973,7 +4001,7 @@ lra_constraints (bool first_p)
 	      if (GET_CODE (dest_reg) == SUBREG)
 		dest_reg = SUBREG_REG (dest_reg);
 	      if ((REG_P (dest_reg)
-		   && (x = get_equiv_substitution (dest_reg)) != dest_reg
+		   && (x = get_equiv (dest_reg)) != dest_reg
 		   /* Remove insns which set up a pseudo whose value
 		      can not be changed.  Such insns might be not in
 		      init_insns because we don't update equiv data
@@ -3993,8 +4021,7 @@ lra_constraints (bool first_p)
 		       || in_list_p (curr_insn,
 				     ira_reg_equiv
 				     [REGNO (dest_reg)].init_insns)))
-		  || (((x = get_equiv_substitution (SET_SRC (set)))
-		       != SET_SRC (set))
+		  || (((x = get_equiv (SET_SRC (set))) != SET_SRC (set))
 		      && in_list_p (curr_insn,
 				    ira_reg_equiv
 				    [REGNO (SET_SRC (set))].init_insns)))
Index: lra-eliminations.c
===================================================================
--- lra-eliminations.c	(revision 205233)
+++ lra-eliminations.c	(working copy)
@@ -286,10 +286,11 @@ get_elimination (rtx reg)
 }
 
 /* Scan X and replace any eliminable registers (such as fp) with a
-   replacement (such as sp) if SUBST_P, plus an offset.	 The offset is
+   replacement (such as sp) if SUBST_P, plus an offset.  The offset is
    a change in the offset between the eliminable register and its
    substitution if UPDATE_P, or the full offset if FULL_P, or
-   otherwise zero.
+   otherwise zero.  If FULL_P, we also use the SP offsets for
+   elimination to SP.
 
    MEM_MODE is the mode of an enclosing MEM.  We need this to know how
    much to adjust a register for, e.g., PRE_DEC.  Also, if we are
@@ -298,10 +299,10 @@ get_elimination (rtx reg)
    outside a MEM.  In addition, we need to record the fact that a
    hard register is referenced outside a MEM.
 
-   Alternatively, INSN may be a note (an EXPR_LIST or INSN_LIST).
-   That's used when we eliminate in expressions stored in notes.  */
+   If we make full substitution to SP for non-null INSN, add the insn
+   sp offset.  */
 rtx
-lra_eliminate_regs_1 (rtx x, enum machine_mode mem_mode,
+lra_eliminate_regs_1 (rtx insn, rtx x, enum machine_mode mem_mode,
 		      bool subst_p, bool update_p, bool full_p)
 {
   enum rtx_code code = GET_CODE (x);
@@ -311,6 +312,7 @@ lra_eliminate_regs_1 (rtx x, enum machin
   const char *fmt;
   int copied = 0;
 
+  gcc_assert (!update_p || !full_p);
   if (! current_function_decl)
     return x;
 
@@ -338,7 +340,12 @@ lra_eliminate_regs_1 (rtx x, enum machin
 	  if (update_p)
 	    return plus_constant (Pmode, to, ep->offset - ep->previous_offset);
 	  else if (full_p)
-	    return plus_constant (Pmode, to, ep->offset);
+	    return plus_constant (Pmode, to,
+				  ep->offset
+				  - (insn != NULL_RTX
+				     && ep->to_rtx == stack_pointer_rtx
+				     ? lra_get_insn_recog_data (insn)->sp_offset
+				     : 0));
 	  else
 	    return to;
 	}
@@ -359,6 +366,8 @@ lra_eliminate_regs_1 (rtx x, enum machin
 
 	      offset = (update_p
 			? ep->offset - ep->previous_offset : ep->offset);
+	      if (full_p && insn != NULL_RTX && ep->to_rtx == stack_pointer_rtx)
+		offset -= lra_get_insn_recog_data (insn)->sp_offset;
 	      if (CONST_INT_P (XEXP (x, 1))
 		  && INTVAL (XEXP (x, 1)) == -offset)
 		return to;
@@ -384,9 +393,9 @@ lra_eliminate_regs_1 (rtx x, enum machin
 	 an address operand of a load-address insn.  */
 
       {
-	rtx new0 = lra_eliminate_regs_1 (XEXP (x, 0), mem_mode,
+	rtx new0 = lra_eliminate_regs_1 (insn, XEXP (x, 0), mem_mode,
 					 subst_p, update_p, full_p);
-	rtx new1 = lra_eliminate_regs_1 (XEXP (x, 1), mem_mode,
+	rtx new1 = lra_eliminate_regs_1 (insn, XEXP (x, 1), mem_mode,
 					 subst_p, update_p, full_p);
 
 	if (new0 != XEXP (x, 0) || new1 != XEXP (x, 1))
@@ -412,10 +421,16 @@ lra_eliminate_regs_1 (rtx x, enum machin
 			     (ep->offset - ep->previous_offset)
 			     * INTVAL (XEXP (x, 1)));
 	  else if (full_p)
-	    return
-	      plus_constant (Pmode,
-			     gen_rtx_MULT (Pmode, to, XEXP (x, 1)),
-			     ep->offset * INTVAL (XEXP (x, 1)));
+	    {
+	      HOST_WIDE_INT offset = ep->offset;
+
+	      if (insn != NULL_RTX && ep->to_rtx == stack_pointer_rtx)
+		offset -= lra_get_insn_recog_data (insn)->sp_offset;
+	      return
+		plus_constant (Pmode,
+			       gen_rtx_MULT (Pmode, to, XEXP (x, 1)),
+			       offset * INTVAL (XEXP (x, 1)));
+	    }
 	  else
 	    return gen_rtx_MULT (Pmode, to, XEXP (x, 1));
 	}
@@ -435,10 +450,10 @@ lra_eliminate_regs_1 (rtx x, enum machin
     case GE:	   case GT:	  case GEU:    case GTU:
     case LE:	   case LT:	  case LEU:    case LTU:
       {
-	rtx new0 = lra_eliminate_regs_1 (XEXP (x, 0), mem_mode,
+	rtx new0 = lra_eliminate_regs_1 (insn, XEXP (x, 0), mem_mode,
 					 subst_p, update_p, full_p);
 	rtx new1 = XEXP (x, 1)
-		   ? lra_eliminate_regs_1 (XEXP (x, 1), mem_mode,
+		   ? lra_eliminate_regs_1 (insn, XEXP (x, 1), mem_mode,
 					   subst_p, update_p, full_p) : 0;
 
 	if (new0 != XEXP (x, 0) || new1 != XEXP (x, 1))
@@ -451,7 +466,7 @@ lra_eliminate_regs_1 (rtx x, enum machin
 	 eliminate it.	*/
       if (XEXP (x, 0))
 	{
-	  new_rtx = lra_eliminate_regs_1 (XEXP (x, 0), mem_mode,
+	  new_rtx = lra_eliminate_regs_1 (insn, XEXP (x, 0), mem_mode,
 					  subst_p, update_p, full_p);
 	  if (new_rtx != XEXP (x, 0))
 	    {
@@ -460,7 +475,7 @@ lra_eliminate_regs_1 (rtx x, enum machin
 		 REG_DEAD note for the stack or frame pointer.	*/
 	      if (REG_NOTE_KIND (x) == REG_DEAD)
 		return (XEXP (x, 1)
-			? lra_eliminate_regs_1 (XEXP (x, 1), mem_mode,
+			? lra_eliminate_regs_1 (insn, XEXP (x, 1), mem_mode,
 						subst_p, update_p, full_p)
 			: NULL_RTX);
 
@@ -477,7 +492,7 @@ lra_eliminate_regs_1 (rtx x, enum machin
 	 strictly needed, but it simplifies the code.  */
       if (XEXP (x, 1))
 	{
-	  new_rtx = lra_eliminate_regs_1 (XEXP (x, 1), mem_mode,
+	  new_rtx = lra_eliminate_regs_1 (insn, XEXP (x, 1), mem_mode,
 					  subst_p, update_p, full_p);
 	  if (new_rtx != XEXP (x, 1))
 	    return
@@ -504,7 +519,8 @@ lra_eliminate_regs_1 (rtx x, enum machin
       if (GET_CODE (XEXP (x, 1)) == PLUS
 	  && XEXP (XEXP (x, 1), 0) == XEXP (x, 0))
 	{
-	  rtx new_rtx = lra_eliminate_regs_1 (XEXP (XEXP (x, 1), 1), mem_mode,
+	  rtx new_rtx = lra_eliminate_regs_1 (insn, XEXP (XEXP (x, 1), 1),
+					      mem_mode,
 					      subst_p, update_p, full_p);
 
 	  if (new_rtx != XEXP (XEXP (x, 1), 1))
@@ -528,14 +544,14 @@ lra_eliminate_regs_1 (rtx x, enum machin
     case POPCOUNT:
     case PARITY:
     case BSWAP:
-      new_rtx = lra_eliminate_regs_1 (XEXP (x, 0), mem_mode,
+      new_rtx = lra_eliminate_regs_1 (insn, XEXP (x, 0), mem_mode,
 				      subst_p, update_p, full_p);
       if (new_rtx != XEXP (x, 0))
 	return gen_rtx_fmt_e (code, GET_MODE (x), new_rtx);
       return x;
 
     case SUBREG:
-      new_rtx = lra_eliminate_regs_1 (SUBREG_REG (x), mem_mode,
+      new_rtx = lra_eliminate_regs_1 (insn, SUBREG_REG (x), mem_mode,
 				      subst_p, update_p, full_p);
 
       if (new_rtx != SUBREG_REG (x))
@@ -563,12 +579,12 @@ lra_eliminate_regs_1 (rtx x, enum machin
       return
 	replace_equiv_address_nv
 	(x,
-	 lra_eliminate_regs_1 (XEXP (x, 0), GET_MODE (x),
+	 lra_eliminate_regs_1 (insn, XEXP (x, 0), GET_MODE (x),
 			       subst_p, update_p, full_p));
 
     case USE:
       /* Handle insn_list USE that a call to a pure function may generate.  */
-      new_rtx = lra_eliminate_regs_1 (XEXP (x, 0), VOIDmode,
+      new_rtx = lra_eliminate_regs_1 (insn, XEXP (x, 0), VOIDmode,
 				      subst_p, update_p, full_p);
       if (new_rtx != XEXP (x, 0))
 	return gen_rtx_USE (GET_MODE (x), new_rtx);
@@ -589,7 +605,7 @@ lra_eliminate_regs_1 (rtx x, enum machin
     {
       if (*fmt == 'e')
 	{
-	  new_rtx = lra_eliminate_regs_1 (XEXP (x, i), mem_mode,
+	  new_rtx = lra_eliminate_regs_1 (insn, XEXP (x, i), mem_mode,
 					  subst_p, update_p, full_p);
 	  if (new_rtx != XEXP (x, i) && ! copied)
 	    {
@@ -603,7 +619,7 @@ lra_eliminate_regs_1 (rtx x, enum machin
 	  int copied_vec = 0;
 	  for (j = 0; j < XVECLEN (x, i); j++)
 	    {
-	      new_rtx = lra_eliminate_regs_1 (XVECEXP (x, i, j), mem_mode,
+	      new_rtx = lra_eliminate_regs_1 (insn, XVECEXP (x, i, j), mem_mode,
 					      subst_p, update_p, full_p);
 	      if (new_rtx != XVECEXP (x, i, j) && ! copied_vec)
 		{
@@ -631,16 +647,21 @@ rtx
 lra_eliminate_regs (rtx x, enum machine_mode mem_mode,
 		    rtx insn ATTRIBUTE_UNUSED)
 {
-  return lra_eliminate_regs_1 (x, mem_mode, true, false, true);
+  return lra_eliminate_regs_1 (NULL_RTX, x, mem_mode, true, false, true);
 }
 
+/* Stack pointer offset before the current insn relative to one at the
+   func start.  RTL insns can change SP explicitly.  We keep the
+   changes from one insn to another through this variable.  */
+static HOST_WIDE_INT curr_sp_change;
+
 /* Scan rtx X for references to elimination source or target registers
    in contexts that would prevent the elimination from happening.
    Update the table of eliminables to reflect the changed state.
    MEM_MODE is the mode of an enclosing MEM rtx, or VOIDmode if not
    within a MEM.  */
 static void
-mark_not_eliminable (rtx x)
+mark_not_eliminable (rtx x, enum machine_mode mem_mode)
 {
   enum rtx_code code = GET_CODE (x);
   struct elim_table *ep;
@@ -655,17 +676,40 @@ mark_not_eliminable (rtx x)
     case POST_DEC:
     case POST_MODIFY:
     case PRE_MODIFY:
-      if (REG_P (XEXP (x, 0)) && REGNO (XEXP (x, 0)) < FIRST_PSEUDO_REGISTER)
-	/* If we modify the source of an elimination rule, disable
-	   it.  Do the same if it is the source and not the hard frame
-	   register.  */
-	for (ep = reg_eliminate;
-	     ep < &reg_eliminate[NUM_ELIMINABLE_REGS];
+      if (XEXP (x, 0) == stack_pointer_rtx
+	  && ((code != PRE_MODIFY && code != POST_MODIFY)
+	      || (GET_CODE (XEXP (x, 1)) == PLUS
+		  && XEXP (x, 0) == XEXP (XEXP (x, 1), 0)
+		  && CONST_INT_P (XEXP (XEXP (x, 1), 1)))))
+	{
+	  int size = GET_MODE_SIZE (mem_mode);
+	  
+#ifdef PUSH_ROUNDING
+	  /* If more bytes than MEM_MODE are pushed, account for
+	     them.  */
+	  size = PUSH_ROUNDING (size);
+#endif
+	  if (code == PRE_DEC || code == POST_DEC)
+	    curr_sp_change -= size;
+	  else if (code == PRE_INC || code == POST_INC)
+	    curr_sp_change += size;
+	  else if (code == PRE_MODIFY || code == POST_MODIFY)
+	    curr_sp_change += INTVAL (XEXP (XEXP (x, 1), 1));
+	}
+      else if (REG_P (XEXP (x, 0))
+	       && REGNO (XEXP (x, 0)) >= FIRST_PSEUDO_REGISTER)
+	{
+	  /* If we modify the source of an elimination rule, disable
+	     it.  Do the same if it is the destination and not the
+	     hard frame register.  */
+	  for (ep = reg_eliminate;
+	       ep < &reg_eliminate[NUM_ELIMINABLE_REGS];
 	       ep++)
-	  if (ep->from_rtx == XEXP (x, 0)
-	      || (ep->to_rtx == XEXP (x, 0)
-		  && ep->to_rtx != hard_frame_pointer_rtx))
-	    setup_can_eliminate (ep, false);
+	    if (ep->from_rtx == XEXP (x, 0)
+		|| (ep->to_rtx == XEXP (x, 0)
+		    && ep->to_rtx != hard_frame_pointer_rtx))
+	      setup_can_eliminate (ep, false);
+	}
       return;
 
     case USE:
@@ -697,12 +741,22 @@ mark_not_eliminable (rtx x)
       return;
 
     case SET:
-      /* Check for setting a hard register that we know about.	*/
-      if (REG_P (SET_DEST (x)) && REGNO (SET_DEST (x)) < FIRST_PSEUDO_REGISTER)
+      if (SET_DEST (x) == stack_pointer_rtx
+	  && GET_CODE (SET_SRC (x)) == PLUS
+	  && XEXP (SET_SRC (x), 0) == SET_DEST (x)
+	  && CONST_INT_P (XEXP (SET_SRC (x), 1)))
+	{
+	  curr_sp_change += INTVAL (XEXP (SET_SRC (x), 1));
+	  return;
+	}
+      if (! REG_P (SET_DEST (x))
+	  || REGNO (SET_DEST (x)) >= FIRST_PSEUDO_REGISTER)
+	mark_not_eliminable (SET_DEST (x), mem_mode);
+      else
 	{
 	  /* See if this is setting the replacement hard register for
 	     an elimination.
-
+	     
 	     If DEST is the hard frame pointer, we do nothing because
 	     we assume that all assignments to the frame pointer are
 	     for non-local gotos and are being done at a time when
@@ -711,22 +765,21 @@ mark_not_eliminable (rtx x)
 	     even a fake frame pointer) with either the real frame
 	     pointer or the stack pointer.  Assignments to the hard
 	     frame pointer must not prevent this elimination.  */
-
 	  for (ep = reg_eliminate;
 	       ep < &reg_eliminate[NUM_ELIMINABLE_REGS];
 	       ep++)
 	    if (ep->to_rtx == SET_DEST (x)
-		&& SET_DEST (x) != hard_frame_pointer_rtx
-		&& (! (SUPPORTS_STACK_ALIGNMENT && stack_realign_fp
-		       && REGNO (ep->to_rtx) == STACK_POINTER_REGNUM)
-		    || GET_CODE (SET_SRC (x)) != PLUS
-		    || XEXP (SET_SRC (x), 0) != SET_DEST (x)
-		    || ! CONST_INT_P (XEXP (SET_SRC (x), 1))))
+		&& SET_DEST (x) != hard_frame_pointer_rtx)
 	      setup_can_eliminate (ep, false);
 	}
+      
+      mark_not_eliminable (SET_SRC (x), mem_mode);
+      return;
 
-      mark_not_eliminable (SET_DEST (x));
-      mark_not_eliminable (SET_SRC (x));
+    case MEM:
+      /* Our only special processing is to pass the mode of the MEM to
+	 our recursive call.  */
+      mark_not_eliminable (XEXP (x, 0), GET_MODE (x));
       return;
 
     default:
@@ -737,10 +790,10 @@ mark_not_eliminable (rtx x)
   for (i = 0; i < GET_RTX_LENGTH (code); i++, fmt++)
     {
       if (*fmt == 'e')
-	mark_not_eliminable (XEXP (x, i));
+	mark_not_eliminable (XEXP (x, i), mem_mode);
       else if (*fmt == 'E')
 	for (j = 0; j < XVECLEN (x, i); j++)
-	  mark_not_eliminable (XVECEXP (x, i, j));
+	  mark_not_eliminable (XVECEXP (x, i, j), mem_mode);
     }
 }
 
@@ -778,13 +831,14 @@ remove_reg_equal_offset_note (rtx insn,
    delete the insn as dead it if it is setting an eliminable register.
 
    If REPLACE_P is false, just update the offsets while keeping the
-   base register the same.  Attach the note about used elimination for
+   base register the same.  If FIRST_P, use the sp offset for
+   elimination to sp.  Attach the note about used elimination for
    insns setting frame pointer to update elimination easy (without
    parsing already generated elimination insns to find offset
    previously used) in future.  */
 
 static void
-eliminate_regs_in_insn (rtx insn, bool replace_p)
+eliminate_regs_in_insn (rtx insn, bool replace_p, bool first_p)
 {
   int icode = recog_memoized (insn);
   rtx old_set = single_set (insn);
@@ -914,6 +968,8 @@ eliminate_regs_in_insn (rtx insn, bool r
 	  if (! replace_p)
 	    {
 	      offset += (ep->offset - ep->previous_offset);
+	      if (first_p && ep->to_rtx == stack_pointer_rtx)
+		offset -= lra_get_insn_recog_data (insn)->sp_offset;
 	      offset = trunc_int_for_mode (offset, GET_MODE (plus_cst_src));
 	    }
 
@@ -985,8 +1041,9 @@ eliminate_regs_in_insn (rtx insn, bool r
 	  /* Companion to the above plus substitution, we can allow
 	     invariants as the source of a plain move.	*/
 	  substed_operand[i]
-	    = lra_eliminate_regs_1 (*id->operand_loc[i], VOIDmode,
-				    replace_p, ! replace_p, false);
+	    = lra_eliminate_regs_1 (insn, *id->operand_loc[i], VOIDmode,
+				    replace_p, ! replace_p && ! first_p,
+				    first_p);
 	  if (substed_operand[i] != orig_operand[i])
 	    validate_p = true;
 	}
@@ -1054,7 +1111,7 @@ spill_pseudos (HARD_REG_SET set)
 }
 
 /* Update all offsets and possibility for elimination on eliminable
-   registers.  Spill pseudos assigned to registers which became
+   registers.  Spill pseudos assigned to registers which are
    uneliminable, update LRA_NO_ALLOC_REGS and ELIMINABLE_REG_SET.  Add
    insns to INSNS_WITH_CHANGED_OFFSETS containing eliminable hard
    registers whose offsets should be changed.  Return true if any
@@ -1069,7 +1126,6 @@ update_reg_eliminate (bitmap insns_with_
   /* Clear self elimination offsets.  */
   for (ep = reg_eliminate; ep < &reg_eliminate[NUM_ELIMINABLE_REGS]; ep++)
     self_elim_offsets[ep->from] = 0;
-  CLEAR_HARD_REG_SET (temp_hard_reg_set);
   for (ep = reg_eliminate; ep < &reg_eliminate[NUM_ELIMINABLE_REGS]; ep++)
     {
       /* If it is a currently used elimination: update the previous
@@ -1096,6 +1152,9 @@ update_reg_eliminate (bitmap insns_with_
 	    fprintf (lra_dump_file,
 		     "	Elimination %d to %d is not possible anymore\n",
 		     ep->from, ep->to);
+	  /* If after processing RTL we decides that SP can be used as
+	     a result of elimination, it can not be changed.  */
+	  gcc_assert (ep->to_rtx != stack_pointer_rtx);
 	  /* Mark that is not eliminable anymore.  */
 	  elimination_map[ep->from] = NULL;
 	  for (ep1 = ep + 1; ep1 < &reg_eliminate[NUM_ELIMINABLE_REGS]; ep1++)
@@ -1106,9 +1165,6 @@ update_reg_eliminate (bitmap insns_with_
 	      if (lra_dump_file != NULL)
 		fprintf (lra_dump_file, "    Using elimination %d to %d now\n",
 			 ep1->from, ep1->to);
-	      /* Prevent the hard register into which we eliminate now
-		 from the usage for pseudos.  */
-	      SET_HARD_REG_BIT (temp_hard_reg_set, ep1->to);
 	      lra_assert (ep1->previous_offset == 0);
 	      ep1->previous_offset = ep->offset;
 	    }
@@ -1121,7 +1177,6 @@ update_reg_eliminate (bitmap insns_with_
 		fprintf (lra_dump_file, "    %d is not eliminable at all\n",
 			 ep->from);
 	      self_elim_offsets[ep->from] = -ep->offset;
-	      SET_HARD_REG_BIT (temp_hard_reg_set, ep->from);
 	      if (ep->offset != 0)
 		bitmap_ior_into (insns_with_changed_offsets,
 				 &lra_reg_info[ep->from].insn_bitmap);
@@ -1134,23 +1189,33 @@ update_reg_eliminate (bitmap insns_with_
       INITIAL_FRAME_POINTER_OFFSET (ep->offset);
 #endif
     }
-  IOR_HARD_REG_SET (lra_no_alloc_regs, temp_hard_reg_set);
-  AND_COMPL_HARD_REG_SET (eliminable_regset, temp_hard_reg_set);
-  spill_pseudos (temp_hard_reg_set);
   setup_elimination_map ();
   result = false;
+  CLEAR_HARD_REG_SET (temp_hard_reg_set);
   for (ep = reg_eliminate; ep < &reg_eliminate[NUM_ELIMINABLE_REGS]; ep++)
-    if (elimination_map[ep->from] == ep && ep->previous_offset != ep->offset)
+    if (elimination_map[ep->from] == NULL)
+      SET_HARD_REG_BIT (temp_hard_reg_set, ep->from);
+    else if (elimination_map[ep->from] == ep)
       {
-	bitmap_ior_into (insns_with_changed_offsets,
-			 &lra_reg_info[ep->from].insn_bitmap);
+	/* Prevent the hard register into which we eliminate from
+	   the usage for pseudos.  */
+        if (ep->from != ep->to)
+	  SET_HARD_REG_BIT (temp_hard_reg_set, ep->to);
+	if (ep->previous_offset != ep->offset)
+	  {
+	    bitmap_ior_into (insns_with_changed_offsets,
+			     &lra_reg_info[ep->from].insn_bitmap);
 
-	/* Update offset when the eliminate offset have been
-	   changed.  */
-	lra_update_reg_val_offset (lra_reg_info[ep->from].val,
-				   ep->offset - ep->previous_offset);
-	result = true;
+	    /* Update offset when the eliminate offset have been
+	       changed.  */
+	    lra_update_reg_val_offset (lra_reg_info[ep->from].val,
+				       ep->offset - ep->previous_offset);
+	    result = true;
+	  }
       }
+  IOR_HARD_REG_SET (lra_no_alloc_regs, temp_hard_reg_set);
+  AND_COMPL_HARD_REG_SET (eliminable_regset, temp_hard_reg_set);
+  spill_pseudos (temp_hard_reg_set);
   return result;
 }
 
@@ -1194,31 +1259,54 @@ init_elim_table (void)
   setup_can_eliminate (&reg_eliminate[0], ! frame_pointer_needed);
 #endif
 
-  /* Count the number of eliminable registers and build the FROM and TO
-     REG rtx's.	 Note that code in gen_rtx_REG will cause, e.g.,
-     gen_rtx_REG (Pmode, STACK_POINTER_REGNUM) to equal stack_pointer_rtx.
-     We depend on this.	 */
+  /* Build the FROM and TO REG rtx's.  Note that code in gen_rtx_REG
+     will cause, e.g., gen_rtx_REG (Pmode, STACK_POINTER_REGNUM) to
+     equal stack_pointer_rtx.  We depend on this. Threfore we switch
+     off that we are in LRA temporarily.  */
+  lra_in_progress = 0;
   for (ep = reg_eliminate; ep < &reg_eliminate[NUM_ELIMINABLE_REGS]; ep++)
     {
       ep->from_rtx = gen_rtx_REG (Pmode, ep->from);
       ep->to_rtx = gen_rtx_REG (Pmode, ep->to);
       eliminable_reg_rtx[ep->from] = ep->from_rtx;
     }
+  lra_in_progress = 1;
 }
 
-/* Entry function for initialization of elimination once per
-   function.  */
-void
-lra_init_elimination (void)
+/* Function for initialization of elimination once per function.  It
+   sets up sp offset for each insn.  */
+static void
+init_elimination (void)
 {
+  bool stop_to_sp_elimination_p;
   basic_block bb;
   rtx insn;
+  struct elim_table *ep;
 
   init_elim_table ();
   FOR_EACH_BB (bb)
-    FOR_BB_INSNS (bb, insn)
-    if (NONDEBUG_INSN_P (insn))
-      mark_not_eliminable (PATTERN (insn));
+    {
+      curr_sp_change = 0;
+      stop_to_sp_elimination_p = false;
+      FOR_BB_INSNS (bb, insn)
+	if (INSN_P (insn))
+	  {
+	    lra_get_insn_recog_data (insn)->sp_offset = curr_sp_change;
+	    if (NONDEBUG_INSN_P (insn))
+	      {
+		mark_not_eliminable (PATTERN (insn), VOIDmode);
+		if (curr_sp_change != 0
+		    && find_reg_note (insn, REG_LABEL_OPERAND, NULL_RTX))
+		  stop_to_sp_elimination_p = true;
+	      }
+	  }
+      if (! frame_pointer_needed
+	  && (curr_sp_change != 0 || stop_to_sp_elimination_p)
+	  && bb->succs && bb->succs->length () != 0)
+	for (ep = reg_eliminate; ep < &reg_eliminate[NUM_ELIMINABLE_REGS]; ep++)
+	  if (ep->to == STACK_POINTER_REGNUM)
+	    setup_can_eliminate (ep, false);
+    }
   setup_elimination_map ();
 }
 
@@ -1237,12 +1325,13 @@ lra_eliminate_reg_if_possible (rtx *loc)
     *loc = ep->to_rtx;
 }
 
-/* Do (final if FINAL_P) elimination in INSN.  Add the insn for
-   subsequent processing in the constraint pass, update the insn info.	*/
+/* Do (final if FINAL_P or first if FIRST_P) elimination in INSN.  Add
+   the insn for subsequent processing in the constraint pass, update
+   the insn info.  */
 static void
-process_insn_for_elimination (rtx insn, bool final_p)
+process_insn_for_elimination (rtx insn, bool final_p, bool first_p)
 {
-  eliminate_regs_in_insn (insn, final_p);
+  eliminate_regs_in_insn (insn, final_p, first_p);
   if (! final_p)
     {
       /* Check that insn changed its code.  This is a case when a move
@@ -1262,20 +1351,23 @@ process_insn_for_elimination (rtx insn,
 }
 
 /* Entry function to do final elimination if FINAL_P or to update
-   elimination register offsets.  */
+   elimination register offsets (FIRST_P if we are doing it the first
+   time).  */
 void
-lra_eliminate (bool final_p)
+lra_eliminate (bool final_p, bool first_p)
 {
-  int i;
   unsigned int uid;
-  rtx mem_loc, invariant;
   bitmap_head insns_with_changed_offsets;
   bitmap_iterator bi;
   struct elim_table *ep;
-  int regs_num = max_reg_num ();
+
+  gcc_assert (! final_p || ! first_p);
 
   timevar_push (TV_LRA_ELIMINATE);
 
+  if (first_p)
+    init_elimination ();
+
   bitmap_initialize (&insns_with_changed_offsets, &reg_obstack);
   if (final_p)
     {
@@ -1299,28 +1391,11 @@ lra_eliminate (bool final_p)
       fprintf (lra_dump_file, "New elimination table:\n");
       print_elim_table (lra_dump_file);
     }
-  for (i = FIRST_PSEUDO_REGISTER; i < regs_num; i++)
-    if (lra_reg_info[i].nrefs != 0)
-      {
-	mem_loc = ira_reg_equiv[i].memory;
-	if (mem_loc != NULL_RTX)
-	  mem_loc = lra_eliminate_regs_1 (mem_loc, VOIDmode,
-					  final_p, ! final_p, false);
-	ira_reg_equiv[i].memory = mem_loc;
-	invariant = ira_reg_equiv[i].invariant;
-	if (invariant != NULL_RTX)
-	  invariant = lra_eliminate_regs_1 (invariant, VOIDmode,
-					    final_p, ! final_p, false);
-	ira_reg_equiv[i].invariant = invariant;
-	if (lra_dump_file != NULL
-	    && (mem_loc != NULL_RTX || invariant != NULL))
-	  fprintf (lra_dump_file,
-		   "Updating elimination of equiv for reg %d\n", i);
-      }
   EXECUTE_IF_SET_IN_BITMAP (&insns_with_changed_offsets, 0, uid, bi)
     /* A dead insn can be deleted in process_insn_for_elimination.  */
     if (lra_insn_recog_data[uid] != NULL)
-      process_insn_for_elimination (lra_insn_recog_data[uid]->insn, final_p);
+      process_insn_for_elimination (lra_insn_recog_data[uid]->insn,
+				    final_p, first_p);
   bitmap_clear (&insns_with_changed_offsets);
 
 lra_eliminate_done:
Index: lra-int.h
===================================================================
--- lra-int.h	(revision 205233)
+++ lra-int.h	(working copy)
@@ -207,6 +207,12 @@ struct lra_insn_recog_data
 {
   /* The insn code.  */
   int icode;
+  /* The alternative should be used for the insn, -1 if invalid, or we
+     should try to use any alternative, or the insn is a debug
+     insn.  */
+  int used_insn_alternative;
+  /* SP offset before the insn relative to one at the func start.  */
+  HOST_WIDE_INT sp_offset;
   /* The insn itself.  */
   rtx insn;
   /* Common data for insns with the same ICODE.  Asm insns (their
@@ -222,10 +228,6 @@ struct lra_insn_recog_data
   int *arg_hard_regs;
   /* Alternative enabled for the insn.	NULL for debug insns.  */
   bool *alternative_enabled_p;
-  /* The alternative should be used for the insn, -1 if invalid, or we
-     should try to use any alternative, or the insn is a debug
-     insn.  */
-  int used_insn_alternative;
   /* The following member value is always NULL for a debug insn.  */
   struct lra_insn_reg *regs;
 };
@@ -377,8 +379,8 @@ extern void lra_final_code_change (void)
 
 extern void lra_debug_elim_table (void);
 extern int lra_get_elimination_hard_regno (int);
-extern rtx lra_eliminate_regs_1 (rtx, enum machine_mode, bool, bool, bool);
-extern void lra_eliminate (bool);
+extern rtx lra_eliminate_regs_1 (rtx, rtx, enum machine_mode, bool, bool, bool);
+extern void lra_eliminate (bool, bool);
 
 extern void lra_eliminate_reg_if_possible (rtx *);
 
Index: lra-spills.c
===================================================================
--- lra-spills.c	(revision 205233)
+++ lra-spills.c	(working copy)
@@ -163,7 +163,6 @@ assign_mem_slot (int i)
       x = assign_stack_local (mode, total_size,
 			      min_align > inherent_align
 			      || total_size > inherent_size ? -1 : 0);
-      x = lra_eliminate_regs_1 (x, GET_MODE (x), false, false, true);
       stack_slot = x;
       /* Cancel the big-endian correction done in assign_stack_local.
 	 Get the address of the beginning of the slot.	This is so we
@@ -430,8 +429,15 @@ remove_pseudos (rtx *loc, rtx insn)
 	 into scratches back.  */
       && ! lra_former_scratch_p (i))
     {
-      hard_reg = spill_hard_reg[i];
-      *loc = copy_rtx (hard_reg != NULL_RTX ? hard_reg : pseudo_slots[i].mem);
+      if ((hard_reg = spill_hard_reg[i]) != NULL_RTX)
+	*loc = copy_rtx (hard_reg);
+      else
+	{
+	  rtx x = lra_eliminate_regs_1 (insn, pseudo_slots[i].mem,
+					GET_MODE (pseudo_slots[i].mem),
+					false, false, true);
+	  *loc = x != pseudo_slots[i].mem ? x : copy_rtx (x);
+	}
       return;
     }
 
Index: lra.c
===================================================================
--- lra.c	(revision 205233)
+++ lra.c	(working copy)
@@ -207,7 +207,8 @@ lra_set_regno_unique_value (int regno)
   lra_reg_info[regno].val = get_new_reg_value ();
 }
 
-/* Invalidate INSN related info used by LRA.  */
+/* Invalidate INSN related info used by LRA.  The info should never be
+   used after that.  */
 void
 lra_invalidate_insn_data (rtx insn)
 {
@@ -1273,17 +1274,24 @@ lra_update_insn_recog_data (rtx insn)
   int n;
   unsigned int uid = INSN_UID (insn);
   struct lra_static_insn_data *insn_static_data;
+  HOST_WIDE_INT sp_offset = 0;
 
   check_and_expand_insn_recog_data (uid);
   if ((data = lra_insn_recog_data[uid]) != NULL
       && data->icode != INSN_CODE (insn))
     {
+      sp_offset = data->sp_offset;
       invalidate_insn_data_regno_info (data, insn, get_insn_freq (insn));
       invalidate_insn_recog_data (uid);
       data = NULL;
     }
   if (data == NULL)
-    return lra_get_insn_recog_data (insn);
+    {
+      data = lra_get_insn_recog_data (insn);
+      /* Initiate or restore SP offset.  */
+      data->sp_offset = sp_offset;
+      return data;
+    }
   insn_static_data = data->insn_static_data;
   data->used_insn_alternative = -1;
   if (DEBUG_INSN_P (insn))
@@ -1837,6 +1845,20 @@ push_insns (rtx from, rtx to)
       lra_push_insn (insn);
 }
 
+/* Set up sp offset for insn in range [FROM, LAST].  The offset is
+   taken from the next BB insn after LAST or zero if there in such
+   insn.  */
+static void
+setup_sp_offset (rtx from, rtx last)
+{
+  rtx before = next_nonnote_insn_bb (last);
+  HOST_WIDE_INT offset = (before == NULL_RTX || ! INSN_P (before)
+			  ? 0 : lra_get_insn_recog_data (before)->sp_offset);
+
+  for (rtx insn = from; insn != NEXT_INSN (last); insn = NEXT_INSN (insn))
+    lra_get_insn_recog_data (insn)->sp_offset = offset;
+}
+
 /* Emit insns BEFORE before INSN and insns AFTER after INSN.  Put the
    insns onto the stack.  Print about emitting the insns with
    TITLE.  */
@@ -1845,7 +1867,9 @@ lra_process_new_insns (rtx insn, rtx bef
 {
   rtx last;
 
-  if (lra_dump_file != NULL && (before != NULL_RTX || after != NULL_RTX))
+  if (before == NULL_RTX && after == NULL_RTX)
+    return;
+  if (lra_dump_file != NULL)
     {
       dump_insn_slim (lra_dump_file, insn);
       if (before != NULL_RTX)
@@ -1864,6 +1888,7 @@ lra_process_new_insns (rtx insn, rtx bef
     {
       emit_insn_before (before, insn);
       push_insns (PREV_INSN (insn), PREV_INSN (before));
+      setup_sp_offset (before, PREV_INSN (insn));
     }
   if (after != NULL_RTX)
     {
@@ -1871,6 +1896,7 @@ lra_process_new_insns (rtx insn, rtx bef
 	;
       emit_insn_after (after, insn);
       push_insns (last, insn);
+      setup_sp_offset (after, last);
     }
 }
 
@@ -2314,7 +2340,7 @@ lra (FILE *f)
 	     For example, rs6000 can make
 	     RS6000_PIC_OFFSET_TABLE_REGNUM uneliminable if we started
 	     to use a constant pool.  */
-	  lra_eliminate (false);
+	  lra_eliminate (false, false);
 	  /* Do inheritance only for regular algorithms.  */
 	  if (! lra_simple_p)
 	    lra_inheritance ();
@@ -2368,13 +2394,13 @@ lra (FILE *f)
       lra_spill ();
       /* Assignment of stack slots changes elimination offsets for
 	 some eliminations.  So update the offsets here.  */
-      lra_eliminate (false);
+      lra_eliminate (false, false);
       lra_constraint_new_regno_start = max_reg_num ();
       lra_constraint_new_insn_uid_start = get_max_uid ();
       lra_constraint_iter_after_spill = 0;
     }
   restore_scratches ();
-  lra_eliminate (true);
+  lra_eliminate (true, false);
   lra_final_code_change ();
   lra_in_progress = 0;
   if (live_p)
Index: lra.h
===================================================================
--- lra.h	(revision 205233)
+++ lra.h	(working copy)
@@ -33,7 +33,6 @@ lra_get_allocno_class (int regno)
 
 extern rtx lra_create_new_reg (enum machine_mode, rtx, enum reg_class,
 			       const char *);
-extern void lra_init_elimination (void);
 extern rtx lra_eliminate_regs (rtx, enum machine_mode, rtx);
 extern void lra (FILE *);
 extern void lra_init_once (void);
Index: testsuite/gcc.target/i386/pr57293.c
===================================================================
--- testsuite/gcc.target/i386/pr57293.c	(revision 0)
+++ testsuite/gcc.target/i386/pr57293.c	(working copy)
@@ -0,0 +1,20 @@
+/* { dg-do compile  { target { ia32 } } } */
+/* { dg-options "-O2 -fomit-frame-pointer" } */
+/* { dg-final { scan-assembler-not "%ebp" } } */
+
+__attribute__((__noinline__, __noclone__, __stdcall__)) void g(int a)
+{
+  __builtin_printf("in g(): %d\n", a);
+}
+
+__attribute__((__noinline__, __noclone__, __thiscall__)) void h(int a, int b)
+{
+  __builtin_printf("in h(): %d %d\n", a, b);
+}
+
+void f()
+{
+  g(0);
+  h(0, 1);
+  __builtin_puts("in f()");
+}

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: patch for elimination to SP when it is changed in RTL (PR57293)
  2013-11-29  5:19 patch for elimination to SP when it is changed in RTL (PR57293) Vladimir Makarov
@ 2013-11-29 10:51 ` H.J. Lu
  2013-11-29 11:10   ` Vladimir Makarov
  2013-12-01 12:57 ` James Greenhalgh
  1 sibling, 1 reply; 20+ messages in thread
From: H.J. Lu @ 2013-11-29 10:51 UTC (permalink / raw)
  To: Vladimir Makarov; +Cc: GCC Patches

On Thu, Nov 28, 2013 at 2:11 PM, Vladimir Makarov <vmakarov@redhat.com> wrote:
>   The following patch fixes PR57293
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57293
>
>   It is actually an implementation of missed LRA functionality in reg
> elimination.  Before the patch any explicit change of stack pointer in
> RTL resulted in necessity to use the frame pointer.
>
>   The patch has practically no effect on generic tuning of x86/x86-64.
> But it has a dramatic effect on code performance for other tunings
> like corei7 which don't use incoming args accumulation.  The maximum
> SPEC2000 improvement 2.5% is achieved on x86 SPECInt2000.  But
> SPECFP2000 rate also has improvement about 1% on x86 and x86-64.  Too
> bad that I did not implement it at the first place.  The results would
> have been even much better ones reported on 2012 GNU Cauldron as I
> also used -mtune=corei7 that time.
>
> The patch was bootstrapped and tested on x86-64/x86 and ppc.
>
> Committed as rev. 205498.
>
>  2013-11-28  Vladimir Makarov<vmakarov@redhat.com>
>
>         PR target/57293
>         * ira.h (ira_setup_eliminable_regset): Remove parameter.
>         * ira.c (ira_setup_eliminable_regset): Ditto.  Add
>         SUPPORTS_STACK_ALIGNMENT for crtl->stack_realign_needed.
>         Don't call lra_init_elimination.
>         (ira): Call ira_setup_eliminable_regset without arguments.
>         * loop-invariant.c (calculate_loop_reg_pressure): Remove argument
>         from ira_setup_eliminable_regset call.
>         * gcse.c (calculate_bb_reg_pressure): Ditto.
>         * haifa-sched.c (sched_init): Ditto.
>         * lra.h (lra_init_elimination): Remove the prototype.
>         * lra-int.h (lra_insn_recog_data): New member sp_offset.  Move
>         used_insn_alternative upper.
>         (lra_eliminate_regs_1): Add one more parameter.
>         (lra-eliminate): Ditto.
>         * lra.c (lra_invalidate_insn_data): Set sp_offset.
>         (setup_sp_offset): New.
>         (lra_process_new_insns): Call setup_sp_offset.
>         (lra): Add argument to lra_eliminate calls.
>         * lra-constraints.c (get_equiv_substitution): Rename to get_equiv.
>         (get_equiv_with_elimination): New.
>         (process_addr_reg): Call get_equiv_with_elimination instead of
>         get_equiv_substitution.
>         (equiv_address_substitution): Ditto.
>         (loc_equivalence_change_p): Ditto.
>         (loc_equivalence_callback, lra_constraints): Ditto.
>         (curr_insn_transform): Ditto.  Print the sp offset
>         (process_alt_operands): Prevent stack pointer reloads.
>         (lra_constraints): Remove one argument from lra_eliminate call.
>         Move it up.  Mark used hard regs bfore it.  Use
>         get_equiv_with_elimination instead of get_equiv_substitution.
>         * lra-eliminations.c (lra_eliminate_regs_1): Add parameter and
>         assert for param values combination.  Use sp offset.  Add argument
>         to lra_eliminate_regs_1 calls.
>         (lra_eliminate_regs): Add argument to lra_eliminate_regs_1 call.
>         (curr_sp_change): New static var.
>         (mark_not_eliminable): Add parameter.  Update curr_sp_change.
>         Don't prevent elimination to sp if we can calculate its change.
>         Pass the argument to mark_not_eliminable calls.
>         (eliminate_regs_in_insn): Add a parameter.  Use sp offset.  Add
>         argument to lra_eliminate_regs_1 call.
>         (update_reg_eliminate): Move calculation of hard regs for spill
>         lower.  Switch off lra_in_progress temporarily to generate regs
>         involved into elimination.
>         (lra_init_elimination): Rename to init_elimination.  Make it
>         static.  Set up insn sp offset, check the offsets at the end of
>         BBs.
>         (process_insn_for_elimination): Add parameter.  Pass its value to
>         eliminate_regs_in_insn.
>         (lra_eliminate): : Add parameter.  Pass its value to
>         process_insn_for_elimination.  Add assert for param values
>         combination.  Call init_elimination.  Don't update offsets in
>         equivalence substitutions.
>         * lra-spills.c (assign_mem_slot): Don't call lra_eliminate_regs_1
>         for created stack slot.
>         (remove_pseudos): Call lra_eliminate_regs_1 before changing memory
>         onto stack slot.
>

Hi Vladimir,

Thanks for your hard work.   I noticed a few regressions
on x86-64:

FAIL: gcc.dg/guality/pr54519-1.c  -O2  line 20 y == 25
FAIL: gcc.dg/guality/pr54519-1.c  -O2  line 20 z == 6
FAIL: gcc.dg/guality/pr54519-1.c  -O2  line 23 y == 117
FAIL: gcc.dg/guality/pr54519-1.c  -O2  line 23 z == 8
FAIL: gcc.dg/guality/pr54519-1.c  -O3 -fomit-frame-pointer  line 20 x == 36
FAIL: gcc.dg/guality/pr54519-1.c  -O3 -fomit-frame-pointer  line 20 y == 25
FAIL: gcc.dg/guality/pr54519-1.c  -O3 -fomit-frame-pointer  line 20 z == 6
FAIL: gcc.dg/guality/pr54519-1.c  -O3 -g  line 20 x == 36
FAIL: gcc.dg/guality/pr54519-1.c  -O3 -g  line 20 y == 25
FAIL: gcc.dg/guality/pr54519-1.c  -O3 -g  line 20 z == 6
FAIL: gcc.dg/guality/pr54519-3.c  -O2 -flto -fno-use-linker-plugin
-flto-partition=none  line 20 x == 36
FAIL: gcc.dg/guality/pr54519-3.c  -O2 -flto -fno-use-linker-plugin
-flto-partition=none  line 23 x == 98
FAIL: gcc.dg/guality/pr54519-3.c  -O2  line 20 x == 36
FAIL: gcc.dg/guality/pr54519-3.c  -O2  line 20 y == 25
FAIL: gcc.dg/guality/pr54519-3.c  -O2  line 20 z == 6
FAIL: gcc.dg/guality/pr54519-3.c  -O2  line 23 x == 98
FAIL: gcc.dg/guality/pr54519-3.c  -O2  line 23 y == 117
FAIL: gcc.dg/guality/pr54519-3.c  -O2  line 23 z == 8
FAIL: gcc.dg/guality/pr54519-3.c  -O3 -fomit-frame-pointer  line 20 x == 36
FAIL: gcc.dg/guality/pr54519-3.c  -O3 -fomit-frame-pointer  line 20 y == 25
FAIL: gcc.dg/guality/pr54519-3.c  -O3 -fomit-frame-pointer  line 20 z == 6
FAIL: gcc.dg/guality/pr54519-3.c  -O3 -fomit-frame-pointer  line 23 x == 98
FAIL: gcc.dg/guality/pr54519-3.c  -O3 -fomit-frame-pointer  line 23 y == 117
FAIL: gcc.dg/guality/pr54519-3.c  -O3 -fomit-frame-pointer  line 23 z == 8
FAIL: gcc.dg/guality/pr54519-3.c  -O3 -g  line 20 x == 36
FAIL: gcc.dg/guality/pr54519-3.c  -O3 -g  line 20 y == 25
FAIL: gcc.dg/guality/pr54519-3.c  -O3 -g  line 20 z == 6
FAIL: gcc.dg/guality/pr54519-3.c  -O3 -g  line 23 x == 98
FAIL: gcc.dg/guality/pr54519-3.c  -O3 -g  line 23 y == 117
FAIL: gcc.dg/guality/pr54519-3.c  -O3 -g  line 23 z == 8
FAIL: gcc.dg/guality/pr54693-2.c  -O3 -fomit-frame-pointer
-funroll-all-loops -finline-functions  line 21 i == v + 1
FAIL: gcc.dg/guality/pr54693-2.c  -O3 -fomit-frame-pointer
-funroll-loops  line 21 i == v + 1
FAIL: gcc.target/i386/pr9771-1.c (internal compiler error)

Besides guality failures, there is

[hjl@gnu-6 gcc]$ make check-gcc
RUNTESTFLAGS="--target_board='unix{-m32\ -march=corei7}'
i386.exp=pr9771-1.c"
make[1]: Entering directory `/export/build/gnu/gcc/build-x86_64-linux/gcc'
test -d plugin || mkdir plugin
test -d testsuite || mkdir testsuite
test -d testsuite/gcc || mkdir testsuite/gcc
(rootme=`${PWDCMD-pwd}`; export rootme; \
srcdir=`cd /export/gnu/import/git/gcc/gcc; ${PWDCMD-pwd}` ; export srcdir ; \
cd testsuite/gcc; \
rm -f tmp-site.exp; \
sed '/set tmpdir/ s|testsuite$|testsuite/gcc|' \
    < ../../site.exp > tmp-site.exp; \
/bin/sh ${srcdir}/../move-if-change tmp-site.exp site.exp; \
EXPECT=`if [ -f ${rootme}/../expect/expect ] ; then echo
${rootme}/../expect/expect ; else echo expect ; fi` ; export EXPECT ;
\
if [ -f ${rootme}/../expect/expect ] ; then  \
   TCL_LIBRARY=`cd .. ; cd ${srcdir}/../tcl/library ; ${PWDCMD-pwd}` ; \
    export TCL_LIBRARY ; fi ; \
runtestflags= ; \
if [ -n "" ] ; then \
  runtestflags=""; \
elif [ -n "" ] ; then \
  parts="`echo '  ' \
      | sed 's/=[^ ]* / /g'`"; \
  for part in `find $srcdir/testsuite/gcc* -name \*.exp` ; do \
    part=`basename $part` ; \
    case " $parts $runtestflags " in \
      *" $part "*) ;; \
      *) runtestflags="$runtestflags $part" ;; \
    esac ; \
  done ; \
fi ; \
`if [ -f ${srcdir}/../dejagnu/runtest ] ; then echo
${srcdir}/../dejagnu/runtest ; else echo runtest; fi` --tool gcc
--target_board='unix{-m32\ -march=corei7}' i386.exp=pr9771-1.c
$runtestflags)
WARNING: Couldn't find the global config file.
Test Run By hjl on Thu Nov 28 16:50:05 2013
Native configuration is x86_64-unknown-linux-gnu

        === gcc tests ===

Schedule of variations:
    unix/-m32 -march=corei7

Running target unix/-m32 -march=corei7
Using /usr/share/dejagnu/baseboards/unix.exp as board description file
for target.
Using /usr/share/dejagnu/config/unix.exp as generic interface file for target.
Using /export/gnu/import/git/gcc/gcc/testsuite/config/default.exp as
tool-and-target-specific interface file.
Running /export/gnu/import/git/gcc/gcc/testsuite/gcc.target/i386/i386.exp ...
FAIL: gcc.target/i386/pr9771-1.c (internal compiler error)
FAIL: gcc.target/i386/pr9771-1.c (test for excess errors)

        === gcc Summary ===

# of expected passes        10
# of unexpected failures    2
# of unresolved testcases    1

"-m32 -march=corei7" triggers the ICE.


-- 
H.J.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: patch for elimination to SP when it is changed in RTL (PR57293)
  2013-11-29 10:51 ` H.J. Lu
@ 2013-11-29 11:10   ` Vladimir Makarov
  2013-11-29 16:20     ` H.J. Lu
  0 siblings, 1 reply; 20+ messages in thread
From: Vladimir Makarov @ 2013-11-29 11:10 UTC (permalink / raw)
  To: H.J. Lu; +Cc: GCC Patches

On 11/28/2013, 7:51 PM, H.J. Lu wrote:
> On Thu, Nov 28, 2013 at 2:11 PM, Vladimir Makarov <vmakarov@redhat.com> wrote:
>>    The following patch fixes PR57293
>>
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57293
>>
>>    It is actually an implementation of missed LRA functionality in reg
>> elimination.  Before the patch any explicit change of stack pointer in
>> RTL resulted in necessity to use the frame pointer.
>>
>>    The patch has practically no effect on generic tuning of x86/x86-64.
>> But it has a dramatic effect on code performance for other tunings
>> like corei7 which don't use incoming args accumulation.  The maximum
>> SPEC2000 improvement 2.5% is achieved on x86 SPECInt2000.  But
>> SPECFP2000 rate also has improvement about 1% on x86 and x86-64.  Too
>> bad that I did not implement it at the first place.  The results would
>> have been even much better ones reported on 2012 GNU Cauldron as I
>> also used -mtune=corei7 that time.
>>
>> The patch was bootstrapped and tested on x86-64/x86 and ppc.
>>
>> Committed as rev. 205498.
>>
>>   2013-11-28  Vladimir Makarov<vmakarov@redhat.com>
>>
>>          PR target/57293
>>          * ira.h (ira_setup_eliminable_regset): Remove parameter.
>>          * ira.c (ira_setup_eliminable_regset): Ditto.  Add
>>          SUPPORTS_STACK_ALIGNMENT for crtl->stack_realign_needed.
>>          Don't call lra_init_elimination.
>>          (ira): Call ira_setup_eliminable_regset without arguments.
>>          * loop-invariant.c (calculate_loop_reg_pressure): Remove argument
>>          from ira_setup_eliminable_regset call.
>>          * gcse.c (calculate_bb_reg_pressure): Ditto.
>>          * haifa-sched.c (sched_init): Ditto.
>>          * lra.h (lra_init_elimination): Remove the prototype.
>>          * lra-int.h (lra_insn_recog_data): New member sp_offset.  Move
>>          used_insn_alternative upper.
>>          (lra_eliminate_regs_1): Add one more parameter.
>>          (lra-eliminate): Ditto.
>>          * lra.c (lra_invalidate_insn_data): Set sp_offset.
>>          (setup_sp_offset): New.
>>          (lra_process_new_insns): Call setup_sp_offset.
>>          (lra): Add argument to lra_eliminate calls.
>>          * lra-constraints.c (get_equiv_substitution): Rename to get_equiv.
>>          (get_equiv_with_elimination): New.
>>          (process_addr_reg): Call get_equiv_with_elimination instead of
>>          get_equiv_substitution.
>>          (equiv_address_substitution): Ditto.
>>          (loc_equivalence_change_p): Ditto.
>>          (loc_equivalence_callback, lra_constraints): Ditto.
>>          (curr_insn_transform): Ditto.  Print the sp offset
>>          (process_alt_operands): Prevent stack pointer reloads.
>>          (lra_constraints): Remove one argument from lra_eliminate call.
>>          Move it up.  Mark used hard regs bfore it.  Use
>>          get_equiv_with_elimination instead of get_equiv_substitution.
>>          * lra-eliminations.c (lra_eliminate_regs_1): Add parameter and
>>          assert for param values combination.  Use sp offset.  Add argument
>>          to lra_eliminate_regs_1 calls.
>>          (lra_eliminate_regs): Add argument to lra_eliminate_regs_1 call.
>>          (curr_sp_change): New static var.
>>          (mark_not_eliminable): Add parameter.  Update curr_sp_change.
>>          Don't prevent elimination to sp if we can calculate its change.
>>          Pass the argument to mark_not_eliminable calls.
>>          (eliminate_regs_in_insn): Add a parameter.  Use sp offset.  Add
>>          argument to lra_eliminate_regs_1 call.
>>          (update_reg_eliminate): Move calculation of hard regs for spill
>>          lower.  Switch off lra_in_progress temporarily to generate regs
>>          involved into elimination.
>>          (lra_init_elimination): Rename to init_elimination.  Make it
>>          static.  Set up insn sp offset, check the offsets at the end of
>>          BBs.
>>          (process_insn_for_elimination): Add parameter.  Pass its value to
>>          eliminate_regs_in_insn.
>>          (lra_eliminate): : Add parameter.  Pass its value to
>>          process_insn_for_elimination.  Add assert for param values
>>          combination.  Call init_elimination.  Don't update offsets in
>>          equivalence substitutions.
>>          * lra-spills.c (assign_mem_slot): Don't call lra_eliminate_regs_1
>>          for created stack slot.
>>          (remove_pseudos): Call lra_eliminate_regs_1 before changing memory
>>          onto stack slot.
>>
>
> Hi Vladimir,
>
> Thanks for your hard work.   I noticed a few regressions
> on x86-64:
>
> FAIL: gcc.dg/guality/pr54519-1.c  -O2  line 20 y == 25
> FAIL: gcc.dg/guality/pr54519-1.c  -O2  line 20 z == 6
> FAIL: gcc.dg/guality/pr54519-1.c  -O2  line 23 y == 117
> FAIL: gcc.dg/guality/pr54519-1.c  -O2  line 23 z == 8
> FAIL: gcc.dg/guality/pr54519-1.c  -O3 -fomit-frame-pointer  line 20 x == 36
> FAIL: gcc.dg/guality/pr54519-1.c  -O3 -fomit-frame-pointer  line 20 y == 25
> FAIL: gcc.dg/guality/pr54519-1.c  -O3 -fomit-frame-pointer  line 20 z == 6
> FAIL: gcc.dg/guality/pr54519-1.c  -O3 -g  line 20 x == 36
> FAIL: gcc.dg/guality/pr54519-1.c  -O3 -g  line 20 y == 25
> FAIL: gcc.dg/guality/pr54519-1.c  -O3 -g  line 20 z == 6
> FAIL: gcc.dg/guality/pr54519-3.c  -O2 -flto -fno-use-linker-plugin
> -flto-partition=none  line 20 x == 36
> FAIL: gcc.dg/guality/pr54519-3.c  -O2 -flto -fno-use-linker-plugin
> -flto-partition=none  line 23 x == 98
> FAIL: gcc.dg/guality/pr54519-3.c  -O2  line 20 x == 36
> FAIL: gcc.dg/guality/pr54519-3.c  -O2  line 20 y == 25
> FAIL: gcc.dg/guality/pr54519-3.c  -O2  line 20 z == 6
> FAIL: gcc.dg/guality/pr54519-3.c  -O2  line 23 x == 98
> FAIL: gcc.dg/guality/pr54519-3.c  -O2  line 23 y == 117
> FAIL: gcc.dg/guality/pr54519-3.c  -O2  line 23 z == 8
> FAIL: gcc.dg/guality/pr54519-3.c  -O3 -fomit-frame-pointer  line 20 x == 36
> FAIL: gcc.dg/guality/pr54519-3.c  -O3 -fomit-frame-pointer  line 20 y == 25
> FAIL: gcc.dg/guality/pr54519-3.c  -O3 -fomit-frame-pointer  line 20 z == 6
> FAIL: gcc.dg/guality/pr54519-3.c  -O3 -fomit-frame-pointer  line 23 x == 98
> FAIL: gcc.dg/guality/pr54519-3.c  -O3 -fomit-frame-pointer  line 23 y == 117
> FAIL: gcc.dg/guality/pr54519-3.c  -O3 -fomit-frame-pointer  line 23 z == 8
> FAIL: gcc.dg/guality/pr54519-3.c  -O3 -g  line 20 x == 36
> FAIL: gcc.dg/guality/pr54519-3.c  -O3 -g  line 20 y == 25
> FAIL: gcc.dg/guality/pr54519-3.c  -O3 -g  line 20 z == 6
> FAIL: gcc.dg/guality/pr54519-3.c  -O3 -g  line 23 x == 98
> FAIL: gcc.dg/guality/pr54519-3.c  -O3 -g  line 23 y == 117
> FAIL: gcc.dg/guality/pr54519-3.c  -O3 -g  line 23 z == 8
> FAIL: gcc.dg/guality/pr54693-2.c  -O3 -fomit-frame-pointer
> -funroll-all-loops -finline-functions  line 21 i == v + 1
> FAIL: gcc.dg/guality/pr54693-2.c  -O3 -fomit-frame-pointer
> -funroll-loops  line 21 i == v + 1
> FAIL: gcc.target/i386/pr9771-1.c (internal compiler error)
>

Well, I guess it can happen when you don't use frame-pointer anymore.

> Besides guality failures, there is
>
> [hjl@gnu-6 gcc]$ make check-gcc
> RUNTESTFLAGS="--target_board='unix{-m32\ -march=corei7}'
> i386.exp=pr9771-1.c"
> make[1]: Entering directory `/export/build/gnu/gcc/build-x86_64-linux/gcc'
> test -d plugin || mkdir plugin
> test -d testsuite || mkdir testsuite
> test -d testsuite/gcc || mkdir testsuite/gcc
> (rootme=`${PWDCMD-pwd}`; export rootme; \
> srcdir=`cd /export/gnu/import/git/gcc/gcc; ${PWDCMD-pwd}` ; export srcdir ; \
> cd testsuite/gcc; \
> rm -f tmp-site.exp; \
> sed '/set tmpdir/ s|testsuite$|testsuite/gcc|' \
>      < ../../site.exp > tmp-site.exp; \
> /bin/sh ${srcdir}/../move-if-change tmp-site.exp site.exp; \
> EXPECT=`if [ -f ${rootme}/../expect/expect ] ; then echo
> ${rootme}/../expect/expect ; else echo expect ; fi` ; export EXPECT ;
> \
> if [ -f ${rootme}/../expect/expect ] ; then  \
>     TCL_LIBRARY=`cd .. ; cd ${srcdir}/../tcl/library ; ${PWDCMD-pwd}` ; \
>      export TCL_LIBRARY ; fi ; \
> runtestflags= ; \
> if [ -n "" ] ; then \
>    runtestflags=""; \
> elif [ -n "" ] ; then \
>    parts="`echo '  ' \
>        | sed 's/=[^ ]* / /g'`"; \
>    for part in `find $srcdir/testsuite/gcc* -name \*.exp` ; do \
>      part=`basename $part` ; \
>      case " $parts $runtestflags " in \
>        *" $part "*) ;; \
>        *) runtestflags="$runtestflags $part" ;; \
>      esac ; \
>    done ; \
> fi ; \
> `if [ -f ${srcdir}/../dejagnu/runtest ] ; then echo
> ${srcdir}/../dejagnu/runtest ; else echo runtest; fi` --tool gcc
> --target_board='unix{-m32\ -march=corei7}' i386.exp=pr9771-1.c
> $runtestflags)
> WARNING: Couldn't find the global config file.
> Test Run By hjl on Thu Nov 28 16:50:05 2013
> Native configuration is x86_64-unknown-linux-gnu
>
>          === gcc tests ===
>
> Schedule of variations:
>      unix/-m32 -march=corei7
>
> Running target unix/-m32 -march=corei7
> Using /usr/share/dejagnu/baseboards/unix.exp as board description file
> for target.
> Using /usr/share/dejagnu/config/unix.exp as generic interface file for target.
> Using /export/gnu/import/git/gcc/gcc/testsuite/config/default.exp as
> tool-and-target-specific interface file.
> Running /export/gnu/import/git/gcc/gcc/testsuite/gcc.target/i386/i386.exp ...
> FAIL: gcc.target/i386/pr9771-1.c (internal compiler error)
> FAIL: gcc.target/i386/pr9771-1.c (test for excess errors)
>

May be it is wrong to use this test with this tuning as reload also 
fails on this test when -march=corei7 is used.  Or most probably the 
problem is somewhere else.  The patch just triggered it.


>          === gcc Summary ===
>
> # of expected passes        10
> # of unexpected failures    2
> # of unresolved testcases    1
>
> "-m32 -march=corei7" triggers the ICE.
>
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: patch for elimination to SP when it is changed in RTL (PR57293)
  2013-11-29 11:10   ` Vladimir Makarov
@ 2013-11-29 16:20     ` H.J. Lu
  2013-11-29 17:02       ` Jan Hubicka
  0 siblings, 1 reply; 20+ messages in thread
From: H.J. Lu @ 2013-11-29 16:20 UTC (permalink / raw)
  To: Vladimir Makarov, Jan Hubicka; +Cc: GCC Patches

On Thu, Nov 28, 2013 at 9:18 PM, Vladimir Makarov <vmakarov@redhat.com> wrote:
> On 11/28/2013, 7:51 PM, H.J. Lu wrote:
>>
>> On Thu, Nov 28, 2013 at 2:11 PM, Vladimir Makarov <vmakarov@redhat.com>
>> wrote:
>>>
>>>    The following patch fixes PR57293
>>>
>>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57293
>>>
>>>    It is actually an implementation of missed LRA functionality in reg
>>> elimination.  Before the patch any explicit change of stack pointer in
>>> RTL resulted in necessity to use the frame pointer.
>>>
>>>    The patch has practically no effect on generic tuning of x86/x86-64.
>>> But it has a dramatic effect on code performance for other tunings
>>> like corei7 which don't use incoming args accumulation.  The maximum
>>> SPEC2000 improvement 2.5% is achieved on x86 SPECInt2000.  But
>>> SPECFP2000 rate also has improvement about 1% on x86 and x86-64.  Too
>>> bad that I did not implement it at the first place.  The results would
>>> have been even much better ones reported on 2012 GNU Cauldron as I
>>> also used -mtune=corei7 that time.
>>>
>>> The patch was bootstrapped and tested on x86-64/x86 and ppc.
>>>
>>> Committed as rev. 205498.
>>>
>>>   2013-11-28  Vladimir Makarov<vmakarov@redhat.com>
>>>
>>>          PR target/57293
>>>          * ira.h (ira_setup_eliminable_regset): Remove parameter.
>>>          * ira.c (ira_setup_eliminable_regset): Ditto.  Add
>>>          SUPPORTS_STACK_ALIGNMENT for crtl->stack_realign_needed.
>>>          Don't call lra_init_elimination.
>>>          (ira): Call ira_setup_eliminable_regset without arguments.
>>>          * loop-invariant.c (calculate_loop_reg_pressure): Remove
>>> argument
>>>          from ira_setup_eliminable_regset call.
>>>          * gcse.c (calculate_bb_reg_pressure): Ditto.
>>>          * haifa-sched.c (sched_init): Ditto.
>>>          * lra.h (lra_init_elimination): Remove the prototype.
>>>          * lra-int.h (lra_insn_recog_data): New member sp_offset.  Move
>>>          used_insn_alternative upper.
>>>          (lra_eliminate_regs_1): Add one more parameter.
>>>          (lra-eliminate): Ditto.
>>>          * lra.c (lra_invalidate_insn_data): Set sp_offset.
>>>          (setup_sp_offset): New.
>>>          (lra_process_new_insns): Call setup_sp_offset.
>>>          (lra): Add argument to lra_eliminate calls.
>>>          * lra-constraints.c (get_equiv_substitution): Rename to
>>> get_equiv.
>>>          (get_equiv_with_elimination): New.
>>>          (process_addr_reg): Call get_equiv_with_elimination instead of
>>>          get_equiv_substitution.
>>>          (equiv_address_substitution): Ditto.
>>>          (loc_equivalence_change_p): Ditto.
>>>          (loc_equivalence_callback, lra_constraints): Ditto.
>>>          (curr_insn_transform): Ditto.  Print the sp offset
>>>          (process_alt_operands): Prevent stack pointer reloads.
>>>          (lra_constraints): Remove one argument from lra_eliminate call.
>>>          Move it up.  Mark used hard regs bfore it.  Use
>>>          get_equiv_with_elimination instead of get_equiv_substitution.
>>>          * lra-eliminations.c (lra_eliminate_regs_1): Add parameter and
>>>          assert for param values combination.  Use sp offset.  Add
>>> argument
>>>          to lra_eliminate_regs_1 calls.
>>>          (lra_eliminate_regs): Add argument to lra_eliminate_regs_1 call.
>>>          (curr_sp_change): New static var.
>>>          (mark_not_eliminable): Add parameter.  Update curr_sp_change.
>>>          Don't prevent elimination to sp if we can calculate its change.
>>>          Pass the argument to mark_not_eliminable calls.
>>>          (eliminate_regs_in_insn): Add a parameter.  Use sp offset.  Add
>>>          argument to lra_eliminate_regs_1 call.
>>>          (update_reg_eliminate): Move calculation of hard regs for spill
>>>          lower.  Switch off lra_in_progress temporarily to generate regs
>>>          involved into elimination.
>>>          (lra_init_elimination): Rename to init_elimination.  Make it
>>>          static.  Set up insn sp offset, check the offsets at the end of
>>>          BBs.
>>>          (process_insn_for_elimination): Add parameter.  Pass its value
>>> to
>>>          eliminate_regs_in_insn.
>>>          (lra_eliminate): : Add parameter.  Pass its value to
>>>          process_insn_for_elimination.  Add assert for param values
>>>          combination.  Call init_elimination.  Don't update offsets in
>>>          equivalence substitutions.
>>>          * lra-spills.c (assign_mem_slot): Don't call
>>> lra_eliminate_regs_1
>>>          for created stack slot.
>>>          (remove_pseudos): Call lra_eliminate_regs_1 before changing
>>> memory
>>>          onto stack slot.
>>>
>>
>> Hi Vladimir,
>>
>> Thanks for your hard work.   I noticed a few regressions
>> on x86-64:
>>

>> FAIL: gcc.target/i386/pr9771-1.c (internal compiler error)
>>
>
> Well, I guess it can happen when you don't use frame-pointer anymore.
>
>> FAIL: gcc.target/i386/pr9771-1.c (internal compiler error)
>> FAIL: gcc.target/i386/pr9771-1.c (test for excess errors)
>>
>
> May be it is wrong to use this test with this tuning as reload also fails on
> this test when -march=corei7 is used.  Or most probably the problem is
> somewhere else.  The patch just triggered it.
>

There is a comment:

   FIXME: the flags is incorrectly enabled for amdfam10, Bulldozer,
   Bobcat and Generic.  This is because disabling it causes large
   regression on mgrid due to IRA limitation leading to unecessary
   use of the frame pointer in 32bit mode.  */
DEF_TUNE (X86_TUNE_ACCUMULATE_OUTGOING_ARGS, "accumulate_outgoing_args",
          m_PPRO | m_P4_NOCONA | m_ATOM | m_SLM | m_AMD_MULTIPLE | m_GENERIC)

Does this mean we can enable it for m_AMD_MULTIPLE and
m_GENERIC?  If this bug is fixed, we should add

-mtune-ctrl=accumulate_outgoing_args

to gcc.target/i386/pr9771-1.c since it may be disabled by
default.

Thanks.

-- 
H.J.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: patch for elimination to SP when it is changed in RTL (PR57293)
  2013-11-29 16:20     ` H.J. Lu
@ 2013-11-29 17:02       ` Jan Hubicka
  0 siblings, 0 replies; 20+ messages in thread
From: Jan Hubicka @ 2013-11-29 17:02 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Vladimir Makarov, Jan Hubicka, GCC Patches

> There is a comment:
> 
>    FIXME: the flags is incorrectly enabled for amdfam10, Bulldozer,
>    Bobcat and Generic.  This is because disabling it causes large
>    regression on mgrid due to IRA limitation leading to unecessary
>    use of the frame pointer in 32bit mode.  */
> DEF_TUNE (X86_TUNE_ACCUMULATE_OUTGOING_ARGS, "accumulate_outgoing_args",
>           m_PPRO | m_P4_NOCONA | m_ATOM | m_SLM | m_AMD_MULTIPLE | m_GENERIC)
> 
> Does this mean we can enable it for m_AMD_MULTIPLE and
> m_GENERIC?  If this bug is fixed, we should add

I hope we can add it for fam10/buldozer/generic.  m_AMD_MULTIPLE covers also
earier targets.  Of course we need to benchmark it first, I will configure benchmarking
now.

Thanks for looking into this!

Honza
> 
> -mtune-ctrl=accumulate_outgoing_args
> 
> to gcc.target/i386/pr9771-1.c since it may be disabled by
> default.
> 
> Thanks.
> 
> -- 
> H.J.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: patch for elimination to SP when it is changed in RTL (PR57293)
  2013-11-29  5:19 patch for elimination to SP when it is changed in RTL (PR57293) Vladimir Makarov
  2013-11-29 10:51 ` H.J. Lu
@ 2013-12-01 12:57 ` James Greenhalgh
  2013-12-02 16:32   ` Vladimir Makarov
  2013-12-02 23:44   ` Vladimir Makarov
  1 sibling, 2 replies; 20+ messages in thread
From: James Greenhalgh @ 2013-12-01 12:57 UTC (permalink / raw)
  To: Vladimir Makarov; +Cc: GCC Patches

On Thu, Nov 28, 2013 at 10:11:26PM +0000, Vladimir Makarov wrote:
> Committed as rev. 205498.
> 
>   2013-11-28  Vladimir Makarov<vmakarov@redhat.com>
> 
> 	PR target/57293
> 	* ira.h (ira_setup_eliminable_regset): Remove parameter.
> 	* ira.c (ira_setup_eliminable_regset): Ditto.  Add
> 	SUPPORTS_STACK_ALIGNMENT for crtl->stack_realign_needed.
> 	Don't call lra_init_elimination.
> 	(ira): Call ira_setup_eliminable_regset without arguments.
> 	* loop-invariant.c (calculate_loop_reg_pressure): Remove argument
> 	from ira_setup_eliminable_regset call.
> 	* gcse.c (calculate_bb_reg_pressure): Ditto.
> 	* haifa-sched.c (sched_init): Ditto.
> 	* lra.h (lra_init_elimination): Remove the prototype.
> 	* lra-int.h (lra_insn_recog_data): New member sp_offset.  Move
> 	used_insn_alternative upper.
> 	(lra_eliminate_regs_1): Add one more parameter.
> 	(lra-eliminate): Ditto.
> 	* lra.c (lra_invalidate_insn_data): Set sp_offset.
> 	(setup_sp_offset): New.
> 	(lra_process_new_insns): Call setup_sp_offset.
> 	(lra): Add argument to lra_eliminate calls.
> 	* lra-constraints.c (get_equiv_substitution): Rename to get_equiv.
> 	(get_equiv_with_elimination): New.
> 	(process_addr_reg): Call get_equiv_with_elimination instead of
> 	get_equiv_substitution.
> 	(equiv_address_substitution): Ditto.
> 	(loc_equivalence_change_p): Ditto.
> 	(loc_equivalence_callback, lra_constraints): Ditto.
> 	(curr_insn_transform): Ditto.  Print the sp offset
> 	(process_alt_operands): Prevent stack pointer reloads.
> 	(lra_constraints): Remove one argument from lra_eliminate call.
> 	Move it up.  Mark used hard regs bfore it.  Use
> 	get_equiv_with_elimination instead of get_equiv_substitution.
> 	* lra-eliminations.c (lra_eliminate_regs_1): Add parameter and
> 	assert for param values combination.  Use sp offset.  Add argument
> 	to lra_eliminate_regs_1 calls.
> 	(lra_eliminate_regs): Add argument to lra_eliminate_regs_1 call.
> 	(curr_sp_change): New static var.
> 	(mark_not_eliminable): Add parameter.  Update curr_sp_change.
> 	Don't prevent elimination to sp if we can calculate its change.
> 	Pass the argument to mark_not_eliminable calls.
> 	(eliminate_regs_in_insn): Add a parameter.  Use sp offset.  Add
> 	argument to lra_eliminate_regs_1 call.
> 	(update_reg_eliminate): Move calculation of hard regs for spill
> 	lower.  Switch off lra_in_progress temporarily to generate regs
> 	involved into elimination.
> 	(lra_init_elimination): Rename to init_elimination.  Make it
> 	static.  Set up insn sp offset, check the offsets at the end of
> 	BBs.
> 	(process_insn_for_elimination): Add parameter.  Pass its value to
> 	eliminate_regs_in_insn.
> 	(lra_eliminate): : Add parameter.  Pass its value to
> 	process_insn_for_elimination.  Add assert for param values
> 	combination.  Call init_elimination.  Don't update offsets in
> 	equivalence substitutions.
> 	* lra-spills.c (assign_mem_slot): Don't call lra_eliminate_regs_1
> 	for created stack slot.
> 	(remove_pseudos): Call lra_eliminate_regs_1 before changing memory
> 	onto stack slot.
> 
> 2013-11-28  Vladimir Makarov<vmakarov@redhat.com>
> 
> 	PR target/57293
> 	* gcc.target/i386/pr57293.c: New.

Hi Vlad,

This patch seems to cause some problems for AArch64. I see an assert
triggering when building libgloss:

/work/gcc-clean/build-aarch64-none-elf/obj/gcc1/gcc/xgcc -B/work/gcc-clean/build-aarch64-none-elf/obj/gcc1/gcc/ -B/work/gcc-clean/build-aarch64-none-elf/obj/binutils/aarch64-none-elf/newlib/ -isystem /work/gcc-clean/build-aarch64-none-elf/obj/binutils/aarch64-none-elf/newlib/targ-include -isystem /work/gcc-clean/src/binutils/newlib/libc/include -B/work/gcc-clean/build-aarch64-none-elf/obj/binutils/aarch64-none-elf/libgloss/aarch64 -L/work/gcc-clean/build-aarch64-none-elf/obj/binutils/aarch64-none-elf/libgloss/libnosys -L/work/gcc-clean/src/binutils/libgloss/aarch64 -L/work/gcc-clean/build-aarch64-none-elf/obj/binutils/./ld    -O2 -g -O2 -g -I. -I/work/gcc-clean/src/binutils/libgloss/aarch64/.. -DARM_RDI_MONITOR -o rdimon-_exit.o -c /work/gcc-clean/src/binutils/libgloss/aarch64/_exit.c
/work/gcc-clean/src/binutils/libgloss/aarch64/_exit.c: In function '_exit':
/work/gcc-clean/src/binutils/libgloss/aarch64/_exit.c:41:1: internal compiler error: in update_reg_eliminate, at lra-eliminations.c:1157
 }
 ^
0x84587e update_reg_eliminate
	/work/gcc-clean/src/gcc/gcc/lra-eliminations.c:1157
0x846ca6 lra_eliminate(bool, bool)
	/work/gcc-clean/src/gcc/gcc/lra-eliminations.c:1387
0x84114a lra_constraints(bool)
	/work/gcc-clean/src/gcc/gcc/lra-constraints.c:3887
0x832f3b lra(_IO_FILE*)
	/work/gcc-clean/src/gcc/gcc/lra.c:2331
0x7f3b1e do_reload
	/work/gcc-clean/src/gcc/gcc/ira.c:5451
0x7f3b1e rest_of_handle_reload
	/work/gcc-clean/src/gcc/gcc/ira.c:5580
0x7f3b1e execute
	/work/gcc-clean/src/gcc/gcc/ira.c:5609
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <http://gcc.gnu.org/bugs.html> for instructions.
make[2]: *** [rdimon-_exit.o] Error 1
make[2]: *** Waiting for unfinished jobs....

If you need any more details or logs, let me know.

Thanks,
James

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: patch for elimination to SP when it is changed in RTL (PR57293)
  2013-12-01 12:57 ` James Greenhalgh
@ 2013-12-02 16:32   ` Vladimir Makarov
  2013-12-02 23:44   ` Vladimir Makarov
  1 sibling, 0 replies; 20+ messages in thread
From: Vladimir Makarov @ 2013-12-02 16:32 UTC (permalink / raw)
  To: James Greenhalgh; +Cc: GCC Patches

On 12/1/2013, 7:57 AM, James Greenhalgh wrote:
> On Thu, Nov 28, 2013 at 10:11:26PM +0000, Vladimir Makarov wrote:
>> Committed as rev. 205498.
>>
>>    2013-11-28  Vladimir Makarov<vmakarov@redhat.com>
>>
>> 	PR target/57293
>> 	* ira.h (ira_setup_eliminable_regset): Remove parameter.
>> 	* ira.c (ira_setup_eliminable_regset): Ditto.  Add
>> 	SUPPORTS_STACK_ALIGNMENT for crtl->stack_realign_needed.
>> 	Don't call lra_init_elimination.
>> 	(ira): Call ira_setup_eliminable_regset without arguments.
>> 	* loop-invariant.c (calculate_loop_reg_pressure): Remove argument
>> 	from ira_setup_eliminable_regset call.
>> 	* gcse.c (calculate_bb_reg_pressure): Ditto.
>> 	* haifa-sched.c (sched_init): Ditto.
>> 	* lra.h (lra_init_elimination): Remove the prototype.
>> 	* lra-int.h (lra_insn_recog_data): New member sp_offset.  Move
>> 	used_insn_alternative upper.
>> 	(lra_eliminate_regs_1): Add one more parameter.
>> 	(lra-eliminate): Ditto.
>> 	* lra.c (lra_invalidate_insn_data): Set sp_offset.
>> 	(setup_sp_offset): New.
>> 	(lra_process_new_insns): Call setup_sp_offset.
>> 	(lra): Add argument to lra_eliminate calls.
>> 	* lra-constraints.c (get_equiv_substitution): Rename to get_equiv.
>> 	(get_equiv_with_elimination): New.
>> 	(process_addr_reg): Call get_equiv_with_elimination instead of
>> 	get_equiv_substitution.
>> 	(equiv_address_substitution): Ditto.
>> 	(loc_equivalence_change_p): Ditto.
>> 	(loc_equivalence_callback, lra_constraints): Ditto.
>> 	(curr_insn_transform): Ditto.  Print the sp offset
>> 	(process_alt_operands): Prevent stack pointer reloads.
>> 	(lra_constraints): Remove one argument from lra_eliminate call.
>> 	Move it up.  Mark used hard regs bfore it.  Use
>> 	get_equiv_with_elimination instead of get_equiv_substitution.
>> 	* lra-eliminations.c (lra_eliminate_regs_1): Add parameter and
>> 	assert for param values combination.  Use sp offset.  Add argument
>> 	to lra_eliminate_regs_1 calls.
>> 	(lra_eliminate_regs): Add argument to lra_eliminate_regs_1 call.
>> 	(curr_sp_change): New static var.
>> 	(mark_not_eliminable): Add parameter.  Update curr_sp_change.
>> 	Don't prevent elimination to sp if we can calculate its change.
>> 	Pass the argument to mark_not_eliminable calls.
>> 	(eliminate_regs_in_insn): Add a parameter.  Use sp offset.  Add
>> 	argument to lra_eliminate_regs_1 call.
>> 	(update_reg_eliminate): Move calculation of hard regs for spill
>> 	lower.  Switch off lra_in_progress temporarily to generate regs
>> 	involved into elimination.
>> 	(lra_init_elimination): Rename to init_elimination.  Make it
>> 	static.  Set up insn sp offset, check the offsets at the end of
>> 	BBs.
>> 	(process_insn_for_elimination): Add parameter.  Pass its value to
>> 	eliminate_regs_in_insn.
>> 	(lra_eliminate): : Add parameter.  Pass its value to
>> 	process_insn_for_elimination.  Add assert for param values
>> 	combination.  Call init_elimination.  Don't update offsets in
>> 	equivalence substitutions.
>> 	* lra-spills.c (assign_mem_slot): Don't call lra_eliminate_regs_1
>> 	for created stack slot.
>> 	(remove_pseudos): Call lra_eliminate_regs_1 before changing memory
>> 	onto stack slot.
>>
>> 2013-11-28  Vladimir Makarov<vmakarov@redhat.com>
>>
>> 	PR target/57293
>> 	* gcc.target/i386/pr57293.c: New.
>
> Hi Vlad,
>
> This patch seems to cause some problems for AArch64. I see an assert
> triggering when building libgloss:
>
> /work/gcc-clean/build-aarch64-none-elf/obj/gcc1/gcc/xgcc -B/work/gcc-clean/build-aarch64-none-elf/obj/gcc1/gcc/ -B/work/gcc-clean/build-aarch64-none-elf/obj/binutils/aarch64-none-elf/newlib/ -isystem /work/gcc-clean/build-aarch64-none-elf/obj/binutils/aarch64-none-elf/newlib/targ-include -isystem /work/gcc-clean/src/binutils/newlib/libc/include -B/work/gcc-clean/build-aarch64-none-elf/obj/binutils/aarch64-none-elf/libgloss/aarch64 -L/work/gcc-clean/build-aarch64-none-elf/obj/binutils/aarch64-none-elf/libgloss/libnosys -L/work/gcc-clean/src/binutils/libgloss/aarch64 -L/work/gcc-clean/build-aarch64-none-elf/obj/binutils/./ld    -O2 -g -O2 -g -I. -I/work/gcc-clean/src/binutils/libgloss/aarch64/.. -DARM_RDI_MONITOR -o rdimon-_exit.o -c /work/gcc-clean/src/binutils/libgloss/aarch64/_exit.c
> /work/gcc-clean/src/binutils/libgloss/aarch64/_exit.c: In function '_exit':
> /work/gcc-clean/src/binutils/libgloss/aarch64/_exit.c:41:1: internal compiler error: in update_reg_eliminate, at lra-eliminations.c:1157
>   }
>
Thanks, James.  I'll try to reproduce it with the cross-compiler.  I 
expected that the patch might be disruptive.  It is pretty big. 
Therefore I started to work on it (and the related PRs) first. I'll try 
to fix as soon as I can.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: patch for elimination to SP when it is changed in RTL (PR57293)
  2013-12-01 12:57 ` James Greenhalgh
  2013-12-02 16:32   ` Vladimir Makarov
@ 2013-12-02 23:44   ` Vladimir Makarov
  2013-12-03  5:13     ` Jeff Law
  2013-12-03 11:54     ` Marcus Shawcroft
  1 sibling, 2 replies; 20+ messages in thread
From: Vladimir Makarov @ 2013-12-02 23:44 UTC (permalink / raw)
  To: James Greenhalgh; +Cc: GCC Patches, Richard Earnshaw

On 12/1/2013, 7:57 AM, James Greenhalgh wrote:
> On Thu, Nov 28, 2013 at 10:11:26PM +0000, Vladimir Makarov wrote:
>> Committed as rev. 205498.
>>
>>    2013-11-28  Vladimir Makarov<vmakarov@redhat.com>
>>
>> 	PR target/57293
>> 	* ira.h (ira_setup_eliminable_regset): Remove parameter.
>> 	* ira.c (ira_setup_eliminable_regset): Ditto.  Add
>> 	SUPPORTS_STACK_ALIGNMENT for crtl->stack_realign_needed.
>> 	Don't call lra_init_elimination.
>> 	(ira): Call ira_setup_eliminable_regset without arguments.
>> 	* loop-invariant.c (calculate_loop_reg_pressure): Remove argument
>> 	from ira_setup_eliminable_regset call.
>> 	* gcse.c (calculate_bb_reg_pressure): Ditto.
>> 	* haifa-sched.c (sched_init): Ditto.
>> 	* lra.h (lra_init_elimination): Remove the prototype.
>> 	* lra-int.h (lra_insn_recog_data): New member sp_offset.  Move
>> 	used_insn_alternative upper.
>> 	(lra_eliminate_regs_1): Add one more parameter.
>> 	(lra-eliminate): Ditto.
>> 	* lra.c (lra_invalidate_insn_data): Set sp_offset.
>> 	(setup_sp_offset): New.
>> 	(lra_process_new_insns): Call setup_sp_offset.
>> 	(lra): Add argument to lra_eliminate calls.
>> 	* lra-constraints.c (get_equiv_substitution): Rename to get_equiv.
>> 	(get_equiv_with_elimination): New.
>> 	(process_addr_reg): Call get_equiv_with_elimination instead of
>> 	get_equiv_substitution.
>> 	(equiv_address_substitution): Ditto.
>> 	(loc_equivalence_change_p): Ditto.
>> 	(loc_equivalence_callback, lra_constraints): Ditto.
>> 	(curr_insn_transform): Ditto.  Print the sp offset
>> 	(process_alt_operands): Prevent stack pointer reloads.
>> 	(lra_constraints): Remove one argument from lra_eliminate call.
>> 	Move it up.  Mark used hard regs bfore it.  Use
>> 	get_equiv_with_elimination instead of get_equiv_substitution.
>> 	* lra-eliminations.c (lra_eliminate_regs_1): Add parameter and
>> 	assert for param values combination.  Use sp offset.  Add argument
>> 	to lra_eliminate_regs_1 calls.
>> 	(lra_eliminate_regs): Add argument to lra_eliminate_regs_1 call.
>> 	(curr_sp_change): New static var.
>> 	(mark_not_eliminable): Add parameter.  Update curr_sp_change.
>> 	Don't prevent elimination to sp if we can calculate its change.
>> 	Pass the argument to mark_not_eliminable calls.
>> 	(eliminate_regs_in_insn): Add a parameter.  Use sp offset.  Add
>> 	argument to lra_eliminate_regs_1 call.
>> 	(update_reg_eliminate): Move calculation of hard regs for spill
>> 	lower.  Switch off lra_in_progress temporarily to generate regs
>> 	involved into elimination.
>> 	(lra_init_elimination): Rename to init_elimination.  Make it
>> 	static.  Set up insn sp offset, check the offsets at the end of
>> 	BBs.
>> 	(process_insn_for_elimination): Add parameter.  Pass its value to
>> 	eliminate_regs_in_insn.
>> 	(lra_eliminate): : Add parameter.  Pass its value to
>> 	process_insn_for_elimination.  Add assert for param values
>> 	combination.  Call init_elimination.  Don't update offsets in
>> 	equivalence substitutions.
>> 	* lra-spills.c (assign_mem_slot): Don't call lra_eliminate_regs_1
>> 	for created stack slot.
>> 	(remove_pseudos): Call lra_eliminate_regs_1 before changing memory
>> 	onto stack slot.
>>
>> 2013-11-28  Vladimir Makarov<vmakarov@redhat.com>
>>
>> 	PR target/57293
>> 	* gcc.target/i386/pr57293.c: New.
>
> Hi Vlad,
>
> This patch seems to cause some problems for AArch64. I see an assert
> triggering when building libgloss:
>
> /work/gcc-clean/build-aarch64-none-elf/obj/gcc1/gcc/xgcc -B/work/gcc-clean/build-aarch64-none-elf/obj/gcc1/gcc/ -B/work/gcc-clean/build-aarch64-none-elf/obj/binutils/aarch64-none-elf/newlib/ -isystem /work/gcc-clean/build-aarch64-none-elf/obj/binutils/aarch64-none-elf/newlib/targ-include -isystem /work/gcc-clean/src/binutils/newlib/libc/include -B/work/gcc-clean/build-aarch64-none-elf/obj/binutils/aarch64-none-elf/libgloss/aarch64 -L/work/gcc-clean/build-aarch64-none-elf/obj/binutils/aarch64-none-elf/libgloss/libnosys -L/work/gcc-clean/src/binutils/libgloss/aarch64 -L/work/gcc-clean/build-aarch64-none-elf/obj/binutils/./ld    -O2 -g -O2 -g -I. -I/work/gcc-clean/src/binutils/libgloss/aarch64/.. -DARM_RDI_MONITOR -o rdimon-_exit.o -c /work/gcc-clean/src/binutils/libgloss/aarch64/_exit.c
> /work/gcc-clean/src/binutils/libgloss/aarch64/_exit.c: In function '_exit':
> /work/gcc-clean/src/binutils/libgloss/aarch64/_exit.c:41:1: internal compiler error: in update_reg_eliminate, at lra-eliminations.c:1157
>   }
>   ^
>


   First of all, it is a bad situation for code performance when IRA
decides that it can use frame pointer for allocation, and after that
LRA/reload decides that frame pointer can not be used and spills all
pseudos assigned to FP.  The generated code will be much worse than
one generated if we decided not to use FP from the IRA start.

   Therefore I decided to put an assert for checking the situation.  It
was triggered by your case.  I think the problem in the following
code.

aarch64_can_eliminate (const int from, const int to)
{
   if (frame_pointer_needed)
      ...
   else
     {
       ...
       if (from == FRAME_POINTER_REGNUM && to == STACK_POINTER_REGNUM
	  && df_regs_ever_live_p (LR_REGNUM)
	  && faked_omit_frame_pointer)
	return false;
     }
   return true;
}


and

static bool
aarch64_frame_pointer_required (void)
{
   ...
   if (flag_omit_frame_pointer && !faked_omit_frame_pointer)
     return false;
   else if (flag_omit_leaf_frame_pointer)
     return !crtl->is_leaf;
   return true;
}

   IRA calls hook frame_pointer_required and it returns false.  After
that LRA calls can_eliminate hook and it returns false which means
that fp can not be used for allocation and we should spill all pseudos
assigned to it.

   So the problem can be solved by modifying frame_pointer_required
hook.

   I also don't like

       if (from == FRAME_POINTER_REGNUM && to == STACK_POINTER_REGNUM
           && df_regs_ever_live_p (LR_REGNUM)
           && faked_omit_frame_pointer)
         return false;

   It means we cannot eliminate FP by SP but we still can eliminate AP
by SP, for example.  I think it is wrong.

So the following patch solves the problem and improves generated code.

If somebody with the rights approves, I can commit it tomorrow.

2013-12-02  Vladimir Makarov  <vmakarov@redhat.com>

	* config/aarch64/aarch64.c (aarch64_frame_pointer_required): Check
	LR_REGNUM.
	(aarch64_can_eliminate): Don't check elimination source when
	frame_pointer_requred is false.

Index: ../../gcc/gcc/config/aarch64/aarch64.c
===================================================================
--- ../../gcc/gcc/config/aarch64/aarch64.c	(revision 205590)
+++ ../../gcc/gcc/config/aarch64/aarch64.c	(working copy)
@@ -1703,7 +1703,7 @@ aarch64_frame_pointer_required (void)
    if (flag_omit_frame_pointer && !faked_omit_frame_pointer)
      return false;
    else if (flag_omit_leaf_frame_pointer)
-    return !crtl->is_leaf;
+    return !crtl->is_leaf || df_regs_ever_live_p (LR_REGNUM);
    return true;
  }

@@ -4126,7 +4126,7 @@ aarch64_can_eliminate (const int from, c
  	 of faked_omit_frame_pointer here (which is true when we always
  	 wish to keep non-leaf frame pointers but only wish to keep leaf frame
  	 pointers when LR is clobbered).  */
-      if (from == FRAME_POINTER_REGNUM && to == STACK_POINTER_REGNUM
+      if (to == STACK_POINTER_REGNUM
  	  && df_regs_ever_live_p (LR_REGNUM)
  	  && faked_omit_frame_pointer)
  	return false;

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: patch for elimination to SP when it is changed in RTL (PR57293)
  2013-12-02 23:44   ` Vladimir Makarov
@ 2013-12-03  5:13     ` Jeff Law
  2013-12-03 11:54     ` Marcus Shawcroft
  1 sibling, 0 replies; 20+ messages in thread
From: Jeff Law @ 2013-12-03  5:13 UTC (permalink / raw)
  To: Vladimir Makarov, James Greenhalgh; +Cc: GCC Patches, Richard Earnshaw

On 12/02/13 16:44, Vladimir Makarov wrote:
>
>
>    First of all, it is a bad situation for code performance when IRA
> decides that it can use frame pointer for allocation, and after that
> LRA/reload decides that frame pointer can not be used and spills all
> pseudos assigned to FP.  The generated code will be much worse than
> one generated if we decided not to use FP from the IRA start.
Yup.  Actually, I think we had the same problem with the old 
local/global/reload allocator as well.

I don't recall the specifics, but I think the problem was global thought 
it could eliminate FP, but reload didn't and as a result code generation 
suffered.

I don't recall ever auditing ports to see if they were vulnerable to 
this class of problem.  So there may be others that will might trigger 
the assert.

>
> 2013-12-02  Vladimir Makarov  <vmakarov@redhat.com>
>
>      * config/aarch64/aarch64.c (aarch64_frame_pointer_required): Check
>      LR_REGNUM.
>      (aarch64_can_eliminate): Don't check elimination source when
>      frame_pointer_requred is false.
s/frame_pointer_requred/frame_pointer_required in the ChangeLog entry.

jeff

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: patch for elimination to SP when it is changed in RTL (PR57293)
  2013-12-02 23:44   ` Vladimir Makarov
  2013-12-03  5:13     ` Jeff Law
@ 2013-12-03 11:54     ` Marcus Shawcroft
  2013-12-03 15:39       ` Vladimir Makarov
  1 sibling, 1 reply; 20+ messages in thread
From: Marcus Shawcroft @ 2013-12-03 11:54 UTC (permalink / raw)
  To: Vladimir Makarov; +Cc: James Greenhalgh, GCC Patches, Richard Earnshaw

On 2 December 2013 23:44, Vladimir Makarov <vmakarov@redhat.com> wrote:

> If somebody with the rights approves, I can commit it tomorrow.
>
> 2013-12-02  Vladimir Makarov  <vmakarov@redhat.com>
>
>         * config/aarch64/aarch64.c (aarch64_frame_pointer_required): Check
>         LR_REGNUM.
>         (aarch64_can_eliminate): Don't check elimination source when
>         frame_pointer_requred is false.
>


This is fine with me, go ahead and commit it.  Thanks /Marcus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: patch for elimination to SP when it is changed in RTL (PR57293)
  2013-12-03 11:54     ` Marcus Shawcroft
@ 2013-12-03 15:39       ` Vladimir Makarov
  2013-12-11 10:35         ` Yvan Roux
  0 siblings, 1 reply; 20+ messages in thread
From: Vladimir Makarov @ 2013-12-03 15:39 UTC (permalink / raw)
  To: Marcus Shawcroft
  Cc: James Greenhalgh, GCC Patches, Richard Earnshaw, Jeff Law

On 12/3/2013, 6:54 AM, Marcus Shawcroft wrote:
> On 2 December 2013 23:44, Vladimir Makarov <vmakarov@redhat.com> wrote:
>
>> If somebody with the rights approves, I can commit it tomorrow.
>>
>> 2013-12-02  Vladimir Makarov  <vmakarov@redhat.com>
>>
>>          * config/aarch64/aarch64.c (aarch64_frame_pointer_required): Check
>>          LR_REGNUM.
>>          (aarch64_can_eliminate): Don't check elimination source when
>>          frame_pointer_requred is false.
>>
>
>
> This is fine with me, go ahead and commit it.  Thanks /Marcus
>
Committed as rev. 205637 with changelog fix of a typo found by Jeff.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: patch for elimination to SP when it is changed in RTL (PR57293)
  2013-12-03 15:39       ` Vladimir Makarov
@ 2013-12-11 10:35         ` Yvan Roux
  2013-12-11 11:08           ` Ramana Radhakrishnan
  2013-12-11 18:25           ` Vladimir Makarov
  0 siblings, 2 replies; 20+ messages in thread
From: Yvan Roux @ 2013-12-11 10:35 UTC (permalink / raw)
  To: Vladimir Makarov
  Cc: Marcus Shawcroft, James Greenhalgh, GCC Patches,
	Richard Earnshaw, Jeff Law

Hi Vladimir,

I've some regressions on ARM after this SP elimination patch, and they
are execution failures.  Here is the list:

g++.dg/cilk-plus/AN/array_test_ND_tplt.cc  -O3 -fcilkplus
gcc.c-torture/execute/va-arg-22.c  -O2
gcc.dg/atomic/c11-atomic-exec-5.c  -O0
gfortran.dg/direct_io_12.f90  -O[23]
gfortran.dg/elemental_dependency_1.f90  -O2
gfortran.dg/matmul_2.f90  -O2
gfortran.dg/matmul_6.f90  -O2
gfortran.dg/mvbits_7.f90  -O3
gfortran.dg/unlimited_polymorphic_1.f03  -O3

I reduced and looked at var-arg-22.c and the issue is that in
lra_eliminate_regs_1  (called by get_equiv_with_elimination) we
transformed sfp + 0x4c in sp + 0xfc because of a bad sp offset.  What
we try to do here is to change the pseudo 195 of the insn 118 below :

(insn 118 114 112 8 (set (reg:DI 195)
        (unspec:DI [
                (mem:DI (plus:SI (reg/f:SI 215)
                        (const_int 8 [0x8])) [7 MEM[(struct A35 *)_12
+ 64B]+8 S8 A8])
            ] UNSPEC_UNALIGNED_LOAD)) v2.c:49 146 {unaligned_loaddi}
     (expr_list:REG_EQUIV (mem/c:DI (plus:SI (reg/f:SI 192)
                (const_int 8 [0x8])) [7 a35+8 S8 A32])
        (nil)))

with its equivalent (x arg of lra_eliminate_regs_1):

(mem/c:DI (plus:SI (reg/f:SI 102 sfp)
        (const_int 76 [0x4c])) [7 a35+8 S8 A32])

lra_eliminate_regs_1 is called with full_p = true (it is not really
clear for what it means), but in the PLUS switch case, we have offset
= 0xb (given by ep->offset) and  as lra_get_insn_recog_data
(insn)->sp_offset value is 0, we will indeed add 0xb to the original
0x4c offset.

So, here I don't get if it is the sp_offset value of the
lra_insn_recog_data element which is not well updated or if lra_
eliminate_regs_1 has to be called with update_p and not full_p (which
fixed the value in that particular case).  Is it more obvious for you
?

Thanks
Yvan


On 3 December 2013 16:39, Vladimir Makarov <vmakarov@redhat.com> wrote:
> On 12/3/2013, 6:54 AM, Marcus Shawcroft wrote:
>>
>> On 2 December 2013 23:44, Vladimir Makarov <vmakarov@redhat.com> wrote:
>>
>>> If somebody with the rights approves, I can commit it tomorrow.
>>>
>>> 2013-12-02  Vladimir Makarov  <vmakarov@redhat.com>
>>>
>>>          * config/aarch64/aarch64.c (aarch64_frame_pointer_required):
>>> Check
>>>          LR_REGNUM.
>>>          (aarch64_can_eliminate): Don't check elimination source when
>>>          frame_pointer_requred is false.
>>>
>>
>>
>> This is fine with me, go ahead and commit it.  Thanks /Marcus
>>
> Committed as rev. 205637 with changelog fix of a typo found by Jeff.
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: patch for elimination to SP when it is changed in RTL (PR57293)
  2013-12-11 10:35         ` Yvan Roux
@ 2013-12-11 11:08           ` Ramana Radhakrishnan
  2013-12-11 11:31             ` Yvan Roux
  2013-12-11 18:25           ` Vladimir Makarov
  1 sibling, 1 reply; 20+ messages in thread
From: Ramana Radhakrishnan @ 2013-12-11 11:08 UTC (permalink / raw)
  To: Yvan Roux
  Cc: Vladimir Makarov, Marcus Shawcroft, James Greenhalgh,
	GCC Patches, Richard Earnshaw, Jeff Law

Yvan,

On Wed, Dec 11, 2013 at 10:35 AM, Yvan Roux <yvan.roux@linaro.org> wrote:
> Hi Vladimir,
>
> I've some regressions on ARM after this SP elimination patch, and they
> are execution failures.  Here is the list:

Pragmatically, I think it's time we turned LRA on by default now that
we are in stage3 and that would help with getting more issues out of
the auto-testers quicker than anything else. Given we are now well
into stage3, we should make sure that the LRA support gets as much
testing as it can get in the run-up to the release.

Can you prepare a patch for this please ?

regards
Ramana



>
> g++.dg/cilk-plus/AN/array_test_ND_tplt.cc  -O3 -fcilkplus
> gcc.c-torture/execute/va-arg-22.c  -O2
> gcc.dg/atomic/c11-atomic-exec-5.c  -O0
> gfortran.dg/direct_io_12.f90  -O[23]
> gfortran.dg/elemental_dependency_1.f90  -O2
> gfortran.dg/matmul_2.f90  -O2
> gfortran.dg/matmul_6.f90  -O2
> gfortran.dg/mvbits_7.f90  -O3
> gfortran.dg/unlimited_polymorphic_1.f03  -O3
>
> I reduced and looked at var-arg-22.c and the issue is that in
> lra_eliminate_regs_1  (called by get_equiv_with_elimination) we
> transformed sfp + 0x4c in sp + 0xfc because of a bad sp offset.  What
> we try to do here is to change the pseudo 195 of the insn 118 below :
>
> (insn 118 114 112 8 (set (reg:DI 195)
>         (unspec:DI [
>                 (mem:DI (plus:SI (reg/f:SI 215)
>                         (const_int 8 [0x8])) [7 MEM[(struct A35 *)_12
> + 64B]+8 S8 A8])
>             ] UNSPEC_UNALIGNED_LOAD)) v2.c:49 146 {unaligned_loaddi}
>      (expr_list:REG_EQUIV (mem/c:DI (plus:SI (reg/f:SI 192)
>                 (const_int 8 [0x8])) [7 a35+8 S8 A32])
>         (nil)))
>
> with its equivalent (x arg of lra_eliminate_regs_1):
>
> (mem/c:DI (plus:SI (reg/f:SI 102 sfp)
>         (const_int 76 [0x4c])) [7 a35+8 S8 A32])
>
> lra_eliminate_regs_1 is called with full_p = true (it is not really
> clear for what it means), but in the PLUS switch case, we have offset
> = 0xb (given by ep->offset) and  as lra_get_insn_recog_data
> (insn)->sp_offset value is 0, we will indeed add 0xb to the original
> 0x4c offset.
>
> So, here I don't get if it is the sp_offset value of the
> lra_insn_recog_data element which is not well updated or if lra_
> eliminate_regs_1 has to be called with update_p and not full_p (which
> fixed the value in that particular case).  Is it more obvious for you
> ?
>
> Thanks
> Yvan
>
>
> On 3 December 2013 16:39, Vladimir Makarov <vmakarov@redhat.com> wrote:
>> On 12/3/2013, 6:54 AM, Marcus Shawcroft wrote:
>>>
>>> On 2 December 2013 23:44, Vladimir Makarov <vmakarov@redhat.com> wrote:
>>>
>>>> If somebody with the rights approves, I can commit it tomorrow.
>>>>
>>>> 2013-12-02  Vladimir Makarov  <vmakarov@redhat.com>
>>>>
>>>>          * config/aarch64/aarch64.c (aarch64_frame_pointer_required):
>>>> Check
>>>>          LR_REGNUM.
>>>>          (aarch64_can_eliminate): Don't check elimination source when
>>>>          frame_pointer_requred is false.
>>>>
>>>
>>>
>>> This is fine with me, go ahead and commit it.  Thanks /Marcus
>>>
>> Committed as rev. 205637 with changelog fix of a typo found by Jeff.
>>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: patch for elimination to SP when it is changed in RTL (PR57293)
  2013-12-11 11:08           ` Ramana Radhakrishnan
@ 2013-12-11 11:31             ` Yvan Roux
  0 siblings, 0 replies; 20+ messages in thread
From: Yvan Roux @ 2013-12-11 11:31 UTC (permalink / raw)
  To: Ramana Radhakrishnan
  Cc: Vladimir Makarov, Marcus Shawcroft, James Greenhalgh,
	GCC Patches, Richard Earnshaw, Jeff Law

> Pragmatically, I think it's time we turned LRA on by default now that
> we are in stage3 and that would help with getting more issues out of
> the auto-testers quicker than anything else. Given we are now well
> into stage3, we should make sure that the LRA support gets as much
> testing as it can get in the run-up to the release.

I agree.

> Can you prepare a patch for this please ?

I'll post the patch on the list.

Thanks,
Yvan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: patch for elimination to SP when it is changed in RTL (PR57293)
  2013-12-11 10:35         ` Yvan Roux
  2013-12-11 11:08           ` Ramana Radhakrishnan
@ 2013-12-11 18:25           ` Vladimir Makarov
  2013-12-11 18:59             ` Yvan Roux
  1 sibling, 1 reply; 20+ messages in thread
From: Vladimir Makarov @ 2013-12-11 18:25 UTC (permalink / raw)
  To: Yvan Roux
  Cc: Marcus Shawcroft, James Greenhalgh, GCC Patches,
	Richard Earnshaw, Jeff Law

On 12/11/2013, 5:35 AM, Yvan Roux wrote:
> Hi Vladimir,
>
> I've some regressions on ARM after this SP elimination patch, and they
> are execution failures.  Here is the list:
>
> g++.dg/cilk-plus/AN/array_test_ND_tplt.cc  -O3 -fcilkplus
> gcc.c-torture/execute/va-arg-22.c  -O2
> gcc.dg/atomic/c11-atomic-exec-5.c  -O0
> gfortran.dg/direct_io_12.f90  -O[23]
> gfortran.dg/elemental_dependency_1.f90  -O2
> gfortran.dg/matmul_2.f90  -O2
> gfortran.dg/matmul_6.f90  -O2
> gfortran.dg/mvbits_7.f90  -O3
> gfortran.dg/unlimited_polymorphic_1.f03  -O3
>
> I reduced and looked at var-arg-22.c and the issue is that in
> lra_eliminate_regs_1  (called by get_equiv_with_elimination) we
> transformed sfp + 0x4c in sp + 0xfc because of a bad sp offset.  What
> we try to do here is to change the pseudo 195 of the insn 118 below :
>
> (insn 118 114 112 8 (set (reg:DI 195)
>          (unspec:DI [
>                  (mem:DI (plus:SI (reg/f:SI 215)
>                          (const_int 8 [0x8])) [7 MEM[(struct A35 *)_12
> + 64B]+8 S8 A8])
>              ] UNSPEC_UNALIGNED_LOAD)) v2.c:49 146 {unaligned_loaddi}
>       (expr_list:REG_EQUIV (mem/c:DI (plus:SI (reg/f:SI 192)
>                  (const_int 8 [0x8])) [7 a35+8 S8 A32])
>          (nil)))
>
> with its equivalent (x arg of lra_eliminate_regs_1):
>
> (mem/c:DI (plus:SI (reg/f:SI 102 sfp)
>          (const_int 76 [0x4c])) [7 a35+8 S8 A32])
>
> lra_eliminate_regs_1 is called with full_p = true (it is not really
> clear for what it means),

It means we use full offset between the regs, otherwise we use change in 
the full offset from the previous iteration (it can be changed as we 
reserve stack memory for spilled pseudos and the reservation can be done 
several times).  As equiv value is stored as it was before any 
elimination, we need always to use full offset to make elimination.

  but in the PLUS switch case, we have offset
> = 0xb (given by ep->offset) and  as lra_get_insn_recog_data
> (insn)->sp_offset value is 0, we will indeed add 0xb to the original
> 0x4c offset.
>

0 value is suspicious because it is default.  We might have not set up 
it from neighbor insns.


> So, here I don't get if it is the sp_offset value of the
> lra_insn_recog_data element which is not well updated or if lra_
> eliminate_regs_1 has to be called with update_p and not full_p (which
> fixed the value in that particular case).  Is it more obvious for you
> ?
>

Yvan, could you send me the reduced preprocessed case and the options 
for cc1 to reproduce it.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: patch for elimination to SP when it is changed in RTL (PR57293)
  2013-12-11 18:25           ` Vladimir Makarov
@ 2013-12-11 18:59             ` Yvan Roux
  2013-12-12 19:18               ` Vladimir Makarov
  0 siblings, 1 reply; 20+ messages in thread
From: Yvan Roux @ 2013-12-11 18:59 UTC (permalink / raw)
  To: Vladimir Makarov
  Cc: Marcus Shawcroft, James Greenhalgh, GCC Patches,
	Richard Earnshaw, Jeff Law

[-- Attachment #1: Type: text/plain, Size: 2978 bytes --]

On 11 December 2013 19:25, Vladimir Makarov <vmakarov@redhat.com> wrote:
> On 12/11/2013, 5:35 AM, Yvan Roux wrote:
>>
>> Hi Vladimir,
>>
>> I've some regressions on ARM after this SP elimination patch, and they
>> are execution failures.  Here is the list:
>>
>> g++.dg/cilk-plus/AN/array_test_ND_tplt.cc  -O3 -fcilkplus
>> gcc.c-torture/execute/va-arg-22.c  -O2
>> gcc.dg/atomic/c11-atomic-exec-5.c  -O0
>> gfortran.dg/direct_io_12.f90  -O[23]
>> gfortran.dg/elemental_dependency_1.f90  -O2
>> gfortran.dg/matmul_2.f90  -O2
>> gfortran.dg/matmul_6.f90  -O2
>> gfortran.dg/mvbits_7.f90  -O3
>> gfortran.dg/unlimited_polymorphic_1.f03  -O3
>>
>> I reduced and looked at var-arg-22.c and the issue is that in
>> lra_eliminate_regs_1  (called by get_equiv_with_elimination) we
>> transformed sfp + 0x4c in sp + 0xfc because of a bad sp offset.  What
>> we try to do here is to change the pseudo 195 of the insn 118 below :
>>
>> (insn 118 114 112 8 (set (reg:DI 195)
>>          (unspec:DI [
>>                  (mem:DI (plus:SI (reg/f:SI 215)
>>                          (const_int 8 [0x8])) [7 MEM[(struct A35 *)_12
>> + 64B]+8 S8 A8])
>>              ] UNSPEC_UNALIGNED_LOAD)) v2.c:49 146 {unaligned_loaddi}
>>       (expr_list:REG_EQUIV (mem/c:DI (plus:SI (reg/f:SI 192)
>>                  (const_int 8 [0x8])) [7 a35+8 S8 A32])
>>          (nil)))
>>
>> with its equivalent (x arg of lra_eliminate_regs_1):
>>
>> (mem/c:DI (plus:SI (reg/f:SI 102 sfp)
>>          (const_int 76 [0x4c])) [7 a35+8 S8 A32])
>>
>> lra_eliminate_regs_1 is called with full_p = true (it is not really
>> clear for what it means),
>
>
> It means we use full offset between the regs, otherwise we use change in the
> full offset from the previous iteration (it can be changed as we reserve
> stack memory for spilled pseudos and the reservation can be done several
> times).  As equiv value is stored as it was before any elimination, we need
> always to use full offset to make elimination.

Ok thanks it's clearer.

>  but in the PLUS switch case, we have offset
>>
>> = 0xb (given by ep->offset) and  as lra_get_insn_recog_data
>> (insn)->sp_offset value is 0, we will indeed add 0xb to the original
>> 0x4c offset.
>>
>
> 0 value is suspicious because it is default.  We might have not set up it
> from neighbor insns.
>
>
>
>> So, here I don't get if it is the sp_offset value of the
>> lra_insn_recog_data element which is not well updated or if lra_
>> eliminate_regs_1 has to be called with update_p and not full_p (which
>> fixed the value in that particular case).  Is it more obvious for you
>> ?
>>
>
> Yvan, could you send me the reduced preprocessed case and the options for
> cc1 to reproduce it.


Here is cc1 command line :

cc1 -quiet -march=armv7-a -mtune=cortex-a15 -mfloat-abi=hard
-mfpu=neon -mthumb  v2.c  -O2

I use a native build on a chromebook, but it's reproducible with a
cross compiler.

With the attached test case the issue is when processing insn 118.

Thanks,
Yvan

[-- Attachment #2: v2.c --]
[-- Type: text/x-csrc, Size: 1359 bytes --]

typedef __builtin_va_list __gnuc_va_list;
typedef __gnuc_va_list va_list;

extern void abort (void);
extern void exit (int);


void bar (int n, int c)
{
  static int lastn = -1, lastc = -1;

  if (lastn != n) {
    if (lastc != lastn)
      abort ();
    lastc = 0;
    lastn = n;
  }

  if (c != (char) (lastc ^ (n << 3)))
    abort ();
  lastc++;
}

 typedef struct { char x[31]; } A31;
 typedef struct { char x[32]; } A32;
 typedef struct { char x[35]; } A35;
 typedef struct { char x[72]; } A72;

void foo (int size, ...)
{
 A31 a31;
 A32 a32;
 A35 a35;
 A72 a72;


  va_list ap;

  int i;

  if (size != 21) abort ();

  __builtin_va_start(ap,size);

 a31 = __builtin_va_arg(ap,typeof (a31));
 for (i = 0; i < 31; i++) bar (31, a31.x[i]);
 a32 = __builtin_va_arg(ap,typeof (a32));
 for (i = 0; i < 32; i++) bar (32, a32.x[i]);
 a35 = __builtin_va_arg(ap,typeof (a35));
 for (i = 0; i < 35; i++) bar (35, a35.x[i]);
 a72 = __builtin_va_arg(ap,typeof (a72));
 for (i = 0; i < 72; i++) bar (72, a72.x[i]);

  __builtin_va_end(ap);

}

int main (void)
{
 A31 a31;
 A32 a32;
 A35 a35;
 A72 a72;
 int i;
 for (i = 0; i < 31; i++) a31.x[i] = i ^ (31 << 3);
 for (i = 0; i < 32; i++) a32.x[i] = i ^ (32 << 3);
 for (i = 0; i < 35; i++) a35.x[i] = i ^ (35 << 3);
 for (i = 0; i < 72; i++) a72.x[i] = i ^ (72 << 3);

  foo (21, a31, a32, a35, a72);
  exit (0);

}

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: patch for elimination to SP when it is changed in RTL (PR57293)
  2013-12-11 18:59             ` Yvan Roux
@ 2013-12-12 19:18               ` Vladimir Makarov
  2013-12-13 13:07                 ` Yvan Roux
  0 siblings, 1 reply; 20+ messages in thread
From: Vladimir Makarov @ 2013-12-12 19:18 UTC (permalink / raw)
  To: Yvan Roux
  Cc: Marcus Shawcroft, James Greenhalgh, GCC Patches,
	Richard Earnshaw, Jeff Law

On 12/11/2013, 1:59 PM, Yvan Roux wrote:
> On 11 December 2013 19:25, Vladimir Makarov <vmakarov@redhat.com> wrote:
>> On 12/11/2013, 5:35 AM, Yvan Roux wrote:
>>>
>>> Hi Vladimir,
>>>
>>> I've some regressions on ARM after this SP elimination patch, and they
>>> are execution failures.  Here is the list:
>>>
>>> g++.dg/cilk-plus/AN/array_test_ND_tplt.cc  -O3 -fcilkplus
>>> gcc.c-torture/execute/va-arg-22.c  -O2
>>> gcc.dg/atomic/c11-atomic-exec-5.c  -O0
>>> gfortran.dg/direct_io_12.f90  -O[23]
>>> gfortran.dg/elemental_dependency_1.f90  -O2
>>> gfortran.dg/matmul_2.f90  -O2
>>> gfortran.dg/matmul_6.f90  -O2
>>> gfortran.dg/mvbits_7.f90  -O3
>>> gfortran.dg/unlimited_polymorphic_1.f03  -O3
>>>
>>> I reduced and looked at var-arg-22.c and the issue is that in
>>> lra_eliminate_regs_1  (called by get_equiv_with_elimination) we
>>> transformed sfp + 0x4c in sp + 0xfc because of a bad sp offset.  What
>>> we try to do here is to change the pseudo 195 of the insn 118 below :
>>>
>>> (insn 118 114 112 8 (set (reg:DI 195)
>>>           (unspec:DI [
>>>                   (mem:DI (plus:SI (reg/f:SI 215)
>>>                           (const_int 8 [0x8])) [7 MEM[(struct A35 *)_12
>>> + 64B]+8 S8 A8])
>>>               ] UNSPEC_UNALIGNED_LOAD)) v2.c:49 146 {unaligned_loaddi}
>>>        (expr_list:REG_EQUIV (mem/c:DI (plus:SI (reg/f:SI 192)
>>>                   (const_int 8 [0x8])) [7 a35+8 S8 A32])
>>>           (nil)))
>>>
>>> with its equivalent (x arg of lra_eliminate_regs_1):
>>>
>>> (mem/c:DI (plus:SI (reg/f:SI 102 sfp)
>>>           (const_int 76 [0x4c])) [7 a35+8 S8 A32])
>>>
>>> lra_eliminate_regs_1 is called with full_p = true (it is not really
>>> clear for what it means),
>>
>>
>> It means we use full offset between the regs, otherwise we use change in the
>> full offset from the previous iteration (it can be changed as we reserve
>> stack memory for spilled pseudos and the reservation can be done several
>> times).  As equiv value is stored as it was before any elimination, we need
>> always to use full offset to make elimination.
>
> Ok thanks it's clearer.
>
>>   but in the PLUS switch case, we have offset
>>>
>>> = 0xb (given by ep->offset) and  as lra_get_insn_recog_data
>>> (insn)->sp_offset value is 0, we will indeed add 0xb to the original
>>> 0x4c offset.
>>>
>>
>> 0 value is suspicious because it is default.  We might have not set up it
>> from neighbor insns.
>>
>>
>>
>>> So, here I don't get if it is the sp_offset value of the
>>> lra_insn_recog_data element which is not well updated or if lra_
>>> eliminate_regs_1 has to be called with update_p and not full_p (which
>>> fixed the value in that particular case).  Is it more obvious for you
>>> ?
>>>
>>
>> Yvan, could you send me the reduced preprocessed case and the options for
>> cc1 to reproduce it.
>
>
> Here is cc1 command line :
>
> cc1 -quiet -march=armv7-a -mtune=cortex-a15 -mfloat-abi=hard
> -mfpu=neon -mthumb  v2.c  -O2
>
> I use a native build on a chromebook, but it's reproducible with a
> cross compiler.
>
> With the attached test case the issue is when processing insn 118.

The offset is updated two times and that is wrong.  That is because 
memory in init insn is shared by ira_reg_equiv and the test involves 2 
equivalent substitutions.  As I wrote equiv should be stored in original 
form by the current patch design.  Simple copying will not work as the 
first substitution is not done in this case.

I need some time to think how to fix it better still I'll try to fix it 
tomorrow.  I expected that the patch might have some problems.  The 
patch code is quite big although it is just a long standing PR fix. 
Therefore that was my first PR fixed on stage 3.  It is good to have it 
tested earlier and sorry to break some arm tests.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: patch for elimination to SP when it is changed in RTL (PR57293)
  2013-12-12 19:18               ` Vladimir Makarov
@ 2013-12-13 13:07                 ` Yvan Roux
  2013-12-16 23:03                   ` Vladimir Makarov
  0 siblings, 1 reply; 20+ messages in thread
From: Yvan Roux @ 2013-12-13 13:07 UTC (permalink / raw)
  To: Vladimir Makarov
  Cc: Marcus Shawcroft, James Greenhalgh, GCC Patches,
	Richard Earnshaw, Jeff Law, Ramana Radhakrishnan

[-- Attachment #1: Type: text/plain, Size: 4379 bytes --]

Thanks for your help Vlad.  Another bad news about this PR fix, is
that it has resurrected the thumb_movhi_clobber bug (PR 58785) but in
a different manner as the original failing testcase still pass.  I
attached a testcase to be compiled with :

cc1 -mthumb -mcpu=cortex-m0 -O2 m.c

And Thumb bootstrap seems to be broken with an ICE in check_rtl, I'm
checking if it is the same issue.

Yvan


On 12 December 2013 20:18, Vladimir Makarov <vmakarov@redhat.com> wrote:
> On 12/11/2013, 1:59 PM, Yvan Roux wrote:
>>
>> On 11 December 2013 19:25, Vladimir Makarov <vmakarov@redhat.com> wrote:
>>>
>>> On 12/11/2013, 5:35 AM, Yvan Roux wrote:
>>>>
>>>>
>>>> Hi Vladimir,
>>>>
>>>> I've some regressions on ARM after this SP elimination patch, and they
>>>> are execution failures.  Here is the list:
>>>>
>>>> g++.dg/cilk-plus/AN/array_test_ND_tplt.cc  -O3 -fcilkplus
>>>> gcc.c-torture/execute/va-arg-22.c  -O2
>>>> gcc.dg/atomic/c11-atomic-exec-5.c  -O0
>>>> gfortran.dg/direct_io_12.f90  -O[23]
>>>> gfortran.dg/elemental_dependency_1.f90  -O2
>>>> gfortran.dg/matmul_2.f90  -O2
>>>> gfortran.dg/matmul_6.f90  -O2
>>>> gfortran.dg/mvbits_7.f90  -O3
>>>> gfortran.dg/unlimited_polymorphic_1.f03  -O3
>>>>
>>>> I reduced and looked at var-arg-22.c and the issue is that in
>>>> lra_eliminate_regs_1  (called by get_equiv_with_elimination) we
>>>> transformed sfp + 0x4c in sp + 0xfc because of a bad sp offset.  What
>>>> we try to do here is to change the pseudo 195 of the insn 118 below :
>>>>
>>>> (insn 118 114 112 8 (set (reg:DI 195)
>>>>           (unspec:DI [
>>>>                   (mem:DI (plus:SI (reg/f:SI 215)
>>>>                           (const_int 8 [0x8])) [7 MEM[(struct A35 *)_12
>>>> + 64B]+8 S8 A8])
>>>>               ] UNSPEC_UNALIGNED_LOAD)) v2.c:49 146 {unaligned_loaddi}
>>>>        (expr_list:REG_EQUIV (mem/c:DI (plus:SI (reg/f:SI 192)
>>>>                   (const_int 8 [0x8])) [7 a35+8 S8 A32])
>>>>           (nil)))
>>>>
>>>> with its equivalent (x arg of lra_eliminate_regs_1):
>>>>
>>>> (mem/c:DI (plus:SI (reg/f:SI 102 sfp)
>>>>           (const_int 76 [0x4c])) [7 a35+8 S8 A32])
>>>>
>>>> lra_eliminate_regs_1 is called with full_p = true (it is not really
>>>> clear for what it means),
>>>
>>>
>>>
>>> It means we use full offset between the regs, otherwise we use change in
>>> the
>>> full offset from the previous iteration (it can be changed as we reserve
>>> stack memory for spilled pseudos and the reservation can be done several
>>> times).  As equiv value is stored as it was before any elimination, we
>>> need
>>> always to use full offset to make elimination.
>>
>>
>> Ok thanks it's clearer.
>>
>>>   but in the PLUS switch case, we have offset
>>>>
>>>>
>>>> = 0xb (given by ep->offset) and  as lra_get_insn_recog_data
>>>> (insn)->sp_offset value is 0, we will indeed add 0xb to the original
>>>> 0x4c offset.
>>>>
>>>
>>> 0 value is suspicious because it is default.  We might have not set up it
>>> from neighbor insns.
>>>
>>>
>>>
>>>> So, here I don't get if it is the sp_offset value of the
>>>> lra_insn_recog_data element which is not well updated or if lra_
>>>> eliminate_regs_1 has to be called with update_p and not full_p (which
>>>> fixed the value in that particular case).  Is it more obvious for you
>>>> ?
>>>>
>>>
>>> Yvan, could you send me the reduced preprocessed case and the options for
>>> cc1 to reproduce it.
>>
>>
>>
>> Here is cc1 command line :
>>
>> cc1 -quiet -march=armv7-a -mtune=cortex-a15 -mfloat-abi=hard
>> -mfpu=neon -mthumb  v2.c  -O2
>>
>> I use a native build on a chromebook, but it's reproducible with a
>> cross compiler.
>>
>> With the attached test case the issue is when processing insn 118.
>
>
> The offset is updated two times and that is wrong.  That is because memory
> in init insn is shared by ira_reg_equiv and the test involves 2 equivalent
> substitutions.  As I wrote equiv should be stored in original form by the
> current patch design.  Simple copying will not work as the first
> substitution is not done in this case.
>
> I need some time to think how to fix it better still I'll try to fix it
> tomorrow.  I expected that the patch might have some problems.  The patch
> code is quite big although it is just a long standing PR fix. Therefore that
> was my first PR fixed on stage 3.  It is good to have it tested earlier and
> sorry to break some arm tests.
>
>

[-- Attachment #2: m.c --]
[-- Type: text/x-csrc, Size: 2169 bytes --]

void free (void *) ;

extern int *__errno (void);

typedef unsigned int size_t;


typedef unsigned int __uint32_t;
typedef unsigned short __uint16_t;

typedef struct _bufhead BUFHEAD;

struct _bufhead {
 BUFHEAD *prev;
 BUFHEAD *next;
 BUFHEAD *ovfl;
 __uint32_t addr;
 char *page;
 char flags;
};

typedef BUFHEAD **SEGMENT;

typedef struct hashhdr {
 int magic;
 int version;
 __uint32_t lorder;
 int bsize;
 int bshift;
 int dsize;
 int ssize;
 int sshift;
 int ovfl_point;

 int last_freed;
 int max_bucket;
 int high_mask;
 int low_mask;

 int ffactor;
 int nkeys;
 int hdrpages;
 int h_charkey;


 int spares[32];
 __uint16_t bitmaps[32];

} HASHHDR;

typedef struct htab {
 HASHHDR hdr;
 int nsegs;
 int exsegs;

 __uint32_t
     (*hash)(const void *, size_t);
 int flags;
 int fp;
 char *tmp_buf;
 char *tmp_key;
 BUFHEAD *cpage;
 int cbucket;
 int cndx;
 int error;

 int new_file;

 int save_file;


 __uint32_t *mapp[32];
 int nmaps;
 int nbufs;

 BUFHEAD bufhead;
 SEGMENT *dir;
} HTAB;


extern int
collect_data(hashp, bufp, len, set)
 HTAB *hashp;
 BUFHEAD *bufp;
 int len, set;
{
 __uint16_t *bp;
 char *p;
 BUFHEAD *xbp;
 __uint16_t save_addr;
 int mylen, totlen;

 p = bufp->page;
 bp = (__uint16_t *)p;
 mylen = hashp->hdr.bsize - bp[1];
 save_addr = bufp->addr;

 if (bp[2] == 3) {
  totlen = len + mylen;
  if (hashp->tmp_buf)
   free(hashp->tmp_buf);
  if ((hashp->tmp_buf = (char *)malloc(totlen)) == ((void *)0))
   return (-1);
  if (set) {
   hashp->cndx = 1;
   if (bp[0] == 2) {
    hashp->cpage = ((void *)0);
    hashp->cbucket++;
   } else {
    hashp->cpage =
        __get_buf(hashp, bp[bp[0] - 1], bufp, 0);
    if (!hashp->cpage)
     return (-1);
    else if (!((__uint16_t *)hashp->cpage->page)[0]) {
     hashp->cbucket++;
     hashp->cpage = ((void *)0);
    }
   }
  } 
 } else {
  xbp = __get_buf(hashp, bp[bp[0] - 1], bufp, 0);
  if (!xbp || ((totlen =
      collect_data(hashp, xbp, len + mylen, set)) < 1))
   return (-1);
 }
 if (bufp->addr != save_addr) {
  return (-1);
 }
 return (totlen);
}

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: patch for elimination to SP when it is changed in RTL (PR57293)
  2013-12-13 13:07                 ` Yvan Roux
@ 2013-12-16 23:03                   ` Vladimir Makarov
  2013-12-17 14:01                     ` Yvan Roux
  0 siblings, 1 reply; 20+ messages in thread
From: Vladimir Makarov @ 2013-12-16 23:03 UTC (permalink / raw)
  To: Yvan Roux
  Cc: Marcus Shawcroft, James Greenhalgh, GCC Patches,
	Richard Earnshaw, Jeff Law, Ramana Radhakrishnan

[-- Attachment #1: Type: text/plain, Size: 1075 bytes --]

On 12/13/2013, 8:07 AM, Yvan Roux wrote:
> Thanks for your help Vlad.  Another bad news about this PR fix, is
> that it has resurrected the thumb_movhi_clobber bug (PR 58785) but in
> a different manner as the original failing testcase still pass.  I
> attached a testcase to be compiled with :
>
> cc1 -mthumb -mcpu=cortex-m0 -O2 m.c
>
> And Thumb bootstrap seems to be broken with an ICE in check_rtl, I'm
> checking if it is the same issue.
>

The compiler crashes because a reload pattern is trying to take address 
of memory which is actually a spilled pseudo for LRA.  The pattern is 
designed for reload which always uses memory not a spilled pseudo as LRA 
does.

But we don't need to adjust the pattern for LRA.  LRA can manage by 
itself without reload patterns.

I found that I missed to switch off these patterns for LRA fully.  The 
following patch solves the problem.  The same was done for 
THUMB_SECONDARY_INPUT_RELOAD_CLASS long ago.

Yvan, could go from this patch by yourself.  I mean testing and getting 
its approval from an ARM maintainer.  Thanks.





[-- Attachment #2: z --]
[-- Type: text/plain, Size: 918 bytes --]

Index: config/arm/arm.h
===================================================================
--- config/arm/arm.h	(revision 206023)
+++ config/arm/arm.h	(working copy)
@@ -1285,11 +1285,12 @@ enum reg_class
       : NO_REGS))
 
 #define THUMB_SECONDARY_OUTPUT_RELOAD_CLASS(CLASS, MODE, X)		\
-  ((CLASS) != LO_REGS && (CLASS) != BASE_REGS				\
-   ? ((true_regnum (X) == -1 ? LO_REGS					\
-       : (true_regnum (X) + HARD_REGNO_NREGS (0, MODE) > 8) ? LO_REGS	\
-       : NO_REGS)) 							\
-   : NO_REGS)
+  (lra_in_progress ? NO_REGS						\
+   : (CLASS) != LO_REGS && (CLASS) != BASE_REGS				\
+      ? ((true_regnum (X) == -1 ? LO_REGS				\
+         : (true_regnum (X) + HARD_REGNO_NREGS (0, MODE) > 8) ? LO_REGS	\
+         : NO_REGS)) 							\
+      : NO_REGS)
 
 /* Return the register class of a scratch register needed to copy IN into
    or out of a register in CLASS in MODE.  If it can be done directly,

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: patch for elimination to SP when it is changed in RTL (PR57293)
  2013-12-16 23:03                   ` Vladimir Makarov
@ 2013-12-17 14:01                     ` Yvan Roux
  0 siblings, 0 replies; 20+ messages in thread
From: Yvan Roux @ 2013-12-17 14:01 UTC (permalink / raw)
  To: Vladimir Makarov
  Cc: Marcus Shawcroft, James Greenhalgh, GCC Patches,
	Richard Earnshaw, Jeff Law, Ramana Radhakrishnan

On 17 December 2013 00:03, Vladimir Makarov <vmakarov@redhat.com> wrote:
> On 12/13/2013, 8:07 AM, Yvan Roux wrote:
>>
>> Thanks for your help Vlad.  Another bad news about this PR fix, is
>> that it has resurrected the thumb_movhi_clobber bug (PR 58785) but in
>> a different manner as the original failing testcase still pass.  I
>> attached a testcase to be compiled with :
>>
>> cc1 -mthumb -mcpu=cortex-m0 -O2 m.c
>>
>> And Thumb bootstrap seems to be broken with an ICE in check_rtl, I'm
>> checking if it is the same issue.
>>
>
> The compiler crashes because a reload pattern is trying to take address of
> memory which is actually a spilled pseudo for LRA.  The pattern is designed
> for reload which always uses memory not a spilled pseudo as LRA does.
>
> But we don't need to adjust the pattern for LRA.  LRA can manage by itself
> without reload patterns.
>
> I found that I missed to switch off these patterns for LRA fully.  The
> following patch solves the problem.  The same was done for
> THUMB_SECONDARY_INPUT_RELOAD_CLASS long ago.
>
> Yvan, could go from this patch by yourself.  I mean testing and getting its
> approval from an ARM maintainer.  Thanks.


I remember having tested that very same patch when we changed
THUMB_SECONDARY_INPUT_RELOAD_CLASS and having build issues, but with
the fixes made since the summer, the build and the testsuite are now
ok.  Before submitting this patch I wanted to check if  it is not more
general fix which is needed, and modifying
SECONDARY_OUTPUT_RELOAD_CLASS and SECONDARY_INPUT_RELOAD_CLASS instead
of the THUMB macros, because this is here that the target IWMMXT is
handled and that we have the lra loop issue during the constraint
solving for that target.  First results shows that it fixes also some
Thumb1 regressions, but I don't have the full results for the moment.

Thanks
Yvan

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2013-12-17 14:01 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-29  5:19 patch for elimination to SP when it is changed in RTL (PR57293) Vladimir Makarov
2013-11-29 10:51 ` H.J. Lu
2013-11-29 11:10   ` Vladimir Makarov
2013-11-29 16:20     ` H.J. Lu
2013-11-29 17:02       ` Jan Hubicka
2013-12-01 12:57 ` James Greenhalgh
2013-12-02 16:32   ` Vladimir Makarov
2013-12-02 23:44   ` Vladimir Makarov
2013-12-03  5:13     ` Jeff Law
2013-12-03 11:54     ` Marcus Shawcroft
2013-12-03 15:39       ` Vladimir Makarov
2013-12-11 10:35         ` Yvan Roux
2013-12-11 11:08           ` Ramana Radhakrishnan
2013-12-11 11:31             ` Yvan Roux
2013-12-11 18:25           ` Vladimir Makarov
2013-12-11 18:59             ` Yvan Roux
2013-12-12 19:18               ` Vladimir Makarov
2013-12-13 13:07                 ` Yvan Roux
2013-12-16 23:03                   ` Vladimir Makarov
2013-12-17 14:01                     ` Yvan Roux

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).