public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* RFA: enable LRA for rs6000
@ 2013-04-11 20:09 Vladimir Makarov
  2013-04-11 20:56 ` David Edelsohn
  2013-04-16  9:13 ` Michael Meissner
  0 siblings, 2 replies; 33+ messages in thread
From: Vladimir Makarov @ 2013-04-11 20:09 UTC (permalink / raw)
  To: gcc-patches, David Edelsohn, Michael Meissner, Bergner, Peter

[-- Attachment #1: Type: text/plain, Size: 2975 bytes --]

   Here is a patch to enable LRA for rs6000.  The patch includes code 
changes in rs6000 machine-dependent parts and in LRA parts.

   I've added a new switch -mlra for rs6000 to make debugging LRA for 
rs6000 easier but not documented it as it will be gone at the end of 
stage1 (may be with ability to use LRA for rs6000 if the results are 
unsatisfactory).  I have only one question about its default value.  
Currently by default LRA is used but if you prefer opposite I can 
reverse it. Please just let me know your opinion.

   The patch was successfully bootstrapped and tested on pppc64 and 
x86/x86-64.

   Are rs6000 parts ok for trunk?

2013-04-11  Vladimir Makarov  <vmakarov@redhat.com>

         * rtl.h (struct rtx_def): Add comment for field jump.
         (LRA_SUBREG_P): New macro.
         * recog.c (register_operand): Check LRA_SUBREG_P.
         * lra-constraints.c (match_reload, simplify_operand_subreg): Use
         LRA_SUBREG_P.
         (emit_spill_move): Set up LRA_SUBREG_P.
         * lra.c (lra): Add note at the end of RTL code. Align non-empty
         stack frame.
         * lra-spills.c (lra_spill): Align stack after spilling pseudos.
         (lra_final_code_change): Skip subreg change for operators.
         * lra-eliminations.c (eliminate_regs_in_insn): Make return earlier
         if there are no operand changes.
         * lra-constraints.c (curr_insn_set): New.
         (match_reload): Set LRA_SUBREG_P.
         (emit_spill_move): Ditto.
         (check_and_process_move): Use curr_insn_set. Process only single
         set insns.  Don't initialize sec_mem_p and change_p.
         (simplify_operand_subreg): Use LRA_SUBREG_P.
         (reg_in_class_p): New function.
         (process_alt_operands): Use it.  Use #if HAVE_ATTR_enabled instead
         of #ifdef.  Add code to remove cycling.
         (process_address): Check EXTRA_CONSTRAINT_STR. Process even if
         non-null disp.  Reload inner instead of disp when base and index
         are null.
         (EBB_PROBABILITY_CUTOFF): Redefine probability in percents.
         (curr_insn_transform): Initialize sec_mem_p and change_p.  Set up
         curr_insn_set.  Call check_and_process_move only for single set
         insns.
         * config/rs6000/rs6000.opt (mlra): New option.
         * config/rs6000/rs6000.h (SECONDARY_MEMORY_NEEDED_MODE): New macro.
         * config/rs6000/rs6000-protos.h
         (rs6000_secondary_memory_needed_mode): New prototype.
         * config/rs6000/rs6000.c: Include ira.h.
         (TARGET_LRA_P): Redefine.
         (legitimate_lo_sum_address_p): Permit modes bigger word for LRA.
         (rs6000_emit_move): Add movsd generation code for LRA.
         (rs6000_secondary_memory_needed_mode): New function.
         (rs6000_lra_p): Ditto.
         (rs6000_alloc_sdmode_stack_slot): Ignore code for LRA.
         (rs6000_secondary_reload_class): Return NO_REGS for LRA in case
         constants, memory, or fp regs.



[-- Attachment #2: lra-ppc.patch --]
[-- Type: text/x-patch, Size: 24584 bytes --]

Index: config/rs6000/rs6000-protos.h
===================================================================
--- config/rs6000/rs6000-protos.h	(revision 197640)
+++ config/rs6000/rs6000-protos.h	(working copy)
@@ -118,6 +118,8 @@ extern rtx create_TOC_reference (rtx, rt
 extern void rs6000_split_multireg_move (rtx, rtx);
 extern void rs6000_emit_move (rtx, rtx, enum machine_mode);
 extern rtx rs6000_secondary_memory_needed_rtx (enum machine_mode);
+extern enum machine_mode rs6000_secondary_memory_needed_mode (enum
+							      machine_mode);
 extern rtx (*rs6000_legitimize_reload_address_ptr) (rtx, enum machine_mode,
 						    int, int, int, int *);
 extern bool rs6000_legitimate_offset_address_p (enum machine_mode, rtx,
Index: config/rs6000/rs6000.c
===================================================================
--- config/rs6000/rs6000.c	(revision 197640)
+++ config/rs6000/rs6000.c	(working copy)
@@ -56,6 +56,7 @@
 #include "intl.h"
 #include "params.h"
 #include "tm-constrs.h"
+#include "ira.h"
 #include "opts.h"
 #include "tree-vectorizer.h"
 #include "dumpfile.h"
@@ -1425,6 +1426,9 @@ static const struct attribute_spec rs600
 #undef TARGET_MODE_DEPENDENT_ADDRESS_P
 #define TARGET_MODE_DEPENDENT_ADDRESS_P rs6000_mode_dependent_address_p
 
+#undef TARGET_LRA_P
+#define TARGET_LRA_P rs6000_lra_p
+
 #undef TARGET_CAN_ELIMINATE
 #define TARGET_CAN_ELIMINATE rs6000_can_eliminate
 
@@ -5561,7 +5565,7 @@ rs6000_legitimate_offset_address_p (enum
     return false;
   if (!reg_offset_addressing_ok_p (mode))
     return virtual_stack_registers_memory_p (x);
-  if (legitimate_constant_pool_address_p (x, mode, strict))
+  if (legitimate_constant_pool_address_p (x, mode, strict || lra_in_progress))
     return true;
   if (GET_CODE (XEXP (x, 1)) != CONST_INT)
     return false;
@@ -5701,19 +5705,23 @@ legitimate_lo_sum_address_p (enum machin
 
   if (TARGET_ELF || TARGET_MACHO)
     {
+      bool toc_ok_p;
+
       if (DEFAULT_ABI != ABI_AIX && DEFAULT_ABI != ABI_DARWIN && flag_pic)
 	return false;
-      if (TARGET_TOC)
+      toc_ok_p = (lra_in_progress && TARGET_CMODEL != CMODEL_SMALL
+		  && small_toc_ref (x, VOIDmode));
+      if (TARGET_TOC && ! toc_ok_p)
 	return false;
       if (GET_MODE_NUNITS (mode) != 1)
 	return false;
-      if (GET_MODE_SIZE (mode) > UNITS_PER_WORD
+      if (! lra_in_progress && GET_MODE_SIZE (mode) > UNITS_PER_WORD
 	  && !(/* ??? Assume floating point reg based on mode?  */
 	       TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_DOUBLE_FLOAT
 	       && (mode == DFmode || mode == DDmode)))
 	return false;
 
-      return CONSTANT_P (x);
+      return CONSTANT_P (x) || toc_ok_p;
     }
 
   return false;
@@ -6711,7 +6719,8 @@ rs6000_legitimate_address_p (enum machin
   if (reg_offset_p && legitimate_small_data_p (mode, x))
     return 1;
   if (reg_offset_p
-      && legitimate_constant_pool_address_p (x, mode, reg_ok_strict))
+      && legitimate_constant_pool_address_p (x, mode,
+					     reg_ok_strict || lra_in_progress))
     return 1;
   /* If not REG_OK_STRICT (before reload) let pass any stack offset.  */
   if (! reg_ok_strict
@@ -7000,6 +7009,7 @@ rs6000_conditional_register_usage (void)
 	  fixed_regs[i] = call_used_regs[i] = call_really_used_regs[i] = 1;
     }
 }
+
 \f
 /* Try to output insns to set TARGET equal to the constant C if it can
    be done in less than N insns.  Do all computations in MODE.
@@ -7331,6 +7341,68 @@ rs6000_emit_move (rtx dest, rtx source,
     cfun->machine->sdmode_stack_slot =
       eliminate_regs (cfun->machine->sdmode_stack_slot, VOIDmode, NULL_RTX);
 
+
+  if (lra_in_progress
+      && mode == SDmode
+      && REG_P (operands[0]) && REGNO (operands[0]) >= FIRST_PSEUDO_REGISTER
+      && reg_preferred_class (REGNO (operands[0])) == NO_REGS
+      && (REG_P (operands[1])
+	  || (GET_CODE (operands[1]) == SUBREG
+	      && REG_P (SUBREG_REG (operands[1])))))
+    {
+      int regno = REGNO (GET_CODE (operands[1]) == SUBREG
+			 ? SUBREG_REG (operands[1]) : operands[1]);
+      enum reg_class cl;
+
+      if (regno >= FIRST_PSEUDO_REGISTER)
+	{
+	  cl = reg_preferred_class (regno);
+	  gcc_assert (cl != NO_REGS);
+	  regno = ira_class_hard_regs[cl][0];
+	}
+      if (FP_REGNO_P (regno))
+	{
+	  if (GET_MODE (operands[0]) != DDmode)
+	    operands[0] = gen_rtx_SUBREG (DDmode, operands[0], 0);
+	  emit_insn (gen_movsd_store (operands[0], operands[1]));
+	}
+      else if (INT_REGNO_P (regno))
+	emit_insn (gen_movsd_hardfloat (operands[0], operands[1]));
+      else
+	gcc_unreachable();
+      return;
+    }
+  if (lra_in_progress
+      && mode == SDmode
+      && (REG_P (operands[0])
+	  || (GET_CODE (operands[0]) == SUBREG
+	      && REG_P (SUBREG_REG (operands[0]))))
+      && REG_P (operands[1]) && REGNO (operands[1]) >= FIRST_PSEUDO_REGISTER
+      && reg_preferred_class (REGNO (operands[1])) == NO_REGS)
+    {
+      int regno = REGNO (GET_CODE (operands[0]) == SUBREG
+			 ? SUBREG_REG (operands[0]) : operands[0]);
+      enum reg_class cl;
+
+      if (regno >= FIRST_PSEUDO_REGISTER)
+	{
+	  cl = reg_preferred_class (regno);
+	  gcc_assert (cl != NO_REGS);
+	  regno = ira_class_hard_regs[cl][0];
+	}
+      if (FP_REGNO_P (regno))
+	{
+	  if (GET_MODE (operands[1]) != DDmode)
+	    operands[1] = gen_rtx_SUBREG (DDmode, operands[1], 0);
+	  emit_insn (gen_movsd_load (operands[0], operands[1]));
+	}
+      else if (INT_REGNO_P (regno))
+	emit_insn (gen_movsd_hardfloat (operands[0], operands[1]));
+      else
+	gcc_unreachable();
+      return;
+    }
+
   if (reload_in_progress
       && mode == SDmode
       && cfun->machine->sdmode_stack_slot != NULL_RTX
@@ -13848,6 +13920,17 @@ rs6000_secondary_memory_needed_rtx (enum
   return ret;
 }
 
+/* Return the mode to be used for memory when a secondary memory
+   location is needed.  For SDmode values we need to use DDmode, in
+   all other cases we can use the same mode.  */
+enum machine_mode
+rs6000_secondary_memory_needed_mode (enum machine_mode mode)
+{
+  if (mode == SDmode)
+    return DDmode;
+  return mode;
+}
+
 static tree
 rs6000_check_sdmode (tree *tp, int *walk_subtrees, void *data ATTRIBUTE_UNUSED)
 {
@@ -14511,6 +14594,10 @@ rs6000_alloc_sdmode_stack_slot (void)
   gimple_stmt_iterator gsi;
 
   gcc_assert (cfun->machine->sdmode_stack_slot == NULL_RTX);
+  /* We use a different approach for dealing with the secondary
+     memmory in LRA.  */
+  if (ira_use_lra_p)
+    return;
 
   if (TARGET_NO_SDMODE_STACK)
     return;
@@ -14747,7 +14834,7 @@ rs6000_secondary_reload_class (enum reg_
   /* Constants, memory, and FP registers can go into FP registers.  */
   if ((regno == -1 || FP_REGNO_P (regno))
       && (rclass == FLOAT_REGS || rclass == NON_SPECIAL_REGS))
-    return (mode != SDmode) ? NO_REGS : GENERAL_REGS;
+    return (mode != SDmode || lra_in_progress) ? NO_REGS : GENERAL_REGS;
 
   /* Memory, and FP/altivec registers can go into fp/altivec registers under
      VSX.  However, for scalar variables, use the traditional floating point
@@ -27666,6 +27753,13 @@ rs6000_libcall_value (enum machine_mode
 }
 
 
+/* Return true if we use LRA instead of reload pass.  */
+static bool
+rs6000_lra_p (void)
+{
+  return rs6000_lra_flag;
+}
+
 /* Given FROM and TO register numbers, say whether this elimination is allowed.
    Frame pointer elimination is automatically handled.
 
Index: config/rs6000/rs6000.h
===================================================================
--- config/rs6000/rs6000.h	(revision 197640)
+++ config/rs6000/rs6000.h	(working copy)
@@ -1391,6 +1391,13 @@ extern enum reg_class rs6000_constraints
 #define SECONDARY_MEMORY_NEEDED_RTX(MODE) \
   rs6000_secondary_memory_needed_rtx (MODE)
 
+/* Specify the mode to be used for memory when a secondary memory
+   location is needed.  For cpus that cannot load/store SDmode values
+   from the 64-bit FP registers without using a full 64-bit
+   load/store, we need a wider mode.  */
+#define SECONDARY_MEMORY_NEEDED_MODE(MODE)		\
+  rs6000_secondary_memory_needed_mode (MODE)
+
 /* Return the maximum number of consecutive registers
    needed to represent mode MODE in a register of class CLASS.
 
Index: config/rs6000/rs6000.opt
===================================================================
--- config/rs6000/rs6000.opt	(revision 197640)
+++ config/rs6000/rs6000.opt	(working copy)
@@ -443,6 +443,10 @@ mlong-double-
 Target RejectNegative Joined UInteger Var(rs6000_long_double_type_size) Save
 -mlong-double-<n>	Specify size of long double (64 or 128 bits)
 
+mlra
+Target Report Var(rs6000_lra_flag) Init(1) Save
+Use LRA instead of reload
+
 msched-costly-dep=
 Target RejectNegative Joined Var(rs6000_sched_costly_dep_str)
 Determine which dependences between insns are considered costly
Index: lra-constraints.c
===================================================================
--- lra-constraints.c	(revision 197640)
+++ lra-constraints.c	(working copy)
@@ -135,10 +135,11 @@
    reload insns.  */
 static int bb_reload_num;
 
-/* The current insn being processed and corresponding its data (basic
-   block, the insn data, the insn static data, and the mode of each
-   operand).  */
+/* The current insn being processed and corresponding its single set
+   (NULL otherwise), its data (basic block, the insn data, the insn
+   static data, and the mode of each operand).  */
 static rtx curr_insn;
+static rtx curr_insn_set;
 static basic_block curr_bb;
 static lra_insn_recog_data_t curr_id;
 static struct lra_static_insn_data *curr_static_id;
@@ -698,6 +699,7 @@ match_reload (signed char out, signed ch
 	    new_out_reg = gen_lowpart_SUBREG (outmode, reg);
 	  else
 	    new_out_reg = gen_rtx_SUBREG (outmode, reg, 0);
+	  LRA_SUBREG_P (new_out_reg) = 1;
 	  /* If the input reg is dying here, we can use the same hard
 	     register for REG and IN_RTX.  We do it only for original
 	     pseudos as reload pseudos can die although original
@@ -721,6 +723,7 @@ match_reload (signed char out, signed ch
 	     it at the end of LRA work.  */
 	  clobber = emit_clobber (new_out_reg);
 	  LRA_TEMP_CLOBBER_P (PATTERN (clobber)) = 1;
+	  LRA_SUBREG_P (new_in_reg) = 1;
 	  if (GET_CODE (in_rtx) == SUBREG)
 	    {
 	      rtx subreg_reg = SUBREG_REG (in_rtx);
@@ -856,32 +859,34 @@ static rtx
 emit_spill_move (bool to_p, rtx mem_pseudo, rtx val)
 {
   if (GET_MODE (mem_pseudo) != GET_MODE (val))
-    val = gen_rtx_SUBREG (GET_MODE (mem_pseudo),
-			  GET_CODE (val) == SUBREG ? SUBREG_REG (val) : val,
-			  0);
+    {
+      val = gen_rtx_SUBREG (GET_MODE (mem_pseudo),
+			    GET_CODE (val) == SUBREG ? SUBREG_REG (val) : val,
+			    0);
+      LRA_SUBREG_P (val) = 1;
+    }
   return (to_p
-	  ? gen_move_insn (mem_pseudo, val)
-	  : gen_move_insn (val, mem_pseudo));
+          ? gen_move_insn (mem_pseudo, val)
+          : gen_move_insn (val, mem_pseudo));
 }
 
 /* Process a special case insn (register move), return true if we
-   don't need to process it anymore.  Return that RTL was changed
-   through CHANGE_P and macro SECONDARY_MEMORY_NEEDED says to use
-   secondary memory through SEC_MEM_P.	*/
+   don't need to process it anymore.  INSN should be a single set
+   insn.  Set up that RTL was changed through CHANGE_P and macro
+   SECONDARY_MEMORY_NEEDED says to use secondary memory through
+   SEC_MEM_P.  */
 static bool
-check_and_process_move (bool *change_p, bool *sec_mem_p)
+check_and_process_move (bool *change_p, bool *sec_mem_p ATTRIBUTE_UNUSED)
 {
   int sregno, dregno;
-  rtx set, dest, src, dreg, sreg, old_sreg, new_reg, before, scratch_reg;
+  rtx dest, src, dreg, sreg, old_sreg, new_reg, before, scratch_reg;
   enum reg_class dclass, sclass, secondary_class;
   enum machine_mode sreg_mode;
   secondary_reload_info sri;
 
-  *sec_mem_p = *change_p = false;
-  if ((set = single_set (curr_insn)) == NULL)
-    return false;
-  dreg = dest = SET_DEST (set);
-  sreg = src = SET_SRC (set);
+  lra_assert (curr_insn_set != NULL_RTX);
+  dreg = dest = SET_DEST (curr_insn_set);
+  sreg = src = SET_SRC (curr_insn_set);
   /* Quick check on the right move insn which does not need
      reloads.  */
   if ((dclass = get_op_class (dest)) != NO_REGS
@@ -1008,7 +1013,7 @@ check_and_process_move (bool *change_p,
       if (GET_CODE (src) == SUBREG)
 	SUBREG_REG (src) = new_reg;
       else
-	SET_SRC (set) = new_reg;
+	SET_SRC (curr_insn_set) = new_reg;
     }
   else
     {
@@ -1205,7 +1210,10 @@ simplify_operand_subreg (int nop, enum m
        && (hard_regno_nregs[hard_regno][GET_MODE (reg)]
 	   >= hard_regno_nregs[hard_regno][mode])
        && simplify_subreg_regno (hard_regno, GET_MODE (reg),
-				 SUBREG_BYTE (operand), mode) < 0)
+				 SUBREG_BYTE (operand), mode) < 0
+       /* Don't reload subreg for matching reload.  It is actually
+	  valid subreg in LRA.  */
+       && ! LRA_SUBREG_P (operand))
       || CONSTANT_P (reg) || GET_CODE (reg) == PLUS || MEM_P (reg))
     {
       enum op_type type = curr_static_id->operand[nop].type;
@@ -1312,6 +1320,14 @@ general_constant_p (rtx x)
   return CONSTANT_P (x) && (! flag_pic || LEGITIMATE_PIC_OPERAND_P (x));
 }
 
+static bool
+reg_in_class_p (rtx reg, enum reg_class cl)
+{
+  if (cl == NO_REGS)
+    return get_reg_class (REGNO (reg)) == NO_REGS;
+  return in_class_p (reg, cl, NULL);
+}
+
 /* Major function to choose the current insn alternative and what
    operands should be reloaded and how.	 If ONLY_ALTERNATIVE is not
    negative we should consider only this alternative.  Return false if
@@ -1391,7 +1407,7 @@ process_alt_operands (int only_alternati
   for (nalt = 0; nalt < n_alternatives; nalt++)
     {
       /* Loop over operands for one constraint alternative.  */
-#ifdef HAVE_ATTR_enabled
+#if HAVE_ATTR_enabled
       if (curr_id->alternative_enabled_p != NULL
 	  && ! curr_id->alternative_enabled_p[nalt])
 	continue;
@@ -2048,6 +2064,31 @@ process_alt_operands (int only_alternati
 	  if (early_clobber_p && operand_reg[nop] != NULL_RTX)
 	    early_clobbered_nops[early_clobbered_regs_num++] = nop;
 	}
+      if (curr_insn_set != NULL_RTX && n_operands == 2
+	  && ((! curr_alt_win[0] && ! curr_alt_win[1]
+	       && REG_P (no_subreg_reg_operand[0])
+	       && REG_P (no_subreg_reg_operand[1])
+	       && (reg_in_class_p (no_subreg_reg_operand[0], curr_alt[1])
+		   || reg_in_class_p (no_subreg_reg_operand[1], curr_alt[0])))
+	      || (! curr_alt_win[0] && curr_alt_win[1]
+		  && REG_P (no_subreg_reg_operand[1])
+		  && reg_in_class_p (no_subreg_reg_operand[1], curr_alt[0]))
+	      || (curr_alt_win[0] && ! curr_alt_win[1]
+		  && REG_P (no_subreg_reg_operand[0])
+		  && reg_in_class_p (no_subreg_reg_operand[0], curr_alt[1])
+		  && (! CONST_POOL_OK_P (curr_operand_mode[1],
+					 no_subreg_reg_operand[1])
+		      || (targetm.preferred_reload_class
+			  (no_subreg_reg_operand[1],
+			   (enum reg_class) curr_alt[1]) != NO_REGS))
+		  /* If it is a result of recent elimination in move
+		     insn we can transform it into an add still by
+		     using this alternative.  */
+		  && GET_CODE (no_subreg_reg_operand[1]) != PLUS)))
+	/* We have a move insn and a new reload insn will be similar
+	   to the current insn.  We should avoid such situation as it
+	   results in LRA cycling.  */
+	overall += LRA_MAX_REJECT;
       ok_p = true;
       curr_alt_dont_inherit_ops_num = 0;
       for (nop = 0; nop < early_clobbered_regs_num; nop++)
@@ -2419,27 +2460,35 @@ process_address (int nop, rtx *before, r
       && process_addr_reg (ad.index_term, before, NULL, INDEX_REG_CLASS))
     change_p = true;
 
+#ifdef EXTRA_CONSTRAINT_STR
+  /* Target hooks sometimes reject extra constraint addresses -- use
+     EXTRA_CONSTRAINT_STR for the validation.  */
+  if (constraint[0] != 'p'
+      && EXTRA_ADDRESS_CONSTRAINT (constraint[0], constraint)
+      && EXTRA_CONSTRAINT_STR (op, constraint[0], constraint))
+    return change_p;
+#endif
+
   /* There are three cases where the shape of *AD.INNER may now be invalid:
 
      1) the original address was valid, but either elimination or
-	equiv_address_substitution applied a displacement that made
-	it invalid.
+	equiv_address_substitution was applied and that made
+	the address invalid.
 
      2) the address is an invalid symbolic address created by
 	force_const_to_mem.
 
      3) the address is a frame address with an invalid offset.
 
-     All these cases involve a displacement and a non-autoinc address,
-     so there is no point revalidating other types.  */
-  if (ad.disp == NULL || ad.autoinc_p || valid_address_p (&ad))
+     All these cases involve a non-autoinc address, so there is no
+     point revalidating other types.  */
+  if (ad.autoinc_p || valid_address_p (&ad))
     return change_p;
 
   /* Any index existed before LRA started, so we can assume that the
      presence and shape of the index is valid.  */
   push_to_sequence (*before);
-  gcc_assert (ad.segment == NULL);
-  gcc_assert (ad.disp == ad.disp_term);
+  lra_assert (ad.disp == ad.disp_term);
   if (ad.base == NULL)
     {
       if (ad.index == NULL)
@@ -2447,25 +2496,25 @@ process_address (int nop, rtx *before, r
 	  int code = -1;
 	  enum reg_class cl = base_reg_class (ad.mode, ad.as,
 					      SCRATCH, SCRATCH);
-	  rtx disp = *ad.disp;
+	  rtx addr = *ad.inner;
 
-	  new_reg = lra_create_new_reg (Pmode, NULL_RTX, cl, "disp");
+	  new_reg = lra_create_new_reg (Pmode, NULL_RTX, cl, "addr");
 #ifdef HAVE_lo_sum
 	  {
 	    rtx insn;
 	    rtx last = get_last_insn ();
 
-	    /* disp => lo_sum (new_base, disp), case (2) above.  */
+	    /* addr => lo_sum (new_base, addr), case (2) above.  */
 	    insn = emit_insn (gen_rtx_SET
 			      (VOIDmode, new_reg,
-			       gen_rtx_HIGH (Pmode, copy_rtx (disp))));
+			       gen_rtx_HIGH (Pmode, copy_rtx (addr))));
 	    code = recog_memoized (insn);
 	    if (code >= 0)
 	      {
-		*ad.disp = gen_rtx_LO_SUM (Pmode, new_reg, disp);
+		*ad.inner = gen_rtx_LO_SUM (Pmode, new_reg, addr);
 		if (! valid_address_p (ad.mode, *ad.outer, ad.as))
 		  {
-		    *ad.disp = disp;
+		    *ad.inner = addr;
 		    code = -1;
 		  }
 	      }
@@ -2475,9 +2524,9 @@ process_address (int nop, rtx *before, r
 #endif
 	  if (code < 0)
 	    {
-	      /* disp => new_base, case (2) above.  */
-	      lra_emit_move (new_reg, disp);
-	      *ad.disp = new_reg;
+	      /* addr => new_base, case (2) above.  */
+	      lra_emit_move (new_reg, addr);
+	      *ad.inner = new_reg;
 	    }
 	}
       else
@@ -2690,7 +2739,10 @@ curr_insn_transform (void)
   no_input_reloads_p = no_output_reloads_p = false;
   goal_alt_number = -1;
 
-  if (check_and_process_move (&change_p, &sec_mem_p))
+  change_p = sec_mem_p = false;
+  curr_insn_set = single_set (curr_insn);
+  if (curr_insn_set != NULL_RTX
+      && check_and_process_move (&change_p, &sec_mem_p))
     return change_p;
 
   /* JUMP_INSNs and CALL_INSNs are not allowed to have any output
@@ -4806,7 +4858,7 @@ inherit_in_ebb (rtx head, rtx tail)
 /* This value affects EBB forming.  If probability of edge from EBB to
    a BB is not greater than the following value, we don't add the BB
    to EBB.  */
-#define EBB_PROBABILITY_CUTOFF (REG_BR_PROB_BASE / 2)
+#define EBB_PROBABILITY_CUTOFF ((REG_BR_PROB_BASE * 50) / 100)
 
 /* Current number of inheritance/split iteration.  */
 int lra_inheritance_iter;
Index: lra-eliminations.c
===================================================================
--- lra-eliminations.c	(revision 197640)
+++ lra-eliminations.c	(working copy)
@@ -975,6 +975,9 @@ eliminate_regs_in_insn (rtx insn, bool r
 	}
     }
 
+  if (! validate_p)
+    return;
+
   /* Substitute the operands; the new values are in the substed_operand
      array.  */
   for (i = 0; i < static_id->n_operands; i++)
@@ -982,16 +985,13 @@ eliminate_regs_in_insn (rtx insn, bool r
   for (i = 0; i < static_id->n_dups; i++)
     *id->dup_loc[i] = substed_operand[(int) static_id->dup_num[i]];
 
-  if (validate_p)
-    {
-      /* If we had a move insn but now we don't, re-recognize it.
-	 This will cause spurious re-recognition if the old move had a
-	 PARALLEL since the new one still will, but we can't call
-	 single_set without having put new body into the insn and the
-	 re-recognition won't hurt in this rare case.  */
-      id = lra_update_insn_recog_data (insn);
-      static_id = id->insn_static_data;
-    }
+  /* If we had a move insn but now we don't, re-recognize it.
+     This will cause spurious re-recognition if the old move had a
+     PARALLEL since the new one still will, but we can't call
+     single_set without having put new body into the insn and the
+     re-recognition won't hurt in this rare case.  */
+  id = lra_update_insn_recog_data (insn);
+  static_id = id->insn_static_data;
 }
 
 /* Spill pseudos which are assigned to hard registers in SET.  Add
Index: lra-spills.c
===================================================================
--- lra-spills.c	(revision 197640)
+++ lra-spills.c	(working copy)
@@ -548,6 +548,11 @@ lra_spill (void)
   for (i = 0; i < n; i++)
     if (pseudo_slots[pseudo_regnos[i]].mem == NULL_RTX)
       assign_mem_slot (pseudo_regnos[i]);
+  if (n > 0 && crtl->stack_alignment_needed)
+    /* If we have a stack frame, we must align it now.  The stack size
+       may be a part of the offset computation for register
+       elimination.  */
+    assign_stack_local (BLKmode, 0, crtl->stack_alignment_needed);
   if (lra_dump_file != NULL)
     {
       for (i = 0; i < slots_num; i++)
@@ -644,10 +649,12 @@ lra_final_code_change (void)
 	    }
 
 	  lra_insn_recog_data_t id = lra_get_insn_recog_data (insn);
+	  struct lra_static_insn_data *static_id = id->insn_static_data;
 	  bool insn_change_p = false;
 
 	  for (i = id->insn_static_data->n_operands - 1; i >= 0; i--)
-	    if (alter_subregs (id->operand_loc[i], ! DEBUG_INSN_P (insn)))
+	    if ((DEBUG_INSN_P (insn) || ! static_id->operand[i].is_operator)
+		&& alter_subregs (id->operand_loc[i], ! DEBUG_INSN_P (insn)))
 	      {
 		lra_update_dup (id, i);
 		insn_change_p = true;
Index: lra.c
===================================================================
--- lra.c	(revision 197640)
+++ lra.c	(working copy)
@@ -2202,6 +2202,10 @@ lra (FILE *f)
 
   timevar_push (TV_LRA);
 
+  /* Make sure that the last insn is a note.  Some subsequent passes
+     need it.  */
+  emit_note (NOTE_INSN_DELETED);
+
   COPY_HARD_REG_SET (lra_no_alloc_regs, ira_no_alloc_regs);
 
   init_reg_info ();
@@ -2258,6 +2262,11 @@ lra (FILE *f)
   bitmap_initialize (&lra_split_regs, &reg_obstack);
   bitmap_initialize (&lra_optional_reload_pseudos, &reg_obstack);
   live_p = false;
+  if (get_frame_size () != 0 && crtl->stack_alignment_needed)
+    /* If we have a stack frame, we must align it now.  The stack size
+       may be a part of the offset computation for register
+       elimination.  */
+    assign_stack_local (BLKmode, 0, crtl->stack_alignment_needed);
   for (;;)
     {
       for (;;)
Index: recog.c
===================================================================
--- recog.c	(revision 197640)
+++ recog.c	(working copy)
@@ -1065,7 +1065,8 @@ register_operand (rtx op, enum machine_m
 	  && REGNO (sub) < FIRST_PSEUDO_REGISTER
 	  && REG_CANNOT_CHANGE_MODE_P (REGNO (sub), GET_MODE (sub), mode)
 	  && GET_MODE_CLASS (GET_MODE (sub)) != MODE_COMPLEX_INT
-	  && GET_MODE_CLASS (GET_MODE (sub)) != MODE_COMPLEX_FLOAT)
+	  && GET_MODE_CLASS (GET_MODE (sub)) != MODE_COMPLEX_FLOAT
+	  && ! LRA_SUBREG_P (op))
 	return 0;
 #endif
 
Index: rtl.h
===================================================================
--- rtl.h	(revision 197640)
+++ rtl.h	(working copy)
@@ -265,7 +265,8 @@ struct GTY((chain_next ("RTX_NEXT (&%h)"
      1 in a SET that is for a return.
      In a CODE_LABEL, part of the two-bit alternate entry field.
      1 in a CONCAT is VAL_EXPR_IS_COPIED in var-tracking.c.
-     1 in a VALUE is SP_BASED_VALUE_P in cselib.c.  */
+     1 in a VALUE is SP_BASED_VALUE_P in cselib.c.
+     1 in a SUBREG generated by LRA for reload insns.  */
   unsigned int jump : 1;
   /* In a CODE_LABEL, part of the two-bit alternate entry field.
      1 in a MEM if it cannot trap.
@@ -1411,6 +1412,11 @@ do {									\
   ((RTL_FLAG_CHECK1("SUBREG_PROMOTED_UNSIGNED_P", (RTX), SUBREG)->volatil) \
    ? -1 : (int) (RTX)->unchanging)
 
+/* True if the subreg was generated by LRA for reload insns.  Such
+   subregs are valid only during LRA.  */
+#define LRA_SUBREG_P(RTX)	\
+  (RTL_FLAG_CHECK1("LRA_SUBREG_P", (RTX), SUBREG)->jump)
+
 /* Access various components of an ASM_OPERANDS rtx.  */
 
 #define ASM_OPERANDS_TEMPLATE(RTX) XCSTR (RTX, 0, ASM_OPERANDS)

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFA: enable LRA for rs6000
  2013-04-11 20:09 RFA: enable LRA for rs6000 Vladimir Makarov
@ 2013-04-11 20:56 ` David Edelsohn
  2013-04-11 22:48   ` Vladimir Makarov
  2013-04-16  9:13 ` Michael Meissner
  1 sibling, 1 reply; 33+ messages in thread
From: David Edelsohn @ 2013-04-11 20:56 UTC (permalink / raw)
  To: Vladimir Makarov; +Cc: gcc-patches, Michael Meissner, Bergner, Peter

On Thu, Apr 11, 2013 at 1:30 PM, Vladimir Makarov <vmakarov@redhat.com> wrote:
>   Here is a patch to enable LRA for rs6000.  The patch includes code changes
> in rs6000 machine-dependent parts and in LRA parts.
>
>   I've added a new switch -mlra for rs6000 to make debugging LRA for rs6000
> easier but not documented it as it will be gone at the end of stage1 (may be
> with ability to use LRA for rs6000 if the results are unsatisfactory).  I
> have only one question about its default value.  Currently by default LRA is
> used but if you prefer opposite I can reverse it. Please just let me know
> your opinion.
>
>   The patch was successfully bootstrapped and tested on pppc64 and
> x86/x86-64.
>
>   Are rs6000 parts ok for trunk?

Vlad,

Thanks for your work on LRA and your work to convert the rs6000 port
to use the new feature.

My main question about the rs6000 parts of the patch is: what is
toc_ok_p suppose to mean?

-      if (TARGET_TOC)
+      toc_ok_p = (lra_in_progress && TARGET_CMODEL != CMODEL_SMALL
+                        && small_toc_ref (x, VOIDmode));
+      if (TARGET_TOC && ! toc_ok_p)
  return false;

This is a change to the meaning of the test with no explanation or
comment.  At least the name is confusing because it seems to be
selecting a subset of valid TOC addressing forms.  Medium and large
model TOC references before reload?

Also suffix "_ok" and "_p" seem redundant.

Maybe your intent is a boolean for "large" TOC that could be named
large_toc_p or large_toc_ok?

Thanks, David

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFA: enable LRA for rs6000
  2013-04-11 20:56 ` David Edelsohn
@ 2013-04-11 22:48   ` Vladimir Makarov
  2013-04-12 19:09     ` Vladimir Makarov
  0 siblings, 1 reply; 33+ messages in thread
From: Vladimir Makarov @ 2013-04-11 22:48 UTC (permalink / raw)
  To: David Edelsohn; +Cc: gcc-patches, Michael Meissner, Bergner, Peter

On 04/11/2013 02:32 PM, David Edelsohn wrote:
> On Thu, Apr 11, 2013 at 1:30 PM, Vladimir Makarov <vmakarov@redhat.com> wrote:
>>    Here is a patch to enable LRA for rs6000.  The patch includes code changes
>> in rs6000 machine-dependent parts and in LRA parts.
>>
>>    I've added a new switch -mlra for rs6000 to make debugging LRA for rs6000
>> easier but not documented it as it will be gone at the end of stage1 (may be
>> with ability to use LRA for rs6000 if the results are unsatisfactory).  I
>> have only one question about its default value.  Currently by default LRA is
>> used but if you prefer opposite I can reverse it. Please just let me know
>> your opinion.
>>
>>    The patch was successfully bootstrapped and tested on pppc64 and
>> x86/x86-64.
>>
>>    Are rs6000 parts ok for trunk?
> Vlad,
>
> Thanks for your work on LRA and your work to convert the rs6000 port
> to use the new feature.
>
> My main question about the rs6000 parts of the patch is: what is
> toc_ok_p suppose to mean?
>
> -      if (TARGET_TOC)
> +      toc_ok_p = (lra_in_progress && TARGET_CMODEL != CMODEL_SMALL
> +                        && small_toc_ref (x, VOIDmode));
> +      if (TARGET_TOC && ! toc_ok_p)
>    return false;
>
> This is a change to the meaning of the test with no explanation or
> comment.  At least the name is confusing because it seems to be
> selecting a subset of valid TOC addressing forms.  Medium and large
> model TOC references before reload?
>
> Also suffix "_ok" and "_p" seem redundant.
>
> Maybe your intent is a boolean for "large" TOC that could be named
> large_toc_p or large_toc_ok?
>
This is a hard question for me.  Thanks for pointing this out, David.  
The code is 9 months old.  It was introduced with lo_sum support for ppc:

http://gcc.gnu.org/ml/gcc-patches/2012-07/msg00260.html

Without this change, there are >3000 failures on GCC testsuite.  If ppc 
target code says that the address

(lo_sum:DI (reg:DI <some pseudo>)
                 (plus:DI (unspec:DI [
                             (symbol_ref:DI ("*.LANCHOR0") [flags 0x182])
                             (reg:DI 2 2)
                         ] UNSPEC_TOCREL)
                     (const_int 10 [0xa])))

is not legitimate lo_sum address, LRA tries to reload an address 
generating insn

<a pseudo> = (plus:DI (unspec:DI [
                          (symbol_ref:DI ("*.LANCHOR0") [flags 0x182])
                         (reg:DI 2 2)
                        ] UNSPEC_TOCREL)
               (const_int 10 [0xa]))

and that fails

I believe it was a quick hack for old address decomposition code which 
processed wrongly some addresses.  I guess I forgot about this hack.  
Looking at this I don't like it now.

Probably we still need a more informative address decomposition which 
was rewritten by Richard Sandiford (and out of my control now).  I'll 
investigate this more but probably another solution will be ready next 
week in the best case.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFA: enable LRA for rs6000
  2013-04-11 22:48   ` Vladimir Makarov
@ 2013-04-12 19:09     ` Vladimir Makarov
  2013-04-13 15:02       ` David Edelsohn
  0 siblings, 1 reply; 33+ messages in thread
From: Vladimir Makarov @ 2013-04-12 19:09 UTC (permalink / raw)
  To: David Edelsohn; +Cc: gcc-patches, Michael Meissner, Bergner, Peter

On 13-04-11 5:04 PM, Vladimir Makarov wrote:
> On 04/11/2013 02:32 PM, David Edelsohn wrote:
>> On Thu, Apr 11, 2013 at 1:30 PM, Vladimir Makarov 
>> <vmakarov@redhat.com> wrote:
>>>    Here is a patch to enable LRA for rs6000.  The patch includes 
>>> code changes
>>> in rs6000 machine-dependent parts and in LRA parts.
>>>
>>>    I've added a new switch -mlra for rs6000 to make debugging LRA 
>>> for rs6000
>>> easier but not documented it as it will be gone at the end of stage1 
>>> (may be
>>> with ability to use LRA for rs6000 if the results are 
>>> unsatisfactory).  I
>>> have only one question about its default value. Currently by default 
>>> LRA is
>>> used but if you prefer opposite I can reverse it. Please just let me 
>>> know
>>> your opinion.
>>>
>>>    The patch was successfully bootstrapped and tested on pppc64 and
>>> x86/x86-64.
>>>
>>>    Are rs6000 parts ok for trunk?
>> Vlad,
>>
>> Thanks for your work on LRA and your work to convert the rs6000 port
>> to use the new feature.
>>
>> My main question about the rs6000 parts of the patch is: what is
>> toc_ok_p suppose to mean?
>>
>> -      if (TARGET_TOC)
>> +      toc_ok_p = (lra_in_progress && TARGET_CMODEL != CMODEL_SMALL
>> +                        && small_toc_ref (x, VOIDmode));
>> +      if (TARGET_TOC && ! toc_ok_p)
>>    return false;
>>
>> This is a change to the meaning of the test with no explanation or
>> comment.  At least the name is confusing because it seems to be
>> selecting a subset of valid TOC addressing forms. Medium and large
>> model TOC references before reload?
>>
>> Also suffix "_ok" and "_p" seem redundant.
>>
>> Maybe your intent is a boolean for "large" TOC that could be named
>> large_toc_p or large_toc_ok?
>>
> This is a hard question for me.  Thanks for pointing this out, David.  
> The code is 9 months old.  It was introduced with lo_sum support for ppc:
>
> http://gcc.gnu.org/ml/gcc-patches/2012-07/msg00260.html
>
> Without this change, there are >3000 failures on GCC testsuite.  If 
> ppc target code says that the address
>
> (lo_sum:DI (reg:DI <some pseudo>)
>                 (plus:DI (unspec:DI [
>                             (symbol_ref:DI ("*.LANCHOR0") [flags 0x182])
>                             (reg:DI 2 2)
>                         ] UNSPEC_TOCREL)
>                     (const_int 10 [0xa])))
>
> is not legitimate lo_sum address, LRA tries to reload an address 
> generating insn
>
> <a pseudo> = (plus:DI (unspec:DI [
>                          (symbol_ref:DI ("*.LANCHOR0") [flags 0x182])
>                         (reg:DI 2 2)
>                        ] UNSPEC_TOCREL)
>               (const_int 10 [0xa]))
>
> and that fails
>
> I believe it was a quick hack for old address decomposition code which 
> processed wrongly some addresses.  I guess I forgot about this hack.  
> Looking at this I don't like it now.
>
> Probably we still need a more informative address decomposition which 
> was rewritten by Richard Sandiford (and out of my control now).  I'll 
> investigate this more but probably another solution will be ready next 
> week in the best case.
>
After thorough investigation I found that this code is still ok.  The 
explanations are in the comment.  Here is the modified version of the 
code taking you proposals into account

@@ -5701,19 +5705,31 @@ legitimate_lo_sum_address_p (enum machin

    if (TARGET_ELF || TARGET_MACHO)
      {
+      bool large_toc_ok;
+
        if (DEFAULT_ABI != ABI_AIX && DEFAULT_ABI != ABI_DARWIN && flag_pic)
         return false;
-      if (TARGET_TOC)
+      /* LRA don't use LEGITIMIZE_RELOAD_ADDRESS as it usually calls
+        push_reload from reload pass code. LEGITIMIZE_RELOAD_ADDRESS
+        recognizes some LO_SUM addresses as valid although this
+        function says opposite.  In most cases, LRA through different
+        transformations can generate correct code for address reloads.
+        It can not manage only some LO_SUM cases.  So we need to add
+        code analogous to one in rs6000_legitimize_reload_address for
+        LOW_SUM here saying that some addresses are still valid. */
+      large_toc_ok = (lra_in_progress && TARGET_CMODEL != CMODEL_SMALL
+                     && small_toc_ref (x, VOIDmode));
+      if (TARGET_TOC && ! large_toc_ok)
         return false;
        if (GET_MODE_NUNITS (mode) != 1)
         return false;
-      if (GET_MODE_SIZE (mode) > UNITS_PER_WORD
+      if (! lra_in_progress && GET_MODE_SIZE (mode) > UNITS_PER_WORD
           && !(/* ??? Assume floating point reg based on mode?  */
                TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_DOUBLE_FLOAT
                && (mode == DFmode || mode == DDmode)))
         return false;

-      return CONSTANT_P (x);
+      return CONSTANT_P (x) || large_toc_ok;
      }

    return false;

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFA: enable LRA for rs6000
  2013-04-12 19:09     ` Vladimir Makarov
@ 2013-04-13 15:02       ` David Edelsohn
  0 siblings, 0 replies; 33+ messages in thread
From: David Edelsohn @ 2013-04-13 15:02 UTC (permalink / raw)
  To: Vladimir Makarov; +Cc: gcc-patches, Michael Meissner, Bergner, Peter

On Fri, Apr 12, 2013 at 12:57 PM, Vladimir Makarov <vmakarov@redhat.com> wrote:

> After thorough investigation I found that this code is still ok.  The
> explanations are in the comment.  Here is the modified version of the code
> taking you proposals into account
>
> @@ -5701,19 +5705,31 @@ legitimate_lo_sum_address_p (enum machin
>
>    if (TARGET_ELF || TARGET_MACHO)
>      {
> +      bool large_toc_ok;
> +
>        if (DEFAULT_ABI != ABI_AIX && DEFAULT_ABI != ABI_DARWIN && flag_pic)
>         return false;
> -      if (TARGET_TOC)
> +      /* LRA don't use LEGITIMIZE_RELOAD_ADDRESS as it usually calls
> +        push_reload from reload pass code. LEGITIMIZE_RELOAD_ADDRESS
> +        recognizes some LO_SUM addresses as valid although this
> +        function says opposite.  In most cases, LRA through different
> +        transformations can generate correct code for address reloads.
> +        It can not manage only some LO_SUM cases.  So we need to add
> +        code analogous to one in rs6000_legitimize_reload_address for
> +        LOW_SUM here saying that some addresses are still valid. */
> +      large_toc_ok = (lra_in_progress && TARGET_CMODEL != CMODEL_SMALL
>
> +                     && small_toc_ref (x, VOIDmode));
> +      if (TARGET_TOC && ! large_toc_ok)
>         return false;
>        if (GET_MODE_NUNITS (mode) != 1)
>         return false;
> -      if (GET_MODE_SIZE (mode) > UNITS_PER_WORD
> +      if (! lra_in_progress && GET_MODE_SIZE (mode) > UNITS_PER_WORD
>           && !(/* ??? Assume floating point reg based on mode?  */
>                TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_DOUBLE_FLOAT
>                && (mode == DFmode || mode == DDmode)))
>         return false;
>
> -      return CONSTANT_P (x);
> +      return CONSTANT_P (x) || large_toc_ok;
>      }
>
>    return false;

Okay, this makes more sense.

I think that Mike is running some experiments with LRA to see what
impact we see.  I would like to understand that a little more before
we apply the patch to trunk.

Thanks David

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFA: enable LRA for rs6000
  2013-04-11 20:09 RFA: enable LRA for rs6000 Vladimir Makarov
  2013-04-11 20:56 ` David Edelsohn
@ 2013-04-16  9:13 ` Michael Meissner
  2013-04-16  9:19   ` Steven Bosscher
  2013-04-17 11:53   ` RFA: enable LRA for rs6000 [patch for WRF] Michael Meissner
  1 sibling, 2 replies; 33+ messages in thread
From: Michael Meissner @ 2013-04-16  9:13 UTC (permalink / raw)
  To: Vladimir Makarov
  Cc: gcc-patches, David Edelsohn, Michael Meissner, Bergner, Peter

I built the spec 2006 suite with/without Vlad's patches for enabling using the
LRA register allocator for the powerpc.  Because of the bug with the count
register that was in the version I checked out, I have built things with the
-fno-branch-count-reg option.

I created a branch off of subversion id 197925 and applied Vlad's initial
patches:
svn+ssh://gcc.gnu.org/svn/gcc/branches/ibm/meissner-lra

I can't put the spec files in a general mailing list, but I will make them
available to Vlad as needed.

On the 64-bit side, the wrf benchmark does not build:

/home/meissner/fsf-install-ppc64/meissner-lra/bin/gfortran -c -o module_diffusion_em.fppized.o -I. -I./netcdf/include -g -save-temps=obj -ffast-math -O3 -mveclibabi=mass -mcpu=power7 -mrecip=rsqrt -fpeel-loops -funroll-loops -ftree-vectorize -fvect-cost-model -msave-toc-indirect -fno-aggressive-loop-optimizations -fno-branch-count-reg -mno-pointers-to-nested-functions -mlra -m64 module_diffusion_em.fppized.f90
module_diffusion_em.fppized.f90: In function 'compute_diff_metrics':
module_diffusion_em.fppized.f90:5069:0: internal compiler error: in check_rtl, at lra.c:1999
     END SUBROUTINE compute_diff_metrics
 ^
0x1055e1bf check_rtl		 /home/meissner/fsf-src/meissner-lra/gcc/lra.c:1999
0x105604c3 lra(_IO_FILE*)	 /home/meissner/fsf-src/meissner-lra/gcc/lra.c:2374
0x10512f4b do_reload		 /home/meissner/fsf-src/meissner-lra/gcc/ira.c:4619
0x10512f4b rest_of_handle_reload /home/meissner/fsf-src/meissner-lra/gcc/ira.c:4731
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <http://gcc.gnu.org/bugs.html> for instructions.
specmake: *** [module_diffusion_em.fppized.o] Error 1
specmake: *** Waiting for unfinished jobs....

On the 32-bit side, both wrf and dealII benchmarks do not build.  The wrf
failure looks like the 64-bit failure, but the file being compiled is
different:

/home/meissner/fsf-install-ppc64/meissner-lra/bin/gfortran -c -o ESMF_Alarm.fppized.o -I. -I./netcdf/include -g -save-temps=obj -ffast-math -Ofast -mveclibabi=mass -mcpu=power7 -mrecip=rsqrt -fpeel-loops -funroll-loops -ftree-vectorize -fvect-cost-model -fno-aggressive-loop-optimizations -fno-branch-count-reg -mlra -m32 ESMF_Alarm.fppized.f90
module_soil_pre.fppized.f90:1184:0: internal compiler error: in check_rtl, at lra.c:1999
    END SUBROUTINE init_soil_3_real
 ^
0x1055e1bf check_rtl		 /home/meissner/fsf-src/meissner-lra/gcc/lra.c:1999
0x105604c3 lra(_IO_FILE*)	 /home/meissner/fsf-src/meissner-lra/gcc/lra.c:2374
0x10512f4b do_reload		 /home/meissner/fsf-src/meissner-lra/gcc/ira.c:4619
0x10512f4b rest_of_handle_reload /home/meissner/fsf-src/meissner-lra/gcc/ira.c:4731
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <http://gcc.gnu.org/bugs.html> for instructions.
specmake: *** [module_soil_pre.fppized.o] Error 1
specmake: *** Waiting for unfinished jobs....
Error with make 'specmake -j40 build': check file '/home/meissner/spec-build/spec-2006-base-dev49-power7-vsx-svn197925-nocountreg-lra-shared-at6.0-32bit/benchspec/CPU2006/481.wrf/build/build_base_dev49-power7-vsx-32bit.0000/make.err'
  Command returned exit code 2
  Error with make!
*** Error building 481.wrf

In dealII, quadrature_lib.cc and polynomial.cc don't build.

/home/meissner/fsf-install-ppc64/meissner-lra/bin/g++ -c -o quadrature_lib.o -DSPEC_CPU -DNDEBUG  -Iinclude -DBOOST_DISABLE_THREADS -Ddeal_II_dimension=3 -g -save-temps=obj -ffast-math -Ofast -mveclibabi=mass -mcpu=power7 -mrecip=rsqrt -fpeel-loops -funroll-loops -ftree-vectorize -fvect-cost-model -fno-aggressive-loop-optimizations -fno-branch-count-reg -mlra  -m32     -DSPEC_CPU_LINUX -include cstddef      quadrature_lib.cc
quadrature_lib.cc: In constructor 'QGauss<dim>::QGauss(unsigned int) [with int dim = 1]':
quadrature_lib.cc:95:1: internal compiler error: in check_rtl, at lra.c:1999
 }
 ^
0x106cb2bf check_rtl		 /home/meissner/fsf-src/meissner-lra/gcc/lra.c:1999
0x106cd5c3 lra(_IO_FILE*)        /home/meissner/fsf-src/meissner-lra/gcc/lra.c:2374
0x1068004b do_reload		 /home/meissner/fsf-src/meissner-lra/gcc/ira.c:4619
0x1068004b rest_of_handle_reload /home/meissner/fsf-src/meissner-lra/gcc/ira.c:4731
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <http://gcc.gnu.org/bugs.html> for instructions.
specmake: *** [quadrature_lib.o] Error 1

/home/meissner/fsf-install-ppc64/meissner-lra/bin/g++ -c -o polynomial.o -DSPEC_CPU -DNDEBUG  -Iinclude -DBOOST_DISABLE_THREADS -Ddeal_II_dimension=3 -g -save-temps=obj -ffast-math -Ofast -mveclibabi=mass -mcpu=power7 -mrecip=rsqrt -fpeel-loops -funroll-loops -ftree-vectorize -fvect-cost-model -fno-aggressive-loop-optimizations -fno-branch-count-reg -mlra  -m32     -DSPEC_CPU_LINUX -include cstddef      polynomial.cc
polynomial.cc: In member function 'Polynomials::Polynomial<number> Polynomials::Polynomial<number>::derivative() const [with number = long double]':
polynomial.cc:282:3: internal compiler error: in check_rtl, at lra.c:1999
   }
   ^
0x106cb2bf check_rtl		 /home/meissner/fsf-src/meissner-lra/gcc/lra.c:1999
0x106cd5c3 lra(_IO_FILE*)	 /home/meissner/fsf-src/meissner-lra/gcc/lra.c:2374
0x1068004b do_reload		 /home/meissner/fsf-src/meissner-lra/gcc/ira.c:4619
0x1068004b rest_of_handle_reload /home/meissner/fsf-src/meissner-lra/gcc/ira.c:4731
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <http://gcc.gnu.org/bugs.html> for instructions.
specmake: *** [polynomial.o] Error 1



-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFA: enable LRA for rs6000
  2013-04-16  9:13 ` Michael Meissner
@ 2013-04-16  9:19   ` Steven Bosscher
  2013-04-16  9:27     ` Michael Meissner
  2013-04-17 11:53   ` RFA: enable LRA for rs6000 [patch for WRF] Michael Meissner
  1 sibling, 1 reply; 33+ messages in thread
From: Steven Bosscher @ 2013-04-16  9:19 UTC (permalink / raw)
  To: Michael Meissner, Vladimir Makarov, gcc-patches, David Edelsohn,
	Bergner, Peter

On Tue, Apr 16, 2013 at 12:48 AM, Michael Meissner wrote:
> 0x1055e1bf check_rtl             /home/meissner/fsf-src/meissner-lra/gcc/lra.c:1999

These are all cases of insns not satisfying their constraints. There
are no PRs for this, and there are no test suite failures of this kind
in the logs of my powerpc lra-branch test bot. I hope you can extract
test cases and file PRs...

Ciao!
Steven

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFA: enable LRA for rs6000
  2013-04-16  9:19   ` Steven Bosscher
@ 2013-04-16  9:27     ` Michael Meissner
  0 siblings, 0 replies; 33+ messages in thread
From: Michael Meissner @ 2013-04-16  9:27 UTC (permalink / raw)
  To: Steven Bosscher
  Cc: Michael Meissner, Vladimir Makarov, gcc-patches, David Edelsohn,
	Bergner, Peter

On Tue, Apr 16, 2013 at 01:03:35AM +0200, Steven Bosscher wrote:
> On Tue, Apr 16, 2013 at 12:48 AM, Michael Meissner wrote:
> > 0x1055e1bf check_rtl             /home/meissner/fsf-src/meissner-lra/gcc/lra.c:1999
> 
> These are all cases of insns not satisfying their constraints. There
> are no PRs for this, and there are no test suite failures of this kind
> in the logs of my powerpc lra-branch test bot. I hope you can extract
> test cases and file PRs...

Yes of course, but I wanted to give Vlad a heads up, ASAP.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFA: enable LRA for rs6000 [patch for WRF]
  2013-04-16  9:13 ` Michael Meissner
  2013-04-16  9:19   ` Steven Bosscher
@ 2013-04-17 11:53   ` Michael Meissner
  2013-04-17 16:13     ` Vladimir Makarov
  1 sibling, 1 reply; 33+ messages in thread
From: Michael Meissner @ 2013-04-17 11:53 UTC (permalink / raw)
  To: Michael Meissner, Vladimir Makarov, gcc-patches, David Edelsohn,
	Bergner, Peter

[-- Attachment #1: Type: text/plain, Size: 1960 bytes --]

I tracked down the bug with the spec 2006 benchmark WRF using the LRA register
allocator.

At one point LRA has decided to use the CTR to hold a CCmode value:

(insn 11019 11018 11020 16 (set (reg:CC 66 ctr [4411])
        (reg:CC 66 ctr [4411])) module_diffusion_em.fppized.f90:4885 360 {*movcc_internal1}
     (expr_list:REG_DEAD (reg:CC 66 ctr [4411])
        (nil)))

Now movcc_internal1 has moves from r->h (which includes ctr/lr) and ctr/lr->r,
but it doesn't have a move to cover the nop move of moving the ctr to the ctr.
IMHO, LRA should not be generating NOP moves that are later deleted.

There are two ways to solve the problem.  One is not to let anything but int
modes into CTR/LR, which will also eliminate the register allocator from
spilling floating point values there (which we've seen in the past, but the
last time I tried to eliminate it I couldn't).  The following patch does this,
and also changes the assertion to call fatal_insn_not_found to make it clearer
what the error is.

I imagine, I could add a NOP move insn to movcc_internal1, but that just
strikes me as wrong.

Note, this does not fix the 32-bit failure in dealII, and I also noticed that I
can't bootstrap the compiler using --with-cpu=power7, which I will get to
tomorrow.

2013-04-16  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* config/rs6000/rs6000.opt (-mconstrain-regs): New debug switch to
	control whether we only allow int modes to go in the CTR, LR,
	VRSAVE, VSCR registers.
	* config/rs6000/rs6000.c (rs6000_hard_regno_mode_ok): Likewise.
	(rs6000_debug_reg_global): If -mdebug=reg, print out if SPRs are
	constrained.
	(rs6000_option_override_internal): Set -mconstrain-regs if we are
	using the LRA register allocator.

	* lra.c (check_rtl): Use fatal_insn_not_found to report constraint
	does not match.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

[-- Attachment #2: gcc-power7.patch395b --]
[-- Type: text/plain, Size: 2664 bytes --]

Index: gcc/config/rs6000/rs6000.opt
===================================================================
--- gcc/config/rs6000/rs6000.opt	(revision 197925)
+++ gcc/config/rs6000/rs6000.opt	(working copy)
@@ -522,3 +522,7 @@ Control whether we save the TOC in the p
 mvsx-timode
 Target Undocumented Mask(VSX_TIMODE) Var(rs6000_isa_flags)
 ; Allow/disallow TImode in VSX registers
+
+mconstrain-regs
+Target Undocumented Mask(CONSTRAIN_REGS) Var(rs6000_isa_flags)
+; Only allow ints of certain modes to go in SPRs
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 197925)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -1608,6 +1608,18 @@ rs6000_hard_regno_mode_ok (int regno, en
   if (SPE_SIMD_REGNO_P (regno) && TARGET_SPE && SPE_VECTOR_MODE (mode))
     return 1;
 
+  /* See if we need to be stricter about what goes into the special
+     registers (LR, CTR, VSAVE, VSCR).  */
+  if (TARGET_CONSTRAIN_REGS)
+    {
+      if (regno == LR_REGNO || regno == CTR_REGNO)
+	return (GET_MODE_CLASS (mode) == MODE_INT
+		&& rs6000_hard_regno_nregs[mode][regno] == 1);
+
+      if (regno == VRSAVE_REGNO || regno == VSCR_REGNO)
+	return (mode == SImode);
+    }
+
   /* We cannot put non-VSX TImode or PTImode anywhere except general register
      and it must be able to fit within the register set.  */
 
@@ -2054,6 +2066,9 @@ rs6000_debug_reg_global (void)
   if (targetm.lra_p ())
     fprintf (stderr, DEBUG_FMT_S, "lra", "true");
 
+  if (TARGET_CONSTRAIN_REGS)
+    fprintf (stderr, DEBUG_FMT_S, "constrain-regs", "true");
+
   fprintf (stderr, DEBUG_FMT_S, "plt-format",
 	   TARGET_SECURE_PLT ? "secure" : "bss");
   fprintf (stderr, DEBUG_FMT_S, "struct-return",
@@ -2945,6 +2960,12 @@ rs6000_option_override_internal (bool gl
 	}
     }
 
+  /* If we are using the LRA register allocator, constrain the SPRs to only
+     have integer modes, and not modes like CC.  */
+  if ((rs6000_isa_flags_explicit & OPTION_MASK_CONSTRAIN_REGS) == 0
+      && targetm.lra_p ())
+    rs6000_isa_flags |= OPTION_MASK_CONSTRAIN_REGS;
+
   /* Place FP constants in the constant pool instead of TOC
      if section anchors enabled.  */
   if (flag_section_anchors)
Index: gcc/lra.c
===================================================================
--- gcc/lra.c	(revision 197925)
+++ gcc/lra.c	(working copy)
@@ -1996,7 +1996,8 @@ check_rtl (bool final_p)
 	if (final_p)
 	  {
 	    extract_insn (insn);
-	    lra_assert (constrain_operands (1));
+	    if (!constrain_operands (1))
+	      fatal_insn_not_found (insn);
 	    continue;
 	  }
 	if (insn_invalid_p (insn, false))

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFA: enable LRA for rs6000 [patch for WRF]
  2013-04-17 11:53   ` RFA: enable LRA for rs6000 [patch for WRF] Michael Meissner
@ 2013-04-17 16:13     ` Vladimir Makarov
  2013-04-17 19:43       ` Michael Meissner
  0 siblings, 1 reply; 33+ messages in thread
From: Vladimir Makarov @ 2013-04-17 16:13 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, David Edelsohn, Bergner, Peter

On 13-04-16 6:56 PM, Michael Meissner wrote:
> I tracked down the bug with the spec 2006 benchmark WRF using the LRA register
> allocator.
>
> At one point LRA has decided to use the CTR to hold a CCmode value:
>
> (insn 11019 11018 11020 16 (set (reg:CC 66 ctr [4411])
>          (reg:CC 66 ctr [4411])) module_diffusion_em.fppized.f90:4885 360 {*movcc_internal1}
>       (expr_list:REG_DEAD (reg:CC 66 ctr [4411])
>          (nil)))
>
> Now movcc_internal1 has moves from r->h (which includes ctr/lr) and ctr/lr->r,
> but it doesn't have a move to cover the nop move of moving the ctr to the ctr.
> IMHO, LRA should not be generating NOP moves that are later deleted.
>
> There are two ways to solve the problem.  One is not to let anything but int
> modes into CTR/LR, which will also eliminate the register allocator from
> spilling floating point values there (which we've seen in the past, but the
> last time I tried to eliminate it I couldn't).  The following patch does this,
> and also changes the assertion to call fatal_insn_not_found to make it clearer
> what the error is.
>
> I imagine, I could add a NOP move insn to movcc_internal1, but that just
> strikes me as wrong.
>
> Note, this does not fix the 32-bit failure in dealII, and I also noticed that I
> can't bootstrap the compiler using --with-cpu=power7, which I will get to
> tomorrow.
>
> 2013-04-16  Michael Meissner  <meissner@linux.vnet.ibm.com>
>
> 	* config/rs6000/rs6000.opt (-mconstrain-regs): New debug switch to
> 	control whether we only allow int modes to go in the CTR, LR,
> 	VRSAVE, VSCR registers.
> 	* config/rs6000/rs6000.c (rs6000_hard_regno_mode_ok): Likewise.
> 	(rs6000_debug_reg_global): If -mdebug=reg, print out if SPRs are
> 	constrained.
> 	(rs6000_option_override_internal): Set -mconstrain-regs if we are
> 	using the LRA register allocator.
>
> 	* lra.c (check_rtl): Use fatal_insn_not_found to report constraint
> 	does not match.
>
Mike, thanks for the patch and all the SPEC2006 data  (which are very 
useful as I have no access to power machine which can be used for 
benchmarking).  I guess that may be some benchmark scores are lower 
because of LRA lacks some micro-optimizations which reload implements 
through many power hooks (e.g. LRA does not use push reload).  Although 
sometimes it is not a bad thing (e.g. LRA does not use  
SECONDARY_MEMORY_NEEDED_RTX which permits to reuse the stack slots for 
other useful things).

In general I got impression that power7 is the most difficult port for 
LRA.  If we manage to port it, LRA ports for other targets will be easier.

I also reproduced bootstrap failure --with-cpu=power7 and I am going to 
work on this and after that on SPEC2006 you wrote about.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFA: enable LRA for rs6000 [patch for WRF]
  2013-04-17 16:13     ` Vladimir Makarov
@ 2013-04-17 19:43       ` Michael Meissner
  2013-04-19  8:09         ` Vladimir Makarov
  0 siblings, 1 reply; 33+ messages in thread
From: Michael Meissner @ 2013-04-17 19:43 UTC (permalink / raw)
  To: Vladimir Makarov
  Cc: Michael Meissner, gcc-patches, David Edelsohn, Bergner, Peter

On Wed, Apr 17, 2013 at 10:14:53AM -0400, Vladimir Makarov wrote:
> Mike, thanks for the patch and all the SPEC2006 data  (which are
> very useful as I have no access to power machine which can be used
> for benchmarking).  I guess that may be some benchmark scores are
> lower because of LRA lacks some micro-optimizations which reload
> implements through many power hooks (e.g. LRA does not use push
> reload).  Although sometimes it is not a bad thing (e.g. LRA does
> not use  SECONDARY_MEMORY_NEEDED_RTX which permits to reuse the
> stack slots for other useful things).

SECONDARY_MEMORY_NEEDED_RTX is needed for SDmode on machines before the power7,
where we need a larger stack slot to hold spilled SDmode values (power6 did not
have the LFIWZX instruction that is needed to load SDmode values into floating
point registers).

> In general I got impression that power7 is the most difficult port
> for LRA.  If we manage to port it, LRA ports for other targets will
> be easier.

I dunno, has LRA been ported to the SH yet?

> I also reproduced bootstrap failure --with-cpu=power7 and I am going
> to work on this and after that on SPEC2006 you wrote about.

Ok.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFA: enable LRA for rs6000 [patch for WRF]
  2013-04-17 19:43       ` Michael Meissner
@ 2013-04-19  8:09         ` Vladimir Makarov
  2013-04-19  8:13           ` Michael Meissner
  2013-04-19 21:32           ` Vladimir Makarov
  0 siblings, 2 replies; 33+ messages in thread
From: Vladimir Makarov @ 2013-04-19  8:09 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, David Edelsohn, Bergner, Peter, aavrunin

[-- Attachment #1: Type: text/plain, Size: 2482 bytes --]

On 04/17/2013 12:10 PM, Michael Meissner wrote:
> On Wed, Apr 17, 2013 at 10:14:53AM -0400, Vladimir Makarov wrote:
>> Mike, thanks for the patch and all the SPEC2006 data  (which are
>> very useful as I have no access to power machine which can be used
>> for benchmarking).  I guess that may be some benchmark scores are
>> lower because of LRA lacks some micro-optimizations which reload
>> implements through many power hooks (e.g. LRA does not use push
>> reload).  Although sometimes it is not a bad thing (e.g. LRA does
>> not use  SECONDARY_MEMORY_NEEDED_RTX which permits to reuse the
>> stack slots for other useful things).
> SECONDARY_MEMORY_NEEDED_RTX is needed for SDmode on machines before the power7,
> where we need a larger stack slot to hold spilled SDmode values (power6 did not
> have the LFIWZX instruction that is needed to load SDmode values into floating
> point registers).
Thanks for the info.
>> In general I got impression that power7 is the most difficult port
>> for LRA.  If we manage to port it, LRA ports for other targets will
>> be easier.
> I dunno, has LRA been ported to the SH yet?
Not yet.

Sorry for be inaccurate.  I meant 9 targets which I worked on to port LRA.
>> I also reproduced bootstrap failure --with-cpu=power7 and I am going
>> to work on this and after that on SPEC2006 you wrote about.
>
The bootstrap problem was in processing move whose operand was 
substituted by equiv. memory and the move needs secondary reload through 
a provided insn pattern.  The equiv memory was not legitimate and it 
resulted in failure to generated the secondary reload insn.

LRA can fix the wrong address but secondary reload  was done before 
processing addresses.  It could be fixed in rs6000.c code too but it is 
complicated and I found a better (and i think more right) solution by 
moving secondary reload generation after address processing.

Here is the patch for your branch (patch for trunk is a bit different as 
some changes in affected code were done on trunk).

2013-04-18  Vladimir Makarov  <vmakarov@redhat.com>

         * lra-constraints.c (check_and_process_move): Move code for move
         cost check to simple_move_p.  Remove equiv_substitution.
         (simple_move_p): New function.
         (curr_insn_transform): Use the new function.  Move call of
         check_and_process_move after operand equiv substitution and
         address process.

Tomorrow I am going to look at SPEC2006 dealII crash for 32-bit mode.


[-- Attachment #2: power7-bootstrap --]
[-- Type: text/plain, Size: 3179 bytes --]

Index: lra-constraints.c
===================================================================
--- lra-constraints.c	(revision 198028)
+++ lra-constraints.c	(working copy)
@@ -887,14 +887,6 @@ check_and_process_move (bool *change_p,
   lra_assert (curr_insn_set != NULL_RTX);
   dreg = dest = SET_DEST (curr_insn_set);
   sreg = src = SET_SRC (curr_insn_set);
-  /* Quick check on the right move insn which does not need
-     reloads.  */
-  if ((dclass = get_op_class (dest)) != NO_REGS
-      && (sclass = get_op_class (src)) != NO_REGS
-      /* The backend guarantees that register moves of cost 2 never
-	 need reloads.  */
-      && targetm.register_move_cost (GET_MODE (src), dclass, sclass) == 2)
-    return true;
   if (GET_CODE (dest) == SUBREG)
     dreg = SUBREG_REG (dest);
   if (GET_CODE (src) == SUBREG)
@@ -902,7 +894,6 @@ check_and_process_move (bool *change_p,
   if (! REG_P (dreg) || ! REG_P (sreg))
     return false;
   sclass = dclass = NO_REGS;
-  dreg = get_equiv_substitution (dreg);
   if (REG_P (dreg))
     dclass = get_reg_class (REGNO (dreg));
   if (dclass == ALL_REGS)
@@ -916,7 +907,6 @@ check_and_process_move (bool *change_p,
     return false;
   sreg_mode = GET_MODE (sreg);
   old_sreg = sreg;
-  sreg = get_equiv_substitution (sreg);
   if (REG_P (sreg))
     sclass = get_reg_class (REGNO (sreg));
   if (sclass == ALL_REGS)
@@ -2693,6 +2683,24 @@ emit_inc (enum reg_class new_rclass, rtx
   return result;
 }
 
+/* Return true if the current move insn does not need processing as we
+   already know that it satisfies its constraints.  */
+static bool
+simple_move_p (void)
+{
+  rtx dest, src;
+  enum reg_class dclass, sclass;
+
+  lra_assert (curr_insn_set != NULL_RTX);
+  dest = SET_DEST (curr_insn_set);
+  src = SET_SRC (curr_insn_set);
+  return ((dclass = get_op_class (dest)) != NO_REGS
+	  && (sclass = get_op_class (src)) != NO_REGS
+	  /* The backend guarantees that register moves of cost 2
+	     never need reloads.  */
+	  && targetm.register_move_cost (GET_MODE (src), dclass, sclass) == 2);
+ }
+
 /* Swap operands NOP and NOP + 1. */
 static inline void
 swap_operands (int nop)
@@ -2736,15 +2744,13 @@ curr_insn_transform (void)
   int max_regno_before;
   int reused_alternative_num;
 
+  curr_insn_set = single_set (curr_insn);
+  if (curr_insn_set != NULL_RTX && simple_move_p ())
+    return false;
+
   no_input_reloads_p = no_output_reloads_p = false;
   goal_alt_number = -1;
-
   change_p = sec_mem_p = false;
-  curr_insn_set = single_set (curr_insn);
-  if (curr_insn_set != NULL_RTX
-      && check_and_process_move (&change_p, &sec_mem_p))
-    return change_p;
-
   /* JUMP_INSNs and CALL_INSNs are not allowed to have any output
      reloads; neither are insns that SET cc0.  Insns that use CC0 are
      not allowed to have any input reloads.  */
@@ -2839,6 +2845,10 @@ curr_insn_transform (void)
        we chose previously may no longer be valid.  */
     lra_set_used_insn_alternative (curr_insn, -1);
 
+  if (curr_insn_set != NULL_RTX
+      && check_and_process_move (&change_p, &sec_mem_p))
+    return change_p;
+
  try_swapped:
 
   reused_alternative_num = curr_id->used_insn_alternative;

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFA: enable LRA for rs6000 [patch for WRF]
  2013-04-19  8:09         ` Vladimir Makarov
@ 2013-04-19  8:13           ` Michael Meissner
  2013-04-19 21:32           ` Vladimir Makarov
  1 sibling, 0 replies; 33+ messages in thread
From: Michael Meissner @ 2013-04-19  8:13 UTC (permalink / raw)
  To: Vladimir Makarov
  Cc: Michael Meissner, gcc-patches, David Edelsohn, Bergner, Peter, aavrunin

On Thu, Apr 18, 2013 at 04:44:21PM -0400, Vladimir Makarov wrote:
> LRA can fix the wrong address but secondary reload  was done before
> processing addresses.  It could be fixed in rs6000.c code too but it
> is complicated and I found a better (and i think more right)
> solution by moving secondary reload generation after address
> processing.

I tended to think secondary_reload should always happen after address
processing.

> Here is the patch for your branch (patch for trunk is a bit
> different as some changes in affected code were done on trunk).
> 
> 2013-04-18  Vladimir Makarov  <vmakarov@redhat.com>
> 
>         * lra-constraints.c (check_and_process_move): Move code for move
>         cost check to simple_move_p.  Remove equiv_substitution.
>         (simple_move_p): New function.
>         (curr_insn_transform): Use the new function.  Move call of
>         check_and_process_move after operand equiv substitution and
>         address process.
> 
> Tomorrow I am going to look at SPEC2006 dealII crash for 32-bit mode.
> 

Thanks for the patch.  Unfortunately I just updated the branch to be merged up
to 198065.  So if you could send me a patch against current trunk, or update
the branch (branches/ibm/meissner-lra) and check it into the branch, it would
be appreciated.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFA: enable LRA for rs6000 [patch for WRF]
  2013-04-19  8:09         ` Vladimir Makarov
  2013-04-19  8:13           ` Michael Meissner
@ 2013-04-19 21:32           ` Vladimir Makarov
  2013-04-22 10:56             ` Alan Modra
  1 sibling, 1 reply; 33+ messages in thread
From: Vladimir Makarov @ 2013-04-19 21:32 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, David Edelsohn, Bergner, Peter, aavrunin

On 13-04-18 4:44 PM, Vladimir Makarov wrote:
>
> Tomorrow I am going to look at SPEC2006 dealII crash for 32-bit mode.
>
   LRA crashes on insn

(insn 406 575 391 22 (set (reg:TF 35 3)
         (mem/u/c:TF (lo_sum:SI (reg:SI 7 7 [414])
                 (symbol_ref/u:SI ("*.LC10") [flags 0x82])) [85 S16 
A128])) 
/home/cygnus/vmakarov/build/spec2006/benchspec/CPU2006/447.dealII/src/quadrature_lib.cc:80 
373 {*movtf_intern\
al}
      (expr_list:REG_DEAD (reg:SI 7 7 [414])
         (nil)))

   As I understand this correct insn.  LRA checks the insn correctness in

             if (!constrain_operands (1))
               fatal_insn_not_found (insn);

   and fails.  The reason for this code in 
rs6000.c::legitimate_lo_sum_address_p

       if (! lra_in_progress && GET_MODE_SIZE (mode) > UNITS_PER_WORD
           && !(/* ??? Assume floating point reg based on mode?  */
                TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_DOUBLE_FLOAT
                && (mode == DFmode || mode == DDmode)))
         return false;

   lra_in_progress is false and mode==TFmode.  I don't want to setup 
lra_in_progress to true at this stage as I need right correctness check 
(lra_in_progress may affect code checking RTLcorrectness, make the test 
less rigourous).

   If I change the above code to

       if (! lra_in_progress && GET_MODE_SIZE (mode) > UNITS_PER_WORD
           && !(/* ??? Assume floating point reg based on mode?  */
                TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_DOUBLE_FLOAT
                && (mode == DFmode || mode == DDmode || mode == TFmode 
|| mode == TDmode)))
         return false;

   the crash is gone.

   I don't understand what this check means and what comments ??? means too.

   So it is up to you, Mike, to decide is my change correct or not.

Thanks.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFA: enable LRA for rs6000 [patch for WRF]
  2013-04-19 21:32           ` Vladimir Makarov
@ 2013-04-22 10:56             ` Alan Modra
  2013-04-22 20:57               ` Vladimir Makarov
  0 siblings, 1 reply; 33+ messages in thread
From: Alan Modra @ 2013-04-22 10:56 UTC (permalink / raw)
  To: Vladimir Makarov
  Cc: Michael Meissner, gcc-patches, David Edelsohn, Bergner, Peter, aavrunin

On Fri, Apr 19, 2013 at 05:16:43PM -0400, Vladimir Makarov wrote:
>   If I change the above code to
> 
>       if (! lra_in_progress && GET_MODE_SIZE (mode) > UNITS_PER_WORD
>           && !(/* ??? Assume floating point reg based on mode?  */
>                TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_DOUBLE_FLOAT
>                && (mode == DFmode || mode == DDmode || mode ==
> TFmode || mode == TDmode)))
>         return false;
> 
>   the crash is gone.

A lo_sum address isn't always offsettable.  You know that you can't
read or write the whole TFmode with one instruction, and that the mem
will be accessed at addr+0 and addr+8 (and possibly at addr+4 and
addr+12).  If it so happens that addr is n*65536+32760, then addr+8
will overflow the 16-bit lo_sum and you'll actually try to access
n*65536-32768 for the second part.

So I don't think your change is correct.

>   I don't understand what this check means and what comments ??? means too.

A lo_sum mem is only valid if you know it won't be offset (or that
offsetting will never cross a 64k+32k boundary).  If the access is
smaller than a word then the load or store can be done in one insn.
No offset required.  If the access is a DFmode *and* you are loading
or storing a floating point reg, then the access is also done in one
insn.  The ??? comment is referring to the fact that you don't know
for sure that the DFmode is in a floating point reg.  It usually is,
but may be in two general purpose regs.  Which then need an offset to
load/store the second reg.

-- 
Alan Modra
Australia Development Lab, IBM

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFA: enable LRA for rs6000 [patch for WRF]
  2013-04-22 10:56             ` Alan Modra
@ 2013-04-22 20:57               ` Vladimir Makarov
  2013-04-22 20:59                 ` Michael Meissner
  0 siblings, 1 reply; 33+ messages in thread
From: Vladimir Makarov @ 2013-04-22 20:57 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, David Edelsohn, Bergner, Peter, aavrunin

On 13-04-22 12:35 AM, Alan Modra wrote:
> On Fri, Apr 19, 2013 at 05:16:43PM -0400, Vladimir Makarov wrote:
>>    I don't understand what this check means and what comments ??? means too.
> A lo_sum mem is only valid if you know it won't be offset (or that
> offsetting will never cross a 64k+32k boundary).  If the access is
> smaller than a word then the load or store can be done in one insn.
> No offset required.  If the access is a DFmode *and* you are loading
> or storing a floating point reg, then the access is also done in one
> insn.  The ??? comment is referring to the fact that you don't know
> for sure that the DFmode is in a floating point reg.  It usually is,
> but may be in two general purpose regs.  Which then need an offset to
> load/store the second reg.
>
Alan, thanks for the explanation. I'll search for another solution.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFA: enable LRA for rs6000 [patch for WRF]
  2013-04-22 20:57               ` Vladimir Makarov
@ 2013-04-22 20:59                 ` Michael Meissner
  2013-04-23  8:14                   ` Vladimir Makarov
  0 siblings, 1 reply; 33+ messages in thread
From: Michael Meissner @ 2013-04-22 20:59 UTC (permalink / raw)
  To: Vladimir Makarov
  Cc: Michael Meissner, gcc-patches, David Edelsohn, Bergner, Peter, aavrunin

On Mon, Apr 22, 2013 at 03:26:45PM -0400, Vladimir Makarov wrote:
> On 13-04-22 12:35 AM, Alan Modra wrote:
> >On Fri, Apr 19, 2013 at 05:16:43PM -0400, Vladimir Makarov wrote:
> >>   I don't understand what this check means and what comments ??? means too.
> >A lo_sum mem is only valid if you know it won't be offset (or that
> >offsetting will never cross a 64k+32k boundary).  If the access is
> >smaller than a word then the load or store can be done in one insn.
> >No offset required.  If the access is a DFmode *and* you are loading
> >or storing a floating point reg, then the access is also done in one
> >insn.  The ??? comment is referring to the fact that you don't know
> >for sure that the DFmode is in a floating point reg.  It usually is,
> >but may be in two general purpose regs.  Which then need an offset to
> >load/store the second reg.
> >
> Alan, thanks for the explanation. I'll search for another solution.

I'm suspecting secondary_reload needs more tuning for TF/TD modes.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFA: enable LRA for rs6000 [patch for WRF]
  2013-04-22 20:59                 ` Michael Meissner
@ 2013-04-23  8:14                   ` Vladimir Makarov
  2013-04-23  9:05                     ` David Edelsohn
  0 siblings, 1 reply; 33+ messages in thread
From: Vladimir Makarov @ 2013-04-23  8:14 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, David Edelsohn, Bergner, Peter, aavrunin

[-- Attachment #1: Type: text/plain, Size: 1804 bytes --]

On 13-04-22 3:31 PM, Michael Meissner wrote:
> On Mon, Apr 22, 2013 at 03:26:45PM -0400, Vladimir Makarov wrote:
>> On 13-04-22 12:35 AM, Alan Modra wrote:
>>> On Fri, Apr 19, 2013 at 05:16:43PM -0400, Vladimir Makarov wrote:
>>>>    I don't understand what this check means and what comments ??? means too.
>>> A lo_sum mem is only valid if you know it won't be offset (or that
>>> offsetting will never cross a 64k+32k boundary).  If the access is
>>> smaller than a word then the load or store can be done in one insn.
>>> No offset required.  If the access is a DFmode *and* you are loading
>>> or storing a floating point reg, then the access is also done in one
>>> insn.  The ??? comment is referring to the fact that you don't know
>>> for sure that the DFmode is in a floating point reg.  It usually is,
>>> but may be in two general purpose regs.  Which then need an offset to
>>> load/store the second reg.
>>>
>> Alan, thanks for the explanation. I'll search for another solution.
> I'm suspecting secondary_reload needs more tuning for TF/TD modes.
>
I've fixed dealII crash and commited the patch into the branch as rev. 
198169.

The dealII crash itself can be cured by treatment of lo_sum for LRA the 
same way as for reload (please see code checking modes for addressing 
more one word).  But in this case a few tests fail which is cured in LRA 
itself by trying to load address into pseudo using lo_sum.

The patch was successfully bootstrapped (--with-cpu=power7) and tested 
on GCC testsuite.

2013-04-22  Vladimir Makarov <vmakarov@redhat.com>

         * lra-constraints.c (process_address): Try to put lo_sum into
         register.
         * config/rs6000/rs6000.c (legitimate_lo_sum_address_p): Remove
         lra_in_progress guard for addressing something bigger than word.



[-- Attachment #2: deal.patch --]
[-- Type: text/plain, Size: 1991 bytes --]

Index: ChangeLog
===================================================================
--- ChangeLog	(revision 198101)
+++ ChangeLog	(working copy)
@@ -1,3 +1,10 @@
+2013-04-22  Vladimir Makarov  <vmakarov@redhat.com>
+
+	* lra-constraints.c (process_address): Try to put lo_sum into
+	register.
+	* config/rs6000/rs6000.c (legitimate_lo_sum_address_p): Remove
+	lra_in_progress guard for addressing something bigger than word.
+
 2013-04-18  Vladimir Makarov  <vmakarov@redhat.com>
 
 	* lra-constraints.c (check_and_process_move): Move code for move
Index: config/rs6000/rs6000.c
===================================================================
--- config/rs6000/rs6000.c	(revision 198101)
+++ config/rs6000/rs6000.c	(working copy)
@@ -5736,7 +5736,7 @@ legitimate_lo_sum_address_p (enum machin
 	return false;
       if (GET_MODE_NUNITS (mode) != 1)
 	return false;
-      if (! lra_in_progress && GET_MODE_SIZE (mode) > UNITS_PER_WORD
+      if (GET_MODE_SIZE (mode) > UNITS_PER_WORD
 	  && !(/* ??? Assume floating point reg based on mode?  */
 	       TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_DOUBLE_FLOAT
 	       && (mode == DFmode || mode == DDmode)))
Index: lra-constraints.c
===================================================================
--- lra-constraints.c	(revision 198101)
+++ lra-constraints.c	(working copy)
@@ -2504,8 +2504,21 @@ process_address (int nop, rtx *before, r
 		*ad.inner = gen_rtx_LO_SUM (Pmode, new_reg, addr);
 		if (! valid_address_p (ad.mode, *ad.outer, ad.as))
 		  {
-		    *ad.inner = addr;
-		    code = -1;
+		    /* Try to put lo_sum into register.  */
+		    insn = emit_insn (gen_rtx_SET
+				      (VOIDmode, new_reg,
+				       gen_rtx_LO_SUM (Pmode, new_reg, addr)));
+		    code = recog_memoized (insn);
+		    if (code >= 0)
+		      {
+			*ad.inner = new_reg;
+			if (! valid_address_p (ad.mode, *ad.outer, ad.as))
+			  {
+			    *ad.inner = addr;
+			    code = -1;
+			  }
+		      }
+		    
 		  }
 	      }
 	    if (code < 0)

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFA: enable LRA for rs6000 [patch for WRF]
  2013-04-23  8:14                   ` Vladimir Makarov
@ 2013-04-23  9:05                     ` David Edelsohn
  2013-04-23 21:02                       ` Vladimir Makarov
  0 siblings, 1 reply; 33+ messages in thread
From: David Edelsohn @ 2013-04-23  9:05 UTC (permalink / raw)
  To: Vladimir Makarov; +Cc: Michael Meissner, gcc-patches, Bergner, Peter, aavrunin

On Mon, Apr 22, 2013 at 8:43 PM, Vladimir Makarov <vmakarov@redhat.com> wrote:

>         * config/rs6000/rs6000.c (legitimate_lo_sum_address_p): Remove
>         lra_in_progress guard for addressing something bigger than word.

That will work, but I'm worried that it is too fragile.  Previously
the code uniformly returned consistent values from the various
legitimate address functions.  Changing the response based on
lra_in_progress for various modes seems like a problem waiting to
happen.

Thanks, David

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFA: enable LRA for rs6000 [patch for WRF]
  2013-04-23  9:05                     ` David Edelsohn
@ 2013-04-23 21:02                       ` Vladimir Makarov
  2013-04-23 21:32                         ` David Edelsohn
  0 siblings, 1 reply; 33+ messages in thread
From: Vladimir Makarov @ 2013-04-23 21:02 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Michael Meissner, gcc-patches, Bergner, Peter, aavrunin

On 04/22/2013 09:55 PM, David Edelsohn wrote:
> On Mon, Apr 22, 2013 at 8:43 PM, Vladimir Makarov <vmakarov@redhat.com> wrote:
>
>>          * config/rs6000/rs6000.c (legitimate_lo_sum_address_p): Remove
>>          lra_in_progress guard for addressing something bigger than word.
> That will work, but I'm worried that it is too fragile.  Previously
> the code uniformly returned consistent values from the various
> legitimate address functions.  Changing the response based on
> lra_in_progress for various modes seems like a problem waiting to
> happen.
>
   LRA approach is different from reload one.  First of all LRA can 
create pseudos (not assigned yet to anything) and secondly LRA makes 
changes incrementally opposite to reload which makes all final rewriting 
code when it decides that a final solution is found.  The difference in 
approaches means that LRA can not use all reload macros and hooks 
completely without changes.  Although I tried to minimize the changes, 
they are still present.  For example, LRA will never use hooks which 
call push_reload which is completely oriented to reload pass.

   I'd not say that "the code uniformly returned consistent values". One 
place where lra_in_progress is needed exactly because of discrepancy 
between legitimize reload address hook (which can call push_reload and 
can not be used by LRA) and legitimate address hook.  Two other places 
in rs6000_emit_move for SDmode where lra_in_progress is used are because 
LRA can create and use spilled pseudos instead of using concrete memory 
slot as reload does.  And two the rest places are in calls of 
legitimate_const_pool_address_p where LRA actually querying in strict 
sense.  So I think the changes are minimal.

David, thanks for expressing the concern.  I hope I answered it. The 
future will show the reality not just our speculations.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFA: enable LRA for rs6000 [patch for WRF]
  2013-04-23 21:02                       ` Vladimir Makarov
@ 2013-04-23 21:32                         ` David Edelsohn
  2013-04-23 21:45                           ` Vladimir Makarov
  0 siblings, 1 reply; 33+ messages in thread
From: David Edelsohn @ 2013-04-23 21:32 UTC (permalink / raw)
  To: Vladimir Makarov; +Cc: Michael Meissner, gcc-patches, Bergner, Peter, aavrunin

On Tue, Apr 23, 2013 at 11:03 AM, Vladimir Makarov <vmakarov@redhat.com> wrote:

>   LRA approach is different from reload one.  First of all LRA can create
> pseudos (not assigned yet to anything) and secondly LRA makes changes
> incrementally opposite to reload which makes all final rewriting code when
> it decides that a final solution is found.  The difference in approaches
> means that LRA can not use all reload macros and hooks completely without
> changes.  Although I tried to minimize the changes, they are still present.
> For example, LRA will never use hooks which call push_reload which is
> completely oriented to reload pass.
>
>   I'd not say that "the code uniformly returned consistent values". One
> place where lra_in_progress is needed exactly because of discrepancy between
> legitimize reload address hook (which can call push_reload and can not be
> used by LRA) and legitimate address hook.  Two other places in
> rs6000_emit_move for SDmode where lra_in_progress is used are because LRA
> can create and use spilled pseudos instead of using concrete memory slot as
> reload does.  And two the rest places are in calls of
> legitimate_const_pool_address_p where LRA actually querying in strict sense.
> So I think the changes are minimal.
>
> David, thanks for expressing the concern.  I hope I answered it. The future
> will show the reality not just our speculations.


Vlad,

I don't think that you understood my concern.  I am not concerned
about adding lra_in_progress.  I am concerned about combining
lra_in_progress with the mode.  I would like an explanation why it is
correct for the result of legitimate address functions to vary with
lra_in_progress combined with the mode.  Not simply that it works.

The issue should be if an offsettable address is valid and the size of
the offset.

Thanks, David

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFA: enable LRA for rs6000 [patch for WRF]
  2013-04-23 21:32                         ` David Edelsohn
@ 2013-04-23 21:45                           ` Vladimir Makarov
  2013-04-23 22:02                             ` David Edelsohn
  2013-04-24 22:51                             ` Michael Meissner
  0 siblings, 2 replies; 33+ messages in thread
From: Vladimir Makarov @ 2013-04-23 21:45 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Michael Meissner, gcc-patches, Bergner, Peter, aavrunin

On 04/23/2013 11:33 AM, David Edelsohn wrote:
> On Tue, Apr 23, 2013 at 11:03 AM, Vladimir Makarov <vmakarov@redhat.com> wrote:
>
>>    LRA approach is different from reload one.  First of all LRA can create
>> pseudos (not assigned yet to anything) and secondly LRA makes changes
>> incrementally opposite to reload which makes all final rewriting code when
>> it decides that a final solution is found.  The difference in approaches
>> means that LRA can not use all reload macros and hooks completely without
>> changes.  Although I tried to minimize the changes, they are still present.
>> For example, LRA will never use hooks which call push_reload which is
>> completely oriented to reload pass.
>>
>>    I'd not say that "the code uniformly returned consistent values". One
>> place where lra_in_progress is needed exactly because of discrepancy between
>> legitimize reload address hook (which can call push_reload and can not be
>> used by LRA) and legitimate address hook.  Two other places in
>> rs6000_emit_move for SDmode where lra_in_progress is used are because LRA
>> can create and use spilled pseudos instead of using concrete memory slot as
>> reload does.  And two the rest places are in calls of
>> legitimate_const_pool_address_p where LRA actually querying in strict sense.
>> So I think the changes are minimal.
>>
>> David, thanks for expressing the concern.  I hope I answered it. The future
>> will show the reality not just our speculations.
>
> Vlad,
>
> I don't think that you understood my concern.  I am not concerned
> about adding lra_in_progress.  I am concerned about combining
> lra_in_progress with the mode.  I would like an explanation why it is
> correct for the result of legitimate address functions to vary with
> lra_in_progress combined with the mode.  Not simply that it works.
>
> The issue should be if an offsettable address is valid and the size of
> the offset.
>
Sorry, may be I missed some other places but
could you be more specific about the place where combining 
lra_in_progress and mode happens now for legitimate address querying.

I guess I explained in my previous emails why the following change is 
necessary in legitimate_lo_sum_address_p:

       toc_ok_p = (lra_in_progress && TARGET_CMODEL != CMODEL_SMALL
                   && small_toc_ref (x, VOIDmode));

lra_in_progress was removed in my latest change which now looks like the 
original code:

       if (GET_MODE_SIZE (mode) > UNITS_PER_WORD
           && !(/* ??? Assume floating point reg based on mode?  */
                TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_DOUBLE_FLOAT
                && (mode == DFmode || mode == DDmode)))
         return false;

So I can not see other places.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFA: enable LRA for rs6000 [patch for WRF]
  2013-04-23 21:45                           ` Vladimir Makarov
@ 2013-04-23 22:02                             ` David Edelsohn
  2013-04-24 22:51                             ` Michael Meissner
  1 sibling, 0 replies; 33+ messages in thread
From: David Edelsohn @ 2013-04-23 22:02 UTC (permalink / raw)
  To: Vladimir Makarov; +Cc: Michael Meissner, gcc-patches, Bergner, Peter, aavrunin

On Tue, Apr 23, 2013 at 11:37 AM, Vladimir Makarov <vmakarov@redhat.com> wrote:

> Sorry, may be I missed some other places but
> could you be more specific about the place where combining lra_in_progress
> and mode happens now for legitimate address querying.
>
> I guess I explained in my previous emails why the following change is
> necessary in legitimate_lo_sum_address_p:
>
>       toc_ok_p = (lra_in_progress && TARGET_CMODEL != CMODEL_SMALL
>                   && small_toc_ref (x, VOIDmode));
>
> lra_in_progress was removed in my latest change which now looks like the
> original code:
>
>       if (GET_MODE_SIZE (mode) > UNITS_PER_WORD
>
>           && !(/* ??? Assume floating point reg based on mode?  */
>                TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_DOUBLE_FLOAT
>                && (mode == DFmode || mode == DDmode)))
>         return false;
>
> So I can not see other places.

Okay. I misunderstood the change. I thought that you were adding
lra_in_progress tied to the mode.

Thanks, David

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFA: enable LRA for rs6000 [patch for WRF]
  2013-04-23 21:45                           ` Vladimir Makarov
  2013-04-23 22:02                             ` David Edelsohn
@ 2013-04-24 22:51                             ` Michael Meissner
  2013-04-25 19:27                               ` Vladimir Makarov
  1 sibling, 1 reply; 33+ messages in thread
From: Michael Meissner @ 2013-04-24 22:51 UTC (permalink / raw)
  To: Vladimir Makarov
  Cc: David Edelsohn, Michael Meissner, gcc-patches, Bergner, Peter, aavrunin

I'm seeing a lot of failures with these changes in make check.

The first two that I noticed on a build that did not use --with-cpu=power7:

1) c-c++-common/dfp/call-by-value.c (and others in the directory) fails with -O0
for all targets before power7 because it can't spill SDmode.  Note, in the
earlier targets, we need to have a wider spill slot because we don't have
32-bit integer load/store to the FPR registers.

FAIL: c-c++-common/dfp/call-by-value.c (internal compiler error)
FAIL: c-c++-common/dfp/call-by-value.c (test for excess errors)
Excess errors:
/home/meissner/fsf-src/meissner-lra/gcc/testsuite/c-c++-common/dfp/call-by-value.c:43:1: internal compiler error: in assign_by_spills, at lra-assigns.c:1268
0x104ceff7 assign_by_spills
        /home/meissner/fsf-src/meissner-lra/gcc/lra-assigns.c:1268
0x104cfe43 lra_assign()
        /home/meissner/fsf-src/meissner-lra/gcc/lra-assigns.c:1425
0x104ca837 lra(_IO_FILE*)
        /home/meissner/fsf-src/meissner-lra/gcc/lra.c:2309
0x1047d6eb do_reload
        /home/meissner/fsf-src/meissner-lra/gcc/ira.c:4619
0x1047d6eb rest_of_handle_reload
        /home/meissner/fsf-src/meissner-lra/gcc/ira.c:4731

2) A lot of fortran tests are failing for all optimization levels due to
segmentation violations at runtime:

AIL: gfortran.dg/advance_1.f90  -O0  execution test
Executing on host: /home/meissner/fsf-build-ppc64/meissner-lra/gcc/testsuite/gfortran1/../../gfortran -B/home/meissner/fsf-build-ppc64/meissner-lra/gcc/testsuite/gfortran1/../../ -B/home/meissner/fsf-build-ppc64/meissner-lra/powerpc64-unkno
wn-linux-gnu/./libgfortran/ /home/meissner/fsf-src/meissner-lra/gcc/testsuite/gfortran.dg/advance_1.f90  -fno-diagnostics-show-caret   -O1   -pedantic-errors  -B/home/meissner/fsf-build-ppc64/meissner-lra/powerpc64-unknown-linux-gnu/32/libg
fortran/.libs -L/home/meissner/fsf-build-ppc64/meissner-lra/powerpc64-unknown-linux-gnu/32/libgfortran/.libs -L/home/meissner/fsf-build-ppc64/meissner-lra/powerpc64-unknown-linux-gnu/32/libgfortran/.libs  -lm   -m32 -o ./advance_1.exe    (t
imeout = 300)
spawn /home/meissner/fsf-build-ppc64/meissner-lra/gcc/testsuite/gfortran1/../../gfortran -B/home/meissner/fsf-build-ppc64/meissner-lra/gcc/testsuite/gfortran1/../../ -B/home/meissner/fsf-build-ppc64/meissner-lra/powerpc64-unknown-linux-gnu/
./libgfortran/ /home/meissner/fsf-src/meissner-lra/gcc/testsuite/gfortran.dg/advance_1.f90 -fno-diagnostics-show-caret -O1 -pedantic-errors -B/home/meissner/fsf-build-ppc64/meissner-lra/powerpc64-unknown-linux-gnu/32/libgfortran/.libs -L/ho
me/meissner/fsf-build-ppc64/meissner-lra/powerpc64-unknown-linux-gnu/32/libgfortran/.libs -L/home/meissner/fsf-build-ppc64/meissner-lra/powerpc64-unknown-linux-gnu/32/libgfortran/.libs -lm -m32 -o ./advance_1.exe
PASS: gfortran.dg/advance_1.f90  -O1  (test for excess errors)
Setting LD_LIBRARY_PATH to .:/home/meissner/fsf-build-ppc64/meissner-lra/powerpc64-unknown-linux-gnu/32/libgfortran/.libs:/home/meissner/fsf-build-ppc64/meissner-lra/powerpc64-unknown-linux-gnu/32/libgfortran/.libs:/home/meissner/fsf-build-
ppc64/meissner-lra/gcc:/home/meissner/fsf-build-ppc64/meissner-lra/gcc/32:.:/home/meissner/fsf-build-ppc64/meissner-lra/powerpc64-unknown-linux-gnu/32/libgfortran/.libs:/home/meissner/fsf-build-ppc64/meissner-lra/powerpc64-unknown-linux-gnu
/32/libgfortran/.libs:/home/meissner/fsf-build-ppc64/meissner-lra/gcc:/home/meissner/fsf-build-ppc64/meissner-lra/gcc/32:/home/meissner/tools/ppc64/lib:/home/meissner/tools/ppc32/lib:/home/meissner/tools-binutils/ppc64/lib:/home/meissner/to
ols-binutils/ppc32/lib
spawn [open ...]

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.


-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFA: enable LRA for rs6000 [patch for WRF]
  2013-04-24 22:51                             ` Michael Meissner
@ 2013-04-25 19:27                               ` Vladimir Makarov
  2013-04-25 22:29                                 ` Vladimir Makarov
  0 siblings, 1 reply; 33+ messages in thread
From: Vladimir Makarov @ 2013-04-25 19:27 UTC (permalink / raw)
  To: Michael Meissner, David Edelsohn, gcc-patches, Bergner, Peter, aavrunin

On 04/24/2013 03:42 PM, Michael Meissner wrote:
> I'm seeing a lot of failures with these changes in make check.
>
I've reproduced it, Mike.  I'll work on it.  Thanks.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFA: enable LRA for rs6000 [patch for WRF]
  2013-04-25 19:27                               ` Vladimir Makarov
@ 2013-04-25 22:29                                 ` Vladimir Makarov
  2013-04-26 22:21                                   ` RFA: enable LRA for rs6000 Michael Meissner
  2013-04-27  0:34                                   ` RFA: enable LRA for rs6000 [32-bit fortran] Michael Meissner
  0 siblings, 2 replies; 33+ messages in thread
From: Vladimir Makarov @ 2013-04-25 22:29 UTC (permalink / raw)
  To: Michael Meissner, David Edelsohn, gcc-patches, Bergner, Peter, aavrunin

On 04/25/2013 11:29 AM, Vladimir Makarov wrote:
> On 04/24/2013 03:42 PM, Michael Meissner wrote:
>> I'm seeing a lot of failures with these changes in make check.
>>
> I've reproduced it, Mike.  I'll work on it.  Thanks.
>
I've fixed it, Mike.  It is on the branch now.

2013-04-25  Vladimir Makarov  <vmakarov@redhat.com>

         * config/rs6000/rs6000.c (rs6000_emit_move): Use assigned hard reg
         for LRA SD moves.

Index: config/rs6000/rs6000.c
===================================================================
--- config/rs6000/rs6000.c      (revision 198263)
+++ config/rs6000/rs6000.c      (working copy)
@@ -7377,9 +7377,14 @@ rs6000_emit_move (rtx dest, rtx source,

        if (regno >= FIRST_PSEUDO_REGISTER)
     {
-         cl = reg_preferred_class (regno);
-         gcc_assert (cl != NO_REGS);
-         regno = ira_class_hard_regs[cl][0];
+         if (reg_renumber[regno] >= 0)
+           regno = reg_renumber[regno];
+         else
+           {
+             cl = reg_preferred_class (regno);
+             gcc_assert (cl != NO_REGS);
+             regno = ira_class_hard_regs[cl][0];
+           }
         }
        if (FP_REGNO_P (regno))
     {
@@ -7407,9 +7412,14 @@ rs6000_emit_move (rtx dest, rtx source,

        if (regno >= FIRST_PSEUDO_REGISTER)
         {
-         cl = reg_preferred_class (regno);
-         gcc_assert (cl != NO_REGS);
-         regno = ira_class_hard_regs[cl][0];
+         if (reg_renumber[regno] >= 0)
+           regno = reg_renumber[regno];
+         else
+           {
+             cl = reg_preferred_class (regno);
+             gcc_assert (cl != NO_REGS);
+             regno = ira_class_hard_regs[cl][0];
+           }
         }
        if (FP_REGNO_P (regno))
         {



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFA: enable LRA for rs6000
  2013-04-25 22:29                                 ` Vladimir Makarov
@ 2013-04-26 22:21                                   ` Michael Meissner
  2013-04-27  8:07                                     ` Vladimir Makarov
  2013-04-27  0:34                                   ` RFA: enable LRA for rs6000 [32-bit fortran] Michael Meissner
  1 sibling, 1 reply; 33+ messages in thread
From: Michael Meissner @ 2013-04-26 22:21 UTC (permalink / raw)
  To: Vladimir Makarov
  Cc: Michael Meissner, David Edelsohn, gcc-patches, Bergner, Peter, aavrunin

Vlad, in going through the LRA test differences, some of the bswap64 tests are
failing because LRA converts the swaps for register/register converts into
store/load.  For example, if gcc.target/powerpc/bswap64-4.c is compiled on
32-bit, for this function:

long long swap_reg (long long a) { return __builtin_bswap64 (a); }

LRA gives:

swap_reg:
        stwu 1,-16(1)
        li 9,4
        stw 3,8(1)
        stw 4,12(1)
        addi 10,1,8
        lwbrx 3,9,10
        lwbrx 4,0,10
        addi 1,1,16
        blr

And the traditional code generation is:

swap_reg:
        rlwinm 9,4,8,0xffffffff
        rlwinm 10,3,8,0xffffffff
        rlwimi 9,4,24,0,7
        rlwimi 10,3,24,0,7
        rlwimi 9,4,24,16,23
        rlwimi 10,3,24,16,23
        mr 4,10
        mr 3,9

I assume the rlwinm's are to be preferred because there is no LHS, and also in
this case, the 2 registers rlwinm's are done in parallel.

The test gcc.target/powerpc/vect-83_64.c is failing in LRA:

vect-83_64.c: In function ‘main1’:
vect-83_64.c:30:1: internal compiler error: Max. number of generated reload insns per insn is achieved (90)

 }
 ^
0x104dca7f lra_constraints(bool)
        /home/meissner/fsf-src/meissner-lra/gcc/lra-constraints.c:3613
0x104ca67b lra(_IO_FILE*)
        /home/meissner/fsf-src/meissner-lra/gcc/lra.c:2278
0x1047d6eb do_reload
        /home/meissner/fsf-src/meissner-lra/gcc/ira.c:4619
0x1047d6eb rest_of_handle_reload
        /home/meissner/fsf-src/meissner-lra/gcc/ira.c:4731
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <http://gcc.gnu.org/bugs.html> for instructions.

I'm also seeing quite a few Fortran failures for -m32:

gfortran.dg/PR19872.f
gfortran.dg/advance_1.f90
gfortran.dg/advance_4.f90
gfortran.dg/advance_5.f90
gfortran.dg/advance_6.f90
gfortran.dg/append_1.f90
gfortran.dg/associated_2.f90
gfortran.dg/assumed_rank_1.f90
gfortran.dg/assumed_rank_2.f90
gfortran.dg/assumed_rank_7.f90
gfortran.dg/assumed_type_2.f90
gfortran.dg/backspace_10.f90
gfortran.dg/backspace_2.f
gfortran.dg/backspace_8.f
gfortran.dg/backspace_9.f
gfortran.dg/bound_2.f90
gfortran.dg/bound_7.f90
gfortran.dg/bound_8.f90
gfortran.dg/char_cshift_1.f90
gfortran.dg/char_cshift_2.f90
gfortran.dg/char_cshift_3.f90
gfortran.dg/char_eoshift_1.f90
gfortran.dg/char_eoshift_2.f90
gfortran.dg/char_eoshift_3.f90
gfortran.dg/char_eoshift_4.f90
gfortran.dg/char_eoshift_5.f90
gfortran.dg/char_length_8.f90
gfortran.dg/chmod_1.f90
gfortran.dg/chmod_2.f90
gfortran.dg/chmod_3.f90
gfortran.dg/comma.f
gfortran.dg/convert_2.f90
gfortran.dg/convert_implied_open.f90
gfortran.dg/cr_lf.f90
gfortran.dg/cshift_bounds_1.f90
gfortran.dg/cshift_bounds_2.f90
gfortran.dg/cshift_bounds_3.f90
gfortran.dg/cshift_bounds_4.f90
gfortran.dg/cshift_nan_1.f90
gfortran.dg/dev_null.F90
gfortran.dg/direct_io_1.f90
gfortran.dg/direct_io_11.f90
gfortran.dg/direct_io_12.f90
gfortran.dg/direct_io_2.f90
gfortran.dg/direct_io_3.f90
gfortran.dg/direct_io_5.f90
gfortran.dg/direct_io_8.f90
gfortran.dg/endfile.f90
gfortran.dg/endfile_2.f90
gfortran.dg/eof_4.f90
gfortran.dg/eoshift.f90
gfortran.dg/eoshift_bounds_1.f90
gfortran.dg/error_format.f90
gfortran.dg/f2003_inquire_1.f03
gfortran.dg/f2003_io_1.f03
gfortran.dg/f2003_io_5.f03
gfortran.dg/f2003_io_7.f03
gfortran.dg/fmt_cache_1.f
gfortran.dg/fmt_error_4.f90
gfortran.dg/fmt_error_5.f90
gfortran.dg/fmt_t_5.f90
gfortran.dg/fmt_t_7.f
gfortran.dg/ftell_3.f90
gfortran.dg/hollerith4.f90
gfortran.dg/inquire_10.f90
gfortran.dg/inquire_13.f90
gfortran.dg/inquire_15.f90
gfortran.dg/inquire_9.f90
gfortran.dg/inquire_size.f90
gfortran.dg/iomsg_1.f90
gfortran.dg/iostat_2.f90
gfortran.dg/list_read_10.f90
gfortran.dg/list_read_6.f90
gfortran.dg/list_read_7.f90
gfortran.dg/list_read_9.f90
gfortran.dg/matmul_1.f90
gfortran.dg/matmul_5.f90
gfortran.dg/maxloc_bounds_1.f90
gfortran.dg/maxloc_bounds_2.f90
gfortran.dg/maxloc_bounds_3.f90
gfortran.dg/maxloc_bounds_6.f90
gfortran.dg/maxloc_bounds_8.f90
gfortran.dg/namelist_44.f90
gfortran.dg/namelist_45.f90
gfortran.dg/namelist_46.f90
gfortran.dg/namelist_66.f90
gfortran.dg/namelist_72.f
gfortran.dg/namelist_82.f90
gfortran.dg/negative_automatic_size.f90
gfortran.dg/negative_unit.f
gfortran.dg/negative_unit_int8.f
gfortran.dg/newunit_1.f90
gfortran.dg/newunit_3.f90
gfortran.dg/open_access_append_1.f90
gfortran.dg/open_errors.f90
gfortran.dg/open_negative_unit_1.f90
gfortran.dg/open_new.f90
gfortran.dg/open_readonly_1.f90
gfortran.dg/open_status_1.f90
gfortran.dg/open_status_2.f90
gfortran.dg/open_status_3.f90
gfortran.dg/optional_dim_2.f90
gfortran.dg/optional_dim_3.f90
gfortran.dg/overwrite_1.f
gfortran.dg/pr16597.f90
gfortran.dg/pr16935.f90
gfortran.dg/pr20954.f
gfortran.dg/pr39865.f90
gfortran.dg/pr46804.f90
gfortran.dg/pr47878.f90
gfortran.dg/read_comma.f
gfortran.dg/read_eof_4.f90
gfortran.dg/read_eof_8.f90
gfortran.dg/read_eof_all.f90
gfortran.dg/read_list_eof_1.f90
gfortran.dg/read_many_1.f
gfortran.dg/read_no_eor.f90
gfortran.dg/readwrite_unf_direct_eor_1.f90
gfortran.dg/realloc_on_assign_11.f90
gfortran.dg/realloc_on_assign_7.f03
gfortran.dg/record_marker_1.f90
gfortran.dg/record_marker_3.f90
gfortran.dg/runtime_warning_1.f90
gfortran.dg/selected_char_kind_1.f90
gfortran.dg/selected_char_kind_4.f90
gfortran.dg/shift-alloc.f90
gfortran.dg/shift-kind_2.f90
gfortran.dg/stat_1.f90
gfortran.dg/stat_2.f90
gfortran.dg/streamio_1.f90
gfortran.dg/streamio_10.f90
gfortran.dg/streamio_12.f90
gfortran.dg/streamio_14.f90
gfortran.dg/streamio_15.f90
gfortran.dg/streamio_16.f90
gfortran.dg/streamio_2.f90
gfortran.dg/streamio_3.f90
gfortran.dg/streamio_4.f90
gfortran.dg/streamio_5.f90
gfortran.dg/streamio_6.f90
gfortran.dg/streamio_7.f90
gfortran.dg/streamio_8.f90
gfortran.dg/streamio_9.f90
gfortran.dg/tl_editing.f90
gfortran.dg/unf_io_convert_1.f90
gfortran.dg/unf_io_convert_2.f90
gfortran.dg/unf_io_convert_3.f90
gfortran.dg/unf_io_convert_4.f90
gfortran.dg/unf_read_corrupted_1.f90
gfortran.dg/unf_short_record_1.f90
gfortran.dg/unformatted_subrecord_1.f90
gfortran.dg/unpack_bounds_1.f90
gfortran.dg/unpack_bounds_2.f90
gfortran.dg/unpack_bounds_3.f90
gfortran.dg/widechar_intrinsics_10.f90
gfortran.dg/widechar_intrinsics_5.f90
gfortran.dg/write_back.f
gfortran.dg/write_check.f90
gfortran.dg/write_check3.f90
gfortran.dg/write_direct_eor.f90
gfortran.dg/write_rewind_1.f
gfortran.dg/write_rewind_2.f
gfortran.dg/write_to_null.F90
gfortran.dg/x_slash_2.f
gfortran.dg/zero_sized_1.f90
gfortran.fortran-torture/execute/backspace.f90
gfortran.fortran-torture/execute/direct_io.f90
gfortran.fortran-torture/execute/inquire_1.f90
gfortran.fortran-torture/execute/inquire_2.f90
gfortran.fortran-torture/execute/inquire_3.f90
gfortran.fortran-torture/execute/inquire_4.f90
gfortran.fortran-torture/execute/inquire_5.f90
gfortran.fortran-torture/execute/intrinsic_associated.f90
gfortran.fortran-torture/execute/intrinsic_associated_2.f90
gfortran.fortran-torture/execute/intrinsic_cshift.f90
gfortran.fortran-torture/execute/intrinsic_eoshift.f90
gfortran.fortran-torture/execute/intrinsic_size.f90
gfortran.fortran-torture/execute/list_read_1.f90
gfortran.fortran-torture/execute/open_replace.f90
gfortran.fortran-torture/execute/seq_io.f90
gfortran.fortran-torture/execute/slash_edit.f90
gfortran.fortran-torture/execute/unopened_unit_1.f90

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFA: enable LRA for rs6000 [32-bit fortran]
  2013-04-25 22:29                                 ` Vladimir Makarov
  2013-04-26 22:21                                   ` RFA: enable LRA for rs6000 Michael Meissner
@ 2013-04-27  0:34                                   ` Michael Meissner
  2013-04-27  8:10                                     ` Vladimir Makarov
  1 sibling, 1 reply; 33+ messages in thread
From: Michael Meissner @ 2013-04-27  0:34 UTC (permalink / raw)
  To: Vladimir Makarov
  Cc: Michael Meissner, David Edelsohn, gcc-patches, Bergner, Peter, aavrunin

In addition to all of the failures in the 32-bit gfortrain suite, I ran one run
of the 32-bit spec 2006 fortan tests, and the following benchmarks fail:

	410.bwaves	416.gamess	434.zeusmp
	437.leslie3d	454.calculix	459.GemsFDTD
	465.tonto	481.wrf

The following 2 benchmarks succeed:

	435.gromacs	436.cactusADM

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFA: enable LRA for rs6000
  2013-04-26 22:21                                   ` RFA: enable LRA for rs6000 Michael Meissner
@ 2013-04-27  8:07                                     ` Vladimir Makarov
  2013-04-27  8:33                                       ` Michael Meissner
  0 siblings, 1 reply; 33+ messages in thread
From: Vladimir Makarov @ 2013-04-27  8:07 UTC (permalink / raw)
  To: Michael Meissner, David Edelsohn, gcc-patches, Bergner, Peter, aavrunin

On 13-04-26 11:30 AM, Michael Meissner wrote:
> Vlad, in going through the LRA test differences, some of the bswap64 tests are
> failing because LRA converts the swaps for register/register converts into
> store/load.  For example, if gcc.target/powerpc/bswap64-4.c is compiled on
> 32-bit, for this function:
>
> long long swap_reg (long long a) { return __builtin_bswap64 (a); }
>
> LRA gives:
>
> swap_reg:
>          stwu 1,-16(1)
>          li 9,4
>          stw 3,8(1)
>          stw 4,12(1)
>          addi 10,1,8
>          lwbrx 3,9,10
>          lwbrx 4,0,10
>          addi 1,1,16
>          blr
>
> And the traditional code generation is:
>
> swap_reg:
>          rlwinm 9,4,8,0xffffffff
>          rlwinm 10,3,8,0xffffffff
>          rlwimi 9,4,24,0,7
>          rlwimi 10,3,24,0,7
>          rlwimi 9,4,24,16,23
>          rlwimi 10,3,24,16,23
>          mr 4,10
>          mr 3,9
>
> I assume the rlwinm's are to be preferred because there is no LHS, and also in
> this case, the 2 registers rlwinm's are done in parallel.
>
> The test gcc.target/powerpc/vect-83_64.c is failing in LRA:
>
> vect-83_64.c: In function ‘main1’:
> vect-83_64.c:30:1: internal compiler error: Max. number of generated reload insns per insn is achieved (90)
>
>   }
>   ^
> 0x104dca7f lra_constraints(bool)
>          /home/meissner/fsf-src/meissner-lra/gcc/lra-constraints.c:3613
> 0x104ca67b lra(_IO_FILE*)
>          /home/meissner/fsf-src/meissner-lra/gcc/lra.c:2278
> 0x1047d6eb do_reload
>          /home/meissner/fsf-src/meissner-lra/gcc/ira.c:4619
> 0x1047d6eb rest_of_handle_reload
>          /home/meissner/fsf-src/meissner-lra/gcc/ira.c:4731
> Please submit a full bug report,
> with preprocessed source if appropriate.
> Please include the complete backtrace with any bug report.
> See <http://gcc.gnu.org/bugs.html> for instructions.

The patch below solved the above problems.  The problem was in that I 
increased alternatives with '?' (although manual says that it should 
affect only regclass).  It improves slightly x86/x86-64 generated code 
performance.  But power has some strange insns with ???? (four ?).  It 
looks a woodo to me.

I've committed the patch into the branch.


> I'm also seeing quite a few Fortran failures for -m32:
It seems fixed too.
> gfortran.dg/PR19872.f
> gfortran.dg/advance_1.f90
> gfortran.dg/advance_4.f90
> gfortran.dg/advance_5.f90
> gfortran.dg/advance_6.f90
> gfortran.dg/append_1.f90
> gfortran.dg/associated_2.f90
> gfortran.dg/assumed_rank_1.f90
> gfortran.dg/assumed_rank_2.f90
> gfortran.dg/assumed_rank_7.f90
> gfortran.dg/assumed_type_2.f90
> gfortran.dg/backspace_10.f90
> gfortran.dg/backspace_2.f
> gfortran.dg/backspace_8.f
> gfortran.dg/backspace_9.f
> gfortran.dg/bound_2.f90
> gfortran.dg/bound_7.f90
> gfortran.dg/bound_8.f90
> gfortran.dg/char_cshift_1.f90
> gfortran.dg/char_cshift_2.f90
> gfortran.dg/char_cshift_3.f90
> gfortran.dg/char_eoshift_1.f90
> gfortran.dg/char_eoshift_2.f90
> gfortran.dg/char_eoshift_3.f90
> gfortran.dg/char_eoshift_4.f90
> gfortran.dg/char_eoshift_5.f90
> gfortran.dg/char_length_8.f90
> gfortran.dg/chmod_1.f90
> gfortran.dg/chmod_2.f90
> gfortran.dg/chmod_3.f90
> gfortran.dg/comma.f
> gfortran.dg/convert_2.f90
> gfortran.dg/convert_implied_open.f90
> gfortran.dg/cr_lf.f90
> gfortran.dg/cshift_bounds_1.f90
> gfortran.dg/cshift_bounds_2.f90
> gfortran.dg/cshift_bounds_3.f90
> gfortran.dg/cshift_bounds_4.f90
> gfortran.dg/cshift_nan_1.f90
> gfortran.dg/dev_null.F90
> gfortran.dg/direct_io_1.f90
> gfortran.dg/direct_io_11.f90
> gfortran.dg/direct_io_12.f90
> gfortran.dg/direct_io_2.f90
> gfortran.dg/direct_io_3.f90
> gfortran.dg/direct_io_5.f90
> gfortran.dg/direct_io_8.f90
> gfortran.dg/endfile.f90
> gfortran.dg/endfile_2.f90
> gfortran.dg/eof_4.f90
> gfortran.dg/eoshift.f90
> gfortran.dg/eoshift_bounds_1.f90
> gfortran.dg/error_format.f90
> gfortran.dg/f2003_inquire_1.f03
> gfortran.dg/f2003_io_1.f03
> gfortran.dg/f2003_io_5.f03
> gfortran.dg/f2003_io_7.f03
> gfortran.dg/fmt_cache_1.f
> gfortran.dg/fmt_error_4.f90
> gfortran.dg/fmt_error_5.f90
> gfortran.dg/fmt_t_5.f90
> gfortran.dg/fmt_t_7.f
> gfortran.dg/ftell_3.f90
> gfortran.dg/hollerith4.f90
> gfortran.dg/inquire_10.f90
> gfortran.dg/inquire_13.f90
> gfortran.dg/inquire_15.f90
> gfortran.dg/inquire_9.f90
> gfortran.dg/inquire_size.f90
> gfortran.dg/iomsg_1.f90
> gfortran.dg/iostat_2.f90
> gfortran.dg/list_read_10.f90
> gfortran.dg/list_read_6.f90
> gfortran.dg/list_read_7.f90
> gfortran.dg/list_read_9.f90
> gfortran.dg/matmul_1.f90
> gfortran.dg/matmul_5.f90
> gfortran.dg/maxloc_bounds_1.f90
> gfortran.dg/maxloc_bounds_2.f90
> gfortran.dg/maxloc_bounds_3.f90
> gfortran.dg/maxloc_bounds_6.f90
> gfortran.dg/maxloc_bounds_8.f90
> gfortran.dg/namelist_44.f90
> gfortran.dg/namelist_45.f90
> gfortran.dg/namelist_46.f90
> gfortran.dg/namelist_66.f90
> gfortran.dg/namelist_72.f
> gfortran.dg/namelist_82.f90
> gfortran.dg/negative_automatic_size.f90
> gfortran.dg/negative_unit.f
> gfortran.dg/negative_unit_int8.f
> gfortran.dg/newunit_1.f90
> gfortran.dg/newunit_3.f90
> gfortran.dg/open_access_append_1.f90
> gfortran.dg/open_errors.f90
> gfortran.dg/open_negative_unit_1.f90
> gfortran.dg/open_new.f90
> gfortran.dg/open_readonly_1.f90
> gfortran.dg/open_status_1.f90
> gfortran.dg/open_status_2.f90
> gfortran.dg/open_status_3.f90
> gfortran.dg/optional_dim_2.f90
> gfortran.dg/optional_dim_3.f90
> gfortran.dg/overwrite_1.f
> gfortran.dg/pr16597.f90
> gfortran.dg/pr16935.f90
> gfortran.dg/pr20954.f
> gfortran.dg/pr39865.f90
> gfortran.dg/pr46804.f90
> gfortran.dg/pr47878.f90
> gfortran.dg/read_comma.f
> gfortran.dg/read_eof_4.f90
> gfortran.dg/read_eof_8.f90
> gfortran.dg/read_eof_all.f90
> gfortran.dg/read_list_eof_1.f90
> gfortran.dg/read_many_1.f
> gfortran.dg/read_no_eor.f90
> gfortran.dg/readwrite_unf_direct_eor_1.f90
> gfortran.dg/realloc_on_assign_11.f90
> gfortran.dg/realloc_on_assign_7.f03
> gfortran.dg/record_marker_1.f90
> gfortran.dg/record_marker_3.f90
> gfortran.dg/runtime_warning_1.f90
> gfortran.dg/selected_char_kind_1.f90
> gfortran.dg/selected_char_kind_4.f90
> gfortran.dg/shift-alloc.f90
> gfortran.dg/shift-kind_2.f90
> gfortran.dg/stat_1.f90
> gfortran.dg/stat_2.f90
> gfortran.dg/streamio_1.f90
> gfortran.dg/streamio_10.f90
> gfortran.dg/streamio_12.f90
> gfortran.dg/streamio_14.f90
> gfortran.dg/streamio_15.f90
> gfortran.dg/streamio_16.f90
> gfortran.dg/streamio_2.f90
> gfortran.dg/streamio_3.f90
> gfortran.dg/streamio_4.f90
> gfortran.dg/streamio_5.f90
> gfortran.dg/streamio_6.f90
> gfortran.dg/streamio_7.f90
> gfortran.dg/streamio_8.f90
> gfortran.dg/streamio_9.f90
> gfortran.dg/tl_editing.f90
> gfortran.dg/unf_io_convert_1.f90
> gfortran.dg/unf_io_convert_2.f90
> gfortran.dg/unf_io_convert_3.f90
> gfortran.dg/unf_io_convert_4.f90
> gfortran.dg/unf_read_corrupted_1.f90
> gfortran.dg/unf_short_record_1.f90
> gfortran.dg/unformatted_subrecord_1.f90
> gfortran.dg/unpack_bounds_1.f90
> gfortran.dg/unpack_bounds_2.f90
> gfortran.dg/unpack_bounds_3.f90
> gfortran.dg/widechar_intrinsics_10.f90
> gfortran.dg/widechar_intrinsics_5.f90
> gfortran.dg/write_back.f
> gfortran.dg/write_check.f90
> gfortran.dg/write_check3.f90
> gfortran.dg/write_direct_eor.f90
> gfortran.dg/write_rewind_1.f
> gfortran.dg/write_rewind_2.f
> gfortran.dg/write_to_null.F90
> gfortran.dg/x_slash_2.f
> gfortran.dg/zero_sized_1.f90
> gfortran.fortran-torture/execute/backspace.f90
> gfortran.fortran-torture/execute/direct_io.f90
> gfortran.fortran-torture/execute/inquire_1.f90
> gfortran.fortran-torture/execute/inquire_2.f90
> gfortran.fortran-torture/execute/inquire_3.f90
> gfortran.fortran-torture/execute/inquire_4.f90
> gfortran.fortran-torture/execute/inquire_5.f90
> gfortran.fortran-torture/execute/intrinsic_associated.f90
> gfortran.fortran-torture/execute/intrinsic_associated_2.f90
> gfortran.fortran-torture/execute/intrinsic_cshift.f90
> gfortran.fortran-torture/execute/intrinsic_eoshift.f90
> gfortran.fortran-torture/execute/intrinsic_size.f90
> gfortran.fortran-torture/execute/list_read_1.f90
> gfortran.fortran-torture/execute/open_replace.f90
> gfortran.fortran-torture/execute/seq_io.f90
> gfortran.fortran-torture/execute/slash_edit.f90
> gfortran.fortran-torture/execute/unopened_unit_1.f90
>
2013-04-26  Vladimir Makarov  <vmakarov@redhat.com>

         * lra.c (setup_operand_alternative): Ignore '?'.
         * lra-constraints.c (process_alt_operands): Print cost dump for
         alternatives.  Check only moves for cycling.
         (curr_insn_transform): Print insn name.

Index: lra-constraints.c
===================================================================
--- lra-constraints.c   (revision 198344)
+++ lra-constraints.c   (working copy)
@@ -2038,7 +2038,13 @@ process_alt_operands (int only_alternati
              or non-important thing to be worth to do it.  */
           overall = losers * LRA_LOSER_COST_FACTOR + reject;
           if ((best_losers == 0 || losers != 0) && best_overall < overall)
-           goto fail;
+           {
+             if (lra_dump_file != NULL)
+               fprintf (lra_dump_file, " alt=%d,overall=%d,losers=%d,"
+                        "small_class_ops=%d,rld_nregs=%d -- reject\n",
+                nalt, overall, losers, small_class_operands_num, 
reload_nregs);
+             goto fail;
+           }

           curr_alt[nop] = this_alternative;
           COPY_HARD_REG_SET (curr_alt_set[nop], this_alternative_set);
@@ -2055,6 +2061,9 @@ process_alt_operands (int only_alternati
             early_clobbered_nops[early_clobbered_regs_num++] = nop;
         }
        if (curr_insn_set != NULL_RTX && n_operands == 2
+          /* Prevent processing non-move insns.  */
+          && (GET_CODE (SET_SRC (curr_insn_set)) == SUBREG
+              || SET_SRC (curr_insn_set) == no_subreg_reg_operand[1])
           && ((! curr_alt_win[0] && ! curr_alt_win[1]
                && REG_P (no_subreg_reg_operand[0])
                && REG_P (no_subreg_reg_operand[1])
@@ -2162,6 +2171,10 @@ process_alt_operands (int only_alternati
         small_class_operands_num
           += SMALL_REGISTER_CLASS_P (curr_alt[nop]) ? 1 : 0;

+      if (lra_dump_file != NULL)
+       fprintf (lra_dump_file, " alt=%d,overall=%d,losers=%d,"
+                "small_class_ops=%d,rld_nregs=%d\n",
+                nalt, overall, losers, small_class_operands_num, 
reload_nregs);
        /* If this alternative can be made to work by reloading, and it
          needs less reloading than the others checked so far, record
          it as the chosen goal for reloading.  */
@@ -2518,7 +2531,6 @@ process_address (int nop, rtx *before, r
                             code = -1;
                           }
                       }
-
                   }
               }
             if (code < 0)
@@ -3007,6 +3019,9 @@ curr_insn_transform (void)
           for (; *p != '\0' && *p != ',' && *p != '#'; p++)
             fputc (*p, lra_dump_file);
         }
+      if (INSN_CODE (curr_insn) >= 0
+         && (p = get_insn_name (INSN_CODE (curr_insn))) != NULL)
+       fprintf (lra_dump_file, " {%s}", p);
        fprintf (lra_dump_file, "\n");
      }

Index: lra.c
===================================================================
--- lra.c       (revision 198344)
+++ lra.c       (working copy)
@@ -784,9 +784,6 @@ setup_operand_alternative (lra_insn_reco
                   lra_assert (i != nop - 1);
                   break;

-               case '?':
-                 op_alt->reject += LRA_LOSER_COST_FACTOR;
-                 break;
                 case '!':
                   op_alt->reject += LRA_MAX_REJECT;
                   break;

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFA: enable LRA for rs6000 [32-bit fortran]
  2013-04-27  0:34                                   ` RFA: enable LRA for rs6000 [32-bit fortran] Michael Meissner
@ 2013-04-27  8:10                                     ` Vladimir Makarov
  2013-04-28 14:45                                       ` RFA: enable LRA for rs6000 [lra-constraints] Michael Meissner
  0 siblings, 1 reply; 33+ messages in thread
From: Vladimir Makarov @ 2013-04-27  8:10 UTC (permalink / raw)
  To: Michael Meissner, David Edelsohn, gcc-patches, Bergner, Peter, aavrunin

On 13-04-26 2:04 PM, Michael Meissner wrote:
> In addition to all of the failures in the 32-bit gfortrain suite, I ran one run
> of the 32-bit spec 2006 fortan tests, and the following benchmarks fail:
>
> 	410.bwaves	416.gamess	434.zeusmp
> 	437.leslie3d	454.calculix	459.GemsFDTD
> 	465.tonto	481.wrf
I'll work on this on Monday.
> The following 2 benchmarks succeed:
>
> 	435.gromacs	436.cactusADM
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFA: enable LRA for rs6000
  2013-04-27  8:07                                     ` Vladimir Makarov
@ 2013-04-27  8:33                                       ` Michael Meissner
  2013-04-28 21:13                                         ` Vladimir Makarov
  0 siblings, 1 reply; 33+ messages in thread
From: Michael Meissner @ 2013-04-27  8:33 UTC (permalink / raw)
  To: Vladimir Makarov
  Cc: Michael Meissner, David Edelsohn, gcc-patches, Bergner, Peter, aavrunin

On Fri, Apr 26, 2013 at 07:00:37PM -0400, Vladimir Makarov wrote:
> 2013-04-26  Vladimir Makarov  <vmakarov@redhat.com>
> 
>         * lra.c (setup_operand_alternative): Ignore '?'.
>         * lra-constraints.c (process_alt_operands): Print cost dump for
>         alternatives.  Check only moves for cycling.
>         (curr_insn_transform): Print insn name.

I'm not sure I'm comfortable with ignoring the '?' altogether.  For example, if
you do something in the GPR unit, instructions run at one cycle, while if you
do it in the vector unit, it runs in two cycles.  In the past, I've seen cases
where it wanted to spill floating point values from the floating point
registers to the CTR.  And if you spill to the LR, it can interfere with the
call cache.

Admitily, when to use '!', '?', and '*' is unclear, and unfortunately it has
changed over time.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFA: enable LRA for rs6000 [lra-constraints]
  2013-04-27  8:10                                     ` Vladimir Makarov
@ 2013-04-28 14:45                                       ` Michael Meissner
  0 siblings, 0 replies; 33+ messages in thread
From: Michael Meissner @ 2013-04-28 14:45 UTC (permalink / raw)
  To: Vladimir Makarov
  Cc: Michael Meissner, David Edelsohn, gcc-patches, Bergner, Peter, aavrunin

Note, this last patch does not bootstrap on powerpc:

/home/meissner/fsf-src/meissner-lra/gcc/lra-constraints.c: In function ‘bool process_alt_operands(int)’:
/home/meissner/fsf-src/meissner-lra/gcc/lra-constraints.c:2045:66: error:
‘small_class_operands_num’ may be used uninitialized in this function
[-Werror=maybe-uninitialized]

I have set it to 0 in the declaration in my sources, to do the bootstrap.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFA: enable LRA for rs6000
  2013-04-27  8:33                                       ` Michael Meissner
@ 2013-04-28 21:13                                         ` Vladimir Makarov
  0 siblings, 0 replies; 33+ messages in thread
From: Vladimir Makarov @ 2013-04-28 21:13 UTC (permalink / raw)
  To: Michael Meissner; +Cc: David Edelsohn, gcc-patches, Peter Bergner, aavrunin



----- Original Message -----
From: "Michael Meissner" <meissner@linux.vnet.ibm.com>
To: "Vladimir Makarov" <vmakarov@redhat.com>
Cc: "Michael Meissner" <meissner@linux.vnet.ibm.com>, "David Edelsohn" <dje.gcc@gmail.com>, "gcc-patches" <gcc-patches@gcc.gnu.org>, "Peter Bergner" <bergner@vnet.ibm.com>, aavrunin@redhat.com
Sent: Friday, April 26, 2013 7:13:55 PM
Subject: Re: RFA: enable LRA for rs6000

On Fri, Apr 26, 2013 at 07:00:37PM -0400, Vladimir Makarov wrote:
> 2013-04-26  Vladimir Makarov  <vmakarov@redhat.com>
> 
>         * lra.c (setup_operand_alternative): Ignore '?'.
>         * lra-constraints.c (process_alt_operands): Print cost dump for
>         alternatives.  Check only moves for cycling.
>         (curr_insn_transform): Print insn name.

I'm not sure I'm comfortable with ignoring the '?' altogether.  For example, if
you do something in the GPR unit, instructions run at one cycle, while if you
do it in the vector unit, it runs in two cycles.  In the past, I've seen cases
where it wanted to spill floating point values from the floating point
registers to the CTR.  And if you spill to the LR, it can interfere with the
call cache.

Admitily, when to use '!', '?', and '*' is unclear, and unfortunately it has
changed over time.

-----------

I don't like to change '?' semantics too.  So I found another solution.
I've committed it to the branch although it might be not final solution --
I'd like to see how it affects x86/x86-64.

Mike, could you send me the config file (if it is possible of course) for
spec2006 you are using in order to be in sync.

Thanks, Vlad.


2013-04-28  Vladimir Makarov  <vmakarov@redhat.com>

        * lra.c (setup_operand_alternative): Restore taking into account
        '?'.
        * lra-constraints.c (process_alt_operands): Discourage a bit more using
        memory for pseudos.  Remove printing undefined values.  Modify
        cost values for conflicts with early clobbers.


Index: lra.c
===================================================================
--- lra.c       (revision 198350)
+++ lra.c       (working copy)
@@ -784,6 +784,9 @@ setup_operand_alternative (lra_insn_reco
                  lra_assert (i != nop - 1);
                  break;

+               case '?':
+                 op_alt->reject += LRA_LOSER_COST_FACTOR;
+                 break;
                case '!':
                  op_alt->reject += LRA_MAX_REJECT;
                  break;
Index: lra-constraints.c
===================================================================
--- lra-constraints.c   (revision 198373)
+++ lra-constraints.c   (working copy)
@@ -2007,7 +2007,7 @@ process_alt_operands (int only_alternati
                 although it might takes the same number of
                 reloads.  */
              if (no_regs_p && REG_P (op))
-               reject++;
+               reject += 2;

 #ifdef SECONDARY_MEMORY_NEEDED
              /* If reload requires moving value through secondary
@@ -2040,9 +2040,9 @@ process_alt_operands (int only_alternati
          if ((best_losers == 0 || losers != 0) && best_overall < overall)
            {
              if (lra_dump_file != NULL)
-               fprintf (lra_dump_file, "          alt=%d,overall=%d,losers=%d,"
-                        "small_class_ops=%d,rld_nregs=%d -- reject\n",
-                nalt, overall, losers, small_class_operands_num, reload_nregs);
+               fprintf (lra_dump_file,
+                        "          alt=%d,overall=%d,losers=%d -- reject\n",
+                        nalt, overall, losers);
              goto fail;
            }

@@ -2139,7 +2139,10 @@ process_alt_operands (int only_alternati
              curr_alt_dont_inherit_ops[curr_alt_dont_inherit_ops_num++]
                = last_conflict_j;
              losers++;
-             overall += LRA_LOSER_COST_FACTOR;

+             /* Early clobber was already reflected in REJECT. */
+             lra_assert (reject > 0);
+             reject--;
+             overall += LRA_LOSER_COST_FACTOR - 1;
            }
          else
            {
@@ -2163,7 +2166,10 @@ process_alt_operands (int only_alternati
                }
              curr_alt_win[i] = curr_alt_match_win[i] = false;
              losers++;
-             overall += LRA_LOSER_COST_FACTOR;
+             /* Early clobber was already reflected in REJECT. */
+             lra_assert (reject > 0);
+             reject--;
+             overall += LRA_LOSER_COST_FACTOR - 1;
            }
        }
       small_class_operands_num = 0;

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2013-04-28 17:07 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-04-11 20:09 RFA: enable LRA for rs6000 Vladimir Makarov
2013-04-11 20:56 ` David Edelsohn
2013-04-11 22:48   ` Vladimir Makarov
2013-04-12 19:09     ` Vladimir Makarov
2013-04-13 15:02       ` David Edelsohn
2013-04-16  9:13 ` Michael Meissner
2013-04-16  9:19   ` Steven Bosscher
2013-04-16  9:27     ` Michael Meissner
2013-04-17 11:53   ` RFA: enable LRA for rs6000 [patch for WRF] Michael Meissner
2013-04-17 16:13     ` Vladimir Makarov
2013-04-17 19:43       ` Michael Meissner
2013-04-19  8:09         ` Vladimir Makarov
2013-04-19  8:13           ` Michael Meissner
2013-04-19 21:32           ` Vladimir Makarov
2013-04-22 10:56             ` Alan Modra
2013-04-22 20:57               ` Vladimir Makarov
2013-04-22 20:59                 ` Michael Meissner
2013-04-23  8:14                   ` Vladimir Makarov
2013-04-23  9:05                     ` David Edelsohn
2013-04-23 21:02                       ` Vladimir Makarov
2013-04-23 21:32                         ` David Edelsohn
2013-04-23 21:45                           ` Vladimir Makarov
2013-04-23 22:02                             ` David Edelsohn
2013-04-24 22:51                             ` Michael Meissner
2013-04-25 19:27                               ` Vladimir Makarov
2013-04-25 22:29                                 ` Vladimir Makarov
2013-04-26 22:21                                   ` RFA: enable LRA for rs6000 Michael Meissner
2013-04-27  8:07                                     ` Vladimir Makarov
2013-04-27  8:33                                       ` Michael Meissner
2013-04-28 21:13                                         ` Vladimir Makarov
2013-04-27  0:34                                   ` RFA: enable LRA for rs6000 [32-bit fortran] Michael Meissner
2013-04-27  8:10                                     ` Vladimir Makarov
2013-04-28 14:45                                       ` RFA: enable LRA for rs6000 [lra-constraints] Michael Meissner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).