public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: Enable EBX for x86 in 32bits PIC code
       [not found]       ` <CAMbmDYZV_fx0jxmKHhLsC2pJ7pDzuu6toEAH72izOdpq6KGyfg@mail.gmail.com>
@ 2014-08-22 12:21         ` Ilya Enkovich
  2014-08-23  1:47           ` Hans-Peter Nilsson
                             ` (4 more replies)
  0 siblings, 5 replies; 49+ messages in thread
From: Ilya Enkovich @ 2014-08-22 12:21 UTC (permalink / raw)
  To: gcc, gcc-patches
  Cc: Evgeny Stupachenko, Richard Biener, Uros Bizjak, law, vmakarov

Hi,

On Cauldron 2014 we had a couple of talks about relaxation of ebx usage in 32bit PIC mode.  It was decided that the best approach would be to not fix ebx register, use speudo register for GOT base address and let allocator do the rest.  This should be similar to how clang and icc work with GOT base address.  I've been working for some time on such patch and now want to share my results.

The idea of the patch was very simple and included few things;
 1.  Set PIC_OFFSET_TABLE_REGNUM to INVALID_REGNUM to specify that we do not have any hard reg fixed for PIC.
 2.  Initialize pic_offset_table_rtx with a new pseudo register in the begining of a function expand.
 3.  Change ABI so that there is a possible implicit PIC argument for calls; pic_offset_table_rtx is used as an arg value if such implicit arg exist.

Such approach worked well on small tests but trying to run some benchmarks we faced a problem with reload of address constants.  The problem is that when we try to rematerialize address constant or some constant memory reference, we have to use pic_offset_table_rtx.  It means we insert new usages of a speudo register and alocator cannot handle it correctly.  Same problem also applies for float and vector constants.

Rematerialization is not the only case causing new pic_offset_table_rtx usage.  Another case is a split of some instructions using constant but not having proper constraints.  E.g. pushtf pattern allows push of constant but it has to be replaced with push of memory in reload pass causing additional usage of pic_offset_table_rtx.

There are two ways to fix it.  The first one is to support modifications of pseudo register live range during reload and correctly allocate hard regs for its new usages (currently we have some hard reg allocated for new usage of pseudo reg but it may contain value of some other pseudo reg; thus we reveal the problem at runtime only).

The second way is to avoid all cases when new usages of pic_offset_table_rtx appear in reload.  That is a way I chose because it appeared simplier to me and would allow me to get some performance data faster.  Also having rematerialization of address anf float constants in PIC mode would mean we have higher register pressure, thus having them on stack should be even more efficient.  To achieve it I had to cut off reg equivs to all exprs using symbol references and all constants living in the memory.  I also had to avoid instructions requiring split in reload causing load of constant from memory (*push[txd]f).

Resulting compiler successfully passes make check, compiles EEMBC and SPEC2000 benchmarks.  There is no confidence I covered all cases and there still may be some templates causing split in reload with new pic_offset_table_rtx usages.  I think support of reload with pseudo PIC would be better and more general solution.  But I don't know how difficult is to implement it though.  Any ideas on resolving this reload issue?

I collected some performance numbers for EEMBC and SPEC2000 benchmarks.  Here are patch results for -Ofast optlevel with LTO collectd on Avoton server:
AUTOmark +1,9%
TELECOMmark +4,0%
DENmark +10,0%
SPEC2000 -0,5%

There are few degradations on EEMBC benchmarks but on SPEC2000 situation is different and we see more performance losses.  Some of them are caused by disabled rematerialization of address constants.  In some cases relaxed ebx causes more spills/fills in plaecs where GOT is frequently used.  There are also some minor fixes required in the patch to allow more efficient function prolog (avoid unnecessary GOT register initialization and allow its initialization without ebx usage).  Suppose some performance problems may be resolved but a good fix for reload should go first.

Thanks,
Ilya
--
diff --git a/gcc/calls.c b/gcc/calls.c
index 4285ec1..85dae6b 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -1122,6 +1122,14 @@ initialize_argument_information (int num_actuals ATTRIBUTE_UNUSED,
     call_expr_arg_iterator iter;
     tree arg;
 
+    if (targetm.calls.implicit_pic_arg (fndecl ? fndecl : fntype))
+      {
+	gcc_assert (pic_offset_table_rtx);
+	args[j].tree_value = make_tree (ptr_type_node,
+					pic_offset_table_rtx);
+	j--;
+      }
+
     if (struct_value_addr_value)
       {
 	args[j].tree_value = struct_value_addr_value;
@@ -2520,6 +2528,10 @@ expand_call (tree exp, rtx target, int ignore)
     /* Treat all args as named.  */
     n_named_args = num_actuals;
 
+  /* Add implicit PIC arg.  */
+  if (targetm.calls.implicit_pic_arg (fndecl ? fndecl : fntype))
+    num_actuals++;
+
   /* Make a vector to hold all the information about each arg.  */
   args = XALLOCAVEC (struct arg_data, num_actuals);
   memset (args, 0, num_actuals * sizeof (struct arg_data));
@@ -3133,6 +3145,8 @@ expand_call (tree exp, rtx target, int ignore)
 	{
 	  int arg_nr = return_flags & ERF_RETURN_ARG_MASK;
 	  arg_nr = num_actuals - arg_nr - 1;
+	  if (targetm.calls.implicit_pic_arg (fndecl ? fndecl : fntype))
+	    arg_nr--;
 	  if (arg_nr >= 0
 	      && arg_nr < num_actuals
 	      && args[arg_nr].reg
@@ -3700,8 +3714,8 @@ emit_library_call_value_1 (int retval, rtx orgfun, rtx value,
      of the full argument passing conventions to limit complexity here since
      library functions shouldn't have many args.  */
 
-  argvec = XALLOCAVEC (struct arg, nargs + 1);
-  memset (argvec, 0, (nargs + 1) * sizeof (struct arg));
+  argvec = XALLOCAVEC (struct arg, nargs + 2);
+  memset (argvec, 0, (nargs + 2) * sizeof (struct arg));
 
 #ifdef INIT_CUMULATIVE_LIBCALL_ARGS
   INIT_CUMULATIVE_LIBCALL_ARGS (args_so_far_v, outmode, fun);
@@ -3717,6 +3731,23 @@ emit_library_call_value_1 (int retval, rtx orgfun, rtx value,
 
   push_temp_slots ();
 
+  if (targetm.calls.implicit_pic_arg (fndecl ? fndecl : fntype))
+    {
+      gcc_assert (pic_offset_table_rtx);
+
+      argvec[count].value = pic_offset_table_rtx;
+      argvec[count].mode = Pmode;
+      argvec[count].partial = 0;
+
+      argvec[count].reg = targetm.calls.function_arg (args_so_far,
+						      Pmode, NULL_TREE, true);
+
+      targetm.calls.function_arg_advance (args_so_far, Pmode, NULL_TREE, true);
+
+      count++;
+      nargs++;
+    }
+
   /* If there's a structure value address to be passed,
      either pass it in the special place, or pass it as an extra argument.  */
   if (mem_value && struct_value == 0 && ! pcc_struct_value)
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index cc4b0c7..cfafcdd 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -6133,6 +6133,21 @@ ix86_maybe_switch_abi (void)
     reinit_regs ();
 }
 
+/* Return reg in which implicit PIC base address
+   arg is passed.  */
+static rtx
+ix86_implicit_pic_arg (const_tree fntype_or_decl ATTRIBUTE_UNUSED)
+{
+  if ((TARGET_64BIT
+       && (ix86_cmodel == CM_SMALL_PIC
+	   || TARGET_PECOFF))
+      || !flag_pic
+      || !X86_TUNE_RELAX_PIC_REG)
+    return NULL_RTX;
+
+  return gen_rtx_REG (Pmode, REAL_PIC_OFFSET_TABLE_REGNUM);
+}
+
 /* Initialize a variable CUM of type CUMULATIVE_ARGS
    for a call to a function whose data type is FNTYPE.
    For a library call, FNTYPE is 0.  */
@@ -6198,6 +6213,11 @@ init_cumulative_args (CUMULATIVE_ARGS *cum,  /* Argument info to initialize */
 		      ? (!prototype_p (fntype) || stdarg_p (fntype))
 		      : !libname);
 
+  if (caller)
+    cum->implicit_pic_arg = ix86_implicit_pic_arg (fndecl ? fndecl : fntype);
+  else
+    cum->implicit_pic_arg = NULL_RTX;
+
   if (!TARGET_64BIT)
     {
       /* If there are variable arguments, then we won't pass anything
@@ -7291,7 +7311,9 @@ ix86_function_arg_advance (cumulative_args_t cum_v, enum machine_mode mode,
   if (type)
     mode = type_natural_mode (type, NULL, false);
 
-  if (TARGET_64BIT && (cum ? cum->call_abi : ix86_abi) == MS_ABI)
+  if (cum->implicit_pic_arg)
+    cum->implicit_pic_arg = NULL_RTX;
+  else if (TARGET_64BIT && (cum ? cum->call_abi : ix86_abi) == MS_ABI)
     function_arg_advance_ms_64 (cum, bytes, words);
   else if (TARGET_64BIT)
     function_arg_advance_64 (cum, mode, type, words, named);
@@ -7542,7 +7564,9 @@ ix86_function_arg (cumulative_args_t cum_v, enum machine_mode omode,
   if (type && TREE_CODE (type) == VECTOR_TYPE)
     mode = type_natural_mode (type, cum, false);
 
-  if (TARGET_64BIT && (cum ? cum->call_abi : ix86_abi) == MS_ABI)
+  if (cum->implicit_pic_arg)
+    arg = cum->implicit_pic_arg;
+  else if (TARGET_64BIT && (cum ? cum->call_abi : ix86_abi) == MS_ABI)
     arg = function_arg_ms_64 (cum, mode, omode, named, bytes);
   else if (TARGET_64BIT)
     arg = function_arg_64 (cum, mode, omode, type, named);
@@ -9373,6 +9397,9 @@ gen_pop (rtx arg)
 static unsigned int
 ix86_select_alt_pic_regnum (void)
 {
+  if (ix86_implicit_pic_arg (NULL))
+    return INVALID_REGNUM;
+
   if (crtl->is_leaf
       && !crtl->profile
       && !ix86_current_function_calls_tls_descriptor)
@@ -11236,7 +11263,8 @@ ix86_expand_prologue (void)
 	}
       else
 	{
-          insn = emit_insn (gen_set_got (pic_offset_table_rtx));
+	  rtx reg = gen_rtx_REG (Pmode, REAL_PIC_OFFSET_TABLE_REGNUM);
+          insn = emit_insn (gen_set_got (reg));
 	  RTX_FRAME_RELATED_P (insn) = 1;
 	  add_reg_note (insn, REG_CFA_FLUSH_QUEUE, NULL_RTX);
 	}
@@ -11789,7 +11817,8 @@ ix86_expand_epilogue (int style)
 static void
 ix86_output_function_epilogue (FILE *file ATTRIBUTE_UNUSED, HOST_WIDE_INT)
 {
-  if (pic_offset_table_rtx)
+  if (pic_offset_table_rtx
+      && REGNO (pic_offset_table_rtx) < FIRST_PSEUDO_REGISTER)
     SET_REGNO (pic_offset_table_rtx, REAL_PIC_OFFSET_TABLE_REGNUM);
 #if TARGET_MACHO
   /* Mach-O doesn't support labels at the end of objects, so if
@@ -13107,6 +13136,15 @@ ix86_GOT_alias_set (void)
   return set;
 }
 
+/* Set regs_ever_live for PIC base address register
+   to true if required.  */
+static void
+set_pic_reg_ever_alive ()
+{
+  if (reload_in_progress)
+    df_set_regs_ever_live (REGNO (pic_offset_table_rtx), true);
+}
+
 /* Return a legitimate reference for ORIG (an address) using the
    register REG.  If REG is 0, a new pseudo is generated.
 
@@ -13157,8 +13195,7 @@ legitimize_pic_address (rtx orig, rtx reg)
       /* This symbol may be referenced via a displacement from the PIC
 	 base address (@GOTOFF).  */
 
-      if (reload_in_progress)
-	df_set_regs_ever_live (PIC_OFFSET_TABLE_REGNUM, true);
+      set_pic_reg_ever_alive ();
       if (GET_CODE (addr) == CONST)
 	addr = XEXP (addr, 0);
       if (GET_CODE (addr) == PLUS)
@@ -13190,8 +13227,7 @@ legitimize_pic_address (rtx orig, rtx reg)
       /* This symbol may be referenced via a displacement from the PIC
 	 base address (@GOTOFF).  */
 
-      if (reload_in_progress)
-	df_set_regs_ever_live (PIC_OFFSET_TABLE_REGNUM, true);
+      set_pic_reg_ever_alive ();
       if (GET_CODE (addr) == CONST)
 	addr = XEXP (addr, 0);
       if (GET_CODE (addr) == PLUS)
@@ -13252,8 +13288,7 @@ legitimize_pic_address (rtx orig, rtx reg)
 	  /* This symbol must be referenced via a load from the
 	     Global Offset Table (@GOT).  */
 
-	  if (reload_in_progress)
-	    df_set_regs_ever_live (PIC_OFFSET_TABLE_REGNUM, true);
+	  set_pic_reg_ever_alive ();
 	  new_rtx = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, addr), UNSPEC_GOT);
 	  new_rtx = gen_rtx_CONST (Pmode, new_rtx);
 	  if (TARGET_64BIT)
@@ -13305,8 +13340,7 @@ legitimize_pic_address (rtx orig, rtx reg)
 	    {
 	      if (!TARGET_64BIT)
 		{
-		  if (reload_in_progress)
-		    df_set_regs_ever_live (PIC_OFFSET_TABLE_REGNUM, true);
+		  set_pic_reg_ever_alive ();
 		  new_rtx = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, op0),
 					    UNSPEC_GOTOFF);
 		  new_rtx = gen_rtx_PLUS (Pmode, new_rtx, op1);
@@ -13601,8 +13635,7 @@ legitimize_tls_address (rtx x, enum tls_model model, bool for_mov)
 	}
       else if (flag_pic)
 	{
-	  if (reload_in_progress)
-	    df_set_regs_ever_live (PIC_OFFSET_TABLE_REGNUM, true);
+	  set_pic_reg_ever_alive ();
 	  pic = pic_offset_table_rtx;
 	  type = TARGET_ANY_GNU_TLS ? UNSPEC_GOTNTPOFF : UNSPEC_GOTTPOFF;
 	}
@@ -14233,6 +14266,8 @@ ix86_pic_register_p (rtx x)
   if (GET_CODE (x) == VALUE && CSELIB_VAL_PTR (x))
     return (pic_offset_table_rtx
 	    && rtx_equal_for_cselib_p (x, pic_offset_table_rtx));
+  else if (pic_offset_table_rtx)
+    return REG_P (x) && REGNO (x) == REGNO (pic_offset_table_rtx);
   else
     return REG_P (x) && REGNO (x) == PIC_OFFSET_TABLE_REGNUM;
 }
@@ -14408,7 +14443,9 @@ ix86_delegitimize_address (rtx x)
 	 ...
 	 movl foo@GOTOFF(%ecx), %edx
 	 in which case we return (%ecx - %ebx) + foo.  */
-      if (pic_offset_table_rtx)
+      if (pic_offset_table_rtx
+	  && (!reload_completed
+	      || REGNO (pic_offset_table_rtx) < FIRST_PSEUDO_REGISTER))
         result = gen_rtx_PLUS (Pmode, gen_rtx_MINUS (Pmode, copy_rtx (addend),
 						     pic_offset_table_rtx),
 			       result);
@@ -24915,7 +24952,7 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1,
 		  && DEFAULT_ABI != MS_ABI))
 	  && GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF
 	  && ! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0)))
-	use_reg (&use, pic_offset_table_rtx);
+	use_reg (&use, gen_rtx_REG (Pmode, REAL_PIC_OFFSET_TABLE_REGNUM));
     }
 
   if (TARGET_64BIT && INTVAL (callarg2) >= 0)
@@ -47228,6 +47265,8 @@ ix86_atomic_assign_expand_fenv (tree *hold, tree *clear, tree *update)
 #define TARGET_FUNCTION_ARG_ADVANCE ix86_function_arg_advance
 #undef TARGET_FUNCTION_ARG
 #define TARGET_FUNCTION_ARG ix86_function_arg
+#undef TARGET_IMPLICIT_PIC_ARG
+#define TARGET_IMPLICIT_PIC_ARG ix86_implicit_pic_arg
 #undef TARGET_FUNCTION_ARG_BOUNDARY
 #define TARGET_FUNCTION_ARG_BOUNDARY ix86_function_arg_boundary
 #undef TARGET_PASS_BY_REFERENCE
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 2c64162..d5fa250 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -1243,11 +1243,13 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
 
 #define REAL_PIC_OFFSET_TABLE_REGNUM  BX_REG
 
-#define PIC_OFFSET_TABLE_REGNUM				\
-  ((TARGET_64BIT && (ix86_cmodel == CM_SMALL_PIC	\
-                     || TARGET_PECOFF))		\
-   || !flag_pic ? INVALID_REGNUM			\
-   : reload_completed ? REGNO (pic_offset_table_rtx)	\
+#define PIC_OFFSET_TABLE_REGNUM						\
+  ((TARGET_64BIT && (ix86_cmodel == CM_SMALL_PIC			\
+                     || TARGET_PECOFF))					\
+   || !flag_pic ? INVALID_REGNUM					\
+   : X86_TUNE_RELAX_PIC_REG ? (pic_offset_table_rtx ? INVALID_REGNUM	\
+			       : REAL_PIC_OFFSET_TABLE_REGNUM)		\
+   : reload_completed ? REGNO (pic_offset_table_rtx)			\
    : REAL_PIC_OFFSET_TABLE_REGNUM)
 
 #define GOT_SYMBOL_NAME "_GLOBAL_OFFSET_TABLE_"
@@ -1652,6 +1654,7 @@ typedef struct ix86_args {
   int float_in_sse;		/* Set to 1 or 2 for 32bit targets if
 				   SFmode/DFmode arguments should be passed
 				   in SSE registers.  Otherwise 0.  */
+  rtx implicit_pic_arg;         /* Implicit PIC base address arg if passed.  */
   enum calling_abi call_abi;	/* Set to SYSV_ABI for sysv abi. Otherwise
  				   MS_ABI for ms abi.  */
 } CUMULATIVE_ARGS;
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 8e74eab..27028ba 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -2725,7 +2725,7 @@
 
 (define_insn "*pushtf"
   [(set (match_operand:TF 0 "push_operand" "=<,<")
-	(match_operand:TF 1 "general_no_elim_operand" "x,*roF"))]
+	(match_operand:TF 1 "nonimmediate_no_elim_operand" "x,*roF"))]
   "TARGET_64BIT || TARGET_SSE"
 {
   /* This insn should be already split before reg-stack.  */
@@ -2750,7 +2750,7 @@
 
 (define_insn "*pushxf"
   [(set (match_operand:XF 0 "push_operand" "=<,<")
-	(match_operand:XF 1 "general_no_elim_operand" "f,Yx*roF"))]
+	(match_operand:XF 1 "nonimmediate_no_elim_operand" "f,Yx*roF"))]
   ""
 {
   /* This insn should be already split before reg-stack.  */
@@ -2781,7 +2781,7 @@
 
 (define_insn "*pushdf"
   [(set (match_operand:DF 0 "push_operand" "=<,<,<,<")
-	(match_operand:DF 1 "general_no_elim_operand" "f,Yd*roF,rmF,x"))]
+	(match_operand:DF 1 "nonimmediate_no_elim_operand" "f,Yd*roF,rmF,x"))]
   ""
 {
   /* This insn should be already split before reg-stack.  */
diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
index 62970be..56eca24 100644
--- a/gcc/config/i386/predicates.md
+++ b/gcc/config/i386/predicates.md
@@ -580,6 +580,12 @@
     (match_operand 0 "register_no_elim_operand")
     (match_operand 0 "general_operand")))
 
+;; Return false if this is any eliminable register.  Otherwise nonimmediate_operand.
+(define_predicate "nonimmediate_no_elim_operand"
+  (if_then_else (match_code "reg,subreg")
+    (match_operand 0 "register_no_elim_operand")
+    (match_operand 0 "nonimmediate_operand")))
+
 ;; Return false if this is any eliminable register.  Otherwise
 ;; register_operand or a constant.
 (define_predicate "nonmemory_no_elim_operand"
diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
index 215c63c..ffb7a2d 100644
--- a/gcc/config/i386/x86-tune.def
+++ b/gcc/config/i386/x86-tune.def
@@ -537,3 +537,6 @@ DEF_TUNE (X86_TUNE_PROMOTE_QI_REGS, "promote_qi_regs", 0)
    unrolling small loop less important. For, such architectures we adjust
    the unroll factor so that the unrolled loop fits the loop buffer.  */
 DEF_TUNE (X86_TUNE_ADJUST_UNROLL, "adjust_unroll_factor", m_BDVER3 | m_BDVER4)
+
+/* X86_TUNE_RELAX_PIC_REG: Do not fix hard register for GOT base usage.  */
+DEF_TUNE (X86_TUNE_RELAX_PIC_REG, "relax_pic_reg", ~0)
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 9dd8d68..33b36be 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -3967,6 +3967,12 @@ If @code{TARGET_FUNCTION_INCOMING_ARG} is not defined,
 @code{TARGET_FUNCTION_ARG} serves both purposes.
 @end deftypefn
 
+@deftypefn {Target Hook} rtx TARGET_IMPLICIT_PIC_ARG (const_tree @var{fntype_or_decl})
+This hook returns register holding PIC base address for functions
+which do not fix hard register but handle it similar to function arg
+assigning a virtual reg for it.
+@end deftypefn
+
 @deftypefn {Target Hook} int TARGET_ARG_PARTIAL_BYTES (cumulative_args_t @var{cum}, enum machine_mode @var{mode}, tree @var{type}, bool @var{named})
 This target hook returns the number of bytes at the beginning of an
 argument that must be put in registers.  The value must be zero for
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index dd72b98..3e6da2f 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -3413,6 +3413,8 @@ the stack.
 
 @hook TARGET_FUNCTION_INCOMING_ARG
 
+@hook TARGET_IMPLICIT_PIC_ARG
+
 @hook TARGET_ARG_PARTIAL_BYTES
 
 @hook TARGET_PASS_BY_REFERENCE
diff --git a/gcc/function.c b/gcc/function.c
index 8156766..3a85c16 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -3456,6 +3456,15 @@ assign_parms (tree fndecl)
 
   fnargs.release ();
 
+  /* Handle implicit PIC arg if any.  */
+  if (targetm.calls.implicit_pic_arg (fndecl))
+    {
+      rtx old_reg = targetm.calls.implicit_pic_arg (fndecl);
+      rtx new_reg = gen_reg_rtx (GET_MODE (old_reg));
+      emit_move_insn (new_reg, old_reg);
+      pic_offset_table_rtx = new_reg;
+    }
+
   /* Output all parameter conversion instructions (possibly including calls)
      now that all parameters have been copied out of hard registers.  */
   emit_insn (all.first_conversion_insn);
diff --git a/gcc/hooks.c b/gcc/hooks.c
index 5c06562..47784e2 100644
--- a/gcc/hooks.c
+++ b/gcc/hooks.c
@@ -352,6 +352,13 @@ hook_rtx_rtx_null (rtx x ATTRIBUTE_UNUSED)
   return NULL;
 }
 
+/* Generic hook that takes a const_tree arg and returns NULL_RTX.  */
+rtx
+hook_rtx_const_tree_null (const_tree a ATTRIBUTE_UNUSED)
+{
+  return NULL;
+}
+
 /* Generic hook that takes a tree and an int and returns NULL_RTX.  */
 rtx
 hook_rtx_tree_int_null (tree a ATTRIBUTE_UNUSED, int b ATTRIBUTE_UNUSED)
diff --git a/gcc/hooks.h b/gcc/hooks.h
index ba42b6c..cf830ef 100644
--- a/gcc/hooks.h
+++ b/gcc/hooks.h
@@ -100,6 +100,7 @@ extern bool default_can_output_mi_thunk_no_vcall (const_tree, HOST_WIDE_INT,
 
 extern rtx hook_rtx_rtx_identity (rtx);
 extern rtx hook_rtx_rtx_null (rtx);
+extern rtx hook_rtx_const_tree_null (const_tree);
 extern rtx hook_rtx_tree_int_null (tree, int);
 
 extern const char *hook_constcharptr_void_null (void);
diff --git a/gcc/ira.c b/gcc/ira.c
index 3f41061..dc2eaed 100644
--- a/gcc/ira.c
+++ b/gcc/ira.c
@@ -3467,6 +3467,11 @@ update_equiv_regs (void)
 	  if (note && GET_CODE (XEXP (note, 0)) == EXPR_LIST)
 	    note = NULL_RTX;
 
+	  if (pic_offset_table_rtx
+	      && REGNO (pic_offset_table_rtx) >= FIRST_PSEUDO_REGISTER
+	      && contains_symbol_ref (insn))
+	    note = NULL_RTX;
+
 	  if (DF_REG_DEF_COUNT (regno) != 1
 	      && (! note
 		  || rtx_varies_p (XEXP (note, 0), 0)
@@ -3512,6 +3517,10 @@ update_equiv_regs (void)
 	      && MEM_P (SET_SRC (set))
 	      && validate_equiv_mem (insn, dest, SET_SRC (set)))
 	    note = set_unique_reg_note (insn, REG_EQUIV, copy_rtx (SET_SRC (set)));
+	  if (pic_offset_table_rtx
+	      && REGNO (pic_offset_table_rtx) >= FIRST_PSEUDO_REGISTER
+	      && contains_symbol_ref (insn))
+	    note = NULL_RTX;
 
 	  if (note)
 	    {
@@ -3886,11 +3895,19 @@ setup_reg_equiv (void)
 		      /* This is PLUS of frame pointer and a constant,
 			 or fp, or argp.  */
 		      ira_reg_equiv[i].invariant = x;
-		    else if (targetm.legitimate_constant_p (mode, x))
+		    else if (targetm.legitimate_constant_p (mode, x)
+			     && (!pic_offset_table_rtx
+				 || REGNO (pic_offset_table_rtx) < FIRST_PSEUDO_REGISTER
+				 || (GET_CODE (x) != CONST_DOUBLE
+				     && GET_CODE (x) != CONST_VECTOR)))
 		      ira_reg_equiv[i].constant = x;
 		    else
 		      {
 			ira_reg_equiv[i].memory = force_const_mem (mode, x);
+			if (pic_offset_table_rtx
+			    && REGNO (pic_offset_table_rtx) >= FIRST_PSEUDO_REGISTER
+			    && contains_symbol_ref (ira_reg_equiv[i].memory))
+			  ira_reg_equiv[i].memory = NULL_RTX;
 			if (ira_reg_equiv[i].memory == NULL_RTX)
 			  {
 			    ira_reg_equiv[i].defined_p = false;
diff --git a/gcc/rtl.h b/gcc/rtl.h
index b6a21b6..02fcf96 100644
--- a/gcc/rtl.h
+++ b/gcc/rtl.h
@@ -2610,6 +2610,7 @@ extern int rtx_referenced_p (rtx, rtx);
 extern bool tablejump_p (const_rtx, rtx *, rtx_jump_table_data **);
 extern int computed_jump_p (const_rtx);
 extern bool tls_referenced_p (rtx);
+extern bool contains_symbol_ref (rtx);
 
 typedef int (*rtx_function) (rtx *, void *);
 extern int for_each_rtx (rtx *, rtx_function, void *);
diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index bc16437..21f2872 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -110,7 +110,8 @@ rtx_unstable_p (const_rtx x)
       /* ??? When call-clobbered, the value is stable modulo the restore
 	 that must happen after a call.  This currently screws up local-alloc
 	 into believing that the restore is not needed.  */
-      if (!PIC_OFFSET_TABLE_REG_CALL_CLOBBERED && x == pic_offset_table_rtx)
+      if (!PIC_OFFSET_TABLE_REG_CALL_CLOBBERED && x == pic_offset_table_rtx
+	  && REGNO (pic_offset_table_rtx) < FIRST_PSEUDO_REGISTER)
 	return 0;
       return 1;
 
@@ -185,7 +186,9 @@ rtx_varies_p (const_rtx x, bool for_alias)
 	     that must happen after a call.  This currently screws up
 	     local-alloc into believing that the restore is not needed, so we
 	     must return 0 only if we are called from alias analysis.  */
-	  && (!PIC_OFFSET_TABLE_REG_CALL_CLOBBERED || for_alias))
+	  && ((!PIC_OFFSET_TABLE_REG_CALL_CLOBBERED
+	       && REGNO (pic_offset_table_rtx) < FIRST_PSEUDO_REGISTER)
+	      || for_alias))
 	return 0;
       return 1;
 
@@ -5978,6 +5981,42 @@ get_index_code (const struct address_info *info)
   return SCRATCH;
 }
 
+/* Return true if RTL X contains a SYMBOL_REF.  */
+
+bool
+contains_symbol_ref (rtx x)
+{
+  const char *fmt;
+  RTX_CODE code;
+  int i;
+
+  if (!x)
+    return false;
+
+  code = GET_CODE (x);
+  if (code == SYMBOL_REF)
+    return true;
+
+  fmt = GET_RTX_FORMAT (code);
+  for (i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
+    {
+      if (fmt[i] == 'e')
+	{
+	  if (contains_symbol_ref (XEXP (x, i)))
+	    return true;
+	}
+      else if (fmt[i] == 'E')
+	{
+	  int j;
+	  for (j = 0; j < XVECLEN (x, i); j++)
+	    if (contains_symbol_ref (XVECEXP (x, i, j)))
+	      return true;
+	}
+    }
+
+  return false;
+}
+
 /* Return 1 if *X is a thread-local symbol.  */
 
 static int
diff --git a/gcc/shrink-wrap.c b/gcc/shrink-wrap.c
index 5c34fee..50de8d5 100644
--- a/gcc/shrink-wrap.c
+++ b/gcc/shrink-wrap.c
@@ -448,7 +448,7 @@ try_shrink_wrapping (edge *entry_edge, edge orig_entry_edge,
     {
       HARD_REG_SET prologue_clobbered, prologue_used, live_on_edge;
       struct hard_reg_set_container set_up_by_prologue;
-      rtx p_insn;
+      rtx p_insn, reg;
       vec<basic_block> vec;
       basic_block bb;
       bitmap_head bb_antic_flags;
@@ -494,9 +494,13 @@ try_shrink_wrapping (edge *entry_edge, edge orig_entry_edge,
       if (frame_pointer_needed)
 	add_to_hard_reg_set (&set_up_by_prologue.set, Pmode,
 			     HARD_FRAME_POINTER_REGNUM);
-      if (pic_offset_table_rtx)
+      if (pic_offset_table_rtx
+	  && PIC_OFFSET_TABLE_REGNUM != INVALID_REGNUM)
 	add_to_hard_reg_set (&set_up_by_prologue.set, Pmode,
 			     PIC_OFFSET_TABLE_REGNUM);
+      if ((reg = targetm.calls.implicit_pic_arg (current_function_decl)))
+	add_to_hard_reg_set (&set_up_by_prologue.set,
+			     Pmode, REGNO (reg));
       if (crtl->drap_reg)
 	add_to_hard_reg_set (&set_up_by_prologue.set,
 			     GET_MODE (crtl->drap_reg),
diff --git a/gcc/target.def b/gcc/target.def
index 3a41db1..5c221b6 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -3976,6 +3976,14 @@ If @code{TARGET_FUNCTION_INCOMING_ARG} is not defined,\n\
  default_function_incoming_arg)
 
 DEFHOOK
+(implicit_pic_arg,
+ "This hook returns register holding PIC base address for functions\n\
+which do not fix hard register but handle it similar to function arg\n\
+assigning a virtual reg for it.",
+ rtx, (const_tree fntype_or_decl),
+ hook_rtx_const_tree_null)
+
+DEFHOOK
 (function_arg_boundary,
  "This hook returns the alignment boundary, in bits, of an argument\n\
 with the specified mode and type.  The default hook returns\n\
diff --git a/gcc/var-tracking.c b/gcc/var-tracking.c
index a458380..63d2be5 100644
--- a/gcc/var-tracking.c
+++ b/gcc/var-tracking.c
@@ -661,7 +661,6 @@ static bool variable_different_p (variable, variable);
 static bool dataflow_set_different (dataflow_set *, dataflow_set *);
 static void dataflow_set_destroy (dataflow_set *);
 
-static bool contains_symbol_ref (rtx);
 static bool track_expr_p (tree, bool);
 static bool same_variable_part_p (rtx, tree, HOST_WIDE_INT);
 static int add_uses (rtx *, void *);
@@ -5032,42 +5031,6 @@ dataflow_set_destroy (dataflow_set *set)
   set->vars = NULL;
 }
 
-/* Return true if RTL X contains a SYMBOL_REF.  */
-
-static bool
-contains_symbol_ref (rtx x)
-{
-  const char *fmt;
-  RTX_CODE code;
-  int i;
-
-  if (!x)
-    return false;
-
-  code = GET_CODE (x);
-  if (code == SYMBOL_REF)
-    return true;
-
-  fmt = GET_RTX_FORMAT (code);
-  for (i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
-    {
-      if (fmt[i] == 'e')
-	{
-	  if (contains_symbol_ref (XEXP (x, i)))
-	    return true;
-	}
-      else if (fmt[i] == 'E')
-	{
-	  int j;
-	  for (j = 0; j < XVECLEN (x, i); j++)
-	    if (contains_symbol_ref (XVECEXP (x, i, j)))
-	      return true;
-	}
-    }
-
-  return false;
-}
-
 /* Shall EXPR be tracked?  */
 
 static bool

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Enable EBX for x86 in 32bits PIC code
  2014-08-22 12:21         ` Enable EBX for x86 in 32bits PIC code Ilya Enkovich
@ 2014-08-23  1:47           ` Hans-Peter Nilsson
  2014-08-25  9:25             ` Ilya Enkovich
  2014-08-25 15:09           ` Vladimir Makarov
                             ` (3 subsequent siblings)
  4 siblings, 1 reply; 49+ messages in thread
From: Hans-Peter Nilsson @ 2014-08-23  1:47 UTC (permalink / raw)
  To: Ilya Enkovich; +Cc: gcc-patches, Evgeny Stupachenko

(Dropping gcc@ and people known to subscribe to gcc-patches
from the CC.)

Sorry for the drive-by review, but...

On Fri, 22 Aug 2014, Ilya Enkovich wrote:
> Hi,
>
> On Cauldron 2014 we had a couple of talks about relaxation of
> ebx usage in 32bit PIC mode.  It was decided that the best
> approach would be to not fix ebx register, use speudo register
> for GOT base address and let allocator do the rest.  This should
> be similar to how clang and icc work with GOT base address.
> I've been working for some time on such patch and now want to
> share my results.

...did you send the right version of the patch?
This one uses the RTX-returning hook only in boolean tests,
unless I misread.

Using the return value in boolean tests (non/NULL) here:

> diff --git a/gcc/calls.c b/gcc/calls.c
> index 4285ec1..85dae6b 100644
> --- a/gcc/calls.c
> +++ b/gcc/calls.c
> @@ -1122,6 +1122,14 @@ initialize_argument_information (int num_actuals ATTRIBUTE_UNUSED,
>      call_expr_arg_iterator iter;
>      tree arg;
>
> +    if (targetm.calls.implicit_pic_arg (fndecl ? fndecl : fntype))
...
> +  /* Add implicit PIC arg.  */
> +  if (targetm.calls.implicit_pic_arg (fndecl ? fndecl : fntype))
> +    num_actuals++;
...
> +  if (targetm.calls.implicit_pic_arg (fndecl ? fndecl : fntype))

but:

> +/* Return reg in which implicit PIC base address
> +   arg is passed.  */
> +static rtx
> +ix86_implicit_pic_arg (const_tree fntype_or_decl ATTRIBUTE_UNUSED)
...
> +#undef TARGET_IMPLICIT_PIC_ARG
> +#define TARGET_IMPLICIT_PIC_ARG ix86_implicit_pic_arg
>  #undef TARGET_FUNCTION_ARG_BOUNDARY

and:

> --- a/gcc/doc/tm.texi
> +++ b/gcc/doc/tm.texi
> @@ -3967,6 +3967,12 @@ If @code{TARGET_FUNCTION_INCOMING_ARG} is not defined,
>  @code{TARGET_FUNCTION_ARG} serves both purposes.
>  @end deftypefn
>
> +@deftypefn {Target Hook} rtx TARGET_IMPLICIT_PIC_ARG (const_tree @var{fntype_or_decl})
> +This hook returns register holding PIC base address for functions
> +which do not fix hard register but handle it similar to function arg
> +assigning a virtual reg for it.
> +@end deftypefn

Also, the contains_symbol_ref removal seems like an independent
cleanup-patch.

> index a458380..63d2be5 100644
> --- a/gcc/var-tracking.c
> +++ b/gcc/var-tracking.c
> @@ -661,7 +661,6 @@ static bool variable_different_p (variable, variable);
>  static bool dataflow_set_different (dataflow_set *, dataflow_set *);
>  static void dataflow_set_destroy (dataflow_set *);
>
> -static bool contains_symbol_ref (rtx);

brgds, H-P

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Enable EBX for x86 in 32bits PIC code
  2014-08-23  1:47           ` Hans-Peter Nilsson
@ 2014-08-25  9:25             ` Ilya Enkovich
  2014-08-25 11:24               ` Hans-Peter Nilsson
  0 siblings, 1 reply; 49+ messages in thread
From: Ilya Enkovich @ 2014-08-25  9:25 UTC (permalink / raw)
  To: Hans-Peter Nilsson; +Cc: gcc-patches, Evgeny Stupachenko

2014-08-23 5:47 GMT+04:00 Hans-Peter Nilsson <hp@bitrange.com>:
> (Dropping gcc@ and people known to subscribe to gcc-patches
> from the CC.)
>
> Sorry for the drive-by review, but...
>
> On Fri, 22 Aug 2014, Ilya Enkovich wrote:
>> Hi,
>>
>> On Cauldron 2014 we had a couple of talks about relaxation of
>> ebx usage in 32bit PIC mode.  It was decided that the best
>> approach would be to not fix ebx register, use speudo register
>> for GOT base address and let allocator do the rest.  This should
>> be similar to how clang and icc work with GOT base address.
>> I've been working for some time on such patch and now want to
>> share my results.
>
> ...did you send the right version of the patch?
> This one uses the RTX-returning hook only in boolean tests,
> unless I misread.
>
> Using the return value in boolean tests (non/NULL) here:

NULL returned by hook means we do not have implicit pic arg to
pass/receive and there are pieces of code which should be executed
only when implicit pic arg exists.  This causes these boolean tests.
There are also non boolean usages. E.g.:

+      rtx old_reg = targetm.calls.implicit_pic_arg (fndecl);
+      rtx new_reg = gen_reg_rtx (GET_MODE (old_reg));
+      emit_move_insn (new_reg, old_reg);
+      pic_offset_table_rtx = new_reg;

>
>> diff --git a/gcc/calls.c b/gcc/calls.c
>> index 4285ec1..85dae6b 100644
>> --- a/gcc/calls.c
>> +++ b/gcc/calls.c
>> @@ -1122,6 +1122,14 @@ initialize_argument_information (int num_actuals ATTRIBUTE_UNUSED,
>>      call_expr_arg_iterator iter;
>>      tree arg;
>>
>> +    if (targetm.calls.implicit_pic_arg (fndecl ? fndecl : fntype))
> ...
>> +  /* Add implicit PIC arg.  */
>> +  if (targetm.calls.implicit_pic_arg (fndecl ? fndecl : fntype))
>> +    num_actuals++;
> ...
>> +  if (targetm.calls.implicit_pic_arg (fndecl ? fndecl : fntype))
>
> but:
>
>> +/* Return reg in which implicit PIC base address
>> +   arg is passed.  */
>> +static rtx
>> +ix86_implicit_pic_arg (const_tree fntype_or_decl ATTRIBUTE_UNUSED)
> ...
>> +#undef TARGET_IMPLICIT_PIC_ARG
>> +#define TARGET_IMPLICIT_PIC_ARG ix86_implicit_pic_arg
>>  #undef TARGET_FUNCTION_ARG_BOUNDARY
>
> and:
>
>> --- a/gcc/doc/tm.texi
>> +++ b/gcc/doc/tm.texi
>> @@ -3967,6 +3967,12 @@ If @code{TARGET_FUNCTION_INCOMING_ARG} is not defined,
>>  @code{TARGET_FUNCTION_ARG} serves both purposes.
>>  @end deftypefn
>>
>> +@deftypefn {Target Hook} rtx TARGET_IMPLICIT_PIC_ARG (const_tree @var{fntype_or_decl})
>> +This hook returns register holding PIC base address for functions
>> +which do not fix hard register but handle it similar to function arg
>> +assigning a virtual reg for it.
>> +@end deftypefn
>
> Also, the contains_symbol_ref removal seems like an independent
> cleanup-patch.

It was not removed, it was just moved into rtlanal.c for shared usage
(I used it in ira.c).

Thanks,
Ilya

>
>> index a458380..63d2be5 100644
>> --- a/gcc/var-tracking.c
>> +++ b/gcc/var-tracking.c
>> @@ -661,7 +661,6 @@ static bool variable_different_p (variable, variable);
>>  static bool dataflow_set_different (dataflow_set *, dataflow_set *);
>>  static void dataflow_set_destroy (dataflow_set *);
>>
>> -static bool contains_symbol_ref (rtx);
>
> brgds, H-P

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Enable EBX for x86 in 32bits PIC code
  2014-08-25  9:25             ` Ilya Enkovich
@ 2014-08-25 11:24               ` Hans-Peter Nilsson
  2014-08-25 11:43                 ` Ilya Enkovich
  0 siblings, 1 reply; 49+ messages in thread
From: Hans-Peter Nilsson @ 2014-08-25 11:24 UTC (permalink / raw)
  To: Ilya Enkovich; +Cc: gcc-patches, Evgeny Stupachenko

On Mon, 25 Aug 2014, Ilya Enkovich wrote:
> 2014-08-23 5:47 GMT+04:00 Hans-Peter Nilsson <hp@bitrange.com>:
> > ...did you send the right version of the patch?
> > This one uses the RTX-returning hook only in boolean tests,
> > unless I misread.

(I did, but not by much.)

> NULL returned by hook means we do not have implicit pic arg to
> pass/receive and there are pieces of code which should be executed
> only when implicit pic arg exists.  This causes these boolean tests.

Well, obviously, but...

> There are also non boolean usages. E.g.:

I thing singular ("usage") is more correct?
I saw only one such use. :)

> +      rtx old_reg = targetm.calls.implicit_pic_arg (fndecl);
> +      rtx new_reg = gen_reg_rtx (GET_MODE (old_reg));
> +      emit_move_insn (new_reg, old_reg);
> +      pic_offset_table_rtx = new_reg;

And before that, it's called as a boolean test, throwing away
the result!

I suggest you change the hook to return a boolean, with a
pointer argument to a variable to set, passed as NULL from
callers not interested in the actual value.

I.e. instead of:

> >> +@deftypefn {Target Hook} rtx TARGET_IMPLICIT_PIC_ARG (const_tree @var{fntype_or_decl})

make it a:

@deftypefn {Target Hook} bool TARGET_IMPLICIT_PIC_ARG
 (const_tree @var{fntype_or_decl}, rtx *@var{addr})

brgds, H-P

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Enable EBX for x86 in 32bits PIC code
  2014-08-25 11:24               ` Hans-Peter Nilsson
@ 2014-08-25 11:43                 ` Ilya Enkovich
  0 siblings, 0 replies; 49+ messages in thread
From: Ilya Enkovich @ 2014-08-25 11:43 UTC (permalink / raw)
  To: Hans-Peter Nilsson; +Cc: gcc-patches, Evgeny Stupachenko

2014-08-25 15:24 GMT+04:00 Hans-Peter Nilsson <hp@bitrange.com>:
> On Mon, 25 Aug 2014, Ilya Enkovich wrote:
>> 2014-08-23 5:47 GMT+04:00 Hans-Peter Nilsson <hp@bitrange.com>:
>> > ...did you send the right version of the patch?
>> > This one uses the RTX-returning hook only in boolean tests,
>> > unless I misread.
>
> (I did, but not by much.)
>
>> NULL returned by hook means we do not have implicit pic arg to
>> pass/receive and there are pieces of code which should be executed
>> only when implicit pic arg exists.  This causes these boolean tests.
>
> Well, obviously, but...
>
>> There are also non boolean usages. E.g.:
>
> I thing singular ("usage") is more correct?
> I saw only one such use. :)

There is another one in i386.c :)

>
>> +      rtx old_reg = targetm.calls.implicit_pic_arg (fndecl);
>> +      rtx new_reg = gen_reg_rtx (GET_MODE (old_reg));
>> +      emit_move_insn (new_reg, old_reg);
>> +      pic_offset_table_rtx = new_reg;
>
> And before that, it's called as a boolean test, throwing away
> the result!
>
> I suggest you change the hook to return a boolean, with a
> pointer argument to a variable to set, passed as NULL from
> callers not interested in the actual value.
>
> I.e. instead of:
>
>> >> +@deftypefn {Target Hook} rtx TARGET_IMPLICIT_PIC_ARG (const_tree @var{fntype_or_decl})
>
> make it a:
>
> @deftypefn {Target Hook} bool TARGET_IMPLICIT_PIC_ARG
>  (const_tree @var{fntype_or_decl}, rtx *@var{addr})

OK.  I'll change this hook if it goes to a product quality patch.
Current patch is posted to demonstrate an approach and show narrow
points I have to deal with in reload.  There is no reason in cleaning
it until a decision about next steps is made.

Thanks,
Ilya

>
> brgds, H-P

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Enable EBX for x86 in 32bits PIC code
  2014-08-22 12:21         ` Enable EBX for x86 in 32bits PIC code Ilya Enkovich
  2014-08-23  1:47           ` Hans-Peter Nilsson
@ 2014-08-25 15:09           ` Vladimir Makarov
  2014-08-26  7:49             ` Ilya Enkovich
  2014-08-25 17:30           ` Jeff Law
                             ` (2 subsequent siblings)
  4 siblings, 1 reply; 49+ messages in thread
From: Vladimir Makarov @ 2014-08-25 15:09 UTC (permalink / raw)
  To: Ilya Enkovich, gcc, gcc-patches
  Cc: Evgeny Stupachenko, Richard Biener, Uros Bizjak, law

On 2014-08-22 8:21 AM, Ilya Enkovich wrote:
> Hi,
>
> On Cauldron 2014 we had a couple of talks about relaxation of ebx usage in 32bit PIC mode.  It was decided that the best approach would be to not fix ebx register, use speudo register for GOT base address and let allocator do the rest.  This should be similar to how clang and icc work with GOT base address.  I've been working for some time on such patch and now want to share my results.
>
> The idea of the patch was very simple and included few things;
>   1.  Set PIC_OFFSET_TABLE_REGNUM to INVALID_REGNUM to specify that we do not have any hard reg fixed for PIC.
>   2.  Initialize pic_offset_table_rtx with a new pseudo register in the begining of a function expand.
>   3.  Change ABI so that there is a possible implicit PIC argument for calls; pic_offset_table_rtx is used as an arg value if such implicit arg exist.
>
> Such approach worked well on small tests but trying to run some benchmarks we faced a problem with reload of address constants.  The problem is that when we try to rematerialize address constant or some constant memory reference, we have to use pic_offset_table_rtx.  It means we insert new usages of a speudo register and alocator cannot handle it correctly.  Same problem also applies for float and vector constants.
>
> Rematerialization is not the only case causing new pic_offset_table_rtx usage.  Another case is a split of some instructions using constant but not having proper constraints.  E.g. pushtf pattern allows push of constant but it has to be replaced with push of memory in reload pass causing additional usage of pic_offset_table_rtx.
>
> There are two ways to fix it.  The first one is to support modifications of pseudo register live range during reload and correctly allocate hard regs for its new usages (currently we have some hard reg allocated for new usage of pseudo reg but it may contain value of some other pseudo reg; thus we reveal the problem at runtime only).
>

I believe there is already code to deal with this situation.  It is code 
for risky transformations (please check flag 
lra_risky_transformation_p).  If this flag is set, next lra assign 
subpass is running and checking correctness of assignments (e.g. 
checking situation when two different pseudos have intersected live 
ranges and the same assigned hard reg.  If such dangerous situation is 
found, it is fixed).

> The second way is to avoid all cases when new usages of pic_offset_table_rtx appear in reload.  That is a way I chose because it appeared simplier to me and would allow me to get some performance data faster.  Also having rematerialization of address anf float constants in PIC mode would mean we have higher register pressure, thus having them on stack should be even more efficient.  To achieve it I had to cut off reg equivs to all exprs using symbol references and all constants living in the memory.  I also had to avoid instructions requiring split in reload causing load of constant from memory (*push[txd]f).
>
> Resulting compiler successfully passes make check, compiles EEMBC and SPEC2000 benchmarks.  There is no confidence I covered all cases and there still may be some templates causing split in reload with new pic_offset_table_rtx usages.  I think support of reload with pseudo PIC would be better and more general solution.  But I don't know how difficult is to implement it though.  Any ideas on resolving this reload issue?
>

Please see what I mentioned above.  May be it can fix the degradation. 
Rematerialization is important for performance and switching it of 
completely is not wise.


> I collected some performance numbers for EEMBC and SPEC2000 benchmarks.  Here are patch results for -Ofast optlevel with LTO collectd on Avoton server:
> AUTOmark +1,9%
> TELECOMmark +4,0%
> DENmark +10,0%
> SPEC2000 -0,5%
>
> There are few degradations on EEMBC benchmarks but on SPEC2000 situation is different and we see more performance losses.  Some of them are caused by disabled rematerialization of address constants.  In some cases relaxed ebx causes more spills/fills in plaecs where GOT is frequently used.  There are also some minor fixes required in the patch to allow more efficient function prolog (avoid unnecessary GOT register initialization and allow its initialization without ebx usage).  Suppose some performance problems may be resolved but a good fix for reload should go first.
>
>

Ilya, the optimization you are trying to implement is important in many 
cases and should be in some way included in gcc.  If the degradations 
can be solved in a way i mentioned above we could introduce a 
machine-dependent flag.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Enable EBX for x86 in 32bits PIC code
  2014-08-22 12:21         ` Enable EBX for x86 in 32bits PIC code Ilya Enkovich
  2014-08-23  1:47           ` Hans-Peter Nilsson
  2014-08-25 15:09           ` Vladimir Makarov
@ 2014-08-25 17:30           ` Jeff Law
  2014-08-28 13:01           ` Uros Bizjak
  2014-08-28 18:58           ` Uros Bizjak
  4 siblings, 0 replies; 49+ messages in thread
From: Jeff Law @ 2014-08-25 17:30 UTC (permalink / raw)
  To: Ilya Enkovich, gcc, gcc-patches
  Cc: Evgeny Stupachenko, Richard Biener, Uros Bizjak, vmakarov

On 08/22/14 06:21, Ilya Enkovich wrote:
>
> Such approach worked well on small tests but trying to run some
> benchmarks we faced a problem with reload of address constants.  The
> problem is that when we try to rematerialize address constant or some
> constant memory reference, we have to use pic_offset_table_rtx.  It
> means we insert new usages of a speudo register and alocator cannot
> handle it correctly.  Same problem also applies for float and vector
> constants.
Isn't this typically handled with secondary reloads?   It's not an exact 
match, but if you look at the PA port, you can see cases where we need 
to have %r1 available when we rematerialize certain constants.  Several 
ports have secondary reloads that you may be able to refer back to.  LRA 
may handle things differently, so first check LRA's paths.



>
> Rematerialization is not the only case causing new
> pic_offset_table_rtx usage.  Another case is a split of some
> instructions using constant but not having proper constraints.  E.g.
> pushtf pattern allows push of constant but it has to be replaced with
> push of memory in reload pass causing additional usage of
> pic_offset_table_rtx.
Yup.  I think those would be handled the same way.


Jeff

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Enable EBX for x86 in 32bits PIC code
  2014-08-25 15:09           ` Vladimir Makarov
@ 2014-08-26  7:49             ` Ilya Enkovich
  2014-08-26  8:57               ` Ilya Enkovich
  0 siblings, 1 reply; 49+ messages in thread
From: Ilya Enkovich @ 2014-08-26  7:49 UTC (permalink / raw)
  To: Vladimir Makarov
  Cc: gcc, gcc-patches, Evgeny Stupachenko, Richard Biener,
	Uros Bizjak, Jeff Law

2014-08-25 19:08 GMT+04:00 Vladimir Makarov <vmakarov@redhat.com>:
> On 2014-08-22 8:21 AM, Ilya Enkovich wrote:
>>
>> Hi,
>>
>> On Cauldron 2014 we had a couple of talks about relaxation of ebx usage in
>> 32bit PIC mode.  It was decided that the best approach would be to not fix
>> ebx register, use speudo register for GOT base address and let allocator do
>> the rest.  This should be similar to how clang and icc work with GOT base
>> address.  I've been working for some time on such patch and now want to
>> share my results.
>>
>> The idea of the patch was very simple and included few things;
>>   1.  Set PIC_OFFSET_TABLE_REGNUM to INVALID_REGNUM to specify that we do
>> not have any hard reg fixed for PIC.
>>   2.  Initialize pic_offset_table_rtx with a new pseudo register in the
>> begining of a function expand.
>>   3.  Change ABI so that there is a possible implicit PIC argument for
>> calls; pic_offset_table_rtx is used as an arg value if such implicit arg
>> exist.
>>
>> Such approach worked well on small tests but trying to run some benchmarks
>> we faced a problem with reload of address constants.  The problem is that
>> when we try to rematerialize address constant or some constant memory
>> reference, we have to use pic_offset_table_rtx.  It means we insert new
>> usages of a speudo register and alocator cannot handle it correctly.  Same
>> problem also applies for float and vector constants.
>>
>> Rematerialization is not the only case causing new pic_offset_table_rtx
>> usage.  Another case is a split of some instructions using constant but not
>> having proper constraints.  E.g. pushtf pattern allows push of constant but
>> it has to be replaced with push of memory in reload pass causing additional
>> usage of pic_offset_table_rtx.
>>
>> There are two ways to fix it.  The first one is to support modifications
>> of pseudo register live range during reload and correctly allocate hard regs
>> for its new usages (currently we have some hard reg allocated for new usage
>> of pseudo reg but it may contain value of some other pseudo reg; thus we
>> reveal the problem at runtime only).
>>
>
> I believe there is already code to deal with this situation.  It is code for
> risky transformations (please check flag lra_risky_transformation_p).  If
> this flag is set, next lra assign subpass is running and checking
> correctness of assignments (e.g. checking situation when two different
> pseudos have intersected live ranges and the same assigned hard reg.  If
> such dangerous situation is found, it is fixed).

I tried to remove my restrictions from setup_reg_equiv and initialize
lra_risky_transformation_p with 'true' in lra_constraints instead.  I
got only 50% pass rate for SPEC2000 on Ofast with LTO.  Will search
for fail reason.

Ilya

>
>
>> The second way is to avoid all cases when new usages of
>> pic_offset_table_rtx appear in reload.  That is a way I chose because it
>> appeared simplier to me and would allow me to get some performance data
>> faster.  Also having rematerialization of address anf float constants in PIC
>> mode would mean we have higher register pressure, thus having them on stack
>> should be even more efficient.  To achieve it I had to cut off reg equivs to
>> all exprs using symbol references and all constants living in the memory.  I
>> also had to avoid instructions requiring split in reload causing load of
>> constant from memory (*push[txd]f).
>>
>> Resulting compiler successfully passes make check, compiles EEMBC and
>> SPEC2000 benchmarks.  There is no confidence I covered all cases and there
>> still may be some templates causing split in reload with new
>> pic_offset_table_rtx usages.  I think support of reload with pseudo PIC
>> would be better and more general solution.  But I don't know how difficult
>> is to implement it though.  Any ideas on resolving this reload issue?
>>
>
> Please see what I mentioned above.  May be it can fix the degradation.
> Rematerialization is important for performance and switching it of
> completely is not wise.
>
>
>
>> I collected some performance numbers for EEMBC and SPEC2000 benchmarks.
>> Here are patch results for -Ofast optlevel with LTO collectd on Avoton
>> server:
>> AUTOmark +1,9%
>> TELECOMmark +4,0%
>> DENmark +10,0%
>> SPEC2000 -0,5%
>>
>> There are few degradations on EEMBC benchmarks but on SPEC2000 situation
>> is different and we see more performance losses.  Some of them are caused by
>> disabled rematerialization of address constants.  In some cases relaxed ebx
>> causes more spills/fills in plaecs where GOT is frequently used.  There are
>> also some minor fixes required in the patch to allow more efficient function
>> prolog (avoid unnecessary GOT register initialization and allow its
>> initialization without ebx usage).  Suppose some performance problems may be
>> resolved but a good fix for reload should go first.
>>
>>
>
> Ilya, the optimization you are trying to implement is important in many
> cases and should be in some way included in gcc.  If the degradations can be
> solved in a way i mentioned above we could introduce a machine-dependent
> flag.
>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Enable EBX for x86 in 32bits PIC code
  2014-08-26  7:49             ` Ilya Enkovich
@ 2014-08-26  8:57               ` Ilya Enkovich
  2014-08-26 15:25                 ` Vladimir Makarov
  0 siblings, 1 reply; 49+ messages in thread
From: Ilya Enkovich @ 2014-08-26  8:57 UTC (permalink / raw)
  To: Vladimir Makarov
  Cc: gcc, gcc-patches, Evgeny Stupachenko, Richard Biener,
	Uros Bizjak, Jeff Law

2014-08-26 11:49 GMT+04:00 Ilya Enkovich <enkovich.gnu@gmail.com>:
> 2014-08-25 19:08 GMT+04:00 Vladimir Makarov <vmakarov@redhat.com>:
>> On 2014-08-22 8:21 AM, Ilya Enkovich wrote:
>>>
>>> Hi,
>>>
>>> On Cauldron 2014 we had a couple of talks about relaxation of ebx usage in
>>> 32bit PIC mode.  It was decided that the best approach would be to not fix
>>> ebx register, use speudo register for GOT base address and let allocator do
>>> the rest.  This should be similar to how clang and icc work with GOT base
>>> address.  I've been working for some time on such patch and now want to
>>> share my results.
>>>
>>> The idea of the patch was very simple and included few things;
>>>   1.  Set PIC_OFFSET_TABLE_REGNUM to INVALID_REGNUM to specify that we do
>>> not have any hard reg fixed for PIC.
>>>   2.  Initialize pic_offset_table_rtx with a new pseudo register in the
>>> begining of a function expand.
>>>   3.  Change ABI so that there is a possible implicit PIC argument for
>>> calls; pic_offset_table_rtx is used as an arg value if such implicit arg
>>> exist.
>>>
>>> Such approach worked well on small tests but trying to run some benchmarks
>>> we faced a problem with reload of address constants.  The problem is that
>>> when we try to rematerialize address constant or some constant memory
>>> reference, we have to use pic_offset_table_rtx.  It means we insert new
>>> usages of a speudo register and alocator cannot handle it correctly.  Same
>>> problem also applies for float and vector constants.
>>>
>>> Rematerialization is not the only case causing new pic_offset_table_rtx
>>> usage.  Another case is a split of some instructions using constant but not
>>> having proper constraints.  E.g. pushtf pattern allows push of constant but
>>> it has to be replaced with push of memory in reload pass causing additional
>>> usage of pic_offset_table_rtx.
>>>
>>> There are two ways to fix it.  The first one is to support modifications
>>> of pseudo register live range during reload and correctly allocate hard regs
>>> for its new usages (currently we have some hard reg allocated for new usage
>>> of pseudo reg but it may contain value of some other pseudo reg; thus we
>>> reveal the problem at runtime only).
>>>
>>
>> I believe there is already code to deal with this situation.  It is code for
>> risky transformations (please check flag lra_risky_transformation_p).  If
>> this flag is set, next lra assign subpass is running and checking
>> correctness of assignments (e.g. checking situation when two different
>> pseudos have intersected live ranges and the same assigned hard reg.  If
>> such dangerous situation is found, it is fixed).
>
> I tried to remove my restrictions from setup_reg_equiv and initialize
> lra_risky_transformation_p with 'true' in lra_constraints instead.  I
> got only 50% pass rate for SPEC2000 on Ofast with LTO.  Will search
> for fail reason.

I've looked into one of fails.  There is still a problem with
allocation in reload. Here is a piece of code which uses float
constant:

(insn 1199 1198 1200 96 (set (reg:SI 3 bx)
        (reg:SI 1301 [528])) /usr/include/bits/stdlib-float.h:28 90
{*movsi_internal}
     (nil))
(call_insn 1200 1199 1201 96 (set (reg:DF 8 st)
        (call (mem:QI (symbol_ref:SI ("strtod") [flags 0x41]
<function_decl 0x2b29b8ea8900 strtod>) [0 strtod S1 A8])
            (const_int 8 [0x8]))) /usr/include/bits/stdlib-float.h:28
661 {*call_value}
     (expr_list:REG_DEAD (reg:SI 3 bx)
        (expr_list:REG_CALL_DECL (symbol_ref:SI ("strtod") [flags
0x41]  <function_decl 0x2b29b8ea8900 strtod>)
            (expr_list:REG_EH_REGION (const_int 0 [0])
                (nil))))
    (expr_list (use (reg:SI 3 bx))
        (expr_list:SI (use (reg:SI 3 bx))
            (expr_list:SI (use (mem/f:SI (reg/f:SI 7 sp) [0  S4 A32]))
                (expr_list:SI (use (mem/f:SI (plus:SI (reg/f:SI 7 sp)
                                (const_int 4 [0x4])) [0  S4 A32]))
                    (nil))))))
(insn 1201 1200 1202 96 (set (reg:DF 321 [ D.7817 ])
        (reg:DF 8 st)) /usr/include/bits/stdlib-float.h:28 128 {*movdf_internal}
     (expr_list:REG_DEAD (reg:DF 8 st)
        (nil)))
(insn 1202 1201 1203 96 (set (reg:SF 322 [ D.7804 ])
        (float_truncate:SF (reg:DF 321 [ D.7817 ]))) read_arch.c:700
157 {*truncdfsf_fast_sse}
     (expr_list:REG_DEAD (reg:DF 321 [ D.7817 ])
        (nil)))
(insn 1203 1202 1204 96 (set (mem:SF (reg/f:SI 198 [ D.7812 ]) [4
_130->frequency+0 S4 A32])
        (reg:SF 322 [ D.7804 ])) read_arch.c:700 129 {*movsf_internal}
     (nil))
(insn 1204 1203 1205 96 (set (reg:SF 1209)
        (mem/u/c:SF (plus:SI (reg:SI 1301 [528])
                (const:SI (unspec:SI [
                            (symbol_ref/u:SI ("*.LC12") [flags 0x2])
                        ] UNSPEC_GOTOFF))) [4  S4 A32]))
read_arch.c:701 129 {*movsf_internal}
     (expr_list:REG_EQUAL (const_double:SF 0.0 [0x0.0p+0])
        (nil)))
(note 1205 1204 1206 96 NOTE_INSN_DELETED)
(note 1206 1205 1207 96 NOTE_INSN_DELETED)
(insn 1207 1206 1208 96 (set (reg:CCFP 17 flags)
        (compare:CCFP (reg:SF 1209)
            (reg:SF 322 [ D.7804 ]))) read_arch.c:701 53 {*cmpisf_sse}
     (nil))
(jump_insn 1208 1207 3075 96 (set (pc)
        (if_then_else (ge (reg:CCFP 17 flags)
                (const_int 0 [0]))
            (label_ref:SI 3114)
            (pc))) read_arch.c:701 606 {*jcc_1}
     (expr_list:REG_DEAD (reg:CCFP 17 flags)
        (int_list:REG_BR_PROB 2 (nil)))
 -> 3114)
(note 3075 1208 1209 97 [bb 97] NOTE_INSN_BASIC_BLOCK)
(insn 1209 3075 1210 97 (set (reg:SF 1208)
        (mem/u/c:SF (plus:SI (reg:SI 1301 [528])
                (const:SI (unspec:SI [
                            (symbol_ref/u:SI ("*.LC11") [flags 0x2])
                        ] UNSPEC_GOTOFF))) [4  S4 A32]))
read_arch.c:701 129 {*movsf_internal}
     (expr_list:REG_EQUIV (const_double:SF 1.0e+0 [0x0.8p+1])
        (nil)))
(note 1210 1209 1211 97 NOTE_INSN_DELETED)
(note 1211 1210 1212 97 NOTE_INSN_DELETED)
(insn 1212 1211 1213 97 (set (reg:CCFP 17 flags)
        (compare:CCFP (reg:SF 322 [ D.7804 ])
            (reg:SF 1208))) read_arch.c:701 53 {*cmpisf_sse}
     (nil))

We have PIC register r1301 (former r528) used for constant load (insn
1209).  This register was actually loaded to bx (insn 1199) and this
hard reg may be used by insn 1209.  During reload we have insn 1209
removed and a new one created instead:

(insn 3864 1211 1212 104 (set (reg:SI 0 ax [1468])
        (plus:SI (reg:SI 6 bp [528])
            (const:SI (unspec:SI [
                        (symbol_ref/u:SI ("*.LC11") [flags 0x2])
                    ] UNSPEC_GOTOFF)))) read_arch.c:701 213 {*leasi}
     (expr_list:REG_EQUAL (symbol_ref/u:SI ("*.LC11") [flags 0x2])
        (nil)))
(insn 1212 3864 1213 104 (set (reg:CCFP 17 flags)
        (compare:CCFP (reg:SF 21 xmm0 [orig:322 D.7804 ] [322])
            (mem/u/c:SF (reg:SI 0 ax [1468]) [4  S4 A32])))
read_arch.c:701 53 {*cmpisf_sse}
     (nil))

In this new instruction bp is used which is wrong. We actually have
required value in bx. In debugger I also checked that bp doesn't have
required value.  I suppose I enabled flag correctly because found this
in the log: "Spill r1301 after risky transformations".  Is it possible
we are still not allowed to use the original PIC register (r528) and
should use a reg copy created for particular region (in this case
r1301)?

Ilya

>
> Ilya
>
>>
>>
>>> The second way is to avoid all cases when new usages of
>>> pic_offset_table_rtx appear in reload.  That is a way I chose because it
>>> appeared simplier to me and would allow me to get some performance data
>>> faster.  Also having rematerialization of address anf float constants in PIC
>>> mode would mean we have higher register pressure, thus having them on stack
>>> should be even more efficient.  To achieve it I had to cut off reg equivs to
>>> all exprs using symbol references and all constants living in the memory.  I
>>> also had to avoid instructions requiring split in reload causing load of
>>> constant from memory (*push[txd]f).
>>>
>>> Resulting compiler successfully passes make check, compiles EEMBC and
>>> SPEC2000 benchmarks.  There is no confidence I covered all cases and there
>>> still may be some templates causing split in reload with new
>>> pic_offset_table_rtx usages.  I think support of reload with pseudo PIC
>>> would be better and more general solution.  But I don't know how difficult
>>> is to implement it though.  Any ideas on resolving this reload issue?
>>>
>>
>> Please see what I mentioned above.  May be it can fix the degradation.
>> Rematerialization is important for performance and switching it of
>> completely is not wise.
>>
>>
>>
>>> I collected some performance numbers for EEMBC and SPEC2000 benchmarks.
>>> Here are patch results for -Ofast optlevel with LTO collectd on Avoton
>>> server:
>>> AUTOmark +1,9%
>>> TELECOMmark +4,0%
>>> DENmark +10,0%
>>> SPEC2000 -0,5%
>>>
>>> There are few degradations on EEMBC benchmarks but on SPEC2000 situation
>>> is different and we see more performance losses.  Some of them are caused by
>>> disabled rematerialization of address constants.  In some cases relaxed ebx
>>> causes more spills/fills in plaecs where GOT is frequently used.  There are
>>> also some minor fixes required in the patch to allow more efficient function
>>> prolog (avoid unnecessary GOT register initialization and allow its
>>> initialization without ebx usage).  Suppose some performance problems may be
>>> resolved but a good fix for reload should go first.
>>>
>>>
>>
>> Ilya, the optimization you are trying to implement is important in many
>> cases and should be in some way included in gcc.  If the degradations can be
>> solved in a way i mentioned above we could introduce a machine-dependent
>> flag.
>>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Enable EBX for x86 in 32bits PIC code
  2014-08-26  8:57               ` Ilya Enkovich
@ 2014-08-26 15:25                 ` Vladimir Makarov
  2014-08-26 21:42                   ` Ilya Enkovich
  0 siblings, 1 reply; 49+ messages in thread
From: Vladimir Makarov @ 2014-08-26 15:25 UTC (permalink / raw)
  To: Ilya Enkovich
  Cc: gcc, gcc-patches, Evgeny Stupachenko, Richard Biener,
	Uros Bizjak, Jeff Law

On 08/26/2014 04:57 AM, Ilya Enkovich wrote:
> 2014-08-26 11:49 GMT+04:00 Ilya Enkovich <enkovich.gnu@gmail.com>:
>> 2014-08-25 19:08 GMT+04:00 Vladimir Makarov <vmakarov@redhat.com>:
>>> On 2014-08-22 8:21 AM, Ilya Enkovich wrote:
>>>> Hi,
>>>>
>>>> On Cauldron 2014 we had a couple of talks about relaxation of ebx usage in
>>>> 32bit PIC mode.  It was decided that the best approach would be to not fix
>>>> ebx register, use speudo register for GOT base address and let allocator do
>>>> the rest.  This should be similar to how clang and icc work with GOT base
>>>> address.  I've been working for some time on such patch and now want to
>>>> share my results.
>>>>
>>>> The idea of the patch was very simple and included few things;
>>>>   1.  Set PIC_OFFSET_TABLE_REGNUM to INVALID_REGNUM to specify that we do
>>>> not have any hard reg fixed for PIC.
>>>>   2.  Initialize pic_offset_table_rtx with a new pseudo register in the
>>>> begining of a function expand.
>>>>   3.  Change ABI so that there is a possible implicit PIC argument for
>>>> calls; pic_offset_table_rtx is used as an arg value if such implicit arg
>>>> exist.
>>>>
>>>> Such approach worked well on small tests but trying to run some benchmarks
>>>> we faced a problem with reload of address constants.  The problem is that
>>>> when we try to rematerialize address constant or some constant memory
>>>> reference, we have to use pic_offset_table_rtx.  It means we insert new
>>>> usages of a speudo register and alocator cannot handle it correctly.  Same
>>>> problem also applies for float and vector constants.
>>>>
>>>> Rematerialization is not the only case causing new pic_offset_table_rtx
>>>> usage.  Another case is a split of some instructions using constant but not
>>>> having proper constraints.  E.g. pushtf pattern allows push of constant but
>>>> it has to be replaced with push of memory in reload pass causing additional
>>>> usage of pic_offset_table_rtx.
>>>>
>>>> There are two ways to fix it.  The first one is to support modifications
>>>> of pseudo register live range during reload and correctly allocate hard regs
>>>> for its new usages (currently we have some hard reg allocated for new usage
>>>> of pseudo reg but it may contain value of some other pseudo reg; thus we
>>>> reveal the problem at runtime only).
>>>>
>>> I believe there is already code to deal with this situation.  It is code for
>>> risky transformations (please check flag lra_risky_transformation_p).  If
>>> this flag is set, next lra assign subpass is running and checking
>>> correctness of assignments (e.g. checking situation when two different
>>> pseudos have intersected live ranges and the same assigned hard reg.  If
>>> such dangerous situation is found, it is fixed).
>> I tried to remove my restrictions from setup_reg_equiv and initialize
>> lra_risky_transformation_p with 'true' in lra_constraints instead.  I
>> got only 50% pass rate for SPEC2000 on Ofast with LTO.  Will search
>> for fail reason.
> I've looked into one of fails.  There is still a problem with
> allocation in reload. Here is a piece of code which uses float
> constant:
>
> (insn 1199 1198 1200 96 (set (reg:SI 3 bx)
>         (reg:SI 1301 [528])) /usr/include/bits/stdlib-float.h:28 90
> {*movsi_internal}
>      (nil))
> (call_insn 1200 1199 1201 96 (set (reg:DF 8 st)
>         (call (mem:QI (symbol_ref:SI ("strtod") [flags 0x41]
> <function_decl 0x2b29b8ea8900 strtod>) [0 strtod S1 A8])
>             (const_int 8 [0x8]))) /usr/include/bits/stdlib-float.h:28
> 661 {*call_value}
>      (expr_list:REG_DEAD (reg:SI 3 bx)
>         (expr_list:REG_CALL_DECL (symbol_ref:SI ("strtod") [flags
> 0x41]  <function_decl 0x2b29b8ea8900 strtod>)
>             (expr_list:REG_EH_REGION (const_int 0 [0])
>                 (nil))))
>     (expr_list (use (reg:SI 3 bx))
>         (expr_list:SI (use (reg:SI 3 bx))
>             (expr_list:SI (use (mem/f:SI (reg/f:SI 7 sp) [0  S4 A32]))
>                 (expr_list:SI (use (mem/f:SI (plus:SI (reg/f:SI 7 sp)
>                                 (const_int 4 [0x4])) [0  S4 A32]))
>                     (nil))))))
> (insn 1201 1200 1202 96 (set (reg:DF 321 [ D.7817 ])
>         (reg:DF 8 st)) /usr/include/bits/stdlib-float.h:28 128 {*movdf_internal}
>      (expr_list:REG_DEAD (reg:DF 8 st)
>         (nil)))
> (insn 1202 1201 1203 96 (set (reg:SF 322 [ D.7804 ])
>         (float_truncate:SF (reg:DF 321 [ D.7817 ]))) read_arch.c:700
> 157 {*truncdfsf_fast_sse}
>      (expr_list:REG_DEAD (reg:DF 321 [ D.7817 ])
>         (nil)))
> (insn 1203 1202 1204 96 (set (mem:SF (reg/f:SI 198 [ D.7812 ]) [4
> _130->frequency+0 S4 A32])
>         (reg:SF 322 [ D.7804 ])) read_arch.c:700 129 {*movsf_internal}
>      (nil))
> (insn 1204 1203 1205 96 (set (reg:SF 1209)
>         (mem/u/c:SF (plus:SI (reg:SI 1301 [528])
>                 (const:SI (unspec:SI [
>                             (symbol_ref/u:SI ("*.LC12") [flags 0x2])
>                         ] UNSPEC_GOTOFF))) [4  S4 A32]))
> read_arch.c:701 129 {*movsf_internal}
>      (expr_list:REG_EQUAL (const_double:SF 0.0 [0x0.0p+0])
>         (nil)))
> (note 1205 1204 1206 96 NOTE_INSN_DELETED)
> (note 1206 1205 1207 96 NOTE_INSN_DELETED)
> (insn 1207 1206 1208 96 (set (reg:CCFP 17 flags)
>         (compare:CCFP (reg:SF 1209)
>             (reg:SF 322 [ D.7804 ]))) read_arch.c:701 53 {*cmpisf_sse}
>      (nil))
> (jump_insn 1208 1207 3075 96 (set (pc)
>         (if_then_else (ge (reg:CCFP 17 flags)
>                 (const_int 0 [0]))
>             (label_ref:SI 3114)
>             (pc))) read_arch.c:701 606 {*jcc_1}
>      (expr_list:REG_DEAD (reg:CCFP 17 flags)
>         (int_list:REG_BR_PROB 2 (nil)))
>  -> 3114)
> (note 3075 1208 1209 97 [bb 97] NOTE_INSN_BASIC_BLOCK)
> (insn 1209 3075 1210 97 (set (reg:SF 1208)
>         (mem/u/c:SF (plus:SI (reg:SI 1301 [528])
>                 (const:SI (unspec:SI [
>                             (symbol_ref/u:SI ("*.LC11") [flags 0x2])
>                         ] UNSPEC_GOTOFF))) [4  S4 A32]))
> read_arch.c:701 129 {*movsf_internal}
>      (expr_list:REG_EQUIV (const_double:SF 1.0e+0 [0x0.8p+1])
>         (nil)))
> (note 1210 1209 1211 97 NOTE_INSN_DELETED)
> (note 1211 1210 1212 97 NOTE_INSN_DELETED)
> (insn 1212 1211 1213 97 (set (reg:CCFP 17 flags)
>         (compare:CCFP (reg:SF 322 [ D.7804 ])
>             (reg:SF 1208))) read_arch.c:701 53 {*cmpisf_sse}
>      (nil))
>
> We have PIC register r1301 (former r528) used for constant load (insn
> 1209).  This register was actually loaded to bx (insn 1199) and this
> hard reg may be used by insn 1209.  During reload we have insn 1209
> removed and a new one created instead:
>
> (insn 3864 1211 1212 104 (set (reg:SI 0 ax [1468])
>         (plus:SI (reg:SI 6 bp [528])
>             (const:SI (unspec:SI [
>                         (symbol_ref/u:SI ("*.LC11") [flags 0x2])
>                     ] UNSPEC_GOTOFF)))) read_arch.c:701 213 {*leasi}
>      (expr_list:REG_EQUAL (symbol_ref/u:SI ("*.LC11") [flags 0x2])
>         (nil)))
> (insn 1212 3864 1213 104 (set (reg:CCFP 17 flags)
>         (compare:CCFP (reg:SF 21 xmm0 [orig:322 D.7804 ] [322])
>             (mem/u/c:SF (reg:SI 0 ax [1468]) [4  S4 A32])))
> read_arch.c:701 53 {*cmpisf_sse}
>      (nil))
>
> In this new instruction bp is used which is wrong. We actually have
> required value in bx. In debugger I also checked that bp doesn't have
> required value.  I suppose I enabled flag correctly because found this
> in the log: "Spill r1301 after risky transformations".  Is it possible
> we are still not allowed to use the original PIC register (r528) and
> should use a reg copy created for particular region (in this case
> r1301)?
>
It is hard for me to say without the full patch and the test.  I can
only guess that 1301 gets a wrong class and therefore assigned to the
wrong hard ref.

Could you send me the patch and the test.  I'll look at this and inform
you what is going on.



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Enable EBX for x86 in 32bits PIC code
  2014-08-26 15:25                 ` Vladimir Makarov
@ 2014-08-26 21:42                   ` Ilya Enkovich
  2014-08-27 20:19                     ` Vladimir Makarov
  2014-08-27 21:39                     ` Enable EBX for x86 in 32bits PIC code Jeff Law
  0 siblings, 2 replies; 49+ messages in thread
From: Ilya Enkovich @ 2014-08-26 21:42 UTC (permalink / raw)
  To: Vladimir Makarov
  Cc: gcc, gcc-patches, Evgeny Stupachenko, Richard Biener,
	Uros Bizjak, Jeff Law

On 26 Aug 11:25, Vladimir Makarov wrote:
> On 08/26/2014 04:57 AM, Ilya Enkovich wrote:
> > I've looked into one of fails.  There is still a problem with
> > allocation in reload. Here is a piece of code which uses float
> > constant:
> >
> > (insn 1199 1198 1200 96 (set (reg:SI 3 bx)
> >         (reg:SI 1301 [528])) /usr/include/bits/stdlib-float.h:28 90
> > {*movsi_internal}
> >      (nil))
> > (call_insn 1200 1199 1201 96 (set (reg:DF 8 st)
> >         (call (mem:QI (symbol_ref:SI ("strtod") [flags 0x41]
> > <function_decl 0x2b29b8ea8900 strtod>) [0 strtod S1 A8])
> >             (const_int 8 [0x8]))) /usr/include/bits/stdlib-float.h:28
> > 661 {*call_value}
> >      (expr_list:REG_DEAD (reg:SI 3 bx)
> >         (expr_list:REG_CALL_DECL (symbol_ref:SI ("strtod") [flags
> > 0x41]  <function_decl 0x2b29b8ea8900 strtod>)
> >             (expr_list:REG_EH_REGION (const_int 0 [0])
> >                 (nil))))
> >     (expr_list (use (reg:SI 3 bx))
> >         (expr_list:SI (use (reg:SI 3 bx))
> >             (expr_list:SI (use (mem/f:SI (reg/f:SI 7 sp) [0  S4 A32]))
> >                 (expr_list:SI (use (mem/f:SI (plus:SI (reg/f:SI 7 sp)
> >                                 (const_int 4 [0x4])) [0  S4 A32]))
> >                     (nil))))))
> > (insn 1201 1200 1202 96 (set (reg:DF 321 [ D.7817 ])
> >         (reg:DF 8 st)) /usr/include/bits/stdlib-float.h:28 128 {*movdf_internal}
> >      (expr_list:REG_DEAD (reg:DF 8 st)
> >         (nil)))
> > (insn 1202 1201 1203 96 (set (reg:SF 322 [ D.7804 ])
> >         (float_truncate:SF (reg:DF 321 [ D.7817 ]))) read_arch.c:700
> > 157 {*truncdfsf_fast_sse}
> >      (expr_list:REG_DEAD (reg:DF 321 [ D.7817 ])
> >         (nil)))
> > (insn 1203 1202 1204 96 (set (mem:SF (reg/f:SI 198 [ D.7812 ]) [4
> > _130->frequency+0 S4 A32])
> >         (reg:SF 322 [ D.7804 ])) read_arch.c:700 129 {*movsf_internal}
> >      (nil))
> > (insn 1204 1203 1205 96 (set (reg:SF 1209)
> >         (mem/u/c:SF (plus:SI (reg:SI 1301 [528])
> >                 (const:SI (unspec:SI [
> >                             (symbol_ref/u:SI ("*.LC12") [flags 0x2])
> >                         ] UNSPEC_GOTOFF))) [4  S4 A32]))
> > read_arch.c:701 129 {*movsf_internal}
> >      (expr_list:REG_EQUAL (const_double:SF 0.0 [0x0.0p+0])
> >         (nil)))
> > (note 1205 1204 1206 96 NOTE_INSN_DELETED)
> > (note 1206 1205 1207 96 NOTE_INSN_DELETED)
> > (insn 1207 1206 1208 96 (set (reg:CCFP 17 flags)
> >         (compare:CCFP (reg:SF 1209)
> >             (reg:SF 322 [ D.7804 ]))) read_arch.c:701 53 {*cmpisf_sse}
> >      (nil))
> > (jump_insn 1208 1207 3075 96 (set (pc)
> >         (if_then_else (ge (reg:CCFP 17 flags)
> >                 (const_int 0 [0]))
> >             (label_ref:SI 3114)
> >             (pc))) read_arch.c:701 606 {*jcc_1}
> >      (expr_list:REG_DEAD (reg:CCFP 17 flags)
> >         (int_list:REG_BR_PROB 2 (nil)))
> >  -> 3114)
> > (note 3075 1208 1209 97 [bb 97] NOTE_INSN_BASIC_BLOCK)
> > (insn 1209 3075 1210 97 (set (reg:SF 1208)
> >         (mem/u/c:SF (plus:SI (reg:SI 1301 [528])
> >                 (const:SI (unspec:SI [
> >                             (symbol_ref/u:SI ("*.LC11") [flags 0x2])
> >                         ] UNSPEC_GOTOFF))) [4  S4 A32]))
> > read_arch.c:701 129 {*movsf_internal}
> >      (expr_list:REG_EQUIV (const_double:SF 1.0e+0 [0x0.8p+1])
> >         (nil)))
> > (note 1210 1209 1211 97 NOTE_INSN_DELETED)
> > (note 1211 1210 1212 97 NOTE_INSN_DELETED)
> > (insn 1212 1211 1213 97 (set (reg:CCFP 17 flags)
> >         (compare:CCFP (reg:SF 322 [ D.7804 ])
> >             (reg:SF 1208))) read_arch.c:701 53 {*cmpisf_sse}
> >      (nil))
> >
> > We have PIC register r1301 (former r528) used for constant load (insn
> > 1209).  This register was actually loaded to bx (insn 1199) and this
> > hard reg may be used by insn 1209.  During reload we have insn 1209
> > removed and a new one created instead:
> >
> > (insn 3864 1211 1212 104 (set (reg:SI 0 ax [1468])
> >         (plus:SI (reg:SI 6 bp [528])
> >             (const:SI (unspec:SI [
> >                         (symbol_ref/u:SI ("*.LC11") [flags 0x2])
> >                     ] UNSPEC_GOTOFF)))) read_arch.c:701 213 {*leasi}
> >      (expr_list:REG_EQUAL (symbol_ref/u:SI ("*.LC11") [flags 0x2])
> >         (nil)))
> > (insn 1212 3864 1213 104 (set (reg:CCFP 17 flags)
> >         (compare:CCFP (reg:SF 21 xmm0 [orig:322 D.7804 ] [322])
> >             (mem/u/c:SF (reg:SI 0 ax [1468]) [4  S4 A32])))
> > read_arch.c:701 53 {*cmpisf_sse}
> >      (nil))
> >
> > In this new instruction bp is used which is wrong. We actually have
> > required value in bx. In debugger I also checked that bp doesn't have
> > required value.  I suppose I enabled flag correctly because found this
> > in the log: "Spill r1301 after risky transformations".  Is it possible
> > we are still not allowed to use the original PIC register (r528) and
> > should use a reg copy created for particular region (in this case
> > r1301)?
> >
> It is hard for me to say without the full patch and the test.  I can
> only guess that 1301 gets a wrong class and therefore assigned to the
> wrong hard ref.
> 
> Could you send me the patch and the test.  I'll look at this and inform
> you what is going on.
> 
> 
> 

Hi,

Here is a patch I tried.  I apply it over revision 214215.  Unfortunately I do not have a small reproducer but the problem can be easily reproduced on SPEC2000 benchmark 175.vpr.  The problem is in read_arch.c:701 where float value is compared with float constant 1.0.  It is inlined into read_arch function and can be easily found in RTL dump of function read_arch as a float comparison with 1.0 after the first call to strtod function.

Here is a compilation string I use:

gcc -m32 -mno-movbe -g3 -fdump-rtl-all-details -O2 -ffast-math -mfpmath=sse -m32  -march=slm -fPIE -pie -c -o read_arch.o       -DSPEC_CPU2000        read_arch.c

In my final assembler comparison with 1.0 looks like:

comiss  .LC11@GOTOFF(%ebp), %xmm0       # 1101  *cmpisf_sse     [length = 7]

and %ebp here doesn't have a proper value.

I'll try to make a smaller reproducer if these instructions don't help.

Thank you for your help!
Ilya
--
diff --git a/gcc/calls.c b/gcc/calls.c
index 4285ec1..85dae6b 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -1122,6 +1122,14 @@ initialize_argument_information (int num_actuals ATTRIBUTE_UNUSED,
     call_expr_arg_iterator iter;
     tree arg;
 
+    if (targetm.calls.implicit_pic_arg (fndecl ? fndecl : fntype))
+      {
+	gcc_assert (pic_offset_table_rtx);
+	args[j].tree_value = make_tree (ptr_type_node,
+					pic_offset_table_rtx);
+	j--;
+      }
+
     if (struct_value_addr_value)
       {
 	args[j].tree_value = struct_value_addr_value;
@@ -2520,6 +2528,10 @@ expand_call (tree exp, rtx target, int ignore)
     /* Treat all args as named.  */
     n_named_args = num_actuals;
 
+  /* Add implicit PIC arg.  */
+  if (targetm.calls.implicit_pic_arg (fndecl ? fndecl : fntype))
+    num_actuals++;
+
   /* Make a vector to hold all the information about each arg.  */
   args = XALLOCAVEC (struct arg_data, num_actuals);
   memset (args, 0, num_actuals * sizeof (struct arg_data));
@@ -3133,6 +3145,8 @@ expand_call (tree exp, rtx target, int ignore)
 	{
 	  int arg_nr = return_flags & ERF_RETURN_ARG_MASK;
 	  arg_nr = num_actuals - arg_nr - 1;
+	  if (targetm.calls.implicit_pic_arg (fndecl ? fndecl : fntype))
+	    arg_nr--;
 	  if (arg_nr >= 0
 	      && arg_nr < num_actuals
 	      && args[arg_nr].reg
@@ -3700,8 +3714,8 @@ emit_library_call_value_1 (int retval, rtx orgfun, rtx value,
      of the full argument passing conventions to limit complexity here since
      library functions shouldn't have many args.  */
 
-  argvec = XALLOCAVEC (struct arg, nargs + 1);
-  memset (argvec, 0, (nargs + 1) * sizeof (struct arg));
+  argvec = XALLOCAVEC (struct arg, nargs + 2);
+  memset (argvec, 0, (nargs + 2) * sizeof (struct arg));
 
 #ifdef INIT_CUMULATIVE_LIBCALL_ARGS
   INIT_CUMULATIVE_LIBCALL_ARGS (args_so_far_v, outmode, fun);
@@ -3717,6 +3731,23 @@ emit_library_call_value_1 (int retval, rtx orgfun, rtx value,
 
   push_temp_slots ();
 
+  if (targetm.calls.implicit_pic_arg (fndecl ? fndecl : fntype))
+    {
+      gcc_assert (pic_offset_table_rtx);
+
+      argvec[count].value = pic_offset_table_rtx;
+      argvec[count].mode = Pmode;
+      argvec[count].partial = 0;
+
+      argvec[count].reg = targetm.calls.function_arg (args_so_far,
+						      Pmode, NULL_TREE, true);
+
+      targetm.calls.function_arg_advance (args_so_far, Pmode, NULL_TREE, true);
+
+      count++;
+      nargs++;
+    }
+
   /* If there's a structure value address to be passed,
      either pass it in the special place, or pass it as an extra argument.  */
   if (mem_value && struct_value == 0 && ! pcc_struct_value)
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index cc4b0c7..cfafcdd 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -6133,6 +6133,21 @@ ix86_maybe_switch_abi (void)
     reinit_regs ();
 }
 
+/* Return reg in which implicit PIC base address
+   arg is passed.  */
+static rtx
+ix86_implicit_pic_arg (const_tree fntype_or_decl ATTRIBUTE_UNUSED)
+{
+  if ((TARGET_64BIT
+       && (ix86_cmodel == CM_SMALL_PIC
+	   || TARGET_PECOFF))
+      || !flag_pic
+      || !X86_TUNE_RELAX_PIC_REG)
+    return NULL_RTX;
+
+  return gen_rtx_REG (Pmode, REAL_PIC_OFFSET_TABLE_REGNUM);
+}
+
 /* Initialize a variable CUM of type CUMULATIVE_ARGS
    for a call to a function whose data type is FNTYPE.
    For a library call, FNTYPE is 0.  */
@@ -6198,6 +6213,11 @@ init_cumulative_args (CUMULATIVE_ARGS *cum,  /* Argument info to initialize */
 		      ? (!prototype_p (fntype) || stdarg_p (fntype))
 		      : !libname);
 
+  if (caller)
+    cum->implicit_pic_arg = ix86_implicit_pic_arg (fndecl ? fndecl : fntype);
+  else
+    cum->implicit_pic_arg = NULL_RTX;
+
   if (!TARGET_64BIT)
     {
       /* If there are variable arguments, then we won't pass anything
@@ -7291,7 +7311,9 @@ ix86_function_arg_advance (cumulative_args_t cum_v, enum machine_mode mode,
   if (type)
     mode = type_natural_mode (type, NULL, false);
 
-  if (TARGET_64BIT && (cum ? cum->call_abi : ix86_abi) == MS_ABI)
+  if (cum->implicit_pic_arg)
+    cum->implicit_pic_arg = NULL_RTX;
+  else if (TARGET_64BIT && (cum ? cum->call_abi : ix86_abi) == MS_ABI)
     function_arg_advance_ms_64 (cum, bytes, words);
   else if (TARGET_64BIT)
     function_arg_advance_64 (cum, mode, type, words, named);
@@ -7542,7 +7564,9 @@ ix86_function_arg (cumulative_args_t cum_v, enum machine_mode omode,
   if (type && TREE_CODE (type) == VECTOR_TYPE)
     mode = type_natural_mode (type, cum, false);
 
-  if (TARGET_64BIT && (cum ? cum->call_abi : ix86_abi) == MS_ABI)
+  if (cum->implicit_pic_arg)
+    arg = cum->implicit_pic_arg;
+  else if (TARGET_64BIT && (cum ? cum->call_abi : ix86_abi) == MS_ABI)
     arg = function_arg_ms_64 (cum, mode, omode, named, bytes);
   else if (TARGET_64BIT)
     arg = function_arg_64 (cum, mode, omode, type, named);
@@ -9373,6 +9397,9 @@ gen_pop (rtx arg)
 static unsigned int
 ix86_select_alt_pic_regnum (void)
 {
+  if (ix86_implicit_pic_arg (NULL))
+    return INVALID_REGNUM;
+
   if (crtl->is_leaf
       && !crtl->profile
       && !ix86_current_function_calls_tls_descriptor)
@@ -11236,7 +11263,8 @@ ix86_expand_prologue (void)
 	}
       else
 	{
-          insn = emit_insn (gen_set_got (pic_offset_table_rtx));
+	  rtx reg = gen_rtx_REG (Pmode, REAL_PIC_OFFSET_TABLE_REGNUM);
+          insn = emit_insn (gen_set_got (reg));
 	  RTX_FRAME_RELATED_P (insn) = 1;
 	  add_reg_note (insn, REG_CFA_FLUSH_QUEUE, NULL_RTX);
 	}
@@ -11789,7 +11817,8 @@ ix86_expand_epilogue (int style)
 static void
 ix86_output_function_epilogue (FILE *file ATTRIBUTE_UNUSED, HOST_WIDE_INT)
 {
-  if (pic_offset_table_rtx)
+  if (pic_offset_table_rtx
+      && REGNO (pic_offset_table_rtx) < FIRST_PSEUDO_REGISTER)
     SET_REGNO (pic_offset_table_rtx, REAL_PIC_OFFSET_TABLE_REGNUM);
 #if TARGET_MACHO
   /* Mach-O doesn't support labels at the end of objects, so if
@@ -13107,6 +13136,15 @@ ix86_GOT_alias_set (void)
   return set;
 }
 
+/* Set regs_ever_live for PIC base address register
+   to true if required.  */
+static void
+set_pic_reg_ever_alive ()
+{
+  if (reload_in_progress)
+    df_set_regs_ever_live (REGNO (pic_offset_table_rtx), true);
+}
+
 /* Return a legitimate reference for ORIG (an address) using the
    register REG.  If REG is 0, a new pseudo is generated.
 
@@ -13157,8 +13195,7 @@ legitimize_pic_address (rtx orig, rtx reg)
       /* This symbol may be referenced via a displacement from the PIC
 	 base address (@GOTOFF).  */
 
-      if (reload_in_progress)
-	df_set_regs_ever_live (PIC_OFFSET_TABLE_REGNUM, true);
+      set_pic_reg_ever_alive ();
       if (GET_CODE (addr) == CONST)
 	addr = XEXP (addr, 0);
       if (GET_CODE (addr) == PLUS)
@@ -13190,8 +13227,7 @@ legitimize_pic_address (rtx orig, rtx reg)
       /* This symbol may be referenced via a displacement from the PIC
 	 base address (@GOTOFF).  */
 
-      if (reload_in_progress)
-	df_set_regs_ever_live (PIC_OFFSET_TABLE_REGNUM, true);
+      set_pic_reg_ever_alive ();
       if (GET_CODE (addr) == CONST)
 	addr = XEXP (addr, 0);
       if (GET_CODE (addr) == PLUS)
@@ -13252,8 +13288,7 @@ legitimize_pic_address (rtx orig, rtx reg)
 	  /* This symbol must be referenced via a load from the
 	     Global Offset Table (@GOT).  */
 
-	  if (reload_in_progress)
-	    df_set_regs_ever_live (PIC_OFFSET_TABLE_REGNUM, true);
+	  set_pic_reg_ever_alive ();
 	  new_rtx = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, addr), UNSPEC_GOT);
 	  new_rtx = gen_rtx_CONST (Pmode, new_rtx);
 	  if (TARGET_64BIT)
@@ -13305,8 +13340,7 @@ legitimize_pic_address (rtx orig, rtx reg)
 	    {
 	      if (!TARGET_64BIT)
 		{
-		  if (reload_in_progress)
-		    df_set_regs_ever_live (PIC_OFFSET_TABLE_REGNUM, true);
+		  set_pic_reg_ever_alive ();
 		  new_rtx = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, op0),
 					    UNSPEC_GOTOFF);
 		  new_rtx = gen_rtx_PLUS (Pmode, new_rtx, op1);
@@ -13601,8 +13635,7 @@ legitimize_tls_address (rtx x, enum tls_model model, bool for_mov)
 	}
       else if (flag_pic)
 	{
-	  if (reload_in_progress)
-	    df_set_regs_ever_live (PIC_OFFSET_TABLE_REGNUM, true);
+	  set_pic_reg_ever_alive ();
 	  pic = pic_offset_table_rtx;
 	  type = TARGET_ANY_GNU_TLS ? UNSPEC_GOTNTPOFF : UNSPEC_GOTTPOFF;
 	}
@@ -14233,6 +14266,8 @@ ix86_pic_register_p (rtx x)
   if (GET_CODE (x) == VALUE && CSELIB_VAL_PTR (x))
     return (pic_offset_table_rtx
 	    && rtx_equal_for_cselib_p (x, pic_offset_table_rtx));
+  else if (pic_offset_table_rtx)
+    return REG_P (x) && REGNO (x) == REGNO (pic_offset_table_rtx);
   else
     return REG_P (x) && REGNO (x) == PIC_OFFSET_TABLE_REGNUM;
 }
@@ -14408,7 +14443,9 @@ ix86_delegitimize_address (rtx x)
 	 ...
 	 movl foo@GOTOFF(%ecx), %edx
 	 in which case we return (%ecx - %ebx) + foo.  */
-      if (pic_offset_table_rtx)
+      if (pic_offset_table_rtx
+	  && (!reload_completed
+	      || REGNO (pic_offset_table_rtx) < FIRST_PSEUDO_REGISTER))
         result = gen_rtx_PLUS (Pmode, gen_rtx_MINUS (Pmode, copy_rtx (addend),
 						     pic_offset_table_rtx),
 			       result);
@@ -24915,7 +24952,7 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1,
 		  && DEFAULT_ABI != MS_ABI))
 	  && GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF
 	  && ! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0)))
-	use_reg (&use, pic_offset_table_rtx);
+	use_reg (&use, gen_rtx_REG (Pmode, REAL_PIC_OFFSET_TABLE_REGNUM));
     }
 
   if (TARGET_64BIT && INTVAL (callarg2) >= 0)
@@ -47228,6 +47265,8 @@ ix86_atomic_assign_expand_fenv (tree *hold, tree *clear, tree *update)
 #define TARGET_FUNCTION_ARG_ADVANCE ix86_function_arg_advance
 #undef TARGET_FUNCTION_ARG
 #define TARGET_FUNCTION_ARG ix86_function_arg
+#undef TARGET_IMPLICIT_PIC_ARG
+#define TARGET_IMPLICIT_PIC_ARG ix86_implicit_pic_arg
 #undef TARGET_FUNCTION_ARG_BOUNDARY
 #define TARGET_FUNCTION_ARG_BOUNDARY ix86_function_arg_boundary
 #undef TARGET_PASS_BY_REFERENCE
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 2c64162..d5fa250 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -1243,11 +1243,13 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
 
 #define REAL_PIC_OFFSET_TABLE_REGNUM  BX_REG
 
-#define PIC_OFFSET_TABLE_REGNUM				\
-  ((TARGET_64BIT && (ix86_cmodel == CM_SMALL_PIC	\
-                     || TARGET_PECOFF))		\
-   || !flag_pic ? INVALID_REGNUM			\
-   : reload_completed ? REGNO (pic_offset_table_rtx)	\
+#define PIC_OFFSET_TABLE_REGNUM						\
+  ((TARGET_64BIT && (ix86_cmodel == CM_SMALL_PIC			\
+                     || TARGET_PECOFF))					\
+   || !flag_pic ? INVALID_REGNUM					\
+   : X86_TUNE_RELAX_PIC_REG ? (pic_offset_table_rtx ? INVALID_REGNUM	\
+			       : REAL_PIC_OFFSET_TABLE_REGNUM)		\
+   : reload_completed ? REGNO (pic_offset_table_rtx)			\
    : REAL_PIC_OFFSET_TABLE_REGNUM)
 
 #define GOT_SYMBOL_NAME "_GLOBAL_OFFSET_TABLE_"
@@ -1652,6 +1654,7 @@ typedef struct ix86_args {
   int float_in_sse;		/* Set to 1 or 2 for 32bit targets if
 				   SFmode/DFmode arguments should be passed
 				   in SSE registers.  Otherwise 0.  */
+  rtx implicit_pic_arg;         /* Implicit PIC base address arg if passed.  */
   enum calling_abi call_abi;	/* Set to SYSV_ABI for sysv abi. Otherwise
  				   MS_ABI for ms abi.  */
 } CUMULATIVE_ARGS;
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 8e74eab..27028ba 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -2725,7 +2725,7 @@
 
 (define_insn "*pushtf"
   [(set (match_operand:TF 0 "push_operand" "=<,<")
-	(match_operand:TF 1 "general_no_elim_operand" "x,*roF"))]
+	(match_operand:TF 1 "nonimmediate_no_elim_operand" "x,*roF"))]
   "TARGET_64BIT || TARGET_SSE"
 {
   /* This insn should be already split before reg-stack.  */
@@ -2750,7 +2750,7 @@
 
 (define_insn "*pushxf"
   [(set (match_operand:XF 0 "push_operand" "=<,<")
-	(match_operand:XF 1 "general_no_elim_operand" "f,Yx*roF"))]
+	(match_operand:XF 1 "nonimmediate_no_elim_operand" "f,Yx*roF"))]
   ""
 {
   /* This insn should be already split before reg-stack.  */
@@ -2781,7 +2781,7 @@
 
 (define_insn "*pushdf"
   [(set (match_operand:DF 0 "push_operand" "=<,<,<,<")
-	(match_operand:DF 1 "general_no_elim_operand" "f,Yd*roF,rmF,x"))]
+	(match_operand:DF 1 "nonimmediate_no_elim_operand" "f,Yd*roF,rmF,x"))]
   ""
 {
   /* This insn should be already split before reg-stack.  */
diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
index 62970be..56eca24 100644
--- a/gcc/config/i386/predicates.md
+++ b/gcc/config/i386/predicates.md
@@ -580,6 +580,12 @@
     (match_operand 0 "register_no_elim_operand")
     (match_operand 0 "general_operand")))
 
+;; Return false if this is any eliminable register.  Otherwise nonimmediate_operand.
+(define_predicate "nonimmediate_no_elim_operand"
+  (if_then_else (match_code "reg,subreg")
+    (match_operand 0 "register_no_elim_operand")
+    (match_operand 0 "nonimmediate_operand")))
+
 ;; Return false if this is any eliminable register.  Otherwise
 ;; register_operand or a constant.
 (define_predicate "nonmemory_no_elim_operand"
diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
index 215c63c..ffb7a2d 100644
--- a/gcc/config/i386/x86-tune.def
+++ b/gcc/config/i386/x86-tune.def
@@ -537,3 +537,6 @@ DEF_TUNE (X86_TUNE_PROMOTE_QI_REGS, "promote_qi_regs", 0)
    unrolling small loop less important. For, such architectures we adjust
    the unroll factor so that the unrolled loop fits the loop buffer.  */
 DEF_TUNE (X86_TUNE_ADJUST_UNROLL, "adjust_unroll_factor", m_BDVER3 | m_BDVER4)
+
+/* X86_TUNE_RELAX_PIC_REG: Do not fix hard register for GOT base usage.  */
+DEF_TUNE (X86_TUNE_RELAX_PIC_REG, "relax_pic_reg", ~0)
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 9dd8d68..33b36be 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -3967,6 +3967,12 @@ If @code{TARGET_FUNCTION_INCOMING_ARG} is not defined,
 @code{TARGET_FUNCTION_ARG} serves both purposes.
 @end deftypefn
 
+@deftypefn {Target Hook} rtx TARGET_IMPLICIT_PIC_ARG (const_tree @var{fntype_or_decl})
+This hook returns register holding PIC base address for functions
+which do not fix hard register but handle it similar to function arg
+assigning a virtual reg for it.
+@end deftypefn
+
 @deftypefn {Target Hook} int TARGET_ARG_PARTIAL_BYTES (cumulative_args_t @var{cum}, enum machine_mode @var{mode}, tree @var{type}, bool @var{named})
 This target hook returns the number of bytes at the beginning of an
 argument that must be put in registers.  The value must be zero for
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index dd72b98..3e6da2f 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -3413,6 +3413,8 @@ the stack.
 
 @hook TARGET_FUNCTION_INCOMING_ARG
 
+@hook TARGET_IMPLICIT_PIC_ARG
+
 @hook TARGET_ARG_PARTIAL_BYTES
 
 @hook TARGET_PASS_BY_REFERENCE
diff --git a/gcc/function.c b/gcc/function.c
index 8156766..3a85c16 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -3456,6 +3456,15 @@ assign_parms (tree fndecl)
 
   fnargs.release ();
 
+  /* Handle implicit PIC arg if any.  */
+  if (targetm.calls.implicit_pic_arg (fndecl))
+    {
+      rtx old_reg = targetm.calls.implicit_pic_arg (fndecl);
+      rtx new_reg = gen_reg_rtx (GET_MODE (old_reg));
+      emit_move_insn (new_reg, old_reg);
+      pic_offset_table_rtx = new_reg;
+    }
+
   /* Output all parameter conversion instructions (possibly including calls)
      now that all parameters have been copied out of hard registers.  */
   emit_insn (all.first_conversion_insn);
diff --git a/gcc/hooks.c b/gcc/hooks.c
index 5c06562..47784e2 100644
--- a/gcc/hooks.c
+++ b/gcc/hooks.c
@@ -352,6 +352,13 @@ hook_rtx_rtx_null (rtx x ATTRIBUTE_UNUSED)
   return NULL;
 }
 
+/* Generic hook that takes a const_tree arg and returns NULL_RTX.  */
+rtx
+hook_rtx_const_tree_null (const_tree a ATTRIBUTE_UNUSED)
+{
+  return NULL;
+}
+
 /* Generic hook that takes a tree and an int and returns NULL_RTX.  */
 rtx
 hook_rtx_tree_int_null (tree a ATTRIBUTE_UNUSED, int b ATTRIBUTE_UNUSED)
diff --git a/gcc/hooks.h b/gcc/hooks.h
index ba42b6c..cf830ef 100644
--- a/gcc/hooks.h
+++ b/gcc/hooks.h
@@ -100,6 +100,7 @@ extern bool default_can_output_mi_thunk_no_vcall (const_tree, HOST_WIDE_INT,
 
 extern rtx hook_rtx_rtx_identity (rtx);
 extern rtx hook_rtx_rtx_null (rtx);
+extern rtx hook_rtx_const_tree_null (const_tree);
 extern rtx hook_rtx_tree_int_null (tree, int);
 
 extern const char *hook_constcharptr_void_null (void);
diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index a43f8dc..253934b 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -4017,7 +4017,11 @@ lra_constraints (bool first_p)
       ("Maximum number of LRA constraint passes is achieved (%d)\n",
        LRA_MAX_CONSTRAINT_ITERATION_NUMBER);
   changed_p = false;
-  lra_risky_transformations_p = false;
+  if (pic_offset_table_rtx
+      && REGNO (pic_offset_table_rtx) >= FIRST_PSEUDO_REGISTER)
+    lra_risky_transformations_p = true;
+  else
+    lra_risky_transformations_p = false;
   new_insn_uid_start = get_max_uid ();
   new_regno_start = first_p ? lra_constraint_new_regno_start : max_reg_num ();
   /* Mark used hard regs for target stack size calulations.  */
diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index bc16437..1cd7ea3 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -110,7 +110,8 @@ rtx_unstable_p (const_rtx x)
       /* ??? When call-clobbered, the value is stable modulo the restore
 	 that must happen after a call.  This currently screws up local-alloc
 	 into believing that the restore is not needed.  */
-      if (!PIC_OFFSET_TABLE_REG_CALL_CLOBBERED && x == pic_offset_table_rtx)
+      if (!PIC_OFFSET_TABLE_REG_CALL_CLOBBERED && x == pic_offset_table_rtx
+	  && REGNO (pic_offset_table_rtx) < FIRST_PSEUDO_REGISTER)
 	return 0;
       return 1;
 
@@ -185,7 +186,9 @@ rtx_varies_p (const_rtx x, bool for_alias)
 	     that must happen after a call.  This currently screws up
 	     local-alloc into believing that the restore is not needed, so we
 	     must return 0 only if we are called from alias analysis.  */
-	  && (!PIC_OFFSET_TABLE_REG_CALL_CLOBBERED || for_alias))
+	  && ((!PIC_OFFSET_TABLE_REG_CALL_CLOBBERED
+	       && REGNO (pic_offset_table_rtx) < FIRST_PSEUDO_REGISTER)
+	      || for_alias))
 	return 0;
       return 1;
 
diff --git a/gcc/shrink-wrap.c b/gcc/shrink-wrap.c
index 5c34fee..50de8d5 100644
--- a/gcc/shrink-wrap.c
+++ b/gcc/shrink-wrap.c
@@ -448,7 +448,7 @@ try_shrink_wrapping (edge *entry_edge, edge orig_entry_edge,
     {
       HARD_REG_SET prologue_clobbered, prologue_used, live_on_edge;
       struct hard_reg_set_container set_up_by_prologue;
-      rtx p_insn;
+      rtx p_insn, reg;
       vec<basic_block> vec;
       basic_block bb;
       bitmap_head bb_antic_flags;
@@ -494,9 +494,13 @@ try_shrink_wrapping (edge *entry_edge, edge orig_entry_edge,
       if (frame_pointer_needed)
 	add_to_hard_reg_set (&set_up_by_prologue.set, Pmode,
 			     HARD_FRAME_POINTER_REGNUM);
-      if (pic_offset_table_rtx)
+      if (pic_offset_table_rtx
+	  && PIC_OFFSET_TABLE_REGNUM != INVALID_REGNUM)
 	add_to_hard_reg_set (&set_up_by_prologue.set, Pmode,
 			     PIC_OFFSET_TABLE_REGNUM);
+      if ((reg = targetm.calls.implicit_pic_arg (current_function_decl)))
+	add_to_hard_reg_set (&set_up_by_prologue.set,
+			     Pmode, REGNO (reg));
       if (crtl->drap_reg)
 	add_to_hard_reg_set (&set_up_by_prologue.set,
 			     GET_MODE (crtl->drap_reg),
diff --git a/gcc/target.def b/gcc/target.def
index 3a41db1..5c221b6 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -3976,6 +3976,14 @@ If @code{TARGET_FUNCTION_INCOMING_ARG} is not defined,\n\
  default_function_incoming_arg)
 
 DEFHOOK
+(implicit_pic_arg,
+ "This hook returns register holding PIC base address for functions\n\
+which do not fix hard register but handle it similar to function arg\n\
+assigning a virtual reg for it.",
+ rtx, (const_tree fntype_or_decl),
+ hook_rtx_const_tree_null)
+
+DEFHOOK
 (function_arg_boundary,
  "This hook returns the alignment boundary, in bits, of an argument\n\
 with the specified mode and type.  The default hook returns\n\

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Enable EBX for x86 in 32bits PIC code
  2014-08-26 21:42                   ` Ilya Enkovich
@ 2014-08-27 20:19                     ` Vladimir Makarov
  2014-08-28  8:28                       ` Ilya Enkovich
  2014-08-27 21:39                     ` Enable EBX for x86 in 32bits PIC code Jeff Law
  1 sibling, 1 reply; 49+ messages in thread
From: Vladimir Makarov @ 2014-08-27 20:19 UTC (permalink / raw)
  To: Ilya Enkovich
  Cc: gcc, gcc-patches, Evgeny Stupachenko, Richard Biener,
	Uros Bizjak, Jeff Law

[-- Attachment #1: Type: text/plain, Size: 1470 bytes --]

On 2014-08-26 5:42 PM, Ilya Enkovich wrote:
> Hi,
>
> Here is a patch I tried.  I apply it over revision 214215.  Unfortunately I do not have a small reproducer but the problem can be easily reproduced on SPEC2000 benchmark 175.vpr.  The problem is in read_arch.c:701 where float value is compared with float constant 1.0.  It is inlined into read_arch function and can be easily found in RTL dump of function read_arch as a float comparison with 1.0 after the first call to strtod function.
>
> Here is a compilation string I use:
>
> gcc -m32 -mno-movbe -g3 -fdump-rtl-all-details -O2 -ffast-math -mfpmath=sse -m32  -march=slm -fPIE -pie -c -o read_arch.o       -DSPEC_CPU2000        read_arch.c
>
> In my final assembler comparison with 1.0 looks like:
>
> comiss  .LC11@GOTOFF(%ebp), %xmm0       # 1101  *cmpisf_sse     [length = 7]
>
> and %ebp here doesn't have a proper value.
>
> I'll try to make a smaller reproducer if these instructions don't help.

I've managed to reproduce it.  Although it would be better to send the 
patch as an attachment.

The problem is actually in IRA not LRA.  IRA splits pseudo used for PIC. 
  Then in a region when a *new* pseudo used as PIC we rematerialize a 
constant which transformed in memory addressed through *original* PIC 
pseudo.

To solve the problem we should prevent such splitting and guarantee that 
PIC pseudo allocnos in different region gets the same hard reg.

The following patch should solve the problem.



[-- Attachment #2: z --]
[-- Type: text/plain, Size: 1281 bytes --]

Index: ira-color.c
===================================================================
--- ira-color.c	(revision 214576)
+++ ira-color.c	(working copy)
@@ -3239,9 +3239,10 @@
 	  ira_assert (ALLOCNO_CLASS (subloop_allocno) == rclass);
 	  ira_assert (bitmap_bit_p (subloop_node->all_allocnos,
 				    ALLOCNO_NUM (subloop_allocno)));
-	  if ((flag_ira_region == IRA_REGION_MIXED)
-	      && (loop_tree_node->reg_pressure[pclass]
-		  <= ira_class_hard_regs_num[pclass]))
+	  if ((flag_ira_region == IRA_REGION_MIXED
+	       && (loop_tree_node->reg_pressure[pclass]
+		   <= ira_class_hard_regs_num[pclass]))
+	      || regno == (int) REGNO (pic_offset_table_rtx))
 	    {
 	      if (! ALLOCNO_ASSIGNED_P (subloop_allocno))
 		{
Index: ira-emit.c
===================================================================
--- ira-emit.c	(revision 214576)
+++ ira-emit.c	(working copy)
@@ -620,7 +620,8 @@
 		  /* don't create copies because reload can spill an
 		     allocno set by copy although the allocno will not
 		     get memory slot.  */
-		  || ira_equiv_no_lvalue_p (regno)))
+		  || ira_equiv_no_lvalue_p (regno)
+		  || ALLOCNO_REGNO (allocno) == REGNO (pic_offset_table_rtx)))
 	    continue;
 	  original_reg = allocno_emit_reg (allocno);
 	  if (parent_allocno == NULL

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Enable EBX for x86 in 32bits PIC code
  2014-08-26 21:42                   ` Ilya Enkovich
  2014-08-27 20:19                     ` Vladimir Makarov
@ 2014-08-27 21:39                     ` Jeff Law
  2014-08-28  8:37                       ` Ilya Enkovich
  1 sibling, 1 reply; 49+ messages in thread
From: Jeff Law @ 2014-08-27 21:39 UTC (permalink / raw)
  To: Ilya Enkovich, Vladimir Makarov
  Cc: gcc, gcc-patches, Evgeny Stupachenko, Richard Biener, Uros Bizjak

On 08/26/14 15:42, Ilya Enkovich wrote:
> diff --git a/gcc/calls.c b/gcc/calls.c
> index 4285ec1..85dae6b 100644
> --- a/gcc/calls.c
> +++ b/gcc/calls.c
> @@ -1122,6 +1122,14 @@ initialize_argument_information (int num_actuals ATTRIBUTE_UNUSED,
>       call_expr_arg_iterator iter;
>       tree arg;
>
> +    if (targetm.calls.implicit_pic_arg (fndecl ? fndecl : fntype))
> +      {
> +	gcc_assert (pic_offset_table_rtx);
> +	args[j].tree_value = make_tree (ptr_type_node,
> +					pic_offset_table_rtx);
> +	j--;
> +      }
> +
>       if (struct_value_addr_value)
>         {
>   	args[j].tree_value = struct_value_addr_value;
So why do you need this?  Can't this be handled in the call/call_value 
expanders or what about attaching the use to CALL_INSN_FUNCTION_USAGE 
from inside ix86_expand_call?  Basically I'm not seeing the need for 
another target hook here.  I think that would significantly simply the 
patch as well.


Jeff

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Enable EBX for x86 in 32bits PIC code
  2014-08-27 20:19                     ` Vladimir Makarov
@ 2014-08-28  8:28                       ` Ilya Enkovich
  2014-08-29  6:47                         ` Ilya Enkovich
  0 siblings, 1 reply; 49+ messages in thread
From: Ilya Enkovich @ 2014-08-28  8:28 UTC (permalink / raw)
  To: Vladimir Makarov
  Cc: gcc, gcc-patches, Evgeny Stupachenko, Richard Biener,
	Uros Bizjak, Jeff Law

2014-08-28 0:19 GMT+04:00 Vladimir Makarov <vmakarov@redhat.com>:
> On 2014-08-26 5:42 PM, Ilya Enkovich wrote:
>>
>> Hi,
>>
>> Here is a patch I tried.  I apply it over revision 214215.  Unfortunately
>> I do not have a small reproducer but the problem can be easily reproduced on
>> SPEC2000 benchmark 175.vpr.  The problem is in read_arch.c:701 where float
>> value is compared with float constant 1.0.  It is inlined into read_arch
>> function and can be easily found in RTL dump of function read_arch as a
>> float comparison with 1.0 after the first call to strtod function.
>>
>> Here is a compilation string I use:
>>
>> gcc -m32 -mno-movbe -g3 -fdump-rtl-all-details -O2 -ffast-math
>> -mfpmath=sse -m32  -march=slm -fPIE -pie -c -o read_arch.o
>> -DSPEC_CPU2000        read_arch.c
>>
>> In my final assembler comparison with 1.0 looks like:
>>
>> comiss  .LC11@GOTOFF(%ebp), %xmm0       # 1101  *cmpisf_sse     [length =
>> 7]
>>
>> and %ebp here doesn't have a proper value.
>>
>> I'll try to make a smaller reproducer if these instructions don't help.
>
>
> I've managed to reproduce it.  Although it would be better to send the patch
> as an attachment.
>
> The problem is actually in IRA not LRA.  IRA splits pseudo used for PIC.
> Then in a region when a *new* pseudo used as PIC we rematerialize a constant
> which transformed in memory addressed through *original* PIC pseudo.
>
> To solve the problem we should prevent such splitting and guarantee that PIC
> pseudo allocnos in different region gets the same hard reg.
>
> The following patch should solve the problem.
>

Thanks for the patch! I'll try it and be back with results.

Ilya
>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Enable EBX for x86 in 32bits PIC code
  2014-08-27 21:39                     ` Enable EBX for x86 in 32bits PIC code Jeff Law
@ 2014-08-28  8:37                       ` Ilya Enkovich
  2014-08-28 12:43                         ` Uros Bizjak
  2014-08-29 18:56                         ` Jeff Law
  0 siblings, 2 replies; 49+ messages in thread
From: Ilya Enkovich @ 2014-08-28  8:37 UTC (permalink / raw)
  To: Jeff Law
  Cc: Vladimir Makarov, gcc, gcc-patches, Evgeny Stupachenko,
	Richard Biener, Uros Bizjak

2014-08-28 1:39 GMT+04:00 Jeff Law <law@redhat.com>:
> On 08/26/14 15:42, Ilya Enkovich wrote:
>>
>> diff --git a/gcc/calls.c b/gcc/calls.c
>> index 4285ec1..85dae6b 100644
>> --- a/gcc/calls.c
>> +++ b/gcc/calls.c
>> @@ -1122,6 +1122,14 @@ initialize_argument_information (int num_actuals
>> ATTRIBUTE_UNUSED,
>>       call_expr_arg_iterator iter;
>>       tree arg;
>>
>> +    if (targetm.calls.implicit_pic_arg (fndecl ? fndecl : fntype))
>> +      {
>> +       gcc_assert (pic_offset_table_rtx);
>> +       args[j].tree_value = make_tree (ptr_type_node,
>> +                                       pic_offset_table_rtx);
>> +       j--;
>> +      }
>> +
>>       if (struct_value_addr_value)
>>         {
>>         args[j].tree_value = struct_value_addr_value;
>
> So why do you need this?  Can't this be handled in the call/call_value
> expanders or what about attaching the use to CALL_INSN_FUNCTION_USAGE from
> inside ix86_expand_call?  Basically I'm not seeing the need for another
> target hook here.  I think that would significantly simply the patch as
> well.

GOT base address become an additional implicit arg with EBX relaxed
and I handled it as all other args. I can move EBX initialization into
ix86_expand_call. Would still need some hint from target to init
pic_offset_table_rtx with proper value in the beginning of function
expand.

Thanks,
Ilya

>
>
> Jeff

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Enable EBX for x86 in 32bits PIC code
  2014-08-28  8:37                       ` Ilya Enkovich
@ 2014-08-28 12:43                         ` Uros Bizjak
  2014-08-28 12:54                           ` Ilya Enkovich
  2014-08-29 18:56                         ` Jeff Law
  1 sibling, 1 reply; 49+ messages in thread
From: Uros Bizjak @ 2014-08-28 12:43 UTC (permalink / raw)
  To: Ilya Enkovich
  Cc: Jeff Law, Vladimir Makarov, gcc, gcc-patches, Evgeny Stupachenko,
	Richard Biener

On Thu, Aug 28, 2014 at 10:37 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
> 2014-08-28 1:39 GMT+04:00 Jeff Law <law@redhat.com>:
>> On 08/26/14 15:42, Ilya Enkovich wrote:
>>>
>>> diff --git a/gcc/calls.c b/gcc/calls.c
>>> index 4285ec1..85dae6b 100644
>>> --- a/gcc/calls.c
>>> +++ b/gcc/calls.c
>>> @@ -1122,6 +1122,14 @@ initialize_argument_information (int num_actuals
>>> ATTRIBUTE_UNUSED,
>>>       call_expr_arg_iterator iter;
>>>       tree arg;
>>>
>>> +    if (targetm.calls.implicit_pic_arg (fndecl ? fndecl : fntype))
>>> +      {
>>> +       gcc_assert (pic_offset_table_rtx);
>>> +       args[j].tree_value = make_tree (ptr_type_node,
>>> +                                       pic_offset_table_rtx);
>>> +       j--;
>>> +      }
>>> +
>>>       if (struct_value_addr_value)
>>>         {
>>>         args[j].tree_value = struct_value_addr_value;
>>
>> So why do you need this?  Can't this be handled in the call/call_value
>> expanders or what about attaching the use to CALL_INSN_FUNCTION_USAGE from
>> inside ix86_expand_call?  Basically I'm not seeing the need for another
>> target hook here.  I think that would significantly simply the patch as
>> well.
>
> GOT base address become an additional implicit arg with EBX relaxed
> and I handled it as all other args. I can move EBX initialization into
> ix86_expand_call. Would still need some hint from target to init
> pic_offset_table_rtx with proper value in the beginning of function
> expand.

Maybe you can you use get_hard_reg_initial_val for this?

Uros.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Enable EBX for x86 in 32bits PIC code
  2014-08-28 12:43                         ` Uros Bizjak
@ 2014-08-28 12:54                           ` Ilya Enkovich
  2014-08-28 13:08                             ` Uros Bizjak
  0 siblings, 1 reply; 49+ messages in thread
From: Ilya Enkovich @ 2014-08-28 12:54 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: Jeff Law, Vladimir Makarov, gcc, gcc-patches, Evgeny Stupachenko,
	Richard Biener

2014-08-28 16:42 GMT+04:00 Uros Bizjak <ubizjak@gmail.com>:
> On Thu, Aug 28, 2014 at 10:37 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>> 2014-08-28 1:39 GMT+04:00 Jeff Law <law@redhat.com>:
>>> On 08/26/14 15:42, Ilya Enkovich wrote:
>>>>
>>>> diff --git a/gcc/calls.c b/gcc/calls.c
>>>> index 4285ec1..85dae6b 100644
>>>> --- a/gcc/calls.c
>>>> +++ b/gcc/calls.c
>>>> @@ -1122,6 +1122,14 @@ initialize_argument_information (int num_actuals
>>>> ATTRIBUTE_UNUSED,
>>>>       call_expr_arg_iterator iter;
>>>>       tree arg;
>>>>
>>>> +    if (targetm.calls.implicit_pic_arg (fndecl ? fndecl : fntype))
>>>> +      {
>>>> +       gcc_assert (pic_offset_table_rtx);
>>>> +       args[j].tree_value = make_tree (ptr_type_node,
>>>> +                                       pic_offset_table_rtx);
>>>> +       j--;
>>>> +      }
>>>> +
>>>>       if (struct_value_addr_value)
>>>>         {
>>>>         args[j].tree_value = struct_value_addr_value;
>>>
>>> So why do you need this?  Can't this be handled in the call/call_value
>>> expanders or what about attaching the use to CALL_INSN_FUNCTION_USAGE from
>>> inside ix86_expand_call?  Basically I'm not seeing the need for another
>>> target hook here.  I think that would significantly simply the patch as
>>> well.
>>
>> GOT base address become an additional implicit arg with EBX relaxed
>> and I handled it as all other args. I can move EBX initialization into
>> ix86_expand_call. Would still need some hint from target to init
>> pic_offset_table_rtx with proper value in the beginning of function
>> expand.
>
> Maybe you can you use get_hard_reg_initial_val for this?

Actually there is no input hard reg holding GOT address.  Currently I
use initialization with ebx with following ebx initialization in
prolog_epilog pass.  But this is a temporary workaround.  It is
inefficient because always uses callee save reg to get GOT address.  I
suppose we should generate pseudo reg for pic_offset_table_rtx and
also set_got with this register as a destination in expand pass.
After register allocation set_got may be transformed into get_pc_thunk
call with proper hard reg.  But some target hook has to be used for
this.

Ilya

>
> Uros.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Enable EBX for x86 in 32bits PIC code
  2014-08-22 12:21         ` Enable EBX for x86 in 32bits PIC code Ilya Enkovich
                             ` (2 preceding siblings ...)
  2014-08-25 17:30           ` Jeff Law
@ 2014-08-28 13:01           ` Uros Bizjak
  2014-08-28 13:13             ` Ilya Enkovich
                               ` (2 more replies)
  2014-08-28 18:58           ` Uros Bizjak
  4 siblings, 3 replies; 49+ messages in thread
From: Uros Bizjak @ 2014-08-28 13:01 UTC (permalink / raw)
  To: Ilya Enkovich
  Cc: gcc, gcc-patches, Evgeny Stupachenko, Richard Biener, Jeff Law,
	Vladimir Makarov

On Fri, Aug 22, 2014 at 2:21 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
> Hi,
>
> On Cauldron 2014 we had a couple of talks about relaxation of ebx usage in 32bit PIC mode.  It was decided that the best approach would be to not fix ebx register, use speudo register for GOT base address and let allocator do the rest.  This should be similar to how clang and icc work with GOT base address.  I've been working for some time on such patch and now want to share my results.

+#define PIC_OFFSET_TABLE_REGNUM
         \
+  ((TARGET_64BIT && (ix86_cmodel == CM_SMALL_PIC                       \
+                     || TARGET_PECOFF))
         \
+   || !flag_pic ? INVALID_REGNUM                                       \
+   : X86_TUNE_RELAX_PIC_REG ? (pic_offset_table_rtx ? INVALID_REGNUM   \
+                              : REAL_PIC_OFFSET_TABLE_REGNUM)          \
+   : reload_completed ? REGNO (pic_offset_table_rtx)                   \
    : REAL_PIC_OFFSET_TABLE_REGNUM)

I'd like to avoid X86_TUNE_RELAX_PIC_REG and always treat EBX as an
allocatable register. This way, we can avoid all mess with implicit
xchgs in atomic_compare_and_swap<dwi>_doubleword. Also, having
allocatable EBX would allow us to introduce __builtin_cpuid builtin
and cleanup cpiud.h.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Enable EBX for x86 in 32bits PIC code
  2014-08-28 12:54                           ` Ilya Enkovich
@ 2014-08-28 13:08                             ` Uros Bizjak
  2014-08-28 13:29                               ` Ilya Enkovich
  0 siblings, 1 reply; 49+ messages in thread
From: Uros Bizjak @ 2014-08-28 13:08 UTC (permalink / raw)
  To: Ilya Enkovich
  Cc: Jeff Law, Vladimir Makarov, gcc, gcc-patches, Evgeny Stupachenko,
	Richard Biener

On Thu, Aug 28, 2014 at 2:54 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:

>>>>> diff --git a/gcc/calls.c b/gcc/calls.c
>>>>> index 4285ec1..85dae6b 100644
>>>>> --- a/gcc/calls.c
>>>>> +++ b/gcc/calls.c
>>>>> @@ -1122,6 +1122,14 @@ initialize_argument_information (int num_actuals
>>>>> ATTRIBUTE_UNUSED,
>>>>>       call_expr_arg_iterator iter;
>>>>>       tree arg;
>>>>>
>>>>> +    if (targetm.calls.implicit_pic_arg (fndecl ? fndecl : fntype))
>>>>> +      {
>>>>> +       gcc_assert (pic_offset_table_rtx);
>>>>> +       args[j].tree_value = make_tree (ptr_type_node,
>>>>> +                                       pic_offset_table_rtx);
>>>>> +       j--;
>>>>> +      }
>>>>> +
>>>>>       if (struct_value_addr_value)
>>>>>         {
>>>>>         args[j].tree_value = struct_value_addr_value;
>>>>
>>>> So why do you need this?  Can't this be handled in the call/call_value
>>>> expanders or what about attaching the use to CALL_INSN_FUNCTION_USAGE from
>>>> inside ix86_expand_call?  Basically I'm not seeing the need for another
>>>> target hook here.  I think that would significantly simply the patch as
>>>> well.
>>>
>>> GOT base address become an additional implicit arg with EBX relaxed
>>> and I handled it as all other args. I can move EBX initialization into
>>> ix86_expand_call. Would still need some hint from target to init
>>> pic_offset_table_rtx with proper value in the beginning of function
>>> expand.
>>
>> Maybe you can you use get_hard_reg_initial_val for this?
>
> Actually there is no input hard reg holding GOT address.  Currently I
> use initialization with ebx with following ebx initialization in
> prolog_epilog pass.  But this is a temporary workaround.  It is
> inefficient because always uses callee save reg to get GOT address.  I
> suppose we should generate pseudo reg for pic_offset_table_rtx and
> also set_got with this register as a destination in expand pass.
> After register allocation set_got may be transformed into get_pc_thunk
> call with proper hard reg.  But some target hook has to be used for
> this.

Let me expand my idea a bit. IIRC, get_hard_reg_initial_val and
friends will automatically emit intialization of a pseudo from
pic_offset_table_rtx hard reg. After reload, real initialization of
pic_offset_table_rtx hard reg is emitted in pro_and_epilogue pass. I
don't know if this works with current implementation of dynamic
pic_offset_table_rtx selection, though.

Uros.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Enable EBX for x86 in 32bits PIC code
  2014-08-28 13:01           ` Uros Bizjak
@ 2014-08-28 13:13             ` Ilya Enkovich
  2014-08-28 18:30             ` Florian Weimer
  2014-08-29 18:48             ` Jeff Law
  2 siblings, 0 replies; 49+ messages in thread
From: Ilya Enkovich @ 2014-08-28 13:13 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: gcc, gcc-patches, Evgeny Stupachenko, Richard Biener, Jeff Law,
	Vladimir Makarov

2014-08-28 17:01 GMT+04:00 Uros Bizjak <ubizjak@gmail.com>:
> On Fri, Aug 22, 2014 at 2:21 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>> Hi,
>>
>> On Cauldron 2014 we had a couple of talks about relaxation of ebx usage in 32bit PIC mode.  It was decided that the best approach would be to not fix ebx register, use speudo register for GOT base address and let allocator do the rest.  This should be similar to how clang and icc work with GOT base address.  I've been working for some time on such patch and now want to share my results.
>
> +#define PIC_OFFSET_TABLE_REGNUM
>          \
> +  ((TARGET_64BIT && (ix86_cmodel == CM_SMALL_PIC                       \
> +                     || TARGET_PECOFF))
>          \
> +   || !flag_pic ? INVALID_REGNUM                                       \
> +   : X86_TUNE_RELAX_PIC_REG ? (pic_offset_table_rtx ? INVALID_REGNUM   \
> +                              : REAL_PIC_OFFSET_TABLE_REGNUM)          \
> +   : reload_completed ? REGNO (pic_offset_table_rtx)                   \
>     : REAL_PIC_OFFSET_TABLE_REGNUM)
>
> I'd like to avoid X86_TUNE_RELAX_PIC_REG and always treat EBX as an
> allocatable register. This way, we can avoid all mess with implicit
> xchgs in atomic_compare_and_swap<dwi>_doubleword. Also, having
> allocatable EBX would allow us to introduce __builtin_cpuid builtin
> and cleanup cpiud.h.

We should show nice performance to have this feature enabled by
default.  Currently patch causes a set of performance losses. I have a
version of this patch where EBX is relaxed by a compiler flag, not
tune flag.

Ilya

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Enable EBX for x86 in 32bits PIC code
  2014-08-28 13:08                             ` Uros Bizjak
@ 2014-08-28 13:29                               ` Ilya Enkovich
  2014-08-28 16:25                                 ` Uros Bizjak
  0 siblings, 1 reply; 49+ messages in thread
From: Ilya Enkovich @ 2014-08-28 13:29 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: Jeff Law, Vladimir Makarov, gcc, gcc-patches, Evgeny Stupachenko,
	Richard Biener

2014-08-28 17:08 GMT+04:00 Uros Bizjak <ubizjak@gmail.com>:
> On Thu, Aug 28, 2014 at 2:54 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>
>>>>>> diff --git a/gcc/calls.c b/gcc/calls.c
>>>>>> index 4285ec1..85dae6b 100644
>>>>>> --- a/gcc/calls.c
>>>>>> +++ b/gcc/calls.c
>>>>>> @@ -1122,6 +1122,14 @@ initialize_argument_information (int num_actuals
>>>>>> ATTRIBUTE_UNUSED,
>>>>>>       call_expr_arg_iterator iter;
>>>>>>       tree arg;
>>>>>>
>>>>>> +    if (targetm.calls.implicit_pic_arg (fndecl ? fndecl : fntype))
>>>>>> +      {
>>>>>> +       gcc_assert (pic_offset_table_rtx);
>>>>>> +       args[j].tree_value = make_tree (ptr_type_node,
>>>>>> +                                       pic_offset_table_rtx);
>>>>>> +       j--;
>>>>>> +      }
>>>>>> +
>>>>>>       if (struct_value_addr_value)
>>>>>>         {
>>>>>>         args[j].tree_value = struct_value_addr_value;
>>>>>
>>>>> So why do you need this?  Can't this be handled in the call/call_value
>>>>> expanders or what about attaching the use to CALL_INSN_FUNCTION_USAGE from
>>>>> inside ix86_expand_call?  Basically I'm not seeing the need for another
>>>>> target hook here.  I think that would significantly simply the patch as
>>>>> well.
>>>>
>>>> GOT base address become an additional implicit arg with EBX relaxed
>>>> and I handled it as all other args. I can move EBX initialization into
>>>> ix86_expand_call. Would still need some hint from target to init
>>>> pic_offset_table_rtx with proper value in the beginning of function
>>>> expand.
>>>
>>> Maybe you can you use get_hard_reg_initial_val for this?
>>
>> Actually there is no input hard reg holding GOT address.  Currently I
>> use initialization with ebx with following ebx initialization in
>> prolog_epilog pass.  But this is a temporary workaround.  It is
>> inefficient because always uses callee save reg to get GOT address.  I
>> suppose we should generate pseudo reg for pic_offset_table_rtx and
>> also set_got with this register as a destination in expand pass.
>> After register allocation set_got may be transformed into get_pc_thunk
>> call with proper hard reg.  But some target hook has to be used for
>> this.
>
> Let me expand my idea a bit. IIRC, get_hard_reg_initial_val and
> friends will automatically emit intialization of a pseudo from
> pic_offset_table_rtx hard reg. After reload, real initialization of
> pic_offset_table_rtx hard reg is emitted in pro_and_epilogue pass. I
> don't know if this works with current implementation of dynamic
> pic_offset_table_rtx selection, though.

That means you should choose some hard reg early before register
allocation to be used for PIC reg initialization.  I do not like we
have to do this and want to just generate set_got with pseudo reg and
do not involve any additional hard reg. That would look like

(insn/f 168 167 169 2 (parallel [
            (set (reg:SI 127)
                (unspec:SI [
                        (const_int 0 [0])
                    ] UNSPEC_SET_GOT))
            (clobber (reg:CC 17 flags))
        ]) test.cc:42 -1
     (expr_list:REG_CFA_FLUSH_QUEUE (nil)
        (nil)))

after expand pass.  r127 is pic_offset_table_rtx here. And after
reload it would become:

(insn/f 168 167 169 2 (parallel [
            (set (reg:SI 3 bx)
                (unspec:SI [
                        (const_int 0 [0])
                    ] UNSPEC_SET_GOT))
            (clobber (reg:CC 17 flags))
        ]) test.cc:42 -1
     (expr_list:REG_CFA_FLUSH_QUEUE (nil)
        (nil)))

And no additional actions are required on pro_and_epilogue.  Also it
simplifies analysis whether we should generate set_got at all.
Current we check hard reg is ever live which is wrong with not fixed
ebx because any usage of hard reg used to init GOT doesn't mean GOT
usage.  And with my proposed scheme unused GOT would mean DCE just
removes useless set_got.

Ilya

>
> Uros.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Enable EBX for x86 in 32bits PIC code
  2014-08-28 13:29                               ` Ilya Enkovich
@ 2014-08-28 16:25                                 ` Uros Bizjak
  0 siblings, 0 replies; 49+ messages in thread
From: Uros Bizjak @ 2014-08-28 16:25 UTC (permalink / raw)
  To: Ilya Enkovich
  Cc: Jeff Law, Vladimir Makarov, gcc, gcc-patches, Evgeny Stupachenko,
	Richard Biener

On Thu, Aug 28, 2014 at 3:29 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:

>>>>>>> diff --git a/gcc/calls.c b/gcc/calls.c
>>>>>>> index 4285ec1..85dae6b 100644
>>>>>>> --- a/gcc/calls.c
>>>>>>> +++ b/gcc/calls.c
>>>>>>> @@ -1122,6 +1122,14 @@ initialize_argument_information (int num_actuals
>>>>>>> ATTRIBUTE_UNUSED,
>>>>>>>       call_expr_arg_iterator iter;
>>>>>>>       tree arg;
>>>>>>>
>>>>>>> +    if (targetm.calls.implicit_pic_arg (fndecl ? fndecl : fntype))
>>>>>>> +      {
>>>>>>> +       gcc_assert (pic_offset_table_rtx);
>>>>>>> +       args[j].tree_value = make_tree (ptr_type_node,
>>>>>>> +                                       pic_offset_table_rtx);
>>>>>>> +       j--;
>>>>>>> +      }
>>>>>>> +
>>>>>>>       if (struct_value_addr_value)
>>>>>>>         {
>>>>>>>         args[j].tree_value = struct_value_addr_value;
>>>>>>
>>>>>> So why do you need this?  Can't this be handled in the call/call_value
>>>>>> expanders or what about attaching the use to CALL_INSN_FUNCTION_USAGE from
>>>>>> inside ix86_expand_call?  Basically I'm not seeing the need for another
>>>>>> target hook here.  I think that would significantly simply the patch as
>>>>>> well.
>>>>>
>>>>> GOT base address become an additional implicit arg with EBX relaxed
>>>>> and I handled it as all other args. I can move EBX initialization into
>>>>> ix86_expand_call. Would still need some hint from target to init
>>>>> pic_offset_table_rtx with proper value in the beginning of function
>>>>> expand.
>>>>
>>>> Maybe you can you use get_hard_reg_initial_val for this?
>>>
>>> Actually there is no input hard reg holding GOT address.  Currently I
>>> use initialization with ebx with following ebx initialization in
>>> prolog_epilog pass.  But this is a temporary workaround.  It is
>>> inefficient because always uses callee save reg to get GOT address.  I
>>> suppose we should generate pseudo reg for pic_offset_table_rtx and
>>> also set_got with this register as a destination in expand pass.
>>> After register allocation set_got may be transformed into get_pc_thunk
>>> call with proper hard reg.  But some target hook has to be used for
>>> this.
>>
>> Let me expand my idea a bit. IIRC, get_hard_reg_initial_val and
>> friends will automatically emit intialization of a pseudo from
>> pic_offset_table_rtx hard reg. After reload, real initialization of
>> pic_offset_table_rtx hard reg is emitted in pro_and_epilogue pass. I
>> don't know if this works with current implementation of dynamic
>> pic_offset_table_rtx selection, though.
>
> That means you should choose some hard reg early before register
> allocation to be used for PIC reg initialization.  I do not like we
> have to do this and want to just generate set_got with pseudo reg and
> do not involve any additional hard reg. That would look like
>
> (insn/f 168 167 169 2 (parallel [
>             (set (reg:SI 127)
>                 (unspec:SI [
>                         (const_int 0 [0])
>                     ] UNSPEC_SET_GOT))
>             (clobber (reg:CC 17 flags))
>         ]) test.cc:42 -1
>      (expr_list:REG_CFA_FLUSH_QUEUE (nil)
>         (nil)))
>
> after expand pass.  r127 is pic_offset_table_rtx here. And after
> reload it would become:
>
> (insn/f 168 167 169 2 (parallel [
>             (set (reg:SI 3 bx)
>                 (unspec:SI [
>                         (const_int 0 [0])
>                     ] UNSPEC_SET_GOT))
>             (clobber (reg:CC 17 flags))
>         ]) test.cc:42 -1
>      (expr_list:REG_CFA_FLUSH_QUEUE (nil)
>         (nil)))
>
> And no additional actions are required on pro_and_epilogue.  Also it
> simplifies analysis whether we should generate set_got at all.
> Current we check hard reg is ever live which is wrong with not fixed
> ebx because any usage of hard reg used to init GOT doesn't mean GOT
> usage.  And with my proposed scheme unused GOT would mean DCE just
> removes useless set_got.

Yes this is better. I was under impression you want to retain current
initialization insertion in expand_prologue.

Uros.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Enable EBX for x86 in 32bits PIC code
  2014-08-28 13:01           ` Uros Bizjak
  2014-08-28 13:13             ` Ilya Enkovich
@ 2014-08-28 18:30             ` Florian Weimer
  2014-08-29 18:48             ` Jeff Law
  2 siblings, 0 replies; 49+ messages in thread
From: Florian Weimer @ 2014-08-28 18:30 UTC (permalink / raw)
  To: Uros Bizjak, Ilya Enkovich
  Cc: gcc, gcc-patches, Evgeny Stupachenko, Richard Biener, Jeff Law,
	Vladimir Makarov

On 08/28/2014 03:01 PM, Uros Bizjak wrote:
> I'd like to avoid X86_TUNE_RELAX_PIC_REG and always treat EBX as an
> allocatable register. This way, we can avoid all mess with implicit
> xchgs in atomic_compare_and_swap<dwi>_doubleword. Also, having
> allocatable EBX would allow us to introduce __builtin_cpuid builtin
> and cleanup cpiud.h.

It also makes writing solid inline assembly which has to use %ebx for 
some reason much easier.  We just fixed a glibc bug related to that.

-- 
Florian Weimer / Red Hat Product Security

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Enable EBX for x86 in 32bits PIC code
  2014-08-22 12:21         ` Enable EBX for x86 in 32bits PIC code Ilya Enkovich
                             ` (3 preceding siblings ...)
  2014-08-28 13:01           ` Uros Bizjak
@ 2014-08-28 18:58           ` Uros Bizjak
  2014-08-29  6:51             ` Ilya Enkovich
  2014-08-29 18:45             ` Jeff Law
  4 siblings, 2 replies; 49+ messages in thread
From: Uros Bizjak @ 2014-08-28 18:58 UTC (permalink / raw)
  To: Ilya Enkovich
  Cc: gcc, gcc-patches, Evgeny Stupachenko, Richard Biener, Jeff Law,
	Vladimir Makarov

On Fri, Aug 22, 2014 at 2:21 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:

> On Cauldron 2014 we had a couple of talks about relaxation of ebx usage in 32bit PIC mode.  It was decided that the best approach would be to not fix ebx register, use speudo register for GOT base address and let allocator do the rest.  This should be similar to how clang and icc work with GOT base address.  I've been working for some time on such patch and now want to share my results.

>  (define_insn "*pushtf"
>    [(set (match_operand:TF 0 "push_operand" "=<,<")
> -       (match_operand:TF 1 "general_no_elim_operand" "x,*roF"))]
> +       (match_operand:TF 1 "nonimmediate_no_elim_operand" "x,*roF"))]

Can you please explain the reason for this change (and a couple of
similar changes to push patterns) ?

Uros.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Enable EBX for x86 in 32bits PIC code
  2014-08-28  8:28                       ` Ilya Enkovich
@ 2014-08-29  6:47                         ` Ilya Enkovich
  2014-09-02 14:29                           ` Vladimir Makarov
  2014-09-03 20:19                           ` Vladimir Makarov
  0 siblings, 2 replies; 49+ messages in thread
From: Ilya Enkovich @ 2014-08-29  6:47 UTC (permalink / raw)
  To: Vladimir Makarov
  Cc: gcc, gcc-patches, Evgeny Stupachenko, Richard Biener,
	Uros Bizjak, Jeff Law

[-- Attachment #1: Type: text/plain, Size: 3744 bytes --]

2014-08-28 12:28 GMT+04:00 Ilya Enkovich <enkovich.gnu@gmail.com>:
> 2014-08-28 0:19 GMT+04:00 Vladimir Makarov <vmakarov@redhat.com>:
>> On 2014-08-26 5:42 PM, Ilya Enkovich wrote:
>>>
>>> Hi,
>>>
>>> Here is a patch I tried.  I apply it over revision 214215.  Unfortunately
>>> I do not have a small reproducer but the problem can be easily reproduced on
>>> SPEC2000 benchmark 175.vpr.  The problem is in read_arch.c:701 where float
>>> value is compared with float constant 1.0.  It is inlined into read_arch
>>> function and can be easily found in RTL dump of function read_arch as a
>>> float comparison with 1.0 after the first call to strtod function.
>>>
>>> Here is a compilation string I use:
>>>
>>> gcc -m32 -mno-movbe -g3 -fdump-rtl-all-details -O2 -ffast-math
>>> -mfpmath=sse -m32  -march=slm -fPIE -pie -c -o read_arch.o
>>> -DSPEC_CPU2000        read_arch.c
>>>
>>> In my final assembler comparison with 1.0 looks like:
>>>
>>> comiss  .LC11@GOTOFF(%ebp), %xmm0       # 1101  *cmpisf_sse     [length =
>>> 7]
>>>
>>> and %ebp here doesn't have a proper value.
>>>
>>> I'll try to make a smaller reproducer if these instructions don't help.
>>
>>
>> I've managed to reproduce it.  Although it would be better to send the patch
>> as an attachment.
>>
>> The problem is actually in IRA not LRA.  IRA splits pseudo used for PIC.
>> Then in a region when a *new* pseudo used as PIC we rematerialize a constant
>> which transformed in memory addressed through *original* PIC pseudo.
>>
>> To solve the problem we should prevent such splitting and guarantee that PIC
>> pseudo allocnos in different region gets the same hard reg.
>>
>> The following patch should solve the problem.
>>
>
> Thanks for the patch! I'll try it and be back with results.

Seems your patch doesn't cover all cases.  Attached is a modified
patch (with your changes included) and a test where double constant is
wrongly rematerialized.  I also see in ira dump that there is still a
copy of PIC reg created:

Initialization of original PIC reg:
(insn 23 22 24 2 (set (reg:SI 127)
        (reg:SI 3 bx)) test.cc:42 90 {*movsi_internal}
     (expr_list:REG_DEAD (reg:SI 3 bx)
        (nil)))
...
Copy is created:
(insn 135 37 25 3 (set (reg:SI 138 [127])
        (reg:SI 127)) 90 {*movsi_internal}
     (expr_list:REG_DEAD (reg:SI 127)
        (nil)))
...
Copy is used:
(insn 119 25 122 3 (set (reg:DF 134)
        (mem/u/c:DF (plus:SI (reg:SI 138 [127])
                (const:SI (unspec:SI [
                            (symbol_ref/u:SI ("*.LC0") [flags 0x2])
                        ] UNSPEC_GOTOFF))) [5  S8 A64])) 128 {*movdf_internal}
     (expr_list:REG_EQUIV (const_double:DF
2.9999999999999997371893933895137251965934410691261292e-4
[0x0.9d495182a99308p-11])
        (nil)))

After reload we have new usage of r127 which is allocated to ecx which
actually does not have any definition in this function at all.

(insn 151 42 44 4 (set (reg:SI 0 ax [147])
        (plus:SI (reg:SI 2 cx [127])
            (const:SI (unspec:SI [
                        (symbol_ref/u:SI ("*.LC0") [flags 0x2])
                    ] UNSPEC_GOTOFF)))) test.cc:44 213 {*leasi}
     (expr_list:REG_EQUAL (symbol_ref/u:SI ("*.LC0") [flags 0x2])
        (nil)))
(insn 44 151 45 4 (set (reg:DF 21 xmm0 [orig:129 D.2450 ] [129])
        (mult:DF (reg:DF 21 xmm0 [orig:128 D.2450 ] [128])
            (mem/u/c:DF (reg:SI 0 ax [147]) [5  S8 A64]))) test.cc:44
790 {*fop_df_comm_sse}
     (expr_list:REG_EQUAL (mult:DF (reg:DF 21 xmm0 [orig:128 D.2450 ] [128])
            (const_double:DF
2.9999999999999997371893933895137251965934410691261292e-4
[0x0.9d495182a99308p-11]))
        (nil)))

Compilation string: g++ -m32 -O2 -mfpmath=sse -fPIE -S test.cc

Thanks,
Ilya

>
> Ilya
>>

[-- Attachment #2: pie-2014-08-28.patch --]
[-- Type: application/octet-stream, Size: 21514 bytes --]

diff --git a/gcc/calls.c b/gcc/calls.c
index 4285ec1..85dae6b 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -1122,6 +1122,14 @@ initialize_argument_information (int num_actuals ATTRIBUTE_UNUSED,
     call_expr_arg_iterator iter;
     tree arg;
 
+    if (targetm.calls.implicit_pic_arg (fndecl ? fndecl : fntype))
+      {
+	gcc_assert (pic_offset_table_rtx);
+	args[j].tree_value = make_tree (ptr_type_node,
+					pic_offset_table_rtx);
+	j--;
+      }
+
     if (struct_value_addr_value)
       {
 	args[j].tree_value = struct_value_addr_value;
@@ -2520,6 +2528,10 @@ expand_call (tree exp, rtx target, int ignore)
     /* Treat all args as named.  */
     n_named_args = num_actuals;
 
+  /* Add implicit PIC arg.  */
+  if (targetm.calls.implicit_pic_arg (fndecl ? fndecl : fntype))
+    num_actuals++;
+
   /* Make a vector to hold all the information about each arg.  */
   args = XALLOCAVEC (struct arg_data, num_actuals);
   memset (args, 0, num_actuals * sizeof (struct arg_data));
@@ -3133,6 +3145,8 @@ expand_call (tree exp, rtx target, int ignore)
 	{
 	  int arg_nr = return_flags & ERF_RETURN_ARG_MASK;
 	  arg_nr = num_actuals - arg_nr - 1;
+	  if (targetm.calls.implicit_pic_arg (fndecl ? fndecl : fntype))
+	    arg_nr--;
 	  if (arg_nr >= 0
 	      && arg_nr < num_actuals
 	      && args[arg_nr].reg
@@ -3700,8 +3714,8 @@ emit_library_call_value_1 (int retval, rtx orgfun, rtx value,
      of the full argument passing conventions to limit complexity here since
      library functions shouldn't have many args.  */
 
-  argvec = XALLOCAVEC (struct arg, nargs + 1);
-  memset (argvec, 0, (nargs + 1) * sizeof (struct arg));
+  argvec = XALLOCAVEC (struct arg, nargs + 2);
+  memset (argvec, 0, (nargs + 2) * sizeof (struct arg));
 
 #ifdef INIT_CUMULATIVE_LIBCALL_ARGS
   INIT_CUMULATIVE_LIBCALL_ARGS (args_so_far_v, outmode, fun);
@@ -3717,6 +3731,23 @@ emit_library_call_value_1 (int retval, rtx orgfun, rtx value,
 
   push_temp_slots ();
 
+  if (targetm.calls.implicit_pic_arg (fndecl ? fndecl : fntype))
+    {
+      gcc_assert (pic_offset_table_rtx);
+
+      argvec[count].value = pic_offset_table_rtx;
+      argvec[count].mode = Pmode;
+      argvec[count].partial = 0;
+
+      argvec[count].reg = targetm.calls.function_arg (args_so_far,
+						      Pmode, NULL_TREE, true);
+
+      targetm.calls.function_arg_advance (args_so_far, Pmode, NULL_TREE, true);
+
+      count++;
+      nargs++;
+    }
+
   /* If there's a structure value address to be passed,
      either pass it in the special place, or pass it as an extra argument.  */
   if (mem_value && struct_value == 0 && ! pcc_struct_value)
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index cc4b0c7..cfafcdd 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -6133,6 +6133,21 @@ ix86_maybe_switch_abi (void)
     reinit_regs ();
 }
 
+/* Return reg in which implicit PIC base address
+   arg is passed.  */
+static rtx
+ix86_implicit_pic_arg (const_tree fntype_or_decl ATTRIBUTE_UNUSED)
+{
+  if ((TARGET_64BIT
+       && (ix86_cmodel == CM_SMALL_PIC
+	   || TARGET_PECOFF))
+      || !flag_pic
+      || !X86_TUNE_RELAX_PIC_REG)
+    return NULL_RTX;
+
+  return gen_rtx_REG (Pmode, REAL_PIC_OFFSET_TABLE_REGNUM);
+}
+
 /* Initialize a variable CUM of type CUMULATIVE_ARGS
    for a call to a function whose data type is FNTYPE.
    For a library call, FNTYPE is 0.  */
@@ -6198,6 +6213,11 @@ init_cumulative_args (CUMULATIVE_ARGS *cum,  /* Argument info to initialize */
 		      ? (!prototype_p (fntype) || stdarg_p (fntype))
 		      : !libname);
 
+  if (caller)
+    cum->implicit_pic_arg = ix86_implicit_pic_arg (fndecl ? fndecl : fntype);
+  else
+    cum->implicit_pic_arg = NULL_RTX;
+
   if (!TARGET_64BIT)
     {
       /* If there are variable arguments, then we won't pass anything
@@ -7291,7 +7311,9 @@ ix86_function_arg_advance (cumulative_args_t cum_v, enum machine_mode mode,
   if (type)
     mode = type_natural_mode (type, NULL, false);
 
-  if (TARGET_64BIT && (cum ? cum->call_abi : ix86_abi) == MS_ABI)
+  if (cum->implicit_pic_arg)
+    cum->implicit_pic_arg = NULL_RTX;
+  else if (TARGET_64BIT && (cum ? cum->call_abi : ix86_abi) == MS_ABI)
     function_arg_advance_ms_64 (cum, bytes, words);
   else if (TARGET_64BIT)
     function_arg_advance_64 (cum, mode, type, words, named);
@@ -7542,7 +7564,9 @@ ix86_function_arg (cumulative_args_t cum_v, enum machine_mode omode,
   if (type && TREE_CODE (type) == VECTOR_TYPE)
     mode = type_natural_mode (type, cum, false);
 
-  if (TARGET_64BIT && (cum ? cum->call_abi : ix86_abi) == MS_ABI)
+  if (cum->implicit_pic_arg)
+    arg = cum->implicit_pic_arg;
+  else if (TARGET_64BIT && (cum ? cum->call_abi : ix86_abi) == MS_ABI)
     arg = function_arg_ms_64 (cum, mode, omode, named, bytes);
   else if (TARGET_64BIT)
     arg = function_arg_64 (cum, mode, omode, type, named);
@@ -9373,6 +9397,9 @@ gen_pop (rtx arg)
 static unsigned int
 ix86_select_alt_pic_regnum (void)
 {
+  if (ix86_implicit_pic_arg (NULL))
+    return INVALID_REGNUM;
+
   if (crtl->is_leaf
       && !crtl->profile
       && !ix86_current_function_calls_tls_descriptor)
@@ -11236,7 +11263,8 @@ ix86_expand_prologue (void)
 	}
       else
 	{
-          insn = emit_insn (gen_set_got (pic_offset_table_rtx));
+	  rtx reg = gen_rtx_REG (Pmode, REAL_PIC_OFFSET_TABLE_REGNUM);
+          insn = emit_insn (gen_set_got (reg));
 	  RTX_FRAME_RELATED_P (insn) = 1;
 	  add_reg_note (insn, REG_CFA_FLUSH_QUEUE, NULL_RTX);
 	}
@@ -11789,7 +11817,8 @@ ix86_expand_epilogue (int style)
 static void
 ix86_output_function_epilogue (FILE *file ATTRIBUTE_UNUSED, HOST_WIDE_INT)
 {
-  if (pic_offset_table_rtx)
+  if (pic_offset_table_rtx
+      && REGNO (pic_offset_table_rtx) < FIRST_PSEUDO_REGISTER)
     SET_REGNO (pic_offset_table_rtx, REAL_PIC_OFFSET_TABLE_REGNUM);
 #if TARGET_MACHO
   /* Mach-O doesn't support labels at the end of objects, so if
@@ -13107,6 +13136,15 @@ ix86_GOT_alias_set (void)
   return set;
 }
 
+/* Set regs_ever_live for PIC base address register
+   to true if required.  */
+static void
+set_pic_reg_ever_alive ()
+{
+  if (reload_in_progress)
+    df_set_regs_ever_live (REGNO (pic_offset_table_rtx), true);
+}
+
 /* Return a legitimate reference for ORIG (an address) using the
    register REG.  If REG is 0, a new pseudo is generated.
 
@@ -13157,8 +13195,7 @@ legitimize_pic_address (rtx orig, rtx reg)
       /* This symbol may be referenced via a displacement from the PIC
 	 base address (@GOTOFF).  */
 
-      if (reload_in_progress)
-	df_set_regs_ever_live (PIC_OFFSET_TABLE_REGNUM, true);
+      set_pic_reg_ever_alive ();
       if (GET_CODE (addr) == CONST)
 	addr = XEXP (addr, 0);
       if (GET_CODE (addr) == PLUS)
@@ -13190,8 +13227,7 @@ legitimize_pic_address (rtx orig, rtx reg)
       /* This symbol may be referenced via a displacement from the PIC
 	 base address (@GOTOFF).  */
 
-      if (reload_in_progress)
-	df_set_regs_ever_live (PIC_OFFSET_TABLE_REGNUM, true);
+      set_pic_reg_ever_alive ();
       if (GET_CODE (addr) == CONST)
 	addr = XEXP (addr, 0);
       if (GET_CODE (addr) == PLUS)
@@ -13252,8 +13288,7 @@ legitimize_pic_address (rtx orig, rtx reg)
 	  /* This symbol must be referenced via a load from the
 	     Global Offset Table (@GOT).  */
 
-	  if (reload_in_progress)
-	    df_set_regs_ever_live (PIC_OFFSET_TABLE_REGNUM, true);
+	  set_pic_reg_ever_alive ();
 	  new_rtx = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, addr), UNSPEC_GOT);
 	  new_rtx = gen_rtx_CONST (Pmode, new_rtx);
 	  if (TARGET_64BIT)
@@ -13305,8 +13340,7 @@ legitimize_pic_address (rtx orig, rtx reg)
 	    {
 	      if (!TARGET_64BIT)
 		{
-		  if (reload_in_progress)
-		    df_set_regs_ever_live (PIC_OFFSET_TABLE_REGNUM, true);
+		  set_pic_reg_ever_alive ();
 		  new_rtx = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, op0),
 					    UNSPEC_GOTOFF);
 		  new_rtx = gen_rtx_PLUS (Pmode, new_rtx, op1);
@@ -13601,8 +13635,7 @@ legitimize_tls_address (rtx x, enum tls_model model, bool for_mov)
 	}
       else if (flag_pic)
 	{
-	  if (reload_in_progress)
-	    df_set_regs_ever_live (PIC_OFFSET_TABLE_REGNUM, true);
+	  set_pic_reg_ever_alive ();
 	  pic = pic_offset_table_rtx;
 	  type = TARGET_ANY_GNU_TLS ? UNSPEC_GOTNTPOFF : UNSPEC_GOTTPOFF;
 	}
@@ -14233,6 +14266,8 @@ ix86_pic_register_p (rtx x)
   if (GET_CODE (x) == VALUE && CSELIB_VAL_PTR (x))
     return (pic_offset_table_rtx
 	    && rtx_equal_for_cselib_p (x, pic_offset_table_rtx));
+  else if (pic_offset_table_rtx)
+    return REG_P (x) && REGNO (x) == REGNO (pic_offset_table_rtx);
   else
     return REG_P (x) && REGNO (x) == PIC_OFFSET_TABLE_REGNUM;
 }
@@ -14408,7 +14443,9 @@ ix86_delegitimize_address (rtx x)
 	 ...
 	 movl foo@GOTOFF(%ecx), %edx
 	 in which case we return (%ecx - %ebx) + foo.  */
-      if (pic_offset_table_rtx)
+      if (pic_offset_table_rtx
+	  && (!reload_completed
+	      || REGNO (pic_offset_table_rtx) < FIRST_PSEUDO_REGISTER))
         result = gen_rtx_PLUS (Pmode, gen_rtx_MINUS (Pmode, copy_rtx (addend),
 						     pic_offset_table_rtx),
 			       result);
@@ -24915,7 +24952,7 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1,
 		  && DEFAULT_ABI != MS_ABI))
 	  && GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF
 	  && ! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0)))
-	use_reg (&use, pic_offset_table_rtx);
+	use_reg (&use, gen_rtx_REG (Pmode, REAL_PIC_OFFSET_TABLE_REGNUM));
     }
 
   if (TARGET_64BIT && INTVAL (callarg2) >= 0)
@@ -47228,6 +47265,8 @@ ix86_atomic_assign_expand_fenv (tree *hold, tree *clear, tree *update)
 #define TARGET_FUNCTION_ARG_ADVANCE ix86_function_arg_advance
 #undef TARGET_FUNCTION_ARG
 #define TARGET_FUNCTION_ARG ix86_function_arg
+#undef TARGET_IMPLICIT_PIC_ARG
+#define TARGET_IMPLICIT_PIC_ARG ix86_implicit_pic_arg
 #undef TARGET_FUNCTION_ARG_BOUNDARY
 #define TARGET_FUNCTION_ARG_BOUNDARY ix86_function_arg_boundary
 #undef TARGET_PASS_BY_REFERENCE
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 2c64162..d5fa250 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -1243,11 +1243,13 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
 
 #define REAL_PIC_OFFSET_TABLE_REGNUM  BX_REG
 
-#define PIC_OFFSET_TABLE_REGNUM				\
-  ((TARGET_64BIT && (ix86_cmodel == CM_SMALL_PIC	\
-                     || TARGET_PECOFF))		\
-   || !flag_pic ? INVALID_REGNUM			\
-   : reload_completed ? REGNO (pic_offset_table_rtx)	\
+#define PIC_OFFSET_TABLE_REGNUM						\
+  ((TARGET_64BIT && (ix86_cmodel == CM_SMALL_PIC			\
+                     || TARGET_PECOFF))					\
+   || !flag_pic ? INVALID_REGNUM					\
+   : X86_TUNE_RELAX_PIC_REG ? (pic_offset_table_rtx ? INVALID_REGNUM	\
+			       : REAL_PIC_OFFSET_TABLE_REGNUM)		\
+   : reload_completed ? REGNO (pic_offset_table_rtx)			\
    : REAL_PIC_OFFSET_TABLE_REGNUM)
 
 #define GOT_SYMBOL_NAME "_GLOBAL_OFFSET_TABLE_"
@@ -1652,6 +1654,7 @@ typedef struct ix86_args {
   int float_in_sse;		/* Set to 1 or 2 for 32bit targets if
 				   SFmode/DFmode arguments should be passed
 				   in SSE registers.  Otherwise 0.  */
+  rtx implicit_pic_arg;         /* Implicit PIC base address arg if passed.  */
   enum calling_abi call_abi;	/* Set to SYSV_ABI for sysv abi. Otherwise
  				   MS_ABI for ms abi.  */
 } CUMULATIVE_ARGS;
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 8e74eab..27028ba 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -2725,7 +2725,7 @@
 
 (define_insn "*pushtf"
   [(set (match_operand:TF 0 "push_operand" "=<,<")
-	(match_operand:TF 1 "general_no_elim_operand" "x,*roF"))]
+	(match_operand:TF 1 "nonimmediate_no_elim_operand" "x,*roF"))]
   "TARGET_64BIT || TARGET_SSE"
 {
   /* This insn should be already split before reg-stack.  */
@@ -2750,7 +2750,7 @@
 
 (define_insn "*pushxf"
   [(set (match_operand:XF 0 "push_operand" "=<,<")
-	(match_operand:XF 1 "general_no_elim_operand" "f,Yx*roF"))]
+	(match_operand:XF 1 "nonimmediate_no_elim_operand" "f,Yx*roF"))]
   ""
 {
   /* This insn should be already split before reg-stack.  */
@@ -2781,7 +2781,7 @@
 
 (define_insn "*pushdf"
   [(set (match_operand:DF 0 "push_operand" "=<,<,<,<")
-	(match_operand:DF 1 "general_no_elim_operand" "f,Yd*roF,rmF,x"))]
+	(match_operand:DF 1 "nonimmediate_no_elim_operand" "f,Yd*roF,rmF,x"))]
   ""
 {
   /* This insn should be already split before reg-stack.  */
diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
index 62970be..56eca24 100644
--- a/gcc/config/i386/predicates.md
+++ b/gcc/config/i386/predicates.md
@@ -580,6 +580,12 @@
     (match_operand 0 "register_no_elim_operand")
     (match_operand 0 "general_operand")))
 
+;; Return false if this is any eliminable register.  Otherwise nonimmediate_operand.
+(define_predicate "nonimmediate_no_elim_operand"
+  (if_then_else (match_code "reg,subreg")
+    (match_operand 0 "register_no_elim_operand")
+    (match_operand 0 "nonimmediate_operand")))
+
 ;; Return false if this is any eliminable register.  Otherwise
 ;; register_operand or a constant.
 (define_predicate "nonmemory_no_elim_operand"
diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
index 215c63c..ffb7a2d 100644
--- a/gcc/config/i386/x86-tune.def
+++ b/gcc/config/i386/x86-tune.def
@@ -537,3 +537,6 @@ DEF_TUNE (X86_TUNE_PROMOTE_QI_REGS, "promote_qi_regs", 0)
    unrolling small loop less important. For, such architectures we adjust
    the unroll factor so that the unrolled loop fits the loop buffer.  */
 DEF_TUNE (X86_TUNE_ADJUST_UNROLL, "adjust_unroll_factor", m_BDVER3 | m_BDVER4)
+
+/* X86_TUNE_RELAX_PIC_REG: Do not fix hard register for GOT base usage.  */
+DEF_TUNE (X86_TUNE_RELAX_PIC_REG, "relax_pic_reg", ~0)
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 9dd8d68..33b36be 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -3967,6 +3967,12 @@ If @code{TARGET_FUNCTION_INCOMING_ARG} is not defined,
 @code{TARGET_FUNCTION_ARG} serves both purposes.
 @end deftypefn
 
+@deftypefn {Target Hook} rtx TARGET_IMPLICIT_PIC_ARG (const_tree @var{fntype_or_decl})
+This hook returns register holding PIC base address for functions
+which do not fix hard register but handle it similar to function arg
+assigning a virtual reg for it.
+@end deftypefn
+
 @deftypefn {Target Hook} int TARGET_ARG_PARTIAL_BYTES (cumulative_args_t @var{cum}, enum machine_mode @var{mode}, tree @var{type}, bool @var{named})
 This target hook returns the number of bytes at the beginning of an
 argument that must be put in registers.  The value must be zero for
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index dd72b98..3e6da2f 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -3413,6 +3413,8 @@ the stack.
 
 @hook TARGET_FUNCTION_INCOMING_ARG
 
+@hook TARGET_IMPLICIT_PIC_ARG
+
 @hook TARGET_ARG_PARTIAL_BYTES
 
 @hook TARGET_PASS_BY_REFERENCE
diff --git a/gcc/function.c b/gcc/function.c
index 8156766..3a85c16 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -3456,6 +3456,15 @@ assign_parms (tree fndecl)
 
   fnargs.release ();
 
+  /* Handle implicit PIC arg if any.  */
+  if (targetm.calls.implicit_pic_arg (fndecl))
+    {
+      rtx old_reg = targetm.calls.implicit_pic_arg (fndecl);
+      rtx new_reg = gen_reg_rtx (GET_MODE (old_reg));
+      emit_move_insn (new_reg, old_reg);
+      pic_offset_table_rtx = new_reg;
+    }
+
   /* Output all parameter conversion instructions (possibly including calls)
      now that all parameters have been copied out of hard registers.  */
   emit_insn (all.first_conversion_insn);
diff --git a/gcc/hooks.c b/gcc/hooks.c
index 5c06562..47784e2 100644
--- a/gcc/hooks.c
+++ b/gcc/hooks.c
@@ -352,6 +352,13 @@ hook_rtx_rtx_null (rtx x ATTRIBUTE_UNUSED)
   return NULL;
 }
 
+/* Generic hook that takes a const_tree arg and returns NULL_RTX.  */
+rtx
+hook_rtx_const_tree_null (const_tree a ATTRIBUTE_UNUSED)
+{
+  return NULL;
+}
+
 /* Generic hook that takes a tree and an int and returns NULL_RTX.  */
 rtx
 hook_rtx_tree_int_null (tree a ATTRIBUTE_UNUSED, int b ATTRIBUTE_UNUSED)
diff --git a/gcc/hooks.h b/gcc/hooks.h
index ba42b6c..cf830ef 100644
--- a/gcc/hooks.h
+++ b/gcc/hooks.h
@@ -100,6 +100,7 @@ extern bool default_can_output_mi_thunk_no_vcall (const_tree, HOST_WIDE_INT,
 
 extern rtx hook_rtx_rtx_identity (rtx);
 extern rtx hook_rtx_rtx_null (rtx);
+extern rtx hook_rtx_const_tree_null (const_tree);
 extern rtx hook_rtx_tree_int_null (tree, int);
 
 extern const char *hook_constcharptr_void_null (void);
diff --git a/gcc/ira-color.c b/gcc/ira-color.c
index 36c3c87..493670c 100644
--- a/gcc/ira-color.c
+++ b/gcc/ira-color.c
@@ -3239,9 +3239,11 @@ color_pass (ira_loop_tree_node_t loop_tree_node)
 	  ira_assert (ALLOCNO_CLASS (subloop_allocno) == rclass);
 	  ira_assert (bitmap_bit_p (subloop_node->all_allocnos,
 				    ALLOCNO_NUM (subloop_allocno)));
-	  if ((flag_ira_region == IRA_REGION_MIXED)
-	      && (loop_tree_node->reg_pressure[pclass]
-		  <= ira_class_hard_regs_num[pclass]))
+	  if ((flag_ira_region == IRA_REGION_MIXED
+	       && (loop_tree_node->reg_pressure[pclass]
+		   <= ira_class_hard_regs_num[pclass]))
+	      || (pic_offset_table_rtx
+		  && regno == (int) REGNO (pic_offset_table_rtx)))
 	    {
 	      if (! ALLOCNO_ASSIGNED_P (subloop_allocno))
 		{
diff --git a/gcc/ira-emit.c b/gcc/ira-emit.c
index 71dc6bc..6833e67 100644
--- a/gcc/ira-emit.c
+++ b/gcc/ira-emit.c
@@ -610,7 +610,9 @@ change_loop (ira_loop_tree_node_t node)
 		  /* don't create copies because reload can spill an
 		     allocno set by copy although the allocno will not
 		     get memory slot.  */
-		  || ira_equiv_no_lvalue_p (regno)))
+		  || ira_equiv_no_lvalue_p (regno)
+		  || (pic_offset_table_rtx
+		      && ALLOCNO_REGNO (allocno) == REGNO (pic_offset_table_rtx))))
 	    continue;
 	  original_reg = allocno_emit_reg (allocno);
 	  if (parent_allocno == NULL
diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index a43f8dc..253934b 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -4017,7 +4017,11 @@ lra_constraints (bool first_p)
       ("Maximum number of LRA constraint passes is achieved (%d)\n",
        LRA_MAX_CONSTRAINT_ITERATION_NUMBER);
   changed_p = false;
-  lra_risky_transformations_p = false;
+  if (pic_offset_table_rtx
+      && REGNO (pic_offset_table_rtx) >= FIRST_PSEUDO_REGISTER)
+    lra_risky_transformations_p = true;
+  else
+    lra_risky_transformations_p = false;
   new_insn_uid_start = get_max_uid ();
   new_regno_start = first_p ? lra_constraint_new_regno_start : max_reg_num ();
   /* Mark used hard regs for target stack size calulations.  */
diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index bc16437..1cd7ea3 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -110,7 +110,8 @@ rtx_unstable_p (const_rtx x)
       /* ??? When call-clobbered, the value is stable modulo the restore
 	 that must happen after a call.  This currently screws up local-alloc
 	 into believing that the restore is not needed.  */
-      if (!PIC_OFFSET_TABLE_REG_CALL_CLOBBERED && x == pic_offset_table_rtx)
+      if (!PIC_OFFSET_TABLE_REG_CALL_CLOBBERED && x == pic_offset_table_rtx
+	  && REGNO (pic_offset_table_rtx) < FIRST_PSEUDO_REGISTER)
 	return 0;
       return 1;
 
@@ -185,7 +186,9 @@ rtx_varies_p (const_rtx x, bool for_alias)
 	     that must happen after a call.  This currently screws up
 	     local-alloc into believing that the restore is not needed, so we
 	     must return 0 only if we are called from alias analysis.  */
-	  && (!PIC_OFFSET_TABLE_REG_CALL_CLOBBERED || for_alias))
+	  && ((!PIC_OFFSET_TABLE_REG_CALL_CLOBBERED
+	       && REGNO (pic_offset_table_rtx) < FIRST_PSEUDO_REGISTER)
+	      || for_alias))
 	return 0;
       return 1;
 
diff --git a/gcc/shrink-wrap.c b/gcc/shrink-wrap.c
index 5c34fee..50de8d5 100644
--- a/gcc/shrink-wrap.c
+++ b/gcc/shrink-wrap.c
@@ -448,7 +448,7 @@ try_shrink_wrapping (edge *entry_edge, edge orig_entry_edge,
     {
       HARD_REG_SET prologue_clobbered, prologue_used, live_on_edge;
       struct hard_reg_set_container set_up_by_prologue;
-      rtx p_insn;
+      rtx p_insn, reg;
       vec<basic_block> vec;
       basic_block bb;
       bitmap_head bb_antic_flags;
@@ -494,9 +494,13 @@ try_shrink_wrapping (edge *entry_edge, edge orig_entry_edge,
       if (frame_pointer_needed)
 	add_to_hard_reg_set (&set_up_by_prologue.set, Pmode,
 			     HARD_FRAME_POINTER_REGNUM);
-      if (pic_offset_table_rtx)
+      if (pic_offset_table_rtx
+	  && PIC_OFFSET_TABLE_REGNUM != INVALID_REGNUM)
 	add_to_hard_reg_set (&set_up_by_prologue.set, Pmode,
 			     PIC_OFFSET_TABLE_REGNUM);
+      if ((reg = targetm.calls.implicit_pic_arg (current_function_decl)))
+	add_to_hard_reg_set (&set_up_by_prologue.set,
+			     Pmode, REGNO (reg));
       if (crtl->drap_reg)
 	add_to_hard_reg_set (&set_up_by_prologue.set,
 			     GET_MODE (crtl->drap_reg),
diff --git a/gcc/target.def b/gcc/target.def
index 3a41db1..5c221b6 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -3976,6 +3976,14 @@ If @code{TARGET_FUNCTION_INCOMING_ARG} is not defined,\n\
  default_function_incoming_arg)
 
 DEFHOOK
+(implicit_pic_arg,
+ "This hook returns register holding PIC base address for functions\n\
+which do not fix hard register but handle it similar to function arg\n\
+assigning a virtual reg for it.",
+ rtx, (const_tree fntype_or_decl),
+ hook_rtx_const_tree_null)
+
+DEFHOOK
 (function_arg_boundary,
  "This hook returns the alignment boundary, in bits, of an argument\n\
 with the specified mode and type.  The default hook returns\n\

[-- Attachment #3: test.cc --]
[-- Type: application/octet-stream, Size: 611 bytes --]

extern long int my_rand();

template <class T>
class array {
public:
  int push(T item);

private:
  T *data;
  int used;
  int size;
};

template <class T>
int array<T>::push(T item)
{
  if (used == size) {
    size *= 2;
    T *temp = data;
    data = new T[size];
    for (int i = 0; i < used; i++)
      data[i] = temp[i];
    delete [] temp;
  }
  data[used++] = item;
  return 1;
}

class sample
{
public:
  void test();

protected:
  int n;
  double delta;
  array<double> data;
};


void sample::test()
{
  for (int i = 0; i < n; i++)
    if (!data.push((double)i + my_rand() * 0.0003))
      return;
}

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Enable EBX for x86 in 32bits PIC code
  2014-08-28 18:58           ` Uros Bizjak
@ 2014-08-29  6:51             ` Ilya Enkovich
  2014-08-29 18:45             ` Jeff Law
  1 sibling, 0 replies; 49+ messages in thread
From: Ilya Enkovich @ 2014-08-29  6:51 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: gcc, gcc-patches, Evgeny Stupachenko, Richard Biener, Jeff Law,
	Vladimir Makarov

2014-08-28 22:58 GMT+04:00 Uros Bizjak <ubizjak@gmail.com>:
> On Fri, Aug 22, 2014 at 2:21 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>
>> On Cauldron 2014 we had a couple of talks about relaxation of ebx usage in 32bit PIC mode.  It was decided that the best approach would be to not fix ebx register, use speudo register for GOT base address and let allocator do the rest.  This should be similar to how clang and icc work with GOT base address.  I've been working for some time on such patch and now want to share my results.
>
>>  (define_insn "*pushtf"
>>    [(set (match_operand:TF 0 "push_operand" "=<,<")
>> -       (match_operand:TF 1 "general_no_elim_operand" "x,*roF"))]
>> +       (match_operand:TF 1 "nonimmediate_no_elim_operand" "x,*roF"))]
>
> Can you please explain the reason for this change (and a couple of
> similar changes to push patterns) ?

This is a workaround for stability problem with reload.  Immediate
operands cause new usages of pseudo PIC register in reload which leads
to wrong registers allocation.  These changes wouldn't be required
after reload issue if resolved.

Ilya

>
> Uros.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Enable EBX for x86 in 32bits PIC code
  2014-08-28 18:58           ` Uros Bizjak
  2014-08-29  6:51             ` Ilya Enkovich
@ 2014-08-29 18:45             ` Jeff Law
  1 sibling, 0 replies; 49+ messages in thread
From: Jeff Law @ 2014-08-29 18:45 UTC (permalink / raw)
  To: Uros Bizjak, Ilya Enkovich; +Cc: gcc, gcc-patches

On 08/28/14 12:58, Uros Bizjak wrote:
> On Fri, Aug 22, 2014 at 2:21 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>
>> On Cauldron 2014 we had a couple of talks about relaxation of ebx usage in 32bit PIC mode.  It was decided that the best approach would be to not fix ebx register, use speudo register for GOT base address and let allocator do the rest.  This should be similar to how clang and icc work with GOT base address.  I've been working for some time on such patch and now want to share my results.
>
>>   (define_insn "*pushtf"
>>     [(set (match_operand:TF 0 "push_operand" "=<,<")
>> -       (match_operand:TF 1 "general_no_elim_operand" "x,*roF"))]
>> +       (match_operand:TF 1 "nonimmediate_no_elim_operand" "x,*roF"))]
>
> Can you please explain the reason for this change (and a couple of
> similar changes to push patterns) ?
I'd recommend dropping them from the WIP postings.

jeff

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Enable EBX for x86 in 32bits PIC code
  2014-08-28 13:01           ` Uros Bizjak
  2014-08-28 13:13             ` Ilya Enkovich
  2014-08-28 18:30             ` Florian Weimer
@ 2014-08-29 18:48             ` Jeff Law
  2 siblings, 0 replies; 49+ messages in thread
From: Jeff Law @ 2014-08-29 18:48 UTC (permalink / raw)
  To: Uros Bizjak, Ilya Enkovich
  Cc: gcc, gcc-patches, Evgeny Stupachenko, Richard Biener, Vladimir Makarov

On 08/28/14 07:01, Uros Bizjak wrote:
> On Fri, Aug 22, 2014 at 2:21 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>> Hi,
>>
>> On Cauldron 2014 we had a couple of talks about relaxation of ebx usage in 32bit PIC mode.  It was decided that the best approach would be to not fix ebx register, use speudo register for GOT base address and let allocator do the rest.  This should be similar to how clang and icc work with GOT base address.  I've been working for some time on such patch and now want to share my results.
>
> +#define PIC_OFFSET_TABLE_REGNUM
>           \
> +  ((TARGET_64BIT && (ix86_cmodel == CM_SMALL_PIC                       \
> +                     || TARGET_PECOFF))
>           \
> +   || !flag_pic ? INVALID_REGNUM                                       \
> +   : X86_TUNE_RELAX_PIC_REG ? (pic_offset_table_rtx ? INVALID_REGNUM   \
> +                              : REAL_PIC_OFFSET_TABLE_REGNUM)          \
> +   : reload_completed ? REGNO (pic_offset_table_rtx)                   \
>      : REAL_PIC_OFFSET_TABLE_REGNUM)
>
> I'd like to avoid X86_TUNE_RELAX_PIC_REG and always treat EBX as an
> allocatable register. This way, we can avoid all mess with implicit
> xchgs in atomic_compare_and_swap<dwi>_doubleword. Also, having
> allocatable EBX would allow us to introduce __builtin_cpuid builtin
> and cleanup cpiud.h.
I think for the initial WIP patch it was fine.  However I think we all 
agree that we want EBX as an allocatable register without any special 
conditions.  So I'd recommend pulling this out of the WIP patches as well.
Jeff

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Enable EBX for x86 in 32bits PIC code
  2014-08-28  8:37                       ` Ilya Enkovich
  2014-08-28 12:43                         ` Uros Bizjak
@ 2014-08-29 18:56                         ` Jeff Law
  1 sibling, 0 replies; 49+ messages in thread
From: Jeff Law @ 2014-08-29 18:56 UTC (permalink / raw)
  To: Ilya Enkovich
  Cc: Vladimir Makarov, gcc, gcc-patches, Evgeny Stupachenko,
	Richard Biener, Uros Bizjak

On 08/28/14 02:37, Ilya Enkovich wrote:
> 2014-08-28 1:39 GMT+04:00 Jeff Law <law@redhat.com>:
>> On 08/26/14 15:42, Ilya Enkovich wrote:
>>>
>>> diff --git a/gcc/calls.c b/gcc/calls.c
>>> index 4285ec1..85dae6b 100644
>>> --- a/gcc/calls.c
>>> +++ b/gcc/calls.c
>>> @@ -1122,6 +1122,14 @@ initialize_argument_information (int num_actuals
>>> ATTRIBUTE_UNUSED,
>>>        call_expr_arg_iterator iter;
>>>        tree arg;
>>>
>>> +    if (targetm.calls.implicit_pic_arg (fndecl ? fndecl : fntype))
>>> +      {
>>> +       gcc_assert (pic_offset_table_rtx);
>>> +       args[j].tree_value = make_tree (ptr_type_node,
>>> +                                       pic_offset_table_rtx);
>>> +       j--;
>>> +      }
>>> +
>>>        if (struct_value_addr_value)
>>>          {
>>>          args[j].tree_value = struct_value_addr_value;
>>
>> So why do you need this?  Can't this be handled in the call/call_value
>> expanders or what about attaching the use to CALL_INSN_FUNCTION_USAGE from
>> inside ix86_expand_call?  Basically I'm not seeing the need for another
>> target hook here.  I think that would significantly simply the patch as
>> well.
>
> GOT base address become an additional implicit arg with EBX relaxed
> and I handled it as all other args. I can move EBX initialization into
> ix86_expand_call. Would still need some hint from target to init
> pic_offset_table_rtx with proper value in the beginning of function
> expand.
It doesn't really need to be an argument in the traditional sense and 
adding it just complicates things with a target implementation detail as 
far as I can see.

I think you'll find that if you have the call pattern emit a copy from 
pic_offset_table_rtx into EBX and attach of use of EBX to the call then 
most of the code you've written to add the implicit argument just 
disappears.

jeff

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Enable EBX for x86 in 32bits PIC code
  2014-08-29  6:47                         ` Ilya Enkovich
@ 2014-09-02 14:29                           ` Vladimir Makarov
  2014-09-03 20:19                           ` Vladimir Makarov
  1 sibling, 0 replies; 49+ messages in thread
From: Vladimir Makarov @ 2014-09-02 14:29 UTC (permalink / raw)
  To: Ilya Enkovich
  Cc: gcc-patches, Evgeny Stupachenko, Richard Biener, Uros Bizjak, Jeff Law

On 08/29/2014 02:47 AM, Ilya Enkovich wrote:
> Seems your patch doesn't cover all cases.  Attached is a modified
> patch (with your changes included) and a test where double constant is
> wrongly rematerialized.  I also see in ira dump that there is still a
> copy of PIC reg created:
>
> Initialization of original PIC reg:
> (insn 23 22 24 2 (set (reg:SI 127)
>         (reg:SI 3 bx)) test.cc:42 90 {*movsi_internal}
>      (expr_list:REG_DEAD (reg:SI 3 bx)
>         (nil)))
> ...
> Copy is created:
> (insn 135 37 25 3 (set (reg:SI 138 [127])
>         (reg:SI 127)) 90 {*movsi_internal}
>      (expr_list:REG_DEAD (reg:SI 127)
>         (nil)))
> ...
> Copy is used:
> (insn 119 25 122 3 (set (reg:DF 134)
>         (mem/u/c:DF (plus:SI (reg:SI 138 [127])
>                 (const:SI (unspec:SI [
>                             (symbol_ref/u:SI ("*.LC0") [flags 0x2])
>                         ] UNSPEC_GOTOFF))) [5  S8 A64])) 128 {*movdf_internal}
>      (expr_list:REG_EQUIV (const_double:DF
> 2.9999999999999997371893933895137251965934410691261292e-4
> [0x0.9d495182a99308p-11])
>         (nil)))
>
> After reload we have new usage of r127 which is allocated to ecx which
> actually does not have any definition in this function at all.
>
> (insn 151 42 44 4 (set (reg:SI 0 ax [147])
>         (plus:SI (reg:SI 2 cx [127])
>             (const:SI (unspec:SI [
>                         (symbol_ref/u:SI ("*.LC0") [flags 0x2])
>                     ] UNSPEC_GOTOFF)))) test.cc:44 213 {*leasi}
>      (expr_list:REG_EQUAL (symbol_ref/u:SI ("*.LC0") [flags 0x2])
>         (nil)))
> (insn 44 151 45 4 (set (reg:DF 21 xmm0 [orig:129 D.2450 ] [129])
>         (mult:DF (reg:DF 21 xmm0 [orig:128 D.2450 ] [128])
>             (mem/u/c:DF (reg:SI 0 ax [147]) [5  S8 A64]))) test.cc:44
> 790 {*fop_df_comm_sse}
>      (expr_list:REG_EQUAL (mult:DF (reg:DF 21 xmm0 [orig:128 D.2450 ] [128])
>             (const_double:DF
> 2.9999999999999997371893933895137251965934410691261292e-4
> [0x0.9d495182a99308p-11]))
>         (nil)))
>
> Compilation string: g++ -m32 -O2 -mfpmath=sse -fPIE -S test.cc
>
>
Ok, Ilya.  I'll look at the problem this week.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Enable EBX for x86 in 32bits PIC code
  2014-08-29  6:47                         ` Ilya Enkovich
  2014-09-02 14:29                           ` Vladimir Makarov
@ 2014-09-03 20:19                           ` Vladimir Makarov
       [not found]                             ` <0EFAB2BDD0F67E4FB6CCC8B9F87D756969B3A89D@IRSMSX101.ger.corp.intel.com>
  2014-09-23 13:54                             ` Ilya Enkovich
  1 sibling, 2 replies; 49+ messages in thread
From: Vladimir Makarov @ 2014-09-03 20:19 UTC (permalink / raw)
  To: Ilya Enkovich
  Cc: gcc, gcc-patches, Evgeny Stupachenko, Richard Biener,
	Uros Bizjak, Jeff Law

[-- Attachment #1: Type: text/plain, Size: 2355 bytes --]

On 2014-08-29 2:47 AM, Ilya Enkovich wrote:
> Seems your patch doesn't cover all cases.  Attached is a modified
> patch (with your changes included) and a test where double constant is
> wrongly rematerialized.  I also see in ira dump that there is still a
> copy of PIC reg created:
>
> Initialization of original PIC reg:
> (insn 23 22 24 2 (set (reg:SI 127)
>          (reg:SI 3 bx)) test.cc:42 90 {*movsi_internal}
>       (expr_list:REG_DEAD (reg:SI 3 bx)
>          (nil)))
> ...
> Copy is created:
> (insn 135 37 25 3 (set (reg:SI 138 [127])
>          (reg:SI 127)) 90 {*movsi_internal}
>       (expr_list:REG_DEAD (reg:SI 127)
>          (nil)))
> ...
> Copy is used:
> (insn 119 25 122 3 (set (reg:DF 134)
>          (mem/u/c:DF (plus:SI (reg:SI 138 [127])
>                  (const:SI (unspec:SI [
>                              (symbol_ref/u:SI ("*.LC0") [flags 0x2])
>                          ] UNSPEC_GOTOFF))) [5  S8 A64])) 128 {*movdf_internal}
>       (expr_list:REG_EQUIV (const_double:DF
> 2.9999999999999997371893933895137251965934410691261292e-4
> [0x0.9d495182a99308p-11])
>          (nil)))
>

The copy is created by a newer IRA optimization for function prologues.

The patch in the attachment should solve the problem.  I also added the 
code to prevent spilling the pic pseudo in LRA which could happen before 
theoretically.


> After reload we have new usage of r127 which is allocated to ecx which
> actually does not have any definition in this function at all.
>
> (insn 151 42 44 4 (set (reg:SI 0 ax [147])
>          (plus:SI (reg:SI 2 cx [127])
>              (const:SI (unspec:SI [
>                          (symbol_ref/u:SI ("*.LC0") [flags 0x2])
>                      ] UNSPEC_GOTOFF)))) test.cc:44 213 {*leasi}
>       (expr_list:REG_EQUAL (symbol_ref/u:SI ("*.LC0") [flags 0x2])
>          (nil)))
> (insn 44 151 45 4 (set (reg:DF 21 xmm0 [orig:129 D.2450 ] [129])
>          (mult:DF (reg:DF 21 xmm0 [orig:128 D.2450 ] [128])
>              (mem/u/c:DF (reg:SI 0 ax [147]) [5  S8 A64]))) test.cc:44
> 790 {*fop_df_comm_sse}
>       (expr_list:REG_EQUAL (mult:DF (reg:DF 21 xmm0 [orig:128 D.2450 ] [128])
>              (const_double:DF
> 2.9999999999999997371893933895137251965934410691261292e-4
> [0x0.9d495182a99308p-11]))
>          (nil)))
>
> Compilation string: g++ -m32 -O2 -mfpmath=sse -fPIE -S test.cc


[-- Attachment #2: z2 --]
[-- Type: text/plain, Size: 2393 bytes --]

Index: ira.c
===================================================================
--- ira.c	(revision 214576)
+++ ira.c	(working copy)
@@ -4887,7 +4887,7 @@ split_live_ranges_for_shrink_wrap (void)
   FOR_BB_INSNS (first, insn)
     {
       rtx dest = interesting_dest_for_shprep (insn, call_dom);
-      if (!dest)
+      if (!dest || dest == pic_offset_table_rtx)
 	continue;
 
       rtx newreg = NULL_RTX;
Index: lra-assigns.c
===================================================================
--- lra-assigns.c	(revision 214576)
+++ lra-assigns.c	(working copy)
@@ -879,11 +879,13 @@ spill_for (int regno, bitmap spilled_pse
 	}
       /* Spill pseudos.	 */
       EXECUTE_IF_SET_IN_BITMAP (&spill_pseudos_bitmap, 0, spill_regno, bi)
-	if ((int) spill_regno >= lra_constraint_new_regno_start
-	    && ! bitmap_bit_p (&lra_inheritance_pseudos, spill_regno)
-	    && ! bitmap_bit_p (&lra_split_regs, spill_regno)
-	    && ! bitmap_bit_p (&lra_subreg_reload_pseudos, spill_regno)
-	    && ! bitmap_bit_p (&lra_optional_reload_pseudos, spill_regno))
+	if ((pic_offset_table_rtx != NULL
+	     && spill_regno == REGNO (pic_offset_table_rtx))
+	    || ((int) spill_regno >= lra_constraint_new_regno_start
+		&& ! bitmap_bit_p (&lra_inheritance_pseudos, spill_regno)
+		&& ! bitmap_bit_p (&lra_split_regs, spill_regno)
+		&& ! bitmap_bit_p (&lra_subreg_reload_pseudos, spill_regno)
+		&& ! bitmap_bit_p (&lra_optional_reload_pseudos, spill_regno)))
 	  goto fail;
       insn_pseudos_num = 0;
       if (lra_dump_file != NULL)
@@ -1053,7 +1055,9 @@ setup_live_pseudos_and_spill_after_risky
       return;
     }
   for (n = 0, i = FIRST_PSEUDO_REGISTER; i < max_regno; i++)
-    if (reg_renumber[i] >= 0 && lra_reg_info[i].nrefs > 0)
+    if ((pic_offset_table_rtx == NULL_RTX
+	 || i != (int) REGNO (pic_offset_table_rtx))
+	&& reg_renumber[i] >= 0 && lra_reg_info[i].nrefs > 0)
       sorted_pseudos[n++] = i;
   qsort (sorted_pseudos, n, sizeof (int), pseudo_compare_func);
   for (i = n - 1; i >= 0; i--)
@@ -1360,6 +1364,8 @@ assign_by_spills (void)
 	}
       EXECUTE_IF_SET_IN_SPARSESET (live_range_hard_reg_pseudos, conflict_regno)
 	{
+	  gcc_assert (pic_offset_table_rtx == NULL
+		      || conflict_regno != REGNO (pic_offset_table_rtx));
 	  if ((int) conflict_regno >= lra_constraint_new_regno_start)
 	    sorted_pseudos[nfails++] = conflict_regno;
 	  if (lra_dump_file != NULL)

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Enable EBX for x86 in 32bits PIC code
       [not found]                             ` <0EFAB2BDD0F67E4FB6CCC8B9F87D756969B3A89D@IRSMSX101.ger.corp.intel.com>
@ 2014-09-09 16:43                               ` Vladimir Makarov
  2014-09-11 19:57                                 ` Jeff Law
  0 siblings, 1 reply; 49+ messages in thread
From: Vladimir Makarov @ 2014-09-09 16:43 UTC (permalink / raw)
  To: Zamyatin, Igor; +Cc: Enkovich, Ilya, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 4002 bytes --]

On 09/04/2014 10:30 AM, Zamyatin, Igor wrote:
>
>> -----Original Message-----
>> From: gcc-patches-owner@gcc.gnu.org [mailto:gcc-patches-
>> owner@gcc.gnu.org] On Behalf Of Vladimir Makarov
>> Sent: Thursday, September 04, 2014 12:19 AM
>> To: Ilya Enkovich
>> Cc: gcc@gnu.org; gcc-patches; Evgeny Stupachenko; Richard Biener; Uros
>> Bizjak; Jeff Law
>> Subject: Re: Enable EBX for x86 in 32bits PIC code
>>
>> On 2014-08-29 2:47 AM, Ilya Enkovich wrote:
>>> Seems your patch doesn't cover all cases.  Attached is a modified
>>> patch (with your changes included) and a test where double constant is
>>> wrongly rematerialized.  I also see in ira dump that there is still a
>>> copy of PIC reg created:
>>>
>>> Initialization of original PIC reg:
>>> (insn 23 22 24 2 (set (reg:SI 127)
>>>          (reg:SI 3 bx)) test.cc:42 90 {*movsi_internal}
>>>       (expr_list:REG_DEAD (reg:SI 3 bx)
>>>          (nil)))
>>> ...
>>> Copy is created:
>>> (insn 135 37 25 3 (set (reg:SI 138 [127])
>>>          (reg:SI 127)) 90 {*movsi_internal}
>>>       (expr_list:REG_DEAD (reg:SI 127)
>>>          (nil)))
>>> ...
>>> Copy is used:
>>> (insn 119 25 122 3 (set (reg:DF 134)
>>>          (mem/u/c:DF (plus:SI (reg:SI 138 [127])
>>>                  (const:SI (unspec:SI [
>>>                              (symbol_ref/u:SI ("*.LC0") [flags 0x2])
>>>                          ] UNSPEC_GOTOFF))) [5  S8 A64])) 128 {*movdf_internal}
>>>       (expr_list:REG_EQUIV (const_double:DF
>>> 2.9999999999999997371893933895137251965934410691261292e-4
>>> [0x0.9d495182a99308p-11])
>>>          (nil)))
>>>
>> The copy is created by a newer IRA optimization for function prologues.
>>
>> The patch in the attachment should solve the problem.  I also added the code
>> to prevent spilling the pic pseudo in LRA which could happen before
>> theoretically.
> Hi, Vladimir!
>
> I applied patch as an addition to your previous patch (was I right?) and unfortunately got all spec2000 tests failed at the runtime (segfault)
>
> Looking at 164.gzip I saw following code in spec_init
>
>
> 00004bc0 <spec_init>:
>     4bc0:       55                      push   %ebp
>     4bc1:       57                      push   %edi
>     4bc2:       56                      push   %esi
>     4bc3:       e8 58 c6 ff ff          call   1220 <__x86.get_pc_thunk.si>
>     4bc8:       81 c6 38 84 00 00       add    $0x8438,%esi
>     4bce:       53                      push   %ebx
>     4bcf:       8d 64 24 e4             lea    -0x1c(%esp),%esp
>     4bd3:       83 be a0 03 00 00 03    cmpl   $0x3,0x3a0(%esi)
>     4bda:       7f 67                   jg     4c43 <spec_init+0x83>
>     4bdc:       8d ae 40 12 05 00       lea    0x51240(%esi),%ebp
>     4be2:       8d 45 30                lea    0x30(%ebp),%eax
>     4be5:       89 c6                   mov    %eax,%esi                             <---- incorrect move, GOT value is now lost (here was mov    %eax,0x1c(%esp) before this additional patch)
>     4be7:       8b 7d 00                mov    0x0(%ebp),%edi
>     4bea:       89 f3                   mov    %esi,%ebx                            <---- now ebx contains incorrect value so call to malloc will be executed wrongly
>     4bec:       c7 45 04 00 00 00 00    movl   $0x0,0x4(%ebp)
>     4bf3:       c7 45 08 00 00 00 00    movl   $0x0,0x8(%ebp)
>     4bfa:       c7 45 0c 00 00 00 00    movl   $0x0,0xc(%ebp)
>     4c01:       8d 87 00 90 01 00       lea    0x19000(%edi),%eax
>     4c07:       89 04 24                mov    %eax,(%esp)
>     4c0a:       e8 a1 be ff ff          call   ab0 <malloc@plt>
>
>
I've investigated the wrong code generation.  I did a mistake in my last
patch excluding pic pseudo from live-range analysis when risky
transformations are on.

Here is the right version of all IRA/LRA changes relative to trunk.  I
managed to compile and run successfully all 32-bit PIC SPECInt2000
programs with these changes.


[-- Attachment #2: z3 --]
[-- Type: text/plain, Size: 4623 bytes --]

Index: ira-color.c
===================================================================
--- ira-color.c	(revision 214576)
+++ ira-color.c	(working copy)
@@ -3239,9 +3239,11 @@
 	  ira_assert (ALLOCNO_CLASS (subloop_allocno) == rclass);
 	  ira_assert (bitmap_bit_p (subloop_node->all_allocnos,
 				    ALLOCNO_NUM (subloop_allocno)));
-	  if ((flag_ira_region == IRA_REGION_MIXED)
-	      && (loop_tree_node->reg_pressure[pclass]
-		  <= ira_class_hard_regs_num[pclass]))
+	  if ((flag_ira_region == IRA_REGION_MIXED
+	       && (loop_tree_node->reg_pressure[pclass]
+		   <= ira_class_hard_regs_num[pclass]))
+	      || (pic_offset_table_rtx != NULL
+		  && regno == (int) REGNO (pic_offset_table_rtx)))
 	    {
 	      if (! ALLOCNO_ASSIGNED_P (subloop_allocno))
 		{
Index: ira-emit.c
===================================================================
--- ira-emit.c	(revision 214576)
+++ ira-emit.c	(working copy)
@@ -620,7 +620,10 @@
 		  /* don't create copies because reload can spill an
 		     allocno set by copy although the allocno will not
 		     get memory slot.  */
-		  || ira_equiv_no_lvalue_p (regno)))
+		  || ira_equiv_no_lvalue_p (regno)
+		  || (pic_offset_table_rtx != NULL
+		      && (ALLOCNO_REGNO (allocno)
+			  == (int) REGNO (pic_offset_table_rtx)))))
 	    continue;
 	  original_reg = allocno_emit_reg (allocno);
 	  if (parent_allocno == NULL
Index: ira.c
===================================================================
--- ira.c	(revision 214576)
+++ ira.c	(working copy)
@@ -4887,7 +4887,7 @@
   FOR_BB_INSNS (first, insn)
     {
       rtx dest = interesting_dest_for_shprep (insn, call_dom);
-      if (!dest)
+      if (!dest || dest == pic_offset_table_rtx)
 	continue;
 
       rtx newreg = NULL_RTX;
Index: lra-assigns.c
===================================================================
--- lra-assigns.c	(revision 214576)
+++ lra-assigns.c	(working copy)
@@ -879,11 +879,13 @@
 	}
       /* Spill pseudos.	 */
       EXECUTE_IF_SET_IN_BITMAP (&spill_pseudos_bitmap, 0, spill_regno, bi)
-	if ((int) spill_regno >= lra_constraint_new_regno_start
-	    && ! bitmap_bit_p (&lra_inheritance_pseudos, spill_regno)
-	    && ! bitmap_bit_p (&lra_split_regs, spill_regno)
-	    && ! bitmap_bit_p (&lra_subreg_reload_pseudos, spill_regno)
-	    && ! bitmap_bit_p (&lra_optional_reload_pseudos, spill_regno))
+	if ((pic_offset_table_rtx != NULL
+	     && spill_regno == REGNO (pic_offset_table_rtx))
+	    || ((int) spill_regno >= lra_constraint_new_regno_start
+		&& ! bitmap_bit_p (&lra_inheritance_pseudos, spill_regno)
+		&& ! bitmap_bit_p (&lra_split_regs, spill_regno)
+		&& ! bitmap_bit_p (&lra_subreg_reload_pseudos, spill_regno)
+		&& ! bitmap_bit_p (&lra_optional_reload_pseudos, spill_regno)))
 	  goto fail;
       insn_pseudos_num = 0;
       if (lra_dump_file != NULL)
@@ -1053,9 +1055,15 @@
       return;
     }
   for (n = 0, i = FIRST_PSEUDO_REGISTER; i < max_regno; i++)
-    if (reg_renumber[i] >= 0 && lra_reg_info[i].nrefs > 0)
+    if ((pic_offset_table_rtx == NULL_RTX
+	 || i != (int) REGNO (pic_offset_table_rtx))
+	&& reg_renumber[i] >= 0 && lra_reg_info[i].nrefs > 0)
       sorted_pseudos[n++] = i;
   qsort (sorted_pseudos, n, sizeof (int), pseudo_compare_func);
+  if (pic_offset_table_rtx != NULL_RTX
+      && (regno = REGNO (pic_offset_table_rtx)) >= FIRST_PSEUDO_REGISTER
+      && reg_renumber[regno] >= 0 && lra_reg_info[regno].nrefs > 0)
+    sorted_pseudos[n++] = regno;
   for (i = n - 1; i >= 0; i--)
     {
       regno = sorted_pseudos[i];
@@ -1360,6 +1368,8 @@
 	}
       EXECUTE_IF_SET_IN_SPARSESET (live_range_hard_reg_pseudos, conflict_regno)
 	{
+	  gcc_assert (pic_offset_table_rtx == NULL
+		      || conflict_regno != REGNO (pic_offset_table_rtx));
 	  if ((int) conflict_regno >= lra_constraint_new_regno_start)
 	    sorted_pseudos[nfails++] = conflict_regno;
 	  if (lra_dump_file != NULL)
Index: lra-constraints.c
===================================================================
--- lra-constraints.c	(revision 214576)
+++ lra-constraints.c	(working copy)
@@ -4021,7 +4021,11 @@
       ("Maximum number of LRA constraint passes is achieved (%d)\n",
        LRA_MAX_CONSTRAINT_ITERATION_NUMBER);
   changed_p = false;
-  lra_risky_transformations_p = false;
+  if (pic_offset_table_rtx
+      && REGNO (pic_offset_table_rtx) >= FIRST_PSEUDO_REGISTER)
+    lra_risky_transformations_p = true;
+  else
+    lra_risky_transformations_p = false;
   new_insn_uid_start = get_max_uid ();
   new_regno_start = first_p ? lra_constraint_new_regno_start : max_reg_num ();
   /* Mark used hard regs for target stack size calulations.  */

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Enable EBX for x86 in 32bits PIC code
  2014-09-09 16:43                               ` Vladimir Makarov
@ 2014-09-11 19:57                                 ` Jeff Law
  0 siblings, 0 replies; 49+ messages in thread
From: Jeff Law @ 2014-09-11 19:57 UTC (permalink / raw)
  To: Vladimir Makarov, Zamyatin, Igor; +Cc: Enkovich, Ilya, gcc-patches

On 09/09/14 10:43, Vladimir Makarov wrote:

> I've investigated the wrong code generation.  I did a mistake in my last
> patch excluding pic pseudo from live-range analysis when risky
> transformations are on.
>
> Here is the right version of all IRA/LRA changes relative to trunk.  I
> managed to compile and run successfully all 32-bit PIC SPECInt2000
> programs with these changes.
Thanks Vlad.  I'll leave final testing and committing these patches to you.

I did play around with a simpler patch to un-fix the PIC register for 
32bit x86 -- without turning it into a pseudo.  The problem with that 
intermediate step is that when we spill values with symbolic 
equivalences, we fail to handle the new conflicts that will generate. 
ie, in the case of PIC reloading those values should cause the unfixed 
hard pic register to conflict with any live pseudos at the reload point. 
  But it doesn't :(  As a result we end up using the unfixed hard pic 
reg to satsify a reload and, well, you know what happens then.

jeff

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Enable EBX for x86 in 32bits PIC code
  2014-09-03 20:19                           ` Vladimir Makarov
       [not found]                             ` <0EFAB2BDD0F67E4FB6CCC8B9F87D756969B3A89D@IRSMSX101.ger.corp.intel.com>
@ 2014-09-23 13:54                             ` Ilya Enkovich
  2014-09-23 14:23                               ` Uros Bizjak
  2014-09-23 14:34                               ` Jakub Jelinek
  1 sibling, 2 replies; 49+ messages in thread
From: Ilya Enkovich @ 2014-09-23 13:54 UTC (permalink / raw)
  To: Vladimir Makarov
  Cc: gcc, gcc-patches, Evgeny Stupachenko, Richard Biener,
	Uros Bizjak, Jeff Law

On 03 Sep 16:19, Vladimir Makarov wrote:
> On 2014-08-29 2:47 AM, Ilya Enkovich wrote:
> >Seems your patch doesn't cover all cases.  Attached is a modified
> >patch (with your changes included) and a test where double constant is
> >wrongly rematerialized.  I also see in ira dump that there is still a
> >copy of PIC reg created:
> >
> >Initialization of original PIC reg:
> >(insn 23 22 24 2 (set (reg:SI 127)
> >         (reg:SI 3 bx)) test.cc:42 90 {*movsi_internal}
> >      (expr_list:REG_DEAD (reg:SI 3 bx)
> >         (nil)))
> >...
> >Copy is created:
> >(insn 135 37 25 3 (set (reg:SI 138 [127])
> >         (reg:SI 127)) 90 {*movsi_internal}
> >      (expr_list:REG_DEAD (reg:SI 127)
> >         (nil)))
> >...
> >Copy is used:
> >(insn 119 25 122 3 (set (reg:DF 134)
> >         (mem/u/c:DF (plus:SI (reg:SI 138 [127])
> >                 (const:SI (unspec:SI [
> >                             (symbol_ref/u:SI ("*.LC0") [flags 0x2])
> >                         ] UNSPEC_GOTOFF))) [5  S8 A64])) 128 {*movdf_internal}
> >      (expr_list:REG_EQUIV (const_double:DF
> >2.9999999999999997371893933895137251965934410691261292e-4
> >[0x0.9d495182a99308p-11])
> >         (nil)))
> >
> 
> The copy is created by a newer IRA optimization for function prologues.
> 
> The patch in the attachment should solve the problem.  I also added
> the code to prevent spilling the pic pseudo in LRA which could
> happen before theoretically.
> 
> 
> >After reload we have new usage of r127 which is allocated to ecx which
> >actually does not have any definition in this function at all.
> >
> >(insn 151 42 44 4 (set (reg:SI 0 ax [147])
> >         (plus:SI (reg:SI 2 cx [127])
> >             (const:SI (unspec:SI [
> >                         (symbol_ref/u:SI ("*.LC0") [flags 0x2])
> >                     ] UNSPEC_GOTOFF)))) test.cc:44 213 {*leasi}
> >      (expr_list:REG_EQUAL (symbol_ref/u:SI ("*.LC0") [flags 0x2])
> >         (nil)))
> >(insn 44 151 45 4 (set (reg:DF 21 xmm0 [orig:129 D.2450 ] [129])
> >         (mult:DF (reg:DF 21 xmm0 [orig:128 D.2450 ] [128])
> >             (mem/u/c:DF (reg:SI 0 ax [147]) [5  S8 A64]))) test.cc:44
> >790 {*fop_df_comm_sse}
> >      (expr_list:REG_EQUAL (mult:DF (reg:DF 21 xmm0 [orig:128 D.2450 ] [128])
> >             (const_double:DF
> >2.9999999999999997371893933895137251965934410691261292e-4
> >[0x0.9d495182a99308p-11]))
> >         (nil)))
> >
> >Compilation string: g++ -m32 -O2 -mfpmath=sse -fPIE -S test.cc
> 

> Index: ira.c
> ===================================================================
> --- ira.c	(revision 214576)
> +++ ira.c	(working copy)
> @@ -4887,7 +4887,7 @@ split_live_ranges_for_shrink_wrap (void)
>    FOR_BB_INSNS (first, insn)
>      {
>        rtx dest = interesting_dest_for_shprep (insn, call_dom);
> -      if (!dest)
> +      if (!dest || dest == pic_offset_table_rtx)
>  	continue;
>  
>        rtx newreg = NULL_RTX;
> Index: lra-assigns.c
> ===================================================================
> --- lra-assigns.c	(revision 214576)
> +++ lra-assigns.c	(working copy)
> @@ -879,11 +879,13 @@ spill_for (int regno, bitmap spilled_pse
>  	}
>        /* Spill pseudos.	 */
>        EXECUTE_IF_SET_IN_BITMAP (&spill_pseudos_bitmap, 0, spill_regno, bi)
> -	if ((int) spill_regno >= lra_constraint_new_regno_start
> -	    && ! bitmap_bit_p (&lra_inheritance_pseudos, spill_regno)
> -	    && ! bitmap_bit_p (&lra_split_regs, spill_regno)
> -	    && ! bitmap_bit_p (&lra_subreg_reload_pseudos, spill_regno)
> -	    && ! bitmap_bit_p (&lra_optional_reload_pseudos, spill_regno))
> +	if ((pic_offset_table_rtx != NULL
> +	     && spill_regno == REGNO (pic_offset_table_rtx))
> +	    || ((int) spill_regno >= lra_constraint_new_regno_start
> +		&& ! bitmap_bit_p (&lra_inheritance_pseudos, spill_regno)
> +		&& ! bitmap_bit_p (&lra_split_regs, spill_regno)
> +		&& ! bitmap_bit_p (&lra_subreg_reload_pseudos, spill_regno)
> +		&& ! bitmap_bit_p (&lra_optional_reload_pseudos, spill_regno)))
>  	  goto fail;
>        insn_pseudos_num = 0;
>        if (lra_dump_file != NULL)
> @@ -1053,7 +1055,9 @@ setup_live_pseudos_and_spill_after_risky
>        return;
>      }
>    for (n = 0, i = FIRST_PSEUDO_REGISTER; i < max_regno; i++)
> -    if (reg_renumber[i] >= 0 && lra_reg_info[i].nrefs > 0)
> +    if ((pic_offset_table_rtx == NULL_RTX
> +	 || i != (int) REGNO (pic_offset_table_rtx))
> +	&& reg_renumber[i] >= 0 && lra_reg_info[i].nrefs > 0)
>        sorted_pseudos[n++] = i;
>    qsort (sorted_pseudos, n, sizeof (int), pseudo_compare_func);
>    for (i = n - 1; i >= 0; i--)
> @@ -1360,6 +1364,8 @@ assign_by_spills (void)
>  	}
>        EXECUTE_IF_SET_IN_SPARSESET (live_range_hard_reg_pseudos, conflict_regno)
>  	{
> +	  gcc_assert (pic_offset_table_rtx == NULL
> +		      || conflict_regno != REGNO (pic_offset_table_rtx));
>  	  if ((int) conflict_regno >= lra_constraint_new_regno_start)
>  	    sorted_pseudos[nfails++] = conflict_regno;
>  	  if (lra_dump_file != NULL)

Hi,

Here is a patch which combines results of my and Vladimir's work on EBX enabling.

It works OK for SPEC2000 and SPEC2006 on -Ofast + LTO.  It passes bootstrap but there are few new failures in make check.

gcc.target/i386/pic-1.c fails because it doesn't expect we can use EBX in 32bit PIC mode
gcc.target/i386/pr55458.c fails due to the same reason
gcc.target/i386/pr23098.c fails because compiler fails to use float constant as an immediate and loads it from GOT instead

Do we have the final decision about having a sompiler flag to control enabling of pseudo PIC register?  I think we should have a possibility to use fixed EBX at least until we make sure pseudo PIC doesn't harm debug info generation. If we have such option then gcc.target/i386/pic-1.c and gcc.target/i386/pr55458.c should be modified, otherwise these tests should be removed.

@Vladimir: I didn't want to speculate about your changes and just put '??' for them in ChangeLog description.  Could you please fill proper comments?  Or may be you would like to split this patch into two parts and commit ira changes separately?

Thanks,
Ilya
--
2014-09-23  Ilya Enkovich  <ilya.enkovich@intel.com>

	* config/i386/i386.c (ix86_use_pseudo_pic_reg): New.
	(ix86_init_pic_reg): New.
	(ix86_select_alt_pic_regnum): Support pseudo PIC register.
	(ix86_save_reg): Likewise.
	(ix86_output_function_epilogue): Likewise.
	(ix86_expand_prologue): Remove PIC register initialization
	now performed in ix86_init_pic_reg.
	(set_pic_reg_ever_alive): New.
	(legitimize_pic_address): Use set_pic_reg_ever_alive.
	(ix86_pic_register_p): Support pseudo PIC register.
	(ix86_delegitimize_address): Likewise.
	(ix86_expand_call): Fill REAL_PIC_OFFSET_TABLE_REGNUM
	with GOT address if required.
	(TARGET_INIT_PIC_REG): New.
	(TARGET_USE_PSEUDO_PIC_REG): New.
	(PIC_OFFSET_TABLE_REGNUM): Return INVALID_REGNUM if
	pic_offset_table_rtx exists.
	* doc/tm.texi.in (TARGET_USE_PSEUDO_PIC_REG): New.
	(TARGET_INIT_PIC_REG): New.
	* doc/tm.texi: Regenrated.
	* function.c (assign_parms): Create pseudo PIC register
	if required.
	* init-regs.c (initialize_uninitialized_regs): Don't
	initialize PIC regsiter.
	* ira-color.c (color_pass): ??
	* ira-emit.c (change_loop): ??
	* ira.c (split_live_ranges_for_shrink_wrap): ??
	(ira): Call target hook to initialize PIC register.
	(do_reload): Avoid taransformation of pic_offset_table_rtx
	into hard register.
	* lra-assigns.c (spill_for): ??
	(setup_live_pseudos_and_spill_after_risky_transforms): ??
	* lra-constraints.c (contains_symbol_ref_p): New.
	(lra_constraints): Pseudo PIC register means we make risky
	transformations.
	* shrink-wrap.c (try_shrink_wrapping): Support pseudo PIC
	regsiter.
	* target.def (use_pseudo_pic_reg): New.
	(init_pic_reg): New.


diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 6337aa5..a21ae25 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -6134,6 +6134,68 @@ ix86_maybe_switch_abi (void)
     reinit_regs ();
 }
 
+/* Return 1 if pseudo register should be created and used to hold
+   GOT address for PIC code.  */
+static bool
+ix86_use_pseudo_pic_reg (void)
+{
+  if ((TARGET_64BIT
+       && (ix86_cmodel == CM_SMALL_PIC
+	   || TARGET_PECOFF))
+      || !flag_pic)
+    return false;
+  return true;
+}
+
+/* Create and initialize PIC register if required.  */
+static void
+ix86_init_pic_reg (void)
+{
+  edge entry_edge;
+  rtx_insn *seq;
+
+  if (!ix86_use_pseudo_pic_reg ())
+    return;
+
+  start_sequence ();
+
+  if (TARGET_64BIT)
+    {
+      if (ix86_cmodel == CM_LARGE_PIC)
+	{
+	  rtx_code_label *label;
+	  rtx tmp_reg;
+
+	  gcc_assert (Pmode == DImode);
+	  label = gen_label_rtx ();
+	  emit_label (label);
+	  LABEL_PRESERVE_P (label) = 1;
+	  tmp_reg = gen_rtx_REG (Pmode, R11_REG);
+	  gcc_assert (REGNO (pic_offset_table_rtx) != REGNO (tmp_reg));
+	  emit_insn (gen_set_rip_rex64 (pic_offset_table_rtx,
+					label));
+	  emit_insn (gen_set_got_offset_rex64 (tmp_reg, label));
+	  emit_insn (ix86_gen_add3 (pic_offset_table_rtx,
+				    pic_offset_table_rtx, tmp_reg));
+	}
+      else
+	emit_insn (gen_set_got_rex64 (pic_offset_table_rtx));
+    }
+  else
+    {
+      rtx insn = emit_insn (gen_set_got (pic_offset_table_rtx));
+      RTX_FRAME_RELATED_P (insn) = 1;
+      add_reg_note (insn, REG_CFA_FLUSH_QUEUE, NULL_RTX);
+    }
+
+  seq = get_insns ();
+  end_sequence ();
+
+  entry_edge = single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun));
+  insert_insn_on_edge (seq, entry_edge);
+  commit_one_edge_insertion (entry_edge);
+}
+
 /* Initialize a variable CUM of type CUMULATIVE_ARGS
    for a call to a function whose data type is FNTYPE.
    For a library call, FNTYPE is 0.  */
@@ -9376,6 +9438,9 @@ gen_pop (rtx arg)
 static unsigned int
 ix86_select_alt_pic_regnum (void)
 {
+  if (ix86_use_pseudo_pic_reg ())
+    return INVALID_REGNUM;
+
   if (crtl->is_leaf
       && !crtl->profile
       && !ix86_current_function_calls_tls_descriptor)
@@ -9400,6 +9465,7 @@ static bool
 ix86_save_reg (unsigned int regno, bool maybe_eh_return)
 {
   if (pic_offset_table_rtx
+      && !ix86_use_pseudo_pic_reg ()
       && regno == REAL_PIC_OFFSET_TABLE_REGNUM
       && (df_regs_ever_live_p (REAL_PIC_OFFSET_TABLE_REGNUM)
 	  || crtl->profile
@@ -10752,7 +10818,6 @@ ix86_expand_prologue (void)
 {
   struct machine_function *m = cfun->machine;
   rtx insn, t;
-  bool pic_reg_used;
   struct ix86_frame frame;
   HOST_WIDE_INT allocate;
   bool int_registers_saved;
@@ -11199,60 +11264,6 @@ ix86_expand_prologue (void)
   if (!sse_registers_saved)
     ix86_emit_save_sse_regs_using_mov (frame.sse_reg_save_offset);
 
-  pic_reg_used = false;
-  /* We don't use pic-register for pe-coff target.  */
-  if (pic_offset_table_rtx
-      && !TARGET_PECOFF
-      && (df_regs_ever_live_p (REAL_PIC_OFFSET_TABLE_REGNUM)
-	  || crtl->profile))
-    {
-      unsigned int alt_pic_reg_used = ix86_select_alt_pic_regnum ();
-
-      if (alt_pic_reg_used != INVALID_REGNUM)
-	SET_REGNO (pic_offset_table_rtx, alt_pic_reg_used);
-
-      pic_reg_used = true;
-    }
-
-  if (pic_reg_used)
-    {
-      if (TARGET_64BIT)
-	{
-	  if (ix86_cmodel == CM_LARGE_PIC)
-	    {
-	      rtx_code_label *label;
-	      rtx tmp_reg;
-
-	      gcc_assert (Pmode == DImode);
-	      label = gen_label_rtx ();
-	      emit_label (label);
-	      LABEL_PRESERVE_P (label) = 1;
-	      tmp_reg = gen_rtx_REG (Pmode, R11_REG);
-	      gcc_assert (REGNO (pic_offset_table_rtx) != REGNO (tmp_reg));
-	      insn = emit_insn (gen_set_rip_rex64 (pic_offset_table_rtx,
-						   label));
-	      insn = emit_insn (gen_set_got_offset_rex64 (tmp_reg, label));
-	      insn = emit_insn (ix86_gen_add3 (pic_offset_table_rtx,
-					       pic_offset_table_rtx, tmp_reg));
-	    }
-	  else
-            insn = emit_insn (gen_set_got_rex64 (pic_offset_table_rtx));
-	}
-      else
-	{
-          insn = emit_insn (gen_set_got (pic_offset_table_rtx));
-	  RTX_FRAME_RELATED_P (insn) = 1;
-	  add_reg_note (insn, REG_CFA_FLUSH_QUEUE, NULL_RTX);
-	}
-    }
-
-  /* In the pic_reg_used case, make sure that the got load isn't deleted
-     when mcount needs it.  Blockage to avoid call movement across mcount
-     call is emitted in generic code after the NOTE_INSN_PROLOGUE_END
-     note.  */
-  if (crtl->profile && !flag_fentry && pic_reg_used)
-    emit_insn (gen_prologue_use (pic_offset_table_rtx));
-
   if (crtl->drap_reg && !crtl->stack_realign_needed)
     {
       /* vDRAP is setup but after reload it turns out stack realign
@@ -11793,7 +11804,8 @@ ix86_expand_epilogue (int style)
 static void
 ix86_output_function_epilogue (FILE *file ATTRIBUTE_UNUSED, HOST_WIDE_INT)
 {
-  if (pic_offset_table_rtx)
+  if (pic_offset_table_rtx
+      && !ix86_use_pseudo_pic_reg ())
     SET_REGNO (pic_offset_table_rtx, REAL_PIC_OFFSET_TABLE_REGNUM);
 #if TARGET_MACHO
   /* Mach-O doesn't support labels at the end of objects, so if
@@ -13113,6 +13125,15 @@ ix86_GOT_alias_set (void)
   return set;
 }
 
+/* Set regs_ever_live for PIC base address register
+   to true if required.  */
+static void
+set_pic_reg_ever_alive ()
+{
+  if (reload_in_progress)
+    df_set_regs_ever_live (REGNO (pic_offset_table_rtx), true);
+}
+
 /* Return a legitimate reference for ORIG (an address) using the
    register REG.  If REG is 0, a new pseudo is generated.
 
@@ -13163,8 +13184,7 @@ legitimize_pic_address (rtx orig, rtx reg)
       /* This symbol may be referenced via a displacement from the PIC
 	 base address (@GOTOFF).  */
 
-      if (reload_in_progress)
-	df_set_regs_ever_live (PIC_OFFSET_TABLE_REGNUM, true);
+      set_pic_reg_ever_alive ();
       if (GET_CODE (addr) == CONST)
 	addr = XEXP (addr, 0);
       if (GET_CODE (addr) == PLUS)
@@ -13196,8 +13216,7 @@ legitimize_pic_address (rtx orig, rtx reg)
       /* This symbol may be referenced via a displacement from the PIC
 	 base address (@GOTOFF).  */
 
-      if (reload_in_progress)
-	df_set_regs_ever_live (PIC_OFFSET_TABLE_REGNUM, true);
+      set_pic_reg_ever_alive ();
       if (GET_CODE (addr) == CONST)
 	addr = XEXP (addr, 0);
       if (GET_CODE (addr) == PLUS)
@@ -13258,8 +13277,7 @@ legitimize_pic_address (rtx orig, rtx reg)
 	  /* This symbol must be referenced via a load from the
 	     Global Offset Table (@GOT).  */
 
-	  if (reload_in_progress)
-	    df_set_regs_ever_live (PIC_OFFSET_TABLE_REGNUM, true);
+	  set_pic_reg_ever_alive ();
 	  new_rtx = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, addr), UNSPEC_GOT);
 	  new_rtx = gen_rtx_CONST (Pmode, new_rtx);
 	  if (TARGET_64BIT)
@@ -13311,8 +13329,7 @@ legitimize_pic_address (rtx orig, rtx reg)
 	    {
 	      if (!TARGET_64BIT)
 		{
-		  if (reload_in_progress)
-		    df_set_regs_ever_live (PIC_OFFSET_TABLE_REGNUM, true);
+		  set_pic_reg_ever_alive ();
 		  new_rtx = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, op0),
 					    UNSPEC_GOTOFF);
 		  new_rtx = gen_rtx_PLUS (Pmode, new_rtx, op1);
@@ -13608,8 +13625,7 @@ legitimize_tls_address (rtx x, enum tls_model model, bool for_mov)
 	}
       else if (flag_pic)
 	{
-	  if (reload_in_progress)
-	    df_set_regs_ever_live (PIC_OFFSET_TABLE_REGNUM, true);
+	  set_pic_reg_ever_alive ();
 	  pic = pic_offset_table_rtx;
 	  type = TARGET_ANY_GNU_TLS ? UNSPEC_GOTNTPOFF : UNSPEC_GOTTPOFF;
 	}
@@ -14240,6 +14256,8 @@ ix86_pic_register_p (rtx x)
   if (GET_CODE (x) == VALUE && CSELIB_VAL_PTR (x))
     return (pic_offset_table_rtx
 	    && rtx_equal_for_cselib_p (x, pic_offset_table_rtx));
+  else if (pic_offset_table_rtx)
+    return REG_P (x) && REGNO (x) == REGNO (pic_offset_table_rtx);
   else
     return REG_P (x) && REGNO (x) == PIC_OFFSET_TABLE_REGNUM;
 }
@@ -14415,7 +14433,8 @@ ix86_delegitimize_address (rtx x)
 	 ...
 	 movl foo@GOTOFF(%ecx), %edx
 	 in which case we return (%ecx - %ebx) + foo.  */
-      if (pic_offset_table_rtx)
+      if (pic_offset_table_rtx
+	  && (!reload_completed || !ix86_use_pseudo_pic_reg ()))
         result = gen_rtx_PLUS (Pmode, gen_rtx_MINUS (Pmode, copy_rtx (addend),
 						     pic_offset_table_rtx),
 			       result);
@@ -24891,7 +24910,12 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1,
 		  && DEFAULT_ABI != MS_ABI))
 	  && GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF
 	  && ! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0)))
-	use_reg (&use, pic_offset_table_rtx);
+	{
+	  use_reg (&use, gen_rtx_REG (Pmode, REAL_PIC_OFFSET_TABLE_REGNUM));
+	  if (ix86_use_pseudo_pic_reg ())
+	    emit_move_insn (gen_rtx_REG (Pmode, REAL_PIC_OFFSET_TABLE_REGNUM),
+			    pic_offset_table_rtx);
+	}
     }
 
   if (TARGET_64BIT && INTVAL (callarg2) >= 0)
@@ -47300,6 +47324,10 @@ ix86_atomic_assign_expand_fenv (tree *hold, tree *clear, tree *update)
 #define TARGET_FUNCTION_ARG_ADVANCE ix86_function_arg_advance
 #undef TARGET_FUNCTION_ARG
 #define TARGET_FUNCTION_ARG ix86_function_arg
+#undef TARGET_INIT_PIC_REG
+#define TARGET_INIT_PIC_REG ix86_init_pic_reg
+#undef TARGET_USE_PSEUDO_PIC_REG
+#define TARGET_USE_PSEUDO_PIC_REG ix86_use_pseudo_pic_reg
 #undef TARGET_FUNCTION_ARG_BOUNDARY
 #define TARGET_FUNCTION_ARG_BOUNDARY ix86_function_arg_boundary
 #undef TARGET_PASS_BY_REFERENCE
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 2c64162..a1be45e 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -1243,11 +1243,11 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
 
 #define REAL_PIC_OFFSET_TABLE_REGNUM  BX_REG
 
-#define PIC_OFFSET_TABLE_REGNUM				\
-  ((TARGET_64BIT && (ix86_cmodel == CM_SMALL_PIC	\
-                     || TARGET_PECOFF))		\
-   || !flag_pic ? INVALID_REGNUM			\
-   : reload_completed ? REGNO (pic_offset_table_rtx)	\
+#define PIC_OFFSET_TABLE_REGNUM						\
+  ((TARGET_64BIT && (ix86_cmodel == CM_SMALL_PIC			\
+                     || TARGET_PECOFF))					\
+   || !flag_pic ? INVALID_REGNUM					\
+   : pic_offset_table_rtx ? INVALID_REGNUM				\
    : REAL_PIC_OFFSET_TABLE_REGNUM)
 
 #define GOT_SYMBOL_NAME "_GLOBAL_OFFSET_TABLE_"
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 396909f..0dd9b79 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -3909,6 +3909,16 @@ If @code{TARGET_FUNCTION_INCOMING_ARG} is not defined,
 @code{TARGET_FUNCTION_ARG} serves both purposes.
 @end deftypefn
 
+@deftypefn {Target Hook} bool TARGET_USE_PSEUDO_PIC_REG (void)
+This hook should return 1 in case pseudo register should be created
+for pic_offset_table_rtx during function expand.
+@end deftypefn
+
+@deftypefn {Target Hook} void TARGET_INIT_PIC_REG (void)
+Perform a target dependent initialization of pic_offset_table_rtx.
+This hook is called at the start of register allocation.
+@end deftypefn
+
 @deftypefn {Target Hook} int TARGET_ARG_PARTIAL_BYTES (cumulative_args_t @var{cum}, enum machine_mode @var{mode}, tree @var{type}, bool @var{named})
 This target hook returns the number of bytes at the beginning of an
 argument that must be put in registers.  The value must be zero for
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 798c1aa..d6ee52a 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -3355,6 +3355,10 @@ the stack.
 
 @hook TARGET_FUNCTION_INCOMING_ARG
 
+@hook TARGET_USE_PSEUDO_PIC_REG
+
+@hook TARGET_INIT_PIC_REG
+
 @hook TARGET_ARG_PARTIAL_BYTES
 
 @hook TARGET_PASS_BY_REFERENCE
diff --git a/gcc/function.c b/gcc/function.c
index ac50f4a..cd7e42e 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -3459,6 +3459,11 @@ assign_parms (tree fndecl)
 
   fnargs.release ();
 
+  /* Initialize pic_offset_table_rtx with a pseudo register
+     if required.  */
+  if (targetm.use_pseudo_pic_reg ())
+    pic_offset_table_rtx = gen_reg_rtx (Pmode);
+
   /* Output all parameter conversion instructions (possibly including calls)
      now that all parameters have been copied out of hard registers.  */
   emit_insn (all.first_conversion_insn);
diff --git a/gcc/init-regs.c b/gcc/init-regs.c
index 91b123d..bf83e51 100644
--- a/gcc/init-regs.c
+++ b/gcc/init-regs.c
@@ -80,6 +80,11 @@ initialize_uninitialized_regs (void)
 	      if (regno < FIRST_PSEUDO_REGISTER)
 		continue;
 
+	      /* Ignore pseudo PIC register.  */
+	      if (pic_offset_table_rtx
+		  && regno == REGNO (pic_offset_table_rtx))
+		continue;
+
 	      /* Do not generate multiple moves for the same regno.
 		 This is common for sequences of subreg operations.
 		 They would be deleted during combine but there is no
diff --git a/gcc/ira-color.c b/gcc/ira-color.c
index 6846567..26b8ffe 100644
--- a/gcc/ira-color.c
+++ b/gcc/ira-color.c
@@ -3239,9 +3239,11 @@ color_pass (ira_loop_tree_node_t loop_tree_node)
 	  ira_assert (ALLOCNO_CLASS (subloop_allocno) == rclass);
 	  ira_assert (bitmap_bit_p (subloop_node->all_allocnos,
 				    ALLOCNO_NUM (subloop_allocno)));
-	  if ((flag_ira_region == IRA_REGION_MIXED)
-	      && (loop_tree_node->reg_pressure[pclass]
-		  <= ira_class_hard_regs_num[pclass]))
+	  if ((flag_ira_region == IRA_REGION_MIXED
+	       && (loop_tree_node->reg_pressure[pclass]
+		   <= ira_class_hard_regs_num[pclass]))
+	      || (pic_offset_table_rtx != NULL
+		  && regno == (int) REGNO (pic_offset_table_rtx)))
 	    {
 	      if (! ALLOCNO_ASSIGNED_P (subloop_allocno))
 		{
diff --git a/gcc/ira-emit.c b/gcc/ira-emit.c
index a3bf41e..676ee1a 100644
--- a/gcc/ira-emit.c
+++ b/gcc/ira-emit.c
@@ -620,7 +620,10 @@ change_loop (ira_loop_tree_node_t node)
 		  /* don't create copies because reload can spill an
 		     allocno set by copy although the allocno will not
 		     get memory slot.  */
-		  || ira_equiv_no_lvalue_p (regno)))
+		  || ira_equiv_no_lvalue_p (regno)
+		  || (pic_offset_table_rtx != NULL
+		      && (ALLOCNO_REGNO (allocno)
+			  == (int) REGNO (pic_offset_table_rtx)))))
 	    continue;
 	  original_reg = allocno_emit_reg (allocno);
 	  if (parent_allocno == NULL
diff --git a/gcc/ira.c b/gcc/ira.c
index f377f7d..ae83aa5 100644
--- a/gcc/ira.c
+++ b/gcc/ira.c
@@ -4887,7 +4887,7 @@ split_live_ranges_for_shrink_wrap (void)
   FOR_BB_INSNS (first, insn)
     {
       rtx dest = interesting_dest_for_shprep (insn, call_dom);
-      if (!dest)
+      if (!dest || dest == pic_offset_table_rtx)
 	continue;
 
       rtx newreg = NULL_RTX;
@@ -5039,6 +5039,9 @@ ira (FILE *f)
   bool saved_flag_caller_saves = flag_caller_saves;
   enum ira_region saved_flag_ira_region = flag_ira_region;
 
+  /* Perform target specific PIC register initialization.  */
+  targetm.init_pic_reg ();
+
   ira_conflicts_p = optimize > 0;
 
   ira_use_lra_p = targetm.lra_p ();
@@ -5290,10 +5293,18 @@ do_reload (void)
 {
   basic_block bb;
   bool need_dce;
+  unsigned pic_offset_table_regno = INVALID_REGNUM;
 
   if (flag_ira_verbose < 10)
     ira_dump_file = dump_file;
 
+  /* If pic_offset_table_rtx is a pseudo register, then keep it so
+     after reload to avoid possible wrong usages of hard reg assigned
+     to it.  */
+  if (pic_offset_table_rtx
+      && REGNO (pic_offset_table_rtx) >= FIRST_PSEUDO_REGISTER)
+    pic_offset_table_regno = REGNO (pic_offset_table_rtx);
+
   timevar_push (TV_RELOAD);
   if (ira_use_lra_p)
     {
@@ -5398,6 +5409,9 @@ do_reload (void)
       inform (DECL_SOURCE_LOCATION (decl), "for %qD", decl);
     }
 
+  if (pic_offset_table_regno != INVALID_REGNUM)
+    pic_offset_table_rtx = gen_rtx_REG (Pmode, pic_offset_table_regno);
+
   timevar_pop (TV_IRA);
 }
 \f
diff --git a/gcc/lra-assigns.c b/gcc/lra-assigns.c
index c7164cd..99ae00d 100644
--- a/gcc/lra-assigns.c
+++ b/gcc/lra-assigns.c
@@ -879,11 +879,13 @@ spill_for (int regno, bitmap spilled_pseudo_bitmap, bool first_p)
 	}
       /* Spill pseudos.	 */
       EXECUTE_IF_SET_IN_BITMAP (&spill_pseudos_bitmap, 0, spill_regno, bi)
-	if ((int) spill_regno >= lra_constraint_new_regno_start
-	    && ! bitmap_bit_p (&lra_inheritance_pseudos, spill_regno)
-	    && ! bitmap_bit_p (&lra_split_regs, spill_regno)
-	    && ! bitmap_bit_p (&lra_subreg_reload_pseudos, spill_regno)
-	    && ! bitmap_bit_p (&lra_optional_reload_pseudos, spill_regno))
+	if ((pic_offset_table_rtx != NULL
+	     && spill_regno == REGNO (pic_offset_table_rtx))
+	    || ((int) spill_regno >= lra_constraint_new_regno_start
+		&& ! bitmap_bit_p (&lra_inheritance_pseudos, spill_regno)
+		&& ! bitmap_bit_p (&lra_split_regs, spill_regno)
+		&& ! bitmap_bit_p (&lra_subreg_reload_pseudos, spill_regno)
+		&& ! bitmap_bit_p (&lra_optional_reload_pseudos, spill_regno)))
 	  goto fail;
       insn_pseudos_num = 0;
       if (lra_dump_file != NULL)
@@ -1053,9 +1055,15 @@ setup_live_pseudos_and_spill_after_risky_transforms (bitmap
       return;
     }
   for (n = 0, i = FIRST_PSEUDO_REGISTER; i < max_regno; i++)
-    if (reg_renumber[i] >= 0 && lra_reg_info[i].nrefs > 0)
+    if ((pic_offset_table_rtx == NULL_RTX
+	 || i != (int) REGNO (pic_offset_table_rtx))
+	&& reg_renumber[i] >= 0 && lra_reg_info[i].nrefs > 0)
       sorted_pseudos[n++] = i;
   qsort (sorted_pseudos, n, sizeof (int), pseudo_compare_func);
+  if (pic_offset_table_rtx != NULL_RTX
+      && (regno = REGNO (pic_offset_table_rtx)) >= FIRST_PSEUDO_REGISTER
+      && reg_renumber[regno] >= 0 && lra_reg_info[regno].nrefs > 0)
+    sorted_pseudos[n++] = regno;
   for (i = n - 1; i >= 0; i--)
     {
       regno = sorted_pseudos[i];
diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index 5f68399..977e1db 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -3798,6 +3798,35 @@ contains_reg_p (rtx x, bool hard_reg_p, bool spilled_p)
   return false;
 }
 
+/* Return true if X contains a symbol reg.  */
+static bool
+contains_symbol_ref_p (rtx x)
+{
+  int i, j;
+  const char *fmt;
+  enum rtx_code code;
+
+  code = GET_CODE (x);
+  if (code == SYMBOL_REF)
+    return true;
+  fmt = GET_RTX_FORMAT (code);
+  for (i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
+    {
+      if (fmt[i] == 'e')
+	{
+	  if (contains_symbol_ref_p (XEXP (x, i)))
+	    return true;
+	}
+      else if (fmt[i] == 'E')
+	{
+	  for (j = XVECLEN (x, i) - 1; j >= 0; j--)
+	    if (contains_symbol_ref_p (XVECEXP (x, i, j)))
+	      return true;
+	}
+    }
+  return false;
+}
+
 /* Process all regs in location *LOC and change them on equivalent
    substitution.  Return true if any change was done.  */
 static bool
@@ -4020,7 +4049,11 @@ lra_constraints (bool first_p)
       ("Maximum number of LRA constraint passes is achieved (%d)\n",
        LRA_MAX_CONSTRAINT_ITERATION_NUMBER);
   changed_p = false;
-  lra_risky_transformations_p = false;
+  if (pic_offset_table_rtx
+      && REGNO (pic_offset_table_rtx) >= FIRST_PSEUDO_REGISTER)
+    lra_risky_transformations_p = true;
+  else
+    lra_risky_transformations_p = false;
   new_insn_uid_start = get_max_uid ();
   new_regno_start = first_p ? lra_constraint_new_regno_start : max_reg_num ();
   /* Mark used hard regs for target stack size calulations.  */
@@ -4088,7 +4121,12 @@ lra_constraints (bool first_p)
 		   paradoxical subregs.  */
 		|| (MEM_P (x)
 		    && (GET_MODE_SIZE (lra_reg_info[i].biggest_mode)
-			> GET_MODE_SIZE (GET_MODE (x)))))
+			> GET_MODE_SIZE (GET_MODE (x))))
+		|| (pic_offset_table_rtx
+		    && ((CONST_POOL_OK_P (PSEUDO_REGNO_MODE (i), x)
+			 && (targetm.preferred_reload_class
+			     (x, lra_get_allocno_class (i)) == NO_REGS))
+			|| contains_symbol_ref_p (x))))
 	      ira_reg_equiv[i].defined_p = false;
 	    if (contains_reg_p (x, false, true))
 	      ira_reg_equiv[i].profitable_p = false;
diff --git a/gcc/shrink-wrap.c b/gcc/shrink-wrap.c
index fd24135..e1ecff7 100644
--- a/gcc/shrink-wrap.c
+++ b/gcc/shrink-wrap.c
@@ -495,7 +495,8 @@ try_shrink_wrapping (edge *entry_edge, edge orig_entry_edge,
       if (frame_pointer_needed)
 	add_to_hard_reg_set (&set_up_by_prologue.set, Pmode,
 			     HARD_FRAME_POINTER_REGNUM);
-      if (pic_offset_table_rtx)
+      if (pic_offset_table_rtx
+	  && PIC_OFFSET_TABLE_REGNUM != INVALID_REGNUM)
 	add_to_hard_reg_set (&set_up_by_prologue.set, Pmode,
 			     PIC_OFFSET_TABLE_REGNUM);
       if (crtl->drap_reg)
diff --git a/gcc/target.def b/gcc/target.def
index ce11eae..4d90fc2 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -4274,6 +4274,20 @@ DEFHOOK
 
 HOOK_VECTOR_END (calls)
 
+DEFHOOK
+(use_pseudo_pic_reg,
+ "This hook should return 1 in case pseudo register should be created\n\
+for pic_offset_table_rtx during function expand.",
+ bool, (void),
+ hook_bool_void_false)
+
+DEFHOOK
+(init_pic_reg,
+ "Perform a target dependent initialization of pic_offset_table_rtx.\n\
+This hook is called at the start of register allocation.",
+ void, (void),
+ hook_void_void)
+
 /* Return the diagnostic message string if conversion from FROMTYPE
    to TOTYPE is not allowed, NULL otherwise.  */
 DEFHOOK

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Enable EBX for x86 in 32bits PIC code
  2014-09-23 13:54                             ` Ilya Enkovich
@ 2014-09-23 14:23                               ` Uros Bizjak
  2014-09-23 15:59                                 ` Jeff Law
  2014-09-23 14:34                               ` Jakub Jelinek
  1 sibling, 1 reply; 49+ messages in thread
From: Uros Bizjak @ 2014-09-23 14:23 UTC (permalink / raw)
  To: Ilya Enkovich
  Cc: Vladimir Makarov, gcc, gcc-patches, Evgeny Stupachenko,
	Richard Biener, Jeff Law

On Tue, Sep 23, 2014 at 3:54 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:

> Here is a patch which combines results of my and Vladimir's work on EBX enabling.
>
> It works OK for SPEC2000 and SPEC2006 on -Ofast + LTO.  It passes bootstrap but there are few new failures in make check.
>
> gcc.target/i386/pic-1.c fails because it doesn't expect we can use EBX in 32bit PIC mode
> gcc.target/i386/pr55458.c fails due to the same reason
> gcc.target/i386/pr23098.c fails because compiler fails to use float constant as an immediate and loads it from GOT instead
>
> Do we have the final decision about having a sompiler flag to control enabling of pseudo PIC register?  I think we should have a possibility to use fixed EBX at least until we make sure pseudo PIC doesn't harm debug info generation. If we have such option then gcc.target/i386/pic-1.c and gcc.target/i386/pr55458.c should be modified, otherwise these tests should be removed.

I think having this flag would be dangerous. In effect, this flag
would be a hidden -ffixed-bx, with unwanted consequences on asm code
that handles ebx. As an example, please see config/i386/cpuid.h - ATM,
we handle ebx in a special way when __PIC__ is defined. With your
patch, we will have to handle it in a special way when new flag is in
effect, which is impossible, unless another compiler-generated define
is emitted.

So, I vote to change PIC reg to a pseudo unconditionally and adjust
testsuite for all (expected) fall-out.

Uros.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Enable EBX for x86 in 32bits PIC code
  2014-09-23 13:54                             ` Ilya Enkovich
  2014-09-23 14:23                               ` Uros Bizjak
@ 2014-09-23 14:34                               ` Jakub Jelinek
  2014-09-23 15:59                                 ` Petr Machata
  2014-09-23 16:00                                 ` Jeff Law
  1 sibling, 2 replies; 49+ messages in thread
From: Jakub Jelinek @ 2014-09-23 14:34 UTC (permalink / raw)
  To: Ilya Enkovich
  Cc: Vladimir Makarov, gcc, gcc-patches, Evgeny Stupachenko,
	Richard Biener, Uros Bizjak, Jeff Law, Petr Machata

On Tue, Sep 23, 2014 at 05:54:37PM +0400, Ilya Enkovich wrote:
> use fixed EBX at least until we make sure pseudo PIC doesn't harm debug
> info generation.  If we have such option then gcc.target/i386/pic-1.c and

For debug info, it seems you are already handling this in
delegitimize_address target hook, I'd suggest just building some very large
shared library at -O2 -g -fpic on i?86 and either look at the
sizes of .debug_info/.debug_loc sections with/without the patch,
or use the locstat utility from elfutils (talk to Petr Machata if needed).

	Jakub

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Enable EBX for x86 in 32bits PIC code
  2014-09-23 14:23                               ` Uros Bizjak
@ 2014-09-23 15:59                                 ` Jeff Law
  0 siblings, 0 replies; 49+ messages in thread
From: Jeff Law @ 2014-09-23 15:59 UTC (permalink / raw)
  To: Uros Bizjak, Ilya Enkovich
  Cc: Vladimir Makarov, gcc, gcc-patches, Evgeny Stupachenko, Richard Biener

On 09/23/14 08:23, Uros Bizjak wrote:
> On Tue, Sep 23, 2014 at 3:54 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>
>> Here is a patch which combines results of my and Vladimir's work on EBX enabling.
>>
>> It works OK for SPEC2000 and SPEC2006 on -Ofast + LTO.  It passes bootstrap but there are few new failures in make check.
>>
>> gcc.target/i386/pic-1.c fails because it doesn't expect we can use EBX in 32bit PIC mode
>> gcc.target/i386/pr55458.c fails due to the same reason
>> gcc.target/i386/pr23098.c fails because compiler fails to use float constant as an immediate and loads it from GOT instead
>>
>> Do we have the final decision about having a sompiler flag to control enabling of pseudo PIC register?  I think we should have a possibility to use fixed EBX at least until we make sure pseudo PIC doesn't harm debug info generation. If we have such option then gcc.target/i386/pic-1.c and gcc.target/i386/pr55458.c should be modified, otherwise these tests should be removed.
>
> I think having this flag would be dangerous. In effect, this flag
> would be a hidden -ffixed-bx, with unwanted consequences on asm code
> that handles ebx. As an example, please see config/i386/cpuid.h - ATM,
> we handle ebx in a special way when __PIC__ is defined. With your
> patch, we will have to handle it in a special way when new flag is in
> effect, which is impossible, unless another compiler-generated define
> is emitted.
>
> So, I vote to change PIC reg to a pseudo unconditionally and adjust
> testsuite for all (expected) fall-out.
Agreed.  Continuing to support both modes just seems like a maintenance 
nightmare and asking for problems at some point.  If there's performance 
regressions, we just tackle them :-)

I suspect any performance regressions we find are going to point us at 
issues in IRA/LRA that we would want to look at anyway.

jeff

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Enable EBX for x86 in 32bits PIC code
  2014-09-23 14:34                               ` Jakub Jelinek
@ 2014-09-23 15:59                                 ` Petr Machata
  2014-09-23 16:00                                 ` Jeff Law
  1 sibling, 0 replies; 49+ messages in thread
From: Petr Machata @ 2014-09-23 15:59 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Ilya Enkovich, Vladimir Makarov, gcc, gcc-patches,
	Evgeny Stupachenko, Richard Biener, Uros Bizjak, Jeff Law

Jakub Jelinek <jakub@redhat.com> writes:

> look at the sizes of .debug_info/.debug_loc sections with/without the
> patch, or use the locstat utility from elfutils

Not actually part of elfutils, but available either here:
        https://github.com/pmachata/dwlocstat

... or packaged in Fedora.

Thanks,
PM

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Enable EBX for x86 in 32bits PIC code
  2014-09-23 14:34                               ` Jakub Jelinek
  2014-09-23 15:59                                 ` Petr Machata
@ 2014-09-23 16:00                                 ` Jeff Law
  2014-09-23 16:03                                   ` Jakub Jelinek
  1 sibling, 1 reply; 49+ messages in thread
From: Jeff Law @ 2014-09-23 16:00 UTC (permalink / raw)
  To: Jakub Jelinek, Ilya Enkovich
  Cc: Vladimir Makarov, gcc, gcc-patches, Evgeny Stupachenko,
	Richard Biener, Uros Bizjak, Petr Machata

On 09/23/14 08:34, Jakub Jelinek wrote:
> On Tue, Sep 23, 2014 at 05:54:37PM +0400, Ilya Enkovich wrote:
>> use fixed EBX at least until we make sure pseudo PIC doesn't harm debug
>> info generation.  If we have such option then gcc.target/i386/pic-1.c and
>
> For debug info, it seems you are already handling this in
> delegitimize_address target hook, I'd suggest just building some very large
> shared library at -O2 -g -fpic on i?86 and either look at the
> sizes of .debug_info/.debug_loc sections with/without the patch,
> or use the locstat utility from elfutils (talk to Petr Machata if needed).
Can't hurt, but I really don't see how changing from a fixed to an 
allocatable register is going to muck up debug info in any significant 
way.

jeff

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Enable EBX for x86 in 32bits PIC code
  2014-09-23 16:00                                 ` Jeff Law
@ 2014-09-23 16:03                                   ` Jakub Jelinek
  2014-09-23 16:10                                     ` Jeff Law
  0 siblings, 1 reply; 49+ messages in thread
From: Jakub Jelinek @ 2014-09-23 16:03 UTC (permalink / raw)
  To: Jeff Law
  Cc: Ilya Enkovich, Vladimir Makarov, gcc, gcc-patches,
	Evgeny Stupachenko, Richard Biener, Uros Bizjak, Petr Machata

On Tue, Sep 23, 2014 at 10:00:00AM -0600, Jeff Law wrote:
> On 09/23/14 08:34, Jakub Jelinek wrote:
> >On Tue, Sep 23, 2014 at 05:54:37PM +0400, Ilya Enkovich wrote:
> >>use fixed EBX at least until we make sure pseudo PIC doesn't harm debug
> >>info generation.  If we have such option then gcc.target/i386/pic-1.c and
> >
> >For debug info, it seems you are already handling this in
> >delegitimize_address target hook, I'd suggest just building some very large
> >shared library at -O2 -g -fpic on i?86 and either look at the
> >sizes of .debug_info/.debug_loc sections with/without the patch,
> >or use the locstat utility from elfutils (talk to Petr Machata if needed).
> Can't hurt, but I really don't see how changing from a fixed to an
> allocatable register is going to muck up debug info in any significant way.

What matters is if the delegitimize_address target hook is as efficient in
delegitimization as before.  E.g. if it previously matched only when seeing
%ebx + gotoff or similar, and wouldn't match anything now, some vars could
have debug locations including UNSPEC and be dropped on the floor.

	Jakub

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Enable EBX for x86 in 32bits PIC code
  2014-09-23 16:03                                   ` Jakub Jelinek
@ 2014-09-23 16:10                                     ` Jeff Law
  2014-09-24  6:56                                       ` Ilya Enkovich
  0 siblings, 1 reply; 49+ messages in thread
From: Jeff Law @ 2014-09-23 16:10 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Ilya Enkovich, Vladimir Makarov, gcc, gcc-patches,
	Evgeny Stupachenko, Richard Biener, Uros Bizjak, Petr Machata

On 09/23/14 10:03, Jakub Jelinek wrote:
> On Tue, Sep 23, 2014 at 10:00:00AM -0600, Jeff Law wrote:
>> On 09/23/14 08:34, Jakub Jelinek wrote:
>>> On Tue, Sep 23, 2014 at 05:54:37PM +0400, Ilya Enkovich wrote:
>>>> use fixed EBX at least until we make sure pseudo PIC doesn't harm debug
>>>> info generation.  If we have such option then gcc.target/i386/pic-1.c and
>>>
>>> For debug info, it seems you are already handling this in
>>> delegitimize_address target hook, I'd suggest just building some very large
>>> shared library at -O2 -g -fpic on i?86 and either look at the
>>> sizes of .debug_info/.debug_loc sections with/without the patch,
>>> or use the locstat utility from elfutils (talk to Petr Machata if needed).
>> Can't hurt, but I really don't see how changing from a fixed to an
>> allocatable register is going to muck up debug info in any significant way.
>
> What matters is if the delegitimize_address target hook is as efficient in
> delegitimization as before.  E.g. if it previously matched only when seeing
> %ebx + gotoff or similar, and wouldn't match anything now, some vars could
> have debug locations including UNSPEC and be dropped on the floor.
Ah, yea, that makes sense.

jeff

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Enable EBX for x86 in 32bits PIC code
  2014-09-23 16:10                                     ` Jeff Law
@ 2014-09-24  6:56                                       ` Ilya Enkovich
  2014-09-24 15:27                                         ` Jeff Law
  0 siblings, 1 reply; 49+ messages in thread
From: Ilya Enkovich @ 2014-09-24  6:56 UTC (permalink / raw)
  To: Jeff Law
  Cc: Jakub Jelinek, Vladimir Makarov, gcc, gcc-patches,
	Evgeny Stupachenko, Richard Biener, Uros Bizjak, Petr Machata

2014-09-23 20:10 GMT+04:00 Jeff Law <law@redhat.com>:
> On 09/23/14 10:03, Jakub Jelinek wrote:
>>
>> On Tue, Sep 23, 2014 at 10:00:00AM -0600, Jeff Law wrote:
>>>
>>> On 09/23/14 08:34, Jakub Jelinek wrote:
>>>>
>>>> On Tue, Sep 23, 2014 at 05:54:37PM +0400, Ilya Enkovich wrote:
>>>>>
>>>>> use fixed EBX at least until we make sure pseudo PIC doesn't harm debug
>>>>> info generation.  If we have such option then gcc.target/i386/pic-1.c
>>>>> and
>>>>
>>>>
>>>> For debug info, it seems you are already handling this in
>>>> delegitimize_address target hook, I'd suggest just building some very
>>>> large
>>>> shared library at -O2 -g -fpic on i?86 and either look at the
>>>> sizes of .debug_info/.debug_loc sections with/without the patch,
>>>> or use the locstat utility from elfutils (talk to Petr Machata if
>>>> needed).
>>>
>>> Can't hurt, but I really don't see how changing from a fixed to an
>>> allocatable register is going to muck up debug info in any significant
>>> way.
>>
>>
>> What matters is if the delegitimize_address target hook is as efficient in
>> delegitimization as before.  E.g. if it previously matched only when
>> seeing
>> %ebx + gotoff or similar, and wouldn't match anything now, some vars could
>> have debug locations including UNSPEC and be dropped on the floor.
>
> Ah, yea, that makes sense.
>
> jeff


After register allocation we have no idea where GOT address is and
therefore delegitimize_address target hook becomes less efficient and
cannot remove UNSPECs. That's what I see now when build GCC with patch
applied:

../../../../gcc/libgfortran/generated/sum_r4.c: In function 'msum_r4':
../../../../gcc/libgfortran/generated/sum_r4.c:195:1: note:
non-delegitimized UNSPEC UNSPEC_GOTOFF (1) found in variable location
 msum_r4 (gfc_array_r4 * const restrict retarray,
 ^
../../../../gcc/libgfortran/generated/sum_r4.c:195:1: note:
non-delegitimized UNSPEC UNSPEC_GOTOFF (1) found in variable location
../../../../gcc/libgfortran/generated/sum_r4.c:195:1: note:
non-delegitimized UNSPEC UNSPEC_GOTOFF (1) found in variable location
../../../../gcc/libgfortran/generated/sum_r4.c:195:1: note:
non-delegitimized UNSPEC UNSPEC_GOTOFF (1) found in variable location
../../../../gcc/libgfortran/generated/sum_r8.c: In function 'msum_r8':
../../../../gcc/libgfortran/generated/sum_r8.c:195:1: note:
non-delegitimized UNSPEC UNSPEC_GOTOFF (1) found in variable location
 msum_r8 (gfc_array_r8 * const restrict retarray,
 ^
../../../../gcc/libgfortran/generated/sum_r8.c:195:1: note:
non-delegitimized UNSPEC UNSPEC_GOTOFF (1) found in variable location
../../../../gcc/libgfortran/generated/sum_r8.c:195:1: note:
non-delegitimized UNSPEC UNSPEC_GOTOFF (1) found in variable location
../../../../gcc/libgfortran/generated/sum_r8.c:195:1: note:
non-delegitimized UNSPEC UNSPEC_GOTOFF (1) found in variable location


Ilya

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Enable EBX for x86 in 32bits PIC code
  2014-09-24  6:56                                       ` Ilya Enkovich
@ 2014-09-24 15:27                                         ` Jeff Law
  2014-09-24 20:32                                           ` Ilya Enkovich
  0 siblings, 1 reply; 49+ messages in thread
From: Jeff Law @ 2014-09-24 15:27 UTC (permalink / raw)
  To: Ilya Enkovich
  Cc: Jakub Jelinek, Vladimir Makarov, gcc, gcc-patches,
	Evgeny Stupachenko, Richard Biener, Uros Bizjak, Petr Machata

On 09/24/14 00:56, Ilya Enkovich wrote:
> 2014-09-23 20:10 GMT+04:00 Jeff Law <law@redhat.com>:
>> On 09/23/14 10:03, Jakub Jelinek wrote:
>>>
>>> On Tue, Sep 23, 2014 at 10:00:00AM -0600, Jeff Law wrote:
>>>>
>>>> On 09/23/14 08:34, Jakub Jelinek wrote:
>>>>>
>>>>> On Tue, Sep 23, 2014 at 05:54:37PM +0400, Ilya Enkovich wrote:
>>>>>>
>>>>>> use fixed EBX at least until we make sure pseudo PIC doesn't harm debug
>>>>>> info generation.  If we have such option then gcc.target/i386/pic-1.c
>>>>>> and
>>>>>
>>>>>
>>>>> For debug info, it seems you are already handling this in
>>>>> delegitimize_address target hook, I'd suggest just building some very
>>>>> large
>>>>> shared library at -O2 -g -fpic on i?86 and either look at the
>>>>> sizes of .debug_info/.debug_loc sections with/without the patch,
>>>>> or use the locstat utility from elfutils (talk to Petr Machata if
>>>>> needed).
>>>>
>>>> Can't hurt, but I really don't see how changing from a fixed to an
>>>> allocatable register is going to muck up debug info in any significant
>>>> way.
>>>
>>>
>>> What matters is if the delegitimize_address target hook is as efficient in
>>> delegitimization as before.  E.g. if it previously matched only when
>>> seeing
>>> %ebx + gotoff or similar, and wouldn't match anything now, some vars could
>>> have debug locations including UNSPEC and be dropped on the floor.
>>
>> Ah, yea, that makes sense.
>>
>> jeff
>
>
> After register allocation we have no idea where GOT address is and
> therefore delegitimize_address target hook becomes less efficient and
> cannot remove UNSPECs. That's what I see now when build GCC with patch
> applied:
In theory this shouldn't be too hard to fix.

I haven't looked at the code, but it might be something looking 
explicitly for ebx by register #, or something similar.  Which case 
within delegitimize_address isn't firing as it should after your changes?

jeff

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Enable EBX for x86 in 32bits PIC code
  2014-09-24 15:27                                         ` Jeff Law
@ 2014-09-24 20:32                                           ` Ilya Enkovich
  2014-09-24 21:20                                             ` Jeff Law
  0 siblings, 1 reply; 49+ messages in thread
From: Ilya Enkovich @ 2014-09-24 20:32 UTC (permalink / raw)
  To: Jeff Law
  Cc: Jakub Jelinek, Vladimir Makarov, gcc, gcc-patches,
	Evgeny Stupachenko, Richard Biener, Uros Bizjak, Petr Machata

2014-09-24 19:27 GMT+04:00 Jeff Law <law@redhat.com>:
> On 09/24/14 00:56, Ilya Enkovich wrote:
>>
>> 2014-09-23 20:10 GMT+04:00 Jeff Law <law@redhat.com>:
>>>
>>> On 09/23/14 10:03, Jakub Jelinek wrote:
>>>>
>>>>
>>>> On Tue, Sep 23, 2014 at 10:00:00AM -0600, Jeff Law wrote:
>>>>>
>>>>>
>>>>> On 09/23/14 08:34, Jakub Jelinek wrote:
>>>>>>
>>>>>>
>>>>>> On Tue, Sep 23, 2014 at 05:54:37PM +0400, Ilya Enkovich wrote:
>>>>>>>
>>>>>>>
>>>>>>> use fixed EBX at least until we make sure pseudo PIC doesn't harm
>>>>>>> debug
>>>>>>> info generation.  If we have such option then gcc.target/i386/pic-1.c
>>>>>>> and
>>>>>>
>>>>>>
>>>>>>
>>>>>> For debug info, it seems you are already handling this in
>>>>>> delegitimize_address target hook, I'd suggest just building some very
>>>>>> large
>>>>>> shared library at -O2 -g -fpic on i?86 and either look at the
>>>>>> sizes of .debug_info/.debug_loc sections with/without the patch,
>>>>>> or use the locstat utility from elfutils (talk to Petr Machata if
>>>>>> needed).
>>>>>
>>>>>
>>>>> Can't hurt, but I really don't see how changing from a fixed to an
>>>>> allocatable register is going to muck up debug info in any significant
>>>>> way.
>>>>
>>>>
>>>>
>>>> What matters is if the delegitimize_address target hook is as efficient
>>>> in
>>>> delegitimization as before.  E.g. if it previously matched only when
>>>> seeing
>>>> %ebx + gotoff or similar, and wouldn't match anything now, some vars
>>>> could
>>>> have debug locations including UNSPEC and be dropped on the floor.
>>>
>>>
>>> Ah, yea, that makes sense.
>>>
>>> jeff
>>
>>
>>
>> After register allocation we have no idea where GOT address is and
>> therefore delegitimize_address target hook becomes less efficient and
>> cannot remove UNSPECs. That's what I see now when build GCC with patch
>> applied:
>
> In theory this shouldn't be too hard to fix.
>
> I haven't looked at the code, but it might be something looking explicitly
> for ebx by register #, or something similar.  Which case within
> delegitimize_address isn't firing as it should after your changes?

It is the case I had to fix:

@@ -14415,7 +14433,8 @@ ix86_delegitimize_address (rtx x)
         ...
         movl foo@GOTOFF(%ecx), %edx
         in which case we return (%ecx - %ebx) + foo.  */
-      if (pic_offset_table_rtx)
+      if (pic_offset_table_rtx
+         && (!reload_completed || !ix86_use_pseudo_pic_reg ()))
         result = gen_rtx_PLUS (Pmode, gen_rtx_MINUS (Pmode, copy_rtx (addend),
                                                     pic_offset_table_rtx),
                               result);

Originally if there is a UNSPEC_GOTOFFSET but no EBX usage then we
just remove this UNSPEC and substract EBX value.  With pseudo PIC reg
we should use PIC register instead of EBX but it is unclear what to
use after register allocation.

Ilya

>
> jeff
>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Enable EBX for x86 in 32bits PIC code
  2014-09-24 20:32                                           ` Ilya Enkovich
@ 2014-09-24 21:20                                             ` Jeff Law
  2014-09-29 11:09                                               ` Jakub Jelinek
  0 siblings, 1 reply; 49+ messages in thread
From: Jeff Law @ 2014-09-24 21:20 UTC (permalink / raw)
  To: Ilya Enkovich
  Cc: Jakub Jelinek, Vladimir Makarov, gcc, gcc-patches,
	Evgeny Stupachenko, Richard Biener, Uros Bizjak, Petr Machata

On 09/24/14 14:32, Ilya Enkovich wrote:
> 2014-09-24 19:27 GMT+04:00 Jeff Law <law@redhat.com>:
>> On 09/24/14 00:56, Ilya Enkovich wrote:

>>>
>>> After register allocation we have no idea where GOT address is and
>>> therefore delegitimize_address target hook becomes less efficient and
>>> cannot remove UNSPECs. That's what I see now when build GCC with patch
>>> applied:
>>
>> In theory this shouldn't be too hard to fix.
>>
>> I haven't looked at the code, but it might be something looking explicitly
>> for ebx by register #, or something similar.  Which case within
>> delegitimize_address isn't firing as it should after your changes?
>
> It is the case I had to fix:
>
> @@ -14415,7 +14433,8 @@ ix86_delegitimize_address (rtx x)
>           ...
>           movl foo@GOTOFF(%ecx), %edx
>           in which case we return (%ecx - %ebx) + foo.  */
> -      if (pic_offset_table_rtx)
> +      if (pic_offset_table_rtx
> +         && (!reload_completed || !ix86_use_pseudo_pic_reg ()))
>           result = gen_rtx_PLUS (Pmode, gen_rtx_MINUS (Pmode, copy_rtx (addend),
>                                                       pic_offset_table_rtx),
>                                 result);
>
> Originally if there is a UNSPEC_GOTOFFSET but no EBX usage then we
> just remove this UNSPEC and substract EBX value.  With pseudo PIC reg
> we should use PIC register instead of EBX but it is unclear what to
> use after register allocation.
What's the RTL before & after allocation?  Feel free to just pass along 
the dump files for sum_r4 that you referenced in a prior message.

jeff

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Enable EBX for x86 in 32bits PIC code
  2014-09-24 21:20                                             ` Jeff Law
@ 2014-09-29 11:09                                               ` Jakub Jelinek
  2014-10-21 16:05                                                 ` [PATCH] Improve i?86 address delegitimization after 32-bit pic changes (PR target/63542) Jakub Jelinek
  0 siblings, 1 reply; 49+ messages in thread
From: Jakub Jelinek @ 2014-09-29 11:09 UTC (permalink / raw)
  To: Jeff Law
  Cc: Ilya Enkovich, Vladimir Makarov, gcc, gcc-patches,
	Evgeny Stupachenko, Richard Biener, Uros Bizjak, Petr Machata

On Wed, Sep 24, 2014 at 03:20:44PM -0600, Jeff Law wrote:
> On 09/24/14 14:32, Ilya Enkovich wrote:
> >2014-09-24 19:27 GMT+04:00 Jeff Law <law@redhat.com>:
> >>On 09/24/14 00:56, Ilya Enkovich wrote:
> 
> >>>
> >>>After register allocation we have no idea where GOT address is and
> >>>therefore delegitimize_address target hook becomes less efficient and
> >>>cannot remove UNSPECs. That's what I see now when build GCC with patch
> >>>applied:
> >>
> >>In theory this shouldn't be too hard to fix.
> >>
> >>I haven't looked at the code, but it might be something looking explicitly
> >>for ebx by register #, or something similar.  Which case within
> >>delegitimize_address isn't firing as it should after your changes?
> >
> >It is the case I had to fix:
> >
> >@@ -14415,7 +14433,8 @@ ix86_delegitimize_address (rtx x)
> >          ...
> >          movl foo@GOTOFF(%ecx), %edx
> >          in which case we return (%ecx - %ebx) + foo.  */
> >-      if (pic_offset_table_rtx)
> >+      if (pic_offset_table_rtx
> >+         && (!reload_completed || !ix86_use_pseudo_pic_reg ()))
> >          result = gen_rtx_PLUS (Pmode, gen_rtx_MINUS (Pmode, copy_rtx (addend),
> >                                                      pic_offset_table_rtx),
> >                                result);
> >
> >Originally if there is a UNSPEC_GOTOFFSET but no EBX usage then we
> >just remove this UNSPEC and substract EBX value.  With pseudo PIC reg
> >we should use PIC register instead of EBX but it is unclear what to
> >use after register allocation.
> What's the RTL before & after allocation?  Feel free to just pass along the
> dump files for sum_r4 that you referenced in a prior message.

I wonder if during/after reload we just couldn't look at
ORIGINAL_REGNO of hard regs if ix86_use_pseudo_pic_reg.  Or is that
the other case, where you don't have any PIC register replacement around,
and want to subtract something?  Perhaps in that case we could just
subtract the value of _GLOBAL_OFFSET_TABLE_ symbol if we have nothing better
around.

	Jakub

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH] Improve i?86 address delegitimization after 32-bit pic changes (PR target/63542)
  2014-09-29 11:09                                               ` Jakub Jelinek
@ 2014-10-21 16:05                                                 ` Jakub Jelinek
  2014-10-22  2:02                                                   ` Jeff Law
  2014-11-24 15:57                                                   ` H.J. Lu
  0 siblings, 2 replies; 49+ messages in thread
From: Jakub Jelinek @ 2014-10-21 16:05 UTC (permalink / raw)
  To: Uros Bizjak, Jeff Law
  Cc: Ilya Enkovich, Vladimir Makarov, gcc-patches, Evgeny Stupachenko,
	Richard Biener, Uros Bizjak, Petr Machata

On Mon, Sep 29, 2014 at 01:08:56PM +0200, Jakub Jelinek wrote:
> I wonder if during/after reload we just couldn't look at
> ORIGINAL_REGNO of hard regs if ix86_use_pseudo_pic_reg.  Or is that
> the other case, where you don't have any PIC register replacement around,
> and want to subtract something?  Perhaps in that case we could just
> subtract the value of _GLOBAL_OFFSET_TABLE_ symbol if we have nothing better
> around.

Here is a patch that implements both of these ideas.

The number of lines like:
note: non-delegitimized UNSPEC UNSPEC_GOT (0) found in variable location
note: non-delegitimized UNSPEC UNSPEC_GOTOFF (1) found in variable location
during i686-linux bootstrap (not including regtest) went down from
14165 to 19.

The patch trusts that a hard reg with ORIGINAL_REGNO containing the pic
pseudo contains the _GLOBAL_OFFSET_TABLE_ value of the current shared
library (or binary), I think that is reasonable assumption.
And for ELF for the UNSPEC_GOTOFF it worse case can subtract
_GLOBAL_OFFSET_TABLE_ symbol if it doesn't know what register to subtract.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2014-10-21  Jakub Jelinek  <jakub@redhat.com>

	PR target/63542
	* config/i386/i386.c (ix86_pic_register_p): Also return
	true if x is a hard register with ORIGINAL_REGNO equal to
	pic_offset_table_rtx pseudo REGNO.
	(ix86_delegitimize_address): For ix86_use_pseudo_pic_reg ()
	after reload, subtract GOT_SYMBOL_NAME symbol if possible.

	* gcc.target/i386/pr63542-1.c: New test.
	* gcc.target/i386/pr63542-2.c: New test.

--- gcc/config/i386/i386.c.jj	2014-10-21 11:51:30.000000000 +0200
+++ gcc/config/i386/i386.c	2014-10-21 13:06:55.621292368 +0200
@@ -14281,10 +14281,20 @@ ix86_pic_register_p (rtx x)
   if (GET_CODE (x) == VALUE && CSELIB_VAL_PTR (x))
     return (pic_offset_table_rtx
 	    && rtx_equal_for_cselib_p (x, pic_offset_table_rtx));
+  else if (!REG_P (x))
+    return false;
   else if (pic_offset_table_rtx)
-    return REG_P (x) && REGNO (x) == REGNO (pic_offset_table_rtx);
+    {
+      if (REGNO (x) == REGNO (pic_offset_table_rtx))
+	return true;
+      if (HARD_REGISTER_P (x)
+	  && !HARD_REGISTER_P (pic_offset_table_rtx)
+	  && ORIGINAL_REGNO (x) == REGNO (pic_offset_table_rtx))
+	return true;
+      return false;
+    }
   else
-    return REG_P (x) && REGNO (x) == PIC_OFFSET_TABLE_REGNUM;
+    return REGNO (x) == PIC_OFFSET_TABLE_REGNUM;
 }
 
 /* Helper function for ix86_delegitimize_address.
@@ -14457,15 +14467,20 @@ ix86_delegitimize_address (rtx x)
 	 leal (%ebx, %ecx, 4), %ecx
 	 ...
 	 movl foo@GOTOFF(%ecx), %edx
-	 in which case we return (%ecx - %ebx) + foo.
-
-	 Note that when pseudo_pic_reg is used we can generate it only
-	 before reload_completed.  */
+	 in which case we return (%ecx - %ebx) + foo
+	 or (%ecx - _GLOBAL_OFFSET_TABLE_) + foo if pseudo_pic_reg
+	 and reload has completed.  */
       if (pic_offset_table_rtx
 	  && (!reload_completed || !ix86_use_pseudo_pic_reg ()))
         result = gen_rtx_PLUS (Pmode, gen_rtx_MINUS (Pmode, copy_rtx (addend),
 						     pic_offset_table_rtx),
 			       result);
+      else if (pic_offset_table_rtx && !TARGET_MACHO && !TARGET_VXWORKS_RTP)
+	{
+	  rtx tmp = gen_rtx_SYMBOL_REF (Pmode, GOT_SYMBOL_NAME);
+	  tmp = gen_rtx_MINUS (Pmode, copy_rtx (addend), tmp);
+	  result = gen_rtx_PLUS (Pmode, tmp, result);
+	}
       else
 	return orig_x;
     }
--- gcc/testsuite/gcc.target/i386/pr63542-1.c.jj	2014-10-21 13:47:25.938470961 +0200
+++ gcc/testsuite/gcc.target/i386/pr63542-1.c	2014-10-21 13:48:26.227333649 +0200
@@ -0,0 +1,21 @@
+/* PR target/63542 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -g -dA" } */
+/* { dg-additional-options "-fpic" { target fpic } } */
+
+float
+foo (long long u)
+{
+  if (!(-(1LL << 53) < u && u < (1LL << 53)))
+    {
+      if ((unsigned long long) u & ((1ULL << 11) - 1))
+	{
+	  u &= ~((1ULL << 11) - 1);
+	  u |= (1ULL << 11);
+	}
+    }
+  double f = (int) (u >> (32));
+  f *= 0x1p32f;
+  f += (unsigned int) u;
+  return (float) f;
+}
--- gcc/testsuite/gcc.target/i386/pr63542-2.c.jj	2014-10-21 13:47:29.084411447 +0200
+++ gcc/testsuite/gcc.target/i386/pr63542-2.c	2014-10-21 13:48:38.779096466 +0200
@@ -0,0 +1,37 @@
+/* PR target/63542 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -g -dA" } */
+/* { dg-additional-options "-fpic" { target fpic } } */
+
+struct B { unsigned long c; unsigned char *d; };
+extern struct A { struct B *e[0x400]; } *f[128];
+extern void (*bar) (char *p, char *q);
+
+char *
+foo (char *p, char *q)
+{
+  struct B *g;
+  char *b, *l;
+  unsigned long s;
+
+  g = f[((unsigned long) p) >> 22]->e[(((unsigned long) p) >> 12) & 0x3ff];
+  s = g->c << 2;
+  int r = ((unsigned long) p) & 0xfff;
+  int m = g->d[r];
+  if (m > 0xfd)
+    {
+      m = (r >> 2) % (s >> 2);
+      if ((((unsigned long) p) & ~(unsigned long) 0xfff) != (((unsigned long) q) & ~(unsigned long) 0xfff))
+	goto fail;
+    }
+  b = (char *) ((unsigned long) p & ~(unsigned long) 3);
+  b -= m << 2;
+  l = b + s;
+
+  if ( q >= l || q < b)
+    goto fail;
+  return p;
+fail:
+  (*bar) (p, q);
+  return p;
+}


	Jakub

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] Improve i?86 address delegitimization after 32-bit pic changes (PR target/63542)
  2014-10-21 16:05                                                 ` [PATCH] Improve i?86 address delegitimization after 32-bit pic changes (PR target/63542) Jakub Jelinek
@ 2014-10-22  2:02                                                   ` Jeff Law
  2014-11-24 15:57                                                   ` H.J. Lu
  1 sibling, 0 replies; 49+ messages in thread
From: Jeff Law @ 2014-10-22  2:02 UTC (permalink / raw)
  To: Jakub Jelinek, Uros Bizjak
  Cc: Ilya Enkovich, Vladimir Makarov, gcc-patches, Evgeny Stupachenko,
	Richard Biener, Petr Machata

On 10/21/14 16:03, Jakub Jelinek wrote:
> On Mon, Sep 29, 2014 at 01:08:56PM +0200, Jakub Jelinek wrote:
>> I wonder if during/after reload we just couldn't look at
>> ORIGINAL_REGNO of hard regs if ix86_use_pseudo_pic_reg.  Or is that
>> the other case, where you don't have any PIC register replacement around,
>> and want to subtract something?  Perhaps in that case we could just
>> subtract the value of _GLOBAL_OFFSET_TABLE_ symbol if we have nothing better
>> around.
>
> Here is a patch that implements both of these ideas.
>
> The number of lines like:
> note: non-delegitimized UNSPEC UNSPEC_GOT (0) found in variable location
> note: non-delegitimized UNSPEC UNSPEC_GOTOFF (1) found in variable location
> during i686-linux bootstrap (not including regtest) went down from
> 14165 to 19.
>
> The patch trusts that a hard reg with ORIGINAL_REGNO containing the pic
> pseudo contains the _GLOBAL_OFFSET_TABLE_ value of the current shared
> library (or binary), I think that is reasonable assumption.
> And for ELF for the UNSPEC_GOTOFF it worse case can subtract
> _GLOBAL_OFFSET_TABLE_ symbol if it doesn't know what register to subtract.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2014-10-21  Jakub Jelinek  <jakub@redhat.com>
>
> 	PR target/63542
> 	* config/i386/i386.c (ix86_pic_register_p): Also return
> 	true if x is a hard register with ORIGINAL_REGNO equal to
> 	pic_offset_table_rtx pseudo REGNO.
> 	(ix86_delegitimize_address): For ix86_use_pseudo_pic_reg ()
> 	after reload, subtract GOT_SYMBOL_NAME symbol if possible.
>
> 	* gcc.target/i386/pr63542-1.c: New test.
> 	* gcc.target/i386/pr63542-2.c: New test.
OK.
Jeff

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] Improve i?86 address delegitimization after 32-bit pic changes (PR target/63542)
  2014-10-21 16:05                                                 ` [PATCH] Improve i?86 address delegitimization after 32-bit pic changes (PR target/63542) Jakub Jelinek
  2014-10-22  2:02                                                   ` Jeff Law
@ 2014-11-24 15:57                                                   ` H.J. Lu
  1 sibling, 0 replies; 49+ messages in thread
From: H.J. Lu @ 2014-11-24 15:57 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Uros Bizjak, Jeff Law, Ilya Enkovich, Vladimir Makarov,
	gcc-patches, Evgeny Stupachenko, Richard Biener, Petr Machata

On Tue, Oct 21, 2014 at 9:03 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Mon, Sep 29, 2014 at 01:08:56PM +0200, Jakub Jelinek wrote:
>> I wonder if during/after reload we just couldn't look at
>> ORIGINAL_REGNO of hard regs if ix86_use_pseudo_pic_reg.  Or is that
>> the other case, where you don't have any PIC register replacement around,
>> and want to subtract something?  Perhaps in that case we could just
>> subtract the value of _GLOBAL_OFFSET_TABLE_ symbol if we have nothing better
>> around.
>
> Here is a patch that implements both of these ideas.
>
> The number of lines like:
> note: non-delegitimized UNSPEC UNSPEC_GOT (0) found in variable location
> note: non-delegitimized UNSPEC UNSPEC_GOTOFF (1) found in variable location
> during i686-linux bootstrap (not including regtest) went down from
> 14165 to 19.
>
> The patch trusts that a hard reg with ORIGINAL_REGNO containing the pic
> pseudo contains the _GLOBAL_OFFSET_TABLE_ value of the current shared
> library (or binary), I think that is reasonable assumption.
> And for ELF for the UNSPEC_GOTOFF it worse case can subtract
> _GLOBAL_OFFSET_TABLE_ symbol if it doesn't know what register to subtract.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2014-10-21  Jakub Jelinek  <jakub@redhat.com>
>
>         PR target/63542
>         * config/i386/i386.c (ix86_pic_register_p): Also return
>         true if x is a hard register with ORIGINAL_REGNO equal to
>         pic_offset_table_rtx pseudo REGNO.
>         (ix86_delegitimize_address): For ix86_use_pseudo_pic_reg ()
>         after reload, subtract GOT_SYMBOL_NAME symbol if possible.
>
>         * gcc.target/i386/pr63542-1.c: New test.
>         * gcc.target/i386/pr63542-2.c: New test.
>

This caused:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64025


-- 
H.J.

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2014-11-24 14:58 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAOvf_xxsQ_oYGqNAVQ1+BW+CuD3mzebZ2xma0jpF=WfyZMCRCA@mail.gmail.com>
     [not found] ` <CAFiYyc1mFtTezkTJORmJJq+yht=qPSwiN7KDn19+bSuSdaqvMQ@mail.gmail.com>
     [not found]   ` <CAOvf_xyeVeg2oB9Xxz8RMEQ6gyfJY5whd9s4ygoAAEaMU9efnA@mail.gmail.com>
     [not found]     ` <20140707114750.GB31640@tucnak.redhat.com>
     [not found]       ` <CAMbmDYZV_fx0jxmKHhLsC2pJ7pDzuu6toEAH72izOdpq6KGyfg@mail.gmail.com>
2014-08-22 12:21         ` Enable EBX for x86 in 32bits PIC code Ilya Enkovich
2014-08-23  1:47           ` Hans-Peter Nilsson
2014-08-25  9:25             ` Ilya Enkovich
2014-08-25 11:24               ` Hans-Peter Nilsson
2014-08-25 11:43                 ` Ilya Enkovich
2014-08-25 15:09           ` Vladimir Makarov
2014-08-26  7:49             ` Ilya Enkovich
2014-08-26  8:57               ` Ilya Enkovich
2014-08-26 15:25                 ` Vladimir Makarov
2014-08-26 21:42                   ` Ilya Enkovich
2014-08-27 20:19                     ` Vladimir Makarov
2014-08-28  8:28                       ` Ilya Enkovich
2014-08-29  6:47                         ` Ilya Enkovich
2014-09-02 14:29                           ` Vladimir Makarov
2014-09-03 20:19                           ` Vladimir Makarov
     [not found]                             ` <0EFAB2BDD0F67E4FB6CCC8B9F87D756969B3A89D@IRSMSX101.ger.corp.intel.com>
2014-09-09 16:43                               ` Vladimir Makarov
2014-09-11 19:57                                 ` Jeff Law
2014-09-23 13:54                             ` Ilya Enkovich
2014-09-23 14:23                               ` Uros Bizjak
2014-09-23 15:59                                 ` Jeff Law
2014-09-23 14:34                               ` Jakub Jelinek
2014-09-23 15:59                                 ` Petr Machata
2014-09-23 16:00                                 ` Jeff Law
2014-09-23 16:03                                   ` Jakub Jelinek
2014-09-23 16:10                                     ` Jeff Law
2014-09-24  6:56                                       ` Ilya Enkovich
2014-09-24 15:27                                         ` Jeff Law
2014-09-24 20:32                                           ` Ilya Enkovich
2014-09-24 21:20                                             ` Jeff Law
2014-09-29 11:09                                               ` Jakub Jelinek
2014-10-21 16:05                                                 ` [PATCH] Improve i?86 address delegitimization after 32-bit pic changes (PR target/63542) Jakub Jelinek
2014-10-22  2:02                                                   ` Jeff Law
2014-11-24 15:57                                                   ` H.J. Lu
2014-08-27 21:39                     ` Enable EBX for x86 in 32bits PIC code Jeff Law
2014-08-28  8:37                       ` Ilya Enkovich
2014-08-28 12:43                         ` Uros Bizjak
2014-08-28 12:54                           ` Ilya Enkovich
2014-08-28 13:08                             ` Uros Bizjak
2014-08-28 13:29                               ` Ilya Enkovich
2014-08-28 16:25                                 ` Uros Bizjak
2014-08-29 18:56                         ` Jeff Law
2014-08-25 17:30           ` Jeff Law
2014-08-28 13:01           ` Uros Bizjak
2014-08-28 13:13             ` Ilya Enkovich
2014-08-28 18:30             ` Florian Weimer
2014-08-29 18:48             ` Jeff Law
2014-08-28 18:58           ` Uros Bizjak
2014-08-29  6:51             ` Ilya Enkovich
2014-08-29 18:45             ` Jeff Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).