public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* PowerPC future machine, version 3
@ 2019-08-26 19:21 Michael Meissner
  2019-08-26 20:41 ` [PATCH V3, #1 of 10], Add basic pc-relative support Michael Meissner
                   ` (9 more replies)
  0 siblings, 10 replies; 42+ messages in thread
From: Michael Meissner @ 2019-08-26 19:21 UTC (permalink / raw)
  To: gcc-patches, segher, dje.gcc, meissner

Since there was a version 2 of one of the previous patches, I'll call this set
version 3.  These patches completely replace the previous patches.

As before, these are for a future PowerPC machine that we are looking at.  If a
real machine is announced that uses these instructions, we may change the name
used in the -mcpu=<machine> option to be the real machine, and drop the
'future' name.

I tried to make the changes suggested in the previous patch set, and keep
things that use the address masks (i.e. RELOAD_REG_*) confined to rs6000.c.

To recap:

    Patch #1:  Basic changes to enable pcrel addresses using PLA/PLD;
    Patch #2:  Optional rework rs6000_setup_reg_addr_mask;
    Patch #3:  Add prefixed RTL support;
    Patch #4:  Add prefixed load/store to all offset instructions;
    Patch #5:  Optionally enable pc-relative on Linux 64-bit ELFv2;
    Patch #6:  Fix a limitation with vector extracts & pcrel addresses;
    Patch #7:  Add PCREL_OPT support;
    Patch #8:  Misc. tests;
    Patch #9:  Prefixed load/store tests with large numeric offsets;
    Patch #10: Pc-relative load/store tests.

I have built each of patches 1-7 in succession, building a bootstrapped
compiler and running make check.  There were no regressions.  I then reran the
tests with patches 8-10 applied, and all of the new patches also pass.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH V3, #1 of 10], Add basic pc-relative support
  2019-08-26 19:21 PowerPC future machine, version 3 Michael Meissner
@ 2019-08-26 20:41 ` Michael Meissner
  2019-08-28 18:46   ` Segher Boessenkool
  2019-08-26 21:07 ` [PATCH, V3, #3 of 10], Add prefixed RTL insn attribute Michael Meissner
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 42+ messages in thread
From: Michael Meissner @ 2019-08-26 20:41 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, segher, dje.gcc

This patch adds basic pc-relative support.

I changed the public type used in the prefixed match function from insn_form to
trad_form.  This argument is used in matching prefixed addresses to say whether
the traditional instruction's offset uses the D instruction format, the DS
instruction format, or the DQ instruction format.  I am open to other name
suggestions (or going back to insn_form).

This patch adds a new RELOAD_REG_* mask for the DS instruction format.  It uses
the old code in to set this mask for the two places where we need to set the DS
flag (64-bit GPR registers and Altivec scalar registers).

In addition to a new mask bit, I removed the RELOAD_REG_ANY reload register
class.  This was always hard to explain why it was declared as a reload
register class.  Instead, I added a new field (any_addr_mask) that is the OR of
most of the address bits, and I changed the various mode_supports functions to
use it.

In addition, I added a new field (default_addr_mask) that is the default
address mask to use when you don't have the full load/store instruction, and
know which register is being loaded or stored.  I.e. scalar integers uses the
GPR address mask for the default address mask, scalar floating point (other
that IEEE 128-bit) uses the FPR address mask, vectors use either FPR or Altivec
address masks, and IEEE 128-bit uses the Altivec address mask.

This patch adds the support so that if you use the -mpcrel option, it will
enable using the pc-relative addressing by loading up the address into a base
register (in the previous patches, this code was in patch #2).  If the symbol
is a local symbol or a label, GCC will load up the address with:

	PLA reg,symbol@pcrel

If the symbol is not local to the current module, GCC will load up the address
with:

	PLD reg,symbol@got@pcrel

When the linker is linking the program, if it is linking the objects to be the
main program, and the symbol is defined in another module also in the main
program, the linker will transform this to:

	PLA reg,symbol@pcrel

If the symbol is defined in a shared library, or the code being linked is in a
shared library, the linker will allocate a 64-bit value in the .got section
that the loader will fill in the address when the program is loaded.  The
linker will transform this to:

	PLD reg,symbol.got@pcrel

Rather than just generating a SET instruction like I did previously, I used a
named insn to create loading up a local or external pc-relative symbol.

I have built a bootstrapped compiler with this patch and I ran make check on a
little endian power8 system, and there were no regressions.  Can I check this
into the trunk?

In addition, I built a big endian cross compiler and I used -mdebug=reg to dump
out the address masks for each of the modes that we care about.  I did:

	-mcpu=power5/power6/power7/power8/power9/future for big endian -m32
	-mcpu=power5/power6/power7/power8/power9/future for big endian -m64
	-mcpu=power8/power9/future for little endian -m64 -mabi=elfv2

All of the cpu/endian/bit-size targets use the same address masks as before
except for little endian future.  There the -mdebug=reg code prints whether a
mode needs to use the DS mode.  I didn't print the DS information earlier, so
that I could more easily do the comparison.  For the future/le combinations, I
verified that each of the modes that says it needs to use a DS instruction
format, did actually have that requirement for the traditional instruction.

2019-08-26  Michael Meissner  <meissner@linux.ibm.com>

	* config/rs6000/predicates.md (pcrel_local_address): Rename
	pcrel_address to pcrel_local_address.
	(pcrel_ext_address): Rename pcrel_external_address to
	pcrel_ext_address.
	(prefixed_mem_operand): Delete.
	(pcrel_external_mem_operand): Delete.
	* config/rs6000/rs6000-protos.h (rs6000_prefixed_address_mode_p):
	Delete.
	(trad_insn_type): New enumeration.
	(prefixed_local_addr_p): New declaration.
	* config/rs6000/rs6000.c (RELOAD_REG_ANY): Delete enum element.
	(reload_reg_map): Delete RELOAD_REG_ANY element.
	(addr_mask_type): Grow type to unsigned short.
	(RELOAD_REG_*): Add RELOAD_REG_DS_OFFSET mask.
	(struct rs6000_reg_addr): Add any_addr_mask and default_addr_mask
	fields.
	(mode_supports_pre_incdec_p): Use reg_addr[m].any_addr_mask
	field.
	(mode_supports_pre_modify_p): Use reg_addr[m].any_addr_mask
	field.
	(mode_supports_dq_form): Use reg_addr[m].any_addr_mask field.
	(rs6000_debug_addr_mask): Print out DS mask bit if -mcpu=future.
	(rs6000_debug_print_mode): Print the any_addr_mask and
	default_addr_mask fields.
	(rs6000_setup_reg_addr_masks): Mark where an offset instruction
	will need to use a DS offset.  Set up the any_addr_mask and
	default_addr_mask fields.
	(rs6000_emit_move): Emit code to load up pc-relative addresses if
	-mpcrel.
	(rs6000_secondary_reload_memory): Use any_addr_mask field.
	(print_operand_address): Handle both local and external
	pc-relative symbols.
	(mode_supports_prefixed_address_p): Delete.
	(rs6000_prefixed_address_mode_p): Delete.
	(addr_mask_to_trad_insn): New function.
	(prefixed_local_addr_p): New function that replaces the
	rs6000_prefixed_address_mode_p function.
	* config/rs6000/rs6000.md (pcrel_local_addr): New insn.
	(pcrel_ext_addr): New insn.

Index: gcc/config/rs6000/predicates.md
===================================================================
--- gcc/config/rs6000/predicates.md	(revision 274864)
+++ gcc/config/rs6000/predicates.md	(working copy)
@@ -1626,8 +1626,8 @@ (define_predicate "small_toc_ref"
   return GET_CODE (op) == UNSPEC && XINT (op, 1) == UNSPEC_TOCREL;
 })
 
-;; Return true if the operand is a pc-relative address.
-(define_predicate "pcrel_address"
+;; Return true if the operand is a pc-relative address to a local label.
+(define_predicate "pcrel_local_address"
   (match_code "label_ref,symbol_ref,const")
 {
   if (!rs6000_pcrel_p (cfun))
@@ -1662,7 +1662,7 @@ (define_predicate "pcrel_address"
 ;; defined locally in another module or a PLD of the address if the label is
 ;; defined in another module.
 
-(define_predicate "pcrel_external_address"
+(define_predicate "pcrel_ext_address"
   (match_code "symbol_ref,const")
 {
   if (!rs6000_pcrel_p (cfun))
@@ -1686,22 +1686,6 @@ (define_predicate "pcrel_external_addres
   return (SYMBOL_REF_P (op) && !SYMBOL_REF_LOCAL_P (op));
 })
 
-;; Return 1 if op is a prefixed memory operand.
-(define_predicate "prefixed_mem_operand"
-  (match_code "mem")
-{
-  return rs6000_prefixed_address_mode_p (XEXP (op, 0), GET_MODE (op));
-})
-
-;; Return 1 if op is a memory operand to an external variable when we
-;; support pc-relative addressing and the PCREL_OPT relocation to
-;; optimize references to it.
-(define_predicate "pcrel_external_mem_operand"
-  (match_code "mem")
-{
-  return pcrel_external_address (XEXP (op, 0), Pmode);
-})
-
 ;; Match the first insn (addis) in fusing the combination of addis and loads to
 ;; GPR registers on power8.
 (define_predicate "fusion_gpr_addis"
Index: gcc/config/rs6000/rs6000-protos.h
===================================================================
--- gcc/config/rs6000/rs6000-protos.h	(revision 274864)
+++ gcc/config/rs6000/rs6000-protos.h	(working copy)
@@ -154,7 +154,22 @@ extern align_flags rs6000_loop_align (rt
 extern void rs6000_split_logical (rtx [], enum rtx_code, bool, bool, bool);
 extern bool rs6000_pcrel_p (struct function *);
 extern bool rs6000_fndecl_pcrel_p (const_tree);
-extern bool rs6000_prefixed_address_mode_p (rtx, machine_mode);
+
+/* Enumeration giving the type of traditional addressing that would be used to
+   decide whether an instruction uses prefixed memory or not.  If the
+   traditional instruction uses the DS instruction format, and the bottom 2
+   bits of the offset are not 0, the traditional instruction cannot be used,
+   but a prefixed instruction can be used.  */
+
+typedef enum {
+  TRAD_INSN_DEFAULT,	/* Use the default for the mode.  */
+  TRAD_INSN_D,		/* Insn uses D format (all 16 bits).  */
+  TRAD_INSN_DS,		/* Insn uses DS format (bottom 2 bits clear).  */
+  TRAD_INSN_DQ,		/* Insn uses DQ format (bottom 4 bits clear).  */
+  TRAD_INSN_INVALID	/* Insn does not have an offsettable form.  */
+} trad_insn_type;
+
+extern bool prefixed_local_addr_p (rtx, machine_mode, trad_insn_type);
 #endif /* RTX_CODE */
 
 #ifdef TREE_CODE
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 274864)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -325,7 +325,6 @@ enum rs6000_reload_reg_type {
   RELOAD_REG_GPR,			/* General purpose registers.  */
   RELOAD_REG_FPR,			/* Traditional floating point regs.  */
   RELOAD_REG_VMX,			/* Altivec (VMX) registers.  */
-  RELOAD_REG_ANY,			/* OR of GPR, FPR, Altivec masks.  */
   N_RELOAD_REG
 };
 
@@ -345,22 +344,22 @@ static const struct reload_reg_map_type
   { "Gpr",	FIRST_GPR_REGNO },	/* RELOAD_REG_GPR.  */
   { "Fpr",	FIRST_FPR_REGNO },	/* RELOAD_REG_FPR.  */
   { "VMX",	FIRST_ALTIVEC_REGNO },	/* RELOAD_REG_VMX.  */
-  { "Any",	-1 },			/* RELOAD_REG_ANY.  */
 };
 
 /* Mask bits for each register class, indexed per mode.  Historically the
    compiler has been more restrictive which types can do PRE_MODIFY instead of
    PRE_INC and PRE_DEC, so keep track of sepaate bits for these two.  */
-typedef unsigned char addr_mask_type;
+typedef unsigned short addr_mask_type;
 
-#define RELOAD_REG_VALID	0x01	/* Mode valid in register..  */
-#define RELOAD_REG_MULTIPLE	0x02	/* Mode takes multiple registers.  */
-#define RELOAD_REG_INDEXED	0x04	/* Reg+reg addressing.  */
-#define RELOAD_REG_OFFSET	0x08	/* Reg+offset addressing. */
-#define RELOAD_REG_PRE_INCDEC	0x10	/* PRE_INC/PRE_DEC valid.  */
-#define RELOAD_REG_PRE_MODIFY	0x20	/* PRE_MODIFY valid.  */
-#define RELOAD_REG_AND_M16	0x40	/* AND -16 addressing.  */
-#define RELOAD_REG_QUAD_OFFSET	0x80	/* quad offset is limited.  */
+#define RELOAD_REG_VALID	0x001	/* Mode valid in register..  */
+#define RELOAD_REG_MULTIPLE	0x002	/* Mode takes multiple registers.  */
+#define RELOAD_REG_INDEXED	0x004	/* Reg+reg addressing.  */
+#define RELOAD_REG_OFFSET	0x008	/* Reg+offset addressing. */
+#define RELOAD_REG_PRE_INCDEC	0x010	/* PRE_INC/PRE_DEC valid.  */
+#define RELOAD_REG_PRE_MODIFY	0x020	/* PRE_MODIFY valid.  */
+#define RELOAD_REG_AND_M16	0x040	/* AND -16 addressing.  */
+#define RELOAD_REG_QUAD_OFFSET	0x080	/* DQ offset (bottom 4 bits 0).  */
+#define RELOAD_REG_DS_OFFSET	0x100	/* DS offset (bottom 2 bits 0).  */
 
 /* Register type masks based on the type, of valid addressing modes.  */
 struct rs6000_reg_addr {
@@ -370,6 +369,8 @@ struct rs6000_reg_addr {
   enum insn_code reload_gpr_vsx;	/* INSN to move from GPR to VSX.  */
   enum insn_code reload_vsx_gpr;	/* INSN to move from VSX to GPR.  */
   addr_mask_type addr_mask[(int)N_RELOAD_REG]; /* Valid address masks.  */
+  addr_mask_type any_addr_mask;		/* OR of GPR/FPR/VMX addr_masks.  */
+  addr_mask_type default_addr_mask;	/* Default addr_mask to use.  */
   bool scalar_in_vmx_p;			/* Scalar value can go in VMX.  */
 };
 
@@ -379,16 +380,14 @@ static struct rs6000_reg_addr reg_addr[N
 static inline bool
 mode_supports_pre_incdec_p (machine_mode mode)
 {
-  return ((reg_addr[mode].addr_mask[RELOAD_REG_ANY] & RELOAD_REG_PRE_INCDEC)
-	  != 0);
+  return ((reg_addr[mode].any_addr_mask & RELOAD_REG_PRE_INCDEC) != 0);
 }
 
 /* Helper function to say whether a mode supports PRE_MODIFY.  */
 static inline bool
 mode_supports_pre_modify_p (machine_mode mode)
 {
-  return ((reg_addr[mode].addr_mask[RELOAD_REG_ANY] & RELOAD_REG_PRE_MODIFY)
-	  != 0);
+  return ((reg_addr[mode].any_addr_mask & RELOAD_REG_PRE_MODIFY) != 0);
 }
 
 /* Return true if we have D-form addressing in altivec registers.  */
@@ -404,8 +403,7 @@ mode_supports_vmx_dform (machine_mode mo
 static inline bool
 mode_supports_dq_form (machine_mode mode)
 {
-  return ((reg_addr[mode].addr_mask[RELOAD_REG_ANY] & RELOAD_REG_QUAD_OFFSET)
-	  != 0);
+  return ((reg_addr[mode].any_addr_mask & RELOAD_REG_QUAD_OFFSET) != 0);
 }
 
 /* Given that there exists at least one variable that is set (produced)
@@ -2078,6 +2076,9 @@ rs6000_debug_addr_mask (addr_mask_type m
 
   if ((mask & RELOAD_REG_QUAD_OFFSET) != 0)
     *p++ = 'O';
+  /* To simplify comparing addr_masks, don't print DS for older machines.  */
+  else if ((mask & RELOAD_REG_DS_OFFSET) != 0 && TARGET_PREFIXED_ADDR)
+    *p++ = 's';
   else if ((mask & RELOAD_REG_OFFSET) != 0)
     *p++ = 'o';
   else if (keep_spaces)
@@ -2115,6 +2116,12 @@ rs6000_debug_print_mode (ssize_t m)
     fprintf (stderr, " %s: %s", reload_reg_map[rc].name,
 	     rs6000_debug_addr_mask (reg_addr[m].addr_mask[rc], true));
 
+  fprintf (stderr, " Any: %s",
+	   rs6000_debug_addr_mask (reg_addr[m].any_addr_mask, true));
+
+  fprintf (stderr, " Default: %s",
+	   rs6000_debug_addr_mask (reg_addr[m].default_addr_mask, true));
+
   if ((reg_addr[m].reload_store != CODE_FOR_nothing)
       || (reg_addr[m].reload_load != CODE_FOR_nothing))
     {
@@ -2660,11 +2667,74 @@ rs6000_setup_reg_addr_masks (void)
 	      && (addr_mask & RELOAD_REG_VALID) != 0)
 	    addr_mask |= RELOAD_REG_AND_M16;
 
+	  /* 64-bit and larger values on GPRs need DS format instructions.  All
+	     non-vector offset instructions in Altivec registers need the DS
+	     format instructions.  */
+	  const addr_mask_type quad_flags = (RELOAD_REG_OFFSET
+					     | RELOAD_REG_QUAD_OFFSET);
+
+	  if ((addr_mask & quad_flags) == RELOAD_REG_OFFSET
+	      && ((rc == RELOAD_REG_GPR && msize >= 8 && TARGET_POWERPC64)
+		  || (rc == RELOAD_REG_VMX)))
+	    addr_mask |= RELOAD_REG_DS_OFFSET;
+
 	  reg_addr[m].addr_mask[rc] = addr_mask;
-	  any_addr_mask |= addr_mask;
+	  any_addr_mask |= (addr_mask & ~RELOAD_REG_AND_M16);
 	}
 
-      reg_addr[m].addr_mask[RELOAD_REG_ANY] = any_addr_mask;
+      reg_addr[m].any_addr_mask = any_addr_mask;
+
+      /* Figure out what the default reload register set that should be used
+	 for each mode, that should mirror the expected usage (i.e. vectors in
+	 vector registers, ints in GPRs, etc).  Fall back to GPRs as a last
+	 resort if the mode isn't valid in the vector/floating point registers.
+	 In the case of vectors and FP, we want to test the reload register
+	 classes in the order of epxected use or in terms of functionality (the
+	 FPRs offer offsettable loads/stores in earlier ISAs).  */
+
+      int def_rc;
+      int rc_order[2];
+      int rc_max = 0;
+
+      /* IEEE 128-bit hardware floating point insns use Altivec registers.  */
+      if (TARGET_FLOAT128_HW && FLOAT128_IEEE_P (m))
+	rc_order[rc_max++] = RELOAD_REG_VMX;
+
+      /* Normal vectors and software IEEE 128-bit can use either floating point
+	 registers or Altivec registers.  */
+      else if (TARGET_VSX && (VECTOR_MODE_P (m) || FLOAT128_IEEE_P (m)))
+	{
+	  rc_order[rc_max++] = RELOAD_REG_FPR;
+	  rc_order[rc_max++] = RELOAD_REG_VMX;
+	}
+
+      /* Altivec only vectors use the Altivec registers.  */
+      else if (TARGET_ALTIVEC && !TARGET_VSX && VECTOR_MODE_P (m))
+	rc_order[rc_max++] = RELOAD_REG_VMX;
+
+      /* For scalar binary/decimal floating point, prefer FPRs over altivec
+	 registers.  */
+      else if (TARGET_HARD_FLOAT && SCALAR_FLOAT_MODE_P (m))
+	{
+	  rc_order[rc_max++] = RELOAD_REG_FPR;
+	  rc_order[rc_max++] = RELOAD_REG_VMX;
+	}
+
+      /* Default to GPRs if neither FPRs or Altivec registers is valid and
+	 preferred.  */
+      def_rc = RELOAD_REG_GPR;
+      for (int i = 0; i < rc_max; i++)
+	{
+	  int rc_num = rc_order[i];
+	  if ((reg_addr[m].addr_mask[rc_num] & RELOAD_REG_VALID) != 0)
+	    {
+	      def_rc = rc_num;
+	      break;
+	    }
+	}
+
+      reg_addr[m].default_addr_mask = (reg_addr[m].addr_mask[def_rc]
+				       & ~RELOAD_REG_AND_M16);
     }
 }
 
@@ -9634,6 +9704,21 @@ rs6000_emit_move (rtx dest, rtx source,
 	  return;
 	}
 
+      /* Handle loading up pc-relative addresses.  */
+      if (TARGET_PCREL && mode == E_DImode)
+	{
+	  if (pcrel_local_address (operands[1], Pmode))
+	    {
+	      emit_insn (gen_pcrel_local_addr (operands[0], operands[1]));
+	      return;
+	    }
+	  else if (pcrel_ext_address (operands[1], Pmode))
+	    {
+	      emit_insn (gen_pcrel_ext_addr (operands[0], operands[1]));
+	      return;
+	    }
+	}
+
       if (DEFAULT_ABI == ABI_V4
 	  && mode == Pmode && mode == SImode
 	  && flag_pic == 1 && got_operand (operands[1], mode))
@@ -10770,11 +10855,10 @@ rs6000_secondary_reload_memory (rtx addr
 		 & ~RELOAD_REG_AND_M16);
 
   /* If the register allocator hasn't made up its mind yet on the register
-     class to use, settle on defaults to use.  */
+     class to use, use the default address mask bits.  */
   else if (rclass == NO_REGS)
     {
-      addr_mask = (reg_addr[mode].addr_mask[RELOAD_REG_ANY]
-		   & ~RELOAD_REG_AND_M16);
+      addr_mask = reg_addr[mode].default_addr_mask;
 
       if ((addr_mask & RELOAD_REG_MULTIPLE) != 0)
 	addr_mask &= ~(RELOAD_REG_INDEXED
@@ -13074,7 +13158,7 @@ print_operand_address (FILE *file, rtx x
     fprintf (file, "0(%s)", reg_names[ REGNO (x) ]);
 
   /* Is it a pc-relative address?  */
-  else if (pcrel_address (x, Pmode))
+  else if (pcrel_local_address (x, Pmode) || pcrel_ext_address (x, Pmode))
     {
       HOST_WIDE_INT offset;
 
@@ -13094,6 +13178,9 @@ print_operand_address (FILE *file, rtx x
       if (offset)
 	fprintf (file, "%+" PRId64, offset);
 
+      if (SYMBOL_REF_P (x) && !SYMBOL_REF_LOCAL_P (x))
+	fputs ("@got", file);
+
       fputs ("@pcrel", file);
     }
   else if (SYMBOL_REF_P (x) || GET_CODE (x) == CONST
@@ -13579,29 +13666,68 @@ rs6000_pltseq_template (rtx *operands, i
   return str;
 }
 #endif
+\f
+/* Helper function to take a MODE and an ADDR_MASK and turn it into the
+   traditional instruction format (D/DS/DQ).  */
 
-/* Helper function to return whether a MODE can do prefixed loads/stores.
-   VOIDmode is used when we are loading the pc-relative address into a base
-   register, but we are not using it as part of a memory operation.  As modes
-   add support for prefixed memory, they will be added here.  */
-
-static bool
-mode_supports_prefixed_address_p (machine_mode mode)
+static trad_insn_type
+addr_mask_to_trad_insn (machine_mode mode, addr_mask_type addr_mask)
 {
-  return mode == VOIDmode;
+  const addr_mask_type flags = RELOAD_REG_MULTIPLE | RELOAD_REG_OFFSET;
+
+  /* If the mode does not support offset addressing directly, but it has
+     multiple registers, see if we can figure out a type that after splitting
+     the load/store, will be used (i.e. for a vector, use the element, for IBM
+     long double or TDmode use DFmode, etc.).  This is typically needed in the
+     early RTL stages before register allocation has been done.  */
+  if ((addr_mask & flags) == RELOAD_REG_MULTIPLE)
+    {
+      machine_mode inner = word_mode;
+
+      if (COMPLEX_MODE_P (mode))
+	{
+	  inner = GET_MODE_INNER (mode);
+	  if ((reg_addr[inner].default_addr_mask & RELOAD_REG_OFFSET) == 0)
+	    inner = word_mode;
+	}
+
+      if (FLOAT128_2REG_P (mode))
+	{
+	  inner = DFmode;
+	  if ((reg_addr[inner].default_addr_mask & RELOAD_REG_OFFSET) == 0)
+	    inner = word_mode;
+	}
+
+      addr_mask = reg_addr[inner].default_addr_mask;
+    }
+
+  if ((addr_mask & RELOAD_REG_OFFSET) == 0)
+    return TRAD_INSN_INVALID;
+
+  if ((addr_mask & RELOAD_REG_QUAD_OFFSET) != 0)
+    return TRAD_INSN_DQ;
+
+  if ((addr_mask & RELOAD_REG_DS_OFFSET) != 0)
+    return TRAD_INSN_DS;
+
+  return TRAD_INSN_D;
 }
 
 /* Function to return true if ADDR is a valid prefixed memory address that uses
-   mode MODE.  */
+   mode MODE, and the traditional instruction uses the TRAD_INSN format.  */
 
 bool
-rs6000_prefixed_address_mode_p (rtx addr, machine_mode mode)
+prefixed_local_addr_p (rtx addr,
+		       machine_mode mode,
+		       trad_insn_type trad_insn)
 {
-  if (!TARGET_PREFIXED_ADDR || !mode_supports_prefixed_address_p (mode))
+  /* Don't allow SDmode, because it only can be loaded into FPRs using LFIWZX
+     instruction.  */
+  if (!TARGET_PREFIXED_ADDR || mode == E_SDmode)
     return false;
 
-  /* Check for PC-relative addresses.  */
-  if (pcrel_address (addr, Pmode))
+  /* Check for local PC-relative addresses.  */
+  if (pcrel_local_address (addr, Pmode))
     return true;
 
   /* Check for prefixed memory addresses that have a large numeric offset,
@@ -13622,24 +13748,16 @@ rs6000_prefixed_address_mode_p (rtx addr
       if (!SIGNED_16BIT_OFFSET_P (value))
 	return true;
 
-      /* DQ instruction (bottom 4 bits must be 0) for vectors.  */
-      HOST_WIDE_INT mask;
-      if (GET_MODE_SIZE (mode) >= 16)
-	mask = 15;
-
-      /* DS instruction (bottom 2 bits must be 0).  For 32-bit integers, we
-	 need to use DS instructions if we are sign-extending the value with
-	 LWA.  For 32-bit floating point, we need DS instructions to load and
-	 store values to the traditional Altivec registers.  */
-      else if (GET_MODE_SIZE (mode) >= 4)
-	mask = 3;
+      /* If needed, figure out the traditional instruction format.  */
+      if (trad_insn == TRAD_INSN_DEFAULT)
+	trad_insn
+	  = addr_mask_to_trad_insn (mode, reg_addr[mode].default_addr_mask);
 
-      /* QImode/HImode has no restrictions.  */
-      else
-	return true;
+      if (trad_insn == TRAD_INSN_DS)
+	return (value & 3) != 0;
 
-      /* Return true if we must use a prefixed instruction.  */
-      return (value & mask) != 0;
+      if (trad_insn == TRAD_INSN_DQ)
+	return (value & 15) != 0;
     }
 
   return false;
Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(revision 274864)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -9877,6 +9877,28 @@ (define_expand "restore_stack_nonlocal"
   operands[6] = gen_rtx_PARALLEL (VOIDmode, p);
 })
 \f
+;; Load up a pc-relative address.  Print_operand_address will append a @pcrel
+;; to the symbol or label.
+(define_insn "pcrel_local_addr"
+  [(set (match_operand:DI 0 "gpc_reg_operand" "=b*r")
+	(match_operand:DI 1 "pcrel_local_address"))]
+  "TARGET_PCREL"
+  "pla %0,%a1"
+  [(set_attr "length" "12")])
+
+;; Load up a pc-relative address to an external symbol.  If the symbol and the
+;; program are both defined in the main program, the linker will optimize this
+;; to a PADDI.  Otherwise, it will create a GOT address that is relocated by
+;; the dynamic linker and loaded up.  Print_operand_address will append a
+;; @got@pcrel to the symbol.
+(define_insn "pcrel_ext_addr"
+  [(set (match_operand:DI 0 "gpc_reg_operand" "=b*r")
+	(match_operand:DI 1 "pcrel_ext_address"))]
+  "TARGET_PCREL"
+  "pld %0,%a1"
+  [(set_attr "length" "12")
+   (set_attr "type" "load")])
+
 ;; TOC register handling.
 
 ;; Code to initialize the TOC register...

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH, V3, #3 of 10], Add prefixed RTL insn attribute
  2019-08-26 19:21 PowerPC future machine, version 3 Michael Meissner
  2019-08-26 20:41 ` [PATCH V3, #1 of 10], Add basic pc-relative support Michael Meissner
@ 2019-08-26 21:07 ` Michael Meissner
  2019-08-30  1:58   ` Segher Boessenkool
  2019-08-26 21:12 ` [PATCH, V3, #2 of 10], Improve rs6000_setup_addr_mask Michael Meissner
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 42+ messages in thread
From: Michael Meissner @ 2019-08-26 21:07 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, segher, dje.gcc

This patch adds the basic RTL insn attribute for doing prefixed and
pc-relative addressing.

It is fairly similar to the version 1 patch #2.

This patch is an infrastructure patch that will be used by the next
patch (#4) to use prefixed loads and stores where it is appropriate.
In this patch, only the load pc-relative address operations will
actually be flagged as a prefixed instruction.

In this patch, I kept the changes in rs6000.c and I did not create a
new file rs6000-prefixed.c.  I used the 'trad_insn' type introduced in
v3 patch #1, instead of the insn_form that was used previously.

I changed the 3 functions used to return whether that the current insn
is prefixed (which are called from the prefixed RTL attribute) to use
the operands directly.  It checks whether the "indexed" and "update"
attributes are both "no" for load/store operands, and that there are at
least 2 operands.

In reworking this patch from the previous patch, I have changed the
method for forcing LWA to be a DS format instruction instead of D
format.

Since I reworked things and patch #1 now has the insns to load up both
local and external prefixed addresses, this patch modifies those two
insns to use the "prefixed" RTL attribute instead of manually printing
out the leading "p".

I made a minor change to patch #1 to deal with _Complex IBM long double
that I hadn't thought of when I was doing patch #1.

Aaron Sawdey had discovered that vector extracts with a variable
element number where the vector is a local pc-relative symbol (i.e. in
static), would fail because it only had one temporary register.  In
this patch, I just put an abort if the user tried to do this operation.
In a later patch (#6), I will have a patch that does not allow the
optimization of vector extract from memory to be combined with a
variable element number, and instead, it will load up the address into
a register, and do the extract with that address.

I have built the GCC compiler with this patch and the previous two
patches on a little endian power8 system.  It bootstrapped fine and
there no regressions in running make check.  Can I check this patch
into the trunk, once the previous two patches are checked in?

2019-08-26  Michael Meissner  <meissner@linux.ibm.com>

	* config/rs6000/rs6000-protos.h (prefixed_load_p): New
	declaration.
	(prefixed_store_p): New declaration.
	(prefixed_paddi_p): New declaration.
	(rs6000_asm_output_opcode): New declaration.
	(rs6000_final_prescan_insn): New declaration.
	* config/rs6000/rs6000.c (addr_mask_to_trad_insn): Rework to deal
	with things like _Complex __ibm128.
	(reg_to_trad_insn): New helper function.
	(prefixed_load_p): New function for prefixed memory.
	(prefixed_store_p): New function for prefixed memory.
	(prefixed_paddi_p): New function for prefixed memory.
	(next_insn_prefixed_p): New state static flag.
	(rs6000_final_prescan_insn): New function for prefixed memory.
	(rs6000_asm_output_opcode): New function for prifixed memory.
	* config/rs6000/rs6000.h (FINAL_PRESCAN_INSN): New target hook.
	(ASM_OUTPUT_OPCODE): New target hook.
	* config/rs6000/rs6000.md (prefixed RTL attribute): New attribute
	for prefixed memory support.
	(prefixed_length RTL attribute): New attribute for prefixed memory
	support.
	(non_prefixed_length RTL attribute): New attribute for prefixed
	memory support.
	(length RTL attribute): Use prefixed, prefixed_length, and
	non_prefixed_lengths to set the default instruction length.
	(pcrel_local_addr): Change to use the prefixed attribute.
	(pcrel_ext_addr): Change to use the prefixed attribute.

Index: gcc/config/rs6000/rs6000-protos.h
===================================================================
--- gcc/config/rs6000/rs6000-protos.h	(revision 274870)
+++ gcc/config/rs6000/rs6000-protos.h	(working copy)
@@ -170,6 +170,11 @@ typedef enum {
 } trad_insn_type;
 
 extern bool prefixed_local_addr_p (rtx, machine_mode, trad_insn_type);
+extern bool prefixed_load_p (rtx_insn *);
+extern bool prefixed_store_p (rtx_insn *);
+extern bool prefixed_paddi_p (rtx_insn *);
+extern void rs6000_asm_output_opcode (FILE *);
+void rs6000_final_prescan_insn (rtx_insn *, rtx [], int);
 #endif /* RTX_CODE */
 
 #ifdef TREE_CODE
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 274871)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -13827,23 +13827,23 @@ addr_mask_to_trad_insn (machine_mode mod
      early RTL stages before register allocation has been done.  */
   if ((addr_mask & flags) == RELOAD_REG_MULTIPLE)
     {
-      machine_mode inner = word_mode;
+      machine_mode mode2 = mode;
 
-      if (COMPLEX_MODE_P (mode))
+      if (COMPLEX_MODE_P (mode2))
 	{
-	  inner = GET_MODE_INNER (mode);
-	  if ((reg_addr[inner].default_addr_mask & RELOAD_REG_OFFSET) == 0)
-	    inner = word_mode;
+	  machine_mode inner = GET_MODE_INNER (mode);
+	  if ((reg_addr[inner].default_addr_mask & RELOAD_REG_OFFSET) != 0)
+	    mode2 = inner;
 	}
 
-      if (FLOAT128_2REG_P (mode))
+      if (FLOAT128_2REG_P (mode2))
 	{
-	  inner = DFmode;
-	  if ((reg_addr[inner].default_addr_mask & RELOAD_REG_OFFSET) == 0)
-	    inner = word_mode;
+	  if ((reg_addr[E_DFmode].default_addr_mask & RELOAD_REG_OFFSET) != 0)
+	    mode = DFmode;
 	}
 
-      addr_mask = reg_addr[inner].default_addr_mask;
+      if (mode != mode2)
+	addr_mask = reg_addr[mode2].default_addr_mask;
     }
 
   if ((addr_mask & RELOAD_REG_OFFSET) == 0)
@@ -13858,6 +13858,49 @@ addr_mask_to_trad_insn (machine_mode mod
   return TRAD_INSN_D;
 }
 
+/* Helper function to take a REG and a MODE and turn it into the traditional
+   instruction format (D/DS/DQ) used for offset memory.  */
+
+static trad_insn_type
+reg_to_trad_insn (rtx reg, machine_mode mode)
+{
+  addr_mask_type addr_mask;
+
+  /* If it isn't a register, use the defaults.  */
+  if (!REG_P (reg) && !SUBREG_P (reg))
+    addr_mask = reg_addr[mode].default_addr_mask;
+
+  else
+    {
+      unsigned int r = reg_or_subregno (reg);
+
+      /* If we have a pseudo, use the default instruction format.  */
+      if (r >= FIRST_PSEUDO_REGISTER)
+	addr_mask = reg_addr[mode].default_addr_mask;
+
+      /* If we have a hard register, use the addr_mask of that hard
+	 register's reload register class.  */
+      else
+	{
+	  if (INT_REGNO_P (r))
+	    addr_mask = reg_addr[mode].addr_mask[RELOAD_REG_GPR];
+
+	  else if (FP_REGNO_P (r))
+	    addr_mask = reg_addr[mode].addr_mask[RELOAD_REG_FPR];
+
+	  else if (ALTIVEC_REGNO_P (r))
+	    addr_mask = reg_addr[mode].addr_mask[RELOAD_REG_VMX];
+
+	  /* Assume things like SPRs, CR, etc. will be loaded through the GPR
+	     registers.  */
+	  else
+	    addr_mask = reg_addr[mode].addr_mask[RELOAD_REG_GPR];
+	}
+    }
+
+  return addr_mask_to_trad_insn (mode, addr_mask);
+}
+
 /* Function to return true if ADDR is a valid prefixed memory address that uses
    mode MODE, and the traditional instruction uses the TRAD_INSN format.  */
 
@@ -13908,6 +13951,139 @@ prefixed_local_addr_p (rtx addr,
   return false;
 }
 \f
+/* Whether a load instruction is a prefixed instruction.  This is called from
+   the prefixed attribute processing.  */
+
+bool
+prefixed_load_p (rtx_insn *insn)
+{
+  /* Validate the insn to make sure it is a normal load insn.  */
+  extract_insn_cached (insn);
+  if (recog_data.n_operands < 2)
+    return false;
+
+  rtx reg = recog_data.operand[0];
+  rtx mem = recog_data.operand[1];
+
+  if (!REG_P (reg) && !SUBREG_P (reg))
+    return false;
+
+  if (!MEM_P (mem))
+    return false;
+
+  /* LWA uses the DS format instead of the D format that LWZ uses.  */
+  trad_insn_type trad_insn;
+  machine_mode reg_mode = GET_MODE (reg);
+  machine_mode mem_mode = GET_MODE (mem);
+
+  if (mem_mode == SImode && reg_mode == DImode
+      && get_attr_sign_extend (insn) == SIGN_EXTEND_YES)
+    trad_insn = TRAD_INSN_DS;
+
+  else
+    trad_insn = reg_to_trad_insn (reg, mem_mode);
+
+  return prefixed_local_addr_p (XEXP (mem, 0), mem_mode, trad_insn);
+}
+
+/* Whether a store instruction is a prefixed instruction.  This is called from
+   the prefixed attribute processing.  */
+
+bool
+prefixed_store_p (rtx_insn *insn)
+{
+  /* Validate the insn to make sure it is a normal store insn.  */
+  extract_insn_cached (insn);
+  if (recog_data.n_operands < 2)
+    return false;
+
+  rtx mem = recog_data.operand[0];
+  rtx reg = recog_data.operand[1];
+
+  if (!REG_P (reg) && !SUBREG_P (reg))
+    return false;
+
+  if (!MEM_P (mem))
+    return false;
+
+  machine_mode mem_mode = GET_MODE (mem);
+  trad_insn_type trad_insn = reg_to_trad_insn (reg, mem_mode);
+  return prefixed_local_addr_p (XEXP (mem, 0), mem_mode, trad_insn);
+}
+
+/* Whether a load immediate or add instruction is a prefixed instruction.  This
+   is called from the prefixed attribute processing.  */
+
+bool
+prefixed_paddi_p (rtx_insn *insn)
+{
+  rtx set = single_set (insn);
+  if (!set)
+    return false;
+
+  rtx dest = SET_DEST (set);
+  rtx src = SET_SRC (set);
+
+  if (!REG_P (dest) && !SUBREG_P (dest))
+    return false;
+
+  /* Is this a load immediate that can't be done with a simple ADDI or
+     ADDIS?  */
+  if (CONST_INT_P (src))
+    return (satisfies_constraint_eI (src)
+	    && !satisfies_constraint_I (src)
+	    && !satisfies_constraint_L (src));
+
+  /* Is this a PADDI instruction that can't be done with a simple ADDI or
+     ADDIS?  */
+  if (GET_CODE (src) == PLUS)
+    {
+      rtx op1 = XEXP (src, 1);
+
+      return (CONST_INT_P (op1)
+	      && satisfies_constraint_eI (op1)
+	      && !satisfies_constraint_I (op1)
+	      && !satisfies_constraint_L (op1));
+    }
+
+  /* If not, is it a load of a pc-relative address?  */
+  if (!TARGET_PCREL || GET_MODE (dest) != Pmode)
+    return false;
+
+  if (!SYMBOL_REF_P (src) && !LABEL_REF_P (src) && GET_CODE (src) != CONST)
+    return false;
+
+  return (pcrel_local_address (src, Pmode) || pcrel_ext_address (src, Pmode));
+}
+
+\f
+/* Whether the next instruction needs a 'p' prefix issued before the
+   instruction is printed out.  */
+static bool next_insn_prefixed_p;
+
+/* Define FINAL_PRESCAN_INSN if some processing needs to be done before
+   outputting the assembler code.  On the PowerPC, we remember if the current
+   insn is a prefixed insn where we need to emit a 'p' before the insn.  */
+void
+rs6000_final_prescan_insn (rtx_insn *insn, rtx [], int)
+{
+  next_insn_prefixed_p = (get_attr_prefixed (insn) != PREFIXED_NO);
+  return;
+}
+
+/* Define ASM_OUTPUT_OPCODE to do anything special before emitting an opcode.
+   We use it to emit a 'p' for prefixed insns that is set in
+   FINAL_PRESCAN_INSN.  */
+void
+rs6000_asm_output_opcode (FILE *stream)
+{
+  if (next_insn_prefixed_p)
+    fputc ('p', stream);
+
+  return;
+}
+
+\f
 #if defined (HAVE_GAS_HIDDEN) && !TARGET_MACHO
 /* Emit an assembler directive to set symbol visibility for DECL to
    VISIBILITY_TYPE.  */
Index: gcc/config/rs6000/rs6000.h
===================================================================
--- gcc/config/rs6000/rs6000.h	(revision 274864)
+++ gcc/config/rs6000/rs6000.h	(working copy)
@@ -2572,3 +2572,24 @@ typedef struct GTY(()) machine_function
   IN_RANGE ((VALUE),							\
 	    -(HOST_WIDE_INT_1 << 33),					\
 	    (HOST_WIDE_INT_1 << 33) - 1 - (EXTRA))
+
+/* Define this if some processing needs to be done before outputting the
+   assembler code.  On the PowerPC, we remember if the current insn is a normal
+   prefixed insn where we need to emit a 'p' before the insn.  */
+#define FINAL_PRESCAN_INSN(INSN, OPERANDS, NOPERANDS)			\
+do									\
+  {									\
+    if (TARGET_PREFIXED_ADDR)						\
+      rs6000_final_prescan_insn (INSN, OPERANDS, NOPERANDS);		\
+  }									\
+while (0)
+
+/* Do anything special before emitting an opcode.  We use it to emit a 'p' for
+   prefixed insns that is set in FINAL_PRESCAN_INSN.  */
+#define ASM_OUTPUT_OPCODE(STREAM, OPCODE)				\
+  do									\
+    {									\
+     if (TARGET_PREFIXED_ADDR)						\
+       rs6000_asm_output_opcode (STREAM);				\
+    }									\
+  while (0)
Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(revision 274870)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -258,8 +258,51 @@ (define_attr "var_shift" "no,yes"
 ;; Is copying of this instruction disallowed?
 (define_attr "cannot_copy" "no,yes" (const_string "no"))
 
-;; Length of the instruction (in bytes).
-(define_attr "length" "" (const_int 4))
+;; Whether an insn is a prefixed insn, and an initial 'p' should be printed
+;; before the instruction.  A prefixed instruction has a prefix instruction
+;; word that extends the immediate value of the instructions from 12-16 bits to
+;; 34 bits.  The macro ASM_OUTPUT_OPCODE emits a leading 'p' for prefixed
+;; insns.  The default "length" attribute will also be adjusted by default to
+;; be 12 bytes.
+(define_attr "prefixed" "no,yes"
+  (cond [(ior (match_test "!TARGET_PREFIXED_ADDR")
+	      (match_test "!NONJUMP_INSN_P (insn)"))
+	 (const_string "no")
+
+	 (eq_attr "type" "load,fpload,vecload")
+	 (if_then_else (and (eq_attr "indexed" "no")
+			    (eq_attr "update" "no")
+			    (match_test "prefixed_load_p (insn)"))
+		       (const_string "yes")
+		       (const_string "no"))
+
+	 (eq_attr "type" "store,fpstore,vecstore")
+	 (if_then_else (and (eq_attr "indexed" "no")
+			    (eq_attr "update" "no")
+			    (match_test "prefixed_store_p (insn)"))
+		       (const_string "yes")
+		       (const_string "no"))
+
+	 (eq_attr "type" "integer,add")
+	 (if_then_else (match_test "prefixed_paddi_p (insn)")
+		       (const_string "yes")
+		       (const_string "no"))]
+	(const_string "no")))
+
+;; Length in bytes of instructions that use prefixed addressing and length in
+;; bytes of instructions that does not use prefixed addressing.  This allows
+;; both lengths to be defined as constants, and the length attribute can pick
+;; the size as appropriate.
+(define_attr "prefixed_length" "" (const_int 12))
+(define_attr "non_prefixed_length" "" (const_int 4))
+
+;; Length of the instruction (in bytes).  Prefixed insns are 8 bytes, but the
+;; assembler might issue need to issue a NOP so that the prefixed instruction
+;; does not cross a cache boundary, which makes them possibly 12 bytes.
+(define_attr "length" ""
+  (if_then_else (eq_attr "prefixed" "yes")
+		(attr "prefixed_length")
+		(attr "non_prefixed_length")))
 
 ;; Processor type -- this attribute must exactly match the processor_type
 ;; enumeration in rs6000-opts.h.
@@ -9883,8 +9926,8 @@ (define_insn "pcrel_local_addr"
   [(set (match_operand:DI 0 "gpc_reg_operand" "=b*r")
 	(match_operand:DI 1 "pcrel_local_address"))]
   "TARGET_PCREL"
-  "pla %0,%a1"
-  [(set_attr "length" "12")])
+  "la %0,%a1"
+  [(set_attr "prefixed" "yes")])
 
 ;; Load up a pc-relative address to an external symbol.  If the symbol and the
 ;; program are both defined in the main program, the linker will optimize this
@@ -9895,8 +9938,8 @@ (define_insn "pcrel_ext_addr"
   [(set (match_operand:DI 0 "gpc_reg_operand" "=b*r")
 	(match_operand:DI 1 "pcrel_ext_address"))]
   "TARGET_PCREL"
-  "pld %0,%a1"
-  [(set_attr "length" "12")
+  "ld %0,%a1"
+  [(set_attr "prefixed" "yes")
    (set_attr "type" "load")])
 
 ;; TOC register handling.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH, V3, #2 of 10], Improve rs6000_setup_addr_mask
  2019-08-26 19:21 PowerPC future machine, version 3 Michael Meissner
  2019-08-26 20:41 ` [PATCH V3, #1 of 10], Add basic pc-relative support Michael Meissner
  2019-08-26 21:07 ` [PATCH, V3, #3 of 10], Add prefixed RTL insn attribute Michael Meissner
@ 2019-08-26 21:12 ` Michael Meissner
  2019-08-29  2:59   ` Segher Boessenkool
  2019-08-26 21:23 ` [PATCH, V3, #4 of 10], Add general prefixed/pcrel support Michael Meissner
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 42+ messages in thread
From: Michael Meissner @ 2019-08-26 21:12 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, segher, dje.gcc

This patch is an optional patch.

In previous patches, people said that the rs6000_setup_addr_mask
function was hard to understand, exactly what address mask bits were
set.

This code attempts to make this clearer by moving the settings for
GPRs, FPRs, and traditional Altivec registers to separate functions.
Along the way, I discovered there were two things that arguably should
be changed.  In this patch, I opted to set the address masks to be the
same as the previous compiler, since this is supposed to be an optional
drop-in replacement.

The first weirdity is where systems without Altivec support still set
the address masks to inicate valid Altivec settings for the V1TImode
type, due to a missing test.  I have provided this fix as a separate
patch, and it is currently awaiting approval:
https://gcc.gnu.org/ml/gcc-patches/2019-08/msg01432.html

The second issue is SDmode indicates that it can do PRE_INCREMENT,
PRE_DECREMENT, and PRE_MODIFY in the floating point registers.  It
can't since you need to use the LFIWZX instruction to load SDmode, and
that does not have an pre-increment format.  I was not able to make a
test case that actually failed with SDmode.  I opted to make my
comparison simpler by returning the same information that the current
compiler uses.  If you prefer, I can change it so the address mask does
not indicate that the mode can do pre increment, etc.

I tested this on a little endian power8 system, doing a bootstrap and
make check.  There were no regressions.  Can I check this into the
trunk once patch #1 has been checked in?

In addition, I did the same test as I did in patch #1, and verified
that all of the address masks were the same:

| In addition, I built a big endian cross compiler and I used
| -mdebug=reg to dump out the address masks for each of the modes that
| we care about.  I did:
|
|	-mcpu=power5/power6/power7/power8/power9/future for big endian -m32
|	-mcpu=power5/power6/power7/power8/power9/future for big endian -m64
|	-mcpu=power8/power9/future for little endian -m64 -mabi=elfv2
|
|
| All of the cpu/endian/bit-size targets use the same address masks as
| before except for little endian future.  There the -mdebug=reg code
| prints whether a mode needs to use the DS mode.  I didn't print the
| DS information earlier, so that I could more easily do the
| comparison.  For the future/le combinations, I verified that each of
| the modes that says it needs to use a DS instruction format, did
| actually have that requirement for the traditional instruction.

2019-08-26  Michael Meissner  <meissner@linux.ibm.com>

	* config/rs6000/rs6000.c (FIRST_RELOAD_REG_CLASS): Delete.
	(LAST_RELOAD_REG_CLASS): Delete.
	(mode_uses_full_vector_reg): New helper function.
	(setup_reg_addr_masks_pre_incdec): New helper function.
	(setup_reg_addr_masks_gpr): New helper function.
	(setup_reg_addr_masks_fpr): New helper function.
	(setup_reg_addr_masks_altivec): New helper function.
	(rs6000_setup_reg_addr_masks): Move most of the code into the 3
	specific helper functions that deals with the specific address
	masks for GPRs, FPRs, and traditional Altivec registers.

Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 274870)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -328,12 +328,6 @@ enum rs6000_reload_reg_type {
   N_RELOAD_REG
 };
 
-/* For setting up register classes, loop through the 3 register classes mapping
-   into real registers, and skip the ANY class, which is just an OR of the
-   bits.  */
-#define FIRST_RELOAD_REG_CLASS	RELOAD_REG_GPR
-#define LAST_RELOAD_REG_CLASS	RELOAD_REG_VMX
-
 /* Map reload register type to a register in the register class.  */
 struct reload_reg_map_type {
   const char *name;			/* Register class name.  */
@@ -2532,156 +2526,304 @@ rs6000_debug_reg_global (void)
 }
 
 \f
-/* Update the addr mask bits in reg_addr to help secondary reload and go if
-   legitimate address support to figure out the appropriate addressing to
-   use.  */
+/* Return true if the mode is a type that uses the full vector register (like
+   V2DImode or KFmode).  Do not return true for 128-bit types like TDmode or
+   IFmode.  */
 
-static void
-rs6000_setup_reg_addr_masks (void)
+static bool
+mode_uses_full_vector_reg (machine_mode mode)
 {
-  ssize_t rc, reg, m, nregs;
-  addr_mask_type any_addr_mask, addr_mask;
+  if (GET_MODE_SIZE (mode) < 16)
+    return false;
 
-  for (m = 0; m < NUM_MACHINE_MODES; ++m)
+  if (TARGET_VSX)
+    return (VECTOR_MODE_P (mode)
+	    || FLOAT128_VECTOR_P (mode)
+	    || mode == TImode);
+
+  if (TARGET_ALTIVEC)
+    return ALTIVEC_VECTOR_MODE (mode);
+
+  return false;
+}
+
+/* Figure out if we can do PRE_INC, PRE_DEC, or PRE_MODIFY addressing for a
+   given MODE.  If we allow scalars into Altivec registers, don't allow
+   PRE_INC, PRE_DEC, or PRE_MODIFY.
+
+   For VSX systems, we don't allow update addressing for DFmode/SFmode if those
+   registers can go in both the traditional floating point registers and
+   Altivec registers.  The load/store instructions for the Altivec registers do
+   not have update forms.  If we allowed update addressing, it seems to break
+   IV-OPT code using floating point if the index type is int instead of long
+   (PR target/81550 and target/84042).  */
+
+/* Return the address mask bits for whether we allow PRE_INCREMENT,
+   PRE_DECREMENT, and PRE_MODIFY for a given MODE.  */
+
+static addr_mask_type
+setup_reg_addr_masks_pre_incdec (machine_mode mode)
+{
+  addr_mask_type addr_mask = 0;
+
+  if (TARGET_UPDATE
+      && GET_MODE_SIZE (mode) <= 8
+      && !VECTOR_MODE_P (mode)
+      && !FLOAT128_VECTOR_P (mode)
+      && !COMPLEX_MODE_P (mode)
+      && (mode != E_DFmode || !TARGET_VSX)
+      && (mode != E_SFmode || !TARGET_P8_VECTOR))
+    {
+      addr_mask |= RELOAD_REG_PRE_INCDEC;
+
+      /* PRE_MODIFY is more restricted than PRE_INC/PRE_DEC in that we don't
+	 allow PRE_MODIFY for some multi-register operations.  */
+      switch (mode)
+	{
+	default:
+	  addr_mask |= RELOAD_REG_PRE_MODIFY;
+	  break;
+
+	case E_DImode:
+	  if (TARGET_POWERPC64)
+	    addr_mask |= RELOAD_REG_PRE_MODIFY;
+	  break;
+
+	case E_DFmode:
+	case E_DDmode:
+	  if (TARGET_HARD_FLOAT)
+	    addr_mask |= RELOAD_REG_PRE_MODIFY;
+	  break;
+	}
+    }
+
+  return addr_mask;
+}
+
+/* Helper function for rs6000_setup_reg_addr_masks to set up the address masks
+   for GPR registers.  */
+
+static addr_mask_type
+setup_reg_addr_masks_gpr (machine_mode mode)
+{
+  addr_mask_type addr_mask = 0;
+
+  /* Can mode values go in the GPR registers?  */
+  if (rs6000_hard_regno_mode_ok_p[mode][FIRST_GPR_REGNO])
     {
-      machine_mode m2 = (machine_mode) m;
+      size_t mode_size = GET_MODE_SIZE (mode);
+      size_t reg_size = TARGET_POWERPC64 ? 8 : 4;
+      machine_mode mode_inner = mode;
       bool complex_p = false;
-      bool small_int_p = (m2 == QImode || m2 == HImode || m2 == SImode);
-      size_t msize;
 
-      if (COMPLEX_MODE_P (m2))
+      if (COMPLEX_MODE_P (mode_inner))
 	{
 	  complex_p = true;
-	  m2 = GET_MODE_INNER (m2);
+	  mode_inner = GET_MODE_INNER (mode_inner);
 	}
 
-      msize = GET_MODE_SIZE (m2);
+      size_t mode_size_inner = GET_MODE_SIZE (mode_inner);
+      ssize_t nregs = rs6000_hard_regno_nregs[mode][FIRST_GPR_REGNO];
+
+      /* Indicate if the mode takes more than 1 physical register.  If it takes
+	 a single register, indicate it can do REG+REG addressing.  */
+      if (nregs > 1 || mode == BLKmode || complex_p)
+	addr_mask |= RELOAD_REG_MULTIPLE;
+      else if (mode_size <= reg_size)
+	addr_mask |= RELOAD_REG_INDEXED;
+
+      /* GPR registers can do REG+OFFSET addressing for small scalar types.
+	 For vectors, VSX registers can do REG+OFFSET addresssing if ISA 3.0
+	 instructions are enabled.  The offset for 128-bit VSX registers is
+	 only 12-bits.  While GPRs can handle the full offset range, VSX
+	 registers can only handle the restricted range.
+
+	 SDmode is special in that we want to access it only via REG+REG
+	 addressing on power7 and above.  This is because the natural way to
+	 load a SDmode is to use the indexed LFIWZX instruction.  In power6, we
+	 had to load it up in a GPR, store it on the stack and then load it up
+	 into a FPR.  Don't allow SDmode to use offset addressing in power7 or
+	 later, even in GPRs (to prevent the register allocator from using a
+	 GPR to load the value).  */
+
+      bool indexed_only = (mode == SDmode && TARGET_LFIWZX);
+
+      if (!indexed_only
+	  && (mode_size_inner <= 8
+	      || (mode_size_inner == 16 && TARGET_P9_VECTOR
+		  && mode_uses_full_vector_reg (mode_inner))))
+	{
+	  addr_mask |= RELOAD_REG_OFFSET;
 
-      /* SDmode is special in that we want to access it only via REG+REG
-	 addressing on power7 and above, since we want to use the LFIWZX and
-	 STFIWZX instructions to load it.  */
-      bool indexed_only_p = (m == SDmode && TARGET_NO_SDMODE_STACK);
+	  /* Set the DS format bit if we have 64-bit loads/stores on a 64-bit
+	     system.  */
+	  if (TARGET_POWERPC64 && mode_size_inner >= 8)
+	    addr_mask |= RELOAD_REG_DS_OFFSET;
+	}
+
+      /* Do we support pre_increment, pre_decrement, or pre_modify?  */
+      addr_mask |= setup_reg_addr_masks_pre_incdec (mode);
+
+      /* Set the valid bit.  */
+      addr_mask |= RELOAD_REG_VALID;
+    }
+
+  return addr_mask;
+}
 
-      any_addr_mask = 0;
-      for (rc = FIRST_RELOAD_REG_CLASS; rc <= LAST_RELOAD_REG_CLASS; rc++)
+/* Helper function for rs6000_setup_reg_addr_masks to set up the address masks
+   for traditional FPR registers.  */
+
+static addr_mask_type
+setup_reg_addr_masks_fpr (machine_mode mode)
+{
+  addr_mask_type addr_mask = 0;
+
+  /* Can mode values go in the FPR registers?  */
+  if (rs6000_hard_regno_mode_ok_p[mode][FIRST_FPR_REGNO])
+    {
+      size_t mode_size = GET_MODE_SIZE (mode);
+      machine_mode mode_inner = mode;
+      bool complex_p = false;
+
+      if (COMPLEX_MODE_P (mode_inner))
 	{
-	  addr_mask = 0;
-	  reg = reload_reg_map[rc].reg;
+	  complex_p = true;
+	  mode_inner = GET_MODE_INNER (mode_inner);
+	}
 
-	  /* Can mode values go in the GPR/FPR/Altivec registers?  */
-	  if (reg >= 0 && rs6000_hard_regno_mode_ok_p[m][reg])
-	    {
-	      bool small_int_vsx_p = (small_int_p
-				      && (rc == RELOAD_REG_FPR
-					  || rc == RELOAD_REG_VMX));
-
-	      nregs = rs6000_hard_regno_nregs[m][reg];
-	      addr_mask |= RELOAD_REG_VALID;
-
-	      /* Indicate if the mode takes more than 1 physical register.  If
-		 it takes a single register, indicate it can do REG+REG
-		 addressing.  Small integers in VSX registers can only do
-		 REG+REG addressing.  */
-	      if (small_int_vsx_p)
-		addr_mask |= RELOAD_REG_INDEXED;
-	      else if (nregs > 1 || m == BLKmode || complex_p)
-		addr_mask |= RELOAD_REG_MULTIPLE;
-	      else
-		addr_mask |= RELOAD_REG_INDEXED;
-
-	      /* Figure out if we can do PRE_INC, PRE_DEC, or PRE_MODIFY
-		 addressing.  If we allow scalars into Altivec registers,
-		 don't allow PRE_INC, PRE_DEC, or PRE_MODIFY.
-
-		 For VSX systems, we don't allow update addressing for
-		 DFmode/SFmode if those registers can go in both the
-		 traditional floating point registers and Altivec registers.
-		 The load/store instructions for the Altivec registers do not
-		 have update forms.  If we allowed update addressing, it seems
-		 to break IV-OPT code using floating point if the index type is
-		 int instead of long (PR target/81550 and target/84042).  */
-
-	      if (TARGET_UPDATE
-		  && (rc == RELOAD_REG_GPR || rc == RELOAD_REG_FPR)
-		  && msize <= 8
-		  && !VECTOR_MODE_P (m2)
-		  && !FLOAT128_VECTOR_P (m2)
-		  && !complex_p
-		  && (m != E_DFmode || !TARGET_VSX)
-		  && (m != E_SFmode || !TARGET_P8_VECTOR)
-		  && !small_int_vsx_p)
-		{
-		  addr_mask |= RELOAD_REG_PRE_INCDEC;
+      size_t mode_size_inner = GET_MODE_SIZE (mode_inner);
+      ssize_t nregs = rs6000_hard_regno_nregs[mode][FIRST_FPR_REGNO];
 
-		  /* PRE_MODIFY is more restricted than PRE_INC/PRE_DEC in that
-		     we don't allow PRE_MODIFY for some multi-register
-		     operations.  */
-		  switch (m)
-		    {
-		    default:
-		      addr_mask |= RELOAD_REG_PRE_MODIFY;
-		      break;
-
-		    case E_DImode:
-		      if (TARGET_POWERPC64)
-			addr_mask |= RELOAD_REG_PRE_MODIFY;
-		      break;
-
-		    case E_DFmode:
-		    case E_DDmode:
-		      if (TARGET_HARD_FLOAT)
-			addr_mask |= RELOAD_REG_PRE_MODIFY;
-		      break;
-		    }
-		}
-	    }
+      /* Indicate if the mode takes more than 1 physical register.  If it takes
+	 a single register, indicate it can do REG+REG addressing.  */
+      if (nregs > 1 || mode == BLKmode || complex_p)
+	addr_mask |= RELOAD_REG_MULTIPLE;
+
+      else if (mode == SFmode || mode_size == 8
+	       || mode_uses_full_vector_reg (mode_inner)
+	       || (TARGET_LFIWAX && (mode == SImode || mode == SDmode))
+	       || (TARGET_P9_VECTOR && (mode == QImode || mode == HImode)))
+	addr_mask |= RELOAD_REG_INDEXED;
 
-	  /* GPR and FPR registers can do REG+OFFSET addressing, except
-	     possibly for SDmode.  ISA 3.0 (i.e. power9) adds D-form addressing
-	     for 64-bit scalars and 32-bit SFmode to altivec registers.  */
-	  if ((addr_mask != 0) && !indexed_only_p
-	      && msize <= 8
-	      && (rc == RELOAD_REG_GPR
-		  || ((msize == 8 || m2 == SFmode)
-		      && (rc == RELOAD_REG_FPR
-			  || (rc == RELOAD_REG_VMX && TARGET_P9_VECTOR)))))
-	    addr_mask |= RELOAD_REG_OFFSET;
-
-	  /* VSX registers can do REG+OFFSET addresssing if ISA 3.0
-	     instructions are enabled.  The offset for 128-bit VSX registers is
-	     only 12-bits.  While GPRs can handle the full offset range, VSX
-	     registers can only handle the restricted range.  */
-	  else if ((addr_mask != 0) && !indexed_only_p
-		   && msize == 16 && TARGET_P9_VECTOR
-		   && (ALTIVEC_OR_VSX_VECTOR_MODE (m2)
-		       || (m2 == TImode && TARGET_VSX)))
-	    {
-	      addr_mask |= RELOAD_REG_OFFSET;
-	      if (rc == RELOAD_REG_FPR || rc == RELOAD_REG_VMX)
-		addr_mask |= RELOAD_REG_QUAD_OFFSET;
-	    }
+      /* FPR registers can do REG+OFFSET addressing for SFmode/DFmode.  */
+      if (mode_inner == SFmode || mode_size_inner == 8)
+	{
+	  addr_mask |= RELOAD_REG_OFFSET;
 
-	  /* VMX registers can do (REG & -16) and ((REG+REG) & -16)
-	     addressing on 128-bit types.  */
-	  if (rc == RELOAD_REG_VMX && msize == 16
-	      && (addr_mask & RELOAD_REG_VALID) != 0)
-	    addr_mask |= RELOAD_REG_AND_M16;
-
-	  /* 64-bit and larger values on GPRs need DS format instructions.  All
-	     non-vector offset instructions in Altivec registers need the DS
-	     format instructions.  */
-	  const addr_mask_type quad_flags = (RELOAD_REG_OFFSET
-					     | RELOAD_REG_QUAD_OFFSET);
-
-	  if ((addr_mask & quad_flags) == RELOAD_REG_OFFSET
-	      && ((rc == RELOAD_REG_GPR && msize >= 8 && TARGET_POWERPC64)
-		  || (rc == RELOAD_REG_VMX)))
-	    addr_mask |= RELOAD_REG_DS_OFFSET;
+	  /* Do we support pre_increment, pre_decrement, or pre_modify?  */
+	  addr_mask |= setup_reg_addr_masks_pre_incdec (mode);
+	}
+
+      /* It is weird that previous versions of GCC supported pre increment,
+	 etc. forms of addressing for SDmode, when you could only use an
+	 indexed instruction, but allow it for now.  Previous versions of GCC
+	 also set the indexed flag for SDmode, even though there was no direct
+	 instruction to load it.  */
+      else if (mode_inner == SDmode)
+	addr_mask |= (RELOAD_REG_INDEXED
+		      | RELOAD_REG_PRE_INCDEC
+		      | RELOAD_REG_PRE_MODIFY);
+
+      /* FPR registers can do REG+OFFSET addresssing for vectors if ISA 3.0
+	 instructions are enabled.  The offset for 128-bit VSX registers is
+	 only 12-bits.  */
+      else if (TARGET_P9_VECTOR && mode_uses_full_vector_reg (mode_inner))
+	addr_mask |= RELOAD_REG_OFFSET | RELOAD_REG_QUAD_OFFSET;
+
+      /* Set valid bit.  */
+      addr_mask |= RELOAD_REG_VALID;
+    }
+
+  return addr_mask;
+}
+
+/* Helper function for rs6000_setup_reg_addr_masks to set up the address masks
+   for traditional Altivec registers.  */
+
+static addr_mask_type
+setup_reg_addr_masks_altivec (machine_mode mode)
+{
+  addr_mask_type addr_mask = 0;
 
-	  reg_addr[m].addr_mask[rc] = addr_mask;
-	  any_addr_mask |= (addr_mask & ~RELOAD_REG_AND_M16);
+  /* Can mode values go in the Altivec registers?  */
+  if (rs6000_hard_regno_mode_ok_p[mode][FIRST_ALTIVEC_REGNO])
+    {
+      size_t mode_size = GET_MODE_SIZE (mode);
+      machine_mode mode_inner = mode;
+      bool complex_p = false;
+
+      if (COMPLEX_MODE_P (mode_inner))
+	{
+	  complex_p = true;
+	  mode_inner = GET_MODE_INNER (mode_inner);
 	}
 
+      size_t mode_size_inner = GET_MODE_SIZE (mode_inner);
+      bool vector_p = mode_uses_full_vector_reg (mode_inner);
+      ssize_t nregs = rs6000_hard_regno_nregs[mode][FIRST_ALTIVEC_REGNO];
+
+      /* Indicate if the mode takes more than 1 physical register.  If it takes
+	 a single register, indicate it can do REG+REG addressing.  */
+      if (nregs > 1 || mode == BLKmode || complex_p)
+	addr_mask |= RELOAD_REG_MULTIPLE;
+
+      else if (mode == SFmode || mode_size == 8 || vector_p
+	       || (TARGET_P8_VECTOR && mode == SImode)
+	       || (TARGET_P9_VECTOR && (mode == QImode || mode == HImode)))
+	addr_mask |= RELOAD_REG_INDEXED;
+
+      /* Starting with ISA 3.0, Altivec registers can do REG+OFFSET addressing
+	 for SFmode/DFmode.  Vectors also support REG+OFFSET, but the offset is
+	 limited to DQ format unless prefixed memory instructions are used.
+	 Pre increment/decrement/modify is not supported.
+
+	 All Altivec scalar load/store instructions with an offset use the DS
+	 format.  */
+      if (TARGET_P9_VECTOR)
+	{
+	  if (mode_inner == SFmode || mode_size_inner == 8)
+	    addr_mask |= RELOAD_REG_OFFSET | RELOAD_REG_DS_OFFSET;
+
+	  else if (vector_p)
+	    addr_mask |= RELOAD_REG_OFFSET | RELOAD_REG_QUAD_OFFSET;
+	}
+
+      /* Vectors can use Altivec memory instructions to support omitting the
+	 bottom 4 bits in addition to normal indexed addressing.  */
+      if (vector_p)
+	addr_mask |= RELOAD_REG_AND_M16;
+
+      /* Set valid bit.  */
+      addr_mask |= RELOAD_REG_VALID;
+    }
+
+  return addr_mask;
+}
+
+/* Update the addr mask bits in reg_addr to help secondary reload and go if
+   legitimate address support to figure out the appropriate addressing to
+   use.  */
+
+static void
+rs6000_setup_reg_addr_masks (void)
+{
+  for (ssize_t m = 0; m < NUM_MACHINE_MODES; ++m)
+    {
+      machine_mode mode = (machine_mode)m;
+
+      addr_mask_type addr_mask = setup_reg_addr_masks_gpr (mode);
+      addr_mask_type any_addr_mask = addr_mask;
+      reg_addr[m].addr_mask[RELOAD_REG_GPR] = addr_mask;
+
+      addr_mask = setup_reg_addr_masks_fpr (mode);
+      any_addr_mask |= addr_mask;
+      reg_addr[m].addr_mask[RELOAD_REG_FPR] = addr_mask;
+
+      addr_mask = setup_reg_addr_masks_altivec (mode);
+      any_addr_mask |= (addr_mask & ~RELOAD_REG_AND_M16);
+      reg_addr[m].addr_mask[RELOAD_REG_VMX] = addr_mask;
+
       reg_addr[m].any_addr_mask = any_addr_mask;
 
       /* Figure out what the default reload register set that should be used
@@ -2701,15 +2843,18 @@ rs6000_setup_reg_addr_masks (void)
 	rc_order[rc_max++] = RELOAD_REG_VMX;
 
       /* Normal vectors and software IEEE 128-bit can use either floating point
-	 registers or Altivec registers.  */
-      else if (TARGET_VSX && (VECTOR_MODE_P (m) || FLOAT128_IEEE_P (m)))
+	 registers or Altivec registers.  Don't favor TImode for vector
+	 registers at this time.  */
+      else if (TARGET_VSX && m != E_TImode
+	       && mode_uses_full_vector_reg ((machine_mode) m))
 	{
 	  rc_order[rc_max++] = RELOAD_REG_FPR;
 	  rc_order[rc_max++] = RELOAD_REG_VMX;
 	}
 
       /* Altivec only vectors use the Altivec registers.  */
-      else if (TARGET_ALTIVEC && !TARGET_VSX && VECTOR_MODE_P (m))
+      else if (TARGET_ALTIVEC && !TARGET_VSX
+	       && mode_uses_full_vector_reg ((machine_mode) m))
 	rc_order[rc_max++] = RELOAD_REG_VMX;
 
       /* For scalar binary/decimal floating point, prefer FPRs over altivec

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH, V3, #4 of 10], Add general prefixed/pcrel support
  2019-08-26 19:21 PowerPC future machine, version 3 Michael Meissner
                   ` (2 preceding siblings ...)
  2019-08-26 21:12 ` [PATCH, V3, #2 of 10], Improve rs6000_setup_addr_mask Michael Meissner
@ 2019-08-26 21:23 ` Michael Meissner
  2019-08-30 19:22   ` Segher Boessenkool
  2019-08-26 21:43 ` [PATCH, V3, #5 of 10], Make -mpcrel default on little endian Linux systems Michael Meissner
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 42+ messages in thread
From: Michael Meissner @ 2019-08-26 21:23 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, segher, dje.gcc

This patch (V3 patch #4) is a rework of the V1 patches #3 and #4.  It
adds support to generate prefixed (and local pc-relative) instructions
for all modes, except SDmode.  SDmode can't be used with a prefixed
offset instruction, because the default method to load up a SDmode
value is to use the LFIWZX instruction, which only has an indexed
format.

For the stack_protect_setdi and stack_protect_testdi insns, I reworked
them so that the expander will copy the prefixed memory address to a
register and use the indexed instruction format.  I added new
predicates to make sure nothing re-combined the insn to form a prefixed
insns.

I changed the logic previously using insn_form to now use trad_insn.

I think in the previoius patch, I mispoke, in that the logic for
pc-relative vector extract is here, and not in the previous patch.

I have built a bootstrap compiler on a little endian power8 system, and
there were no regressions when I ran make check.  Once the previous
patches are checked in, can I check in this patch?

2019-08-26  Michael Meissner  <meissner@linux.ibm.com>

	* config/rs6000/predicates.md (add_operand): Add support for the
	PADDI instruction.
	(non_add_cint_operand): Add support for the PADDI instruction.
	(lwa_operand): Add support for the PLWA instruction.
	(non_prefixed_mem_operand): New predicate.
	* config/rs6000/rs6000-protos.h (make_memory_non_prefixed): New
	declaration.
	* config/rs6000/rs6000.c (num_insns_constant_gpr): Add support for
	the PADDI instruction.
	(rs6000_adjust_vec_address): Add support for optimizing prefixed
	and pc-relative extracts with constant extraction elements.  Add a
	failure when we use pc-relative addressing and non-constant
	extraction elements.  Use SIGNED_16BIT_OFFSET_P.
	(quad_address_p): Add support for prefixed memory instructions.
	(mem_operand_gpr): Add support for prefixed memory instructions.
	Use SIGNED_16BIT_OFFSET_EXTRA_P.
	(mem_operand_ds_form): Add support for prefixed memory
	instructions.  Use SIGNED_16BIT_OFFSET_EXTRA_P.
	(rs6000_legitimate_offset_address_p): Add support for prefixed
	memory instructions.
	(rs6000_legitimate_address_p): Add support for prefixed memory
	instructions.
	(rs6000_mode_dependent_address): Add support for prefixed memory
	instructions.
	(make_memory_non_prefixed): New function.
	(prefixed_paddi_p): Fix thinkos in last patch.
	(rs6000_rtx_costs): Add support for the PADDI instruction.
	(rs6000_num_insns): Don't treat prefixed instructions as being
	slower because they have a larger length.
	(rs6000_insn_cost): Call rs6000_num_insns.
	* config/rs6000/rs6000.md (add<mode>3): Add support for the PADDI
	instruction.
	(movsi_low): Add support for the PADDI instruction.
	(movsi const int splitter): Add support for the PADDI
	instruction.
	(mov<mode>_64bit_dm): Add support for prefixed memory
	instructions. Split alternatives that had merged loading a
	constant with register moves.
	(movtd_64bit_nodm): Add support for prefixed memory instructions.
	(movdi_internal64): Add support for prefixed memory instructions.
	(movdi const int splitter): Add comment.
	(mov<mode>_ppc64): Add support for prefixed memory instructions.
	(stack_protect_setdi): Do not allow prefixed instructions.
	(stack_protect_testdi): Do not allow prefixed instructions.
	* config/rs6000/vsx.md (vsx_mov<mode>_64bit): Add support for
	prefixed memory instructions.

Index: gcc/config/rs6000/predicates.md
===================================================================
--- gcc/config/rs6000/predicates.md	(revision 274870)
+++ gcc/config/rs6000/predicates.md	(working copy)
@@ -839,7 +839,8 @@ (define_special_predicate "indexed_addre
 (define_predicate "add_operand"
   (if_then_else (match_code "const_int")
     (match_test "satisfies_constraint_I (op)
-		 || satisfies_constraint_L (op)")
+		 || satisfies_constraint_L (op)
+		 || satisfies_constraint_eI (op)")
     (match_operand 0 "gpc_reg_operand")))
 
 ;; Return 1 if the operand is either a non-special register, or 0, or -1.
@@ -852,7 +853,8 @@ (define_predicate "adde_operand"
 (define_predicate "non_add_cint_operand"
   (and (match_code "const_int")
        (match_test "!satisfies_constraint_I (op)
-		    && !satisfies_constraint_L (op)")))
+		    && !satisfies_constraint_L (op)
+		    && !satisfies_constraint_eI (op)")))
 
 ;; Return 1 if the operand is a constant that can be used as the operand
 ;; of an AND, OR or XOR.
@@ -933,6 +935,13 @@ (define_predicate "lwa_operand"
     return false;
 
   addr = XEXP (inner, 0);
+
+  /* The LWA instruction uses the DS-form format where the bottom two bits of
+     the offset must be 0.  The prefixed PLWA does not have this
+     restriction.  */
+  if (prefixed_local_addr_p (addr, mode, TRAD_INSN_DS))
+    return true;
+
   if (GET_CODE (addr) == PRE_INC
       || GET_CODE (addr) == PRE_DEC
       || (GET_CODE (addr) == PRE_MODIFY
@@ -1686,6 +1695,17 @@ (define_predicate "pcrel_ext_address"
   return (SYMBOL_REF_P (op) && !SYMBOL_REF_LOCAL_P (op));
 })
 
+;; Return 1 if op is a memory operand that is not prefixed.
+(define_predicate "non_prefixed_mem_operand"
+  (match_code "mem")
+{
+  if (!memory_operand (op, mode))
+    return false;
+
+  return !prefixed_local_addr_p (XEXP (op, 0), GET_MODE (op),
+				 TRAD_INSN_DEFAULT);
+})
+
 ;; Match the first insn (addis) in fusing the combination of addis and loads to
 ;; GPR registers on power8.
 (define_predicate "fusion_gpr_addis"
Index: gcc/config/rs6000/rs6000-protos.h
===================================================================
--- gcc/config/rs6000/rs6000-protos.h	(revision 274872)
+++ gcc/config/rs6000/rs6000-protos.h	(working copy)
@@ -170,6 +170,7 @@ typedef enum {
 } trad_insn_type;
 
 extern bool prefixed_local_addr_p (rtx, machine_mode, trad_insn_type);
+extern rtx make_memory_non_prefixed (rtx);
 extern bool prefixed_load_p (rtx_insn *);
 extern bool prefixed_store_p (rtx_insn *);
 extern bool prefixed_paddi_p (rtx_insn *);
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 274872)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -5727,7 +5727,7 @@ static int
 num_insns_constant_gpr (HOST_WIDE_INT value)
 {
   /* signed constant loadable with addi */
-  if (((unsigned HOST_WIDE_INT) value + 0x8000) < 0x10000)
+  if (SIGNED_16BIT_OFFSET_P (value))
     return 1;
 
   /* constant loadable with addis */
@@ -5735,6 +5735,10 @@ num_insns_constant_gpr (HOST_WIDE_INT va
 	   && (value >> 31 == -1 || value >> 31 == 0))
     return 1;
 
+  /* PADDI can support up to 34 bit signed integers.  */
+  else if (TARGET_PREFIXED_ADDR && SIGNED_34BIT_OFFSET_P (value))
+    return 1;
+
   else if (TARGET_POWERPC64)
     {
       HOST_WIDE_INT low  = ((value & 0xffffffff) ^ 0x80000000) - 0x80000000;
@@ -6905,6 +6909,7 @@ rs6000_adjust_vec_address (rtx scalar_re
   rtx element_offset;
   rtx new_addr;
   bool valid_addr_p;
+  bool pcrel_p = TARGET_PCREL && pcrel_local_address (addr, Pmode);
 
   /* Vector addresses should not have PRE_INC, PRE_DEC, or PRE_MODIFY.  */
   gcc_assert (GET_RTX_CLASS (GET_CODE (addr)) != RTX_AUTOINC);
@@ -6942,6 +6947,41 @@ rs6000_adjust_vec_address (rtx scalar_re
   else if (REG_P (addr) || SUBREG_P (addr))
     new_addr = gen_rtx_PLUS (Pmode, addr, element_offset);
 
+
+  /* Optimize pc-relative addresses.  */
+  else if (pcrel_p)
+    {
+      if (CONST_INT_P (element_offset))
+	{
+	  rtx addr2 = addr;
+	  HOST_WIDE_INT offset = INTVAL (element_offset);
+
+	  if (GET_CODE (addr2) == CONST)
+	    addr2 = XEXP (addr2, 0);
+
+	  if (GET_CODE (addr2) == PLUS)
+	    {
+	      offset += INTVAL (XEXP (addr2, 1));
+	      addr2 = XEXP (addr2, 0);
+	    }
+
+	  gcc_assert (SIGNED_34BIT_OFFSET_P (offset));
+	  if (offset)
+	    {
+	      addr2 = gen_rtx_PLUS (Pmode, addr2, GEN_INT (offset));
+	      new_addr = gen_rtx_CONST (Pmode, addr2);
+	    }
+	  else
+	    new_addr = addr2;
+	}
+
+      /* Right now, the pc-relative support needs to be re-thought if you have
+	 a pc-relative address and a variable extract, due to having only have
+	 one base register tmp to use.  Fail until this is rewritten.  */
+      else
+	gcc_unreachable ();
+    }
+
   /* Optimize D-FORM addresses with constant offset with a constant element, to
      include the element offset in the address directly.  */
   else if (GET_CODE (addr) == PLUS)
@@ -6956,8 +6996,11 @@ rs6000_adjust_vec_address (rtx scalar_re
 	  HOST_WIDE_INT offset = INTVAL (op1) + INTVAL (element_offset);
 	  rtx offset_rtx = GEN_INT (offset);
 
-	  if (IN_RANGE (offset, -32768, 32767)
-	      && (scalar_size < 8 || (offset & 0x3) == 0))
+	  if (TARGET_PREFIXED_ADDR && SIGNED_34BIT_OFFSET_P (offset))
+	    new_addr = gen_rtx_PLUS (Pmode, op0, offset_rtx);
+
+	  else if (SIGNED_16BIT_OFFSET_P (offset)
+		   && (scalar_size < 8 || (offset & 0x3) == 0))
 	    new_addr = gen_rtx_PLUS (Pmode, op0, offset_rtx);
 	  else
 	    {
@@ -7007,9 +7050,8 @@ rs6000_adjust_vec_address (rtx scalar_re
 
   /* If we have a PLUS, we need to see whether the particular register class
      allows for D-FORM or X-FORM addressing.  */
-  if (GET_CODE (new_addr) == PLUS)
+  if (GET_CODE (new_addr) == PLUS || pcrel_p)
     {
-      rtx op1 = XEXP (new_addr, 1);
       addr_mask_type addr_mask;
       unsigned int scalar_regno = reg_or_subregno (scalar_reg);
 
@@ -7026,7 +7068,10 @@ rs6000_adjust_vec_address (rtx scalar_re
       else
 	gcc_unreachable ();
 
-      if (REG_P (op1) || SUBREG_P (op1))
+      if (pcrel_p)
+	valid_addr_p = (addr_mask & RELOAD_REG_OFFSET) != 0;
+      else if (REG_P (XEXP (new_addr, 1))
+	       || SUBREG_P (XEXP (new_addr, 1)))
 	valid_addr_p = (addr_mask & RELOAD_REG_INDEXED) != 0;
       else
 	valid_addr_p = (addr_mask & RELOAD_REG_OFFSET) != 0;
@@ -7454,6 +7499,13 @@ quad_address_p (rtx addr, machine_mode m
   if (VECTOR_MODE_P (mode) && !mode_supports_dq_form (mode))
     return false;
 
+  /* Is this a valid prefixed address?  If the bottom four bits of the offset
+     are non-zero, we could use a prefixed instruction (which does not have the
+     DQ-form constraint that the traditional instruction had) instead of
+     forcing the unaligned offset to a GPR.  */
+  if (prefixed_local_addr_p (addr, mode, TRAD_INSN_DQ))
+    return true;
+
   if (GET_CODE (addr) != PLUS)
     return false;
 
@@ -7555,6 +7607,13 @@ mem_operand_gpr (rtx op, machine_mode mo
       && legitimate_indirect_address_p (XEXP (addr, 0), false))
     return true;
 
+  /* Allow prefixed instructions if supported.  If the bottom two bits of the
+     offset are non-zero, we could use a prefixed instruction (which does not
+     have the DS-form constraint that the traditional instruction had) instead
+     of forcing the unaligned offset to a GPR.  */
+  if (prefixed_local_addr_p (addr, mode, TRAD_INSN_DS))
+    return true;
+
   /* Don't allow non-offsettable addresses.  See PRs 83969 and 84279.  */
   if (!rs6000_offsettable_memref_p (op, mode, false))
     return false;
@@ -7576,7 +7635,7 @@ mem_operand_gpr (rtx op, machine_mode mo
        causes a wrap, so test only the low 16 bits.  */
     offset = ((offset & 0xffff) ^ 0x8000) - 0x8000;
 
-  return offset + 0x8000 < 0x10000u - extra;
+  return SIGNED_16BIT_OFFSET_EXTRA_P (offset, extra);
 }
 
 /* As above, but for DS-FORM VSX insns.  Unlike mem_operand_gpr,
@@ -7589,6 +7648,13 @@ mem_operand_ds_form (rtx op, machine_mod
   int extra;
   rtx addr = XEXP (op, 0);
 
+  /* Allow prefixed instructions if supported.  If the bottom two bits of the
+     offset are non-zero, we could use a prefixed instruction (which does not
+     have the DS-form constraint that the traditional instruction had) instead
+     of forcing the unaligned offset to a GPR.  */
+  if (prefixed_local_addr_p (addr, mode, TRAD_INSN_DS))
+    return true;
+
   if (!offsettable_address_p (false, mode, addr))
     return false;
 
@@ -7609,7 +7675,7 @@ mem_operand_ds_form (rtx op, machine_mod
        causes a wrap, so test only the low 16 bits.  */
     offset = ((offset & 0xffff) ^ 0x8000) - 0x8000;
 
-  return offset + 0x8000 < 0x10000u - extra;
+  return SIGNED_16BIT_OFFSET_EXTRA_P (offset, extra);
 }
 \f
 /* Subroutines of rs6000_legitimize_address and rs6000_legitimate_address_p.  */
@@ -7958,8 +8024,10 @@ rs6000_legitimate_offset_address_p (mach
       break;
     }
 
-  offset += 0x8000;
-  return offset < 0x10000 - extra;
+  if (TARGET_PREFIXED_ADDR)
+    return SIGNED_34BIT_OFFSET_EXTRA_P (offset, extra);
+  else
+    return SIGNED_16BIT_OFFSET_EXTRA_P (offset, extra);
 }
 
 bool
@@ -8856,6 +8924,11 @@ rs6000_legitimate_address_p (machine_mod
       && mode_supports_pre_incdec_p (mode)
       && legitimate_indirect_address_p (XEXP (x, 0), reg_ok_strict))
     return 1;
+
+  /* Handle prefixed addresses (pc-relative or 34-bit offset).  */
+  if (prefixed_local_addr_p (x, mode, TRAD_INSN_DEFAULT))
+    return 1;
+
   /* Handle restricted vector d-form offsets in ISA 3.0.  */
   if (quad_offset_p)
     {
@@ -8914,7 +8987,10 @@ rs6000_legitimate_address_p (machine_mod
 	  || (!avoiding_indexed_address_p (mode)
 	      && legitimate_indexed_address_p (XEXP (x, 1), reg_ok_strict)))
       && rtx_equal_p (XEXP (XEXP (x, 1), 0), XEXP (x, 0)))
-    return 1;
+    {
+      /* There is no prefixed version of the load/store with update.  */
+      return !prefixed_local_addr_p (XEXP (x, 1), mode, TRAD_INSN_DEFAULT);
+    }
   if (reg_offset_p && !quad_offset_p
       && legitimate_lo_sum_address_p (mode, x, reg_ok_strict))
     return 1;
@@ -8976,8 +9052,12 @@ rs6000_mode_dependent_address (const_rtx
 	  && XEXP (addr, 0) != arg_pointer_rtx
 	  && CONST_INT_P (XEXP (addr, 1)))
 	{
-	  unsigned HOST_WIDE_INT val = INTVAL (XEXP (addr, 1));
-	  return val + 0x8000 >= 0x10000 - (TARGET_POWERPC64 ? 8 : 12);
+	  HOST_WIDE_INT val = INTVAL (XEXP (addr, 1));
+	  HOST_WIDE_INT extra = TARGET_POWERPC64 ? 8 : 12;
+	  if (TARGET_PREFIXED_ADDR)
+	    return !SIGNED_34BIT_OFFSET_EXTRA_P (val, extra);
+	  else
+	    return !SIGNED_16BIT_OFFSET_EXTRA_P (val, extra);
 	}
       break;
 
@@ -13950,6 +14030,34 @@ prefixed_local_addr_p (rtx addr,
 
   return false;
 }
+
+/* Make a memory address non-prefixed if it is prefixed.  */
+
+rtx
+make_memory_non_prefixed (rtx mem)
+{
+  gcc_assert (MEM_P (mem));
+  if (prefixed_local_addr_p (XEXP (mem, 0), GET_MODE (mem), TRAD_INSN_DEFAULT))
+    {
+      rtx old_addr = XEXP (mem, 0);
+      rtx new_addr;
+
+      if (GET_CODE (old_addr) == PLUS
+	  && (REG_P (XEXP (old_addr, 0)) || SUBREG_P (XEXP (old_addr, 0)))
+	  && CONST_INT_P (XEXP (old_addr, 1)))
+	{
+	  rtx tmp_reg = force_reg (Pmode, XEXP (old_addr, 1));
+	  new_addr = gen_rtx_PLUS (Pmode, XEXP (old_addr, 0), tmp_reg);
+	}
+      else
+	new_addr = force_reg (Pmode, old_addr);
+
+      mem = change_address (mem, VOIDmode, new_addr);
+    }
+
+  return mem;
+}
+
 \f
 /* Whether a load instruction is a prefixed instruction.  This is called from
    the prefixed attribute processing.  */
@@ -21060,7 +21168,8 @@ rs6000_rtx_costs (rtx x, machine_mode mo
 	    || outer_code == PLUS
 	    || outer_code == MINUS)
 	   && (satisfies_constraint_I (x)
-	       || satisfies_constraint_L (x)))
+	       || satisfies_constraint_L (x)
+	       || satisfies_constraint_eI (x)))
 	  || (outer_code == AND
 	      && (satisfies_constraint_K (x)
 		  || (mode == SImode
@@ -21440,6 +21549,42 @@ rs6000_debug_rtx_costs (rtx x, machine_m
   return ret;
 }
 
+/* How many real instructions are generated for this insn?  This is slightly
+   different from the length attribute, in that the length attribute counts the
+   number of bytes.  With prefixed instructions, we don't want to count a
+   prefixed instruction (length 12 bytes including possible NOP) as taking 3
+   instructions, but just one.  */
+
+static int
+rs6000_num_insns (rtx_insn *insn)
+{
+  /* Try to figure it out based on the length and whether there are prefixed
+     instructions.  While prefixed instructions are only 8 bytes, we have to
+     use 12 as the size of the first prefixed instruction in case the
+     instruction needs to be aligned.  Back to back prefixed instructions would
+     only take 20 bytes, since it is guaranteed that one of the prefixed
+     instructions does not need the alignment.  */
+  int length = get_attr_length (insn);
+
+  if (length >= 12 && TARGET_PREFIXED_ADDR
+      && get_attr_prefixed (insn) == PREFIXED_YES)
+    {
+      /* Single prefixed instruction.  */
+      if (length == 12)
+	return 1;
+
+      /* A normal instruction and a prefixed instruction (16) or two back
+	 to back prefixed instructions (20).  */
+      if (length == 16 || length == 20)
+	return 2;
+
+      /* Guess for larger instruction sizes.  */
+      return 2 + (length - 20) / 4;
+    }
+
+  return length / 4;
+}
+
 static int
 rs6000_insn_cost (rtx_insn *insn, bool speed)
 {
@@ -21453,7 +21598,7 @@ rs6000_insn_cost (rtx_insn *insn, bool s
   if (cost > 0)
     return cost;
 
-  int n = get_attr_length (insn) / 4;
+  int n = rs6000_num_insns (insn);
   enum attr_type type = get_attr_type (insn);
 
   switch (type)
Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(revision 274872)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -1761,15 +1761,17 @@ (define_expand "add<mode>3"
 })
 
 (define_insn "*add<mode>3"
-  [(set (match_operand:GPR 0 "gpc_reg_operand" "=r,r,r")
-	(plus:GPR (match_operand:GPR 1 "gpc_reg_operand" "%r,b,b")
-		  (match_operand:GPR 2 "add_operand" "r,I,L")))]
+  [(set (match_operand:GPR 0 "gpc_reg_operand" "=r,r,r,r")
+	(plus:GPR (match_operand:GPR 1 "gpc_reg_operand" "%r,b,b,b")
+		  (match_operand:GPR 2 "add_operand" "r,I,L,eI")))]
   ""
   "@
    add %0,%1,%2
    addi %0,%1,%2
-   addis %0,%1,%v2"
-  [(set_attr "type" "add")])
+   addis %0,%1,%v2
+   addi %0,%1,%2"
+  [(set_attr "type" "add")
+   (set_attr "isa" "*,*,*,fut")])
 
 (define_insn "*addsi3_high"
   [(set (match_operand:SI 0 "gpc_reg_operand" "=b")
@@ -6909,22 +6911,22 @@ (define_insn "movsi_low"
 
 ;;		MR           LA           LWZ          LFIWZX       LXSIWZX
 ;;		STW          STFIWX       STXSIWX      LI           LIS
-;;		#            XXLOR        XXSPLTIB 0   XXSPLTIB -1  VSPLTISW
-;;		XXLXOR 0     XXLORC -1    P9 const     MTVSRWZ      MFVSRWZ
-;;		MF%1         MT%0         NOP
+;;		PLI          #            XXLOR        XXSPLTIB 0   XXSPLTIB -1
+;;		VSPLTISW     XXLXOR 0     XXLORC -1    P9 const     MTVSRWZ
+;;		MFVSRWZ      MF%1         MT%0         NOP
 (define_insn "*movsi_internal1"
   [(set (match_operand:SI 0 "nonimmediate_operand"
 		"=r,         r,           r,           d,           v,
 		 m,          Z,           Z,           r,           r,
-		 r,          wa,          wa,          wa,          v,
-		 wa,         v,           v,           wa,          r,
-		 r,          *h,          *h")
+		 r,          r,           wa,          wa,          wa,
+		 v,          wa,          v,           v,           wa,
+		 r,          r,           *h,          *h")
 	(match_operand:SI 1 "input_operand"
 		"r,          U,           m,           Z,           Z,
 		 r,          d,           v,           I,           L,
-		 n,          wa,          O,           wM,          wB,
-		 O,          wM,          wS,          r,           wa,
-		 *h,         r,           0"))]
+		 eI,         n,           wa,          O,           wM,
+		 wB,         O,           wM,          wS,          r,
+		 wa,         *h,          r,           0"))]
   "gpc_reg_operand (operands[0], SImode)
    || gpc_reg_operand (operands[1], SImode)"
   "@
@@ -6938,6 +6940,7 @@ (define_insn "*movsi_internal1"
    stxsiwx %x1,%y0
    li %0,%1
    lis %0,%v1
+   li %0,%1
    #
    xxlor %x0,%x1,%x1
    xxspltib %x0,0
@@ -6954,21 +6957,21 @@ (define_insn "*movsi_internal1"
   [(set_attr "type"
 		"*,          *,           load,        fpload,      fpload,
 		 store,      fpstore,     fpstore,     *,           *,
-		 *,          veclogical,  vecsimple,   vecsimple,   vecsimple,
-		 veclogical, veclogical,  vecsimple,   mffgpr,      mftgpr,
-		 *,          *,           *")
+		 *,          *,           veclogical,  vecsimple,   vecsimple,
+		 vecsimple,  veclogical,  veclogical,  vecsimple,   mffgpr,
+		 mftgpr,     *,           *,           *")
    (set_attr "length"
 		"*,          *,           *,           *,           *,
 		 *,          *,           *,           *,           *,
-		 8,          *,           *,           *,           *,
-		 *,          *,           8,           *,           *,
-		 *,          *,           *")
+		 *,          8,           *,           *,           *,
+		 *,          *,           *,           8,           *,
+		 *,          *,           *,           *")
    (set_attr "isa"
 		"*,          *,           *,           p8v,         p8v,
 		 *,          p8v,         p8v,         *,           *,
-		 *,          p8v,         p9v,         p9v,         p8v,
-		 p9v,        p8v,         p9v,         p8v,         p8v,
-		 *,          *,           *")])
+		 fut,        *,           p8v,         p9v,         p9v,
+		 p8v,        p9v,         p8v,         p9v,         p8v,
+		 p8v,        *,           *,           *")])
 
 ;; Like movsi, but adjust a SF value to be used in a SI context, i.e.
 ;; (set (reg:SI ...) (subreg:SI (reg:SF ...) 0))
@@ -7113,14 +7116,15 @@ (define_insn "*movsi_from_df"
   "xscvdpsp %x0,%x1"
   [(set_attr "type" "fp")])
 
-;; Split a load of a large constant into the appropriate two-insn
-;; sequence.
+;; Split a load of a large constant into the appropriate two-insn sequence.  On
+;; systems that support PADDI (PLI), we can use PLI to load any 32-bit constant
+;; in one instruction.
 
 (define_split
   [(set (match_operand:SI 0 "gpc_reg_operand")
 	(match_operand:SI 1 "const_int_operand"))]
   "(unsigned HOST_WIDE_INT) (INTVAL (operands[1]) + 0x8000) >= 0x10000
-   && (INTVAL (operands[1]) & 0xffff) != 0"
+   && (INTVAL (operands[1]) & 0xffff) != 0 && !TARGET_PREFIXED_ADDR"
   [(set (match_dup 0)
 	(match_dup 2))
    (set (match_dup 0)
@@ -7759,9 +7763,18 @@ (define_expand "mov<mode>"
 ;; not swapped like they are for TImode or TFmode.  Subregs therefore are
 ;; problematical.  Don't allow direct move for this case.
 
+;;		FPR load    FPR store   FPR move    FPR zero    GPR load
+;;		GPR store   GPR move    GPR zero    MFVSRD      MTVSRD
+
 (define_insn_and_split "*mov<mode>_64bit_dm"
-  [(set (match_operand:FMOVE128_FPR 0 "nonimmediate_operand" "=m,d,d,d,Y,r,r,r,d")
-	(match_operand:FMOVE128_FPR 1 "input_operand" "d,m,d,<zero_fp>,r,<zero_fp>Y,r,d,r"))]
+  [(set (match_operand:FMOVE128_FPR 0 "nonimmediate_operand"
+		"=m,        d,          d,          d,          Y,
+		 r,         r,          r,          r,          d")
+
+	(match_operand:FMOVE128_FPR 1 "input_operand"
+		"d,         m,          d,          <zero_fp>,  r,
+		 <zero_fp>, Y,          r,          d,          r"))]
+
   "TARGET_HARD_FLOAT && TARGET_POWERPC64 && FLOAT128_2REG_P (<MODE>mode)
    && (<MODE>mode != TDmode || WORDS_BIG_ENDIAN)
    && (gpc_reg_operand (operands[0], <MODE>mode)
@@ -7769,9 +7782,13 @@ (define_insn_and_split "*mov<mode>_64bit
   "#"
   "&& reload_completed"
   [(pc)]
-{ rs6000_split_multireg_move (operands[0], operands[1]); DONE; }
-  [(set_attr "length" "8,8,8,8,12,12,8,8,8")
-   (set_attr "isa" "*,*,*,*,*,*,*,p8v,p8v")])
+{
+  rs6000_split_multireg_move (operands[0], operands[1]);
+  DONE;
+}
+  [(set_attr "isa" "*,*,*,*,*,*,*,*,p8v,p8v")
+   (set_attr "non_prefixed_length" "8")
+   (set_attr "prefixed_length" "20")])
 
 (define_insn_and_split "*movtd_64bit_nodm"
   [(set (match_operand:TD 0 "nonimmediate_operand" "=m,d,d,Y,r,r")
@@ -7782,8 +7799,12 @@ (define_insn_and_split "*movtd_64bit_nod
   "#"
   "&& reload_completed"
   [(pc)]
-{ rs6000_split_multireg_move (operands[0], operands[1]); DONE; }
-  [(set_attr "length" "8,8,8,12,12,8")])
+{
+  rs6000_split_multireg_move (operands[0], operands[1]);
+  DONE;
+}
+  [(set_attr "non_prefixed_length" "8")
+   (set_attr "prefixed_length" "20")])
 
 (define_insn_and_split "*mov<mode>_32bit"
   [(set (match_operand:FMOVE128_FPR 0 "nonimmediate_operand" "=m,d,d,d,Y,r,r")
@@ -8793,24 +8814,24 @@ (define_split
   [(pc)]
 { rs6000_split_multireg_move (operands[0], operands[1]); DONE; })
 
-;;              GPR store  GPR load   GPR move   GPR li     GPR lis     GPR #
-;;              FPR store  FPR load   FPR move   AVX store  AVX store   AVX load
-;;              AVX load   VSX move   P9 0       P9 -1      AVX 0/-1    VSX 0
-;;              VSX -1     P9 const   AVX const  From SPR   To SPR      SPR<->SPR
-;;              VSX->GPR   GPR->VSX
+;;              GPR store  GPR load   GPR move   GPR li     GPR lis     GPR pli
+;;              GPR #      FPR store  FPR load   FPR move   AVX store   AVX store
+;;              AVX load   AVX load   VSX move   P9 0       P9 -1       AVX 0/-1
+;;              VSX 0      VSX -1     P9 const   AVX const  From SPR    To SPR
+;;              SPR<->SPR  VSX->GPR   GPR->VSX
 (define_insn "*movdi_internal64"
   [(set (match_operand:DI 0 "nonimmediate_operand"
                "=YZ,       r,         r,         r,         r,          r,
-                m,         ^d,        ^d,        wY,        Z,          $v,
-                $v,        ^wa,       wa,        wa,        v,          wa,
-                wa,        v,         v,         r,         *h,         *h,
-                ?r,        ?wa")
+                r,         m,         ^d,        ^d,        wY,         Z,
+                $v,        $v,        ^wa,       wa,        wa,         v,
+                wa,        wa,        v,         v,         r,          *h,
+                *h,        ?r,        ?wa")
 	(match_operand:DI 1 "input_operand"
-               "r,         YZ,        r,         I,         L,          nF,
-                ^d,        m,         ^d,        ^v,        $v,         wY,
-                Z,         ^wa,       Oj,        wM,        OjwM,       Oj,
-                wM,        wS,        wB,        *h,        r,          0,
-                wa,        r"))]
+               "r,         YZ,        r,         I,         L,          eI,
+                nF,        ^d,        m,         ^d,        ^v,         $v,
+                wY,        Z,         ^wa,       Oj,        wM,         OjwM,
+                Oj,        wM,        wS,        wB,        *h,         r,
+                0,         wa,        r"))]
   "TARGET_POWERPC64
    && (gpc_reg_operand (operands[0], DImode)
        || gpc_reg_operand (operands[1], DImode))"
@@ -8820,6 +8841,7 @@ (define_insn "*movdi_internal64"
    mr %0,%1
    li %0,%1
    lis %0,%v1
+   li %0,%1
    #
    stfd%U0%X0 %1,%0
    lfd%U1%X1 %0,%1
@@ -8843,26 +8865,28 @@ (define_insn "*movdi_internal64"
    mtvsrd %x0,%1"
   [(set_attr "type"
                "store,      load,	*,         *,         *,         *,
-                fpstore,    fpload,     fpsimple,  fpstore,   fpstore,   fpload,
-                fpload,     veclogical, vecsimple, vecsimple, vecsimple, veclogical,
-                veclogical, vecsimple,  vecsimple, mfjmpr,    mtjmpr,    *,
-                mftgpr,    mffgpr")
+                *,          fpstore,    fpload,    fpsimple,  fpstore,   fpstore,
+                fpload,     fpload,     veclogical,vecsimple, vecsimple, vecsimple,
+                veclogical, veclogical, vecsimple,  vecsimple, mfjmpr,   mtjmpr,
+                *,          mftgpr,    mffgpr")
    (set_attr "size" "64")
    (set_attr "length"
-               "*,         *,         *,         *,         *,          20,
-                *,         *,         *,         *,         *,          *,
+               "*,         *,         *,         *,         *,          *,
+                20,        *,         *,         *,         *,          *,
                 *,         *,         *,         *,         *,          *,
-                *,         8,         *,         *,         *,          *,
-                *,         *")
+                *,         *,         8,         *,         *,          *,
+                *,         *,         *")
    (set_attr "isa"
-               "*,         *,         *,         *,         *,          *,
-                *,         *,         *,         p9v,       p7v,        p9v,
-                p7v,       *,         p9v,       p9v,       p7v,        *,
-                *,         p7v,       p7v,       *,         *,          *,
-                p8v,       p8v")])
+               "*,         *,         *,         *,         *,          fut,
+                *,         *,         *,         *,         p9v,        p7v,
+                p9v,       p7v,       *,         p9v,       p9v,        p7v,
+                *,         *,         p7v,       p7v,       *,          *,
+                *,         p8v,       p8v")])
 
 ; Some DImode loads are best done as a load of -1 followed by a mask
-; instruction.
+; instruction.  On systems that support the PADDI (PLI) instruction,
+; num_insns_constant returns 1, so these splitter would not be used for things
+; that be loaded with PLI.
 (define_split
   [(set (match_operand:DI 0 "int_reg_operand_not_pseudo")
 	(match_operand:DI 1 "const_int_operand"))]
@@ -8980,7 +9004,8 @@ (define_insn "*mov<mode>_ppc64"
   return rs6000_output_move_128bit (operands);
 }
   [(set_attr "type" "store,store,load,load,*,*")
-   (set_attr "length" "8")])
+   (set_attr "non_prefixed_length" "8,8,8,8,8,40")
+   (set_attr "prefixed_length" "20,20,20,20,8,40")])
 
 (define_split
   [(set (match_operand:TI2 0 "int_reg_operand")
@@ -11497,9 +11522,25 @@ (define_insn "stack_protect_setsi"
   [(set_attr "type" "three")
    (set_attr "length" "12")])
 
-(define_insn "stack_protect_setdi"
-  [(set (match_operand:DI 0 "memory_operand" "=Y")
-	(unspec:DI [(match_operand:DI 1 "memory_operand" "Y")] UNSPEC_SP_SET))
+(define_expand "stack_protect_setdi"
+  [(parallel [(set (match_operand:DI 0 "memory_operand")
+		   (unspec:DI [(match_operand:DI 1 "memory_operand")]
+		   UNSPEC_SP_SET))
+	      (set (match_scratch:DI 2)
+		   (const_int 0))])]
+  "TARGET_64BIT"
+{
+  if (TARGET_PREFIXED_ADDR)
+    {
+      operands[0] = make_memory_non_prefixed (operands[0]);
+      operands[1] = make_memory_non_prefixed (operands[1]);
+    }
+})
+
+(define_insn "*stack_protect_setdi"
+  [(set (match_operand:DI 0 "non_prefixed_mem_operand" "=YZ")
+	(unspec:DI [(match_operand:DI 1 "non_prefixed_mem_operand" "YZ")]
+		   UNSPEC_SP_SET))
    (set (match_scratch:DI 2 "=&r") (const_int 0))]
   "TARGET_64BIT"
   "ld%U1%X1 %2,%1\;std%U0%X0 %2,%0\;li %2,0"
@@ -11543,10 +11584,27 @@ (define_insn "stack_protect_testsi"
    lwz%U1%X1 %3,%1\;lwz%U2%X2 %4,%2\;cmplw %0,%3,%4\;li %3,0\;li %4,0"
   [(set_attr "length" "16,20")])
 
-(define_insn "stack_protect_testdi"
+(define_expand "stack_protect_testdi"
+  [(parallel [(set (match_operand:CCEQ 0 "cc_reg_operand")
+		   (unspec:CCEQ [(match_operand:DI 1 "memory_operand")
+				 (match_operand:DI 2 "memory_operand")]
+				UNSPEC_SP_TEST))
+	      (set (match_scratch:DI 4)
+		   (const_int 0))
+	      (clobber (match_scratch:DI 3))])]
+  "TARGET_64BIT"
+{
+  if (TARGET_PREFIXED_ADDR)
+    {
+      operands[0] = make_memory_non_prefixed (operands[0]);
+      operands[1] = make_memory_non_prefixed (operands[1]);
+    }
+})
+
+(define_insn "*stack_protect_testdi"
   [(set (match_operand:CCEQ 0 "cc_reg_operand" "=x,?y")
-        (unspec:CCEQ [(match_operand:DI 1 "memory_operand" "Y,Y")
-		      (match_operand:DI 2 "memory_operand" "Y,Y")]
+        (unspec:CCEQ [(match_operand:DI 1 "non_prefixed_mem_operand" "YZ,YZ")
+		      (match_operand:DI 2 "non_prefixed_mem_operand" "YZ,YZ")]
 		     UNSPEC_SP_TEST))
    (set (match_scratch:DI 4 "=r,r") (const_int 0))
    (clobber (match_scratch:DI 3 "=&r,&r"))]
Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md	(revision 274864)
+++ gcc/config/rs6000/vsx.md	(working copy)
@@ -1149,10 +1149,30 @@ (define_insn "vsx_mov<mode>_64bit"
                "vecstore,  vecload,   vecsimple, mffgpr,    mftgpr,    load,
                 store,     load,      store,     *,         vecsimple, vecsimple,
                 vecsimple, *,         *,         vecstore,  vecload")
-   (set_attr "length"
-               "*,         *,         *,         8,         *,         8,
-                8,         8,         8,         8,         *,         *,
-                *,         20,        8,         *,         *")
+   (set (attr "non_prefixed_length")
+	(cond [(and (eq_attr "alternative" "4")		;; MTVSRDD
+		    (match_test "TARGET_P9_VECTOR"))
+	       (const_string "4")
+
+	       (eq_attr "alternative" "3,4")		;; GPR <-> VSX
+	       (const_string "8")
+
+	       (eq_attr "alternative" "5,6,7,8")	;; GPR load/store
+	       (const_string "8")]
+	      (const_string "*")))
+
+   (set (attr "prefixed_length")
+	(cond [(and (eq_attr "alternative" "4")		;; MTVSRDD
+		    (match_test "TARGET_P9_VECTOR"))
+	       (const_string "4")
+
+	       (eq_attr "alternative" "3,4")		;; GPR <-> VSX
+	       (const_string "8")
+
+	       (eq_attr "alternative" "5,6,7,8")	;; GPR load/store
+	       (const_string "20")]
+	      (const_string "*")))
+
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
                 *,         *,         *,         *,         p9v,       *,

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH, V3, #5 of 10], Make -mpcrel default on little endian Linux systems
  2019-08-26 19:21 PowerPC future machine, version 3 Michael Meissner
                   ` (3 preceding siblings ...)
  2019-08-26 21:23 ` [PATCH, V3, #4 of 10], Add general prefixed/pcrel support Michael Meissner
@ 2019-08-26 21:43 ` Michael Meissner
  2019-08-30 19:46   ` Segher Boessenkool
  2019-08-26 21:52 ` [PATCH, V3, #6 of 10], Fix vec_extract breakage Michael Meissner
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 42+ messages in thread
From: Michael Meissner @ 2019-08-26 21:43 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, segher, dje.gcc

I rewrote the simple previous patch (V1 #5) to enable pc-relative addressing by
default, to only enable by default on little endian Linux systems.  Other
systems like AIX and BSD will default to not supporting prefixed addressing
until they are configured to support these addressing modes.

I added checks for 32-bit, ELF v1, and reworked the checks for small/large code
model.  The check is now done in one place.

I built a bootstrap compiler on a little endian power8 system and there were no
regressions in running make check.

In addition, I built cross compilers on my Linux x86_64 system, and checked
that even on Linux 64-bit little endian systems, you can disable the default
behavior by defining TARGET_PREFIXED_ADDR_DEFAULT and TARGET_PCREL_DEFAULT to 0
(the first disables both prefixed addressing with numeric offsets and
pc-relative addressing, while the second disables pc-relative addressing).

I built a bootstrap compiler on a little endian power8 system, and there were
no regressions in make check.  Can I check this into the trunk once the
previous patches are checked in?

2019-08-26  Michael Meissner  <meissner@linux.ibm.com>

	* config/rs6000/linux64.h (TARGET_PREFIXED_ADDR_DEFAULT): On
	'future' systems, enable prefixed addressing by default.
	(TARGET_PCREL_DEFAULT): On 'future' systems, enable pc-relative
	addressing by default.
	* config/rs6000/rs6000-cpus.def (FUTURE_ADDRESS_MASKS): New macro
	of 'future' addressing masks.
	(OTHER_FUTURE_MASKS): Use FUTURE_ADDRESS_MASKS.
	* config/rs6000/rs6000.c (TARGET_PREFIXED_ADDR_DEFAULT): If not
	defined, don't enable prefixed addressing on 'future' systems.
	(TARGET_PCREL_DEFAULT): If not defined, don't enable pc-relative
	addressing on 'future' systems.
	(rs6000_debug_reg_global): Print TARGET_PREFIXED_ADDR_DEFAULT and
	TARGET_PCREL_DEFAULT.
	(rs6000_option_override_internal): Add checks for 32-bit systems
	and non ELFv2 systems trying to enable prefixed addressing.  If
	the target OS tm.h says it is safe to do, enable prefixed and
	pc-relative addressing.

Index: gcc/config/rs6000/linux64.h
===================================================================
--- gcc/config/rs6000/linux64.h	(revision 274864)
+++ gcc/config/rs6000/linux64.h	(working copy)
@@ -640,3 +640,13 @@ extern int dot_symbols;
    enabling the __float128 keyword.  */
 #undef	TARGET_FLOAT128_ENABLE_TYPE
 #define TARGET_FLOAT128_ENABLE_TYPE 1
+
+/* By default enable support for pc-relative and numeric prefixed addressing on
+   the 'future' system, unless it is overriden at build time.  */
+#ifndef TARGET_PREFIXED_ADDR_DEFAULT
+#define TARGET_PREFIXED_ADDR_DEFAULT	1
+#endif
+
+#if !defined (TARGET_PCREL_DEFAULT) && TARGET_PREFIXED_ADDR_DEFAULT
+#define TARGET_PCREL_DEFAULT		1
+#endif
Index: gcc/config/rs6000/rs6000-cpus.def
===================================================================
--- gcc/config/rs6000/rs6000-cpus.def	(revision 274864)
+++ gcc/config/rs6000/rs6000-cpus.def	(working copy)
@@ -75,15 +75,21 @@
 				 | OPTION_MASK_P8_VECTOR		\
 				 | OPTION_MASK_P9_VECTOR)
 
-/* Support for a future processor's features.  Do not enable -mpcrel until it
-   is fully functional.  */
+/* Support for a future processor's features.  The prefixed and pc-relative
+   addressing bits are not added here.  Instead, rs6000.c adds them if the OS
+   tm.h says that it supports the addressing modes.  */
 #define ISA_FUTURE_MASKS_SERVER	(ISA_3_0_MASKS_SERVER			\
-				 | OPTION_MASK_FUTURE			\
+				 | OPTION_MASK_FUTURE)
+
+/* Addressing related flags on a future processor.  These flags are broken out
+   because not all targets will support either pc-relative addressing, or even
+   prefixed addressing, and we want to clear all of the addressing bits
+   on targets that cannot support prefixed/pcrel addressing.  */
+#define ADDRESSING_FUTURE_MASKS	(OPTION_MASK_PCREL			\
 				 | OPTION_MASK_PREFIXED_ADDR)
 
 /* Flags that need to be turned off if -mno-future.  */
-#define OTHER_FUTURE_MASKS	(OPTION_MASK_PCREL			\
-				 | OPTION_MASK_PREFIXED_ADDR)
+#define OTHER_FUTURE_MASKS	ADDRESSING_FUTURE_MASKS
 
 /* Flags that need to be turned off if -mno-power9-vector.  */
 #define OTHER_P9_VECTOR_MASKS	(OPTION_MASK_FLOAT128_HW		\
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 274874)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -98,6 +98,16 @@
 #endif
 #endif
 
+/* Set up the defaults for whether prefixed addressing is used, and if it is
+   used, whether we want to turn on pc-relative support by default.  */
+#ifndef TARGET_PREFIXED_ADDR_DEFAULT
+#define TARGET_PREFIXED_ADDR_DEFAULT	0
+#endif
+
+#ifndef TARGET_PCREL_DEFAULT
+#define TARGET_PCREL_DEFAULT		0
+#endif
+
 /* Support targetm.vectorize.builtin_mask_for_load.  */
 GTY(()) tree altivec_builtin_mask_for_load;
 
@@ -2523,6 +2533,14 @@ rs6000_debug_reg_global (void)
   if (TARGET_DIRECT_MOVE_128)
     fprintf (stderr, DEBUG_FMT_D, "VSX easy 64-bit mfvsrld element",
 	     (int)VECTOR_ELEMENT_MFVSRLD_64BIT);
+
+  if (TARGET_FUTURE)
+    {
+      fprintf (stderr, DEBUG_FMT_D, "TARGET_PREFIXED_ADDR_DEFAULT",
+	       TARGET_PREFIXED_ADDR_DEFAULT);
+      fprintf (stderr, DEBUG_FMT_D, "TARGET_PCREL_DEFAULT",
+	       TARGET_PCREL_DEFAULT);
+    }
 }
 
 \f
@@ -4217,26 +4235,6 @@ rs6000_option_override_internal (bool gl
       rs6000_isa_flags &= ~OPTION_MASK_FLOAT128_HW;
     }
 
-  /* -mprefixed-addr (and hence -mpcrel) requires -mcpu=future.  */
-  if (TARGET_PREFIXED_ADDR && !TARGET_FUTURE)
-    {
-      if ((rs6000_isa_flags_explicit & OPTION_MASK_PCREL) != 0)
-	error ("%qs requires %qs", "-mpcrel", "-mcpu=future");
-      else if ((rs6000_isa_flags_explicit & OPTION_MASK_PREFIXED_ADDR) != 0)
-	error ("%qs requires %qs", "-mprefixed-addr", "-mcpu=future");
-
-      rs6000_isa_flags &= ~(OPTION_MASK_PCREL | OPTION_MASK_PREFIXED_ADDR);
-    }
-
-  /* -mpcrel requires prefixed load/store addressing.  */
-  if (TARGET_PCREL && !TARGET_PREFIXED_ADDR)
-    {
-      if ((rs6000_isa_flags_explicit & OPTION_MASK_PCREL) != 0)
-	error ("%qs requires %qs", "-mpcrel", "-mprefixed-addr");
-
-      rs6000_isa_flags &= ~OPTION_MASK_PCREL;
-    }
-
   /* Print the options after updating the defaults.  */
   if (TARGET_DEBUG_REG || TARGET_DEBUG_TARGET)
     rs6000_print_isa_options (stderr, 0, "after defaults", rs6000_isa_flags);
@@ -4368,12 +4366,89 @@ rs6000_option_override_internal (bool gl
   SUB3TARGET_OVERRIDE_OPTIONS;
 #endif
 
-  /* -mpcrel requires -mcmodel=medium, but we can't check TARGET_CMODEL until
-      after the subtarget override options are done.  */
-  if (TARGET_PCREL && TARGET_CMODEL != CMODEL_MEDIUM)
+  /* Enable prefixed addressing and pc-relative addressing on 64-bit ELF v2
+     systems if the OS tm.h file says that it is supported and the user did not
+     explicitly use -mprefixed-addr or -mpcrel.  At the present time, only
+     64-bit Linux enables this.
+
+     Pc-relative support also requires the medium code model.
+
+     However, we can't check for ELFv2 or -mcmodel=medium until after the
+     subtarget macros are run.
+
+     If prefixed addressing is disabled by default, and the user does -mpcrel,
+     don't force them to also specify -mprefixed-addr.  */
+  if (TARGET_FUTURE)
+    {
+      bool explicit_prefixed = ((rs6000_isa_flags_explicit
+				 & OPTION_MASK_PREFIXED_ADDR) != 0);
+      bool explicit_pcrel = ((rs6000_isa_flags_explicit
+			      & OPTION_MASK_PCREL) != 0);
+
+      /* Prefixed addressing requires 64-bit registers.  */
+      if (!TARGET_POWERPC64)
+	{
+	  if (TARGET_PCREL && explicit_pcrel)
+	    error ("%qs requires %qs", "-mpcrel", "-m64");
+
+	  else if (TARGET_PREFIXED_ADDR && explicit_prefixed)
+	    error ("%qs requires %qs", "-mprefixed-addr", "-m64");
+
+	  rs6000_isa_flags &= ~ADDRESSING_FUTURE_MASKS;
+	}
+
+      /* Only ELFv2 currently supports prefixed/pcrel addressing.  */
+      else if (rs6000_current_abi != ABI_ELFv2)
+	{
+	  if (TARGET_PCREL && explicit_pcrel)
+	    error ("%qs requires %qs", "-mpcrel", "-mabi=elfv2");
+
+	  else if (TARGET_PREFIXED_ADDR && explicit_prefixed)
+	    error ("%qs requires %qs", "-mprefixed-addr", "-mabi=elfv2");
+
+	  rs6000_isa_flags &= ~ADDRESSING_FUTURE_MASKS;
+	}
+
+      /* Pc-relative requires the medium code model.  */
+      else if (TARGET_PCREL && TARGET_CMODEL != CMODEL_MEDIUM)
+	{
+	  if ((rs6000_isa_flags_explicit & OPTION_MASK_PCREL) != 0)
+	    error ("%qs requires %qs", "-mpcrel", "-mcmodel=medium");
+
+	  rs6000_isa_flags &= ~OPTION_MASK_PCREL;
+	}
+
+      /* Enable defaults if desired.  */
+      else
+	{
+	  if (!explicit_prefixed
+	      && (TARGET_PREFIXED_ADDR_DEFAULT
+		  || TARGET_PCREL
+		  || TARGET_PCREL_DEFAULT))
+	    rs6000_isa_flags |= OPTION_MASK_PREFIXED_ADDR;
+
+	  if (!explicit_pcrel && TARGET_PCREL_DEFAULT
+	      && TARGET_CMODEL == CMODEL_MEDIUM)
+	    rs6000_isa_flags |= OPTION_MASK_PCREL;
+	}
+    }
+
+  /* -mprefixed-addr (and hence -mpcrel) requires -mcpu=future.  */
+  if (TARGET_PREFIXED_ADDR && !TARGET_FUTURE)
+    {
+      if ((rs6000_isa_flags_explicit & OPTION_MASK_PCREL) != 0)
+	error ("%qs requires %qs", "-mpcrel", "-mcpu=future");
+      else if ((rs6000_isa_flags_explicit & OPTION_MASK_PREFIXED_ADDR) != 0)
+	error ("%qs requires %qs", "-mprefixed-addr", "-mcpu=future");
+
+      rs6000_isa_flags &= ~(OPTION_MASK_PCREL | OPTION_MASK_PREFIXED_ADDR);
+    }
+
+  /* -mpcrel requires prefixed load/store addressing.  */
+  if (TARGET_PCREL && !TARGET_PREFIXED_ADDR)
     {
       if ((rs6000_isa_flags_explicit & OPTION_MASK_PCREL) != 0)
-	error ("%qs requires %qs", "-mpcrel", "-mcmodel=medium");
+	error ("%qs requires %qs", "-mpcrel", "-mprefixed-addr");
 
       rs6000_isa_flags &= ~OPTION_MASK_PCREL;
     }

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH, V3, #6 of 10], Fix vec_extract breakage
  2019-08-26 19:21 PowerPC future machine, version 3 Michael Meissner
                   ` (4 preceding siblings ...)
  2019-08-26 21:43 ` [PATCH, V3, #5 of 10], Make -mpcrel default on little endian Linux systems Michael Meissner
@ 2019-08-26 21:52 ` Michael Meissner
  2019-09-03 19:49   ` Segher Boessenkool
  2019-08-26 22:06 ` [PATCH, V3, #7 of 10], Implement PCREL_OPT relocation optimization Michael Meissner
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 42+ messages in thread
From: Michael Meissner @ 2019-08-26 21:52 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, segher, dje.gcc

This patch fixes the bug I mentioned in a previous patch, i.e.

	#include <altivec.h>

	static vector double v;

	// ...

	double foo (int n)
	{
	    double x = vec_extract (v, n);
	    return x;
	}

would generate incorrect code because it only has one temporary register, and
it needs two temporary registers (one to hold the pc-relative address, and the
other to hold the index).  In the previous V3 patch #4 that added pc-relative
support, I put in an abort for this case.  This patch actually fixes it.

Originally, I solved it by just adding a predicate/condition to not allow a
pc-relative address to combine with the extract directly.  But I found the
reload pass was joining the two insns, so I added a new constraint (ep) to say
this memory insn must not involve a pc-relative address.

If you have an unused constraint pair that you would prefer instead of "ep", I
can easily switch to use that.

I built a boostrap compiler on a little endian power8 system, and there were no
regressions in running make check.  Can I check this change into the trunk once
the previous patches are checked in?

2019-08-26  Michael Meissner  <meissner@linux.ibm.com>

	* config/rs6000/constraints.md (ep constraint): New constraint.
	* config/rs6000/predicates.md (non_pcrel_mem_operand): New
	predicate.
	(reg_or_non_pcrel_operand): New predicate.
	* config/rs6000/vsx.md (vsx_extract_<mode>_var, VSX_D iterator):
	Don't allow pc-relative memory addresses.
	(vsx_extract_v4sf_var): Don't allow pc-relative memory addresses.
	(vsx_extract_<mode>_var, VSX_EXTRACT_I iterator): Don't allow
	pc-relative memory addresses.
	(vsx_extract_<mode>_<VS_scalar>mode_var): Don't allow pc-relative
	memory addresses.
	* doc/md.texi (PowerPC Constraints): Document ep constraint.

Index: gcc/config/rs6000/constraints.md
===================================================================
--- gcc/config/rs6000/constraints.md	(revision 274864)
+++ gcc/config/rs6000/constraints.md	(working copy)
@@ -210,6 +210,11 @@ several times, or that might not access
   (and (match_code "mem")
        (match_test "GET_RTX_CLASS (GET_CODE (XEXP (op, 0))) != RTX_AUTOINC")))
 
+(define_memory_constraint "ep"
+  "A memory operand that does not contain a pc-relative reference."
+  (and (match_code "mem")
+       (match_test "non_pcrel_mem_operand (op, mode)")))
+
 (define_memory_constraint "Q"
   "Memory operand that is an offset from a register (it is usually better
 to use @samp{m} or @samp{es} in @code{asm} statements)"
Index: gcc/config/rs6000/predicates.md
===================================================================
--- gcc/config/rs6000/predicates.md	(revision 274874)
+++ gcc/config/rs6000/predicates.md	(working copy)
@@ -1706,6 +1706,33 @@ (define_predicate "non_prefixed_mem_oper
 				 TRAD_INSN_DEFAULT);
 })
 
+;; Return 1 if op is a memory operand that does not contain a pc-relative
+;; address.
+(define_predicate "non_pcrel_mem_operand"
+  (match_code "mem")
+{
+  if (!memory_operand (op, mode))
+    return false;
+
+  return (!pcrel_local_address (XEXP (op, 0), Pmode)
+	  && !pcrel_ext_address (XEXP (op, 0), Pmode));
+})
+
+;; Return 1 if op is a register or a memory operand that does not contain a
+;; pc-relatve address.
+(define_predicate "reg_or_non_pcrel_operand"
+  (match_code "reg,subreg,mem")
+{
+  if (REG_P (op) || SUBREG_P (op))
+    return true;
+
+  if (!memory_operand (op, mode))
+    return false;
+
+  return (!pcrel_local_address (XEXP (op, 0), Pmode)
+	  && !pcrel_ext_address (XEXP (op, 0), Pmode));
+})
+
 ;; Match the first insn (addis) in fusing the combination of addis and loads to
 ;; GPR registers on power8.
 (define_predicate "fusion_gpr_addis"
Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md	(revision 274874)
+++ gcc/config/rs6000/vsx.md	(working copy)
@@ -3249,9 +3249,10 @@ (define_insn "vsx_vslo_<mode>"
 ;; Variable V2DI/V2DF extract
 (define_insn_and_split "vsx_extract_<mode>_var"
   [(set (match_operand:<VS_scalar> 0 "gpc_reg_operand" "=v,wa,r")
-	(unspec:<VS_scalar> [(match_operand:VSX_D 1 "input_operand" "v,m,m")
-			     (match_operand:DI 2 "gpc_reg_operand" "r,r,r")]
-			    UNSPEC_VSX_EXTRACT))
+	(unspec:<VS_scalar>
+	 [(match_operand:VSX_D 1 "reg_or_non_pcrel_operand" "v,ep,ep")
+	  (match_operand:DI 2 "gpc_reg_operand" "r,r,r")]
+	 UNSPEC_VSX_EXTRACT))
    (clobber (match_scratch:DI 3 "=r,&b,&b"))
    (clobber (match_scratch:V2DI 4 "=&v,X,X"))]
   "VECTOR_MEM_VSX_P (<MODE>mode) && TARGET_DIRECT_MOVE_64BIT"
@@ -3319,9 +3320,10 @@ (define_insn_and_split "*vsx_extract_v4s
 ;; Variable V4SF extract
 (define_insn_and_split "vsx_extract_v4sf_var"
   [(set (match_operand:SF 0 "gpc_reg_operand" "=wa,wa,?r")
-	(unspec:SF [(match_operand:V4SF 1 "input_operand" "v,m,m")
-		    (match_operand:DI 2 "gpc_reg_operand" "r,r,r")]
-		   UNSPEC_VSX_EXTRACT))
+	(unspec:SF
+	 [(match_operand:V4SF 1 "reg_or_non_pcrel_operand" "v,ep,ep")
+	  (match_operand:DI 2 "gpc_reg_operand" "r,r,r")]
+	 UNSPEC_VSX_EXTRACT))
    (clobber (match_scratch:DI 3 "=r,&b,&b"))
    (clobber (match_scratch:V2DI 4 "=&v,X,X"))]
   "VECTOR_MEM_VSX_P (V4SFmode) && TARGET_DIRECT_MOVE_64BIT"
@@ -3682,7 +3684,7 @@ (define_insn_and_split "*vsx_extract_<mo
 (define_insn_and_split "vsx_extract_<mode>_var"
   [(set (match_operand:<VS_scalar> 0 "gpc_reg_operand" "=r,r,r")
 	(unspec:<VS_scalar>
-	 [(match_operand:VSX_EXTRACT_I 1 "input_operand" "v,v,m")
+	 [(match_operand:VSX_EXTRACT_I 1 "reg_or_non_pcrel_operand" "v,v,ep")
 	  (match_operand:DI 2 "gpc_reg_operand" "r,r,r")]
 	 UNSPEC_VSX_EXTRACT))
    (clobber (match_scratch:DI 3 "=r,r,&b"))
@@ -3702,7 +3704,7 @@ (define_insn_and_split "*vsx_extract_<mo
   [(set (match_operand:<VS_scalar> 0 "gpc_reg_operand" "=r,r,r")
 	(zero_extend:<VS_scalar>
 	 (unspec:<VSX_EXTRACT_I:VS_scalar>
-	  [(match_operand:VSX_EXTRACT_I 1 "input_operand" "v,v,m")
+	  [(match_operand:VSX_EXTRACT_I 1 "reg_or_non_pcrel_operand" "v,v,ep")
 	   (match_operand:DI 2 "gpc_reg_operand" "r,r,r")]
 	  UNSPEC_VSX_EXTRACT)))
    (clobber (match_scratch:DI 3 "=r,r,&b"))
Index: gcc/doc/md.texi
===================================================================
--- gcc/doc/md.texi	(revision 274864)
+++ gcc/doc/md.texi	(working copy)
@@ -3343,6 +3343,9 @@ Constant whose negation is a signed 16-b
 @item eI
 Signed 34-bit integer constant if prefixed instructions are supported.
 
+@item ep
+A memory operand that does not include a pc-relative address.
+
 @item G
 Floating point constant that can be loaded into a register with one
 instruction per word

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH, V3, #7 of 10], Implement PCREL_OPT relocation optimization
  2019-08-26 19:21 PowerPC future machine, version 3 Michael Meissner
                   ` (5 preceding siblings ...)
  2019-08-26 21:52 ` [PATCH, V3, #6 of 10], Fix vec_extract breakage Michael Meissner
@ 2019-08-26 22:06 ` Michael Meissner
  2019-08-28 21:48   ` Michael Meissner
  2019-09-03 22:56   ` Segher Boessenkool
  2019-08-27  7:01 ` [PATCH, V3, #8 of 10], Miscellaneous prefixed addressing tests Michael Meissner
                   ` (2 subsequent siblings)
  9 siblings, 2 replies; 42+ messages in thread
From: Michael Meissner @ 2019-08-26 22:06 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, segher, dje.gcc

This patch is a slight rework on V1 patch #7 (V1 patch #6 is not going
to be re-submitted at this time).

This patch adds a new RTL pass that supports creating the optimization
and flagging the appropriate load of external pc-relative addresses and
the use of that address in the basic block.

Here is the comment from the beginning of rs6000-pcrel.c that describes
the optimization.

/* This file implements a RTL pass that looks for pc-relative loads of the
   address of an external variable using the PCREL_GOT relocation and a single
   load/store that uses that GOT pointer.  If that is found we create the
   PCREL_OPT relocation to possibly convert:

	pld b,var@pcrel@got(0),1

	# possibly other instructions that do not use the base register 'b' or
        # the result register 'r'.

	lwz r,0(b)

   into:

	plwz r,var@pcrel(0),1

	# possibly other instructions that do not use the base register 'b' or
        # the result register 'r'.

	nop

   If the variable is not defined in the main program or the code using it is
   not in the main program, the linker put the address in the .got section and
   do:

	.section .got
	.Lvar_got:	.dword var

	.section .text
	pld b,.Lvar_got@pcrel(0),1

	# possibly other instructions that do not use the base register 'b' or
        # the result register 'r'.

	lwz r,0(b)
	
   We only look for a single usage in the basic block where the GOT pointer is
   loaded.  Multiple uses or references in another basic block will force us to
   not use the PCREL_OPT relocation.  */

I have built a bootstrap compiler on a little endian power8 system, and
there wre no regressions when I ran make check.  Assuming the previous
patches are checked in, can I check this into the trunk?

[gcc]
2019-08-26  Michael Meissner  <meissner@linux.ibm.com>

	* config.gcc (powerpc*-*-*): Add rs6000-pcrel.c.
	(rs6000*-*-*): Add rs6000-pcrel.c.
	* config/rs6000/pcrel.md: New file.
	* config/rs6000/predicates.md (one_reg_memory_operand): New
	predicate.
	(pcrel_ext_mem_operand): New predicate.
	* config/rs6000/rs6000-cpus.def (ADDRESSING_FUTURE_MASKS): Add
	-mpcrel-opt.
	(POWERPC_MASKS): Add -mpcrel-opt.
	* config/rs6000/rs6000-passes.def: Add pcrel optimization pass.
	* config/rs6000/rs6000-pcrel.c: New file.
	* config/rs6000/rs6000-protos.h (make_pass_pcrel_opt): New
	declaration.
	* config/rs6000/rs6000.c (rs6000_option_override_internal): Add
	-mpcrel-opt support.
	(pcrel_opt_label_num): New state static flag.
	(rs6000_final_prescan_insn): Add -mpcrel-opt support.
	(rs6000_asm_output_opcode): Add -mpcrel-opt support.
	(rs6000_opt_masks): Add -mpcrel-opt.
	* config/rs6000/rs6000.md: Include pcrel.md.
	(pcrel_opt RTL attribute): New RTL attribute.
	* config/rs6000/t-rs6000 (rs6000-pcrel.o): Add build rules.
	(MD_INCLUDES): Add pcrel.md.

[gcc/testsuite]
2019-08-26   Michael Meissner  <meissner@linux.ibm.com>

	* gcc.target/powerpc/pcrel-opt-di.c: New test for -mpcrel-opt.

Index: gcc/config/rs6000/pcrel.md
===================================================================
--- gcc/config/rs6000/pcrel.md	(revision 274877)
+++ gcc/config/rs6000/pcrel.md	(working copy)
@@ -0,0 +1,563 @@
+;; PC relative support.
+;; Copyright (C) 2019 Free Software Foundation, Inc.
+;; Contributed by Peter Bergner <bergner@linux.ibm.com> and
+;;		  Michael Meissner <meissner@linux.ibm.com>
+
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published
+;; by the Free Software Foundation; either version 3, or (at your
+;; option) any later version.
+
+;; GCC is distributed in the hope that it will be useful, but WITHOUT
+;; ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+;; or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+;; License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+;;
+;; UNSPEC usage
+;;
+
+(define_c_enum "unspec"
+  [UNSPEC_PCREL_LD
+   UNSPEC_PCREL_ST
+  ])
+
+
+;; Optimize references to external variables to combine loading up the external
+;; address from the GOT and doing the load or store operation.
+;;
+;; A typical optimization looks like:
+;;
+;;		pld b,var@pcrel@got(0),1
+;;	100:
+;;		...
+;;		.reloc 100b-8,R_PPC64_PCREL_OPT,0
+;;		lwz r,0(b)
+;;
+;; If 'var' is an external variable defined in another module in the main
+;; program, and the code is being linked for the main program, then the
+;; linker can optimize this to:
+;;
+;;		plwz r,var(0),1
+;;	100:
+;;		...
+;;		nop
+;;
+;; If either the variable or the code being linked is defined in a shared
+;; library, then the linker puts the address in the GOT area, and the pld will
+;; load up the pointer, and then that pointer is used for the load or store.
+;; If there is more than one reference to the GOT pointer, the compiler will
+;; not do this optimization, and use the GOT pointer normally.
+;;
+;; Having the label after the pld instruction and using label-8 in the .reloc
+;; addresses the prefixed instruction properly.  If we put the label before the
+;; pld instruction, then the relocation might point to the NOP that is
+;; generated if the prefixed instruction is not aligned.
+;;
+;; We need to rewrite the normal GOT load operation before register allocation
+;; to include setting the eventual destination register for loads, or referring
+;; to the value being stored for store operations so that the proper register
+;; lifetime is set in case the optimization is done and the pld/lwz is
+;; converted to plwz/nop.
+
+(define_mode_iterator PO [QI HI SI DI SF DF
+			  V16QI V8HI V4SI V4SF V2DI V2DF V1TI KF
+			  (TF "FLOAT128_IEEE_P (TFmode)")])
+
+;; Vector types for pcrel optimization
+(define_mode_iterator POV [V16QI V8HI V4SI V4SF V2DI V2DF V1TI KF
+			   (TF "FLOAT128_IEEE_P (TFmode)")])
+
+;; Define the constraints for each mode for pcrel_opt.  The order of the
+;; constraints should have the most natural register class first.
+(define_mode_attr PO_constraint [(QI    "r,d,v")
+				 (HI    "r,d,v")
+				 (SI    "r,d,v")
+				 (DI    "r,d,v")
+				 (SF    "d,v,r")
+				 (DF    "d,v,r")
+				 (V16QI "wa,wn,wn")
+				 (V8HI  "wa,wn,wn")
+				 (V4SI  "wa,wn,wn")
+				 (V4SF  "wa,wn,wn")
+				 (V2DI  "wa,wn,wn")
+				 (V2DF  "wa,wn,wn")
+				 (V1TI  "wa,wn,wn")
+				 (KF    "wa,wn,wn")
+				 (TF    "wa,wn,wn")])
+
+;; Combiner pattern that combines the load of the GOT along with the load.  The
+;; first split pass before register allocation will split this into the load of
+;; the GOT that indicates the resultant value may be created if the PCREL_OPT
+;; relocation is done.
+;;
+;; The (set (match_dup 0)
+;;	    (unspec:<MODE> [(const_int 0)] UNSPEC_PCREL_LD))
+;;
+;; Is to signal to the register allocator that the destination register may be
+;; set by the GOT operation (if the linker does the optimization).
+;;
+;; We need to set the "cost" explicitly so that the instruction length is not
+;; used.  We return the same cost as a normal load (4 if we are not optimizing
+;; for speed, 8 if we are optimizing for speed)
+
+(define_insn_and_split "*mov<mode>_pcrel_opt_load"
+  [(set (match_operand:PO 0 "gpc_reg_operand")
+	(match_operand:PO 1 "pcrel_ext_mem_operand"))]
+  "TARGET_PCREL && TARGET_PCREL_OPT && TARGET_POWERPC64
+   && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(parallel [(set (match_dup 2)
+		   (match_dup 3))
+	      (set (match_dup 0)
+		   (unspec:<MODE> [(const_int 0)] UNSPEC_PCREL_LD))
+	      (use (const_int 0))])
+   (parallel [(set (match_dup 0)
+		   (match_dup 4))
+	      (use (match_dup 0))
+	      (use (const_int 0))])]
+{
+  rtx mem = operands[1];
+  rtx got = gen_reg_rtx (DImode);
+
+  operands[2] = got;
+  operands[3] = XEXP (mem, 0);
+  operands[4] = change_address (mem, <MODE>mode, got);
+}
+  [(set_attr "type" "load")
+   (set_attr "length" "16")
+   (set (attr "cost")
+	(if_then_else (match_test "optimize_function_for_speed_p (cfun)")
+		      (const_string "8")
+		      (const_string "4")))
+   (set_attr "prefixed" "yes")])
+
+;; Zero extend combiner patterns
+(define_insn_and_split "*mov<mode>_pcrel_opt_zero_extend"
+  [(set (match_operand:DI 0 "gpc_reg_operand")
+	(zero_extend:DI
+	 (match_operand:QHSI 1 "pcrel_ext_mem_operand")))]
+  "TARGET_PCREL && TARGET_PCREL_OPT && TARGET_POWERPC64
+   && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(parallel [(set (match_dup 2)
+		   (match_dup 3))
+	      (set (match_dup 0)
+		   (unspec:DI [(const_int 0)] UNSPEC_PCREL_LD))
+	      (use (const_int 0))])
+   (parallel [(set (match_dup 0)
+		   (zero_extend:DI
+		    (match_dup 4)))
+	      (use (match_dup 0))
+	      (use (const_int 0))])]
+{
+  rtx mem = operands[1];
+  rtx got = gen_reg_rtx (DImode);
+
+  operands[2] = got;
+  operands[3] = XEXP (mem, 0);
+  operands[4] = change_address (mem, <MODE>mode, got);
+}
+  [(set_attr "type" "load")
+   (set_attr "length" "16")
+   (set (attr "cost")
+	(if_then_else (match_test "optimize_function_for_speed_p (cfun)")
+		      (const_string "8")
+		      (const_string "4")))
+   (set_attr "prefixed" "yes")])
+
+;; Sign extend combiner patterns
+(define_insn_and_split "*mov<mode>_pcrel_opt_sign_extend"
+  [(set (match_operand:DI 0 "gpc_reg_operand")
+	(sign_extend:DI
+	 (match_operand:HSI 1 "pcrel_ext_mem_operand")))]
+  "TARGET_PCREL && TARGET_PCREL_OPT && TARGET_POWERPC64
+   && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(parallel [(set (match_dup 2)
+		   (match_dup 3))
+	      (set (match_dup 0)
+		   (unspec:DI [(const_int 0)] UNSPEC_PCREL_LD))
+	      (use (const_int 0))])
+   (parallel [(set (match_dup 0)
+		   (sign_extend:DI
+		    (match_dup 4)))
+	      (use (match_dup 0))
+	      (use (const_int 0))])]
+{
+  rtx mem = operands[1];
+  rtx got = gen_reg_rtx (DImode);
+
+  operands[2] = got;
+  operands[3] = XEXP (mem, 0);
+  operands[4] = change_address (mem, <MODE>mode, got);
+}
+  [(set_attr "type" "load")
+   (set_attr "length" "16")
+   (set (attr "cost")
+	(if_then_else (match_test "optimize_function_for_speed_p (cfun)")
+		      (const_string "8")
+		      (const_string "4")))
+   (set_attr "prefixed" "yes")])
+
+;; Float extend combiner pattern
+(define_insn_and_split "*movdf_pcrel_opt_float_extend"
+  [(set (match_operand:DF 0 "gpc_reg_operand")
+	(float_extend:DF
+	 (match_operand:SF 1 "pcrel_ext_mem_operand")))]
+  "TARGET_PCREL && TARGET_PCREL_OPT && TARGET_POWERPC64
+   && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(parallel [(set (match_dup 2)
+		   (match_dup 3))
+	      (set (match_dup 0)
+		   (unspec:DF [(const_int 0)] UNSPEC_PCREL_LD))
+	      (use (const_int 0))])
+   (parallel [(set (match_dup 0)
+		   (float_extend:DF
+		    (match_dup 4)))
+	      (use (match_dup 0))
+	      (use (const_int 0))])]
+{
+  rtx mem = operands[1];
+  rtx got = gen_reg_rtx (DImode);
+
+  operands[2] = got;
+  operands[3] = XEXP (mem, 0);
+  operands[4] = change_address (mem, SFmode, got);
+}
+  [(set_attr "type" "load")
+   (set_attr "length" "16")
+   (set (attr "cost")
+	(if_then_else (match_test "optimize_function_for_speed_p (cfun)")
+		      (const_string "8")
+		      (const_string "4")))
+   (set_attr "prefixed" "yes")])
+
+;; Patterns to load up the GOT address that may be changed into the load of the
+;; actual variable.
+(define_insn "*mov<mode>_pcrel_opt_load_got"
+  [(set (match_operand:DI 0 "base_reg_operand" "=b,b,b")
+	(match_operand:DI 1 "pcrel_ext_address"))
+   (set (match_operand:PO 2 "gpc_reg_operand" "=<PO_constraint>")
+	(unspec:PO [(const_int 0)] UNSPEC_PCREL_LD))
+   (use (match_operand:DI 3 "const_int_operand" "n,n,n"))]
+  "TARGET_PCREL && TARGET_PCREL_OPT && TARGET_POWERPC64"
+{
+  return (INTVAL (operands[3])) ? "ld %0,%a1\n.Lpcrel%3:" : "ld %0,%a1";
+}
+  [(set_attr "type" "load")
+   (set_attr "length" "12")
+   (set_attr "pcrel_opt" "load_got")
+   (set (attr "cost")
+	(if_then_else (match_test "optimize_function_for_speed_p (cfun)")
+		      (const_string "8")
+		      (const_string "4")))
+   (set_attr "prefixed" "yes")])
+
+;; The secondary load insns that uses the GOT pointer that may become a NOP.
+(define_insn "*mov<mode>_pcrel_opt_load_mem"
+  [(set (match_operand:QHI 0 "gpc_reg_operand" "+r,wa")
+	(match_operand:QHI 1 "one_reg_memory_operand" "Q,Q"))
+   (use (match_operand:QHI 2 "gpc_reg_operand" "0,0"))
+   (use (match_operand:DI 3 "const_int_operand" "n,n"))]
+  "TARGET_PCREL && TARGET_PCREL_OPT && TARGET_POWERPC64"
+  "@
+   l<wd>z %0,%1
+   lxsi<wd>zx %x0,%y1"
+  [(set_attr "type" "load,fpload")
+   (set_attr "pcrel_opt" "load,no")
+   (set_attr "prefixed" "no")])
+
+(define_insn "*movsi_pcrel_opt_load_mem"
+  [(set (match_operand:SI 0 "gpc_reg_operand" "+r,d,v")
+	(match_operand:SI 1 "one_reg_memory_operand" "Q,Q,Q"))
+   (use (match_operand:SI 2 "gpc_reg_operand" "0,0,0"))
+   (use (match_operand:DI 3 "const_int_operand" "n,n,n"))]
+  "TARGET_PCREL && TARGET_PCREL_OPT && TARGET_POWERPC64"
+  "@
+   lwz %0,%1
+   lfiwzx %0,%y1
+   lxsiwzx %x0,%y1"
+  [(set_attr "type" "load,fpload,fpload")
+   (set_attr "pcrel_opt" "load,no,no")
+   (set_attr "prefixed" "no")])
+
+(define_insn "*movdi_pcrel_opt_load_mem"
+  [(set (match_operand:DI 0 "gpc_reg_operand" "+r,d,v")
+	(match_operand:DI 1 "one_reg_memory_operand" "Q,Q,Q"))
+   (use (match_operand:DI 2 "gpc_reg_operand" "0,0,0"))
+   (use (match_operand:DI 3 "const_int_operand" "n,n,n"))]
+  "TARGET_PCREL && TARGET_PCREL_OPT && TARGET_POWERPC64"
+  "@
+   ld %0,%1
+   lfd %0,%1
+   lxsd %0,%1"
+  [(set_attr "type" "load,fpload,fpload")
+   (set_attr "pcrel_opt" "load")
+   (set_attr "prefixed" "no")])
+
+(define_insn "*movsf_pcrel_opt_load_mem"
+  [(set (match_operand:SF 0 "gpc_reg_operand" "+d,v,r")
+	(match_operand:SF 1 "one_reg_memory_operand" "Q,Q,Q"))
+   (use (match_operand:SF 2 "gpc_reg_operand" "0,0,0"))
+   (use (match_operand:DI 3 "const_int_operand" "n,n,n"))]
+  "TARGET_PCREL && TARGET_PCREL_OPT && TARGET_POWERPC64"
+  "@
+   lfs %0,%1
+   lxssp %0,%1
+   lwz %0,%1"
+  [(set_attr "type" "fpload,fpload,load")
+   (set_attr "pcrel_opt" "load")
+   (set_attr "prefixed" "no")])
+
+(define_insn "*movdf_pcrel_opt_load_mem"
+  [(set (match_operand:DF 0 "gpc_reg_operand" "+d,v,r")
+	(match_operand:DF 1 "one_reg_memory_operand" "Q,Q,Q"))
+   (use (match_operand:DF 2 "gpc_reg_operand" "0,0,0"))
+   (use (match_operand:DI 3 "const_int_operand" "n,n,n"))]
+  "TARGET_PCREL && TARGET_PCREL_OPT && TARGET_POWERPC64"
+  "@
+   lfd %0,%1
+   lxsd %0,%1
+   ld %0,%1"
+  [(set_attr "type" "fpload,fpload,load")
+   (set_attr "pcrel_opt" "load")
+   (set_attr "prefixed" "no")])
+
+(define_insn "*mov<mode>_pcrel_opt_load_mem"
+  [(set (match_operand:POV 0 "gpc_reg_operand" "+wa")
+	(match_operand:POV 1 "one_reg_memory_operand" "Q"))
+   (use (match_operand:POV 2 "gpc_reg_operand" "0"))
+   (use (match_operand:DI 3 "const_int_operand" "n"))]
+  "TARGET_PCREL && TARGET_PCREL_OPT && TARGET_POWERPC64"
+  "lxv %x0,%1"
+  [(set_attr "type" "vecload")
+   (set_attr "pcrel_opt" "load")
+   (set_attr "prefixed" "no")])
+
+;; Zero extend insns
+(define_insn "*mov<mode>_pcrel_opt_load_zero_extend2"
+  [(set (match_operand:DI 0 "gpc_reg_operand" "+r,wa")
+	(zero_extend:DI
+	 (match_operand:QHI 1 "one_reg_memory_operand" "Q,Q")))
+   (use (match_operand:DI 2 "gpc_reg_operand" "0,0"))
+   (use (match_operand:DI 3 "const_int_operand" "n,n"))]
+  "TARGET_PCREL && TARGET_PCREL_OPT && TARGET_POWERPC64"
+  "@
+   l<wd>z %0,%1
+   lxsi<wd>zx %x0,%y1"
+  [(set_attr "type" "load,fpload")
+   (set_attr "pcrel_opt" "load,no")
+   (set_attr "prefixed" "no")])
+
+(define_insn "*movsi_pcrel_opt_load_zero_extend2"
+  [(set (match_operand:DI 0 "gpc_reg_operand" "+r,d,v")
+	(zero_extend:DI
+	 (match_operand:SI 1 "one_reg_memory_operand" "Q,Q,Q")))
+   (use (match_operand:DI 2 "gpc_reg_operand" "0,0,0"))
+   (use (match_operand:DI 3 "const_int_operand" "n,n,n"))]
+  "TARGET_PCREL && TARGET_PCREL_OPT && TARGET_POWERPC64"
+  "@
+   lwz %0,%1
+   lfiwzx %0,%y1
+   lxsiwzx %x0,%y1"
+  [(set_attr "type" "load,fpload,fpload")
+   (set_attr "pcrel_opt" "load,no,no")
+   (set_attr "prefixed" "no")])
+
+;; Sign extend insns
+(define_insn "*movsi_pcrel_opt_load_sign_extend2"
+  [(set (match_operand:DI 0 "gpc_reg_operand" "+r,d,v")
+	(sign_extend:DI
+	 (match_operand:SI 1 "one_reg_memory_operand" "Q,Q,Q")))
+   (use (match_operand:DI 2 "gpc_reg_operand" "0,0,0"))
+   (use (match_operand:DI 3 "const_int_operand" "n,n,n"))]
+  "TARGET_PCREL && TARGET_PCREL_OPT && TARGET_POWERPC64"
+  "@
+   lwa %0,%1
+   lfiwax %0,%y1
+   lxsiwax %x0,%y1"
+  [(set_attr "type" "load,fpload,fpload")
+   (set_attr "pcrel_opt" "load,no,no")
+   (set_attr "prefixed" "no")])
+
+(define_insn_and_split "*movhi_pcrel_opt_load_sign_extend2"
+  [(set (match_operand:DI 0 "gpc_reg_operand" "+r,v")
+	(sign_extend:DI
+	 (match_operand:HI 1 "one_reg_memory_operand" "Q,Q")))
+   (use (match_operand:DI 2 "gpc_reg_operand" "0,0"))
+   (use (match_operand:DI 3 "const_int_operand" "n,n"))]
+  "TARGET_PCREL && TARGET_PCREL_OPT && TARGET_POWERPC64"
+  "@
+   lha %0,%1
+   #"
+  "&& reload_completed && altivec_register_operand (operands[0], HImode)"
+  [(parallel [(set (match_dup 4)
+		   (match_dup 1))
+	      (use (match_dup 4))
+	      (use (const_int 0))])
+   (set (match_dup 0)
+	(sign_extend:DI
+	 (match_dup 4)))]
+{
+  operands[4] = gen_rtx_REG (HImode, REGNO (operands[0]));
+}
+  [(set_attr "type" "load,fpload")
+   (set_attr "pcrel_opt" "load,no")
+   (set_attr "length" "4,8")
+   (set_attr "prefixed" "no")])
+
+;; Floating point extend insn
+(define_insn "*movsf_pcrel_opt_load_float_extend2"
+  [(set (match_operand:DF 0 "gpc_reg_operand" "+d,v")
+	(float_extend:DF
+	 (match_operand:SF 1 "one_reg_memory_operand" "Q,Q")))
+   (use (match_operand:DF 2 "gpc_reg_operand" "0,0"))
+   (use (match_operand:DI 3 "const_int_operand" "n,n"))]
+  "TARGET_PCREL && TARGET_PCREL_OPT && TARGET_POWERPC64"
+  "@
+   lfs %0,%1
+   lxssp %0,%1"
+  [(set_attr "type" "fpload")
+   (set_attr "pcrel_opt" "load")
+   (set_attr "prefixed" "no")])
+
+; ;; Store combiner insns that merge together loading up the address of the
+; ;; external variable and doing the store.  This is split in the first split
+; ;; pass before register allocation.
+;;
+;; We need to set the "cost" explicitly so that the instruction length is not
+;; used.  We return the same cost as a normal store (4).
+(define_insn_and_split "*mov<mode>_pcrel_opt_store"
+  [(set (match_operand:PO 0 "pcrel_ext_mem_operand")
+ 	(match_operand:PO 1 "gpc_reg_operand"))]
+   "TARGET_PCREL && TARGET_PCREL_OPT && TARGET_POWERPC64
+    && can_create_pseudo_p ()"
+   "#"
+   "&& 1"
+   [(set (match_dup 2)
+	 (unspec:DI [(match_dup 1)
+		     (match_dup 3)
+		     (const_int 0)] UNSPEC_PCREL_ST))
+    (parallel [(set (match_dup 4)
+		    (match_dup 1))
+	       (use (const_int 0))])]
+{
+  rtx mem = operands[0];
+  rtx addr = XEXP (mem, 0);
+  rtx got = gen_reg_rtx (DImode);
+
+  operands[2] = got;
+  operands[3] = addr;
+  operands[4] = change_address (mem, <MODE>mode, got);
+}
+  [(set_attr "type" "load")
+   (set_attr "length" "20")
+   (set_attr "pcrel_opt" "store_got")
+   (set_attr "cost" "4")
+   (set_attr "prefixed" "yes")])
+
+;; Load of the GOT address for a store operation that may be converted into a
+;; direct store.
+(define_insn "*mov<mode>_pcrel_opt_store_got"
+  [(set (match_operand:DI 0 "base_reg_operand" "=&b,&b,&b")
+	(unspec:DI [(match_operand:PO 1 "gpc_reg_operand" "<PO_constraint>")
+		    (match_operand:DI 2 "pcrel_ext_address")
+		    (match_operand:DI 3 "const_int_operand" "n,n,n")]
+		   UNSPEC_PCREL_ST))]
+  "TARGET_PCREL && TARGET_PCREL_OPT && TARGET_POWERPC64"
+{
+  return (INTVAL (operands[3])) ? "ld %0,%a2\n.Lpcrel%3:" : "ld %0,%a2";
+}
+  [(set_attr "type" "load")
+   (set_attr "length" "12")
+   (set_attr "pcrel_opt" "store_got")
+   (set_attr "cost" "4")
+   (set_attr "prefixed" "yes")])
+
+;; Secondary store instruction that uses the GOT pointer, and may be optimized
+;; into a NOP instruction.
+(define_insn "*mov<mode>_pcrel_opt_store_mem"
+  [(set (match_operand:QHI 0 "one_reg_memory_operand" "=Q,Q")
+	(match_operand:QHI 1 "gpc_reg_operand" "r,wa"))
+   (use (match_operand:DI 2 "const_int_operand" "n,n"))]
+  "TARGET_PCREL && TARGET_PCREL_OPT && TARGET_POWERPC64"
+  "@
+  st<wd> %1,%0
+  stxsi<wd>x %x1,%y0"
+  [(set_attr "type" "store,fpstore")
+   (set_attr "pcrel_opt" "store,no")
+   (set_attr "prefixed" "no")])
+
+(define_insn "*movsi_pcrel_opt_store_mem"
+  [(set (match_operand:SI 0 "one_reg_memory_operand" "=Q,Q,Q")
+	(match_operand:SI 1 "gpc_reg_operand" "r,d,v"))
+   (use (match_operand:DI 2 "const_int_operand" "n,n,n"))]
+  "TARGET_PCREL && TARGET_PCREL_OPT && TARGET_POWERPC64"
+  "@
+  stw %1,%0
+  stfiwx %1,%y0
+  stxsiwx %1,%y0"
+  [(set_attr "type" "store,fpstore,fpstore")
+   (set_attr "pcrel_opt" "store,no,no")
+   (set_attr "prefixed" "no")])
+
+(define_insn "*movdi_pcrel_opt_store_mem"
+  [(set (match_operand:DI 0 "one_reg_memory_operand" "=Q,Q,Q")
+	(match_operand:DI 1 "gpc_reg_operand" "r,d,v"))
+   (use (match_operand:DI 2 "const_int_operand" "n,n,n"))]
+  "TARGET_PCREL && TARGET_PCREL_OPT && TARGET_POWERPC64"
+  "@
+  std %1,%0
+  stfd %1,%0
+  stxsd %1,%0"
+  [(set_attr "type" "store,fpstore,fpstore")
+   (set_attr "pcrel_opt" "store")
+   (set_attr "prefixed" "no")])
+
+(define_insn "*movsf_pcrel_opt_store_mem"
+  [(set (match_operand:SF 0 "one_reg_memory_operand" "=Q,Q,Q")
+	(match_operand:SF 1 "gpc_reg_operand" "d,v,r"))
+   (use (match_operand:DI 2 "const_int_operand" "n,n,n"))]
+  "TARGET_PCREL && TARGET_PCREL_OPT && TARGET_POWERPC64"
+  "@
+  stfs %1,%0
+  stxssp %1,%0
+  stw %1,%0"
+  [(set_attr "type" "fpstore,fpstore,store")
+   (set_attr "pcrel_opt" "store")
+   (set_attr "prefixed" "no")])
+
+(define_insn "*movdf_pcrel_opt_store_mem"
+  [(set (match_operand:DF 0 "one_reg_memory_operand" "=Q,Q,Q")
+	(match_operand:DF 1 "gpc_reg_operand" "d,v,r"))
+   (use (match_operand:DI 2 "const_int_operand" "n,n,n"))]
+  "TARGET_PCREL && TARGET_PCREL_OPT && TARGET_POWERPC64"
+  "@
+  stfd %1,%0
+  stxsd %1,%0
+  std %1,%0"
+  [(set_attr "type" "fpstore,fpstore,store")
+   (set_attr "pcrel_opt" "store")
+   (set_attr "prefixed" "no")])
+
+(define_insn "*mov<mode>_pcrel_opt_store_mem"
+  [(set (match_operand:POV 0 "one_reg_memory_operand" "=Q")
+	(match_operand:POV 1 "gpc_reg_operand" "wa"))
+   (use (match_operand:DI 2 "const_int_operand" "n"))]
+  "TARGET_PCREL && TARGET_PCREL_OPT && TARGET_POWERPC64"
+  "stxv %x1,%0"
+  [(set_attr "type" "vecstore")
+   (set_attr "pcrel_opt" "store")
+   (set_attr "prefixed" "no")])
Index: gcc/config/rs6000/predicates.md
===================================================================
--- gcc/config/rs6000/predicates.md	(revision 274876)
+++ gcc/config/rs6000/predicates.md	(working copy)
@@ -775,6 +775,13 @@ (define_predicate "indexed_or_indirect_o
   return indexed_or_indirect_address (op, mode);
 })
 
+;; Return 1 if the operand uses a single register for the address.
+(define_predicate "one_reg_memory_operand"
+  (match_code "mem")
+{
+  return REG_P (XEXP (op, 0));
+})
+
 ;; Like indexed_or_indirect_operand, but also allow a GPR register if direct
 ;; moves are supported.
 (define_predicate "reg_or_indexed_operand"
@@ -1695,6 +1702,15 @@ (define_predicate "pcrel_ext_address"
   return (SYMBOL_REF_P (op) && !SYMBOL_REF_LOCAL_P (op));
 })
 
+;; Return 1 if op is a memory operand to an external variable when we
+;; support pc-relative addressing and the PCREL_OPT relocation to
+;; optimize references to it.
+(define_predicate "pcrel_ext_mem_operand"
+  (match_code "mem")
+{
+  return pcrel_ext_address (XEXP (op, 0), Pmode);
+})
+
 ;; Return 1 if op is a memory operand that is not prefixed.
 (define_predicate "non_prefixed_mem_operand"
   (match_code "mem")
Index: gcc/config/rs6000/rs6000-cpus.def
===================================================================
--- gcc/config/rs6000/rs6000-cpus.def	(revision 274875)
+++ gcc/config/rs6000/rs6000-cpus.def	(working copy)
@@ -86,6 +86,7 @@
    prefixed addressing, and we want to clear all of the addressing bits
    on targets that cannot support prefixed/pcrel addressing.  */
 #define ADDRESSING_FUTURE_MASKS	(OPTION_MASK_PCREL			\
+				 | OPTION_MASK_PCREL_OPT		\
 				 | OPTION_MASK_PREFIXED_ADDR)
 
 /* Flags that need to be turned off if -mno-future.  */
@@ -144,6 +145,7 @@
 				 | OPTION_MASK_P9_MISC			\
 				 | OPTION_MASK_P9_VECTOR		\
 				 | OPTION_MASK_PCREL			\
+				 | OPTION_MASK_PCREL_OPT		\
 				 | OPTION_MASK_POPCNTB			\
 				 | OPTION_MASK_POPCNTD			\
 				 | OPTION_MASK_POWERPC64		\
Index: gcc/config/rs6000/rs6000-passes.def
===================================================================
--- gcc/config/rs6000/rs6000-passes.def	(revision 274864)
+++ gcc/config/rs6000/rs6000-passes.def	(working copy)
@@ -25,3 +25,12 @@ along with GCC; see the file COPYING3.
  */
 
   INSERT_PASS_BEFORE (pass_cse, 1, pass_analyze_swaps);
+
+/* The pcrel_opt pass must be the final pass before final.  This pass combines
+   references to external pc-relative variables with their use.  There must be
+   only one reference to the external pointer loaded in order to do the
+   optimization.  Otherwise we load up the addresses (either via PADDI if the
+   label is local or via a PLD from the got section if it is defined in another
+   module) and the value as a base pointer.  */
+
+  INSERT_PASS_BEFORE (pass_final, 1, pass_pcrel_opt);
Index: gcc/config/rs6000/rs6000-pcrel.c
===================================================================
--- gcc/config/rs6000/rs6000-pcrel.c	(revision 274877)
+++ gcc/config/rs6000/rs6000-pcrel.c	(working copy)
@@ -0,0 +1,463 @@
+/* Subroutines used support the pc-relative linker optimization.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+/* This file implements a RTL pass that looks for pc-relative loads of the
+   address of an external variable using the PCREL_GOT relocation and a single
+   load/store that uses that GOT pointer.  If that is found we create the
+   PCREL_OPT relocation to possibly convert:
+
+	pld b,var@pcrel@got(0),1
+
+	# possibly other instructions that do not use the base register 'b' or
+        # the result register 'r'.
+
+	lwz r,0(b)
+
+   into:
+
+	plwz r,var@pcrel(0),1
+
+	# possibly other instructions that do not use the base register 'b' or
+        # the result register 'r'.
+
+	nop
+
+   If the variable is not defined in the main program or the code using it is
+   not in the main program, the linker put the address in the .got section and
+   do:
+
+	.section .got
+	.Lvar_got:	.dword var
+
+	.section .text
+	pld b,.Lvar_got@pcrel(0),1
+
+	# possibly other instructions that do not use the base register 'b' or
+        # the result register 'r'.
+
+	lwz r,0(b)
+	
+   We only look for a single usage in the basic block where the GOT pointer is
+   loaded.  Multiple uses or references in another basic block will force us to
+   not use the PCREL_OPT relocation.
+
+   This file also contains the support function for prefixed memory to emit the
+   leading 'p' in front of prefixed instructions, and to create the necessary
+   relocations needed for PCREL_OPT.  */
+
+#define IN_TARGET_CODE 1
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "rtl.h"
+#include "tree.h"
+#include "memmodel.h"
+#include "df.h"
+#include "tm_p.h"
+#include "ira.h"
+#include "print-tree.h"
+#include "varasm.h"
+#include "explow.h"
+#include "expr.h"
+#include "output.h"
+#include "tree-pass.h"
+#include "rtx-vector-builder.h"
+#include "print-rtl.h"
+#include "insn-attr.h"
+
+\f
+// Optimize pc-relative references
+const pass_data pass_data_pcrel =
+{
+  RTL_PASS,			// type
+  "pcrel",			// name
+  OPTGROUP_NONE,		// optinfo_flags
+  TV_NONE,			// tv_id
+  0,				// properties_required
+  0,				// properties_provided
+  0,				// properties_destroyed
+  0,				// todo_flags_start
+  TODO_df_finish,		// todo_flags_finish
+};
+
+// Pass data structures
+class pcrel : public rtl_opt_pass
+{
+private:
+  // Function to optimize pc relative loads/stores
+  unsigned int do_pcrel_opt (function *);
+
+  // A GOT pointer used for a load
+  void load_got (rtx_insn *);
+
+  // A load insn that uses the GOT ponter
+  void load_insn (rtx_insn *);
+
+  // A GOT pointer used for a store
+  void store_got (rtx_insn *);
+
+  // A store insn that uses the GOT ponter
+  void store_insn (rtx_insn *);
+
+  // Record the number of loads and stores optimized
+  unsigned long num_got_loads;
+  unsigned long num_got_stores;
+  unsigned long num_loads;
+  unsigned long num_stores;
+  unsigned long num_opt_loads;
+  unsigned long num_opt_stores;
+
+  // We record the GOT insn for each register that sets a GOT for a load or a
+  // store instruction.
+  rtx_insn *got_reg[32];
+
+public:
+  pcrel (gcc::context *ctxt)
+  : rtl_opt_pass (pass_data_pcrel, ctxt),
+    num_got_loads (0),
+    num_got_stores (0),
+    num_loads (0),
+    num_stores (0),
+    num_opt_loads (0),
+    num_opt_stores (0)
+  {}
+
+  ~pcrel (void)
+  {}
+
+  // opt_pass methods:
+  virtual bool gate (function *)
+  {
+    return TARGET_PCREL && TARGET_PCREL_OPT && optimize;
+  }
+
+  virtual unsigned int execute (function *fun)
+  {
+    return do_pcrel_opt (fun);
+  }
+
+  opt_pass *clone ()
+  {
+    return new pcrel (m_ctxt);
+  }
+};
+
+\f
+/* Return a marker to create the backward pointing label that links the load or
+   store to the insn that loads the adddress of an external label with
+   PCREL_GOT.  This allows us to create the necessary R_PPC64_PCREL_OPT
+   relocation to link the two instructions.  */
+
+static rtx
+pcrel_marker (void)
+{
+  static unsigned int label_number = 0;
+
+  label_number++;
+  return GEN_INT (label_number);
+}
+
+\f
+// Save the current PCREL_OPT load GOT insn address in the register # of the
+// GOT pointer that is loaded.
+//
+// The PCREL_OPT LOAD_GOT insn looks like:
+//
+//	(parallel [(set (base) (addr))
+//		   (set (reg)  (unspec [(const_int 0)] UNSPEC_PCREL_LD))
+//		   (use (marker))])
+//
+// The base register is the GOT address, and the marker is a numeric label that
+// is created in this pass if the only use of the GOT load pointer is for a
+// single load.
+
+void
+pcrel::load_got (rtx_insn *insn)
+{
+  rtx pattern = PATTERN (insn);
+  rtx set = XVECEXP (pattern, 0, 0);
+  int got = REGNO (SET_DEST (set));
+
+  gcc_assert (IN_RANGE (got, FIRST_GPR_REGNO+1, LAST_GPR_REGNO));
+  got_reg[got] = insn;
+  num_got_loads++;
+}
+
+// See if the use of this load of a GOT pointer is the only usage.  If so,
+// allocate a marker to create a label.
+//
+// The PCREL_OPT LOAD insn looks like:
+//
+//	(parallel [(set (reg) (mem))
+//		   (use (reg)
+//		   (use (marker))])
+//
+// Between the reg and the memory might be a SIGN_EXTEND, ZERO_EXTEND, or
+// FLOAT_EXTEND:
+//
+//	(parallel [(set (reg) (sign_extend (mem)))
+//		   (use (reg)
+//		   (use (marker))])
+
+void
+pcrel::load_insn (rtx_insn *insn)
+{
+  num_loads++;
+
+  /* If the optimizer has changed the load instruction, just use the GOT
+     pointer as an address.  */
+  rtx pattern = PATTERN (insn);
+  if (GET_CODE (pattern) != PARALLEL || XVECLEN (pattern, 0) != 3)
+    return;
+
+  rtx set = XVECEXP (pattern, 0, 0);
+  if (GET_CODE (set) != SET
+      || GET_CODE (XVECEXP (pattern, 0, 1)) != USE
+      || GET_CODE (XVECEXP (pattern, 0, 2)) != USE)
+    return;
+
+  rtx dest = SET_DEST (set);
+  rtx src = SET_SRC (set);
+
+  if (!rtx_equal_p (dest, XEXP (XVECEXP (pattern, 0, 1), 0)))
+    return;
+
+  if (GET_CODE (src) == SIGN_EXTEND || GET_CODE (src) == ZERO_EXTEND
+      || GET_CODE (src) == FLOAT_EXTEND)
+    src = XEXP (src, 0);
+
+  if (!MEM_P (src))
+    return;
+
+  rtx addr = XEXP (src, 0);
+  if (!REG_P (addr))
+    return;
+
+  int r = REGNO (addr);
+  if (!IN_RANGE (r, FIRST_GPR_REGNO+1, LAST_GPR_REGNO))
+    return;
+
+  rtx_insn *got_insn = got_reg[r];
+
+  // See if this is the only reference, and there is a set of the GOT pointer
+  // previously in the same basic block.  If this is the only reference,
+  // optimize it.
+  if (got_insn
+      && get_attr_pcrel_opt (got_insn) == PCREL_OPT_LOAD_GOT
+      && !reg_used_between_p (addr, got_insn, insn)
+      && (find_reg_note (insn, REG_DEAD, addr) || rtx_equal_p (dest, addr)))
+    {
+      rtx marker = pcrel_marker ();
+      rtx got_use = XVECEXP (PATTERN (got_insn), 0, 2);
+      rtx insn_use = XVECEXP (pattern, 0, 2);
+
+      gcc_checking_assert (rtx_equal_p (XEXP (got_use, 0), const0_rtx));
+      gcc_checking_assert (rtx_equal_p (XEXP (insn_use, 0), const0_rtx));
+
+      XEXP (got_use, 0) = marker;
+      XEXP (insn_use, 0) = marker;
+      num_opt_loads++;
+    }
+
+  // Forget the GOT now that we've used it.
+  got_reg[r] = (rtx_insn *)0;
+}
+
+// Save the current PCREL_OPT store GOT insn address in the register # of the
+// GOT pointer that is loaded.
+//
+// The PCREL_OPT STORE_GOT insn looks like:
+//
+//	(set (set (base)
+//	     (unspec:DI [(src)
+//			 (addr)
+//			 (marker)] UNSPEC_PCREL_ST))
+//
+// The base register is the GOT address, and the marker is a numeric label that
+// is created in this pass or 0 to indicate there are other uses of the GOT
+// pointer.
+
+void
+pcrel::store_got (rtx_insn *insn)
+{
+  rtx pattern = PATTERN (insn);
+  int got = REGNO (SET_DEST (pattern));
+
+  gcc_checking_assert (IN_RANGE (got, FIRST_GPR_REGNO+1, LAST_GPR_REGNO));
+  got_reg[got] = insn;
+  num_got_stores++;
+}
+
+// See if the use of this store using a GOT pointer is the only usage.  If so,
+// allocate a marker to create a label.
+//
+// The PCREL_OPT STORE insn looks like:
+//
+//	(parallel [(set (mem) (reg))
+//		   (use (marker))])
+
+void
+pcrel::store_insn (rtx_insn *insn)
+{
+  num_stores++;
+
+  /* If the optimizer has changed the store instruction, just use the GOT
+     pointer as an address.  */
+  rtx pattern = PATTERN (insn);
+  if (GET_CODE (pattern) != PARALLEL || XVECLEN (pattern, 0) != 2)
+    return;
+
+  rtx set = XVECEXP (pattern, 0, 0);
+  if (GET_CODE (set) != SET || GET_CODE (XVECEXP (pattern, 0, 1)) != USE)
+    return;
+
+  rtx dest = SET_DEST (set);
+
+  if (!MEM_P (dest))
+    return;
+
+  rtx addr = XEXP (dest, 0);
+  if (!REG_P (addr))
+    return;
+
+  int r = REGNO (addr);
+  if (!IN_RANGE (r, FIRST_GPR_REGNO+1, LAST_GPR_REGNO))
+    return;
+
+  rtx_insn *got_insn = got_reg[r];
+
+  // See if this is the only reference, and there is a GOT pointer previously.
+  // If this is the only reference, optimize it.
+  if (got_insn
+      && get_attr_pcrel_opt (got_insn) == PCREL_OPT_STORE_GOT
+      && !reg_used_between_p (addr, got_insn, insn)
+      && find_reg_note (insn, REG_DEAD, addr))
+    {
+      rtx marker = pcrel_marker ();
+      rtx got_src = SET_SRC (PATTERN (got_insn));
+      rtx insn_use = XVECEXP (pattern, 0, 1);
+
+      gcc_checking_assert (rtx_equal_p (XVECEXP (got_src, 0, 2), const0_rtx));
+      gcc_checking_assert (rtx_equal_p (XEXP (insn_use, 0), const0_rtx));
+
+      XVECEXP (got_src, 0, 2) = marker;
+      XEXP (insn_use, 0) = marker;
+      num_opt_stores++;
+    }
+
+  // Forget the GOT now
+  got_reg[r] = (rtx_insn *)0;
+}
+
+// Optimize pcrel external variable references
+
+unsigned int
+pcrel::do_pcrel_opt (function *fun)
+{
+  basic_block bb;
+  rtx_insn *insn, *curr_insn = 0;
+
+  // Dataflow analysis for use-def chains.
+  df_set_flags (DF_RD_PRUNE_DEAD_DEFS);
+  df_chain_add_problem (DF_DU_CHAIN | DF_UD_CHAIN);
+  df_analyze ();
+  df_set_flags (DF_DEFER_INSN_RESCAN | DF_LR_RUN_DCE);
+
+  // Look at each basic block to see if there is a load of an external
+  // variable's GOT address, and a single load/store using that GOT address.
+  FOR_ALL_BB_FN (bb, fun)
+    {
+      bool clear_got_p = true;
+
+      FOR_BB_INSNS_SAFE (bb, insn, curr_insn)
+	{
+	  if (clear_got_p)
+	    {
+	      memset ((void *) &got_reg[0], 0, sizeof (got_reg));
+	      clear_got_p = false;
+	    }
+
+	  if (NONJUMP_INSN_P (insn))
+	    {
+	      rtx pattern = PATTERN (insn);
+	      if (GET_CODE (pattern) == SET || GET_CODE (pattern) == PARALLEL)
+		{
+		  switch (get_attr_pcrel_opt (insn))
+		    {
+		    case PCREL_OPT_NO:
+		      break;
+
+		    case PCREL_OPT_LOAD_GOT:
+		      load_got (insn);
+		      break;
+
+		    case PCREL_OPT_LOAD:
+		      load_insn (insn);
+		      break;
+
+		    case PCREL_OPT_STORE_GOT:
+		      store_got (insn);
+		      break;
+
+		    case PCREL_OPT_STORE:
+		      store_insn (insn);
+		      break;
+
+		    default:
+		      gcc_unreachable ();
+		    }
+		}
+	    }
+
+	  /* Don't let the GOT load be moved before a label, jump, or call and
+	     the dependent load/store after the label, jump, or call.  */
+	  else if (JUMP_P (insn) || CALL_P (insn) || LABEL_P (insn))
+	    clear_got_p = true;
+	}
+    }
+
+  // Rebuild ud chains.
+  df_remove_problem (df_chain);
+  df_process_deferred_rescans ();
+  df_set_flags (DF_RD_PRUNE_DEAD_DEFS | DF_LR_RUN_DCE);
+  df_chain_add_problem (DF_UD_CHAIN);
+  df_analyze ();
+
+  if (dump_file)
+    {
+      fprintf (dump_file, "\npc-relative optimizations:\n");
+      fprintf (dump_file, "\tgot loads        = %lu\n", num_got_loads);
+      fprintf (dump_file, "\tpotential loads  = %lu\n", num_loads);
+      fprintf (dump_file, "\toptimized loads  = %lu\n", num_opt_loads);
+      fprintf (dump_file, "\tgot stores       = %lu\n", num_got_stores);
+      fprintf (dump_file, "\tpotential stores = %lu\n", num_stores);
+      fprintf (dump_file, "\toptimized stores = %lu\n\n", num_opt_stores);
+    }
+
+  return 0;
+}
+
+\f
+rtl_opt_pass *
+make_pass_pcrel_opt (gcc::context *ctxt)
+{
+  return new pcrel (ctxt);
+}
Index: gcc/config/rs6000/rs6000-protos.h
===================================================================
--- gcc/config/rs6000/rs6000-protos.h	(revision 274874)
+++ gcc/config/rs6000/rs6000-protos.h	(working copy)
@@ -266,6 +266,7 @@ extern bool rs6000_linux_float_exception
 namespace gcc { class context; }
 class rtl_opt_pass;
 
+extern rtl_opt_pass *make_pass_pcrel_opt (gcc::context *);
 extern rtl_opt_pass *make_pass_analyze_swaps (gcc::context *);
 extern bool rs6000_sum_of_two_registers_p (const_rtx expr);
 extern bool rs6000_quadword_masked_address_p (const_rtx exp);
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 274875)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -4415,7 +4415,7 @@ rs6000_option_override_internal (bool gl
 	  if ((rs6000_isa_flags_explicit & OPTION_MASK_PCREL) != 0)
 	    error ("%qs requires %qs", "-mpcrel", "-mcmodel=medium");
 
-	  rs6000_isa_flags &= ~OPTION_MASK_PCREL;
+	  rs6000_isa_flags &= ~(OPTION_MASK_PCREL | OPTION_MASK_PCREL_OPT);
 	}
 
       /* Enable defaults if desired.  */
@@ -4429,7 +4429,11 @@ rs6000_option_override_internal (bool gl
 
 	  if (!explicit_pcrel && TARGET_PCREL_DEFAULT
 	      && TARGET_CMODEL == CMODEL_MEDIUM)
-	    rs6000_isa_flags |= OPTION_MASK_PCREL;
+	    {
+	      rs6000_isa_flags |= OPTION_MASK_PCREL;
+	      if ((rs6000_isa_flags_explicit & OPTION_MASK_PCREL_OPT) == 0)
+		rs6000_isa_flags |= OPTION_MASK_PCREL_OPT;
+	    }
 	}
     }
 
@@ -4453,6 +4457,15 @@ rs6000_option_override_internal (bool gl
       rs6000_isa_flags &= ~OPTION_MASK_PCREL;
     }
 
+  /* Check -mfuture debug switches.  */
+  if (!TARGET_PCREL && TARGET_PCREL_OPT)
+    {
+      if ((rs6000_isa_flags_explicit & OPTION_MASK_PCREL_OPT) != 0)
+	error ("%qs requires %qs", "-mpcrel-opt", "-mpcrel");
+
+      rs6000_isa_flags &= ~OPTION_MASK_PCREL_OPT;
+    }
+
   if (TARGET_DEBUG_REG || TARGET_DEBUG_TARGET)
     rs6000_print_isa_options (stderr, 0, "after subtarget", rs6000_isa_flags);
 
@@ -14244,13 +14257,40 @@ prefixed_paddi_p (rtx_insn *insn)
    instruction is printed out.  */
 static bool next_insn_prefixed_p;
 
+/* Numeric label that is the address of the GOT load instruction + 8 that we
+   link the R_PPC64_PCREL_OPT relocation to for on the next instruction.  */
+static unsigned int pcrel_opt_label_num;
+
 /* Define FINAL_PRESCAN_INSN if some processing needs to be done before
    outputting the assembler code.  On the PowerPC, we remember if the current
-   insn is a prefixed insn where we need to emit a 'p' before the insn.  */
+   insn is a prefixed insn where we need to emit a 'p' before the insn.
+
+   In addition, if the insn is part of a pc-relative reference to an external
+   label optimization, this is recorded also.  */
 void
-rs6000_final_prescan_insn (rtx_insn *insn, rtx [], int)
+rs6000_final_prescan_insn (rtx_insn *insn, rtx operands[], int noperands)
 {
   next_insn_prefixed_p = (get_attr_prefixed (insn) != PREFIXED_NO);
+
+  enum attr_pcrel_opt pcrel_attr = get_attr_pcrel_opt (insn);
+
+  /* For the load and store instructions that are tied to a GOT pointer, we
+     know that operand 3 contains a marker for loads and operand 2 contains
+     the marker for stores.  If it is non-zero, it is the numeric label where
+     we load the address + 8.  */
+  if (pcrel_attr == PCREL_OPT_LOAD)
+    {
+      gcc_assert (noperands >= 3);
+      pcrel_opt_label_num = INTVAL (operands[3]);
+    }
+  else if (pcrel_attr == PCREL_OPT_STORE)
+    {
+      gcc_assert (noperands >= 2);
+      pcrel_opt_label_num = INTVAL (operands[2]);
+    }
+  else
+    pcrel_opt_label_num = 0;
+
   return;
 }
 
@@ -14260,6 +14300,13 @@ rs6000_final_prescan_insn (rtx_insn *ins
 void
 rs6000_asm_output_opcode (FILE *stream)
 {
+  if (pcrel_opt_label_num)
+    {
+      fprintf (stream, ".reloc .Lpcrel%u-8,R_PPC64_PCREL_OPT,.-(.Lpcrel%u-8)\n\t",
+	       pcrel_opt_label_num, pcrel_opt_label_num);
+      pcrel_opt_label_num = 0;
+    }
+
   if (next_insn_prefixed_p)
     fputc ('p', stream);
 
@@ -23422,6 +23469,7 @@ static struct rs6000_opt_mask const rs60
   { "mulhw",			OPTION_MASK_MULHW,		false, true  },
   { "multiple",			OPTION_MASK_MULTIPLE,		false, true  },
   { "pcrel",			OPTION_MASK_PCREL,		false, true  },
+  { "pcrel-opt",		OPTION_MASK_PCREL_OPT,		false, true  },
   { "popcntb",			OPTION_MASK_POPCNTB,		false, true  },
   { "popcntd",			OPTION_MASK_POPCNTD,		false, true  },
   { "power8-fusion",		OPTION_MASK_P8_FUSION,		false, true  },
Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(revision 274874)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -258,6 +258,31 @@ (define_attr "var_shift" "no,yes"
 ;; Is copying of this instruction disallowed?
 (define_attr "cannot_copy" "no,yes" (const_string "no"))
 
+;; Whether this instruction is part of the two instruction sequence that
+;; supports PCREL_OPT optimizations, where the linker can change code of the
+;; form:
+;;
+;;		pld b,var@got@pcrel
+;;	100:
+;;		# possibly other instructions
+;;		.reloc 100b-8,R_PPC64_PCREL_OPT,0
+;;		lwz r,0(b)
+;;
+;; into the following if 'var' is in the main program:
+;;
+;;		plwz r,0(b)
+;;		# possibly other instructions
+;;		nop
+;;
+;; The states are:
+;;	no		-- insn is not involved with PCREL_OPT optimizations
+;;	load_got	-- insn loads up the GOT pointer for a load instruction
+;;	load		-- insn is an offsettable load that uses the GOT pointer
+;;	store_got	-- insn loads up the GOT pointer for a store instruction
+;;	store		-- insn is an offsettable store that uses the GOT pointer
+
+(define_attr "pcrel_opt" "no,load_got,load,store_got,store" (const_string "no"))
+
 ;; Whether an insn is a prefixed insn, and an initial 'p' should be printed
 ;; before the instruction.  A prefixed instruction has a prefix instruction
 ;; word that extends the immediate value of the instructions from 12-16 bits to
@@ -14726,6 +14751,7 @@ (define_insn "*cmpeqb_internal"
   [(set_attr "type" "logical")])
 \f
 
+(include "pcrel.md")
 (include "sync.md")
 (include "vector.md")
 (include "vsx.md")
Index: gcc/config/rs6000/rs6000.opt
===================================================================
--- gcc/config/rs6000/rs6000.opt	(revision 274864)
+++ gcc/config/rs6000/rs6000.opt	(working copy)
@@ -577,3 +577,7 @@ Generate (do not generate) prefixed memo
 mpcrel
 Target Report Mask(PCREL) Var(rs6000_isa_flags)
 Generate (do not generate) pc-relative memory addressing.
+
+mpcrel-opt
+Target Undocumented Mask(PCREL_OPT) Var(rs6000_isa_flags)
+Generate (do not generate) pc-relative memory optimizations for externals.
Index: gcc/config/rs6000/t-rs6000
===================================================================
--- gcc/config/rs6000/t-rs6000	(revision 274864)
+++ gcc/config/rs6000/t-rs6000	(working copy)
@@ -47,6 +47,10 @@ rs6000-call.o: $(srcdir)/config/rs6000/r
 	$(COMPILE) $<
 	$(POSTCOMPILE)
 
+rs6000-pcrel.o: $(srcdir)/config/rs6000/rs6000-pcrel.c
+	$(COMPILE) $<
+	$(POSTCOMPILE)
+
 $(srcdir)/config/rs6000/rs6000-tables.opt: $(srcdir)/config/rs6000/genopt.sh \
   $(srcdir)/config/rs6000/rs6000-cpus.def
 	$(SHELL) $(srcdir)/config/rs6000/genopt.sh $(srcdir)/config/rs6000 > \
@@ -79,6 +83,7 @@ MD_INCLUDES = $(srcdir)/config/rs6000/rs
 	$(srcdir)/config/rs6000/predicates.md \
 	$(srcdir)/config/rs6000/constraints.md \
 	$(srcdir)/config/rs6000/darwin.md \
+	$(srcdir)/config/rs6000/pcrel.md \
 	$(srcdir)/config/rs6000/sync.md \
 	$(srcdir)/config/rs6000/vector.md \
 	$(srcdir)/config/rs6000/vsx.md \


-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH, V3, #8 of 10], Miscellaneous prefixed addressing tests
  2019-08-26 19:21 PowerPC future machine, version 3 Michael Meissner
                   ` (6 preceding siblings ...)
  2019-08-26 22:06 ` [PATCH, V3, #7 of 10], Implement PCREL_OPT relocation optimization Michael Meissner
@ 2019-08-27  7:01 ` Michael Meissner
  2019-09-03 23:17   ` Segher Boessenkool
  2019-08-27  7:14 ` [PATCH, V3, #10 of #10], Pc-relative tests Michael Meissner
  2019-08-27  7:55 ` [PATCH, V3, #9 of 10], Prefixed addressing tests with large offsets Michael Meissner
  9 siblings, 1 reply; 42+ messages in thread
From: Michael Meissner @ 2019-08-27  7:01 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, segher, dje.gcc

This patch contains the miscellaneous tests for GCC to test some features of
prefixed addressing.  It is exactly the same as patch V1 #8.

When I add patches 8-10 to the testsuite, all of these tests now run.  Once I
have checked in the previous patches, can I check this patch into the trunk?

(note, the new files were created in the branch before this patch was made).

2019-08-26  Michael Meissner  <meissner@linux.ibm.com>

	* gcc/testsuite/gcc.target/powerpc/prefix-odd-memory.c: New test.
	* gcc/testsuite/gcc.target/powerpc/paddi-1.c: New test.
	* gcc/testsuite/gcc.target/powerpc/paddi-2.c: New test.
	* gcc/testsuite/gcc.target/powerpc/paddi-3.c: New test.
	* gcc/testsuite/gcc.target/powerpc/prefix-premodify.c: New test.

Index: gcc/testsuite/gcc.target/powerpc/paddi-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/paddi-1.c	(revision 274879)
+++ gcc/testsuite/gcc.target/powerpc/paddi-1.c	(working copy)
@@ -0,0 +1,12 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target powerpc_future_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Test that PADDI is generated to add a large constant.  */
+unsigned long
+add (unsigned long a)
+{
+  return a + 0x12345678UL;
+}
+
+/* { dg-final { scan-assembler {\mpaddi\M} } } */
Index: gcc/testsuite/gcc.target/powerpc/paddi-2.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/paddi-2.c	(revision 274879)
+++ gcc/testsuite/gcc.target/powerpc/paddi-2.c	(working copy)
@@ -0,0 +1,12 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target powerpc_future_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Test that PLI (PADDI) is generated to load a large constant.  */
+unsigned long
+large (void)
+{
+  return 0x12345678UL;
+}
+
+/* { dg-final { scan-assembler {\mpli\M} } } */
Index: gcc/testsuite/gcc.target/powerpc/paddi-3.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/paddi-3.c	(revision 274879)
+++ gcc/testsuite/gcc.target/powerpc/paddi-3.c	(working copy)
@@ -0,0 +1,12 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target powerpc_future_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Test that PLI (PADDI) is generated to load a large constant for SImode.  */
+void
+large_si (unsigned int *p)
+{
+  *p = 0x12345U;
+}
+
+/* { dg-final { scan-assembler {\mpli\M} } } */
Index: gcc/testsuite/gcc.target/powerpc/prefix-odd-memory.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/prefix-odd-memory.c	(revision 274879)
+++ gcc/testsuite/gcc.target/powerpc/prefix-odd-memory.c	(working copy)
@@ -0,0 +1,156 @@
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
+/* { dg-require-effective-target powerpc_future_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Tests whether we can generate a prefixed load/store operation for addresses
+   that don't meet DS/DQ alignment constraints.  */
+
+unsigned long
+load_uc_odd (unsigned char *p)
+{
+  return p[1];				/* should generate LBZ.  */
+}
+
+long
+load_sc_odd (signed char *p)
+{
+  return p[1];				/* should generate LBZ + EXTSB.  */
+}
+
+unsigned long
+load_us_odd (unsigned char *p)
+{
+  return *(unsigned short *)(p + 1);	/* should generate LHZ.  */
+}
+
+long
+load_ss_odd (unsigned char *p)
+{
+  return *(short *)(p + 1);		/* should generate LHA.  */
+}
+
+unsigned long
+load_ui_odd (unsigned char *p)
+{
+  return *(unsigned int *)(p + 1);	/* should generate LWZ.  */
+}
+
+long
+load_si_odd (unsigned char *p)
+{
+  return *(int *)(p + 1);		/* should generate PLWA.  */
+}
+
+unsigned long
+load_ul_odd (unsigned char *p)
+{
+  return *(unsigned long *)(p + 1);	/* should generate PLD.  */
+}
+
+long
+load_sl_odd (unsigned char *p)
+{
+  return *(long *)(p + 1);	/* should generate PLD.  */
+}
+
+float
+load_float_odd (unsigned char *p)
+{
+  return *(float *)(p + 1);		/* should generate LFS.  */
+}
+
+double
+load_double_odd (unsigned char *p)
+{
+  return *(double *)(p + 1);		/* should generate LFD.  */
+}
+
+__ieee128
+load_ieee128_odd (unsigned char *p)
+{
+  return *(__ieee128 *)(p + 1);		/* should generate PLXV.  */
+}
+
+void
+store_uc_odd (unsigned char uc, unsigned char *p)
+{
+  p[1] = uc;				/* should generate STB.  */
+}
+
+void
+store_sc_odd (signed char sc, signed char *p)
+{
+  p[1] = sc;				/* should generate STB.  */
+}
+
+void
+store_us_odd (unsigned short us, unsigned char *p)
+{
+  *(unsigned short *)(p + 1) = us;	/* should generate STH.  */
+}
+
+void
+store_ss_odd (signed short ss, unsigned char *p)
+{
+  *(signed short *)(p + 1) = ss;	/* should generate STH.  */
+}
+
+void
+store_ui_odd (unsigned int ui, unsigned char *p)
+{
+  *(unsigned int *)(p + 1) = ui;	/* should generate STW.  */
+}
+
+void
+store_si_odd (signed int si, unsigned char *p)
+{
+  *(signed int *)(p + 1) = si;		/* should generate STW.  */
+}
+
+void
+store_ul_odd (unsigned long ul, unsigned char *p)
+{
+  *(unsigned long *)(p + 1) = ul;	/* should generate PSTD.  */
+}
+
+void
+store_sl_odd (signed long sl, unsigned char *p)
+{
+  *(signed long *)(p + 1) = sl;		/* should generate PSTD.  */
+}
+
+void
+store_float_odd (float f, unsigned char *p)
+{
+  *(float *)(p + 1) = f;		/* should generate STF.  */
+}
+
+void
+store_double_odd (double d, unsigned char *p)
+{
+  *(double *)(p + 1) = d;		/* should generate STD.  */
+}
+
+void
+store_ieee128_odd (__ieee128 ieee, unsigned char *p)
+{
+  *(__ieee128 *)(p + 1) = ieee;		/* should generate PSTXV.  */
+}
+
+/* { dg-final { scan-assembler-times {\mextsb\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mlbz\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mlfd\M}   1 } } */
+/* { dg-final { scan-assembler-times {\mlfs\M}   1 } } */
+/* { dg-final { scan-assembler-times {\mlha\M}   1 } } */
+/* { dg-final { scan-assembler-times {\mlhz\M}   1 } } */
+/* { dg-final { scan-assembler-times {\mlwz\M}   1 } } */
+/* { dg-final { scan-assembler-times {\mpld\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mplwa\M}  1 } } */
+/* { dg-final { scan-assembler-times {\mplxv\M}  1 } } */
+/* { dg-final { scan-assembler-times {\mpstd\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mpstxv\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mstb\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mstfd\M}  1 } } */
+/* { dg-final { scan-assembler-times {\mstfs\M}  1 } } */
+/* { dg-final { scan-assembler-times {\msth\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mstw\M}   2 } } */
Index: gcc/testsuite/gcc.target/powerpc/prefix-premodify.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/prefix-premodify.c	(revision 274879)
+++ gcc/testsuite/gcc.target/powerpc/prefix-premodify.c	(working copy)
@@ -0,0 +1,47 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target powerpc_future_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Make sure that we don't try to generate a prefixed form of the load and
+   store with update instructions.  */
+
+#ifndef SIZE
+#define SIZE 50000
+#endif
+
+struct foo {
+  unsigned int field;
+  char pad[SIZE];
+};
+
+struct foo *inc_load (struct foo *p, unsigned int *q)
+{
+  *q = (++p)->field;
+  return p;
+}
+
+struct foo *dec_load (struct foo *p, unsigned int *q)
+{
+  *q = (--p)->field;
+  return p;
+}
+
+struct foo *inc_store (struct foo *p, unsigned int *q)
+{
+  (++p)->field = *q;
+  return p;
+}
+
+struct foo *dec_store (struct foo *p, unsigned int *q)
+{
+  (--p)->field = *q;
+  return p;
+}
+
+/* { dg-final { scan-assembler-times {\mpli\M|\mpla\M|\mpaddi\M} 4 } } */
+/* { dg-final { scan-assembler-times {\mplwz\M}                  2 } } */
+/* { dg-final { scan-assembler-times {\mpstw\M}                  2 } } */
+/* { dg-final { scan-assembler-not   {\mp?lwzu\M}                  } } */
+/* { dg-final { scan-assembler-not   {\mp?stwzu\M}                 } } */
+/* { dg-final { scan-assembler-not   {\maddis\M}                   } } */
+/* { dg-final { scan-assembler-not   {\maddi\M}                    } } */

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH, V3, #10 of #10], Pc-relative tests
  2019-08-26 19:21 PowerPC future machine, version 3 Michael Meissner
                   ` (7 preceding siblings ...)
  2019-08-27  7:01 ` [PATCH, V3, #8 of 10], Miscellaneous prefixed addressing tests Michael Meissner
@ 2019-08-27  7:14 ` Michael Meissner
  2019-08-27  7:55 ` [PATCH, V3, #9 of 10], Prefixed addressing tests with large offsets Michael Meissner
  9 siblings, 0 replies; 42+ messages in thread
From: Michael Meissner @ 2019-08-27  7:14 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, segher, dje.gcc

This patch contains the pc-relative tests for GCC.  It is exactly the
same as patch V1 #10.

When I add patches 8-10 to the testsuite, all of these tests now run.  Once I
have checked in the previous patches, can I check this patch into the trunk?

(note, the new files were created in the branch before this patch was made).

2019-08-26  Michael Meissner  <meissner@linux.ibm.com>

	* gcc/testsuite/gcc.target/powerpc/prefix-large.h: New set of
	tests to test prefixed addressing on 'future' system with
	pc-relative addreses.
	* gcc/testsuite/gcc.target/powerpc/prefix-pcrel-dd.c: New test.
	* gcc/testsuite/gcc.target/powerpc/prefix-pcrel-df.c: New test.
	* gcc/testsuite/gcc.target/powerpc/prefix-pcrel-di.c: New test.
	* gcc/testsuite/gcc.target/powerpc/prefix-pcrel-hi.c: New test.
	* gcc/testsuite/gcc.target/powerpc/prefix-pcrel-kf.c: New test.
	* gcc/testsuite/gcc.target/powerpc/prefix-pcrel-qi.c: New test.
	* gcc/testsuite/gcc.target/powerpc/prefix-pcrel-sd.c: New test.
	* gcc/testsuite/gcc.target/powerpc/prefix-pcrel-sf.c: New test.
	* gcc/testsuite/gcc.target/powerpc/prefix-pcrel-si.c: New test.
	* gcc/testsuite/gcc.target/powerpc/prefix-pcrel-udi.c: New test.
	* gcc/testsuite/gcc.target/powerpc/prefix-pcrel-uhi.c: New test.
	* gcc/testsuite/gcc.target/powerpc/prefix-pcrel-uqi.c: New test.
	* gcc/testsuite/gcc.target/powerpc/prefix-pcrel-usi.c: New test.
	* gcc/testsuite/gcc.target/powerpc/prefix-pcrel-v2df.c: New test.

Index: gcc/testsuite/gcc.target/powerpc/prefix-pcrel-dd.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/prefix-pcrel-dd.c	(revision 274883)
+++ gcc/testsuite/gcc.target/powerpc/prefix-pcrel-dd.c	(working copy)
@@ -0,0 +1,13 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target powerpc_future_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Tests for prefixed instructions testing whether pc-relative prefixed
+   instructions are generated for SImode.  */
+
+#define TYPE _Decimal64
+
+#include "prefix-pcrel.h"
+
+/* { dg-final { scan-assembler-times {\mplfd\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mpstfd\M} 2 } } */
Index: gcc/testsuite/gcc.target/powerpc/prefix-pcrel-df.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/prefix-pcrel-df.c	(revision 274883)
+++ gcc/testsuite/gcc.target/powerpc/prefix-pcrel-df.c	(working copy)
@@ -0,0 +1,13 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target powerpc_future_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Tests for prefixed instructions testing whether pc-relative prefixed
+   instructions are generated for DFmode.  */
+
+#define TYPE double
+
+#include "prefix-pcrel.h"
+
+/* { dg-final { scan-assembler-times {\mplfd\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mpstfd\M} 2 } } */
Index: gcc/testsuite/gcc.target/powerpc/prefix-pcrel-di.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/prefix-pcrel-di.c	(revision 274883)
+++ gcc/testsuite/gcc.target/powerpc/prefix-pcrel-di.c	(working copy)
@@ -0,0 +1,13 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target powerpc_future_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Tests for prefixed instructions testing whether pc-relative prefixed
+   instructions are generated for DImode.  */
+
+#define TYPE long
+
+#include "prefix-pcrel.h"
+
+/* { dg-final { scan-assembler-times {\mpld\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mpstd\M} 2 } } */
Index: gcc/testsuite/gcc.target/powerpc/prefix-pcrel-hi.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/prefix-pcrel-hi.c	(revision 274883)
+++ gcc/testsuite/gcc.target/powerpc/prefix-pcrel-hi.c	(working copy)
@@ -0,0 +1,13 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target powerpc_future_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Tests for prefixed instructions testing whether pc-relative prefixed
+   instructions are generated for HImode.  */
+
+#define TYPE short
+
+#include "prefix-pcrel.h"
+
+/* { dg-final { scan-assembler-times {\mplh[az]\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mpsth\M}     2 } } */
Index: gcc/testsuite/gcc.target/powerpc/prefix-pcrel-kf.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/prefix-pcrel-kf.c	(revision 274883)
+++ gcc/testsuite/gcc.target/powerpc/prefix-pcrel-kf.c	(working copy)
@@ -0,0 +1,13 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target powerpc_future_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Tests for prefixed instructions testing whether pc-relative prefixed
+   instructions are generated for KFmode.  */
+
+#define TYPE __float128
+
+#include "prefix-pcrel.h"
+
+/* { dg-final { scan-assembler-times {\mplxv\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mpstxv\M} 2 } } */
Index: gcc/testsuite/gcc.target/powerpc/prefix-pcrel-qi.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/prefix-pcrel-qi.c	(revision 274883)
+++ gcc/testsuite/gcc.target/powerpc/prefix-pcrel-qi.c	(working copy)
@@ -0,0 +1,13 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target powerpc_future_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Tests for prefixed instructions testing whether pc-relative prefixed
+   instructions are generated for QImode.  */
+
+#define TYPE signed char
+
+#include "prefix-pcrel.h"
+
+/* { dg-final { scan-assembler-times {\mplbz\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mpstb\M}  2 } } */
Index: gcc/testsuite/gcc.target/powerpc/prefix-pcrel-sd.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/prefix-pcrel-sd.c	(revision 274883)
+++ gcc/testsuite/gcc.target/powerpc/prefix-pcrel-sd.c	(working copy)
@@ -0,0 +1,12 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target powerpc_future_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Tests for prefixed instructions testing whether pc-relative prefixed
+   instructions are generated for SImode.  */
+
+#define TYPE _Decimal32
+
+#include "prefix-pcrel.h"
+
+/* { dg-final { scan-assembler-times {\mpaddi|\mpla\M} 3 } } */
Index: gcc/testsuite/gcc.target/powerpc/prefix-pcrel-sf.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/prefix-pcrel-sf.c	(revision 274883)
+++ gcc/testsuite/gcc.target/powerpc/prefix-pcrel-sf.c	(working copy)
@@ -0,0 +1,13 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target powerpc_future_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Tests for prefixed instructions testing whether pc-relative prefixed
+   instructions are generated for SFmode.  */
+
+#define TYPE float
+
+#include "prefix-pcrel.h"
+
+/* { dg-final { scan-assembler-times {\mplfs\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mpstfs\M} 2 } } */
Index: gcc/testsuite/gcc.target/powerpc/prefix-pcrel-si.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/prefix-pcrel-si.c	(revision 274883)
+++ gcc/testsuite/gcc.target/powerpc/prefix-pcrel-si.c	(working copy)
@@ -0,0 +1,13 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target powerpc_future_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Tests for prefixed instructions testing whether pc-relative prefixed
+   instructions are generated for SImode.  */
+
+#define TYPE int
+
+#include "prefix-pcrel.h"
+
+/* { dg-final { scan-assembler-times {\mplw[az]\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mpstw\M}     2 } } */
Index: gcc/testsuite/gcc.target/powerpc/prefix-pcrel-udi.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/prefix-pcrel-udi.c	(revision 274883)
+++ gcc/testsuite/gcc.target/powerpc/prefix-pcrel-udi.c	(working copy)
@@ -0,0 +1,13 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target powerpc_future_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Tests for prefixed instructions testing whether pc-relative prefixed
+   instructions are generated for unsigned DImode.  */
+
+#define TYPE unsigned long
+
+#include "prefix-pcrel.h"
+
+/* { dg-final { scan-assembler-times {\mpld\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mpstd\M} 2 } } */
Index: gcc/testsuite/gcc.target/powerpc/prefix-pcrel-uhi.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/prefix-pcrel-uhi.c	(revision 274883)
+++ gcc/testsuite/gcc.target/powerpc/prefix-pcrel-uhi.c	(working copy)
@@ -0,0 +1,13 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target powerpc_future_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Tests for prefixed instructions testing whether pc-relative prefixed
+   instructions are generated for unsigned HImode.  */
+
+#define TYPE unsigned short
+
+#include "prefix-pcrel.h"
+
+/* { dg-final { scan-assembler-times {\mplhz\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mpsth\M}  2 } } */
Index: gcc/testsuite/gcc.target/powerpc/prefix-pcrel-uqi.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/prefix-pcrel-uqi.c	(revision 274883)
+++ gcc/testsuite/gcc.target/powerpc/prefix-pcrel-uqi.c	(working copy)
@@ -0,0 +1,13 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target powerpc_future_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Tests for prefixed instructions testing whether pc-relative prefixed
+   instructions are generated for unsigned QImode.  */
+
+#define TYPE unsigned char
+
+#include "prefix-pcrel.h"
+
+/* { dg-final { scan-assembler-times {\mplbz\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mpstb\M}  2 } } */
Index: gcc/testsuite/gcc.target/powerpc/prefix-pcrel-usi.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/prefix-pcrel-usi.c	(revision 274883)
+++ gcc/testsuite/gcc.target/powerpc/prefix-pcrel-usi.c	(working copy)
@@ -0,0 +1,13 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target powerpc_future_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Tests for prefixed instructions testing whether pc-relative prefixed
+   instructions are generated for unsigned SImode.  */
+
+#define TYPE unsigned int
+
+#include "prefix-pcrel.h"
+
+/* { dg-final { scan-assembler-times {\mplwz\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mpstw\M}  2 } } */
Index: gcc/testsuite/gcc.target/powerpc/prefix-pcrel-v2df.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/prefix-pcrel-v2df.c	(revision 274883)
+++ gcc/testsuite/gcc.target/powerpc/prefix-pcrel-v2df.c	(working copy)
@@ -0,0 +1,13 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target powerpc_future_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Tests for prefixed instructions testing whether pc-relative prefixed
+   instructions are generated for V2DFmode.  */
+
+#define TYPE vector double
+
+#include "prefix-pcrel.h"
+
+/* { dg-final { scan-assembler-times {\mplxv\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mpstxv\M} 2 } } */
Index: gcc/testsuite/gcc.target/powerpc/prefix-pcrel.h
===================================================================
--- gcc/testsuite/gcc.target/powerpc/prefix-pcrel.h	(revision 274883)
+++ gcc/testsuite/gcc.target/powerpc/prefix-pcrel.h	(working copy)
@@ -0,0 +1,58 @@
+/* Common tests for prefixed instructions testing whether pc-relative prefixed
+   instructions are generated for each type.  */
+
+typedef signed char	schar;
+typedef unsigned char	uchar;
+typedef unsigned short	ushort;
+typedef unsigned int	uint;
+typedef unsigned long	ulong;
+typedef long double	ldouble;
+typedef vector double	v2df;
+typedef vector long	v2di;
+typedef vector float	v4sf;
+typedef vector int	v4si;
+
+#ifndef TYPE
+#define TYPE ulong
+#endif
+
+#ifndef ITYPE
+#define ITYPE TYPE
+#endif
+
+#ifndef OTYPE
+#define OTYPE TYPE
+#endif
+
+static TYPE a;
+TYPE *p = &a;
+
+#if !defined(DO_ADD) && !defined(DO_VALUE) && !defined(DO_SET)
+#define DO_ADD		1
+#define DO_VALUE	1
+#define DO_SET		1
+#endif
+
+#if DO_ADD
+void
+add (TYPE b)
+{
+  a += b;
+}
+#endif
+
+#if DO_VALUE
+OTYPE
+value (void)
+{
+  return (OTYPE)a;
+}
+#endif
+
+#if DO_SET
+void
+set (ITYPE b)
+{
+  a = (TYPE)b;
+}
+#endif

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH, V3, #9 of 10], Prefixed addressing tests with large offsets
  2019-08-26 19:21 PowerPC future machine, version 3 Michael Meissner
                   ` (8 preceding siblings ...)
  2019-08-27  7:14 ` [PATCH, V3, #10 of #10], Pc-relative tests Michael Meissner
@ 2019-08-27  7:55 ` Michael Meissner
  2019-09-03 23:22   ` Segher Boessenkool
  9 siblings, 1 reply; 42+ messages in thread
From: Michael Meissner @ 2019-08-27  7:55 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, segher, dje.gcc

This patch contains the tests for prefixed addressing with large
offsets.  It is exactly the same as patch V1 #9.

When I add patches 8-10 to the testsuite, all of these tests now run.  Once I
have checked in the previous patches, can I check this patch into the trunk?

(note, the new files were created in the branch before this patch was made).

2019-08-26  Michael Meissner  <meissner@linux.ibm.com>

	* gcc/testsuite/gcc.target/powerpc/prefix-large.h: New set of
	tests to test prefixed addressing on 'future' system with large
	numeric offsets.
	* gcc/testsuite/gcc.target/powerpc/prefix-large-dd.c: New test.
	* gcc/testsuite/gcc.target/powerpc/prefix-large-df.c: New test.
	* gcc/testsuite/gcc.target/powerpc/prefix-large-di.c: New test.
	* gcc/testsuite/gcc.target/powerpc/prefix-large-hi.c: New test.
	* gcc/testsuite/gcc.target/powerpc/prefix-large-kf.c: New test.
	* gcc/testsuite/gcc.target/powerpc/prefix-large-qi.c: New test.
	* gcc/testsuite/gcc.target/powerpc/prefix-large-sd.c: New test.
	* gcc/testsuite/gcc.target/powerpc/prefix-large-sf.c: New test.
	* gcc/testsuite/gcc.target/powerpc/prefix-large-si.c: New test.
	* gcc/testsuite/gcc.target/powerpc/prefix-large-udi.c: New test.
	* gcc/testsuite/gcc.target/powerpc/prefix-large-uhi.c: New test.
	* gcc/testsuite/gcc.target/powerpc/prefix-large-uqi.c: New test.
	* gcc/testsuite/gcc.target/powerpc/prefix-large-usi.c: New test.
	* gcc/testsuite/gcc.target/powerpc/prefix-large-v2df.c: New test.

Index: gcc/testsuite/gcc.target/powerpc/prefix-large-dd.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/prefix-large-dd.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/prefix-large-dd.c	(working copy)
@@ -0,0 +1,13 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target powerpc_future_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Tests for prefixed instructions testing whether we can generate a prefixed
+   load/store instruction that has a 34-bit offset.  */
+
+#define TYPE _Decimal64
+
+#include "prefix-large.h"
+
+/* { dg-final { scan-assembler-times {\mplfd\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mpstfd\M} 2 } } */
Index: gcc/testsuite/gcc.target/powerpc/prefix-large-df.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/prefix-large-df.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/prefix-large-df.c	(working copy)
@@ -0,0 +1,13 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target powerpc_future_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Tests for prefixed instructions testing whether we can generate a prefixed
+   load/store instruction that has a 34-bit offset.  */
+
+#define TYPE double
+
+#include "prefix-large.h"
+
+/* { dg-final { scan-assembler-times {\mplfd\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mpstfd\M} 2 } } */
Index: gcc/testsuite/gcc.target/powerpc/prefix-large-di.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/prefix-large-di.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/prefix-large-di.c	(working copy)
@@ -0,0 +1,13 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target powerpc_future_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Tests for prefixed instructions testing whether we can generate a prefixed
+   load/store instruction that has a 34-bit offset.  */
+
+#define TYPE long
+
+#include "prefix-large.h"
+
+/* { dg-final { scan-assembler-times {\mpld\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mpstd\M} 2 } } */
Index: gcc/testsuite/gcc.target/powerpc/prefix-large-hi.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/prefix-large-hi.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/prefix-large-hi.c	(working copy)
@@ -0,0 +1,13 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target powerpc_future_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Tests for prefixed instructions testing whether we can generate a prefixed
+   load/store instruction that has a 34-bit offset.  */
+
+#define TYPE short
+
+#include "prefix-large.h"
+
+/* { dg-final { scan-assembler-times {\mplh[az]\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mpsth\M}     2 } } */
Index: gcc/testsuite/gcc.target/powerpc/prefix-large-kf.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/prefix-large-kf.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/prefix-large-kf.c	(working copy)
@@ -0,0 +1,13 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target powerpc_future_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Tests for prefixed instructions testing whether we can generate a prefixed
+   load/store instruction that has a 34-bit offset.  */
+
+#define TYPE __float128
+
+#include "prefix-large.h"
+
+/* { dg-final { scan-assembler-times {\mplxv\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mpstxv\M} 2 } } */
Index: gcc/testsuite/gcc.target/powerpc/prefix-large-qi.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/prefix-large-qi.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/prefix-large-qi.c	(working copy)
@@ -0,0 +1,13 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target powerpc_future_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Tests for prefixed instructions testing whether we can generate a prefixed
+   load/store instruction that has a 34-bit offset.  */
+
+#define TYPE signed char
+
+#include "prefix-large.h"
+
+/* { dg-final { scan-assembler-times {\mplbz\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mpstb\M}  2 } } */
Index: gcc/testsuite/gcc.target/powerpc/prefix-large-sd.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/prefix-large-sd.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/prefix-large-sd.c	(working copy)
@@ -0,0 +1,16 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target powerpc_future_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Tests for prefixed instructions testing whether we can generate a prefixed
+   load/store instruction that has a 34-bit offset.  */
+
+#define TYPE _Decimal32
+
+#include "prefix-large.h"
+
+/* { dg-final { scan-assembler-times {\mpaddi\M|\mpli|\mpla\M} 3 } } */
+/* { dg-final { scan-assembler-times {\mlfiwzx\M}              2 } } */
+/* { dg-final { scan-assembler-times {\mstfiwx\M}              2 } } */
+
+
Index: gcc/testsuite/gcc.target/powerpc/prefix-large-sf.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/prefix-large-sf.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/prefix-large-sf.c	(working copy)
@@ -0,0 +1,13 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target powerpc_future_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Tests for prefixed instructions testing whether we can generate a prefixed
+   load/store instruction that has a 34-bit offset.  */
+
+#define TYPE float
+
+#include "prefix-large.h"
+
+/* { dg-final { scan-assembler-times {\mplfs\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mpstfs\M} 2 } } */
Index: gcc/testsuite/gcc.target/powerpc/prefix-large-si.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/prefix-large-si.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/prefix-large-si.c	(working copy)
@@ -0,0 +1,13 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target powerpc_future_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Tests for prefixed instructions testing whether we can generate a prefixed
+   load/store instruction that has a 34-bit offset.  */
+
+#define TYPE int
+
+#include "prefix-large.h"
+
+/* { dg-final { scan-assembler-times {\mplw[az]\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mpstw\M}     2 } } */
Index: gcc/testsuite/gcc.target/powerpc/prefix-large-udi.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/prefix-large-udi.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/prefix-large-udi.c	(working copy)
@@ -0,0 +1,13 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target powerpc_future_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Tests for prefixed instructions testing whether we can generate a prefixed
+   load/store instruction that has a 34-bit offset.  */
+
+#define TYPE unsigned long
+
+#include "prefix-large.h"
+
+/* { dg-final { scan-assembler-times {\mpld\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mpstd\M} 2 } } */
Index: gcc/testsuite/gcc.target/powerpc/prefix-large-uhi.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/prefix-large-uhi.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/prefix-large-uhi.c	(working copy)
@@ -0,0 +1,13 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target powerpc_future_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Tests for prefixed instructions testing whether we can generate a prefixed
+   load/store instruction that has a 34-bit offset.  */
+
+#define TYPE unsigned short
+
+#include "prefix-large.h"
+
+/* { dg-final { scan-assembler-times {\mplhz\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mpsth\M}  2 } } */
Index: gcc/testsuite/gcc.target/powerpc/prefix-large-uqi.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/prefix-large-uqi.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/prefix-large-uqi.c	(working copy)
@@ -0,0 +1,13 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target powerpc_future_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Tests for prefixed instructions testing whether we can generate a prefixed
+   load/store instruction that has a 34-bit offset.  */
+
+#define TYPE unsigned char
+
+#include "prefix-large.h"
+
+/* { dg-final { scan-assembler-times {\mplbz\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mpstb\M}  2 } } */
Index: gcc/testsuite/gcc.target/powerpc/prefix-large-usi.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/prefix-large-usi.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/prefix-large-usi.c	(working copy)
@@ -0,0 +1,13 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target powerpc_future_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Tests for prefixed instructions testing whether we can generate a prefixed
+   load/store instruction that has a 34-bit offset.  */
+
+#define TYPE unsigned int
+
+#include "prefix-large.h"
+
+/* { dg-final { scan-assembler-times {\mplwz\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mpstw\M}  2 } } */
Index: gcc/testsuite/gcc.target/powerpc/prefix-large-v2df.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/prefix-large-v2df.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/prefix-large-v2df.c	(working copy)
@@ -0,0 +1,13 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target powerpc_future_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Tests for prefixed instructions testing whether we can generate a prefixed
+   load/store instruction that has a 34-bit offset.  */
+
+#define TYPE vector double
+
+#include "prefix-large.h"
+
+/* { dg-final { scan-assembler-times {\mplxv\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mpstxv\M} 2 } } */
Index: gcc/testsuite/gcc.target/powerpc/prefix-large.h
===================================================================
--- gcc/testsuite/gcc.target/powerpc/prefix-large.h	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/prefix-large.h	(working copy)
@@ -0,0 +1,59 @@
+/* Common tests for prefixed instructions testing whether we can generate a
+   34-bit offset using 1 instruction.  */
+
+typedef signed char	schar;
+typedef unsigned char	uchar;
+typedef unsigned short	ushort;
+typedef unsigned int	uint;
+typedef unsigned long	ulong;
+typedef long double	ldouble;
+typedef vector double	v2df;
+typedef vector long	v2di;
+typedef vector float	v4sf;
+typedef vector int	v4si;
+
+#ifndef TYPE
+#define TYPE ulong
+#endif
+
+#ifndef ITYPE
+#define ITYPE TYPE
+#endif
+
+#ifndef OTYPE
+#define OTYPE TYPE
+#endif
+
+#if !defined(DO_ADD) && !defined(DO_VALUE) && !defined(DO_SET)
+#define DO_ADD		1
+#define DO_VALUE	1
+#define DO_SET		1
+#endif
+
+#ifndef CONSTANT
+#define CONSTANT	0x123450UL
+#endif
+
+#if DO_ADD
+void
+add (TYPE *p, TYPE a)
+{
+  p[CONSTANT] += a;
+}
+#endif
+
+#if DO_VALUE
+OTYPE
+value (TYPE *p)
+{
+  return p[CONSTANT];
+}
+#endif
+
+#if DO_SET
+void
+set (TYPE *p, ITYPE a)
+{
+  p[CONSTANT] = a;
+}
+#endif

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V3, #1 of 10], Add basic pc-relative support
  2019-08-26 20:41 ` [PATCH V3, #1 of 10], Add basic pc-relative support Michael Meissner
@ 2019-08-28 18:46   ` Segher Boessenkool
  2019-08-28 21:48     ` Michael Meissner
  0 siblings, 1 reply; 42+ messages in thread
From: Segher Boessenkool @ 2019-08-28 18:46 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, dje.gcc

Hi Mike,

On Mon, Aug 26, 2019 at 03:54:14PM -0400, Michael Meissner wrote:
> @@ -1626,8 +1626,8 @@ (define_predicate "small_toc_ref"
>    return GET_CODE (op) == UNSPEC && XINT (op, 1) == UNSPEC_TOCREL;
>  })
>  
> -;; Return true if the operand is a pc-relative address.
> -(define_predicate "pcrel_address"
> +;; Return true if the operand is a pc-relative address to a local label.
> +(define_predicate "pcrel_local_address"
>    (match_code "label_ref,symbol_ref,const")

Not just to a local label?  Please fix the comment.

> -(define_predicate "pcrel_external_address"
> +(define_predicate "pcrel_ext_address"

That is a much worse name.  "ext" can mean "extended" or many more things.

Having good names for these very basic things is important.  The names
shape the concepts, how you think about the code.

> +/* Enumeration giving the type of traditional addressing that would be used to
> +   decide whether an instruction uses prefixed memory or not.  If the
> +   traditional instruction uses the DS instruction format, and the bottom 2
> +   bits of the offset are not 0, the traditional instruction cannot be used,
> +   but a prefixed instruction can be used.  */

"Traditional" is a bad word for documentation.  What you mean is what was
supported before.  Before you know it "new" will be old as well.

What you mean is non-prefixed addressing?  Please say so, then.

(And it is "form", not "format", in all the insn descriptions etc...
Like "D-form".  Please use the same.)

> -#define RELOAD_REG_VALID	0x01	/* Mode valid in register..  */
> -#define RELOAD_REG_MULTIPLE	0x02	/* Mode takes multiple registers.  */
> -#define RELOAD_REG_INDEXED	0x04	/* Reg+reg addressing.  */
> -#define RELOAD_REG_OFFSET	0x08	/* Reg+offset addressing. */
> -#define RELOAD_REG_PRE_INCDEC	0x10	/* PRE_INC/PRE_DEC valid.  */
> -#define RELOAD_REG_PRE_MODIFY	0x20	/* PRE_MODIFY valid.  */
> -#define RELOAD_REG_AND_M16	0x40	/* AND -16 addressing.  */
> -#define RELOAD_REG_QUAD_OFFSET	0x80	/* quad offset is limited.  */
> +#define RELOAD_REG_VALID	0x001	/* Mode valid in register..  */
> +#define RELOAD_REG_MULTIPLE	0x002	/* Mode takes multiple registers.  */
> +#define RELOAD_REG_INDEXED	0x004	/* Reg+reg addressing.  */
> +#define RELOAD_REG_OFFSET	0x008	/* Reg+offset addressing. */
> +#define RELOAD_REG_PRE_INCDEC	0x010	/* PRE_INC/PRE_DEC valid.  */
> +#define RELOAD_REG_PRE_MODIFY	0x020	/* PRE_MODIFY valid.  */
> +#define RELOAD_REG_AND_M16	0x040	/* AND -16 addressing.  */
> +#define RELOAD_REG_QUAD_OFFSET	0x080	/* DQ offset (bottom 4 bits 0).  */
> +#define RELOAD_REG_DS_OFFSET	0x100	/* DS offset (bottom 2 bits 0).  */

As explained before, do not do this.  Do not use 3-digit hex numbers; don't
change existing definitions for no reason, not in the middle of an unrelated
patch, either.

> @@ -370,6 +369,8 @@ struct rs6000_reg_addr {

Can you fix this struct / arrays / whatever, instead of adding more to it?

>    enum insn_code reload_gpr_vsx;	/* INSN to move from GPR to VSX.  */
>    enum insn_code reload_vsx_gpr;	/* INSN to move from VSX to GPR.  */
>    addr_mask_type addr_mask[(int)N_RELOAD_REG]; /* Valid address masks.  */
> +  addr_mask_type any_addr_mask;		/* OR of GPR/FPR/VMX addr_masks.  */
> +  addr_mask_type default_addr_mask;	/* Default addr_mask to use.  */

And these "address masks" are bitmaps of random flags, one for each
"register class" (which is not related to the core GCC concept of "register
class", and the bits are called "RELOAD_REG_*" although this isn't for
reload at all?


Don't pile new stuff on without cleaning up the old stuff first.


> +	  /* 64-bit and larger values on GPRs need DS format instructions.  All

"Need"...  Well you want to use ld insns, please just say that?

> +	     non-vector offset instructions in Altivec registers need the DS
> +	     format instructions.  */

And this is talking about lxsd / lxssp?

> +	  const addr_mask_type quad_flags = (RELOAD_REG_OFFSET
> +					     | RELOAD_REG_QUAD_OFFSET);
> +
> +	  if ((addr_mask & quad_flags) == RELOAD_REG_OFFSET
> +	      && ((rc == RELOAD_REG_GPR && msize >= 8 && TARGET_POWERPC64)
> +		  || (rc == RELOAD_REG_VMX)))
> +	    addr_mask |= RELOAD_REG_DS_OFFSET;
> +
>  	  reg_addr[m].addr_mask[rc] = addr_mask;
> -	  any_addr_mask |= addr_mask;
> +	  any_addr_mask |= (addr_mask & ~RELOAD_REG_AND_M16);

Why do you need this last line?  Why was that flag set at all?  What does
"any mask" mean if it is not?

> +      /* Figure out what the default reload register set that should be used
> +	 for each mode,

"Figure out what register set should be used by default for each mode"?

And what does reload have to do with it?

> that should mirror the expected usage (i.e. vectors in
> +	 vector registers, ints in GPRs, etc).  Fall back to GPRs as a last
> +	 resort if the mode isn't valid in the vector/floating point registers.
> +	 In the case of vectors and FP, we want to test the reload register
> +	 classes in the order of epxected use or in terms of functionality (the

expected

> +	 FPRs offer offsettable loads/stores in earlier ISAs).  */

"If targetting ISA 2.07 or before, we need FPRs if we need to do a floating
point load using offset addressing".  Something like that?

> +      int def_rc;
> +      int rc_order[2];
> +      int rc_max = 0;
> +
> +      /* IEEE 128-bit hardware floating point insns use Altivec registers.  */
> +      if (TARGET_FLOAT128_HW && FLOAT128_IEEE_P (m))
> +	rc_order[rc_max++] = RELOAD_REG_VMX;
> +
> +      /* Normal vectors and software IEEE 128-bit can use either floating point
> +	 registers or Altivec registers.  */
> +      else if (TARGET_VSX && (VECTOR_MODE_P (m) || FLOAT128_IEEE_P (m)))
> +	{
> +	  rc_order[rc_max++] = RELOAD_REG_FPR;
> +	  rc_order[rc_max++] = RELOAD_REG_VMX;
> +	}
> +
> +      /* Altivec only vectors use the Altivec registers.  */
> +      else if (TARGET_ALTIVEC && !TARGET_VSX && VECTOR_MODE_P (m))
> +	rc_order[rc_max++] = RELOAD_REG_VMX;
> +
> +      /* For scalar binary/decimal floating point, prefer FPRs over altivec
> +	 registers.  */
> +      else if (TARGET_HARD_FLOAT && SCALAR_FLOAT_MODE_P (m))
> +	{
> +	  rc_order[rc_max++] = RELOAD_REG_FPR;
> +	  rc_order[rc_max++] = RELOAD_REG_VMX;
> +	}
> +
> +      /* Default to GPRs if neither FPRs or Altivec registers is valid and
> +	 preferred.  */
> +      def_rc = RELOAD_REG_GPR;
> +      for (int i = 0; i < rc_max; i++)
> +	{
> +	  int rc_num = rc_order[i];
> +	  if ((reg_addr[m].addr_mask[rc_num] & RELOAD_REG_VALID) != 0)
> +	    {
> +	      def_rc = rc_num;
> +	      break;
> +	    }
> +	}
> +
> +      reg_addr[m].default_addr_mask = (reg_addr[m].addr_mask[def_rc]
> +				       & ~RELOAD_REG_AND_M16);

Please factor this better.  You don't need a "default result code" variable
either, then.

> @@ -9634,6 +9704,21 @@ rs6000_emit_move (rtx dest, rtx source,
>  	  return;
>  	}
>  
> +      /* Handle loading up pc-relative addresses.  */
> +      if (TARGET_PCREL && mode == E_DImode)

(Why does this need E_, btw?)

> @@ -10770,11 +10855,10 @@ rs6000_secondary_reload_memory (rtx addr
>  		 & ~RELOAD_REG_AND_M16);
>  
>    /* If the register allocator hasn't made up its mind yet on the register
> -     class to use, settle on defaults to use.  */
> +     class to use, use the default address mask bits.  */
>    else if (rclass == NO_REGS)

And this *does* mean register class.

>    /* Is it a pc-relative address?  */
> -  else if (pcrel_address (x, Pmode))
> +  else if (pcrel_local_address (x, Pmode) || pcrel_ext_address (x, Pmode))

This sounds like something you want to test together all over the place.
Please make a helper function?

> +  /* If the mode does not support offset addressing directly, but it has
> +     multiple registers, see if we can figure out a type that after splitting
> +     the load/store, will be used (i.e. for a vector, use the element, for IBM
> +     long double or TDmode use DFmode, etc.).  This is typically needed in the
> +     early RTL stages before register allocation has been done.  */

s/type/mode/

> +  if ((addr_mask & flags) == RELOAD_REG_MULTIPLE)
> +    {
> +      machine_mode inner = word_mode;
> +
> +      if (COMPLEX_MODE_P (mode))
> +	{
> +	  inner = GET_MODE_INNER (mode);
> +	  if ((reg_addr[inner].default_addr_mask & RELOAD_REG_OFFSET) == 0)
> +	    inner = word_mode;
> +	}
> +
> +      if (FLOAT128_2REG_P (mode))
> +	{
> +	  inner = DFmode;
> +	  if ((reg_addr[inner].default_addr_mask & RELOAD_REG_OFFSET) == 0)
> +	    inner = word_mode;
> +	}
> +
> +      addr_mask = reg_addr[inner].default_addr_mask;
> +    }

Are these the only two cases?  Is there some better way to determine this?

You don't handle vectors, contrary to the comment.

The word_mode thing needs a comment (it isn't clear to me what it is really
trying to do).

I think this would all be much simpler with just a few lines of code instead
of all these tables, fwiw.

>  bool
> -rs6000_prefixed_address_mode_p (rtx addr, machine_mode mode)
> +prefixed_local_addr_p (rtx addr,
> +		       machine_mode mode,
> +		       trad_insn_type trad_insn)
>  {
> -  if (!TARGET_PREFIXED_ADDR || !mode_supports_prefixed_address_p (mode))
> +  /* Don't allow SDmode, because it only can be loaded into FPRs using LFIWZX
> +     instruction.  */
> +  if (!TARGET_PREFIXED_ADDR || mode == E_SDmode)
>      return false;

Write this as two separate conditionals please, the two are unrelated (and
the comment is only about the second).

> +;; Load up a pc-relative address.  Print_operand_address will append a @pcrel
> +;; to the symbol or label.
> +(define_insn "pcrel_local_addr"

This isn't used anywhere?  Not by name, that is?

> +  [(set (match_operand:DI 0 "gpc_reg_operand" "=b*r")
> +	(match_operand:DI 1 "pcrel_local_address"))]
> +  "TARGET_PCREL"
> +  "pla %0,%a1"
> +  [(set_attr "length" "12")])

I wonder if that whole "b*r" thing is useful at all these days, btw.


This patch changes a whole bunch of things.  You probably can split it
into smaller, self-contained pieces.


Segher

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH, V3, #7 of 10], Implement PCREL_OPT relocation optimization
  2019-08-26 22:06 ` [PATCH, V3, #7 of 10], Implement PCREL_OPT relocation optimization Michael Meissner
@ 2019-08-28 21:48   ` Michael Meissner
  2019-09-03 22:56   ` Segher Boessenkool
  1 sibling, 0 replies; 42+ messages in thread
From: Michael Meissner @ 2019-08-28 21:48 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, segher, dje.gcc

Note, there is a minor error in this patch.  However, since I will need to
create V4 patches shortly, I will fix the bug in those patches.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V3, #1 of 10], Add basic pc-relative support
  2019-08-28 18:46   ` Segher Boessenkool
@ 2019-08-28 21:48     ` Michael Meissner
  2019-08-30  0:08       ` Segher Boessenkool
  0 siblings, 1 reply; 42+ messages in thread
From: Michael Meissner @ 2019-08-28 21:48 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Michael Meissner, gcc-patches, dje.gcc

On Wed, Aug 28, 2019 at 12:14:58PM -0500, Segher Boessenkool wrote:
> Hi Mike,
> 
> On Mon, Aug 26, 2019 at 03:54:14PM -0400, Michael Meissner wrote:
> > @@ -1626,8 +1626,8 @@ (define_predicate "small_toc_ref"
> >    return GET_CODE (op) == UNSPEC && XINT (op, 1) == UNSPEC_TOCREL;
> >  })
> >  
> > -;; Return true if the operand is a pc-relative address.
> > -(define_predicate "pcrel_address"
> > +;; Return true if the operand is a pc-relative address to a local label.
> > +(define_predicate "pcrel_local_address"
> >    (match_code "label_ref,symbol_ref,const")
> 
> Not just to a local label?  Please fix the comment.

Ok.

> > -(define_predicate "pcrel_external_address"
> > +(define_predicate "pcrel_ext_address"
> 
> That is a much worse name.  "ext" can mean "extended" or many more things.

Ok.

> Having good names for these very basic things is important.  The names
> shape the concepts, how you think about the code.
> 
> > +/* Enumeration giving the type of traditional addressing that would be used to
> > +   decide whether an instruction uses prefixed memory or not.  If the
> > +   traditional instruction uses the DS instruction format, and the bottom 2
> > +   bits of the offset are not 0, the traditional instruction cannot be used,
> > +   but a prefixed instruction can be used.  */
> 
> "Traditional" is a bad word for documentation.  What you mean is what was
> supported before.  Before you know it "new" will be old as well.

Yeah, yeah, yeah.  I recall in Amsterdam there is the "Oude Kerk" (old church)
built in the 1200's and the "De Nieuwe Kerk" in Amsterdam (built in the 1500's)
and thinking then of the problems of calling something "new" and "old".


> What you mean is non-prefixed addressing?  Please say so, then.
> 
> (And it is "form", not "format", in all the insn descriptions etc...
> Like "D-form".  Please use the same.)

Ok.

> > -#define RELOAD_REG_VALID	0x01	/* Mode valid in register..  */
> > -#define RELOAD_REG_MULTIPLE	0x02	/* Mode takes multiple registers.  */
> > -#define RELOAD_REG_INDEXED	0x04	/* Reg+reg addressing.  */
> > -#define RELOAD_REG_OFFSET	0x08	/* Reg+offset addressing. */
> > -#define RELOAD_REG_PRE_INCDEC	0x10	/* PRE_INC/PRE_DEC valid.  */
> > -#define RELOAD_REG_PRE_MODIFY	0x20	/* PRE_MODIFY valid.  */
> > -#define RELOAD_REG_AND_M16	0x40	/* AND -16 addressing.  */
> > -#define RELOAD_REG_QUAD_OFFSET	0x80	/* quad offset is limited.  */
> > +#define RELOAD_REG_VALID	0x001	/* Mode valid in register..  */
> > +#define RELOAD_REG_MULTIPLE	0x002	/* Mode takes multiple registers.  */
> > +#define RELOAD_REG_INDEXED	0x004	/* Reg+reg addressing.  */
> > +#define RELOAD_REG_OFFSET	0x008	/* Reg+offset addressing. */
> > +#define RELOAD_REG_PRE_INCDEC	0x010	/* PRE_INC/PRE_DEC valid.  */
> > +#define RELOAD_REG_PRE_MODIFY	0x020	/* PRE_MODIFY valid.  */
> > +#define RELOAD_REG_AND_M16	0x040	/* AND -16 addressing.  */
> > +#define RELOAD_REG_QUAD_OFFSET	0x080	/* DQ offset (bottom 4 bits 0).  */
> > +#define RELOAD_REG_DS_OFFSET	0x100	/* DS offset (bottom 2 bits 0).  */
> 
> As explained before, do not do this.  Do not use 3-digit hex numbers; don't
> change existing definitions for no reason, not in the middle of an unrelated
> patch, either.

Ok, but I do hate things not lining up.  But I will keep the original masks as they are.

> > @@ -370,6 +369,8 @@ struct rs6000_reg_addr {
> 
> Can you fix this struct / arrays / whatever, instead of adding more to it?
> 
> >    enum insn_code reload_gpr_vsx;	/* INSN to move from GPR to VSX.  */
> >    enum insn_code reload_vsx_gpr;	/* INSN to move from VSX to GPR.  */
> >    addr_mask_type addr_mask[(int)N_RELOAD_REG]; /* Valid address masks.  */
> > +  addr_mask_type any_addr_mask;		/* OR of GPR/FPR/VMX addr_masks.  */
> > +  addr_mask_type default_addr_mask;	/* Default addr_mask to use.  */
> 
> And these "address masks" are bitmaps of random flags, one for each
> "register class" (which is not related to the core GCC concept of "register
> class", and the bits are called "RELOAD_REG_*" although this isn't for
> reload at all?

Actually no, they were created explicitly for the secondary reload handler when
I wrote this interface to add VSX support.  These masks were created for the
secondary reload handler to tell what type of addressing a mode has for the 3
hardware register classes (GPR, FPR, VMX) when you are given an address and
told to fix it up.

Now when I wrote them, I always meant to extend their use to the legitimate
address functions to replace all of the if this type is foo, then allow
increment or offset, etc.  I didn't get to other things as time, energy, and
deadlines came up.

> Don't pile new stuff on without cleaning up the old stuff first.
> 
> 
> > +	  /* 64-bit and larger values on GPRs need DS format instructions.  All
> 
> "Need"...  Well you want to use ld insns, please just say that?
> 
> > +	     non-vector offset instructions in Altivec registers need the DS
> > +	     format instructions.  */
> 
> And this is talking about lxsd / lxssp?

Yes.

> > +	  const addr_mask_type quad_flags = (RELOAD_REG_OFFSET
> > +					     | RELOAD_REG_QUAD_OFFSET);
> > +
> > +	  if ((addr_mask & quad_flags) == RELOAD_REG_OFFSET
> > +	      && ((rc == RELOAD_REG_GPR && msize >= 8 && TARGET_POWERPC64)
> > +		  || (rc == RELOAD_REG_VMX)))
> > +	    addr_mask |= RELOAD_REG_DS_OFFSET;
> > +
> >  	  reg_addr[m].addr_mask[rc] = addr_mask;
> > -	  any_addr_mask |= addr_mask;
> > +	  any_addr_mask |= (addr_mask & ~RELOAD_REG_AND_M16);
> 
> Why do you need this last line?  Why was that flag set at all?  What does
> "any mask" mean if it is not?

The flag is set to say this register class allows the funky (reg + reg) & -16
addressing used with the original Altivec instructions.  In retrospect, I
should have changed the Altivec memory to use an UNSPEC or some similar, but
that would be several years before we needed to clean that up.

> > +      /* Figure out what the default reload register set that should be used
> > +	 for each mode,
> 
> "Figure out what register set should be used by default for each mode"?
> 
> And what does reload have to do with it?

It is used when the register is still a pseudo to tell the appropriate defaults
to use (in rs6000_secondary_reload_memory).  Before this patch, it used the any
field.

In the original code, the any register class was used by the mode_supports
helper function to say whether any one of the three reload register classes
supports a feature.

But even if one feature is used in one reload register class, for making
decisions, you really want to use the default class instead of the any class.
Otherwise, with direct move, it means that reload will often times do it in an
alternate register and do a direct move.  SDmode for example, supports offset
addresses in GPRs, but the normal usage you want to only use indexed addresses.

> 
> > that should mirror the expected usage (i.e. vectors in
> > +	 vector registers, ints in GPRs, etc).  Fall back to GPRs as a last
> > +	 resort if the mode isn't valid in the vector/floating point registers.
> > +	 In the case of vectors and FP, we want to test the reload register
> > +	 classes in the order of epxected use or in terms of functionality (the
> 
> expected
> 
> > +	 FPRs offer offsettable loads/stores in earlier ISAs).  */
> 
> "If targetting ISA 2.07 or before, we need FPRs if we need to do a floating
> point load using offset addressing".  Something like that?
> 
> > +      int def_rc;
> > +      int rc_order[2];
> > +      int rc_max = 0;
> > +
> > +      /* IEEE 128-bit hardware floating point insns use Altivec registers.  */
> > +      if (TARGET_FLOAT128_HW && FLOAT128_IEEE_P (m))
> > +	rc_order[rc_max++] = RELOAD_REG_VMX;
> > +
> > +      /* Normal vectors and software IEEE 128-bit can use either floating point
> > +	 registers or Altivec registers.  */
> > +      else if (TARGET_VSX && (VECTOR_MODE_P (m) || FLOAT128_IEEE_P (m)))
> > +	{
> > +	  rc_order[rc_max++] = RELOAD_REG_FPR;
> > +	  rc_order[rc_max++] = RELOAD_REG_VMX;
> > +	}
> > +
> > +      /* Altivec only vectors use the Altivec registers.  */
> > +      else if (TARGET_ALTIVEC && !TARGET_VSX && VECTOR_MODE_P (m))
> > +	rc_order[rc_max++] = RELOAD_REG_VMX;
> > +
> > +      /* For scalar binary/decimal floating point, prefer FPRs over altivec
> > +	 registers.  */
> > +      else if (TARGET_HARD_FLOAT && SCALAR_FLOAT_MODE_P (m))
> > +	{
> > +	  rc_order[rc_max++] = RELOAD_REG_FPR;
> > +	  rc_order[rc_max++] = RELOAD_REG_VMX;
> > +	}
> > +
> > +      /* Default to GPRs if neither FPRs or Altivec registers is valid and
> > +	 preferred.  */
> > +      def_rc = RELOAD_REG_GPR;
> > +      for (int i = 0; i < rc_max; i++)
> > +	{
> > +	  int rc_num = rc_order[i];
> > +	  if ((reg_addr[m].addr_mask[rc_num] & RELOAD_REG_VALID) != 0)
> > +	    {
> > +	      def_rc = rc_num;
> > +	      break;
> > +	    }
> > +	}
> > +
> > +      reg_addr[m].default_addr_mask = (reg_addr[m].addr_mask[def_rc]
> > +				       & ~RELOAD_REG_AND_M16);
> 
> Please factor this better.  You don't need a "default result code" variable
> either, then.

The default addr mask is used in prefixed addressing to decide whether in the
abscense of the register class used to say whether D/DS/DQ-form instructions
would be used as non-prefixed instructions.

I can refactor it, but it will be a lot more code.

> > @@ -9634,6 +9704,21 @@ rs6000_emit_move (rtx dest, rtx source,
> >  	  return;
> >  	}
> >  
> > +      /* Handle loading up pc-relative addresses.  */
> > +      if (TARGET_PCREL && mode == E_DImode)
> 
> (Why does this need E_, btw?)

I can change it.

> > @@ -10770,11 +10855,10 @@ rs6000_secondary_reload_memory (rtx addr
> >  		 & ~RELOAD_REG_AND_M16);
> >  
> >    /* If the register allocator hasn't made up its mind yet on the register
> > -     class to use, settle on defaults to use.  */
> > +     class to use, use the default address mask bits.  */
> >    else if (rclass == NO_REGS)
> 
> And this *does* mean register class.

No, in the context of the code, it means reload register class.  The whole
point is to reduce all of the normal register classes just to the 3 hardware
register types.

> 
> >    /* Is it a pc-relative address?  */
> > -  else if (pcrel_address (x, Pmode))
> > +  else if (pcrel_local_address (x, Pmode) || pcrel_ext_address (x, Pmode))
> 
> This sounds like something you want to test together all over the place.
> Please make a helper function?

Actually I believe there are are only two places where I check for local and
external symbols.  Everywhere else only checks for local symbols (that is why
the previous patches used two booleans, to say whether you wanted local symbols
and/or external symbols).

> > +  /* If the mode does not support offset addressing directly, but it has
> > +     multiple registers, see if we can figure out a type that after splitting
> > +     the load/store, will be used (i.e. for a vector, use the element, for IBM
> > +     long double or TDmode use DFmode, etc.).  This is typically needed in the
> > +     early RTL stages before register allocation has been done.  */
> 
> s/type/mode/
> 
> > +  if ((addr_mask & flags) == RELOAD_REG_MULTIPLE)
> > +    {
> > +      machine_mode inner = word_mode;
> > +
> > +      if (COMPLEX_MODE_P (mode))
> > +	{
> > +	  inner = GET_MODE_INNER (mode);
> > +	  if ((reg_addr[inner].default_addr_mask & RELOAD_REG_OFFSET) == 0)
> > +	    inner = word_mode;
> > +	}
> > +
> > +      if (FLOAT128_2REG_P (mode))
> > +	{
> > +	  inner = DFmode;
> > +	  if ((reg_addr[inner].default_addr_mask & RELOAD_REG_OFFSET) == 0)
> > +	    inner = word_mode;
> > +	}
> > +
> > +      addr_mask = reg_addr[inner].default_addr_mask;
> > +    }
> 
> Are these the only two cases?  Is there some better way to determine this?
> 
> You don't handle vectors, contrary to the comment.

I will look at the comment.

> The word_mode thing needs a comment (it isn't clear to me what it is really
> trying to do).
> 
> I think this would all be much simpler with just a few lines of code instead
> of all these tables, fwiw.
> 
> >  bool
> > -rs6000_prefixed_address_mode_p (rtx addr, machine_mode mode)
> > +prefixed_local_addr_p (rtx addr,
> > +		       machine_mode mode,
> > +		       trad_insn_type trad_insn)
> >  {
> > -  if (!TARGET_PREFIXED_ADDR || !mode_supports_prefixed_address_p (mode))
> > +  /* Don't allow SDmode, because it only can be loaded into FPRs using LFIWZX
> > +     instruction.  */
> > +  if (!TARGET_PREFIXED_ADDR || mode == E_SDmode)
> >      return false;
> 
> Write this as two separate conditionals please, the two are unrelated (and
> the comment is only about the second).

Ok.

> > +;; Load up a pc-relative address.  Print_operand_address will append a @pcrel
> > +;; to the symbol or label.
> > +(define_insn "pcrel_local_addr"
> 
> This isn't used anywhere?  Not by name, that is?

Yes it is used in rs6000_emit_move.  Basically in the previous patch, you
complained about just creating the SET directly.  So I figured this was
clearer.  But I can go back to generating the SET and just add more comments,
and re-add the '*' in front of the name.'

> > +  [(set (match_operand:DI 0 "gpc_reg_operand" "=b*r")
> > +	(match_operand:DI 1 "pcrel_local_address"))]
> > +  "TARGET_PCREL"
> > +  "pla %0,%a1"
> > +  [(set_attr "length" "12")])
> 
> I wonder if that whole "b*r" thing is useful at all these days, btw.

Yep.

> This patch changes a whole bunch of things.  You probably can split it
> into smaller, self-contained pieces.

Not really, but I will try.  However, then of course you have the issue that a
particular patch creates a function that isn't used for a few patches, and you
have to look at several patches all at once.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH, V3, #2 of 10], Improve rs6000_setup_addr_mask
  2019-08-26 21:12 ` [PATCH, V3, #2 of 10], Improve rs6000_setup_addr_mask Michael Meissner
@ 2019-08-29  2:59   ` Segher Boessenkool
  0 siblings, 0 replies; 42+ messages in thread
From: Segher Boessenkool @ 2019-08-29  2:59 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, dje.gcc

Hi!

Thanks for splitting things out, that makes it easier to understand.

On Mon, Aug 26, 2019 at 04:10:22PM -0400, Michael Meissner wrote:
> This code attempts to make this clearer by moving the settings for
> GPRs, FPRs, and traditional Altivec registers to separate functions.

They are called VRs.  They always were, but it really doesn't make much
sense to call them "traditional AltiVec registers" now; they are used
for so many more things.

> The second issue is SDmode indicates that it can do PRE_INCREMENT,
> PRE_DECREMENT, and PRE_MODIFY in the floating point registers.  It
> can't since you need to use the LFIWZX instruction to load SDmode, and

(Or lxsiwzx -- you can do all floating point in VRs, too!)

> that does not have an pre-increment format.  I was not able to make a
> test case that actually failed with SDmode.  I opted to make my
> comparison simpler by returning the same information that the current
> compiler uses.  If you prefer, I can change it so the address mask does
> not indicate that the mode can do pre increment, etc.

Does that result in better code?  Does it make the compiler simpler?  If
either of those is "yes", then yes please.

> +/* Return true if the mode is a type that uses the full vector register (like
> +   V2DImode or KFmode).  Do not return true for 128-bit types like TDmode or
> +   IFmode.  */

(_A_ full vector register, "the" makes no sense if the mode doesn't go in
vector registers at all).

As opposed to?  Using only (part of) dword 0?

> +static bool
> +mode_uses_full_vector_reg (machine_mode mode)
>  {
> +  if (GET_MODE_SIZE (mode) < 16)
> +    return false;

Is this needed, given the other conditions?  Or...

> +  if (TARGET_VSX)
> +    return (VECTOR_MODE_P (mode)
> +	    || FLOAT128_VECTOR_P (mode)
> +	    || mode == TImode);
> +
> +  if (TARGET_ALTIVEC)
> +    return ALTIVEC_VECTOR_MODE (mode);

... maybe it should be just ALTIVEC_OR_VSX_VECTOR_MODE instead?  And the
TImode test?

> +/* Figure out if we can do PRE_INC, PRE_DEC, or PRE_MODIFY addressing for a
> +   given MODE.  If we allow scalars into Altivec registers, don't allow
> +   PRE_INC, PRE_DEC, or PRE_MODIFY.
> +
> +   For VSX systems, we don't allow update addressing for DFmode/SFmode if those
> +   registers can go in both the traditional floating point registers and
> +   Altivec registers.  The load/store instructions for the Altivec registers do
> +   not have update forms.  If we allowed update addressing, it seems to break
> +   IV-OPT code using floating point if the index type is int instead of long
> +   (PR target/81550 and target/84042).  */

"Seems to break"...  Well, ivopts makes a different decision, which isn't
very surprising, and that leads to a loop that is *better* optimised?

Not that load/store-with-update is terribly interesting for floating
point, on modern cores anyway, or that it is ideal to have insns that
only exist for half of the allowed registers :-)

> +/* Return the address mask bits for whether we allow PRE_INCREMENT,
> +   PRE_DECREMENT, and PRE_MODIFY for a given MODE.  */

Why do we do decrement and increment separately here?  There are no insns
like that at all.  Not allowing pre_modify because you need to have an
"extra" like in other cases for bigger modes, well, just do that then?

> +static addr_mask_type
> +setup_reg_addr_masks_pre_incdec (machine_mode mode)
> +{
> +  addr_mask_type addr_mask = 0;
> +
> +  if (TARGET_UPDATE
> +      && GET_MODE_SIZE (mode) <= 8
> +      && !VECTOR_MODE_P (mode)

If it is at most 8 bytes, it cannot be a vector (all our vector modes
are 16 or 32 bytes).

> +      && !FLOAT128_VECTOR_P (mode)
> +      && !COMPLEX_MODE_P (mode)
> +      && (mode != E_DFmode || !TARGET_VSX)
> +      && (mode != E_SFmode || !TARGET_P8_VECTOR))

(Please use DFmode etc. where you can).

We probably should have some helper to say what modes can go in vector
regs.  Well, don't we have that already?  rs6000_vector_unit[mode]?

> +      bool indexed_only = (mode == SDmode && TARGET_LFIWZX);
> +
> +      if (!indexed_only
> +	  && (mode_size_inner <= 8
> +	      || (mode_size_inner == 16 && TARGET_P9_VECTOR
> +		  && mode_uses_full_vector_reg (mode_inner))))

The P9 part here needs a comment (and/or a nicer condition).

> +      else if (mode == SFmode || mode_size == 8

Newline before the || please.

> +	       || mode_uses_full_vector_reg (mode_inner)
> +	       || (TARGET_LFIWAX && (mode == SImode || mode == SDmode))
> +	       || (TARGET_P9_VECTOR && (mode == QImode || mode == HImode)))
> +	addr_mask |= RELOAD_REG_INDEXED;

These last two really only care about the size of the mode?  So write it
like that, too?

> +      /* It is weird that previous versions of GCC supported pre increment,
> +	 etc. forms of addressing for SDmode, when you could only use an
> +	 indexed instruction, but allow it for now.  Previous versions of GCC
> +	 also set the indexed flag for SDmode, even though there was no direct
> +	 instruction to load it.  */

Why not fix it now?  It's stage 1.

> +      /* FPR registers can do REG+OFFSET addresssing for vectors if ISA 3.0
> +	 instructions are enabled.  The offset for 128-bit VSX registers is
> +	 only 12-bits.  */

No, it's 16 bits; but is a DQ-form, so the low 4 bits of the offset are
zeroes.  (And it is not just FPRs, it is VRs as well).


So, I wonder if things will be a lot simpler if you do not precompute
anything, just have "mode_can_use_offset_addressing" etc. functions?


Segher

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V3, #1 of 10], Add basic pc-relative support
  2019-08-28 21:48     ` Michael Meissner
@ 2019-08-30  0:08       ` Segher Boessenkool
  2019-09-06  0:18         ` Michael Meissner
  0 siblings, 1 reply; 42+ messages in thread
From: Segher Boessenkool @ 2019-08-30  0:08 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, dje.gcc

On Wed, Aug 28, 2019 at 05:26:55PM -0400, Michael Meissner wrote:
> On Wed, Aug 28, 2019 at 12:14:58PM -0500, Segher Boessenkool wrote:
> > > +/* Enumeration giving the type of traditional addressing that would be used to
> > > +   decide whether an instruction uses prefixed memory or not.  If the
> > > +   traditional instruction uses the DS instruction format, and the bottom 2
> > > +   bits of the offset are not 0, the traditional instruction cannot be used,
> > > +   but a prefixed instruction can be used.  */
> > 
> > "Traditional" is a bad word for documentation.  What you mean is what was
> > supported before.  Before you know it "new" will be old as well.
> 
> Yeah, yeah, yeah.  I recall in Amsterdam there is the "Oude Kerk" (old church)
> built in the 1200's and the "De Nieuwe Kerk" in Amsterdam (built in the 1500's)
> and thinking then of the problems of calling something "new" and "old".

:-)

> > Can you fix this struct / arrays / whatever, instead of adding more to it?

> > And these "address masks" are bitmaps of random flags, one for each
> > "register class" (which is not related to the core GCC concept of "register
> > class", and the bits are called "RELOAD_REG_*" although this isn't for
> > reload at all?
> 
> Actually no, they were created explicitly for the secondary reload handler when
> I wrote this interface to add VSX support.

This is not just for reload anymore, so please don't name it that.  Renaming
things isn't hard, this isn't a public API or anything :-)

> > > +	  if ((addr_mask & quad_flags) == RELOAD_REG_OFFSET
> > > +	      && ((rc == RELOAD_REG_GPR && msize >= 8 && TARGET_POWERPC64)
> > > +		  || (rc == RELOAD_REG_VMX)))
> > > +	    addr_mask |= RELOAD_REG_DS_OFFSET;
> > > +
> > >  	  reg_addr[m].addr_mask[rc] = addr_mask;
> > > -	  any_addr_mask |= addr_mask;
> > > +	  any_addr_mask |= (addr_mask & ~RELOAD_REG_AND_M16);
> > 
> > Why do you need this last line?  Why was that flag set at all?  What does
> > "any mask" mean if it is not?
> 
> The flag is set to say this register class allows the funky (reg + reg) & -16
> addressing used with the original Altivec instructions.

No, I understand that, but why was it set in some individual mask if you
need to clean it in the "any" mask?

> > > @@ -10770,11 +10855,10 @@ rs6000_secondary_reload_memory (rtx addr
> > >  		 & ~RELOAD_REG_AND_M16);
> > >  
> > >    /* If the register allocator hasn't made up its mind yet on the register
> > > -     class to use, settle on defaults to use.  */
> > > +     class to use, use the default address mask bits.  */
> > >    else if (rclass == NO_REGS)
> > 
> > And this *does* mean register class.
> 
> No, in the context of the code, it means reload register class.

rclass is a register class.  NO_REGS is a register class.  "rc" isn't.

> The whole
> point is to reduce all of the normal register classes just to the 3 hardware
> register types.

Yes, so don't call it register class.  Don't use the same word for two
different things, esp. when one is used all over the place already.

> > I think this would all be much simpler with just a few lines of code instead
> > of all these tables, fwiw.

That's the core of most of this.  All this precomputation is indirection
that makes things really hard to understand.

And a lot of the more problematic code is the *older* code.  If you improve
that first -- *first*, that is what the earlier patches in a series are
for -- then this all will be much much easier to read and understand and
review and comment on and accept.

> > > +;; Load up a pc-relative address.  Print_operand_address will append a @pcrel
> > > +;; to the symbol or label.
> > > +(define_insn "pcrel_local_addr"
> > 
> > This isn't used anywhere?  Not by name, that is?
> 
> Yes it is used in rs6000_emit_move.

Not in this patch though?

> > > +  [(set (match_operand:DI 0 "gpc_reg_operand" "=b*r")
> > > +	(match_operand:DI 1 "pcrel_local_address"))]
> > > +  "TARGET_PCREL"
> > > +  "pla %0,%a1"
> > > +  [(set_attr "length" "12")])
> > 
> > I wonder if that whole "b*r" thing is useful at all these days, btw.
> 
> Yep.

You mean it is useful?  Or you question it too?

> > This patch changes a whole bunch of things.  You probably can split it
> > into smaller, self-contained pieces.
> 
> Not really, but I will try.  However, then of course you have the issue that a
> particular patch creates a function that isn't used for a few patches, and you
> have to look at several patches all at once.

No, not if you divide things properly.  You *never* need to introduce more
than one thing at once, if they all are unused!

Multiple concepts in one patch is a LOT of work to review.  It is MUCH
less work to review 50 focused patches than to review just 5 doing the
same, even if those 50 make up twice as many lines of patch total.


Segher

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH, V3, #3 of 10], Add prefixed RTL insn attribute
  2019-08-26 21:07 ` [PATCH, V3, #3 of 10], Add prefixed RTL insn attribute Michael Meissner
@ 2019-08-30  1:58   ` Segher Boessenkool
  0 siblings, 0 replies; 42+ messages in thread
From: Segher Boessenkool @ 2019-08-30  1:58 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, dje.gcc

Hi Mike,

On Mon, Aug 26, 2019 at 04:31:02PM -0400, Michael Meissner wrote:
> 	(rs6000_asm_output_opcode): New function for prifixed memory.

Typo.  Just say "New." or "New function." please.

> --- gcc/config/rs6000/rs6000.c	(revision 274871)
> +++ gcc/config/rs6000/rs6000.c	(working copy)
> @@ -13827,23 +13827,23 @@ addr_mask_to_trad_insn (machine_mode mod
>       early RTL stages before register allocation has been done.  */
>    if ((addr_mask & flags) == RELOAD_REG_MULTIPLE)
>      {
> -      machine_mode inner = word_mode;
> +      machine_mode mode2 = mode;

So what is "mode2" for?  A meaningful name and/or some comments would help.

> +	  if ((reg_addr[E_DFmode].default_addr_mask & RELOAD_REG_OFFSET) != 0)
> +	    mode = DFmode;

(Don't use E_ if you do not need it -- i.e. most of the time).

> +/* Helper function to take a REG and a MODE and turn it into the traditional
> +   instruction format (D/DS/DQ) used for offset memory.  */

Is this the form of the preferred insn to do this?  Or ths minimum required
to do it at all?  Something else?

> +  /* If it isn't a register, use the defaults.  */
> +  if (!REG_P (reg) && !SUBREG_P (reg))
> +    addr_mask = reg_addr[mode].default_addr_mask;
> +
> +  else
> +    {
> +      unsigned int r = reg_or_subregno (reg);

This ICEs if it is a subreg of something else than a reg.

You can just start with

  if (SUBREG_P (reg))
    reg = SUBREG_REG (reg);

  if (REG_P (reg))
   ... etc.

> +/* Whether a load instruction is a prefixed instruction.  This is called from
> +   the prefixed attribute processing.  */
> +
> +bool
> +prefixed_load_p (rtx_insn *insn)
> +{
> +  /* Validate the insn to make sure it is a normal load insn.  */
> +  extract_insn_cached (insn);
> +  if (recog_data.n_operands < 2)
> +    return false;

Why don't you handle this the same way "indexed" and "update" are already
handled?  That is *easy* and it *works*, it trivially verifiably works.
It also doesn't care whether something is a load or a store.  You hardcode
the few exceptions (okay, twenty or whatever update insns -- but all are
similar, so that is easy), and everything else just works.

The way you code it you just hope to exclude all of the exceptions,
instead of handling them directly.

> +void
> +rs6000_asm_output_opcode (FILE *stream)
> +{
> +  if (next_insn_prefixed_p)
> +    fputc ('p', stream);
> +
> +  return;
> +}

You can just write fprintf fwiw, the compile can optimise it for you
just fine.

> +#define ASM_OUTPUT_OPCODE(STREAM, OPCODE)				\
> +  do									\
> +    {									\
> +     if (TARGET_PREFIXED_ADDR)						\
> +       rs6000_asm_output_opcode (STREAM);				\
> +    }									\
> +  while (0)

(Indentation of the "if" is weird?)

> +;; Whether an insn is a prefixed insn, and an initial 'p' should be printed
> +;; before the instruction.  A prefixed instruction has a prefix instruction

Whether it is a prefixed insn, period.

> +;; word that extends the immediate value of the instructions from 12-16 bits to
> +;; 34 bits.  The macro ASM_OUTPUT_OPCODE emits a leading 'p' for prefixed
> +;; insns.  The default "length" attribute will also be adjusted by default to
> +;; be 12 bytes.

Don't say all the effects here, say that where you make it happen?

> +;; Length in bytes of instructions that use prefixed addressing and length in
> +;; bytes of instructions that does not use prefixed addressing.  This allows
> +;; both lengths to be defined as constants, and the length attribute can pick
> +;; the size as appropriate.
> +(define_attr "prefixed_length" "" (const_int 12))
> +(define_attr "non_prefixed_length" "" (const_int 4))

Do you mean a define_insn can override either to something else?  Then
say that, please?

> +;; Length of the instruction (in bytes).  Prefixed insns are 8 bytes, but the
> +;; assembler might issue need to issue a NOP so that the prefixed instruction
> +;; does not cross a cache boundary, which makes them possibly 12 bytes.

s/issue //

> @@ -9883,8 +9926,8 @@ (define_insn "pcrel_local_addr"
>    [(set (match_operand:DI 0 "gpc_reg_operand" "=b*r")
>  	(match_operand:DI 1 "pcrel_local_address"))]
>    "TARGET_PCREL"
> -  "pla %0,%a1"
> -  [(set_attr "length" "12")])
> +  "la %0,%a1"
> +  [(set_attr "prefixed" "yes")])

And just like this you can set the few insns that do not have operands 0
and 1 as source and dest to "no", exactly like is already done for "update"
and "indexed".


Segher

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH, V3, #4 of 10], Add general prefixed/pcrel support
  2019-08-26 21:23 ` [PATCH, V3, #4 of 10], Add general prefixed/pcrel support Michael Meissner
@ 2019-08-30 19:22   ` Segher Boessenkool
  2019-08-31  3:08     ` Alan Modra
  0 siblings, 1 reply; 42+ messages in thread
From: Segher Boessenkool @ 2019-08-30 19:22 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, dje.gcc

Hi!

(Please split off paddi to a separate patch?)

On Mon, Aug 26, 2019 at 04:43:37PM -0400, Michael Meissner wrote:
> 	(prefixed_paddi_p): Fix thinkos in last patch.

Do that separately please.  Don't hide this in another patch like this.

Hrm, this is not in this patch at all?  Fix the changelog, then :-)

> --- gcc/config/rs6000/predicates.md	(revision 274870)
> +++ gcc/config/rs6000/predicates.md	(working copy)
> @@ -839,7 +839,8 @@ (define_special_predicate "indexed_addre
>  (define_predicate "add_operand"
>    (if_then_else (match_code "const_int")
>      (match_test "satisfies_constraint_I (op)
> -		 || satisfies_constraint_L (op)")
> +		 || satisfies_constraint_L (op)
> +		 || satisfies_constraint_eI (op)")
>      (match_operand 0 "gpc_reg_operand")))
>  
>  ;; Return 1 if the operand is either a non-special register, or 0, or -1.
> @@ -852,7 +853,8 @@ (define_predicate "adde_operand"
>  (define_predicate "non_add_cint_operand"
>    (and (match_code "const_int")
>         (match_test "!satisfies_constraint_I (op)
> -		    && !satisfies_constraint_L (op)")))
> +		    && !satisfies_constraint_L (op)
> +		    && !satisfies_constraint_eI (op)")))

(define_predicate "non_add_cint_operand"
  (and (match_code "const_int")
       (not (match_operand 0 "add_operand"))))

?  You can do that *now*, and it is pre-approved.  (This could use a better
name btw., I always have to look up what it means; a longer name is fine as
well of course, it is used only once or so).

> @@ -933,6 +935,13 @@ (define_predicate "lwa_operand"
>      return false;
>  
>    addr = XEXP (inner, 0);
> +
> +  /* The LWA instruction uses the DS-form format where the bottom two bits of
> +     the offset must be 0.  The prefixed PLWA does not have this
> +     restriction.  */
> +  if (prefixed_local_addr_p (addr, mode, TRAD_INSN_DS))
> +    return true;

Why does the decision whether something is a valid prefixed lwa_operand
need to know the non-prefixed lwa is a DS-form instruction?

And "local" is a head-scratcher for this condition, too.

> +;; Return 1 if op is a memory operand that is not prefixed.
> +(define_predicate "non_prefixed_mem_operand"
> +  (match_code "mem")
> +{
> +  if (!memory_operand (op, mode))
> +    return false;
> +
> +  return !prefixed_local_addr_p (XEXP (op, 0), GET_MODE (op),
> +				 TRAD_INSN_DEFAULT);
> +})

Use match_operand for the first condition please (and then match_test for
the second?)

This does make it seem like we need a prefixed_local_mem_p as well?  So
that we need neither that XEXP nor that GET_MODE.

> @@ -5735,6 +5735,10 @@ num_insns_constant_gpr (HOST_WIDE_INT va
>  	   && (value >> 31 == -1 || value >> 31 == 0))
>      return 1;
>  
> +  /* PADDI can support up to 34 bit signed integers.  */
> +  else if (TARGET_PREFIXED_ADDR && SIGNED_34BIT_OFFSET_P (value))
> +    return 1;

Write this earlier, together with the 16BIT one?

> @@ -6905,6 +6909,7 @@ rs6000_adjust_vec_address (rtx scalar_re
>    rtx element_offset;
>    rtx new_addr;
>    bool valid_addr_p;
> +  bool pcrel_p = TARGET_PCREL && pcrel_local_address (addr, Pmode);

This is used 159 lines later.  Please refactor things.  That would make
a separate patch *before* this one.

> +  /* Optimize pc-relative addresses.  */
> +  else if (pcrel_p)
> +    {
> +      if (CONST_INT_P (element_offset))
> +	{
> +	  rtx addr2 = addr;

This var needs a better name and/or comments.  Or maybe just factoring.

> @@ -7007,9 +7050,8 @@ rs6000_adjust_vec_address (rtx scalar_re
>  
>    /* If we have a PLUS, we need to see whether the particular register class
>       allows for D-FORM or X-FORM addressing.  */
> -  if (GET_CODE (new_addr) == PLUS)
> +  if (GET_CODE (new_addr) == PLUS || pcrel_p)

That second condition needs a comment.

> @@ -7609,7 +7675,7 @@ mem_operand_ds_form (rtx op, machine_mod
>         causes a wrap, so test only the low 16 bits.  */
>      offset = ((offset & 0xffff) ^ 0x8000) - 0x8000;
>  
> -  return offset + 0x8000 < 0x10000u - extra;
> +  return SIGNED_16BIT_OFFSET_EXTRA_P (offset, extra);

Please do all these things first too, as a separate patch.

> -  offset += 0x8000;
> -  return offset < 0x10000 - extra;
> +  if (TARGET_PREFIXED_ADDR)
> +    return SIGNED_34BIT_OFFSET_EXTRA_P (offset, extra);
> +  else
> +    return SIGNED_16BIT_OFFSET_EXTRA_P (offset, extra);

So this you could just do the 16BIT first, and then *this* patch will add
the 34BIT thing, in an easy-to-read patch.

> +    {
> +      /* There is no prefixed version of the load/store with update.  */
> +      return !prefixed_local_addr_p (XEXP (x, 1), mode, TRAD_INSN_DEFAULT);
> +    }

If you pass the actual MEM, the prefixed_local_mem_p function can return
false, itself.

> -	  unsigned HOST_WIDE_INT val = INTVAL (XEXP (addr, 1));
> -	  return val + 0x8000 >= 0x10000 - (TARGET_POWERPC64 ? 8 : 12);
> +	  HOST_WIDE_INT val = INTVAL (XEXP (addr, 1));
> +	  HOST_WIDE_INT extra = TARGET_POWERPC64 ? 8 : 12;

The 8 vs. 12 could use a comment (yes, I know it was there already).  Do you
know what this is about, why it is 8 and 12?

> +/* Make a memory address non-prefixed if it is prefixed.  */

"Return an RTX that is like MEM but does not need prefixed instructions
to access."?

> +rtx
> +make_memory_non_prefixed (rtx mem)
> +{
> +  gcc_assert (MEM_P (mem));
> +  if (prefixed_local_addr_p (XEXP (mem, 0), GET_MODE (mem), TRAD_INSN_DEFAULT))

Swap the condition and do an early-out, please.

> +    {
> +      rtx old_addr = XEXP (mem, 0);
> +      rtx new_addr;

You you also need to strip CONST from around the address here?

> @@ -21060,7 +21168,8 @@ rs6000_rtx_costs (rtx x, machine_mode mo
>  	    || outer_code == PLUS
>  	    || outer_code == MINUS)
>  	   && (satisfies_constraint_I (x)
> -	       || satisfies_constraint_L (x)))
> +	       || satisfies_constraint_L (x)
> +	       || satisfies_constraint_eI (x)))

Just use add_operand here, maybe?

OTOH, do we want to count prefixed insns as not more expensive than
prefixed ones?

> +/* How many real instructions are generated for this insn?  This is slightly

"How many machine instructions are generated for INSN".

> +static int
> +rs6000_num_insns (rtx_insn *insn)
> +{
> +  /* Try to figure it out based on the length and whether there are prefixed
> +     instructions.  While prefixed instructions are only 8 bytes, we have to
> +     use 12 as the size of the first prefixed instruction in case the
> +     instruction needs to be aligned.  Back to back prefixed instructions would
> +     only take 20 bytes, since it is guaranteed that one of the prefixed
> +     instructions does not need the alignment.  */
> +  int length = get_attr_length (insn);
> +
> +  if (length >= 12 && TARGET_PREFIXED_ADDR
> +      && get_attr_prefixed (insn) == PREFIXED_YES)
> +    {
> +      /* Single prefixed instruction.  */
> +      if (length == 12)
> +	return 1;
> +
> +      /* A normal instruction and a prefixed instruction (16) or two back
> +	 to back prefixed instructions (20).  */
> +      if (length == 16 || length == 20)
> +	return 2;
> +
> +      /* Guess for larger instruction sizes.  */
> +      return 2 + (length - 20) / 4;
> +    }
> +
> +  return length / 4;
> +}

Yuck.  It only needs an approximate answer, but why then handle all kinds
of cases that you cannot test (because they do not currently happen)?

Instead, handle prefixed insns one step up, in insn_cost itself?  It
knows more, it can make better estimates.  It can do it per instruction
type, importantly.

>  (define_insn "*add<mode>3"
> -  [(set (match_operand:GPR 0 "gpc_reg_operand" "=r,r,r")
> -	(plus:GPR (match_operand:GPR 1 "gpc_reg_operand" "%r,b,b")
> -		  (match_operand:GPR 2 "add_operand" "r,I,L")))]
> +  [(set (match_operand:GPR 0 "gpc_reg_operand" "=r,r,r,r")
> +	(plus:GPR (match_operand:GPR 1 "gpc_reg_operand" "%r,b,b,b")
> +		  (match_operand:GPR 2 "add_operand" "r,I,L,eI")))]
>    ""
>    "@
>     add %0,%1,%2
>     addi %0,%1,%2
> -   addis %0,%1,%v2"
> -  [(set_attr "type" "add")])
> +   addis %0,%1,%v2
> +   addi %0,%1,%2"
> +  [(set_attr "type" "add")
> +   (set_attr "isa" "*,*,*,fut")])

Okay.

> @@ -6909,22 +6911,22 @@ (define_insn "movsi_low"
>  
>  ;;		MR           LA           LWZ          LFIWZX       LXSIWZX
>  ;;		STW          STFIWX       STXSIWX      LI           LIS
> -;;		#            XXLOR        XXSPLTIB 0   XXSPLTIB -1  VSPLTISW
> -;;		XXLXOR 0     XXLORC -1    P9 const     MTVSRWZ      MFVSRWZ
> -;;		MF%1         MT%0         NOP
> +;;		PLI          #            XXLOR        XXSPLTIB 0   XXSPLTIB -1
> +;;		VSPLTISW     XXLXOR 0     XXLORC -1    P9 const     MTVSRWZ
> +;;		MFVSRWZ      MF%1         MT%0         NOP

So this is adding just the PLI?  Put it on a line of its own then?  And
don't reformat the existing stuff.

It would be nice if this all can be formatted a bit nicer eventually, in
more logical groups, not strictly five per line, which isn't need for
anything and not helpful at all either.  So for this maybe

  mr la
  lwz lfiwzx lxsiwzx
  stw stfiwx stxsiwx
  li lis pli #

etc.  You also have the restriction that the order matters somewhat, but
there still is a lot of room to make it easier to read.

> -;; Split a load of a large constant into the appropriate two-insn
> -;; sequence.
> +;; Split a load of a large constant into the appropriate two-insn sequence.  On
> +;; systems that support PADDI (PLI), we can use PLI to load any 32-bit constant
> +;; in one instruction.
>  
>  (define_split
>    [(set (match_operand:SI 0 "gpc_reg_operand")
>  	(match_operand:SI 1 "const_int_operand"))]
>    "(unsigned HOST_WIDE_INT) (INTVAL (operands[1]) + 0x8000) >= 0x10000

Use the 16BIT thing here?

> @@ -7769,9 +7782,13 @@ (define_insn_and_split "*mov<mode>_64bit
>    "#"
>    "&& reload_completed"
>    [(pc)]
> -{ rs6000_split_multireg_move (operands[0], operands[1]); DONE; }
> -  [(set_attr "length" "8,8,8,8,12,12,8,8,8")
> -   (set_attr "isa" "*,*,*,*,*,*,*,p8v,p8v")])
> +{
> +  rs6000_split_multireg_move (operands[0], operands[1]);
> +  DONE;
> +}
> +  [(set_attr "isa" "*,*,*,*,*,*,*,*,p8v,p8v")
> +   (set_attr "non_prefixed_length" "8")
> +   (set_attr "prefixed_length" "20")])

Should this have a separate alternative for prefixed addressing?

What happened to the 12's?

> @@ -1149,10 +1149,30 @@ (define_insn "vsx_mov<mode>_64bit"
>                 "vecstore,  vecload,   vecsimple, mffgpr,    mftgpr,    load,
>                  store,     load,      store,     *,         vecsimple, vecsimple,
>                  vecsimple, *,         *,         vecstore,  vecload")
> -   (set_attr "length"
> -               "*,         *,         *,         8,         *,         8,
> -                8,         8,         8,         8,         *,         *,
> -                *,         20,        8,         *,         *")
> +   (set (attr "non_prefixed_length")
> +	(cond [(and (eq_attr "alternative" "4")		;; MTVSRDD
> +		    (match_test "TARGET_P9_VECTOR"))
> +	       (const_string "4")
> +
> +	       (eq_attr "alternative" "3,4")		;; GPR <-> VSX
> +	       (const_string "8")
> +
> +	       (eq_attr "alternative" "5,6,7,8")	;; GPR load/store
> +	       (const_string "8")]
> +	      (const_string "*")))

Why handle alternative 4 separately like this?  Shouldn't there just be
a separate alternative for the p9 version?


Segher

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH, V3, #5 of 10], Make -mpcrel default on little endian Linux systems
  2019-08-26 21:43 ` [PATCH, V3, #5 of 10], Make -mpcrel default on little endian Linux systems Michael Meissner
@ 2019-08-30 19:46   ` Segher Boessenkool
  2019-09-03 21:07     ` Michael Meissner
  0 siblings, 1 reply; 42+ messages in thread
From: Segher Boessenkool @ 2019-08-30 19:46 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, dje.gcc

On Mon, Aug 26, 2019 at 05:07:25PM -0400, Michael Meissner wrote:
> +/* By default enable support for pc-relative and numeric prefixed addressing on
> +   the 'future' system, unless it is overriden at build time.  */
> +#ifndef TARGET_PREFIXED_ADDR_DEFAULT
> +#define TARGET_PREFIXED_ADDR_DEFAULT	1
> +#endif
> +
> +#if !defined (TARGET_PCREL_DEFAULT) && TARGET_PREFIXED_ADDR_DEFAULT
> +#define TARGET_PCREL_DEFAULT		1
> +#endif

Spelling ("overridden").

How can it be overridden at build time?

How can it be defined already, when linux64.h is included?  Don't put in
guards against things that cannot happen.


> +  if (TARGET_FUTURE)
> +    {
> +      bool explicit_prefixed = ((rs6000_isa_flags_explicit
> +				 & OPTION_MASK_PREFIXED_ADDR) != 0);
> +      bool explicit_pcrel = ((rs6000_isa_flags_explicit
> +			      & OPTION_MASK_PCREL) != 0);
> +
> +      /* Prefixed addressing requires 64-bit registers.  */

Does it?  Don't disable things just because you do not want to think
about if and how to support them.  Be much more exact in the comment here
if you do have a reason to disable it here.

> +      if (!TARGET_POWERPC64)
> +	{
> +	  if (TARGET_PCREL && explicit_pcrel)
> +	    error ("%qs requires %qs", "-mpcrel", "-m64");

TARGET_POWERPC64 is -mpowerpc64.  -m64 is TARGET_64BIT.

> +      /* Enable defaults if desired.  */
> +      else
> +	{
> +	  if (!explicit_prefixed
> +	      && (TARGET_PREFIXED_ADDR_DEFAULT
> +		  || TARGET_PCREL
> +		  || TARGET_PCREL_DEFAULT))
> +	    rs6000_isa_flags |= OPTION_MASK_PREFIXED_ADDR;
> +
> +	  if (!explicit_pcrel && TARGET_PCREL_DEFAULT
> +	      && TARGET_CMODEL == CMODEL_MEDIUM)
> +	    rs6000_isa_flags |= OPTION_MASK_PCREL;
> +	}

Should these be the other way around?


Segher

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH, V3, #4 of 10], Add general prefixed/pcrel support
  2019-08-30 19:22   ` Segher Boessenkool
@ 2019-08-31  3:08     ` Alan Modra
  2019-08-31 14:13       ` Segher Boessenkool
  0 siblings, 1 reply; 42+ messages in thread
From: Alan Modra @ 2019-08-31  3:08 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Michael Meissner, gcc-patches, dje.gcc

On Fri, Aug 30, 2019 at 11:35:11AM -0500, Segher Boessenkool wrote:
> > -	  unsigned HOST_WIDE_INT val = INTVAL (XEXP (addr, 1));
> > -	  return val + 0x8000 >= 0x10000 - (TARGET_POWERPC64 ? 8 : 12);
> > +	  HOST_WIDE_INT val = INTVAL (XEXP (addr, 1));
> > +	  HOST_WIDE_INT extra = TARGET_POWERPC64 ? 8 : 12;
> 
> The 8 vs. 12 could use a comment (yes, I know it was there already).  Do you
> know what this is about, why it is 8 and 12?

"extra" here covers the increase in offset needed to access the memory
using multiple registers.  For example, when loading a TImode mem to
gprs you will load at offset+0 and offset+8 when powerpc64, and
offset+0, offset+4, offset+8, and offset+12 when powerpc32.

-- 
Alan Modra
Australia Development Lab, IBM

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH, V3, #4 of 10], Add general prefixed/pcrel support
  2019-08-31  3:08     ` Alan Modra
@ 2019-08-31 14:13       ` Segher Boessenkool
  0 siblings, 0 replies; 42+ messages in thread
From: Segher Boessenkool @ 2019-08-31 14:13 UTC (permalink / raw)
  To: Alan Modra; +Cc: Michael Meissner, gcc-patches, dje.gcc

On Sat, Aug 31, 2019 at 10:36:00AM +0930, Alan Modra wrote:
> On Fri, Aug 30, 2019 at 11:35:11AM -0500, Segher Boessenkool wrote:
> > > -	  unsigned HOST_WIDE_INT val = INTVAL (XEXP (addr, 1));
> > > -	  return val + 0x8000 >= 0x10000 - (TARGET_POWERPC64 ? 8 : 12);
> > > +	  HOST_WIDE_INT val = INTVAL (XEXP (addr, 1));
> > > +	  HOST_WIDE_INT extra = TARGET_POWERPC64 ? 8 : 12;
> > 
> > The 8 vs. 12 could use a comment (yes, I know it was there already).  Do you
> > know what this is about, why it is 8 and 12?
> 
> "extra" here covers the increase in offset needed to access the memory
> using multiple registers.  For example, when loading a TImode mem to
> gprs you will load at offset+0 and offset+8 when powerpc64, and
> offset+0, offset+4, offset+8, and offset+12 when powerpc32.

Ah, so it is the size of the mode minus the size of the accesses done to
get it...  16 - UNITS_PER_WORD may be a good way to express it?  I don't
see a way to say it that isn't still helped by a comment though.

Thanks,


Segher

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH, V3, #6 of 10], Fix vec_extract breakage
  2019-08-26 21:52 ` [PATCH, V3, #6 of 10], Fix vec_extract breakage Michael Meissner
@ 2019-09-03 19:49   ` Segher Boessenkool
  2019-09-05 20:48     ` Michael Meissner
  0 siblings, 1 reply; 42+ messages in thread
From: Segher Boessenkool @ 2019-09-03 19:49 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, dje.gcc

Hi!

On Mon, Aug 26, 2019 at 05:20:12PM -0400, Michael Meissner wrote:
> @@ -3249,9 +3249,10 @@ (define_insn "vsx_vslo_<mode>"
>  ;; Variable V2DI/V2DF extract
>  (define_insn_and_split "vsx_extract_<mode>_var"
>    [(set (match_operand:<VS_scalar> 0 "gpc_reg_operand" "=v,wa,r")
> -	(unspec:<VS_scalar> [(match_operand:VSX_D 1 "input_operand" "v,m,m")
> -			     (match_operand:DI 2 "gpc_reg_operand" "r,r,r")]
> -			    UNSPEC_VSX_EXTRACT))
> +	(unspec:<VS_scalar>
> +	 [(match_operand:VSX_D 1 "reg_or_non_pcrel_operand" "v,ep,ep")
> +	  (match_operand:DI 2 "gpc_reg_operand" "r,r,r")]
> +	 UNSPEC_VSX_EXTRACT))
>     (clobber (match_scratch:DI 3 "=r,&b,&b"))
>     (clobber (match_scratch:V2DI 4 "=&v,X,X"))]
>    "VECTOR_MEM_VSX_P (<MODE>mode) && TARGET_DIRECT_MOVE_64BIT"

After this patch, what happens if you have this instruction generated
with some pcrel memory?  This pattern will no longer match.  Or can that
not happen?  Many places call gen_vsx_extract_*.

I wouldn't use "ep" for *non*-pcrel.  The new constraints/predicates don't
need to do everything in a C block.  Looks good otherwise.


Segher

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH, V3, #5 of 10], Make -mpcrel default on little endian Linux systems
  2019-08-30 19:46   ` Segher Boessenkool
@ 2019-09-03 21:07     ` Michael Meissner
  2019-09-03 22:25       ` Segher Boessenkool
  0 siblings, 1 reply; 42+ messages in thread
From: Michael Meissner @ 2019-09-03 21:07 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Michael Meissner, gcc-patches, dje.gcc

On Fri, Aug 30, 2019 at 01:32:57PM -0500, Segher Boessenkool wrote:
> On Mon, Aug 26, 2019 at 05:07:25PM -0400, Michael Meissner wrote:
> > +/* By default enable support for pc-relative and numeric prefixed addressing on
> > +   the 'future' system, unless it is overriden at build time.  */
> > +#ifndef TARGET_PREFIXED_ADDR_DEFAULT
> > +#define TARGET_PREFIXED_ADDR_DEFAULT	1
> > +#endif
> > +
> > +#if !defined (TARGET_PCREL_DEFAULT) && TARGET_PREFIXED_ADDR_DEFAULT
> > +#define TARGET_PCREL_DEFAULT		1
> > +#endif
> 
> Spelling ("overridden").
> 
> How can it be overridden at build time?
> 
> How can it be defined already, when linux64.h is included?  Don't put in
> guards against things that cannot happen.

You can define TARGET_PREFIXED_ADDR_DEFAULT or TARGET_PCREL_DEFAULT in your
CFLAGS or via the make command line (which is how I tested it).

> 
> > +  if (TARGET_FUTURE)
> > +    {
> > +      bool explicit_prefixed = ((rs6000_isa_flags_explicit
> > +				 & OPTION_MASK_PREFIXED_ADDR) != 0);
> > +      bool explicit_pcrel = ((rs6000_isa_flags_explicit
> > +			      & OPTION_MASK_PCREL) != 0);
> > +
> > +      /* Prefixed addressing requires 64-bit registers.  */
> 
> Does it?  Don't disable things just because you do not want to think
> about if and how to support them.  Be much more exact in the comment here
> if you do have a reason to disable it here.
>
> > +      if (!TARGET_POWERPC64)
> > +	{
> > +	  if (TARGET_PCREL && explicit_pcrel)
> > +	    error ("%qs requires %qs", "-mpcrel", "-m64");
> 
> TARGET_POWERPC64 is -mpowerpc64.  -m64 is TARGET_64BIT.
> 
> > +      /* Enable defaults if desired.  */
> > +      else
> > +	{
> > +	  if (!explicit_prefixed
> > +	      && (TARGET_PREFIXED_ADDR_DEFAULT
> > +		  || TARGET_PCREL
> > +		  || TARGET_PCREL_DEFAULT))
> > +	    rs6000_isa_flags |= OPTION_MASK_PREFIXED_ADDR;
> > +
> > +	  if (!explicit_pcrel && TARGET_PCREL_DEFAULT
> > +	      && TARGET_CMODEL == CMODEL_MEDIUM)
> > +	    rs6000_isa_flags |= OPTION_MASK_PCREL;
> > +	}
> 
> Should these be the other way around?

I'm not sure I follow the question.  You want to enable pc-relative support if
prefixed addressing support is enabled, and the OS says that it supports
pc-relative addressing.

If you previously disabled prefixed addressing, you can't enable pc-relative by default.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH, V3, #5 of 10], Make -mpcrel default on little endian Linux systems
  2019-09-03 21:07     ` Michael Meissner
@ 2019-09-03 22:25       ` Segher Boessenkool
  0 siblings, 0 replies; 42+ messages in thread
From: Segher Boessenkool @ 2019-09-03 22:25 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, dje.gcc

On Tue, Sep 03, 2019 at 05:06:52PM -0400, Michael Meissner wrote:
> On Fri, Aug 30, 2019 at 01:32:57PM -0500, Segher Boessenkool wrote:
> > On Mon, Aug 26, 2019 at 05:07:25PM -0400, Michael Meissner wrote:
> > > +/* By default enable support for pc-relative and numeric prefixed addressing on
> > > +   the 'future' system, unless it is overriden at build time.  */
> > > +#ifndef TARGET_PREFIXED_ADDR_DEFAULT
> > > +#define TARGET_PREFIXED_ADDR_DEFAULT	1
> > > +#endif
> > > +
> > > +#if !defined (TARGET_PCREL_DEFAULT) && TARGET_PREFIXED_ADDR_DEFAULT
> > > +#define TARGET_PCREL_DEFAULT		1
> > > +#endif
> > 
> > Spelling ("overridden").
> > 
> > How can it be overridden at build time?
> > 
> > How can it be defined already, when linux64.h is included?  Don't put in
> > guards against things that cannot happen.
> 
> You can define TARGET_PREFIXED_ADDR_DEFAULT or TARGET_PCREL_DEFAULT in your
> CFLAGS or via the make command line (which is how I tested it).

Rebuilding GCC?  Yeah, that doesn't count, sorry :-)

It's fine to put some sanity check asserts in somewhere, if you are
worried about such things.  But please don't add unnecessary conditions
to macros like this.  If something unexpected happened, we want to hear
about it (either directly, e.g. via an assert, or indirectly, via crash
and burn), it should not be hidden.

> > > +      /* Enable defaults if desired.  */
> > > +      else
> > > +	{
> > > +	  if (!explicit_prefixed
> > > +	      && (TARGET_PREFIXED_ADDR_DEFAULT
> > > +		  || TARGET_PCREL
> > > +		  || TARGET_PCREL_DEFAULT))
> > > +	    rs6000_isa_flags |= OPTION_MASK_PREFIXED_ADDR;
> > > +
> > > +	  if (!explicit_pcrel && TARGET_PCREL_DEFAULT
> > > +	      && TARGET_CMODEL == CMODEL_MEDIUM)
> > > +	    rs6000_isa_flags |= OPTION_MASK_PCREL;
> > > +	}
> > 
> > Should these be the other way around?
> 
> I'm not sure I follow the question.  You want to enable pc-relative support if
> prefixed addressing support is enabled, and the OS says that it supports
> pc-relative addressing.
> 
> If you previously disabled prefixed addressing, you can't enable pc-relative by default.

I meant, should it instead be:

	  if (!explicit_pcrel && TARGET_PCREL_DEFAULT
	      && TARGET_CMODEL == CMODEL_MEDIUM)
	    rs6000_isa_flags |= OPTION_MASK_PCREL;
	}

	  if (!explicit_prefixed
	      && (TARGET_PREFIXED_ADDR_DEFAULT
		  || TARGET_PCREL
		  || TARGET_PCREL_DEFAULT))
	    rs6000_isa_flags |= OPTION_MASK_PREFIXED_ADDR;

i.e., if we enable PCREL, shouldn't we also enable PREFIXED_ADDR?


Segher

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH, V3, #7 of 10], Implement PCREL_OPT relocation optimization
  2019-08-26 22:06 ` [PATCH, V3, #7 of 10], Implement PCREL_OPT relocation optimization Michael Meissner
  2019-08-28 21:48   ` Michael Meissner
@ 2019-09-03 22:56   ` Segher Boessenkool
  2019-09-03 23:20     ` Michael Meissner
  1 sibling, 1 reply; 42+ messages in thread
From: Segher Boessenkool @ 2019-09-03 22:56 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, dje.gcc

Hi!

On Mon, Aug 26, 2019 at 05:43:41PM -0400, Michael Meissner wrote:
> /* This file implements a RTL pass that looks for pc-relative loads of the
>    address of an external variable using the PCREL_GOT relocation and a single
>    load/store that uses that GOT pointer.

Does this work better than having a peephole for it?  Is there some reason
you cannot do this with a peephole?


Segher

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH, V3, #8 of 10], Miscellaneous prefixed addressing tests
  2019-08-27  7:01 ` [PATCH, V3, #8 of 10], Miscellaneous prefixed addressing tests Michael Meissner
@ 2019-09-03 23:17   ` Segher Boessenkool
  2019-09-05 21:01     ` Michael Meissner
  0 siblings, 1 reply; 42+ messages in thread
From: Segher Boessenkool @ 2019-09-03 23:17 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, dje.gcc

On Mon, Aug 26, 2019 at 05:48:08PM -0400, Michael Meissner wrote:
> This patch contains the miscellaneous tests for GCC to test some features of
> --- gcc/testsuite/gcc.target/powerpc/paddi-1.c	(revision 274879)
> +++ gcc/testsuite/gcc.target/powerpc/paddi-1.c	(working copy)
> @@ -0,0 +1,12 @@
> +/* { dg-do compile { target { powerpc*-*-* } } } */

Everything in gcc.target/powerpc implicitly has this.  You may want
  /* { dg-do compile } */
for documentation value, or simply because it is nicely symmetric if you
also have run tests, but there is no need for the target clause.

> +/* { dg-require-effective-target powerpc_future_ok } */

Why does this test return false if not on linux?  That doesn't make much
sense.  A user can select -mcpu=future whatever OS he is on.

We probably want a test saying if there is prefixed addressing, or the
like, instead?

> +/* { dg-options "-O2 -mdejagnu-cpu=future" } */

> --- gcc/testsuite/gcc.target/powerpc/prefix-odd-memory.c	(revision 274879)
> +++ gcc/testsuite/gcc.target/powerpc/prefix-odd-memory.c	(working copy)
> @@ -0,0 +1,156 @@
> +/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */

All of these except the ld and std and lwa should work on -m32 as well,
right?

> +long
> +load_si_odd (unsigned char *p)
> +{
> +  return *(int *)(p + 1);		/* should generate PLWA.  */
> +}
> +
> +unsigned long
> +load_ul_odd (unsigned char *p)
> +{
> +  return *(unsigned long *)(p + 1);	/* should generate PLD.  */
> +}
> +
> +long
> +load_sl_odd (unsigned char *p)
> +{
> +  return *(long *)(p + 1);	/* should generate PLD.  */
> +}

> +void
> +store_ul_odd (unsigned long ul, unsigned char *p)
> +{
> +  *(unsigned long *)(p + 1) = ul;	/* should generate PSTD.  */
> +}
> +
> +void
> +store_sl_odd (signed long sl, unsigned char *p)
> +{
> +  *(signed long *)(p + 1) = sl;		/* should generate PSTD.  */
> +}

> +void
> +store_double_odd (double d, unsigned char *p)
> +{
> +  *(double *)(p + 1) = d;		/* should generate STD.  */
> +}

(PSTFD?)

So put an #if around those?

> +/* { dg-final { scan-assembler-times {\mpld\M}   2 } } */
> +/* { dg-final { scan-assembler-times {\mplwa\M}  1 } } */
> +/* { dg-final { scan-assembler-times {\mpstd\M}  2 } } */

And conditions on these.

Right now this all is only supported on powerpc64le-linux, but it won't
stay that way.  I'm not looking forward to having to change all the tests,
let's try to test for support of the actual feature we need, instead.


Segher

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH, V3, #7 of 10], Implement PCREL_OPT relocation optimization
  2019-09-03 22:56   ` Segher Boessenkool
@ 2019-09-03 23:20     ` Michael Meissner
  2019-09-03 23:33       ` Segher Boessenkool
  0 siblings, 1 reply; 42+ messages in thread
From: Michael Meissner @ 2019-09-03 23:20 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Michael Meissner, gcc-patches, dje.gcc

On Tue, Sep 03, 2019 at 05:56:03PM -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Mon, Aug 26, 2019 at 05:43:41PM -0400, Michael Meissner wrote:
> > /* This file implements a RTL pass that looks for pc-relative loads of the
> >    address of an external variable using the PCREL_GOT relocation and a single
> >    load/store that uses that GOT pointer.
> 
> Does this work better than having a peephole for it?  Is there some reason
> you cannot do this with a peephole?

Yes.  Peepholes only look at adjacent insns.  This optimization allows the load
of the GOT address to be separated from the eventual load or store.

Peephole2's are likely too early, because you really, really, really don't want
any other pass moving things around.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH, V3, #9 of 10], Prefixed addressing tests with large offsets
  2019-08-27  7:55 ` [PATCH, V3, #9 of 10], Prefixed addressing tests with large offsets Michael Meissner
@ 2019-09-03 23:22   ` Segher Boessenkool
  0 siblings, 0 replies; 42+ messages in thread
From: Segher Boessenkool @ 2019-09-03 23:22 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, dje.gcc

Hi!

On Mon, Aug 26, 2019 at 05:50:38PM -0400, Michael Meissner wrote:
> 	* gcc/testsuite/gcc.target/powerpc/prefix-large.h: New set of
> 	tests to test prefixed addressing on 'future' system with large
> 	numeric offsets.

In the changelog just say "New." or "New test."; explanation of what the
tests try to test for should be in the test themselves (and is much
appreciated!)

No other comments than those on 8/10; looks good otherwise, thanks.


Segher

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH, V3, #7 of 10], Implement PCREL_OPT relocation optimization
  2019-09-03 23:20     ` Michael Meissner
@ 2019-09-03 23:33       ` Segher Boessenkool
  2019-09-04 17:26         ` Michael Meissner
  0 siblings, 1 reply; 42+ messages in thread
From: Segher Boessenkool @ 2019-09-03 23:33 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, dje.gcc

On Tue, Sep 03, 2019 at 07:20:13PM -0400, Michael Meissner wrote:
> On Tue, Sep 03, 2019 at 05:56:03PM -0500, Segher Boessenkool wrote:
> > Hi!
> > 
> > On Mon, Aug 26, 2019 at 05:43:41PM -0400, Michael Meissner wrote:
> > > /* This file implements a RTL pass that looks for pc-relative loads of the
> > >    address of an external variable using the PCREL_GOT relocation and a single
> > >    load/store that uses that GOT pointer.
> > 
> > Does this work better than having a peephole for it?  Is there some reason
> > you cannot do this with a peephole?
> 
> Yes.  Peepholes only look at adjacent insns.

Huh.  Wow.  Would you believe I never knew that (or I forgot)?  Well, that
explains why peepholes aren't very effective for us at all, alright!

> This optimization allows the load
> of the GOT address to be separated from the eventual load or store.
> 
> Peephole2's are likely too early, because you really, really, really don't want
> any other pass moving things around.

That is a bit worrying...  What can go wrong?


Segher

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH, V3, #7 of 10], Implement PCREL_OPT relocation optimization
  2019-09-03 23:33       ` Segher Boessenkool
@ 2019-09-04 17:26         ` Michael Meissner
  2019-09-06 12:09           ` Segher Boessenkool
  0 siblings, 1 reply; 42+ messages in thread
From: Michael Meissner @ 2019-09-04 17:26 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Michael Meissner, gcc-patches, dje.gcc

On Tue, Sep 03, 2019 at 06:33:26PM -0500, Segher Boessenkool wrote:
> On Tue, Sep 03, 2019 at 07:20:13PM -0400, Michael Meissner wrote:
> > On Tue, Sep 03, 2019 at 05:56:03PM -0500, Segher Boessenkool wrote:
> > > Hi!
> > > 
> > > On Mon, Aug 26, 2019 at 05:43:41PM -0400, Michael Meissner wrote:
> > > > /* This file implements a RTL pass that looks for pc-relative loads of the
> > > >    address of an external variable using the PCREL_GOT relocation and a single
> > > >    load/store that uses that GOT pointer.
> > > 
> > > Does this work better than having a peephole for it?  Is there some reason
> > > you cannot do this with a peephole?
> > 
> > Yes.  Peepholes only look at adjacent insns.
> 
> Huh.  Wow.  Would you believe I never knew that (or I forgot)?  Well, that
> explains why peepholes aren't very effective for us at all, alright!
> 
> > This optimization allows the load
> > of the GOT address to be separated from the eventual load or store.
> > 
> > Peephole2's are likely too early, because you really, really, really don't want
> > any other pass moving things around.
> 
> That is a bit worrying...  What can go wrong?

As I say in the comments, with PCREL_OPT, you must have exactly one load of the
address and one load or store that references the load of the address.  If
something duplicates one of the loads or stores, or adds another reference to
the address, or just moves it so we can't link the loading of the address to
the final load/store, it will not work.

For stores, the value being stored must be live at both the loading of the
address and the store.

For loads, the register being loaded must not be used between the loading of
the address and the final load.

I.e. in:

		PLD r1,foo@got@pcrel
	.Lpcrel1:

		# other instructions

		.reloc .Lpcrel1-8,R_PPC64_PCREL_OPT,.-(.Lpcrel1-8)
		LWZ r2,0(r1)

If you get lucky and foo is defined in the same compilation unit, this will get
turned into:

		PLWZ r2,foo@pcrel

		# other instructions

		NOP

If foo is defined in a shared library (or you are linking for a shared library,
and foo is defined in the main program or another shared library), you get:

		PLD r1,.got.foo@pcrel

		# other instructions

		LWZ r2,0(r1)

		.section .got
	.got.foo: .quad	foo

So for loads, r2 must not be used between the PLD and LWZ instructions.

Similarly for stores:

		PLD r1,foo@got@pcrel
	.Lpcrel1:

		# other instructions

		.reloc .Lpcrel1-8,R_PPC64_PCREL_OPT,.-(.Lpcrel1-8)
		stw r2,0(r1)

If you get lucky, this becomes:

		PSTW r2,foo@pcrel

		# other instructions

		NOP

If foo is defined in a shared library (or you are linking for a shared library,
and foo is defined in the main program or another shared library), you get:

		PLD r1,.got.foo@pcrel

		# other instructions

		STW r2,0(r1)

		.section .got
	.got.foo: .quad	foo

So as I said, r2 must be live betweent he PLD and STW, because you don't know
if the PLD will be replaced with a PSTW or not.

So to keep other passes from 'improving' things, I opted to do the pass as the
last pass before final.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH, V3, #6 of 10], Fix vec_extract breakage
  2019-09-03 19:49   ` Segher Boessenkool
@ 2019-09-05 20:48     ` Michael Meissner
  2019-09-05 22:38       ` Segher Boessenkool
  0 siblings, 1 reply; 42+ messages in thread
From: Michael Meissner @ 2019-09-05 20:48 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Michael Meissner, gcc-patches, dje.gcc

On Tue, Sep 03, 2019 at 02:49:01PM -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Mon, Aug 26, 2019 at 05:20:12PM -0400, Michael Meissner wrote:
> > @@ -3249,9 +3249,10 @@ (define_insn "vsx_vslo_<mode>"
> >  ;; Variable V2DI/V2DF extract
> >  (define_insn_and_split "vsx_extract_<mode>_var"
> >    [(set (match_operand:<VS_scalar> 0 "gpc_reg_operand" "=v,wa,r")
> > -	(unspec:<VS_scalar> [(match_operand:VSX_D 1 "input_operand" "v,m,m")
> > -			     (match_operand:DI 2 "gpc_reg_operand" "r,r,r")]
> > -			    UNSPEC_VSX_EXTRACT))
> > +	(unspec:<VS_scalar>
> > +	 [(match_operand:VSX_D 1 "reg_or_non_pcrel_operand" "v,ep,ep")
> > +	  (match_operand:DI 2 "gpc_reg_operand" "r,r,r")]
> > +	 UNSPEC_VSX_EXTRACT))
> >     (clobber (match_scratch:DI 3 "=r,&b,&b"))
> >     (clobber (match_scratch:V2DI 4 "=&v,X,X"))]
> >    "VECTOR_MEM_VSX_P (<MODE>mode) && TARGET_DIRECT_MOVE_64BIT"
> 
> After this patch, what happens if you have this instruction generated
> with some pcrel memory?  This pattern will no longer match.  Or can that
> not happen?  Many places call gen_vsx_extract_*.

No, the only place that calls it directly is rs6000_expand_vector_extract.
There are multiple references to gen_vsx_extract_<mode>_var because it has a
switch statement to call the appropriate generator based on the mode.  When
rs6000_expand_vector_extract is called, it is in the expand phase of RTL, and a
force_reg has been done to move the vector into a register.

What this patch does is prevent the combiner from merging the load and extract
in the one case where it is loading the value from a pc-relative address.  In
that case, the compiler will just load the vector into a register, and then do
the normal variable extract from a register.

> I wouldn't use "ep" for *non*-pcrel.  The new constraints/predicates don't
> need to do everything in a C block.  Looks good otherwise.

Any particular suggestions for the spelling?

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH, V3, #8 of 10], Miscellaneous prefixed addressing tests
  2019-09-03 23:17   ` Segher Boessenkool
@ 2019-09-05 21:01     ` Michael Meissner
  2019-09-05 22:57       ` Segher Boessenkool
  0 siblings, 1 reply; 42+ messages in thread
From: Michael Meissner @ 2019-09-05 21:01 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Michael Meissner, gcc-patches, dje.gcc

On Tue, Sep 03, 2019 at 06:17:23PM -0500, Segher Boessenkool wrote:
> On Mon, Aug 26, 2019 at 05:48:08PM -0400, Michael Meissner wrote:
> > This patch contains the miscellaneous tests for GCC to test some features of
> > --- gcc/testsuite/gcc.target/powerpc/paddi-1.c	(revision 274879)
> > +++ gcc/testsuite/gcc.target/powerpc/paddi-1.c	(working copy)
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile { target { powerpc*-*-* } } } */
> 
> Everything in gcc.target/powerpc implicitly has this.  You may want
>   /* { dg-do compile } */
> for documentation value, or simply because it is nicely symmetric if you
> also have run tests, but there is no need for the target clause.
> 
> > +/* { dg-require-effective-target powerpc_future_ok } */
> 
> Why does this test return false if not on linux?  That doesn't make much
> sense.  A user can select -mcpu=future whatever OS he is on.
>
> We probably want a test saying if there is prefixed addressing, or the
> like, instead?
> 
> > +/* { dg-options "-O2 -mdejagnu-cpu=future" } */
> 
> > --- gcc/testsuite/gcc.target/powerpc/prefix-odd-memory.c	(revision 274879)
> > +++ gcc/testsuite/gcc.target/powerpc/prefix-odd-memory.c	(working copy)
> > @@ -0,0 +1,156 @@
> > +/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
> 
> All of these except the ld and std and lwa should work on -m32 as well,
> right?

I don't believe so.  I don't think prefixed load/store instructions work on
32-bit at all.

> Right now this all is only supported on powerpc64le-linux, but it won't
> stay that way.  I'm not looking forward to having to change all the tests,
> let's try to test for support of the actual feature we need, instead.

It depends on whether the test just tests the instruction (as many of these
tests do), or whether they are testing things using ELF syntax (i.e. @got and
@got@pcrel).

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH, V3, #6 of 10], Fix vec_extract breakage
  2019-09-05 20:48     ` Michael Meissner
@ 2019-09-05 22:38       ` Segher Boessenkool
  2019-09-06 10:26         ` Segher Boessenkool
  0 siblings, 1 reply; 42+ messages in thread
From: Segher Boessenkool @ 2019-09-05 22:38 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, dje.gcc

Hi Mike,

On Thu, Sep 05, 2019 at 04:48:28PM -0400, Michael Meissner wrote:
> On Tue, Sep 03, 2019 at 02:49:01PM -0500, Segher Boessenkool wrote:
> > On Mon, Aug 26, 2019 at 05:20:12PM -0400, Michael Meissner wrote:
> > > @@ -3249,9 +3249,10 @@ (define_insn "vsx_vslo_<mode>"
> > >  ;; Variable V2DI/V2DF extract
> > >  (define_insn_and_split "vsx_extract_<mode>_var"
> > >    [(set (match_operand:<VS_scalar> 0 "gpc_reg_operand" "=v,wa,r")
> > > -	(unspec:<VS_scalar> [(match_operand:VSX_D 1 "input_operand" "v,m,m")
> > > -			     (match_operand:DI 2 "gpc_reg_operand" "r,r,r")]
> > > -			    UNSPEC_VSX_EXTRACT))
> > > +	(unspec:<VS_scalar>
> > > +	 [(match_operand:VSX_D 1 "reg_or_non_pcrel_operand" "v,ep,ep")
> > > +	  (match_operand:DI 2 "gpc_reg_operand" "r,r,r")]
> > > +	 UNSPEC_VSX_EXTRACT))
> > >     (clobber (match_scratch:DI 3 "=r,&b,&b"))
> > >     (clobber (match_scratch:V2DI 4 "=&v,X,X"))]
> > >    "VECTOR_MEM_VSX_P (<MODE>mode) && TARGET_DIRECT_MOVE_64BIT"
> > 
> > After this patch, what happens if you have this instruction generated
> > with some pcrel memory?  This pattern will no longer match.  Or can that
> > not happen?  Many places call gen_vsx_extract_*.
> 
> No, the only place that calls it directly is rs6000_expand_vector_extract.
> There are multiple references to gen_vsx_extract_<mode>_var because it has a
> switch statement to call the appropriate generator based on the mode.

My grep found many more places because I plainly forgot the _var.
Whoopsie.

So no other places that use UNSPEC_VSX_EXTRACT can get pcrel addressing
generated either, I guess.  Okay.  It's hard to ascertain this.

> > I wouldn't use "ep" for *non*-pcrel.  The new constraints/predicates don't
> > need to do everything in a C block.  Looks good otherwise.
> 
> Any particular suggestions for the spelling?

Not "p" because that suggests it *is* pcrel (or it is a pointer, perhaps).
Maybe "en"?  Most letters are still available, lots of choice :-)


Segher

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH, V3, #8 of 10], Miscellaneous prefixed addressing tests
  2019-09-05 21:01     ` Michael Meissner
@ 2019-09-05 22:57       ` Segher Boessenkool
  0 siblings, 0 replies; 42+ messages in thread
From: Segher Boessenkool @ 2019-09-05 22:57 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, dje.gcc

On Thu, Sep 05, 2019 at 05:01:25PM -0400, Michael Meissner wrote:
> On Tue, Sep 03, 2019 at 06:17:23PM -0500, Segher Boessenkool wrote:
> > > +/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
> > 
> > All of these except the ld and std and lwa should work on -m32 as well,
> > right?
> 
> I don't believe so.

I don't see why not, but...

> I don't think prefixed load/store instructions work on 32-bit at all.

Please test for prefixed insns, instead.  You can then use that test
wherever it is needed, and if you disable 32-bit in there for no reason
at all, we can re-enable it on all tests easily.

It is also good documentation value.

> > Right now this all is only supported on powerpc64le-linux, but it won't
> > stay that way.  I'm not looking forward to having to change all the tests,
> > let's try to test for support of the actual feature we need, instead.
> 
> It depends on whether the test just tests the instruction (as many of these
> tests do), or whether they are testing things using ELF syntax (i.e. @got and
> @got@pcrel).

I don't see why?  You obviously should not test things for ABI A when
testing ABI B.  That does not mean you should not run your tests wherever
possible; just that you should not run them where *not* possible.


Segher

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V3, #1 of 10], Add basic pc-relative support
  2019-08-30  0:08       ` Segher Boessenkool
@ 2019-09-06  0:18         ` Michael Meissner
  2019-09-06 12:50           ` Segher Boessenkool
  0 siblings, 1 reply; 42+ messages in thread
From: Michael Meissner @ 2019-09-06  0:18 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Michael Meissner, gcc-patches, dje.gcc

On Thu, Aug 29, 2019 at 04:32:07PM -0500, Segher Boessenkool wrote:
> This is not just for reload anymore, so please don't name it that.  Renaming
> things isn't hard, this isn't a public API or anything :-)

This hasn't just be for reload for several years now.  Do you have a name you
prefer?

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH, V3, #6 of 10], Fix vec_extract breakage
  2019-09-05 22:38       ` Segher Boessenkool
@ 2019-09-06 10:26         ` Segher Boessenkool
  0 siblings, 0 replies; 42+ messages in thread
From: Segher Boessenkool @ 2019-09-06 10:26 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, dje.gcc

On Thu, Sep 05, 2019 at 05:38:18PM -0500, Segher Boessenkool wrote:
> On Thu, Sep 05, 2019 at 04:48:28PM -0400, Michael Meissner wrote:
> > > I wouldn't use "ep" for *non*-pcrel.  The new constraints/predicates don't
> > > need to do everything in a C block.  Looks good otherwise.
> > 
> > Any particular suggestions for the spelling?
> 
> Not "p" because that suggests it *is* pcrel (or it is a pointer, perhaps).
> Maybe "en"?  Most letters are still available, lots of choice :-)

Or what about "em" even?  How often is this constraint used, is there
some better candidate that might want the "em" name?  It should
preferably be something close to "m", so this is feels like a good
choice, do you see anything against it?


Segher

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH, V3, #7 of 10], Implement PCREL_OPT relocation optimization
  2019-09-04 17:26         ` Michael Meissner
@ 2019-09-06 12:09           ` Segher Boessenkool
  2019-09-09 20:32             ` Michael Meissner
  0 siblings, 1 reply; 42+ messages in thread
From: Segher Boessenkool @ 2019-09-06 12:09 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, dje.gcc

On Wed, Sep 04, 2019 at 01:26:27PM -0400, Michael Meissner wrote:

[snip]

> So to keep other passes from 'improving' things, I opted to do the pass as the
> last pass before final.

If the problem is that you do not properly analyse dependencies between
insns, well, fix that?

If this really needs to be done after everything else GCC does, that is
problematic.  What when you have two or more passes with that property?

If this really needs to be done after everything else GCC does, does it
belong in the compiler at all?  Should the assembler do it instead, or
the linker?


Segher

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V3, #1 of 10], Add basic pc-relative support
  2019-09-06  0:18         ` Michael Meissner
@ 2019-09-06 12:50           ` Segher Boessenkool
  2019-09-09 20:28             ` Michael Meissner
  0 siblings, 1 reply; 42+ messages in thread
From: Segher Boessenkool @ 2019-09-06 12:50 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, dje.gcc

On Thu, Sep 05, 2019 at 08:18:02PM -0400, Michael Meissner wrote:
> On Thu, Aug 29, 2019 at 04:32:07PM -0500, Segher Boessenkool wrote:
> > This is not just for reload anymore, so please don't name it that.  Renaming
> > things isn't hard, this isn't a public API or anything :-)
> 
> This hasn't just be for reload for several years now.

Yes, and since you are extending it a lot now, it is high time it is fixed.

> Do you have a name you prefer?

As I said, I don't think all these things should be lumped together at
all, and also you shouldn't precompute everything (as fixed values
always, after that precompute) into arrays anyway.  Instead, use
functions for all accessors, which can have simple and clear logic what
they return when.

If it is hard to find good names for your interfaces, most likely your
interfaces aren't structured very well.


Segher

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V3, #1 of 10], Add basic pc-relative support
  2019-09-06 12:50           ` Segher Boessenkool
@ 2019-09-09 20:28             ` Michael Meissner
  0 siblings, 0 replies; 42+ messages in thread
From: Michael Meissner @ 2019-09-09 20:28 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Michael Meissner, gcc-patches, dje.gcc

On Fri, Sep 06, 2019 at 07:50:51AM -0500, Segher Boessenkool wrote:
> On Thu, Sep 05, 2019 at 08:18:02PM -0400, Michael Meissner wrote:
> > On Thu, Aug 29, 2019 at 04:32:07PM -0500, Segher Boessenkool wrote:
> > > This is not just for reload anymore, so please don't name it that.  Renaming
> > > things isn't hard, this isn't a public API or anything :-)
> > 
> > This hasn't just be for reload for several years now.
> 
> Yes, and since you are extending it a lot now, it is high time it is fixed.
> 
> > Do you have a name you prefer?
> 
> As I said, I don't think all these things should be lumped together at
> all, and also you shouldn't precompute everything (as fixed values
> always, after that precompute) into arrays anyway.  Instead, use
> functions for all accessors, which can have simple and clear logic what
> they return when.
> 
> If it is hard to find good names for your interfaces, most likely your
> interfaces aren't structured very well.

Here is where I disagree.  I tend to think pre-computing the stuff saves time.
When you are using it with pre-computed masks and such, it takes 5-10
instructions to make the decision, while if you have to re-do the tests (for
example, checking size, checking whether it is int/fp/vector, it can involve
many more tests than a simple load/mask type test).

But if it is the only way to get things in, I can look at not using the address
masks any further, and instead have discreet tests.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH, V3, #7 of 10], Implement PCREL_OPT relocation optimization
  2019-09-06 12:09           ` Segher Boessenkool
@ 2019-09-09 20:32             ` Michael Meissner
  2019-09-09 20:56               ` Segher Boessenkool
  0 siblings, 1 reply; 42+ messages in thread
From: Michael Meissner @ 2019-09-09 20:32 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Michael Meissner, gcc-patches, dje.gcc

On Fri, Sep 06, 2019 at 07:09:45AM -0500, Segher Boessenkool wrote:
> On Wed, Sep 04, 2019 at 01:26:27PM -0400, Michael Meissner wrote:
> 
> [snip]
> 
> > So to keep other passes from 'improving' things, I opted to do the pass as the
> > last pass before final.
> 
> If the problem is that you do not properly analyse dependencies between
> insns, well, fix that?
> 
> If this really needs to be done after everything else GCC does, that is
> problematic.  What when you have two or more passes with that property?
> 
> If this really needs to be done after everything else GCC does, does it
> belong in the compiler at all?  Should the assembler do it instead, or
> the linker?

No, with the definition of the PCREL_OPT there can be only one reference.
Yeah, there might be other ways to do it, but fundamentally you need to do this
as late as possible and prevent any other optimizations from messing things up.

This is similar to figuring out whether a conditional branch is short enough or
you have to do reverse the conditional branch and do an unconditional jump.  If
you add any more code at that point that changes the sizes, it makes the whole
calculation moot.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH, V3, #7 of 10], Implement PCREL_OPT relocation optimization
  2019-09-09 20:32             ` Michael Meissner
@ 2019-09-09 20:56               ` Segher Boessenkool
  2019-09-09 22:39                 ` Michael Meissner
  0 siblings, 1 reply; 42+ messages in thread
From: Segher Boessenkool @ 2019-09-09 20:56 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, dje.gcc

On Mon, Sep 09, 2019 at 04:32:39PM -0400, Michael Meissner wrote:
> On Fri, Sep 06, 2019 at 07:09:45AM -0500, Segher Boessenkool wrote:
> > On Wed, Sep 04, 2019 at 01:26:27PM -0400, Michael Meissner wrote:
> > 
> > [snip]
> > 
> > > So to keep other passes from 'improving' things, I opted to do the pass as the
> > > last pass before final.
> > 
> > If the problem is that you do not properly analyse dependencies between
> > insns, well, fix that?
> > 
> > If this really needs to be done after everything else GCC does, that is
> > problematic.  What when you have two or more passes with that property?
> > 
> > If this really needs to be done after everything else GCC does, does it
> > belong in the compiler at all?  Should the assembler do it instead, or
> > the linker?
> 
> No, with the definition of the PCREL_OPT there can be only one reference.

I don't see why you think that argues for having to do it last?

> Yeah, there might be other ways to do it, but fundamentally you need to do this
> as late as possible and prevent any other optimizations from messing things up.

That is true for *everything*.


You haven't addressed the "if it should be after everything the compiler
does, does this belong in the compiler at all" question.


Segher

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH, V3, #7 of 10], Implement PCREL_OPT relocation optimization
  2019-09-09 20:56               ` Segher Boessenkool
@ 2019-09-09 22:39                 ` Michael Meissner
  0 siblings, 0 replies; 42+ messages in thread
From: Michael Meissner @ 2019-09-09 22:39 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Michael Meissner, gcc-patches, dje.gcc

On Mon, Sep 09, 2019 at 03:56:52PM -0500, Segher Boessenkool wrote:
> On Mon, Sep 09, 2019 at 04:32:39PM -0400, Michael Meissner wrote:
> > On Fri, Sep 06, 2019 at 07:09:45AM -0500, Segher Boessenkool wrote:
> > > On Wed, Sep 04, 2019 at 01:26:27PM -0400, Michael Meissner wrote:
> > > 
> > > [snip]
> > > 
> > > > So to keep other passes from 'improving' things, I opted to do the pass as the
> > > > last pass before final.
> > > 
> > > If the problem is that you do not properly analyse dependencies between
> > > insns, well, fix that?
> > > 
> > > If this really needs to be done after everything else GCC does, that is
> > > problematic.  What when you have two or more passes with that property?
> > > 
> > > If this really needs to be done after everything else GCC does, does it
> > > belong in the compiler at all?  Should the assembler do it instead, or
> > > the linker?
> > 
> > No, with the definition of the PCREL_OPT there can be only one reference.
> 
> I don't see why you think that argues for having to do it last?
> 
> > Yeah, there might be other ways to do it, but fundamentally you need to do this
> > as late as possible and prevent any other optimizations from messing things up.
> 
> That is true for *everything*.
> 
> 
> You haven't addressed the "if it should be after everything the compiler
> does, does this belong in the compiler at all" question.

I believe it falls out of the basic PCREL_OPT description which I have in the
comments to the code.

For the load case, if you have:

		pld 4,esym@got@pcrel
		addi 6,6,1
		lwz 5,0(4)

I.e. load up the addresss of 'esym' into register 4.  If 'esym' is defined in
another module and both are in the main program, the linker converts the PLD
into:

		pla 4,esym@pcrel

If instead esym is defined in a shared library or you are linking a shared
library, the linker rewrites this as:

		pld 4,.esym.got
		.section .got
	.esym.got:
		.quad esym
		.section .text

I.e. load up the address of 'esym' from an address in the data section that has
an external relocation to 'esym' and the runtime loader will fill in the
address after loading any shared libraries.

And you want to use the PCREL_OPT optimization, the following must be true:

    1) Between the PLD and LWZ, register 4 must not be referenced;
    2) Register 4 dies on the LWZ instruction;
    3) Register 5 is not used between PLD and LWZ.

If these hold, you can modify it to use the PCREL_OPT optimization:

		pld 4,esym@got@pcrel
	.Lpcrel1:
		addi 6,6,1
		.reloc .Lpcrel1-8,R_PPC64_PCREL_OPT,.-(.Lpcrel1-8)
		lwz 5,0(4)

Then if 'esym' is in the main program, and you are linking for the main
program, the linker can change this to:

		plwz 4,esym@pcrel
		addi 6,6,1
		nop

Thus if any other pass, duplicates the LWZ, uses the result of the PLD, or uses
register 5 in that sequence, it will be invalid.  Hence, why I think it should
be the last pass before final.

Similarly for the store case.  If you have:

		pld 4,esym@got@pcrel
		addi 6,6,1
		stw 5,0(4)

And you want to use the PCREL_OPT optimization, the following must be true:

    1) Between the PLD and STW, register 4 must not be referenced;
    2) Register 4 dies on the LWZ instruction;
    3) Register 5 must have the value in it at the time of the PLD, and it must
       not be modified between the PLD and STW.

The compiler would generate:

		pld 4,esym@got@pcrel
	.Lpcrel2:
		addi 6,6,1
		.reloc .Lpcrel2-8,R_PPC64_PCREL_OPT,.-(.Lpcrel2-8)
		stw 5,0(4)

And if the symbol is defined in the main program, and you are linking for the
main program, the linker will transform this to:

		pstw 5,esym@pcrel
		addi 6,6,1
		nop

The reason the .Lpcrel<x> label is defined after the PLD and we use
.Lpcrel<x>-8 is due to the prefixed instruction possibly having a NOP if it
otherwise would cross a 64-byte boundary, and you would have the relocation on
the wrong word.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2019-09-09 22:39 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-26 19:21 PowerPC future machine, version 3 Michael Meissner
2019-08-26 20:41 ` [PATCH V3, #1 of 10], Add basic pc-relative support Michael Meissner
2019-08-28 18:46   ` Segher Boessenkool
2019-08-28 21:48     ` Michael Meissner
2019-08-30  0:08       ` Segher Boessenkool
2019-09-06  0:18         ` Michael Meissner
2019-09-06 12:50           ` Segher Boessenkool
2019-09-09 20:28             ` Michael Meissner
2019-08-26 21:07 ` [PATCH, V3, #3 of 10], Add prefixed RTL insn attribute Michael Meissner
2019-08-30  1:58   ` Segher Boessenkool
2019-08-26 21:12 ` [PATCH, V3, #2 of 10], Improve rs6000_setup_addr_mask Michael Meissner
2019-08-29  2:59   ` Segher Boessenkool
2019-08-26 21:23 ` [PATCH, V3, #4 of 10], Add general prefixed/pcrel support Michael Meissner
2019-08-30 19:22   ` Segher Boessenkool
2019-08-31  3:08     ` Alan Modra
2019-08-31 14:13       ` Segher Boessenkool
2019-08-26 21:43 ` [PATCH, V3, #5 of 10], Make -mpcrel default on little endian Linux systems Michael Meissner
2019-08-30 19:46   ` Segher Boessenkool
2019-09-03 21:07     ` Michael Meissner
2019-09-03 22:25       ` Segher Boessenkool
2019-08-26 21:52 ` [PATCH, V3, #6 of 10], Fix vec_extract breakage Michael Meissner
2019-09-03 19:49   ` Segher Boessenkool
2019-09-05 20:48     ` Michael Meissner
2019-09-05 22:38       ` Segher Boessenkool
2019-09-06 10:26         ` Segher Boessenkool
2019-08-26 22:06 ` [PATCH, V3, #7 of 10], Implement PCREL_OPT relocation optimization Michael Meissner
2019-08-28 21:48   ` Michael Meissner
2019-09-03 22:56   ` Segher Boessenkool
2019-09-03 23:20     ` Michael Meissner
2019-09-03 23:33       ` Segher Boessenkool
2019-09-04 17:26         ` Michael Meissner
2019-09-06 12:09           ` Segher Boessenkool
2019-09-09 20:32             ` Michael Meissner
2019-09-09 20:56               ` Segher Boessenkool
2019-09-09 22:39                 ` Michael Meissner
2019-08-27  7:01 ` [PATCH, V3, #8 of 10], Miscellaneous prefixed addressing tests Michael Meissner
2019-09-03 23:17   ` Segher Boessenkool
2019-09-05 21:01     ` Michael Meissner
2019-09-05 22:57       ` Segher Boessenkool
2019-08-27  7:14 ` [PATCH, V3, #10 of #10], Pc-relative tests Michael Meissner
2019-08-27  7:55 ` [PATCH, V3, #9 of 10], Prefixed addressing tests with large offsets Michael Meissner
2019-09-03 23:22   ` Segher Boessenkool

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).