public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [RFC Patch], PowerPC memory support pre-gcc9, patch #1
@ 2018-03-14 23:01 Michael Meissner
  2018-03-15 17:09 ` [RFC Patch], PowerPC memory support pre-gcc9, patch #2 Michael Meissner
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Michael Meissner @ 2018-03-14 23:01 UTC (permalink / raw)
  To: GCC Patches, Segher Boessenkool, David Edelsohn, Bill Schmidt

[-- Attachment #1: Type: text/plain, Size: 6669 bytes --]

I am starting to work on cleaning up the memory addressing support in the GCC 9
time frame.  At the moment, I am working on upgrading the infrastructure to
allow in the future to prevent splitting memory on 64-bit LE systems too early,
rework the fusion support, and provide a pathway for future processor support.

The first patch in the series moves most of the reg_addr structure from
rs6000.c to rs6000-protos.h, so that in the next patch, we can start splitting
some of the address code to other files.

In addition to just moving the reg_addr stuff to be global, there are a few
minor changes in this patch:

    1)	I was playing with making r12 be fixed with a new option (not in this
	set of patches), and I noticed it wasn't reflected in the -mdebug=reg
	debug dump, due to the debug dump being done before the conditional
	registers are setup.  I made the debug dump set conditional registers.

    2)	I renamed some of the mode_suppports helper functions to be more
	consistent the instruction documentation (i.e. there are helper
	functions for normal d-form register+offset instructions, ds-form where
	the bottom 2 bits must be 0 and dq-form where the bottom 4 bits must be
	0).  I added optional arguments to the helper functions, so that
	secondary reload in the future can narrow down whether a particular
	register class has particular support.

    3)	I did a simplification in setting up the reg_addr address masks, where
	instead of 3 states which two set INDEXED and the other sets MULTIPLE
	register, it only has 1 place where it sets INDEXED.

I have tested this with full bootstrap and make check on a little endian power8
with no regressions.  Since we are in stage4 currently, I am not asking for
permission to check it in, but if you have any comments on how you would like
to see the eventual patches when stage1 opens up, let me know.  It would be
simpler to make the changes now, rather than when the number of patches have
accumulated.

The second patch that I will submit shortly will move the
rs6000_output_move_128bit function to a new file (rs6000-output.c).

The third patch that I will submit will be to move the movdi patterns to a
separate function (rs6000_output_move_64bit) also in rs6000-output.c, so that
in the future we can use C++ code to check on constraints, etc.

I haven't written it yet, but the fourth patch is likely to similarly move DF
and DD output templates to use the same function.  One thing I plan to do for
DF/DD is to structure the comments about the alternatives so that it is more
readable, much like I've done for movdi, etc.

I likely will remove the undocumented toc-fusion all together, and eventually
rework the p8/p9 fusion support.

2018-03-14  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* config/rs6000/rs6000-protos.h (regno_or_subregno): Add
	declaration.
	(enum rs6000_reg_type): Move the basic reg_addr support from
	rs6000.c to rs6000-protos.h, except for the parts that store
	insn_code's for the register allocator to allow future patches to
	move parts of rs6000.c to other files.  Change the bool flags to
	bit-fields.  Add a flag to indicate the mode/reload register class
	uses DS-form (14-bit offset) addresses.
	(reg_class_to_reg_type): Likwise.
	(IS_STD_REG_TYPE): Likwise.
	(IS_FP_VECT_REG_TYPE): Likwise.
	(enum rs6000_reload_reg_type): Likwise.
	(struct rs6000_reg_addr): Likwise.
	(reg_addr): Likwise.
	(RELOAD_REG_*): Likewise.
	(mode_supports_pre_incdec_p): Move the mode supports helper
	functions to rs6000-protos.h.  Add an optional argument to use a
	particular reload register class instead of RELOAD_REG_ANY.
	Rename mode_supports_vsx_dform_quad -> mode_supports_dq_form.  Add
	mode_supports_ds_form for DS-form addresses.  Add
	mode_supports_x_form for X-form (indexed) addresses.
	(mode_supports_pre_modify_p): Likewise.
	(mode_supports_d_form): Likewise.
	(mode_supports_ds_form): Likewise.
	(mode_supports_dq_form): Likewise.
	(mode_supports_x_form): Likewise.
	* config/rs6000/rs6000.c (enum rs6000_reg_type): Move basic
	reg_addr support to rs6000-protos.h.
	(IS_STD_REG_TYPE): Likewise.
	(IS_FP_VECT_REG_TYPE): Likewise.
	(enum rs6000_reload_reg_typ): Likewise.
	(reg_class_to_reg_type): Make global.
	(addr_mask_type): Move basic reg_addr support to rs6000-protos.h.
	(reg_addr): Make global.
	(RELOAD_REG_VALID): Move basic reg_addr support to
	rs6000-protos.h.
	(RELOAD_REG_*): Likewise.
	(struct rs6000_insn_functions): New structure that includes the
	parts of the old reg_addr structure that did not move to
	rs6000-protos.h because it contains insn codes.
	(rs6000_insns): Likewise.
	(mode_supports_pre_modify_p): Move to rs6000-protos.h.
	(mode_supports_pre_modify_p): Likewise.
	(mode_supports_vmx_dform): Likewise.
	(mode_supports_vsx_dform_quad): Likewise.
	(rs6000_conditional_register_usage): Add forward declaration.
	(rs6000_debug_addr_mask): Print whether the mode/reload register
	class uses DS-form memory instructions.
	(rs6000_debug_print_mode): Call rs6000_conditional_register_usage
	in order to print the status of the registers properly.  Use
	rs6000_insns instead of reg_addr for the insn code elements.
	(rs6000_setup_reg_addr_masks): Simplify the code to set whether a
	mode uses multiple registers or provides indexed mode.  Don't
	allow update addresses on modes that only support indexed mode.
	Note that DI/SI in 64-bit use DS-form addresses as well as ISA 3.0
	scalar altivec offset references.
	(rs6000_init_hard_regno_mode_ok): Clear rs6000_insns.  Change to
	use rs6000_insns instead of reg_addr for saving the insn codes for
	secondary reload and fusion support.
	(regno_or_subregno): Make global.
	(quad_address_p): Rename mode_supports_vsx_dform_quad to
	mode_supports_dq_form.
	(reg_offset_addressing_ok_p): Likewise.
	(offsettable_ok_by_alignment): Likewise.
	(rs6000_legitimate_offset_address_p): Likewise.
	(legitimate_lo_sum_address_p): Likewise.
	(rs6000_legitimize_address): Likewise.
	(rs6000_legitimize_reload_address): Likewise.
	(rs6000_legitimate_address_p): Likewise.
	(rs6000_secondary_reload_direct_move): Use rs6000_insns to pick up
	secondary reload and fusion insn codes instead of reg_addr.
	(rs6000_secondary_reload): Likewise.
	(rs6000_secondary_reload_inner): Rename
	mode_supports_vsx_dform_quad to mode_supports_dq_form.  Use
	mode_supports_d_form with RELOAD_REG_VMX instead of calling
	mode_supports_vmx_dform.
	(rs6000_preferred_reload_class): Likewise.
	(rs6000_secondary_reload_class): Likewise.
	(rs6000_output_move_128bit): Likewise.


-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

[-- Attachment #2: ext-addr.patch01b --]
[-- Type: text/plain, Size: 38803 bytes --]

Index: gcc/config/rs6000/rs6000-protos.h
===================================================================
--- gcc/config/rs6000/rs6000-protos.h	(revision 258530)
+++ gcc/config/rs6000/rs6000-protos.h	(working copy)
@@ -61,6 +61,7 @@ extern void paired_expand_vector_init (r
 extern void rs6000_expand_vector_set (rtx, rtx, int);
 extern void rs6000_expand_vector_extract (rtx, rtx, rtx);
 extern void rs6000_split_vec_extract_var (rtx, rtx, rtx, rtx, rtx);
+extern int regno_or_subregno (rtx);
 extern rtx rs6000_adjust_vec_address (rtx, rtx, rtx, rtx, machine_mode);
 extern void rs6000_split_v4si_init (rtx []);
 extern void altivec_expand_vec_perm_le (rtx op[4]);
@@ -258,4 +259,116 @@ extern bool rs6000_quadword_masked_addre
 extern rtx rs6000_gen_lvx (enum machine_mode, rtx, rtx);
 extern rtx rs6000_gen_stvx (enum machine_mode, rtx, rtx);
 
+\f
+/* Simplfy register classes into simpler classifications.  We assume
+   GPR_REG_TYPE - FPR_REG_TYPE are ordered so that we can use a simple range
+   check for standard register classes (gpr/floating/altivec/vsx) and
+   floating/vector classes (float/altivec/vsx).  */
+
+enum rs6000_reg_type {
+  NO_REG_TYPE,
+  PSEUDO_REG_TYPE,
+  GPR_REG_TYPE,
+  VSX_REG_TYPE,
+  ALTIVEC_REG_TYPE,
+  FPR_REG_TYPE,
+  SPR_REG_TYPE,
+  CR_REG_TYPE
+};
+
+/* Map register class to register type.  */
+extern enum rs6000_reg_type reg_class_to_reg_type[];
+
+/* First/last register type for the 'normal' register types (i.e. general
+   purpose, floating point, altivec, and VSX registers).  */
+#define IS_STD_REG_TYPE(RTYPE) IN_RANGE(RTYPE, GPR_REG_TYPE, FPR_REG_TYPE)
+
+#define IS_FP_VECT_REG_TYPE(RTYPE) IN_RANGE(RTYPE, VSX_REG_TYPE, FPR_REG_TYPE)
+
+/* Register classes we care about in secondary reload or go if legitimate
+   address.  We only need to worry about GPR, FPR, and Altivec registers here,
+   along an ANY field that is the OR of the 3 register classes.  */
+
+enum rs6000_reload_reg_type {
+  RELOAD_REG_GPR,			/* General purpose registers.  */
+  RELOAD_REG_FPR,			/* Traditional floating point regs.  */
+  RELOAD_REG_VMX,			/* Altivec (VMX) registers.  */
+  RELOAD_REG_ANY,			/* OR of GPR, FPR, Altivec masks.  */
+  N_RELOAD_REG
+};
+
+/* Mask bits for each register class, indexed per mode.  Historically the
+   compiler has been more restrictive which types can do PRE_MODIFY instead of
+   PRE_INC and PRE_DEC, so keep track of sepaate bits for these two.  */
+typedef unsigned short addr_mask_type;
+
+#define RELOAD_REG_VALID	0x001	/* Mode valid in register..  */
+#define RELOAD_REG_MULTIPLE	0x002	/* Mode takes multiple registers.  */
+#define RELOAD_REG_INDEXED	0x004	/* Reg+reg addressing.  */
+#define RELOAD_REG_OFFSET	0x008	/* Reg+offset addressing. */
+#define RELOAD_REG_PRE_INCDEC	0x010	/* PRE_INC/PRE_DEC valid.  */
+#define RELOAD_REG_PRE_MODIFY	0x020	/* PRE_MODIFY valid.  */
+#define RELOAD_REG_AND_M16	0x040	/* AND -16 addressing.  */
+#define RELOAD_REG_QUAD_OFFSET	0x080	/* Bottom 4 bits must be 0.  */
+#define RELOAD_REG_DS_OFFSET	0x100	/* Bottom 2 bits must be 0.  */
+
+/* Register type masks based on the type, of valid addressing modes.  */
+struct rs6000_reg_addr {
+  addr_mask_type addr_mask[(int)N_RELOAD_REG];	/* Valid address masks.  */
+  unsigned char scalar_in_vmx_p	: 1;		/* Scalar can go in VMX.  */
+  unsigned char fused_toc	: 1;		/* Mode supports TOC fusion.  */
+};
+
+extern struct rs6000_reg_addr reg_addr[];
+
+/* Helper function to say whether a mode supports PRE_INC or PRE_DEC.  */
+static inline bool
+mode_supports_pre_incdec_p (machine_mode mode,
+			    enum rs6000_reload_reg_type rt = RELOAD_REG_ANY)
+{
+  return ((reg_addr[mode].addr_mask[rt] & RELOAD_REG_PRE_INCDEC) != 0);
+}
+
+/* Helper function to say whether a mode supports PRE_MODIFY.  */
+static inline bool
+mode_supports_pre_modify_p (machine_mode mode,
+			    enum rs6000_reload_reg_type rt = RELOAD_REG_ANY)
+{
+  return ((reg_addr[mode].addr_mask[rt] & RELOAD_REG_PRE_MODIFY) != 0);
+}
+
+/* Return true if we have offset addressing (d-form).  The offset may be 12 bit
+   (dq-form), 14 bits (ds-form), or 16 (d-form) bits.  */
+static inline bool
+mode_supports_d_form (machine_mode mode,
+		      enum rs6000_reload_reg_type rt = RELOAD_REG_ANY)
+{
+  return ((reg_addr[mode].addr_mask[rt] & RELOAD_REG_OFFSET) != 0);
+}
+
+/* Return true if we have DS-form addressing in any registers where the bottom
+   2 bits must be 0 (i.e. LD, ST, etc.).  */
+static inline bool
+mode_supports_ds_form (machine_mode mode,
+		       enum rs6000_reload_reg_type rt = RELOAD_REG_ANY)
+{
+  return ((reg_addr[mode].addr_mask[rt] & RELOAD_REG_DS_OFFSET) != 0);
+}
+
+/* Return true if we have DQ-form addressing.  The bottom 4 bits must be 0.  */
+static inline bool
+mode_supports_dq_form (machine_mode mode,
+		       enum rs6000_reload_reg_type rt = RELOAD_REG_ANY)
+{
+  return ((reg_addr[mode].addr_mask[rt] & RELOAD_REG_QUAD_OFFSET) != 0);
+}
+
+/* Return true if we have indexed addressing (x-form).  */
+static inline bool
+mode_supports_x_form (machine_mode mode,
+		      enum rs6000_reload_reg_type rt = RELOAD_REG_ANY)
+{
+  return ((reg_addr[mode].addr_mask[rt] & RELOAD_REG_INDEXED) != 0);
+}
+
 #endif  /* rs6000-protos.h */
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 258531)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -443,43 +443,8 @@ bool cpu_builtin_p;
    don't link in rs6000-c.c, so we can't call it directly.  */
 void (*rs6000_target_modify_macros_ptr) (bool, HOST_WIDE_INT, HOST_WIDE_INT);
 
-/* Simplfy register classes into simpler classifications.  We assume
-   GPR_REG_TYPE - FPR_REG_TYPE are ordered so that we can use a simple range
-   check for standard register classes (gpr/floating/altivec/vsx) and
-   floating/vector classes (float/altivec/vsx).  */
-
-enum rs6000_reg_type {
-  NO_REG_TYPE,
-  PSEUDO_REG_TYPE,
-  GPR_REG_TYPE,
-  VSX_REG_TYPE,
-  ALTIVEC_REG_TYPE,
-  FPR_REG_TYPE,
-  SPR_REG_TYPE,
-  CR_REG_TYPE
-};
-
 /* Map register class to register type.  */
-static enum rs6000_reg_type reg_class_to_reg_type[N_REG_CLASSES];
-
-/* First/last register type for the 'normal' register types (i.e. general
-   purpose, floating point, altivec, and VSX registers).  */
-#define IS_STD_REG_TYPE(RTYPE) IN_RANGE(RTYPE, GPR_REG_TYPE, FPR_REG_TYPE)
-
-#define IS_FP_VECT_REG_TYPE(RTYPE) IN_RANGE(RTYPE, VSX_REG_TYPE, FPR_REG_TYPE)
-
-
-/* Register classes we care about in secondary reload or go if legitimate
-   address.  We only need to worry about GPR, FPR, and Altivec registers here,
-   along an ANY field that is the OR of the 3 register classes.  */
-
-enum rs6000_reload_reg_type {
-  RELOAD_REG_GPR,			/* General purpose registers.  */
-  RELOAD_REG_FPR,			/* Traditional floating point regs.  */
-  RELOAD_REG_VMX,			/* Altivec (VMX) registers.  */
-  RELOAD_REG_ANY,			/* OR of GPR, FPR, Altivec masks.  */
-  N_RELOAD_REG
-};
+enum rs6000_reg_type reg_class_to_reg_type[N_REG_CLASSES];
 
 /* For setting up register classes, loop through the 3 register classes mapping
    into real registers, and skip the ANY class, which is just an OR of the
@@ -500,22 +465,12 @@ static const struct reload_reg_map_type 
   { "Any",	-1 },			/* RELOAD_REG_ANY.  */
 };
 
-/* Mask bits for each register class, indexed per mode.  Historically the
-   compiler has been more restrictive which types can do PRE_MODIFY instead of
-   PRE_INC and PRE_DEC, so keep track of sepaate bits for these two.  */
-typedef unsigned char addr_mask_type;
-
-#define RELOAD_REG_VALID	0x01	/* Mode valid in register..  */
-#define RELOAD_REG_MULTIPLE	0x02	/* Mode takes multiple registers.  */
-#define RELOAD_REG_INDEXED	0x04	/* Reg+reg addressing.  */
-#define RELOAD_REG_OFFSET	0x08	/* Reg+offset addressing. */
-#define RELOAD_REG_PRE_INCDEC	0x10	/* PRE_INC/PRE_DEC valid.  */
-#define RELOAD_REG_PRE_MODIFY	0x20	/* PRE_MODIFY valid.  */
-#define RELOAD_REG_AND_M16	0x40	/* AND -16 addressing.  */
-#define RELOAD_REG_QUAD_OFFSET	0x80	/* quad offset is limited.  */
-
 /* Register type masks based on the type, of valid addressing modes.  */
-struct rs6000_reg_addr {
+struct rs6000_reg_addr reg_addr[NUM_MACHINE_MODES];
+
+/* Insns to use for loading/storing various register types, and for creating
+   various combined fusion instructions.  */
+struct rs6000_insn_functions {
   enum insn_code reload_load;		/* INSN to reload for loading. */
   enum insn_code reload_store;		/* INSN to reload for storing.  */
   enum insn_code reload_fpr_gpr;	/* INSN to move from FPR to GPR.  */
@@ -530,28 +485,9 @@ struct rs6000_reg_addr {
 					   or stores for each reg. class.  */					   
   enum insn_code fusion_addis_ld[(int)N_RELOAD_REG];
   enum insn_code fusion_addis_st[(int)N_RELOAD_REG];
-  addr_mask_type addr_mask[(int)N_RELOAD_REG]; /* Valid address masks.  */
-  bool scalar_in_vmx_p;			/* Scalar value can go in VMX.  */
-  bool fused_toc;			/* Mode supports TOC fusion.  */
 };
 
-static struct rs6000_reg_addr reg_addr[NUM_MACHINE_MODES];
-
-/* Helper function to say whether a mode supports PRE_INC or PRE_DEC.  */
-static inline bool
-mode_supports_pre_incdec_p (machine_mode mode)
-{
-  return ((reg_addr[mode].addr_mask[RELOAD_REG_ANY] & RELOAD_REG_PRE_INCDEC)
-	  != 0);
-}
-
-/* Helper function to say whether a mode supports PRE_MODIFY.  */
-static inline bool
-mode_supports_pre_modify_p (machine_mode mode)
-{
-  return ((reg_addr[mode].addr_mask[RELOAD_REG_ANY] & RELOAD_REG_PRE_MODIFY)
-	  != 0);
-}
+static struct rs6000_insn_functions rs6000_insns[NUM_MACHINE_MODES];
 
 /* Given that there exists at least one variable that is set (produced)
    by OUT_INSN and read (consumed) by IN_INSN, return true iff
@@ -638,23 +574,6 @@ rs6000_store_data_bypass_p (rtx_insn *ou
   return store_data_bypass_p (out_insn, in_insn);
 }
 
-/* Return true if we have D-form addressing in altivec registers.  */
-static inline bool
-mode_supports_vmx_dform (machine_mode mode)
-{
-  return ((reg_addr[mode].addr_mask[RELOAD_REG_VMX] & RELOAD_REG_OFFSET) != 0);
-}
-
-/* Return true if we have D-form addressing in VSX registers.  This addressing
-   is more limited than normal d-form addressing in that the offset must be
-   aligned on a 16-byte boundary.  */
-static inline bool
-mode_supports_vsx_dform_quad (machine_mode mode)
-{
-  return ((reg_addr[mode].addr_mask[RELOAD_REG_ANY] & RELOAD_REG_QUAD_OFFSET)
-	  != 0);
-}
-
 \f
 /* Processor costs (relative to an add) */
 
@@ -1395,6 +1314,7 @@ static bool rs6000_debug_can_change_mode
 						reg_class_t);
 static bool rs6000_save_toc_in_prologue_p (void);
 static rtx rs6000_internal_arg_pointer (void);
+static void rs6000_conditional_register_usage (void);
 
 rtx (*rs6000_legitimize_reload_address_ptr) (rtx, machine_mode, int, int,
 					     int, int *)
@@ -2224,6 +2144,10 @@ rs6000_debug_reg_print (int first_regno,
 {
   int r, m;
 
+  /* Insure the conditional registers are up to date when printing the debug
+     information.  */
+  rs6000_conditional_register_usage ();
+
   for (r = first_regno; r <= last_regno; ++r)
     {
       const char *comma = "";
@@ -2344,6 +2268,8 @@ rs6000_debug_addr_mask (addr_mask_type m
 
   if ((mask & RELOAD_REG_QUAD_OFFSET) != 0)
     *p++ = 'O';
+  else if ((mask & RELOAD_REG_DS_OFFSET) != 0)
+    *p++ = 'D';
   else if ((mask & RELOAD_REG_OFFSET) != 0)
     *p++ = 'o';
   else if (keep_spaces)
@@ -2382,11 +2308,11 @@ rs6000_debug_print_mode (ssize_t m)
     fprintf (stderr, " %s: %s", reload_reg_map[rc].name,
 	     rs6000_debug_addr_mask (reg_addr[m].addr_mask[rc], true));
 
-  if ((reg_addr[m].reload_store != CODE_FOR_nothing)
-      || (reg_addr[m].reload_load != CODE_FOR_nothing))
+  if ((rs6000_insns[m].reload_store != CODE_FOR_nothing)
+      || (rs6000_insns[m].reload_load != CODE_FOR_nothing))
     fprintf (stderr, "  Reload=%c%c",
-	     (reg_addr[m].reload_store != CODE_FOR_nothing) ? 's' : '*',
-	     (reg_addr[m].reload_load != CODE_FOR_nothing) ? 'l' : '*');
+	     (rs6000_insns[m].reload_store != CODE_FOR_nothing) ? 's' : '*',
+	     (rs6000_insns[m].reload_load != CODE_FOR_nothing) ? 'l' : '*');
   else
     spaces += sizeof ("  Reload=sl") - 1;
 
@@ -2398,7 +2324,7 @@ rs6000_debug_print_mode (ssize_t m)
   else
     spaces += sizeof ("  Upper=y") - 1;
 
-  fuse_extra_p = ((reg_addr[m].fusion_gpr_ld != CODE_FOR_nothing)
+  fuse_extra_p = (rs6000_insns[m].fusion_gpr_ld != CODE_FOR_nothing
 		  || reg_addr[m].fused_toc);
   if (!fuse_extra_p)
     {
@@ -2406,11 +2332,11 @@ rs6000_debug_print_mode (ssize_t m)
 	{
 	  if (rc != RELOAD_REG_ANY)
 	    {
-	      if (reg_addr[m].fusion_addi_ld[rc]     != CODE_FOR_nothing
-		  || reg_addr[m].fusion_addi_ld[rc]  != CODE_FOR_nothing
-		  || reg_addr[m].fusion_addi_st[rc]  != CODE_FOR_nothing
-		  || reg_addr[m].fusion_addis_ld[rc] != CODE_FOR_nothing
-		  || reg_addr[m].fusion_addis_st[rc] != CODE_FOR_nothing)
+	      if (rs6000_insns[m].fusion_addi_ld[rc]     != CODE_FOR_nothing
+		  || rs6000_insns[m].fusion_addi_ld[rc]  != CODE_FOR_nothing
+		  || rs6000_insns[m].fusion_addi_st[rc]  != CODE_FOR_nothing
+		  || rs6000_insns[m].fusion_addis_ld[rc] != CODE_FOR_nothing
+		  || rs6000_insns[m].fusion_addis_st[rc] != CODE_FOR_nothing)
 		{
 		  fuse_extra_p = true;
 		  break;
@@ -2430,16 +2356,16 @@ rs6000_debug_print_mode (ssize_t m)
 	    {
 	      char load, store;
 
-	      if (reg_addr[m].fusion_addis_ld[rc] != CODE_FOR_nothing)
+	      if (rs6000_insns[m].fusion_addis_ld[rc] != CODE_FOR_nothing)
 		load = 'l';
-	      else if (reg_addr[m].fusion_addi_ld[rc] != CODE_FOR_nothing)
+	      else if (rs6000_insns[m].fusion_addi_ld[rc] != CODE_FOR_nothing)
 		load = 'L';
 	      else
 		load = '-';
 
-	      if (reg_addr[m].fusion_addis_st[rc] != CODE_FOR_nothing)
+	      if (rs6000_insns[m].fusion_addis_st[rc] != CODE_FOR_nothing)
 		store = 's';
-	      else if (reg_addr[m].fusion_addi_st[rc] != CODE_FOR_nothing)
+	      else if (rs6000_insns[m].fusion_addi_st[rc] != CODE_FOR_nothing)
 		store = 'S';
 	      else
 		store = '-';
@@ -2455,7 +2381,7 @@ rs6000_debug_print_mode (ssize_t m)
 	    }
 	}
 
-      if (reg_addr[m].fusion_gpr_ld != CODE_FOR_nothing)
+      if (rs6000_insns[m].fusion_gpr_ld != CODE_FOR_nothing)
 	{
 	  fprintf (stderr, "%*sP8gpr", (spaces + 1), "");
 	  spaces = 0;
@@ -2951,8 +2877,11 @@ rs6000_setup_reg_addr_masks (void)
 
       /* SDmode is special in that we want to access it only via REG+REG
 	 addressing on power7 and above, since we want to use the LFIWZX and
-	 STFIWZX instructions to load it.  */
-      bool indexed_only_p = (m == SDmode && TARGET_NO_SDMODE_STACK);
+	 STFIWZX instructions to load it.  Paired floating point is also
+	 only indexed mode.  */
+      bool indexed_only_p = ((m == E_SDmode && TARGET_NO_SDMODE_STACK)
+			     || (TARGET_PAIRED_FLOAT
+				 && (m == E_V2SImode || m == E_V2SFmode)));
 
       any_addr_mask = 0;
       for (rc = FIRST_RELOAD_REG_CLASS; rc <= LAST_RELOAD_REG_CLASS; rc++)
@@ -2972,11 +2901,8 @@ rs6000_setup_reg_addr_masks (void)
 
 	      /* Indicate if the mode takes more than 1 physical register.  If
 		 it takes a single register, indicate it can do REG+REG
-		 addressing.  Small integers in VSX registers can only do
-		 REG+REG addressing.  */
-	      if (small_int_vsx_p)
-		addr_mask |= RELOAD_REG_INDEXED;
-	      else if (nregs > 1 || m == BLKmode || complex_p)
+		 addressing.  */
+	      if (nregs > 1 || m == BLKmode || complex_p)
 		addr_mask |= RELOAD_REG_MULTIPLE;
 	      else
 		addr_mask |= RELOAD_REG_INDEXED;
@@ -3001,7 +2927,8 @@ rs6000_setup_reg_addr_masks (void)
 		  && !complex_p
 		  && (m != E_DFmode || !TARGET_VSX)
 		  && (m != E_SFmode || !TARGET_P8_VECTOR)
-		  && !small_int_vsx_p)
+		  && !small_int_vsx_p
+		  && !indexed_only_p)
 		{
 		  addr_mask |= RELOAD_REG_PRE_INCDEC;
 
@@ -3053,6 +2980,22 @@ rs6000_setup_reg_addr_masks (void)
 		addr_mask |= RELOAD_REG_QUAD_OFFSET;
 	    }
 
+	  /* LD and STD are DS-form instructions, which must have the bottom 2
+	     bits be 0.  However, since DFmode is primarily used in the
+	     floating point/vector registers, don't restrict the offsets in ISA
+	     2.xx.  */
+	  if (rc == RELOAD_REG_GPR && msize == 8 && TARGET_POWERPC64
+	      && (addr_mask & RELOAD_REG_OFFSET) != 0
+	      && INTEGRAL_MODE_P (m2))
+	    addr_mask |= RELOAD_REG_DS_OFFSET;
+
+	  /* ISA 3.0 LXSD, LXSSP, STXSD, STXSSP altivec load/store instructions
+	     are DS-FORM.  */
+	  else if (rc == RELOAD_REG_VMX && TARGET_P9_VECTOR
+		   && (addr_mask & RELOAD_REG_OFFSET) != 0
+		   && (msize == 8 ||  m2 == SFmode))
+	    addr_mask |= RELOAD_REG_DS_OFFSET;
+
 	  /* VMX registers can do (REG & -16) and ((REG+REG) & -16)
 	     addressing on 128-bit types.  */
 	  if (rc == RELOAD_REG_VMX && msize == 16
@@ -3142,6 +3085,7 @@ rs6000_init_hard_regno_mode_ok (bool glo
 
   gcc_assert ((int)CODE_FOR_nothing == 0);
   memset ((void *) &reg_addr[0], '\0', sizeof (reg_addr));
+  memset ((void *) &rs6000_insns, '\0', sizeof (rs6000_insns));
 
   gcc_assert ((int)NO_REGS == 0);
   memset ((void *) &rs6000_constraints[0], '\0', sizeof (rs6000_constraints));
@@ -3394,142 +3338,167 @@ rs6000_init_hard_regno_mode_ok (bool glo
     {
       if (TARGET_64BIT)
 	{
-	  reg_addr[V16QImode].reload_store = CODE_FOR_reload_v16qi_di_store;
-	  reg_addr[V16QImode].reload_load  = CODE_FOR_reload_v16qi_di_load;
-	  reg_addr[V8HImode].reload_store  = CODE_FOR_reload_v8hi_di_store;
-	  reg_addr[V8HImode].reload_load   = CODE_FOR_reload_v8hi_di_load;
-	  reg_addr[V4SImode].reload_store  = CODE_FOR_reload_v4si_di_store;
-	  reg_addr[V4SImode].reload_load   = CODE_FOR_reload_v4si_di_load;
-	  reg_addr[V2DImode].reload_store  = CODE_FOR_reload_v2di_di_store;
-	  reg_addr[V2DImode].reload_load   = CODE_FOR_reload_v2di_di_load;
-	  reg_addr[V1TImode].reload_store  = CODE_FOR_reload_v1ti_di_store;
-	  reg_addr[V1TImode].reload_load   = CODE_FOR_reload_v1ti_di_load;
-	  reg_addr[V4SFmode].reload_store  = CODE_FOR_reload_v4sf_di_store;
-	  reg_addr[V4SFmode].reload_load   = CODE_FOR_reload_v4sf_di_load;
-	  reg_addr[V2DFmode].reload_store  = CODE_FOR_reload_v2df_di_store;
-	  reg_addr[V2DFmode].reload_load   = CODE_FOR_reload_v2df_di_load;
-	  reg_addr[DFmode].reload_store    = CODE_FOR_reload_df_di_store;
-	  reg_addr[DFmode].reload_load     = CODE_FOR_reload_df_di_load;
-	  reg_addr[DDmode].reload_store    = CODE_FOR_reload_dd_di_store;
-	  reg_addr[DDmode].reload_load     = CODE_FOR_reload_dd_di_load;
-	  reg_addr[SFmode].reload_store    = CODE_FOR_reload_sf_di_store;
-	  reg_addr[SFmode].reload_load     = CODE_FOR_reload_sf_di_load;
+	  rs6000_insns[V16QImode].reload_store = CODE_FOR_reload_v16qi_di_store;
+	  rs6000_insns[V16QImode].reload_load  = CODE_FOR_reload_v16qi_di_load;
+	  rs6000_insns[V8HImode].reload_store  = CODE_FOR_reload_v8hi_di_store;
+	  rs6000_insns[V8HImode].reload_load   = CODE_FOR_reload_v8hi_di_load;
+	  rs6000_insns[V4SImode].reload_store  = CODE_FOR_reload_v4si_di_store;
+	  rs6000_insns[V4SImode].reload_load   = CODE_FOR_reload_v4si_di_load;
+	  rs6000_insns[V2DImode].reload_store  = CODE_FOR_reload_v2di_di_store;
+	  rs6000_insns[V2DImode].reload_load   = CODE_FOR_reload_v2di_di_load;
+	  rs6000_insns[V1TImode].reload_store  = CODE_FOR_reload_v1ti_di_store;
+	  rs6000_insns[V1TImode].reload_load   = CODE_FOR_reload_v1ti_di_load;
+	  rs6000_insns[V4SFmode].reload_store  = CODE_FOR_reload_v4sf_di_store;
+	  rs6000_insns[V4SFmode].reload_load   = CODE_FOR_reload_v4sf_di_load;
+	  rs6000_insns[V2DFmode].reload_store  = CODE_FOR_reload_v2df_di_store;
+	  rs6000_insns[V2DFmode].reload_load   = CODE_FOR_reload_v2df_di_load;
+	  rs6000_insns[DFmode].reload_store    = CODE_FOR_reload_df_di_store;
+	  rs6000_insns[DFmode].reload_load     = CODE_FOR_reload_df_di_load;
+	  rs6000_insns[DDmode].reload_store    = CODE_FOR_reload_dd_di_store;
+	  rs6000_insns[DDmode].reload_load     = CODE_FOR_reload_dd_di_load;
+	  rs6000_insns[SFmode].reload_store    = CODE_FOR_reload_sf_di_store;
+	  rs6000_insns[SFmode].reload_load     = CODE_FOR_reload_sf_di_load;
 
 	  if (FLOAT128_VECTOR_P (KFmode))
 	    {
-	      reg_addr[KFmode].reload_store = CODE_FOR_reload_kf_di_store;
-	      reg_addr[KFmode].reload_load  = CODE_FOR_reload_kf_di_load;
+	      rs6000_insns[KFmode].reload_store = CODE_FOR_reload_kf_di_store;
+	      rs6000_insns[KFmode].reload_load  = CODE_FOR_reload_kf_di_load;
 	    }
 
 	  if (FLOAT128_VECTOR_P (TFmode))
 	    {
-	      reg_addr[TFmode].reload_store = CODE_FOR_reload_tf_di_store;
-	      reg_addr[TFmode].reload_load  = CODE_FOR_reload_tf_di_load;
+	      rs6000_insns[TFmode].reload_store = CODE_FOR_reload_tf_di_store;
+	      rs6000_insns[TFmode].reload_load  = CODE_FOR_reload_tf_di_load;
 	    }
 
 	  /* Only provide a reload handler for SDmode if lfiwzx/stfiwx are
 	     available.  */
 	  if (TARGET_NO_SDMODE_STACK)
 	    {
-	      reg_addr[SDmode].reload_store = CODE_FOR_reload_sd_di_store;
-	      reg_addr[SDmode].reload_load  = CODE_FOR_reload_sd_di_load;
+	      rs6000_insns[SDmode].reload_store = CODE_FOR_reload_sd_di_store;
+	      rs6000_insns[SDmode].reload_load  = CODE_FOR_reload_sd_di_load;
 	    }
 
 	  if (TARGET_VSX)
 	    {
-	      reg_addr[TImode].reload_store  = CODE_FOR_reload_ti_di_store;
-	      reg_addr[TImode].reload_load   = CODE_FOR_reload_ti_di_load;
+	      rs6000_insns[TImode].reload_store  = CODE_FOR_reload_ti_di_store;
+	      rs6000_insns[TImode].reload_load   = CODE_FOR_reload_ti_di_load;
 	    }
 
 	  if (TARGET_DIRECT_MOVE && !TARGET_DIRECT_MOVE_128)
 	    {
-	      reg_addr[TImode].reload_gpr_vsx    = CODE_FOR_reload_gpr_from_vsxti;
-	      reg_addr[V1TImode].reload_gpr_vsx  = CODE_FOR_reload_gpr_from_vsxv1ti;
-	      reg_addr[V2DFmode].reload_gpr_vsx  = CODE_FOR_reload_gpr_from_vsxv2df;
-	      reg_addr[V2DImode].reload_gpr_vsx  = CODE_FOR_reload_gpr_from_vsxv2di;
-	      reg_addr[V4SFmode].reload_gpr_vsx  = CODE_FOR_reload_gpr_from_vsxv4sf;
-	      reg_addr[V4SImode].reload_gpr_vsx  = CODE_FOR_reload_gpr_from_vsxv4si;
-	      reg_addr[V8HImode].reload_gpr_vsx  = CODE_FOR_reload_gpr_from_vsxv8hi;
-	      reg_addr[V16QImode].reload_gpr_vsx = CODE_FOR_reload_gpr_from_vsxv16qi;
-	      reg_addr[SFmode].reload_gpr_vsx    = CODE_FOR_reload_gpr_from_vsxsf;
-
-	      reg_addr[TImode].reload_vsx_gpr    = CODE_FOR_reload_vsx_from_gprti;
-	      reg_addr[V1TImode].reload_vsx_gpr  = CODE_FOR_reload_vsx_from_gprv1ti;
-	      reg_addr[V2DFmode].reload_vsx_gpr  = CODE_FOR_reload_vsx_from_gprv2df;
-	      reg_addr[V2DImode].reload_vsx_gpr  = CODE_FOR_reload_vsx_from_gprv2di;
-	      reg_addr[V4SFmode].reload_vsx_gpr  = CODE_FOR_reload_vsx_from_gprv4sf;
-	      reg_addr[V4SImode].reload_vsx_gpr  = CODE_FOR_reload_vsx_from_gprv4si;
-	      reg_addr[V8HImode].reload_vsx_gpr  = CODE_FOR_reload_vsx_from_gprv8hi;
-	      reg_addr[V16QImode].reload_vsx_gpr = CODE_FOR_reload_vsx_from_gprv16qi;
-	      reg_addr[SFmode].reload_vsx_gpr    = CODE_FOR_reload_vsx_from_gprsf;
+	      rs6000_insns[TImode].reload_gpr_vsx
+		= CODE_FOR_reload_gpr_from_vsxti;
+	      rs6000_insns[V1TImode].reload_gpr_vsx
+		= CODE_FOR_reload_gpr_from_vsxv1ti;
+	      rs6000_insns[V2DFmode].reload_gpr_vsx
+		= CODE_FOR_reload_gpr_from_vsxv2df;
+	      rs6000_insns[V2DImode].reload_gpr_vsx
+		= CODE_FOR_reload_gpr_from_vsxv2di;
+	      rs6000_insns[V4SFmode].reload_gpr_vsx
+		= CODE_FOR_reload_gpr_from_vsxv4sf;
+	      rs6000_insns[V4SImode].reload_gpr_vsx
+		= CODE_FOR_reload_gpr_from_vsxv4si;
+	      rs6000_insns[V8HImode].reload_gpr_vsx
+		= CODE_FOR_reload_gpr_from_vsxv8hi;
+	      rs6000_insns[V16QImode].reload_gpr_vsx
+		= CODE_FOR_reload_gpr_from_vsxv16qi;
+	      rs6000_insns[SFmode].reload_gpr_vsx
+		= CODE_FOR_reload_gpr_from_vsxsf;
+
+	      rs6000_insns[TImode].reload_vsx_gpr
+		= CODE_FOR_reload_vsx_from_gprti;
+	      rs6000_insns[V1TImode].reload_vsx_gpr
+		= CODE_FOR_reload_vsx_from_gprv1ti;
+	      rs6000_insns[V2DFmode].reload_vsx_gpr
+		= CODE_FOR_reload_vsx_from_gprv2df;
+	      rs6000_insns[V2DImode].reload_vsx_gpr
+		= CODE_FOR_reload_vsx_from_gprv2di;
+	      rs6000_insns[V4SFmode].reload_vsx_gpr
+		= CODE_FOR_reload_vsx_from_gprv4sf;
+	      rs6000_insns[V4SImode].reload_vsx_gpr
+		= CODE_FOR_reload_vsx_from_gprv4si;
+	      rs6000_insns[V8HImode].reload_vsx_gpr
+		= CODE_FOR_reload_vsx_from_gprv8hi;
+	      rs6000_insns[V16QImode].reload_vsx_gpr
+		= CODE_FOR_reload_vsx_from_gprv16qi;
+	      rs6000_insns[SFmode].reload_vsx_gpr
+		= CODE_FOR_reload_vsx_from_gprsf;
 
 	      if (FLOAT128_VECTOR_P (KFmode))
 		{
-		  reg_addr[KFmode].reload_gpr_vsx = CODE_FOR_reload_gpr_from_vsxkf;
-		  reg_addr[KFmode].reload_vsx_gpr = CODE_FOR_reload_vsx_from_gprkf;
+		  rs6000_insns[KFmode].reload_gpr_vsx
+		    = CODE_FOR_reload_gpr_from_vsxkf;
+		  rs6000_insns[KFmode].reload_vsx_gpr
+		    = CODE_FOR_reload_vsx_from_gprkf;
 		}
 
 	      if (FLOAT128_VECTOR_P (TFmode))
 		{
-		  reg_addr[TFmode].reload_gpr_vsx = CODE_FOR_reload_gpr_from_vsxtf;
-		  reg_addr[TFmode].reload_vsx_gpr = CODE_FOR_reload_vsx_from_gprtf;
+		  rs6000_insns[TFmode].reload_gpr_vsx
+		    = CODE_FOR_reload_gpr_from_vsxtf;
+		  rs6000_insns[TFmode].reload_vsx_gpr
+		    = CODE_FOR_reload_vsx_from_gprtf;
 		}
 	    }
 	}
       else
 	{
-	  reg_addr[V16QImode].reload_store = CODE_FOR_reload_v16qi_si_store;
-	  reg_addr[V16QImode].reload_load  = CODE_FOR_reload_v16qi_si_load;
-	  reg_addr[V8HImode].reload_store  = CODE_FOR_reload_v8hi_si_store;
-	  reg_addr[V8HImode].reload_load   = CODE_FOR_reload_v8hi_si_load;
-	  reg_addr[V4SImode].reload_store  = CODE_FOR_reload_v4si_si_store;
-	  reg_addr[V4SImode].reload_load   = CODE_FOR_reload_v4si_si_load;
-	  reg_addr[V2DImode].reload_store  = CODE_FOR_reload_v2di_si_store;
-	  reg_addr[V2DImode].reload_load   = CODE_FOR_reload_v2di_si_load;
-	  reg_addr[V1TImode].reload_store  = CODE_FOR_reload_v1ti_si_store;
-	  reg_addr[V1TImode].reload_load   = CODE_FOR_reload_v1ti_si_load;
-	  reg_addr[V4SFmode].reload_store  = CODE_FOR_reload_v4sf_si_store;
-	  reg_addr[V4SFmode].reload_load   = CODE_FOR_reload_v4sf_si_load;
-	  reg_addr[V2DFmode].reload_store  = CODE_FOR_reload_v2df_si_store;
-	  reg_addr[V2DFmode].reload_load   = CODE_FOR_reload_v2df_si_load;
-	  reg_addr[DFmode].reload_store    = CODE_FOR_reload_df_si_store;
-	  reg_addr[DFmode].reload_load     = CODE_FOR_reload_df_si_load;
-	  reg_addr[DDmode].reload_store    = CODE_FOR_reload_dd_si_store;
-	  reg_addr[DDmode].reload_load     = CODE_FOR_reload_dd_si_load;
-	  reg_addr[SFmode].reload_store    = CODE_FOR_reload_sf_si_store;
-	  reg_addr[SFmode].reload_load     = CODE_FOR_reload_sf_si_load;
+	  rs6000_insns[V16QImode].reload_store = CODE_FOR_reload_v16qi_si_store;
+	  rs6000_insns[V16QImode].reload_load  = CODE_FOR_reload_v16qi_si_load;
+	  rs6000_insns[V8HImode].reload_store  = CODE_FOR_reload_v8hi_si_store;
+	  rs6000_insns[V8HImode].reload_load   = CODE_FOR_reload_v8hi_si_load;
+	  rs6000_insns[V4SImode].reload_store  = CODE_FOR_reload_v4si_si_store;
+	  rs6000_insns[V4SImode].reload_load   = CODE_FOR_reload_v4si_si_load;
+	  rs6000_insns[V2DImode].reload_store  = CODE_FOR_reload_v2di_si_store;
+	  rs6000_insns[V2DImode].reload_load   = CODE_FOR_reload_v2di_si_load;
+	  rs6000_insns[V1TImode].reload_store  = CODE_FOR_reload_v1ti_si_store;
+	  rs6000_insns[V1TImode].reload_load   = CODE_FOR_reload_v1ti_si_load;
+	  rs6000_insns[V4SFmode].reload_store  = CODE_FOR_reload_v4sf_si_store;
+	  rs6000_insns[V4SFmode].reload_load   = CODE_FOR_reload_v4sf_si_load;
+	  rs6000_insns[V2DFmode].reload_store  = CODE_FOR_reload_v2df_si_store;
+	  rs6000_insns[V2DFmode].reload_load   = CODE_FOR_reload_v2df_si_load;
+	  rs6000_insns[DFmode].reload_store    = CODE_FOR_reload_df_si_store;
+	  rs6000_insns[DFmode].reload_load     = CODE_FOR_reload_df_si_load;
+	  rs6000_insns[DDmode].reload_store    = CODE_FOR_reload_dd_si_store;
+	  rs6000_insns[DDmode].reload_load     = CODE_FOR_reload_dd_si_load;
+	  rs6000_insns[SFmode].reload_store    = CODE_FOR_reload_sf_si_store;
+	  rs6000_insns[SFmode].reload_load     = CODE_FOR_reload_sf_si_load;
 
 	  if (FLOAT128_VECTOR_P (KFmode))
 	    {
-	      reg_addr[KFmode].reload_store = CODE_FOR_reload_kf_si_store;
-	      reg_addr[KFmode].reload_load  = CODE_FOR_reload_kf_si_load;
+	      rs6000_insns[KFmode].reload_store = CODE_FOR_reload_kf_si_store;
+	      rs6000_insns[KFmode].reload_load  = CODE_FOR_reload_kf_si_load;
 	    }
 
 	  if (FLOAT128_IEEE_P (TFmode))
 	    {
-	      reg_addr[TFmode].reload_store = CODE_FOR_reload_tf_si_store;
-	      reg_addr[TFmode].reload_load  = CODE_FOR_reload_tf_si_load;
+	      rs6000_insns[TFmode].reload_store = CODE_FOR_reload_tf_si_store;
+	      rs6000_insns[TFmode].reload_load  = CODE_FOR_reload_tf_si_load;
 	    }
 
 	  /* Only provide a reload handler for SDmode if lfiwzx/stfiwx are
 	     available.  */
 	  if (TARGET_NO_SDMODE_STACK)
 	    {
-	      reg_addr[SDmode].reload_store = CODE_FOR_reload_sd_si_store;
-	      reg_addr[SDmode].reload_load  = CODE_FOR_reload_sd_si_load;
+	      rs6000_insns[SDmode].reload_store = CODE_FOR_reload_sd_si_store;
+	      rs6000_insns[SDmode].reload_load  = CODE_FOR_reload_sd_si_load;
 	    }
 
 	  if (TARGET_VSX)
 	    {
-	      reg_addr[TImode].reload_store  = CODE_FOR_reload_ti_si_store;
-	      reg_addr[TImode].reload_load   = CODE_FOR_reload_ti_si_load;
+	      rs6000_insns[TImode].reload_store  = CODE_FOR_reload_ti_si_store;
+	      rs6000_insns[TImode].reload_load   = CODE_FOR_reload_ti_si_load;
 	    }
 
 	  if (TARGET_DIRECT_MOVE)
 	    {
-	      reg_addr[DImode].reload_fpr_gpr = CODE_FOR_reload_fpr_from_gprdi;
-	      reg_addr[DDmode].reload_fpr_gpr = CODE_FOR_reload_fpr_from_gprdd;
-	      reg_addr[DFmode].reload_fpr_gpr = CODE_FOR_reload_fpr_from_gprdf;
+	      rs6000_insns[DImode].reload_fpr_gpr
+		= CODE_FOR_reload_fpr_from_gprdi;
+	      rs6000_insns[DDmode].reload_fpr_gpr
+		= CODE_FOR_reload_fpr_from_gprdd;
+	      rs6000_insns[DFmode].reload_fpr_gpr
+		= CODE_FOR_reload_fpr_from_gprdf;
 	    }
 	}
 
@@ -3552,11 +3521,11 @@ rs6000_init_hard_regno_mode_ok (bool glo
   /* Setup the fusion operations.  */
   if (TARGET_P8_FUSION)
     {
-      reg_addr[QImode].fusion_gpr_ld = CODE_FOR_fusion_gpr_load_qi;
-      reg_addr[HImode].fusion_gpr_ld = CODE_FOR_fusion_gpr_load_hi;
-      reg_addr[SImode].fusion_gpr_ld = CODE_FOR_fusion_gpr_load_si;
+      rs6000_insns[QImode].fusion_gpr_ld = CODE_FOR_fusion_gpr_load_qi;
+      rs6000_insns[HImode].fusion_gpr_ld = CODE_FOR_fusion_gpr_load_hi;
+      rs6000_insns[SImode].fusion_gpr_ld = CODE_FOR_fusion_gpr_load_si;
       if (TARGET_64BIT)
-	reg_addr[DImode].fusion_gpr_ld = CODE_FOR_fusion_gpr_load_di;
+	rs6000_insns[DImode].fusion_gpr_ld = CODE_FOR_fusion_gpr_load_di;
     }
 
   if (TARGET_P9_FUSION)
@@ -3649,14 +3618,14 @@ rs6000_init_hard_regno_mode_ok (bool glo
 	  if (rtype == RELOAD_REG_FPR && !TARGET_HARD_FLOAT)
 	    continue;
 
-	  reg_addr[xmode].fusion_addis_ld[rtype] = addis_insns[i].load;
-	  reg_addr[xmode].fusion_addis_st[rtype] = addis_insns[i].store;
+	  rs6000_insns[xmode].fusion_addis_ld[rtype] = addis_insns[i].load;
+	  rs6000_insns[xmode].fusion_addis_st[rtype] = addis_insns[i].store;
 
 	  if (rtype == RELOAD_REG_FPR && TARGET_P9_VECTOR)
 	    {
-	      reg_addr[xmode].fusion_addis_ld[RELOAD_REG_VMX]
+	      rs6000_insns[xmode].fusion_addis_ld[RELOAD_REG_VMX]
 		= addis_insns[i].load;
-	      reg_addr[xmode].fusion_addis_st[RELOAD_REG_VMX]
+	      rs6000_insns[xmode].fusion_addis_st[RELOAD_REG_VMX]
 		= addis_insns[i].store;
 	    }
 	}
@@ -7427,7 +7396,7 @@ rs6000_expand_vector_extract (rtx target
 }
 
 /* Helper function to return the register number of a RTX.  */
-static inline int
+int
 regno_or_subregno (rtx op)
 {
   if (REG_P (op))
@@ -8107,7 +8076,7 @@ quad_address_p (rtx addr, machine_mode m
   if (legitimate_indirect_address_p (addr, strict))
     return true;
 
-  if (VECTOR_MODE_P (mode) && !mode_supports_vsx_dform_quad (mode))
+  if (VECTOR_MODE_P (mode) && !mode_supports_dq_form (mode))
     return false;
 
   if (GET_CODE (addr) != PLUS)
@@ -8289,7 +8258,7 @@ reg_offset_addressing_ok_p (machine_mode
 	 IEEE 128-bit floating point that is passed in a single vector
 	 register.  */
       if (VECTOR_MEM_ALTIVEC_OR_VSX_P (mode))
-	return mode_supports_vsx_dform_quad (mode);
+	return mode_supports_dq_form (mode);
       break;
 
     case E_V2SImode:
@@ -8356,7 +8325,7 @@ offsettable_ok_by_alignment (rtx op, HOS
 
   /* ISA 3.0 vector d-form addressing is restricted, don't allow
      SYMBOL_REF.  */
-  if (mode_supports_vsx_dform_quad (mode))
+  if (mode_supports_dq_form (mode))
     return false;
 
   dsize = GET_MODE_SIZE (mode);
@@ -8527,7 +8496,7 @@ rs6000_legitimate_offset_address_p (mach
     return false;
   if (!INT_REG_OK_FOR_BASE_P (XEXP (x, 0), strict))
     return false;
-  if (mode_supports_vsx_dform_quad (mode))
+  if (mode_supports_dq_form (mode))
     return quad_address_p (x, mode, strict);
   if (!reg_offset_addressing_ok_p (mode))
     return virtual_stack_registers_memory_p (x);
@@ -8645,7 +8614,7 @@ legitimate_lo_sum_address_p (machine_mod
   if (!INT_REG_OK_FOR_BASE_P (XEXP (x, 0), strict))
     return false;
   /* quad word addresses are restricted, and we can't use LO_SUM.  */
-  if (mode_supports_vsx_dform_quad (mode))
+  if (mode_supports_dq_form (mode))
     return false;
   x = XEXP (x, 1);
 
@@ -8710,7 +8679,7 @@ rs6000_legitimize_address (rtx x, rtx ol
   unsigned int extra;
 
   if (!reg_offset_addressing_ok_p (mode)
-      || mode_supports_vsx_dform_quad (mode))
+      || mode_supports_dq_form (mode))
     {
       if (virtual_stack_registers_memory_p (x))
 	return x;
@@ -9454,7 +9423,7 @@ rs6000_legitimize_reload_address (rtx x,
 				  int ind_levels ATTRIBUTE_UNUSED, int *win)
 {
   bool reg_offset_p = reg_offset_addressing_ok_p (mode);
-  bool quad_offset_p = mode_supports_vsx_dform_quad (mode);
+  bool quad_offset_p = mode_supports_dq_form (mode);
 
   /* Nasty hack for vsx_splat_v2df/v2di load from mem, which takes a
      DFmode/DImode MEM.  Ditto for ISA 3.0 vsx_splat_v4sf/v4si.  */
@@ -9742,7 +9711,7 @@ static bool
 rs6000_legitimate_address_p (machine_mode mode, rtx x, bool reg_ok_strict)
 {
   bool reg_offset_p = reg_offset_addressing_ok_p (mode);
-  bool quad_offset_p = mode_supports_vsx_dform_quad (mode);
+  bool quad_offset_p = mode_supports_dq_form (mode);
 
   /* If this is an unaligned stvx/ldvx type address, discard the outer AND.  */
   if (VECTOR_MEM_ALTIVEC_P (mode)
@@ -19913,7 +19882,7 @@ rs6000_secondary_reload_direct_move (enu
       if (to_type == VSX_REG_TYPE && from_type == GPR_REG_TYPE)
 	{
 	  cost = 3;			/* 2 mtvsrd's, 1 xxpermdi.  */
-	  icode = reg_addr[mode].reload_vsx_gpr;
+	  icode = rs6000_insns[mode].reload_vsx_gpr;
 	}
 
       /* Handle moving 128-bit values from VSX point registers to GPRs on
@@ -19922,7 +19891,7 @@ rs6000_secondary_reload_direct_move (enu
       else if (to_type == GPR_REG_TYPE && from_type == VSX_REG_TYPE)
 	{
 	  cost = 3;			/* 2 mfvsrd's, 1 xxpermdi.  */
-	  icode = reg_addr[mode].reload_gpr_vsx;
+	  icode = rs6000_insns[mode].reload_gpr_vsx;
 	}
     }
 
@@ -19931,13 +19900,13 @@ rs6000_secondary_reload_direct_move (enu
       if (to_type == GPR_REG_TYPE && from_type == VSX_REG_TYPE)
 	{
 	  cost = 3;			/* xscvdpspn, mfvsrd, and.  */
-	  icode = reg_addr[mode].reload_gpr_vsx;
+	  icode = rs6000_insns[mode].reload_gpr_vsx;
 	}
 
       else if (to_type == VSX_REG_TYPE && from_type == GPR_REG_TYPE)
 	{
 	  cost = 2;			/* mtvsrz, xscvspdpn.  */
-	  icode = reg_addr[mode].reload_vsx_gpr;
+	  icode = rs6000_insns[mode].reload_vsx_gpr;
 	}
     }
 
@@ -19953,7 +19922,7 @@ rs6000_secondary_reload_direct_move (enu
       if (to_type == VSX_REG_TYPE && from_type == GPR_REG_TYPE && !altivec_p)
 	{
 	  cost = 3;			/* 2 mtvsrwz's, 1 fmrgow.  */
-	  icode = reg_addr[mode].reload_fpr_gpr;
+	  icode = rs6000_insns[mode].reload_fpr_gpr;
 	}
     }
 
@@ -20045,8 +20014,8 @@ rs6000_secondary_reload (bool in_p,
   sri->t_icode = CODE_FOR_nothing;
   sri->extra_cost = 0;
   icode = ((in_p)
-	   ? reg_addr[mode].reload_load
-	   : reg_addr[mode].reload_store);
+	   ? rs6000_insns[mode].reload_load
+	   : rs6000_insns[mode].reload_store);
 
   if (REG_P (x) || register_operand (x, mode))
     {
@@ -20081,7 +20050,7 @@ rs6000_secondary_reload (bool in_p,
      point register, unless we have D-form addressing.  Also make sure that
      non-zero constants use a FPR.  */
   if (!done_p && reg_addr[mode].scalar_in_vmx_p
-      && !mode_supports_vmx_dform (mode)
+      && !mode_supports_d_form (mode, RELOAD_REG_VMX)
       && (rclass == VSX_REGS || rclass == ALTIVEC_REGS)
       && (memory_p || (GET_CODE (x) == CONST_DOUBLE)))
     {
@@ -20409,7 +20378,7 @@ rs6000_secondary_reload_inner (rtx reg, 
 	    }
 	}
 
-      else if (mode_supports_vsx_dform_quad (mode) && CONST_INT_P (op1))
+      else if (mode_supports_dq_form (mode) && CONST_INT_P (op1))
 	{
 	  if (((addr_mask & RELOAD_REG_QUAD_OFFSET) == 0)
 	      || !quad_address_p (addr, mode, false))
@@ -20450,7 +20419,7 @@ rs6000_secondary_reload_inner (rtx reg, 
 	}
 
       /* Quad offsets are restricted and can't handle normal addresses.  */
-      else if (mode_supports_vsx_dform_quad (mode))
+      else if (mode_supports_dq_form (mode))
 	{
 	  emit_insn (gen_rtx_SET (scratch, addr));
 	  new_addr = scratch;
@@ -20644,8 +20613,8 @@ rs6000_preferred_reload_class (rtx x, en
 	}
 
       /* D-form addressing can easily reload the value.  */
-      if (mode_supports_vmx_dform (mode)
-	  || mode_supports_vsx_dform_quad (mode))
+      if (mode_supports_d_form (mode, RELOAD_REG_VMX)
+	  || mode_supports_dq_form (mode))
 	return rclass;
 
       /* If this is a scalar floating point value and we don't have D-form
@@ -20801,7 +20770,7 @@ rs6000_secondary_reload_class (enum reg_
      instead of reloading the secondary memory address for Altivec moves.  */
   if (TARGET_VSX
       && GET_MODE_SIZE (mode) < 16
-      && !mode_supports_vmx_dform (mode)
+      && !mode_supports_d_form (mode, RELOAD_REG_VMX)
       && (((rclass == GENERAL_REGS || rclass == BASE_REGS)
            && (regno >= 0 && ALTIVEC_REGNO_P (regno)))
           || ((rclass == VSX_REGS || rclass == ALTIVEC_REGS)
@@ -21048,7 +21017,7 @@ rs6000_output_move_128bit (rtx operands[
 
       else if (TARGET_VSX && dest_vsx_p)
 	{
-	  if (mode_supports_vsx_dform_quad (mode)
+	  if (mode_supports_dq_form (mode)
 	      && quad_address_p (XEXP (src, 0), mode, true))
 	    return "lxv %x0,%1";
 
@@ -21086,7 +21055,7 @@ rs6000_output_move_128bit (rtx operands[
 
       else if (TARGET_VSX && src_vsx_p)
 	{
-	  if (mode_supports_vsx_dform_quad (mode)
+	  if (mode_supports_dq_form (mode)
 	      && quad_address_p (XEXP (dest, 0), mode, true))
 	    return "stxv %x1,%0";
 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC Patch], PowerPC memory support pre-gcc9, patch #2
  2018-03-14 23:01 [RFC Patch], PowerPC memory support pre-gcc9, patch #1 Michael Meissner
@ 2018-03-15 17:09 ` Michael Meissner
  2018-03-20 13:32   ` Segher Boessenkool
  2018-03-15 23:33 ` [RFC Patch], PowerPC memory support pre-gcc9, patch #3 Michael Meissner
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Michael Meissner @ 2018-03-15 17:09 UTC (permalink / raw)
  To: Michael Meissner, GCC Patches, Segher Boessenkool,
	David Edelsohn, Bill Schmidt

[-- Attachment #1: Type: text/plain, Size: 1313 bytes --]

This is patch #2 of my series for improving the PowerPC internal memory
support.  It assumes patch #1 has been applied.

This patch moves the rs6000_move_128bit function from rs6000.c to a new file,
rs6000-output.c.

The third patch will create a rs6000_move_64bit function and change both 32-bit
and 64-bit movdi to call it, instead of having all of the instructions be
literals.  I will also likely add improvements to setting the reg_addr address
masks for offsetable addresses.

The fourth patch will like move movdd and movdf to call rs6000_move_64bit as
well.

I tested this on a little endian power8 system and there were no regressions.

2018-03-14  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* config.gcc (powerpc*-*-*): Add rs6000-output.o to extra_objs.
	* config/rs6000/t-rs6000 (rs6000-output.o): Add build rule.
	* config/rs6000/rs6000.c (rs6000_output_move_128bit): Move to
	rs6000-output.c.
	(rs6000_move_128bit_ok_p): Likewise.
	(rs6000_split_128bit_ok_p): Likewise.
	* config/rs6000/rs6000-output.c (rs6000_output_move_128bit):
	Likewise.
	to rs6000-output.c.
	(rs6000_move_128bit_ok_p): Likewise.
	(rs6000_split_128bit_ok_p): Likewise.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

[-- Attachment #2: ext-addr.patch02b --]
[-- Type: text/plain, Size: 12388 bytes --]

Index: gcc/config.gcc
===================================================================
--- gcc/config.gcc	(revision 258531)
+++ gcc/config.gcc	(working copy)
@@ -466,7 +466,7 @@ powerpc*-*-*spe*)
 	;;
 powerpc*-*-*)
 	cpu_type=rs6000
-	extra_objs="rs6000-string.o rs6000-p8swap.o"
+	extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-output.o"
 	extra_headers="ppc-asm.h altivec.h htmintrin.h htmxlintrin.h"
 	extra_headers="${extra_headers} bmi2intrin.h bmiintrin.h"
 	extra_headers="${extra_headers} xmmintrin.h mm_malloc.h emmintrin.h"
Index: gcc/config/rs6000/t-rs6000
===================================================================
--- gcc/config/rs6000/t-rs6000	(revision 258530)
+++ gcc/config/rs6000/t-rs6000	(working copy)
@@ -30,6 +30,10 @@ rs6000-string.o: $(srcdir)/config/rs6000
 	$(COMPILE) $<
 	$(POSTCOMPILE)
 
+rs6000-output.o: $(srcdir)/config/rs6000/rs6000-output.c
+	$(COMPILE) $<
+	$(POSTCOMPILE)
+
 rs6000-p8swap.o: $(srcdir)/config/rs6000/rs6000-p8swap.c
 	$(COMPILE) $<
 	$(POSTCOMPILE)
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 258535)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -20921,205 +20921,6 @@ rs6000_debug_can_change_mode_class (mach
   return ret;
 }
 \f
-/* Return a string to do a move operation of 128 bits of data.  */
-
-const char *
-rs6000_output_move_128bit (rtx operands[])
-{
-  rtx dest = operands[0];
-  rtx src = operands[1];
-  machine_mode mode = GET_MODE (dest);
-  int dest_regno;
-  int src_regno;
-  bool dest_gpr_p, dest_fp_p, dest_vmx_p, dest_vsx_p;
-  bool src_gpr_p, src_fp_p, src_vmx_p, src_vsx_p;
-
-  if (REG_P (dest))
-    {
-      dest_regno = REGNO (dest);
-      dest_gpr_p = INT_REGNO_P (dest_regno);
-      dest_fp_p = FP_REGNO_P (dest_regno);
-      dest_vmx_p = ALTIVEC_REGNO_P (dest_regno);
-      dest_vsx_p = dest_fp_p | dest_vmx_p;
-    }
-  else
-    {
-      dest_regno = -1;
-      dest_gpr_p = dest_fp_p = dest_vmx_p = dest_vsx_p = false;
-    }
-
-  if (REG_P (src))
-    {
-      src_regno = REGNO (src);
-      src_gpr_p = INT_REGNO_P (src_regno);
-      src_fp_p = FP_REGNO_P (src_regno);
-      src_vmx_p = ALTIVEC_REGNO_P (src_regno);
-      src_vsx_p = src_fp_p | src_vmx_p;
-    }
-  else
-    {
-      src_regno = -1;
-      src_gpr_p = src_fp_p = src_vmx_p = src_vsx_p = false;
-    }
-
-  /* Register moves.  */
-  if (dest_regno >= 0 && src_regno >= 0)
-    {
-      if (dest_gpr_p)
-	{
-	  if (src_gpr_p)
-	    return "#";
-
-	  if (TARGET_DIRECT_MOVE_128 && src_vsx_p)
-	    return (WORDS_BIG_ENDIAN
-		    ? "mfvsrd %0,%x1\n\tmfvsrld %L0,%x1"
-		    : "mfvsrd %L0,%x1\n\tmfvsrld %0,%x1");
-
-	  else if (TARGET_VSX && TARGET_DIRECT_MOVE && src_vsx_p)
-	    return "#";
-	}
-
-      else if (TARGET_VSX && dest_vsx_p)
-	{
-	  if (src_vsx_p)
-	    return "xxlor %x0,%x1,%x1";
-
-	  else if (TARGET_DIRECT_MOVE_128 && src_gpr_p)
-	    return (WORDS_BIG_ENDIAN
-		    ? "mtvsrdd %x0,%1,%L1"
-		    : "mtvsrdd %x0,%L1,%1");
-
-	  else if (TARGET_DIRECT_MOVE && src_gpr_p)
-	    return "#";
-	}
-
-      else if (TARGET_ALTIVEC && dest_vmx_p && src_vmx_p)
-	return "vor %0,%1,%1";
-
-      else if (dest_fp_p && src_fp_p)
-	return "#";
-    }
-
-  /* Loads.  */
-  else if (dest_regno >= 0 && MEM_P (src))
-    {
-      if (dest_gpr_p)
-	{
-	  if (TARGET_QUAD_MEMORY && quad_load_store_p (dest, src))
-	    return "lq %0,%1";
-	  else
-	    return "#";
-	}
-
-      else if (TARGET_ALTIVEC && dest_vmx_p
-	       && altivec_indexed_or_indirect_operand (src, mode))
-	return "lvx %0,%y1";
-
-      else if (TARGET_VSX && dest_vsx_p)
-	{
-	  if (mode_supports_dq_form (mode)
-	      && quad_address_p (XEXP (src, 0), mode, true))
-	    return "lxv %x0,%1";
-
-	  else if (TARGET_P9_VECTOR)
-	    return "lxvx %x0,%y1";
-
-	  else if (mode == V16QImode || mode == V8HImode || mode == V4SImode)
-	    return "lxvw4x %x0,%y1";
-
-	  else
-	    return "lxvd2x %x0,%y1";
-	}
-
-      else if (TARGET_ALTIVEC && dest_vmx_p)
-	return "lvx %0,%y1";
-
-      else if (dest_fp_p)
-	return "#";
-    }
-
-  /* Stores.  */
-  else if (src_regno >= 0 && MEM_P (dest))
-    {
-      if (src_gpr_p)
-	{
- 	  if (TARGET_QUAD_MEMORY && quad_load_store_p (dest, src))
-	    return "stq %1,%0";
-	  else
-	    return "#";
-	}
-
-      else if (TARGET_ALTIVEC && src_vmx_p
-	       && altivec_indexed_or_indirect_operand (src, mode))
-	return "stvx %1,%y0";
-
-      else if (TARGET_VSX && src_vsx_p)
-	{
-	  if (mode_supports_dq_form (mode)
-	      && quad_address_p (XEXP (dest, 0), mode, true))
-	    return "stxv %x1,%0";
-
-	  else if (TARGET_P9_VECTOR)
-	    return "stxvx %x1,%y0";
-
-	  else if (mode == V16QImode || mode == V8HImode || mode == V4SImode)
-	    return "stxvw4x %x1,%y0";
-
-	  else
-	    return "stxvd2x %x1,%y0";
-	}
-
-      else if (TARGET_ALTIVEC && src_vmx_p)
-	return "stvx %1,%y0";
-
-      else if (src_fp_p)
-	return "#";
-    }
-
-  /* Constants.  */
-  else if (dest_regno >= 0
-	   && (GET_CODE (src) == CONST_INT
-	       || GET_CODE (src) == CONST_WIDE_INT
-	       || GET_CODE (src) == CONST_DOUBLE
-	       || GET_CODE (src) == CONST_VECTOR))
-    {
-      if (dest_gpr_p)
-	return "#";
-
-      else if ((dest_vmx_p && TARGET_ALTIVEC)
-	       || (dest_vsx_p && TARGET_VSX))
-	return output_vec_const_move (operands);
-    }
-
-  fatal_insn ("Bad 128-bit move", gen_rtx_SET (dest, src));
-}
-
-/* Validate a 128-bit move.  */
-bool
-rs6000_move_128bit_ok_p (rtx operands[])
-{
-  machine_mode mode = GET_MODE (operands[0]);
-  return (gpc_reg_operand (operands[0], mode)
-	  || gpc_reg_operand (operands[1], mode));
-}
-
-/* Return true if a 128-bit move needs to be split.  */
-bool
-rs6000_split_128bit_ok_p (rtx operands[])
-{
-  if (!reload_completed)
-    return false;
-
-  if (!gpr_or_gpr_p (operands[0], operands[1]))
-    return false;
-
-  if (quad_load_store_p (operands[0], operands[1]))
-    return false;
-
-  return true;
-}
-
-\f
 /* Given a comparison operation, return the bit number in CCR to test.  We
    know this is a valid comparison.
 
Index: gcc/config/rs6000/rs6000-output.c
===================================================================
--- gcc/config/rs6000/rs6000-output.c	(revision 0)
+++ gcc/config/rs6000/rs6000-output.c	(revision 0)
@@ -0,0 +1,246 @@
+/* Subroutines used to emit code and split insns for PowerPC.
+   Copyright (C) 2018 Free Software Foundation, Inc.
+   Contributed by Richard Kenner (kenner@vlsi1.ultra.nyu.edu)
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#define IN_TARGET_CODE 1
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "rtl.h"
+#include "tree.h"
+#include "memmodel.h"
+#include "tm_p.h"
+#include "expmed.h"
+#include "optabs.h"
+#include "regs.h"
+#include "ira.h"
+#include "recog.h"
+#include "insn-attr.h"
+#include "flags.h"
+#include "print-tree.h"
+#include "fold-const.h"
+#include "stringpool.h"
+#include "attribs.h"
+#include "varasm.h"
+#include "explow.h"
+#include "expr.h"
+#include "output.h"
+#include "target.h"
+#include "tm-constrs.h"
+
+\f
+/* Return a string to do a move operation of 128 bits of data.  */
+
+const char *
+rs6000_output_move_128bit (rtx operands[])
+{
+  rtx dest = operands[0];
+  rtx src = operands[1];
+  machine_mode mode = GET_MODE (dest);
+  int dest_regno;
+  int src_regno;
+  bool dest_gpr_p, dest_fp_p, dest_vmx_p, dest_vsx_p;
+  bool src_gpr_p, src_fp_p, src_vmx_p, src_vsx_p;
+
+  if (REG_P (dest))
+    {
+      dest_regno = REGNO (dest);
+      dest_gpr_p = INT_REGNO_P (dest_regno);
+      dest_fp_p = FP_REGNO_P (dest_regno);
+      dest_vmx_p = ALTIVEC_REGNO_P (dest_regno);
+      dest_vsx_p = dest_fp_p | dest_vmx_p;
+    }
+  else
+    {
+      dest_regno = -1;
+      dest_gpr_p = dest_fp_p = dest_vmx_p = dest_vsx_p = false;
+    }
+
+  if (REG_P (src))
+    {
+      src_regno = REGNO (src);
+      src_gpr_p = INT_REGNO_P (src_regno);
+      src_fp_p = FP_REGNO_P (src_regno);
+      src_vmx_p = ALTIVEC_REGNO_P (src_regno);
+      src_vsx_p = src_fp_p | src_vmx_p;
+    }
+  else
+    {
+      src_regno = -1;
+      src_gpr_p = src_fp_p = src_vmx_p = src_vsx_p = false;
+    }
+
+  /* Register moves.  */
+  if (dest_regno >= 0 && src_regno >= 0)
+    {
+      if (dest_gpr_p)
+	{
+	  if (src_gpr_p)
+	    return "#";
+
+	  if (TARGET_DIRECT_MOVE_128 && src_vsx_p)
+	    return (WORDS_BIG_ENDIAN
+		    ? "mfvsrd %0,%x1\n\tmfvsrld %L0,%x1"
+		    : "mfvsrd %L0,%x1\n\tmfvsrld %0,%x1");
+
+	  else if (TARGET_VSX && TARGET_DIRECT_MOVE && src_vsx_p)
+	    return "#";
+	}
+
+      else if (TARGET_VSX && dest_vsx_p)
+	{
+	  if (src_vsx_p)
+	    return "xxlor %x0,%x1,%x1";
+
+	  else if (TARGET_DIRECT_MOVE_128 && src_gpr_p)
+	    return (WORDS_BIG_ENDIAN
+		    ? "mtvsrdd %x0,%1,%L1"
+		    : "mtvsrdd %x0,%L1,%1");
+
+	  else if (TARGET_DIRECT_MOVE && src_gpr_p)
+	    return "#";
+	}
+
+      else if (TARGET_ALTIVEC && dest_vmx_p && src_vmx_p)
+	return "vor %0,%1,%1";
+
+      else if (dest_fp_p && src_fp_p)
+	return "#";
+    }
+
+  /* Loads.  */
+  else if (dest_regno >= 0 && MEM_P (src))
+    {
+      if (dest_gpr_p)
+	{
+	  if (TARGET_QUAD_MEMORY && quad_load_store_p (dest, src))
+	    return "lq %0,%1";
+	  else
+	    return "#";
+	}
+
+      else if (TARGET_ALTIVEC && dest_vmx_p
+	       && altivec_indexed_or_indirect_operand (src, mode))
+	return "lvx %0,%y1";
+
+      else if (TARGET_VSX && dest_vsx_p)
+	{
+	  if (mode_supports_dq_form (mode)
+	      && quad_address_p (XEXP (src, 0), mode, true))
+	    return "lxv %x0,%1";
+
+	  else if (TARGET_P9_VECTOR)
+	    return "lxvx %x0,%y1";
+
+	  else if (mode == V16QImode || mode == V8HImode || mode == V4SImode)
+	    return "lxvw4x %x0,%y1";
+
+	  else
+	    return "lxvd2x %x0,%y1";
+	}
+
+      else if (TARGET_ALTIVEC && dest_vmx_p)
+	return "lvx %0,%y1";
+
+      else if (dest_fp_p)
+	return "#";
+    }
+
+  /* Stores.  */
+  else if (src_regno >= 0 && MEM_P (dest))
+    {
+      if (src_gpr_p)
+	{
+ 	  if (TARGET_QUAD_MEMORY && quad_load_store_p (dest, src))
+	    return "stq %1,%0";
+	  else
+	    return "#";
+	}
+
+      else if (TARGET_ALTIVEC && src_vmx_p
+	       && altivec_indexed_or_indirect_operand (src, mode))
+	return "stvx %1,%y0";
+
+      else if (TARGET_VSX && src_vsx_p)
+	{
+	  if (mode_supports_dq_form (mode)
+	      && quad_address_p (XEXP (dest, 0), mode, true))
+	    return "stxv %x1,%0";
+
+	  else if (TARGET_P9_VECTOR)
+	    return "stxvx %x1,%y0";
+
+	  else if (mode == V16QImode || mode == V8HImode || mode == V4SImode)
+	    return "stxvw4x %x1,%y0";
+
+	  else
+	    return "stxvd2x %x1,%y0";
+	}
+
+      else if (TARGET_ALTIVEC && src_vmx_p)
+	return "stvx %1,%y0";
+
+      else if (src_fp_p)
+	return "#";
+    }
+
+  /* Constants.  */
+  else if (dest_regno >= 0
+	   && (GET_CODE (src) == CONST_INT
+	       || GET_CODE (src) == CONST_WIDE_INT
+	       || GET_CODE (src) == CONST_DOUBLE
+	       || GET_CODE (src) == CONST_VECTOR))
+    {
+      if (dest_gpr_p)
+	return "#";
+
+      else if ((dest_vmx_p && TARGET_ALTIVEC)
+	       || (dest_vsx_p && TARGET_VSX))
+	return output_vec_const_move (operands);
+    }
+
+  fatal_insn ("Bad 128-bit move", gen_rtx_SET (dest, src));
+}
+
+/* Validate a 128-bit move.  */
+bool
+rs6000_move_128bit_ok_p (rtx operands[])
+{
+  machine_mode mode = GET_MODE (operands[0]);
+  return (gpc_reg_operand (operands[0], mode)
+	  || gpc_reg_operand (operands[1], mode));
+}
+
+/* Return true if a 128-bit move needs to be split.  */
+bool
+rs6000_split_128bit_ok_p (rtx operands[])
+{
+  if (!reload_completed)
+    return false;
+
+  if (!gpr_or_gpr_p (operands[0], operands[1]))
+    return false;
+
+  if (quad_load_store_p (operands[0], operands[1]))
+    return false;
+
+  return true;
+}

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC Patch], PowerPC memory support pre-gcc9, patch #3
  2018-03-14 23:01 [RFC Patch], PowerPC memory support pre-gcc9, patch #1 Michael Meissner
  2018-03-15 17:09 ` [RFC Patch], PowerPC memory support pre-gcc9, patch #2 Michael Meissner
@ 2018-03-15 23:33 ` Michael Meissner
  2018-03-16 17:27 ` [RFC Patch], PowerPC memory support pre-gcc9, patch #4 Michael Meissner
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Michael Meissner @ 2018-03-15 23:33 UTC (permalink / raw)
  To: Michael Meissner, GCC Patches, Segher Boessenkool,
	David Edelsohn, Bill Schmidt

[-- Attachment #1: Type: text/plain, Size: 2127 bytes --]

This patch moves the instructions for movdi (both 32-bit and 64-bit) into a
separate rs6000_output_move_64bit function.

As I'm starting to move more stuff to checking the addr_masks instead of doing
a lot of if mode == MODE1 || mode == MODE2, etc. I realized that the
mult-register types (complex values, long double using IBM double double, etc.)
did not have the offset bits set correctly in reg_addr.  I also prevented the
Altivec load/stores (that give you the free AND with -16) from being generated
for multi-register values.

I added a function (rs6000_valid_move_p) that replaces the old is operand[0] a
register or is operand[1] a register tests.  Right now, it generates the same
tests, but I may need to add additional conditions in the future.

I have done a full bootstrap and make check on a little endian power8 system
with no regressions.

The next patch will change the MOVDF and MOVDD patterns to use
rs6000_output_move_64bit as well.

2018-03-15  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* config/rs6000/rs6000-protos.h (rs6000_output_move_64bit): Add
	declaration.
	(rs6000_valid_move_p): Likewise.
	* config/rs6000/rs6000-output.c (addr_is_xform_p): New helper
	function to return if an addresses uses X-form (reg+reg).
	(reg_is_spr_p): New helper function to determine if a register is
	a SPR.
	(rs6000_output_move_64bit): New function to return the proper
	instruction to do a 64-bit move.
	* config/rs6000/rs6000.c (rs6000_setup_reg_addr_masks): Rework
	setting offset addresses to assume multi-register values have the
	proper offset bits set.  Do not enable Altivec & -16 on
	mult-reigster moves.
	(rs6000_valid_move_p): New function to validate moves.
	(reg_offset_addressing_ok_p): Add check if the mode and register
	class support offstable instructions.
	* config/rs6000/rs6000.md (movdi_internal32): Move instruction
	literals to rs6000_otuput_move_64bit.  Check move validity with
	rs6000_move_valid_p.
	(movdi_internal64): Likewise.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

[-- Attachment #2: ext-addr.patch03b --]
[-- Type: text/plain, Size: 13048 bytes --]

Index: gcc/config/rs6000/rs6000-protos.h
===================================================================
--- gcc/config/rs6000/rs6000-protos.h	(revision 258535)
+++ gcc/config/rs6000/rs6000-protos.h	(working copy)
@@ -52,6 +52,7 @@ extern rtx rs6000_got_register (rtx);
 extern rtx find_addr_reg (rtx);
 extern rtx gen_easy_altivec_constant (rtx);
 extern const char *output_vec_const_move (rtx *);
+extern const char *rs6000_output_move_64bit (rtx *);
 extern const char *rs6000_output_move_128bit (rtx *);
 extern bool rs6000_move_128bit_ok_p (rtx []);
 extern bool rs6000_split_128bit_ok_p (rtx []);
@@ -89,6 +90,7 @@ extern bool rs6000_is_valid_2insn_and (r
 extern void rs6000_emit_2insn_and (machine_mode, rtx *, bool, int);
 extern int registers_ok_for_quad_peep (rtx, rtx);
 extern int mems_ok_for_quad_peep (rtx, rtx);
+extern bool rs6000_valid_move_p (rtx, rtx);
 extern bool gpr_or_gpr_p (rtx, rtx);
 extern bool direct_move_p (rtx, rtx);
 extern bool quad_address_p (rtx, machine_mode, bool);
Index: gcc/config/rs6000/rs6000-output.c
===================================================================
--- gcc/config/rs6000/rs6000-output.c	(revision 258538)
+++ gcc/config/rs6000/rs6000-output.c	(working copy)
@@ -47,6 +47,215 @@
 #include "tm-constrs.h"
 
 \f
+/* Return whether an address is an x-form (reg or reg+reg) address.  This is
+   used when we know the instruction is not a traditional GPR or FPR
+   load/store, so check to make sure auto increment is not present in the
+   address.  */
+inline static bool
+addr_is_xform_p (rtx addr)
+{
+  gcc_assert (GET_RTX_CLASS (GET_CODE (addr)) != RTX_AUTOINC);
+
+  if (REG_P (addr) || SUBREG_P (addr))
+    return true;
+
+  if (GET_CODE (addr) != PLUS)
+    return false;
+
+  rtx op1 = XEXP (addr, 1);
+  return REG_P (op1) || SUBREG_P (op1);
+}
+
+/* Return whether a register is a SPR.  */
+inline static bool
+reg_is_spr_p (rtx reg)
+{
+  if (!REG_P (reg))
+    return false;
+
+  enum reg_class rclass = REGNO_REG_CLASS (REGNO (reg));
+  return reg_class_to_reg_type[(int)rclass] == SPR_REG_TYPE;
+}
+
+\f
+/* Return a string to do a move operation of 64 bits of data.  */
+
+const char *
+rs6000_output_move_64bit (rtx operands[])
+{
+  rtx dest = operands[0];
+  rtx src = operands[1];
+  machine_mode mode = GET_MODE (dest);
+  int dest_regno;
+  int src_regno;
+  bool dest_gpr_p, dest_fp_p, dest_vmx_p, dest_vsx_p;
+  bool src_gpr_p, src_fp_p, src_vmx_p, src_vsx_p;
+
+  if (REG_P (dest) || SUBREG_P (dest))
+    {
+      dest_regno = regno_or_subregno (dest);
+      dest_gpr_p = INT_REGNO_P (dest_regno);
+      dest_fp_p = FP_REGNO_P (dest_regno);
+      dest_vmx_p = ALTIVEC_REGNO_P (dest_regno);
+      dest_vsx_p = dest_fp_p | dest_vmx_p;
+    }
+  else
+    {
+      dest_regno = -1;
+      dest_gpr_p = dest_fp_p = dest_vmx_p = dest_vsx_p = false;
+    }
+
+  if (REG_P (src) || SUBREG_P (src))
+    {
+      src_regno = regno_or_subregno (src);
+      src_gpr_p = INT_REGNO_P (src_regno);
+      src_fp_p = FP_REGNO_P (src_regno);
+      src_vmx_p = ALTIVEC_REGNO_P (src_regno);
+      src_vsx_p = src_fp_p | src_vmx_p;
+    }
+  else
+    {
+      src_regno = -1;
+      src_gpr_p = src_fp_p = src_vmx_p = src_vsx_p = false;
+    }
+
+  /* Register moves.  */
+  if (dest_regno >= 0 && src_regno >= 0)
+    {
+      /* Moves to GPRs.  */
+      if (dest_gpr_p)
+	{
+	  if (!TARGET_POWERPC64)
+	    return "#";
+
+	  else if (src_gpr_p)
+	    return "mr %0,%1";
+
+	  else if (TARGET_DIRECT_MOVE && src_vsx_p)
+	    return "mfvsrd %0,%x1";
+
+	  else if (TARGET_MFPGPR && src_fp_p)
+	    return "mftgpr %0,%1";
+
+	  else if (reg_is_spr_p (src))
+	    return "mf%1 %0";
+	}
+
+      /* Moves to vector/floating point registers.  */
+      else if (dest_vsx_p)
+	{
+	  if (dest_fp_p && src_fp_p)
+	    return "fmr %0,%1";
+
+	  else if (TARGET_VSX && src_vsx_p)
+	    return "xxlor %x0,%x1,%x1";
+
+	  else if (TARGET_POWERPC64 && src_gpr_p)
+	    {
+	      if (TARGET_DIRECT_MOVE)
+		return "mtvsrd %x0,%1";
+
+	      else if (TARGET_MFPGPR && dest_fp_p)
+		return "mffgpr %0,%1";
+	    }
+	}
+
+      /* Moves to SPRs.  */
+      else if (reg_is_spr_p (dest))
+	return "mt%0 %1";
+    }
+
+  /* Loads.  */
+  else if (dest_regno >= 0 && MEM_P (src))
+    {
+      if (dest_gpr_p)
+	return TARGET_POWERPC64 ? "ld%U1%X1 %0,%1" : "#";
+
+      else if (dest_fp_p)
+	return "lfd%U1%X1 %0,%1";
+
+      else if (dest_vmx_p)
+	{
+	  if (TARGET_VSX && addr_is_xform_p (XEXP (src, 0)))
+	    return "lxsdx %x0,%y1";
+
+	  else if (TARGET_P9_VECTOR)
+	    return "lxsd %0,%1";
+	}
+    }
+
+  /* Stores.  */
+  else if (src_regno >= 0 && MEM_P (dest))
+    {
+      if (src_gpr_p)
+	return TARGET_POWERPC64 ? "std%U0%X0 %1,%0" : "#";
+
+      else if (src_fp_p)
+	return "stfd%U0%X0 %1,%0";
+
+      else if (src_vmx_p)
+	{
+	  if (TARGET_VSX && addr_is_xform_p (XEXP (dest, 0)))
+	    return "stxsdx %x1,%y0";
+
+	  else if (TARGET_P9_VECTOR)
+	    return "stxsd %1,%0";
+	}
+    }
+
+  /* Constants.  */
+  else if (dest_regno >= 0 && CONSTANT_P (src))
+    {
+      if (dest_gpr_p)
+	{
+	  if (satisfies_constraint_I (src))
+	    return "li %0,%1";
+
+	  if (satisfies_constraint_L (src))
+	    return "lis %0,%v1";
+
+	  return "#";
+	}
+
+      else if (TARGET_VSX && dest_vsx_p)
+	{
+	  /* We prefer to generate XXSPLTIB/VSPLTISW over XXLXOR/XXLORC to
+	     generate 0/-1, because the later can potentially cause a stall if
+	     the previous use of the register did a long operation followed by
+	     a store.  This would cause this insn to wait for the previous
+	     operation to finish, even though it doesn't use any of the bits in
+	     the previous value.  */
+	  if (src == CONST0_RTX (mode))
+	    {
+	      /* Note 0.0 is not all zeros in IBM decimal format.  */
+	      gcc_assert (mode != DDmode);
+
+	      if (TARGET_P9_VECTOR)
+		return "xxspltib %x0,0";
+	      else if (dest_vmx_p)
+		return "vspltisw %0,0";
+	      else
+		return "xxlxor %x0,%x0,%x0";
+	    }
+	  else if (GET_MODE_CLASS (mode) == MODE_INT
+		   && src == CONSTM1_RTX (mode))
+	    {
+	      if (TARGET_P9_VECTOR)
+		return "xxspltib %x0,255";
+	      else if (dest_vmx_p)
+		return "vspltisw %0,-1";
+	      else if (TARGET_P8_VECTOR)
+		return "xxlorc %x0,%x0,%x0";
+	      /* XXX: We could generate xxlxor/xxlnor for power7 if
+		 desired.  */
+	    }
+	}
+    }
+
+  fatal_insn ("Bad 64-bit move", gen_rtx_SET (dest, src));
+}
+
+\f
 /* Return a string to do a move operation of 128 bits of data.  */
 
 const char *
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 258538)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -2957,49 +2957,52 @@ rs6000_setup_reg_addr_masks (void)
 
 	  /* GPR and FPR registers can do REG+OFFSET addressing, except
 	     possibly for SDmode.  ISA 3.0 (i.e. power9) adds D-form addressing
-	     for 64-bit scalars and 32-bit SFmode to altivec registers.  */
-	  if ((addr_mask != 0) && !indexed_only_p
-	      && msize <= 8
-	      && (rc == RELOAD_REG_GPR
-		  || ((msize == 8 || m2 == SFmode)
-		      && (rc == RELOAD_REG_FPR
-			  || (rc == RELOAD_REG_VMX && TARGET_P9_VECTOR)))))
-	    addr_mask |= RELOAD_REG_OFFSET;
-
-	  /* VSX registers can do REG+OFFSET addresssing if ISA 3.0
-	     instructions are enabled.  The offset for 128-bit VSX registers is
-	     only 12-bits.  While GPRs can handle the full offset range, VSX
-	     registers can only handle the restricted range.  */
-	  else if ((addr_mask != 0) && !indexed_only_p
-		   && msize == 16 && TARGET_P9_VECTOR
-		   && (ALTIVEC_OR_VSX_VECTOR_MODE (m2)
-		       || (m2 == TImode && TARGET_VSX)))
-	    {
-	      addr_mask |= RELOAD_REG_OFFSET;
-	      if (rc == RELOAD_REG_FPR || rc == RELOAD_REG_VMX)
-		addr_mask |= RELOAD_REG_QUAD_OFFSET;
-	    }
-
-	  /* LD and STD are DS-form instructions, which must have the bottom 2
-	     bits be 0.  However, since DFmode is primarily used in the
-	     floating point/vector registers, don't restrict the offsets in ISA
-	     2.xx.  */
-	  if (rc == RELOAD_REG_GPR && msize == 8 && TARGET_POWERPC64
-	      && (addr_mask & RELOAD_REG_OFFSET) != 0
-	      && INTEGRAL_MODE_P (m2))
-	    addr_mask |= RELOAD_REG_DS_OFFSET;
-
-	  /* ISA 3.0 LXSD, LXSSP, STXSD, STXSSP altivec load/store instructions
-	     are DS-FORM.  */
-	  else if (rc == RELOAD_REG_VMX && TARGET_P9_VECTOR
-		   && (addr_mask & RELOAD_REG_OFFSET) != 0
-		   && (msize == 8 ||  m2 == SFmode))
-	    addr_mask |= RELOAD_REG_DS_OFFSET;
+	     for 64-bit scalars and 32-bit SFmode to altivec registers.
+
+	     64-bit GPR offset memory references and Altivec offset memory
+	     references use DS-mode offsets where the bottom 2 bits are 0.
+
+	     128-bit vector offset memory references use DQ-mode offsets where
+	     the bottom 4 bits are 0.  */
+	  if ((addr_mask != 0) && !indexed_only_p)
+	    {
+	      if (rc == RELOAD_REG_GPR)
+		{
+		  /* LD/STD on 64-bit use DS-form addresses.  */
+		  addr_mask |= RELOAD_REG_OFFSET;
+		  if (msize >= 8 && TARGET_POWERPC64)
+		    addr_mask |= RELOAD_REG_DS_OFFSET;
+		}
+	      else if (msize >= 8 || m == E_SFmode)
+		{
+		  if (rc == RELOAD_REG_FPR)
+		    {
+		      /* LXV/STXV use DQ-form addresses.  */
+		      addr_mask |= RELOAD_REG_OFFSET;
+		      if (msize == 16
+			  && (addr_mask & RELOAD_REG_MULTIPLE) == 0
+			  && TARGET_P9_VECTOR)
+			addr_mask |= RELOAD_REG_QUAD_OFFSET;
+		    }
+		  else if (rc == RELOAD_REG_VMX && TARGET_P9_VECTOR)
+		    {
+		      /* LXV/STXV use DQ-form addresses, LXSD/LXSSP/STXSD/STXSSP
+			 use DS-form addresses. */
+		      addr_mask |= RELOAD_REG_OFFSET;
+		      if (msize == 16
+			  && (addr_mask & RELOAD_REG_MULTIPLE) == 0)
+			addr_mask |= RELOAD_REG_QUAD_OFFSET;
+		      else
+			addr_mask |= RELOAD_REG_DS_OFFSET;
+		    }
+		}
+	    }
 
 	  /* VMX registers can do (REG & -16) and ((REG+REG) & -16)
 	     addressing on 128-bit types.  */
 	  if (rc == RELOAD_REG_VMX && msize == 16
-	      && (addr_mask & RELOAD_REG_VALID) != 0)
+	      && ((addr_mask & (RELOAD_REG_VALID
+				| RELOAD_REG_MULTIPLE)) == RELOAD_REG_VALID))
 	    addr_mask |= RELOAD_REG_AND_M16;
 
 	  reg_addr[m].addr_mask[rc] = addr_mask;
@@ -8007,6 +8010,26 @@ small_data_operand (rtx op ATTRIBUTE_UNU
 #endif
 }
 
+/* Return true if a move is valid.  */
+
+bool
+rs6000_valid_move_p (rtx dest, rtx src)
+{
+  if (SUBREG_P (dest))
+    dest = SUBREG_REG (dest);
+
+  if (SUBREG_P (src))
+    src = SUBREG_REG (src);
+
+  if (REG_P (dest))
+    return true;
+
+  if (MEM_P (dest) && REG_P (src))
+    return true;
+
+  return false;
+}
+
 /* Return true if either operand is a general purpose register.  */
 
 bool
@@ -8239,6 +8262,9 @@ mem_operand_ds_form (rtx op, machine_mod
 static bool
 reg_offset_addressing_ok_p (machine_mode mode)
 {
+  if (!mode_supports_d_form (mode))
+    return false;
+
   switch (mode)
     {
     case E_V16QImode:
Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(revision 258531)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -8485,29 +8485,8 @@ (define_insn "*movdi_internal32"
            Oj,       wM,        OjwM,      Oj,        wM,         wS,
            wB"))]
 
-  "! TARGET_POWERPC64
-   && (gpc_reg_operand (operands[0], DImode)
-       || gpc_reg_operand (operands[1], DImode))"
-  "@
-   #
-   #
-   #
-   stfd%U0%X0 %1,%0
-   lfd%U1%X1 %0,%1
-   fmr %0,%1
-   #
-   stxsd %1,%0
-   stxsdx %x1,%y0
-   lxsd %0,%1
-   lxsdx %x0,%y1
-   xxlor %x0,%x1,%x1
-   xxspltib %x0,0
-   xxspltib %x0,255
-   vspltisw %0,%1
-   xxlxor %x0,%x0,%x0
-   xxlorc %x0,%x0,%x0
-   #
-   #"
+  "! TARGET_POWERPC64 && rs6000_valid_move_p (operands[0], operands[1])"
+  "* return rs6000_output_move_64bit (operands);"
   [(set_attr "type"
                "store,     load,      *,         fpstore,    fpload,     fpsimple,
                 *,         fpstore,   fpstore,   fpload,     fpload,     veclogical,
@@ -8562,38 +8541,8 @@ (define_insn "*movdi_internal64"
                  wM,       wS,        wB,        *h,        r,          0,
                  wg,       r,         wj,        r"))]
 
-  "TARGET_POWERPC64
-   && (gpc_reg_operand (operands[0], DImode)
-       || gpc_reg_operand (operands[1], DImode))"
-  "@
-   std%U0%X0 %1,%0
-   ld%U1%X1 %0,%1
-   mr %0,%1
-   li %0,%1
-   lis %0,%v1
-   #
-   stfd%U0%X0 %1,%0
-   lfd%U1%X1 %0,%1
-   fmr %0,%1
-   stxsd %1,%0
-   stxsdx %x1,%y0
-   lxsd %0,%1
-   lxsdx %x0,%y1
-   xxlor %x0,%x1,%x1
-   xxspltib %x0,0
-   xxspltib %x0,255
-   #
-   xxlxor %x0,%x0,%x0
-   xxlorc %x0,%x0,%x0
-   #
-   #
-   mf%1 %0
-   mt%0 %1
-   nop
-   mftgpr %0,%1
-   mffgpr %0,%1
-   mfvsrd %0,%x1
-   mtvsrd %x0,%1"
+  "TARGET_POWERPC64 && rs6000_valid_move_p (operands[0], operands[1])"
+  "* return rs6000_output_move_64bit (operands);"
   [(set_attr "type"
                "store,      load,	*,         *,         *,         *,
                 fpstore,    fpload,     fpsimple,  fpstore,   fpstore,   fpload,

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC Patch], PowerPC memory support pre-gcc9, patch #4
  2018-03-14 23:01 [RFC Patch], PowerPC memory support pre-gcc9, patch #1 Michael Meissner
  2018-03-15 17:09 ` [RFC Patch], PowerPC memory support pre-gcc9, patch #2 Michael Meissner
  2018-03-15 23:33 ` [RFC Patch], PowerPC memory support pre-gcc9, patch #3 Michael Meissner
@ 2018-03-16 17:27 ` Michael Meissner
  2018-03-20 16:21   ` Segher Boessenkool
  2018-03-16 23:31 ` [RFC Patch], PowerPC memory support pre-gcc9, patch #1 Michael Meissner
  2018-03-20 13:04 ` Segher Boessenkool
  4 siblings, 1 reply; 11+ messages in thread
From: Michael Meissner @ 2018-03-16 17:27 UTC (permalink / raw)
  To: Michael Meissner, GCC Patches, Segher Boessenkool,
	David Edelsohn, Bill Schmidt

[-- Attachment #1: Type: text/plain, Size: 1388 bytes --]

Here is patch #4 that moves the MOVDF/MOVDD insns into calling C code.  I added
documentation to the various MOVD{F,D} patterns similar to the documentation
I've done on the other patterns to make it simpler to track which two
constraints match which instruction and which instruction type is used.

The next patch may tackle an instruction discrepancy that I've noticed in
building Spec 2006.  The tonto benchmark generates slightly different code with
these changes than with.  It doesn't affect the runtime of the benchmark, but
for these infrastructure changes, they should generate the same code.

After that, I will tackle the 32-bit moves and then the 8/16-bit moves.

2018-03-16  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* config/rs6000/rs6000-output.c (rs6000_output_move_64bit): Deal
	with SPR<-SPR where the register is the same.
	* config/rs6000/rs6000.md (mov<mode>_hardfloat32): Add comments
	and spacing to allow easier understanding of which constraints are
	used for which alternative.  Use rs6000_valid_move_p to validate
	the move.  Use rs6000_output_move_64bit to print out the correct
	instruction.
	(mov<mode>_softfloat32): Likewise.
	(mov<mode>_hardfloat64): Likewise.
	(mov<mode>_softfloat64): Likewise.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

[-- Attachment #2: ext-addr.patch04b --]
[-- Type: text/plain, Size: 7904 bytes --]

Index: gcc/config/rs6000/rs6000-output.c
===================================================================
--- gcc/config/rs6000/rs6000-output.c	(revision 258576)
+++ gcc/config/rs6000/rs6000-output.c	(working copy)
@@ -162,7 +162,13 @@ rs6000_output_move_64bit (rtx operands[]
 
       /* Moves to SPRs.  */
       else if (reg_is_spr_p (dest))
-	return "mt%0 %1";
+	{
+	  if (src_gpr_p)
+	    return "mt%0 %1";
+
+	  else if (dest_regno == src_regno)
+	    return "nop";
+	}
     }
 
   /* Loads.  */
Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(revision 258576)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -7398,92 +7398,108 @@ (define_split
 ;; If we have FPR registers, rs6000_emit_move has moved all constants to memory,
 ;; except for 0.0 which can be created on VSX with an xor instruction.
 
+;;           STFD         LFD         FMR         LXSD        STXSD
+;;           LXSD         STXSD       XXLOR       XXLXOR      GPR<-0
+;;           LWZ          STW         MR
+
 (define_insn "*mov<mode>_hardfloat32"
-  [(set (match_operand:FMOVE64 0 "nonimmediate_operand" "=m,d,d,<f64_p9>,wY,<f64_av>,Z,<f64_vsx>,<f64_vsx>,!r,Y,r,!r")
-	(match_operand:FMOVE64 1 "input_operand" "d,m,d,wY,<f64_p9>,Z,<f64_av>,<f64_vsx>,<zero_fp>,<zero_fp>,r,Y,r"))]
+  [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
+            "=m,          d,          d,          <f64_p9>,   wY,
+              <f64_av>,   Z,          <f64_vsx>,  <f64_vsx>,  !r,
+              Y,          r,          !r")
+
+	(match_operand:FMOVE64 1 "input_operand"
+             "d,          m,          d,          wY,         <f64_p9>,
+              Z,          <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
+              r,          Y,          r"))]
+
   "! TARGET_POWERPC64 && TARGET_HARD_FLOAT && TARGET_DOUBLE_FLOAT 
-   && (gpc_reg_operand (operands[0], <MODE>mode)
-       || gpc_reg_operand (operands[1], <MODE>mode))"
-  "@
-   stfd%U0%X0 %1,%0
-   lfd%U1%X1 %0,%1
-   fmr %0,%1
-   lxsd %0,%1
-   stxsd %1,%0
-   lxsd%U1x %x0,%y1
-   stxsd%U0x %x1,%y0
-   xxlor %x0,%x1,%x1
-   xxlxor %x0,%x0,%x0
-   #
-   #
-   #
-   #"
-  [(set_attr "type" "fpstore,fpload,fpsimple,fpload,fpstore,fpload,fpstore,veclogical,veclogical,two,store,load,two")
+   && rs6000_valid_move_p (operands[0], operands[1])"
+  "* return rs6000_output_move_64bit (operands);"
+  [(set_attr "type"
+            "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
+             fpload,      fpstore,    veclogical, veclogical, two,
+             store,       load,       two")
+
    (set_attr "size" "64")
-   (set_attr "length" "4,4,4,4,4,4,4,4,4,8,8,8,8")])
+   (set_attr "length"
+            "4,           4,          4,          4,          4,
+             4,           4,          4,          4,          8,
+             8,           8,          8")])
+
+;;           STW      LWZ     MR      G-const H-const F-const
 
 (define_insn "*mov<mode>_softfloat32"
-  [(set (match_operand:FMOVE64 0 "nonimmediate_operand" "=Y,r,r,r,r,r")
-	(match_operand:FMOVE64 1 "input_operand" "r,Y,r,G,H,F"))]
-  "! TARGET_POWERPC64 
-   && (TARGET_SINGLE_FLOAT || TARGET_SOFT_FLOAT)
-   && (gpc_reg_operand (operands[0], <MODE>mode)
-       || gpc_reg_operand (operands[1], <MODE>mode))"
-  "#"
-  [(set_attr "type" "store,load,two,*,*,*")
-   (set_attr "length" "8,8,8,8,12,16")])
+  [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
+           "=Y,       r,      r,      r,      r,      r")
+
+	(match_operand:FMOVE64 1 "input_operand"
+            "r,       Y,      r,      G,      H,      F"))]
+
+  "! TARGET_POWERPC64 && (TARGET_SINGLE_FLOAT || TARGET_SOFT_FLOAT)
+   && rs6000_valid_move_p (operands[0], operands[1])"
+  "* return rs6000_output_move_64bit (operands);"
+  [(set_attr "type"
+            "store,   load,   two,    *,      *,      *")
+
+   (set_attr "length"
+             "8,      8,      8,      8,      12,     16")])
 
 ; ld/std require word-aligned displacements -> 'Y' constraint.
 ; List Y->r and r->Y before r->r for reload.
+
+;;           STFD         LFD         FMR         LXSD        STXSD
+;;           LXSDX        STXSDX      XXLOR       XXLXOR      LI 0
+;;           STD          LD          MR          MT<SPR>     MF<SPR>
+;;           NOP          MFTGPR      MFFGPR      MTVSRD      MFVSRD
+
 (define_insn "*mov<mode>_hardfloat64"
-  [(set (match_operand:FMOVE64 0 "nonimmediate_operand" "=m,d,d,<f64_p9>,wY,<f64_av>,Z,<f64_vsx>,<f64_vsx>,!r,Y,r,!r,*c*l,!r,*h,r,wg,r,<f64_dm>")
-	(match_operand:FMOVE64 1 "input_operand" "d,m,d,wY,<f64_p9>,Z,<f64_av>,<f64_vsx>,<zero_fp>,<zero_fp>,r,Y,r,r,h,0,wg,r,<f64_dm>,r"))]
+  [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
+           "=m,           d,          d,          <f64_p9>,   wY,
+             <f64_av>,    Z,          <f64_vsx>,  <f64_vsx>,  !r,
+             Y,           r,          !r,         *c*l,       !r,
+            *h,           r,          wg,         r,          <f64_dm>")
+
+	(match_operand:FMOVE64 1 "input_operand"
+            "d,           m,          d,          wY,         <f64_p9>,
+             Z,           <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
+             r,           Y,          r,          r,          h,
+             0,           wg,         r,          <f64_dm>,   r"))]
+
   "TARGET_POWERPC64 && TARGET_HARD_FLOAT && TARGET_DOUBLE_FLOAT
-   && (gpc_reg_operand (operands[0], <MODE>mode)
-       || gpc_reg_operand (operands[1], <MODE>mode))"
-  "@
-   stfd%U0%X0 %1,%0
-   lfd%U1%X1 %0,%1
-   fmr %0,%1
-   lxsd %0,%1
-   stxsd %1,%0
-   lxsd%U1x %x0,%y1
-   stxsd%U0x %x1,%y0
-   xxlor %x0,%x1,%x1
-   xxlxor %x0,%x0,%x0
-   li %0,0
-   std%U0%X0 %1,%0
-   ld%U1%X1 %0,%1
-   mr %0,%1
-   mt%0 %1
-   mf%1 %0
-   nop
-   mftgpr %0,%1
-   mffgpr %0,%1
-   mfvsrd %0,%x1
-   mtvsrd %x0,%1"
-  [(set_attr "type" "fpstore,fpload,fpsimple,fpload,fpstore,fpload,fpstore,veclogical,veclogical,integer,store,load,*,mtjmpr,mfjmpr,*,mftgpr,mffgpr,mftgpr,mffgpr")
+   && rs6000_valid_move_p (operands[0], operands[1])"
+  "* return rs6000_output_move_64bit (operands);"
+  [(set_attr "type"
+            "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
+             fpload,      fpstore,    veclogical, veclogical, integer,
+             store,       load,       *,          mtjmpr,     mfjmpr,
+             *,           mftgpr,     mffgpr,     mftgpr,    mffgpr")
+
    (set_attr "size" "64")
    (set_attr "length" "4")])
 
+;;           STD      LD       MR      MT<SPR> MF<SPR> G-const
+;;           H-const  F-const  Special
+
 (define_insn "*mov<mode>_softfloat64"
-  [(set (match_operand:FMOVE64 0 "nonimmediate_operand" "=Y,r,r,cl,r,r,r,r,*h")
-	(match_operand:FMOVE64 1 "input_operand" "r,Y,r,r,h,G,H,F,0"))]
+  [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
+           "=Y,       r,      r,      cl,     r,      r,
+             r,       r,      *h")
+
+	(match_operand:FMOVE64 1 "input_operand"
+            "r,       Y,      r,      r,      h,      G,
+             H,       F,      0"))]
+
   "TARGET_POWERPC64 && TARGET_SOFT_FLOAT
-   && (gpc_reg_operand (operands[0], <MODE>mode)
-       || gpc_reg_operand (operands[1], <MODE>mode))"
-  "@
-   std%U0%X0 %1,%0
-   ld%U1%X1 %0,%1
-   mr %0,%1
-   mt%0 %1
-   mf%1 %0
-   #
-   #
-   #
-   nop"
-  [(set_attr "type" "store,load,*,mtjmpr,mfjmpr,*,*,*,*")
-   (set_attr "length" "4,4,4,4,4,8,12,16,4")])
+   && rs6000_valid_move_p (operands[0], operands[1])"
+  "* return rs6000_output_move_64bit (operands);"
+  [(set_attr "type"
+            "store,   load,   *,      mtjmpr, mfjmpr, *,
+             *,       *,      *")
+
+   (set_attr "length"
+            "4,       4,      4,      4,      4,      8,
+             12,      16,     4")])
 \f
 (define_expand "mov<mode>"
   [(set (match_operand:FMOVE128 0 "general_operand")

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC Patch], PowerPC memory support pre-gcc9, patch #1
  2018-03-14 23:01 [RFC Patch], PowerPC memory support pre-gcc9, patch #1 Michael Meissner
                   ` (2 preceding siblings ...)
  2018-03-16 17:27 ` [RFC Patch], PowerPC memory support pre-gcc9, patch #4 Michael Meissner
@ 2018-03-16 23:31 ` Michael Meissner
  2018-03-20 16:31   ` Segher Boessenkool
  2018-03-20 13:04 ` Segher Boessenkool
  4 siblings, 1 reply; 11+ messages in thread
From: Michael Meissner @ 2018-03-16 23:31 UTC (permalink / raw)
  To: Michael Meissner, GCC Patches, Segher Boessenkool,
	David Edelsohn, Bill Schmidt

[-- Attachment #1: Type: text/plain, Size: 747 bytes --]

In patch #4, I mentioned that the spec 2006 benchmark 'tonto' generated
different with the patches applied.  I tracked it down, and it was due to the
call I inserted in rs6000_debug_reg_print to update the conditional register
usage seemed to set the Altivec registers VS0..VS19 to call_used instead of
call_saved.  Since I no longer need to set the conditional register usage with
-mdebug=reg, it is simpler to just delete it.

2018-03-16  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* config/rs6000/rs6000.c (rs6000_debug_reg_print): Eliminate call
	to rs6000_conditional_register_usage.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

[-- Attachment #2: ext-addr.patch05b --]
[-- Type: text/plain, Size: 846 bytes --]

Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 258576)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -1314,7 +1314,6 @@ static bool rs6000_debug_can_change_mode
 						reg_class_t);
 static bool rs6000_save_toc_in_prologue_p (void);
 static rtx rs6000_internal_arg_pointer (void);
-static void rs6000_conditional_register_usage (void);
 
 rtx (*rs6000_legitimize_reload_address_ptr) (rtx, machine_mode, int, int,
 					     int, int *)
@@ -2144,10 +2143,6 @@ rs6000_debug_reg_print (int first_regno,
 {
   int r, m;
 
-  /* Insure the conditional registers are up to date when printing the debug
-     information.  */
-  rs6000_conditional_register_usage ();
-
   for (r = first_regno; r <= last_regno; ++r)
     {
       const char *comma = "";

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC Patch], PowerPC memory support pre-gcc9, patch #1
  2018-03-14 23:01 [RFC Patch], PowerPC memory support pre-gcc9, patch #1 Michael Meissner
                   ` (3 preceding siblings ...)
  2018-03-16 23:31 ` [RFC Patch], PowerPC memory support pre-gcc9, patch #1 Michael Meissner
@ 2018-03-20 13:04 ` Segher Boessenkool
  2018-03-20 20:35   ` Michael Meissner
  4 siblings, 1 reply; 11+ messages in thread
From: Segher Boessenkool @ 2018-03-20 13:04 UTC (permalink / raw)
  To: Michael Meissner, GCC Patches, David Edelsohn, Bill Schmidt

Hi!  Some comments...

On Wed, Mar 14, 2018 at 06:54:08PM -0400, Michael Meissner wrote:
> The first patch in the series moves most of the reg_addr structure from
> rs6000.c to rs6000-protos.h, so that in the next patch, we can start splitting
> some of the address code to other files.

Is that the correct header?  It currently contains only function
prototypes, and the name indicates that is what it should be.

>     1)	I was playing with making r12 be fixed with a new option (not in this
> 	set of patches), and I noticed it wasn't reflected in the -mdebug=reg
> 	debug dump, due to the debug dump being done before the conditional
> 	registers are setup.  I made the debug dump set conditional registers.

Various ABIs use r12 for various things.  It's also used for split stack.
Besides that it is available for programs to do with as they please.

> I likely will remove the undocumented toc-fusion all together, and eventually
> rework the p8/p9 fusion support.

Did it ever give any performance improvement?

> 2018-03-14  Michael Meissner  <meissner@linux.vnet.ibm.com>
> 	* config/rs6000/rs6000-protos.h (regno_or_subregno): Add
> 	declaration.

There is a generic reg_or_subregno, how does this differ?  If we need
it please change the name so the difference is clear.

It is very hard to review these patches.  Please do patches that only
move or rename things, not changing functionality, as separate patches
(usually before everything else).


Segher

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC Patch], PowerPC memory support pre-gcc9, patch #2
  2018-03-15 17:09 ` [RFC Patch], PowerPC memory support pre-gcc9, patch #2 Michael Meissner
@ 2018-03-20 13:32   ` Segher Boessenkool
  2018-03-20 20:27     ` Michael Meissner
  0 siblings, 1 reply; 11+ messages in thread
From: Segher Boessenkool @ 2018-03-20 13:32 UTC (permalink / raw)
  To: Michael Meissner, GCC Patches, David Edelsohn, Bill Schmidt

On Thu, Mar 15, 2018 at 01:04:30PM -0400, Michael Meissner wrote:
> This is patch #2 of my series for improving the PowerPC internal memory
> support.  It assumes patch #1 has been applied.
> 
> This patch moves the rs6000_move_128bit function from rs6000.c to a new file,
> rs6000-output.c.
> 
> The third patch will create a rs6000_move_64bit function and change both 32-bit
> and 64-bit movdi to call it, instead of having all of the instructions be
> literals.  I will also likely add improvements to setting the reg_addr address
> masks for offsetable addresses.
> 
> The fourth patch will like move movdd and movdf to call rs6000_move_64bit as
> well.
> 
> I tested this on a little endian power8 system and there were no regressions.
> 
> 2018-03-14  Michael Meissner  <meissner@linux.vnet.ibm.com>
> 
> 	* config.gcc (powerpc*-*-*): Add rs6000-output.o to extra_objs.
> 	* config/rs6000/t-rs6000 (rs6000-output.o): Add build rule.
> 	* config/rs6000/rs6000.c (rs6000_output_move_128bit): Move to
> 	rs6000-output.c.

I am not happy at all with this new file, and it won't even work as far
as I see (for multi-alternative define_insn's; splitting the strings to
a different file than the constraints and attributes is asking for
trouble, better keep it all together).

Files should bundle together code that conceptually belongs together,
not some arbitrary split ("these routine return strings that are
eventually output from the compiler as instructions").


Segher

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC Patch], PowerPC memory support pre-gcc9, patch #4
  2018-03-16 17:27 ` [RFC Patch], PowerPC memory support pre-gcc9, patch #4 Michael Meissner
@ 2018-03-20 16:21   ` Segher Boessenkool
  0 siblings, 0 replies; 11+ messages in thread
From: Segher Boessenkool @ 2018-03-20 16:21 UTC (permalink / raw)
  To: Michael Meissner, GCC Patches, David Edelsohn, Bill Schmidt

Hi Mike,

On Fri, Mar 16, 2018 at 12:50:45PM -0400, Michael Meissner wrote:
> --- gcc/config/rs6000/rs6000-output.c	(revision 258576)
> +++ gcc/config/rs6000/rs6000-output.c	(working copy)
> @@ -162,7 +162,13 @@ rs6000_output_move_64bit (rtx operands[]
>  
>        /* Moves to SPRs.  */
>        else if (reg_is_spr_p (dest))
> -	return "mt%0 %1";
> +	{
> +	  if (src_gpr_p)
> +	    return "mt%0 %1";
> +
> +	  else if (dest_regno == src_regno)
> +	    return "nop";
> +	}
>      }

Is this correct?  Many SPRs are not simple registers, they do something
when you write to them.  But I guess this is only for lr,ctr,vrsave (i.e.
regclass "h", "SPECIAL_REGS").  So maybe we want a better name?

> +;;           STD      LD       MR      MT<SPR> MF<SPR> G-const
> +;;           H-const  F-const  Special
> +
>  (define_insn "*mov<mode>_softfloat64"
> -  [(set (match_operand:FMOVE64 0 "nonimmediate_operand" "=Y,r,r,cl,r,r,r,r,*h")
> -	(match_operand:FMOVE64 1 "input_operand" "r,Y,r,r,h,G,H,F,0"))]
> +  [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
> +           "=Y,       r,      r,      cl,     r,      r,
> +             r,       r,      *h")
> +
> +	(match_operand:FMOVE64 1 "input_operand"
> +            "r,       Y,      r,      r,      h,      G,
> +             H,       F,      0"))]
> +
>    "TARGET_POWERPC64 && TARGET_SOFT_FLOAT
> -   && (gpc_reg_operand (operands[0], <MODE>mode)
> -       || gpc_reg_operand (operands[1], <MODE>mode))"
> -  "@
> -   std%U0%X0 %1,%0
> -   ld%U1%X1 %0,%1
> -   mr %0,%1
> -   mt%0 %1
> -   mf%1 %0
> -   #
> -   #
> -   #
> -   nop"
> -  [(set_attr "type" "store,load,*,mtjmpr,mfjmpr,*,*,*,*")
> -   (set_attr "length" "4,4,4,4,4,8,12,16,4")])
> +   && rs6000_valid_move_p (operands[0], operands[1])"
> +  "* return rs6000_output_move_64bit (operands);"
> +  [(set_attr "type"
> +            "store,   load,   *,      mtjmpr, mfjmpr, *,
> +             *,       *,      *")
> +
> +   (set_attr "length"
> +            "4,       4,      4,      4,      4,      8,
> +             12,      16,     4")])

Let's take this one as example.  The attributes depend on which alternative
is selected, but with your change the actual output insn does not.  That is
no good.

Maybe you can reduce the number of alternatives?  Make it just store,
load, and moves for example, and then select the attributes based on what
machine insns you actually output?  The ones that are split are the
problematic ones in that case, the rest is easy to handle.


Segher

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC Patch], PowerPC memory support pre-gcc9, patch #1
  2018-03-16 23:31 ` [RFC Patch], PowerPC memory support pre-gcc9, patch #1 Michael Meissner
@ 2018-03-20 16:31   ` Segher Boessenkool
  0 siblings, 0 replies; 11+ messages in thread
From: Segher Boessenkool @ 2018-03-20 16:31 UTC (permalink / raw)
  To: Michael Meissner, GCC Patches, David Edelsohn, Bill Schmidt

On Fri, Mar 16, 2018 at 07:01:18PM -0400, Michael Meissner wrote:
> In patch #4, I mentioned that the spec 2006 benchmark 'tonto' generated
> different with the patches applied.  I tracked it down, and it was due to the
> call I inserted in rs6000_debug_reg_print to update the conditional register
> usage seemed to set the Altivec registers VS0..VS19 to call_used instead of
> call_saved.  Since I no longer need to set the conditional register usage with
> -mdebug=reg, it is simpler to just delete it.
> 
> 2018-03-16  Michael Meissner  <meissner@linux.vnet.ibm.com>
> 
> 	* config/rs6000/rs6000.c (rs6000_debug_reg_print): Eliminate call
> 	to rs6000_conditional_register_usage.

Yes, debug output should *never* change *any* state.

Could you fold this patch into the patch that created the problem please?


Segher

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC Patch], PowerPC memory support pre-gcc9, patch #2
  2018-03-20 13:32   ` Segher Boessenkool
@ 2018-03-20 20:27     ` Michael Meissner
  0 siblings, 0 replies; 11+ messages in thread
From: Michael Meissner @ 2018-03-20 20:27 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Michael Meissner, GCC Patches, David Edelsohn, Bill Schmidt

On Tue, Mar 20, 2018 at 08:30:57AM -0500, Segher Boessenkool wrote:
> On Thu, Mar 15, 2018 at 01:04:30PM -0400, Michael Meissner wrote:
> > This is patch #2 of my series for improving the PowerPC internal memory
> > support.  It assumes patch #1 has been applied.
> > 
> > This patch moves the rs6000_move_128bit function from rs6000.c to a new file,
> > rs6000-output.c.
> > 
> > The third patch will create a rs6000_move_64bit function and change both 32-bit
> > and 64-bit movdi to call it, instead of having all of the instructions be
> > literals.  I will also likely add improvements to setting the reg_addr address
> > masks for offsetable addresses.
> > 
> > The fourth patch will like move movdd and movdf to call rs6000_move_64bit as
> > well.
> > 
> > I tested this on a little endian power8 system and there were no regressions.
> > 
> > 2018-03-14  Michael Meissner  <meissner@linux.vnet.ibm.com>
> > 
> > 	* config.gcc (powerpc*-*-*): Add rs6000-output.o to extra_objs.
> > 	* config/rs6000/t-rs6000 (rs6000-output.o): Add build rule.
> > 	* config/rs6000/rs6000.c (rs6000_output_move_128bit): Move to
> > 	rs6000-output.c.
> 
> I am not happy at all with this new file, and it won't even work as far
> as I see (for multi-alternative define_insn's; splitting the strings to
> a different file than the constraints and attributes is asking for
> trouble, better keep it all together).
> 
> Files should bundle together code that conceptually belongs together,
> not some arbitrary split ("these routine return strings that are
> eventually output from the compiler as instructions").

I was eventually planning to move the other functions that split insns and
output the strings there.  But I can keep it in rs6000.c if desired.  I was
just trying to keep the mechanical changes down, rather than move everything
all at once.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC Patch], PowerPC memory support pre-gcc9, patch #1
  2018-03-20 13:04 ` Segher Boessenkool
@ 2018-03-20 20:35   ` Michael Meissner
  0 siblings, 0 replies; 11+ messages in thread
From: Michael Meissner @ 2018-03-20 20:35 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Michael Meissner, GCC Patches, David Edelsohn, Bill Schmidt

On Tue, Mar 20, 2018 at 08:01:57AM -0500, Segher Boessenkool wrote:
> Hi!  Some comments...
> 
> On Wed, Mar 14, 2018 at 06:54:08PM -0400, Michael Meissner wrote:
> > The first patch in the series moves most of the reg_addr structure from
> > rs6000.c to rs6000-protos.h, so that in the next patch, we can start splitting
> > some of the address code to other files.
> 
> Is that the correct header?  It currently contains only function
> prototypes, and the name indicates that is what it should be.
> 
> >     1)	I was playing with making r12 be fixed with a new option (not in this
> > 	set of patches), and I noticed it wasn't reflected in the -mdebug=reg
> > 	debug dump, due to the debug dump being done before the conditional
> > 	registers are setup.  I made the debug dump set conditional registers.
> 
> Various ABIs use r12 for various things.  It's also used for split stack.
> Besides that it is available for programs to do with as they please.
> 
> > I likely will remove the undocumented toc-fusion all together, and eventually
> > rework the p8/p9 fusion support.
> 
> Did it ever give any performance improvement?
> 
> > 2018-03-14  Michael Meissner  <meissner@linux.vnet.ibm.com>
> > 	* config/rs6000/rs6000-protos.h (regno_or_subregno): Add
> > 	declaration.
> 
> There is a generic reg_or_subregno, how does this differ?  If we need
> it please change the name so the difference is clear.
> 
> It is very hard to review these patches.  Please do patches that only
> move or rename things, not changing functionality, as separate patches
> (usually before everything else).

Ok, but if you want me to shove everything back into rs6000.c that simplifies
things.  Some of the artiface is to support the reg_addr stuff in multiple
locations.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-03-20 20:32 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-14 23:01 [RFC Patch], PowerPC memory support pre-gcc9, patch #1 Michael Meissner
2018-03-15 17:09 ` [RFC Patch], PowerPC memory support pre-gcc9, patch #2 Michael Meissner
2018-03-20 13:32   ` Segher Boessenkool
2018-03-20 20:27     ` Michael Meissner
2018-03-15 23:33 ` [RFC Patch], PowerPC memory support pre-gcc9, patch #3 Michael Meissner
2018-03-16 17:27 ` [RFC Patch], PowerPC memory support pre-gcc9, patch #4 Michael Meissner
2018-03-20 16:21   ` Segher Boessenkool
2018-03-16 23:31 ` [RFC Patch], PowerPC memory support pre-gcc9, patch #1 Michael Meissner
2018-03-20 16:31   ` Segher Boessenkool
2018-03-20 13:04 ` Segher Boessenkool
2018-03-20 20:35   ` Michael Meissner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).