public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH], Add power9 support to GCC, patch #1
@ 2015-11-03 20:29 Michael Meissner
  2015-11-04 21:16 ` Segher Boessenkool
                   ` (9 more replies)
  0 siblings, 10 replies; 47+ messages in thread
From: Michael Meissner @ 2015-11-03 20:29 UTC (permalink / raw)
  To: gcc-patches, dje.gcc

[-- Attachment #1: Type: text/plain, Size: 2488 bytes --]

This patch adds the stub support to allow users to use -mcpu=power9, so that in
the future it will generate code for the Power9 systems (ISA 3.0).  At this
time, the stub only sets up the switches.  Future patches to GCC 6.x and later
GCC 7.x will add support for various features in power9.

I have bootstrapped this on a big endian power7 system and a little endian
power8 system with no regressions.  Is this patch ok to install in the trunk?

I would also like to back port this initial support into GCC 5.x.  Is that ok
as well?

2015-11-03  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* config/rs6000/rs6000.opt (-mfusion-toc): Add new switches for
	ISA 3.0 (power9).
	(-mpower9-fusion): Likewise.
	(-mpower9-vector): Likewise.
	(-mmodulo): Likewise.
	(-mfloat128-hardware): Likewise.

	* config/rs6000/rs6000-cpus.def (ISA_3_0_MASKS_SERVER): Add option
	mask for ISA 3.0 (power9).
	(POWERPC_MASKS): Add new ISA 3.0 switches.
	(power9 cpu): Add power9 cpu.

	* config/rs6000/rs6000.h (ASM_CPU_POWER9_SPEC): Add support for
	power9.
	(ASM_CPU_SPEC): Likewise.
	(EXTRA_SPECS): Likewise.

	* config/rs6000/rs6000.c (power9_cost): Initial cost setup for
	power9.
	(rs6000_debug_reg_global): Add support for power9 fusion.
	(rs6000_setup_reg_addr_masks): Cache mode size.
	(rs6000_option_override_internal): Until real power9 tuning is
	added, use -mtune=power8 for -mcpu=power9.
	(rs6000_option_override_internal): Add support for ISA 3.0
	switches.
	(rs6000_loop_align): Add support for power9 cpu.
	(rs6000_file_start): Likewise.
	(rs6000_adjust_cost): Likewise.
	(rs6000_issue_rate): Likewise.
	(insn_must_be_first_in_group): Likewise.
	(insn_must_be_last_in_group): Likewise.
	(force_new_group): Likewise.
	(rs6000_register_move_cost): Likewise.
	(rs6000_opt_masks): Likewise.

	* config/rs6000/rs6000.md (cpu attribute): Add power9.
	* config/rs6000/rs6000-tables.opt: Regenerate.

	* config/rs6000/rs6000-c.c (rs6000_target_modify_macros): Define
	_ARCH_PWR9 if power9 support is available.

	* config/rs6000/aix61.h (ASM_CPU_SPEC): Add power9.
	* config/rs6000/aix53.h (ASM_CPU_SPEC): Likewise.

	* configure.ac: Determine if the assembler supports the ISA 3.0
	instructions.
	* config.in (HAVE_AS_POWER9): Likewise.
	* configure: Regenerate.

	* doc/invoke.texi (RS/6000 and PowerPC Options): Document ISA 3.0
	switches.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

[-- Attachment #2: gcc-power9.official-01b --]
[-- Type: text/plain, Size: 27619 bytes --]

Index: gcc/config/rs6000/rs6000.opt
===================================================================
--- gcc/config/rs6000/rs6000.opt	(revision 229674)
+++ gcc/config/rs6000/rs6000.opt	(working copy)
@@ -561,6 +561,10 @@ mpower8-vector
 Target Report Mask(P8_VECTOR) Var(rs6000_isa_flags)
 Use/do not use vector and scalar instructions added in ISA 2.07.
 
+mfusion-toc
+Target Undocumented Mask(FUSION_TOC) Var(rs6000_isa_flags)
+Fuse medium/large code model toc references to the memory instruction.
+
 mcrypto
 Target Report Mask(CRYPTO) Var(rs6000_isa_flags)
 Use ISA 2.07 Category:Vector.AES and Category:Vector.SHA2 instructions.
@@ -601,6 +605,22 @@ moptimize-swaps
 Target Undocumented Var(rs6000_optimize_swaps) Init(1) Save
 Analyze and remove doubleword swaps from VSX computations.
 
+mpower9-fusion
+Target Report Mask(P9_FUSION) Var(rs6000_isa_flags)
+Fuse certain operations together for better performance on power9.
+
+mpower9-vector
+Target Report Mask(P9_VECTOR) Var(rs6000_isa_flags)
+Use/do not use vector and scalar instructions added in ISA 2.08.
+
+mmodulo
+Target Report Mask(MODULO) Var(rs6000_isa_flags)
+Generate the integer modulo instructions.
+
 mfloat128
 Target Report Mask(FLOAT128) Var(rs6000_isa_flags)
 Enable/disable IEEE 128-bit floating point via the __float128 keyword.
+
+mfloat128-hardware
+Target Report Mask(FLOAT128_HW) Var(rs6000_isa_flags)
+Enable/disable using IEEE 128-bit floating point instructions.
Index: gcc/config/rs6000/rs6000-cpus.def
===================================================================
--- gcc/config/rs6000/rs6000-cpus.def	(revision 229674)
+++ gcc/config/rs6000/rs6000-cpus.def	(working copy)
@@ -60,6 +60,14 @@
   				 | OPTION_MASK_QUAD_MEMORY_ATOMIC	\
 				 | OPTION_MASK_UPPER_REGS_SF)
 
+/* Add ISEL back into ISA 3.0, since it is supposed to be a win.  */
+#define ISA_3_0_MASKS_SERVER	(ISA_2_7_MASKS_SERVER			\
+				 | OPTION_MASK_FLOAT128_HW		\
+				 | OPTION_MASK_ISEL			\
+				 | OPTION_MASK_MODULO			\
+				 | OPTION_MASK_P9_FUSION		\
+				 | OPTION_MASK_P9_VECTOR)
+
 #define POWERPC_7400_MASK	(OPTION_MASK_PPC_GFXOPT | OPTION_MASK_ALTIVEC)
 
 /* Deal with ports that do not have -mstrict-align.  */
@@ -83,14 +91,18 @@
 				 | OPTION_MASK_EFFICIENT_UNALIGNED_VSX	\
 				 | OPTION_MASK_FLOAT128			\
 				 | OPTION_MASK_FPRND			\
+				 | OPTION_MASK_FUSION_TOC		\
 				 | OPTION_MASK_HTM			\
 				 | OPTION_MASK_ISEL			\
 				 | OPTION_MASK_MFCRF			\
 				 | OPTION_MASK_MFPGPR			\
+				 | OPTION_MASK_MODULO			\
 				 | OPTION_MASK_MULHW			\
 				 | OPTION_MASK_NO_UPDATE		\
 				 | OPTION_MASK_P8_FUSION		\
 				 | OPTION_MASK_P8_VECTOR		\
+				 | OPTION_MASK_P9_FUSION		\
+				 | OPTION_MASK_P9_VECTOR		\
 				 | OPTION_MASK_POPCNTB			\
 				 | OPTION_MASK_POPCNTD			\
 				 | OPTION_MASK_POWERPC64		\
@@ -195,6 +207,7 @@ RS6000_CPU ("power7", PROCESSOR_POWER7, 
 	    | MASK_POPCNTB | MASK_FPRND | MASK_CMPB | MASK_DFP | MASK_POPCNTD
 	    | MASK_VSX | MASK_RECIP_PRECISION | OPTION_MASK_UPPER_REGS_DF)
 RS6000_CPU ("power8", PROCESSOR_POWER8, MASK_POWERPC64 | ISA_2_7_MASKS_SERVER)
+RS6000_CPU ("power9", PROCESSOR_POWER9, MASK_POWERPC64 | ISA_3_0_MASKS_SERVER)
 RS6000_CPU ("powerpc", PROCESSOR_POWERPC, 0)
 RS6000_CPU ("powerpc64", PROCESSOR_POWERPC64, MASK_PPC_GFXOPT | MASK_POWERPC64)
 RS6000_CPU ("powerpc64le", PROCESSOR_POWER8, MASK_POWERPC64 | ISA_2_7_MASKS_SERVER)
Index: gcc/config/rs6000/rs6000.h
===================================================================
--- gcc/config/rs6000/rs6000.h	(revision 229674)
+++ gcc/config/rs6000/rs6000.h	(working copy)
@@ -95,6 +95,12 @@
 #define ASM_CPU_POWER8_SPEC ASM_CPU_POWER7_SPEC
 #endif
 
+#ifdef HAVE_AS_POWER9
+#define ASM_CPU_POWER9_SPEC "-mpower9"
+#else
+#define ASM_CPU_POWER9_SPEC ASM_CPU_POWER8_SPEC
+#endif
+
 #ifdef HAVE_AS_DCI
 #define ASM_CPU_476_SPEC "-m476"
 #else
@@ -119,6 +125,7 @@
 %{mcpu=power6x: %(asm_cpu_power6) -maltivec} \
 %{mcpu=power7: %(asm_cpu_power7)} \
 %{mcpu=power8: %(asm_cpu_power8)} \
+%{mcpu=power9: %(asm_cpu_power9)} \
 %{mcpu=a2: -ma2} \
 %{mcpu=powerpc: -mppc} \
 %{mcpu=rs64a: -mppc64} \
@@ -193,6 +200,7 @@
   { "asm_cpu_power6",		ASM_CPU_POWER6_SPEC },			\
   { "asm_cpu_power7",		ASM_CPU_POWER7_SPEC },			\
   { "asm_cpu_power8",		ASM_CPU_POWER8_SPEC },			\
+  { "asm_cpu_power9",		ASM_CPU_POWER9_SPEC },			\
   { "asm_cpu_476",		ASM_CPU_476_SPEC },			\
   SUBTARGET_EXTRA_SPECS
 
Index: gcc/config/rs6000/rs6000-opts.h
===================================================================
--- gcc/config/rs6000/rs6000-opts.h	(revision 229674)
+++ gcc/config/rs6000/rs6000-opts.h	(working copy)
@@ -60,6 +60,7 @@ enum processor_type
    PROCESSOR_POWER6,
    PROCESSOR_POWER7,
    PROCESSOR_POWER8,
+   PROCESSOR_POWER9,
 
    PROCESSOR_RS64A,
    PROCESSOR_MPCCORE,
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 229674)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -985,6 +985,26 @@ struct processor_costs power8_cost = {
   COSTS_N_INSNS (3),	/* SF->DF convert */
 };
 
+/* Instruction costs on POWER9 processors.  */
+static const
+struct processor_costs power9_cost = {
+  COSTS_N_INSNS (3),	/* mulsi */
+  COSTS_N_INSNS (3),	/* mulsi_const */
+  COSTS_N_INSNS (3),	/* mulsi_const9 */
+  COSTS_N_INSNS (3),	/* muldi */
+  COSTS_N_INSNS (19),	/* divsi */
+  COSTS_N_INSNS (35),	/* divdi */
+  COSTS_N_INSNS (3),	/* fp */
+  COSTS_N_INSNS (3),	/* dmul */
+  COSTS_N_INSNS (14),	/* sdiv */
+  COSTS_N_INSNS (17),	/* ddiv */
+  128,			/* cache line size */
+  32,			/* l1 cache */
+  256,			/* l2 cache */
+  12,			/* prefetch streams */
+  COSTS_N_INSNS (3),	/* SF->DF convert */
+};
+
 /* Instruction costs on POWER A2 processors.  */
 static const
 struct processor_costs ppca2_cost = {
@@ -2423,8 +2443,18 @@ rs6000_debug_reg_global (void)
     fprintf (stderr, DEBUG_FMT_S, "lra", "true");
 
   if (TARGET_P8_FUSION)
-    fprintf (stderr, DEBUG_FMT_S, "p8 fusion",
-	     (TARGET_P8_FUSION_SIGN) ? "zero+sign" : "zero");
+    {
+      char options[80];
+
+      strcpy (options, (TARGET_P9_FUSION) ? "power9" : "power8");
+      if (TARGET_FUSION_TOC)
+	strcat (options, ", toc");
+
+      if (TARGET_P8_FUSION_SIGN)
+	strcat (options, ", sign");
+
+      fprintf (stderr, DEBUG_FMT_S, "fusion", options);
+    }
 
   fprintf (stderr, DEBUG_FMT_S, "plt-format",
 	   TARGET_SECURE_PLT ? "secure" : "bss");
@@ -2463,6 +2493,7 @@ rs6000_setup_reg_addr_masks (void)
   for (m = 0; m < NUM_MACHINE_MODES; ++m)
     {
       machine_mode m2 = (machine_mode)m;
+      unsigned short msize = GET_MODE_SIZE (m2);
 
       /* SDmode is special in that we want to access it only via REG+REG
 	 addressing on power7 and above, since we want to use the LFIWZX and
@@ -2496,12 +2527,12 @@ rs6000_setup_reg_addr_masks (void)
 
 	      if (TARGET_UPDATE
 		  && (rc == RELOAD_REG_GPR || rc == RELOAD_REG_FPR)
-		  && GET_MODE_SIZE (m2) <= 8
+		  && msize <= 8
 		  && !VECTOR_MODE_P (m2)
 		  && !FLOAT128_VECTOR_P (m2)
 		  && !COMPLEX_MODE_P (m2)
 		  && !indexed_only_p
-		  && !(TARGET_E500_DOUBLE && GET_MODE_SIZE (m2) == 8))
+		  && !(TARGET_E500_DOUBLE && msize == 8))
 		{
 		  addr_mask |= RELOAD_REG_PRE_INCDEC;
 
@@ -3382,7 +3413,22 @@ rs6000_option_override_internal (bool gl
   if (rs6000_tune_index >= 0)
     tune_index = rs6000_tune_index;
   else if (have_cpu)
-    rs6000_tune_index = tune_index = cpu_index;
+    {
+      /* Until power9 tuning is available, use power8 tuning if -mcpu=power9.  */
+      if (processor_target_table[cpu_index].processor != PROCESSOR_POWER9)
+	rs6000_tune_index = tune_index = cpu_index;
+      else
+	{
+	  size_t i;
+	  tune_index = -1;
+	  for (i = 0; i < ARRAY_SIZE (processor_target_table); i++)
+	    if (processor_target_table[i].processor == PROCESSOR_POWER8)
+	      {
+		rs6000_tune_index = tune_index = i;
+		break;
+	      }
+	}
+    }
   else
     {
       size_t i;
@@ -3557,7 +3603,9 @@ rs6000_option_override_internal (bool gl
 
   /* For the newer switches (vsx, dfp, etc.) set some of the older options,
      unless the user explicitly used the -mno-<option> to disable the code.  */
-  if (TARGET_P8_VECTOR || TARGET_DIRECT_MOVE || TARGET_CRYPTO)
+  if (TARGET_P9_VECTOR || TARGET_MODULO)
+    rs6000_isa_flags |= (ISA_3_0_MASKS_SERVER & ~rs6000_isa_flags_explicit);
+  else if (TARGET_P8_VECTOR || TARGET_DIRECT_MOVE || TARGET_CRYPTO)
     rs6000_isa_flags |= (ISA_2_7_MASKS_SERVER & ~rs6000_isa_flags_explicit);
   else if (TARGET_VSX)
     rs6000_isa_flags |= (ISA_2_6_MASKS_SERVER & ~rs6000_isa_flags_explicit);
@@ -3703,6 +3751,41 @@ rs6000_option_override_internal (bool gl
     rs6000_isa_flags |= (processor_target_table[tune_index].target_enable
 			 & OPTION_MASK_P8_FUSION);
 
+  /* Setting additional fusion flags turns on base fusion.  */
+  if (!TARGET_P8_FUSION && (TARGET_P8_FUSION_SIGN || TARGET_FUSION_TOC))
+    {
+      if (rs6000_isa_flags_explicit & OPTION_MASK_P8_FUSION)
+	{
+	  if (TARGET_P8_FUSION_SIGN)
+	    error ("-mpower8-fusion-sign requires -mpower8-fusion");
+
+	  if (TARGET_FUSION_TOC)
+	    error ("-mfusion-toc requires -mpower8-fusion");
+
+	  rs6000_isa_flags &= ~OPTION_MASK_P8_FUSION;
+	}
+      else
+	rs6000_isa_flags |= OPTION_MASK_P8_FUSION;
+    }
+
+  /* Power9 fusion is a superset over power8 fusion.  */
+  if (TARGET_P9_FUSION && !TARGET_P8_FUSION)
+    {
+      if (rs6000_isa_flags_explicit & OPTION_MASK_P8_FUSION)
+	{
+	  error ("-mpower9-fusion requires -mpower8-fusion");
+	  rs6000_isa_flags &= ~OPTION_MASK_P9_FUSION;
+	}
+      else
+	rs6000_isa_flags |= OPTION_MASK_P8_FUSION;
+    }
+
+  /* Enable power9 fusion if we are tuning for power9, even if we aren't
+     generating power9 instructions.  */
+  if (!(rs6000_isa_flags_explicit & OPTION_MASK_P9_FUSION))
+    rs6000_isa_flags |= (processor_target_table[tune_index].target_enable
+			 & OPTION_MASK_P9_FUSION);
+
   /* Power8 does not fuse sign extended loads with the addis.  If we are
      optimizing at high levels for speed, convert a sign extended load into a
      zero extending load, and an explicit sign extension.  */
@@ -3712,6 +3795,36 @@ rs6000_option_override_internal (bool gl
       && optimize >= 3)
     rs6000_isa_flags |= OPTION_MASK_P8_FUSION_SIGN;
 
+  /* TOC fusion requires 64-bit and medium/large code model.  */
+  if (TARGET_FUSION_TOC && !TARGET_POWERPC64)
+    {
+      rs6000_isa_flags &= ~OPTION_MASK_FUSION_TOC;
+      if ((rs6000_isa_flags_explicit & OPTION_MASK_FUSION_TOC) != 0)
+	warning (0, N_("-mfusion-toc requires 64-bit"));
+    }
+
+  if (TARGET_FUSION_TOC && (TARGET_CMODEL == CMODEL_SMALL))
+    {
+      rs6000_isa_flags &= ~OPTION_MASK_FUSION_TOC;
+      if ((rs6000_isa_flags_explicit & OPTION_MASK_FUSION_TOC) != 0)
+	warning (0, N_("-mfusion-toc requires medium/large code model"));
+    }
+
+  /* Turn on -mfusion-toc by default if p8-fusion and 64-bit medium/large code
+     model.  */
+  if (TARGET_P8_FUSION && !TARGET_FUSION_TOC && TARGET_POWERPC64
+      && (TARGET_CMODEL != CMODEL_SMALL)
+      && !(rs6000_isa_flags_explicit & OPTION_MASK_FUSION_TOC))
+    rs6000_isa_flags |= OPTION_MASK_FUSION_TOC;
+
+  /* ISA 2.08 vector instructions include ISA 2.07.  */
+  if (TARGET_P9_VECTOR && !TARGET_P8_VECTOR)
+    {
+      if (rs6000_isa_flags_explicit & OPTION_MASK_P9_VECTOR)
+	error ("-mpower9-vector requires -mpower8-vector");
+      rs6000_isa_flags &= ~OPTION_MASK_P9_VECTOR;
+    }
+
   /* Set -mallow-movmisalign to explicitly on if we have full ISA 2.07
      support. If we only have ISA 2.06 support, and the user did not specify
      the switch, leave it set to -1 so the movmisalign patterns are enabled,
@@ -3757,9 +3870,32 @@ rs6000_option_override_internal (bool gl
       if ((rs6000_isa_flags_explicit & OPTION_MASK_FLOAT128) != 0)
 	error ("-mfloat128 requires VSX support");
 
-      rs6000_isa_flags &= ~OPTION_MASK_FLOAT128;
+      rs6000_isa_flags &= ~(OPTION_MASK_FLOAT128 | OPTION_MASK_FLOAT128_HW);
+    }
+
+  /* IEEE 128-bit floating point hardware instructions imply enabling
+     __float128.  */
+  if (TARGET_FLOAT128_HW
+      && (rs6000_isa_flags & (OPTION_MASK_P9_VECTOR
+			      | OPTION_MASK_DIRECT_MOVE
+			      | OPTION_MASK_UPPER_REGS_DF
+			      | OPTION_MASK_UPPER_REGS_SF)) == 0)
+    {
+      if ((rs6000_isa_flags_explicit & OPTION_MASK_FLOAT128_HW) != 0)
+	error ("-mfloat128-hardware requires full ISA 3.0 support");
+
+      rs6000_isa_flags &= ~OPTION_MASK_FLOAT128_HW;
     }
 
+  else if (TARGET_P9_VECTOR && !TARGET_FLOAT128_HW
+	   && (rs6000_isa_flags_explicit & OPTION_MASK_FLOAT128_HW) == 0)
+    rs6000_isa_flags |= OPTION_MASK_FLOAT128_HW;
+
+  if (TARGET_FLOAT128_HW
+      && (rs6000_isa_flags_explicit & OPTION_MASK_FLOAT128) == 0)
+    rs6000_isa_flags |= OPTION_MASK_FLOAT128;
+
+  /* Print the options after updating the defaults.  */
   if (TARGET_DEBUG_REG || TARGET_DEBUG_TARGET)
     rs6000_print_isa_options (stderr, 0, "after defaults", rs6000_isa_flags);
 
@@ -3957,18 +4093,21 @@ rs6000_option_override_internal (bool gl
 			&& rs6000_cpu != PROCESSOR_POWER6
 			&& rs6000_cpu != PROCESSOR_POWER7
 			&& rs6000_cpu != PROCESSOR_POWER8
+			&& rs6000_cpu != PROCESSOR_POWER9
 			&& rs6000_cpu != PROCESSOR_PPCA2
 			&& rs6000_cpu != PROCESSOR_CELL
 			&& rs6000_cpu != PROCESSOR_PPC476);
   rs6000_sched_groups = (rs6000_cpu == PROCESSOR_POWER4
 			 || rs6000_cpu == PROCESSOR_POWER5
 			 || rs6000_cpu == PROCESSOR_POWER7
-			 || rs6000_cpu == PROCESSOR_POWER8);
+			 || rs6000_cpu == PROCESSOR_POWER8
+			 || rs6000_cpu == PROCESSOR_POWER9);
   rs6000_align_branch_targets = (rs6000_cpu == PROCESSOR_POWER4
 				 || rs6000_cpu == PROCESSOR_POWER5
 				 || rs6000_cpu == PROCESSOR_POWER6
 				 || rs6000_cpu == PROCESSOR_POWER7
 				 || rs6000_cpu == PROCESSOR_POWER8
+				 || rs6000_cpu == PROCESSOR_POWER9
 				 || rs6000_cpu == PROCESSOR_PPCE500MC
 				 || rs6000_cpu == PROCESSOR_PPCE500MC64
 				 || rs6000_cpu == PROCESSOR_PPCE5500
@@ -4216,6 +4355,10 @@ rs6000_option_override_internal (bool gl
 	rs6000_cost = &power8_cost;
 	break;
 
+      case PROCESSOR_POWER9:
+	rs6000_cost = &power9_cost;
+	break;
+
       case PROCESSOR_PPCA2:
 	rs6000_cost = &ppca2_cost;
 	break;
@@ -4396,7 +4539,8 @@ rs6000_loop_align (rtx label)
 	  || rs6000_cpu == PROCESSOR_POWER5
 	  || rs6000_cpu == PROCESSOR_POWER6
 	  || rs6000_cpu == PROCESSOR_POWER7
-	  || rs6000_cpu == PROCESSOR_POWER8))
+	  || rs6000_cpu == PROCESSOR_POWER8
+	  || rs6000_cpu == PROCESSOR_POWER9))
     return 5;
   else
     return align_loops_log;
@@ -5213,7 +5357,9 @@ rs6000_file_start (void)
       || !global_options_set.x_rs6000_cpu_index)
     {
       fputs ("\t.machine ", asm_out_file);
-      if ((rs6000_isa_flags & OPTION_MASK_DIRECT_MOVE) != 0)
+      if ((rs6000_isa_flags & OPTION_MASK_MODULO) != 0)
+	fputs ("power9\n", asm_out_file);
+      else if ((rs6000_isa_flags & OPTION_MASK_DIRECT_MOVE) != 0)
 	fputs ("power8\n", asm_out_file);
       else if ((rs6000_isa_flags & OPTION_MASK_POPCNTD) != 0)
 	fputs ("power7\n", asm_out_file);
@@ -28006,6 +28152,7 @@ rs6000_adjust_cost (rtx_insn *insn, rtx 
                  || rs6000_cpu_attr == CPU_POWER5
 		 || rs6000_cpu_attr == CPU_POWER7
 		 || rs6000_cpu_attr == CPU_POWER8
+		 || rs6000_cpu_attr == CPU_POWER9
                  || rs6000_cpu_attr == CPU_CELL)
                 && recog_memoized (dep_insn)
                 && (INSN_CODE (dep_insn) >= 0))
@@ -28578,6 +28725,7 @@ rs6000_issue_rate (void)
   case CPU_POWER7:
     return 5;
   case CPU_POWER8:
+  case CPU_POWER9:
     return 7;
   default:
     return 1;
@@ -29211,6 +29359,7 @@ insn_must_be_first_in_group (rtx_insn *i
         }
       break;
     case PROCESSOR_POWER8:
+    case PROCESSOR_POWER9:
       type = get_attr_type (insn);
 
       switch (type)
@@ -29341,6 +29490,7 @@ insn_must_be_last_in_group (rtx_insn *in
     }
     break;
   case PROCESSOR_POWER8:
+  case PROCESSOR_POWER9:
     type = get_attr_type (insn);
 
     switch (type)
@@ -29459,7 +29609,7 @@ force_new_group (int sched_verbose, FILE
 
       /* Do we have a special group ending nop? */
       if (rs6000_cpu_attr == CPU_POWER6 || rs6000_cpu_attr == CPU_POWER7
-	  || rs6000_cpu_attr == CPU_POWER8)
+	  || rs6000_cpu_attr == CPU_POWER8 || rs6000_cpu_attr == CPU_POWER9)
 	{
 	  nop = gen_group_ending_nop ();
 	  emit_insn_before (nop, next_insn);
@@ -31959,7 +32109,8 @@ rs6000_register_move_cost (machine_mode 
          expensive than memory in order to bias spills to memory .*/
       else if ((rs6000_cpu == PROCESSOR_POWER6
 		|| rs6000_cpu == PROCESSOR_POWER7
-		|| rs6000_cpu == PROCESSOR_POWER8)
+		|| rs6000_cpu == PROCESSOR_POWER8
+		|| rs6000_cpu == PROCESSOR_POWER9)
 	       && reg_classes_intersect_p (rclass, LINK_OR_CTR_REGS))
         ret = 6 * hard_regno_nregs[0][mode];
 
@@ -33489,12 +33640,15 @@ static struct rs6000_opt_mask const rs60
   { "efficient-unaligned-vsx",	OPTION_MASK_EFFICIENT_UNALIGNED_VSX,
 								false, true  },
   { "float128",			OPTION_MASK_FLOAT128,		false, true  },
+  { "float128-hardware",	OPTION_MASK_FLOAT128_HW,	false, true  },
   { "fprnd",			OPTION_MASK_FPRND,		false, true  },
+  { "fusion-toc",		OPTION_MASK_FUSION_TOC,		false, true  },
   { "hard-dfp",			OPTION_MASK_DFP,		false, true  },
   { "htm",			OPTION_MASK_HTM,		false, true  },
   { "isel",			OPTION_MASK_ISEL,		false, true  },
   { "mfcrf",			OPTION_MASK_MFCRF,		false, true  },
   { "mfpgpr",			OPTION_MASK_MFPGPR,		false, true  },
+  { "modulo",			OPTION_MASK_MODULO,		false, true  },
   { "mulhw",			OPTION_MASK_MULHW,		false, true  },
   { "multiple",			OPTION_MASK_MULTIPLE,		false, true  },
   { "popcntb",			OPTION_MASK_POPCNTB,		false, true  },
@@ -33502,6 +33656,8 @@ static struct rs6000_opt_mask const rs60
   { "power8-fusion",		OPTION_MASK_P8_FUSION,		false, true  },
   { "power8-fusion-sign",	OPTION_MASK_P8_FUSION_SIGN,	false, true  },
   { "power8-vector",		OPTION_MASK_P8_VECTOR,		false, true  },
+  { "power9-fusion",		OPTION_MASK_P9_FUSION,		false, true  },
+  { "power9-vector",		OPTION_MASK_P9_VECTOR,		false, true  },
   { "powerpc-gfxopt",		OPTION_MASK_PPC_GFXOPT,		false, true  },
   { "powerpc-gpopt",		OPTION_MASK_PPC_GPOPT,		false, true  },
   { "quad-memory",		OPTION_MASK_QUAD_MEMORY,	false, true  },
Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(revision 229674)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -252,7 +252,7 @@ (define_attr "cpu"
    ppc750,ppc7400,ppc7450,
    ppc403,ppc405,ppc440,ppc476,
    ppc8540,ppc8548,ppce300c2,ppce300c3,ppce500mc,ppce500mc64,ppce5500,ppce6500,
-   power4,power5,power6,power7,power8,
+   power4,power5,power6,power7,power8,power9,
    rs64a,mpccore,cell,ppca2,titan"
   (const (symbol_ref "rs6000_cpu_attr")))
 
Index: gcc/config/rs6000/rs6000-tables.opt
===================================================================
--- gcc/config/rs6000/rs6000-tables.opt	(revision 229674)
+++ gcc/config/rs6000/rs6000-tables.opt	(working copy)
@@ -180,14 +180,17 @@ EnumValue
 Enum(rs6000_cpu_opt_value) String(power8) Value(50)
 
 EnumValue
-Enum(rs6000_cpu_opt_value) String(powerpc) Value(51)
+Enum(rs6000_cpu_opt_value) String(power9) Value(51)
 
 EnumValue
-Enum(rs6000_cpu_opt_value) String(powerpc64) Value(52)
+Enum(rs6000_cpu_opt_value) String(powerpc) Value(52)
 
 EnumValue
-Enum(rs6000_cpu_opt_value) String(powerpc64le) Value(53)
+Enum(rs6000_cpu_opt_value) String(powerpc64) Value(53)
 
 EnumValue
-Enum(rs6000_cpu_opt_value) String(rs64) Value(54)
+Enum(rs6000_cpu_opt_value) String(powerpc64le) Value(54)
+
+EnumValue
+Enum(rs6000_cpu_opt_value) String(rs64) Value(55)
 
Index: gcc/config/rs6000/rs6000-c.c
===================================================================
--- gcc/config/rs6000/rs6000-c.c	(revision 229674)
+++ gcc/config/rs6000/rs6000-c.c	(working copy)
@@ -349,6 +349,8 @@ rs6000_target_modify_macros (bool define
     rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR7");
   if ((flags & OPTION_MASK_DIRECT_MOVE) != 0)
     rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR8");
+  if ((flags & OPTION_MASK_MODULO) != 0)
+    rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR9");
   if ((flags & OPTION_MASK_SOFT_FLOAT) != 0)
     rs6000_define_or_undefine_macro (define_p, "_SOFT_FLOAT");
   if ((flags & OPTION_MASK_RECIP_PRECISION) != 0)
Index: gcc/config/rs6000/aix61.h
===================================================================
--- gcc/config/rs6000/aix61.h	(revision 229674)
+++ gcc/config/rs6000/aix61.h	(working copy)
@@ -80,6 +80,7 @@ do {									\
 %{mcpu=power6x: -mpwr6} \
 %{mcpu=power7: -mpwr7} \
 %{mcpu=power8: -mpwr8} \
+%{mcpu=power9: -mpwr9} \
 %{mcpu=powerpc: -mppc} \
 %{mcpu=rs64a: -mppc} \
 %{mcpu=603: -m603} \
Index: gcc/config/rs6000/aix53.h
===================================================================
--- gcc/config/rs6000/aix53.h	(revision 229674)
+++ gcc/config/rs6000/aix53.h	(working copy)
@@ -63,6 +63,7 @@ do {									\
 %{mcpu=power6x: -mpwr6} \
 %{mcpu=power7: -mpwr7} \
 %{mcpu=power8: -mpwr8} \
+%{mcpu=power9: -mpwr9} \
 %{mcpu=powerpc: -mppc} \
 %{mcpu=rs64a: -mppc} \
 %{mcpu=603: -m603} \
Index: gcc/configure.ac
===================================================================
--- gcc/configure.ac	(revision 229674)
+++ gcc/configure.ac	(working copy)
@@ -4326,6 +4326,19 @@ LCF0:
 	  [Define if your assembler supports POWER8 instructions.])])
 
     case $target in
+      *-*-aix*) conftest_s='	.machine "pwr9"
+	.csect .text[[PR]]';;
+      *) conftest_s='	.machine power9
+	.text';;
+    esac
+
+    gcc_GAS_CHECK_FEATURE([power9 support],
+      gcc_cv_as_powerpc_power9, [2,19,2], -a32,
+      [$conftest_s],,
+      [AC_DEFINE(HAVE_AS_POWER9, 1,
+	  [Define if your assembler supports POWER9 instructions.])])
+
+    case $target in
       *-*-aix*) conftest_s='	.csect .text[[PR]]
 	lwsync';;
       *) conftest_s='	.text
Index: gcc/configure
===================================================================
--- gcc/configure	(revision 229674)
+++ gcc/configure	(working copy)
@@ -26315,6 +26315,48 @@ $as_echo "#define HAVE_AS_POWER8 1" >>co
 fi
 
     case $target in
+      *-*-aix*) conftest_s='	.machine "pwr9"
+	.csect .text[PR]';;
+      *) conftest_s='	.machine power9
+	.text';;
+    esac
+
+    { $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for power9 support" >&5
+$as_echo_n "checking assembler for power9 support... " >&6; }
+if test "${gcc_cv_as_powerpc_power9+set}" = set; then :
+  $as_echo_n "(cached) " >&6
+else
+  gcc_cv_as_powerpc_power9=no
+    if test $in_tree_gas = yes; then
+    if test $gcc_cv_gas_vers -ge `expr \( \( 2 \* 1000 \) + 19 \) \* 1000 + 2`
+  then gcc_cv_as_powerpc_power9=yes
+fi
+  elif test x$gcc_cv_as != x; then
+    $as_echo "$conftest_s" > conftest.s
+    if { ac_try='$gcc_cv_as $gcc_cv_as_flags -a32 -o conftest.o conftest.s >&5'
+  { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5
+  (eval $ac_try) 2>&5
+  ac_status=$?
+  $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; }; }
+    then
+	gcc_cv_as_powerpc_power9=yes
+    else
+      echo "configure: failed program was" >&5
+      cat conftest.s >&5
+    fi
+    rm -f conftest.o conftest.s
+  fi
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $gcc_cv_as_powerpc_power9" >&5
+$as_echo "$gcc_cv_as_powerpc_power9" >&6; }
+if test $gcc_cv_as_powerpc_power9 = yes; then
+
+$as_echo "#define HAVE_AS_POWER9 1" >>confdefs.h
+
+fi
+
+    case $target in
       *-*-aix*) conftest_s='	.csect .text[PR]
 	lwsync';;
       *) conftest_s='	.text
Index: gcc/config.in
===================================================================
--- gcc/config.in	(revision 229674)
+++ gcc/config.in	(working copy)
@@ -563,6 +563,12 @@
 #endif
 
 
+/* Define if your assembler supports POWER9 instructions. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_AS_POWER9
+#endif
+
+
 /* Define if your assembler supports .ref */
 #ifndef USED_FOR_TARGET
 #undef HAVE_AS_REF
Index: gcc/doc/invoke.texi
===================================================================
--- gcc/doc/invoke.texi	(revision 229674)
+++ gcc/doc/invoke.texi	(working copy)
@@ -946,8 +946,9 @@ See RS/6000 and PowerPC Options.
 -mquad-memory-atomic -mno-quad-memory-atomic @gol
 -mcompat-align-parm -mno-compat-align-parm @gol
 -mupper-regs-df -mno-upper-regs-df -mupper-regs-sf -mno-upper-regs-sf @gol
--mupper-regs -mno-upper-regs @gol
--mfloat128 -mno-float128}
+-mupper-regs -mno-upper-regs -mmodulo -mno-modulo @gol
+-mfloat128 -mno-float128 -mfloat128-hardware -mno-float128-hardware @gol
+-mpower9-fusion -mno-mpower9-fusion -mpower9-vector -mno-power9-vector}
 
 @emph{RX Options}
 @gccoptlist{-m64bit-doubles  -m32bit-doubles  -fpu  -nofpu@gol
@@ -19275,8 +19276,9 @@ Supported values for @var{cpu_type} are 
 @samp{e300c3}, @samp{e500mc}, @samp{e500mc64}, @samp{e5500},
 @samp{e6500}, @samp{ec603e}, @samp{G3}, @samp{G4}, @samp{G5},
 @samp{titan}, @samp{power3}, @samp{power4}, @samp{power5}, @samp{power5+},
-@samp{power6}, @samp{power6x}, @samp{power7}, @samp{power8}, @samp{powerpc},
-@samp{powerpc64}, @samp{powerpc64le}, and @samp{rs64}.
+@samp{power6}, @samp{power6x}, @samp{power7}, @samp{power8},
+@samp{power9}, @samp{powerpc}, @samp{powerpc64}, @samp{powerpc64le},
+and @samp{rs64}.
 
 @option{-mcpu=powerpc}, @option{-mcpu=powerpc64}, and
 @option{-mcpu=powerpc64le} specify pure 32-bit PowerPC (either
@@ -19296,7 +19298,8 @@ following options:
 -mpowerpc-gpopt  -mpowerpc-gfxopt  -msingle-float -mdouble-float @gol
 -msimple-fpu -mstring  -mmulhw  -mdlmzb  -mmfpgpr -mvsx @gol
 -mcrypto -mdirect-move -mpower8-fusion -mpower8-vector @gol
--mquad-memory -mquad-memory-atomic}
+-mquad-memory -mquad-memory-atomic -mmodulo -mfloat128 -mfloat128-hardware @gol
+-mpower9-fusion -mpower9-vector}
 
 The particular options set for any particular CPU varies between
 compiler versions, depending on what setting seems to produce optimal
@@ -19533,12 +19536,45 @@ If the @option{-mno-upper-regs} option i
 @opindex mfloat128
 @opindex mno-float128
 Enable/disable the @var{__float128} keyword for IEEE 128-bit floating point
-and use software emulation for IEEE 128-bit floating point.
+and use either software emulation for IEEE 128-bit floating point or
+hardware instructions.
 
 The VSX instruction set (@option{-mvsx}, @option{-mcpu=power7}, or
 @option{-mcpu=power8}) must be enabled to use the @option{-mfloat128}
 option.
 
+@item -mfloat128-hardware
+@itemx -mno-float128-hardware
+@opindex mfloat128-hardware
+@opindex mno-float128-hardware
+Enable/disable using ISA 3.0 hardware instructions to support the
+@var{__float128} data type.
+
+@item -mmodulo
+@itemx -mno-modulo
+@opindex mmodulo
+@opindex mno-module
+Generate code that uses (does not use) the ISA 2.08 integer modulo
+instructions.  The @option{-mmodulo} option is enabled by default
+with the @option{-mcpu=power9} option.
+
+@item -mpower9-fusion
+@itemx -mno-power9-fusion
+@opindex mpower9-fusion
+@opindex mno-power9-fusion
+Generate code that keeps (does not keeps) some operations adjacent so
+that the instructions can be fused together on power9 and later
+processors.
+
+@item -mpower9-vector
+@itemx -mno-power9-vector
+@opindex mpower9-vector
+@opindex mno-power9-vector
+Generate code that uses (does not use) the vector and scalar
+instructions that were added in version 2.07 of the PowerPC ISA.  Also
+enable the use of built-in functions that allow more direct access to
+the vector instructions.
+
 @item -mfloat-gprs=@var{yes/single/double/no}
 @itemx -mfloat-gprs
 @opindex mfloat-gprs

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH], Add power9 support to GCC, patch #1
  2015-11-03 20:29 [PATCH], Add power9 support to GCC, patch #1 Michael Meissner
@ 2015-11-04 21:16 ` Segher Boessenkool
  2015-11-04 21:27   ` Michael Meissner
  2015-11-09  0:33 ` [PATCH], Add power9 support to GCC, patch #1 (revised) Michael Meissner
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 47+ messages in thread
From: Segher Boessenkool @ 2015-11-04 21:16 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, dje.gcc

Hi,

Some minor things...

On Tue, Nov 03, 2015 at 03:29:11PM -0500, Michael Meissner wrote:
> 	* config/rs6000/rs6000.opt (-mfusion-toc): Add new switches for
> 	ISA 3.0 (power9).

"-mtoc-fusion" sounds more natural, and is more in line with the other
switches I think.

> +  /* ISA 2.08 vector instructions include ISA 2.07.  */

ISA 3.0

> +@item -mmodulo
> +@itemx -mno-modulo
> +@opindex mmodulo
> +@opindex mno-module
> +Generate code that uses (does not use) the ISA 2.08 integer modulo
> +instructions.  The @option{-mmodulo} option is enabled by default
> +with the @option{-mcpu=power9} option.

Again.  I think it was just these two, but please check.

> +@item -mpower9-fusion
> +@itemx -mno-power9-fusion
> +@opindex mpower9-fusion
> +@opindex mno-power9-fusion
> +Generate code that keeps (does not keeps) some operations adjacent so
> +that the instructions can be fused together on power9 and later
> +processors.

> +@item -mpower9-vector
> +@itemx -mno-power9-vector
> +@opindex mpower9-vector
> +@opindex mno-power9-vector
> +Generate code that uses (does not use) the vector and scalar
> +instructions that were added in version 2.07 of the PowerPC ISA.  Also
> +enable the use of built-in functions that allow more direct access to
> +the vector instructions.

3.0 here as well?


Segher

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH], Add power9 support to GCC, patch #1
  2015-11-04 21:16 ` Segher Boessenkool
@ 2015-11-04 21:27   ` Michael Meissner
  0 siblings, 0 replies; 47+ messages in thread
From: Michael Meissner @ 2015-11-04 21:27 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Michael Meissner, gcc-patches, dje.gcc

On Wed, Nov 04, 2015 at 03:15:53PM -0600, Segher Boessenkool wrote:
> Hi,
> 
> Some minor things...
> 
> On Tue, Nov 03, 2015 at 03:29:11PM -0500, Michael Meissner wrote:
> > 	* config/rs6000/rs6000.opt (-mfusion-toc): Add new switches for
> > 	ISA 3.0 (power9).
> 
> "-mtoc-fusion" sounds more natural, and is more in line with the other
> switches I think.

That's reasonable.  At present, -mfusion-toc is not documented, as it was
intended to be a debug switch.

David, do you have an opinion one way or the other?

> > +  /* ISA 2.08 vector instructions include ISA 2.07.  */
> 
> ISA 3.0

Thanks for catching that.  I missed a few places that were written earlier
before we decided the new ISA would be 3.0 instead of 2.8.  I'll make those
changes before submitting.

> > +@item -mmodulo
> > +@itemx -mno-modulo
> > +@opindex mmodulo
> > +@opindex mno-module
> > +Generate code that uses (does not use) the ISA 2.08 integer modulo
> > +instructions.  The @option{-mmodulo} option is enabled by default
> > +with the @option{-mcpu=power9} option.
> 
> Again.  I think it was just these two, but please check.
> 
> > +@item -mpower9-fusion
> > +@itemx -mno-power9-fusion
> > +@opindex mpower9-fusion
> > +@opindex mno-power9-fusion
> > +Generate code that keeps (does not keeps) some operations adjacent so
> > +that the instructions can be fused together on power9 and later
> > +processors.
> 
> > +@item -mpower9-vector
> > +@itemx -mno-power9-vector
> > +@opindex mpower9-vector
> > +@opindex mno-power9-vector
> > +Generate code that uses (does not use) the vector and scalar
> > +instructions that were added in version 2.07 of the PowerPC ISA.  Also
> > +enable the use of built-in functions that allow more direct access to
> > +the vector instructions.
> 
> 3.0 here as well?

I only found 3 references to 2.08 in the patch.

Thanks.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH], Add power9 support to GCC, patch #1 (revised)
  2015-11-03 20:29 [PATCH], Add power9 support to GCC, patch #1 Michael Meissner
  2015-11-04 21:16 ` Segher Boessenkool
@ 2015-11-09  0:33 ` Michael Meissner
  2015-11-09 16:12   ` David Edelsohn
  2015-11-10 18:39   ` [PATCH], Add power9 support to GCC, patch #8 (add integer multiply/add) Michael Meissner
  2015-11-09  0:36 ` [PATCH], Add power9 support to GCC, patch #2 (add modulus instructions) Michael Meissner
                   ` (7 subsequent siblings)
  9 siblings, 2 replies; 47+ messages in thread
From: Michael Meissner @ 2015-11-09  0:33 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, dje.gcc

[-- Attachment #1: Type: text/plain, Size: 2634 bytes --]

This is patch #1 that I revised.  I changed -mfusion-toc to -mtoc-fusion.  I
changed the references to ISA 2.08 to 3.0.  I added two new debug switches for
code in future patches that in undergoing development and is not ready to be on
by default.

I have done a bootstrap build on a little endian power8 system and there were
no regressions in this patch.  Is it ok to install in the trunk?

2015-11-08  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* config/rs6000/rs6000.opt (-mpower9-fusion): Add new switches for
	ISA 3.0 (power9).
	(-mpower9-vector): Likewise.
	(-mpower9-dform): Likewise.
	(-mpower9-minmax): Likewise.
	(-mtoc-fusion): Likewise.
	(-mmodulo): Likewise.
	(-mfloat128-hardware): Likewise.

	* config/rs6000/rs6000-cpus.def (ISA_3_0_MASKS_SERVER): Add option
	mask for ISA 3.0 (power9).
	(POWERPC_MASKS): Add new ISA 3.0 switches.
	(power9 cpu): Add power9 cpu.

	* config/rs6000/rs6000.h (ASM_CPU_POWER9_SPEC): Add support for
	power9.
	(ASM_CPU_SPEC): Likewise.
	(EXTRA_SPECS): Likewise.

	* config/rs6000/rs6000-opts.h (enum processor_type): Add
	PROCESSOR_POWER9.

	* config/rs6000/rs6000.c (power9_cost): Initial cost setup for
	power9.
	(rs6000_debug_reg_global): Add support for power9 fusion.
	(rs6000_setup_reg_addr_masks): Cache mode size.
	(rs6000_option_override_internal): Until real power9 tuning is
	added, use -mtune=power8 for -mcpu=power9.
	(rs6000_setup_reg_addr_masks): Do not allow pre-increment,
	pre-decrement, or pre-modify on SFmode/DFmode if we allow the use
	of Altivec registers.
	(rs6000_option_override_internal): Add support for ISA 3.0
	switches.
	(rs6000_loop_align): Add support for power9 cpu.
	(rs6000_file_start): Likewise.
	(rs6000_adjust_cost): Likewise.
	(rs6000_issue_rate): Likewise.
	(insn_must_be_first_in_group): Likewise.
	(insn_must_be_last_in_group): Likewise.
	(force_new_group): Likewise.
	(rs6000_register_move_cost): Likewise.
	(rs6000_opt_masks): Likewise.

	* config/rs6000/rs6000.md (cpu attribute): Add power9.
	* config/rs6000/rs6000-tables.opt: Regenerate.

	* config/rs6000/rs6000-c.c (rs6000_target_modify_macros): Define
	_ARCH_PWR9 if power9 support is available.

	* config/rs6000/aix61.h (ASM_CPU_SPEC): Add power9.
	* config/rs6000/aix53.h (ASM_CPU_SPEC): Likewise.

	* configure.ac: Determine if the assembler supports the ISA 3.0
	instructions.
	* config.in (HAVE_AS_POWER9): Likewise.
	* configure: Regenerate.

	* doc/invoke.texi (RS/6000 and PowerPC Options): Document ISA 3.0
	switches.


-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

[-- Attachment #2: gcc-power9.official-01b --]
[-- Type: text/plain, Size: 30011 bytes --]

Index: gcc/config/rs6000/rs6000.opt
===================================================================
--- gcc/config/rs6000/rs6000.opt	(revision 229970)
+++ gcc/config/rs6000/rs6000.opt	(working copy)
@@ -601,6 +601,34 @@ moptimize-swaps
 Target Undocumented Var(rs6000_optimize_swaps) Init(1) Save
 Analyze and remove doubleword swaps from VSX computations.
 
+mpower9-fusion
+Target Report Mask(P9_FUSION) Var(rs6000_isa_flags)
+Fuse certain operations together for better performance on power9.
+
+mpower9-vector
+Target Report Mask(P9_VECTOR) Var(rs6000_isa_flags)
+Use/do not use vector and scalar instructions added in ISA 3.0.
+
+mpower9-dform
+Target Undocumented Mask(P9_DFORM) Var(rs6000_isa_flags)
+Use/do not use vector and scalar instructions added in ISA 3.0.
+
+mpower9-minmax
+Target Undocumented Mask(P9_MINMAX) Var(rs6000_isa_flags)
+Use/do not use the new min/max instructions defined in ISA 3.0.
+
+mtoc-fusion
+Target Undocumented Mask(TOC_FUSION) Var(rs6000_isa_flags)
+Fuse medium/large code model toc references with the memory instruction.
+
+mmodulo
+Target Report Mask(MODULO) Var(rs6000_isa_flags)
+Generate the integer modulo instructions.
+
 mfloat128
 Target Report Mask(FLOAT128) Var(rs6000_isa_flags)
 Enable/disable IEEE 128-bit floating point via the __float128 keyword.
+
+mfloat128-hardware
+Target Report Mask(FLOAT128_HW) Var(rs6000_isa_flags)
+Enable/disable using IEEE 128-bit floating point instructions.
Index: gcc/config/rs6000/rs6000-cpus.def
===================================================================
--- gcc/config/rs6000/rs6000-cpus.def	(revision 229970)
+++ gcc/config/rs6000/rs6000-cpus.def	(working copy)
@@ -60,6 +60,15 @@
   				 | OPTION_MASK_QUAD_MEMORY_ATOMIC	\
 				 | OPTION_MASK_UPPER_REGS_SF)
 
+/* Add ISEL back into ISA 3.0, since it is supposed to be a win.  Do not add
+   P9_DFORM or P9_MINMAX until they are fully debugged.  */
+#define ISA_3_0_MASKS_SERVER	(ISA_2_7_MASKS_SERVER			\
+				 | OPTION_MASK_FLOAT128_HW		\
+				 | OPTION_MASK_ISEL			\
+				 | OPTION_MASK_MODULO			\
+				 | OPTION_MASK_P9_FUSION		\
+				 | OPTION_MASK_P9_VECTOR)
+
 #define POWERPC_7400_MASK	(OPTION_MASK_PPC_GFXOPT | OPTION_MASK_ALTIVEC)
 
 /* Deal with ports that do not have -mstrict-align.  */
@@ -87,10 +96,15 @@
 				 | OPTION_MASK_ISEL			\
 				 | OPTION_MASK_MFCRF			\
 				 | OPTION_MASK_MFPGPR			\
+				 | OPTION_MASK_MODULO			\
 				 | OPTION_MASK_MULHW			\
 				 | OPTION_MASK_NO_UPDATE		\
 				 | OPTION_MASK_P8_FUSION		\
 				 | OPTION_MASK_P8_VECTOR		\
+				 | OPTION_MASK_P9_DFORM			\
+				 | OPTION_MASK_P9_FUSION		\
+				 | OPTION_MASK_P9_MINMAX		\
+				 | OPTION_MASK_P9_VECTOR		\
 				 | OPTION_MASK_POPCNTB			\
 				 | OPTION_MASK_POPCNTD			\
 				 | OPTION_MASK_POWERPC64		\
@@ -101,6 +115,7 @@
 				 | OPTION_MASK_RECIP_PRECISION		\
 				 | OPTION_MASK_SOFT_FLOAT		\
 				 | OPTION_MASK_STRICT_ALIGN_OPTIONAL	\
+				 | OPTION_MASK_TOC_FUSION		\
 				 | OPTION_MASK_UPPER_REGS_DF		\
 				 | OPTION_MASK_UPPER_REGS_SF		\
 				 | OPTION_MASK_VSX			\
@@ -195,6 +210,7 @@ RS6000_CPU ("power7", PROCESSOR_POWER7, 
 	    | MASK_POPCNTB | MASK_FPRND | MASK_CMPB | MASK_DFP | MASK_POPCNTD
 	    | MASK_VSX | MASK_RECIP_PRECISION | OPTION_MASK_UPPER_REGS_DF)
 RS6000_CPU ("power8", PROCESSOR_POWER8, MASK_POWERPC64 | ISA_2_7_MASKS_SERVER)
+RS6000_CPU ("power9", PROCESSOR_POWER9, MASK_POWERPC64 | ISA_3_0_MASKS_SERVER)
 RS6000_CPU ("powerpc", PROCESSOR_POWERPC, 0)
 RS6000_CPU ("powerpc64", PROCESSOR_POWERPC64, MASK_PPC_GFXOPT | MASK_POWERPC64)
 RS6000_CPU ("powerpc64le", PROCESSOR_POWER8, MASK_POWERPC64 | ISA_2_7_MASKS_SERVER)
Index: gcc/config/rs6000/rs6000.h
===================================================================
--- gcc/config/rs6000/rs6000.h	(revision 229970)
+++ gcc/config/rs6000/rs6000.h	(working copy)
@@ -95,6 +95,12 @@
 #define ASM_CPU_POWER8_SPEC ASM_CPU_POWER7_SPEC
 #endif
 
+#ifdef HAVE_AS_POWER9
+#define ASM_CPU_POWER9_SPEC "-mpower9"
+#else
+#define ASM_CPU_POWER9_SPEC ASM_CPU_POWER8_SPEC
+#endif
+
 #ifdef HAVE_AS_DCI
 #define ASM_CPU_476_SPEC "-m476"
 #else
@@ -119,6 +125,7 @@
 %{mcpu=power6x: %(asm_cpu_power6) -maltivec} \
 %{mcpu=power7: %(asm_cpu_power7)} \
 %{mcpu=power8: %(asm_cpu_power8)} \
+%{mcpu=power9: %(asm_cpu_power9)} \
 %{mcpu=a2: -ma2} \
 %{mcpu=powerpc: -mppc} \
 %{mcpu=rs64a: -mppc64} \
@@ -193,6 +200,7 @@
   { "asm_cpu_power6",		ASM_CPU_POWER6_SPEC },			\
   { "asm_cpu_power7",		ASM_CPU_POWER7_SPEC },			\
   { "asm_cpu_power8",		ASM_CPU_POWER8_SPEC },			\
+  { "asm_cpu_power9",		ASM_CPU_POWER9_SPEC },			\
   { "asm_cpu_476",		ASM_CPU_476_SPEC },			\
   SUBTARGET_EXTRA_SPECS
 
Index: gcc/config/rs6000/rs6000-opts.h
===================================================================
--- gcc/config/rs6000/rs6000-opts.h	(revision 229970)
+++ gcc/config/rs6000/rs6000-opts.h	(working copy)
@@ -60,6 +60,7 @@ enum processor_type
    PROCESSOR_POWER6,
    PROCESSOR_POWER7,
    PROCESSOR_POWER8,
+   PROCESSOR_POWER9,
 
    PROCESSOR_RS64A,
    PROCESSOR_MPCCORE,
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 229970)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -985,6 +985,26 @@ struct processor_costs power8_cost = {
   COSTS_N_INSNS (3),	/* SF->DF convert */
 };
 
+/* Instruction costs on POWER9 processors.  */
+static const
+struct processor_costs power9_cost = {
+  COSTS_N_INSNS (3),	/* mulsi */
+  COSTS_N_INSNS (3),	/* mulsi_const */
+  COSTS_N_INSNS (3),	/* mulsi_const9 */
+  COSTS_N_INSNS (3),	/* muldi */
+  COSTS_N_INSNS (19),	/* divsi */
+  COSTS_N_INSNS (35),	/* divdi */
+  COSTS_N_INSNS (3),	/* fp */
+  COSTS_N_INSNS (3),	/* dmul */
+  COSTS_N_INSNS (14),	/* sdiv */
+  COSTS_N_INSNS (17),	/* ddiv */
+  128,			/* cache line size */
+  32,			/* l1 cache */
+  256,			/* l2 cache */
+  12,			/* prefetch streams */
+  COSTS_N_INSNS (3),	/* SF->DF convert */
+};
+
 /* Instruction costs on POWER A2 processors.  */
 static const
 struct processor_costs ppca2_cost = {
@@ -2423,8 +2443,18 @@ rs6000_debug_reg_global (void)
     fprintf (stderr, DEBUG_FMT_S, "lra", "true");
 
   if (TARGET_P8_FUSION)
-    fprintf (stderr, DEBUG_FMT_S, "p8 fusion",
-	     (TARGET_P8_FUSION_SIGN) ? "zero+sign" : "zero");
+    {
+      char options[80];
+
+      strcpy (options, (TARGET_P9_FUSION) ? "power9" : "power8");
+      if (TARGET_TOC_FUSION)
+	strcat (options, ", toc");
+
+      if (TARGET_P8_FUSION_SIGN)
+	strcat (options, ", sign");
+
+      fprintf (stderr, DEBUG_FMT_S, "fusion", options);
+    }
 
   fprintf (stderr, DEBUG_FMT_S, "plt-format",
 	   TARGET_SECURE_PLT ? "secure" : "bss");
@@ -2463,6 +2493,7 @@ rs6000_setup_reg_addr_masks (void)
   for (m = 0; m < NUM_MACHINE_MODES; ++m)
     {
       machine_mode m2 = (machine_mode)m;
+      unsigned short msize = GET_MODE_SIZE (m2);
 
       /* SDmode is special in that we want to access it only via REG+REG
 	 addressing on power7 and above, since we want to use the LFIWZX and
@@ -2492,16 +2523,18 @@ rs6000_setup_reg_addr_masks (void)
 	      /* Figure out if we can do PRE_INC, PRE_DEC, or PRE_MODIFY
 		 addressing.  Restrict addressing on SPE for 64-bit types
 		 because of the SUBREG hackery used to address 64-bit floats in
-		 '32-bit' GPRs.  */
+		 '32-bit' GPRs.  If we allow scalars into Altivec registers,
+		 don't allow PRE_INC, PRE_DEC, or PRE_MODIFY.  */
 
 	      if (TARGET_UPDATE
 		  && (rc == RELOAD_REG_GPR || rc == RELOAD_REG_FPR)
-		  && GET_MODE_SIZE (m2) <= 8
+		  && msize <= 8
 		  && !VECTOR_MODE_P (m2)
 		  && !FLOAT128_VECTOR_P (m2)
 		  && !COMPLEX_MODE_P (m2)
-		  && !indexed_only_p
-		  && !(TARGET_E500_DOUBLE && GET_MODE_SIZE (m2) == 8))
+		  && (m2 != DFmode || !TARGET_UPPER_REGS_DF)
+		  && (m2 != SFmode || !TARGET_UPPER_REGS_SF)
+		  && !(TARGET_E500_DOUBLE && msize == 8))
 		{
 		  addr_mask |= RELOAD_REG_PRE_INCDEC;
 
@@ -2536,7 +2569,7 @@ rs6000_setup_reg_addr_masks (void)
 
 	  /* VMX registers can do (REG & -16) and ((REG+REG) & -16)
 	     addressing on 128-bit types.  */
-	  if (rc == RELOAD_REG_VMX && GET_MODE_SIZE (m2) == 16
+	  if (rc == RELOAD_REG_VMX && msize == 16
 	      && (addr_mask & RELOAD_REG_VALID) != 0)
 	    addr_mask |= RELOAD_REG_AND_M16;
 
@@ -3382,7 +3415,22 @@ rs6000_option_override_internal (bool gl
   if (rs6000_tune_index >= 0)
     tune_index = rs6000_tune_index;
   else if (have_cpu)
-    rs6000_tune_index = tune_index = cpu_index;
+    {
+      /* Until power9 tuning is available, use power8 tuning if -mcpu=power9.  */
+      if (processor_target_table[cpu_index].processor != PROCESSOR_POWER9)
+	rs6000_tune_index = tune_index = cpu_index;
+      else
+	{
+	  size_t i;
+	  tune_index = -1;
+	  for (i = 0; i < ARRAY_SIZE (processor_target_table); i++)
+	    if (processor_target_table[i].processor == PROCESSOR_POWER8)
+	      {
+		rs6000_tune_index = tune_index = i;
+		break;
+	      }
+	}
+    }
   else
     {
       size_t i;
@@ -3557,7 +3605,9 @@ rs6000_option_override_internal (bool gl
 
   /* For the newer switches (vsx, dfp, etc.) set some of the older options,
      unless the user explicitly used the -mno-<option> to disable the code.  */
-  if (TARGET_P8_VECTOR || TARGET_DIRECT_MOVE || TARGET_CRYPTO)
+  if (TARGET_P9_VECTOR || TARGET_MODULO || TARGET_P9_DFORM || TARGET_P9_MINMAX)
+    rs6000_isa_flags |= (ISA_3_0_MASKS_SERVER & ~rs6000_isa_flags_explicit);
+  else if (TARGET_P8_VECTOR || TARGET_DIRECT_MOVE || TARGET_CRYPTO)
     rs6000_isa_flags |= (ISA_2_7_MASKS_SERVER & ~rs6000_isa_flags_explicit);
   else if (TARGET_VSX)
     rs6000_isa_flags |= (ISA_2_6_MASKS_SERVER & ~rs6000_isa_flags_explicit);
@@ -3703,6 +3753,41 @@ rs6000_option_override_internal (bool gl
     rs6000_isa_flags |= (processor_target_table[tune_index].target_enable
 			 & OPTION_MASK_P8_FUSION);
 
+  /* Setting additional fusion flags turns on base fusion.  */
+  if (!TARGET_P8_FUSION && (TARGET_P8_FUSION_SIGN || TARGET_TOC_FUSION))
+    {
+      if (rs6000_isa_flags_explicit & OPTION_MASK_P8_FUSION)
+	{
+	  if (TARGET_P8_FUSION_SIGN)
+	    error ("-mpower8-fusion-sign requires -mpower8-fusion");
+
+	  if (TARGET_TOC_FUSION)
+	    error ("-mtoc-fusion requires -mpower8-fusion");
+
+	  rs6000_isa_flags &= ~OPTION_MASK_P8_FUSION;
+	}
+      else
+	rs6000_isa_flags |= OPTION_MASK_P8_FUSION;
+    }
+
+  /* Power9 fusion is a superset over power8 fusion.  */
+  if (TARGET_P9_FUSION && !TARGET_P8_FUSION)
+    {
+      if (rs6000_isa_flags_explicit & OPTION_MASK_P8_FUSION)
+	{
+	  error ("-mpower9-fusion requires -mpower8-fusion");
+	  rs6000_isa_flags &= ~OPTION_MASK_P9_FUSION;
+	}
+      else
+	rs6000_isa_flags |= OPTION_MASK_P8_FUSION;
+    }
+
+  /* Enable power9 fusion if we are tuning for power9, even if we aren't
+     generating power9 instructions.  */
+  if (!(rs6000_isa_flags_explicit & OPTION_MASK_P9_FUSION))
+    rs6000_isa_flags |= (processor_target_table[tune_index].target_enable
+			 & OPTION_MASK_P9_FUSION);
+
   /* Power8 does not fuse sign extended loads with the addis.  If we are
      optimizing at high levels for speed, convert a sign extended load into a
      zero extending load, and an explicit sign extension.  */
@@ -3712,6 +3797,58 @@ rs6000_option_override_internal (bool gl
       && optimize >= 3)
     rs6000_isa_flags |= OPTION_MASK_P8_FUSION_SIGN;
 
+  /* TOC fusion requires 64-bit and medium/large code model.  */
+  if (TARGET_TOC_FUSION && !TARGET_POWERPC64)
+    {
+      rs6000_isa_flags &= ~OPTION_MASK_TOC_FUSION;
+      if ((rs6000_isa_flags_explicit & OPTION_MASK_TOC_FUSION) != 0)
+	warning (0, N_("-mtoc-fusion requires 64-bit"));
+    }
+
+  if (TARGET_TOC_FUSION && (TARGET_CMODEL == CMODEL_SMALL))
+    {
+      rs6000_isa_flags &= ~OPTION_MASK_TOC_FUSION;
+      if ((rs6000_isa_flags_explicit & OPTION_MASK_TOC_FUSION) != 0)
+	warning (0, N_("-mtoc-fusion requires medium/large code model"));
+    }
+
+  /* Turn on -mtoc-fusion by default if p8-fusion and 64-bit medium/large code
+     model.  */
+  if (TARGET_P8_FUSION && !TARGET_TOC_FUSION && TARGET_POWERPC64
+      && (TARGET_CMODEL != CMODEL_SMALL)
+      && !(rs6000_isa_flags_explicit & OPTION_MASK_TOC_FUSION))
+    rs6000_isa_flags |= OPTION_MASK_TOC_FUSION;
+
+  /* ISA 3.0 D-form instructions require p9-vector and upper-regs.  */
+  if (TARGET_P9_DFORM && !TARGET_P9_VECTOR)
+    {
+      if (rs6000_isa_flags_explicit & OPTION_MASK_P9_VECTOR)
+	error ("-mpower9-dform requires -mpower9-vector");
+      rs6000_isa_flags &= ~OPTION_MASK_P9_DFORM;
+    }
+
+  if (TARGET_P9_DFORM && !TARGET_UPPER_REGS_DF)
+    {
+      if (rs6000_isa_flags_explicit & OPTION_MASK_UPPER_REGS_DF)
+	error ("-mpower9-dform requires -mupper-regs-df");
+      rs6000_isa_flags &= ~OPTION_MASK_P9_DFORM;
+    }
+
+  if (TARGET_P9_DFORM && !TARGET_UPPER_REGS_SF)
+    {
+      if (rs6000_isa_flags_explicit & OPTION_MASK_UPPER_REGS_SF)
+	error ("-mpower9-dform requires -mupper-regs-sf");
+      rs6000_isa_flags &= ~OPTION_MASK_P9_DFORM;
+    }
+
+  /* ISA 3.0 vector instructions include ISA 2.07.  */
+  if (TARGET_P9_VECTOR && !TARGET_P8_VECTOR)
+    {
+      if (rs6000_isa_flags_explicit & OPTION_MASK_P8_VECTOR)
+	error ("-mpower9-vector requires -mpower8-vector");
+      rs6000_isa_flags &= ~OPTION_MASK_P9_VECTOR;
+    }
+
   /* Set -mallow-movmisalign to explicitly on if we have full ISA 2.07
      support. If we only have ISA 2.06 support, and the user did not specify
      the switch, leave it set to -1 so the movmisalign patterns are enabled,
@@ -3757,9 +3894,32 @@ rs6000_option_override_internal (bool gl
       if ((rs6000_isa_flags_explicit & OPTION_MASK_FLOAT128) != 0)
 	error ("-mfloat128 requires VSX support");
 
-      rs6000_isa_flags &= ~OPTION_MASK_FLOAT128;
+      rs6000_isa_flags &= ~(OPTION_MASK_FLOAT128 | OPTION_MASK_FLOAT128_HW);
     }
 
+  /* IEEE 128-bit floating point hardware instructions imply enabling
+     __float128.  */
+  if (TARGET_FLOAT128_HW
+      && (rs6000_isa_flags & (OPTION_MASK_P9_VECTOR
+			      | OPTION_MASK_DIRECT_MOVE
+			      | OPTION_MASK_UPPER_REGS_DF
+			      | OPTION_MASK_UPPER_REGS_SF)) == 0)
+    {
+      if ((rs6000_isa_flags_explicit & OPTION_MASK_FLOAT128_HW) != 0)
+	error ("-mfloat128-hardware requires full ISA 3.0 support");
+
+      rs6000_isa_flags &= ~OPTION_MASK_FLOAT128_HW;
+    }
+
+  else if (TARGET_P9_VECTOR && !TARGET_FLOAT128_HW
+	   && (rs6000_isa_flags_explicit & OPTION_MASK_FLOAT128_HW) == 0)
+    rs6000_isa_flags |= OPTION_MASK_FLOAT128_HW;
+
+  if (TARGET_FLOAT128_HW
+      && (rs6000_isa_flags_explicit & OPTION_MASK_FLOAT128) == 0)
+    rs6000_isa_flags |= OPTION_MASK_FLOAT128;
+
+  /* Print the options after updating the defaults.  */
   if (TARGET_DEBUG_REG || TARGET_DEBUG_TARGET)
     rs6000_print_isa_options (stderr, 0, "after defaults", rs6000_isa_flags);
 
@@ -3957,18 +4117,21 @@ rs6000_option_override_internal (bool gl
 			&& rs6000_cpu != PROCESSOR_POWER6
 			&& rs6000_cpu != PROCESSOR_POWER7
 			&& rs6000_cpu != PROCESSOR_POWER8
+			&& rs6000_cpu != PROCESSOR_POWER9
 			&& rs6000_cpu != PROCESSOR_PPCA2
 			&& rs6000_cpu != PROCESSOR_CELL
 			&& rs6000_cpu != PROCESSOR_PPC476);
   rs6000_sched_groups = (rs6000_cpu == PROCESSOR_POWER4
 			 || rs6000_cpu == PROCESSOR_POWER5
 			 || rs6000_cpu == PROCESSOR_POWER7
-			 || rs6000_cpu == PROCESSOR_POWER8);
+			 || rs6000_cpu == PROCESSOR_POWER8
+			 || rs6000_cpu == PROCESSOR_POWER9);
   rs6000_align_branch_targets = (rs6000_cpu == PROCESSOR_POWER4
 				 || rs6000_cpu == PROCESSOR_POWER5
 				 || rs6000_cpu == PROCESSOR_POWER6
 				 || rs6000_cpu == PROCESSOR_POWER7
 				 || rs6000_cpu == PROCESSOR_POWER8
+				 || rs6000_cpu == PROCESSOR_POWER9
 				 || rs6000_cpu == PROCESSOR_PPCE500MC
 				 || rs6000_cpu == PROCESSOR_PPCE500MC64
 				 || rs6000_cpu == PROCESSOR_PPCE5500
@@ -4216,6 +4379,10 @@ rs6000_option_override_internal (bool gl
 	rs6000_cost = &power8_cost;
 	break;
 
+      case PROCESSOR_POWER9:
+	rs6000_cost = &power9_cost;
+	break;
+
       case PROCESSOR_PPCA2:
 	rs6000_cost = &ppca2_cost;
 	break;
@@ -4396,7 +4563,8 @@ rs6000_loop_align (rtx label)
 	  || rs6000_cpu == PROCESSOR_POWER5
 	  || rs6000_cpu == PROCESSOR_POWER6
 	  || rs6000_cpu == PROCESSOR_POWER7
-	  || rs6000_cpu == PROCESSOR_POWER8))
+	  || rs6000_cpu == PROCESSOR_POWER8
+	  || rs6000_cpu == PROCESSOR_POWER9))
     return 5;
   else
     return align_loops_log;
@@ -5213,7 +5381,9 @@ rs6000_file_start (void)
       || !global_options_set.x_rs6000_cpu_index)
     {
       fputs ("\t.machine ", asm_out_file);
-      if ((rs6000_isa_flags & OPTION_MASK_DIRECT_MOVE) != 0)
+      if ((rs6000_isa_flags & OPTION_MASK_MODULO) != 0)
+	fputs ("power9\n", asm_out_file);
+      else if ((rs6000_isa_flags & OPTION_MASK_DIRECT_MOVE) != 0)
 	fputs ("power8\n", asm_out_file);
       else if ((rs6000_isa_flags & OPTION_MASK_POPCNTD) != 0)
 	fputs ("power7\n", asm_out_file);
@@ -28006,6 +28176,7 @@ rs6000_adjust_cost (rtx_insn *insn, rtx 
                  || rs6000_cpu_attr == CPU_POWER5
 		 || rs6000_cpu_attr == CPU_POWER7
 		 || rs6000_cpu_attr == CPU_POWER8
+		 || rs6000_cpu_attr == CPU_POWER9
                  || rs6000_cpu_attr == CPU_CELL)
                 && recog_memoized (dep_insn)
                 && (INSN_CODE (dep_insn) >= 0))
@@ -28578,6 +28749,7 @@ rs6000_issue_rate (void)
   case CPU_POWER7:
     return 5;
   case CPU_POWER8:
+  case CPU_POWER9:
     return 7;
   default:
     return 1;
@@ -29211,6 +29383,7 @@ insn_must_be_first_in_group (rtx_insn *i
         }
       break;
     case PROCESSOR_POWER8:
+    case PROCESSOR_POWER9:
       type = get_attr_type (insn);
 
       switch (type)
@@ -29341,6 +29514,7 @@ insn_must_be_last_in_group (rtx_insn *in
     }
     break;
   case PROCESSOR_POWER8:
+  case PROCESSOR_POWER9:
     type = get_attr_type (insn);
 
     switch (type)
@@ -29459,7 +29633,7 @@ force_new_group (int sched_verbose, FILE
 
       /* Do we have a special group ending nop? */
       if (rs6000_cpu_attr == CPU_POWER6 || rs6000_cpu_attr == CPU_POWER7
-	  || rs6000_cpu_attr == CPU_POWER8)
+	  || rs6000_cpu_attr == CPU_POWER8 || rs6000_cpu_attr == CPU_POWER9)
 	{
 	  nop = gen_group_ending_nop ();
 	  emit_insn_before (nop, next_insn);
@@ -31959,7 +32133,8 @@ rs6000_register_move_cost (machine_mode 
          expensive than memory in order to bias spills to memory .*/
       else if ((rs6000_cpu == PROCESSOR_POWER6
 		|| rs6000_cpu == PROCESSOR_POWER7
-		|| rs6000_cpu == PROCESSOR_POWER8)
+		|| rs6000_cpu == PROCESSOR_POWER8
+		|| rs6000_cpu == PROCESSOR_POWER9)
 	       && reg_classes_intersect_p (rclass, LINK_OR_CTR_REGS))
         ret = 6 * hard_regno_nregs[0][mode];
 
@@ -33489,12 +33664,14 @@ static struct rs6000_opt_mask const rs60
   { "efficient-unaligned-vsx",	OPTION_MASK_EFFICIENT_UNALIGNED_VSX,
 								false, true  },
   { "float128",			OPTION_MASK_FLOAT128,		false, true  },
+  { "float128-hardware",	OPTION_MASK_FLOAT128_HW,	false, true  },
   { "fprnd",			OPTION_MASK_FPRND,		false, true  },
   { "hard-dfp",			OPTION_MASK_DFP,		false, true  },
   { "htm",			OPTION_MASK_HTM,		false, true  },
   { "isel",			OPTION_MASK_ISEL,		false, true  },
   { "mfcrf",			OPTION_MASK_MFCRF,		false, true  },
   { "mfpgpr",			OPTION_MASK_MFPGPR,		false, true  },
+  { "modulo",			OPTION_MASK_MODULO,		false, true  },
   { "mulhw",			OPTION_MASK_MULHW,		false, true  },
   { "multiple",			OPTION_MASK_MULTIPLE,		false, true  },
   { "popcntb",			OPTION_MASK_POPCNTB,		false, true  },
@@ -33502,6 +33679,10 @@ static struct rs6000_opt_mask const rs60
   { "power8-fusion",		OPTION_MASK_P8_FUSION,		false, true  },
   { "power8-fusion-sign",	OPTION_MASK_P8_FUSION_SIGN,	false, true  },
   { "power8-vector",		OPTION_MASK_P8_VECTOR,		false, true  },
+  { "power9-dform",		OPTION_MASK_P9_DFORM,		false, true  },
+  { "power9-fusion",		OPTION_MASK_P9_FUSION,		false, true  },
+  { "power9-minmax",		OPTION_MASK_P9_MINMAX,		false, true  },
+  { "power9-vector",		OPTION_MASK_P9_VECTOR,		false, true  },
   { "powerpc-gfxopt",		OPTION_MASK_PPC_GFXOPT,		false, true  },
   { "powerpc-gpopt",		OPTION_MASK_PPC_GPOPT,		false, true  },
   { "quad-memory",		OPTION_MASK_QUAD_MEMORY,	false, true  },
@@ -33509,6 +33690,7 @@ static struct rs6000_opt_mask const rs60
   { "recip-precision",		OPTION_MASK_RECIP_PRECISION,	false, true  },
   { "save-toc-indirect",	OPTION_MASK_SAVE_TOC_INDIRECT,	false, true  },
   { "string",			OPTION_MASK_STRING,		false, true  },
+  { "toc-fusion",		OPTION_MASK_TOC_FUSION,		false, true  },
   { "update",			OPTION_MASK_NO_UPDATE,		true , true  },
   { "upper-regs-df",		OPTION_MASK_UPPER_REGS_DF,	false, true  },
   { "upper-regs-sf",		OPTION_MASK_UPPER_REGS_SF,	false, true  },
Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(revision 229970)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -252,7 +252,7 @@ (define_attr "cpu"
    ppc750,ppc7400,ppc7450,
    ppc403,ppc405,ppc440,ppc476,
    ppc8540,ppc8548,ppce300c2,ppce300c3,ppce500mc,ppce500mc64,ppce5500,ppce6500,
-   power4,power5,power6,power7,power8,
+   power4,power5,power6,power7,power8,power9,
    rs64a,mpccore,cell,ppca2,titan"
   (const (symbol_ref "rs6000_cpu_attr")))
 
Index: gcc/config/rs6000/rs6000-tables.opt
===================================================================
--- gcc/config/rs6000/rs6000-tables.opt	(revision 229970)
+++ gcc/config/rs6000/rs6000-tables.opt	(working copy)
@@ -180,14 +180,17 @@ EnumValue
 Enum(rs6000_cpu_opt_value) String(power8) Value(50)
 
 EnumValue
-Enum(rs6000_cpu_opt_value) String(powerpc) Value(51)
+Enum(rs6000_cpu_opt_value) String(power9) Value(51)
 
 EnumValue
-Enum(rs6000_cpu_opt_value) String(powerpc64) Value(52)
+Enum(rs6000_cpu_opt_value) String(powerpc) Value(52)
 
 EnumValue
-Enum(rs6000_cpu_opt_value) String(powerpc64le) Value(53)
+Enum(rs6000_cpu_opt_value) String(powerpc64) Value(53)
 
 EnumValue
-Enum(rs6000_cpu_opt_value) String(rs64) Value(54)
+Enum(rs6000_cpu_opt_value) String(powerpc64le) Value(54)
+
+EnumValue
+Enum(rs6000_cpu_opt_value) String(rs64) Value(55)
 
Index: gcc/config/rs6000/rs6000-c.c
===================================================================
--- gcc/config/rs6000/rs6000-c.c	(revision 229970)
+++ gcc/config/rs6000/rs6000-c.c	(working copy)
@@ -349,6 +349,8 @@ rs6000_target_modify_macros (bool define
     rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR7");
   if ((flags & OPTION_MASK_DIRECT_MOVE) != 0)
     rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR8");
+  if ((flags & OPTION_MASK_MODULO) != 0)
+    rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR9");
   if ((flags & OPTION_MASK_SOFT_FLOAT) != 0)
     rs6000_define_or_undefine_macro (define_p, "_SOFT_FLOAT");
   if ((flags & OPTION_MASK_RECIP_PRECISION) != 0)
Index: gcc/config/rs6000/aix61.h
===================================================================
--- gcc/config/rs6000/aix61.h	(revision 229970)
+++ gcc/config/rs6000/aix61.h	(working copy)
@@ -80,6 +80,7 @@ do {									\
 %{mcpu=power6x: -mpwr6} \
 %{mcpu=power7: -mpwr7} \
 %{mcpu=power8: -mpwr8} \
+%{mcpu=power9: -mpwr9} \
 %{mcpu=powerpc: -mppc} \
 %{mcpu=rs64a: -mppc} \
 %{mcpu=603: -m603} \
Index: gcc/config/rs6000/aix53.h
===================================================================
--- gcc/config/rs6000/aix53.h	(revision 229970)
+++ gcc/config/rs6000/aix53.h	(working copy)
@@ -63,6 +63,7 @@ do {									\
 %{mcpu=power6x: -mpwr6} \
 %{mcpu=power7: -mpwr7} \
 %{mcpu=power8: -mpwr8} \
+%{mcpu=power9: -mpwr9} \
 %{mcpu=powerpc: -mppc} \
 %{mcpu=rs64a: -mppc} \
 %{mcpu=603: -m603} \
Index: gcc/configure.ac
===================================================================
--- gcc/configure.ac	(revision 229970)
+++ gcc/configure.ac	(working copy)
@@ -4323,6 +4323,19 @@ LCF0:
 	  [Define if your assembler supports POWER8 instructions.])])
 
     case $target in
+      *-*-aix*) conftest_s='	.machine "pwr9"
+	.csect .text[[PR]]';;
+      *) conftest_s='	.machine power9
+	.text';;
+    esac
+
+    gcc_GAS_CHECK_FEATURE([power9 support],
+      gcc_cv_as_powerpc_power9, [2,19,2], -a32,
+      [$conftest_s],,
+      [AC_DEFINE(HAVE_AS_POWER9, 1,
+	  [Define if your assembler supports POWER9 instructions.])])
+
+    case $target in
       *-*-aix*) conftest_s='	.csect .text[[PR]]
 	lwsync';;
       *) conftest_s='	.text
Index: gcc/configure
===================================================================
--- gcc/configure	(revision 229970)
+++ gcc/configure	(working copy)
@@ -26312,6 +26312,48 @@ $as_echo "#define HAVE_AS_POWER8 1" >>co
 fi
 
     case $target in
+      *-*-aix*) conftest_s='	.machine "pwr9"
+	.csect .text[PR]';;
+      *) conftest_s='	.machine power9
+	.text';;
+    esac
+
+    { $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for power9 support" >&5
+$as_echo_n "checking assembler for power9 support... " >&6; }
+if test "${gcc_cv_as_powerpc_power9+set}" = set; then :
+  $as_echo_n "(cached) " >&6
+else
+  gcc_cv_as_powerpc_power9=no
+    if test $in_tree_gas = yes; then
+    if test $gcc_cv_gas_vers -ge `expr \( \( 2 \* 1000 \) + 19 \) \* 1000 + 2`
+  then gcc_cv_as_powerpc_power9=yes
+fi
+  elif test x$gcc_cv_as != x; then
+    $as_echo "$conftest_s" > conftest.s
+    if { ac_try='$gcc_cv_as $gcc_cv_as_flags -a32 -o conftest.o conftest.s >&5'
+  { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5
+  (eval $ac_try) 2>&5
+  ac_status=$?
+  $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; }; }
+    then
+	gcc_cv_as_powerpc_power9=yes
+    else
+      echo "configure: failed program was" >&5
+      cat conftest.s >&5
+    fi
+    rm -f conftest.o conftest.s
+  fi
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $gcc_cv_as_powerpc_power9" >&5
+$as_echo "$gcc_cv_as_powerpc_power9" >&6; }
+if test $gcc_cv_as_powerpc_power9 = yes; then
+
+$as_echo "#define HAVE_AS_POWER9 1" >>confdefs.h
+
+fi
+
+    case $target in
       *-*-aix*) conftest_s='	.csect .text[PR]
 	lwsync';;
       *) conftest_s='	.text
Index: gcc/config.in
===================================================================
--- gcc/config.in	(revision 229970)
+++ gcc/config.in	(working copy)
@@ -556,6 +556,12 @@
 #endif
 
 
+/* Define if your assembler supports POWER9 instructions. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_AS_POWER9
+#endif
+
+
 /* Define if your assembler supports .ref */
 #ifndef USED_FOR_TARGET
 #undef HAVE_AS_REF
Index: gcc/doc/invoke.texi
===================================================================
--- gcc/doc/invoke.texi	(revision 229970)
+++ gcc/doc/invoke.texi	(working copy)
@@ -949,8 +949,9 @@ See RS/6000 and PowerPC Options.
 -mquad-memory-atomic -mno-quad-memory-atomic @gol
 -mcompat-align-parm -mno-compat-align-parm @gol
 -mupper-regs-df -mno-upper-regs-df -mupper-regs-sf -mno-upper-regs-sf @gol
--mupper-regs -mno-upper-regs @gol
--mfloat128 -mno-float128}
+-mupper-regs -mno-upper-regs -mmodulo -mno-modulo @gol
+-mfloat128 -mno-float128 -mfloat128-hardware -mno-float128-hardware @gol
+-mpower9-fusion -mno-mpower9-fusion -mpower9-vector -mno-power9-vector}
 
 @emph{RX Options}
 @gccoptlist{-m64bit-doubles  -m32bit-doubles  -fpu  -nofpu@gol
@@ -19311,8 +19312,9 @@ Supported values for @var{cpu_type} are 
 @samp{e300c3}, @samp{e500mc}, @samp{e500mc64}, @samp{e5500},
 @samp{e6500}, @samp{ec603e}, @samp{G3}, @samp{G4}, @samp{G5},
 @samp{titan}, @samp{power3}, @samp{power4}, @samp{power5}, @samp{power5+},
-@samp{power6}, @samp{power6x}, @samp{power7}, @samp{power8}, @samp{powerpc},
-@samp{powerpc64}, @samp{powerpc64le}, and @samp{rs64}.
+@samp{power6}, @samp{power6x}, @samp{power7}, @samp{power8},
+@samp{power9}, @samp{powerpc}, @samp{powerpc64}, @samp{powerpc64le},
+and @samp{rs64}.
 
 @option{-mcpu=powerpc}, @option{-mcpu=powerpc64}, and
 @option{-mcpu=powerpc64le} specify pure 32-bit PowerPC (either
@@ -19332,7 +19334,8 @@ following options:
 -mpowerpc-gpopt  -mpowerpc-gfxopt  -msingle-float -mdouble-float @gol
 -msimple-fpu -mstring  -mmulhw  -mdlmzb  -mmfpgpr -mvsx @gol
 -mcrypto -mdirect-move -mpower8-fusion -mpower8-vector @gol
--mquad-memory -mquad-memory-atomic}
+-mquad-memory -mquad-memory-atomic -mmodulo -mfloat128 -mfloat128-hardware @gol
+-mpower9-fusion -mpower9-vector}
 
 The particular options set for any particular CPU varies between
 compiler versions, depending on what setting seems to produce optimal
@@ -19569,12 +19572,45 @@ If the @option{-mno-upper-regs} option i
 @opindex mfloat128
 @opindex mno-float128
 Enable/disable the @var{__float128} keyword for IEEE 128-bit floating point
-and use software emulation for IEEE 128-bit floating point.
+and use either software emulation for IEEE 128-bit floating point or
+hardware instructions.
 
 The VSX instruction set (@option{-mvsx}, @option{-mcpu=power7}, or
 @option{-mcpu=power8}) must be enabled to use the @option{-mfloat128}
 option.
 
+@item -mfloat128-hardware
+@itemx -mno-float128-hardware
+@opindex mfloat128-hardware
+@opindex mno-float128-hardware
+Enable/disable using ISA 3.0 hardware instructions to support the
+@var{__float128} data type.
+
+@item -mmodulo
+@itemx -mno-modulo
+@opindex mmodulo
+@opindex mno-module
+Generate code that uses (does not use) the ISA 3.0 integer modulo
+instructions.  The @option{-mmodulo} option is enabled by default
+with the @option{-mcpu=power9} option.
+
+@item -mpower9-fusion
+@itemx -mno-power9-fusion
+@opindex mpower9-fusion
+@opindex mno-power9-fusion
+Generate code that keeps (does not keeps) some operations adjacent so
+that the instructions can be fused together on power9 and later
+processors.
+
+@item -mpower9-vector
+@itemx -mno-power9-vector
+@opindex mpower9-vector
+@opindex mno-power9-vector
+Generate code that uses (does not use) the vector and scalar
+instructions that were added in version 2.07 of the PowerPC ISA.  Also
+enable the use of built-in functions that allow more direct access to
+the vector instructions.
+
 @item -mfloat-gprs=@var{yes/single/double/no}
 @itemx -mfloat-gprs
 @opindex mfloat-gprs

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH], Add power9 support to GCC, patch #2 (add modulus instructions)
  2015-11-03 20:29 [PATCH], Add power9 support to GCC, patch #1 Michael Meissner
  2015-11-04 21:16 ` Segher Boessenkool
  2015-11-09  0:33 ` [PATCH], Add power9 support to GCC, patch #1 (revised) Michael Meissner
@ 2015-11-09  0:36 ` Michael Meissner
  2015-11-09 15:48   ` Segher Boessenkool
                     ` (2 more replies)
  2015-11-09  0:38 ` [PATCH], Add power9 support to GCC, patch #3 (scalar count trailing zeros) Michael Meissner
                   ` (6 subsequent siblings)
  9 siblings, 3 replies; 47+ messages in thread
From: Michael Meissner @ 2015-11-09  0:36 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, dje.gcc

[-- Attachment #1: Type: text/plain, Size: 1214 bytes --]

This is patch #2.  It adds support for the new modulus instructions that are
being added in ISA 3.0 (power9):

I have built this patch (along with patches #3 and #4) with a bootstrap build
on a power8 little endian system.  There were no regressions in the test
suite.  Is this patch ok to install in the trunk once patch #1 has been
installed.

[gcc]
2015-11-08  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* config/rs6000/rs6000.c (rs6000_rtx_costs): Update costs for
	modulus instructions if we have hardware support.

	* config/rs6000/rs6000.md (mod<mode>3): Add support for ISA 3.0
	modulus instructions.
	(umod<mode>3): Likewise.
	(divmod peephole): Likewise.
	(udivmod peephole): Likewise.

[gcc/testsuite]
2015-11-08  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* lib/target-supports.exp (check_p9vector_hw_available): Add
	checks for power9 availability.
	(check_effective_target_powerpc_p9vector_ok): Likewise.
	(check_vect_support_and_set_flags): Likewise.

	* gcc.target/powerpc/mod-1.c: New test.
	* gcc.target/powerpc/mod-2.c: Likewise.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

[-- Attachment #2: gcc-power9.official-02b --]
[-- Type: text/plain, Size: 10300 bytes --]

Index: gcc/testsuite/gcc.target/powerpc/mod-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/mod-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/mod-1.c	(revision 0)
@@ -0,0 +1,21 @@
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-options "-mcpu=power9 -O3" } */
+
+int ismod (int a, int b) { return a%b; }
+long lsmod (long a, long b) { return a%b; }
+unsigned int iumod (unsigned int a, unsigned int b) { return a%b; }
+unsigned long lumod (unsigned long a, unsigned long b) { return a%b; }
+
+/* { dg-final { scan-assembler-times "modsw " 1 } } */
+/* { dg-final { scan-assembler-times "modsd " 1 } } */
+/* { dg-final { scan-assembler-times "moduw " 1 } } */
+/* { dg-final { scan-assembler-times "modud " 1 } } */
+/* { dg-final { scan-assembler-not   "mullw "   } } */
+/* { dg-final { scan-assembler-not   "mulld "   } } */
+/* { dg-final { scan-assembler-not   "divw "    } } */
+/* { dg-final { scan-assembler-not   "divd "    } } */
+/* { dg-final { scan-assembler-not   "divwu "   } } */
+/* { dg-final { scan-assembler-not   "divdu "   } } */
Index: gcc/testsuite/gcc.target/powerpc/mod-2.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/mod-2.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/mod-2.c	(revision 0)
@@ -0,0 +1,14 @@
+/* { dg-do compile { target { powerpc*-*-* && ilp32 } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-options "-mcpu=power9 -O3" } */
+
+int ismod (int a, int b) { return a%b; }
+unsigned int iumod (unsigned int a, unsigned int b) { return a%b; }
+
+/* { dg-final { scan-assembler-times "modsw " 1 } } */
+/* { dg-final { scan-assembler-times "moduw " 1 } } */
+/* { dg-final { scan-assembler-not   "mullw "   } } */
+/* { dg-final { scan-assembler-not   "divw "    } } */
+/* { dg-final { scan-assembler-not   "divwu "   } } */
Index: gcc/testsuite/lib/target-supports.exp
===================================================================
--- gcc/testsuite/lib/target-supports.exp	(revision 229970)
+++ gcc/testsuite/lib/target-supports.exp	(working copy)
@@ -1635,6 +1635,30 @@ proc check_p8vector_hw_available { } {
     }]
 }
 
+# Return 1 if the target supports executing power9 vector instructions, 0
+# otherwise.  Cache the result.
+
+proc check_p9vector_hw_available { } {
+    return [check_cached_effective_target p9vector_hw_available {
+	# Some simulators are known to not support VSX/power8 instructions.
+	# For now, disable on Darwin
+	if { [istarget powerpc-*-eabi] || [istarget powerpc*-*-eabispe] || [istarget *-*-darwin*]} {
+	    expr 0
+	} else {
+	    set options "-mpower9-vector"
+	    check_runtime_nocache p9vector_hw_available {
+		int main()
+		{
+		  long e = -1;
+		  vector double v = (vector double) { 0.0, 0.0 };
+		  asm ("xsxexpdp %0,%1" : "+r" (e) : "wa" (v));
+		  return e;
+		}
+	    } $options
+	}
+    }]
+}
+
 # Return 1 if the target supports executing VSX instructions, 0
 # otherwise.  Cache the result.
 
@@ -3358,6 +3382,31 @@ proc check_effective_target_powerpc_p8ve
     }
 }
 
+# Return 1 if this is a PowerPC target supporting -mpower9-vector
+
+proc check_effective_target_powerpc_p9vector_ok { } {
+    if { ([istarget powerpc*-*-*]
+         && ![istarget powerpc-*-linux*paired*])
+	 || [istarget rs6000-*-*] } {
+	# AltiVec is not supported on AIX before 5.3.
+	if { [istarget powerpc*-*-aix4*]
+	     || [istarget powerpc*-*-aix5.1*] 
+	     || [istarget powerpc*-*-aix5.2*] } {
+	    return 0
+	}
+	return [check_no_compiler_messages powerpc_p9vector_ok object {
+	    int main (void) {
+		long e = -1;
+		vector double v = (vector double) { 0.0, 0.0 };
+		asm ("xsxexpdp %0,%1" : "+r" (e) : "wa" (v));
+		return e;
+	    }
+	} "-mpower9-vector"]
+    } else {
+	return 0
+    }
+}
+
 # Return 1 if this is a PowerPC target supporting -mvsx
 
 proc check_effective_target_powerpc_vsx_ok { } {
@@ -5459,6 +5508,7 @@ proc is-effective-target { arg } {
 	  "vmx_hw"         { set selected [check_vmx_hw_available] }
 	  "vsx_hw"         { set selected [check_vsx_hw_available] }
 	  "p8vector_hw"    { set selected [check_p8vector_hw_available] }
+	  "p9vector_hw"    { set selected [check_p9vector_hw_available] }
 	  "ppc_recip_hw"   { set selected [check_ppc_recip_hw_available] }
 	  "dfp_hw"         { set selected [check_dfp_hw_available] }
 	  "htm_hw"         { set selected [check_htm_hw_available] }
@@ -5483,6 +5533,7 @@ proc is-effective-target-keyword { arg }
 	  "vmx_hw"         { return 1 }
 	  "vsx_hw"         { return 1 }
 	  "p8vector_hw"    { return 1 }
+	  "p9vector_hw"    { return 1 }
 	  "ppc_recip_hw"   { return 1 }
 	  "dfp_hw"         { return 1 }
 	  "htm_hw"         { return 1 }
@@ -6186,7 +6237,9 @@ proc check_vect_support_and_set_flags { 
         }
 
         lappend DEFAULT_VECTCFLAGS "-maltivec"
-        if [check_p8vector_hw_available] {
+        if [check_p9vector_hw_available] {
+            lappend DEFAULT_VECTCFLAGS "-mpower9-vector"
+        } elseif [check_p8vector_hw_available] {
             lappend DEFAULT_VECTCFLAGS "-mpower8-vector"
         } elseif [check_vsx_hw_available] {
             lappend DEFAULT_VECTCFLAGS "-mvsx" "-mno-allow-movmisalign"
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 229972)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -31844,8 +31844,8 @@ rs6000_rtx_costs (rtx x, machine_mode mo
 	  else
 	    *total = rs6000_cost->divsi;
 	}
-      /* Add in shift and subtract for MOD. */
-      if (code == MOD || code == UMOD)
+      /* Add in shift and subtract for MOD unless we have a mod instruction. */
+      if (!TARGET_MODULO && (code == MOD || code == UMOD))
 	*total += COSTS_N_INSNS (2);
       return false;
 
Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(revision 229972)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -2885,9 +2885,9 @@ (define_insn_and_split "*div<mode>3_sra_
    (set_attr "cell_micro" "not")])
 
 (define_expand "mod<mode>3"
-  [(use (match_operand:GPR 0 "gpc_reg_operand" ""))
-   (use (match_operand:GPR 1 "gpc_reg_operand" ""))
-   (use (match_operand:GPR 2 "reg_or_cint_operand" ""))]
+  [(set (match_operand:GPR 0 "gpc_reg_operand" "")
+	(mod:GPR (match_operand:GPR 1 "gpc_reg_operand" "")
+		 (match_operand:GPR 2 "reg_or_cint_operand" "")))]
   ""
 {
   int i;
@@ -2897,16 +2897,93 @@ (define_expand "mod<mode>3"
   if (GET_CODE (operands[2]) != CONST_INT
       || INTVAL (operands[2]) <= 0
       || (i = exact_log2 (INTVAL (operands[2]))) < 0)
-    FAIL;
+    {
+      if (!TARGET_MODULO)
+	FAIL;
 
-  temp1 = gen_reg_rtx (<MODE>mode);
-  temp2 = gen_reg_rtx (<MODE>mode);
+      operands[2] = force_reg (<MODE>mode, operands[2]);
+    }
+  else
+    {
+      temp1 = gen_reg_rtx (<MODE>mode);
+      temp2 = gen_reg_rtx (<MODE>mode);
 
-  emit_insn (gen_div<mode>3 (temp1, operands[1], operands[2]));
-  emit_insn (gen_ashl<mode>3 (temp2, temp1, GEN_INT (i)));
-  emit_insn (gen_sub<mode>3 (operands[0], operands[1], temp2));
-  DONE;
+      emit_insn (gen_div<mode>3 (temp1, operands[1], operands[2]));
+      emit_insn (gen_ashl<mode>3 (temp2, temp1, GEN_INT (i)));
+      emit_insn (gen_sub<mode>3 (operands[0], operands[1], temp2));
+      DONE;
+    }
 })
+
+;; In order to enable using a peephole2 for combining div/mod to eliminate the
+;; mod, prefer putting the result of mod into a different register
+(define_insn "*mod<mode>3"
+  [(set (match_operand:GPR 0 "gpc_reg_operand" "=&r")
+        (mod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r")
+		 (match_operand:GPR 2 "gpc_reg_operand" "r")))]
+  "TARGET_MODULO"
+  "mods<wd> %0,%1,%2"
+  [(set_attr "type" "div")
+   (set_attr "size" "<bits>")])
+
+
+(define_insn "umod<mode>3"
+  [(set (match_operand:GPR 0 "gpc_reg_operand" "=&r")
+        (umod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r")
+		  (match_operand:GPR 2 "gpc_reg_operand" "r")))]
+  "TARGET_MODULO"
+  "modu<wd> %0,%1,%2"
+  [(set_attr "type" "div")
+   (set_attr "size" "<bits>")])
+
+;; On machines with modulo support, do a combined div/mod the old fashioned
+;; method, since the multiply/subtract is faster than doing the mod instruction
+;; after a divide.
+
+(define_peephole2
+  [(set (match_operand:GPR 0 "gpc_reg_operand" "")
+	(div:GPR (match_operand:GPR 1 "gpc_reg_operand" "")
+		 (match_operand:GPR 2 "gpc_reg_operand" "")))
+   (set (match_operand:GPR 3 "gpc_reg_operand" "")
+	(mod:GPR (match_dup 1)
+		 (match_dup 2)))]
+  "TARGET_MODULO
+   && ! reg_mentioned_p (operands[0], operands[1])
+   && ! reg_mentioned_p (operands[0], operands[2])
+   && ! reg_mentioned_p (operands[3], operands[1])
+   && ! reg_mentioned_p (operands[3], operands[2])"
+  [(set (match_dup 0)
+	(div:GPR (match_dup 1)
+		 (match_dup 2)))
+   (set (match_dup 3)
+	(mult:GPR (match_dup 0)
+		  (match_dup 2)))
+   (set (match_dup 3)
+	(minus:GPR (match_dup 1)
+		   (match_dup 3)))])
+
+(define_peephole2
+  [(set (match_operand:GPR 0 "gpc_reg_operand" "")
+	(udiv:GPR (match_operand:GPR 1 "gpc_reg_operand" "")
+		  (match_operand:GPR 2 "gpc_reg_operand" "")))
+   (set (match_operand:GPR 3 "gpc_reg_operand" "")
+	(umod:GPR (match_dup 1)
+		  (match_dup 2)))]
+  "TARGET_MODULO
+   && ! reg_mentioned_p (operands[0], operands[1])
+   && ! reg_mentioned_p (operands[0], operands[2])
+   && ! reg_mentioned_p (operands[3], operands[1])
+   && ! reg_mentioned_p (operands[3], operands[2])"
+  [(set (match_dup 0)
+	(div:GPR (match_dup 1)
+		 (match_dup 2)))
+   (set (match_dup 3)
+	(mult:GPR (match_dup 0)
+		  (match_dup 2)))
+   (set (match_dup 3)
+	(minus:GPR (match_dup 1)
+		   (match_dup 3)))])
+
 \f
 ;; Logical instructions
 ;; The logical instructions are mostly combined by using match_operator,

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH], Add power9 support to GCC, patch #3 (scalar count trailing zeros)
  2015-11-03 20:29 [PATCH], Add power9 support to GCC, patch #1 Michael Meissner
                   ` (2 preceding siblings ...)
  2015-11-09  0:36 ` [PATCH], Add power9 support to GCC, patch #2 (add modulus instructions) Michael Meissner
@ 2015-11-09  0:38 ` Michael Meissner
  2015-11-09 15:59   ` Segher Boessenkool
  2015-11-09 18:02   ` David Edelsohn
  2015-11-09  0:39 ` [PATCH], Add power9 support to GCC, patch #4 Michael Meissner
                   ` (5 subsequent siblings)
  9 siblings, 2 replies; 47+ messages in thread
From: Michael Meissner @ 2015-11-09  0:38 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, dje.gcc

[-- Attachment #1: Type: text/plain, Size: 1072 bytes --]

This patch adds support for scalar count trailing zeros instruction that is
being added to ISA 3.0 (power9).

I have built this patch (along with patches #2 and #4) with a bootstrap build
on a power8 little endian system.  There were no regressions in the test
suite.  Is this patch ok to install in the trunk once patch #1 has been
installed.

[gcc]
2015-11-08  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* config/rs6000/rs6000.c (rs6000_rtx_costs): Update costs for
	count trailing zero instruction if we have hardware support.

	* config/rs6000/rs6000.h (TARGET_CTZ): Add support for count
	trailing zero instruction in ISA 3.0.
	* config/rs6000/rs6000.c (ctz<mode>2): Likewise.
	(ctz<mode>2_h): Likewise.

[gcc/testsuite]
2015-11-08  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* gcc.target/powerpc/ctz-1.c: Add test for count trailing zero
	instruciton support.
	* gcc.target/powerpc/ctz-2.c: Likewise.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

[-- Attachment #2: gcc-power9.official-03b --]
[-- Type: text/plain, Size: 3628 bytes --]

Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 229973)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -31850,6 +31850,9 @@ rs6000_rtx_costs (rtx x, machine_mode mo
       return false;
 
     case CTZ:
+      *total = COSTS_N_INSNS (TARGET_CTZ ? 1 : 4);
+      return false;
+
     case FFS:
       *total = COSTS_N_INSNS (4);
       return false;
Index: gcc/config/rs6000/rs6000.h
===================================================================
--- gcc/config/rs6000/rs6000.h	(revision 229972)
+++ gcc/config/rs6000/rs6000.h	(working copy)
@@ -565,6 +565,7 @@ extern int rs6000_vector_align[];
 #define TARGET_FCFIDUS	TARGET_POPCNTD
 #define TARGET_FCTIDUZ	TARGET_POPCNTD
 #define TARGET_FCTIWUZ	TARGET_POPCNTD
+#define TARGET_CTZ	TARGET_MODULO
 
 #define TARGET_XSCVDPSPN	(TARGET_DIRECT_MOVE || TARGET_P8_VECTOR)
 #define TARGET_XSCVSPDPN	(TARGET_DIRECT_MOVE || TARGET_P8_VECTOR)
Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(revision 229973)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -2101,12 +2101,25 @@ (define_expand "ctz<mode>2"
 	      (clobber (reg:GPR CA_REGNO))])]
   ""
 {
+  if (TARGET_CTZ)
+    {
+      emit_insn (gen_ctz<mode>2_hw (operands[0], operands[1]));
+      DONE;
+    }
+
   operands[2] = gen_reg_rtx (<MODE>mode);
   operands[3] = gen_reg_rtx (<MODE>mode);
   operands[4] = gen_reg_rtx (<MODE>mode);
   operands[5] = GEN_INT (GET_MODE_BITSIZE (<MODE>mode) - 1);
 })
 
+(define_insn "ctz<mode>2_hw"
+  [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
+	(ctz:GPR (match_operand:GPR 1 "gpc_reg_operand" "r")))]
+  "TARGET_CTZ"
+  "cnttz<wd> %0,%1"
+  [(set_attr "type" "cntlz")])
+
 (define_expand "ffs<mode>2"
   [(set (match_dup 2)
 	(neg:GPR (match_operand:GPR 1 "gpc_reg_operand" "")))
Index: gcc/testsuite/gcc.target/powerpc/ctz-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/ctz-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/ctz-1.c	(revision 0)
@@ -0,0 +1,14 @@
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-options "-mcpu=power9 -O3" } */
+
+int i_trailing_zero (int a) { return __builtin_ctz (a); }
+int l_trailing_zero (long a) { return __builtin_ctzl (a); }
+int ll_trailing_zero (long long a) { return __builtin_ctzll (a); }
+
+/* { dg-final { scan-assembler     "cnttzw " } } */
+/* { dg-final { scan-assembler     "cnttzd " } } */
+/* { dg-final { scan-assembler-not "cntlzw " } } */
+/* { dg-final { scan-assembler-not "cntlzd " } } */
Index: gcc/testsuite/gcc.target/powerpc/ctz-2.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/ctz-2.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/ctz-2.c	(revision 0)
@@ -0,0 +1,10 @@
+/* { dg-do compile { target { powerpc*-*-* && ilp32 } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-options "-mcpu=power9 -O3" } */
+
+int i_trailing_zero (int a) { return __builtin_ctz (a); }
+
+/* { dg-final { scan-assembler     "cnttzw " } } */
+/* { dg-final { scan-assembler-not "cntlzw " } } */

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH], Add power9 support to GCC, patch #4
  2015-11-03 20:29 [PATCH], Add power9 support to GCC, patch #1 Michael Meissner
                   ` (3 preceding siblings ...)
  2015-11-09  0:38 ` [PATCH], Add power9 support to GCC, patch #3 (scalar count trailing zeros) Michael Meissner
@ 2015-11-09  0:39 ` Michael Meissner
  2015-11-09 16:29   ` Segher Boessenkool
  2015-11-09 18:03   ` David Edelsohn
  2015-11-09  0:42 ` [PATCH], Add power9 support to GCC, patch #5 (ISA 3.0 fusion) Michael Meissner
                   ` (4 subsequent siblings)
  9 siblings, 2 replies; 47+ messages in thread
From: Michael Meissner @ 2015-11-09  0:39 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, dje.gcc

[-- Attachment #1: Type: text/plain, Size: 1224 bytes --]

This patch adds support for the EXTSWSLI instruction that is being added to
PowerPC ISA 3.0 (power9).

I have built this patch (along with patches #2 and #3) with a bootstrap build
on a power8 little endian system.  There were no regressions in the test
suite.  Is this patch ok to install in the trunk once patch #1 has been
installed.

[gcc]
2015-11-08  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* config/rs6000/predicates.md (u6bit_cint_operand): New
	predicate, recognize 0..63.

	* config/rs6000/rs6000.c (rs6000_rtx_costs): Adjust the costs if
	the EXTSWSLI instruction is generated.

	* config/rs6000/rs6000.h (TARGET_EXTSWSLI): Add support for ISA
	3.0 EXTSWSLI instruction.
	* config/rs6000/rs6000.md (ashdi3_extswsli): Likewise.
	(ashdi3_extswsli_dot): Likewise.
	(ashdi3_extswsli_dot2): Likewise.

[gcc/testsuite]
2015-11-08  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* gcc.target/powerpc/extswsli-1.c: New file to test EXTSWSLI
	instruction generation.
	* gcc.target/powerpc/extswsli-2.c: Likewise.
	* gcc.target/powerpc/extswsli-3.c: Likewise.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

[-- Attachment #2: gcc-power9.official-04b --]
[-- Type: text/plain, Size: 8814 bytes --]

Index: gcc/config/rs6000/predicates.md
===================================================================
--- gcc/config/rs6000/predicates.md	(revision 229970)
+++ gcc/config/rs6000/predicates.md	(working copy)
@@ -142,6 +142,11 @@ (define_predicate "u5bit_cint_operand"
   (and (match_code "const_int")
        (match_test "INTVAL (op) >= 0 && INTVAL (op) <= 31")))
 
+;; Return 1 if op is a unsigned 6-bit constant integer.
+(define_predicate "u6bit_cint_operand"
+  (and (match_code "const_int")
+       (match_test "INTVAL (op) >= 0 && INTVAL (op) <= 63")))
+
 ;; Return 1 if op is a signed 8-bit constant integer.
 ;; Integer multiplication complete more quickly
 (define_predicate "s8bit_cint_operand"
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 229974)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -31927,6 +31927,17 @@ rs6000_rtx_costs (rtx x, machine_mode mo
       return false;
 
     case ASHIFT:
+      /* The EXTSWSLI instruction is a combined instruction.  Don't count both
+	 the sign extend and shift separately within the insn.  */
+      if (TARGET_EXTSWSLI && mode == DImode
+	  && GET_CODE (XEXP (x, 0)) == SIGN_EXTEND
+	  && GET_MODE (XEXP (XEXP (x, 0), 0)) == SImode)
+	{
+	  *total = 0;
+	  return false;
+	}
+      /* fall through */
+	  
     case ASHIFTRT:
     case LSHIFTRT:
     case ROTATE:
Index: gcc/config/rs6000/rs6000.h
===================================================================
--- gcc/config/rs6000/rs6000.h	(revision 229974)
+++ gcc/config/rs6000/rs6000.h	(working copy)
@@ -566,6 +566,7 @@ extern int rs6000_vector_align[];
 #define TARGET_FCTIDUZ	TARGET_POPCNTD
 #define TARGET_FCTIWUZ	TARGET_POPCNTD
 #define TARGET_CTZ	TARGET_MODULO
+#define TARGET_EXTSWSLI	(TARGET_MODULO && TARGET_POWERPC64)
 
 #define TARGET_XSCVDPSPN	(TARGET_DIRECT_MOVE || TARGET_P8_VECTOR)
 #define TARGET_XSCVSPDPN	(TARGET_DIRECT_MOVE || TARGET_P8_VECTOR)
Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(revision 229974)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -3933,6 +3933,127 @@ (define_insn_and_split "*ashl<mode>3_dot
    (set_attr "dot" "yes")
    (set_attr "length" "4,8")])
 
+;; Pretend we have a memory form of extswsli until register allocation is done
+;; so that we use LWZ to load the value from memory, instead of LWA.
+(define_insn_and_split "ashdi3_extswsli"
+  [(set (match_operand:DI 0 "gpc_reg_operand" "=r,r")
+	(ashift:DI
+	 (sign_extend:DI (match_operand:SI 1 "reg_or_mem_operand" "r,m"))
+	 (match_operand:DI 2 "u6bit_cint_operand" "n,n")))]
+  "TARGET_EXTSWSLI"
+  "@
+   extswsli %0,%1,%2
+   #"
+  "&& reload_completed && MEM_P (operands[1])"
+  [(set (match_dup 3)
+	(match_dup 1))
+   (set (match_dup 0)
+	(ashift:DI (sign_extend:DI (match_dup 3))
+		   (match_dup 2)))]
+{
+  operands[3] = gen_lowpart (SImode, operands[0]);
+}
+  [(set_attr "type" "shift")
+   (set_attr "maybe_var_shift" "no")])
+
+
+(define_insn_and_split "*ashdi3_extswsli_dot"
+  [(set (match_operand:CC 3 "cc_reg_operand" "=x,?y,?x,??y")
+	(compare:CC
+	 (ashift:DI
+	  (sign_extend:DI (match_operand:SI 1 "reg_or_mem_operand" "r,r,m,m"))
+	  (match_operand:DI 2 "u6bit_cint_operand" "n,n,n,n"))
+	 (const_int 0)))
+   (clobber (match_scratch:DI 0 "=r,r,r,r"))]
+  "TARGET_EXTSWSLI"
+  "@
+   extswsli. %0,%1,%2
+   #
+   #
+   #"
+  "&& reload_completed
+   && (cc_reg_not_cr0_operand (operands[3], CCmode)
+       || memory_operand (operands[1], SImode))"
+  [(pc)]
+{
+  rtx dest = operands[0];
+  rtx src = operands[1];
+  rtx shift = operands[2];
+  rtx cr = operands[3];
+  rtx src2;
+
+  if (!MEM_P (src))
+    src2 = src;
+  else
+    {
+      src2 = gen_lowpart (SImode, dest);
+      emit_move_insn (src2, src);
+    }
+
+  if (REGNO (cr) == CR0_REGNO)
+    {
+      emit_insn (gen_ashdi3_extswsli_dot2 (dest, src2, shift, cr));
+      DONE;
+    }
+
+  emit_insn (gen_ashdi3_extswsli (dest, src2, shift));
+  emit_insn (gen_rtx_SET (cr, gen_rtx_COMPARE (CCmode, dest, const0_rtx)));
+  DONE;
+}
+  [(set_attr "type" "shift")
+   (set_attr "maybe_var_shift" "no")
+   (set_attr "dot" "yes")
+   (set_attr "length" "4,8,8,12")])
+
+(define_insn_and_split "ashdi3_extswsli_dot2"
+  [(set (match_operand:CC 3 "cc_reg_operand" "=x,?y,?x,??y")
+	(compare:CC
+	 (ashift:DI
+	  (sign_extend:DI (match_operand:SI 1 "reg_or_mem_operand" "r,r,m,m"))
+	  (match_operand:DI 2 "u6bit_cint_operand" "n,n,n,n"))
+	 (const_int 0)))
+   (set (match_operand:DI 0 "gpc_reg_operand" "=r,r,r,r")
+	(ashift:DI (sign_extend:DI (match_dup 1))
+		   (match_dup 2)))]
+  "TARGET_EXTSWSLI"
+  "@
+   extswsli. %0,%1,%2
+   #
+   #
+   #"
+  "&& reload_completed
+   && (cc_reg_not_cr0_operand (operands[3], CCmode)
+       || memory_operand (operands[1], SImode))"
+  [(pc)]
+{
+  rtx dest = operands[0];
+  rtx src = operands[1];
+  rtx shift = operands[2];
+  rtx cr = operands[3];
+  rtx src2;
+
+  if (!MEM_P (src))
+    src2 = src;
+  else
+    {
+      src2 = gen_lowpart (SImode, dest);
+      emit_move_insn (src2, src);
+    }
+
+  if (REGNO (cr) == CR0_REGNO)
+    {
+      emit_insn (gen_ashdi3_extswsli_dot2 (dest, src2, shift, cr));
+      DONE;
+    }
+
+  emit_insn (gen_ashdi3_extswsli (dest, src2, shift));
+  emit_insn (gen_rtx_SET (cr, gen_rtx_COMPARE (CCmode, dest, const0_rtx)));
+  DONE;
+}
+  [(set_attr "type" "shift")
+   (set_attr "maybe_var_shift" "no")
+   (set_attr "dot" "yes")
+   (set_attr "length" "4,8,8,12")])
 
 (define_insn "lshr<mode>3"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
Index: gcc/testsuite/gcc.target/powerpc/extswsli-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/extswsli-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/extswsli-1.c	(revision 0)
@@ -0,0 +1,21 @@
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-options "-mcpu=power9 -O3" } */
+
+static int mem;
+int *ptr = &mem;
+
+long
+add (long *p, int reg)
+{
+  __asm__ (" #foo %0" : "+r" (reg));
+  return p[reg] + p[mem];
+}
+
+/* { dg-final { scan-assembler-times "extswsli " 2 } } */
+/* { dg-final { scan-assembler-times "lwz "      1 } } */
+/* { dg-final { scan-assembler-not   "lwa "        } } */
+/* { dg-final { scan-assembler-not   "sldi "       } } */
+/* { dg-final { scan-assembler-not   "extsw "      } } */
Index: gcc/testsuite/gcc.target/powerpc/extswsli-2.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/extswsli-2.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/extswsli-2.c	(revision 0)
@@ -0,0 +1,38 @@
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-options "-mcpu=power9 -O3" } */
+
+long
+func1 (int reg, int *is_zero)
+{
+  long value;
+
+  __asm__ (" #foo %0" : "+r" (reg));
+  value = ((long)reg) << 4;
+
+  if (!value)
+    *is_zero = 1;
+
+  return value;
+}
+
+long
+func2 (int *ptr, int *is_zero)
+{
+  int reg = *ptr;
+  long value = ((long)reg) << 4;
+
+  if (!value)
+    *is_zero = 1;
+
+  return value;
+}
+
+/* { dg-final { scan-assembler     "extswsli\\. " } } */
+/* { dg-final { scan-assembler     "lwz "         } } */
+/* { dg-final { scan-assembler-not "lwa "         } } */
+/* { dg-final { scan-assembler-not "sldi "        } } */
+/* { dg-final { scan-assembler-not "sldi\\. "     } } */
+/* { dg-final { scan-assembler-not "extsw "       } } */
Index: gcc/testsuite/gcc.target/powerpc/extswsli-3.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/extswsli-3.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/extswsli-3.c	(revision 0)
@@ -0,0 +1,23 @@
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-options "-mcpu=power9 -O3" } */
+
+long
+do_ext_add (int *p, long a, long b)
+{
+  long l = *p;
+  long l2 = l << 4;
+  return l2 + ((l2 == 0) ? a : b);
+}
+
+long
+do_ext (int *p, long a, long b)
+{
+  long l = *p;
+  long l2 = l << 4;
+  return ((l2 == 0) ? a : b);
+}
+
+/* { dg-final { scan-assembler "extswsli\\. "} } */

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH], Add power9 support to GCC, patch #5 (ISA 3.0 fusion)
  2015-11-03 20:29 [PATCH], Add power9 support to GCC, patch #1 Michael Meissner
                   ` (4 preceding siblings ...)
  2015-11-09  0:39 ` [PATCH], Add power9 support to GCC, patch #4 Michael Meissner
@ 2015-11-09  0:42 ` Michael Meissner
  2015-11-09 17:16   ` Segher Boessenkool
                     ` (2 more replies)
  2015-11-09  0:45 ` [PATCH], Add power9 support to GCC, patch #6 (IEEE 128-bit hardware support) Michael Meissner
                   ` (3 subsequent siblings)
  9 siblings, 3 replies; 47+ messages in thread
From: Michael Meissner @ 2015-11-09  0:42 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, dje.gcc

[-- Attachment #1: Type: text/plain, Size: 3693 bytes --]

This patch adds support for new fusion forms in ISA 3.0 (power9).  In
particular, ISA 3.0 can fuse GPR loads of R0, FPR loads, GPR stores, FPR
stores, and some constant generation that ISA 2.07 (power8) could not
generate.

I have built this patch with a bootstrap build on a power8 little endian
system.  There were no regressions in the test suite.  Is this patch ok to
install in the trunk once patch #1 has been installed.

[gcc]
2015-11-08  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* config/rs6000/constraints.md (wF constraint): New constraints
	for power9/toc fusion.
	(wG constraint): Likewise.

	* config/rs6000/predicates.md (upper16_cint_operand): New
	predicate for power9 and toc fusion.
	(fpr_reg_operand): Likewise.
	(toc_fusion_or_p9_reg_operand): Likewise.
	(toc_fusion_mem_raw): Likewise.
	(toc_fusion_mem_wrapped): Likewise.
	(fusion_gpr_addis): If power9 fusion, allow fusion for a larger
	address range.
	(fusion_gpr_mem_combo): Delete, use fusion_addis_mem_combo_load
	instead.
	(fusion_addis_mem_combo_load): Add support for power9 fusion of
	floating point loads, floating point stores, and gpr stores.
	(fusion_addis_mem_combo_store): Likewise.
	(fusion_offsettable_mem_operand): Likewise.

	* config/rs6000/rs6000-protos.h (emit_fusion_addis): Add
	declarations.
	(emit_fusion_load_store): Likewise.
	(fusion_p9_p): Likewise.
	(expand_fusion_p9_load): Likewise.
	(expand_fusion_p9_store): Likewise.
	(emit_fusion_p9_load): Likewise.
	(emit_fusion_p9_store): Likewise.
	(fusion_wrap_memory_address): Likewise.

	* config/rs6000/rs6000.c (struct rs6000_reg_addr): Add new
	elements for power9 fusion.
	(rs6000_debug_print_mode): Rework debug information to print more
	information about fusion.
	(rs6000_init_hard_regno_mode_ok): Setup for power9 fusion
	support.
	(rs6000_legitimate_address_p): Recognize toc fusion as a valid
	offsettable memory address.
	(emit_fusion_gpr_load): Move most of the code from
	emit_fusion_gpr_load into emit_fusion-addis that handles both
	power8 and power9 fusion.
	(emit_fusion_addis): Likewise.
	(emit_fusion_load_store): Likewise.
	(fusion_wrap_memory_address): Add support for TOC fusion.
	(fusion_split_address): Likewise.
	(fusion_p9_p): Add support for power9 fusion.
	(expand_fusion_p9_load): Likewise.
	(expand_fusion_p9_store): Likewise.
	(emit_fusion_p9_load): Likewise.
	(emit_fusion_p9_store): Likewise.

	* config/rs6000/rs6000.h (TARGET_TOC_FUSION_INT): New macros for
	power9 fusion support.
	(TARGET_TOC_FUSION_FP): Likewise.

	* config/rs6000/rs6000.md (UNSPEC_FUSION_P9): New power9/toc
	fusion unspecs.
	(UNSPEC_FUSION_ADDIS): Likewise.
	(QHSI mode iterator): New iterator for power9 fusion.
	(GPR_FUSION): Likewise.
	(FPR_FUSION): Likewise.
	(power9 fusion splitter): New power9/toc fusion support.
	(toc_fusionload_<mode>): Likewise.
	(toc_fusionload_di): Likewise.
	(fusion_gpr_load_<mode>): Update predicate function.
	(power9 fusion peephole2s): New power9/toc fusion support.
	(fusion_gpr_<P:mode>_<GPR_FUSION:mode>_load): Likewise.
	(fusion_gpr_<P:mode>_<GPR_FUSION:mode>_store): Likewise.
	(fusion_fpr_<P:mode>_<FPR_FUSION:mode>_load): Likewise.
	(fusion_fpr_<P:mode>_<FPR_FUSION:mode>_store): Likewise.
	(fusion_p9_<mode>_constant): Likewise.

[gcc/testsuite]
2015-11-08  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* gcc.target/powerpc/fusion.c (fusion_vector): Move to fusion2.c
	and allow the test on PowerPC LE.
	* gcc.target/powerpc/fusion2.c (fusion_vector): Likewise.

	* gcc.target/powerpc/fusion3.c: New file, test power9 fusion.


-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

[-- Attachment #2: gcc-power9.official-05b --]
[-- Type: text/plain, Size: 49753 bytes --]

Index: gcc/config/rs6000/constraints.md
===================================================================
--- gcc/config/rs6000/constraints.md	(revision 229970)
+++ gcc/config/rs6000/constraints.md	(working copy)
@@ -137,6 +137,16 @@ (define_constraint "wD"
   (and (match_code "const_int")
        (match_test "TARGET_VSX && (ival == VECTOR_ELEMENT_SCALAR_64BIT)")))
 
+;; Extended fusion store
+(define_memory_constraint "wF"
+  "Memory operand suitable for power9 fusion load/stores"
+  (match_operand 0 "fusion_addis_mem_combo_load"))
+
+;; Fusion gpr load.
+(define_memory_constraint "wG"
+  "Memory operand suitable for TOC fusion memory references"
+  (match_operand 0 "toc_fusion_mem_wrapped"))
+
 ;; Lq/stq validates the address for load/store quad
 (define_memory_constraint "wQ"
   "Memory operand suitable for the load/store quad instructions"
Index: gcc/config/rs6000/predicates.md
===================================================================
--- gcc/config/rs6000/predicates.md	(revision 229975)
+++ gcc/config/rs6000/predicates.md	(working copy)
@@ -168,6 +168,12 @@ (define_predicate "u_short_cint_operand"
   (and (match_code "const_int")
        (match_test "satisfies_constraint_K (op)")))
 
+;; Return 1 if op is a constant integer that is a signed 16-bit constant
+;; shifted left 16 bits
+(define_predicate "upper16_cint_operand"
+  (and (match_code "const_int")
+       (match_test "satisfies_constraint_L (op)")))
+
 ;; Return 1 if op is a constant integer that cannot fit in a signed D field.
 (define_predicate "non_short_cint_operand"
   (and (match_code "const_int")
@@ -276,6 +282,70 @@ (define_predicate "base_reg_operand"
   return (REGNO (op) != FIRST_GPR_REGNO);
 })
 
+
+;; Return true if this is a traditional floating point register
+(define_predicate "fpr_reg_operand"
+  (match_code "reg,subreg")
+{
+  HOST_WIDE_INT r;
+
+  if (GET_CODE (op) == SUBREG)
+    op = SUBREG_REG (op);
+
+  if (!REG_P (op))
+    return 0;
+
+  r = REGNO (op);
+  if (r >= FIRST_PSEUDO_REGISTER)
+    return 1;
+
+  return FP_REGNO_P (r);
+})
+
+;; Return true if this is a register that can has D-form addressing (GPR and
+;; traditional FPR registers for scalars).  ISA 3.0 (power9) adds D-form
+;; addressing for scalars in Altivec registers.
+;;
+;; If this is a pseudo only allow for GPR fusion in power8.  If we have the
+;; power9 fusion allow the floating point types.
+(define_predicate "toc_fusion_or_p9_reg_operand"
+  (match_code "reg,subreg")
+{
+  HOST_WIDE_INT r;
+  bool gpr_p = (mode == QImode || mode == HImode || mode == SImode
+		|| mode == SFmode
+		|| (TARGET_POWERPC64 && (mode == DImode || mode == DFmode)));
+  bool fpr_p = (TARGET_P9_FUSION
+		&& (mode == DFmode || mode == SFmode
+		    || (TARGET_POWERPC64 && mode == DImode)));
+  bool vmx_p = (TARGET_P9_FUSION && TARGET_P9_VECTOR
+		&& (mode == DFmode || mode == SFmode));
+
+  if (!TARGET_P8_FUSION)
+    return 0;
+
+  if (GET_CODE (op) == SUBREG)
+    op = SUBREG_REG (op);
+
+  if (!REG_P (op))
+    return 0;
+
+  r = REGNO (op);
+  if (r >= FIRST_PSEUDO_REGISTER)
+    return (gpr_p || fpr_p || vmx_p);
+
+  if (INT_REGNO_P (r))
+    return gpr_p;
+
+  if (FP_REGNO_P (r))
+    return fpr_p;
+
+  if (ALTIVEC_REGNO_P (r))
+    return vmx_p;
+
+  return 0;
+})
+
 ;; Return 1 if op is a HTM specific SPR register.
 (define_predicate "htm_spr_reg_operand"
   (match_operand 0 "register_operand")
@@ -1603,6 +1673,35 @@ (define_predicate "small_toc_ref"
   return GET_CODE (op) == UNSPEC && XINT (op, 1) == UNSPEC_TOCREL;
 })
 
+;; Match the TOC memory operand that can be fused with an addis instruction.
+;; This is used in matching a potential fused address before register
+;; allocation.
+(define_predicate "toc_fusion_mem_raw"
+  (match_code "mem")
+{
+  if (!TARGET_TOC_FUSION_INT || !can_create_pseudo_p ())
+    return false;
+
+  return small_toc_ref (XEXP (op, 0), Pmode);
+})
+
+;; Match the memory operand that has been fused with an addis instruction and
+;; wrapped inside of an (unspec [...] UNSPEC_FUSION_ADDIS) wrapper.
+(define_predicate "toc_fusion_mem_wrapped"
+  (match_code "mem")
+{
+  rtx addr;
+
+  if (!TARGET_TOC_FUSION_INT)
+    return false;
+
+  if (!MEM_P (op))
+    return false;
+
+  addr = XEXP (op, 0);
+  return (GET_CODE (addr) == UNSPEC && XINT (addr, 1) == UNSPEC_FUSION_ADDIS);
+})
+
 ;; Match the first insn (addis) in fusing the combination of addis and loads to
 ;; GPR registers on power8.
 (define_predicate "fusion_gpr_addis"
@@ -1625,8 +1724,6 @@ (define_predicate "fusion_gpr_addis"
   else
     return 0;
 
-  /* Power8 currently will only do the fusion if the top 11 bits of the addis
-     value are all 1's or 0's.  */
   value = INTVAL (int_const);
   if ((value & (HOST_WIDE_INT)0xffff) != 0)
     return 0;
@@ -1634,6 +1731,12 @@ (define_predicate "fusion_gpr_addis"
   if ((value & (HOST_WIDE_INT)0xffff0000) == 0)
     return 0;
 
+  /* Power8 currently will only do the fusion if the top 11 bits of the addis
+     value are all 1's or 0's.  Ignore this restriction if we are testing
+     advanced fusion.  */
+  if (TARGET_P9_FUSION)
+    return 1;
+
   return (IN_RANGE (value >> 16, -32, 31));
 })
 
@@ -1699,13 +1802,14 @@ (define_predicate "fusion_gpr_mem_load"
 ;; Match a GPR load (lbz, lhz, lwz, ld) that uses a combined address in the
 ;; memory field with both the addis and the memory offset.  Sign extension
 ;; is not handled here, since lha and lwa are not fused.
-(define_predicate "fusion_gpr_mem_combo"
-  (match_code "mem,zero_extend")
+;; With extended fusion, also match a FPR load (lfd, lfs) and float_extend
+(define_predicate "fusion_addis_mem_combo_load"
+  (match_code "mem,zero_extend,float_extend")
 {
   rtx addr, base, offset;
 
-  /* Handle zero extend.  */
-  if (GET_CODE (op) == ZERO_EXTEND)
+  /* Handle zero/float extend.  */
+  if (GET_CODE (op) == ZERO_EXTEND || GET_CODE (op) == FLOAT_EXTEND)
     {
       op = XEXP (op, 0);
       mode = GET_MODE (op);
@@ -1726,6 +1830,71 @@ (define_predicate "fusion_gpr_mem_combo"
 	return 0;
       break;
 
+    case SFmode:
+    case DFmode:
+      if (!TARGET_P9_FUSION)
+	return 0;
+      break;
+
+    default:
+      return 0;
+    }
+
+  addr = XEXP (op, 0);
+  if (GET_CODE (addr) != PLUS && GET_CODE (addr) != LO_SUM)
+    return 0;
+
+  base = XEXP (addr, 0);
+  if (!fusion_gpr_addis (base, GET_MODE (base)))
+    return 0;
+
+  offset = XEXP (addr, 1);
+  if (GET_CODE (addr) == PLUS)
+    return satisfies_constraint_I (offset);
+
+  else if (GET_CODE (addr) == LO_SUM)
+    {
+      if (TARGET_XCOFF || (TARGET_ELF && TARGET_POWERPC64))
+	return small_toc_ref (offset, GET_MODE (offset));
+
+      else if (TARGET_ELF && !TARGET_POWERPC64)
+	return CONSTANT_P (offset);
+    }
+
+  return 0;
+})
+
+;; Like fusion_addis_mem_combo_load, but for stores
+(define_predicate "fusion_addis_mem_combo_store"
+  (match_code "mem")
+{
+  rtx addr, base, offset;
+
+  if (!MEM_P (op) || !TARGET_P9_FUSION)
+    return 0;
+
+  switch (mode)
+    {
+    case QImode:
+    case HImode:
+    case SImode:
+      break;
+
+    case DImode:
+      if (!TARGET_POWERPC64)
+	return 0;
+      break;
+
+    case SFmode:
+      if (!TARGET_SF_FPR)
+	return 0;
+      break;
+
+    case DFmode:
+      if (!TARGET_DF_FPR)
+	return 0;
+      break;
+
     default:
       return 0;
     }
@@ -1753,3 +1922,20 @@ (define_predicate "fusion_gpr_mem_combo"
 
   return 0;
 })
+
+;; Return true if the operand is a float_extend or zero extend of an
+;; offsettable memory operand suitable for use in fusion
+(define_predicate "fusion_offsettable_mem_operand"
+  (match_code "mem,zero_extend,float_extend")
+{
+  if (GET_CODE (op) == ZERO_EXTEND || GET_CODE (op) == FLOAT_EXTEND)
+    {
+      op = XEXP (op, 0);
+      mode = GET_MODE (op);
+    }
+
+  if (!memory_operand (op, mode))
+    return 0;
+
+  return offsettable_nonstrict_memref_p (op);
+})
Index: gcc/config/rs6000/rs6000-protos.h
===================================================================
--- gcc/config/rs6000/rs6000-protos.h	(revision 229970)
+++ gcc/config/rs6000/rs6000-protos.h	(working copy)
@@ -87,7 +87,15 @@ extern bool direct_move_p (rtx, rtx);
 extern bool quad_load_store_p (rtx, rtx);
 extern bool fusion_gpr_load_p (rtx, rtx, rtx, rtx);
 extern void expand_fusion_gpr_load (rtx *);
+extern void emit_fusion_addis (rtx, rtx, const char *, const char *);
+extern void emit_fusion_load_store (rtx, rtx, rtx, const char *);
 extern const char *emit_fusion_gpr_load (rtx, rtx);
+extern bool fusion_p9_p (rtx, rtx, rtx, rtx);
+extern void expand_fusion_p9_load (rtx *);
+extern void expand_fusion_p9_store (rtx *);
+extern const char *emit_fusion_p9_load (rtx, rtx, rtx);
+extern const char *emit_fusion_p9_store (rtx, rtx, rtx);
+extern rtx fusion_wrap_memory_address (rtx);
 extern enum reg_class (*rs6000_preferred_reload_class_ptr) (rtx,
 							    enum reg_class);
 extern enum reg_class (*rs6000_secondary_reload_class_ptr) (enum reg_class,
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 229975)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -376,8 +376,18 @@ struct rs6000_reg_addr {
   enum insn_code reload_fpr_gpr;	/* INSN to move from FPR to GPR.  */
   enum insn_code reload_gpr_vsx;	/* INSN to move from GPR to VSX.  */
   enum insn_code reload_vsx_gpr;	/* INSN to move from VSX to GPR.  */
+  enum insn_code fusion_gpr_ld;		/* INSN for fusing gpr ADDIS/loads.  */
+					/* INSNs for fusing addi with loads
+					   or stores for each reg. class.  */					   
+  enum insn_code fusion_addi_ld[(int)N_RELOAD_REG];
+  enum insn_code fusion_addi_st[(int)N_RELOAD_REG];
+					/* INSNs for fusing addis with loads
+					   or stores for each reg. class.  */					   
+  enum insn_code fusion_addis_ld[(int)N_RELOAD_REG];
+  enum insn_code fusion_addis_st[(int)N_RELOAD_REG];
   addr_mask_type addr_mask[(int)N_RELOAD_REG]; /* Valid address masks.  */
   bool scalar_in_vmx_p;			/* Scalar value can go in VMX.  */
+  bool fused_toc;			/* Mode supports TOC fusion.  */
 };
 
 static struct rs6000_reg_addr reg_addr[NUM_MACHINE_MODES];
@@ -2026,25 +2036,113 @@ DEBUG_FUNCTION void
 rs6000_debug_print_mode (ssize_t m)
 {
   ssize_t rc;
+  int spaces = 0;
+  bool fuse_extra_p;
 
   fprintf (stderr, "Mode: %-5s", GET_MODE_NAME (m));
   for (rc = 0; rc < N_RELOAD_REG; rc++)
     fprintf (stderr, " %s: %s", reload_reg_map[rc].name,
 	     rs6000_debug_addr_mask (reg_addr[m].addr_mask[rc], true));
 
+  if ((reg_addr[m].reload_store != CODE_FOR_nothing)
+      || (reg_addr[m].reload_load != CODE_FOR_nothing))
+    fprintf (stderr, "  Reload=%c%c",
+	     (reg_addr[m].reload_store != CODE_FOR_nothing) ? 's' : '*',
+	     (reg_addr[m].reload_load != CODE_FOR_nothing) ? 'l' : '*');
+  else
+    spaces += sizeof ("  Reload=sl") - 1;
+
+  if (reg_addr[m].scalar_in_vmx_p)
+    {
+      fprintf (stderr, "%*s  Upper=y", spaces, "");
+      spaces = 0;
+    }
+  else
+    spaces += sizeof ("  Upper=y") - 1;
+
+  fuse_extra_p = ((reg_addr[m].fusion_gpr_ld != CODE_FOR_nothing)
+		  || reg_addr[m].fused_toc);
+  if (!fuse_extra_p)
+    {
+      for (rc = 0; rc < N_RELOAD_REG; rc++)
+	{
+	  if (rc != RELOAD_REG_ANY)
+	    {
+	      if (reg_addr[m].fusion_addi_ld[rc]     != CODE_FOR_nothing
+		  || reg_addr[m].fusion_addi_ld[rc]  != CODE_FOR_nothing
+		  || reg_addr[m].fusion_addi_st[rc]  != CODE_FOR_nothing
+		  || reg_addr[m].fusion_addis_ld[rc] != CODE_FOR_nothing
+		  || reg_addr[m].fusion_addis_st[rc] != CODE_FOR_nothing)
+		{
+		  fuse_extra_p = true;
+		  break;
+		}
+	    }
+	}
+    }
+
+  if (fuse_extra_p)
+    {
+      fprintf (stderr, "%*s  Fuse:", spaces, "");
+      spaces = 0;
+
+      for (rc = 0; rc < N_RELOAD_REG; rc++)
+	{
+	  if (rc != RELOAD_REG_ANY)
+	    {
+	      char load, store;
+
+	      if (reg_addr[m].fusion_addis_ld[rc] != CODE_FOR_nothing)
+		load = 'l';
+	      else if (reg_addr[m].fusion_addi_ld[rc] != CODE_FOR_nothing)
+		load = 'L';
+	      else
+		load = '-';
+
+	      if (reg_addr[m].fusion_addis_st[rc] != CODE_FOR_nothing)
+		store = 's';
+	      else if (reg_addr[m].fusion_addi_st[rc] != CODE_FOR_nothing)
+		store = 'S';
+	      else
+		store = '-';
+
+	      if (load == '-' && store == '-')
+		spaces += 5;
+	      else
+		{
+		  fprintf (stderr, "%*s%c=%c%c", (spaces + 1), "",
+			   reload_reg_map[rc].name[0], load, store);
+		  spaces = 0;
+		}
+	    }
+	}
+
+      if (reg_addr[m].fusion_gpr_ld != CODE_FOR_nothing)
+	{
+	  fprintf (stderr, "%*sP8gpr", (spaces + 1), "");
+	  spaces = 0;
+	}
+      else
+	spaces += sizeof (" P8gpr") - 1;
+
+      if (reg_addr[m].fused_toc)
+	{
+	  fprintf (stderr, "%*sToc", (spaces + 1), "");
+	  spaces = 0;
+	}
+      else
+	spaces += sizeof (" Toc") - 1;
+    }
+  else
+    spaces += sizeof ("  Fuse: G=ls F=ls v=ls P8gpr Toc") - 1;
+
   if (rs6000_vector_unit[m] != VECTOR_NONE
-      || rs6000_vector_mem[m] != VECTOR_NONE
-      || (reg_addr[m].reload_store != CODE_FOR_nothing)
-      || (reg_addr[m].reload_load != CODE_FOR_nothing)
-      || reg_addr[m].scalar_in_vmx_p)
+      || rs6000_vector_mem[m] != VECTOR_NONE)
     {
-      fprintf (stderr,
-	       "  Vector-arith=%-10s Vector-mem=%-10s Reload=%c%c Upper=%c",
+      fprintf (stderr, "%*s  vector: arith=%-10s mem=%s",
+	       spaces, "",
 	       rs6000_debug_vector_unit (rs6000_vector_unit[m]),
-	       rs6000_debug_vector_unit (rs6000_vector_mem[m]),
-	       (reg_addr[m].reload_store != CODE_FOR_nothing) ? 's' : '*',
-	       (reg_addr[m].reload_load != CODE_FOR_nothing) ? 'l' : '*',
-	       (reg_addr[m].scalar_in_vmx_p) ? 'y' : 'n');
+	       rs6000_debug_vector_unit (rs6000_vector_mem[m]));
     }
 
   fputs ("\n", stderr);
@@ -3019,6 +3117,130 @@ rs6000_init_hard_regno_mode_ok (bool glo
 	reg_addr[SFmode].scalar_in_vmx_p = true;
     }
 
+  /* Setup the fusion operations.  */
+  if (TARGET_P8_FUSION)
+    {
+      reg_addr[QImode].fusion_gpr_ld = CODE_FOR_fusion_gpr_load_qi;
+      reg_addr[HImode].fusion_gpr_ld = CODE_FOR_fusion_gpr_load_hi;
+      reg_addr[SImode].fusion_gpr_ld = CODE_FOR_fusion_gpr_load_si;
+      if (TARGET_64BIT)
+	reg_addr[DImode].fusion_gpr_ld = CODE_FOR_fusion_gpr_load_di;
+    }
+
+  if (TARGET_P9_FUSION)
+    {
+      struct fuse_insns {
+	enum machine_mode mode;			/* mode of the fused type.  */
+	enum machine_mode pmode;		/* pointer mode.  */
+	enum rs6000_reload_reg_type rtype;	/* register type.  */
+	enum insn_code load;			/* load insn.  */
+	enum insn_code store;			/* store insn.  */
+      };
+
+      static const struct fuse_insns addis_insns[] = {
+	{ SFmode, DImode, RELOAD_REG_FPR,
+	  CODE_FOR_fusion_fpr_di_sf_load,
+	  CODE_FOR_fusion_fpr_di_sf_store },
+
+	{ SFmode, SImode, RELOAD_REG_FPR,
+	  CODE_FOR_fusion_fpr_si_sf_load,
+	  CODE_FOR_fusion_fpr_si_sf_store },
+
+	{ DFmode, DImode, RELOAD_REG_FPR,
+	  CODE_FOR_fusion_fpr_di_df_load,
+	  CODE_FOR_fusion_fpr_di_df_store },
+
+	{ DFmode, SImode, RELOAD_REG_FPR,
+	  CODE_FOR_fusion_fpr_si_df_load,
+	  CODE_FOR_fusion_fpr_si_df_store },
+
+	{ DImode, DImode, RELOAD_REG_FPR,
+	  CODE_FOR_fusion_fpr_di_di_load,
+	  CODE_FOR_fusion_fpr_di_di_store },
+
+	{ DImode, SImode, RELOAD_REG_FPR,
+	  CODE_FOR_fusion_fpr_si_di_load,
+	  CODE_FOR_fusion_fpr_si_di_store },
+
+	{ QImode, DImode, RELOAD_REG_GPR,
+	  CODE_FOR_fusion_gpr_di_qi_load,
+	  CODE_FOR_fusion_gpr_di_qi_store },
+
+	{ QImode, SImode, RELOAD_REG_GPR,
+	  CODE_FOR_fusion_gpr_si_qi_load,
+	  CODE_FOR_fusion_gpr_si_qi_store },
+
+	{ HImode, DImode, RELOAD_REG_GPR,
+	  CODE_FOR_fusion_gpr_di_hi_load,
+	  CODE_FOR_fusion_gpr_di_hi_store },
+
+	{ HImode, SImode, RELOAD_REG_GPR,
+	  CODE_FOR_fusion_gpr_si_hi_load,
+	  CODE_FOR_fusion_gpr_si_hi_store },
+
+	{ SImode, DImode, RELOAD_REG_GPR,
+	  CODE_FOR_fusion_gpr_di_si_load,
+	  CODE_FOR_fusion_gpr_di_si_store },
+
+	{ SImode, SImode, RELOAD_REG_GPR,
+	  CODE_FOR_fusion_gpr_si_si_load,
+	  CODE_FOR_fusion_gpr_si_si_store },
+
+	{ SFmode, DImode, RELOAD_REG_GPR,
+	  CODE_FOR_fusion_gpr_di_sf_load,
+	  CODE_FOR_fusion_gpr_di_sf_store },
+
+	{ SFmode, SImode, RELOAD_REG_GPR,
+	  CODE_FOR_fusion_gpr_si_sf_load,
+	  CODE_FOR_fusion_gpr_si_sf_store },
+
+	{ DImode, DImode, RELOAD_REG_GPR,
+	  CODE_FOR_fusion_gpr_di_di_load,
+	  CODE_FOR_fusion_gpr_di_di_store },
+
+	{ DFmode, DImode, RELOAD_REG_GPR,
+	  CODE_FOR_fusion_gpr_di_df_load,
+	  CODE_FOR_fusion_gpr_di_df_store },
+      };
+
+      enum machine_mode cur_pmode = Pmode;
+      size_t i;
+
+      for (i = 0; i < ARRAY_SIZE (addis_insns); i++)
+	{
+	  enum machine_mode xmode = addis_insns[i].mode;
+	  enum rs6000_reload_reg_type rtype = addis_insns[i].rtype;
+
+	  if (addis_insns[i].pmode != cur_pmode)
+	    continue;
+
+	  if (rtype == RELOAD_REG_FPR
+	      && (!TARGET_HARD_FLOAT || !TARGET_FPRS))
+	    continue;
+
+	  reg_addr[xmode].fusion_addis_ld[rtype] = addis_insns[i].load;
+	  reg_addr[xmode].fusion_addis_st[rtype] = addis_insns[i].store;
+	}
+    }
+
+  /* Note which types we support fusing TOC setup plus memory insn.  We only do
+     fused TOCs for medium/large code models.  */
+  if (TARGET_P8_FUSION && TARGET_TOC_FUSION && TARGET_POWERPC64
+      && (TARGET_CMODEL != CMODEL_SMALL))
+    {
+      reg_addr[QImode].fused_toc = true;
+      reg_addr[HImode].fused_toc = true;
+      reg_addr[SImode].fused_toc = true;
+      reg_addr[DImode].fused_toc = true;
+      if (TARGET_HARD_FLOAT && TARGET_FPRS)
+	{
+	  if (TARGET_SINGLE_FLOAT)
+	    reg_addr[SFmode].fused_toc = true;
+	  if (TARGET_DOUBLE_FLOAT)
+	    reg_addr[DFmode].fused_toc = true;
+	}
+    }
+
   /* Precalculate HARD_REGNO_NREGS.  */
   for (r = 0; r < FIRST_PSEUDO_REGISTER; ++r)
     for (m = 0; m < NUM_MACHINE_MODES; ++m)
@@ -8127,6 +8349,8 @@ rs6000_legitimate_address_p (machine_mod
       && legitimate_constant_pool_address_p (x, mode,
 					     reg_ok_strict || lra_in_progress))
     return 1;
+  if (reg_offset_p && reg_addr[mode].fused_toc && toc_fusion_mem_wrapped (x, mode))
+    return 1;
   /* For TImode, if we have load/store quad and TImode in VSX registers, only
      allow register indirect addresses.  This will allow the values to go in
      either GPRs or VSX registers without reloading.  The vector types would
@@ -35209,72 +35433,21 @@ expand_fusion_gpr_load (rtx *operands)
   return;
 }
 
-/* Return a string to fuse an addis instruction with a gpr load to the same
-   register that we loaded up the addis instruction.  The address that is used
-   is the logical address that was formed during peephole2:
-	(lo_sum (high) (low-part))
-
-   The code is complicated, so we call output_asm_insn directly, and just
-   return "".  */
+/* Emit the addis instruction that will be part of a fused instruction
+   sequence.  */
 
-const char *
-emit_fusion_gpr_load (rtx target, rtx mem)
+void
+emit_fusion_addis (rtx target, rtx addis_value, const char *comment,
+		   const char *mode_name)
 {
-  rtx addis_value;
   rtx fuse_ops[10];
-  rtx addr;
-  rtx load_offset;
-  const char *addis_str = NULL;
-  const char *load_str = NULL;
-  const char *mode_name = NULL;
   char insn_template[80];
-  machine_mode mode;
+  const char *addis_str = NULL;
   const char *comment_str = ASM_COMMENT_START;
 
-  if (GET_CODE (mem) == ZERO_EXTEND)
-    mem = XEXP (mem, 0);
-
-  gcc_assert (REG_P (target) && MEM_P (mem));
-
   if (*comment_str == ' ')
     comment_str++;
 
-  addr = XEXP (mem, 0);
-  if (GET_CODE (addr) != PLUS && GET_CODE (addr) != LO_SUM)
-    gcc_unreachable ();
-
-  addis_value = XEXP (addr, 0);
-  load_offset = XEXP (addr, 1);
-
-  /* Now emit the load instruction to the same register.  */
-  mode = GET_MODE (mem);
-  switch (mode)
-    {
-    case QImode:
-      mode_name = "char";
-      load_str = "lbz";
-      break;
-
-    case HImode:
-      mode_name = "short";
-      load_str = "lhz";
-      break;
-
-    case SImode:
-      mode_name = "int";
-      load_str = "lwz";
-      break;
-
-    case DImode:
-      gcc_assert (TARGET_POWERPC64);
-      mode_name = "long";
-      load_str = "ld";
-      break;
-
-    default:
-      gcc_unreachable ();
-    }
-
   /* Emit the addis instruction.  */
   fuse_ops[0] = target;
   if (satisfies_constraint_L (addis_value))
@@ -35353,68 +35526,531 @@ emit_fusion_gpr_load (rtx target, rtx me
   if (!addis_str)
     fatal_insn ("Could not generate addis value for fusion", addis_value);
 
-  sprintf (insn_template, "%s\t\t%s gpr load fusion, type %s", addis_str,
-	   comment_str, mode_name);
+  sprintf (insn_template, "%s\t\t%s %s, type %s", addis_str, comment_str,
+	   comment, mode_name);
   output_asm_insn (insn_template, fuse_ops);
+}
 
-  /* Emit the D-form load instruction.  */
-  if (CONST_INT_P (load_offset) && satisfies_constraint_I (load_offset))
+/* Emit a D-form load or store instruction that is the second instruction
+   of a fusion sequence.  */
+
+void
+emit_fusion_load_store (rtx load_store_reg, rtx addis_reg, rtx offset,
+			const char *insn_str)
+{
+  rtx fuse_ops[10];
+  char insn_template[80];
+
+  fuse_ops[0] = load_store_reg;
+  fuse_ops[1] = addis_reg;
+
+  if (CONST_INT_P (offset) && satisfies_constraint_I (offset))
     {
-      sprintf (insn_template, "%s %%0,%%1(%%0)", load_str);
-      fuse_ops[1] = load_offset;
+      sprintf (insn_template, "%s %%0,%%2(%%1)", insn_str);
+      fuse_ops[2] = offset;
       output_asm_insn (insn_template, fuse_ops);
     }
 
-  else if (GET_CODE (load_offset) == UNSPEC
-	   && XINT (load_offset, 1) == UNSPEC_TOCREL)
+  else if (GET_CODE (offset) == UNSPEC
+	   && XINT (offset, 1) == UNSPEC_TOCREL)
     {
       if (TARGET_ELF)
-	sprintf (insn_template, "%s %%0,%%1@toc@l(%%0)", load_str);
+	sprintf (insn_template, "%s %%0,%%2@toc@l(%%1)", insn_str);
 
       else if (TARGET_XCOFF)
-	sprintf (insn_template, "%s %%0,%%1@l(%%0)", load_str);
+	sprintf (insn_template, "%s %%0,%%2@l(%%1)", insn_str);
 
       else
 	gcc_unreachable ();
 
-      fuse_ops[1] = XVECEXP (load_offset, 0, 0);
+      fuse_ops[2] = XVECEXP (offset, 0, 0);
       output_asm_insn (insn_template, fuse_ops);
     }
 
-  else if (GET_CODE (load_offset) == PLUS
-	   && GET_CODE (XEXP (load_offset, 0)) == UNSPEC
-	   && XINT (XEXP (load_offset, 0), 1) == UNSPEC_TOCREL
-	   && CONST_INT_P (XEXP (load_offset, 1)))
+  else if (GET_CODE (offset) == PLUS
+	   && GET_CODE (XEXP (offset, 0)) == UNSPEC
+	   && XINT (XEXP (offset, 0), 1) == UNSPEC_TOCREL
+	   && CONST_INT_P (XEXP (offset, 1)))
     {
-      rtx tocrel_unspec = XEXP (load_offset, 0);
+      rtx tocrel_unspec = XEXP (offset, 0);
       if (TARGET_ELF)
-	sprintf (insn_template, "%s %%0,%%1+%%2@toc@l(%%0)", load_str);
+	sprintf (insn_template, "%s %%0,%%2+%%3@toc@l(%%1)", insn_str);
 
       else if (TARGET_XCOFF)
-	sprintf (insn_template, "%s %%0,%%1+%%2@l(%%0)", load_str);
+	sprintf (insn_template, "%s %%0,%%2+%%3@l(%%1)", insn_str);
 
       else
 	gcc_unreachable ();
 
-      fuse_ops[1] = XVECEXP (tocrel_unspec, 0, 0);
-      fuse_ops[2] = XEXP (load_offset, 1);
+      fuse_ops[2] = XVECEXP (tocrel_unspec, 0, 0);
+      fuse_ops[3] = XEXP (offset, 1);
       output_asm_insn (insn_template, fuse_ops);
     }
 
-  else if (TARGET_ELF && !TARGET_POWERPC64 && CONSTANT_P (load_offset))
+  else if (TARGET_ELF && !TARGET_POWERPC64 && CONSTANT_P (offset))
     {
-      sprintf (insn_template, "%s %%0,%%1@l(%%0)", load_str);
+      sprintf (insn_template, "%s %%0,%%2@l(%%1)", insn_str);
 
-      fuse_ops[1] = load_offset;
+      fuse_ops[2] = offset;
       output_asm_insn (insn_template, fuse_ops);
     }
 
   else
-    fatal_insn ("Unable to generate load offset for fusion", load_offset);
+    fatal_insn ("Unable to generate load/store offset for fusion", offset);
+
+  return;
+}
+
+/* Wrap a TOC address that can be fused to indicate that special fusion
+   processing is needed.  */
+
+rtx
+fusion_wrap_memory_address (rtx old_mem)
+{
+  rtx old_addr = XEXP (old_mem, 0);
+  rtvec v = gen_rtvec (1, old_addr);
+  rtx new_addr = gen_rtx_UNSPEC (Pmode, v, UNSPEC_FUSION_ADDIS);
+  return replace_equiv_address_nv (old_mem, new_addr, false);
+}
+
+/* Given an address, convert it into the addis and load offset parts.  Addresses
+   created during the peephole2 process look like:
+	(lo_sum (high (unspec [(sym)] UNSPEC_TOCREL))
+		(unspec [(...)] UNSPEC_TOCREL))
+
+   Addresses created via toc fusion look like:
+	(unspec [(unspec [(...)] UNSPEC_TOCREL)] UNSPEC_FUSION_ADDIS))  */
+
+static void
+fusion_split_address (rtx addr, rtx *p_hi, rtx *p_lo)
+{
+  rtx hi, lo;
+
+  if (GET_CODE (addr) == UNSPEC && XINT (addr, 1) == UNSPEC_FUSION_ADDIS)
+    {
+      lo = XVECEXP (addr, 0, 0);
+      hi = gen_rtx_HIGH (Pmode, lo);
+    }
+  else if (GET_CODE (addr) == PLUS || GET_CODE (addr) == LO_SUM)
+    {
+      hi = XEXP (addr, 0);
+      lo = XEXP (addr, 1);
+    }
+  else
+    gcc_unreachable ();
+
+  *p_hi = hi;
+  *p_lo = lo;
+}
+
+/* Return a string to fuse an addis instruction with a gpr load to the same
+   register that we loaded up the addis instruction.  The address that is used
+   is the logical address that was formed during peephole2:
+	(lo_sum (high) (low-part))
+
+   Or the address is the TOC address that is wrapped before register allocation:
+	(unspec [(addr) (toc-reg)] UNSPEC_FUSION_ADDIS)
+
+   The code is complicated, so we call output_asm_insn directly, and just
+   return "".  */
+
+const char *
+emit_fusion_gpr_load (rtx target, rtx mem)
+{
+  rtx addis_value;
+  rtx addr;
+  rtx load_offset;
+  const char *load_str = NULL;
+  const char *mode_name = NULL;
+  machine_mode mode;
+
+  if (GET_CODE (mem) == ZERO_EXTEND)
+    mem = XEXP (mem, 0);
+
+  gcc_assert (REG_P (target) && MEM_P (mem));
+
+  addr = XEXP (mem, 0);
+  fusion_split_address (addr, &addis_value, &load_offset);
+
+  /* Now emit the load instruction to the same register.  */
+  mode = GET_MODE (mem);
+  switch (mode)
+    {
+    case QImode:
+      mode_name = "char";
+      load_str = "lbz";
+      break;
+
+    case HImode:
+      mode_name = "short";
+      load_str = "lhz";
+      break;
+
+    case SImode:
+    case SFmode:
+      mode_name = (mode == SFmode) ? "float" : "int";
+      load_str = "lwz";
+      break;
+
+    case DImode:
+    case DFmode:
+      gcc_assert (TARGET_POWERPC64);
+      mode_name = (mode == DFmode) ? "double" : "long";
+      load_str = "ld";
+      break;
+
+    default:
+      fatal_insn ("Bad GPR fusion", gen_rtx_SET (target, mem));
+    }
+
+  /* Emit the addis instruction.  */
+  emit_fusion_addis (target, addis_value, "gpr load fusion", mode_name);
+
+  /* Emit the D-form load instruction.  */
+  emit_fusion_load_store (target, target, load_offset, load_str);
 
   return "";
 }
 \f
+
+/* Return true if the peephole2 can combine a load/store involving a
+   combination of an addis instruction and the memory operation.  This was
+   added to the ISA 3.0 (power9) hardware.  */
+
+bool
+fusion_p9_p (rtx addis_reg,		/* register set via addis.  */
+	     rtx addis_value,		/* addis value.  */
+	     rtx dest,			/* destination (memory or register). */
+	     rtx src)			/* source (register or memory).  */
+{
+  rtx addr, mem, offset;
+  enum machine_mode mode = GET_MODE (src);
+
+  /* Validate arguments.  */
+  if (!base_reg_operand (addis_reg, GET_MODE (addis_reg)))
+    return false;
+
+  if (!fusion_gpr_addis (addis_value, GET_MODE (addis_value)))
+    return false;
+
+  /* Ignore extend operations that are part of the load.  */
+  if (GET_CODE (src) == FLOAT_EXTEND || GET_CODE (src) == ZERO_EXTEND)
+    src = XEXP (src, 0);
+
+  /* Test for memory<-register or register<-memory.  */
+  if (fpr_reg_operand (src, mode) || int_reg_operand (src, mode))
+    {
+      if (!MEM_P (dest))
+	return false;
+
+      mem = dest;
+    }
+
+  else if (MEM_P (src))
+    {
+      if (!fpr_reg_operand (dest, mode) && !int_reg_operand (dest, mode))
+	return false;
+
+      mem = src;
+    }
+
+  else
+    return false;
+
+  addr = XEXP (mem, 0);			/* either PLUS or LO_SUM.  */
+  if (GET_CODE (addr) == PLUS)
+    {
+      if (!rtx_equal_p (addis_reg, XEXP (addr, 0)))
+	return false;
+
+      return satisfies_constraint_I (XEXP (addr, 1));
+    }
+
+  else if (GET_CODE (addr) == LO_SUM)
+    {
+      if (!rtx_equal_p (addis_reg, XEXP (addr, 0)))
+	return false;
+
+      offset = XEXP (addr, 1);
+      if (TARGET_XCOFF || (TARGET_ELF && TARGET_POWERPC64))
+	return small_toc_ref (offset, GET_MODE (offset));
+
+      else if (TARGET_ELF && !TARGET_POWERPC64)
+	return CONSTANT_P (offset);
+    }
+
+  return false;
+}
+
+/* During the peephole2 pass, adjust and expand the insns for an extended fusion
+   load sequence.
+
+   The operands are:
+	operands[0]	register set with addis
+	operands[1]	value set via addis
+	operands[2]	target register being loaded
+	operands[3]	D-form memory reference using operands[0].
+
+  This is similar to the fusion introduced with power8, except it scales to
+  both loads/stores and does not require the result register to be the same as
+  the base register.  At the moment, we only do this if register set with addis
+  is dead.  */
+
+void
+expand_fusion_p9_load (rtx *operands)
+{
+  rtx tmp_reg = operands[0];
+  rtx addis_value = operands[1];
+  rtx target = operands[2];
+  rtx orig_mem = operands[3];
+  rtx  new_addr, new_mem, orig_addr, offset, set, clobber, insn;
+  enum rtx_code plus_or_lo_sum;
+  machine_mode target_mode = GET_MODE (target);
+  machine_mode extend_mode = target_mode;
+  machine_mode ptr_mode = Pmode;
+  enum rtx_code extend = UNKNOWN;
+
+  if (GET_CODE (orig_mem) == FLOAT_EXTEND || GET_CODE (orig_mem) == ZERO_EXTEND)
+    {
+      extend = GET_CODE (orig_mem);
+      orig_mem = XEXP (orig_mem, 0);
+      target_mode = GET_MODE (orig_mem);
+    }
+
+  gcc_assert (MEM_P (orig_mem));
+
+  orig_addr = XEXP (orig_mem, 0);
+  plus_or_lo_sum = GET_CODE (orig_addr);
+  gcc_assert (plus_or_lo_sum == PLUS || plus_or_lo_sum == LO_SUM);
+
+  offset = XEXP (orig_addr, 1);
+  new_addr = gen_rtx_fmt_ee (plus_or_lo_sum, ptr_mode, addis_value, offset);
+  new_mem = replace_equiv_address_nv (orig_mem, new_addr, false);
+
+  if (extend != UNKNOWN)
+    new_mem = gen_rtx_fmt_e (extend, extend_mode, new_mem);
+
+  new_mem = gen_rtx_UNSPEC (extend_mode, gen_rtvec (1, new_mem),
+			    UNSPEC_FUSION_P9);
+
+  set = gen_rtx_SET (target, new_mem);
+  clobber = gen_rtx_CLOBBER (VOIDmode, tmp_reg);
+  insn = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, set, clobber));
+  emit_insn (insn);
+
+  return;
+}
+
+/* During the peephole2 pass, adjust and expand the insns for an extended fusion
+   store sequence.
+
+   The operands are:
+	operands[0]	register set with addis
+	operands[1]	value set via addis
+	operands[2]	target D-form memory being stored to
+	operands[3]	register being stored
+
+  This is similar to the fusion introduced with power8, except it scales to
+  both loads/stores and does not require the result register to be the same as
+  the base register.  At the moment, we only do this if register set with addis
+  is dead.  */
+
+void
+expand_fusion_p9_store (rtx *operands)
+{
+  rtx tmp_reg = operands[0];
+  rtx addis_value = operands[1];
+  rtx orig_mem = operands[2];
+  rtx src = operands[3];
+  rtx  new_addr, new_mem, orig_addr, offset, set, clobber, insn, new_src;
+  enum rtx_code plus_or_lo_sum;
+  machine_mode target_mode = GET_MODE (orig_mem);
+  machine_mode ptr_mode = Pmode;
+
+  gcc_assert (MEM_P (orig_mem));
+
+  orig_addr = XEXP (orig_mem, 0);
+  plus_or_lo_sum = GET_CODE (orig_addr);
+  gcc_assert (plus_or_lo_sum == PLUS || plus_or_lo_sum == LO_SUM);
+
+  offset = XEXP (orig_addr, 1);
+  new_addr = gen_rtx_fmt_ee (plus_or_lo_sum, ptr_mode, addis_value, offset);
+  new_mem = replace_equiv_address_nv (orig_mem, new_addr, false);
+
+  new_src = gen_rtx_UNSPEC (target_mode, gen_rtvec (1, src),
+			    UNSPEC_FUSION_P9);
+
+  set = gen_rtx_SET (new_mem, new_src);
+  clobber = gen_rtx_CLOBBER (VOIDmode, tmp_reg);
+  insn = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, set, clobber));
+  emit_insn (insn);
+
+  return;
+}
+
+/* Return a string to fuse an addis instruction with a load using extended
+   fusion.  The address that is used is the logical address that was formed
+   during peephole2: (lo_sum (high) (low-part))
+
+   The code is complicated, so we call output_asm_insn directly, and just
+   return "".  */
+
+const char *
+emit_fusion_p9_load (rtx reg, rtx mem, rtx tmp_reg)
+{
+  enum machine_mode mode = GET_MODE (reg);
+  rtx hi;
+  rtx lo;
+  rtx addr;
+  const char *load_string;
+  int r;
+
+  if (GET_CODE (mem) == FLOAT_EXTEND || GET_CODE (mem) == ZERO_EXTEND)
+    {
+      mem = XEXP (mem, 0);
+      mode = GET_MODE (mem);
+    }
+
+  if (GET_CODE (reg) == SUBREG)
+    {
+      gcc_assert (SUBREG_BYTE (reg) == 0);
+      reg = SUBREG_REG (reg);
+    }
+
+  if (!REG_P (reg))
+    fatal_insn ("emit_fusion_p9_load, bad reg #1", reg);
+
+  r = REGNO (reg);
+  if (FP_REGNO_P (r))
+    {
+      if (mode == SFmode)
+	load_string = "lfs";
+      else if (mode == DFmode || mode == DImode)
+	load_string = "lfd";
+      else
+	gcc_unreachable ();
+    }
+  else if (INT_REGNO_P (r))
+    {
+      switch (mode)
+	{
+	case QImode:
+	  load_string = "lbz";
+	  break;
+	case HImode:
+	  load_string = "lhz";
+	  break;
+	case SImode:
+	case SFmode:
+	  load_string = "lwz";
+	  break;
+	case DImode:
+	case DFmode:
+	  if (!TARGET_POWERPC64)
+	    gcc_unreachable ();
+	  load_string = "ld";
+	  break;
+	default:
+	  gcc_unreachable ();
+	}
+    }
+  else
+    fatal_insn ("emit_fusion_p9_load, bad reg #2", reg);
+
+  if (!MEM_P (mem))
+    fatal_insn ("emit_fusion_p9_load not MEM", mem);
+
+  addr = XEXP (mem, 0);
+  fusion_split_address (addr, &hi, &lo);
+
+  /* Emit the addis instruction.  */
+  emit_fusion_addis (tmp_reg, hi, "power9 load fusion", GET_MODE_NAME (mode));
+
+  /* Emit the D-form load instruction.  */
+  emit_fusion_load_store (reg, tmp_reg, lo, load_string);
+
+  return "";
+}
+
+/* Return a string to fuse an addis instruction with a store using extended
+   fusion.  The address that is used is the logical address that was formed
+   during peephole2: (lo_sum (high) (low-part))
+
+   The code is complicated, so we call output_asm_insn directly, and just
+   return "".  */
+
+const char *
+emit_fusion_p9_store (rtx mem, rtx reg, rtx tmp_reg)
+{
+  enum machine_mode mode = GET_MODE (reg);
+  rtx hi;
+  rtx lo;
+  rtx addr;
+  const char *store_string;
+  int r;
+
+  if (GET_CODE (reg) == SUBREG)
+    {
+      gcc_assert (SUBREG_BYTE (reg) == 0);
+      reg = SUBREG_REG (reg);
+    }
+
+  if (!REG_P (reg))
+    fatal_insn ("emit_fusion_p9_store, bad reg #1", reg);
+
+  r = REGNO (reg);
+  if (FP_REGNO_P (r))
+    {
+      if (mode == SFmode)
+	store_string = "stfs";
+      else if (mode == DFmode)
+	store_string = "stfd";
+      else
+	gcc_unreachable ();
+    }
+  else if (INT_REGNO_P (r))
+    {
+      switch (mode)
+	{
+	case QImode:
+	  store_string = "stb";
+	  break;
+	case HImode:
+	  store_string = "sth";
+	  break;
+	case SImode:
+	case SFmode:
+	  store_string = "stw";
+	  break;
+	case DImode:
+	case DFmode:
+	  if (!TARGET_POWERPC64)
+	    gcc_unreachable ();
+	  store_string = "std";
+	  break;
+	default:
+	  gcc_unreachable ();
+	}
+    }
+  else
+    fatal_insn ("emit_fusion_p9_store, bad reg #2", reg);
+
+  if (!MEM_P (mem))
+    fatal_insn ("emit_fusion_p9_store not MEM", mem);
+
+  addr = XEXP (mem, 0);
+  fusion_split_address (addr, &hi, &lo);
+
+  /* Emit the addis instruction.  */
+  emit_fusion_addis (tmp_reg, hi, "power9 store fusion", GET_MODE_NAME (mode));
+
+  /* Emit the D-form load instruction.  */
+  emit_fusion_load_store (reg, tmp_reg, lo, store_string);
+
+  return "";
+}
+
+\f
 /* Analyze vector computations and remove unnecessary doubleword
    swaps (xxswapdi instructions).  This pass is performed only
    for little-endian VSX code generation.
Index: gcc/config/rs6000/rs6000.h
===================================================================
--- gcc/config/rs6000/rs6000.h	(revision 229975)
+++ gcc/config/rs6000/rs6000.h	(working copy)
@@ -703,6 +703,22 @@ extern int rs6000_vector_align[];
 			 && TARGET_DOUBLE_FLOAT \
 			 && (TARGET_PPC_GFXOPT || VECTOR_UNIT_VSX_P (DFmode)))
 
+/* Conditions to allow TOC fusion for loading/storing integers.  */
+#define TARGET_TOC_FUSION_INT	(TARGET_P8_FUSION			\
+				 && TARGET_TOC_FUSION			\
+				 && (TARGET_CMODEL != CMODEL_SMALL)	\
+				 && TARGET_POWERPC64)
+
+/* Conditions to allow TOC fusion for loading/storing floating point.  */
+#define TARGET_TOC_FUSION_FP	(TARGET_P9_FUSION			\
+				 && TARGET_TOC_FUSION			\
+				 && (TARGET_CMODEL != CMODEL_SMALL)	\
+				 && TARGET_POWERPC64			\
+				 && TARGET_HARD_FLOAT			\
+				 && TARGET_FPRS				\
+				 && TARGET_SINGLE_FLOAT			\
+				 && TARGET_DOUBLE_FLOAT)
+
 /* Whether the various reciprocal divide/square root estimate instructions
    exist, and whether we should automatically generate code for the instruction
    by default.  */
Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(revision 229975)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -141,6 +141,8 @@ (define_c_enum "unspec"
    UNSPEC_LSQ
    UNSPEC_FUSION_GPR
    UNSPEC_STACK_CHECK
+   UNSPEC_FUSION_P9
+   UNSPEC_FUSION_ADDIS
   ])
 
 ;;
@@ -327,12 +329,28 @@ (define_mode_iterator EXTSI [(DI "TARGET
 ; QImode or HImode for small atomic ops
 (define_mode_iterator QHI [QI HI])
 
+; QImode, HImode, SImode for fused ops only for GPR loads
+(define_mode_iterator QHSI [QI HI SI])
+
 ; HImode or SImode for sign extended fusion ops
 (define_mode_iterator HSI [HI SI])
 
 ; SImode or DImode, even if DImode doesn't fit in GPRs.
 (define_mode_iterator SDI [SI DI])
 
+; Types that can be fused with an ADDIS instruction to load or store a GPR
+; register that has reg+offset addressing.
+(define_mode_iterator GPR_FUSION [QI
+				  HI
+				  SI
+				  (DI	"TARGET_POWERPC64")
+				  SF
+				  (DF	"TARGET_POWERPC64")])
+
+; Types that can be fused with an ADDIS instruction to load or store a FPR
+; register that has reg+offset addressing.
+(define_mode_iterator FPR_FUSION [DI SF DF])
+
 ; The size of a pointer.  Also, the size of the value that a record-condition
 ; (one with a '.') will compare; and the size used for arithmetic carries.
 (define_mode_iterator P [(SI "TARGET_32BIT") (DI "TARGET_64BIT")])
@@ -12592,6 +12610,66 @@ (define_insn "rs6000_mtfsf"
 ;; a GPR.  The addis instruction must be adjacent to the load, and use the same
 ;; register that is being loaded.  The fused ops must be physically adjacent.
 
+;; There are two parts to addis fusion.  The support for fused TOCs occur
+;; before register allocation, and is meant to reduce the lifetime for the
+;; tempoary register that holds the ADDIS result.  On Power8 GPR loads, we try
+;; to use the register that is being load.  The peephole2 then gathers any
+;; other fused possibilities that it can find after register allocation.  If
+;; power9 fusion is selected, we also fuse floating point loads/stores.
+
+;; Fused TOC support: Replace simple GPR loads with a fused form.  This is done
+;; before register allocation, so that we can avoid allocating a temporary base
+;; register that won't be used, and that we try to load into base registers,
+;; and not register 0.  If we can't get a fused GPR load, generate a P9 fusion
+;; (addis followed by load) even on power8.
+
+(define_split
+  [(set (match_operand:INT1 0 "toc_fusion_or_p9_reg_operand" "")
+	(match_operand:INT1 1 "toc_fusion_mem_raw" ""))]
+  "TARGET_TOC_FUSION_INT && can_create_pseudo_p ()"
+  [(parallel [(set (match_dup 0) (match_dup 2))
+	      (unspec [(const_int 0)] UNSPEC_FUSION_ADDIS)
+	      (use (match_dup 3))
+	      (clobber (scratch:DI))])]
+{
+  operands[2] = fusion_wrap_memory_address (operands[1]);
+  operands[3] = gen_rtx_REG (Pmode, TOC_REGISTER);
+})
+
+(define_insn "*toc_fusionload_<mode>"
+  [(set (match_operand:QHSI 0 "int_reg_operand" "=&b,??r")
+	(match_operand:QHSI 1 "toc_fusion_mem_wrapped" "wG,wG"))
+   (unspec [(const_int 0)] UNSPEC_FUSION_ADDIS)
+   (use (match_operand:DI 2 "base_reg_operand" "r,r"))
+   (clobber (match_scratch:DI 3 "=X,&b"))]
+  "TARGET_TOC_FUSION_INT"
+{
+  if (base_reg_operand (operands[0], <MODE>mode))
+    return emit_fusion_gpr_load (operands[0], operands[1]);
+
+  return emit_fusion_p9_load (operands[0], operands[1], operands[3]);
+}
+  [(set_attr "type" "load")
+   (set_attr "length" "8")])
+
+(define_insn "*toc_fusionload_di"
+  [(set (match_operand:DI 0 "int_reg_operand" "=&b,??r,?d")
+	(match_operand:DI 1 "toc_fusion_mem_wrapped" "wG,wG,wG"))
+   (unspec [(const_int 0)] UNSPEC_FUSION_ADDIS)
+   (use (match_operand:DI 2 "base_reg_operand" "r,r,r"))
+   (clobber (match_scratch:DI 3 "=X,&b,&b"))]
+  "TARGET_TOC_FUSION_INT && TARGET_POWERPC64
+   && (MEM_P (operands[1]) || int_reg_operand (operands[0], DImode))"
+{
+  if (base_reg_operand (operands[0], DImode))
+    return emit_fusion_gpr_load (operands[0], operands[1]);
+
+  return emit_fusion_p9_load (operands[0], operands[1], operands[3]);
+}
+  [(set_attr "type" "load")
+   (set_attr "length" "8")])
+
+\f
 ;; Find cases where the addis that feeds into a load instruction is either used
 ;; once or is the same as the target register, and replace it with the fusion
 ;; insn
@@ -12615,7 +12693,7 @@ (define_peephole2
 
 (define_insn "fusion_gpr_load_<mode>"
   [(set (match_operand:INT1 0 "base_reg_operand" "=&b")
-	(unspec:INT1 [(match_operand:INT1 1 "fusion_gpr_mem_combo" "")]
+	(unspec:INT1 [(match_operand:INT1 1 "fusion_addis_mem_combo_load" "")]
 		     UNSPEC_FUSION_GPR))]
   "TARGET_P8_FUSION"
 {
@@ -12625,6 +12703,133 @@ (define_insn "fusion_gpr_load_<mode>"
    (set_attr "length" "8")])
 
 \f
+;; ISA 3.0 (power9) fusion support
+;; Merge addis with floating load/store to FPRs (or GPRs).
+(define_peephole2
+  [(set (match_operand:P 0 "base_reg_operand" "")
+	(match_operand:P 1 "fusion_gpr_addis" ""))
+   (set (match_operand:SFDF 2 "toc_fusion_or_p9_reg_operand" "")
+	(match_operand:SFDF 3 "fusion_offsettable_mem_operand" ""))]
+  "TARGET_P9_FUSION && peep2_reg_dead_p (2, operands[0])
+   && fusion_p9_p (operands[0], operands[1], operands[2], operands[3])"
+  [(const_int 0)]
+{
+  expand_fusion_p9_load (operands);
+  DONE;
+})
+
+(define_peephole2
+  [(set (match_operand:P 0 "base_reg_operand" "")
+	(match_operand:P 1 "fusion_gpr_addis" ""))
+   (set (match_operand:SFDF 2 "offsettable_mem_operand" "")
+	(match_operand:SFDF 3 "toc_fusion_or_p9_reg_operand" ""))]
+  "TARGET_P9_FUSION && peep2_reg_dead_p (2, operands[0])
+   && fusion_p9_p (operands[0], operands[1], operands[2], operands[3])"
+  [(const_int 0)]
+{
+  expand_fusion_p9_store (operands);
+  DONE;
+})
+
+(define_peephole2
+  [(set (match_operand:SDI 0 "int_reg_operand" "")
+	(match_operand:SDI 1 "upper16_cint_operand" ""))
+   (set (match_dup 0)
+	(ior:SDI (match_dup 0)
+		 (match_operand:SDI 2 "u_short_cint_operand" "")))]
+  "TARGET_P9_FUSION"
+  [(set (match_dup 0)
+	(unspec:SDI [(match_dup 1)
+		     (match_dup 2)] UNSPEC_FUSION_P9))])
+
+(define_peephole2
+  [(set (match_operand:SDI 0 "int_reg_operand" "")
+	(match_operand:SDI 1 "upper16_cint_operand" ""))
+   (set (match_operand:SDI 2 "int_reg_operand" "")
+	(ior:SDI (match_dup 0)
+		 (match_operand:SDI 3 "u_short_cint_operand" "")))]
+  "TARGET_P9_FUSION
+   && !rtx_equal_p (operands[0], operands[2])
+   && peep2_reg_dead_p (2, operands[0])"
+  [(set (match_dup 2)
+	(unspec:SDI [(match_dup 1)
+		     (match_dup 3)] UNSPEC_FUSION_P9))])
+
+;; Fusion insns, created by the define_peephole2 above (and eventually by
+;; reload).  Because we want to eventually have secondary_reload generate
+;; these, they have to have a single alternative that gives the register
+;; classes.  This means we need to have separate gpr/fpr/altivec versions.
+(define_insn "fusion_gpr_<P:mode>_<GPR_FUSION:mode>_load"
+  [(set (match_operand:GPR_FUSION 0 "int_reg_operand" "=r")
+	(unspec:GPR_FUSION
+	 [(match_operand:GPR_FUSION 1 "fusion_addis_mem_combo_load" "wF")]
+	 UNSPEC_FUSION_P9))
+   (clobber (match_operand:P 2 "base_reg_operand" "=&b"))]
+  "TARGET_P9_FUSION"
+{
+  /* This insn is a secondary reload insn, which cannot have alternatives.
+     If we are not loading up register 0, use the power8 fusion instead.  */
+  if (base_reg_operand (operands[0], <GPR_FUSION:MODE>mode))
+    return emit_fusion_gpr_load (operands[0], operands[1]);
+
+  return emit_fusion_p9_load (operands[0], operands[1], operands[2]);
+}
+  [(set_attr "type" "load")
+   (set_attr "length" "8")])
+
+(define_insn "fusion_gpr_<P:mode>_<GPR_FUSION:mode>_store"
+  [(set (match_operand:GPR_FUSION 0 "fusion_addis_mem_combo_store" "=wF")
+	(unspec:GPR_FUSION
+	 [(match_operand:GPR_FUSION 1 "int_reg_operand" "r")]
+	 UNSPEC_FUSION_P9))
+   (clobber (match_operand:P 2 "base_reg_operand" "=&b"))]
+  "TARGET_P9_FUSION"
+{
+  return emit_fusion_p9_store (operands[0], operands[1], operands[2]);
+}
+  [(set_attr "type" "store")
+   (set_attr "length" "8")])
+
+(define_insn "fusion_fpr_<P:mode>_<FPR_FUSION:mode>_load"
+  [(set (match_operand:FPR_FUSION 0 "fpr_reg_operand" "=d")
+	(unspec:FPR_FUSION
+	 [(match_operand:FPR_FUSION 1 "fusion_addis_mem_combo_load" "wF")]
+	 UNSPEC_FUSION_P9))
+   (clobber (match_operand:P 2 "base_reg_operand" "=b"))]
+  "TARGET_P9_FUSION"
+{
+  return emit_fusion_p9_load (operands[0], operands[1], operands[2]);
+}
+  [(set_attr "type" "fpload")
+   (set_attr "length" "8")])
+
+(define_insn "fusion_fpr_<P:mode>_<FPR_FUSION:mode>_store"
+  [(set (match_operand:FPR_FUSION 0 "fusion_addis_mem_combo_store" "=wF")
+	(unspec:FPR_FUSION
+	 [(match_operand:FPR_FUSION 1 "fpr_reg_operand" "d")]
+	 UNSPEC_FUSION_P9))
+   (clobber (match_operand:P 2 "base_reg_operand" "=b"))]
+  "TARGET_P9_FUSION"
+{
+  return emit_fusion_p9_store (operands[0], operands[1], operands[2]);
+}
+  [(set_attr "type" "fpstore")
+   (set_attr "length" "8")])
+
+(define_insn "*fusion_p9_<mode>_constant"
+  [(set (match_operand:SDI 0 "int_reg_operand" "=r")
+	(unspec:SDI [(match_operand:SDI 1 "upper16_cint_operand" "L")
+		     (match_operand:SDI 2 "u_short_cint_operand" "K")]
+		    UNSPEC_FUSION_P9))]	
+  "TARGET_P9_FUSION"
+{
+  emit_fusion_addis (operands[0], operands[1], "constant", "<MODE>");
+  return "ori %0,%0,%2";
+}
+  [(set_attr "type" "two")
+   (set_attr "length" "8")])
+
+\f
 ;; Miscellaneous ISA 2.06 (power7) instructions
 (define_insn "addg6s"
   [(set (match_operand:SI 0 "register_operand" "=r")
@@ -12791,6 +12996,7 @@ (define_insn "pack<mode>"
   "xxpermdi %x0,%x1,%x2,0"
   [(set_attr "type" "vecperm")])
 
+
 \f
 
 (include "sync.md")
Index: gcc/testsuite/gcc.target/powerpc/fusion2.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/fusion2.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/fusion2.c	(revision 0)
@@ -0,0 +1,10 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-skip-if "" { powerpc*le-*-* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power7" } } */
+/* { dg-options "-mcpu=power7 -mtune=power8 -O3" } */
+
+vector double fusion_vector (vector double *p) { return p[2]; }
+
+/* { dg-final { scan-assembler-times "vector load fusion" 1 } } */
Index: gcc/testsuite/gcc.target/powerpc/fusion3.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/fusion3.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/fusion3.c	(revision 0)
@@ -0,0 +1,18 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power7" } } */
+/* { dg-options "-mcpu=power7 -mtune=power9 -O3" } */
+
+#define LARGE 0x12345
+
+int fusion_float_read (float *p){ return p[LARGE]; }
+int fusion_double_read (double *p){ return p[LARGE]; }
+
+void fusion_float_write (float *p, float f){ p[LARGE] = f; }
+void fusion_double_write (double *p, double d){ p[LARGE] = d; }
+
+/* { dg-final { scan-assembler "load fusion, type SF"  } } */
+/* { dg-final { scan-assembler "load fusion, type DF"  } } */
+/* { dg-final { scan-assembler "store fusion, type SF" } } */
+/* { dg-final { scan-assembler "store fusion, type DF" } } */
Index: gcc/testsuite/gcc.target/powerpc/fusion.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/fusion.c	(revision 229970)
+++ gcc/testsuite/gcc.target/powerpc/fusion.c	(working copy)
@@ -1,6 +1,5 @@
 /* { dg-do compile { target { powerpc*-*-* } } } */
 /* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
-/* { dg-skip-if "" { powerpc*le-*-* } { "*" } { "" } } */
 /* { dg-require-effective-target powerpc_p8vector_ok } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power7" } } */
 /* { dg-options "-mcpu=power7 -mtune=power8 -O3" } */
@@ -14,10 +13,7 @@ int fusion_short (short *p){ return p[LA
 int fusion_int (int *p){ return p[LARGE]; }
 unsigned fusion_uns (unsigned *p){ return p[LARGE]; }
 
-vector double fusion_vector (vector double *p) { return p[2]; }
-
 /* { dg-final { scan-assembler-times "gpr load fusion"    6 } } */
-/* { dg-final { scan-assembler-times "vector load fusion" 1 } } */
 /* { dg-final { scan-assembler-times "lbz"                2 } } */
 /* { dg-final { scan-assembler-times "extsb"              1 } } */
 /* { dg-final { scan-assembler-times "lhz"                2 } } */

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH], Add power9 support to GCC, patch #6 (IEEE 128-bit hardware support)
  2015-11-03 20:29 [PATCH], Add power9 support to GCC, patch #1 Michael Meissner
                   ` (5 preceding siblings ...)
  2015-11-09  0:42 ` [PATCH], Add power9 support to GCC, patch #5 (ISA 3.0 fusion) Michael Meissner
@ 2015-11-09  0:45 ` Michael Meissner
  2015-11-09 19:29   ` Segher Boessenkool
                     ` (2 more replies)
  2015-11-09  0:49 ` [PATCH], Add power9 support to GCC, patch #7 (direct move enhancements) Michael Meissner
                   ` (2 subsequent siblings)
  9 siblings, 3 replies; 47+ messages in thread
From: Michael Meissner @ 2015-11-09  0:45 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, dje.gcc

[-- Attachment #1: Type: text/plain, Size: 3229 bytes --]

This patch adds support for the IEEE 128-bit hardware instructions that are
being added to the PowerPC ISA 3.0 (power9).  With this patch, users on power7
and power8 will use the software emulation functions that are committed, but
still need some enhancment.  On ISA 3.0/power9, they would be able to use the
direct instructions.

I have built this patch with a bootstrap build on a power8 little endian
system.  There were no regressions in the test suite.  Is this patch ok to
install in the trunk?

[gcc]
2015-11-08  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* config/rs6000/rs6000-protos.h (convert_float128_to_int): Add
	declaration.
	(convert_int_to_float128): Likewise.
	(rs6000_generate_compare): Add support for ISA 3.0 (power9)
	hardware support for IEEE 128-bit floating point.
	(rs6000_expand_float128_convert): Likewise.
	(convert_float128_to_int): Likewise.
	(convert_int_to_float128): Likewise.

	* config/rs6000/rs6000.md (UNSPEC_ROUND_TO_ODD): New unspecs for
	ISA 3.0 hardware IEEE 128-bit floating point.
	(UNSPEC_IEEE128_MOVE): Likewise.
	(UNSPEC_IEEE128_CONVERT): Likewise.
	(FMA_F): Add support for IEEE 128-bit floating point hardware
	support.
	(Ff): Add support for DImode.
	(Fv): Likewise.
	(any_fix code iterator): New and updated iterators for IEEE
	128-bit floating point hardware support.
	(any_float code iterator): Likewise.
	(s code attribute): Likewise.
	(su code attribute): Likewise.
	(az code attribute): Likewise.
	(neg<mode>2, FLOAT128 iterator): Add support for IEEE 128-bit
	floating point hardware support.
	(abs<mode>2, FLOAT128 iterator): Likewise.
	(add<mode>3, IEEE128 iterator): New insns for IEEE 128-bit
	floating point hardware.
	(sub<mode>3, IEEE128 iterator): Likewise.
	(mul<mode>3, IEEE128 iterator): Likewise.
	(div<mode>3, IEEE128 iterator): Likewise.
	(copysign<mode>3, IEEE128 iterator): Likewise.
	(sqrt<mode>2, IEEE128 iterator): Likewise.
	(neg<mode>2, IEEE128 iterator): Likewise.
	(abs<mode>2, IEEE128 iterator): Likewise.
	(nabs<mode>2, IEEE128 iterator): Likewise.
	(fma<mode>4_hw, IEEE128 iterator): Likewise.
	(fms<mode>4_hw, IEEE128 iterator): Likewise.
	(nfma<mode>4_hw, IEEE128 iterator): Likewise.
	(nfms<mode>4_hw, IEEE128 iterator): Likewise.
	(extend<SFDF:mode><IEEE128:mode>2_hw): Likewise.
	(trunc<mode>df2_hw, IEEE128 iterator): Likewise.
	(trunc<mode>sf2_hw, IEEE128 iterator): Likewise.
	(fix_fixuns code attribute): Likewise.
	(float_floatuns code attribute): Likewise.
	(<fix_fixuns>_<mode>si2_hw): Likewise.
	(<fix_fixuns>_<mode>di2_hw): Likewise.
	(<float_floatuns>_<mode>si2_hw): Likewise.
	(<float_floatuns>_<mode>di2_hw): Likewise.
	(xscvqp<su>wz_<mode>): Likewise.
	(xscvqp<su>dz_<mode>): Likewise.
	(xscv<su>dqp_<mode): Likewise.
	(ieee128_mfvsrd): Likewise.
	(ieee128_mfvsrwz): Likewise.
	(ieee128_mtvsrw): Likewise.
	(ieee128_mtvsrd): Likewise.
	(trunc<mode>df2_odd): Likewise.
	(cmp<mode>_h): Likewise.

[gcc/testsuite]
2015-11-08  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* gcc.target/powerpc/float128-hw.c: New test for IEEE 128-bit
	hardware floating point support.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

[-- Attachment #2: gcc-power9.official-06b --]
[-- Type: text/plain, Size: 28857 bytes --]

Index: gcc/config/rs6000/rs6000-protos.h
===================================================================
--- gcc/config/rs6000/rs6000-protos.h	(revision 229976)
+++ gcc/config/rs6000/rs6000-protos.h	(working copy)
@@ -55,6 +55,8 @@ extern const char *rs6000_output_move_12
 extern bool rs6000_move_128bit_ok_p (rtx []);
 extern bool rs6000_split_128bit_ok_p (rtx []);
 extern void rs6000_expand_float128_convert (rtx, rtx, bool);
+extern void convert_float128_to_int (rtx *, enum rtx_code);
+extern void convert_int_to_float128 (rtx *, enum rtx_code);
 extern void rs6000_expand_vector_init (rtx, rtx);
 extern void paired_expand_vector_init (rtx, rtx);
 extern void rs6000_expand_vector_set (rtx, rtx, int);
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 229976)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -20504,11 +20504,12 @@ rs6000_generate_compare (rtx cmp, machin
       emit_insn (cmp);
     }
 
-  /* IEEE 128-bit support in VSX registers.  The comparison functions
-     (__cmpokf2 and __cmpukf2) returns 0..15 that is laid out the same way as
-     the PowerPC CR register would for a normal floating point comparison from
-     the fcmpo and fcmpu instructions.  */
-  else if (FLOAT128_IEEE_P (mode))
+  /* IEEE 128-bit support in VSX registers.  If we do not have IEEE 128-bit
+     hardware, the comparison functions (__cmpokf2 and __cmpukf2) returns 0..15
+     that is laid out the same way as the PowerPC CR register would for a
+     normal floating point comparison from the fcmpo and fcmpu
+     instructions.  */
+  else if (!TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode))
     {
       rtx and_reg = gen_reg_rtx (SImode);
       rtx dest = gen_reg_rtx (SImode);
@@ -20647,7 +20648,7 @@ rs6000_generate_compare (rtx cmp, machin
   /* Some kinds of FP comparisons need an OR operation;
      under flag_finite_math_only we don't bother.  */
   if (FLOAT_MODE_P (mode)
-      && !FLOAT128_IEEE_P (mode)
+      && (!FLOAT128_IEEE_P (mode) || TARGET_FLOAT128_HW)
       && !flag_finite_math_only
       && !(TARGET_HARD_FLOAT && !TARGET_FPRS)
       && (code == LE || code == GE
@@ -20740,6 +20741,56 @@ rs6000_expand_float128_convert (rtx dest
   bool do_move = false;
   rtx libfunc = NULL_RTX;
   rtx dest2;
+  typedef rtx (*rtx_2func_t) (rtx, rtx);
+  rtx_2func_t hw_convert = (rtx_2func_t)0;
+  size_t kf_or_tf;
+
+  struct hw_conv_t {
+    rtx_2func_t	from_df;
+    rtx_2func_t from_sf;
+    rtx_2func_t from_si_sign;
+    rtx_2func_t from_si_uns;
+    rtx_2func_t from_di_sign;
+    rtx_2func_t from_di_uns;
+    rtx_2func_t to_df;
+    rtx_2func_t to_sf;
+    rtx_2func_t to_si_sign;
+    rtx_2func_t to_si_uns;
+    rtx_2func_t to_di_sign;
+    rtx_2func_t to_di_uns;
+  } hw_conversions[2] = {
+    /* convertions to/from KFmode */
+    {
+      gen_extenddfkf2_hw,		/* KFmode <- DFmode.  */
+      gen_extendsfkf2_hw,		/* KFmode <- SFmode.  */
+      gen_float_kfsi2_hw,		/* KFmode <- SImode (signed).  */
+      gen_floatuns_kfsi2_hw,		/* KFmode <- SImode (unsigned).  */
+      gen_float_kfdi2_hw,		/* KFmode <- DImode (signed).  */
+      gen_floatuns_kfdi2_hw,		/* KFmode <- DImode (unsigned).  */
+      gen_trunckfdf2_hw,		/* DFmode <- KFmode.  */
+      gen_trunckfsf2_hw,		/* SFmode <- KFmode.  */
+      gen_fix_kfsi2_hw,			/* SImode <- KFmode (signed).  */
+      gen_fixuns_kfsi2_hw,		/* SImode <- KFmode (unsigned).  */
+      gen_fix_kfdi2_hw,			/* DImode <- KFmode (signed).  */
+      gen_fixuns_kfdi2_hw,		/* DImode <- KFmode (unsigned).  */
+    },
+
+    /* convertions to/from TFmode */
+    {
+      gen_extenddftf2_hw,		/* TFmode <- DFmode.  */
+      gen_extendsftf2_hw,		/* TFmode <- SFmode.  */
+      gen_float_tfsi2_hw,		/* TFmode <- SImode (signed).  */
+      gen_floatuns_tfsi2_hw,		/* TFmode <- SImode (unsigned).  */
+      gen_float_tfdi2_hw,		/* TFmode <- DImode (signed).  */
+      gen_floatuns_tfdi2_hw,		/* TFmode <- DImode (unsigned).  */
+      gen_trunctfdf2_hw,		/* DFmode <- TFmode.  */
+      gen_trunctfsf2_hw,		/* SFmode <- TFmode.  */
+      gen_fix_tfsi2_hw,			/* SImode <- TFmode (signed).  */
+      gen_fixuns_tfsi2_hw,		/* SImode <- TFmode (unsigned).  */
+      gen_fix_tfdi2_hw,			/* DImode <- TFmode (signed).  */
+      gen_fixuns_tfdi2_hw,		/* DImode <- TFmode (unsigned).  */
+    },
+  };
 
   if (dest_mode == src_mode)
     gcc_unreachable ();
@@ -20759,14 +20810,23 @@ rs6000_expand_float128_convert (rtx dest
   /* Convert to IEEE 128-bit floating point.  */
   if (FLOAT128_IEEE_P (dest_mode))
     {
+      if (dest_mode == KFmode)
+	kf_or_tf = 0;
+      else if (dest_mode == TFmode)
+	kf_or_tf = 1;
+      else
+	gcc_unreachable ();
+
       switch (src_mode)
 	{
 	case DFmode:
 	  cvt = sext_optab;
+	  hw_convert = hw_conversions[kf_or_tf].from_df;
 	  break;
 
 	case SFmode:
 	  cvt = sext_optab;
+	  hw_convert = hw_conversions[kf_or_tf].from_sf;
 	  break;
 
 	case KFmode:
@@ -20779,8 +20839,29 @@ rs6000_expand_float128_convert (rtx dest
 	  break;
 
 	case SImode:
+	  if (unsigned_p)
+	    {
+	      cvt = ufloat_optab;
+	      hw_convert = hw_conversions[kf_or_tf].from_si_uns;
+	    }
+	  else
+	    {
+	      cvt = sfloat_optab;
+	      hw_convert = hw_conversions[kf_or_tf].from_si_sign;
+	    }
+	  break;
+
 	case DImode:
-	  cvt = (unsigned_p) ? ufloat_optab : sfloat_optab;
+	  if (unsigned_p)
+	    {
+	      cvt = ufloat_optab;
+	      hw_convert = hw_conversions[kf_or_tf].from_di_uns;
+	    }
+	  else
+	    {
+	      cvt = sfloat_optab;
+	      hw_convert = hw_conversions[kf_or_tf].from_di_sign;
+	    }
 	  break;
 
 	default:
@@ -20791,14 +20872,23 @@ rs6000_expand_float128_convert (rtx dest
   /* Convert from IEEE 128-bit floating point.  */
   else if (FLOAT128_IEEE_P (src_mode))
     {
+      if (src_mode == KFmode)
+	kf_or_tf = 0;
+      else if (src_mode == TFmode)
+	kf_or_tf = 1;
+      else
+	gcc_unreachable ();
+
       switch (dest_mode)
 	{
 	case DFmode:
 	  cvt = trunc_optab;
+	  hw_convert = hw_conversions[kf_or_tf].to_df;
 	  break;
 
 	case SFmode:
 	  cvt = trunc_optab;
+	  hw_convert = hw_conversions[kf_or_tf].to_sf;
 	  break;
 
 	case KFmode:
@@ -20811,8 +20901,29 @@ rs6000_expand_float128_convert (rtx dest
 	  break;
 
 	case SImode:
+	  if (unsigned_p)
+	    {
+	      cvt = ufix_optab;
+	      hw_convert = hw_conversions[kf_or_tf].to_si_uns;
+	    }
+	  else
+	    {
+	      cvt = sfix_optab;
+	      hw_convert = hw_conversions[kf_or_tf].to_si_sign;
+	    }
+	  break;
+
 	case DImode:
-	  cvt = (unsigned_p) ? ufix_optab : sfix_optab;
+	  if (unsigned_p)
+	    {
+	      cvt = ufix_optab;
+	      hw_convert = hw_conversions[kf_or_tf].to_di_uns;
+	    }
+	  else
+	    {
+	      cvt = sfix_optab;
+	      hw_convert = hw_conversions[kf_or_tf].to_di_sign;
+	    }
 	  break;
 
 	default:
@@ -20831,6 +20942,10 @@ rs6000_expand_float128_convert (rtx dest
   if (do_move)
     emit_move_insn (dest, gen_lowpart (dest_mode, src));
 
+  /* Handle conversion if we have hardware support.  */
+  else if (TARGET_FLOAT128_HW && hw_convert)
+    emit_insn ((hw_convert) (dest, src));
+
   /* Call an external function to do the conversion.  */
   else if (cvt != unknown_optab)
     {
@@ -20851,6 +20966,92 @@ rs6000_expand_float128_convert (rtx dest
   return;
 }
 
+/* Split a conversion from __float128 to an integer type into separate insns.
+   OPERANDS points to the destination, source, and V2DI temporary
+   register. CODE is either FIX or UNSIGNED_FIX.  */
+
+void
+convert_float128_to_int (rtx *operands, enum rtx_code code)
+{
+  rtx dest = operands[0];
+  rtx src = operands[1];
+  rtx tmp = operands[2];
+  rtx cvt;
+  rtvec cvt_vec;
+  rtx cvt_unspec;
+  rtvec move_vec;
+  rtx move_unspec;
+
+  if (GET_CODE (tmp) == SCRATCH)
+    tmp = gen_reg_rtx (V2DImode);
+
+  if (MEM_P (dest))
+    dest = rs6000_address_for_fpconvert (dest);
+
+  /* Generate the actual convert insn of the form:
+     (set (tmp) (unspec:V2DI [(fix:SI (reg:KF))] UNSPEC_IEEE128_CONVERT)).  */
+  cvt = gen_rtx_fmt_e (code, GET_MODE (dest), src);
+  cvt_vec = gen_rtvec (1, cvt);
+  cvt_unspec = gen_rtx_UNSPEC (V2DImode, cvt_vec, UNSPEC_IEEE128_CONVERT);
+  emit_insn (gen_rtx_SET (tmp, cvt_unspec));
+
+  /* Generate the move insn of the form:
+     (set (dest:SI) (unspec:SI [(tmp:V2DI))] UNSPEC_IEEE128_MOVE)).  */
+  move_vec = gen_rtvec (1, tmp);
+  move_unspec = gen_rtx_UNSPEC (GET_MODE (dest), move_vec, UNSPEC_IEEE128_MOVE);
+  emit_insn (gen_rtx_SET (dest, move_unspec));
+}
+
+/* Split a conversion from an integer type to __float128 into separate insns.
+   OPERANDS points to the destination, source, and V2DI temporary
+   register. CODE is either FLOAT or UNSIGNED_FLOAT.  */
+
+void
+convert_int_to_float128 (rtx *operands, enum rtx_code code)
+{
+  rtx dest = operands[0];
+  rtx src = operands[1];
+  rtx tmp = operands[2];
+  rtx cvt;
+  rtvec cvt_vec;
+  rtx cvt_unspec;
+  rtvec move_vec;
+  rtx move_unspec;
+  rtx unsigned_flag;
+
+  if (GET_CODE (tmp) == SCRATCH)
+    tmp = gen_reg_rtx (V2DImode);
+
+  if (MEM_P (src))
+    src = rs6000_address_for_fpconvert (src);
+
+  /* Generate the move of the integer into the Altivec register of the form:
+     (set (tmp:V2DI) (unspec:V2DI [(src:SI)
+				   (const_int 0)] UNSPEC_IEEE128_MOVE)).
+
+     or:
+     (set (tmp:V2DI) (unspec:V2DI [(src:DI)] UNSPEC_IEEE128_MOVE)).  */
+
+  if (GET_MODE (src) == SImode)
+    {
+      unsigned_flag = (code == UNSIGNED_FLOAT) ? const1_rtx : const0_rtx;
+      move_vec = gen_rtvec (2, src, unsigned_flag);
+    }
+  else
+    move_vec = gen_rtvec (1, src);
+
+  move_unspec = gen_rtx_UNSPEC (V2DImode, move_vec, UNSPEC_IEEE128_MOVE);
+  emit_insn (gen_rtx_SET (tmp, move_unspec));
+
+  /* Generate the actual convert insn of the form:
+     (set (dest:KF) (float:KF (unspec:DI [(tmp:V2DI)]
+					 UNSPEC_IEEE128_CONVERT))).  */
+  cvt_vec = gen_rtvec (1, tmp);
+  cvt_unspec = gen_rtx_UNSPEC (DImode, cvt_vec, UNSPEC_IEEE128_CONVERT);
+  cvt = gen_rtx_fmt_e (code, GET_MODE (dest), cvt_unspec);
+  emit_insn (gen_rtx_SET (dest, cvt));
+}
+
 \f
 /* Emit the RTL for an sISEL pattern.  */
 
Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(revision 229976)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -143,6 +143,9 @@ (define_c_enum "unspec"
    UNSPEC_STACK_CHECK
    UNSPEC_FUSION_P9
    UNSPEC_FUSION_ADDIS
+   UNSPEC_ROUND_TO_ODD
+   UNSPEC_IEEE128_MOVE
+   UNSPEC_IEEE128_CONVERT
   ])
 
 ;;
@@ -381,6 +384,8 @@ (define_mode_iterator FMA_F [
   (V2SF "TARGET_PAIRED_FLOAT")
   (V4SF "VECTOR_UNIT_ALTIVEC_OR_VSX_P (V4SFmode)")
   (V2DF "VECTOR_UNIT_ALTIVEC_OR_VSX_P (V2DFmode)")
+  (KF "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (KFmode)")
+  (TF "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (TFmode)")
   ])
 
 ; Floating point move iterators to combine binary and decimal moves
@@ -485,10 +490,10 @@ (define_mode_attr Ftrad		[(SF "s") (DF "
 (define_mode_attr Fvsx		[(SF "sp") (DF	"dp")])
 
 ; SF/DF constraint for arithmetic on traditional floating point registers
-(define_mode_attr Ff		[(SF "f") (DF "d")])
+(define_mode_attr Ff		[(SF "f") (DF "d") (DI "d")])
 
 ; SF/DF constraint for arithmetic on VSX registers
-(define_mode_attr Fv		[(SF "wy") (DF "ws")])
+(define_mode_attr Fv		[(SF "wy") (DF "ws") (DI "wi")])
 
 ; SF/DF constraint for arithmetic on altivec registers
 (define_mode_attr Fa		[(SF "wu") (DF "wv")])
@@ -510,9 +515,26 @@ (define_code_attr return_str [(return ""
 (define_code_iterator iorxor [ior xor])
 
 ; Signed/unsigned variants of ops.
-(define_code_iterator any_extend [sign_extend zero_extend])
-(define_code_attr u [(sign_extend "") (zero_extend "u")])
-(define_code_attr su [(sign_extend "s") (zero_extend "u")])
+(define_code_iterator any_extend	[sign_extend zero_extend])
+(define_code_iterator any_fix		[fix unsigned_fix])
+(define_code_iterator any_float		[float unsigned_float])
+
+(define_code_attr u  [(sign_extend	"")
+		      (zero_extend	"u")])
+
+(define_code_attr su [(sign_extend	"s")
+		      (zero_extend	"u")
+		      (fix		"s")
+		      (unsigned_fix	"s")
+		      (float		"s")
+		      (unsigned_float	"u")])
+
+(define_code_attr az [(sign_extend	"a")
+		      (zero_extend	"z")
+		      (fix		"a")
+		      (unsigned_fix	"z")
+		      (float		"a")
+		      (unsigned_float	"z")])
 
 ; Various instructions that come in SI and DI forms.
 ; A generic w/d attribute, for things like cmpw/cmpd.
@@ -7003,7 +7025,16 @@ (define_expand "neg<mode>2"
 {
   if (FLOAT128_IEEE_P (<MODE>mode))
     {
-      if (TARGET_FLOAT128)
+      if (TARGET_FLOAT128_HW)
+	{
+	  if (<MODE>mode == TFmode)
+	    emit_insn (gen_negtf2_hw (operands[0], operands[1]));
+	  else if (<MODE>mode == KFmode)
+	    emit_insn (gen_negkf2_hw (operands[0], operands[1]));
+	  else
+	    gcc_unreachable ();
+	}
+      else if (TARGET_FLOAT128)
 	{
 	  if (<MODE>mode == TFmode)
 	    emit_insn (gen_ieee_128bit_vsx_negtf2 (operands[0], operands[1]));
@@ -7053,7 +7084,17 @@ (define_expand "abs<mode>2"
 
   if (FLOAT128_IEEE_P (<MODE>mode))
     {
-      if (TARGET_FLOAT128)
+      if (TARGET_FLOAT128_HW)
+	{
+	  if (<MODE>mode == TFmode)
+	    emit_insn (gen_abstf2_hw (operands[0], operands[1]));
+	  else if (<MODE>mode == KFmode)
+	    emit_insn (gen_abskf2_hw (operands[0], operands[1]));
+	  else
+	    FAIL;
+	  DONE;
+	}
+      else if (TARGET_FLOAT128)
 	{
 	  if (<MODE>mode == TFmode)
 	    emit_insn (gen_ieee_128bit_vsx_abstf2 (operands[0], operands[1]));
@@ -7140,7 +7181,7 @@ (define_insn_and_split "ieee_128bit_vsx_
   [(set (match_operand:IEEE128 0 "register_operand" "=wa")
 	(neg:IEEE128 (match_operand:IEEE128 1 "register_operand" "wa")))
    (clobber (match_scratch:V16QI 2 "=v"))]
-  "TARGET_FLOAT128"
+  "TARGET_FLOAT128 && !TARGET_FLOAT128_HW"
   "#"
   "&& 1"
   [(parallel [(set (match_dup 0)
@@ -7160,7 +7201,7 @@ (define_insn "*ieee_128bit_vsx_neg<mode>
   [(set (match_operand:IEEE128 0 "register_operand" "=wa")
 	(neg:IEEE128 (match_operand:IEEE128 1 "register_operand" "wa")))
    (use (match_operand:V16QI 2 "register_operand" "=v"))]
-  "TARGET_FLOAT128"
+  "TARGET_FLOAT128 && !TARGET_FLOAT128_HW"
   "xxlxor %x0,%x1,%x2"
   [(set_attr "type" "vecsimple")])
 
@@ -7169,7 +7210,7 @@ (define_insn_and_split "ieee_128bit_vsx_
   [(set (match_operand:IEEE128 0 "register_operand" "=wa")
 	(abs:IEEE128 (match_operand:IEEE128 1 "register_operand" "wa")))
    (clobber (match_scratch:V16QI 2 "=v"))]
-  "TARGET_FLOAT128 && FLOAT128_IEEE_P (<MODE>mode)"
+  "TARGET_FLOAT128 && !TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<MODE>mode)"
   "#"
   "&& 1"
   [(parallel [(set (match_dup 0)
@@ -7189,7 +7230,7 @@ (define_insn "*ieee_128bit_vsx_abs<mode>
   [(set (match_operand:IEEE128 0 "register_operand" "=wa")
 	(abs:IEEE128 (match_operand:IEEE128 1 "register_operand" "wa")))
    (use (match_operand:V16QI 2 "register_operand" "=v"))]
-  "TARGET_FLOAT128"
+  "TARGET_FLOAT128 && !TARGET_FLOAT128_HW"
   "xxlandc %x0,%x1,%x2"
   [(set_attr "type" "vecsimple")])
 
@@ -7200,7 +7241,7 @@ (define_insn_and_split "*ieee_128bit_vsx
 	 (abs:IEEE128
 	  (match_operand:IEEE128 1 "register_operand" "wa"))))
    (clobber (match_scratch:V16QI 2 "=v"))]
-  "TARGET_FLOAT128 && FLOAT128_IEEE_P (<MODE>mode)"
+  "TARGET_FLOAT128 && !TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<MODE>mode)"
   "#"
   "&& 1"
   [(parallel [(set (match_dup 0)
@@ -7222,7 +7263,7 @@ (define_insn "*ieee_128bit_vsx_nabs<mode
 	 (abs:IEEE128
 	  (match_operand:IEEE128 1 "register_operand" "wa"))))
    (use (match_operand:V16QI 2 "register_operand" "=v"))]
-  "TARGET_FLOAT128"
+  "TARGET_FLOAT128 && !TARGET_FLOAT128_HW"
   "xxlor %x0,%x1,%x2"
   [(set_attr "type" "vecsimple")])
 
@@ -12998,6 +13039,335 @@ (define_insn "pack<mode>"
 
 
 \f
+;; ISA 2.08 IEEE 128-bit floating point support.
+
+(define_insn "add<mode>3"
+  [(set (match_operand:IEEE128 0 "altivec_register_operand" "=v")
+	(plus:IEEE128
+	 (match_operand:IEEE128 1 "altivec_register_operand" "v")
+	 (match_operand:IEEE128 2 "altivec_register_operand" "v")))]
+  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<MODE>mode)"
+  "xsaddqp %0,%1,%2"
+  [(set_attr "type" "vecfloat")])
+
+(define_insn "sub<mode>3"
+  [(set (match_operand:IEEE128 0 "altivec_register_operand" "=v")
+	(minus:IEEE128
+	 (match_operand:IEEE128 1 "altivec_register_operand" "v")
+	 (match_operand:IEEE128 2 "altivec_register_operand" "v")))]
+  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<MODE>mode)"
+  "xssubqp %0,%1,%2"
+  [(set_attr "type" "vecfloat")])
+
+(define_insn "mul<mode>3"
+  [(set (match_operand:IEEE128 0 "altivec_register_operand" "=v")
+	(mult:IEEE128
+	 (match_operand:IEEE128 1 "altivec_register_operand" "v")
+	 (match_operand:IEEE128 2 "altivec_register_operand" "v")))]
+  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<MODE>mode)"
+  "xsmulqp %0,%1,%2"
+  [(set_attr "type" "vecfloat")])
+
+(define_insn "div<mode>3"
+  [(set (match_operand:IEEE128 0 "altivec_register_operand" "=v")
+	(div:IEEE128
+	 (match_operand:IEEE128 1 "altivec_register_operand" "v")
+	 (match_operand:IEEE128 2 "altivec_register_operand" "v")))]
+  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<MODE>mode)"
+  "xsdivqp %0,%1,%2"
+  [(set_attr "type" "vecdiv")])
+
+(define_insn "sqrt<mode>2"
+  [(set (match_operand:IEEE128 0 "altivec_register_operand" "=v")
+	(sqrt:IEEE128
+	 (match_operand:IEEE128 1 "altivec_register_operand" "v")))]
+  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<MODE>mode)"
+   "xssqrtqp %0,%1"
+  [(set_attr "type" "vecdiv")])
+
+(define_insn "copysign<mode>3"
+  [(set (match_operand:IEEE128 0 "altivec_register_operand" "=v")
+	(unspec:IEEE128
+	 [(match_operand:IEEE128 1 "altivec_register_operand" "v")
+	  (match_operand:IEEE128 2 "altivec_register_operand" "v")]
+	 UNSPEC_COPYSIGN))]
+  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<MODE>mode)"
+   "xscpsgnqp %0,%2,%1"
+  [(set_attr "type" "vecsimple")])
+
+(define_insn "neg<mode>2_hw"
+  [(set (match_operand:IEEE128 0 "altivec_register_operand" "=v")
+	(neg:IEEE128
+	 (match_operand:IEEE128 1 "altivec_register_operand" "v")))]
+  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<MODE>mode)"
+  "xsnegqp %0,%1"
+  [(set_attr "type" "vecfloat")])
+
+
+(define_insn "abs<mode>2_hw"
+  [(set (match_operand:IEEE128 0 "altivec_register_operand" "=v")
+	(abs:IEEE128
+	 (match_operand:IEEE128 1 "altivec_register_operand" "v")))]
+  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<MODE>mode)"
+  "xsabsqp %0,%1"
+  [(set_attr "type" "vecfloat")])
+
+
+(define_insn "*nabs<mode>2_hw"
+  [(set (match_operand:IEEE128 0 "altivec_register_operand" "=v")
+	(neg:IEEE128
+	 (abs:IEEE128
+	  (match_operand:IEEE128 1 "altivec_register_operand" "v"))))]
+  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<MODE>mode)"
+  "xsnabsqp %0,%1"
+  [(set_attr "type" "vecfloat")])
+
+;; Initially don't worry about doing fusion
+(define_insn "*fma<mode>4_hw"
+  [(set (match_operand:IEEE128 0 "altivec_register_operand" "=v")
+	(fma:IEEE128
+	 (match_operand:IEEE128 1 "altivec_register_operand" "%v")
+	 (match_operand:IEEE128 2 "altivec_register_operand" "v")
+	 (match_operand:IEEE128 3 "altivec_register_operand" "0")))]
+  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<MODE>mode)"
+  "xsmaddqp %0,%1,%2"
+  [(set_attr "type" "vecfloat")])
+
+(define_insn "*fms<mode>4_hw"
+  [(set (match_operand:IEEE128 0 "altivec_register_operand" "=v")
+	(fma:IEEE128
+	 (match_operand:IEEE128 1 "altivec_register_operand" "%v")
+	 (match_operand:IEEE128 2 "altivec_register_operand" "v")
+	 (neg:IEEE128
+	  (match_operand:IEEE128 3 "altivec_register_operand" "0"))))]
+  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<MODE>mode)"
+  "xsmsubqp %0,%1,%2"
+  [(set_attr "type" "vecfloat")])
+
+(define_insn "*nfma<mode>4_hw"
+  [(set (match_operand:IEEE128 0 "altivec_register_operand" "=v")
+	(neg:IEEE128
+	 (fma:IEEE128
+	  (match_operand:IEEE128 1 "altivec_register_operand" "%v")
+	  (match_operand:IEEE128 2 "altivec_register_operand" "v")
+	  (match_operand:IEEE128 3 "altivec_register_operand" "0"))))]
+  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<MODE>mode)"
+  "xsnmaddqp %0,%1,%2"
+  [(set_attr "type" "vecfloat")])
+
+(define_insn "*nfms<mode>4_hw"
+  [(set (match_operand:IEEE128 0 "altivec_register_operand" "=v")
+	(neg:IEEE128
+	 (fma:IEEE128
+	  (match_operand:IEEE128 1 "altivec_register_operand" "%v")
+	  (match_operand:IEEE128 2 "altivec_register_operand" "v")
+	  (neg:IEEE128
+	   (match_operand:IEEE128 3 "altivec_register_operand" "0")))))]
+  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<MODE>mode)"
+  "xsnmsubqp %0,%1,%2"
+  [(set_attr "type" "vecfloat")])
+
+(define_insn "extend<SFDF:mode><IEEE128:mode>2_hw"
+  [(set (match_operand:IEEE128 0 "altivec_register_operand" "=v")
+	(float_extend:IEEE128
+	 (match_operand:SFDF 1 "altivec_register_operand" "v")))]
+  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<IEEE128:MODE>mode)"
+  "xscvdpqp %0,%1"
+  [(set_attr "type" "vecfloat")])
+
+(define_insn "trunc<mode>df2_hw"
+  [(set (match_operand:DF 0 "altivec_register_operand" "=v")
+	(float_truncate:DF
+	 (match_operand:IEEE128 1 "altivec_register_operand" "v")))]
+  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<MODE>mode)"
+  "xscvqpdp %0,%1"
+  [(set_attr "type" "vecfloat")])
+
+;; There is no KFmode -> SFmode instruction. Preserve the accuracy by doing
+;; the KFmode -> DFmode conversion using round to odd rather than the normal
+;; conversion
+(define_insn_and_split "trunc<mode>sf2_hw"
+  [(set (match_operand:SF 0 "vsx_register_operand" "=wy")
+	(float_truncate:SF
+	 (match_operand:IEEE128 1 "altivec_register_operand" "v")))
+   (clobber (match_scratch:DF 2 "=v"))]
+  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<MODE>mode)"
+  "#"
+  "&& 1"
+  [(set (match_dup 2)
+	(unspec:DF [(match_dup 1)] UNSPEC_ROUND_TO_ODD))
+   (set (match_dup 0)
+	(float_truncate:SF (match_dup 2)))]
+{
+  if (GET_CODE (operands[2]) == SCRATCH)
+    operands[2] = gen_reg_rtx (DFmode);
+}
+  [(set_attr "type" "vecfloat")
+   (set_attr "length" "8")])
+
+;; At present SImode is not allowed in VSX registers at all, and DImode is only
+;; allowed in the traditional floating point registers. Use V2DImode so that
+;; we can get a value in an Altivec register.
+
+(define_code_attr fix_fixuns	 [(fix   "fix")   (unsigned_fix   "fixuns")])
+(define_code_attr float_floatuns [(float "float") (unsigned_float "floatuns")])
+
+(define_insn_and_split "<fix_fixuns>_<mode>si2_hw"
+  [(set (match_operand:SI 0 "nonimmediate_operand" "=r,Z")
+	(any_fix:SI (match_operand:IEEE128 1 "altivec_register_operand" "v,v")))
+   (clobber (match_scratch:V2DI 2 "=v,v"))]
+  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<MODE>mode)"
+  "#"
+  "&& 1"
+  [(pc)]
+{
+  convert_float128_to_int (operands, <CODE>);
+  DONE;
+}
+  [(set_attr "length" "8")
+   (set_attr "type" "mftgpr,fpstore")])
+
+(define_insn_and_split "<fix_fixuns>_<mode>di2_hw"
+  [(set (match_operand:DI 0 "nonimmediate_operand" "=wr,wi,Z")
+	(any_fix:DI (match_operand:IEEE128 1 "altivec_register_operand" "v,v,v")))
+   (clobber (match_scratch:V2DI 2 "=v,v,v"))]
+  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<MODE>mode)"
+  "#"
+  "&& 1"
+  [(pc)]
+{
+  convert_float128_to_int (operands, <CODE>);
+  DONE;
+}
+  [(set_attr "length" "8")
+   (set_attr "type" "mftgpr,vecsimple,fpstore")])
+
+(define_insn_and_split "<float_floatuns>_<mode>si2_hw"
+  [(set (match_operand:IEEE128 0 "altivec_register_operand" "=v,v")
+	(any_float:IEEE128 (match_operand:SI 1 "nonimmediate_operand" "r,Z")))
+   (clobber (match_scratch:V2DI 2 "=v,v"))]
+  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<MODE>mode)"
+  "#"
+  "&& 1"
+  [(pc)]
+{
+  convert_int_to_float128 (operands, <CODE>);
+  DONE;
+}
+  [(set_attr "length" "8")
+   (set_attr "type" "vecfloat")])
+
+(define_insn_and_split "<float_floatuns>_<mode>di2_hw"
+  [(set (match_operand:IEEE128 0 "altivec_register_operand" "=v,v,v")
+	(any_float:IEEE128 (match_operand:DI 1 "nonimmediate_operand" "wi,wr,Z")))
+   (clobber (match_scratch:V2DI 2 "=v,v,v"))]
+  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<MODE>mode)"
+  "#"
+  "&& 1"
+  [(pc)]
+{
+  convert_int_to_float128 (operands, <CODE>);
+  DONE;
+}
+  [(set_attr "length" "8")
+   (set_attr "type" "vecfloat")])
+
+;; Integer conversion instructions, using V2DImode to get an Altivec register
+(define_insn "*xscvqp<su>wz_<mode>"
+  [(set (match_operand:V2DI 0 "altivec_register_operand" "=v")
+	(unspec:V2DI
+	 [(any_fix:SI
+	   (match_operand:IEEE128 1 "altivec_register_operand" "v"))]
+	 UNSPEC_IEEE128_CONVERT))]
+  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<MODE>mode)"
+  "xscvqp<su>wz %0,%1"
+  [(set_attr "type" "vecfloat")])
+
+(define_insn "*xscvqp<su>dz_<mode>"
+  [(set (match_operand:V2DI 0 "altivec_register_operand" "=v")
+	(unspec:V2DI
+	 [(any_fix:DI
+	   (match_operand:IEEE128 1 "altivec_register_operand" "v"))]
+	 UNSPEC_IEEE128_CONVERT))]
+  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<MODE>mode)"
+  "xscvqp<su>dz %0,%1"
+  [(set_attr "type" "vecfloat")])
+
+(define_insn "*xscv<su>dqp_<mode>"
+  [(set (match_operand:IEEE128 0 "altivec_register_operand" "=v")
+	(any_float:IEEE128
+	 (unspec:DI [(match_operand:V2DI 1 "altivec_register_operand" "v")]
+		    UNSPEC_IEEE128_CONVERT)))]
+  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<MODE>mode)"
+  "xscv<su>dqp %0,%1"
+  [(set_attr "type" "vecfloat")])
+
+(define_insn "*ieee128_mfvsrd"
+  [(set (match_operand:DI 0 "reg_or_indexed_operand" "=wr,Z,wi")
+	(unspec:DI [(match_operand:V2DI 1 "altivec_register_operand" "v,v,v")]
+		   UNSPEC_IEEE128_MOVE))]
+  "TARGET_FLOAT128_HW && TARGET_POWERPC64"
+  "@
+   mfvsrd %0,%x1
+   stxsdx %x1,%y0
+   xxlor %x0,%x1,%x1"
+  [(set_attr "type" "mftgpr,vecsimple,fpstore")])
+
+(define_insn "*ieee128_mfvsrwz"
+  [(set (match_operand:SI 0 "reg_or_indexed_operand" "=r,Z")
+	(unspec:SI [(match_operand:V2DI 1 "altivec_register_operand" "v,v")]
+		   UNSPEC_IEEE128_MOVE))]
+  "TARGET_FLOAT128_HW"
+  "@
+   mfvsrwz %0,%x1
+   stxsiwx %x1,%y0"
+  [(set_attr "type" "mftgpr,fpstore")])
+
+;; 0 says do sign-extension, 1 says zero-extension
+(define_insn "*ieee128_mtvsrw"
+  [(set (match_operand:V2DI 0 "altivec_register_operand" "=v,v,v,v")
+	(unspec:V2DI [(match_operand:SI 1 "nonimmediate_operand" "r,Z,r,Z")
+		      (match_operand:SI 2 "const_0_to_1_operand" "O,O,n,n")]
+		     UNSPEC_IEEE128_MOVE))]
+  "TARGET_FLOAT128_HW"
+  "@
+   mtvsrwa %x0,%1
+   lxsiwax %x0,%y1
+   mtvsrwz %x0,%1
+   lxsiwzx %x0,%y1"
+  [(set_attr "type" "mffgpr,fpload,mffgpr,fpload")])
+
+
+(define_insn "*ieee128_mtvsrd"
+  [(set (match_operand:V2DI 0 "altivec_register_operand" "=v,v,v")
+	(unspec:V2DI [(match_operand:DI 1 "nonimmediate_operand" "wr,Z,wi")]
+		     UNSPEC_IEEE128_MOVE))]
+  "TARGET_FLOAT128_HW"
+  "@
+   mtvsrd %x0,%1
+   lxsdx %x0,%y1
+   xxlor %x0,%x1,%x1"
+  [(set_attr "type" "mffgpr,fpload,vecsimple")])
+
+;; IEEE 128-bit instructions with round to odd semantics
+(define_insn "*trunc<mode>df2_odd"
+  [(set (match_operand:DF 0 "vsx_register_operand" "=v")
+	(unspec:DF [(match_operand:IEEE128 1 "altivec_register_operand" "v")]
+		   UNSPEC_ROUND_TO_ODD))]
+  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<MODE>mode)"
+  "xscvqpdpo %0,%1"
+  [(set_attr "type" "vecfloat")])
+
+;; IEEE 128-bit comparisons
+(define_insn "*cmp<mode>_hw"
+  [(set (match_operand:CCFP 0 "cc_reg_operand" "=y")
+	(compare:CCFP (match_operand:IEEE128 1 "altivec_register_operand" "v")
+		      (match_operand:IEEE128 2 "altivec_register_operand" "v")))]
+  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<MODE>mode)"
+   "xscmpuqp %0,%1,%2"
+  [(set_attr "type" "fpcompare")])
+
+\f
 
 (include "sync.md")
 (include "vector.md")
Index: gcc/testsuite/gcc.target/powerpc/float128-hw.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/float128-hw.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/float128-hw.c	(revision 0)
@@ -0,0 +1,19 @@
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-options "-mcpu=power9 -O3" } */
+
+__float128 f128_add (__float128 a, __float128 b) { return a+b; }
+__float128 f128_sub (__float128 a, __float128 b) { return a-b; }
+__float128 f128_mul (__float128 a, __float128 b) { return a*b; }
+__float128 f128_div (__float128 a, __float128 b) { return a/b; }
+__float128 f128_fma (__float128 a, __float128 b, __float128 c) { return (a*b)+c; }
+long f128_cmove (__float128 a, __float128 b, long c, long d) { return (a == b) ? c : d; }
+
+/* { dg-final { scan-assembler "xsaddqp"  } } */
+/* { dg-final { scan-assembler "xssubqp"  } } */
+/* { dg-final { scan-assembler "xsmulqp"  } } */
+/* { dg-final { scan-assembler "xsdivqp"  } } */
+/* { dg-final { scan-assembler "xsmaddqp" } } */
+/* { dg-final { scan-assembler "xscmpuqp" } } */

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH], Add power9 support to GCC, patch #7 (direct move enhancements)
  2015-11-03 20:29 [PATCH], Add power9 support to GCC, patch #1 Michael Meissner
                   ` (6 preceding siblings ...)
  2015-11-09  0:45 ` [PATCH], Add power9 support to GCC, patch #6 (IEEE 128-bit hardware support) Michael Meissner
@ 2015-11-09  0:49 ` Michael Meissner
  2015-11-09 20:00   ` Segher Boessenkool
                     ` (2 more replies)
  2015-11-10 20:56 ` [PATCH, applied], Add power9 support to GCC, patch #9 (config.gcc) Michael Meissner
  2015-11-10 21:56 ` [PATCH], Add power9 support to GCC, patch #10 (SFmode/DFmode d-form addressing) Michael Meissner
  9 siblings, 3 replies; 47+ messages in thread
From: Michael Meissner @ 2015-11-09  0:49 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, dje.gcc

This patch adds support for the new direct move instructions (MFVSRLD and
MTVSRDD) that simplify moving 128-bit data between GPRs and vector registers.

I have built previous versions of this patch with no regressions.  At the
moment, I have built a non-bootstrap build and ran the PowerPC tests, with no
regressions.  Assuming the bootstrap build that I've started has no
regressions, is it ok to install in the trunk?

[gcc]
2015-11-08  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* config/rs6000/constraints.md (we constraint): New constraint for
	64-bit power9 vector support.
	(wL constraint): New constraint for the element in a vector that
	can be addressed by the MFVSRLD instruction.

	* config/rs6000/rs6000.c (rs6000_debug_reg_global): Add ISA 3.0
	debugging.
	(rs6000_init_hard_regno_mode_ok): If ISA 3.0 and 64-bit, enable we
	constraint.  Disable the VSX<->GPR direct move helpers if we have
	the MFVSRLD and MTVSRDD instructions.
	(rs6000_secondary_reload_simple_move): Add support for doing
	vector direct moves directly without additional scratch registers
	if we have ISA 3.0 instructions.
	(rs6000_secondary_reload_direct_move): Update comments.
	(rs6000_output_move_128bit): Add support for ISA 3.0 vector
	instructions.

	* config/rs6000/vsx.md (vsx_mov<mode>): Add support for ISA 3.0
	direct move instructions.
	(vsx_movti_64bit): Likewise.
	(vsx_extract_<mode>): Likewise.

	* config/rs6000/rs6000.h (VECTOR_ELEMENT_MFVSRLD_64BIT): New
	macros for ISA 3.0 direct move instructions.
	(TARGET_DIRECT_MOVE_128): Likewise.

	* config/rs6000/rs6000.md (128-bit GPR splitters): Don't split a
	128-bit move that is a direct move between GPR and vector
	registers using ISA 3.0 direct move instructions.

	* doc/md.texi (RS/6000 constraints): Document we, wF, wG, wL
	constraints.  Update wa documentation to say not to use %x<n> on
	instructions that only take Altivec registers.

[gcc/testsuite]
2015-11-08  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* gcc.target/powerpc/direct-move-vector.c: New test for 128-bit
	vector direct move instructions.


-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH], Add power9 support to GCC, patch #2 (add modulus instructions)
  2015-11-09  0:36 ` [PATCH], Add power9 support to GCC, patch #2 (add modulus instructions) Michael Meissner
@ 2015-11-09 15:48   ` Segher Boessenkool
  2015-11-09 18:07     ` Michael Meissner
  2015-11-09 16:14   ` David Edelsohn
  2015-11-10  0:17   ` [PATCH], Add power9 support to GCC, patches #2-5 committed Michael Meissner
  2 siblings, 1 reply; 47+ messages in thread
From: Segher Boessenkool @ 2015-11-09 15:48 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, dje.gcc

Hi,

On Sun, Nov 08, 2015 at 07:36:16PM -0500, Michael Meissner wrote:
> [gcc/testsuite]
> 	* lib/target-supports.exp (check_p9vector_hw_available): Add
> 	checks for power9 availability.
> 	(check_effective_target_powerpc_p9vector_ok): Likewise.

It's probably better not to use this for modulo; it is confusing and if
you'll later need to untangle it it is much more work.

> +/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
> +/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */

Lose this line?  If Darwin cannot support modulo, the next line will
catch that.

+/* { dg-require-effective-target powerpc_p9vector_ok } */
> +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
> +/* { dg-options "-mcpu=power9 -O3" } */

Is -O3 needed?  Why won't -O2 work?

> +proc check_p9vector_hw_available { } {
> +    return [check_cached_effective_target p9vector_hw_available {
> +	# Some simulators are known to not support VSX/power8 instructions.
> +	# For now, disable on Darwin
> +	if { [istarget powerpc-*-eabi] || [istarget powerpc*-*-eabispe] || [istarget *-*-darwin*]} {

Long line.

> Index: gcc/config/rs6000/rs6000.md
> ===================================================================
> --- gcc/config/rs6000/rs6000.md	(revision 229972)
> +++ gcc/config/rs6000/rs6000.md	(working copy)
> @@ -2885,9 +2885,9 @@ (define_insn_and_split "*div<mode>3_sra_
>     (set_attr "cell_micro" "not")])
>  
>  (define_expand "mod<mode>3"
> -  [(use (match_operand:GPR 0 "gpc_reg_operand" ""))
> -   (use (match_operand:GPR 1 "gpc_reg_operand" ""))
> -   (use (match_operand:GPR 2 "reg_or_cint_operand" ""))]
> +  [(set (match_operand:GPR 0 "gpc_reg_operand" "")
> +	(mod:GPR (match_operand:GPR 1 "gpc_reg_operand" "")
> +		 (match_operand:GPR 2 "reg_or_cint_operand" "")))]

You could delete the empty constraint strings while you're at it.

> +;; On machines with modulo support, do a combined div/mod the old fashioned
> +;; method, since the multiply/subtract is faster than doing the mod instruction
> +;; after a divide.

You can instead have a "divmod" insn that is split to either of div, mod,
or div+mul+sub depending on which of the outputs is unused.  Peepholes
do not get all cases.

This can be a later improvement of course.


Segher

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH], Add power9 support to GCC, patch #3 (scalar count trailing zeros)
  2015-11-09  0:38 ` [PATCH], Add power9 support to GCC, patch #3 (scalar count trailing zeros) Michael Meissner
@ 2015-11-09 15:59   ` Segher Boessenkool
  2015-11-09 17:18     ` Michael Meissner
  2015-11-09 18:02   ` David Edelsohn
  1 sibling, 1 reply; 47+ messages in thread
From: Segher Boessenkool @ 2015-11-09 15:59 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, dje.gcc

On Sun, Nov 08, 2015 at 07:37:53PM -0500, Michael Meissner wrote:
> This patch adds support for scalar count trailing zeros instruction that is
> being added to ISA 3.0 (power9).

I bet you should change CTZ_DEFINED_VALUE_AT_ZERO as well.

> +(define_insn "ctz<mode>2_hw"
> +  [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
> +	(ctz:GPR (match_operand:GPR 1 "gpc_reg_operand" "r")))]
> +  "TARGET_CTZ"
> +  "cnttz<wd> %0,%1"
> +  [(set_attr "type" "cntlz")])

We should probably rename this attr value now.  "cntz" maybe?  Could be
later of course.


Segher

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH], Add power9 support to GCC, patch #1 (revised)
  2015-11-09  0:33 ` [PATCH], Add power9 support to GCC, patch #1 (revised) Michael Meissner
@ 2015-11-09 16:12   ` David Edelsohn
  2015-11-10 18:39   ` [PATCH], Add power9 support to GCC, patch #8 (add integer multiply/add) Michael Meissner
  1 sibling, 0 replies; 47+ messages in thread
From: David Edelsohn @ 2015-11-09 16:12 UTC (permalink / raw)
  To: Michael Meissner, GCC Patches

On Sun, Nov 8, 2015 at 4:33 PM, Michael Meissner
<meissner@linux.vnet.ibm.com> wrote:
> This is patch #1 that I revised.  I changed -mfusion-toc to -mtoc-fusion.  I
> changed the references to ISA 2.08 to 3.0.  I added two new debug switches for
> code in future patches that in undergoing development and is not ready to be on
> by default.
>
> I have done a bootstrap build on a little endian power8 system and there were
> no regressions in this patch.  Is it ok to install in the trunk?
>
> 2015-11-08  Michael Meissner  <meissner@linux.vnet.ibm.com>
>
>         * config/rs6000/rs6000.opt (-mpower9-fusion): Add new switches for
>         ISA 3.0 (power9).
>         (-mpower9-vector): Likewise.
>         (-mpower9-dform): Likewise.
>         (-mpower9-minmax): Likewise.
>         (-mtoc-fusion): Likewise.
>         (-mmodulo): Likewise.
>         (-mfloat128-hardware): Likewise.
>
>         * config/rs6000/rs6000-cpus.def (ISA_3_0_MASKS_SERVER): Add option
>         mask for ISA 3.0 (power9).
>         (POWERPC_MASKS): Add new ISA 3.0 switches.
>         (power9 cpu): Add power9 cpu.
>
>         * config/rs6000/rs6000.h (ASM_CPU_POWER9_SPEC): Add support for
>         power9.
>         (ASM_CPU_SPEC): Likewise.
>         (EXTRA_SPECS): Likewise.
>
>         * config/rs6000/rs6000-opts.h (enum processor_type): Add
>         PROCESSOR_POWER9.
>
>         * config/rs6000/rs6000.c (power9_cost): Initial cost setup for
>         power9.
>         (rs6000_debug_reg_global): Add support for power9 fusion.
>         (rs6000_setup_reg_addr_masks): Cache mode size.
>         (rs6000_option_override_internal): Until real power9 tuning is
>         added, use -mtune=power8 for -mcpu=power9.
>         (rs6000_setup_reg_addr_masks): Do not allow pre-increment,
>         pre-decrement, or pre-modify on SFmode/DFmode if we allow the use
>         of Altivec registers.
>         (rs6000_option_override_internal): Add support for ISA 3.0
>         switches.
>         (rs6000_loop_align): Add support for power9 cpu.
>         (rs6000_file_start): Likewise.
>         (rs6000_adjust_cost): Likewise.
>         (rs6000_issue_rate): Likewise.
>         (insn_must_be_first_in_group): Likewise.
>         (insn_must_be_last_in_group): Likewise.
>         (force_new_group): Likewise.
>         (rs6000_register_move_cost): Likewise.
>         (rs6000_opt_masks): Likewise.
>
>         * config/rs6000/rs6000.md (cpu attribute): Add power9.
>         * config/rs6000/rs6000-tables.opt: Regenerate.
>
>         * config/rs6000/rs6000-c.c (rs6000_target_modify_macros): Define
>         _ARCH_PWR9 if power9 support is available.
>
>         * config/rs6000/aix61.h (ASM_CPU_SPEC): Add power9.
>         * config/rs6000/aix53.h (ASM_CPU_SPEC): Likewise.
>
>         * configure.ac: Determine if the assembler supports the ISA 3.0
>         instructions.
>         * config.in (HAVE_AS_POWER9): Likewise.
>         * configure: Regenerate.
>
>         * doc/invoke.texi (RS/6000 and PowerPC Options): Document ISA 3.0
>         switches.

Okay.

Thanks, David

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH], Add power9 support to GCC, patch #2 (add modulus instructions)
  2015-11-09  0:36 ` [PATCH], Add power9 support to GCC, patch #2 (add modulus instructions) Michael Meissner
  2015-11-09 15:48   ` Segher Boessenkool
@ 2015-11-09 16:14   ` David Edelsohn
  2015-11-10  0:17   ` [PATCH], Add power9 support to GCC, patches #2-5 committed Michael Meissner
  2 siblings, 0 replies; 47+ messages in thread
From: David Edelsohn @ 2015-11-09 16:14 UTC (permalink / raw)
  To: Michael Meissner, GCC Patches

On Sun, Nov 8, 2015 at 4:36 PM, Michael Meissner
<meissner@linux.vnet.ibm.com> wrote:
> This is patch #2.  It adds support for the new modulus instructions that are
> being added in ISA 3.0 (power9):
>
> I have built this patch (along with patches #3 and #4) with a bootstrap build
> on a power8 little endian system.  There were no regressions in the test
> suite.  Is this patch ok to install in the trunk once patch #1 has been
> installed.
>
> [gcc]
> 2015-11-08  Michael Meissner  <meissner@linux.vnet.ibm.com>
>
>         * config/rs6000/rs6000.c (rs6000_rtx_costs): Update costs for
>         modulus instructions if we have hardware support.
>
>         * config/rs6000/rs6000.md (mod<mode>3): Add support for ISA 3.0
>         modulus instructions.
>         (umod<mode>3): Likewise.
>         (divmod peephole): Likewise.
>         (udivmod peephole): Likewise.
>
> [gcc/testsuite]
> 2015-11-08  Michael Meissner  <meissner@linux.vnet.ibm.com>
>
>         * lib/target-supports.exp (check_p9vector_hw_available): Add
>         checks for power9 availability.
>         (check_effective_target_powerpc_p9vector_ok): Likewise.
>         (check_vect_support_and_set_flags): Likewise.
>
>         * gcc.target/powerpc/mod-1.c: New test.
>         * gcc.target/powerpc/mod-2.c: Likewise.

This is okay, but let's wait for revised #3 since you tested 2, 3, 4 together.

Thanks, David

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH], Add power9 support to GCC, patch #4
  2015-11-09  0:39 ` [PATCH], Add power9 support to GCC, patch #4 Michael Meissner
@ 2015-11-09 16:29   ` Segher Boessenkool
  2015-11-09 17:27     ` Michael Meissner
  2015-11-09 18:03   ` David Edelsohn
  1 sibling, 1 reply; 47+ messages in thread
From: Segher Boessenkool @ 2015-11-09 16:29 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, dje.gcc

On Sun, Nov 08, 2015 at 07:39:14PM -0500, Michael Meissner wrote:
> +;; Pretend we have a memory form of extswsli until register allocation is done
> +;; so that we use LWZ to load the value from memory, instead of LWA.

We generate sign_extend loads for many cases where zero_extend would be
preferable.  We should deal with that generically, and then we can lose
this hack.

> +(define_insn_and_split "*ashdi3_extswsli_dot"

...

> +  if (REGNO (cr) == CR0_REGNO)
> +    {
> +      emit_insn (gen_ashdi3_extswsli_dot2 (dest, src2, shift, cr));
> +      DONE;
> +    }

s/dot2/dot/

> +/* { dg-final { scan-assembler     "extswsli\\. " } } */
> +/* { dg-final { scan-assembler     "lwz "         } } */
> +/* { dg-final { scan-assembler-not "lwa "         } } */

"lwa" is a nasty string to search for ("always").  You can write this as
{\mlwa\M} for more sanity.

> +/* { dg-final { scan-assembler-not "sldi "        } } */
> +/* { dg-final { scan-assembler-not "sldi\\. "     } } */

Similarly {\msldi\M} catches both.


Segher

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH], Add power9 support to GCC, patch #5 (ISA 3.0 fusion)
  2015-11-09  0:42 ` [PATCH], Add power9 support to GCC, patch #5 (ISA 3.0 fusion) Michael Meissner
@ 2015-11-09 17:16   ` Segher Boessenkool
  2015-11-09 17:34     ` Michael Meissner
  2015-11-09 18:57   ` David Edelsohn
  2015-11-14 22:58   ` Segher Boessenkool
  2 siblings, 1 reply; 47+ messages in thread
From: Segher Boessenkool @ 2015-11-09 17:16 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, dje.gcc

On Sun, Nov 08, 2015 at 07:42:04PM -0500, Michael Meissner wrote:
> -  /* Power8 currently will only do the fusion if the top 11 bits of the addis
> -     value are all 1's or 0's.  */
>    value = INTVAL (int_const);
>    if ((value & (HOST_WIDE_INT)0xffff) != 0)

Space after cast, like  (HOST_WIDE_INT) 0xffff  .

> +  /* Power8 currently will only do the fusion if the top 11 bits of the addis
> +     value are all 1's or 0's.  Ignore this restriction if we are testing
> +     advanced fusion.  */
> +  if (TARGET_P9_FUSION)
> +    return 1;

This comment seems out of date?

>  ;; Match a GPR load (lbz, lhz, lwz, ld) that uses a combined address in the
>  ;; memory field with both the addis and the memory offset.  Sign extension
>  ;; is not handled here, since lha and lwa are not fused.
> -(define_predicate "fusion_gpr_mem_combo"
> -  (match_code "mem,zero_extend")
> +;; With extended fusion, also match a FPR load (lfd, lfs) and float_extend

And here?

> --- gcc/config/rs6000/rs6000.c	(revision 229975)
> +++ gcc/config/rs6000/rs6000.c	(working copy)
> @@ -376,8 +376,18 @@ struct rs6000_reg_addr {
>    enum insn_code reload_fpr_gpr;	/* INSN to move from FPR to GPR.  */
>    enum insn_code reload_gpr_vsx;	/* INSN to move from GPR to VSX.  */
>    enum insn_code reload_vsx_gpr;	/* INSN to move from VSX to GPR.  */
> +  enum insn_code fusion_gpr_ld;		/* INSN for fusing gpr ADDIS/loads.  */
> +					/* INSNs for fusing addi with loads
> +					   or stores for each reg. class.  */					   
> +  enum insn_code fusion_addi_ld[(int)N_RELOAD_REG];
> +  enum insn_code fusion_addi_st[(int)N_RELOAD_REG];
> +					/* INSNs for fusing addis with loads
> +					   or stores for each reg. class.  */					   

Trailing tabs.

> +/* Return true if the peephole2 can combine a load/store involving a
> +   combination of an addis instruction and the memory operation.  This was
> +   added to the ISA 3.0 (power9) hardware.  */
> +
> +bool
> +fusion_p9_p (rtx addis_reg,		/* register set via addis.  */
> +	     rtx addis_value,		/* addis value.  */
> +	     rtx dest,			/* destination (memory or register). */
> +	     rtx src)			/* source (register or memory).  */

The function header comment should explain the params, after which you
can use the normal style for the function declaration itself.

> +(define_insn "*toc_fusionload_<mode>"
> +  [(set (match_operand:QHSI 0 "int_reg_operand" "=&b,??r")
> +	(match_operand:QHSI 1 "toc_fusion_mem_wrapped" "wG,wG"))
> +   (unspec [(const_int 0)] UNSPEC_FUSION_ADDIS)
> +   (use (match_operand:DI 2 "base_reg_operand" "r,r"))
> +   (clobber (match_scratch:DI 3 "=X,&b"))]
> +  "TARGET_TOC_FUSION_INT"

Do you need that "??r" alternative?  Same for the next define_insn.

Big patch, most looks good :-)


Segher

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH], Add power9 support to GCC, patch #3 (scalar count trailing zeros)
  2015-11-09 15:59   ` Segher Boessenkool
@ 2015-11-09 17:18     ` Michael Meissner
  2015-11-09 19:33       ` Segher Boessenkool
  0 siblings, 1 reply; 47+ messages in thread
From: Michael Meissner @ 2015-11-09 17:18 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Michael Meissner, gcc-patches, dje.gcc

On Mon, Nov 09, 2015 at 09:59:43AM -0600, Segher Boessenkool wrote:
> On Sun, Nov 08, 2015 at 07:37:53PM -0500, Michael Meissner wrote:
> > This patch adds support for scalar count trailing zeros instruction that is
> > being added to ISA 3.0 (power9).
> 
> I bet you should change CTZ_DEFINED_VALUE_AT_ZERO as well.
> 
> > +(define_insn "ctz<mode>2_hw"
> > +  [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
> > +	(ctz:GPR (match_operand:GPR 1 "gpc_reg_operand" "r")))]
> > +  "TARGET_CTZ"
> > +  "cnttz<wd> %0,%1"
> > +  [(set_attr "type" "cntlz")])
> 
> We should probably rename this attr value now.  "cntz" maybe?  Could be
> later of course.

I don't see a need to add another type attribute for count trailing zeros
unless count leading zeros has a different timing than count trailing zeros.
The cntlz attribute was added because in Power7 the CNTLZ instruction became a
2 cycle instruction, and we wanted to model this in power7.md (and hence cntlz
was split from the simple integer attribute).

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH], Add power9 support to GCC, patch #4
  2015-11-09 16:29   ` Segher Boessenkool
@ 2015-11-09 17:27     ` Michael Meissner
  2015-11-09 19:48       ` Segher Boessenkool
  0 siblings, 1 reply; 47+ messages in thread
From: Michael Meissner @ 2015-11-09 17:27 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Michael Meissner, gcc-patches, dje.gcc

On Mon, Nov 09, 2015 at 10:29:10AM -0600, Segher Boessenkool wrote:
> On Sun, Nov 08, 2015 at 07:39:14PM -0500, Michael Meissner wrote:
> > +;; Pretend we have a memory form of extswsli until register allocation is done
> > +;; so that we use LWZ to load the value from memory, instead of LWA.
> 
> We generate sign_extend loads for many cases where zero_extend would be
> preferable.  We should deal with that generically, and then we can lose
> this hack.

Well it would be nice in theory.  But since we don't have that generic pass, I
need to use the combiner to generate the instruction.

> > +(define_insn_and_split "*ashdi3_extswsli_dot"
> 
> ...
> 
> > +  if (REGNO (cr) == CR0_REGNO)
> > +    {
> > +      emit_insn (gen_ashdi3_extswsli_dot2 (dest, src2, shift, cr));
> > +      DONE;
> > +    }
> 
> s/dot2/dot/

No, it will endless recurse until there is a stack overflow if you use dot
(since it will call itself, generating the same pattern over and over again).

> > +/* { dg-final { scan-assembler     "extswsli\\. " } } */
> > +/* { dg-final { scan-assembler     "lwz "         } } */
> > +/* { dg-final { scan-assembler-not "lwa "         } } */
> 
> "lwa" is a nasty string to search for ("always").  You can write this as
> {\mlwa\M} for more sanity.
> 
> > +/* { dg-final { scan-assembler-not "sldi "        } } */
> > +/* { dg-final { scan-assembler-not "sldi\\. "     } } */
> 
> Similarly {\msldi\M} catches both.

Thanks.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH], Add power9 support to GCC, patch #5 (ISA 3.0 fusion)
  2015-11-09 17:16   ` Segher Boessenkool
@ 2015-11-09 17:34     ` Michael Meissner
  2015-11-09 19:57       ` Segher Boessenkool
  0 siblings, 1 reply; 47+ messages in thread
From: Michael Meissner @ 2015-11-09 17:34 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Michael Meissner, gcc-patches, dje.gcc

On Mon, Nov 09, 2015 at 11:16:27AM -0600, Segher Boessenkool wrote:
> On Sun, Nov 08, 2015 at 07:42:04PM -0500, Michael Meissner wrote:
> > -  /* Power8 currently will only do the fusion if the top 11 bits of the addis
> > -     value are all 1's or 0's.  */
> >    value = INTVAL (int_const);
> >    if ((value & (HOST_WIDE_INT)0xffff) != 0)
> 
> Space after cast, like  (HOST_WIDE_INT) 0xffff  .

Thanks.

> > +  /* Power8 currently will only do the fusion if the top 11 bits of the addis
> > +     value are all 1's or 0's.  Ignore this restriction if we are testing
> > +     advanced fusion.  */
> > +  if (TARGET_P9_FUSION)
> > +    return 1;
> 
> This comment seems out of date?

Yeah, when I first coded it when the fusion semantics were being nailed down, I
couldn't reference power9 in the branch which was kept on the FSF servers, so I
just called it advanced fusion.  I evidently missed a few places in doing the
merge to change the name.

> >  ;; Match a GPR load (lbz, lhz, lwz, ld) that uses a combined address in the
> >  ;; memory field with both the addis and the memory offset.  Sign extension
> >  ;; is not handled here, since lha and lwa are not fused.
> > -(define_predicate "fusion_gpr_mem_combo"
> > -  (match_code "mem,zero_extend")
> > +;; With extended fusion, also match a FPR load (lfd, lfs) and float_extend
> 
> And here?

Yes.

> > --- gcc/config/rs6000/rs6000.c	(revision 229975)
> > +++ gcc/config/rs6000/rs6000.c	(working copy)
> > @@ -376,8 +376,18 @@ struct rs6000_reg_addr {
> >    enum insn_code reload_fpr_gpr;	/* INSN to move from FPR to GPR.  */
> >    enum insn_code reload_gpr_vsx;	/* INSN to move from GPR to VSX.  */
> >    enum insn_code reload_vsx_gpr;	/* INSN to move from VSX to GPR.  */
> > +  enum insn_code fusion_gpr_ld;		/* INSN for fusing gpr ADDIS/loads.  */
> > +					/* INSNs for fusing addi with loads
> > +					   or stores for each reg. class.  */					   
> > +  enum insn_code fusion_addi_ld[(int)N_RELOAD_REG];
> > +  enum insn_code fusion_addi_st[(int)N_RELOAD_REG];
> > +					/* INSNs for fusing addis with loads
> > +					   or stores for each reg. class.  */					   
> 
> Trailing tabs.

Ok.

> > +/* Return true if the peephole2 can combine a load/store involving a
> > +   combination of an addis instruction and the memory operation.  This was
> > +   added to the ISA 3.0 (power9) hardware.  */
> > +
> > +bool
> > +fusion_p9_p (rtx addis_reg,		/* register set via addis.  */
> > +	     rtx addis_value,		/* addis value.  */
> > +	     rtx dest,			/* destination (memory or register). */
> > +	     rtx src)			/* source (register or memory).  */
> 
> The function header comment should explain the params, after which you
> can use the normal style for the function declaration itself.

Ok.

> > +(define_insn "*toc_fusionload_<mode>"
> > +  [(set (match_operand:QHSI 0 "int_reg_operand" "=&b,??r")
> > +	(match_operand:QHSI 1 "toc_fusion_mem_wrapped" "wG,wG"))
> > +   (unspec [(const_int 0)] UNSPEC_FUSION_ADDIS)
> > +   (use (match_operand:DI 2 "base_reg_operand" "r,r"))
> > +   (clobber (match_scratch:DI 3 "=X,&b"))]
> > +  "TARGET_TOC_FUSION_INT"
> 
> Do you need that "??r" alternative?  Same for the next define_insn.

Yes unfortunately.  The ??r catches the case where r0 is chosen.  R0 is not a
base register, and it can't be used for power8 gpr fusion (where you use the
value being loaded for the ADDIS instruction), but it can be used for power9
fusion (where the ADDIS must be adjancent, but it no longer has to be the
register being loaded).

> Big patch, most looks good :-)

Thanks.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH], Add power9 support to GCC, patch #3 (scalar count trailing zeros)
  2015-11-09  0:38 ` [PATCH], Add power9 support to GCC, patch #3 (scalar count trailing zeros) Michael Meissner
  2015-11-09 15:59   ` Segher Boessenkool
@ 2015-11-09 18:02   ` David Edelsohn
  1 sibling, 0 replies; 47+ messages in thread
From: David Edelsohn @ 2015-11-09 18:02 UTC (permalink / raw)
  To: Michael Meissner, GCC Patches

On Sun, Nov 8, 2015 at 4:37 PM, Michael Meissner
<meissner@linux.vnet.ibm.com> wrote:
> This patch adds support for scalar count trailing zeros instruction that is
> being added to ISA 3.0 (power9).
>
> I have built this patch (along with patches #2 and #4) with a bootstrap build
> on a power8 little endian system.  There were no regressions in the test
> suite.  Is this patch ok to install in the trunk once patch #1 has been
> installed.
>
> [gcc]
> 2015-11-08  Michael Meissner  <meissner@linux.vnet.ibm.com>
>
>         * config/rs6000/rs6000.c (rs6000_rtx_costs): Update costs for
>         count trailing zero instruction if we have hardware support.
>
>         * config/rs6000/rs6000.h (TARGET_CTZ): Add support for count
>         trailing zero instruction in ISA 3.0.
>         * config/rs6000/rs6000.c (ctz<mode>2): Likewise.
>         (ctz<mode>2_h): Likewise.
>
> [gcc/testsuite]
> 2015-11-08  Michael Meissner  <meissner@linux.vnet.ibm.com>
>
>         * gcc.target/powerpc/ctz-1.c: Add test for count trailing zero
>         instruciton support.
>         * gcc.target/powerpc/ctz-2.c: Likewise.

This is okay.  We can address the attribute at a later time if necessary.

Please re-check CTZ_DEFINED_VALUE_AT_ZERO.

Thanks, David

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH], Add power9 support to GCC, patch #4
  2015-11-09  0:39 ` [PATCH], Add power9 support to GCC, patch #4 Michael Meissner
  2015-11-09 16:29   ` Segher Boessenkool
@ 2015-11-09 18:03   ` David Edelsohn
  1 sibling, 0 replies; 47+ messages in thread
From: David Edelsohn @ 2015-11-09 18:03 UTC (permalink / raw)
  To: Michael Meissner, GCC Patches

On Sun, Nov 8, 2015 at 4:39 PM, Michael Meissner
<meissner@linux.vnet.ibm.com> wrote:
> This patch adds support for the EXTSWSLI instruction that is being added to
> PowerPC ISA 3.0 (power9).
>
> I have built this patch (along with patches #2 and #3) with a bootstrap build
> on a power8 little endian system.  There were no regressions in the test
> suite.  Is this patch ok to install in the trunk once patch #1 has been
> installed.
>
> [gcc]
> 2015-11-08  Michael Meissner  <meissner@linux.vnet.ibm.com>
>
>         * config/rs6000/predicates.md (u6bit_cint_operand): New
>         predicate, recognize 0..63.
>
>         * config/rs6000/rs6000.c (rs6000_rtx_costs): Adjust the costs if
>         the EXTSWSLI instruction is generated.
>
>         * config/rs6000/rs6000.h (TARGET_EXTSWSLI): Add support for ISA
>         3.0 EXTSWSLI instruction.
>         * config/rs6000/rs6000.md (ashdi3_extswsli): Likewise.
>         (ashdi3_extswsli_dot): Likewise.
>         (ashdi3_extswsli_dot2): Likewise.
>
> [gcc/testsuite]
> 2015-11-08  Michael Meissner  <meissner@linux.vnet.ibm.com>
>
>         * gcc.target/powerpc/extswsli-1.c: New file to test EXTSWSLI
>         instruction generation.
>         * gcc.target/powerpc/extswsli-2.c: Likewise.
>         * gcc.target/powerpc/extswsli-3.c: Likewise.

Okay.

Thanks, David

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH], Add power9 support to GCC, patch #2 (add modulus instructions)
  2015-11-09 15:48   ` Segher Boessenkool
@ 2015-11-09 18:07     ` Michael Meissner
  0 siblings, 0 replies; 47+ messages in thread
From: Michael Meissner @ 2015-11-09 18:07 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Michael Meissner, gcc-patches, dje.gcc

On Mon, Nov 09, 2015 at 09:48:50AM -0600, Segher Boessenkool wrote:
> Hi,
> 
> On Sun, Nov 08, 2015 at 07:36:16PM -0500, Michael Meissner wrote:
> > [gcc/testsuite]
> > 	* lib/target-supports.exp (check_p9vector_hw_available): Add
> > 	checks for power9 availability.
> > 	(check_effective_target_powerpc_p9vector_ok): Likewise.
> 
> It's probably better not to use this for modulo; it is confusing and if
> you'll later need to untangle it it is much more work.
> 
> > +/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
> > +/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
> 
> Lose this line?  If Darwin cannot support modulo, the next line will
> catch that.
> 
> +/* { dg-require-effective-target powerpc_p9vector_ok } */
> > +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
> > +/* { dg-options "-mcpu=power9 -O3" } */
> 
> Is -O3 needed?  Why won't -O2 work?

Just habit.

> > +proc check_p9vector_hw_available { } {
> > +    return [check_cached_effective_target p9vector_hw_available {
> > +	# Some simulators are known to not support VSX/power8 instructions.
> > +	# For now, disable on Darwin
> > +	if { [istarget powerpc-*-eabi] || [istarget powerpc*-*-eabispe] || [istarget *-*-darwin*]} {
> 
> Long line.

Cut and paste from other tests.

> > Index: gcc/config/rs6000/rs6000.md
> > ===================================================================
> > --- gcc/config/rs6000/rs6000.md	(revision 229972)
> > +++ gcc/config/rs6000/rs6000.md	(working copy)
> > @@ -2885,9 +2885,9 @@ (define_insn_and_split "*div<mode>3_sra_
> >     (set_attr "cell_micro" "not")])
> >  
> >  (define_expand "mod<mode>3"
> > -  [(use (match_operand:GPR 0 "gpc_reg_operand" ""))
> > -   (use (match_operand:GPR 1 "gpc_reg_operand" ""))
> > -   (use (match_operand:GPR 2 "reg_or_cint_operand" ""))]
> > +  [(set (match_operand:GPR 0 "gpc_reg_operand" "")
> > +	(mod:GPR (match_operand:GPR 1 "gpc_reg_operand" "")
> > +		 (match_operand:GPR 2 "reg_or_cint_operand" "")))]
> 
> You could delete the empty constraint strings while you're at it.
> 
> > +;; On machines with modulo support, do a combined div/mod the old fashioned
> > +;; method, since the multiply/subtract is faster than doing the mod instruction
> > +;; after a divide.
> 
> You can instead have a "divmod" insn that is split to either of div, mod,
> or div+mul+sub depending on which of the outputs is unused.  Peepholes
> do not get all cases.

Yes, though as I recall, I couldn't get it to do what I wanted, and moved on to
other targets.

> This can be a later improvement of course.

Yep.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH], Add power9 support to GCC, patch #5 (ISA 3.0 fusion)
  2015-11-09  0:42 ` [PATCH], Add power9 support to GCC, patch #5 (ISA 3.0 fusion) Michael Meissner
  2015-11-09 17:16   ` Segher Boessenkool
@ 2015-11-09 18:57   ` David Edelsohn
  2015-11-14 22:58   ` Segher Boessenkool
  2 siblings, 0 replies; 47+ messages in thread
From: David Edelsohn @ 2015-11-09 18:57 UTC (permalink / raw)
  To: Michael Meissner, GCC Patches

On Sun, Nov 8, 2015 at 4:42 PM, Michael Meissner
<meissner@linux.vnet.ibm.com> wrote:
> This patch adds support for new fusion forms in ISA 3.0 (power9).  In
> particular, ISA 3.0 can fuse GPR loads of R0, FPR loads, GPR stores, FPR
> stores, and some constant generation that ISA 2.07 (power8) could not
> generate.
>
> I have built this patch with a bootstrap build on a power8 little endian
> system.  There were no regressions in the test suite.  Is this patch ok to
> install in the trunk once patch #1 has been installed.
>
> [gcc]
> 2015-11-08  Michael Meissner  <meissner@linux.vnet.ibm.com>
>
>         * config/rs6000/constraints.md (wF constraint): New constraints
>         for power9/toc fusion.
>         (wG constraint): Likewise.
>
>         * config/rs6000/predicates.md (upper16_cint_operand): New
>         predicate for power9 and toc fusion.
>         (fpr_reg_operand): Likewise.
>         (toc_fusion_or_p9_reg_operand): Likewise.
>         (toc_fusion_mem_raw): Likewise.
>         (toc_fusion_mem_wrapped): Likewise.
>         (fusion_gpr_addis): If power9 fusion, allow fusion for a larger
>         address range.
>         (fusion_gpr_mem_combo): Delete, use fusion_addis_mem_combo_load
>         instead.
>         (fusion_addis_mem_combo_load): Add support for power9 fusion of
>         floating point loads, floating point stores, and gpr stores.
>         (fusion_addis_mem_combo_store): Likewise.
>         (fusion_offsettable_mem_operand): Likewise.
>
>         * config/rs6000/rs6000-protos.h (emit_fusion_addis): Add
>         declarations.
>         (emit_fusion_load_store): Likewise.
>         (fusion_p9_p): Likewise.
>         (expand_fusion_p9_load): Likewise.
>         (expand_fusion_p9_store): Likewise.
>         (emit_fusion_p9_load): Likewise.
>         (emit_fusion_p9_store): Likewise.
>         (fusion_wrap_memory_address): Likewise.
>
>         * config/rs6000/rs6000.c (struct rs6000_reg_addr): Add new
>         elements for power9 fusion.
>         (rs6000_debug_print_mode): Rework debug information to print more
>         information about fusion.
>         (rs6000_init_hard_regno_mode_ok): Setup for power9 fusion
>         support.
>         (rs6000_legitimate_address_p): Recognize toc fusion as a valid
>         offsettable memory address.
>         (emit_fusion_gpr_load): Move most of the code from
>         emit_fusion_gpr_load into emit_fusion-addis that handles both
>         power8 and power9 fusion.
>         (emit_fusion_addis): Likewise.
>         (emit_fusion_load_store): Likewise.
>         (fusion_wrap_memory_address): Add support for TOC fusion.
>         (fusion_split_address): Likewise.
>         (fusion_p9_p): Add support for power9 fusion.
>         (expand_fusion_p9_load): Likewise.
>         (expand_fusion_p9_store): Likewise.
>         (emit_fusion_p9_load): Likewise.
>         (emit_fusion_p9_store): Likewise.
>
>         * config/rs6000/rs6000.h (TARGET_TOC_FUSION_INT): New macros for
>         power9 fusion support.
>         (TARGET_TOC_FUSION_FP): Likewise.
>
>         * config/rs6000/rs6000.md (UNSPEC_FUSION_P9): New power9/toc
>         fusion unspecs.
>         (UNSPEC_FUSION_ADDIS): Likewise.
>         (QHSI mode iterator): New iterator for power9 fusion.
>         (GPR_FUSION): Likewise.
>         (FPR_FUSION): Likewise.
>         (power9 fusion splitter): New power9/toc fusion support.
>         (toc_fusionload_<mode>): Likewise.
>         (toc_fusionload_di): Likewise.
>         (fusion_gpr_load_<mode>): Update predicate function.
>         (power9 fusion peephole2s): New power9/toc fusion support.
>         (fusion_gpr_<P:mode>_<GPR_FUSION:mode>_load): Likewise.
>         (fusion_gpr_<P:mode>_<GPR_FUSION:mode>_store): Likewise.
>         (fusion_fpr_<P:mode>_<FPR_FUSION:mode>_load): Likewise.
>         (fusion_fpr_<P:mode>_<FPR_FUSION:mode>_store): Likewise.
>         (fusion_p9_<mode>_constant): Likewise.
>
> [gcc/testsuite]
> 2015-11-08  Michael Meissner  <meissner@linux.vnet.ibm.com>
>
>         * gcc.target/powerpc/fusion.c (fusion_vector): Move to fusion2.c
>         and allow the test on PowerPC LE.
>         * gcc.target/powerpc/fusion2.c (fusion_vector): Likewise.
>
>         * gcc.target/powerpc/fusion3.c: New file, test power9 fusion.

Okay, with the changes that you and Segher discussed.

Thanks, David

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH], Add power9 support to GCC, patch #6 (IEEE 128-bit hardware support)
  2015-11-09  0:45 ` [PATCH], Add power9 support to GCC, patch #6 (IEEE 128-bit hardware support) Michael Meissner
@ 2015-11-09 19:29   ` Segher Boessenkool
  2015-11-10  0:41   ` Joseph Myers
  2015-11-12 20:47   ` David Edelsohn
  2 siblings, 0 replies; 47+ messages in thread
From: Segher Boessenkool @ 2015-11-09 19:29 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, dje.gcc

On Sun, Nov 08, 2015 at 07:44:52PM -0500, Michael Meissner wrote:
> +/* Split a conversion from __float128 to an integer type into separate insns.
> +   OPERANDS points to the destination, source, and V2DI temporary
> +   register. CODE is either FIX or UNSIGNED_FIX.  */

dot space space

> +;; ISA 2.08 IEEE 128-bit floating point support.

3.0

> +(define_code_attr fix_fixuns	 [(fix   "fix")   (unsigned_fix   "fixuns")])
> +(define_code_attr float_floatuns [(float "float") (unsigned_float "floatuns")])

You could instead do an "uns" attribute so you would write fix<uns> etc.

> +;; 0 says do sign-extension, 1 says zero-extension
> +(define_insn "*ieee128_mtvsrw"
> +  [(set (match_operand:V2DI 0 "altivec_register_operand" "=v,v,v,v")
> +	(unspec:V2DI [(match_operand:SI 1 "nonimmediate_operand" "r,Z,r,Z")
> +		      (match_operand:SI 2 "const_0_to_1_operand" "O,O,n,n")]
> +		     UNSPEC_IEEE128_MOVE))]
> +  "TARGET_FLOAT128_HW"
> +  "@
> +   mtvsrwa %x0,%1
> +   lxsiwax %x0,%y1
> +   mtvsrwz %x0,%1
> +   lxsiwzx %x0,%y1"
> +  [(set_attr "type" "mffgpr,fpload,mffgpr,fpload")])

Tricky, is there no cleaner way to do this?


Segher

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH], Add power9 support to GCC, patch #3 (scalar count trailing zeros)
  2015-11-09 17:18     ` Michael Meissner
@ 2015-11-09 19:33       ` Segher Boessenkool
  0 siblings, 0 replies; 47+ messages in thread
From: Segher Boessenkool @ 2015-11-09 19:33 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, dje.gcc

On Mon, Nov 09, 2015 at 12:17:49PM -0500, Michael Meissner wrote:
> > > +  "TARGET_CTZ"
> > > +  "cnttz<wd> %0,%1"
> > > +  [(set_attr "type" "cntlz")])
> > 
> > We should probably rename this attr value now.  "cntz" maybe?  Could be
> > later of course.
> 
> I don't see a need to add another type attribute for count trailing zeros
> unless count leading zeros has a different timing than count trailing zeros.

I didn't suggest adding a "cnttz"; I suggested renaming "cntlz".  Maybe
"ctz" is better, that's what the target flag is as well.

Cheers,


Segher

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH], Add power9 support to GCC, patch #4
  2015-11-09 17:27     ` Michael Meissner
@ 2015-11-09 19:48       ` Segher Boessenkool
  0 siblings, 0 replies; 47+ messages in thread
From: Segher Boessenkool @ 2015-11-09 19:48 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, dje.gcc

On Mon, Nov 09, 2015 at 12:27:34PM -0500, Michael Meissner wrote:
> On Mon, Nov 09, 2015 at 10:29:10AM -0600, Segher Boessenkool wrote:
> > On Sun, Nov 08, 2015 at 07:39:14PM -0500, Michael Meissner wrote:
> > > +;; Pretend we have a memory form of extswsli until register allocation is done
> > > +;; so that we use LWZ to load the value from memory, instead of LWA.
> > 
> > We generate sign_extend loads for many cases where zero_extend would be
> > preferable.  We should deal with that generically, and then we can lose
> > this hack.
> 
> Well it would be nice in theory.  But since we don't have that generic pass, I
> need to use the combiner to generate the instruction.

Yes, it's for a todo list.  And it doesn't have to be a separate pass,
just a bit of tuning here or there.

This is a lot of complex work to treat a special case of a more general
problem.

> > > +(define_insn_and_split "*ashdi3_extswsli_dot"
> > 
> > ...
> > 
> > > +  if (REGNO (cr) == CR0_REGNO)
> > > +    {
> > > +      emit_insn (gen_ashdi3_extswsli_dot2 (dest, src2, shift, cr));
> > > +      DONE;
> > > +    }
> > 
> > s/dot2/dot/
> 
> No, it will endless recurse until there is a stack overflow if you use dot
> (since it will call itself, generating the same pattern over and over again).

Generating dot2 from dot does not make much sense, and dot2 calls itself
as well.  Are you sure?  Something is off here.

Cheers,


Segher

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH], Add power9 support to GCC, patch #5 (ISA 3.0 fusion)
  2015-11-09 17:34     ` Michael Meissner
@ 2015-11-09 19:57       ` Segher Boessenkool
  2015-11-09 21:11         ` David Edelsohn
  0 siblings, 1 reply; 47+ messages in thread
From: Segher Boessenkool @ 2015-11-09 19:57 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, dje.gcc

On Mon, Nov 09, 2015 at 12:34:20PM -0500, Michael Meissner wrote:
> > > +(define_insn "*toc_fusionload_<mode>"
> > > +  [(set (match_operand:QHSI 0 "int_reg_operand" "=&b,??r")
> > > +	(match_operand:QHSI 1 "toc_fusion_mem_wrapped" "wG,wG"))
> > > +   (unspec [(const_int 0)] UNSPEC_FUSION_ADDIS)
> > > +   (use (match_operand:DI 2 "base_reg_operand" "r,r"))
> > > +   (clobber (match_scratch:DI 3 "=X,&b"))]
> > > +  "TARGET_TOC_FUSION_INT"
> > 
> > Do you need that "??r" alternative?  Same for the next define_insn.
> 
> Yes unfortunately.  The ??r catches the case where r0 is chosen.  R0 is not a
> base register, and it can't be used for power8 gpr fusion (where you use the
> value being loaded for the ADDIS instruction), but it can be used for power9
> fusion (where the ADDIS must be adjancent, but it no longer has to be the
> register being loaded).

If you have only "b", r0 will not be chosen.  Does that help?  Or are
you generating this pattern from somewhere else where you put in r0?


Segher

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH], Add power9 support to GCC, patch #7 (direct move enhancements)
  2015-11-09  0:49 ` [PATCH], Add power9 support to GCC, patch #7 (direct move enhancements) Michael Meissner
@ 2015-11-09 20:00   ` Segher Boessenkool
  2015-11-09 21:06   ` Michael Meissner
  2015-11-12 20:43   ` David Edelsohn
  2 siblings, 0 replies; 47+ messages in thread
From: Segher Boessenkool @ 2015-11-09 20:00 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, dje.gcc

On Sun, Nov 08, 2015 at 07:48:56PM -0500, Michael Meissner wrote:
> This patch adds support for the new direct move instructions (MFVSRLD and
> MTVSRDD) that simplify moving 128-bit data between GPRs and vector registers.

You forgot to attach the patch :-)


Segher

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH], Add power9 support to GCC, patch #7 (direct move enhancements)
  2015-11-09  0:49 ` [PATCH], Add power9 support to GCC, patch #7 (direct move enhancements) Michael Meissner
  2015-11-09 20:00   ` Segher Boessenkool
@ 2015-11-09 21:06   ` Michael Meissner
  2015-11-12 20:43   ` David Edelsohn
  2 siblings, 0 replies; 47+ messages in thread
From: Michael Meissner @ 2015-11-09 21:06 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, dje.gcc

[-- Attachment #1: Type: text/plain, Size: 1851 bytes --]

I evidently forgot to attach the patch.

[gcc]
2015-11-08  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* config/rs6000/constraints.md (we constraint): New constraint for
	64-bit power9 vector support.
	(wL constraint): New constraint for the element in a vector that
	can be addressed by the MFVSRLD instruction.

	* config/rs6000/rs6000.c (rs6000_debug_reg_global): Add ISA 3.0
	debugging.
	(rs6000_init_hard_regno_mode_ok): If ISA 3.0 and 64-bit, enable we
	constraint.  Disable the VSX<->GPR direct move helpers if we have
	the MFVSRLD and MTVSRDD instructions.
	(rs6000_secondary_reload_simple_move): Add support for doing
	vector direct moves directly without additional scratch registers
	if we have ISA 3.0 instructions.
	(rs6000_secondary_reload_direct_move): Update comments.
	(rs6000_output_move_128bit): Add support for ISA 3.0 vector
	instructions.

	* config/rs6000/vsx.md (vsx_mov<mode>): Add support for ISA 3.0
	direct move instructions.
	(vsx_movti_64bit): Likewise.
	(vsx_extract_<mode>): Likewise.

	* config/rs6000/rs6000.h (VECTOR_ELEMENT_MFVSRLD_64BIT): New
	macros for ISA 3.0 direct move instructions.
	(TARGET_DIRECT_MOVE_128): Likewise.

	* config/rs6000/rs6000.md (128-bit GPR splitters): Don't split a
	128-bit move that is a direct move between GPR and vector
	registers using ISA 3.0 direct move instructions.

	* doc/md.texi (RS/6000 constraints): Document we, wF, wG, wL
	constraints.  Update wa documentation to say not to use %x<n> on
	instructions that only take Altivec registers.

[gcc/testsuite]
2015-11-08  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* gcc.target/powerpc/direct-move-vector.c: New test for 128-bit
	vector direct move instructions.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

[-- Attachment #2: gcc-power9.official-07b --]
[-- Type: text/plain, Size: 16104 bytes --]

Index: gcc/config/rs6000/constraints.md
===================================================================
--- gcc/config/rs6000/constraints.md	(revision 229976)
+++ gcc/config/rs6000/constraints.md	(working copy)
@@ -64,7 +64,8 @@ (define_register_constraint "wa" "rs6000
 (define_register_constraint "wd" "rs6000_constraints[RS6000_CONSTRAINT_wd]"
   "VSX vector register to hold vector double data or NO_REGS.")
 
-;; we is not currently used
+(define_register_constraint "we" "rs6000_constraints[RS6000_CONSTRAINT_we]"
+  "VSX register if the -mpower9-vector -m64 options were used or NO_REGS.")
 
 (define_register_constraint "wf" "rs6000_constraints[RS6000_CONSTRAINT_wf]"
   "VSX vector register to hold vector float data or NO_REGS.")
@@ -147,6 +148,12 @@ (define_memory_constraint "wG"
   "Memory operand suitable for TOC fusion memory references"
   (match_operand 0 "toc_fusion_mem_wrapped"))
 
+(define_constraint "wL"
+  "Int constant that is the element number mfvsrld accesses in a vector."
+  (and (match_code "const_int")
+       (and (match_test "TARGET_DIRECT_MOVE_128")
+	    (match_test "(ival == VECTOR_ELEMENT_MFVSRLD_64BIT)"))))
+
 ;; Lq/stq validates the address for load/store quad
 (define_memory_constraint "wQ"
   "Memory operand suitable for the load/store quad instructions"
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 229977)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -2575,6 +2575,10 @@ rs6000_debug_reg_global (void)
   if (TARGET_VSX)
     fprintf (stderr, DEBUG_FMT_D, "VSX easy 64-bit scalar element",
 	     (int)VECTOR_ELEMENT_SCALAR_64BIT);
+
+  if (TARGET_DIRECT_MOVE_128)
+    fprintf (stderr, DEBUG_FMT_D, "VSX easy 64-bit mfvsrld element",
+	     (int)VECTOR_ELEMENT_MFVSRLD_64BIT);
 }
 
 \f
@@ -2986,6 +2990,10 @@ rs6000_init_hard_regno_mode_ok (bool glo
 	rs6000_constraints[RS6000_CONSTRAINT_wp] = VSX_REGS;	/* TFmode  */
     }
 
+  /* Support for new direct moves.  */
+  if (TARGET_DIRECT_MOVE_128)
+    rs6000_constraints[RS6000_CONSTRAINT_we] = VSX_REGS;
+
   /* Set up the reload helper and direct move functions.  */
   if (TARGET_VSX || TARGET_ALTIVEC)
     {
@@ -3034,7 +3042,7 @@ rs6000_init_hard_regno_mode_ok (bool glo
 	      reg_addr[TImode].reload_load   = CODE_FOR_reload_ti_di_load;
 	    }
 
-	  if (TARGET_DIRECT_MOVE)
+	  if (TARGET_DIRECT_MOVE && !TARGET_DIRECT_MOVE_128)
 	    {
 	      reg_addr[TImode].reload_gpr_vsx    = CODE_FOR_reload_gpr_from_vsxti;
 	      reg_addr[V1TImode].reload_gpr_vsx  = CODE_FOR_reload_gpr_from_vsxv1ti;
@@ -18081,6 +18089,11 @@ rs6000_secondary_reload_simple_move (enu
 	  || (to_type == VSX_REG_TYPE && from_type == GPR_REG_TYPE)))
     return true;
 
+  else if (TARGET_DIRECT_MOVE_128 && size == 16
+	   && ((to_type == VSX_REG_TYPE && from_type == GPR_REG_TYPE)
+	       || (to_type == GPR_REG_TYPE && from_type == VSX_REG_TYPE)))
+    return true;
+
   else if (TARGET_MFPGPR && TARGET_POWERPC64 && size == 8
 	   && ((to_type == GPR_REG_TYPE && from_type == FPR_REG_TYPE)
 	       || (to_type == FPR_REG_TYPE && from_type == GPR_REG_TYPE)))
@@ -18094,7 +18107,7 @@ rs6000_secondary_reload_simple_move (enu
   return false;
 }
 
-/* Power8 helper function for rs6000_secondary_reload, handle all of the
+/* Direct move helper function for rs6000_secondary_reload, handle all of the
    special direct moves that involve allocating an extra register, return the
    insn code of the helper function if there is such a function or
    CODE_FOR_nothing if not.  */
@@ -18116,8 +18129,8 @@ rs6000_secondary_reload_direct_move (enu
       if (size == 16)
 	{
 	  /* Handle moving 128-bit values from GPRs to VSX point registers on
-	     power8 when running in 64-bit mode using XXPERMDI to glue the two
-	     64-bit values back together.  */
+	     ISA 2.07 (power8, power9) when running in 64-bit mode using
+	     XXPERMDI to glue the two 64-bit values back together.  */
 	  if (to_type == VSX_REG_TYPE && from_type == GPR_REG_TYPE)
 	    {
 	      cost = 3;			/* 2 mtvsrd's, 1 xxpermdi.  */
@@ -18125,7 +18138,7 @@ rs6000_secondary_reload_direct_move (enu
 	    }
 
 	  /* Handle moving 128-bit values from VSX point registers to GPRs on
-	     power8 when running in 64-bit mode using XXPERMDI to get access to the
+	     ISA 2.07 when running in 64-bit mode using XXPERMDI to get access to the
 	     bottom 64-bit value.  */
 	  else if (to_type == GPR_REG_TYPE && from_type == VSX_REG_TYPE)
 	    {
@@ -18153,7 +18166,7 @@ rs6000_secondary_reload_direct_move (enu
   if (TARGET_POWERPC64 && size == 16)
     {
       /* Handle moving 128-bit values from GPRs to VSX point registers on
-	 power8 when running in 64-bit mode using XXPERMDI to glue the two
+	 ISA 2.07 when running in 64-bit mode using XXPERMDI to glue the two
 	 64-bit values back together.  */
       if (to_type == VSX_REG_TYPE && from_type == GPR_REG_TYPE)
 	{
@@ -18162,7 +18175,7 @@ rs6000_secondary_reload_direct_move (enu
 	}
 
       /* Handle moving 128-bit values from VSX point registers to GPRs on
-	 power8 when running in 64-bit mode using XXPERMDI to get access to the
+	 ISA 2.07 when running in 64-bit mode using XXPERMDI to get access to the
 	 bottom 64-bit value.  */
       else if (to_type == GPR_REG_TYPE && from_type == VSX_REG_TYPE)
 	{
@@ -18174,8 +18187,8 @@ rs6000_secondary_reload_direct_move (enu
   else if (!TARGET_POWERPC64 && size == 8)
     {
       /* Handle moving 64-bit values from GPRs to floating point registers on
-	 power8 when running in 32-bit mode using FMRGOW to glue the two 32-bit
-	 values back together.  Altivec register classes must be handled
+	 ISA 2.07 when running in 32-bit mode using FMRGOW to glue the two
+	 32-bit values back together.  Altivec register classes must be handled
 	 specially since a different instruction is used, and the secondary
 	 reload support requires a single instruction class in the scratch
 	 register constraint.  However, right now TFmode is not allowed in
@@ -18202,7 +18215,7 @@ rs6000_secondary_reload_direct_move (enu
 
 /* Return whether a move between two register classes can be done either
    directly (simple move) or via a pattern that uses a single extra temporary
-   (using power8's direct move in this case.  */
+   (using ISA 2.07's direct move in this case.  */
 
 static bool
 rs6000_secondary_reload_move (enum rs6000_reg_type to_type,
@@ -19241,6 +19254,11 @@ rs6000_output_move_128bit (rtx operands[
 	  if (src_gpr_p)
 	    return "#";
 
+	  if (TARGET_DIRECT_MOVE_128 && src_vsx_p)
+	    return (WORDS_BIG_ENDIAN
+		    ? "mfvsrd %0,%x1\n\tmfvsrld %L0,%x1"
+		    : "mfvsrd %L0,%x1\n\tmfvsrld %0,%x1");
+
 	  else if (TARGET_VSX && TARGET_DIRECT_MOVE && src_vsx_p)
 	    return "#";
 	}
@@ -19250,6 +19268,11 @@ rs6000_output_move_128bit (rtx operands[
 	  if (src_vsx_p)
 	    return "xxlor %x0,%x1,%x1";
 
+	  else if (TARGET_DIRECT_MOVE_128 && src_gpr_p)
+	    return (WORDS_BIG_ENDIAN
+		    ? "mtvsrdd %x0,%1,%L1"
+		    : "mtvsrdd %x0,%L1,%1");
+
 	  else if (TARGET_DIRECT_MOVE && src_gpr_p)
 	    return "#";
 	}
Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md	(revision 229970)
+++ gcc/config/rs6000/vsx.md	(working copy)
@@ -760,31 +760,31 @@ (define_split
   "")
 
 (define_insn "*vsx_mov<mode>"
-  [(set (match_operand:VSX_M 0 "nonimmediate_operand" "=Z,<VSr>,<VSr>,?Z,?<VSa>,?<VSa>,wQ,?&r,??Y,??r,??r,<VSr>,?<VSa>,*r,v,wZ, v")
-	(match_operand:VSX_M 1 "input_operand" "<VSr>,Z,<VSr>,<VSa>,Z,<VSa>,r,wQ,r,Y,r,j,j,j,W,v,wZ"))]
+  [(set (match_operand:VSX_M 0 "nonimmediate_operand" "=Z,<VSr>,<VSr>,?Z,?<VSa>,?<VSa>,r,we,wQ,?&r,??Y,??r,??r,<VSr>,?<VSa>,*r,v,wZ,v")
+	(match_operand:VSX_M 1 "input_operand" "<VSr>,Z,<VSr>,<VSa>,Z,<VSa>,we,b,r,wQ,r,Y,r,j,j,j,W,v,wZ"))]
   "VECTOR_MEM_VSX_P (<MODE>mode)
    && (register_operand (operands[0], <MODE>mode) 
        || register_operand (operands[1], <MODE>mode))"
 {
   return rs6000_output_move_128bit (operands);
 }
-  [(set_attr "type" "vecstore,vecload,vecsimple,vecstore,vecload,vecsimple,load,store,store,load, *,vecsimple,vecsimple,*, *,vecstore,vecload")
-   (set_attr "length" "4,4,4,4,4,4,12,12,12,12,16,4,4,*,16,4,4")])
+  [(set_attr "type" "vecstore,vecload,vecsimple,vecstore,vecload,vecsimple,mffgpr,mftgpr,load,store,store,load, *,vecsimple,vecsimple,*, *,vecstore,vecload")
+   (set_attr "length" "4,4,4,4,4,4,8,4,12,12,12,12,16,4,4,*,16,4,4")])
 
 ;; Unlike other VSX moves, allow the GPRs even for reloading, since a normal
 ;; use of TImode is for unions.  However for plain data movement, slightly
 ;; favor the vector loads
 (define_insn "*vsx_movti_64bit"
-  [(set (match_operand:TI 0 "nonimmediate_operand" "=Z,wa,wa,wa,v,v,wZ,wQ,&r,Y,r,r,?r")
-	(match_operand:TI 1 "input_operand" "wa,Z,wa,O,W,wZ,v,r,wQ,r,Y,r,n"))]
+  [(set (match_operand:TI 0 "nonimmediate_operand" "=Z,wa,wa,wa,r,we,v,v,wZ,wQ,&r,Y,r,r,?r")
+	(match_operand:TI 1 "input_operand" "wa,Z,wa,O,we,b,W,wZ,v,r,wQ,r,Y,r,n"))]
   "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (TImode)
    && (register_operand (operands[0], TImode) 
        || register_operand (operands[1], TImode))"
 {
   return rs6000_output_move_128bit (operands);
 }
-  [(set_attr "type" "vecstore,vecload,vecsimple,vecsimple,vecsimple,vecstore,vecload,store,load,store,load,*,*")
-   (set_attr "length" "4,4,4,4,16,4,4,8,8,8,8,8,8")])
+  [(set_attr "type" "vecstore,vecload,vecsimple,vecsimple,mffgpr,mftgpr,vecsimple,vecstore,vecload,store,load,store,load,*,*")
+   (set_attr "length" "4,4,4,4,8,4,16,4,4,8,8,8,8,8,8")])
 
 (define_insn "*vsx_movti_32bit"
   [(set (match_operand:TI 0 "nonimmediate_operand" "=Z,wa,wa,wa,v, v,wZ,Q,Y,????r,????r,????r,r")
@@ -1909,11 +1909,11 @@ (define_expand "vsx_extract_<mode>"
 ;; Optimize cases were we can do a simple or direct move.
 ;; Or see if we can avoid doing the move at all
 (define_insn "*vsx_extract_<mode>_internal1"
-  [(set (match_operand:<VS_scalar> 0 "register_operand" "=d,<VS_64reg>,r")
+  [(set (match_operand:<VS_scalar> 0 "register_operand" "=d,<VS_64reg>,r,r")
 	(vec_select:<VS_scalar>
-	 (match_operand:VSX_D 1 "register_operand" "d,<VS_64reg>,<VS_64dm>")
+	 (match_operand:VSX_D 1 "register_operand" "d,<VS_64reg>,<VS_64dm>,<VS_64dm>")
 	 (parallel
-	  [(match_operand:QI 2 "vsx_scalar_64bit" "wD,wD,wD")])))]
+	  [(match_operand:QI 2 "vsx_scalar_64bit" "wD,wD,wD,wL")])))]
   "VECTOR_MEM_VSX_P (<MODE>mode) && TARGET_POWERPC64 && TARGET_DIRECT_MOVE"
 {
   int op0_regno = REGNO (operands[0]);
@@ -1923,14 +1923,16 @@ (define_insn "*vsx_extract_<mode>_intern
     return "nop";
 
   if (INT_REGNO_P (op0_regno))
-    return "mfvsrd %0,%x1";
+    return ((INTVAL (operands[2]) == VECTOR_ELEMENT_MFVSRLD_64BIT)
+	    ? "mfvsrdl %0,%x1"
+	    : "mfvsrd %0,%x1");
 
   if (FP_REGNO_P (op0_regno) && FP_REGNO_P (op1_regno))
     return "fmr %0,%1";
 
   return "xxlor %x0,%x1,%x1";
 }
-  [(set_attr "type" "fp,vecsimple,mftgpr")
+  [(set_attr "type" "fp,vecsimple,mftgpr,mftgpr")
    (set_attr "length" "4")])
 
 (define_insn "*vsx_extract_<mode>_internal2"
Index: gcc/config/rs6000/rs6000.h
===================================================================
--- gcc/config/rs6000/rs6000.h	(revision 229976)
+++ gcc/config/rs6000/rs6000.h	(working copy)
@@ -516,6 +516,10 @@ extern int rs6000_vector_align[];
    with scalar instructions.  */
 #define VECTOR_ELEMENT_SCALAR_64BIT	((BYTES_BIG_ENDIAN) ? 0 : 1)
 
+/* Element number of the 64-bit value in a 128-bit vector that can be accessed
+   with the ISA 3.0 MFVSRLD instructions.  */
+#define VECTOR_ELEMENT_MFVSRLD_64BIT	((BYTES_BIG_ENDIAN) ? 1 : 0)
+
 /* Alignment options for fields in structures for sub-targets following
    AIX-like ABI.
    ALIGN_POWER word-aligns FP doubles (default AIX ABI).
@@ -571,6 +575,8 @@ extern int rs6000_vector_align[];
 #define TARGET_XSCVDPSPN	(TARGET_DIRECT_MOVE || TARGET_P8_VECTOR)
 #define TARGET_XSCVSPDPN	(TARGET_DIRECT_MOVE || TARGET_P8_VECTOR)
 #define TARGET_VADDUQM		(TARGET_P8_VECTOR && TARGET_POWERPC64)
+#define TARGET_DIRECT_MOVE_128	(TARGET_P9_VECTOR && TARGET_DIRECT_MOVE \
+				 && TARGET_POWERPC64)
 
 /* Byte/char syncs were added as phased in for ISA 2.06B, but are not present
    in power7, so conditionalize them on p8 features.  TImode syncs need quad
@@ -1517,6 +1523,7 @@ enum r6000_reg_class_enum {
   RS6000_CONSTRAINT_v,		/* Altivec registers */
   RS6000_CONSTRAINT_wa,		/* Any VSX register */
   RS6000_CONSTRAINT_wd,		/* VSX register for V2DF */
+  RS6000_CONSTRAINT_we,		/* VSX register if ISA 3.0 vector. */
   RS6000_CONSTRAINT_wf,		/* VSX register for V4SF */
   RS6000_CONSTRAINT_wg,		/* FPR register for -mmfpgpr */
   RS6000_CONSTRAINT_wh,		/* FPR register for direct moves.  */
Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(revision 229977)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -7521,7 +7521,10 @@ (define_split
 	(match_operand:FMOVE128_GPR 1 "input_operand" ""))]
   "reload_completed
    && (int_reg_operand (operands[0], <MODE>mode)
-       || int_reg_operand (operands[1], <MODE>mode))"
+       || int_reg_operand (operands[1], <MODE>mode))
+   && (!TARGET_DIRECT_MOVE_128
+       || (!vsx_register_operand (operands[0], <MODE>mode)
+           && !vsx_register_operand (operands[1], <MODE>mode)))"
   [(pc)]
 { rs6000_split_multireg_move (operands[0], operands[1]); DONE; })
 
Index: gcc/doc/md.texi
===================================================================
--- gcc/doc/md.texi	(revision 229970)
+++ gcc/doc/md.texi	(working copy)
@@ -3121,9 +3121,28 @@ asm ("xvadddp %0,%1,%2" : "=wa" (v1) : "
 
 is not correct.
 
+If an instruction only takes Altivec registers, you do not want to use
+@code{%x<n>}.
+
+@smallexample
+asm ("xsaddqp %0,%1,%2" : "=v" (v1) : "v" (v2), "v" (v3));
+@end smallexample
+
+is correct because the @code{xsaddqp} instruction only takes Altivec
+registers, while:
+
+@smallexample
+asm ("xsaddqp %x0,%x1,%x2" : "=v" (v1) : "v" (v2), "v" (v3));
+@end smallexample
+
+is incorrect.
+
 @item wd
 VSX vector register to hold vector double data or NO_REGS.
 
+@item we
+VSX register if the -mpower9-vector -m64 options were used or NO_REGS.
+
 @item wf
 VSX vector register to hold vector float data or NO_REGS.
 
@@ -3187,6 +3206,16 @@ Floating point register if the LFIWZX in
 @item wD
 Int constant that is the element number of the 64-bit scalar in a vector.
 
+@item wF
+Memory operand suitable for power9 fusion load/stores.
+
+@item wG
+Memory operand suitable for TOC fusion memory references.
+
+@item wL
+Int constant that is the element number that the MFVSRLD instruction
+targets.
+
 @item wQ
 A memory address that will work with the @code{lq} and @code{stq}
 instructions.
Index: gcc/testsuite/gcc.target/powerpc/direct-move-vector.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/direct-move-vector.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/direct-move-vector.c	(revision 0)
@@ -0,0 +1,35 @@
+/* { dg-do compile { target { powerpc*-*-linux* && lp64 } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-skip-if "" { powerpc*-*-*spe* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-options "-mcpu=power9 -O2" } */
+
+/* Check code generation for direct move for long types.  */
+
+void
+test (vector double *p)
+{
+  vector double v1 = *p;
+  vector double v2;
+  vector double v3;
+  vector double v4;
+
+  /* Force memory -> FPR load.  */
+  __asm__ (" # reg %x0" : "+d" (v1));
+
+  /* force VSX -> GPR direct move.  */
+  v2 = v1;
+  __asm__ (" # reg %0" : "+r" (v2));
+
+  /* Force GPR -> Altivec direct move.  */
+  v3 = v2;
+  __asm__ (" # reg %x0" : "+v" (v3));
+  *p = v3;
+}
+
+/* { dg-final { scan-assembler "mfvsrd"  } } */
+/* { dg-final { scan-assembler "mfvsrld" } } */
+/* { dg-final { scan-assembler "mtvsrdd" } } */
+
+

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH], Add power9 support to GCC, patch #5 (ISA 3.0 fusion)
  2015-11-09 19:57       ` Segher Boessenkool
@ 2015-11-09 21:11         ` David Edelsohn
  2015-11-09 22:17           ` Michael Meissner
  0 siblings, 1 reply; 47+ messages in thread
From: David Edelsohn @ 2015-11-09 21:11 UTC (permalink / raw)
  To: Segher Boessenkool, Michael Meissner; +Cc: GCC Patches

On Mon, Nov 9, 2015 at 11:57 AM, Segher Boessenkool
<segher@kernel.crashing.org> wrote:
> On Mon, Nov 09, 2015 at 12:34:20PM -0500, Michael Meissner wrote:
>> > > +(define_insn "*toc_fusionload_<mode>"
>> > > +  [(set (match_operand:QHSI 0 "int_reg_operand" "=&b,??r")
>> > > + (match_operand:QHSI 1 "toc_fusion_mem_wrapped" "wG,wG"))
>> > > +   (unspec [(const_int 0)] UNSPEC_FUSION_ADDIS)
>> > > +   (use (match_operand:DI 2 "base_reg_operand" "r,r"))
>> > > +   (clobber (match_scratch:DI 3 "=X,&b"))]
>> > > +  "TARGET_TOC_FUSION_INT"
>> >
>> > Do you need that "??r" alternative?  Same for the next define_insn.
>>
>> Yes unfortunately.  The ??r catches the case where r0 is chosen.  R0 is not a
>> base register, and it can't be used for power8 gpr fusion (where you use the
>> value being loaded for the ADDIS instruction), but it can be used for power9
>> fusion (where the ADDIS must be adjancent, but it no longer has to be the
>> register being loaded).
>
> If you have only "b", r0 will not be chosen.  Does that help?  Or are
> you generating this pattern from somewhere else where you put in r0?

Mike,

What happens if you leave out the "r" alternative?  Does other code
explicitly generate that pattern with r0?

Thanks, David

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH], Add power9 support to GCC, patch #5 (ISA 3.0 fusion)
  2015-11-09 21:11         ` David Edelsohn
@ 2015-11-09 22:17           ` Michael Meissner
  2015-11-09 22:33             ` David Edelsohn
  0 siblings, 1 reply; 47+ messages in thread
From: Michael Meissner @ 2015-11-09 22:17 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Segher Boessenkool, Michael Meissner, GCC Patches

On Mon, Nov 09, 2015 at 01:11:41PM -0800, David Edelsohn wrote:
> On Mon, Nov 9, 2015 at 11:57 AM, Segher Boessenkool
> <segher@kernel.crashing.org> wrote:
> > On Mon, Nov 09, 2015 at 12:34:20PM -0500, Michael Meissner wrote:
> >> > > +(define_insn "*toc_fusionload_<mode>"
> >> > > +  [(set (match_operand:QHSI 0 "int_reg_operand" "=&b,??r")
> >> > > + (match_operand:QHSI 1 "toc_fusion_mem_wrapped" "wG,wG"))
> >> > > +   (unspec [(const_int 0)] UNSPEC_FUSION_ADDIS)
> >> > > +   (use (match_operand:DI 2 "base_reg_operand" "r,r"))
> >> > > +   (clobber (match_scratch:DI 3 "=X,&b"))]
> >> > > +  "TARGET_TOC_FUSION_INT"
> >> >
> >> > Do you need that "??r" alternative?  Same for the next define_insn.
> >>
> >> Yes unfortunately.  The ??r catches the case where r0 is chosen.  R0 is not a
> >> base register, and it can't be used for power8 gpr fusion (where you use the
> >> value being loaded for the ADDIS instruction), but it can be used for power9
> >> fusion (where the ADDIS must be adjancent, but it no longer has to be the
> >> register being loaded).
> >
> > If you have only "b", r0 will not be chosen.  Does that help?  Or are
> > you generating this pattern from somewhere else where you put in r0?
> 
> Mike,
> 
> What happens if you leave out the "r" alternative?  Does other code
> explicitly generate that pattern with r0?

Sometimes, one of the passes after reload (usually -fgcse-after-reload) decides
to redo the register allocation, and I would see a failure in building things
like Spec 2006.  I have tried not putting the "r" in there, or using
base_reg_operand instead of gpc_reg_operand, but I still got failures.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH], Add power9 support to GCC, patch #5 (ISA 3.0 fusion)
  2015-11-09 22:17           ` Michael Meissner
@ 2015-11-09 22:33             ` David Edelsohn
  0 siblings, 0 replies; 47+ messages in thread
From: David Edelsohn @ 2015-11-09 22:33 UTC (permalink / raw)
  To: Michael Meissner, Segher Boessenkool, GCC Patches

On Mon, Nov 9, 2015 at 2:17 PM, Michael Meissner
<meissner@linux.vnet.ibm.com> wrote:
> On Mon, Nov 09, 2015 at 01:11:41PM -0800, David Edelsohn wrote:
>> On Mon, Nov 9, 2015 at 11:57 AM, Segher Boessenkool
>> <segher@kernel.crashing.org> wrote:
>> > On Mon, Nov 09, 2015 at 12:34:20PM -0500, Michael Meissner wrote:
>> >> > > +(define_insn "*toc_fusionload_<mode>"
>> >> > > +  [(set (match_operand:QHSI 0 "int_reg_operand" "=&b,??r")
>> >> > > + (match_operand:QHSI 1 "toc_fusion_mem_wrapped" "wG,wG"))
>> >> > > +   (unspec [(const_int 0)] UNSPEC_FUSION_ADDIS)
>> >> > > +   (use (match_operand:DI 2 "base_reg_operand" "r,r"))
>> >> > > +   (clobber (match_scratch:DI 3 "=X,&b"))]
>> >> > > +  "TARGET_TOC_FUSION_INT"
>> >> >
>> >> > Do you need that "??r" alternative?  Same for the next define_insn.
>> >>
>> >> Yes unfortunately.  The ??r catches the case where r0 is chosen.  R0 is not a
>> >> base register, and it can't be used for power8 gpr fusion (where you use the
>> >> value being loaded for the ADDIS instruction), but it can be used for power9
>> >> fusion (where the ADDIS must be adjancent, but it no longer has to be the
>> >> register being loaded).
>> >
>> > If you have only "b", r0 will not be chosen.  Does that help?  Or are
>> > you generating this pattern from somewhere else where you put in r0?
>>
>> Mike,
>>
>> What happens if you leave out the "r" alternative?  Does other code
>> explicitly generate that pattern with r0?
>
> Sometimes, one of the passes after reload (usually -fgcse-after-reload) decides
> to redo the register allocation, and I would see a failure in building things
> like Spec 2006.  I have tried not putting the "r" in there, or using
> base_reg_operand instead of gpc_reg_operand, but I still got failures.

This seems like a bug in those other passes that should be tracked down.

Thanks, David

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH], Add power9 support to GCC, patches #2-5 committed
  2015-11-09  0:36 ` [PATCH], Add power9 support to GCC, patch #2 (add modulus instructions) Michael Meissner
  2015-11-09 15:48   ` Segher Boessenkool
  2015-11-09 16:14   ` David Edelsohn
@ 2015-11-10  0:17   ` Michael Meissner
  2015-11-10  0:20     ` Michael Meissner
  2 siblings, 1 reply; 47+ messages in thread
From: Michael Meissner @ 2015-11-10  0:17 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, dje.gcc

[-- Attachment #1: Type: text/plain, Size: 6032 bytes --]

David said I could commit patches 2-5 after fixing the points that Segher
Boessenkool raised.  I think I addressed most of the points.  If not, let me
know.  I now recall, I have not yet fixed the 'advance fusion' vs. 'power9
fusion' wording in comments, and I will get to that shortly.

I updated the tests to have new tests for the integer power9 instructions
(modulus, count trailing 0's, extswsli) vs. the power9 vector instructions.  I
added new tests for both float128 via software emulation and via power9
instructions.

I updated CTZ_DEFINED_VALUE_AT_ZERO to be 32/64 depending on whether you are
running in 32/64-bit mode.

I removed the empty constraints from the mod define_expand.

Inside of ashdi3_extswsli_dot, if we had split the move and we need to re-issue
the instruction, it calls ashdi3_extswsli_dot instead of ashdi3_extswsli_dot2.

I'm including the patch file for the changes I checked in.

[gcc]
2015-11-08  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* config/rs6000/constraints.md (wF constraint): New constraints
	for power9/toc fusion.
	(wG constraint): Likewise.

	* config/rs6000/predicates.md (u6bit_cint_operand): New
	predicate, recognize 0..63.
	(upper16_cint_operand): New predicate for power9 and toc fusion.
	(fpr_reg_operand): Likewise.
	(toc_fusion_or_p9_reg_operand): Likewise.
	(toc_fusion_mem_raw): Likewise.
	(toc_fusion_mem_wrapped): Likewise.
	(fusion_gpr_addis): If power9 fusion, allow fusion for a larger
	address range.
	(fusion_gpr_mem_combo): Delete, use fusion_addis_mem_combo_load
	instead.
	(fusion_addis_mem_combo_load): Add support for power9 fusion of
	floating point loads, floating point stores, and gpr stores.
	(fusion_addis_mem_combo_store): Likewise.
	(fusion_offsettable_mem_operand): Likewise.

	* config/rs6000/rs6000-protos.h (emit_fusion_addis): Add
	declarations.
	(emit_fusion_load_store): Likewise.
	(fusion_p9_p): Likewise.
	(expand_fusion_p9_load): Likewise.
	(expand_fusion_p9_store): Likewise.
	(emit_fusion_p9_load): Likewise.
	(emit_fusion_p9_store): Likewise.
	(fusion_wrap_memory_address): Likewise.

	* config/rs6000/rs6000.c (struct rs6000_reg_addr): Add new
	elements for power9 fusion.
	(rs6000_debug_print_mode): Rework debug information to print more
	information about fusion.
	(rs6000_init_hard_regno_mode_ok): Setup for power9 fusion
	support.
	(rs6000_legitimate_address_p): Recognize toc fusion as a valid
	offsettable memory address.
	(rs6000_rtx_costs): Update costs for new ISA 3.0 instructions.
	(emit_fusion_gpr_load): Move most of the code from
	emit_fusion_gpr_load into emit_fusion-addis that handles both
	power8 and power9 fusion.
	(emit_fusion_addis): Likewise.
	(emit_fusion_load_store): Likewise.
	(fusion_wrap_memory_address): Add support for TOC fusion.
	(fusion_split_address): Likewise.
	(fusion_p9_p): Add support for power9 fusion.
	(expand_fusion_p9_load): Likewise.
	(expand_fusion_p9_store): Likewise.
	(emit_fusion_p9_load): Likewise.
	(emit_fusion_p9_store): Likewise.

	* config/rs6000/rs6000.h (TARGET_EXTSWSLI): Macros for support for
	new instructions in ISA 3.0.
	(TARGET_CTZ): Likewise.
	(TARGET_TOC_FUSION_INT): Macros for power9 fusion support.
	(TARGET_TOC_FUSION_FP): Likewise.

	* config/rs6000/rs6000.md (UNSPEC_FUSION_P9): New power9/toc
	fusion unspecs.
	(UNSPEC_FUSION_ADDIS): Likewise.
	(QHSI mode iterator): New iterator for power9 fusion.
	(GPR_FUSION): Likewise.
	(FPR_FUSION): Likewise.
	(mod<mode>3): Add support for ISA 3.0
	modulus instructions.
	(umod<mode>3): Likewise.
	(divmod peephole): Likewise.
	(udivmod peephole): Likewise.
	(ctz<mode>2): Add support for ISA 3.0 count trailing zeros scalar
	instructions.
	(ctz<mode>2_h): Likewise.
	(ashdi3_extswsli): Add support for ISA 3.0 EXTSWSLI instruction.
	(ashdi3_extswsli_dot): Likewise.
	(ashdi3_extswsli_dot2): Likewise.
	(power9 fusion splitter): New power9/toc fusion support.
	(toc_fusionload_<mode>): Likewise.
	(toc_fusionload_di): Likewise.
	(fusion_gpr_load_<mode>): Update predicate function.
	(power9 fusion peephole2s): New power9/toc fusion support.
	(fusion_gpr_<P:mode>_<GPR_FUSION:mode>_load): Likewise.
	(fusion_gpr_<P:mode>_<GPR_FUSION:mode>_store): Likewise.
	(fusion_fpr_<P:mode>_<FPR_FUSION:mode>_load): Likewise.
	(fusion_fpr_<P:mode>_<FPR_FUSION:mode>_store): Likewise.
	(fusion_p9_<mode>_constant): Likewise.

[gcc/testsuite]
2015-11-08  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* lib/target-supports.exp (check_p8vector_hw_available): Split
	long line.
	(check_vsx_hw_available): Likewise.
	(check_p9vector_hw_available): Add new checks for ISA 3.0 hardware
	support and for PowerPC float128 support.
	(check_p9modulo_hw_available): Likewise.
	(check_ppc_float128_sw_available): Likewise.
	(check_ppc_float128_hw_available): Likewise.
	(check_effective_target_powerpc_p9vector_ok): Likewise.
	(check_effective_target_powerpc_p9modulo_ok): Likewise.
	(check_effective_target_powerpc_float128_sw_ok): Likewise.
	(check_effective_target_powerpc_float128_hw_ok): Likewise.
	(is-effective-target): Add new PowerPc targets.
	(is-effective-target-keyword): Likewise.
	(check_vect_support_and_set_flags): If we have ISA 3.0 vector
	instructions, use it.

	* gcc.target/powerpc/mod-1.c: New test for ISA 3.0 instructions.
	* gcc.target/powerpc/mod-2.c: Likewise.
	* gcc.target/powerpc/ctz-1.c: Likewise.
	* gcc.target/powerpc/ctz-2.c: Likewise.
	* gcc.target/powerpc/extswsli-1.c: Likewise.
	* gcc.target/powerpc/extswsli-2.c: Likewise.
	* gcc.target/powerpc/extswsli-3.c: Likewise.

	* gcc.target/powerpc/fusion.c (fusion_vector): Move to fusion2.c
	and allow the test on PowerPC LE.
	* gcc.target/powerpc/fusion2.c (fusion_vector): Likewise.
	* gcc.target/powerpc/fusion3.c: New file, test power9 fusion.

	* gcc.target/powerpc/float128-call.c: Use powerpc_float128_sw_ok
	check instead of powerpc_vsx_ok.
	* gcc.target/powerpc/float128-mix.c: Likewise.




-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

[-- Attachment #2: gcc-power9.official-02-05b --]
[-- Type: text/plain, Size: 77791 bytes --]

Index: gcc/config/rs6000/constraints.md
===================================================================
--- gcc/config/rs6000/constraints.md	(revision 230064)
+++ gcc/config/rs6000/constraints.md	(working copy)
@@ -137,6 +137,16 @@ (define_constraint "wD"
   (and (match_code "const_int")
        (match_test "TARGET_VSX && (ival == VECTOR_ELEMENT_SCALAR_64BIT)")))
 
+;; Extended fusion store
+(define_memory_constraint "wF"
+  "Memory operand suitable for power9 fusion load/stores"
+  (match_operand 0 "fusion_addis_mem_combo_load"))
+
+;; Fusion gpr load.
+(define_memory_constraint "wG"
+  "Memory operand suitable for TOC fusion memory references"
+  (match_operand 0 "toc_fusion_mem_wrapped"))
+
 ;; Lq/stq validates the address for load/store quad
 (define_memory_constraint "wQ"
   "Memory operand suitable for the load/store quad instructions"
Index: gcc/config/rs6000/predicates.md
===================================================================
--- gcc/config/rs6000/predicates.md	(revision 230064)
+++ gcc/config/rs6000/predicates.md	(working copy)
@@ -142,6 +142,11 @@ (define_predicate "u5bit_cint_operand"
   (and (match_code "const_int")
        (match_test "INTVAL (op) >= 0 && INTVAL (op) <= 31")))
 
+;; Return 1 if op is a unsigned 6-bit constant integer.
+(define_predicate "u6bit_cint_operand"
+  (and (match_code "const_int")
+       (match_test "INTVAL (op) >= 0 && INTVAL (op) <= 63")))
+
 ;; Return 1 if op is a signed 8-bit constant integer.
 ;; Integer multiplication complete more quickly
 (define_predicate "s8bit_cint_operand"
@@ -163,6 +168,12 @@ (define_predicate "u_short_cint_operand"
   (and (match_code "const_int")
        (match_test "satisfies_constraint_K (op)")))
 
+;; Return 1 if op is a constant integer that is a signed 16-bit constant
+;; shifted left 16 bits
+(define_predicate "upper16_cint_operand"
+  (and (match_code "const_int")
+       (match_test "satisfies_constraint_L (op)")))
+
 ;; Return 1 if op is a constant integer that cannot fit in a signed D field.
 (define_predicate "non_short_cint_operand"
   (and (match_code "const_int")
@@ -271,6 +282,70 @@ (define_predicate "base_reg_operand"
   return (REGNO (op) != FIRST_GPR_REGNO);
 })
 
+
+;; Return true if this is a traditional floating point register
+(define_predicate "fpr_reg_operand"
+  (match_code "reg,subreg")
+{
+  HOST_WIDE_INT r;
+
+  if (GET_CODE (op) == SUBREG)
+    op = SUBREG_REG (op);
+
+  if (!REG_P (op))
+    return 0;
+
+  r = REGNO (op);
+  if (r >= FIRST_PSEUDO_REGISTER)
+    return 1;
+
+  return FP_REGNO_P (r);
+})
+
+;; Return true if this is a register that can has D-form addressing (GPR and
+;; traditional FPR registers for scalars).  ISA 3.0 (power9) adds D-form
+;; addressing for scalars in Altivec registers.
+;;
+;; If this is a pseudo only allow for GPR fusion in power8.  If we have the
+;; power9 fusion allow the floating point types.
+(define_predicate "toc_fusion_or_p9_reg_operand"
+  (match_code "reg,subreg")
+{
+  HOST_WIDE_INT r;
+  bool gpr_p = (mode == QImode || mode == HImode || mode == SImode
+		|| mode == SFmode
+		|| (TARGET_POWERPC64 && (mode == DImode || mode == DFmode)));
+  bool fpr_p = (TARGET_P9_FUSION
+		&& (mode == DFmode || mode == SFmode
+		    || (TARGET_POWERPC64 && mode == DImode)));
+  bool vmx_p = (TARGET_P9_FUSION && TARGET_P9_VECTOR
+		&& (mode == DFmode || mode == SFmode));
+
+  if (!TARGET_P8_FUSION)
+    return 0;
+
+  if (GET_CODE (op) == SUBREG)
+    op = SUBREG_REG (op);
+
+  if (!REG_P (op))
+    return 0;
+
+  r = REGNO (op);
+  if (r >= FIRST_PSEUDO_REGISTER)
+    return (gpr_p || fpr_p || vmx_p);
+
+  if (INT_REGNO_P (r))
+    return gpr_p;
+
+  if (FP_REGNO_P (r))
+    return fpr_p;
+
+  if (ALTIVEC_REGNO_P (r))
+    return vmx_p;
+
+  return 0;
+})
+
 ;; Return 1 if op is a HTM specific SPR register.
 (define_predicate "htm_spr_reg_operand"
   (match_operand 0 "register_operand")
@@ -1598,6 +1673,35 @@ (define_predicate "small_toc_ref"
   return GET_CODE (op) == UNSPEC && XINT (op, 1) == UNSPEC_TOCREL;
 })
 
+;; Match the TOC memory operand that can be fused with an addis instruction.
+;; This is used in matching a potential fused address before register
+;; allocation.
+(define_predicate "toc_fusion_mem_raw"
+  (match_code "mem")
+{
+  if (!TARGET_TOC_FUSION_INT || !can_create_pseudo_p ())
+    return false;
+
+  return small_toc_ref (XEXP (op, 0), Pmode);
+})
+
+;; Match the memory operand that has been fused with an addis instruction and
+;; wrapped inside of an (unspec [...] UNSPEC_FUSION_ADDIS) wrapper.
+(define_predicate "toc_fusion_mem_wrapped"
+  (match_code "mem")
+{
+  rtx addr;
+
+  if (!TARGET_TOC_FUSION_INT)
+    return false;
+
+  if (!MEM_P (op))
+    return false;
+
+  addr = XEXP (op, 0);
+  return (GET_CODE (addr) == UNSPEC && XINT (addr, 1) == UNSPEC_FUSION_ADDIS);
+})
+
 ;; Match the first insn (addis) in fusing the combination of addis and loads to
 ;; GPR registers on power8.
 (define_predicate "fusion_gpr_addis"
@@ -1620,8 +1724,6 @@ (define_predicate "fusion_gpr_addis"
   else
     return 0;
 
-  /* Power8 currently will only do the fusion if the top 11 bits of the addis
-     value are all 1's or 0's.  */
   value = INTVAL (int_const);
   if ((value & (HOST_WIDE_INT)0xffff) != 0)
     return 0;
@@ -1629,6 +1731,12 @@ (define_predicate "fusion_gpr_addis"
   if ((value & (HOST_WIDE_INT)0xffff0000) == 0)
     return 0;
 
+  /* Power8 currently will only do the fusion if the top 11 bits of the addis
+     value are all 1's or 0's.  Ignore this restriction if we are testing
+     advanced fusion.  */
+  if (TARGET_P9_FUSION)
+    return 1;
+
   return (IN_RANGE (value >> 16, -32, 31));
 })
 
@@ -1694,13 +1802,14 @@ (define_predicate "fusion_gpr_mem_load"
 ;; Match a GPR load (lbz, lhz, lwz, ld) that uses a combined address in the
 ;; memory field with both the addis and the memory offset.  Sign extension
 ;; is not handled here, since lha and lwa are not fused.
-(define_predicate "fusion_gpr_mem_combo"
-  (match_code "mem,zero_extend")
+;; With extended fusion, also match a FPR load (lfd, lfs) and float_extend
+(define_predicate "fusion_addis_mem_combo_load"
+  (match_code "mem,zero_extend,float_extend")
 {
   rtx addr, base, offset;
 
-  /* Handle zero extend.  */
-  if (GET_CODE (op) == ZERO_EXTEND)
+  /* Handle zero/float extend.  */
+  if (GET_CODE (op) == ZERO_EXTEND || GET_CODE (op) == FLOAT_EXTEND)
     {
       op = XEXP (op, 0);
       mode = GET_MODE (op);
@@ -1721,6 +1830,71 @@ (define_predicate "fusion_gpr_mem_combo"
 	return 0;
       break;
 
+    case SFmode:
+    case DFmode:
+      if (!TARGET_P9_FUSION)
+	return 0;
+      break;
+
+    default:
+      return 0;
+    }
+
+  addr = XEXP (op, 0);
+  if (GET_CODE (addr) != PLUS && GET_CODE (addr) != LO_SUM)
+    return 0;
+
+  base = XEXP (addr, 0);
+  if (!fusion_gpr_addis (base, GET_MODE (base)))
+    return 0;
+
+  offset = XEXP (addr, 1);
+  if (GET_CODE (addr) == PLUS)
+    return satisfies_constraint_I (offset);
+
+  else if (GET_CODE (addr) == LO_SUM)
+    {
+      if (TARGET_XCOFF || (TARGET_ELF && TARGET_POWERPC64))
+	return small_toc_ref (offset, GET_MODE (offset));
+
+      else if (TARGET_ELF && !TARGET_POWERPC64)
+	return CONSTANT_P (offset);
+    }
+
+  return 0;
+})
+
+;; Like fusion_addis_mem_combo_load, but for stores
+(define_predicate "fusion_addis_mem_combo_store"
+  (match_code "mem")
+{
+  rtx addr, base, offset;
+
+  if (!MEM_P (op) || !TARGET_P9_FUSION)
+    return 0;
+
+  switch (mode)
+    {
+    case QImode:
+    case HImode:
+    case SImode:
+      break;
+
+    case DImode:
+      if (!TARGET_POWERPC64)
+	return 0;
+      break;
+
+    case SFmode:
+      if (!TARGET_SF_FPR)
+	return 0;
+      break;
+
+    case DFmode:
+      if (!TARGET_DF_FPR)
+	return 0;
+      break;
+
     default:
       return 0;
     }
@@ -1748,3 +1922,20 @@ (define_predicate "fusion_gpr_mem_combo"
 
   return 0;
 })
+
+;; Return true if the operand is a float_extend or zero extend of an
+;; offsettable memory operand suitable for use in fusion
+(define_predicate "fusion_offsettable_mem_operand"
+  (match_code "mem,zero_extend,float_extend")
+{
+  if (GET_CODE (op) == ZERO_EXTEND || GET_CODE (op) == FLOAT_EXTEND)
+    {
+      op = XEXP (op, 0);
+      mode = GET_MODE (op);
+    }
+
+  if (!memory_operand (op, mode))
+    return 0;
+
+  return offsettable_nonstrict_memref_p (op);
+})
Index: gcc/config/rs6000/rs6000-protos.h
===================================================================
--- gcc/config/rs6000/rs6000-protos.h	(revision 230064)
+++ gcc/config/rs6000/rs6000-protos.h	(working copy)
@@ -87,7 +87,15 @@ extern bool direct_move_p (rtx, rtx);
 extern bool quad_load_store_p (rtx, rtx);
 extern bool fusion_gpr_load_p (rtx, rtx, rtx, rtx);
 extern void expand_fusion_gpr_load (rtx *);
+extern void emit_fusion_addis (rtx, rtx, const char *, const char *);
+extern void emit_fusion_load_store (rtx, rtx, rtx, const char *);
 extern const char *emit_fusion_gpr_load (rtx, rtx);
+extern bool fusion_p9_p (rtx, rtx, rtx, rtx);
+extern void expand_fusion_p9_load (rtx *);
+extern void expand_fusion_p9_store (rtx *);
+extern const char *emit_fusion_p9_load (rtx, rtx, rtx);
+extern const char *emit_fusion_p9_store (rtx, rtx, rtx);
+extern rtx fusion_wrap_memory_address (rtx);
 extern enum reg_class (*rs6000_preferred_reload_class_ptr) (rtx,
 							    enum reg_class);
 extern enum reg_class (*rs6000_secondary_reload_class_ptr) (enum reg_class,
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 230064)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -376,8 +376,18 @@ struct rs6000_reg_addr {
   enum insn_code reload_fpr_gpr;	/* INSN to move from FPR to GPR.  */
   enum insn_code reload_gpr_vsx;	/* INSN to move from GPR to VSX.  */
   enum insn_code reload_vsx_gpr;	/* INSN to move from VSX to GPR.  */
+  enum insn_code fusion_gpr_ld;		/* INSN for fusing gpr ADDIS/loads.  */
+					/* INSNs for fusing addi with loads
+					   or stores for each reg. class.  */					   
+  enum insn_code fusion_addi_ld[(int)N_RELOAD_REG];
+  enum insn_code fusion_addi_st[(int)N_RELOAD_REG];
+					/* INSNs for fusing addis with loads
+					   or stores for each reg. class.  */					   
+  enum insn_code fusion_addis_ld[(int)N_RELOAD_REG];
+  enum insn_code fusion_addis_st[(int)N_RELOAD_REG];
   addr_mask_type addr_mask[(int)N_RELOAD_REG]; /* Valid address masks.  */
   bool scalar_in_vmx_p;			/* Scalar value can go in VMX.  */
+  bool fused_toc;			/* Mode supports TOC fusion.  */
 };
 
 static struct rs6000_reg_addr reg_addr[NUM_MACHINE_MODES];
@@ -2026,25 +2036,113 @@ DEBUG_FUNCTION void
 rs6000_debug_print_mode (ssize_t m)
 {
   ssize_t rc;
+  int spaces = 0;
+  bool fuse_extra_p;
 
   fprintf (stderr, "Mode: %-5s", GET_MODE_NAME (m));
   for (rc = 0; rc < N_RELOAD_REG; rc++)
     fprintf (stderr, " %s: %s", reload_reg_map[rc].name,
 	     rs6000_debug_addr_mask (reg_addr[m].addr_mask[rc], true));
 
+  if ((reg_addr[m].reload_store != CODE_FOR_nothing)
+      || (reg_addr[m].reload_load != CODE_FOR_nothing))
+    fprintf (stderr, "  Reload=%c%c",
+	     (reg_addr[m].reload_store != CODE_FOR_nothing) ? 's' : '*',
+	     (reg_addr[m].reload_load != CODE_FOR_nothing) ? 'l' : '*');
+  else
+    spaces += sizeof ("  Reload=sl") - 1;
+
+  if (reg_addr[m].scalar_in_vmx_p)
+    {
+      fprintf (stderr, "%*s  Upper=y", spaces, "");
+      spaces = 0;
+    }
+  else
+    spaces += sizeof ("  Upper=y") - 1;
+
+  fuse_extra_p = ((reg_addr[m].fusion_gpr_ld != CODE_FOR_nothing)
+		  || reg_addr[m].fused_toc);
+  if (!fuse_extra_p)
+    {
+      for (rc = 0; rc < N_RELOAD_REG; rc++)
+	{
+	  if (rc != RELOAD_REG_ANY)
+	    {
+	      if (reg_addr[m].fusion_addi_ld[rc]     != CODE_FOR_nothing
+		  || reg_addr[m].fusion_addi_ld[rc]  != CODE_FOR_nothing
+		  || reg_addr[m].fusion_addi_st[rc]  != CODE_FOR_nothing
+		  || reg_addr[m].fusion_addis_ld[rc] != CODE_FOR_nothing
+		  || reg_addr[m].fusion_addis_st[rc] != CODE_FOR_nothing)
+		{
+		  fuse_extra_p = true;
+		  break;
+		}
+	    }
+	}
+    }
+
+  if (fuse_extra_p)
+    {
+      fprintf (stderr, "%*s  Fuse:", spaces, "");
+      spaces = 0;
+
+      for (rc = 0; rc < N_RELOAD_REG; rc++)
+	{
+	  if (rc != RELOAD_REG_ANY)
+	    {
+	      char load, store;
+
+	      if (reg_addr[m].fusion_addis_ld[rc] != CODE_FOR_nothing)
+		load = 'l';
+	      else if (reg_addr[m].fusion_addi_ld[rc] != CODE_FOR_nothing)
+		load = 'L';
+	      else
+		load = '-';
+
+	      if (reg_addr[m].fusion_addis_st[rc] != CODE_FOR_nothing)
+		store = 's';
+	      else if (reg_addr[m].fusion_addi_st[rc] != CODE_FOR_nothing)
+		store = 'S';
+	      else
+		store = '-';
+
+	      if (load == '-' && store == '-')
+		spaces += 5;
+	      else
+		{
+		  fprintf (stderr, "%*s%c=%c%c", (spaces + 1), "",
+			   reload_reg_map[rc].name[0], load, store);
+		  spaces = 0;
+		}
+	    }
+	}
+
+      if (reg_addr[m].fusion_gpr_ld != CODE_FOR_nothing)
+	{
+	  fprintf (stderr, "%*sP8gpr", (spaces + 1), "");
+	  spaces = 0;
+	}
+      else
+	spaces += sizeof (" P8gpr") - 1;
+
+      if (reg_addr[m].fused_toc)
+	{
+	  fprintf (stderr, "%*sToc", (spaces + 1), "");
+	  spaces = 0;
+	}
+      else
+	spaces += sizeof (" Toc") - 1;
+    }
+  else
+    spaces += sizeof ("  Fuse: G=ls F=ls v=ls P8gpr Toc") - 1;
+
   if (rs6000_vector_unit[m] != VECTOR_NONE
-      || rs6000_vector_mem[m] != VECTOR_NONE
-      || (reg_addr[m].reload_store != CODE_FOR_nothing)
-      || (reg_addr[m].reload_load != CODE_FOR_nothing)
-      || reg_addr[m].scalar_in_vmx_p)
+      || rs6000_vector_mem[m] != VECTOR_NONE)
     {
-      fprintf (stderr,
-	       "  Vector-arith=%-10s Vector-mem=%-10s Reload=%c%c Upper=%c",
+      fprintf (stderr, "%*s  vector: arith=%-10s mem=%s",
+	       spaces, "",
 	       rs6000_debug_vector_unit (rs6000_vector_unit[m]),
-	       rs6000_debug_vector_unit (rs6000_vector_mem[m]),
-	       (reg_addr[m].reload_store != CODE_FOR_nothing) ? 's' : '*',
-	       (reg_addr[m].reload_load != CODE_FOR_nothing) ? 'l' : '*',
-	       (reg_addr[m].scalar_in_vmx_p) ? 'y' : 'n');
+	       rs6000_debug_vector_unit (rs6000_vector_mem[m]));
     }
 
   fputs ("\n", stderr);
@@ -3019,6 +3117,130 @@ rs6000_init_hard_regno_mode_ok (bool glo
 	reg_addr[SFmode].scalar_in_vmx_p = true;
     }
 
+  /* Setup the fusion operations.  */
+  if (TARGET_P8_FUSION)
+    {
+      reg_addr[QImode].fusion_gpr_ld = CODE_FOR_fusion_gpr_load_qi;
+      reg_addr[HImode].fusion_gpr_ld = CODE_FOR_fusion_gpr_load_hi;
+      reg_addr[SImode].fusion_gpr_ld = CODE_FOR_fusion_gpr_load_si;
+      if (TARGET_64BIT)
+	reg_addr[DImode].fusion_gpr_ld = CODE_FOR_fusion_gpr_load_di;
+    }
+
+  if (TARGET_P9_FUSION)
+    {
+      struct fuse_insns {
+	enum machine_mode mode;			/* mode of the fused type.  */
+	enum machine_mode pmode;		/* pointer mode.  */
+	enum rs6000_reload_reg_type rtype;	/* register type.  */
+	enum insn_code load;			/* load insn.  */
+	enum insn_code store;			/* store insn.  */
+      };
+
+      static const struct fuse_insns addis_insns[] = {
+	{ SFmode, DImode, RELOAD_REG_FPR,
+	  CODE_FOR_fusion_fpr_di_sf_load,
+	  CODE_FOR_fusion_fpr_di_sf_store },
+
+	{ SFmode, SImode, RELOAD_REG_FPR,
+	  CODE_FOR_fusion_fpr_si_sf_load,
+	  CODE_FOR_fusion_fpr_si_sf_store },
+
+	{ DFmode, DImode, RELOAD_REG_FPR,
+	  CODE_FOR_fusion_fpr_di_df_load,
+	  CODE_FOR_fusion_fpr_di_df_store },
+
+	{ DFmode, SImode, RELOAD_REG_FPR,
+	  CODE_FOR_fusion_fpr_si_df_load,
+	  CODE_FOR_fusion_fpr_si_df_store },
+
+	{ DImode, DImode, RELOAD_REG_FPR,
+	  CODE_FOR_fusion_fpr_di_di_load,
+	  CODE_FOR_fusion_fpr_di_di_store },
+
+	{ DImode, SImode, RELOAD_REG_FPR,
+	  CODE_FOR_fusion_fpr_si_di_load,
+	  CODE_FOR_fusion_fpr_si_di_store },
+
+	{ QImode, DImode, RELOAD_REG_GPR,
+	  CODE_FOR_fusion_gpr_di_qi_load,
+	  CODE_FOR_fusion_gpr_di_qi_store },
+
+	{ QImode, SImode, RELOAD_REG_GPR,
+	  CODE_FOR_fusion_gpr_si_qi_load,
+	  CODE_FOR_fusion_gpr_si_qi_store },
+
+	{ HImode, DImode, RELOAD_REG_GPR,
+	  CODE_FOR_fusion_gpr_di_hi_load,
+	  CODE_FOR_fusion_gpr_di_hi_store },
+
+	{ HImode, SImode, RELOAD_REG_GPR,
+	  CODE_FOR_fusion_gpr_si_hi_load,
+	  CODE_FOR_fusion_gpr_si_hi_store },
+
+	{ SImode, DImode, RELOAD_REG_GPR,
+	  CODE_FOR_fusion_gpr_di_si_load,
+	  CODE_FOR_fusion_gpr_di_si_store },
+
+	{ SImode, SImode, RELOAD_REG_GPR,
+	  CODE_FOR_fusion_gpr_si_si_load,
+	  CODE_FOR_fusion_gpr_si_si_store },
+
+	{ SFmode, DImode, RELOAD_REG_GPR,
+	  CODE_FOR_fusion_gpr_di_sf_load,
+	  CODE_FOR_fusion_gpr_di_sf_store },
+
+	{ SFmode, SImode, RELOAD_REG_GPR,
+	  CODE_FOR_fusion_gpr_si_sf_load,
+	  CODE_FOR_fusion_gpr_si_sf_store },
+
+	{ DImode, DImode, RELOAD_REG_GPR,
+	  CODE_FOR_fusion_gpr_di_di_load,
+	  CODE_FOR_fusion_gpr_di_di_store },
+
+	{ DFmode, DImode, RELOAD_REG_GPR,
+	  CODE_FOR_fusion_gpr_di_df_load,
+	  CODE_FOR_fusion_gpr_di_df_store },
+      };
+
+      enum machine_mode cur_pmode = Pmode;
+      size_t i;
+
+      for (i = 0; i < ARRAY_SIZE (addis_insns); i++)
+	{
+	  enum machine_mode xmode = addis_insns[i].mode;
+	  enum rs6000_reload_reg_type rtype = addis_insns[i].rtype;
+
+	  if (addis_insns[i].pmode != cur_pmode)
+	    continue;
+
+	  if (rtype == RELOAD_REG_FPR
+	      && (!TARGET_HARD_FLOAT || !TARGET_FPRS))
+	    continue;
+
+	  reg_addr[xmode].fusion_addis_ld[rtype] = addis_insns[i].load;
+	  reg_addr[xmode].fusion_addis_st[rtype] = addis_insns[i].store;
+	}
+    }
+
+  /* Note which types we support fusing TOC setup plus memory insn.  We only do
+     fused TOCs for medium/large code models.  */
+  if (TARGET_P8_FUSION && TARGET_TOC_FUSION && TARGET_POWERPC64
+      && (TARGET_CMODEL != CMODEL_SMALL))
+    {
+      reg_addr[QImode].fused_toc = true;
+      reg_addr[HImode].fused_toc = true;
+      reg_addr[SImode].fused_toc = true;
+      reg_addr[DImode].fused_toc = true;
+      if (TARGET_HARD_FLOAT && TARGET_FPRS)
+	{
+	  if (TARGET_SINGLE_FLOAT)
+	    reg_addr[SFmode].fused_toc = true;
+	  if (TARGET_DOUBLE_FLOAT)
+	    reg_addr[DFmode].fused_toc = true;
+	}
+    }
+
   /* Precalculate HARD_REGNO_NREGS.  */
   for (r = 0; r < FIRST_PSEUDO_REGISTER; ++r)
     for (m = 0; m < NUM_MACHINE_MODES; ++m)
@@ -8127,6 +8349,8 @@ rs6000_legitimate_address_p (machine_mod
       && legitimate_constant_pool_address_p (x, mode,
 					     reg_ok_strict || lra_in_progress))
     return 1;
+  if (reg_offset_p && reg_addr[mode].fused_toc && toc_fusion_mem_wrapped (x, mode))
+    return 1;
   /* For TImode, if we have load/store quad and TImode in VSX registers, only
      allow register indirect addresses.  This will allow the values to go in
      either GPRs or VSX registers without reloading.  The vector types would
@@ -31851,12 +32075,15 @@ rs6000_rtx_costs (rtx x, machine_mode mo
 	  else
 	    *total = rs6000_cost->divsi;
 	}
-      /* Add in shift and subtract for MOD. */
-      if (code == MOD || code == UMOD)
+      /* Add in shift and subtract for MOD unless we have a mod instruction. */
+      if (!TARGET_MODULO && (code == MOD || code == UMOD))
 	*total += COSTS_N_INSNS (2);
       return false;
 
     case CTZ:
+      *total = COSTS_N_INSNS (TARGET_CTZ ? 1 : 4);
+      return false;
+
     case FFS:
       *total = COSTS_N_INSNS (4);
       return false;
@@ -31931,6 +32158,17 @@ rs6000_rtx_costs (rtx x, machine_mode mo
       return false;
 
     case ASHIFT:
+      /* The EXTSWSLI instruction is a combined instruction.  Don't count both
+	 the sign extend and shift separately within the insn.  */
+      if (TARGET_EXTSWSLI && mode == DImode
+	  && GET_CODE (XEXP (x, 0)) == SIGN_EXTEND
+	  && GET_MODE (XEXP (XEXP (x, 0), 0)) == SImode)
+	{
+	  *total = 0;
+	  return false;
+	}
+      /* fall through */
+	  
     case ASHIFTRT:
     case LSHIFTRT:
     case ROTATE:
@@ -35202,72 +35440,21 @@ expand_fusion_gpr_load (rtx *operands)
   return;
 }
 
-/* Return a string to fuse an addis instruction with a gpr load to the same
-   register that we loaded up the addis instruction.  The address that is used
-   is the logical address that was formed during peephole2:
-	(lo_sum (high) (low-part))
-
-   The code is complicated, so we call output_asm_insn directly, and just
-   return "".  */
+/* Emit the addis instruction that will be part of a fused instruction
+   sequence.  */
 
-const char *
-emit_fusion_gpr_load (rtx target, rtx mem)
+void
+emit_fusion_addis (rtx target, rtx addis_value, const char *comment,
+		   const char *mode_name)
 {
-  rtx addis_value;
   rtx fuse_ops[10];
-  rtx addr;
-  rtx load_offset;
-  const char *addis_str = NULL;
-  const char *load_str = NULL;
-  const char *mode_name = NULL;
   char insn_template[80];
-  machine_mode mode;
+  const char *addis_str = NULL;
   const char *comment_str = ASM_COMMENT_START;
 
-  if (GET_CODE (mem) == ZERO_EXTEND)
-    mem = XEXP (mem, 0);
-
-  gcc_assert (REG_P (target) && MEM_P (mem));
-
   if (*comment_str == ' ')
     comment_str++;
 
-  addr = XEXP (mem, 0);
-  if (GET_CODE (addr) != PLUS && GET_CODE (addr) != LO_SUM)
-    gcc_unreachable ();
-
-  addis_value = XEXP (addr, 0);
-  load_offset = XEXP (addr, 1);
-
-  /* Now emit the load instruction to the same register.  */
-  mode = GET_MODE (mem);
-  switch (mode)
-    {
-    case QImode:
-      mode_name = "char";
-      load_str = "lbz";
-      break;
-
-    case HImode:
-      mode_name = "short";
-      load_str = "lhz";
-      break;
-
-    case SImode:
-      mode_name = "int";
-      load_str = "lwz";
-      break;
-
-    case DImode:
-      gcc_assert (TARGET_POWERPC64);
-      mode_name = "long";
-      load_str = "ld";
-      break;
-
-    default:
-      gcc_unreachable ();
-    }
-
   /* Emit the addis instruction.  */
   fuse_ops[0] = target;
   if (satisfies_constraint_L (addis_value))
@@ -35346,67 +35533,530 @@ emit_fusion_gpr_load (rtx target, rtx me
   if (!addis_str)
     fatal_insn ("Could not generate addis value for fusion", addis_value);
 
-  sprintf (insn_template, "%s\t\t%s gpr load fusion, type %s", addis_str,
-	   comment_str, mode_name);
+  sprintf (insn_template, "%s\t\t%s %s, type %s", addis_str, comment_str,
+	   comment, mode_name);
   output_asm_insn (insn_template, fuse_ops);
+}
 
-  /* Emit the D-form load instruction.  */
-  if (CONST_INT_P (load_offset) && satisfies_constraint_I (load_offset))
+/* Emit a D-form load or store instruction that is the second instruction
+   of a fusion sequence.  */
+
+void
+emit_fusion_load_store (rtx load_store_reg, rtx addis_reg, rtx offset,
+			const char *insn_str)
+{
+  rtx fuse_ops[10];
+  char insn_template[80];
+
+  fuse_ops[0] = load_store_reg;
+  fuse_ops[1] = addis_reg;
+
+  if (CONST_INT_P (offset) && satisfies_constraint_I (offset))
     {
-      sprintf (insn_template, "%s %%0,%%1(%%0)", load_str);
-      fuse_ops[1] = load_offset;
+      sprintf (insn_template, "%s %%0,%%2(%%1)", insn_str);
+      fuse_ops[2] = offset;
       output_asm_insn (insn_template, fuse_ops);
     }
 
-  else if (GET_CODE (load_offset) == UNSPEC
-	   && XINT (load_offset, 1) == UNSPEC_TOCREL)
+  else if (GET_CODE (offset) == UNSPEC
+	   && XINT (offset, 1) == UNSPEC_TOCREL)
     {
       if (TARGET_ELF)
-	sprintf (insn_template, "%s %%0,%%1@toc@l(%%0)", load_str);
+	sprintf (insn_template, "%s %%0,%%2@toc@l(%%1)", insn_str);
 
       else if (TARGET_XCOFF)
-	sprintf (insn_template, "%s %%0,%%1@l(%%0)", load_str);
+	sprintf (insn_template, "%s %%0,%%2@l(%%1)", insn_str);
 
       else
 	gcc_unreachable ();
 
-      fuse_ops[1] = XVECEXP (load_offset, 0, 0);
+      fuse_ops[2] = XVECEXP (offset, 0, 0);
       output_asm_insn (insn_template, fuse_ops);
     }
 
-  else if (GET_CODE (load_offset) == PLUS
-	   && GET_CODE (XEXP (load_offset, 0)) == UNSPEC
-	   && XINT (XEXP (load_offset, 0), 1) == UNSPEC_TOCREL
-	   && CONST_INT_P (XEXP (load_offset, 1)))
+  else if (GET_CODE (offset) == PLUS
+	   && GET_CODE (XEXP (offset, 0)) == UNSPEC
+	   && XINT (XEXP (offset, 0), 1) == UNSPEC_TOCREL
+	   && CONST_INT_P (XEXP (offset, 1)))
     {
-      rtx tocrel_unspec = XEXP (load_offset, 0);
+      rtx tocrel_unspec = XEXP (offset, 0);
       if (TARGET_ELF)
-	sprintf (insn_template, "%s %%0,%%1+%%2@toc@l(%%0)", load_str);
+	sprintf (insn_template, "%s %%0,%%2+%%3@toc@l(%%1)", insn_str);
 
       else if (TARGET_XCOFF)
-	sprintf (insn_template, "%s %%0,%%1+%%2@l(%%0)", load_str);
+	sprintf (insn_template, "%s %%0,%%2+%%3@l(%%1)", insn_str);
 
       else
 	gcc_unreachable ();
 
-      fuse_ops[1] = XVECEXP (tocrel_unspec, 0, 0);
-      fuse_ops[2] = XEXP (load_offset, 1);
+      fuse_ops[2] = XVECEXP (tocrel_unspec, 0, 0);
+      fuse_ops[3] = XEXP (offset, 1);
       output_asm_insn (insn_template, fuse_ops);
     }
 
-  else if (TARGET_ELF && !TARGET_POWERPC64 && CONSTANT_P (load_offset))
+  else if (TARGET_ELF && !TARGET_POWERPC64 && CONSTANT_P (offset))
     {
-      sprintf (insn_template, "%s %%0,%%1@l(%%0)", load_str);
+      sprintf (insn_template, "%s %%0,%%2@l(%%1)", insn_str);
 
-      fuse_ops[1] = load_offset;
+      fuse_ops[2] = offset;
       output_asm_insn (insn_template, fuse_ops);
     }
 
   else
-    fatal_insn ("Unable to generate load offset for fusion", load_offset);
+    fatal_insn ("Unable to generate load/store offset for fusion", offset);
+
+  return;
+}
+
+/* Wrap a TOC address that can be fused to indicate that special fusion
+   processing is needed.  */
+
+rtx
+fusion_wrap_memory_address (rtx old_mem)
+{
+  rtx old_addr = XEXP (old_mem, 0);
+  rtvec v = gen_rtvec (1, old_addr);
+  rtx new_addr = gen_rtx_UNSPEC (Pmode, v, UNSPEC_FUSION_ADDIS);
+  return replace_equiv_address_nv (old_mem, new_addr, false);
+}
+
+/* Given an address, convert it into the addis and load offset parts.  Addresses
+   created during the peephole2 process look like:
+	(lo_sum (high (unspec [(sym)] UNSPEC_TOCREL))
+		(unspec [(...)] UNSPEC_TOCREL))
+
+   Addresses created via toc fusion look like:
+	(unspec [(unspec [(...)] UNSPEC_TOCREL)] UNSPEC_FUSION_ADDIS))  */
+
+static void
+fusion_split_address (rtx addr, rtx *p_hi, rtx *p_lo)
+{
+  rtx hi, lo;
+
+  if (GET_CODE (addr) == UNSPEC && XINT (addr, 1) == UNSPEC_FUSION_ADDIS)
+    {
+      lo = XVECEXP (addr, 0, 0);
+      hi = gen_rtx_HIGH (Pmode, lo);
+    }
+  else if (GET_CODE (addr) == PLUS || GET_CODE (addr) == LO_SUM)
+    {
+      hi = XEXP (addr, 0);
+      lo = XEXP (addr, 1);
+    }
+  else
+    gcc_unreachable ();
+
+  *p_hi = hi;
+  *p_lo = lo;
+}
+
+/* Return a string to fuse an addis instruction with a gpr load to the same
+   register that we loaded up the addis instruction.  The address that is used
+   is the logical address that was formed during peephole2:
+	(lo_sum (high) (low-part))
+
+   Or the address is the TOC address that is wrapped before register allocation:
+	(unspec [(addr) (toc-reg)] UNSPEC_FUSION_ADDIS)
+
+   The code is complicated, so we call output_asm_insn directly, and just
+   return "".  */
+
+const char *
+emit_fusion_gpr_load (rtx target, rtx mem)
+{
+  rtx addis_value;
+  rtx addr;
+  rtx load_offset;
+  const char *load_str = NULL;
+  const char *mode_name = NULL;
+  machine_mode mode;
+
+  if (GET_CODE (mem) == ZERO_EXTEND)
+    mem = XEXP (mem, 0);
+
+  gcc_assert (REG_P (target) && MEM_P (mem));
+
+  addr = XEXP (mem, 0);
+  fusion_split_address (addr, &addis_value, &load_offset);
+
+  /* Now emit the load instruction to the same register.  */
+  mode = GET_MODE (mem);
+  switch (mode)
+    {
+    case QImode:
+      mode_name = "char";
+      load_str = "lbz";
+      break;
+
+    case HImode:
+      mode_name = "short";
+      load_str = "lhz";
+      break;
+
+    case SImode:
+    case SFmode:
+      mode_name = (mode == SFmode) ? "float" : "int";
+      load_str = "lwz";
+      break;
+
+    case DImode:
+    case DFmode:
+      gcc_assert (TARGET_POWERPC64);
+      mode_name = (mode == DFmode) ? "double" : "long";
+      load_str = "ld";
+      break;
+
+    default:
+      fatal_insn ("Bad GPR fusion", gen_rtx_SET (target, mem));
+    }
+
+  /* Emit the addis instruction.  */
+  emit_fusion_addis (target, addis_value, "gpr load fusion", mode_name);
+
+  /* Emit the D-form load instruction.  */
+  emit_fusion_load_store (target, target, load_offset, load_str);
+
+  return "";
+}
+\f
+
+/* Return true if the peephole2 can combine a load/store involving a
+   combination of an addis instruction and the memory operation.  This was
+   added to the ISA 3.0 (power9) hardware.  */
+
+bool
+fusion_p9_p (rtx addis_reg,		/* register set via addis.  */
+	     rtx addis_value,		/* addis value.  */
+	     rtx dest,			/* destination (memory or register). */
+	     rtx src)			/* source (register or memory).  */
+{
+  rtx addr, mem, offset;
+  enum machine_mode mode = GET_MODE (src);
+
+  /* Validate arguments.  */
+  if (!base_reg_operand (addis_reg, GET_MODE (addis_reg)))
+    return false;
+
+  if (!fusion_gpr_addis (addis_value, GET_MODE (addis_value)))
+    return false;
+
+  /* Ignore extend operations that are part of the load.  */
+  if (GET_CODE (src) == FLOAT_EXTEND || GET_CODE (src) == ZERO_EXTEND)
+    src = XEXP (src, 0);
+
+  /* Test for memory<-register or register<-memory.  */
+  if (fpr_reg_operand (src, mode) || int_reg_operand (src, mode))
+    {
+      if (!MEM_P (dest))
+	return false;
+
+      mem = dest;
+    }
+
+  else if (MEM_P (src))
+    {
+      if (!fpr_reg_operand (dest, mode) && !int_reg_operand (dest, mode))
+	return false;
+
+      mem = src;
+    }
+
+  else
+    return false;
+
+  addr = XEXP (mem, 0);			/* either PLUS or LO_SUM.  */
+  if (GET_CODE (addr) == PLUS)
+    {
+      if (!rtx_equal_p (addis_reg, XEXP (addr, 0)))
+	return false;
+
+      return satisfies_constraint_I (XEXP (addr, 1));
+    }
+
+  else if (GET_CODE (addr) == LO_SUM)
+    {
+      if (!rtx_equal_p (addis_reg, XEXP (addr, 0)))
+	return false;
+
+      offset = XEXP (addr, 1);
+      if (TARGET_XCOFF || (TARGET_ELF && TARGET_POWERPC64))
+	return small_toc_ref (offset, GET_MODE (offset));
+
+      else if (TARGET_ELF && !TARGET_POWERPC64)
+	return CONSTANT_P (offset);
+    }
+
+  return false;
+}
+
+/* During the peephole2 pass, adjust and expand the insns for an extended fusion
+   load sequence.
+
+   The operands are:
+	operands[0]	register set with addis
+	operands[1]	value set via addis
+	operands[2]	target register being loaded
+	operands[3]	D-form memory reference using operands[0].
+
+  This is similar to the fusion introduced with power8, except it scales to
+  both loads/stores and does not require the result register to be the same as
+  the base register.  At the moment, we only do this if register set with addis
+  is dead.  */
+
+void
+expand_fusion_p9_load (rtx *operands)
+{
+  rtx tmp_reg = operands[0];
+  rtx addis_value = operands[1];
+  rtx target = operands[2];
+  rtx orig_mem = operands[3];
+  rtx  new_addr, new_mem, orig_addr, offset, set, clobber, insn;
+  enum rtx_code plus_or_lo_sum;
+  machine_mode target_mode = GET_MODE (target);
+  machine_mode extend_mode = target_mode;
+  machine_mode ptr_mode = Pmode;
+  enum rtx_code extend = UNKNOWN;
+
+  if (GET_CODE (orig_mem) == FLOAT_EXTEND || GET_CODE (orig_mem) == ZERO_EXTEND)
+    {
+      extend = GET_CODE (orig_mem);
+      orig_mem = XEXP (orig_mem, 0);
+      target_mode = GET_MODE (orig_mem);
+    }
+
+  gcc_assert (MEM_P (orig_mem));
+
+  orig_addr = XEXP (orig_mem, 0);
+  plus_or_lo_sum = GET_CODE (orig_addr);
+  gcc_assert (plus_or_lo_sum == PLUS || plus_or_lo_sum == LO_SUM);
+
+  offset = XEXP (orig_addr, 1);
+  new_addr = gen_rtx_fmt_ee (plus_or_lo_sum, ptr_mode, addis_value, offset);
+  new_mem = replace_equiv_address_nv (orig_mem, new_addr, false);
+
+  if (extend != UNKNOWN)
+    new_mem = gen_rtx_fmt_e (extend, extend_mode, new_mem);
+
+  new_mem = gen_rtx_UNSPEC (extend_mode, gen_rtvec (1, new_mem),
+			    UNSPEC_FUSION_P9);
+
+  set = gen_rtx_SET (target, new_mem);
+  clobber = gen_rtx_CLOBBER (VOIDmode, tmp_reg);
+  insn = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, set, clobber));
+  emit_insn (insn);
+
+  return;
+}
+
+/* During the peephole2 pass, adjust and expand the insns for an extended fusion
+   store sequence.
+
+   The operands are:
+	operands[0]	register set with addis
+	operands[1]	value set via addis
+	operands[2]	target D-form memory being stored to
+	operands[3]	register being stored
+
+  This is similar to the fusion introduced with power8, except it scales to
+  both loads/stores and does not require the result register to be the same as
+  the base register.  At the moment, we only do this if register set with addis
+  is dead.  */
+
+void
+expand_fusion_p9_store (rtx *operands)
+{
+  rtx tmp_reg = operands[0];
+  rtx addis_value = operands[1];
+  rtx orig_mem = operands[2];
+  rtx src = operands[3];
+  rtx  new_addr, new_mem, orig_addr, offset, set, clobber, insn, new_src;
+  enum rtx_code plus_or_lo_sum;
+  machine_mode target_mode = GET_MODE (orig_mem);
+  machine_mode ptr_mode = Pmode;
+
+  gcc_assert (MEM_P (orig_mem));
+
+  orig_addr = XEXP (orig_mem, 0);
+  plus_or_lo_sum = GET_CODE (orig_addr);
+  gcc_assert (plus_or_lo_sum == PLUS || plus_or_lo_sum == LO_SUM);
+
+  offset = XEXP (orig_addr, 1);
+  new_addr = gen_rtx_fmt_ee (plus_or_lo_sum, ptr_mode, addis_value, offset);
+  new_mem = replace_equiv_address_nv (orig_mem, new_addr, false);
+
+  new_src = gen_rtx_UNSPEC (target_mode, gen_rtvec (1, src),
+			    UNSPEC_FUSION_P9);
+
+  set = gen_rtx_SET (new_mem, new_src);
+  clobber = gen_rtx_CLOBBER (VOIDmode, tmp_reg);
+  insn = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, set, clobber));
+  emit_insn (insn);
+
+  return;
+}
+
+/* Return a string to fuse an addis instruction with a load using extended
+   fusion.  The address that is used is the logical address that was formed
+   during peephole2: (lo_sum (high) (low-part))
+
+   The code is complicated, so we call output_asm_insn directly, and just
+   return "".  */
+
+const char *
+emit_fusion_p9_load (rtx reg, rtx mem, rtx tmp_reg)
+{
+  enum machine_mode mode = GET_MODE (reg);
+  rtx hi;
+  rtx lo;
+  rtx addr;
+  const char *load_string;
+  int r;
+
+  if (GET_CODE (mem) == FLOAT_EXTEND || GET_CODE (mem) == ZERO_EXTEND)
+    {
+      mem = XEXP (mem, 0);
+      mode = GET_MODE (mem);
+    }
+
+  if (GET_CODE (reg) == SUBREG)
+    {
+      gcc_assert (SUBREG_BYTE (reg) == 0);
+      reg = SUBREG_REG (reg);
+    }
+
+  if (!REG_P (reg))
+    fatal_insn ("emit_fusion_p9_load, bad reg #1", reg);
+
+  r = REGNO (reg);
+  if (FP_REGNO_P (r))
+    {
+      if (mode == SFmode)
+	load_string = "lfs";
+      else if (mode == DFmode || mode == DImode)
+	load_string = "lfd";
+      else
+	gcc_unreachable ();
+    }
+  else if (INT_REGNO_P (r))
+    {
+      switch (mode)
+	{
+	case QImode:
+	  load_string = "lbz";
+	  break;
+	case HImode:
+	  load_string = "lhz";
+	  break;
+	case SImode:
+	case SFmode:
+	  load_string = "lwz";
+	  break;
+	case DImode:
+	case DFmode:
+	  if (!TARGET_POWERPC64)
+	    gcc_unreachable ();
+	  load_string = "ld";
+	  break;
+	default:
+	  gcc_unreachable ();
+	}
+    }
+  else
+    fatal_insn ("emit_fusion_p9_load, bad reg #2", reg);
+
+  if (!MEM_P (mem))
+    fatal_insn ("emit_fusion_p9_load not MEM", mem);
+
+  addr = XEXP (mem, 0);
+  fusion_split_address (addr, &hi, &lo);
+
+  /* Emit the addis instruction.  */
+  emit_fusion_addis (tmp_reg, hi, "power9 load fusion", GET_MODE_NAME (mode));
+
+  /* Emit the D-form load instruction.  */
+  emit_fusion_load_store (reg, tmp_reg, lo, load_string);
 
   return "";
 }
+
+/* Return a string to fuse an addis instruction with a store using extended
+   fusion.  The address that is used is the logical address that was formed
+   during peephole2: (lo_sum (high) (low-part))
+
+   The code is complicated, so we call output_asm_insn directly, and just
+   return "".  */
+
+const char *
+emit_fusion_p9_store (rtx mem, rtx reg, rtx tmp_reg)
+{
+  enum machine_mode mode = GET_MODE (reg);
+  rtx hi;
+  rtx lo;
+  rtx addr;
+  const char *store_string;
+  int r;
+
+  if (GET_CODE (reg) == SUBREG)
+    {
+      gcc_assert (SUBREG_BYTE (reg) == 0);
+      reg = SUBREG_REG (reg);
+    }
+
+  if (!REG_P (reg))
+    fatal_insn ("emit_fusion_p9_store, bad reg #1", reg);
+
+  r = REGNO (reg);
+  if (FP_REGNO_P (r))
+    {
+      if (mode == SFmode)
+	store_string = "stfs";
+      else if (mode == DFmode)
+	store_string = "stfd";
+      else
+	gcc_unreachable ();
+    }
+  else if (INT_REGNO_P (r))
+    {
+      switch (mode)
+	{
+	case QImode:
+	  store_string = "stb";
+	  break;
+	case HImode:
+	  store_string = "sth";
+	  break;
+	case SImode:
+	case SFmode:
+	  store_string = "stw";
+	  break;
+	case DImode:
+	case DFmode:
+	  if (!TARGET_POWERPC64)
+	    gcc_unreachable ();
+	  store_string = "std";
+	  break;
+	default:
+	  gcc_unreachable ();
+	}
+    }
+  else
+    fatal_insn ("emit_fusion_p9_store, bad reg #2", reg);
+
+  if (!MEM_P (mem))
+    fatal_insn ("emit_fusion_p9_store not MEM", mem);
+
+  addr = XEXP (mem, 0);
+  fusion_split_address (addr, &hi, &lo);
+
+  /* Emit the addis instruction.  */
+  emit_fusion_addis (tmp_reg, hi, "power9 store fusion", GET_MODE_NAME (mode));
+
+  /* Emit the D-form load instruction.  */
+  emit_fusion_load_store (reg, tmp_reg, lo, store_string);
+
+  return "";
+}
+
 \f
 /* Analyze vector computations and remove unnecessary doubleword
    swaps (xxswapdi instructions).  This pass is performed only
Index: gcc/config/rs6000/rs6000.h
===================================================================
--- gcc/config/rs6000/rs6000.h	(revision 230064)
+++ gcc/config/rs6000/rs6000.h	(working copy)
@@ -565,6 +565,8 @@ extern int rs6000_vector_align[];
 #define TARGET_FCFIDUS	TARGET_POPCNTD
 #define TARGET_FCTIDUZ	TARGET_POPCNTD
 #define TARGET_FCTIWUZ	TARGET_POPCNTD
+#define TARGET_CTZ	TARGET_MODULO
+#define TARGET_EXTSWSLI	(TARGET_MODULO && TARGET_POWERPC64)
 
 #define TARGET_XSCVDPSPN	(TARGET_DIRECT_MOVE || TARGET_P8_VECTOR)
 #define TARGET_XSCVSPDPN	(TARGET_DIRECT_MOVE || TARGET_P8_VECTOR)
@@ -701,6 +703,22 @@ extern int rs6000_vector_align[];
 			 && TARGET_DOUBLE_FLOAT \
 			 && (TARGET_PPC_GFXOPT || VECTOR_UNIT_VSX_P (DFmode)))
 
+/* Conditions to allow TOC fusion for loading/storing integers.  */
+#define TARGET_TOC_FUSION_INT	(TARGET_P8_FUSION			\
+				 && TARGET_TOC_FUSION			\
+				 && (TARGET_CMODEL != CMODEL_SMALL)	\
+				 && TARGET_POWERPC64)
+
+/* Conditions to allow TOC fusion for loading/storing floating point.  */
+#define TARGET_TOC_FUSION_FP	(TARGET_P9_FUSION			\
+				 && TARGET_TOC_FUSION			\
+				 && (TARGET_CMODEL != CMODEL_SMALL)	\
+				 && TARGET_POWERPC64			\
+				 && TARGET_HARD_FLOAT			\
+				 && TARGET_FPRS				\
+				 && TARGET_SINGLE_FLOAT			\
+				 && TARGET_DOUBLE_FLOAT)
+
 /* Whether the various reciprocal divide/square root estimate instructions
    exist, and whether we should automatically generate code for the instruction
    by default.  */
@@ -2095,8 +2113,12 @@ do {									     \
 #define CLZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \
   ((VALUE) = ((MODE) == SImode ? 32 : 64), 1)
 
-/* The CTZ patterns return -1 for input of zero.  */
-#define CTZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) ((VALUE) = -1, 1)
+/* The CTZ patterns that are implemented in terms of CLZ return -1 for input of
+   zero.  The hardware instructions added in Power9 return 32 or 64.  */
+#define CTZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE)				\
+  ((!TARGET_CTZ)							\
+   ? ((VALUE) = -1, 1)							\
+   : ((VALUE) = ((MODE) == SImode ? 32 : 64), 1))
 
 /* Specify the machine mode that pointers have.
    After generation of rtl, the compiler makes no further distinction
Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(revision 230064)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -141,6 +141,8 @@ (define_c_enum "unspec"
    UNSPEC_LSQ
    UNSPEC_FUSION_GPR
    UNSPEC_STACK_CHECK
+   UNSPEC_FUSION_P9
+   UNSPEC_FUSION_ADDIS
   ])
 
 ;;
@@ -327,12 +329,28 @@ (define_mode_iterator EXTSI [(DI "TARGET
 ; QImode or HImode for small atomic ops
 (define_mode_iterator QHI [QI HI])
 
+; QImode, HImode, SImode for fused ops only for GPR loads
+(define_mode_iterator QHSI [QI HI SI])
+
 ; HImode or SImode for sign extended fusion ops
 (define_mode_iterator HSI [HI SI])
 
 ; SImode or DImode, even if DImode doesn't fit in GPRs.
 (define_mode_iterator SDI [SI DI])
 
+; Types that can be fused with an ADDIS instruction to load or store a GPR
+; register that has reg+offset addressing.
+(define_mode_iterator GPR_FUSION [QI
+				  HI
+				  SI
+				  (DI	"TARGET_POWERPC64")
+				  SF
+				  (DF	"TARGET_POWERPC64")])
+
+; Types that can be fused with an ADDIS instruction to load or store a FPR
+; register that has reg+offset addressing.
+(define_mode_iterator FPR_FUSION [DI SF DF])
+
 ; The size of a pointer.  Also, the size of the value that a record-condition
 ; (one with a '.') will compare; and the size used for arithmetic carries.
 (define_mode_iterator P [(SI "TARGET_32BIT") (DI "TARGET_64BIT")])
@@ -2101,12 +2119,25 @@ (define_expand "ctz<mode>2"
 	      (clobber (reg:GPR CA_REGNO))])]
   ""
 {
+  if (TARGET_CTZ)
+    {
+      emit_insn (gen_ctz<mode>2_hw (operands[0], operands[1]));
+      DONE;
+    }
+
   operands[2] = gen_reg_rtx (<MODE>mode);
   operands[3] = gen_reg_rtx (<MODE>mode);
   operands[4] = gen_reg_rtx (<MODE>mode);
   operands[5] = GEN_INT (GET_MODE_BITSIZE (<MODE>mode) - 1);
 })
 
+(define_insn "ctz<mode>2_hw"
+  [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
+	(ctz:GPR (match_operand:GPR 1 "gpc_reg_operand" "r")))]
+  "TARGET_CTZ"
+  "cnttz<wd> %0,%1"
+  [(set_attr "type" "cntlz")])
+
 (define_expand "ffs<mode>2"
   [(set (match_dup 2)
 	(neg:GPR (match_operand:GPR 1 "gpc_reg_operand" "")))
@@ -2885,9 +2916,9 @@ (define_insn_and_split "*div<mode>3_sra_
    (set_attr "cell_micro" "not")])
 
 (define_expand "mod<mode>3"
-  [(use (match_operand:GPR 0 "gpc_reg_operand" ""))
-   (use (match_operand:GPR 1 "gpc_reg_operand" ""))
-   (use (match_operand:GPR 2 "reg_or_cint_operand" ""))]
+  [(set (match_operand:GPR 0 "gpc_reg_operand")
+	(mod:GPR (match_operand:GPR 1 "gpc_reg_operand")
+		 (match_operand:GPR 2 "reg_or_cint_operand")))]
   ""
 {
   int i;
@@ -2897,16 +2928,93 @@ (define_expand "mod<mode>3"
   if (GET_CODE (operands[2]) != CONST_INT
       || INTVAL (operands[2]) <= 0
       || (i = exact_log2 (INTVAL (operands[2]))) < 0)
-    FAIL;
+    {
+      if (!TARGET_MODULO)
+	FAIL;
 
-  temp1 = gen_reg_rtx (<MODE>mode);
-  temp2 = gen_reg_rtx (<MODE>mode);
+      operands[2] = force_reg (<MODE>mode, operands[2]);
+    }
+  else
+    {
+      temp1 = gen_reg_rtx (<MODE>mode);
+      temp2 = gen_reg_rtx (<MODE>mode);
 
-  emit_insn (gen_div<mode>3 (temp1, operands[1], operands[2]));
-  emit_insn (gen_ashl<mode>3 (temp2, temp1, GEN_INT (i)));
-  emit_insn (gen_sub<mode>3 (operands[0], operands[1], temp2));
-  DONE;
+      emit_insn (gen_div<mode>3 (temp1, operands[1], operands[2]));
+      emit_insn (gen_ashl<mode>3 (temp2, temp1, GEN_INT (i)));
+      emit_insn (gen_sub<mode>3 (operands[0], operands[1], temp2));
+      DONE;
+    }
 })
+
+;; In order to enable using a peephole2 for combining div/mod to eliminate the
+;; mod, prefer putting the result of mod into a different register
+(define_insn "*mod<mode>3"
+  [(set (match_operand:GPR 0 "gpc_reg_operand" "=&r")
+        (mod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r")
+		 (match_operand:GPR 2 "gpc_reg_operand" "r")))]
+  "TARGET_MODULO"
+  "mods<wd> %0,%1,%2"
+  [(set_attr "type" "div")
+   (set_attr "size" "<bits>")])
+
+
+(define_insn "umod<mode>3"
+  [(set (match_operand:GPR 0 "gpc_reg_operand" "=&r")
+        (umod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r")
+		  (match_operand:GPR 2 "gpc_reg_operand" "r")))]
+  "TARGET_MODULO"
+  "modu<wd> %0,%1,%2"
+  [(set_attr "type" "div")
+   (set_attr "size" "<bits>")])
+
+;; On machines with modulo support, do a combined div/mod the old fashioned
+;; method, since the multiply/subtract is faster than doing the mod instruction
+;; after a divide.
+
+(define_peephole2
+  [(set (match_operand:GPR 0 "gpc_reg_operand" "")
+	(div:GPR (match_operand:GPR 1 "gpc_reg_operand" "")
+		 (match_operand:GPR 2 "gpc_reg_operand" "")))
+   (set (match_operand:GPR 3 "gpc_reg_operand" "")
+	(mod:GPR (match_dup 1)
+		 (match_dup 2)))]
+  "TARGET_MODULO
+   && ! reg_mentioned_p (operands[0], operands[1])
+   && ! reg_mentioned_p (operands[0], operands[2])
+   && ! reg_mentioned_p (operands[3], operands[1])
+   && ! reg_mentioned_p (operands[3], operands[2])"
+  [(set (match_dup 0)
+	(div:GPR (match_dup 1)
+		 (match_dup 2)))
+   (set (match_dup 3)
+	(mult:GPR (match_dup 0)
+		  (match_dup 2)))
+   (set (match_dup 3)
+	(minus:GPR (match_dup 1)
+		   (match_dup 3)))])
+
+(define_peephole2
+  [(set (match_operand:GPR 0 "gpc_reg_operand" "")
+	(udiv:GPR (match_operand:GPR 1 "gpc_reg_operand" "")
+		  (match_operand:GPR 2 "gpc_reg_operand" "")))
+   (set (match_operand:GPR 3 "gpc_reg_operand" "")
+	(umod:GPR (match_dup 1)
+		  (match_dup 2)))]
+  "TARGET_MODULO
+   && ! reg_mentioned_p (operands[0], operands[1])
+   && ! reg_mentioned_p (operands[0], operands[2])
+   && ! reg_mentioned_p (operands[3], operands[1])
+   && ! reg_mentioned_p (operands[3], operands[2])"
+  [(set (match_dup 0)
+	(div:GPR (match_dup 1)
+		 (match_dup 2)))
+   (set (match_dup 3)
+	(mult:GPR (match_dup 0)
+		  (match_dup 2)))
+   (set (match_dup 3)
+	(minus:GPR (match_dup 1)
+		   (match_dup 3)))])
+
 \f
 ;; Logical instructions
 ;; The logical instructions are mostly combined by using match_operator,
@@ -3843,6 +3951,127 @@ (define_insn_and_split "*ashl<mode>3_dot
    (set_attr "dot" "yes")
    (set_attr "length" "4,8")])
 
+;; Pretend we have a memory form of extswsli until register allocation is done
+;; so that we use LWZ to load the value from memory, instead of LWA.
+(define_insn_and_split "ashdi3_extswsli"
+  [(set (match_operand:DI 0 "gpc_reg_operand" "=r,r")
+	(ashift:DI
+	 (sign_extend:DI (match_operand:SI 1 "reg_or_mem_operand" "r,m"))
+	 (match_operand:DI 2 "u6bit_cint_operand" "n,n")))]
+  "TARGET_EXTSWSLI"
+  "@
+   extswsli %0,%1,%2
+   #"
+  "&& reload_completed && MEM_P (operands[1])"
+  [(set (match_dup 3)
+	(match_dup 1))
+   (set (match_dup 0)
+	(ashift:DI (sign_extend:DI (match_dup 3))
+		   (match_dup 2)))]
+{
+  operands[3] = gen_lowpart (SImode, operands[0]);
+}
+  [(set_attr "type" "shift")
+   (set_attr "maybe_var_shift" "no")])
+
+
+(define_insn_and_split "ashdi3_extswsli_dot"
+  [(set (match_operand:CC 3 "cc_reg_operand" "=x,?y,?x,??y")
+	(compare:CC
+	 (ashift:DI
+	  (sign_extend:DI (match_operand:SI 1 "reg_or_mem_operand" "r,r,m,m"))
+	  (match_operand:DI 2 "u6bit_cint_operand" "n,n,n,n"))
+	 (const_int 0)))
+   (clobber (match_scratch:DI 0 "=r,r,r,r"))]
+  "TARGET_EXTSWSLI"
+  "@
+   extswsli. %0,%1,%2
+   #
+   #
+   #"
+  "&& reload_completed
+   && (cc_reg_not_cr0_operand (operands[3], CCmode)
+       || memory_operand (operands[1], SImode))"
+  [(pc)]
+{
+  rtx dest = operands[0];
+  rtx src = operands[1];
+  rtx shift = operands[2];
+  rtx cr = operands[3];
+  rtx src2;
+
+  if (!MEM_P (src))
+    src2 = src;
+  else
+    {
+      src2 = gen_lowpart (SImode, dest);
+      emit_move_insn (src2, src);
+    }
+
+  if (REGNO (cr) == CR0_REGNO)
+    {
+      emit_insn (gen_ashdi3_extswsli_dot (dest, src2, shift, cr));
+      DONE;
+    }
+
+  emit_insn (gen_ashdi3_extswsli (dest, src2, shift));
+  emit_insn (gen_rtx_SET (cr, gen_rtx_COMPARE (CCmode, dest, const0_rtx)));
+  DONE;
+}
+  [(set_attr "type" "shift")
+   (set_attr "maybe_var_shift" "no")
+   (set_attr "dot" "yes")
+   (set_attr "length" "4,8,8,12")])
+
+(define_insn_and_split "ashdi3_extswsli_dot2"
+  [(set (match_operand:CC 3 "cc_reg_operand" "=x,?y,?x,??y")
+	(compare:CC
+	 (ashift:DI
+	  (sign_extend:DI (match_operand:SI 1 "reg_or_mem_operand" "r,r,m,m"))
+	  (match_operand:DI 2 "u6bit_cint_operand" "n,n,n,n"))
+	 (const_int 0)))
+   (set (match_operand:DI 0 "gpc_reg_operand" "=r,r,r,r")
+	(ashift:DI (sign_extend:DI (match_dup 1))
+		   (match_dup 2)))]
+  "TARGET_EXTSWSLI"
+  "@
+   extswsli. %0,%1,%2
+   #
+   #
+   #"
+  "&& reload_completed
+   && (cc_reg_not_cr0_operand (operands[3], CCmode)
+       || memory_operand (operands[1], SImode))"
+  [(pc)]
+{
+  rtx dest = operands[0];
+  rtx src = operands[1];
+  rtx shift = operands[2];
+  rtx cr = operands[3];
+  rtx src2;
+
+  if (!MEM_P (src))
+    src2 = src;
+  else
+    {
+      src2 = gen_lowpart (SImode, dest);
+      emit_move_insn (src2, src);
+    }
+
+  if (REGNO (cr) == CR0_REGNO)
+    {
+      emit_insn (gen_ashdi3_extswsli_dot2 (dest, src2, shift, cr));
+      DONE;
+    }
+
+  emit_insn (gen_ashdi3_extswsli (dest, src2, shift));
+  emit_insn (gen_rtx_SET (cr, gen_rtx_COMPARE (CCmode, dest, const0_rtx)));
+  DONE;
+}
+  [(set_attr "type" "shift")
+   (set_attr "maybe_var_shift" "no")
+   (set_attr "dot" "yes")
+   (set_attr "length" "4,8,8,12")])
 
 (define_insn "lshr<mode>3"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
@@ -12381,6 +12610,66 @@ (define_insn "rs6000_mtfsf"
 ;; a GPR.  The addis instruction must be adjacent to the load, and use the same
 ;; register that is being loaded.  The fused ops must be physically adjacent.
 
+;; There are two parts to addis fusion.  The support for fused TOCs occur
+;; before register allocation, and is meant to reduce the lifetime for the
+;; tempoary register that holds the ADDIS result.  On Power8 GPR loads, we try
+;; to use the register that is being load.  The peephole2 then gathers any
+;; other fused possibilities that it can find after register allocation.  If
+;; power9 fusion is selected, we also fuse floating point loads/stores.
+
+;; Fused TOC support: Replace simple GPR loads with a fused form.  This is done
+;; before register allocation, so that we can avoid allocating a temporary base
+;; register that won't be used, and that we try to load into base registers,
+;; and not register 0.  If we can't get a fused GPR load, generate a P9 fusion
+;; (addis followed by load) even on power8.
+
+(define_split
+  [(set (match_operand:INT1 0 "toc_fusion_or_p9_reg_operand" "")
+	(match_operand:INT1 1 "toc_fusion_mem_raw" ""))]
+  "TARGET_TOC_FUSION_INT && can_create_pseudo_p ()"
+  [(parallel [(set (match_dup 0) (match_dup 2))
+	      (unspec [(const_int 0)] UNSPEC_FUSION_ADDIS)
+	      (use (match_dup 3))
+	      (clobber (scratch:DI))])]
+{
+  operands[2] = fusion_wrap_memory_address (operands[1]);
+  operands[3] = gen_rtx_REG (Pmode, TOC_REGISTER);
+})
+
+(define_insn "*toc_fusionload_<mode>"
+  [(set (match_operand:QHSI 0 "int_reg_operand" "=&b,??r")
+	(match_operand:QHSI 1 "toc_fusion_mem_wrapped" "wG,wG"))
+   (unspec [(const_int 0)] UNSPEC_FUSION_ADDIS)
+   (use (match_operand:DI 2 "base_reg_operand" "r,r"))
+   (clobber (match_scratch:DI 3 "=X,&b"))]
+  "TARGET_TOC_FUSION_INT"
+{
+  if (base_reg_operand (operands[0], <MODE>mode))
+    return emit_fusion_gpr_load (operands[0], operands[1]);
+
+  return emit_fusion_p9_load (operands[0], operands[1], operands[3]);
+}
+  [(set_attr "type" "load")
+   (set_attr "length" "8")])
+
+(define_insn "*toc_fusionload_di"
+  [(set (match_operand:DI 0 "int_reg_operand" "=&b,??r,?d")
+	(match_operand:DI 1 "toc_fusion_mem_wrapped" "wG,wG,wG"))
+   (unspec [(const_int 0)] UNSPEC_FUSION_ADDIS)
+   (use (match_operand:DI 2 "base_reg_operand" "r,r,r"))
+   (clobber (match_scratch:DI 3 "=X,&b,&b"))]
+  "TARGET_TOC_FUSION_INT && TARGET_POWERPC64
+   && (MEM_P (operands[1]) || int_reg_operand (operands[0], DImode))"
+{
+  if (base_reg_operand (operands[0], DImode))
+    return emit_fusion_gpr_load (operands[0], operands[1]);
+
+  return emit_fusion_p9_load (operands[0], operands[1], operands[3]);
+}
+  [(set_attr "type" "load")
+   (set_attr "length" "8")])
+
+\f
 ;; Find cases where the addis that feeds into a load instruction is either used
 ;; once or is the same as the target register, and replace it with the fusion
 ;; insn
@@ -12404,7 +12693,7 @@ (define_peephole2
 
 (define_insn "fusion_gpr_load_<mode>"
   [(set (match_operand:INT1 0 "base_reg_operand" "=&b")
-	(unspec:INT1 [(match_operand:INT1 1 "fusion_gpr_mem_combo" "")]
+	(unspec:INT1 [(match_operand:INT1 1 "fusion_addis_mem_combo_load" "")]
 		     UNSPEC_FUSION_GPR))]
   "TARGET_P8_FUSION"
 {
@@ -12414,6 +12703,133 @@ (define_insn "fusion_gpr_load_<mode>"
    (set_attr "length" "8")])
 
 \f
+;; ISA 3.0 (power9) fusion support
+;; Merge addis with floating load/store to FPRs (or GPRs).
+(define_peephole2
+  [(set (match_operand:P 0 "base_reg_operand" "")
+	(match_operand:P 1 "fusion_gpr_addis" ""))
+   (set (match_operand:SFDF 2 "toc_fusion_or_p9_reg_operand" "")
+	(match_operand:SFDF 3 "fusion_offsettable_mem_operand" ""))]
+  "TARGET_P9_FUSION && peep2_reg_dead_p (2, operands[0])
+   && fusion_p9_p (operands[0], operands[1], operands[2], operands[3])"
+  [(const_int 0)]
+{
+  expand_fusion_p9_load (operands);
+  DONE;
+})
+
+(define_peephole2
+  [(set (match_operand:P 0 "base_reg_operand" "")
+	(match_operand:P 1 "fusion_gpr_addis" ""))
+   (set (match_operand:SFDF 2 "offsettable_mem_operand" "")
+	(match_operand:SFDF 3 "toc_fusion_or_p9_reg_operand" ""))]
+  "TARGET_P9_FUSION && peep2_reg_dead_p (2, operands[0])
+   && fusion_p9_p (operands[0], operands[1], operands[2], operands[3])"
+  [(const_int 0)]
+{
+  expand_fusion_p9_store (operands);
+  DONE;
+})
+
+(define_peephole2
+  [(set (match_operand:SDI 0 "int_reg_operand" "")
+	(match_operand:SDI 1 "upper16_cint_operand" ""))
+   (set (match_dup 0)
+	(ior:SDI (match_dup 0)
+		 (match_operand:SDI 2 "u_short_cint_operand" "")))]
+  "TARGET_P9_FUSION"
+  [(set (match_dup 0)
+	(unspec:SDI [(match_dup 1)
+		     (match_dup 2)] UNSPEC_FUSION_P9))])
+
+(define_peephole2
+  [(set (match_operand:SDI 0 "int_reg_operand" "")
+	(match_operand:SDI 1 "upper16_cint_operand" ""))
+   (set (match_operand:SDI 2 "int_reg_operand" "")
+	(ior:SDI (match_dup 0)
+		 (match_operand:SDI 3 "u_short_cint_operand" "")))]
+  "TARGET_P9_FUSION
+   && !rtx_equal_p (operands[0], operands[2])
+   && peep2_reg_dead_p (2, operands[0])"
+  [(set (match_dup 2)
+	(unspec:SDI [(match_dup 1)
+		     (match_dup 3)] UNSPEC_FUSION_P9))])
+
+;; Fusion insns, created by the define_peephole2 above (and eventually by
+;; reload).  Because we want to eventually have secondary_reload generate
+;; these, they have to have a single alternative that gives the register
+;; classes.  This means we need to have separate gpr/fpr/altivec versions.
+(define_insn "fusion_gpr_<P:mode>_<GPR_FUSION:mode>_load"
+  [(set (match_operand:GPR_FUSION 0 "int_reg_operand" "=r")
+	(unspec:GPR_FUSION
+	 [(match_operand:GPR_FUSION 1 "fusion_addis_mem_combo_load" "wF")]
+	 UNSPEC_FUSION_P9))
+   (clobber (match_operand:P 2 "base_reg_operand" "=&b"))]
+  "TARGET_P9_FUSION"
+{
+  /* This insn is a secondary reload insn, which cannot have alternatives.
+     If we are not loading up register 0, use the power8 fusion instead.  */
+  if (base_reg_operand (operands[0], <GPR_FUSION:MODE>mode))
+    return emit_fusion_gpr_load (operands[0], operands[1]);
+
+  return emit_fusion_p9_load (operands[0], operands[1], operands[2]);
+}
+  [(set_attr "type" "load")
+   (set_attr "length" "8")])
+
+(define_insn "fusion_gpr_<P:mode>_<GPR_FUSION:mode>_store"
+  [(set (match_operand:GPR_FUSION 0 "fusion_addis_mem_combo_store" "=wF")
+	(unspec:GPR_FUSION
+	 [(match_operand:GPR_FUSION 1 "int_reg_operand" "r")]
+	 UNSPEC_FUSION_P9))
+   (clobber (match_operand:P 2 "base_reg_operand" "=&b"))]
+  "TARGET_P9_FUSION"
+{
+  return emit_fusion_p9_store (operands[0], operands[1], operands[2]);
+}
+  [(set_attr "type" "store")
+   (set_attr "length" "8")])
+
+(define_insn "fusion_fpr_<P:mode>_<FPR_FUSION:mode>_load"
+  [(set (match_operand:FPR_FUSION 0 "fpr_reg_operand" "=d")
+	(unspec:FPR_FUSION
+	 [(match_operand:FPR_FUSION 1 "fusion_addis_mem_combo_load" "wF")]
+	 UNSPEC_FUSION_P9))
+   (clobber (match_operand:P 2 "base_reg_operand" "=b"))]
+  "TARGET_P9_FUSION"
+{
+  return emit_fusion_p9_load (operands[0], operands[1], operands[2]);
+}
+  [(set_attr "type" "fpload")
+   (set_attr "length" "8")])
+
+(define_insn "fusion_fpr_<P:mode>_<FPR_FUSION:mode>_store"
+  [(set (match_operand:FPR_FUSION 0 "fusion_addis_mem_combo_store" "=wF")
+	(unspec:FPR_FUSION
+	 [(match_operand:FPR_FUSION 1 "fpr_reg_operand" "d")]
+	 UNSPEC_FUSION_P9))
+   (clobber (match_operand:P 2 "base_reg_operand" "=b"))]
+  "TARGET_P9_FUSION"
+{
+  return emit_fusion_p9_store (operands[0], operands[1], operands[2]);
+}
+  [(set_attr "type" "fpstore")
+   (set_attr "length" "8")])
+
+(define_insn "*fusion_p9_<mode>_constant"
+  [(set (match_operand:SDI 0 "int_reg_operand" "=r")
+	(unspec:SDI [(match_operand:SDI 1 "upper16_cint_operand" "L")
+		     (match_operand:SDI 2 "u_short_cint_operand" "K")]
+		    UNSPEC_FUSION_P9))]	
+  "TARGET_P9_FUSION"
+{
+  emit_fusion_addis (operands[0], operands[1], "constant", "<MODE>");
+  return "ori %0,%0,%2";
+}
+  [(set_attr "type" "two")
+   (set_attr "length" "8")])
+
+\f
 ;; Miscellaneous ISA 2.06 (power7) instructions
 (define_insn "addg6s"
   [(set (match_operand:SI 0 "register_operand" "=r")
@@ -12580,6 +12996,7 @@ (define_insn "pack<mode>"
   "xxpermdi %x0,%x1,%x2,0"
   [(set_attr "type" "vecperm")])
 
+
 \f
 
 (include "sync.md")
Index: gcc/testsuite/lib/target-supports.exp
===================================================================
--- gcc/testsuite/lib/target-supports.exp	(revision 230064)
+++ gcc/testsuite/lib/target-supports.exp	(working copy)
@@ -1616,7 +1616,9 @@ proc check_p8vector_hw_available { } {
     return [check_cached_effective_target p8vector_hw_available {
 	# Some simulators are known to not support VSX/power8 instructions.
 	# For now, disable on Darwin
-	if { [istarget powerpc-*-eabi] || [istarget powerpc*-*-eabispe] || [istarget *-*-darwin*]} {
+	if { [istarget powerpc-*-eabi]
+	     || [istarget powerpc*-*-eabispe]
+	     || [istarget *-*-darwin*]} {
 	    expr 0
 	} else {
 	    set options "-mpower8-vector"
@@ -1635,6 +1637,112 @@ proc check_p8vector_hw_available { } {
     }]
 }
 
+# Return 1 if the target supports executing power9 vector instructions, 0
+# otherwise.  Cache the result.
+
+proc check_p9vector_hw_available { } {
+    return [check_cached_effective_target p9vector_hw_available {
+	# Some simulators are known to not support VSX/power8/power9
+	# instructions.	For now, disable on Darwin.
+	if { [istarget powerpc-*-eabi]
+	     || [istarget powerpc*-*-eabispe]
+	     || [istarget *-*-darwin*]} {
+	    expr 0
+	} else {
+	    set options "-mpower9-vector"
+	    check_runtime_nocache p9vector_hw_available {
+		int main()
+		{
+		    long e = -1;
+		    vector double v = (vector double) { 0.0, 0.0 };
+		    asm ("xsxexpdp %0,%1" : "+r" (e) : "wa" (v));
+		    return e;
+		}
+	    } $options
+	}
+    }]
+}
+
+# Return 1 if the target supports executing power9 modulo instructions, 0
+# otherwise.  Cache the result.
+
+proc check_p9modulo_hw_available { } {
+    return [check_cached_effective_target p9modulo_hw_available {
+	# Some simulators are known to not support VSX/power8/power9
+	# instructions.	For now, disable on Darwin.
+	if { [istarget powerpc-*-eabi]
+	     || [istarget powerpc*-*-eabispe]
+	     || [istarget *-*-darwin*]} {
+	    expr 0
+	} else {
+	    set options "-mmodulo"
+	    check_runtime_nocache p9modulo_hw_available {
+		int main()
+		{
+		    int i = 5, j = 3, r = -1;
+		    asm ("modsw %0,%1,%2" : "+r" (r) : "r" (i), "r" (j));
+		    return (r == 2);
+		}
+	    } $options
+	}
+    }]
+}
+
+# Return 1 if the target supports executing __float128 on PowerPC via software
+# emulation, 0 otherwise.  Cache the result.
+
+proc check_ppc_float128_sw_available { } {
+    return [check_cached_effective_target ppc_float128_sw_available {
+	# Some simulators are known to not support VSX/power8/power9
+	# instructions.	For now, disable on Darwin.
+	if { [istarget powerpc-*-eabi]
+	     || [istarget powerpc*-*-eabispe]
+	     || [istarget *-*-darwin*]} {
+	    expr 0
+	} else {
+	    set options "-mfloat128 -mvsx"
+	    check_runtime_nocache ppc_float128_sw_available {
+		volatile __float128 x = 1.0q;
+		volatile __float128 y = 2.0q;
+		int main()
+		{
+		    __float128 z = x + y;
+		    return (z == 3.0q);
+		}
+	    } $options
+	}
+    }]
+}
+
+# Return 1 if the target supports executing __float128 on PowerPC via power9
+# hardware instructions, 0 otherwise.  Cache the result.
+
+proc check_ppc_float128_hw_available { } {
+    return [check_cached_effective_target ppc_float128_hw_available {
+	# Some simulators are known to not support VSX/power8/power9
+	# instructions.	For now, disable on Darwin.
+	if { [istarget powerpc-*-eabi]
+	     || [istarget powerpc*-*-eabispe]
+	     || [istarget *-*-darwin*]} {
+	    expr 0
+	} else {
+	    set options "-mfloat128-hardware"
+	    check_runtime_nocache ppc_float128_hw_available {
+		volatile __float128 x = 1.0q;
+		volatile __float128 y = 2.0q;
+		int main()
+		{
+		    __float128 z = x + y;
+		    __float128 w = -1.0q;
+
+		    __asm__ ("xsaddqp %0,%1,%2" : "+v" (w) : "v" (x), "v" (y));
+		    return ((z == 3.0q) && (z == w);
+		}
+	    } $options
+	}
+    }]
+}
+
 # Return 1 if the target supports executing VSX instructions, 0
 # otherwise.  Cache the result.
 
@@ -1642,7 +1750,9 @@ proc check_vsx_hw_available { } {
     return [check_cached_effective_target vsx_hw_available {
 	# Some simulators are known to not support VSX instructions.
 	# For now, disable on Darwin
-	if { [istarget powerpc-*-eabi] || [istarget powerpc*-*-eabispe] || [istarget *-*-darwin*]} {
+	if { [istarget powerpc-*-eabi]
+	     || [istarget powerpc*-*-eabispe]
+	     || [istarget *-*-darwin*]} {
 	    expr 0
 	} else {
 	    set options "-mvsx"
@@ -3358,6 +3468,108 @@ proc check_effective_target_powerpc_p8ve
     }
 }
 
+# Return 1 if this is a PowerPC target supporting -mpower9-vector
+
+proc check_effective_target_powerpc_p9vector_ok { } {
+    if { ([istarget powerpc*-*-*]
+         && ![istarget powerpc-*-linux*paired*])
+	 || [istarget rs6000-*-*] } {
+	# AltiVec is not supported on AIX before 5.3.
+	if { [istarget powerpc*-*-aix4*]
+	     || [istarget powerpc*-*-aix5.1*] 
+	     || [istarget powerpc*-*-aix5.2*] } {
+	    return 0
+	}
+	return [check_no_compiler_messages powerpc_p9vector_ok object {
+	    int main (void) {
+		long e = -1;
+		vector double v = (vector double) { 0.0, 0.0 };
+		asm ("xsxexpdp %0,%1" : "+r" (e) : "wa" (v));
+		return e;
+	    }
+	} "-mpower9-vector"]
+    } else {
+	return 0
+    }
+}
+
+# Return 1 if this is a PowerPC target supporting -mmodulo
+
+proc check_effective_target_powerpc_p9modulo_ok { } {
+    if { ([istarget powerpc*-*-*]
+         && ![istarget powerpc-*-linux*paired*])
+	 || [istarget rs6000-*-*] } {
+	# AltiVec is not supported on AIX before 5.3.
+	if { [istarget powerpc*-*-aix4*]
+	     || [istarget powerpc*-*-aix5.1*] 
+	     || [istarget powerpc*-*-aix5.2*] } {
+	    return 0
+	}
+	return [check_no_compiler_messages powerpc_p9modulo_ok object {
+	    int main (void) {
+		int i = 5, j = 3, r = -1;
+		asm ("modsw %0,%1,%2" : "+r" (r) : "r" (i), "r" (j));
+		return (r == 2);
+	    }
+	} "-mmodulo"]
+    } else {
+	return 0
+    }
+}
+
+# Return 1 if this is a PowerPC target supporting -mfloat128 via either
+# software emulation on power7/power8 systems or hardware support on power9.
+
+proc check_effective_target_powerpc_float128_sw_ok { } {
+    if { ([istarget powerpc*-*-*]
+         && ![istarget powerpc-*-linux*paired*])
+	 || [istarget rs6000-*-*] } {
+	# AltiVec is not supported on AIX before 5.3.
+	if { [istarget powerpc*-*-aix4*]
+	     || [istarget powerpc*-*-aix5.1*] 
+	     || [istarget powerpc*-*-aix5.2*] } {
+	    return 0
+	}
+	return [check_no_compiler_messages powerpc_float128_sw_ok object {
+		volatile __float128 x = 1.0q;
+		volatile __float128 y = 2.0q;
+		int main() {
+		    __float128 z = x + y;
+		    return (z == 3.0q);
+		}
+	    } "-mfloat128 -mvsx"]
+    } else {
+	return 0
+    }
+}
+
+# Return 1 if this is a PowerPC target supporting -mfloat128 via hardware
+# support on power9.
+
+proc check_effective_target_powerpc_float128_hw_ok { } {
+    if { ([istarget powerpc*-*-*]
+         && ![istarget powerpc-*-linux*paired*])
+	 || [istarget rs6000-*-*] } {
+	# AltiVec is not supported on AIX before 5.3.
+	if { [istarget powerpc*-*-aix4*]
+	     || [istarget powerpc*-*-aix5.1*] 
+	     || [istarget powerpc*-*-aix5.2*] } {
+	    return 0
+	}
+	return [check_no_compiler_messages powerpc_float128_hw_ok object {
+		volatile __float128 x = 1.0q;
+		volatile __float128 y = 2.0q;
+		int main() {
+		    __float128 z;
+		    __asm__ ("xsaddqp %0,%1,%2" : "=v" (z) : "v" (x), "v" (y));
+		    return (z == 3.0q);
+		}
+	} "-mfloat128-hardware"]
+    } else {
+	return 0
+    }
+}
+
 # Return 1 if this is a PowerPC target supporting -mvsx
 
 proc check_effective_target_powerpc_vsx_ok { } {
@@ -5459,6 +5671,10 @@ proc is-effective-target { arg } {
 	  "vmx_hw"         { set selected [check_vmx_hw_available] }
 	  "vsx_hw"         { set selected [check_vsx_hw_available] }
 	  "p8vector_hw"    { set selected [check_p8vector_hw_available] }
+	  "p9vector_hw"    { set selected [check_p9vector_hw_available] }
+	  "p9modulo_hw"    { set selected [check_p9modulo_hw_available] }
+	  "ppc_float128_sw" { set selected [check_ppc_float128_sw_available] }
+	  "ppc_float128_hw" { set selected [check_ppc_float128_hw_available] }
 	  "ppc_recip_hw"   { set selected [check_ppc_recip_hw_available] }
 	  "dfp_hw"         { set selected [check_dfp_hw_available] }
 	  "htm_hw"         { set selected [check_htm_hw_available] }
@@ -5483,6 +5699,10 @@ proc is-effective-target-keyword { arg }
 	  "vmx_hw"         { return 1 }
 	  "vsx_hw"         { return 1 }
 	  "p8vector_hw"    { return 1 }
+	  "p9vector_hw"    { return 1 }
+	  "p9modulo_hw"    { return 1 }
+	  "ppc_float128_sw" { return 1 }
+	  "ppc_float128_hw" { return 1 }
 	  "ppc_recip_hw"   { return 1 }
 	  "dfp_hw"         { return 1 }
 	  "htm_hw"         { return 1 }
@@ -6186,7 +6406,9 @@ proc check_vect_support_and_set_flags { 
         }
 
         lappend DEFAULT_VECTCFLAGS "-maltivec"
-        if [check_p8vector_hw_available] {
+        if [check_p9vector_hw_available] {
+            lappend DEFAULT_VECTCFLAGS "-mpower9-vector"
+        } elseif [check_p8vector_hw_available] {
             lappend DEFAULT_VECTCFLAGS "-mpower8-vector"
         } elseif [check_vsx_hw_available] {
             lappend DEFAULT_VECTCFLAGS "-mvsx" "-mno-allow-movmisalign"
Index: gcc/testsuite/gcc.target/powerpc/extswsli-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/extswsli-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/extswsli-1.c	(revision 0)
@@ -0,0 +1,20 @@
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
+/* { dg-require-effective-target powerpc_p9modulo_ok } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-options "-mcpu=power9 -O2" } */
+
+static int mem;
+int *ptr = &mem;
+
+long
+add (long *p, int reg)
+{
+  __asm__ (" #foo %0" : "+r" (reg));
+  return p[reg] + p[mem];
+}
+
+/* { dg-final { scan-assembler-times "extswsli " 2 } } */
+/* { dg-final { scan-assembler-times "lwz "      1 } } */
+/* { dg-final { scan-assembler-not   "lwa "        } } */
+/* { dg-final { scan-assembler-not   "sldi "       } } */
+/* { dg-final { scan-assembler-not   "extsw "      } } */
Index: gcc/testsuite/gcc.target/powerpc/extswsli-2.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/extswsli-2.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/extswsli-2.c	(revision 0)
@@ -0,0 +1,37 @@
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
+/* { dg-require-effective-target powerpc_p9modulo_ok } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-options "-mcpu=power9 -O2" } */
+
+long
+func1 (int reg, int *is_zero)
+{
+  long value;
+
+  __asm__ (" #foo %0" : "+r" (reg));
+  value = ((long)reg) << 4;
+
+  if (!value)
+    *is_zero = 1;
+
+  return value;
+}
+
+long
+func2 (int *ptr, int *is_zero)
+{
+  int reg = *ptr;
+  long value = ((long)reg) << 4;
+
+  if (!value)
+    *is_zero = 1;
+
+  return value;
+}
+
+/* { dg-final { scan-assembler     "extswsli\\. " } } */
+/* { dg-final { scan-assembler     "lwz "         } } */
+/* { dg-final { scan-assembler-not "lwa "         } } */
+/* { dg-final { scan-assembler-not "sldi "        } } */
+/* { dg-final { scan-assembler-not "sldi\\. "     } } */
+/* { dg-final { scan-assembler-not "extsw "       } } */
Index: gcc/testsuite/gcc.target/powerpc/extswsli-3.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/extswsli-3.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/extswsli-3.c	(revision 0)
@@ -0,0 +1,22 @@
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
+/* { dg-require-effective-target powerpc_p9modulo_ok } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-options "-mcpu=power9 -O2" } */
+
+long
+do_ext_add (int *p, long a, long b)
+{
+  long l = *p;
+  long l2 = l << 4;
+  return l2 + ((l2 == 0) ? a : b);
+}
+
+long
+do_ext (int *p, long a, long b)
+{
+  long l = *p;
+  long l2 = l << 4;
+  return ((l2 == 0) ? a : b);
+}
+
+/* { dg-final { scan-assembler "extswsli\\. "} } */
Index: gcc/testsuite/gcc.target/powerpc/ctz-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/ctz-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/ctz-1.c	(revision 0)
@@ -0,0 +1,13 @@
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
+/* { dg-require-effective-target powerpc_p9modulo_ok } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-options "-mcpu=power9 -O2" } */
+
+int i_trailing_zero (int a) { return __builtin_ctz (a); }
+int l_trailing_zero (long a) { return __builtin_ctzl (a); }
+int ll_trailing_zero (long long a) { return __builtin_ctzll (a); }
+
+/* { dg-final { scan-assembler     "cnttzw " } } */
+/* { dg-final { scan-assembler     "cnttzd " } } */
+/* { dg-final { scan-assembler-not "cntlzw " } } */
+/* { dg-final { scan-assembler-not "cntlzd " } } */
Index: gcc/testsuite/gcc.target/powerpc/ctz-2.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/ctz-2.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/ctz-2.c	(revision 0)
@@ -0,0 +1,9 @@
+/* { dg-do compile { target { powerpc*-*-* && ilp32 } } } */
+/* { dg-require-effective-target powerpc_p9modulo_ok } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-options "-mcpu=power9 -O2" } */
+
+int i_trailing_zero (int a) { return __builtin_ctz (a); }
+
+/* { dg-final { scan-assembler     "cnttzw " } } */
+/* { dg-final { scan-assembler-not "cntlzw " } } */
Index: gcc/testsuite/gcc.target/powerpc/float128-mix.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/float128-mix.c	(revision 230064)
+++ gcc/testsuite/gcc.target/powerpc/float128-mix.c	(working copy)
@@ -1,6 +1,5 @@
 /* { dg-do compile { target { powerpc*-*-linux* } } } */
-/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
-/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-require-effective-target powerpc_float128_sw_ok } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power7" } } */
 /* { dg-options "-O2 -mcpu=power7 -mfloat128" } */
 
Index: gcc/testsuite/gcc.target/powerpc/mod-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/mod-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/mod-1.c	(revision 0)
@@ -0,0 +1,20 @@
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
+/* { dg-require-effective-target powerpc_p9modulo_ok } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-options "-mcpu=power9 -O2" } */
+
+int ismod (int a, int b) { return a%b; }
+long lsmod (long a, long b) { return a%b; }
+unsigned int iumod (unsigned int a, unsigned int b) { return a%b; }
+unsigned long lumod (unsigned long a, unsigned long b) { return a%b; }
+
+/* { dg-final { scan-assembler-times "modsw " 1 } } */
+/* { dg-final { scan-assembler-times "modsd " 1 } } */
+/* { dg-final { scan-assembler-times "moduw " 1 } } */
+/* { dg-final { scan-assembler-times "modud " 1 } } */
+/* { dg-final { scan-assembler-not   "mullw "   } } */
+/* { dg-final { scan-assembler-not   "mulld "   } } */
+/* { dg-final { scan-assembler-not   "divw "    } } */
+/* { dg-final { scan-assembler-not   "divd "    } } */
+/* { dg-final { scan-assembler-not   "divwu "   } } */
+/* { dg-final { scan-assembler-not   "divdu "   } } */
Index: gcc/testsuite/gcc.target/powerpc/mod-2.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/mod-2.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/mod-2.c	(revision 0)
@@ -0,0 +1,13 @@
+/* { dg-do compile { target { powerpc*-*-* && ilp32 } } } */
+/* { dg-require-effective-target powerpc_p9modulo_ok } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-options "-mcpu=power9 -O2" } */
+
+int ismod (int a, int b) { return a%b; }
+unsigned int iumod (unsigned int a, unsigned int b) { return a%b; }
+
+/* { dg-final { scan-assembler-times "modsw " 1 } } */
+/* { dg-final { scan-assembler-times "moduw " 1 } } */
+/* { dg-final { scan-assembler-not   "mullw "   } } */
+/* { dg-final { scan-assembler-not   "divw "    } } */
+/* { dg-final { scan-assembler-not   "divwu "   } } */
Index: gcc/testsuite/gcc.target/powerpc/fusion2.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/fusion2.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/fusion2.c	(revision 0)
@@ -0,0 +1,10 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-skip-if "" { powerpc*le-*-* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power7" } } */
+/* { dg-options "-mcpu=power7 -mtune=power8 -O3" } */
+
+vector double fusion_vector (vector double *p) { return p[2]; }
+
+/* { dg-final { scan-assembler-times "vector load fusion" 1 } } */
Index: gcc/testsuite/gcc.target/powerpc/fusion3.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/fusion3.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/fusion3.c	(revision 0)
@@ -0,0 +1,18 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power7" } } */
+/* { dg-options "-mcpu=power7 -mtune=power9 -O3" } */
+
+#define LARGE 0x12345
+
+int fusion_float_read (float *p){ return p[LARGE]; }
+int fusion_double_read (double *p){ return p[LARGE]; }
+
+void fusion_float_write (float *p, float f){ p[LARGE] = f; }
+void fusion_double_write (double *p, double d){ p[LARGE] = d; }
+
+/* { dg-final { scan-assembler "load fusion, type SF"  } } */
+/* { dg-final { scan-assembler "load fusion, type DF"  } } */
+/* { dg-final { scan-assembler "store fusion, type SF" } } */
+/* { dg-final { scan-assembler "store fusion, type DF" } } */
Index: gcc/testsuite/gcc.target/powerpc/fusion.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/fusion.c	(revision 230064)
+++ gcc/testsuite/gcc.target/powerpc/fusion.c	(working copy)
@@ -1,6 +1,5 @@
 /* { dg-do compile { target { powerpc*-*-* } } } */
 /* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
-/* { dg-skip-if "" { powerpc*le-*-* } { "*" } { "" } } */
 /* { dg-require-effective-target powerpc_p8vector_ok } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power7" } } */
 /* { dg-options "-mcpu=power7 -mtune=power8 -O3" } */
@@ -14,10 +13,7 @@ int fusion_short (short *p){ return p[LA
 int fusion_int (int *p){ return p[LARGE]; }
 unsigned fusion_uns (unsigned *p){ return p[LARGE]; }
 
-vector double fusion_vector (vector double *p) { return p[2]; }
-
 /* { dg-final { scan-assembler-times "gpr load fusion"    6 } } */
-/* { dg-final { scan-assembler-times "vector load fusion" 1 } } */
 /* { dg-final { scan-assembler-times "lbz"                2 } } */
 /* { dg-final { scan-assembler-times "extsb"              1 } } */
 /* { dg-final { scan-assembler-times "lhz"                2 } } */
Index: gcc/testsuite/gcc.target/powerpc/float128-call.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/float128-call.c	(revision 230064)
+++ gcc/testsuite/gcc.target/powerpc/float128-call.c	(working copy)
@@ -1,6 +1,5 @@
 /* { dg-do compile { target { powerpc*-*-linux* } } } */
-/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
-/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-require-effective-target powerpc_float128_sw_ok } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power7" } } */
 /* { dg-options "-O2 -mcpu=power7 -mfloat128 -mno-regnames" } */
 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH], Add power9 support to GCC, patches #2-5 committed
  2015-11-10  0:17   ` [PATCH], Add power9 support to GCC, patches #2-5 committed Michael Meissner
@ 2015-11-10  0:20     ` Michael Meissner
  0 siblings, 0 replies; 47+ messages in thread
From: Michael Meissner @ 2015-11-10  0:20 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, dje.gcc

Actually, it looks like I changed advanced fusion -> power9 fusion.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH], Add power9 support to GCC, patch #6 (IEEE 128-bit hardware support)
  2015-11-09  0:45 ` [PATCH], Add power9 support to GCC, patch #6 (IEEE 128-bit hardware support) Michael Meissner
  2015-11-09 19:29   ` Segher Boessenkool
@ 2015-11-10  0:41   ` Joseph Myers
  2015-11-10 18:41     ` Michael Meissner
  2015-11-12 20:47   ` David Edelsohn
  2 siblings, 1 reply; 47+ messages in thread
From: Joseph Myers @ 2015-11-10  0:41 UTC (permalink / raw)
  To: Michael Meissner; +Cc: gcc-patches, dje.gcc

I don't see any conversions between KFmode and TImode (in either 
direction, signed or unsigned) here - I suppose there are no instructions 
for that?

If so, I would guess (without having tested it) that it is more efficient 
to use the libgcc2 implementations of those functions (whether copied, or 
with some logic to build selected libgcc2.c functions for KFmode), which 
implement them using a few hardware operations on DImode [note that where 
libgcc2.c has e.g. __floatditf, that gets mapped to __floattitf for 64-bit 
systems], than to use the soft-fp implementations doing everything with 
integer arithmetic.  (There are IEEE exceptions issues with the libgcc2.c 
conversions from double-word integers to floating-point - see bug 59412 - 
but since that's a preexisting issue for all architectures using this 
code, it's clearly not your problem to fix.)

Ideally, I'd think that for optimal efficiency if objects built for power8 
are linked with libgcc built for power9, or if an executable using shared 
libgcc that was built for power8 gets run with shared libgcc for power9, 
you'd want power9 libgcc to contain t-hardfp versions of all the functions 
that can be expanded inline for power9, and libgcc2 versions of those 
(such as TImode comparisons) that aren't expanded inline, but not to 
contain soft-fp versions of any of those KFmode functions.  Cf. how 
config.host ensures various 32-bit powerpc variants use the right mixture 
of hardfp and soft-fp functions.  It's a bit fiddly to make sure you get 
the preferred implementation of every function and that the ABI doesn't 
change depending on the configured processor, but not that hard.

Since none of the libgcc pieces for KFmode support are yet in, and the 
proposed changes are optimizations rather than a matter of correctness, 
none of the above should directly affect this patch in any way - it simply 
indicates desirable followup once both the libgcc soft-fp KFmode support, 
and this patch, are in.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH], Add power9 support to GCC, patch #8 (add integer multiply/add)
  2015-11-09  0:33 ` [PATCH], Add power9 support to GCC, patch #1 (revised) Michael Meissner
  2015-11-09 16:12   ` David Edelsohn
@ 2015-11-10 18:39   ` Michael Meissner
  2015-11-12 20:39     ` David Edelsohn
  1 sibling, 1 reply; 47+ messages in thread
From: Michael Meissner @ 2015-11-10 18:39 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, dje.gcc

[-- Attachment #1: Type: text/plain, Size: 883 bytes --]

This patch adds support for the MADDLD instruciton, which is a fused
multiply/add instruction for integers.  At this time, it is for 64-bit
multiplies only.  Eventually, we will restructure 128-bit multiply so that we
can use the 64x64 + 64 high bit varients.

I have bootstrapped a compiler with this change in and there were no
regressions.  Is it ok to apply to the trunk?

[gcc]
2015-11-10  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* config/rs6000/rs6000.h (TARGET_MADDLD): Add support for the ISA
	3.0 integer multiply-add instruction.
	* config/rs6000/rs6000.md (<u>mul<mode><dmode>3): Likewise.

[gcc/testsuite]
2015-11-10  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* gcc.target/powerpc/maddld.c: New test.


-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

[-- Attachment #2: gcc-power9.official-08b --]
[-- Type: text/plain, Size: 2112 bytes --]

Index: gcc/config/rs6000/rs6000.h
===================================================================
--- gcc/config/rs6000/rs6000.h	(revision 230078)
+++ gcc/config/rs6000/rs6000.h	(working copy)
@@ -571,6 +571,7 @@ extern int rs6000_vector_align[];
 #define TARGET_FCTIWUZ	TARGET_POPCNTD
 #define TARGET_CTZ	TARGET_MODULO
 #define TARGET_EXTSWSLI	(TARGET_MODULO && TARGET_POWERPC64)
+#define TARGET_MADDLD	(TARGET_MODULO && TARGET_POWERPC64)
 
 #define TARGET_XSCVDPSPN	(TARGET_DIRECT_MOVE || TARGET_P8_VECTOR)
 #define TARGET_XSCVSPDPN	(TARGET_DIRECT_MOVE || TARGET_P8_VECTOR)
Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(revision 230078)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -2837,6 +2837,14 @@ (define_expand "<u>mul<mode><dmode>3"
   DONE;
 })
 
+(define_insn "*maddld4"
+  [(set (match_operand:DI 0 "gpc_reg_operand" "=r")
+	(plus:DI (mult:DI (match_operand:DI 1 "gpc_reg_operand" "r")
+			  (match_operand:DI 2 "gpc_reg_operand" "r"))
+		 (match_operand:DI 3 "gpc_reg_operand" "r")))]
+  "TARGET_MADDLD"
+  "maddld %0,%1,%2,%3"
+  [(set_attr "type" "mul")])
 
 (define_insn "udiv<mode>3"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
Index: gcc/testsuite/gcc.target/powerpc/maddld.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/maddld.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/maddld.c	(revision 0)
@@ -0,0 +1,20 @@
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
+/* { dg-require-effective-target powerpc_p9modulo_ok } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-options "-mcpu=power9 -O2" } */
+
+long
+s_madd (long a, long b, long c)
+{
+  return (a * b) + c;
+}
+
+unsigned long
+u_madd (unsigned long a, unsigned long b, unsigned long c)
+{
+  return (a * b) + c;
+}
+
+/* { dg-final { scan-assembler-times "maddld " 2 } } */
+/* { dg-final { scan-assembler-not   "mulld "    } } */
+/* { dg-final { scan-assembler-not   "add "      } } */

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH], Add power9 support to GCC, patch #6 (IEEE 128-bit hardware support)
  2015-11-10  0:41   ` Joseph Myers
@ 2015-11-10 18:41     ` Michael Meissner
  0 siblings, 0 replies; 47+ messages in thread
From: Michael Meissner @ 2015-11-10 18:41 UTC (permalink / raw)
  To: Joseph Myers; +Cc: Michael Meissner, gcc-patches, dje.gcc

On Tue, Nov 10, 2015 at 12:41:07AM +0000, Joseph Myers wrote:
> I don't see any conversions between KFmode and TImode (in either 
> direction, signed or unsigned) here - I suppose there are no instructions 
> for that?

No in power9 there is no instruction that converts 128-bit integer to IEEE
128-bit floating point or vice versa.

> If so, I would guess (without having tested it) that it is more efficient 
> to use the libgcc2 implementations of those functions (whether copied, or 
> with some logic to build selected libgcc2.c functions for KFmode), which 
> implement them using a few hardware operations on DImode [note that where 
> libgcc2.c has e.g. __floatditf, that gets mapped to __floattitf for 64-bit 
> systems], than to use the soft-fp implementations doing everything with 
> integer arithmetic.  (There are IEEE exceptions issues with the libgcc2.c 
> conversions from double-word integers to floating-point - see bug 59412 - 
> but since that's a preexisting issue for all architectures using this 
> code, it's clearly not your problem to fix.)
> 
> Ideally, I'd think that for optimal efficiency if objects built for power8 
> are linked with libgcc built for power9, or if an executable using shared 
> libgcc that was built for power8 gets run with shared libgcc for power9, 
> you'd want power9 libgcc to contain t-hardfp versions of all the functions 
> that can be expanded inline for power9, and libgcc2 versions of those 
> (such as TImode comparisons) that aren't expanded inline, but not to 
> contain soft-fp versions of any of those KFmode functions.  Cf. how 
> config.host ensures various 32-bit powerpc variants use the right mixture 
> of hardfp and soft-fp functions.  It's a bit fiddly to make sure you get 
> the preferred implementation of every function and that the ABI doesn't 
> change depending on the configured processor, but not that hard.

Yep, that is my thinking. 

> Since none of the libgcc pieces for KFmode support are yet in, and the 
> proposed changes are optimizations rather than a matter of correctness, 
> none of the above should directly affect this patch in any way - it simply 
> indicates desirable followup once both the libgcc soft-fp KFmode support, 
> and this patch, are in.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH, applied], Add power9 support to GCC, patch #9 (config.gcc)
  2015-11-03 20:29 [PATCH], Add power9 support to GCC, patch #1 Michael Meissner
                   ` (7 preceding siblings ...)
  2015-11-09  0:49 ` [PATCH], Add power9 support to GCC, patch #7 (direct move enhancements) Michael Meissner
@ 2015-11-10 20:56 ` Michael Meissner
  2015-11-10 21:56 ` [PATCH], Add power9 support to GCC, patch #10 (SFmode/DFmode d-form addressing) Michael Meissner
  9 siblings, 0 replies; 47+ messages in thread
From: Michael Meissner @ 2015-11-10 20:56 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, dje.gcc

[-- Attachment #1: Type: text/plain, Size: 459 bytes --]

I applied this patch as obvious.  I missed submitting it in my original patch
for the power9 support (it was in the sandbox I was testing power9 support on).

2015-11-10  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* config.gcc (powerpc*-*-*, rs6000*-*-*): Add power9 to hosts that
	default to 64-bit.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

[-- Attachment #2: gcc-power9.official-09b --]
[-- Type: text/plain, Size: 1019 bytes --]

Index: gcc/config.gcc
===================================================================
--- gcc/config.gcc	(revision 230072)
+++ gcc/config.gcc	(working copy)
@@ -439,7 +439,7 @@ powerpc*-*-*)
 	cpu_type=rs6000
 	extra_headers="ppc-asm.h altivec.h spe.h ppu_intrinsics.h paired.h spu2vmx.h vec_types.h si2vmx.h htmintrin.h htmxlintrin.h"
 	case x$with_cpu in
-	    xpowerpc64|xdefault64|x6[23]0|x970|xG5|xpower[345678]|xpower6x|xrs64a|xcell|xa2|xe500mc64|xe5500|xe6500)
+	    xpowerpc64|xdefault64|x6[23]0|x970|xG5|xpower[3456789]|xpower6x|xrs64a|xcell|xa2|xe500mc64|xe5500|xe6500)
 		cpu_is_64bit=yes
 		;;
 	esac
@@ -4131,7 +4131,7 @@ case "${target}" in
 				eval "with_$which=405"
 				;;
 			"" | common | native \
-			| power | power[2345678] | power6x | powerpc | powerpc64 \
+			| power | power[23456789] | power6x | powerpc | powerpc64 \
 			| rios | rios1 | rios2 | rsc | rsc1 | rs64a \
 			| 401 | 403 | 405 | 405fp | 440 | 440fp | 464 | 464fp \
 			| 476 | 476fp | 505 | 601 | 602 | 603 | 603e | ec603e \

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH], Add power9 support to GCC, patch #10 (SFmode/DFmode d-form addressing)
  2015-11-03 20:29 [PATCH], Add power9 support to GCC, patch #1 Michael Meissner
                   ` (8 preceding siblings ...)
  2015-11-10 20:56 ` [PATCH, applied], Add power9 support to GCC, patch #9 (config.gcc) Michael Meissner
@ 2015-11-10 21:56 ` Michael Meissner
  2015-11-11  0:19   ` Segher Boessenkool
                     ` (2 more replies)
  9 siblings, 3 replies; 47+ messages in thread
From: Michael Meissner @ 2015-11-10 21:56 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, dje.gcc

This patch d-form addressing to float/double scalars for the PowerPC that was
added in ISA 3.0 (power9).  This patch does not yet turn on D-form addressing
as default.  It is likely that patch #11, which will add limited d-form
addressing to vector registers will enable it by default.

I have bootstrapped the compiler with these changes, and there were no
regressions to the testsuite.

In addition, I built all of the Spec 2006 benchmark with my normal options
(-ffast-math -O3 -mveclibabi=mass -mcpu=power9 -mpower9-dform -mrecip=rsqrt
-fpeel-loops -funroll-loops -fvect-cost-model -msave-toc-indirect
-fno-aggressive-loop-optimizations -mno-pointers-to-nested-functions) and there
were no compiler failures (and various power9 instructions were generated,
including d-form addressing).

Are these patches ok to check in?

[gcc]
2015-11-10  Michael Meissner  <meissner@linux.vnet.ibm.com>

	<patch #10>
	* config/rs6000/constraints.md (wb constraint): New constraint for
	ISA 3.0 d-form scalar addressing.

	* config/rs6000/rs6000.c (mode_supports_vmx_dform): Add support
	for ISA 3.0 D-form addressing to load SFmode/DFmode scalars into
	Altivec registers.  Add wb constraint for Altivec registers with
	D-form addressing.  If we have ISA 3.0 d-form support, undo
	secondary reload support for using FPR registers if we want to do
	D-form addressing.
	(rs6000_debug_reg_global): Likewise.
	(rs6000_setup_reg_addr_masks): Likewise.
	(rs6000_init_hard_regno_mode_ok): Likewise.
	(rs6000_secondary_reload): Likewise.
	(rs6000_preferred_reload_class): Likewise.
	(rs6000_secondary_reload_class): Likewise.

	* config/rs6000/rs6000.h (enum r6000_reg_class_enum): Add wb
	constraint.

	* config/rs6000/rs6000.md (f32_lr2 mode attribute): Add support
	for ISA 3.0 SFmode/DFmode d-form addressing to Altivec registers.
	(f32_lm2): Likewise.
	(f32_li2): Likewise.
	(f32_sr2): Likewise.
	(f32_sm2): Likewise.
	(f32_si2): Likewise.
	(f64_p9): Likewise.
	(extendsfdf2_fpr): Likewise.
	(mov<mode>_hardfloat): Likewise.
	(mov<mode>_hardfloat32): Likewise.
	(mov<mode>_hardfloat64): Likewise.

	* doc/md.texi (RS/6000 constraints): Document wb constraint.
	Fixup we constraint documentation.

[gcc/testsuite]
2015-11-10  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* gcc.target/powerpc/dform-1.c: New test.
	* gcc.target/powerpc/dform-2.c: Likewise.


-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH], Add power9 support to GCC, patch #10 (SFmode/DFmode d-form addressing)
  2015-11-10 21:56 ` [PATCH], Add power9 support to GCC, patch #10 (SFmode/DFmode d-form addressing) Michael Meissner
@ 2015-11-11  0:19   ` Segher Boessenkool
  2015-11-11  0:26   ` Michael Meissner
  2015-11-24 18:08   ` David Edelsohn
  2 siblings, 0 replies; 47+ messages in thread
From: Segher Boessenkool @ 2015-11-11  0:19 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, dje.gcc

On Tue, Nov 10, 2015 at 04:56:15PM -0500, Michael Meissner wrote:
> This patch d-form addressing to float/double scalars for the PowerPC that was
> added in ISA 3.0 (power9).  This patch does not yet turn on D-form addressing
> as default.  It is likely that patch #11, which will add limited d-form
> addressing to vector registers will enable it by default.
> 
> I have bootstrapped the compiler with these changes, and there were no
> regressions to the testsuite.
> 
> In addition, I built all of the Spec 2006 benchmark with my normal options
> (-ffast-math -O3 -mveclibabi=mass -mcpu=power9 -mpower9-dform -mrecip=rsqrt
> -fpeel-loops -funroll-loops -fvect-cost-model -msave-toc-indirect
> -fno-aggressive-loop-optimizations -mno-pointers-to-nested-functions) and there
> were no compiler failures (and various power9 instructions were generated,
> including d-form addressing).
> 
> Are these patches ok to check in?

You forgot the patch again, it must be a curse ;-)


Segher

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH], Add power9 support to GCC, patch #10 (SFmode/DFmode d-form addressing)
  2015-11-10 21:56 ` [PATCH], Add power9 support to GCC, patch #10 (SFmode/DFmode d-form addressing) Michael Meissner
  2015-11-11  0:19   ` Segher Boessenkool
@ 2015-11-11  0:26   ` Michael Meissner
  2015-11-24 18:08   ` David Edelsohn
  2 siblings, 0 replies; 47+ messages in thread
From: Michael Meissner @ 2015-11-11  0:26 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, dje.gcc

[-- Attachment #1: Type: text/plain, Size: 1712 bytes --]

Arghh, forgot the the patch once again.

[gcc]
2015-11-10  Michael Meissner  <meissner@linux.vnet.ibm.com>

	<patch #10>
	* config/rs6000/constraints.md (wb constraint): New constraint for
	ISA 3.0 d-form scalar addressing.

	* config/rs6000/rs6000.c (mode_supports_vmx_dform): Add support
	for ISA 3.0 D-form addressing to load SFmode/DFmode scalars into
	Altivec registers.  Add wb constraint for Altivec registers with
	D-form addressing.  If we have ISA 3.0 d-form support, undo
	secondary reload support for using FPR registers if we want to do
	D-form addressing.
	(rs6000_debug_reg_global): Likewise.
	(rs6000_setup_reg_addr_masks): Likewise.
	(rs6000_init_hard_regno_mode_ok): Likewise.
	(rs6000_secondary_reload): Likewise.
	(rs6000_preferred_reload_class): Likewise.
	(rs6000_secondary_reload_class): Likewise.

	* config/rs6000/rs6000.h (enum r6000_reg_class_enum): Add wb
	constraint.

	* config/rs6000/rs6000.md (f32_lr2 mode attribute): Add support
	for ISA 3.0 SFmode/DFmode d-form addressing to Altivec registers.
	(f32_lm2): Likewise.
	(f32_li2): Likewise.
	(f32_sr2): Likewise.
	(f32_sm2): Likewise.
	(f32_si2): Likewise.
	(f64_p9): Likewise.
	(extendsfdf2_fpr): Likewise.
	(mov<mode>_hardfloat): Likewise.
	(mov<mode>_hardfloat32): Likewise.
	(mov<mode>_hardfloat64): Likewise.

	* doc/md.texi (RS/6000 constraints): Document wb constraint.
	Fixup we constraint documentation.

[gcc/testsuite]
2015-11-10  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* gcc.target/powerpc/dform-1.c: New test.
	* gcc.target/powerpc/dform-2.c: Likewise.


-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

[-- Attachment #2: gcc-power9.official-10b --]
[-- Type: text/plain, Size: 26485 bytes --]

Index: gcc/config/rs6000/constraints.md
===================================================================
--- gcc/config/rs6000/constraints.md	(revision 230078)
+++ gcc/config/rs6000/constraints.md	(working copy)
@@ -56,7 +56,8 @@ (define_register_constraint "z" "CA_REGS
 (define_register_constraint "wa" "rs6000_constraints[RS6000_CONSTRAINT_wa]"
   "Any VSX register if the -mvsx option was used or NO_REGS.")
 
-;; wb is not currently used
+(define_register_constraint "wb" "rs6000_constraints[RS6000_CONSTRAINT_wb]"
+  "Altivec register if the -mpower9-dform option was used or NO_REGS.")
 
 ;; NOTE: For compatibility, "wc" is reserved to represent individual CR bits.
 ;; It is currently used for that purpose in LLVM.
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 230078)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -408,6 +408,13 @@ mode_supports_pre_modify_p (machine_mode
 	  != 0);
 }
 
+/* Return true if we have D-form addressing in altivec registers.  */
+static inline bool
+mode_supports_vmx_dform (machine_mode mode)
+{
+  return ((reg_addr[mode].addr_mask[RELOAD_REG_VMX] & RELOAD_REG_OFFSET) != 0);
+}
+
 \f
 /* Target cpu costs.  */
 
@@ -2258,7 +2265,9 @@ rs6000_debug_reg_global (void)
 	   "f  reg_class = %s\n"
 	   "v  reg_class = %s\n"
 	   "wa reg_class = %s\n"
+	   "wb reg_class = %s\n"
 	   "wd reg_class = %s\n"
+	   "we reg_class = %s\n"
 	   "wf reg_class = %s\n"
 	   "wg reg_class = %s\n"
 	   "wh reg_class = %s\n"
@@ -2283,7 +2292,9 @@ rs6000_debug_reg_global (void)
 	   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_f]],
 	   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_v]],
 	   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wa]],
+	   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wb]],
 	   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wd]],
+	   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_we]],
 	   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wf]],
 	   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wg]],
 	   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wh]],
@@ -2664,9 +2675,15 @@ rs6000_setup_reg_addr_masks (void)
 	    }
 
 	  /* GPR and FPR registers can do REG+OFFSET addressing, except
-	     possibly for SDmode.  */
+	     possibly for SDmode.  ISA 3.0 (i.e. power9) adds D-form
+	     addressing for scalars to altivec registers.  */
 	  if ((addr_mask != 0) && !indexed_only_p
-	      && (rc == RELOAD_REG_GPR || rc == RELOAD_REG_FPR))
+	      && msize <= 8
+	      && (rc == RELOAD_REG_GPR
+		  || rc == RELOAD_REG_FPR
+		  || (rc == RELOAD_REG_VMX
+		      && TARGET_P9_DFORM
+		      && (m2 == DFmode || m2 == SFmode))))
 	    addr_mask |= RELOAD_REG_OFFSET;
 
 	  /* VMX registers can do (REG & -16) and ((REG+REG) & -16)
@@ -2990,6 +3007,10 @@ rs6000_init_hard_regno_mode_ok (bool glo
 	rs6000_constraints[RS6000_CONSTRAINT_wp] = VSX_REGS;	/* TFmode  */
     }
 
+  /* Support for new D-form instructions.  */
+  if (TARGET_P9_DFORM)
+    rs6000_constraints[RS6000_CONSTRAINT_wb] = ALTIVEC_REGS;
+
   /* Support for new direct moves.  */
   if (TARGET_DIRECT_MOVE_128)
     rs6000_constraints[RS6000_CONSTRAINT_we] = VSX_REGS;
@@ -18324,8 +18345,10 @@ rs6000_secondary_reload (bool in_p,
 
   /* If this is a scalar floating point value and we want to load it into the
      traditional Altivec registers, do it via a move via a traditional floating
-     point register.  Also make sure that non-zero constants use a FPR.  */
+     point register, unless we have D-form addressing.  Also make sure that
+     non-zero constants use a FPR.  */
   if (!done_p && reg_addr[mode].scalar_in_vmx_p
+      && !mode_supports_vmx_dform (mode)
       && (rclass == VSX_REGS || rclass == ALTIVEC_REGS)
       && (memory_p || (GET_CODE (x) == CONST_DOUBLE)))
     {
@@ -18889,10 +18912,14 @@ rs6000_preferred_reload_class (rtx x, en
 	  return NO_REGS;
 	}
 
-      /* If this is a scalar floating point value, prefer the traditional
-	 floating point registers so that we can use D-form (register+offset)
-	 addressing.  */
-      if (GET_MODE_SIZE (mode) < 16)
+      /* D-form addressing can easily reload the value.  */
+      if (mode_supports_vmx_dform (mode))
+	return rclass;
+
+      /* If this is a scalar floating point value and we don't have D-form
+	 addressing, prefer the traditional floating point registers so that we
+	 can use D-form (register+offset) addressing.  */
+      if (GET_MODE_SIZE (mode) < 16 && rclass == VSX_REGS)
 	return FLOAT_REGS;
 
       /* Prefer the Altivec registers if Altivec is handling the vector
@@ -19041,6 +19068,7 @@ rs6000_secondary_reload_class (enum reg_
      instead of reloading the secondary memory address for Altivec moves.  */
   if (TARGET_VSX
       && GET_MODE_SIZE (mode) < 16
+      && !mode_supports_vmx_dform (mode)
       && (((rclass == GENERAL_REGS || rclass == BASE_REGS)
            && (regno >= 0 && ALTIVEC_REGNO_P (regno)))
           || ((rclass == VSX_REGS || rclass == ALTIVEC_REGS)
Index: gcc/config/rs6000/rs6000.h
===================================================================
--- gcc/config/rs6000/rs6000.h	(revision 230079)
+++ gcc/config/rs6000/rs6000.h	(working copy)
@@ -1523,6 +1523,7 @@ enum r6000_reg_class_enum {
   RS6000_CONSTRAINT_f,		/* fpr registers for single values */
   RS6000_CONSTRAINT_v,		/* Altivec registers */
   RS6000_CONSTRAINT_wa,		/* Any VSX register */
+  RS6000_CONSTRAINT_wb,		/* Altivec register if ISA 3.0 vector. */
   RS6000_CONSTRAINT_wd,		/* VSX register for V2DF */
   RS6000_CONSTRAINT_we,		/* VSX register if ISA 3.0 vector. */
   RS6000_CONSTRAINT_wf,		/* VSX register for V4SF */
Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(revision 230079)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -429,16 +429,22 @@ (define_mode_attr real_value_to_target [
 					(DD "REAL_VALUE_TO_TARGET_DECIMAL64")])
 
 ; Definitions for load to 32-bit fpr register
-(define_mode_attr f32_lr [(SF "f")		 (SD "wz")])
-(define_mode_attr f32_lm [(SF "m")		 (SD "Z")])
-(define_mode_attr f32_li [(SF "lfs%U1%X1 %0,%1") (SD "lfiwzx %0,%y1")])
-(define_mode_attr f32_lv [(SF "lxsspx %x0,%y1")	 (SD "lxsiwzx %x0,%y1")])
+(define_mode_attr f32_lr  [(SF "f")		  (SD "wz")])
+(define_mode_attr f32_lr2 [(SF "wb")		  (SD "wn")])
+(define_mode_attr f32_lm  [(SF "m")		  (SD "Z")])
+(define_mode_attr f32_lm2 [(SF "o")		  (SD "wn")])
+(define_mode_attr f32_li  [(SF "lfs%U1%X1 %0,%1") (SD "lfiwzx %0,%y1")])
+(define_mode_attr f32_li2 [(SF "lxssp %0,%1")     (SD "lfiwzx %0,%y1")])
+(define_mode_attr f32_lv  [(SF "lxsspx %x0,%y1")  (SD "lxsiwzx %x0,%y1")])
 
 ; Definitions for store from 32-bit fpr register
-(define_mode_attr f32_sr [(SF "f")		  (SD "wx")])
-(define_mode_attr f32_sm [(SF "m")		  (SD "Z")])
-(define_mode_attr f32_si [(SF "stfs%U0%X0 %1,%0") (SD "stfiwx %1,%y0")])
-(define_mode_attr f32_sv [(SF "stxsspx %x1,%y0")  (SD "stxsiwzx %x1,%y0")])
+(define_mode_attr f32_sr  [(SF "f")		   (SD "wx")])
+(define_mode_attr f32_sr2 [(SF "wb")		   (SD "wn")])
+(define_mode_attr f32_sm  [(SF "m")		   (SD "Z")])
+(define_mode_attr f32_sm2 [(SF "o")		   (SD "wn")])
+(define_mode_attr f32_si  [(SF "stfs%U0%X0 %1,%0") (SD "stfiwx %1,%y0")])
+(define_mode_attr f32_si2 [(SF "stxssp %1,%0")     (SD "stfiwx %1,%y0")])
+(define_mode_attr f32_sv  [(SF "stxsspx %x1,%y0")  (SD "stxsiwzx %x1,%y0")])
 
 ; Definitions for 32-bit fpr direct move
 ; At present, the decimal modes are not allowed in the traditional altivec
@@ -460,6 +466,9 @@ (define_mode_attr f64_dm  [(DF "wk") (DD
 ; Definitions for 64-bit use of altivec registers
 (define_mode_attr f64_av  [(DF "wv") (DD "wn")])
 
+; Definitions for 64-bit access to ISA 3.0 (power9) vector
+(define_mode_attr f64_p9  [(DF "wb") (DD "wn")])
+
 ; These modes do not fit in integer registers in 32-bit mode.
 ; but on e500v2, the gpr are 64 bit registers
 (define_mode_iterator DIFD [DI (DF "!TARGET_E500_DOUBLE") DD])
@@ -4455,8 +4464,8 @@ (define_expand "extendsfdf2"
   "")
 
 (define_insn_and_split "*extendsfdf2_fpr"
-  [(set (match_operand:DF 0 "gpc_reg_operand" "=d,?d,d,ws,?ws,wu")
-	(float_extend:DF (match_operand:SF 1 "reg_or_mem_operand" "0,f,m,0,wy,Z")))]
+  [(set (match_operand:DF 0 "gpc_reg_operand" "=d,?d,d,ws,?ws,wu,wb")
+	(float_extend:DF (match_operand:SF 1 "reg_or_mem_operand" "0,f,m,0,wy,Z,o")))]
   "TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_DOUBLE_FLOAT"
   "@
    #
@@ -4464,14 +4473,15 @@ (define_insn_and_split "*extendsfdf2_fpr
    lfs%U1%X1 %0,%1
    #
    xscpsgndp %x0,%x1,%x1
-   lxsspx %x0,%y1"
+   lxsspx %x0,%y1
+   lxssp %0,%1"
   "&& reload_completed && REG_P (operands[1]) && REGNO (operands[0]) == REGNO (operands[1])"
   [(const_int 0)]
 {
   emit_note (NOTE_INSN_DELETED);
   DONE;
 }
-  [(set_attr "type" "fp,fp,fpload,fp,fp,fpload")])
+  [(set_attr "type" "fp,fp,fpload,fp,fp,fpload,fpload")])
 
 (define_expand "truncdfsf2"
   [(set (match_operand:SF 0 "gpc_reg_operand" "")
@@ -6435,8 +6445,8 @@ (define_split
 }")
 
 (define_insn "mov<mode>_hardfloat"
-  [(set (match_operand:FMOVE32 0 "nonimmediate_operand" "=!r,!r,m,f,<f32_vsx>,<f32_vsx>,!r,<f32_lr>,<f32_sm>,<f32_av>,Z,?<f32_dm>,?r,*c*l,!r,*h")
-	(match_operand:FMOVE32 1 "input_operand" "r,m,r,f,<f32_vsx>,j,j,<f32_lm>,<f32_sr>,Z,<f32_av>,r,<f32_dm>,r,h,0"))]
+  [(set (match_operand:FMOVE32 0 "nonimmediate_operand" "=!r,!r,m,f,<f32_vsx>,<f32_vsx>,!r,<f32_lr>,<f32_lr2>,<f32_sm>,<f32_sm2>,<f32_av>,Z,?<f32_dm>,?r,*c*l,!r,*h")
+	(match_operand:FMOVE32 1 "input_operand" "r,m,r,f,<f32_vsx>,j,j,<f32_lm>,<f32_lm2>,<f32_sr>,<f32_sr2>,Z,<f32_av>,r,<f32_dm>,r,h,0"))]
   "(gpc_reg_operand (operands[0], <MODE>mode)
    || gpc_reg_operand (operands[1], <MODE>mode))
    && (TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_SINGLE_FLOAT)"
@@ -6449,7 +6459,9 @@ (define_insn "mov<mode>_hardfloat"
    xxlxor %x0,%x0,%x0
    li %0,0
    <f32_li>
+   <f32_li2>
    <f32_si>
+   <f32_si2>
    <f32_lv>
    <f32_sv>
    mtvsrwz %x0,%1
@@ -6457,7 +6469,7 @@ (define_insn "mov<mode>_hardfloat"
    mt%0 %1
    mf%1 %0
    nop"
-  [(set_attr "type" "*,load,store,fp,fp,vecsimple,integer,fpload,fpstore,fpload,fpstore,mftgpr,mffgpr,mtjmpr,mfjmpr,*")
+  [(set_attr "type" "*,load,store,fp,fp,vecsimple,integer,fpload,fpload,fpstore,fpstore,fpload,fpstore,mftgpr,mffgpr,mtjmpr,mfjmpr,*")
    (set_attr "length" "4")])
 
 (define_insn "*mov<mode>_softfloat"
@@ -6566,14 +6578,15 @@ (define_split
 ;; into a floating point register when it is needed for a floating point
 ;; operation.  Prefer traditional floating point registers over VSX registers,
 ;; since the D-form version of the memory instructions does not need a GPR for
-;; reloading.
+;; reloading.  ISA 3.0 (power9) adds D-form addressing for scalars to Altivec
+;; registers.
 
 ;; If we have FPR registers, rs6000_emit_move has moved all constants to memory,
 ;; except for 0.0 which can be created on VSX with an xor instruction.
 
 (define_insn "*mov<mode>_hardfloat32"
-  [(set (match_operand:FMOVE64 0 "nonimmediate_operand" "=m,d,d,<f64_av>,Z,<f64_vsx>,<f64_vsx>,!r,Y,r,!r")
-	(match_operand:FMOVE64 1 "input_operand" "d,m,d,Z,<f64_av>,<f64_vsx>,j,j,r,Y,r"))]
+  [(set (match_operand:FMOVE64 0 "nonimmediate_operand" "=m,d,d,<f64_av>,Z,<f64_p9>,o,<f64_vsx>,<f64_vsx>,!r,Y,r,!r")
+	(match_operand:FMOVE64 1 "input_operand" "d,m,d,Z,<f64_av>,o,<f64_p9>,<f64_vsx>,j,j,r,Y,r"))]
   "! TARGET_POWERPC64 && TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_DOUBLE_FLOAT 
    && (gpc_reg_operand (operands[0], <MODE>mode)
        || gpc_reg_operand (operands[1], <MODE>mode))"
@@ -6583,14 +6596,16 @@ (define_insn "*mov<mode>_hardfloat32"
    fmr %0,%1
    lxsd%U1x %x0,%y1
    stxsd%U0x %x1,%y0
+   lxsd %0,%1
+   stxsd %1,%0
    xxlor %x0,%x1,%x1
    xxlxor %x0,%x0,%x0
    #
    #
    #
    #"
-  [(set_attr "type" "fpstore,fpload,fp,fpload,fpstore,vecsimple,vecsimple,two,store,load,two")
-   (set_attr "length" "4,4,4,4,4,4,4,8,8,8,8")])
+  [(set_attr "type" "fpstore,fpload,fp,fpload,fpstore,fpload,fpstore,vecsimple,vecsimple,two,store,load,two")
+   (set_attr "length" "4,4,4,4,4,4,4,4,4,8,8,8,8")])
 
 (define_insn "*mov<mode>_softfloat32"
   [(set (match_operand:FMOVE64 0 "nonimmediate_operand" "=Y,r,r,r,r,r")
@@ -6608,8 +6623,8 @@ (define_insn "*mov<mode>_softfloat32"
 ; ld/std require word-aligned displacements -> 'Y' constraint.
 ; List Y->r and r->Y before r->r for reload.
 (define_insn "*mov<mode>_hardfloat64"
-  [(set (match_operand:FMOVE64 0 "nonimmediate_operand" "=m,d,d,<f64_av>,Z,<f64_vsx>,<f64_vsx>,!r,Y,r,!r,*c*l,!r,*h,r,wg,r,<f64_dm>")
-	(match_operand:FMOVE64 1 "input_operand" "d,m,d,Z,<f64_av>,<f64_vsx>,j,j,r,Y,r,r,h,0,wg,r,<f64_dm>,r"))]
+  [(set (match_operand:FMOVE64 0 "nonimmediate_operand" "=m,d,d,<f64_p9>,o,<f64_av>,Z,<f64_vsx>,<f64_vsx>,!r,Y,r,!r,*c*l,!r,*h,r,wg,r,<f64_dm>")
+	(match_operand:FMOVE64 1 "input_operand" "d,m,d,o,<f64_p9>,Z,<f64_av>,<f64_vsx>,j,j,r,Y,r,r,h,0,wg,r,<f64_dm>,r"))]
   "TARGET_POWERPC64 && TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_DOUBLE_FLOAT
    && (gpc_reg_operand (operands[0], <MODE>mode)
        || gpc_reg_operand (operands[1], <MODE>mode))"
@@ -6617,6 +6632,8 @@ (define_insn "*mov<mode>_hardfloat64"
    stfd%U0%X0 %1,%0
    lfd%U1%X1 %0,%1
    fmr %0,%1
+   lxsd %0,%1
+   stxsd %1,%0
    lxsd%U1x %x0,%y1
    stxsd%U0x %x1,%y0
    xxlor %x0,%x1,%x1
@@ -6632,7 +6649,7 @@ (define_insn "*mov<mode>_hardfloat64"
    mffgpr %0,%1
    mfvsrd %0,%x1
    mtvsrd %x0,%1"
-  [(set_attr "type" "fpstore,fpload,fp,fpload,fpstore,vecsimple,vecsimple,integer,store,load,*,mtjmpr,mfjmpr,*,mftgpr,mffgpr,mftgpr,mffgpr")
+  [(set_attr "type" "fpstore,fpload,fp,fpload,fpstore,fpload,fpstore,vecsimple,vecsimple,integer,store,load,*,mtjmpr,mfjmpr,*,mftgpr,mffgpr,mftgpr,mffgpr")
    (set_attr "length" "4")])
 
 (define_insn "*mov<mode>_softfloat64"
Index: gcc/doc/md.texi
===================================================================
--- gcc/doc/md.texi	(revision 230078)
+++ gcc/doc/md.texi	(working copy)
@@ -3137,11 +3137,15 @@ asm ("xsaddqp %x0,%x1,%x2" : "=v" (v1) :
 
 is incorrect.
 
+@item wb
+Altivec register if @option{-mpower9-dform} is used or NO_REGS.
+
 @item wd
 VSX vector register to hold vector double data or NO_REGS.
 
 @item we
-VSX register if the -mpower9-vector -m64 options were used or NO_REGS.
+VSX register if the @option{-mpower9-vector} and @option{-m64} options
+were used or NO_REGS.
 
 @item wf
 VSX vector register to hold vector float data or NO_REGS.
Index: gcc/testsuite/gcc.target/powerpc/dform-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/dform-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/dform-1.c	(revision 0)
@@ -0,0 +1,207 @@
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-options "-mcpu=power9 -mpower9-dform -O2" } */
+
+#ifndef TYPE
+#define TYPE double
+#endif
+
+#ifndef TYPE_IN
+#define TYPE_IN TYPE
+#endif
+
+#ifndef TYPE_OUT
+#define TYPE_OUT TYPE
+#endif
+
+#ifndef ITYPE
+#define ITYPE long
+#endif
+
+#ifdef DO_CALL
+extern ITYPE get_bits (ITYPE);
+
+#else
+#define get_bits(X) (X)
+#endif
+
+void test (ITYPE *bits, ITYPE n, TYPE one, TYPE_IN *p, TYPE_OUT *q)
+{
+  TYPE x_00 = p[ 0];
+  TYPE x_01 = p[ 1];
+  TYPE x_02 = p[ 2];
+  TYPE x_03 = p[ 3];
+  TYPE x_04 = p[ 4];
+  TYPE x_05 = p[ 5];
+  TYPE x_06 = p[ 6];
+  TYPE x_07 = p[ 7];
+  TYPE x_08 = p[ 8];
+  TYPE x_09 = p[ 9];
+
+  TYPE x_10 = p[10];
+  TYPE x_11 = p[11];
+  TYPE x_12 = p[12];
+  TYPE x_13 = p[13];
+  TYPE x_14 = p[14];
+  TYPE x_15 = p[15];
+  TYPE x_16 = p[16];
+  TYPE x_17 = p[17];
+  TYPE x_18 = p[18];
+  TYPE x_19 = p[19];
+
+  TYPE x_20 = p[20];
+  TYPE x_21 = p[21];
+  TYPE x_22 = p[22];
+  TYPE x_23 = p[23];
+  TYPE x_24 = p[24];
+  TYPE x_25 = p[25];
+  TYPE x_26 = p[26];
+  TYPE x_27 = p[27];
+  TYPE x_28 = p[28];
+  TYPE x_29 = p[29];
+
+  TYPE x_30 = p[30];
+  TYPE x_31 = p[31];
+  TYPE x_32 = p[32];
+  TYPE x_33 = p[33];
+  TYPE x_34 = p[34];
+  TYPE x_35 = p[35];
+  TYPE x_36 = p[36];
+  TYPE x_37 = p[37];
+  TYPE x_38 = p[38];
+  TYPE x_39 = p[39];
+
+  TYPE x_40 = p[40];
+  TYPE x_41 = p[41];
+  TYPE x_42 = p[42];
+  TYPE x_43 = p[43];
+  TYPE x_44 = p[44];
+  TYPE x_45 = p[45];
+  TYPE x_46 = p[46];
+  TYPE x_47 = p[47];
+  TYPE x_48 = p[48];
+  TYPE x_49 = p[49];
+
+  ITYPE i;
+
+  for (i = 0; i < n; i++)
+    {
+      ITYPE bit = get_bits (bits[i]);
+
+      if ((bit & ((ITYPE)1) << 	0) != 0) x_00 += one;
+      if ((bit & ((ITYPE)1) << 	1) != 0) x_01 += one;
+      if ((bit & ((ITYPE)1) << 	2) != 0) x_02 += one;
+      if ((bit & ((ITYPE)1) << 	3) != 0) x_03 += one;
+      if ((bit & ((ITYPE)1) << 	4) != 0) x_04 += one;
+      if ((bit & ((ITYPE)1) << 	5) != 0) x_05 += one;
+      if ((bit & ((ITYPE)1) << 	6) != 0) x_06 += one;
+      if ((bit & ((ITYPE)1) << 	7) != 0) x_07 += one;
+      if ((bit & ((ITYPE)1) << 	8) != 0) x_08 += one;
+      if ((bit & ((ITYPE)1) << 	9) != 0) x_09 += one;
+
+      if ((bit & ((ITYPE)1) << 10) != 0) x_10 += one;
+      if ((bit & ((ITYPE)1) << 11) != 0) x_11 += one;
+      if ((bit & ((ITYPE)1) << 12) != 0) x_12 += one;
+      if ((bit & ((ITYPE)1) << 13) != 0) x_13 += one;
+      if ((bit & ((ITYPE)1) << 14) != 0) x_14 += one;
+      if ((bit & ((ITYPE)1) << 15) != 0) x_15 += one;
+      if ((bit & ((ITYPE)1) << 16) != 0) x_16 += one;
+      if ((bit & ((ITYPE)1) << 17) != 0) x_17 += one;
+      if ((bit & ((ITYPE)1) << 18) != 0) x_18 += one;
+      if ((bit & ((ITYPE)1) << 19) != 0) x_19 += one;
+
+      if ((bit & ((ITYPE)1) << 20) != 0) x_20 += one;
+      if ((bit & ((ITYPE)1) << 21) != 0) x_21 += one;
+      if ((bit & ((ITYPE)1) << 22) != 0) x_22 += one;
+      if ((bit & ((ITYPE)1) << 23) != 0) x_23 += one;
+      if ((bit & ((ITYPE)1) << 24) != 0) x_24 += one;
+      if ((bit & ((ITYPE)1) << 25) != 0) x_25 += one;
+      if ((bit & ((ITYPE)1) << 26) != 0) x_26 += one;
+      if ((bit & ((ITYPE)1) << 27) != 0) x_27 += one;
+      if ((bit & ((ITYPE)1) << 28) != 0) x_28 += one;
+      if ((bit & ((ITYPE)1) << 29) != 0) x_29 += one;
+
+      if ((bit & ((ITYPE)1) << 30) != 0) x_30 += one;
+      if ((bit & ((ITYPE)1) << 31) != 0) x_31 += one;
+      if ((bit & ((ITYPE)1) << 32) != 0) x_32 += one;
+      if ((bit & ((ITYPE)1) << 33) != 0) x_33 += one;
+      if ((bit & ((ITYPE)1) << 34) != 0) x_34 += one;
+      if ((bit & ((ITYPE)1) << 35) != 0) x_35 += one;
+      if ((bit & ((ITYPE)1) << 36) != 0) x_36 += one;
+      if ((bit & ((ITYPE)1) << 37) != 0) x_37 += one;
+      if ((bit & ((ITYPE)1) << 38) != 0) x_38 += one;
+      if ((bit & ((ITYPE)1) << 39) != 0) x_39 += one;
+
+      if ((bit & ((ITYPE)1) << 40) != 0) x_40 += one;
+      if ((bit & ((ITYPE)1) << 41) != 0) x_41 += one;
+      if ((bit & ((ITYPE)1) << 42) != 0) x_42 += one;
+      if ((bit & ((ITYPE)1) << 43) != 0) x_43 += one;
+      if ((bit & ((ITYPE)1) << 44) != 0) x_44 += one;
+      if ((bit & ((ITYPE)1) << 45) != 0) x_45 += one;
+      if ((bit & ((ITYPE)1) << 46) != 0) x_46 += one;
+      if ((bit & ((ITYPE)1) << 47) != 0) x_47 += one;
+      if ((bit & ((ITYPE)1) << 48) != 0) x_48 += one;
+      if ((bit & ((ITYPE)1) << 49) != 0) x_49 += one;
+    }
+
+  q[ 0] = x_00;
+  q[ 1] = x_01;
+  q[ 2] = x_02;
+  q[ 3] = x_03;
+  q[ 4] = x_04;
+  q[ 5] = x_05;
+  q[ 6] = x_06;
+  q[ 7] = x_07;
+  q[ 8] = x_08;
+  q[ 9] = x_09;
+
+  q[10] = x_10;
+  q[11] = x_11;
+  q[12] = x_12;
+  q[13] = x_13;
+  q[14] = x_14;
+  q[15] = x_15;
+  q[16] = x_16;
+  q[17] = x_17;
+  q[18] = x_18;
+  q[19] = x_19;
+
+  q[20] = x_20;
+  q[21] = x_21;
+  q[22] = x_22;
+  q[23] = x_23;
+  q[24] = x_24;
+  q[25] = x_25;
+  q[26] = x_26;
+  q[27] = x_27;
+  q[28] = x_28;
+  q[29] = x_29;
+
+  q[30] = x_30;
+  q[31] = x_31;
+  q[32] = x_32;
+  q[33] = x_33;
+  q[34] = x_34;
+  q[35] = x_35;
+  q[36] = x_36;
+  q[37] = x_37;
+  q[38] = x_38;
+  q[39] = x_39;
+
+  q[40] = x_40;
+  q[41] = x_41;
+  q[42] = x_42;
+  q[43] = x_43;
+  q[44] = x_44;
+  q[45] = x_45;
+  q[46] = x_46;
+  q[47] = x_47;
+  q[48] = x_48;
+  q[49] = x_49;
+}
+
+/* { dg-final { scan-assembler     "lxsd "   } } */
+/* { dg-final { scan-assembler     "stxsd "  } } */
+/* { dg-final { scan-assembler-not "mfvsrd " } } */
+/* { dg-final { scan-assembler-not "mtvsrd " } } */
Index: gcc/testsuite/gcc.target/powerpc/dform-2.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/dform-2.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/dform-2.c	(revision 0)
@@ -0,0 +1,209 @@
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-options "-mcpu=power9 -mpower9-dform -O2" } */
+
+#ifndef TYPE
+#define TYPE float
+#endif
+
+#ifndef TYPE_IN
+#define TYPE_IN TYPE
+#endif
+
+#ifndef TYPE_OUT
+#define TYPE_OUT TYPE
+#endif
+
+#ifndef ITYPE
+#define ITYPE long
+#endif
+
+#ifdef DO_CALL
+extern ITYPE get_bits (ITYPE);
+
+#else
+#define get_bits(X) (X)
+#endif
+
+void test (ITYPE *bits, ITYPE n, TYPE one, TYPE_IN *p, TYPE_OUT *q)
+{
+  TYPE x_00 = p[ 0];
+  TYPE x_01 = p[ 1];
+  TYPE x_02 = p[ 2];
+  TYPE x_03 = p[ 3];
+  TYPE x_04 = p[ 4];
+  TYPE x_05 = p[ 5];
+  TYPE x_06 = p[ 6];
+  TYPE x_07 = p[ 7];
+  TYPE x_08 = p[ 8];
+  TYPE x_09 = p[ 9];
+
+  TYPE x_10 = p[10];
+  TYPE x_11 = p[11];
+  TYPE x_12 = p[12];
+  TYPE x_13 = p[13];
+  TYPE x_14 = p[14];
+  TYPE x_15 = p[15];
+  TYPE x_16 = p[16];
+  TYPE x_17 = p[17];
+  TYPE x_18 = p[18];
+  TYPE x_19 = p[19];
+
+  TYPE x_20 = p[20];
+  TYPE x_21 = p[21];
+  TYPE x_22 = p[22];
+  TYPE x_23 = p[23];
+  TYPE x_24 = p[24];
+  TYPE x_25 = p[25];
+  TYPE x_26 = p[26];
+  TYPE x_27 = p[27];
+  TYPE x_28 = p[28];
+  TYPE x_29 = p[29];
+
+  TYPE x_30 = p[30];
+  TYPE x_31 = p[31];
+  TYPE x_32 = p[32];
+  TYPE x_33 = p[33];
+  TYPE x_34 = p[34];
+  TYPE x_35 = p[35];
+  TYPE x_36 = p[36];
+  TYPE x_37 = p[37];
+  TYPE x_38 = p[38];
+  TYPE x_39 = p[39];
+
+  TYPE x_40 = p[40];
+  TYPE x_41 = p[41];
+  TYPE x_42 = p[42];
+  TYPE x_43 = p[43];
+  TYPE x_44 = p[44];
+  TYPE x_45 = p[45];
+  TYPE x_46 = p[46];
+  TYPE x_47 = p[47];
+  TYPE x_48 = p[48];
+  TYPE x_49 = p[49];
+
+  ITYPE i;
+
+  for (i = 0; i < n; i++)
+    {
+      ITYPE bit = get_bits (bits[i]);
+
+      if ((bit & ((ITYPE)1) << 	0) != 0) x_00 += one;
+      if ((bit & ((ITYPE)1) << 	1) != 0) x_01 += one;
+      if ((bit & ((ITYPE)1) << 	2) != 0) x_02 += one;
+      if ((bit & ((ITYPE)1) << 	3) != 0) x_03 += one;
+      if ((bit & ((ITYPE)1) << 	4) != 0) x_04 += one;
+      if ((bit & ((ITYPE)1) << 	5) != 0) x_05 += one;
+      if ((bit & ((ITYPE)1) << 	6) != 0) x_06 += one;
+      if ((bit & ((ITYPE)1) << 	7) != 0) x_07 += one;
+      if ((bit & ((ITYPE)1) << 	8) != 0) x_08 += one;
+      if ((bit & ((ITYPE)1) << 	9) != 0) x_09 += one;
+
+      if ((bit & ((ITYPE)1) << 10) != 0) x_10 += one;
+      if ((bit & ((ITYPE)1) << 11) != 0) x_11 += one;
+      if ((bit & ((ITYPE)1) << 12) != 0) x_12 += one;
+      if ((bit & ((ITYPE)1) << 13) != 0) x_13 += one;
+      if ((bit & ((ITYPE)1) << 14) != 0) x_14 += one;
+      if ((bit & ((ITYPE)1) << 15) != 0) x_15 += one;
+      if ((bit & ((ITYPE)1) << 16) != 0) x_16 += one;
+      if ((bit & ((ITYPE)1) << 17) != 0) x_17 += one;
+      if ((bit & ((ITYPE)1) << 18) != 0) x_18 += one;
+      if ((bit & ((ITYPE)1) << 19) != 0) x_19 += one;
+
+      if ((bit & ((ITYPE)1) << 20) != 0) x_20 += one;
+      if ((bit & ((ITYPE)1) << 21) != 0) x_21 += one;
+      if ((bit & ((ITYPE)1) << 22) != 0) x_22 += one;
+      if ((bit & ((ITYPE)1) << 23) != 0) x_23 += one;
+      if ((bit & ((ITYPE)1) << 24) != 0) x_24 += one;
+      if ((bit & ((ITYPE)1) << 25) != 0) x_25 += one;
+      if ((bit & ((ITYPE)1) << 26) != 0) x_26 += one;
+      if ((bit & ((ITYPE)1) << 27) != 0) x_27 += one;
+      if ((bit & ((ITYPE)1) << 28) != 0) x_28 += one;
+      if ((bit & ((ITYPE)1) << 29) != 0) x_29 += one;
+
+      if ((bit & ((ITYPE)1) << 30) != 0) x_30 += one;
+      if ((bit & ((ITYPE)1) << 31) != 0) x_31 += one;
+      if ((bit & ((ITYPE)1) << 32) != 0) x_32 += one;
+      if ((bit & ((ITYPE)1) << 33) != 0) x_33 += one;
+      if ((bit & ((ITYPE)1) << 34) != 0) x_34 += one;
+      if ((bit & ((ITYPE)1) << 35) != 0) x_35 += one;
+      if ((bit & ((ITYPE)1) << 36) != 0) x_36 += one;
+      if ((bit & ((ITYPE)1) << 37) != 0) x_37 += one;
+      if ((bit & ((ITYPE)1) << 38) != 0) x_38 += one;
+      if ((bit & ((ITYPE)1) << 39) != 0) x_39 += one;
+
+      if ((bit & ((ITYPE)1) << 40) != 0) x_40 += one;
+      if ((bit & ((ITYPE)1) << 41) != 0) x_41 += one;
+      if ((bit & ((ITYPE)1) << 42) != 0) x_42 += one;
+      if ((bit & ((ITYPE)1) << 43) != 0) x_43 += one;
+      if ((bit & ((ITYPE)1) << 44) != 0) x_44 += one;
+      if ((bit & ((ITYPE)1) << 45) != 0) x_45 += one;
+      if ((bit & ((ITYPE)1) << 46) != 0) x_46 += one;
+      if ((bit & ((ITYPE)1) << 47) != 0) x_47 += one;
+      if ((bit & ((ITYPE)1) << 48) != 0) x_48 += one;
+      if ((bit & ((ITYPE)1) << 49) != 0) x_49 += one;
+    }
+
+  q[ 0] = x_00;
+  q[ 1] = x_01;
+  q[ 2] = x_02;
+  q[ 3] = x_03;
+  q[ 4] = x_04;
+  q[ 5] = x_05;
+  q[ 6] = x_06;
+  q[ 7] = x_07;
+  q[ 8] = x_08;
+  q[ 9] = x_09;
+
+  q[10] = x_10;
+  q[11] = x_11;
+  q[12] = x_12;
+  q[13] = x_13;
+  q[14] = x_14;
+  q[15] = x_15;
+  q[16] = x_16;
+  q[17] = x_17;
+  q[18] = x_18;
+  q[19] = x_19;
+
+  q[20] = x_20;
+  q[21] = x_21;
+  q[22] = x_22;
+  q[23] = x_23;
+  q[24] = x_24;
+  q[25] = x_25;
+  q[26] = x_26;
+  q[27] = x_27;
+  q[28] = x_28;
+  q[29] = x_29;
+
+  q[30] = x_30;
+  q[31] = x_31;
+  q[32] = x_32;
+  q[33] = x_33;
+  q[34] = x_34;
+  q[35] = x_35;
+  q[36] = x_36;
+  q[37] = x_37;
+  q[38] = x_38;
+  q[39] = x_39;
+
+  q[40] = x_40;
+  q[41] = x_41;
+  q[42] = x_42;
+  q[43] = x_43;
+  q[44] = x_44;
+  q[45] = x_45;
+  q[46] = x_46;
+  q[47] = x_47;
+  q[48] = x_48;
+  q[49] = x_49;
+}
+
+/* { dg-final { scan-assembler     "lxssp "     } } */
+/* { dg-final { scan-assembler     "stxssp "    } } */
+/* { dg-final { scan-assembler-not "mfvsrd "    } } */
+/* { dg-final { scan-assembler-not "mtvsrd "    } } */
+/* { dg-final { scan-assembler-not "xscvdpspn " } } */
+

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH], Add power9 support to GCC, patch #8 (add integer multiply/add)
  2015-11-10 18:39   ` [PATCH], Add power9 support to GCC, patch #8 (add integer multiply/add) Michael Meissner
@ 2015-11-12 20:39     ` David Edelsohn
  0 siblings, 0 replies; 47+ messages in thread
From: David Edelsohn @ 2015-11-12 20:39 UTC (permalink / raw)
  To: Michael Meissner, GCC Patches

On Tue, Nov 10, 2015 at 1:39 PM, Michael Meissner
<meissner@linux.vnet.ibm.com> wrote:
> This patch adds support for the MADDLD instruciton, which is a fused
> multiply/add instruction for integers.  At this time, it is for 64-bit
> multiplies only.  Eventually, we will restructure 128-bit multiply so that we
> can use the 64x64 + 64 high bit varients.
>
> I have bootstrapped a compiler with this change in and there were no
> regressions.  Is it ok to apply to the trunk?
>
> [gcc]
> 2015-11-10  Michael Meissner  <meissner@linux.vnet.ibm.com>
>
>         * config/rs6000/rs6000.h (TARGET_MADDLD): Add support for the ISA
>         3.0 integer multiply-add instruction.
>         * config/rs6000/rs6000.md (<u>mul<mode><dmode>3): Likewise.
>
> [gcc/testsuite]
> 2015-11-10  Michael Meissner  <meissner@linux.vnet.ibm.com>
>
>         * gcc.target/powerpc/maddld.c: New test.

Okay.

Thanks, David

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH], Add power9 support to GCC, patch #7 (direct move enhancements)
  2015-11-09  0:49 ` [PATCH], Add power9 support to GCC, patch #7 (direct move enhancements) Michael Meissner
  2015-11-09 20:00   ` Segher Boessenkool
  2015-11-09 21:06   ` Michael Meissner
@ 2015-11-12 20:43   ` David Edelsohn
  2 siblings, 0 replies; 47+ messages in thread
From: David Edelsohn @ 2015-11-12 20:43 UTC (permalink / raw)
  To: Michael Meissner, GCC Patches

On Sun, Nov 8, 2015 at 7:48 PM, Michael Meissner
<meissner@linux.vnet.ibm.com> wrote:
> This patch adds support for the new direct move instructions (MFVSRLD and
> MTVSRDD) that simplify moving 128-bit data between GPRs and vector registers.
>
> I have built previous versions of this patch with no regressions.  At the
> moment, I have built a non-bootstrap build and ran the PowerPC tests, with no
> regressions.  Assuming the bootstrap build that I've started has no
> regressions, is it ok to install in the trunk?
>
> [gcc]
> 2015-11-08  Michael Meissner  <meissner@linux.vnet.ibm.com>
>
>         * config/rs6000/constraints.md (we constraint): New constraint for
>         64-bit power9 vector support.
>         (wL constraint): New constraint for the element in a vector that
>         can be addressed by the MFVSRLD instruction.
>
>         * config/rs6000/rs6000.c (rs6000_debug_reg_global): Add ISA 3.0
>         debugging.
>         (rs6000_init_hard_regno_mode_ok): If ISA 3.0 and 64-bit, enable we
>         constraint.  Disable the VSX<->GPR direct move helpers if we have
>         the MFVSRLD and MTVSRDD instructions.
>         (rs6000_secondary_reload_simple_move): Add support for doing
>         vector direct moves directly without additional scratch registers
>         if we have ISA 3.0 instructions.
>         (rs6000_secondary_reload_direct_move): Update comments.
>         (rs6000_output_move_128bit): Add support for ISA 3.0 vector
>         instructions.
>
>         * config/rs6000/vsx.md (vsx_mov<mode>): Add support for ISA 3.0
>         direct move instructions.
>         (vsx_movti_64bit): Likewise.
>         (vsx_extract_<mode>): Likewise.
>
>         * config/rs6000/rs6000.h (VECTOR_ELEMENT_MFVSRLD_64BIT): New
>         macros for ISA 3.0 direct move instructions.
>         (TARGET_DIRECT_MOVE_128): Likewise.
>
>         * config/rs6000/rs6000.md (128-bit GPR splitters): Don't split a
>         128-bit move that is a direct move between GPR and vector
>         registers using ISA 3.0 direct move instructions.
>
>         * doc/md.texi (RS/6000 constraints): Document we, wF, wG, wL
>         constraints.  Update wa documentation to say not to use %x<n> on
>         instructions that only take Altivec registers.
>
> [gcc/testsuite]
> 2015-11-08  Michael Meissner  <meissner@linux.vnet.ibm.com>
>
>         * gcc.target/powerpc/direct-move-vector.c: New test for 128-bit
>         vector direct move instructions.

This is okay.

Thanks, David

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH], Add power9 support to GCC, patch #6 (IEEE 128-bit hardware support)
  2015-11-09  0:45 ` [PATCH], Add power9 support to GCC, patch #6 (IEEE 128-bit hardware support) Michael Meissner
  2015-11-09 19:29   ` Segher Boessenkool
  2015-11-10  0:41   ` Joseph Myers
@ 2015-11-12 20:47   ` David Edelsohn
  2015-11-13 22:13     ` [PATCH applied], Power9 patches #6-8 (IEEE 128-bit h/w, 128-bit direct move, integer mult/add) Michael Meissner
  2 siblings, 1 reply; 47+ messages in thread
From: David Edelsohn @ 2015-11-12 20:47 UTC (permalink / raw)
  To: Michael Meissner, GCC Patches

On Sun, Nov 8, 2015 at 7:44 PM, Michael Meissner
<meissner@linux.vnet.ibm.com> wrote:
> This patch adds support for the IEEE 128-bit hardware instructions that are
> being added to the PowerPC ISA 3.0 (power9).  With this patch, users on power7
> and power8 will use the software emulation functions that are committed, but
> still need some enhancment.  On ISA 3.0/power9, they would be able to use the
> direct instructions.
>
> I have built this patch with a bootstrap build on a power8 little endian
> system.  There were no regressions in the test suite.  Is this patch ok to
> install in the trunk?
>
> [gcc]
> 2015-11-08  Michael Meissner  <meissner@linux.vnet.ibm.com>
>
>         * config/rs6000/rs6000-protos.h (convert_float128_to_int): Add
>         declaration.
>         (convert_int_to_float128): Likewise.
>         (rs6000_generate_compare): Add support for ISA 3.0 (power9)
>         hardware support for IEEE 128-bit floating point.
>         (rs6000_expand_float128_convert): Likewise.
>         (convert_float128_to_int): Likewise.
>         (convert_int_to_float128): Likewise.
>
>         * config/rs6000/rs6000.md (UNSPEC_ROUND_TO_ODD): New unspecs for
>         ISA 3.0 hardware IEEE 128-bit floating point.
>         (UNSPEC_IEEE128_MOVE): Likewise.
>         (UNSPEC_IEEE128_CONVERT): Likewise.
>         (FMA_F): Add support for IEEE 128-bit floating point hardware
>         support.
>         (Ff): Add support for DImode.
>         (Fv): Likewise.
>         (any_fix code iterator): New and updated iterators for IEEE
>         128-bit floating point hardware support.
>         (any_float code iterator): Likewise.
>         (s code attribute): Likewise.
>         (su code attribute): Likewise.
>         (az code attribute): Likewise.
>         (neg<mode>2, FLOAT128 iterator): Add support for IEEE 128-bit
>         floating point hardware support.
>         (abs<mode>2, FLOAT128 iterator): Likewise.
>         (add<mode>3, IEEE128 iterator): New insns for IEEE 128-bit
>         floating point hardware.
>         (sub<mode>3, IEEE128 iterator): Likewise.
>         (mul<mode>3, IEEE128 iterator): Likewise.
>         (div<mode>3, IEEE128 iterator): Likewise.
>         (copysign<mode>3, IEEE128 iterator): Likewise.
>         (sqrt<mode>2, IEEE128 iterator): Likewise.
>         (neg<mode>2, IEEE128 iterator): Likewise.
>         (abs<mode>2, IEEE128 iterator): Likewise.
>         (nabs<mode>2, IEEE128 iterator): Likewise.
>         (fma<mode>4_hw, IEEE128 iterator): Likewise.
>         (fms<mode>4_hw, IEEE128 iterator): Likewise.
>         (nfma<mode>4_hw, IEEE128 iterator): Likewise.
>         (nfms<mode>4_hw, IEEE128 iterator): Likewise.
>         (extend<SFDF:mode><IEEE128:mode>2_hw): Likewise.
>         (trunc<mode>df2_hw, IEEE128 iterator): Likewise.
>         (trunc<mode>sf2_hw, IEEE128 iterator): Likewise.
>         (fix_fixuns code attribute): Likewise.
>         (float_floatuns code attribute): Likewise.
>         (<fix_fixuns>_<mode>si2_hw): Likewise.
>         (<fix_fixuns>_<mode>di2_hw): Likewise.
>         (<float_floatuns>_<mode>si2_hw): Likewise.
>         (<float_floatuns>_<mode>di2_hw): Likewise.
>         (xscvqp<su>wz_<mode>): Likewise.
>         (xscvqp<su>dz_<mode>): Likewise.
>         (xscv<su>dqp_<mode): Likewise.
>         (ieee128_mfvsrd): Likewise.
>         (ieee128_mfvsrwz): Likewise.
>         (ieee128_mtvsrw): Likewise.
>         (ieee128_mtvsrd): Likewise.
>         (trunc<mode>df2_odd): Likewise.
>         (cmp<mode>_h): Likewise.
>
> [gcc/testsuite]
> 2015-11-08  Michael Meissner  <meissner@linux.vnet.ibm.com>
>
>         * gcc.target/powerpc/float128-hw.c: New test for IEEE 128-bit
>         hardware floating point support.

Please change the attribute to "uns" as suggested by Segher.

    > +(define_code_attr fix_fixuns  [(fix   "fix")   (unsigned_fix
"fixuns")])
    > +(define_code_attr float_floatuns [(float "float")
(unsigned_float "floatuns")])

    You could instead do an "uns" attribute so you would write fix<uns> etc.

Okay with that change.

We need to think more about ieee128_mtvsw pattern.

Thanks, David

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH applied], Power9 patches #6-8 (IEEE 128-bit h/w, 128-bit direct move, integer mult/add)
  2015-11-12 20:47   ` David Edelsohn
@ 2015-11-13 22:13     ` Michael Meissner
  0 siblings, 0 replies; 47+ messages in thread
From: Michael Meissner @ 2015-11-13 22:13 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Michael Meissner, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 4601 bytes --]

Here is the combo patch for patches #6, #7, and #8 that was applied.  I redid
the code attributes to have a <uns> attribute for both fix and float patterns
David requested.

[gcc]
2015-11-13  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* config/rs6000/constraints.md (we constraint): New constraint for
	64-bit power9 vector support.
	(wL constraint): New constraint for the element in a vector that
	can be addressed by the MFVSRLD instruction.

	* config/rs6000/rs6000-protos.h (convert_float128_to_int): Add
	declaration.
	(convert_int_to_float128): Likewise.
	(rs6000_generate_compare): Add support for ISA 3.0 (power9)
	hardware support for IEEE 128-bit floating point.
	(rs6000_expand_float128_convert): Likewise.
	(convert_float128_to_int): Likewise.
	(convert_int_to_float128): Likewise.

	* config/rs6000/rs6000.md (UNSPEC_ROUND_TO_ODD): New unspecs for
	ISA 3.0 hardware IEEE 128-bit floating point.
	(UNSPEC_IEEE128_MOVE): Likewise.
	(UNSPEC_IEEE128_CONVERT): Likewise.
	(FMA_F): Add support for IEEE 128-bit floating point hardware
	support.
	(Ff): Add support for DImode.
	(Fv): Likewise.
	(any_fix code iterator): New and updated iterators for IEEE
	128-bit floating point hardware support.
	(any_float code iterator): Likewise.
	(s code attribute): Likewise.
	(su code attribute): Likewise.
	(az code attribute): Likewise.
	(uns code attribute): Likewise.
	(neg<mode>2, FLOAT128 iterator): Add support for IEEE 128-bit
	floating point hardware support.
	(abs<mode>2, FLOAT128 iterator): Likewise.
	(add<mode>3, IEEE128 iterator): New insns for IEEE 128-bit
	floating point hardware.
	(sub<mode>3, IEEE128 iterator): Likewise.
	(mul<mode>3, IEEE128 iterator): Likewise.
	(div<mode>3, IEEE128 iterator): Likewise.
	(copysign<mode>3, IEEE128 iterator): Likewise.
	(sqrt<mode>2, IEEE128 iterator): Likewise.
	(neg<mode>2, IEEE128 iterator): Likewise.
	(abs<mode>2, IEEE128 iterator): Likewise.
	(nabs<mode>2, IEEE128 iterator): Likewise.
	(fma<mode>4_hw, IEEE128 iterator): Likewise.
	(fms<mode>4_hw, IEEE128 iterator): Likewise.
	(nfma<mode>4_hw, IEEE128 iterator): Likewise.
	(nfms<mode>4_hw, IEEE128 iterator): Likewise.
	(extend<SFDF:mode><IEEE128:mode>2_hw): Likewise.
	(trunc<mode>df2_hw, IEEE128 iterator): Likewise.
	(trunc<mode>sf2_hw, IEEE128 iterator): Likewise.
	(fix_fixuns code attribute): Likewise.
	(float_floatuns code attribute): Likewise.
	(fix<uns>_<mode>si2_hw): Likewise.
	(fix<uns>_<mode>di2_hw): Likewise.
	(float<uns>_<mode>si2_hw): Likewise.
	(float<uns>_<mode>di2_hw): Likewise.
	(xscvqp<su>wz_<mode>): Likewise.
	(xscvqp<su>dz_<mode>): Likewise.
	(xscv<su>dqp_<mode): Likewise.
	(ieee128_mfvsrd): Likewise.
	(ieee128_mfvsrwz): Likewise.
	(ieee128_mtvsrw): Likewise.
	(ieee128_mtvsrd): Likewise.
	(trunc<mode>df2_odd): Likewise.
	(cmp<mode>_h): Likewise.
	(128-bit GPR splitters): Don't split a 128-bit move that is a
	direct move between GPR and vector registers using ISA 3.0 direct
	move instructions.
	(maddld4): Add support for the ISA 3.0 integer multiply-add
	instruction.

	* config/rs6000/rs6000.c (rs6000_debug_reg_global): Add ISA 3.0
	debugging.
	(rs6000_init_hard_regno_mode_ok): If ISA 3.0 and 64-bit, enable we
	constraint.  Disable the VSX<->GPR direct move helpers if we have
	the MFVSRLD and MTVSRDD instructions.
	(rs6000_secondary_reload_simple_move): Add support for doing
	vector direct moves directly without additional scratch registers
	if we have ISA 3.0 instructions.
	(rs6000_secondary_reload_direct_move): Update comments.
	(rs6000_output_move_128bit): Add support for ISA 3.0 vector
	instructions.

	* config/rs6000/vsx.md (vsx_mov<mode>): Add support for ISA 3.0
	direct move instructions.
	(vsx_movti_64bit): Likewise.
	(vsx_extract_<mode>): Likewise.

	* config/rs6000/rs6000.h (VECTOR_ELEMENT_MFVSRLD_64BIT): New
	macros for ISA 3.0 direct move instructions.
	(TARGET_DIRECT_MOVE_128): Likewise.
	(TARGET_MADDLD): Add support for the ISA 3.0 integer multiply-add
	instruction.

	* doc/md.texi (RS/6000 constraints): Document we, wF, wG, wL
	constraints.  Update wa documentation to say not to use %x<n> on
	instructions that only take Altivec registers.

[gcc/testsuite]
2015-11-13  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* gcc.target/powerpc/float128-hw.c: New test for IEEE 128-bit
	hardware floating point support.

	* gcc.target/powerpc/direct-move-vector.c: New test for 128-bit
	vector direct move instructions.

	* gcc.target/powerpc/maddld.c: New test.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

[-- Attachment #2: gcc-power9.official-06-08b --]
[-- Type: text/plain, Size: 45812 bytes --]

Index: gcc/config/rs6000/constraints.md
===================================================================
--- gcc/config/rs6000/constraints.md	(revision 230335)
+++ gcc/config/rs6000/constraints.md	(working copy)
@@ -64,7 +64,8 @@ (define_register_constraint "wa" "rs6000
 (define_register_constraint "wd" "rs6000_constraints[RS6000_CONSTRAINT_wd]"
   "VSX vector register to hold vector double data or NO_REGS.")
 
-;; we is not currently used
+(define_register_constraint "we" "rs6000_constraints[RS6000_CONSTRAINT_we]"
+  "VSX register if the -mpower9-vector -m64 options were used or NO_REGS.")
 
 (define_register_constraint "wf" "rs6000_constraints[RS6000_CONSTRAINT_wf]"
   "VSX vector register to hold vector float data or NO_REGS.")
@@ -147,6 +148,12 @@ (define_memory_constraint "wG"
   "Memory operand suitable for TOC fusion memory references"
   (match_operand 0 "toc_fusion_mem_wrapped"))
 
+(define_constraint "wL"
+  "Int constant that is the element number mfvsrld accesses in a vector."
+  (and (match_code "const_int")
+       (and (match_test "TARGET_DIRECT_MOVE_128")
+	    (match_test "(ival == VECTOR_ELEMENT_MFVSRLD_64BIT)"))))
+
 ;; Lq/stq validates the address for load/store quad
 (define_memory_constraint "wQ"
   "Memory operand suitable for the load/store quad instructions"
Index: gcc/config/rs6000/rs6000-protos.h
===================================================================
--- gcc/config/rs6000/rs6000-protos.h	(revision 230335)
+++ gcc/config/rs6000/rs6000-protos.h	(working copy)
@@ -55,6 +55,8 @@ extern const char *rs6000_output_move_12
 extern bool rs6000_move_128bit_ok_p (rtx []);
 extern bool rs6000_split_128bit_ok_p (rtx []);
 extern void rs6000_expand_float128_convert (rtx, rtx, bool);
+extern void convert_float128_to_int (rtx *, enum rtx_code);
+extern void convert_int_to_float128 (rtx *, enum rtx_code);
 extern void rs6000_expand_vector_init (rtx, rtx);
 extern void paired_expand_vector_init (rtx, rtx);
 extern void rs6000_expand_vector_set (rtx, rtx, int);
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 230335)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -2575,6 +2575,10 @@ rs6000_debug_reg_global (void)
   if (TARGET_VSX)
     fprintf (stderr, DEBUG_FMT_D, "VSX easy 64-bit scalar element",
 	     (int)VECTOR_ELEMENT_SCALAR_64BIT);
+
+  if (TARGET_DIRECT_MOVE_128)
+    fprintf (stderr, DEBUG_FMT_D, "VSX easy 64-bit mfvsrld element",
+	     (int)VECTOR_ELEMENT_MFVSRLD_64BIT);
 }
 
 \f
@@ -2986,6 +2990,10 @@ rs6000_init_hard_regno_mode_ok (bool glo
 	rs6000_constraints[RS6000_CONSTRAINT_wp] = VSX_REGS;	/* TFmode  */
     }
 
+  /* Support for new direct moves.  */
+  if (TARGET_DIRECT_MOVE_128)
+    rs6000_constraints[RS6000_CONSTRAINT_we] = VSX_REGS;
+
   /* Set up the reload helper and direct move functions.  */
   if (TARGET_VSX || TARGET_ALTIVEC)
     {
@@ -3034,7 +3042,7 @@ rs6000_init_hard_regno_mode_ok (bool glo
 	      reg_addr[TImode].reload_load   = CODE_FOR_reload_ti_di_load;
 	    }
 
-	  if (TARGET_DIRECT_MOVE)
+	  if (TARGET_DIRECT_MOVE && !TARGET_DIRECT_MOVE_128)
 	    {
 	      reg_addr[TImode].reload_gpr_vsx    = CODE_FOR_reload_gpr_from_vsxti;
 	      reg_addr[V1TImode].reload_gpr_vsx  = CODE_FOR_reload_gpr_from_vsxv1ti;
@@ -18081,6 +18089,11 @@ rs6000_secondary_reload_simple_move (enu
 	  || (to_type == VSX_REG_TYPE && from_type == GPR_REG_TYPE)))
     return true;
 
+  else if (TARGET_DIRECT_MOVE_128 && size == 16
+	   && ((to_type == VSX_REG_TYPE && from_type == GPR_REG_TYPE)
+	       || (to_type == GPR_REG_TYPE && from_type == VSX_REG_TYPE)))
+    return true;
+
   else if (TARGET_MFPGPR && TARGET_POWERPC64 && size == 8
 	   && ((to_type == GPR_REG_TYPE && from_type == FPR_REG_TYPE)
 	       || (to_type == FPR_REG_TYPE && from_type == GPR_REG_TYPE)))
@@ -18094,7 +18107,7 @@ rs6000_secondary_reload_simple_move (enu
   return false;
 }
 
-/* Power8 helper function for rs6000_secondary_reload, handle all of the
+/* Direct move helper function for rs6000_secondary_reload, handle all of the
    special direct moves that involve allocating an extra register, return the
    insn code of the helper function if there is such a function or
    CODE_FOR_nothing if not.  */
@@ -18116,8 +18129,8 @@ rs6000_secondary_reload_direct_move (enu
       if (size == 16)
 	{
 	  /* Handle moving 128-bit values from GPRs to VSX point registers on
-	     power8 when running in 64-bit mode using XXPERMDI to glue the two
-	     64-bit values back together.  */
+	     ISA 2.07 (power8, power9) when running in 64-bit mode using
+	     XXPERMDI to glue the two 64-bit values back together.  */
 	  if (to_type == VSX_REG_TYPE && from_type == GPR_REG_TYPE)
 	    {
 	      cost = 3;			/* 2 mtvsrd's, 1 xxpermdi.  */
@@ -18125,7 +18138,7 @@ rs6000_secondary_reload_direct_move (enu
 	    }
 
 	  /* Handle moving 128-bit values from VSX point registers to GPRs on
-	     power8 when running in 64-bit mode using XXPERMDI to get access to the
+	     ISA 2.07 when running in 64-bit mode using XXPERMDI to get access to the
 	     bottom 64-bit value.  */
 	  else if (to_type == GPR_REG_TYPE && from_type == VSX_REG_TYPE)
 	    {
@@ -18150,11 +18163,32 @@ rs6000_secondary_reload_direct_move (enu
 	}
     }
 
-  else if (size == 8)
+  if (TARGET_POWERPC64 && size == 16)
+    {
+      /* Handle moving 128-bit values from GPRs to VSX point registers on
+	 ISA 2.07 when running in 64-bit mode using XXPERMDI to glue the two
+	 64-bit values back together.  */
+      if (to_type == VSX_REG_TYPE && from_type == GPR_REG_TYPE)
+	{
+	  cost = 3;			/* 2 mtvsrd's, 1 xxpermdi.  */
+	  icode = reg_addr[mode].reload_vsx_gpr;
+	}
+
+      /* Handle moving 128-bit values from VSX point registers to GPRs on
+	 ISA 2.07 when running in 64-bit mode using XXPERMDI to get access to the
+	 bottom 64-bit value.  */
+      else if (to_type == GPR_REG_TYPE && from_type == VSX_REG_TYPE)
+	{
+	  cost = 3;			/* 2 mfvsrd's, 1 xxpermdi.  */
+	  icode = reg_addr[mode].reload_gpr_vsx;
+	}
+    }
+
+  else if (!TARGET_POWERPC64 && size == 8)
     {
       /* Handle moving 64-bit values from GPRs to floating point registers on
-	 power8 when running in 32-bit mode using FMRGOW to glue the two 32-bit
-	 values back together.  Altivec register classes must be handled
+	 ISA 2.07 when running in 32-bit mode using FMRGOW to glue the two
+	 32-bit values back together.  Altivec register classes must be handled
 	 specially since a different instruction is used, and the secondary
 	 reload support requires a single instruction class in the scratch
 	 register constraint.  However, right now TFmode is not allowed in
@@ -18181,7 +18215,7 @@ rs6000_secondary_reload_direct_move (enu
 
 /* Return whether a move between two register classes can be done either
    directly (simple move) or via a pattern that uses a single extra temporary
-   (using power8's direct move in this case.  */
+   (using ISA 2.07's direct move in this case.  */
 
 static bool
 rs6000_secondary_reload_move (enum rs6000_reg_type to_type,
@@ -19220,6 +19254,11 @@ rs6000_output_move_128bit (rtx operands[
 	  if (src_gpr_p)
 	    return "#";
 
+	  if (TARGET_DIRECT_MOVE_128 && src_vsx_p)
+	    return (WORDS_BIG_ENDIAN
+		    ? "mfvsrd %0,%x1\n\tmfvsrld %L0,%x1"
+		    : "mfvsrd %L0,%x1\n\tmfvsrld %0,%x1");
+
 	  else if (TARGET_VSX && TARGET_DIRECT_MOVE && src_vsx_p)
 	    return "#";
 	}
@@ -19229,6 +19268,11 @@ rs6000_output_move_128bit (rtx operands[
 	  if (src_vsx_p)
 	    return "xxlor %x0,%x1,%x1";
 
+	  else if (TARGET_DIRECT_MOVE_128 && src_gpr_p)
+	    return (WORDS_BIG_ENDIAN
+		    ? "mtvsrdd %x0,%1,%L1"
+		    : "mtvsrdd %x0,%L1,%1");
+
 	  else if (TARGET_DIRECT_MOVE && src_gpr_p)
 	    return "#";
 	}
@@ -20490,11 +20534,12 @@ rs6000_generate_compare (rtx cmp, machin
       emit_insn (cmp);
     }
 
-  /* IEEE 128-bit support in VSX registers.  The comparison functions
-     (__cmpokf2 and __cmpukf2) returns 0..15 that is laid out the same way as
-     the PowerPC CR register would for a normal floating point comparison from
-     the fcmpo and fcmpu instructions.  */
-  else if (FLOAT128_IEEE_P (mode))
+  /* IEEE 128-bit support in VSX registers.  If we do not have IEEE 128-bit
+     hardware, the comparison functions (__cmpokf2 and __cmpukf2) returns 0..15
+     that is laid out the same way as the PowerPC CR register would for a
+     normal floating point comparison from the fcmpo and fcmpu
+     instructions.  */
+  else if (!TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode))
     {
       rtx and_reg = gen_reg_rtx (SImode);
       rtx dest = gen_reg_rtx (SImode);
@@ -20633,7 +20678,7 @@ rs6000_generate_compare (rtx cmp, machin
   /* Some kinds of FP comparisons need an OR operation;
      under flag_finite_math_only we don't bother.  */
   if (FLOAT_MODE_P (mode)
-      && !FLOAT128_IEEE_P (mode)
+      && (!FLOAT128_IEEE_P (mode) || TARGET_FLOAT128_HW)
       && !flag_finite_math_only
       && !(TARGET_HARD_FLOAT && !TARGET_FPRS)
       && (code == LE || code == GE
@@ -20726,6 +20771,56 @@ rs6000_expand_float128_convert (rtx dest
   bool do_move = false;
   rtx libfunc = NULL_RTX;
   rtx dest2;
+  typedef rtx (*rtx_2func_t) (rtx, rtx);
+  rtx_2func_t hw_convert = (rtx_2func_t)0;
+  size_t kf_or_tf;
+
+  struct hw_conv_t {
+    rtx_2func_t	from_df;
+    rtx_2func_t from_sf;
+    rtx_2func_t from_si_sign;
+    rtx_2func_t from_si_uns;
+    rtx_2func_t from_di_sign;
+    rtx_2func_t from_di_uns;
+    rtx_2func_t to_df;
+    rtx_2func_t to_sf;
+    rtx_2func_t to_si_sign;
+    rtx_2func_t to_si_uns;
+    rtx_2func_t to_di_sign;
+    rtx_2func_t to_di_uns;
+  } hw_conversions[2] = {
+    /* convertions to/from KFmode */
+    {
+      gen_extenddfkf2_hw,		/* KFmode <- DFmode.  */
+      gen_extendsfkf2_hw,		/* KFmode <- SFmode.  */
+      gen_float_kfsi2_hw,		/* KFmode <- SImode (signed).  */
+      gen_floatuns_kfsi2_hw,		/* KFmode <- SImode (unsigned).  */
+      gen_float_kfdi2_hw,		/* KFmode <- DImode (signed).  */
+      gen_floatuns_kfdi2_hw,		/* KFmode <- DImode (unsigned).  */
+      gen_trunckfdf2_hw,		/* DFmode <- KFmode.  */
+      gen_trunckfsf2_hw,		/* SFmode <- KFmode.  */
+      gen_fix_kfsi2_hw,			/* SImode <- KFmode (signed).  */
+      gen_fixuns_kfsi2_hw,		/* SImode <- KFmode (unsigned).  */
+      gen_fix_kfdi2_hw,			/* DImode <- KFmode (signed).  */
+      gen_fixuns_kfdi2_hw,		/* DImode <- KFmode (unsigned).  */
+    },
+
+    /* convertions to/from TFmode */
+    {
+      gen_extenddftf2_hw,		/* TFmode <- DFmode.  */
+      gen_extendsftf2_hw,		/* TFmode <- SFmode.  */
+      gen_float_tfsi2_hw,		/* TFmode <- SImode (signed).  */
+      gen_floatuns_tfsi2_hw,		/* TFmode <- SImode (unsigned).  */
+      gen_float_tfdi2_hw,		/* TFmode <- DImode (signed).  */
+      gen_floatuns_tfdi2_hw,		/* TFmode <- DImode (unsigned).  */
+      gen_trunctfdf2_hw,		/* DFmode <- TFmode.  */
+      gen_trunctfsf2_hw,		/* SFmode <- TFmode.  */
+      gen_fix_tfsi2_hw,			/* SImode <- TFmode (signed).  */
+      gen_fixuns_tfsi2_hw,		/* SImode <- TFmode (unsigned).  */
+      gen_fix_tfdi2_hw,			/* DImode <- TFmode (signed).  */
+      gen_fixuns_tfdi2_hw,		/* DImode <- TFmode (unsigned).  */
+    },
+  };
 
   if (dest_mode == src_mode)
     gcc_unreachable ();
@@ -20745,14 +20840,23 @@ rs6000_expand_float128_convert (rtx dest
   /* Convert to IEEE 128-bit floating point.  */
   if (FLOAT128_IEEE_P (dest_mode))
     {
+      if (dest_mode == KFmode)
+	kf_or_tf = 0;
+      else if (dest_mode == TFmode)
+	kf_or_tf = 1;
+      else
+	gcc_unreachable ();
+
       switch (src_mode)
 	{
 	case DFmode:
 	  cvt = sext_optab;
+	  hw_convert = hw_conversions[kf_or_tf].from_df;
 	  break;
 
 	case SFmode:
 	  cvt = sext_optab;
+	  hw_convert = hw_conversions[kf_or_tf].from_sf;
 	  break;
 
 	case KFmode:
@@ -20765,8 +20869,29 @@ rs6000_expand_float128_convert (rtx dest
 	  break;
 
 	case SImode:
+	  if (unsigned_p)
+	    {
+	      cvt = ufloat_optab;
+	      hw_convert = hw_conversions[kf_or_tf].from_si_uns;
+	    }
+	  else
+	    {
+	      cvt = sfloat_optab;
+	      hw_convert = hw_conversions[kf_or_tf].from_si_sign;
+	    }
+	  break;
+
 	case DImode:
-	  cvt = (unsigned_p) ? ufloat_optab : sfloat_optab;
+	  if (unsigned_p)
+	    {
+	      cvt = ufloat_optab;
+	      hw_convert = hw_conversions[kf_or_tf].from_di_uns;
+	    }
+	  else
+	    {
+	      cvt = sfloat_optab;
+	      hw_convert = hw_conversions[kf_or_tf].from_di_sign;
+	    }
 	  break;
 
 	default:
@@ -20777,14 +20902,23 @@ rs6000_expand_float128_convert (rtx dest
   /* Convert from IEEE 128-bit floating point.  */
   else if (FLOAT128_IEEE_P (src_mode))
     {
+      if (src_mode == KFmode)
+	kf_or_tf = 0;
+      else if (src_mode == TFmode)
+	kf_or_tf = 1;
+      else
+	gcc_unreachable ();
+
       switch (dest_mode)
 	{
 	case DFmode:
 	  cvt = trunc_optab;
+	  hw_convert = hw_conversions[kf_or_tf].to_df;
 	  break;
 
 	case SFmode:
 	  cvt = trunc_optab;
+	  hw_convert = hw_conversions[kf_or_tf].to_sf;
 	  break;
 
 	case KFmode:
@@ -20797,8 +20931,29 @@ rs6000_expand_float128_convert (rtx dest
 	  break;
 
 	case SImode:
+	  if (unsigned_p)
+	    {
+	      cvt = ufix_optab;
+	      hw_convert = hw_conversions[kf_or_tf].to_si_uns;
+	    }
+	  else
+	    {
+	      cvt = sfix_optab;
+	      hw_convert = hw_conversions[kf_or_tf].to_si_sign;
+	    }
+	  break;
+
 	case DImode:
-	  cvt = (unsigned_p) ? ufix_optab : sfix_optab;
+	  if (unsigned_p)
+	    {
+	      cvt = ufix_optab;
+	      hw_convert = hw_conversions[kf_or_tf].to_di_uns;
+	    }
+	  else
+	    {
+	      cvt = sfix_optab;
+	      hw_convert = hw_conversions[kf_or_tf].to_di_sign;
+	    }
 	  break;
 
 	default:
@@ -20817,6 +20972,10 @@ rs6000_expand_float128_convert (rtx dest
   if (do_move)
     emit_move_insn (dest, gen_lowpart (dest_mode, src));
 
+  /* Handle conversion if we have hardware support.  */
+  else if (TARGET_FLOAT128_HW && hw_convert)
+    emit_insn ((hw_convert) (dest, src));
+
   /* Call an external function to do the conversion.  */
   else if (cvt != unknown_optab)
     {
@@ -20837,6 +20996,92 @@ rs6000_expand_float128_convert (rtx dest
   return;
 }
 
+/* Split a conversion from __float128 to an integer type into separate insns.
+   OPERANDS points to the destination, source, and V2DI temporary
+   register. CODE is either FIX or UNSIGNED_FIX.  */
+
+void
+convert_float128_to_int (rtx *operands, enum rtx_code code)
+{
+  rtx dest = operands[0];
+  rtx src = operands[1];
+  rtx tmp = operands[2];
+  rtx cvt;
+  rtvec cvt_vec;
+  rtx cvt_unspec;
+  rtvec move_vec;
+  rtx move_unspec;
+
+  if (GET_CODE (tmp) == SCRATCH)
+    tmp = gen_reg_rtx (V2DImode);
+
+  if (MEM_P (dest))
+    dest = rs6000_address_for_fpconvert (dest);
+
+  /* Generate the actual convert insn of the form:
+     (set (tmp) (unspec:V2DI [(fix:SI (reg:KF))] UNSPEC_IEEE128_CONVERT)).  */
+  cvt = gen_rtx_fmt_e (code, GET_MODE (dest), src);
+  cvt_vec = gen_rtvec (1, cvt);
+  cvt_unspec = gen_rtx_UNSPEC (V2DImode, cvt_vec, UNSPEC_IEEE128_CONVERT);
+  emit_insn (gen_rtx_SET (tmp, cvt_unspec));
+
+  /* Generate the move insn of the form:
+     (set (dest:SI) (unspec:SI [(tmp:V2DI))] UNSPEC_IEEE128_MOVE)).  */
+  move_vec = gen_rtvec (1, tmp);
+  move_unspec = gen_rtx_UNSPEC (GET_MODE (dest), move_vec, UNSPEC_IEEE128_MOVE);
+  emit_insn (gen_rtx_SET (dest, move_unspec));
+}
+
+/* Split a conversion from an integer type to __float128 into separate insns.
+   OPERANDS points to the destination, source, and V2DI temporary
+   register. CODE is either FLOAT or UNSIGNED_FLOAT.  */
+
+void
+convert_int_to_float128 (rtx *operands, enum rtx_code code)
+{
+  rtx dest = operands[0];
+  rtx src = operands[1];
+  rtx tmp = operands[2];
+  rtx cvt;
+  rtvec cvt_vec;
+  rtx cvt_unspec;
+  rtvec move_vec;
+  rtx move_unspec;
+  rtx unsigned_flag;
+
+  if (GET_CODE (tmp) == SCRATCH)
+    tmp = gen_reg_rtx (V2DImode);
+
+  if (MEM_P (src))
+    src = rs6000_address_for_fpconvert (src);
+
+  /* Generate the move of the integer into the Altivec register of the form:
+     (set (tmp:V2DI) (unspec:V2DI [(src:SI)
+				   (const_int 0)] UNSPEC_IEEE128_MOVE)).
+
+     or:
+     (set (tmp:V2DI) (unspec:V2DI [(src:DI)] UNSPEC_IEEE128_MOVE)).  */
+
+  if (GET_MODE (src) == SImode)
+    {
+      unsigned_flag = (code == UNSIGNED_FLOAT) ? const1_rtx : const0_rtx;
+      move_vec = gen_rtvec (2, src, unsigned_flag);
+    }
+  else
+    move_vec = gen_rtvec (1, src);
+
+  move_unspec = gen_rtx_UNSPEC (V2DImode, move_vec, UNSPEC_IEEE128_MOVE);
+  emit_insn (gen_rtx_SET (tmp, move_unspec));
+
+  /* Generate the actual convert insn of the form:
+     (set (dest:KF) (float:KF (unspec:DI [(tmp:V2DI)]
+					 UNSPEC_IEEE128_CONVERT))).  */
+  cvt_vec = gen_rtvec (1, tmp);
+  cvt_unspec = gen_rtx_UNSPEC (DImode, cvt_vec, UNSPEC_IEEE128_CONVERT);
+  cvt = gen_rtx_fmt_e (code, GET_MODE (dest), cvt_unspec);
+  emit_insn (gen_rtx_SET (dest, cvt));
+}
+
 \f
 /* Emit the RTL for an sISEL pattern.  */
 
Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md	(revision 230335)
+++ gcc/config/rs6000/vsx.md	(working copy)
@@ -760,31 +760,31 @@ (define_split
   "")
 
 (define_insn "*vsx_mov<mode>"
-  [(set (match_operand:VSX_M 0 "nonimmediate_operand" "=Z,<VSr>,<VSr>,?Z,?<VSa>,?<VSa>,wQ,?&r,??Y,??r,??r,<VSr>,?<VSa>,*r,v,wZ, v")
-	(match_operand:VSX_M 1 "input_operand" "<VSr>,Z,<VSr>,<VSa>,Z,<VSa>,r,wQ,r,Y,r,j,j,j,W,v,wZ"))]
+  [(set (match_operand:VSX_M 0 "nonimmediate_operand" "=Z,<VSr>,<VSr>,?Z,?<VSa>,?<VSa>,r,we,wQ,?&r,??Y,??r,??r,<VSr>,?<VSa>,*r,v,wZ,v")
+	(match_operand:VSX_M 1 "input_operand" "<VSr>,Z,<VSr>,<VSa>,Z,<VSa>,we,b,r,wQ,r,Y,r,j,j,j,W,v,wZ"))]
   "VECTOR_MEM_VSX_P (<MODE>mode)
    && (register_operand (operands[0], <MODE>mode) 
        || register_operand (operands[1], <MODE>mode))"
 {
   return rs6000_output_move_128bit (operands);
 }
-  [(set_attr "type" "vecstore,vecload,vecsimple,vecstore,vecload,vecsimple,load,store,store,load, *,vecsimple,vecsimple,*, *,vecstore,vecload")
-   (set_attr "length" "4,4,4,4,4,4,12,12,12,12,16,4,4,*,16,4,4")])
+  [(set_attr "type" "vecstore,vecload,vecsimple,vecstore,vecload,vecsimple,mffgpr,mftgpr,load,store,store,load, *,vecsimple,vecsimple,*, *,vecstore,vecload")
+   (set_attr "length" "4,4,4,4,4,4,8,4,12,12,12,12,16,4,4,*,16,4,4")])
 
 ;; Unlike other VSX moves, allow the GPRs even for reloading, since a normal
 ;; use of TImode is for unions.  However for plain data movement, slightly
 ;; favor the vector loads
 (define_insn "*vsx_movti_64bit"
-  [(set (match_operand:TI 0 "nonimmediate_operand" "=Z,wa,wa,wa,v,v,wZ,wQ,&r,Y,r,r,?r")
-	(match_operand:TI 1 "input_operand" "wa,Z,wa,O,W,wZ,v,r,wQ,r,Y,r,n"))]
+  [(set (match_operand:TI 0 "nonimmediate_operand" "=Z,wa,wa,wa,r,we,v,v,wZ,wQ,&r,Y,r,r,?r")
+	(match_operand:TI 1 "input_operand" "wa,Z,wa,O,we,b,W,wZ,v,r,wQ,r,Y,r,n"))]
   "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (TImode)
    && (register_operand (operands[0], TImode) 
        || register_operand (operands[1], TImode))"
 {
   return rs6000_output_move_128bit (operands);
 }
-  [(set_attr "type" "vecstore,vecload,vecsimple,vecsimple,vecsimple,vecstore,vecload,store,load,store,load,*,*")
-   (set_attr "length" "4,4,4,4,16,4,4,8,8,8,8,8,8")])
+  [(set_attr "type" "vecstore,vecload,vecsimple,vecsimple,mffgpr,mftgpr,vecsimple,vecstore,vecload,store,load,store,load,*,*")
+   (set_attr "length" "4,4,4,4,8,4,16,4,4,8,8,8,8,8,8")])
 
 (define_insn "*vsx_movti_32bit"
   [(set (match_operand:TI 0 "nonimmediate_operand" "=Z,wa,wa,wa,v, v,wZ,Q,Y,????r,????r,????r,r")
@@ -1909,11 +1909,11 @@ (define_expand "vsx_extract_<mode>"
 ;; Optimize cases were we can do a simple or direct move.
 ;; Or see if we can avoid doing the move at all
 (define_insn "*vsx_extract_<mode>_internal1"
-  [(set (match_operand:<VS_scalar> 0 "register_operand" "=d,<VS_64reg>,r")
+  [(set (match_operand:<VS_scalar> 0 "register_operand" "=d,<VS_64reg>,r,r")
 	(vec_select:<VS_scalar>
-	 (match_operand:VSX_D 1 "register_operand" "d,<VS_64reg>,<VS_64dm>")
+	 (match_operand:VSX_D 1 "register_operand" "d,<VS_64reg>,<VS_64dm>,<VS_64dm>")
 	 (parallel
-	  [(match_operand:QI 2 "vsx_scalar_64bit" "wD,wD,wD")])))]
+	  [(match_operand:QI 2 "vsx_scalar_64bit" "wD,wD,wD,wL")])))]
   "VECTOR_MEM_VSX_P (<MODE>mode) && TARGET_POWERPC64 && TARGET_DIRECT_MOVE"
 {
   int op0_regno = REGNO (operands[0]);
@@ -1923,14 +1923,16 @@ (define_insn "*vsx_extract_<mode>_intern
     return "nop";
 
   if (INT_REGNO_P (op0_regno))
-    return "mfvsrd %0,%x1";
+    return ((INTVAL (operands[2]) == VECTOR_ELEMENT_MFVSRLD_64BIT)
+	    ? "mfvsrdl %0,%x1"
+	    : "mfvsrd %0,%x1");
 
   if (FP_REGNO_P (op0_regno) && FP_REGNO_P (op1_regno))
     return "fmr %0,%1";
 
   return "xxlor %x0,%x1,%x1";
 }
-  [(set_attr "type" "fp,vecsimple,mftgpr")
+  [(set_attr "type" "fp,vecsimple,mftgpr,mftgpr")
    (set_attr "length" "4")])
 
 (define_insn "*vsx_extract_<mode>_internal2"
Index: gcc/config/rs6000/rs6000.h
===================================================================
--- gcc/config/rs6000/rs6000.h	(revision 230335)
+++ gcc/config/rs6000/rs6000.h	(working copy)
@@ -516,6 +516,10 @@ extern int rs6000_vector_align[];
    with scalar instructions.  */
 #define VECTOR_ELEMENT_SCALAR_64BIT	((BYTES_BIG_ENDIAN) ? 0 : 1)
 
+/* Element number of the 64-bit value in a 128-bit vector that can be accessed
+   with the ISA 3.0 MFVSRLD instructions.  */
+#define VECTOR_ELEMENT_MFVSRLD_64BIT	((BYTES_BIG_ENDIAN) ? 1 : 0)
+
 /* Alignment options for fields in structures for sub-targets following
    AIX-like ABI.
    ALIGN_POWER word-aligns FP doubles (default AIX ABI).
@@ -567,10 +571,13 @@ extern int rs6000_vector_align[];
 #define TARGET_FCTIWUZ	TARGET_POPCNTD
 #define TARGET_CTZ	TARGET_MODULO
 #define TARGET_EXTSWSLI	(TARGET_MODULO && TARGET_POWERPC64)
+#define TARGET_MADDLD	(TARGET_MODULO && TARGET_POWERPC64)
 
 #define TARGET_XSCVDPSPN	(TARGET_DIRECT_MOVE || TARGET_P8_VECTOR)
 #define TARGET_XSCVSPDPN	(TARGET_DIRECT_MOVE || TARGET_P8_VECTOR)
 #define TARGET_VADDUQM		(TARGET_P8_VECTOR && TARGET_POWERPC64)
+#define TARGET_DIRECT_MOVE_128	(TARGET_P9_VECTOR && TARGET_DIRECT_MOVE \
+				 && TARGET_POWERPC64)
 
 /* Byte/char syncs were added as phased in for ISA 2.06B, but are not present
    in power7, so conditionalize them on p8 features.  TImode syncs need quad
@@ -1517,6 +1524,7 @@ enum r6000_reg_class_enum {
   RS6000_CONSTRAINT_v,		/* Altivec registers */
   RS6000_CONSTRAINT_wa,		/* Any VSX register */
   RS6000_CONSTRAINT_wd,		/* VSX register for V2DF */
+  RS6000_CONSTRAINT_we,		/* VSX register if ISA 3.0 vector. */
   RS6000_CONSTRAINT_wf,		/* VSX register for V4SF */
   RS6000_CONSTRAINT_wg,		/* FPR register for -mmfpgpr */
   RS6000_CONSTRAINT_wh,		/* FPR register for direct moves.  */
Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(revision 230335)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -143,6 +143,9 @@ (define_c_enum "unspec"
    UNSPEC_STACK_CHECK
    UNSPEC_FUSION_P9
    UNSPEC_FUSION_ADDIS
+   UNSPEC_ROUND_TO_ODD
+   UNSPEC_IEEE128_MOVE
+   UNSPEC_IEEE128_CONVERT
   ])
 
 ;;
@@ -381,6 +384,8 @@ (define_mode_iterator FMA_F [
   (V2SF "TARGET_PAIRED_FLOAT")
   (V4SF "VECTOR_UNIT_ALTIVEC_OR_VSX_P (V4SFmode)")
   (V2DF "VECTOR_UNIT_ALTIVEC_OR_VSX_P (V2DFmode)")
+  (KF "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (KFmode)")
+  (TF "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (TFmode)")
   ])
 
 ; Floating point move iterators to combine binary and decimal moves
@@ -485,10 +490,10 @@ (define_mode_attr Ftrad		[(SF "s") (DF "
 (define_mode_attr Fvsx		[(SF "sp") (DF	"dp")])
 
 ; SF/DF constraint for arithmetic on traditional floating point registers
-(define_mode_attr Ff		[(SF "f") (DF "d")])
+(define_mode_attr Ff		[(SF "f") (DF "d") (DI "d")])
 
 ; SF/DF constraint for arithmetic on VSX registers
-(define_mode_attr Fv		[(SF "wy") (DF "ws")])
+(define_mode_attr Fv		[(SF "wy") (DF "ws") (DI "wi")])
 
 ; SF/DF constraint for arithmetic on altivec registers
 (define_mode_attr Fa		[(SF "wu") (DF "wv")])
@@ -510,9 +515,31 @@ (define_code_attr return_str [(return ""
 (define_code_iterator iorxor [ior xor])
 
 ; Signed/unsigned variants of ops.
-(define_code_iterator any_extend [sign_extend zero_extend])
-(define_code_attr u [(sign_extend "") (zero_extend "u")])
-(define_code_attr su [(sign_extend "s") (zero_extend "u")])
+(define_code_iterator any_extend	[sign_extend zero_extend])
+(define_code_iterator any_fix		[fix unsigned_fix])
+(define_code_iterator any_float		[float unsigned_float])
+
+(define_code_attr u  [(sign_extend	"")
+		      (zero_extend	"u")])
+
+(define_code_attr su [(sign_extend	"s")
+		      (zero_extend	"u")
+		      (fix		"s")
+		      (unsigned_fix	"s")
+		      (float		"s")
+		      (unsigned_float	"u")])
+
+(define_code_attr az [(sign_extend	"a")
+		      (zero_extend	"z")
+		      (fix		"a")
+		      (unsigned_fix	"z")
+		      (float		"a")
+		      (unsigned_float	"z")])
+
+(define_code_attr uns [(fix		"")
+		       (unsigned_fix	"uns")
+		       (float		"")
+		       (unsigned_float	"uns")])
 
 ; Various instructions that come in SI and DI forms.
 ; A generic w/d attribute, for things like cmpw/cmpd.
@@ -2815,6 +2842,14 @@ (define_expand "<u>mul<mode><dmode>3"
   DONE;
 })
 
+(define_insn "*maddld4"
+  [(set (match_operand:DI 0 "gpc_reg_operand" "=r")
+	(plus:DI (mult:DI (match_operand:DI 1 "gpc_reg_operand" "r")
+			  (match_operand:DI 2 "gpc_reg_operand" "r"))
+		 (match_operand:DI 3 "gpc_reg_operand" "r")))]
+  "TARGET_MADDLD"
+  "maddld %0,%1,%2,%3"
+  [(set_attr "type" "mul")])
 
 (define_insn "udiv<mode>3"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
@@ -7003,7 +7038,16 @@ (define_expand "neg<mode>2"
 {
   if (FLOAT128_IEEE_P (<MODE>mode))
     {
-      if (TARGET_FLOAT128)
+      if (TARGET_FLOAT128_HW)
+	{
+	  if (<MODE>mode == TFmode)
+	    emit_insn (gen_negtf2_hw (operands[0], operands[1]));
+	  else if (<MODE>mode == KFmode)
+	    emit_insn (gen_negkf2_hw (operands[0], operands[1]));
+	  else
+	    gcc_unreachable ();
+	}
+      else if (TARGET_FLOAT128)
 	{
 	  if (<MODE>mode == TFmode)
 	    emit_insn (gen_ieee_128bit_vsx_negtf2 (operands[0], operands[1]));
@@ -7053,7 +7097,17 @@ (define_expand "abs<mode>2"
 
   if (FLOAT128_IEEE_P (<MODE>mode))
     {
-      if (TARGET_FLOAT128)
+      if (TARGET_FLOAT128_HW)
+	{
+	  if (<MODE>mode == TFmode)
+	    emit_insn (gen_abstf2_hw (operands[0], operands[1]));
+	  else if (<MODE>mode == KFmode)
+	    emit_insn (gen_abskf2_hw (operands[0], operands[1]));
+	  else
+	    FAIL;
+	  DONE;
+	}
+      else if (TARGET_FLOAT128)
 	{
 	  if (<MODE>mode == TFmode)
 	    emit_insn (gen_ieee_128bit_vsx_abstf2 (operands[0], operands[1]));
@@ -7140,7 +7194,7 @@ (define_insn_and_split "ieee_128bit_vsx_
   [(set (match_operand:IEEE128 0 "register_operand" "=wa")
 	(neg:IEEE128 (match_operand:IEEE128 1 "register_operand" "wa")))
    (clobber (match_scratch:V16QI 2 "=v"))]
-  "TARGET_FLOAT128"
+  "TARGET_FLOAT128 && !TARGET_FLOAT128_HW"
   "#"
   "&& 1"
   [(parallel [(set (match_dup 0)
@@ -7160,7 +7214,7 @@ (define_insn "*ieee_128bit_vsx_neg<mode>
   [(set (match_operand:IEEE128 0 "register_operand" "=wa")
 	(neg:IEEE128 (match_operand:IEEE128 1 "register_operand" "wa")))
    (use (match_operand:V16QI 2 "register_operand" "=v"))]
-  "TARGET_FLOAT128"
+  "TARGET_FLOAT128 && !TARGET_FLOAT128_HW"
   "xxlxor %x0,%x1,%x2"
   [(set_attr "type" "vecsimple")])
 
@@ -7169,7 +7223,7 @@ (define_insn_and_split "ieee_128bit_vsx_
   [(set (match_operand:IEEE128 0 "register_operand" "=wa")
 	(abs:IEEE128 (match_operand:IEEE128 1 "register_operand" "wa")))
    (clobber (match_scratch:V16QI 2 "=v"))]
-  "TARGET_FLOAT128 && FLOAT128_IEEE_P (<MODE>mode)"
+  "TARGET_FLOAT128 && !TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<MODE>mode)"
   "#"
   "&& 1"
   [(parallel [(set (match_dup 0)
@@ -7189,7 +7243,7 @@ (define_insn "*ieee_128bit_vsx_abs<mode>
   [(set (match_operand:IEEE128 0 "register_operand" "=wa")
 	(abs:IEEE128 (match_operand:IEEE128 1 "register_operand" "wa")))
    (use (match_operand:V16QI 2 "register_operand" "=v"))]
-  "TARGET_FLOAT128"
+  "TARGET_FLOAT128 && !TARGET_FLOAT128_HW"
   "xxlandc %x0,%x1,%x2"
   [(set_attr "type" "vecsimple")])
 
@@ -7200,7 +7254,7 @@ (define_insn_and_split "*ieee_128bit_vsx
 	 (abs:IEEE128
 	  (match_operand:IEEE128 1 "register_operand" "wa"))))
    (clobber (match_scratch:V16QI 2 "=v"))]
-  "TARGET_FLOAT128 && FLOAT128_IEEE_P (<MODE>mode)"
+  "TARGET_FLOAT128 && !TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<MODE>mode)"
   "#"
   "&& 1"
   [(parallel [(set (match_dup 0)
@@ -7222,7 +7276,7 @@ (define_insn "*ieee_128bit_vsx_nabs<mode
 	 (abs:IEEE128
 	  (match_operand:IEEE128 1 "register_operand" "wa"))))
    (use (match_operand:V16QI 2 "register_operand" "=v"))]
-  "TARGET_FLOAT128"
+  "TARGET_FLOAT128 && !TARGET_FLOAT128_HW"
   "xxlor %x0,%x1,%x2"
   [(set_attr "type" "vecsimple")])
 
@@ -7480,7 +7534,10 @@ (define_split
 	(match_operand:FMOVE128_GPR 1 "input_operand" ""))]
   "reload_completed
    && (int_reg_operand (operands[0], <MODE>mode)
-       || int_reg_operand (operands[1], <MODE>mode))"
+       || int_reg_operand (operands[1], <MODE>mode))
+   && (!TARGET_DIRECT_MOVE_128
+       || (!vsx_register_operand (operands[0], <MODE>mode)
+           && !vsx_register_operand (operands[1], <MODE>mode)))"
   [(pc)]
 { rs6000_split_multireg_move (operands[0], operands[1]); DONE; })
 
@@ -12998,6 +13055,332 @@ (define_insn "pack<mode>"
 
 
 \f
+;; ISA 2.08 IEEE 128-bit floating point support.
+
+(define_insn "add<mode>3"
+  [(set (match_operand:IEEE128 0 "altivec_register_operand" "=v")
+	(plus:IEEE128
+	 (match_operand:IEEE128 1 "altivec_register_operand" "v")
+	 (match_operand:IEEE128 2 "altivec_register_operand" "v")))]
+  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<MODE>mode)"
+  "xsaddqp %0,%1,%2"
+  [(set_attr "type" "vecfloat")])
+
+(define_insn "sub<mode>3"
+  [(set (match_operand:IEEE128 0 "altivec_register_operand" "=v")
+	(minus:IEEE128
+	 (match_operand:IEEE128 1 "altivec_register_operand" "v")
+	 (match_operand:IEEE128 2 "altivec_register_operand" "v")))]
+  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<MODE>mode)"
+  "xssubqp %0,%1,%2"
+  [(set_attr "type" "vecfloat")])
+
+(define_insn "mul<mode>3"
+  [(set (match_operand:IEEE128 0 "altivec_register_operand" "=v")
+	(mult:IEEE128
+	 (match_operand:IEEE128 1 "altivec_register_operand" "v")
+	 (match_operand:IEEE128 2 "altivec_register_operand" "v")))]
+  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<MODE>mode)"
+  "xsmulqp %0,%1,%2"
+  [(set_attr "type" "vecfloat")])
+
+(define_insn "div<mode>3"
+  [(set (match_operand:IEEE128 0 "altivec_register_operand" "=v")
+	(div:IEEE128
+	 (match_operand:IEEE128 1 "altivec_register_operand" "v")
+	 (match_operand:IEEE128 2 "altivec_register_operand" "v")))]
+  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<MODE>mode)"
+  "xsdivqp %0,%1,%2"
+  [(set_attr "type" "vecdiv")])
+
+(define_insn "sqrt<mode>2"
+  [(set (match_operand:IEEE128 0 "altivec_register_operand" "=v")
+	(sqrt:IEEE128
+	 (match_operand:IEEE128 1 "altivec_register_operand" "v")))]
+  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<MODE>mode)"
+   "xssqrtqp %0,%1"
+  [(set_attr "type" "vecdiv")])
+
+(define_insn "copysign<mode>3"
+  [(set (match_operand:IEEE128 0 "altivec_register_operand" "=v")
+	(unspec:IEEE128
+	 [(match_operand:IEEE128 1 "altivec_register_operand" "v")
+	  (match_operand:IEEE128 2 "altivec_register_operand" "v")]
+	 UNSPEC_COPYSIGN))]
+  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<MODE>mode)"
+   "xscpsgnqp %0,%2,%1"
+  [(set_attr "type" "vecsimple")])
+
+(define_insn "neg<mode>2_hw"
+  [(set (match_operand:IEEE128 0 "altivec_register_operand" "=v")
+	(neg:IEEE128
+	 (match_operand:IEEE128 1 "altivec_register_operand" "v")))]
+  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<MODE>mode)"
+  "xsnegqp %0,%1"
+  [(set_attr "type" "vecfloat")])
+
+
+(define_insn "abs<mode>2_hw"
+  [(set (match_operand:IEEE128 0 "altivec_register_operand" "=v")
+	(abs:IEEE128
+	 (match_operand:IEEE128 1 "altivec_register_operand" "v")))]
+  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<MODE>mode)"
+  "xsabsqp %0,%1"
+  [(set_attr "type" "vecfloat")])
+
+
+(define_insn "*nabs<mode>2_hw"
+  [(set (match_operand:IEEE128 0 "altivec_register_operand" "=v")
+	(neg:IEEE128
+	 (abs:IEEE128
+	  (match_operand:IEEE128 1 "altivec_register_operand" "v"))))]
+  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<MODE>mode)"
+  "xsnabsqp %0,%1"
+  [(set_attr "type" "vecfloat")])
+
+;; Initially don't worry about doing fusion
+(define_insn "*fma<mode>4_hw"
+  [(set (match_operand:IEEE128 0 "altivec_register_operand" "=v")
+	(fma:IEEE128
+	 (match_operand:IEEE128 1 "altivec_register_operand" "%v")
+	 (match_operand:IEEE128 2 "altivec_register_operand" "v")
+	 (match_operand:IEEE128 3 "altivec_register_operand" "0")))]
+  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<MODE>mode)"
+  "xsmaddqp %0,%1,%2"
+  [(set_attr "type" "vecfloat")])
+
+(define_insn "*fms<mode>4_hw"
+  [(set (match_operand:IEEE128 0 "altivec_register_operand" "=v")
+	(fma:IEEE128
+	 (match_operand:IEEE128 1 "altivec_register_operand" "%v")
+	 (match_operand:IEEE128 2 "altivec_register_operand" "v")
+	 (neg:IEEE128
+	  (match_operand:IEEE128 3 "altivec_register_operand" "0"))))]
+  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<MODE>mode)"
+  "xsmsubqp %0,%1,%2"
+  [(set_attr "type" "vecfloat")])
+
+(define_insn "*nfma<mode>4_hw"
+  [(set (match_operand:IEEE128 0 "altivec_register_operand" "=v")
+	(neg:IEEE128
+	 (fma:IEEE128
+	  (match_operand:IEEE128 1 "altivec_register_operand" "%v")
+	  (match_operand:IEEE128 2 "altivec_register_operand" "v")
+	  (match_operand:IEEE128 3 "altivec_register_operand" "0"))))]
+  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<MODE>mode)"
+  "xsnmaddqp %0,%1,%2"
+  [(set_attr "type" "vecfloat")])
+
+(define_insn "*nfms<mode>4_hw"
+  [(set (match_operand:IEEE128 0 "altivec_register_operand" "=v")
+	(neg:IEEE128
+	 (fma:IEEE128
+	  (match_operand:IEEE128 1 "altivec_register_operand" "%v")
+	  (match_operand:IEEE128 2 "altivec_register_operand" "v")
+	  (neg:IEEE128
+	   (match_operand:IEEE128 3 "altivec_register_operand" "0")))))]
+  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<MODE>mode)"
+  "xsnmsubqp %0,%1,%2"
+  [(set_attr "type" "vecfloat")])
+
+(define_insn "extend<SFDF:mode><IEEE128:mode>2_hw"
+  [(set (match_operand:IEEE128 0 "altivec_register_operand" "=v")
+	(float_extend:IEEE128
+	 (match_operand:SFDF 1 "altivec_register_operand" "v")))]
+  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<IEEE128:MODE>mode)"
+  "xscvdpqp %0,%1"
+  [(set_attr "type" "vecfloat")])
+
+(define_insn "trunc<mode>df2_hw"
+  [(set (match_operand:DF 0 "altivec_register_operand" "=v")
+	(float_truncate:DF
+	 (match_operand:IEEE128 1 "altivec_register_operand" "v")))]
+  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<MODE>mode)"
+  "xscvqpdp %0,%1"
+  [(set_attr "type" "vecfloat")])
+
+;; There is no KFmode -> SFmode instruction. Preserve the accuracy by doing
+;; the KFmode -> DFmode conversion using round to odd rather than the normal
+;; conversion
+(define_insn_and_split "trunc<mode>sf2_hw"
+  [(set (match_operand:SF 0 "vsx_register_operand" "=wy")
+	(float_truncate:SF
+	 (match_operand:IEEE128 1 "altivec_register_operand" "v")))
+   (clobber (match_scratch:DF 2 "=v"))]
+  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<MODE>mode)"
+  "#"
+  "&& 1"
+  [(set (match_dup 2)
+	(unspec:DF [(match_dup 1)] UNSPEC_ROUND_TO_ODD))
+   (set (match_dup 0)
+	(float_truncate:SF (match_dup 2)))]
+{
+  if (GET_CODE (operands[2]) == SCRATCH)
+    operands[2] = gen_reg_rtx (DFmode);
+}
+  [(set_attr "type" "vecfloat")
+   (set_attr "length" "8")])
+
+;; At present SImode is not allowed in VSX registers at all, and DImode is only
+;; allowed in the traditional floating point registers. Use V2DImode so that
+;; we can get a value in an Altivec register.
+
+(define_insn_and_split "fix<uns>_<mode>si2_hw"
+  [(set (match_operand:SI 0 "nonimmediate_operand" "=r,Z")
+	(any_fix:SI (match_operand:IEEE128 1 "altivec_register_operand" "v,v")))
+   (clobber (match_scratch:V2DI 2 "=v,v"))]
+  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<MODE>mode)"
+  "#"
+  "&& 1"
+  [(pc)]
+{
+  convert_float128_to_int (operands, <CODE>);
+  DONE;
+}
+  [(set_attr "length" "8")
+   (set_attr "type" "mftgpr,fpstore")])
+
+(define_insn_and_split "fix<uns>_<mode>di2_hw"
+  [(set (match_operand:DI 0 "nonimmediate_operand" "=wr,wi,Z")
+	(any_fix:DI (match_operand:IEEE128 1 "altivec_register_operand" "v,v,v")))
+   (clobber (match_scratch:V2DI 2 "=v,v,v"))]
+  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<MODE>mode)"
+  "#"
+  "&& 1"
+  [(pc)]
+{
+  convert_float128_to_int (operands, <CODE>);
+  DONE;
+}
+  [(set_attr "length" "8")
+   (set_attr "type" "mftgpr,vecsimple,fpstore")])
+
+(define_insn_and_split "float<uns>_<mode>si2_hw"
+  [(set (match_operand:IEEE128 0 "altivec_register_operand" "=v,v")
+	(any_float:IEEE128 (match_operand:SI 1 "nonimmediate_operand" "r,Z")))
+   (clobber (match_scratch:V2DI 2 "=v,v"))]
+  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<MODE>mode)"
+  "#"
+  "&& 1"
+  [(pc)]
+{
+  convert_int_to_float128 (operands, <CODE>);
+  DONE;
+}
+  [(set_attr "length" "8")
+   (set_attr "type" "vecfloat")])
+
+(define_insn_and_split "float<uns>_<mode>di2_hw"
+  [(set (match_operand:IEEE128 0 "altivec_register_operand" "=v,v,v")
+	(any_float:IEEE128 (match_operand:DI 1 "nonimmediate_operand" "wi,wr,Z")))
+   (clobber (match_scratch:V2DI 2 "=v,v,v"))]
+  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<MODE>mode)"
+  "#"
+  "&& 1"
+  [(pc)]
+{
+  convert_int_to_float128 (operands, <CODE>);
+  DONE;
+}
+  [(set_attr "length" "8")
+   (set_attr "type" "vecfloat")])
+
+;; Integer conversion instructions, using V2DImode to get an Altivec register
+(define_insn "*xscvqp<su>wz_<mode>"
+  [(set (match_operand:V2DI 0 "altivec_register_operand" "=v")
+	(unspec:V2DI
+	 [(any_fix:SI
+	   (match_operand:IEEE128 1 "altivec_register_operand" "v"))]
+	 UNSPEC_IEEE128_CONVERT))]
+  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<MODE>mode)"
+  "xscvqp<su>wz %0,%1"
+  [(set_attr "type" "vecfloat")])
+
+(define_insn "*xscvqp<su>dz_<mode>"
+  [(set (match_operand:V2DI 0 "altivec_register_operand" "=v")
+	(unspec:V2DI
+	 [(any_fix:DI
+	   (match_operand:IEEE128 1 "altivec_register_operand" "v"))]
+	 UNSPEC_IEEE128_CONVERT))]
+  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<MODE>mode)"
+  "xscvqp<su>dz %0,%1"
+  [(set_attr "type" "vecfloat")])
+
+(define_insn "*xscv<su>dqp_<mode>"
+  [(set (match_operand:IEEE128 0 "altivec_register_operand" "=v")
+	(any_float:IEEE128
+	 (unspec:DI [(match_operand:V2DI 1 "altivec_register_operand" "v")]
+		    UNSPEC_IEEE128_CONVERT)))]
+  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<MODE>mode)"
+  "xscv<su>dqp %0,%1"
+  [(set_attr "type" "vecfloat")])
+
+(define_insn "*ieee128_mfvsrd"
+  [(set (match_operand:DI 0 "reg_or_indexed_operand" "=wr,Z,wi")
+	(unspec:DI [(match_operand:V2DI 1 "altivec_register_operand" "v,v,v")]
+		   UNSPEC_IEEE128_MOVE))]
+  "TARGET_FLOAT128_HW && TARGET_POWERPC64"
+  "@
+   mfvsrd %0,%x1
+   stxsdx %x1,%y0
+   xxlor %x0,%x1,%x1"
+  [(set_attr "type" "mftgpr,vecsimple,fpstore")])
+
+(define_insn "*ieee128_mfvsrwz"
+  [(set (match_operand:SI 0 "reg_or_indexed_operand" "=r,Z")
+	(unspec:SI [(match_operand:V2DI 1 "altivec_register_operand" "v,v")]
+		   UNSPEC_IEEE128_MOVE))]
+  "TARGET_FLOAT128_HW"
+  "@
+   mfvsrwz %0,%x1
+   stxsiwx %x1,%y0"
+  [(set_attr "type" "mftgpr,fpstore")])
+
+;; 0 says do sign-extension, 1 says zero-extension
+(define_insn "*ieee128_mtvsrw"
+  [(set (match_operand:V2DI 0 "altivec_register_operand" "=v,v,v,v")
+	(unspec:V2DI [(match_operand:SI 1 "nonimmediate_operand" "r,Z,r,Z")
+		      (match_operand:SI 2 "const_0_to_1_operand" "O,O,n,n")]
+		     UNSPEC_IEEE128_MOVE))]
+  "TARGET_FLOAT128_HW"
+  "@
+   mtvsrwa %x0,%1
+   lxsiwax %x0,%y1
+   mtvsrwz %x0,%1
+   lxsiwzx %x0,%y1"
+  [(set_attr "type" "mffgpr,fpload,mffgpr,fpload")])
+
+
+(define_insn "*ieee128_mtvsrd"
+  [(set (match_operand:V2DI 0 "altivec_register_operand" "=v,v,v")
+	(unspec:V2DI [(match_operand:DI 1 "nonimmediate_operand" "wr,Z,wi")]
+		     UNSPEC_IEEE128_MOVE))]
+  "TARGET_FLOAT128_HW"
+  "@
+   mtvsrd %x0,%1
+   lxsdx %x0,%y1
+   xxlor %x0,%x1,%x1"
+  [(set_attr "type" "mffgpr,fpload,vecsimple")])
+
+;; IEEE 128-bit instructions with round to odd semantics
+(define_insn "*trunc<mode>df2_odd"
+  [(set (match_operand:DF 0 "vsx_register_operand" "=v")
+	(unspec:DF [(match_operand:IEEE128 1 "altivec_register_operand" "v")]
+		   UNSPEC_ROUND_TO_ODD))]
+  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<MODE>mode)"
+  "xscvqpdpo %0,%1"
+  [(set_attr "type" "vecfloat")])
+
+;; IEEE 128-bit comparisons
+(define_insn "*cmp<mode>_hw"
+  [(set (match_operand:CCFP 0 "cc_reg_operand" "=y")
+	(compare:CCFP (match_operand:IEEE128 1 "altivec_register_operand" "v")
+		      (match_operand:IEEE128 2 "altivec_register_operand" "v")))]
+  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<MODE>mode)"
+   "xscmpuqp %0,%1,%2"
+  [(set_attr "type" "fpcompare")])
+
+\f
 
 (include "sync.md")
 (include "vector.md")
Index: gcc/doc/md.texi
===================================================================
--- gcc/doc/md.texi	(revision 230335)
+++ gcc/doc/md.texi	(working copy)
@@ -3121,9 +3121,28 @@ asm ("xvadddp %0,%1,%2" : "=wa" (v1) : "
 
 is not correct.
 
+If an instruction only takes Altivec registers, you do not want to use
+@code{%x<n>}.
+
+@smallexample
+asm ("xsaddqp %0,%1,%2" : "=v" (v1) : "v" (v2), "v" (v3));
+@end smallexample
+
+is correct because the @code{xsaddqp} instruction only takes Altivec
+registers, while:
+
+@smallexample
+asm ("xsaddqp %x0,%x1,%x2" : "=v" (v1) : "v" (v2), "v" (v3));
+@end smallexample
+
+is incorrect.
+
 @item wd
 VSX vector register to hold vector double data or NO_REGS.
 
+@item we
+VSX register if the -mpower9-vector -m64 options were used or NO_REGS.
+
 @item wf
 VSX vector register to hold vector float data or NO_REGS.
 
@@ -3187,6 +3206,16 @@ Floating point register if the LFIWZX in
 @item wD
 Int constant that is the element number of the 64-bit scalar in a vector.
 
+@item wF
+Memory operand suitable for power9 fusion load/stores.
+
+@item wG
+Memory operand suitable for TOC fusion memory references.
+
+@item wL
+Int constant that is the element number that the MFVSRLD instruction
+targets.
+
 @item wQ
 A memory address that will work with the @code{lq} and @code{stq}
 instructions.
Index: gcc/testsuite/gcc.target/powerpc/direct-move-vector.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/direct-move-vector.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/direct-move-vector.c	(revision 0)
@@ -0,0 +1,33 @@
+/* { dg-do compile { target { powerpc*-*-linux* && lp64 } } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-options "-mcpu=power9 -O2" } */
+
+/* Check code generation for direct move for long types.  */
+
+void
+test (vector double *p)
+{
+  vector double v1 = *p;
+  vector double v2;
+  vector double v3;
+  vector double v4;
+
+  /* Force memory -> FPR load.  */
+  __asm__ (" # reg %x0" : "+d" (v1));
+
+  /* force VSX -> GPR direct move.  */
+  v2 = v1;
+  __asm__ (" # reg %0" : "+r" (v2));
+
+  /* Force GPR -> Altivec direct move.  */
+  v3 = v2;
+  __asm__ (" # reg %x0" : "+v" (v3));
+  *p = v3;
+}
+
+/* { dg-final { scan-assembler "mfvsrd"  } } */
+/* { dg-final { scan-assembler "mfvsrld" } } */
+/* { dg-final { scan-assembler "mtvsrdd" } } */
+
+
Index: gcc/testsuite/gcc.target/powerpc/float128-hw.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/float128-hw.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/float128-hw.c	(revision 0)
@@ -0,0 +1,18 @@
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
+/* { dg-require-effective-target powerpc_float128_hw_ok } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-options "-mcpu=power9 -O2" } */
+
+__float128 f128_add (__float128 a, __float128 b) { return a+b; }
+__float128 f128_sub (__float128 a, __float128 b) { return a-b; }
+__float128 f128_mul (__float128 a, __float128 b) { return a*b; }
+__float128 f128_div (__float128 a, __float128 b) { return a/b; }
+__float128 f128_fma (__float128 a, __float128 b, __float128 c) { return (a*b)+c; }
+long f128_cmove (__float128 a, __float128 b, long c, long d) { return (a == b) ? c : d; }
+
+/* { dg-final { scan-assembler "xsaddqp"  } } */
+/* { dg-final { scan-assembler "xssubqp"  } } */
+/* { dg-final { scan-assembler "xsmulqp"  } } */
+/* { dg-final { scan-assembler "xsdivqp"  } } */
+/* { dg-final { scan-assembler "xsmaddqp" } } */
+/* { dg-final { scan-assembler "xscmpuqp" } } */
Index: gcc/testsuite/gcc.target/powerpc/maddld.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/maddld.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/maddld.c	(revision 0)
@@ -0,0 +1,20 @@
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
+/* { dg-require-effective-target powerpc_p9modulo_ok } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-options "-mcpu=power9 -O2" } */
+
+long
+s_madd (long a, long b, long c)
+{
+  return (a * b) + c;
+}
+
+unsigned long
+u_madd (unsigned long a, unsigned long b, unsigned long c)
+{
+  return (a * b) + c;
+}
+
+/* { dg-final { scan-assembler-times "maddld " 2 } } */
+/* { dg-final { scan-assembler-not   "mulld "    } } */
+/* { dg-final { scan-assembler-not   "add "      } } */

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH], Add power9 support to GCC, patch #5 (ISA 3.0 fusion)
  2015-11-09  0:42 ` [PATCH], Add power9 support to GCC, patch #5 (ISA 3.0 fusion) Michael Meissner
  2015-11-09 17:16   ` Segher Boessenkool
  2015-11-09 18:57   ` David Edelsohn
@ 2015-11-14 22:58   ` Segher Boessenkool
  2 siblings, 0 replies; 47+ messages in thread
From: Segher Boessenkool @ 2015-11-14 22:58 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, dje.gcc

On Sun, Nov 08, 2015 at 07:42:04PM -0500, Michael Meissner wrote:
> This patch adds support for new fusion forms in ISA 3.0 (power9).  In
> particular, ISA 3.0 can fuse GPR loads of R0, FPR loads, GPR stores, FPR
> stores, and some constant generation that ISA 2.07 (power8) could not
> generate.

TOC fusion breaks thousands of testcases with -mlra -flto.

What happens is that LRA tries to reload the memory address (the unspec
FUSION_ADDIS) into a register, but there is no pattern that will let it
do that.

This can be fixed temporarily by not enabling TOC fusion if LRA is
enabled.

It seems that without -flto TOC fusion doesn't do much at all, btw?


Segher

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH], Add power9 support to GCC, patch #10 (SFmode/DFmode d-form addressing)
  2015-11-10 21:56 ` [PATCH], Add power9 support to GCC, patch #10 (SFmode/DFmode d-form addressing) Michael Meissner
  2015-11-11  0:19   ` Segher Boessenkool
  2015-11-11  0:26   ` Michael Meissner
@ 2015-11-24 18:08   ` David Edelsohn
  2 siblings, 0 replies; 47+ messages in thread
From: David Edelsohn @ 2015-11-24 18:08 UTC (permalink / raw)
  To: Michael Meissner, GCC Patches

On Tue, Nov 10, 2015 at 4:56 PM, Michael Meissner
<meissner@linux.vnet.ibm.com> wrote:
> This patch d-form addressing to float/double scalars for the PowerPC that was
> added in ISA 3.0 (power9).  This patch does not yet turn on D-form addressing
> as default.  It is likely that patch #11, which will add limited d-form
> addressing to vector registers will enable it by default.
>
> I have bootstrapped the compiler with these changes, and there were no
> regressions to the testsuite.
>
> In addition, I built all of the Spec 2006 benchmark with my normal options
> (-ffast-math -O3 -mveclibabi=mass -mcpu=power9 -mpower9-dform -mrecip=rsqrt
> -fpeel-loops -funroll-loops -fvect-cost-model -msave-toc-indirect
> -fno-aggressive-loop-optimizations -mno-pointers-to-nested-functions) and there
> were no compiler failures (and various power9 instructions were generated,
> including d-form addressing).
>
> Are these patches ok to check in?
>
> [gcc]
> 2015-11-10  Michael Meissner  <meissner@linux.vnet.ibm.com>
>
>         <patch #10>
>         * config/rs6000/constraints.md (wb constraint): New constraint for
>         ISA 3.0 d-form scalar addressing.
>
>         * config/rs6000/rs6000.c (mode_supports_vmx_dform): Add support
>         for ISA 3.0 D-form addressing to load SFmode/DFmode scalars into
>         Altivec registers.  Add wb constraint for Altivec registers with
>         D-form addressing.  If we have ISA 3.0 d-form support, undo
>         secondary reload support for using FPR registers if we want to do
>         D-form addressing.
>         (rs6000_debug_reg_global): Likewise.
>         (rs6000_setup_reg_addr_masks): Likewise.
>         (rs6000_init_hard_regno_mode_ok): Likewise.
>         (rs6000_secondary_reload): Likewise.
>         (rs6000_preferred_reload_class): Likewise.
>         (rs6000_secondary_reload_class): Likewise.
>
>         * config/rs6000/rs6000.h (enum r6000_reg_class_enum): Add wb
>         constraint.
>
>         * config/rs6000/rs6000.md (f32_lr2 mode attribute): Add support
>         for ISA 3.0 SFmode/DFmode d-form addressing to Altivec registers.
>         (f32_lm2): Likewise.
>         (f32_li2): Likewise.
>         (f32_sr2): Likewise.
>         (f32_sm2): Likewise.
>         (f32_si2): Likewise.
>         (f64_p9): Likewise.
>         (extendsfdf2_fpr): Likewise.
>         (mov<mode>_hardfloat): Likewise.
>         (mov<mode>_hardfloat32): Likewise.
>         (mov<mode>_hardfloat64): Likewise.
>
>         * doc/md.texi (RS/6000 constraints): Document wb constraint.
>         Fixup we constraint documentation.
>
> [gcc/testsuite]
> 2015-11-10  Michael Meissner  <meissner@linux.vnet.ibm.com>
>
>         * gcc.target/powerpc/dform-1.c: New test.
>         * gcc.target/powerpc/dform-2.c: Likewise.

This is okay.

I don't know if you want to apply it now or wait until after the
holidays in case there is any fallout.

Thanks, David

^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2015-11-24 17:52 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-03 20:29 [PATCH], Add power9 support to GCC, patch #1 Michael Meissner
2015-11-04 21:16 ` Segher Boessenkool
2015-11-04 21:27   ` Michael Meissner
2015-11-09  0:33 ` [PATCH], Add power9 support to GCC, patch #1 (revised) Michael Meissner
2015-11-09 16:12   ` David Edelsohn
2015-11-10 18:39   ` [PATCH], Add power9 support to GCC, patch #8 (add integer multiply/add) Michael Meissner
2015-11-12 20:39     ` David Edelsohn
2015-11-09  0:36 ` [PATCH], Add power9 support to GCC, patch #2 (add modulus instructions) Michael Meissner
2015-11-09 15:48   ` Segher Boessenkool
2015-11-09 18:07     ` Michael Meissner
2015-11-09 16:14   ` David Edelsohn
2015-11-10  0:17   ` [PATCH], Add power9 support to GCC, patches #2-5 committed Michael Meissner
2015-11-10  0:20     ` Michael Meissner
2015-11-09  0:38 ` [PATCH], Add power9 support to GCC, patch #3 (scalar count trailing zeros) Michael Meissner
2015-11-09 15:59   ` Segher Boessenkool
2015-11-09 17:18     ` Michael Meissner
2015-11-09 19:33       ` Segher Boessenkool
2015-11-09 18:02   ` David Edelsohn
2015-11-09  0:39 ` [PATCH], Add power9 support to GCC, patch #4 Michael Meissner
2015-11-09 16:29   ` Segher Boessenkool
2015-11-09 17:27     ` Michael Meissner
2015-11-09 19:48       ` Segher Boessenkool
2015-11-09 18:03   ` David Edelsohn
2015-11-09  0:42 ` [PATCH], Add power9 support to GCC, patch #5 (ISA 3.0 fusion) Michael Meissner
2015-11-09 17:16   ` Segher Boessenkool
2015-11-09 17:34     ` Michael Meissner
2015-11-09 19:57       ` Segher Boessenkool
2015-11-09 21:11         ` David Edelsohn
2015-11-09 22:17           ` Michael Meissner
2015-11-09 22:33             ` David Edelsohn
2015-11-09 18:57   ` David Edelsohn
2015-11-14 22:58   ` Segher Boessenkool
2015-11-09  0:45 ` [PATCH], Add power9 support to GCC, patch #6 (IEEE 128-bit hardware support) Michael Meissner
2015-11-09 19:29   ` Segher Boessenkool
2015-11-10  0:41   ` Joseph Myers
2015-11-10 18:41     ` Michael Meissner
2015-11-12 20:47   ` David Edelsohn
2015-11-13 22:13     ` [PATCH applied], Power9 patches #6-8 (IEEE 128-bit h/w, 128-bit direct move, integer mult/add) Michael Meissner
2015-11-09  0:49 ` [PATCH], Add power9 support to GCC, patch #7 (direct move enhancements) Michael Meissner
2015-11-09 20:00   ` Segher Boessenkool
2015-11-09 21:06   ` Michael Meissner
2015-11-12 20:43   ` David Edelsohn
2015-11-10 20:56 ` [PATCH, applied], Add power9 support to GCC, patch #9 (config.gcc) Michael Meissner
2015-11-10 21:56 ` [PATCH], Add power9 support to GCC, patch #10 (SFmode/DFmode d-form addressing) Michael Meissner
2015-11-11  0:19   ` Segher Boessenkool
2015-11-11  0:26   ` Michael Meissner
2015-11-24 18:08   ` David Edelsohn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).