public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH, rs6000] power8 patches
@ 2013-05-20 20:41 Michael Meissner
  2013-05-20 20:49 ` [PATCH, rs6000] power8 patch #1, infrastructure changes Michael Meissner
                   ` (9 more replies)
  0 siblings, 10 replies; 52+ messages in thread
From: Michael Meissner @ 2013-05-20 20:41 UTC (permalink / raw)
  To: gcc-patches, dje.gcc, pthaugen, bergner

On May 10th, the Power Architecture Advisory Council announced the public
availability of Power ISA 2.07.
https://www.power.org/documentation/power-isa-version-2-07/

I will start submitting patches shortly which are our initial support for the
future power8 cpu which will implement the ISA 2.07 instructions.

Changes that will involve GCC support will include:

1) Add new builtins for cryptography support;

2) Add new builtins for vector operations (population count, count leading
zeros, new vector logical instructions);

3) Support for vector long long (i.e. V2DI) add, subtract, compare operations;

4) Support for new instructions to move data between the general purpose
registers and the VSX registers (direct move);

5) Support for quad memory atomic operations, and the ability to use the quad
memory instructions (lq/stq) in user space;

6) VSX versions of the scalar single precision floating instructions;

7) Hardware transactional memory support;

8) 32-bit vector integer (i.e. V4SI) multiply operations;

9) Better support for unaligned vector memory operations;

10) Support for fusion, where the hardware can fusion certain add immediate
instrucitons with the dependent load instruction;

11) Power8 scheduling support.

Note, in order to build code for power8, you will need a power8 assembler,
which will shortly be submitted to the binutils mailing lists.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH, rs6000] power8 patch #1, infrastructure changes
  2013-05-20 20:41 [PATCH, rs6000] power8 patches Michael Meissner
@ 2013-05-20 20:49 ` Michael Meissner
  2013-05-20 21:34   ` [PATCH, rs6000] power8 patch #1, infrastructure changes (revised patch) Michael Meissner
  2013-05-20 23:13 ` [PATCH, rs6000] power8 patches, patch #2, add crypto builtins Michael Meissner
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 52+ messages in thread
From: Michael Meissner @ 2013-05-20 20:49 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, dje.gcc, pthaugen, bergner

[-- Attachment #1: Type: text/plain, Size: 5038 bytes --]

These patches are primarily infrastructure patches patches, that adds the
switches the following patches will use.  I also added the new constraints and
predicates that will be used by future patches.

At this point of development, I have multiple switches for different
sub-features.  I could reduce the number of documented switches down to just
one or two, like we did in the power7 time frame if desired.

2013-05-17  Michael Meissner  <meissner@linux.vnet.ibm.com>
	    Pat Haugen <pthaugen@us.ibm.com>
	    Peter Bergner <bergner@vnet.ibm.com>

	* doc/invoke.texi (Option Summary): Add power8 options.
	(RS/6000 and PowerPC Options): Likewise.

	* doc/md.texi (PowerPC and IBM RS6000 constraints): Update to use
	constraints.md instead of rs6000.h.  Reorder w* constraints.  Add
	wm, wn, wr documentation.

	* gcc/config/rs6000/constraints.md (wm): New constraint for VSX
	registers if direct move instructions are enabled.
	(wn): New constraint for no registers.
	(wq): New constraint for quad word even GPR registers.
	(wr): New constraint if 64-bit instructions are enabled.
	(wv): New constraint if power8 vector instructions are enabled.
	(wQ): New constraint for quad word memory locations.

	* gcc/config/rs6000/predicates.md (const_0_to_15_operand): New
	constraint for 0..15 for crypto instructions.
	(gpc_reg_operand): If VSX allow registers in VSX registers as well
	as GPR and floating point registers.
	(int_reg_operand): New predicate to match only GPR registers.
	(base_reg_operand): New predicate to match base registers.
	(quad_int_reg_operand): New predicate to match even GPR registers
	for quad memory operations.
	(vsx_reg_or_cint_operand): New predicate to allow vector logical
	operations in both GPR and VSX registers.
	(quad_memory_operand): New predicate for quad memory operations.
	(reg_or_indexed_operand): New predicate for direct move support.

	* gcc/config/rs6000/rs6000-cpus.def (ISA_2_5_MASKS_EMBEDDED):
	Inherit from ISA_2_4_MASKS, not ISA_2_2_MASKS.
	(ISA_2_7_MASKS_SERVER): New mask for ISA 2.07 (i.e. power8).
	(POWERPC_MASKS): Add power8 options.
	(power8 cpu): Use ISA_2_7_MASKS_SERVER instead of specifying the
	various options.

	* gcc/config/rs6000/rs6000-c.c (rs6000_target_modify_macros):
	Define _ARCH_PWR8 and __POWER8_VECTOR__ for power8.

	* gcc/config/rs6000/rs6000.opt (-mvsx-timode): Add documentation.
	(-mpower8-fusion): New power8 options.
	(-mpower8-fusion-sign): Likewise.
	(-mpower8-vector): Likewise.
	(-mcrypto): Likewise.
	(-mdirect-move): Likewise.
	(-mquad-memory): Likewise.

	* gcc/config/rs6000/rs6000.c (power8_cost): Initial definition for
	power8.
	(rs6000_hard_regno_mode_ok): Make PTImode only match even GPR
	registers.
	(rs6000_debug_reg_print): Print the base register class if
	-mdebug=reg.
	(rs6000_debug_vector_unit): Add p8_vector.
	(rs6000_debug_reg_global): If -mdebug=reg, print power8 constraint
	definitions.  Also print fusion state.
	(rs6000_init_hard_regno_mode_ok): Set up power8 constraints.
	(rs6000_builtin_mask_calculate): Add power8 builtin support.
	(rs6000_option_override_internal): Add support for power8.
	(rs6000_common_init_builtins): Add debugging for skipped builtins
	if -mdebug=builtin.
	(rs6000_adjust_cost): Add power8 support.
	(rs6000_issue_rate): Likewise.
	(insn_must_be_first_in_group): Likewise.
	(insn_must_be_last_in_group): Likewise.
	(force_new_group): Likewise.
	(rs6000_register_move_cost): Likewise.
	(rs6000_opt_masks): Likewise.

	* config/rs6000/rs6000.h (ASM_CPU_POWER8_SPEC): If we don't have a
	power8 capable assembler, default to power7 options.
	(TARGET_DIRECT_MOVE): Likewise.
	(TARGET_CRYPTO): Likewise.
	(TARGET_P8_VECTOR): Likewise.
	(VECTOR_UNIT_P8_VECTOR_P): Define power8 vector support.
	(VECTOR_UNIT_VSX_OR_P8_VECTOR_P): Likewise.
	(VECTOR_MEM_P8_VECTOR_P): Likewise.
	(VECTOR_MEM_VSX_OR_P8_VECTOR_P): Likewise.
	(VECTOR_MEM_ALTIVEC_OR_VSX_P): Likewise.
	(TARGET_XSCVDPSPN): Likewise.
	(TARGET_XSCVSPDPN): Likewsie.
	(TARGET_SYNC_HI_QI): Likewise.
	(TARGET_SYNC_TI): Likewise.
	(MASK_CRYPTO): Likewise.
	(MASK_DIRECT_MOVE): Likewise.
	(MASK_P8_FUSION): Likewise.
	(MASK_P8_VECTOR): Likewise.
	(REG_ALLOC_ORDER): Move fr13 to be lower in priority so that the
	TFmode temporary used by some of the direct move instructions to
	get two FP temporary registers does not force creation of a stack
	frame.
	(VLOGICAL_REGNO_P): Allow vector logical operations in GPRs.
	(MODES_TIEABLE_P): Move the VSX tests above the Altivec tests so
	that any VSX registers are tieable, even if they are also an
	Altivec vector mode.
	(r6000_reg_class_enum): Add wm, wq, wr, wv constraints.
	(RS6000_BTM_P8_VECTOR): Power8 builtin support.
	(RS6000_BTM_CRYPTO): Likewise.
	(RS6000_BTM_COMMON): Likewise.

	* config/rs6000/rs6000.md (cpu attribute): Add power8.
	* config/rs6000/rs6000-opts.h (PROCESSOR_POWER8): Likewise.
	(enum rs6000_vector): Add power8 vector support.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

[-- Attachment #2: gcc-power8.official-01b --]
[-- Type: text/plain, Size: 47203 bytes --]

Index: gcc/doc/invoke.texi
===================================================================
--- gcc/doc/invoke.texi	(revision 199037)
+++ gcc/doc/invoke.texi	(working copy)
@@ -860,7 +860,10 @@ See RS/6000 and PowerPC Options.
 -mno-recip-precision @gol
 -mveclibabi=@var{type} -mfriz -mno-friz @gol
 -mpointers-to-nested-functions -mno-pointers-to-nested-functions @gol
--msave-toc-indirect -mno-save-toc-indirect}
+-msave-toc-indirect -mno-save-toc-indirect @gol
+-mpower8-fusion -mno-mpower8-fusion -mpower8-vector -mno-power8-vector @gol
+-mcrypto -mno-crypto -mdirect-move -mno-direct-move @gol
+-mquad-memory -mno-quad-memory}
 
 @emph{RX Options}
 @gccoptlist{-m64bit-doubles  -m32bit-doubles  -fpu  -nofpu@gol
@@ -17341,7 +17344,8 @@ following options:
 @gccoptlist{-maltivec  -mfprnd  -mhard-float  -mmfcrf  -mmultiple @gol
 -mpopcntb -mpopcntd  -mpowerpc64 @gol
 -mpowerpc-gpopt  -mpowerpc-gfxopt  -msingle-float -mdouble-float @gol
--msimple-fpu -mstring  -mmulhw  -mdlmzb  -mmfpgpr -mvsx}
+-msimple-fpu -mstring  -mmulhw  -mdlmzb  -mmfpgpr -mvsx @gol
+-mcrypto -mdirect-move -mpower8-fusion -mpower8-vector -mquad-memory}
 
 The particular options set for any particular CPU varies between
 compiler versions, depending on what setting seems to produce optimal
@@ -17459,6 +17463,47 @@ Generate code that uses (does not use) v
 instructions, and also enable the use of built-in functions that allow
 more direct access to the VSX instruction set.
 
+@item -mcrypto
+@itemx -mno-crypto
+@opindex mcrypto
+@opindex mno-crypto
+Enable the use (disable) of the built-in functions that allow direct
+access to the cryptographic instructions that were added in version
+2.07 of the PowerPC ISA.
+
+@item -mdirect-move
+@itemx -mno-direct-move
+@opindex mdirect-move
+@opindex mno-direct-move
+Generate code that uses (does not use) the instructions to move data
+between the general purpose registers and the vector/scalar (VSX)
+registers that were added in version 2.07 of the PowerPC ISA.
+
+@item -mpower8-fusion
+@itemx -mno-power8-fusion
+@opindex mpower8-fusion
+@opindex mno-power8-fusion
+Generate code that keeps (does not keeps) some integer operations
+adjacent so that the instructions can be fused together on power8 and
+later processors.
+
+@item -mpower8-vector
+@itemx -mno-power8-vector
+@opindex mpower8-vector
+@opindex mno-power8-vector
+Generate code that uses (does not use) the vector and scalar
+instructions that were added in version 2.07 of the PowerPC ISA.  Also
+enable the use of built-in functions that allow more direct access to
+the vector instructions.
+
+@item -mquad-memory
+@itemx -mno-quad-memory
+@opindex mquad-memory
+@opindex mno-quad-memory
+Generate code that uses (does not use) the quad word memory
+instructions.  The @option{-mquad-memory} option requires use of
+64-bit mode.
+
 @item -mfloat-gprs=@var{yes/single/double/no}
 @itemx -mfloat-gprs
 @opindex mfloat-gprs
Index: gcc/doc/md.texi
===================================================================
--- gcc/doc/md.texi	(revision 199037)
+++ gcc/doc/md.texi	(working copy)
@@ -2055,7 +2055,7 @@ Any constant whose absolute value is no 
 
 @end table
 
-@item PowerPC and IBM RS6000---@file{config/rs6000/rs6000.h}
+@item PowerPC and IBM RS6000---@file{config/rs6000/constraints.md}
 @table @code
 @item b
 Address base register
@@ -2069,6 +2069,9 @@ Floating point register (containing 32-b
 @item v
 Altivec vector register
 
+@item wa
+Any VSX register
+
 @item wd
 VSX vector register to hold vector double data
 
@@ -2081,6 +2084,18 @@ If @option{-mmfpgpr} was used, a floatin
 @item wl
 If the LFIWAX instruction is enabled, a floating point register
 
+@item wm
+If direct moves are enabled, a VSX register.
+
+@item wn
+No register.
+
+@item wq
+Even general purpose register to use with load/store quad instructions
+
+@item wr
+General purpose register if 64-bit mode is used
+
 @item ws
 VSX vector register to hold scalar float data
 
@@ -2093,8 +2108,9 @@ If the STFIWX instruction is enabled, a 
 @item wz
 If the LFIWZX instruction is enabled, a floating point register
 
-@item wa
-Any VSX register
+@item wQ
+A memory address that will work with the @code{lq} and @code{stq}
+instructions.
 
 @item h
 @samp{MQ}, @samp{CTR}, or @samp{LINK} register
Index: gcc/ChangeLog.ibm
===================================================================
--- gcc/ChangeLog.ibm	(revision 199038)
+++ gcc/ChangeLog.ibm	(working copy)
@@ -1,5 +1,108 @@
 2013-05-17  Michael Meissner  <meissner@linux.vnet.ibm.com>
 
+	* doc/invoke.texi (Option Summary): Add power8 options.
+	(RS/6000 and PowerPC Options): Likewise.
+
+	* doc/md.texi (PowerPC and IBM RS6000 constraints): Update to use
+	constraints.md instead of rs6000.h.  Reorder w* constraints.  Add
+	wm, wn, wr documentation.
+
+	* gcc/config/rs6000/constraints.md (wm): New constraint for VSX
+	registers if direct move instructions are enabled.
+	(wn): New constraint for no registers.
+	(wq): New constraint for quad word even GPR registers.
+	(wr): New constraint if 64-bit instructions are enabled.
+	(wv): New constraint if power8 vector instructions are enabled.
+	(wQ): New constraint for quad word memory locations.
+
+	* gcc/config/rs6000/predicates.md (const_0_to_15_operand): New
+	constraint for 0..15 for crypto instructions.
+	(gpc_reg_operand): If VSX allow registers in VSX registers as well
+	as GPR and floating point registers.
+	(int_reg_operand): New predicate to match only GPR registers.
+	(base_reg_operand): New predicate to match base registers.
+	(quad_int_reg_operand): New predicate to match even GPR registers
+	for quad memory operations.
+	(vsx_reg_or_cint_operand): New predicate to allow vector logical
+	operations in both GPR and VSX registers.
+	(quad_memory_operand): New predicate for quad memory operations.
+	(reg_or_indexed_operand): New predicate for direct move support.
+
+	* gcc/config/rs6000/rs6000-cpus.def (ISA_2_5_MASKS_EMBEDDED):
+	Inherit from ISA_2_4_MASKS, not ISA_2_2_MASKS.
+	(ISA_2_7_MASKS_SERVER): New mask for ISA 2.07 (i.e. power8).
+	(POWERPC_MASKS): Add power8 options.
+	(power8 cpu): Use ISA_2_7_MASKS_SERVER instead of specifying the
+	various options.
+
+	* gcc/config/rs6000/rs6000-c.c (rs6000_target_modify_macros):
+	Define _ARCH_PWR8 and __POWER8_VECTOR__ for power8.
+
+	* gcc/config/rs6000/rs6000.opt (-mvsx-timode): Add documentation.
+	(-mpower8-fusion): New power8 options.
+	(-mpower8-fusion-sign): Likewise.
+	(-mpower8-vector): Likewise.
+	(-mcrypto): Likewise.
+	(-mdirect-move): Likewise.
+	(-mquad-memory): Likewise.
+
+	* gcc/config/rs6000/rs6000.c (power8_cost): Initial definition for
+	power8.
+	(rs6000_hard_regno_mode_ok): Make PTImode only match even GPR
+	registers.
+	(rs6000_debug_reg_print): Print the base register class if
+	-mdebug=reg.
+	(rs6000_debug_vector_unit): Add p8_vector.
+	(rs6000_debug_reg_global): If -mdebug=reg, print power8 constraint
+	definitions.  Also print fusion state.
+	(rs6000_init_hard_regno_mode_ok): Set up power8 constraints.
+	(rs6000_builtin_mask_calculate): Add power8 builtin support.
+	(rs6000_option_override_internal): Add support for power8.
+	(rs6000_common_init_builtins): Add debugging for skipped builtins
+	if -mdebug=builtin.
+	(rs6000_adjust_cost): Add power8 support.
+	(rs6000_issue_rate): Likewise.
+	(insn_must_be_first_in_group): Likewise.
+	(insn_must_be_last_in_group): Likewise.
+	(force_new_group): Likewise.
+	(rs6000_register_move_cost): Likewise.
+	(rs6000_opt_masks): Likewise.
+
+	* config/rs6000/rs6000.h (ASM_CPU_POWER8_SPEC): If we don't have a
+	power8 capable assembler, default to power7 options.
+	(TARGET_DIRECT_MOVE): Likewise.
+	(TARGET_CRYPTO): Likewise.
+	(TARGET_P8_VECTOR): Likewise.
+	(VECTOR_UNIT_P8_VECTOR_P): Define power8 vector support.
+	(VECTOR_UNIT_VSX_OR_P8_VECTOR_P): Likewise.
+	(VECTOR_MEM_P8_VECTOR_P): Likewise.
+	(VECTOR_MEM_VSX_OR_P8_VECTOR_P): Likewise.
+	(VECTOR_MEM_ALTIVEC_OR_VSX_P): Likewise.
+	(TARGET_XSCVDPSPN): Likewise.
+	(TARGET_XSCVSPDPN): Likewsie.
+	(TARGET_SYNC_HI_QI): Likewise.
+	(TARGET_SYNC_TI): Likewise.
+	(MASK_CRYPTO): Likewise.
+	(MASK_DIRECT_MOVE): Likewise.
+	(MASK_P8_FUSION): Likewise.
+	(MASK_P8_VECTOR): Likewise.
+	(REG_ALLOC_ORDER): Move fr13 to be lower in priority so that the
+	TFmode temporary used by some of the direct move instructions to
+	get two FP temporary registers does not force creation of a stack
+	frame.
+	(VLOGICAL_REGNO_P): Allow vector logical operations in GPRs.
+	(MODES_TIEABLE_P): Move the VSX tests above the Altivec tests so
+	that any VSX registers are tieable, even if they are also an
+	Altivec vector mode.
+	(r6000_reg_class_enum): Add wm, wq, wr, wv constraints.
+	(RS6000_BTM_P8_VECTOR): Power8 builtin support.
+	(RS6000_BTM_CRYPTO): Likewise.
+	(RS6000_BTM_COMMON): Likewise.
+
+	* config/rs6000/rs6000.md (cpu attribute): Add power8.
+	* config/rs6000/rs6000-opts.h (PROCESSOR_POWER8): Likewise.
+	(enum rs6000_vector): Add power8 vector support.
+
 	Clone branch from subversion id 199028.
 	* REVISION: New file to track subversion id.
 
Index: gcc/config/rs6000/constraints.md
===================================================================
--- gcc/config/rs6000/constraints.md	(revision 199037)
+++ gcc/config/rs6000/constraints.md	(working copy)
@@ -79,12 +79,35 @@ (define_register_constraint "wg" "rs6000
 (define_register_constraint "wl" "rs6000_constraints[RS6000_CONSTRAINT_wl]"
   "Floating point register if the LFIWAX instruction is enabled or NO_REGS.")
 
+(define_register_constraint "wm" "rs6000_constraints[RS6000_CONSTRAINT_wm]"
+  "VSX register if direct move instructions are enabled, or NO_REGS.")
+
+(define_constraint "wq"
+  "Even general purpose register to use with load/store quad instructions."
+  (match_operand 0 "quad_int_reg_operand"))
+
+(define_register_constraint "wr" "rs6000_constraints[RS6000_CONSTRAINT_wr]"
+  "General purpose register if 64-bit instructions are enabled or NO_REGS.")
+
+(define_register_constraint "wv" "rs6000_constraints[RS6000_CONSTRAINT_wv]"
+  "Altivec register if -mpower8-vector is used or NO_REGS.")
+
 (define_register_constraint "wx" "rs6000_constraints[RS6000_CONSTRAINT_wx]"
   "Floating point register if the STFIWX instruction is enabled or NO_REGS.")
 
 (define_register_constraint "wz" "rs6000_constraints[RS6000_CONSTRAINT_wz]"
   "Floating point register if the LFIWZX instruction is enabled or NO_REGS.")
 
+;; NO_REGs register constraint, used to merge mov{sd,sf}, since movsd can use
+;; direct move directly, and movsf can't to move between the register sets.
+;; There is a mode_attr that resolves to wm for SDmode and wn for SFmode
+(define_register_constraint "wn" "NO_REGS")
+
+;; Lq/stq validates the address for load/store quad
+(define_memory_constraint "wQ"
+  "Memory operand suitable for the load/store quad instructions"
+  (match_operand 0 "quad_memory_operand"))
+
 ;; Altivec style load/store that ignores the bottom bits of the address
 (define_memory_constraint "wZ"
   "Indexed or indirect memory operand, ignoring the bottom 4 bits"
Index: gcc/config/rs6000/predicates.md
===================================================================
--- gcc/config/rs6000/predicates.md	(revision 199037)
+++ gcc/config/rs6000/predicates.md	(working copy)
@@ -166,6 +166,11 @@ (define_predicate "const_2_to_3_operand"
   (and (match_code "const_int")
        (match_test "IN_RANGE (INTVAL (op), 2, 3)")))
 
+;; Match op = 0..15
+(define_predicate "const_0_to_15_operand"
+  (and (match_code "const_int")
+       (match_test "IN_RANGE (INTVAL (op), 0, 15)")))
+
 ;; Return 1 if op is a register that is not special.
 (define_predicate "gpc_reg_operand"
   (match_operand 0 "register_operand")
@@ -182,9 +187,68 @@ (define_predicate "gpc_reg_operand"
   if (REGNO (op) >= ARG_POINTER_REGNUM && !CA_REGNO_P (REGNO (op)))
     return 1;
 
+  if (TARGET_VSX && VSX_REGNO_P (REGNO (op)))
+    return 1;
+
   return INT_REGNO_P (REGNO (op)) || FP_REGNO_P (REGNO (op));
 })
 
+;; Return 1 if op is a general purpose register.  Unlike gpc_reg_operand, don't
+;; allow floating point or vector registers.
+(define_predicate "int_reg_operand"
+  (match_operand 0 "register_operand")
+{
+  if ((TARGET_E500_DOUBLE || TARGET_SPE) && invalid_e500_subreg (op, mode))
+    return 0;
+
+  if (GET_CODE (op) == SUBREG)
+    op = SUBREG_REG (op);
+
+  if (!REG_P (op))
+    return 0;
+
+  if (REGNO (op) >= ARG_POINTER_REGNUM && !CA_REGNO_P (REGNO (op)))
+    return 1;
+
+  return INT_REGNO_P (REGNO (op));
+})
+
+;; Like int_reg_operand, but only return true for base registers
+(define_predicate "base_reg_operand"
+  (match_operand 0 "int_reg_operand")
+{
+  if (GET_CODE (op) == SUBREG)
+    op = SUBREG_REG (op);
+
+  if (!REG_P (op))
+    return 0;
+
+  return (REGNO (op) != FIRST_GPR_REGNO);
+})
+
+;; Return 1 if op is a general purpose register that is an even register
+;; which suitable for a load/store quad operation
+(define_predicate "quad_int_reg_operand"
+  (match_operand 0 "register_operand")
+{
+  HOST_WIDE_INT r;
+
+  if (!TARGET_QUAD_MEMORY)
+    return 0;
+
+  if (GET_CODE (op) == SUBREG)
+    op = SUBREG_REG (op);
+
+  if (!REG_P (op))
+    return 0;
+
+  r = REGNO (op);
+  if (r >= FIRST_PSEUDO_REGISTER)
+    return 1;
+
+  return (INT_REGNO_P (r) && ((r & 1) == 0));
+})
+
 ;; Return 1 if op is a register that is a condition register field.
 (define_predicate "cc_reg_operand"
   (match_operand 0 "register_operand")
@@ -302,6 +366,11 @@ (define_predicate "reg_or_logical_cint_o
 		      & (~ (unsigned HOST_WIDE_INT) 0xffffffff)) == 0)")
     (match_operand 0 "gpc_reg_operand")))
 
+;; Like reg_or_logical_cint_operand, but allow vsx registers
+(define_predicate "vsx_reg_or_cint_operand"
+  (ior (match_operand 0 "vsx_register_operand")
+       (match_operand 0 "reg_or_logical_cint_operand")))
+
 ;; Return 1 if operand is a CONST_DOUBLE that can be set in a register
 ;; with no more than one instruction per word.
 (define_predicate "easy_fp_constant"
@@ -507,6 +576,54 @@ (define_predicate "offsettable_mem_opera
   (and (match_operand 0 "memory_operand")
        (match_test "offsettable_nonstrict_memref_p (op)")))
 
+;; Return 1 if the operand is suitable for load/store quad memory.
+(define_predicate "quad_memory_operand"
+  (match_code "mem")
+{
+  rtx addr, op0, op1;
+  int ret;
+
+  if (!TARGET_QUAD_MEMORY)
+    ret = 0;
+
+  else if (!memory_operand (op, mode))
+    ret = 0;
+
+  else if (GET_MODE_SIZE (GET_MODE (op)) != 16)
+    ret = 0;
+
+  else if (MEM_ALIGN (op) < 128)
+    ret = 0;
+
+  else
+    {
+      addr = XEXP (op, 0);
+      if (int_reg_operand (addr, Pmode))
+	ret = 1;
+
+      else if (GET_CODE (addr) != PLUS)
+	ret = 0;
+
+      else
+	{
+	  op0 = XEXP (addr, 0);
+	  op1 = XEXP (addr, 1);
+	  ret = (int_reg_operand (op0, Pmode)
+		 && GET_CODE (op1) == CONST_INT
+		 && IN_RANGE (INTVAL (op1), -32768, 32767)
+		 && (INTVAL (op1) & 15) == 0);
+	}
+    }
+
+  if (TARGET_DEBUG_ADDR)
+    {
+      fprintf (stderr, "\nquad_memory_operand, ret = %s\n", ret ? "true" : "false");
+      debug_rtx (op);
+    }
+
+  return ret;
+})
+
 ;; Return 1 if the operand is an indexed or indirect memory operand.
 (define_predicate "indexed_or_indirect_operand"
   (match_code "mem")
@@ -521,6 +638,19 @@ (define_predicate "indexed_or_indirect_o
   return indexed_or_indirect_address (op, mode);
 })
 
+;; Like indexed_or_indirect_operand, but also allow a GPR register if direct
+;; moves are supported.
+(define_predicate "reg_or_indexed_operand"
+  (match_code "mem,reg")
+{
+  if (MEM_P (op))
+    return indexed_or_indirect_operand (op, mode);
+  else if (TARGET_DIRECT_MOVE)
+    return register_operand (op, mode);
+  return
+    0;
+})
+
 ;; Return 1 if the operand is an indexed or indirect memory operand with an
 ;; AND -16 in it, used to recognize when we need to switch to Altivec loads
 ;; to realign loops instead of VSX (altivec silently ignores the bottom bits,
Index: gcc/config/rs6000/rs6000-cpus.def
===================================================================
--- gcc/config/rs6000/rs6000-cpus.def	(revision 199037)
+++ gcc/config/rs6000/rs6000-cpus.def	(working copy)
@@ -28,7 +28,7 @@
      ALTIVEC, since in general it isn't a win on power6.  In ISA 2.04, fsel,
      fre, fsqrt, etc. were no longer documented as optional.  Group masks by
      server and embedded. */
-#define ISA_2_5_MASKS_EMBEDDED	(ISA_2_2_MASKS				\
+#define ISA_2_5_MASKS_EMBEDDED	(ISA_2_4_MASKS				\
 				 | OPTION_MASK_CMPB			\
 				 | OPTION_MASK_RECIP_PRECISION		\
 				 | OPTION_MASK_PPC_GFXOPT		\
@@ -45,6 +45,14 @@
 				 | OPTION_MASK_VSX			\
 				 | OPTION_MASK_VSX_TIMODE)
 
+/* For now, don't provide an embedded version of ISA 2.07.  */
+#define ISA_2_7_MASKS_SERVER	(ISA_2_6_MASKS_SERVER			\
+				 | OPTION_MASK_P8_FUSION		\
+				 | OPTION_MASK_P8_VECTOR		\
+				 | OPTION_MASK_CRYPTO			\
+				 | OPTION_MASK_DIRECT_MOVE		\
+				 | OPTION_MASK_QUAD_MEMORY)
+
 #define POWERPC_7400_MASK	(OPTION_MASK_PPC_GFXOPT | OPTION_MASK_ALTIVEC)
 
 /* Deal with ports that do not have -mstrict-align.  */
@@ -61,7 +69,9 @@
 /* Mask of all options to set the default isa flags based on -mcpu=<xxx>.  */
 #define POWERPC_MASKS		(OPTION_MASK_ALTIVEC			\
 				 | OPTION_MASK_CMPB			\
+				 | OPTION_MASK_CRYPTO			\
 				 | OPTION_MASK_DFP			\
+				 | OPTION_MASK_DIRECT_MOVE		\
 				 | OPTION_MASK_DLMZB			\
 				 | OPTION_MASK_FPRND			\
 				 | OPTION_MASK_ISEL			\
@@ -69,11 +79,14 @@
 				 | OPTION_MASK_MFPGPR			\
 				 | OPTION_MASK_MULHW			\
 				 | OPTION_MASK_NO_UPDATE		\
+				 | OPTION_MASK_P8_FUSION		\
+				 | OPTION_MASK_P8_VECTOR		\
 				 | OPTION_MASK_POPCNTB			\
 				 | OPTION_MASK_POPCNTD			\
 				 | OPTION_MASK_POWERPC64		\
 				 | OPTION_MASK_PPC_GFXOPT		\
 				 | OPTION_MASK_PPC_GPOPT		\
+				 | OPTION_MASK_QUAD_MEMORY		\
 				 | OPTION_MASK_RECIP_PRECISION		\
 				 | OPTION_MASK_SOFT_FLOAT		\
 				 | OPTION_MASK_STRICT_ALIGN_OPTIONAL	\
@@ -168,10 +181,7 @@ RS6000_CPU ("power7", PROCESSOR_POWER7, 
 	    POWERPC_7400_MASK | MASK_POWERPC64 | MASK_PPC_GPOPT | MASK_MFCRF
 	    | MASK_POPCNTB | MASK_FPRND | MASK_CMPB | MASK_DFP | MASK_POPCNTD
 	    | MASK_VSX | MASK_RECIP_PRECISION | MASK_VSX_TIMODE)
-RS6000_CPU ("power8", PROCESSOR_POWER7,   /* Don't add MASK_ISEL by default */
-	    POWERPC_7400_MASK | MASK_POWERPC64 | MASK_PPC_GPOPT | MASK_MFCRF
-	    | MASK_POPCNTB | MASK_FPRND | MASK_CMPB | MASK_DFP | MASK_POPCNTD
-	    | MASK_VSX | MASK_RECIP_PRECISION | MASK_VSX_TIMODE)
+RS6000_CPU ("power8", PROCESSOR_POWER7, MASK_POWERPC64 | ISA_2_7_MASKS_SERVER)
 RS6000_CPU ("powerpc", PROCESSOR_POWERPC, 0)
 RS6000_CPU ("powerpc64", PROCESSOR_POWERPC64, MASK_PPC_GFXOPT | MASK_POWERPC64)
 RS6000_CPU ("rs64", PROCESSOR_RS64A, MASK_PPC_GFXOPT | MASK_POWERPC64)
Index: gcc/config/rs6000/rs6000-c.c
===================================================================
--- gcc/config/rs6000/rs6000-c.c	(revision 199037)
+++ gcc/config/rs6000/rs6000-c.c	(working copy)
@@ -315,6 +315,8 @@ rs6000_target_modify_macros (bool define
     rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR6X");
   if ((flags & OPTION_MASK_POPCNTD) != 0)
     rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR7");
+  if ((flags & OPTION_MASK_DIRECT_MOVE) != 0)
+    rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR8");
   if ((flags & OPTION_MASK_SOFT_FLOAT) != 0)
     rs6000_define_or_undefine_macro (define_p, "_SOFT_FLOAT");
   if ((flags & OPTION_MASK_RECIP_PRECISION) != 0)
@@ -331,6 +333,8 @@ rs6000_target_modify_macros (bool define
     }
   if ((flags & OPTION_MASK_VSX) != 0)
     rs6000_define_or_undefine_macro (define_p, "__VSX__");
+  if ((flags & OPTION_MASK_P8_VECTOR) != 0)
+    rs6000_define_or_undefine_macro (define_p, "__POWER8_VECTOR__");
 
   /* options from the builtin masks.  */
   if ((bu_mask & RS6000_BTM_SPE) != 0)
Index: gcc/config/rs6000/rs6000.opt
===================================================================
--- gcc/config/rs6000/rs6000.opt	(revision 199037)
+++ gcc/config/rs6000/rs6000.opt	(working copy)
@@ -517,4 +517,28 @@ Control whether we save the TOC in the p
 
 mvsx-timode
 Target Undocumented Mask(VSX_TIMODE) Var(rs6000_isa_flags)
-; Allow/disallow TImode in VSX registers
+Allow 128-bit integers in VSX registers
+
+mpower8-fusion
+Target Report Mask(P8_FUSION) Var(rs6000_isa_flags)
+Fuse certain integer operations together for better performance on power8
+
+mpower8-fusion-sign
+Target Undocumented Mask(P8_FUSION_SIGN) Var(rs6000_isa_flags)
+Allow sign extension in fusion operations
+
+mpower8-vector
+Target Report Mask(P8_VECTOR) Var(rs6000_isa_flags)
+Use/do not use vector and scalar instructions added in ISA 2.07.
+
+mcrypto
+Target Report Mask(CRYPTO) Var(rs6000_isa_flags)
+Use ISA 2.07 crypto instructions
+
+mdirect-move
+Target Report Mask(DIRECT_MOVE) Var(rs6000_isa_flags)
+Use ISA 2.07 direct move between GPR & VSX register instructions
+
+mquad-memory
+Target Report Mask(QUAD_MEMORY) Var(rs6000_isa_flags)
+Generate the quad word memory instructions (lq/stq/lqarx/stqcx).
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 199037)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -831,6 +831,25 @@ struct processor_costs power7_cost = {
   12,			/* prefetch streams */
 };
 
+/* Instruction costs on POWER8 processors.  */
+static const
+struct processor_costs power8_cost = {
+  COSTS_N_INSNS (3),	/* mulsi */
+  COSTS_N_INSNS (3),	/* mulsi_const */
+  COSTS_N_INSNS (3),	/* mulsi_const9 */
+  COSTS_N_INSNS (3),	/* muldi */
+  COSTS_N_INSNS (19),	/* divsi */
+  COSTS_N_INSNS (35),	/* divdi */
+  COSTS_N_INSNS (3),	/* fp */
+  COSTS_N_INSNS (3),	/* dmul */
+  COSTS_N_INSNS (14),	/* sdiv */
+  COSTS_N_INSNS (17),	/* ddiv */
+  128,			/* cache line size */
+  32,			/* l1 cache */
+  256,			/* l2 cache */
+  12,			/* prefetch streams */
+};
+
 /* Instruction costs on POWER A2 processors.  */
 static const
 struct processor_costs ppca2_cost = {
@@ -1547,6 +1566,15 @@ rs6000_hard_regno_mode_ok (int regno, en
 {
   int last_regno = regno + rs6000_hard_regno_nregs[mode][regno] - 1;
 
+  /* PTImode can only go in GPRs.  Quad word memory operations require even/odd
+     register combinations, and use PTImode where we need to deal with quad
+     word memory operations.  Don't allow quad words in the argument or frame
+     pointer registers, just registers 0..31.  */
+  if (mode == PTImode)
+    return (IN_RANGE (regno, FIRST_GPR_REGNO, LAST_GPR_REGNO)
+	    && IN_RANGE (last_regno, FIRST_GPR_REGNO, LAST_GPR_REGNO)
+	    && ((regno & 1) == 0));
+
   /* VSX registers that overlap the FPR registers are larger than for non-VSX
      implementations.  Don't allow an item to be split between a FP register
      and an Altivec register.  */
@@ -1678,6 +1706,16 @@ rs6000_debug_reg_print (int first_regno,
 	  comma = "";
 	}
 
+      len += fprintf (stderr, "%sreg-class = %s", comma,
+		      reg_class_names[(int)rs6000_regno_regclass[r]]);
+      comma = ", ";
+
+      if (len > 70)
+	{
+	  fprintf (stderr, ",\n\t");
+	  comma = "";
+	}
+
       fprintf (stderr, "%sregno = %d\n", comma, r);
     }
 }
@@ -1710,6 +1748,7 @@ rs6000_debug_reg_global (void)
     "none",
     "altivec",
     "vsx",
+    "p8_vector",
     "paired",
     "spe",
     "other"
@@ -1802,8 +1841,11 @@ rs6000_debug_reg_global (void)
 	   "wf reg_class = %s\n"
 	   "wg reg_class = %s\n"
 	   "wl reg_class = %s\n"
+	   "wm reg_class = %s\n"
+	   "wr reg_class = %s\n"
 	   "ws reg_class = %s\n"
 	   "wt reg_class = %s\n"
+	   "wv reg_class = %s\n"
 	   "wx reg_class = %s\n"
 	   "wz reg_class = %s\n"
 	   "\n",
@@ -1815,8 +1857,11 @@ rs6000_debug_reg_global (void)
 	   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wf]],
 	   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wg]],
 	   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wl]],
+	   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wm]],
+	   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wr]],
 	   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_ws]],
 	   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wt]],
+	   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wv]],
 	   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wx]],
 	   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wz]]);
 
@@ -2050,6 +2095,10 @@ rs6000_debug_reg_global (void)
   if (targetm.lra_p ())
     fprintf (stderr, DEBUG_FMT_S, "lra", "true");
 
+  if (TARGET_P8_FUSION)
+    fprintf (stderr, DEBUG_FMT_S, "p8 fusion",
+	     (TARGET_P8_FUSION_SIGN) ? "zero+sign" : "zero");
+
   fprintf (stderr, DEBUG_FMT_S, "plt-format",
 	   TARGET_SECURE_PLT ? "secure" : "bss");
   fprintf (stderr, DEBUG_FMT_S, "struct-return",
@@ -2240,6 +2289,15 @@ rs6000_init_hard_regno_mode_ok (bool glo
   if (TARGET_LFIWAX)
     rs6000_constraints[RS6000_CONSTRAINT_wl] = FLOAT_REGS;
 
+  if (TARGET_DIRECT_MOVE)
+    rs6000_constraints[RS6000_CONSTRAINT_wm] = VSX_REGS;
+
+  if (TARGET_POWERPC64)
+    rs6000_constraints[RS6000_CONSTRAINT_wr] = GENERAL_REGS;
+
+  if (TARGET_P8_VECTOR)
+    rs6000_constraints[RS6000_CONSTRAINT_wv] = ALTIVEC_REGS;
+
   if (TARGET_STFIWX)
     rs6000_constraints[RS6000_CONSTRAINT_wx] = FLOAT_REGS;
 
@@ -2520,16 +2578,18 @@ darwin_rs6000_override_options (void)
 HOST_WIDE_INT
 rs6000_builtin_mask_calculate (void)
 {
-  return (((TARGET_ALTIVEC)		    ? RS6000_BTM_ALTIVEC  : 0)
-	  | ((TARGET_VSX)		    ? RS6000_BTM_VSX	  : 0)
-	  | ((TARGET_SPE)		    ? RS6000_BTM_SPE	  : 0)
-	  | ((TARGET_PAIRED_FLOAT)	    ? RS6000_BTM_PAIRED	  : 0)
-	  | ((TARGET_FRE)		    ? RS6000_BTM_FRE	  : 0)
-	  | ((TARGET_FRES)		    ? RS6000_BTM_FRES	  : 0)
-	  | ((TARGET_FRSQRTE)		    ? RS6000_BTM_FRSQRTE  : 0)
-	  | ((TARGET_FRSQRTES)		    ? RS6000_BTM_FRSQRTES : 0)
-	  | ((TARGET_POPCNTD)		    ? RS6000_BTM_POPCNTD  : 0)
-	  | ((rs6000_cpu == PROCESSOR_CELL) ? RS6000_BTM_CELL     : 0));
+  return (((TARGET_ALTIVEC)		    ? RS6000_BTM_ALTIVEC   : 0)
+	  | ((TARGET_VSX)		    ? RS6000_BTM_VSX	   : 0)
+	  | ((TARGET_SPE)		    ? RS6000_BTM_SPE	   : 0)
+	  | ((TARGET_PAIRED_FLOAT)	    ? RS6000_BTM_PAIRED	   : 0)
+	  | ((TARGET_FRE)		    ? RS6000_BTM_FRE	   : 0)
+	  | ((TARGET_FRES)		    ? RS6000_BTM_FRES	   : 0)
+	  | ((TARGET_FRSQRTE)		    ? RS6000_BTM_FRSQRTE   : 0)
+	  | ((TARGET_FRSQRTES)		    ? RS6000_BTM_FRSQRTES  : 0)
+	  | ((TARGET_POPCNTD)		    ? RS6000_BTM_POPCNTD   : 0)
+	  | ((rs6000_cpu == PROCESSOR_CELL) ? RS6000_BTM_CELL      : 0)
+	  | ((TARGET_P8_VECTOR)		    ? RS6000_BTM_P8_VECTOR : 0)
+	  | ((TARGET_CRYPTO)		    ? RS6000_BTM_CRYPTO	   : 0));
 }
 
 /* Override command line options.  Mostly we process the processor type and
@@ -2803,7 +2863,9 @@ rs6000_option_override_internal (bool gl
 
   /* For the newer switches (vsx, dfp, etc.) set some of the older options,
      unless the user explicitly used the -mno-<option> to disable the code.  */
-  if (TARGET_VSX)
+  if (TARGET_P8_VECTOR || TARGET_DIRECT_MOVE || TARGET_CRYPTO)
+    rs6000_isa_flags |= (ISA_2_7_MASKS_SERVER & ~rs6000_isa_flags_explicit);
+  else if (TARGET_VSX)
     rs6000_isa_flags |= (ISA_2_6_MASKS_SERVER & ~rs6000_isa_flags_explicit);
   else if (TARGET_POPCNTD)
     rs6000_isa_flags |= (ISA_2_6_MASKS_EMBEDDED & ~rs6000_isa_flags_explicit);
@@ -2818,6 +2880,34 @@ rs6000_option_override_internal (bool gl
   else if (TARGET_ALTIVEC)
     rs6000_isa_flags |= (OPTION_MASK_PPC_GFXOPT & ~rs6000_isa_flags_explicit);
 
+  if (TARGET_CRYPTO && !TARGET_ALTIVEC)
+    {
+      if (rs6000_isa_flags_explicit & OPTION_MASK_CRYPTO)
+	error ("-mcrypto requires -maltivec");
+      rs6000_isa_flags &= ~OPTION_MASK_CRYPTO;
+    }
+
+  if (TARGET_DIRECT_MOVE && !TARGET_VSX)
+    {
+      if (rs6000_isa_flags_explicit & OPTION_MASK_DIRECT_MOVE)
+	error ("-mdirect-move requires -mvsx");
+      rs6000_isa_flags &= ~OPTION_MASK_DIRECT_MOVE;
+    }
+
+  if (TARGET_P8_VECTOR && !TARGET_ALTIVEC)
+    {
+      if (rs6000_isa_flags_explicit & OPTION_MASK_P8_VECTOR)
+	error ("-mpower8-vector requires -maltivec");
+      rs6000_isa_flags &= ~OPTION_MASK_P8_VECTOR;
+    }
+
+  if (TARGET_P8_VECTOR && !TARGET_VSX)
+    {
+      if (rs6000_isa_flags_explicit & OPTION_MASK_P8_VECTOR)
+	error ("-mpower8-vector requires -mvsx");
+      rs6000_isa_flags &= ~OPTION_MASK_P8_VECTOR;
+    }
+
   if (TARGET_VSX_TIMODE && !TARGET_VSX)
     {
       if (rs6000_isa_flags_explicit & OPTION_MASK_VSX_TIMODE)
@@ -3019,16 +3109,19 @@ rs6000_option_override_internal (bool gl
 			&& rs6000_cpu != PROCESSOR_POWER5
 			&& rs6000_cpu != PROCESSOR_POWER6
 			&& rs6000_cpu != PROCESSOR_POWER7
+			&& rs6000_cpu != PROCESSOR_POWER8
 			&& rs6000_cpu != PROCESSOR_PPCA2
 			&& rs6000_cpu != PROCESSOR_CELL
 			&& rs6000_cpu != PROCESSOR_PPC476);
   rs6000_sched_groups = (rs6000_cpu == PROCESSOR_POWER4
 			 || rs6000_cpu == PROCESSOR_POWER5
-			 || rs6000_cpu == PROCESSOR_POWER7);
+			 || rs6000_cpu == PROCESSOR_POWER7
+			 || rs6000_cpu == PROCESSOR_POWER8);
   rs6000_align_branch_targets = (rs6000_cpu == PROCESSOR_POWER4
 				 || rs6000_cpu == PROCESSOR_POWER5
 				 || rs6000_cpu == PROCESSOR_POWER6
 				 || rs6000_cpu == PROCESSOR_POWER7
+				 || rs6000_cpu == PROCESSOR_POWER8
 				 || rs6000_cpu == PROCESSOR_PPCE500MC
 				 || rs6000_cpu == PROCESSOR_PPCE500MC64
 				 || rs6000_cpu == PROCESSOR_PPCE5500
@@ -3272,6 +3365,10 @@ rs6000_option_override_internal (bool gl
 	rs6000_cost = &power7_cost;
 	break;
 
+      case PROCESSOR_POWER8:
+	rs6000_cost = &power8_cost;
+	break;
+
       case PROCESSOR_PPCA2:
 	rs6000_cost = &ppca2_cost;
 	break;
@@ -3444,7 +3541,8 @@ rs6000_loop_align (rtx label)
       && (rs6000_cpu == PROCESSOR_POWER4
 	  || rs6000_cpu == PROCESSOR_POWER5
 	  || rs6000_cpu == PROCESSOR_POWER6
-	  || rs6000_cpu == PROCESSOR_POWER7))
+	  || rs6000_cpu == PROCESSOR_POWER7
+	  || rs6000_cpu == PROCESSOR_POWER8))
     return 5;
   else
     return align_loops_log;
@@ -12891,8 +12989,23 @@ rs6000_common_init_builtins (void)
       else
 	{
 	  enum insn_code icode = d->icode;
-          if (d->name == 0 || icode == CODE_FOR_nothing)
-	    continue;
+	  if (d->name == 0)
+	    {
+	      if (TARGET_DEBUG_BUILTIN)
+		fprintf (stderr, "rs6000_builtin, bdesc_3arg[%ld] no name\n",
+			 (long unsigned)i);
+
+	      continue;
+	    }
+
+          if (icode == CODE_FOR_nothing)
+	    {
+	      if (TARGET_DEBUG_BUILTIN)
+		fprintf (stderr, "rs6000_builtin, skip ternary %s (no code)\n",
+			 d->name);
+
+	      continue;
+	    }
 
 	  type = builtin_function_type (insn_data[icode].operand[0].mode,
 					insn_data[icode].operand[1].mode,
@@ -12931,8 +13044,23 @@ rs6000_common_init_builtins (void)
       else
 	{
 	  enum insn_code icode = d->icode;
-          if (d->name == 0 || icode == CODE_FOR_nothing)
-	    continue;
+	  if (d->name == 0)
+	    {
+	      if (TARGET_DEBUG_BUILTIN)
+		fprintf (stderr, "rs6000_builtin, bdesc_2arg[%ld] no name\n",
+			 (long unsigned)i);
+
+	      continue;
+	    }
+
+          if (icode == CODE_FOR_nothing)
+	    {
+	      if (TARGET_DEBUG_BUILTIN)
+		fprintf (stderr, "rs6000_builtin, skip binary %s (no code)\n",
+			 d->name);
+
+	      continue;
+	    }
 
           mode0 = insn_data[icode].operand[0].mode;
           mode1 = insn_data[icode].operand[1].mode;
@@ -12993,8 +13121,23 @@ rs6000_common_init_builtins (void)
       else
         {
 	  enum insn_code icode = d->icode;
-          if (d->name == 0 || icode == CODE_FOR_nothing)
-	    continue;
+	  if (d->name == 0)
+	    {
+	      if (TARGET_DEBUG_BUILTIN)
+		fprintf (stderr, "rs6000_builtin, bdesc_1arg[%ld] no name\n",
+			 (long unsigned)i);
+
+	      continue;
+	    }
+
+          if (icode == CODE_FOR_nothing)
+	    {
+	      if (TARGET_DEBUG_BUILTIN)
+		fprintf (stderr, "rs6000_builtin, skip unary %s (no code)\n",
+			 d->name);
+
+	      continue;
+	    }
 
           mode0 = insn_data[icode].operand[0].mode;
           mode1 = insn_data[icode].operand[1].mode;
@@ -22951,6 +23094,7 @@ rs6000_adjust_cost (rtx insn, rtx link, 
                  || rs6000_cpu_attr == CPU_POWER4
                  || rs6000_cpu_attr == CPU_POWER5
 		 || rs6000_cpu_attr == CPU_POWER7
+		 || rs6000_cpu_attr == CPU_POWER8
                  || rs6000_cpu_attr == CPU_CELL)
                 && recog_memoized (dep_insn)
                 && (INSN_CODE (dep_insn) >= 0))
@@ -23537,6 +23681,8 @@ rs6000_issue_rate (void)
   case CPU_POWER6:
   case CPU_POWER7:
     return 5;
+  case CPU_POWER8:
+    return 7;
   default:
     return 1;
   }
@@ -24130,6 +24276,7 @@ insn_must_be_first_in_group (rtx insn)
         }
       break;
     case PROCESSOR_POWER7:
+    case PROCESSOR_POWER8:	/* FIXME */
       type = get_attr_type (insn);
 
       switch (type)
@@ -24226,6 +24373,7 @@ insn_must_be_last_in_group (rtx insn)
     }
     break;
   case PROCESSOR_POWER7:
+  case PROCESSOR_POWER8:	/* FIXME */
     type = get_attr_type (insn);
 
     switch (type)
@@ -24332,7 +24480,8 @@ force_new_group (int sched_verbose, FILE
 	can_issue_more--;
 
       /* Power6 and Power7 have special group ending nop. */
-      if (rs6000_cpu_attr == CPU_POWER6 || rs6000_cpu_attr == CPU_POWER7)
+      if (rs6000_cpu_attr == CPU_POWER6 || rs6000_cpu_attr == CPU_POWER7
+	  || rs6000_cpu_attr == CPU_POWER8)
 	{
 	  nop = gen_group_ending_nop ();
 	  emit_insn_before (nop, next_insn);
@@ -26513,7 +26662,8 @@ rs6000_register_move_cost (enum machine_
       /* For those processors that have slow LR/CTR moves, make them more
          expensive than memory in order to bias spills to memory .*/
       else if ((rs6000_cpu == PROCESSOR_POWER6
-		|| rs6000_cpu == PROCESSOR_POWER7)
+		|| rs6000_cpu == PROCESSOR_POWER7
+		|| rs6000_cpu == PROCESSOR_POWER8)
 	       && reg_classes_intersect_p (rclass, LINK_OR_CTR_REGS))
         ret = 6 * hard_regno_nregs[0][mode];
 
@@ -27742,6 +27892,8 @@ static struct rs6000_opt_mask const rs60
 {
   { "altivec",			OPTION_MASK_ALTIVEC,		false, true  },
   { "cmpb",			OPTION_MASK_CMPB,		false, true  },
+  { "crypto",			OPTION_MASK_CRYPTO,		false, true  },
+  { "direct-move",		OPTION_MASK_DIRECT_MOVE,	false, true  },
   { "dlmzb",			OPTION_MASK_DLMZB,		false, true  },
   { "fprnd",			OPTION_MASK_FPRND,		false, true  },
   { "hard-dfp",			OPTION_MASK_DFP,		false, true  },
@@ -27750,13 +27902,17 @@ static struct rs6000_opt_mask const rs60
   { "mfpgpr",			OPTION_MASK_MFPGPR,		false, true  },
   { "mulhw",			OPTION_MASK_MULHW,		false, true  },
   { "multiple",			OPTION_MASK_MULTIPLE,		false, true  },
-  { "update",			OPTION_MASK_NO_UPDATE,		true , true  },
   { "popcntb",			OPTION_MASK_POPCNTB,		false, true  },
   { "popcntd",			OPTION_MASK_POPCNTD,		false, true  },
+  { "power8-fusion",		OPTION_MASK_P8_FUSION,		false, true  },
+  { "power8-fusion-sign",	OPTION_MASK_P8_FUSION_SIGN,	false, true  },
+  { "power8-vector",		OPTION_MASK_P8_VECTOR,		false, true  },
   { "powerpc-gfxopt",		OPTION_MASK_PPC_GFXOPT,		false, true  },
   { "powerpc-gpopt",		OPTION_MASK_PPC_GPOPT,		false, true  },
+  { "quad-memory",		OPTION_MASK_QUAD_MEMORY,	false, true  },
   { "recip-precision",		OPTION_MASK_RECIP_PRECISION,	false, true  },
   { "string",			OPTION_MASK_STRING,		false, true  },
+  { "update",			OPTION_MASK_NO_UPDATE,		true , true  },
   { "vsx",			OPTION_MASK_VSX,		false, true  },
   { "vsx-timode",		OPTION_MASK_VSX_TIMODE,		false, true  },
 #ifdef OPTION_MASK_64BIT
@@ -27798,6 +27954,8 @@ static struct rs6000_opt_mask const rs60
   { "frsqrtes",		 RS6000_BTM_FRSQRTES,	false, false },
   { "popcntd",		 RS6000_BTM_POPCNTD,	false, false },
   { "cell",		 RS6000_BTM_CELL,	false, false },
+  { "power8-vector",	 RS6000_BTM_P8_VECTOR,	false, false },
+  { "crypto",		 RS6000_BTM_CRYPTO,	false, false },
 };
 
 /* Option variables that we want to support inside attribute((target)) and
Index: gcc/config/rs6000/rs6000.h
===================================================================
--- gcc/config/rs6000/rs6000.h	(revision 199037)
+++ gcc/config/rs6000/rs6000.h	(working copy)
@@ -92,7 +92,7 @@
 #ifdef HAVE_AS_POWER8
 #define ASM_CPU_POWER8_SPEC "-mpower8"
 #else
-#define ASM_CPU_POWER8_SPEC "-mpower4 -maltivec"
+#define ASM_CPU_POWER8_SPEC ASM_CPU_POWER7_SPEC
 #endif
 
 #ifdef HAVE_AS_DCI
@@ -164,6 +164,7 @@
 %{mcpu=e6500: -me6500} \
 %{maltivec: -maltivec} \
 %{mvsx: -mvsx %{!maltivec: -maltivec} %{!mcpu*: %(asm_cpu_power7)}} \
+%{mpower8-vector|mcrypto|mdirect-move: %{!mcpu*: %(asm_cpu_power8)}} \
 -many"
 
 #define CPP_DEFAULT_SPEC ""
@@ -277,6 +278,19 @@ extern const char *host_detect_local_cpu
 #define TARGET_POPCNTD 0
 #endif
 
+/* Define the ISA 2.07 flags as 0 if the target assembler does not support the
+   waitasecond instruction.  Allow -mpower8-fusion, since it does not add new
+   instructions.  */
+
+#ifndef HAVE_AS_POWER8
+#undef  TARGET_DIRECT_MOVE
+#undef  TARGET_CRYPTO
+#undef  TARGET_P8_VECTOR
+#define TARGET_DIRECT_MOVE 0
+#define TARGET_CRYPTO 0
+#define TARGET_P8_VECTOR 0
+#endif
+
 /* Define TARGET_LWSYNC_INSTRUCTION if the assembler knows about lwsync.  If
    not, generate the lwsync code as an integer constant.  */
 #ifdef HAVE_AS_LWSYNC
@@ -386,6 +400,7 @@ extern const char *host_detect_local_cpu
 #define TARGET_DEBUG_TARGET	(rs6000_debug & MASK_DEBUG_TARGET)
 #define TARGET_DEBUG_BUILTIN	(rs6000_debug & MASK_DEBUG_BUILTIN)
 
+/* Describe the vector unit used for arithmetic operations.  */
 extern enum rs6000_vector rs6000_vector_unit[];
 
 #define VECTOR_UNIT_NONE_P(MODE)			\
@@ -394,12 +409,25 @@ extern enum rs6000_vector rs6000_vector_
 #define VECTOR_UNIT_VSX_P(MODE)				\
   (rs6000_vector_unit[(MODE)] == VECTOR_VSX)
 
+#define VECTOR_UNIT_P8_VECTOR_P(MODE)			\
+  (rs6000_vector_unit[(MODE)] == VECTOR_P8_VECTOR)
+
 #define VECTOR_UNIT_ALTIVEC_P(MODE)			\
   (rs6000_vector_unit[(MODE)] == VECTOR_ALTIVEC)
 
+#define VECTOR_UNIT_VSX_OR_P8_VECTOR_P(MODE)		\
+  (IN_RANGE ((int)rs6000_vector_unit[(MODE)],		\
+	     (int)VECTOR_VSX,				\
+	     (int)VECTOR_P8_VECTOR))
+
+/* VECTOR_UNIT_ALTIVEC_OR_VSX_P is used in places where we are using either
+   altivec (VMX) or VSX vector instructions.  P8 vector support is upwards
+   compatible, so allow it as well, rather than changing all of the uses of the
+   macro.  */
 #define VECTOR_UNIT_ALTIVEC_OR_VSX_P(MODE)		\
-  (rs6000_vector_unit[(MODE)] == VECTOR_ALTIVEC 	\
-   || rs6000_vector_unit[(MODE)] == VECTOR_VSX)
+  (IN_RANGE ((int)rs6000_vector_unit[(MODE)],		\
+	     (int)VECTOR_ALTIVEC,			\
+	     (int)VECTOR_P8_VECTOR))
 
 /* Describe whether to use VSX loads or Altivec loads.  For now, just use the
    same unit as the vector unit we are using, but we may want to migrate to
@@ -412,12 +440,21 @@ extern enum rs6000_vector rs6000_vector_
 #define VECTOR_MEM_VSX_P(MODE)				\
   (rs6000_vector_mem[(MODE)] == VECTOR_VSX)
 
+#define VECTOR_MEM_P8_VECTOR_P(MODE)			\
+  (rs6000_vector_mem[(MODE)] == VECTOR_VSX)
+
 #define VECTOR_MEM_ALTIVEC_P(MODE)			\
   (rs6000_vector_mem[(MODE)] == VECTOR_ALTIVEC)
 
+#define VECTOR_MEM_VSX_OR_P8_VECTOR_P(MODE)		\
+  (IN_RANGE ((int)rs6000_vector_mem[(MODE)],		\
+	     (int)VECTOR_VSX,				\
+	     (int)VECTOR_P8_VECTOR))
+
 #define VECTOR_MEM_ALTIVEC_OR_VSX_P(MODE)		\
-  (rs6000_vector_mem[(MODE)] == VECTOR_ALTIVEC 	\
-   || rs6000_vector_mem[(MODE)] == VECTOR_VSX)
+  (IN_RANGE ((int)rs6000_vector_mem[(MODE)],		\
+	     (int)VECTOR_ALTIVEC,			\
+	     (int)VECTOR_P8_VECTOR))
 
 /* Return the alignment of a given vector type, which is set based on the
    vector unit use.  VSX for instance can load 32 or 64 bit aligned words
@@ -479,6 +516,15 @@ extern int rs6000_vector_align[];
 #define TARGET_FCTIDUZ	TARGET_POPCNTD
 #define TARGET_FCTIWUZ	TARGET_POPCNTD
 
+#define TARGET_XSCVDPSPN	(TARGET_DIRECT_MOVE || TARGET_P8_VECTOR)
+#define TARGET_XSCVSPDPN	(TARGET_DIRECT_MOVE || TARGET_P8_VECTOR)
+
+/* Byte/char syncs were added as phased in for ISA 2.06B, but are not present
+   in power7, so conditionalize them on p8 features.  TImode syncs need quad
+   memory support.  */
+#define TARGET_SYNC_HI_QI	(TARGET_QUAD_MEMORY || TARGET_DIRECT_MOVE)
+#define TARGET_SYNC_TI		TARGET_QUAD_MEMORY
+
 /* Power7 has both 32-bit load and store integer for the FPRs, so we don't need
    to allocate the SDmode stack slot to get the value into the proper location
    in the register.  */
@@ -489,10 +535,13 @@ extern int rs6000_vector_align[];
    OPTION_MASK_<xxx> back into MASK_<xxx>.  */
 #define MASK_ALTIVEC			OPTION_MASK_ALTIVEC
 #define MASK_CMPB			OPTION_MASK_CMPB
+#define MASK_CRYPTO			OPTION_MASK_CRYPTO
 #define MASK_DFP			OPTION_MASK_DFP
+#define MASK_DIRECT_MOVE		OPTION_MASK_DIRECT_MOVE
 #define MASK_DLMZB			OPTION_MASK_DLMZB
 #define MASK_EABI			OPTION_MASK_EABI
 #define MASK_FPRND			OPTION_MASK_FPRND
+#define MASK_P8_FUSION			OPTION_MASK_P8_FUSION
 #define MASK_HARD_FLOAT			OPTION_MASK_HARD_FLOAT
 #define MASK_ISEL			OPTION_MASK_ISEL
 #define MASK_MFCRF			OPTION_MASK_MFCRF
@@ -500,6 +549,7 @@ extern int rs6000_vector_align[];
 #define MASK_MULHW			OPTION_MASK_MULHW
 #define MASK_MULTIPLE			OPTION_MASK_MULTIPLE
 #define MASK_NO_UPDATE			OPTION_MASK_NO_UPDATE
+#define MASK_P8_VECTOR			OPTION_MASK_P8_VECTOR
 #define MASK_POPCNTB			OPTION_MASK_POPCNTB
 #define MASK_POPCNTD			OPTION_MASK_POPCNTD
 #define MASK_PPC_GFXOPT			OPTION_MASK_PPC_GFXOPT
@@ -1002,7 +1052,9 @@ extern unsigned rs6000_pointer_size;
 
 #define REG_ALLOC_ORDER						\
   {32,								\
-   45, 44, 43, 42, 41, 40, 39, 38, 37, 36, 35, 34,		\
+   /* move fr13 (ie 45) later, so if we need TFmode, it does */	\
+   /* not use fr14 which is a saved register.  */		\
+   44, 43, 42, 41, 40, 39, 38, 37, 36, 35, 34, 45,		\
    33,								\
    63, 62, 61, 60, 59, 58, 57, 56, 55, 54, 53, 52, 51,		\
    50, 49, 48, 47, 46,						\
@@ -1062,8 +1114,14 @@ extern unsigned rs6000_pointer_size;
 #define VINT_REGNO_P(N) ALTIVEC_REGNO_P (N)
 
 /* Alternate name for any vector register supporting logical operations, no
-   matter which instruction set(s) are available.  */
-#define VLOGICAL_REGNO_P(N) VFLOAT_REGNO_P (N)
+   matter which instruction set(s) are available.  Under VSX, we allow GPRs as
+   well as vector registers on 64-bit systems.  We don't allow 32-bit systems,
+   due to the number of registers involved, and the number of instructions to
+   load/store the values..  */
+#define VLOGICAL_REGNO_P(N)						\
+  (ALTIVEC_REGNO_P (N)							\
+   || (TARGET_VSX && FP_REGNO_P (N))					\
+   || (TARGET_VSX && TARGET_POWERPC64 && INT_REGNO_P (N)))
 
 /* Return number of consecutive hard regs needed starting at reg REGNO
    to hold something of mode MODE.  */
@@ -1124,7 +1182,7 @@ extern unsigned rs6000_pointer_size;
    when one has mode MODE1 and one has mode MODE2.
    If HARD_REGNO_MODE_OK could produce different values for MODE1 and MODE2,
    for any hard reg, then this must be 0 for correct output.  */
-#define MODES_TIEABLE_P(MODE1, MODE2) \
+#define MODES_TIEABLE_P(MODE1, MODE2)		\
   (SCALAR_FLOAT_MODE_P (MODE1)			\
    ? SCALAR_FLOAT_MODE_P (MODE2)		\
    : SCALAR_FLOAT_MODE_P (MODE2)		\
@@ -1137,14 +1195,14 @@ extern unsigned rs6000_pointer_size;
    ? SPE_VECTOR_MODE (MODE2)			\
    : SPE_VECTOR_MODE (MODE2)			\
    ? SPE_VECTOR_MODE (MODE1)			\
-   : ALTIVEC_VECTOR_MODE (MODE1)		\
-   ? ALTIVEC_VECTOR_MODE (MODE2)		\
-   : ALTIVEC_VECTOR_MODE (MODE2)		\
-   ? ALTIVEC_VECTOR_MODE (MODE1)		\
    : ALTIVEC_OR_VSX_VECTOR_MODE (MODE1)		\
    ? ALTIVEC_OR_VSX_VECTOR_MODE (MODE2)		\
    : ALTIVEC_OR_VSX_VECTOR_MODE (MODE2)		\
    ? ALTIVEC_OR_VSX_VECTOR_MODE (MODE1)		\
+   : ALTIVEC_VECTOR_MODE (MODE1)		\
+   ? ALTIVEC_VECTOR_MODE (MODE2)		\
+   : ALTIVEC_VECTOR_MODE (MODE2)		\
+   ? ALTIVEC_VECTOR_MODE (MODE1)		\
    : 1)
 
 /* Post-reload, we can't use any new AltiVec registers, as we already
@@ -1337,8 +1395,11 @@ enum r6000_reg_class_enum {
   RS6000_CONSTRAINT_wg,		/* FPR register for -mmfpgpr */
   RS6000_CONSTRAINT_wf,		/* VSX register for V4SF */
   RS6000_CONSTRAINT_wl,		/* FPR register for LFIWAX */
+  RS6000_CONSTRAINT_wm,		/* VSX register for direct move */
+  RS6000_CONSTRAINT_wr,		/* GPR register if 64-bit  */
   RS6000_CONSTRAINT_ws,		/* VSX register for DF */
   RS6000_CONSTRAINT_wt,		/* VSX register for TImode */
+  RS6000_CONSTRAINT_wv,		/* Altivec register for power8 vector */
   RS6000_CONSTRAINT_wx,		/* FPR register for STFIWX */
   RS6000_CONSTRAINT_wz,		/* FPR register for LFIWZX */
   RS6000_CONSTRAINT_MAX
@@ -2365,6 +2426,8 @@ extern int frame_pointer_needed;
 #define RS6000_BTM_ALWAYS	0		/* Always enabled.  */
 #define RS6000_BTM_ALTIVEC	MASK_ALTIVEC	/* VMX/altivec vectors.  */
 #define RS6000_BTM_VSX		MASK_VSX	/* VSX (vector/scalar).  */
+#define RS6000_BTM_P8_VECTOR	MASK_P8_VECTOR	/* ISA 2.07 vector.  */
+#define RS6000_BTM_CRYPTO	MASK_CRYPTO	/* crypto funcs.  */
 #define RS6000_BTM_SPE		MASK_STRING	/* E500 */
 #define RS6000_BTM_PAIRED	MASK_MULHW	/* 750CL paired insns.  */
 #define RS6000_BTM_FRE		MASK_POPCNTB	/* FRE instruction.  */
@@ -2376,6 +2439,8 @@ extern int frame_pointer_needed;
 
 #define RS6000_BTM_COMMON	(RS6000_BTM_ALTIVEC			\
 				 | RS6000_BTM_VSX			\
+				 | RS6000_BTM_P8_VECTOR			\
+				 | RS6000_BTM_CRYPTO			\
 				 | RS6000_BTM_FRE			\
 				 | RS6000_BTM_FRES			\
 				 | RS6000_BTM_FRSQRTE			\
Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(revision 199037)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -168,7 +168,7 @@ (define_attr "length" ""
 ;; Processor type -- this attribute must exactly match the processor_type
 ;; enumeration in rs6000.h.
 
-(define_attr "cpu" "rs64a,mpccore,ppc403,ppc405,ppc440,ppc476,ppc601,ppc603,ppc604,ppc604e,ppc620,ppc630,ppc750,ppc7400,ppc7450,ppc8540,ppc8548,ppce300c2,ppce300c3,ppce500mc,ppce500mc64,ppce5500,ppce6500,power4,power5,power6,power7,cell,ppca2,titan"
+(define_attr "cpu" "rs64a,mpccore,ppc403,ppc405,ppc440,ppc476,ppc601,ppc603,ppc604,ppc604e,ppc620,ppc630,ppc750,ppc7400,ppc7450,ppc8540,ppc8548,ppce300c2,ppce300c3,ppce500mc,ppce500mc64,ppce5500,ppce6500,power4,power5,power6,power7,cell,ppca2,titan,power8"
   (const (symbol_ref "rs6000_cpu_attr")))
 
 
Index: gcc/config/rs6000/rs6000-opts.h
===================================================================
--- gcc/config/rs6000/rs6000-opts.h	(revision 199037)
+++ gcc/config/rs6000/rs6000-opts.h	(working copy)
@@ -59,7 +59,8 @@ enum processor_type
    PROCESSOR_POWER7,
    PROCESSOR_CELL,
    PROCESSOR_PPCA2,
-   PROCESSOR_TITAN
+   PROCESSOR_TITAN,
+   PROCESSOR_POWER8
 };
 
 /* FP processor type.  */
@@ -131,11 +132,14 @@ enum rs6000_cmodel {
   CMODEL_LARGE
 };
 
-/* Describe which vector unit to use for a given machine mode.  */
+/* Describe which vector unit to use for a given machine mode.  The
+   VECTOR_MEM_* and VECTOR_UNIT_* macros assume that Altivec, VSX, and
+   P8_VECTOR are contiguous.  */
 enum rs6000_vector {
   VECTOR_NONE,			/* Type is not  a vector or not supported */
   VECTOR_ALTIVEC,		/* Use altivec for vector processing */
   VECTOR_VSX,			/* Use VSX for vector processing */
+  VECTOR_P8_VECTOR,		/* Use ISA 2.07 VSX for vector processing */
   VECTOR_PAIRED,		/* Use paired floating point for vectors */
   VECTOR_SPE,			/* Use SPE for vector processing */
   VECTOR_OTHER			/* Some other vector unit */

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH, rs6000] power8 patch #1, infrastructure changes (revised patch)
  2013-05-20 20:49 ` [PATCH, rs6000] power8 patch #1, infrastructure changes Michael Meissner
@ 2013-05-20 21:34   ` Michael Meissner
  2013-05-22  3:29     ` David Edelsohn
  0 siblings, 1 reply; 52+ messages in thread
From: Michael Meissner @ 2013-05-20 21:34 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, dje.gcc, pthaugen, bergner

[-- Attachment #1: Type: text/plain, Size: 645 bytes --]

After submitting the patch, I realized I had submitted a previous version of
the patch, that had the wq constraint that was initially for the quad memory
operations, and also had the changes for ChangeLog.ibm, that I keep on the
branch.  However, the wq constraint was always equal to the r constraint, do I
have removed it, and used the 'r' constraint once again.

I have also done bootstraps and make check with the patches submitted, with no
regressions found.  Can I check in the revised patch?

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

[-- Attachment #2: gcc-power8.official-01b --]
[-- Type: text/plain, Size: 42216 bytes --]

Index: gcc/doc/invoke.texi
===================================================================
--- gcc/doc/invoke.texi	(revision 199121)
+++ gcc/doc/invoke.texi	(revision 199122)
@@ -860,7 +860,10 @@ See RS/6000 and PowerPC Options.
 -mno-recip-precision @gol
 -mveclibabi=@var{type} -mfriz -mno-friz @gol
 -mpointers-to-nested-functions -mno-pointers-to-nested-functions @gol
--msave-toc-indirect -mno-save-toc-indirect}
+-msave-toc-indirect -mno-save-toc-indirect @gol
+-mpower8-fusion -mno-mpower8-fusion -mpower8-vector -mno-power8-vector @gol
+-mcrypto -mno-crypto -mdirect-move -mno-direct-move @gol
+-mquad-memory -mno-quad-memory}
 
 @emph{RX Options}
 @gccoptlist{-m64bit-doubles  -m32bit-doubles  -fpu  -nofpu@gol
@@ -17341,7 +17344,8 @@ following options:
 @gccoptlist{-maltivec  -mfprnd  -mhard-float  -mmfcrf  -mmultiple @gol
 -mpopcntb -mpopcntd  -mpowerpc64 @gol
 -mpowerpc-gpopt  -mpowerpc-gfxopt  -msingle-float -mdouble-float @gol
--msimple-fpu -mstring  -mmulhw  -mdlmzb  -mmfpgpr -mvsx}
+-msimple-fpu -mstring  -mmulhw  -mdlmzb  -mmfpgpr -mvsx @gol
+-mcrypto -mdirect-move -mpower8-fusion -mpower8-vector -mquad-memory}
 
 The particular options set for any particular CPU varies between
 compiler versions, depending on what setting seems to produce optimal
@@ -17459,6 +17463,47 @@ Generate code that uses (does not use) v
 instructions, and also enable the use of built-in functions that allow
 more direct access to the VSX instruction set.
 
+@item -mcrypto
+@itemx -mno-crypto
+@opindex mcrypto
+@opindex mno-crypto
+Enable the use (disable) of the built-in functions that allow direct
+access to the cryptographic instructions that were added in version
+2.07 of the PowerPC ISA.
+
+@item -mdirect-move
+@itemx -mno-direct-move
+@opindex mdirect-move
+@opindex mno-direct-move
+Generate code that uses (does not use) the instructions to move data
+between the general purpose registers and the vector/scalar (VSX)
+registers that were added in version 2.07 of the PowerPC ISA.
+
+@item -mpower8-fusion
+@itemx -mno-power8-fusion
+@opindex mpower8-fusion
+@opindex mno-power8-fusion
+Generate code that keeps (does not keeps) some integer operations
+adjacent so that the instructions can be fused together on power8 and
+later processors.
+
+@item -mpower8-vector
+@itemx -mno-power8-vector
+@opindex mpower8-vector
+@opindex mno-power8-vector
+Generate code that uses (does not use) the vector and scalar
+instructions that were added in version 2.07 of the PowerPC ISA.  Also
+enable the use of built-in functions that allow more direct access to
+the vector instructions.
+
+@item -mquad-memory
+@itemx -mno-quad-memory
+@opindex mquad-memory
+@opindex mno-quad-memory
+Generate code that uses (does not use) the quad word memory
+instructions.  The @option{-mquad-memory} option requires use of
+64-bit mode.
+
 @item -mfloat-gprs=@var{yes/single/double/no}
 @itemx -mfloat-gprs
 @opindex mfloat-gprs
Index: gcc/doc/md.texi
===================================================================
--- gcc/doc/md.texi	(revision 199121)
+++ gcc/doc/md.texi	(revision 199122)
@@ -2055,7 +2055,7 @@ Any constant whose absolute value is no 
 
 @end table
 
-@item PowerPC and IBM RS6000---@file{config/rs6000/rs6000.h}
+@item PowerPC and IBM RS6000---@file{config/rs6000/constraints.md}
 @table @code
 @item b
 Address base register
@@ -2069,6 +2069,9 @@ Floating point register (containing 32-b
 @item v
 Altivec vector register
 
+@item wa
+Any VSX register
+
 @item wd
 VSX vector register to hold vector double data
 
@@ -2081,6 +2084,15 @@ If @option{-mmfpgpr} was used, a floatin
 @item wl
 If the LFIWAX instruction is enabled, a floating point register
 
+@item wm
+If direct moves are enabled, a VSX register.
+
+@item wn
+No register.
+
+@item wr
+General purpose register if 64-bit mode is used
+
 @item ws
 VSX vector register to hold scalar float data
 
@@ -2093,8 +2105,9 @@ If the STFIWX instruction is enabled, a 
 @item wz
 If the LFIWZX instruction is enabled, a floating point register
 
-@item wa
-Any VSX register
+@item wQ
+A memory address that will work with the @code{lq} and @code{stq}
+instructions.
 
 @item h
 @samp{MQ}, @samp{CTR}, or @samp{LINK} register
Index: gcc/config/rs6000/rs6000.opt
===================================================================
--- gcc/config/rs6000/rs6000.opt	(revision 199121)
+++ gcc/config/rs6000/rs6000.opt	(revision 199122)
@@ -517,4 +517,28 @@ Control whether we save the TOC in the p
 
 mvsx-timode
 Target Undocumented Mask(VSX_TIMODE) Var(rs6000_isa_flags)
-; Allow/disallow TImode in VSX registers
+Allow 128-bit integers in VSX registers
+
+mpower8-fusion
+Target Report Mask(P8_FUSION) Var(rs6000_isa_flags)
+Fuse certain integer operations together for better performance on power8
+
+mpower8-fusion-sign
+Target Undocumented Mask(P8_FUSION_SIGN) Var(rs6000_isa_flags)
+Allow sign extension in fusion operations
+
+mpower8-vector
+Target Report Mask(P8_VECTOR) Var(rs6000_isa_flags)
+Use/do not use vector and scalar instructions added in ISA 2.07.
+
+mcrypto
+Target Report Mask(CRYPTO) Var(rs6000_isa_flags)
+Use ISA 2.07 crypto instructions
+
+mdirect-move
+Target Report Mask(DIRECT_MOVE) Var(rs6000_isa_flags)
+Use ISA 2.07 direct move between GPR & VSX register instructions
+
+mquad-memory
+Target Report Mask(QUAD_MEMORY) Var(rs6000_isa_flags)
+Generate the quad word memory instructions (lq/stq/lqarx/stqcx).
Index: gcc/config/rs6000/rs6000-c.c
===================================================================
--- gcc/config/rs6000/rs6000-c.c	(revision 199121)
+++ gcc/config/rs6000/rs6000-c.c	(revision 199122)
@@ -315,6 +315,8 @@ rs6000_target_modify_macros (bool define
     rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR6X");
   if ((flags & OPTION_MASK_POPCNTD) != 0)
     rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR7");
+  if ((flags & OPTION_MASK_DIRECT_MOVE) != 0)
+    rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR8");
   if ((flags & OPTION_MASK_SOFT_FLOAT) != 0)
     rs6000_define_or_undefine_macro (define_p, "_SOFT_FLOAT");
   if ((flags & OPTION_MASK_RECIP_PRECISION) != 0)
@@ -331,6 +333,8 @@ rs6000_target_modify_macros (bool define
     }
   if ((flags & OPTION_MASK_VSX) != 0)
     rs6000_define_or_undefine_macro (define_p, "__VSX__");
+  if ((flags & OPTION_MASK_P8_VECTOR) != 0)
+    rs6000_define_or_undefine_macro (define_p, "__POWER8_VECTOR__");
 
   /* options from the builtin masks.  */
   if ((bu_mask & RS6000_BTM_SPE) != 0)
Index: gcc/config/rs6000/constraints.md
===================================================================
--- gcc/config/rs6000/constraints.md	(revision 199121)
+++ gcc/config/rs6000/constraints.md	(revision 199122)
@@ -79,12 +79,31 @@ (define_register_constraint "wg" "rs6000
 (define_register_constraint "wl" "rs6000_constraints[RS6000_CONSTRAINT_wl]"
   "Floating point register if the LFIWAX instruction is enabled or NO_REGS.")
 
+(define_register_constraint "wm" "rs6000_constraints[RS6000_CONSTRAINT_wm]"
+  "VSX register if direct move instructions are enabled, or NO_REGS.")
+
+(define_register_constraint "wr" "rs6000_constraints[RS6000_CONSTRAINT_wr]"
+  "General purpose register if 64-bit instructions are enabled or NO_REGS.")
+
+(define_register_constraint "wv" "rs6000_constraints[RS6000_CONSTRAINT_wv]"
+  "Altivec register if -mpower8-vector is used or NO_REGS.")
+
 (define_register_constraint "wx" "rs6000_constraints[RS6000_CONSTRAINT_wx]"
   "Floating point register if the STFIWX instruction is enabled or NO_REGS.")
 
 (define_register_constraint "wz" "rs6000_constraints[RS6000_CONSTRAINT_wz]"
   "Floating point register if the LFIWZX instruction is enabled or NO_REGS.")
 
+;; NO_REGs register constraint, used to merge mov{sd,sf}, since movsd can use
+;; direct move directly, and movsf can't to move between the register sets.
+;; There is a mode_attr that resolves to wm for SDmode and wn for SFmode
+(define_register_constraint "wn" "NO_REGS")
+
+;; Lq/stq validates the address for load/store quad
+(define_memory_constraint "wQ"
+  "Memory operand suitable for the load/store quad instructions"
+  (match_operand 0 "quad_memory_operand"))
+
 ;; Altivec style load/store that ignores the bottom bits of the address
 (define_memory_constraint "wZ"
   "Indexed or indirect memory operand, ignoring the bottom 4 bits"
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 199121)
+++ gcc/config/rs6000/rs6000.c	(revision 199122)
@@ -831,6 +831,25 @@ struct processor_costs power7_cost = {
   12,			/* prefetch streams */
 };
 
+/* Instruction costs on POWER8 processors.  */
+static const
+struct processor_costs power8_cost = {
+  COSTS_N_INSNS (3),	/* mulsi */
+  COSTS_N_INSNS (3),	/* mulsi_const */
+  COSTS_N_INSNS (3),	/* mulsi_const9 */
+  COSTS_N_INSNS (3),	/* muldi */
+  COSTS_N_INSNS (19),	/* divsi */
+  COSTS_N_INSNS (35),	/* divdi */
+  COSTS_N_INSNS (3),	/* fp */
+  COSTS_N_INSNS (3),	/* dmul */
+  COSTS_N_INSNS (14),	/* sdiv */
+  COSTS_N_INSNS (17),	/* ddiv */
+  128,			/* cache line size */
+  32,			/* l1 cache */
+  256,			/* l2 cache */
+  12,			/* prefetch streams */
+};
+
 /* Instruction costs on POWER A2 processors.  */
 static const
 struct processor_costs ppca2_cost = {
@@ -1547,6 +1566,15 @@ rs6000_hard_regno_mode_ok (int regno, en
 {
   int last_regno = regno + rs6000_hard_regno_nregs[mode][regno] - 1;
 
+  /* PTImode can only go in GPRs.  Quad word memory operations require even/odd
+     register combinations, and use PTImode where we need to deal with quad
+     word memory operations.  Don't allow quad words in the argument or frame
+     pointer registers, just registers 0..31.  */
+  if (mode == PTImode)
+    return (IN_RANGE (regno, FIRST_GPR_REGNO, LAST_GPR_REGNO)
+	    && IN_RANGE (last_regno, FIRST_GPR_REGNO, LAST_GPR_REGNO)
+	    && ((regno & 1) == 0));
+
   /* VSX registers that overlap the FPR registers are larger than for non-VSX
      implementations.  Don't allow an item to be split between a FP register
      and an Altivec register.  */
@@ -1678,6 +1706,16 @@ rs6000_debug_reg_print (int first_regno,
 	  comma = "";
 	}
 
+      len += fprintf (stderr, "%sreg-class = %s", comma,
+		      reg_class_names[(int)rs6000_regno_regclass[r]]);
+      comma = ", ";
+
+      if (len > 70)
+	{
+	  fprintf (stderr, ",\n\t");
+	  comma = "";
+	}
+
       fprintf (stderr, "%sregno = %d\n", comma, r);
     }
 }
@@ -1710,6 +1748,7 @@ rs6000_debug_reg_global (void)
     "none",
     "altivec",
     "vsx",
+    "p8_vector",
     "paired",
     "spe",
     "other"
@@ -1802,8 +1841,11 @@ rs6000_debug_reg_global (void)
 	   "wf reg_class = %s\n"
 	   "wg reg_class = %s\n"
 	   "wl reg_class = %s\n"
+	   "wm reg_class = %s\n"
+	   "wr reg_class = %s\n"
 	   "ws reg_class = %s\n"
 	   "wt reg_class = %s\n"
+	   "wv reg_class = %s\n"
 	   "wx reg_class = %s\n"
 	   "wz reg_class = %s\n"
 	   "\n",
@@ -1815,8 +1857,11 @@ rs6000_debug_reg_global (void)
 	   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wf]],
 	   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wg]],
 	   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wl]],
+	   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wm]],
+	   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wr]],
 	   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_ws]],
 	   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wt]],
+	   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wv]],
 	   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wx]],
 	   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wz]]);
 
@@ -2050,6 +2095,10 @@ rs6000_debug_reg_global (void)
   if (targetm.lra_p ())
     fprintf (stderr, DEBUG_FMT_S, "lra", "true");
 
+  if (TARGET_P8_FUSION)
+    fprintf (stderr, DEBUG_FMT_S, "p8 fusion",
+	     (TARGET_P8_FUSION_SIGN) ? "zero+sign" : "zero");
+
   fprintf (stderr, DEBUG_FMT_S, "plt-format",
 	   TARGET_SECURE_PLT ? "secure" : "bss");
   fprintf (stderr, DEBUG_FMT_S, "struct-return",
@@ -2240,6 +2289,15 @@ rs6000_init_hard_regno_mode_ok (bool glo
   if (TARGET_LFIWAX)
     rs6000_constraints[RS6000_CONSTRAINT_wl] = FLOAT_REGS;
 
+  if (TARGET_DIRECT_MOVE)
+    rs6000_constraints[RS6000_CONSTRAINT_wm] = VSX_REGS;
+
+  if (TARGET_POWERPC64)
+    rs6000_constraints[RS6000_CONSTRAINT_wr] = GENERAL_REGS;
+
+  if (TARGET_P8_VECTOR)
+    rs6000_constraints[RS6000_CONSTRAINT_wv] = ALTIVEC_REGS;
+
   if (TARGET_STFIWX)
     rs6000_constraints[RS6000_CONSTRAINT_wx] = FLOAT_REGS;
 
@@ -2520,16 +2578,18 @@ darwin_rs6000_override_options (void)
 HOST_WIDE_INT
 rs6000_builtin_mask_calculate (void)
 {
-  return (((TARGET_ALTIVEC)		    ? RS6000_BTM_ALTIVEC  : 0)
-	  | ((TARGET_VSX)		    ? RS6000_BTM_VSX	  : 0)
-	  | ((TARGET_SPE)		    ? RS6000_BTM_SPE	  : 0)
-	  | ((TARGET_PAIRED_FLOAT)	    ? RS6000_BTM_PAIRED	  : 0)
-	  | ((TARGET_FRE)		    ? RS6000_BTM_FRE	  : 0)
-	  | ((TARGET_FRES)		    ? RS6000_BTM_FRES	  : 0)
-	  | ((TARGET_FRSQRTE)		    ? RS6000_BTM_FRSQRTE  : 0)
-	  | ((TARGET_FRSQRTES)		    ? RS6000_BTM_FRSQRTES : 0)
-	  | ((TARGET_POPCNTD)		    ? RS6000_BTM_POPCNTD  : 0)
-	  | ((rs6000_cpu == PROCESSOR_CELL) ? RS6000_BTM_CELL     : 0));
+  return (((TARGET_ALTIVEC)		    ? RS6000_BTM_ALTIVEC   : 0)
+	  | ((TARGET_VSX)		    ? RS6000_BTM_VSX	   : 0)
+	  | ((TARGET_SPE)		    ? RS6000_BTM_SPE	   : 0)
+	  | ((TARGET_PAIRED_FLOAT)	    ? RS6000_BTM_PAIRED	   : 0)
+	  | ((TARGET_FRE)		    ? RS6000_BTM_FRE	   : 0)
+	  | ((TARGET_FRES)		    ? RS6000_BTM_FRES	   : 0)
+	  | ((TARGET_FRSQRTE)		    ? RS6000_BTM_FRSQRTE   : 0)
+	  | ((TARGET_FRSQRTES)		    ? RS6000_BTM_FRSQRTES  : 0)
+	  | ((TARGET_POPCNTD)		    ? RS6000_BTM_POPCNTD   : 0)
+	  | ((rs6000_cpu == PROCESSOR_CELL) ? RS6000_BTM_CELL      : 0)
+	  | ((TARGET_P8_VECTOR)		    ? RS6000_BTM_P8_VECTOR : 0)
+	  | ((TARGET_CRYPTO)		    ? RS6000_BTM_CRYPTO	   : 0));
 }
 
 /* Override command line options.  Mostly we process the processor type and
@@ -2803,7 +2863,9 @@ rs6000_option_override_internal (bool gl
 
   /* For the newer switches (vsx, dfp, etc.) set some of the older options,
      unless the user explicitly used the -mno-<option> to disable the code.  */
-  if (TARGET_VSX)
+  if (TARGET_P8_VECTOR || TARGET_DIRECT_MOVE || TARGET_CRYPTO)
+    rs6000_isa_flags |= (ISA_2_7_MASKS_SERVER & ~rs6000_isa_flags_explicit);
+  else if (TARGET_VSX)
     rs6000_isa_flags |= (ISA_2_6_MASKS_SERVER & ~rs6000_isa_flags_explicit);
   else if (TARGET_POPCNTD)
     rs6000_isa_flags |= (ISA_2_6_MASKS_EMBEDDED & ~rs6000_isa_flags_explicit);
@@ -2818,6 +2880,34 @@ rs6000_option_override_internal (bool gl
   else if (TARGET_ALTIVEC)
     rs6000_isa_flags |= (OPTION_MASK_PPC_GFXOPT & ~rs6000_isa_flags_explicit);
 
+  if (TARGET_CRYPTO && !TARGET_ALTIVEC)
+    {
+      if (rs6000_isa_flags_explicit & OPTION_MASK_CRYPTO)
+	error ("-mcrypto requires -maltivec");
+      rs6000_isa_flags &= ~OPTION_MASK_CRYPTO;
+    }
+
+  if (TARGET_DIRECT_MOVE && !TARGET_VSX)
+    {
+      if (rs6000_isa_flags_explicit & OPTION_MASK_DIRECT_MOVE)
+	error ("-mdirect-move requires -mvsx");
+      rs6000_isa_flags &= ~OPTION_MASK_DIRECT_MOVE;
+    }
+
+  if (TARGET_P8_VECTOR && !TARGET_ALTIVEC)
+    {
+      if (rs6000_isa_flags_explicit & OPTION_MASK_P8_VECTOR)
+	error ("-mpower8-vector requires -maltivec");
+      rs6000_isa_flags &= ~OPTION_MASK_P8_VECTOR;
+    }
+
+  if (TARGET_P8_VECTOR && !TARGET_VSX)
+    {
+      if (rs6000_isa_flags_explicit & OPTION_MASK_P8_VECTOR)
+	error ("-mpower8-vector requires -mvsx");
+      rs6000_isa_flags &= ~OPTION_MASK_P8_VECTOR;
+    }
+
   if (TARGET_VSX_TIMODE && !TARGET_VSX)
     {
       if (rs6000_isa_flags_explicit & OPTION_MASK_VSX_TIMODE)
@@ -3019,16 +3109,19 @@ rs6000_option_override_internal (bool gl
 			&& rs6000_cpu != PROCESSOR_POWER5
 			&& rs6000_cpu != PROCESSOR_POWER6
 			&& rs6000_cpu != PROCESSOR_POWER7
+			&& rs6000_cpu != PROCESSOR_POWER8
 			&& rs6000_cpu != PROCESSOR_PPCA2
 			&& rs6000_cpu != PROCESSOR_CELL
 			&& rs6000_cpu != PROCESSOR_PPC476);
   rs6000_sched_groups = (rs6000_cpu == PROCESSOR_POWER4
 			 || rs6000_cpu == PROCESSOR_POWER5
-			 || rs6000_cpu == PROCESSOR_POWER7);
+			 || rs6000_cpu == PROCESSOR_POWER7
+			 || rs6000_cpu == PROCESSOR_POWER8);
   rs6000_align_branch_targets = (rs6000_cpu == PROCESSOR_POWER4
 				 || rs6000_cpu == PROCESSOR_POWER5
 				 || rs6000_cpu == PROCESSOR_POWER6
 				 || rs6000_cpu == PROCESSOR_POWER7
+				 || rs6000_cpu == PROCESSOR_POWER8
 				 || rs6000_cpu == PROCESSOR_PPCE500MC
 				 || rs6000_cpu == PROCESSOR_PPCE500MC64
 				 || rs6000_cpu == PROCESSOR_PPCE5500
@@ -3272,6 +3365,10 @@ rs6000_option_override_internal (bool gl
 	rs6000_cost = &power7_cost;
 	break;
 
+      case PROCESSOR_POWER8:
+	rs6000_cost = &power8_cost;
+	break;
+
       case PROCESSOR_PPCA2:
 	rs6000_cost = &ppca2_cost;
 	break;
@@ -3444,7 +3541,8 @@ rs6000_loop_align (rtx label)
       && (rs6000_cpu == PROCESSOR_POWER4
 	  || rs6000_cpu == PROCESSOR_POWER5
 	  || rs6000_cpu == PROCESSOR_POWER6
-	  || rs6000_cpu == PROCESSOR_POWER7))
+	  || rs6000_cpu == PROCESSOR_POWER7
+	  || rs6000_cpu == PROCESSOR_POWER8))
     return 5;
   else
     return align_loops_log;
@@ -12891,8 +12989,23 @@ rs6000_common_init_builtins (void)
       else
 	{
 	  enum insn_code icode = d->icode;
-          if (d->name == 0 || icode == CODE_FOR_nothing)
-	    continue;
+	  if (d->name == 0)
+	    {
+	      if (TARGET_DEBUG_BUILTIN)
+		fprintf (stderr, "rs6000_builtin, bdesc_3arg[%ld] no name\n",
+			 (long unsigned)i);
+
+	      continue;
+	    }
+
+          if (icode == CODE_FOR_nothing)
+	    {
+	      if (TARGET_DEBUG_BUILTIN)
+		fprintf (stderr, "rs6000_builtin, skip ternary %s (no code)\n",
+			 d->name);
+
+	      continue;
+	    }
 
 	  type = builtin_function_type (insn_data[icode].operand[0].mode,
 					insn_data[icode].operand[1].mode,
@@ -12931,8 +13044,23 @@ rs6000_common_init_builtins (void)
       else
 	{
 	  enum insn_code icode = d->icode;
-          if (d->name == 0 || icode == CODE_FOR_nothing)
-	    continue;
+	  if (d->name == 0)
+	    {
+	      if (TARGET_DEBUG_BUILTIN)
+		fprintf (stderr, "rs6000_builtin, bdesc_2arg[%ld] no name\n",
+			 (long unsigned)i);
+
+	      continue;
+	    }
+
+          if (icode == CODE_FOR_nothing)
+	    {
+	      if (TARGET_DEBUG_BUILTIN)
+		fprintf (stderr, "rs6000_builtin, skip binary %s (no code)\n",
+			 d->name);
+
+	      continue;
+	    }
 
           mode0 = insn_data[icode].operand[0].mode;
           mode1 = insn_data[icode].operand[1].mode;
@@ -12993,8 +13121,23 @@ rs6000_common_init_builtins (void)
       else
         {
 	  enum insn_code icode = d->icode;
-          if (d->name == 0 || icode == CODE_FOR_nothing)
-	    continue;
+	  if (d->name == 0)
+	    {
+	      if (TARGET_DEBUG_BUILTIN)
+		fprintf (stderr, "rs6000_builtin, bdesc_1arg[%ld] no name\n",
+			 (long unsigned)i);
+
+	      continue;
+	    }
+
+          if (icode == CODE_FOR_nothing)
+	    {
+	      if (TARGET_DEBUG_BUILTIN)
+		fprintf (stderr, "rs6000_builtin, skip unary %s (no code)\n",
+			 d->name);
+
+	      continue;
+	    }
 
           mode0 = insn_data[icode].operand[0].mode;
           mode1 = insn_data[icode].operand[1].mode;
@@ -22951,6 +23094,7 @@ rs6000_adjust_cost (rtx insn, rtx link, 
                  || rs6000_cpu_attr == CPU_POWER4
                  || rs6000_cpu_attr == CPU_POWER5
 		 || rs6000_cpu_attr == CPU_POWER7
+		 || rs6000_cpu_attr == CPU_POWER8
                  || rs6000_cpu_attr == CPU_CELL)
                 && recog_memoized (dep_insn)
                 && (INSN_CODE (dep_insn) >= 0))
@@ -23537,6 +23681,8 @@ rs6000_issue_rate (void)
   case CPU_POWER6:
   case CPU_POWER7:
     return 5;
+  case CPU_POWER8:
+    return 7;
   default:
     return 1;
   }
@@ -24130,6 +24276,7 @@ insn_must_be_first_in_group (rtx insn)
         }
       break;
     case PROCESSOR_POWER7:
+    case PROCESSOR_POWER8:	/* FIXME */
       type = get_attr_type (insn);
 
       switch (type)
@@ -24226,6 +24373,7 @@ insn_must_be_last_in_group (rtx insn)
     }
     break;
   case PROCESSOR_POWER7:
+  case PROCESSOR_POWER8:	/* FIXME */
     type = get_attr_type (insn);
 
     switch (type)
@@ -24332,7 +24480,8 @@ force_new_group (int sched_verbose, FILE
 	can_issue_more--;
 
       /* Power6 and Power7 have special group ending nop. */
-      if (rs6000_cpu_attr == CPU_POWER6 || rs6000_cpu_attr == CPU_POWER7)
+      if (rs6000_cpu_attr == CPU_POWER6 || rs6000_cpu_attr == CPU_POWER7
+	  || rs6000_cpu_attr == CPU_POWER8)
 	{
 	  nop = gen_group_ending_nop ();
 	  emit_insn_before (nop, next_insn);
@@ -26513,7 +26662,8 @@ rs6000_register_move_cost (enum machine_
       /* For those processors that have slow LR/CTR moves, make them more
          expensive than memory in order to bias spills to memory .*/
       else if ((rs6000_cpu == PROCESSOR_POWER6
-		|| rs6000_cpu == PROCESSOR_POWER7)
+		|| rs6000_cpu == PROCESSOR_POWER7
+		|| rs6000_cpu == PROCESSOR_POWER8)
 	       && reg_classes_intersect_p (rclass, LINK_OR_CTR_REGS))
         ret = 6 * hard_regno_nregs[0][mode];
 
@@ -27742,6 +27892,8 @@ static struct rs6000_opt_mask const rs60
 {
   { "altivec",			OPTION_MASK_ALTIVEC,		false, true  },
   { "cmpb",			OPTION_MASK_CMPB,		false, true  },
+  { "crypto",			OPTION_MASK_CRYPTO,		false, true  },
+  { "direct-move",		OPTION_MASK_DIRECT_MOVE,	false, true  },
   { "dlmzb",			OPTION_MASK_DLMZB,		false, true  },
   { "fprnd",			OPTION_MASK_FPRND,		false, true  },
   { "hard-dfp",			OPTION_MASK_DFP,		false, true  },
@@ -27750,13 +27902,17 @@ static struct rs6000_opt_mask const rs60
   { "mfpgpr",			OPTION_MASK_MFPGPR,		false, true  },
   { "mulhw",			OPTION_MASK_MULHW,		false, true  },
   { "multiple",			OPTION_MASK_MULTIPLE,		false, true  },
-  { "update",			OPTION_MASK_NO_UPDATE,		true , true  },
   { "popcntb",			OPTION_MASK_POPCNTB,		false, true  },
   { "popcntd",			OPTION_MASK_POPCNTD,		false, true  },
+  { "power8-fusion",		OPTION_MASK_P8_FUSION,		false, true  },
+  { "power8-fusion-sign",	OPTION_MASK_P8_FUSION_SIGN,	false, true  },
+  { "power8-vector",		OPTION_MASK_P8_VECTOR,		false, true  },
   { "powerpc-gfxopt",		OPTION_MASK_PPC_GFXOPT,		false, true  },
   { "powerpc-gpopt",		OPTION_MASK_PPC_GPOPT,		false, true  },
+  { "quad-memory",		OPTION_MASK_QUAD_MEMORY,	false, true  },
   { "recip-precision",		OPTION_MASK_RECIP_PRECISION,	false, true  },
   { "string",			OPTION_MASK_STRING,		false, true  },
+  { "update",			OPTION_MASK_NO_UPDATE,		true , true  },
   { "vsx",			OPTION_MASK_VSX,		false, true  },
   { "vsx-timode",		OPTION_MASK_VSX_TIMODE,		false, true  },
 #ifdef OPTION_MASK_64BIT
@@ -27798,6 +27954,8 @@ static struct rs6000_opt_mask const rs60
   { "frsqrtes",		 RS6000_BTM_FRSQRTES,	false, false },
   { "popcntd",		 RS6000_BTM_POPCNTD,	false, false },
   { "cell",		 RS6000_BTM_CELL,	false, false },
+  { "power8-vector",	 RS6000_BTM_P8_VECTOR,	false, false },
+  { "crypto",		 RS6000_BTM_CRYPTO,	false, false },
 };
 
 /* Option variables that we want to support inside attribute((target)) and
Index: gcc/config/rs6000/rs6000.h
===================================================================
--- gcc/config/rs6000/rs6000.h	(revision 199121)
+++ gcc/config/rs6000/rs6000.h	(revision 199122)
@@ -92,7 +92,7 @@
 #ifdef HAVE_AS_POWER8
 #define ASM_CPU_POWER8_SPEC "-mpower8"
 #else
-#define ASM_CPU_POWER8_SPEC "-mpower4 -maltivec"
+#define ASM_CPU_POWER8_SPEC ASM_CPU_POWER7_SPEC
 #endif
 
 #ifdef HAVE_AS_DCI
@@ -164,6 +164,7 @@
 %{mcpu=e6500: -me6500} \
 %{maltivec: -maltivec} \
 %{mvsx: -mvsx %{!maltivec: -maltivec} %{!mcpu*: %(asm_cpu_power7)}} \
+%{mpower8-vector|mcrypto|mdirect-move: %{!mcpu*: %(asm_cpu_power8)}} \
 -many"
 
 #define CPP_DEFAULT_SPEC ""
@@ -277,6 +278,19 @@ extern const char *host_detect_local_cpu
 #define TARGET_POPCNTD 0
 #endif
 
+/* Define the ISA 2.07 flags as 0 if the target assembler does not support the
+   waitasecond instruction.  Allow -mpower8-fusion, since it does not add new
+   instructions.  */
+
+#ifndef HAVE_AS_POWER8
+#undef  TARGET_DIRECT_MOVE
+#undef  TARGET_CRYPTO
+#undef  TARGET_P8_VECTOR
+#define TARGET_DIRECT_MOVE 0
+#define TARGET_CRYPTO 0
+#define TARGET_P8_VECTOR 0
+#endif
+
 /* Define TARGET_LWSYNC_INSTRUCTION if the assembler knows about lwsync.  If
    not, generate the lwsync code as an integer constant.  */
 #ifdef HAVE_AS_LWSYNC
@@ -386,6 +400,7 @@ extern const char *host_detect_local_cpu
 #define TARGET_DEBUG_TARGET	(rs6000_debug & MASK_DEBUG_TARGET)
 #define TARGET_DEBUG_BUILTIN	(rs6000_debug & MASK_DEBUG_BUILTIN)
 
+/* Describe the vector unit used for arithmetic operations.  */
 extern enum rs6000_vector rs6000_vector_unit[];
 
 #define VECTOR_UNIT_NONE_P(MODE)			\
@@ -394,12 +409,25 @@ extern enum rs6000_vector rs6000_vector_
 #define VECTOR_UNIT_VSX_P(MODE)				\
   (rs6000_vector_unit[(MODE)] == VECTOR_VSX)
 
+#define VECTOR_UNIT_P8_VECTOR_P(MODE)			\
+  (rs6000_vector_unit[(MODE)] == VECTOR_P8_VECTOR)
+
 #define VECTOR_UNIT_ALTIVEC_P(MODE)			\
   (rs6000_vector_unit[(MODE)] == VECTOR_ALTIVEC)
 
+#define VECTOR_UNIT_VSX_OR_P8_VECTOR_P(MODE)		\
+  (IN_RANGE ((int)rs6000_vector_unit[(MODE)],		\
+	     (int)VECTOR_VSX,				\
+	     (int)VECTOR_P8_VECTOR))
+
+/* VECTOR_UNIT_ALTIVEC_OR_VSX_P is used in places where we are using either
+   altivec (VMX) or VSX vector instructions.  P8 vector support is upwards
+   compatible, so allow it as well, rather than changing all of the uses of the
+   macro.  */
 #define VECTOR_UNIT_ALTIVEC_OR_VSX_P(MODE)		\
-  (rs6000_vector_unit[(MODE)] == VECTOR_ALTIVEC 	\
-   || rs6000_vector_unit[(MODE)] == VECTOR_VSX)
+  (IN_RANGE ((int)rs6000_vector_unit[(MODE)],		\
+	     (int)VECTOR_ALTIVEC,			\
+	     (int)VECTOR_P8_VECTOR))
 
 /* Describe whether to use VSX loads or Altivec loads.  For now, just use the
    same unit as the vector unit we are using, but we may want to migrate to
@@ -412,12 +440,21 @@ extern enum rs6000_vector rs6000_vector_
 #define VECTOR_MEM_VSX_P(MODE)				\
   (rs6000_vector_mem[(MODE)] == VECTOR_VSX)
 
+#define VECTOR_MEM_P8_VECTOR_P(MODE)			\
+  (rs6000_vector_mem[(MODE)] == VECTOR_VSX)
+
 #define VECTOR_MEM_ALTIVEC_P(MODE)			\
   (rs6000_vector_mem[(MODE)] == VECTOR_ALTIVEC)
 
+#define VECTOR_MEM_VSX_OR_P8_VECTOR_P(MODE)		\
+  (IN_RANGE ((int)rs6000_vector_mem[(MODE)],		\
+	     (int)VECTOR_VSX,				\
+	     (int)VECTOR_P8_VECTOR))
+
 #define VECTOR_MEM_ALTIVEC_OR_VSX_P(MODE)		\
-  (rs6000_vector_mem[(MODE)] == VECTOR_ALTIVEC 	\
-   || rs6000_vector_mem[(MODE)] == VECTOR_VSX)
+  (IN_RANGE ((int)rs6000_vector_mem[(MODE)],		\
+	     (int)VECTOR_ALTIVEC,			\
+	     (int)VECTOR_P8_VECTOR))
 
 /* Return the alignment of a given vector type, which is set based on the
    vector unit use.  VSX for instance can load 32 or 64 bit aligned words
@@ -479,6 +516,15 @@ extern int rs6000_vector_align[];
 #define TARGET_FCTIDUZ	TARGET_POPCNTD
 #define TARGET_FCTIWUZ	TARGET_POPCNTD
 
+#define TARGET_XSCVDPSPN	(TARGET_DIRECT_MOVE || TARGET_P8_VECTOR)
+#define TARGET_XSCVSPDPN	(TARGET_DIRECT_MOVE || TARGET_P8_VECTOR)
+
+/* Byte/char syncs were added as phased in for ISA 2.06B, but are not present
+   in power7, so conditionalize them on p8 features.  TImode syncs need quad
+   memory support.  */
+#define TARGET_SYNC_HI_QI	(TARGET_QUAD_MEMORY || TARGET_DIRECT_MOVE)
+#define TARGET_SYNC_TI		TARGET_QUAD_MEMORY
+
 /* Power7 has both 32-bit load and store integer for the FPRs, so we don't need
    to allocate the SDmode stack slot to get the value into the proper location
    in the register.  */
@@ -489,10 +535,13 @@ extern int rs6000_vector_align[];
    OPTION_MASK_<xxx> back into MASK_<xxx>.  */
 #define MASK_ALTIVEC			OPTION_MASK_ALTIVEC
 #define MASK_CMPB			OPTION_MASK_CMPB
+#define MASK_CRYPTO			OPTION_MASK_CRYPTO
 #define MASK_DFP			OPTION_MASK_DFP
+#define MASK_DIRECT_MOVE		OPTION_MASK_DIRECT_MOVE
 #define MASK_DLMZB			OPTION_MASK_DLMZB
 #define MASK_EABI			OPTION_MASK_EABI
 #define MASK_FPRND			OPTION_MASK_FPRND
+#define MASK_P8_FUSION			OPTION_MASK_P8_FUSION
 #define MASK_HARD_FLOAT			OPTION_MASK_HARD_FLOAT
 #define MASK_ISEL			OPTION_MASK_ISEL
 #define MASK_MFCRF			OPTION_MASK_MFCRF
@@ -500,6 +549,7 @@ extern int rs6000_vector_align[];
 #define MASK_MULHW			OPTION_MASK_MULHW
 #define MASK_MULTIPLE			OPTION_MASK_MULTIPLE
 #define MASK_NO_UPDATE			OPTION_MASK_NO_UPDATE
+#define MASK_P8_VECTOR			OPTION_MASK_P8_VECTOR
 #define MASK_POPCNTB			OPTION_MASK_POPCNTB
 #define MASK_POPCNTD			OPTION_MASK_POPCNTD
 #define MASK_PPC_GFXOPT			OPTION_MASK_PPC_GFXOPT
@@ -1002,7 +1052,9 @@ extern unsigned rs6000_pointer_size;
 
 #define REG_ALLOC_ORDER						\
   {32,								\
-   45, 44, 43, 42, 41, 40, 39, 38, 37, 36, 35, 34,		\
+   /* move fr13 (ie 45) later, so if we need TFmode, it does */	\
+   /* not use fr14 which is a saved register.  */		\
+   44, 43, 42, 41, 40, 39, 38, 37, 36, 35, 34, 45,		\
    33,								\
    63, 62, 61, 60, 59, 58, 57, 56, 55, 54, 53, 52, 51,		\
    50, 49, 48, 47, 46,						\
@@ -1062,8 +1114,14 @@ extern unsigned rs6000_pointer_size;
 #define VINT_REGNO_P(N) ALTIVEC_REGNO_P (N)
 
 /* Alternate name for any vector register supporting logical operations, no
-   matter which instruction set(s) are available.  */
-#define VLOGICAL_REGNO_P(N) VFLOAT_REGNO_P (N)
+   matter which instruction set(s) are available.  Under VSX, we allow GPRs as
+   well as vector registers on 64-bit systems.  We don't allow 32-bit systems,
+   due to the number of registers involved, and the number of instructions to
+   load/store the values..  */
+#define VLOGICAL_REGNO_P(N)						\
+  (ALTIVEC_REGNO_P (N)							\
+   || (TARGET_VSX && FP_REGNO_P (N))					\
+   || (TARGET_VSX && TARGET_POWERPC64 && INT_REGNO_P (N)))
 
 /* Return number of consecutive hard regs needed starting at reg REGNO
    to hold something of mode MODE.  */
@@ -1124,7 +1182,7 @@ extern unsigned rs6000_pointer_size;
    when one has mode MODE1 and one has mode MODE2.
    If HARD_REGNO_MODE_OK could produce different values for MODE1 and MODE2,
    for any hard reg, then this must be 0 for correct output.  */
-#define MODES_TIEABLE_P(MODE1, MODE2) \
+#define MODES_TIEABLE_P(MODE1, MODE2)		\
   (SCALAR_FLOAT_MODE_P (MODE1)			\
    ? SCALAR_FLOAT_MODE_P (MODE2)		\
    : SCALAR_FLOAT_MODE_P (MODE2)		\
@@ -1137,14 +1195,14 @@ extern unsigned rs6000_pointer_size;
    ? SPE_VECTOR_MODE (MODE2)			\
    : SPE_VECTOR_MODE (MODE2)			\
    ? SPE_VECTOR_MODE (MODE1)			\
-   : ALTIVEC_VECTOR_MODE (MODE1)		\
-   ? ALTIVEC_VECTOR_MODE (MODE2)		\
-   : ALTIVEC_VECTOR_MODE (MODE2)		\
-   ? ALTIVEC_VECTOR_MODE (MODE1)		\
    : ALTIVEC_OR_VSX_VECTOR_MODE (MODE1)		\
    ? ALTIVEC_OR_VSX_VECTOR_MODE (MODE2)		\
    : ALTIVEC_OR_VSX_VECTOR_MODE (MODE2)		\
    ? ALTIVEC_OR_VSX_VECTOR_MODE (MODE1)		\
+   : ALTIVEC_VECTOR_MODE (MODE1)		\
+   ? ALTIVEC_VECTOR_MODE (MODE2)		\
+   : ALTIVEC_VECTOR_MODE (MODE2)		\
+   ? ALTIVEC_VECTOR_MODE (MODE1)		\
    : 1)
 
 /* Post-reload, we can't use any new AltiVec registers, as we already
@@ -1337,8 +1395,11 @@ enum r6000_reg_class_enum {
   RS6000_CONSTRAINT_wg,		/* FPR register for -mmfpgpr */
   RS6000_CONSTRAINT_wf,		/* VSX register for V4SF */
   RS6000_CONSTRAINT_wl,		/* FPR register for LFIWAX */
+  RS6000_CONSTRAINT_wm,		/* VSX register for direct move */
+  RS6000_CONSTRAINT_wr,		/* GPR register if 64-bit  */
   RS6000_CONSTRAINT_ws,		/* VSX register for DF */
   RS6000_CONSTRAINT_wt,		/* VSX register for TImode */
+  RS6000_CONSTRAINT_wv,		/* Altivec register for power8 vector */
   RS6000_CONSTRAINT_wx,		/* FPR register for STFIWX */
   RS6000_CONSTRAINT_wz,		/* FPR register for LFIWZX */
   RS6000_CONSTRAINT_MAX
@@ -2365,6 +2426,8 @@ extern int frame_pointer_needed;
 #define RS6000_BTM_ALWAYS	0		/* Always enabled.  */
 #define RS6000_BTM_ALTIVEC	MASK_ALTIVEC	/* VMX/altivec vectors.  */
 #define RS6000_BTM_VSX		MASK_VSX	/* VSX (vector/scalar).  */
+#define RS6000_BTM_P8_VECTOR	MASK_P8_VECTOR	/* ISA 2.07 vector.  */
+#define RS6000_BTM_CRYPTO	MASK_CRYPTO	/* crypto funcs.  */
 #define RS6000_BTM_SPE		MASK_STRING	/* E500 */
 #define RS6000_BTM_PAIRED	MASK_MULHW	/* 750CL paired insns.  */
 #define RS6000_BTM_FRE		MASK_POPCNTB	/* FRE instruction.  */
@@ -2376,6 +2439,8 @@ extern int frame_pointer_needed;
 
 #define RS6000_BTM_COMMON	(RS6000_BTM_ALTIVEC			\
 				 | RS6000_BTM_VSX			\
+				 | RS6000_BTM_P8_VECTOR			\
+				 | RS6000_BTM_CRYPTO			\
 				 | RS6000_BTM_FRE			\
 				 | RS6000_BTM_FRES			\
 				 | RS6000_BTM_FRSQRTE			\
Index: gcc/config/rs6000/predicates.md
===================================================================
--- gcc/config/rs6000/predicates.md	(revision 199121)
+++ gcc/config/rs6000/predicates.md	(revision 199122)
@@ -166,6 +166,11 @@ (define_predicate "const_2_to_3_operand"
   (and (match_code "const_int")
        (match_test "IN_RANGE (INTVAL (op), 2, 3)")))
 
+;; Match op = 0..15
+(define_predicate "const_0_to_15_operand"
+  (and (match_code "const_int")
+       (match_test "IN_RANGE (INTVAL (op), 0, 15)")))
+
 ;; Return 1 if op is a register that is not special.
 (define_predicate "gpc_reg_operand"
   (match_operand 0 "register_operand")
@@ -182,9 +187,68 @@ (define_predicate "gpc_reg_operand"
   if (REGNO (op) >= ARG_POINTER_REGNUM && !CA_REGNO_P (REGNO (op)))
     return 1;
 
+  if (TARGET_VSX && VSX_REGNO_P (REGNO (op)))
+    return 1;
+
   return INT_REGNO_P (REGNO (op)) || FP_REGNO_P (REGNO (op));
 })
 
+;; Return 1 if op is a general purpose register.  Unlike gpc_reg_operand, don't
+;; allow floating point or vector registers.
+(define_predicate "int_reg_operand"
+  (match_operand 0 "register_operand")
+{
+  if ((TARGET_E500_DOUBLE || TARGET_SPE) && invalid_e500_subreg (op, mode))
+    return 0;
+
+  if (GET_CODE (op) == SUBREG)
+    op = SUBREG_REG (op);
+
+  if (!REG_P (op))
+    return 0;
+
+  if (REGNO (op) >= ARG_POINTER_REGNUM && !CA_REGNO_P (REGNO (op)))
+    return 1;
+
+  return INT_REGNO_P (REGNO (op));
+})
+
+;; Like int_reg_operand, but only return true for base registers
+(define_predicate "base_reg_operand"
+  (match_operand 0 "int_reg_operand")
+{
+  if (GET_CODE (op) == SUBREG)
+    op = SUBREG_REG (op);
+
+  if (!REG_P (op))
+    return 0;
+
+  return (REGNO (op) != FIRST_GPR_REGNO);
+})
+
+;; Return 1 if op is a general purpose register that is an even register
+;; which suitable for a load/store quad operation
+(define_predicate "quad_int_reg_operand"
+  (match_operand 0 "register_operand")
+{
+  HOST_WIDE_INT r;
+
+  if (!TARGET_QUAD_MEMORY)
+    return 0;
+
+  if (GET_CODE (op) == SUBREG)
+    op = SUBREG_REG (op);
+
+  if (!REG_P (op))
+    return 0;
+
+  r = REGNO (op);
+  if (r >= FIRST_PSEUDO_REGISTER)
+    return 1;
+
+  return (INT_REGNO_P (r) && ((r & 1) == 0));
+})
+
 ;; Return 1 if op is a register that is a condition register field.
 (define_predicate "cc_reg_operand"
   (match_operand 0 "register_operand")
@@ -302,6 +366,11 @@ (define_predicate "reg_or_logical_cint_o
 		      & (~ (unsigned HOST_WIDE_INT) 0xffffffff)) == 0)")
     (match_operand 0 "gpc_reg_operand")))
 
+;; Like reg_or_logical_cint_operand, but allow vsx registers
+(define_predicate "vsx_reg_or_cint_operand"
+  (ior (match_operand 0 "vsx_register_operand")
+       (match_operand 0 "reg_or_logical_cint_operand")))
+
 ;; Return 1 if operand is a CONST_DOUBLE that can be set in a register
 ;; with no more than one instruction per word.
 (define_predicate "easy_fp_constant"
@@ -507,6 +576,54 @@ (define_predicate "offsettable_mem_opera
   (and (match_operand 0 "memory_operand")
        (match_test "offsettable_nonstrict_memref_p (op)")))
 
+;; Return 1 if the operand is suitable for load/store quad memory.
+(define_predicate "quad_memory_operand"
+  (match_code "mem")
+{
+  rtx addr, op0, op1;
+  int ret;
+
+  if (!TARGET_QUAD_MEMORY)
+    ret = 0;
+
+  else if (!memory_operand (op, mode))
+    ret = 0;
+
+  else if (GET_MODE_SIZE (GET_MODE (op)) != 16)
+    ret = 0;
+
+  else if (MEM_ALIGN (op) < 128)
+    ret = 0;
+
+  else
+    {
+      addr = XEXP (op, 0);
+      if (int_reg_operand (addr, Pmode))
+	ret = 1;
+
+      else if (GET_CODE (addr) != PLUS)
+	ret = 0;
+
+      else
+	{
+	  op0 = XEXP (addr, 0);
+	  op1 = XEXP (addr, 1);
+	  ret = (int_reg_operand (op0, Pmode)
+		 && GET_CODE (op1) == CONST_INT
+		 && IN_RANGE (INTVAL (op1), -32768, 32767)
+		 && (INTVAL (op1) & 15) == 0);
+	}
+    }
+
+  if (TARGET_DEBUG_ADDR)
+    {
+      fprintf (stderr, "\nquad_memory_operand, ret = %s\n", ret ? "true" : "false");
+      debug_rtx (op);
+    }
+
+  return ret;
+})
+
 ;; Return 1 if the operand is an indexed or indirect memory operand.
 (define_predicate "indexed_or_indirect_operand"
   (match_code "mem")
@@ -521,6 +638,19 @@ (define_predicate "indexed_or_indirect_o
   return indexed_or_indirect_address (op, mode);
 })
 
+;; Like indexed_or_indirect_operand, but also allow a GPR register if direct
+;; moves are supported.
+(define_predicate "reg_or_indexed_operand"
+  (match_code "mem,reg")
+{
+  if (MEM_P (op))
+    return indexed_or_indirect_operand (op, mode);
+  else if (TARGET_DIRECT_MOVE)
+    return register_operand (op, mode);
+  return
+    0;
+})
+
 ;; Return 1 if the operand is an indexed or indirect memory operand with an
 ;; AND -16 in it, used to recognize when we need to switch to Altivec loads
 ;; to realign loops instead of VSX (altivec silently ignores the bottom bits,
Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(revision 199121)
+++ gcc/config/rs6000/rs6000.md	(revision 199122)
@@ -168,7 +168,7 @@ (define_attr "length" ""
 ;; Processor type -- this attribute must exactly match the processor_type
 ;; enumeration in rs6000.h.
 
-(define_attr "cpu" "rs64a,mpccore,ppc403,ppc405,ppc440,ppc476,ppc601,ppc603,ppc604,ppc604e,ppc620,ppc630,ppc750,ppc7400,ppc7450,ppc8540,ppc8548,ppce300c2,ppce300c3,ppce500mc,ppce500mc64,ppce5500,ppce6500,power4,power5,power6,power7,cell,ppca2,titan"
+(define_attr "cpu" "rs64a,mpccore,ppc403,ppc405,ppc440,ppc476,ppc601,ppc603,ppc604,ppc604e,ppc620,ppc630,ppc750,ppc7400,ppc7450,ppc8540,ppc8548,ppce300c2,ppce300c3,ppce500mc,ppce500mc64,ppce5500,ppce6500,power4,power5,power6,power7,cell,ppca2,titan,power8"
   (const (symbol_ref "rs6000_cpu_attr")))
 
 
Index: gcc/config/rs6000/rs6000-cpus.def
===================================================================
--- gcc/config/rs6000/rs6000-cpus.def	(revision 199121)
+++ gcc/config/rs6000/rs6000-cpus.def	(revision 199122)
@@ -28,7 +28,7 @@
      ALTIVEC, since in general it isn't a win on power6.  In ISA 2.04, fsel,
      fre, fsqrt, etc. were no longer documented as optional.  Group masks by
      server and embedded. */
-#define ISA_2_5_MASKS_EMBEDDED	(ISA_2_2_MASKS				\
+#define ISA_2_5_MASKS_EMBEDDED	(ISA_2_4_MASKS				\
 				 | OPTION_MASK_CMPB			\
 				 | OPTION_MASK_RECIP_PRECISION		\
 				 | OPTION_MASK_PPC_GFXOPT		\
@@ -45,6 +45,14 @@
 				 | OPTION_MASK_VSX			\
 				 | OPTION_MASK_VSX_TIMODE)
 
+/* For now, don't provide an embedded version of ISA 2.07.  */
+#define ISA_2_7_MASKS_SERVER	(ISA_2_6_MASKS_SERVER			\
+				 | OPTION_MASK_P8_FUSION		\
+				 | OPTION_MASK_P8_VECTOR		\
+				 | OPTION_MASK_CRYPTO			\
+				 | OPTION_MASK_DIRECT_MOVE		\
+				 | OPTION_MASK_QUAD_MEMORY)
+
 #define POWERPC_7400_MASK	(OPTION_MASK_PPC_GFXOPT | OPTION_MASK_ALTIVEC)
 
 /* Deal with ports that do not have -mstrict-align.  */
@@ -61,7 +69,9 @@
 /* Mask of all options to set the default isa flags based on -mcpu=<xxx>.  */
 #define POWERPC_MASKS		(OPTION_MASK_ALTIVEC			\
 				 | OPTION_MASK_CMPB			\
+				 | OPTION_MASK_CRYPTO			\
 				 | OPTION_MASK_DFP			\
+				 | OPTION_MASK_DIRECT_MOVE		\
 				 | OPTION_MASK_DLMZB			\
 				 | OPTION_MASK_FPRND			\
 				 | OPTION_MASK_ISEL			\
@@ -69,11 +79,14 @@
 				 | OPTION_MASK_MFPGPR			\
 				 | OPTION_MASK_MULHW			\
 				 | OPTION_MASK_NO_UPDATE		\
+				 | OPTION_MASK_P8_FUSION		\
+				 | OPTION_MASK_P8_VECTOR		\
 				 | OPTION_MASK_POPCNTB			\
 				 | OPTION_MASK_POPCNTD			\
 				 | OPTION_MASK_POWERPC64		\
 				 | OPTION_MASK_PPC_GFXOPT		\
 				 | OPTION_MASK_PPC_GPOPT		\
+				 | OPTION_MASK_QUAD_MEMORY		\
 				 | OPTION_MASK_RECIP_PRECISION		\
 				 | OPTION_MASK_SOFT_FLOAT		\
 				 | OPTION_MASK_STRICT_ALIGN_OPTIONAL	\
@@ -168,10 +181,7 @@ RS6000_CPU ("power7", PROCESSOR_POWER7, 
 	    POWERPC_7400_MASK | MASK_POWERPC64 | MASK_PPC_GPOPT | MASK_MFCRF
 	    | MASK_POPCNTB | MASK_FPRND | MASK_CMPB | MASK_DFP | MASK_POPCNTD
 	    | MASK_VSX | MASK_RECIP_PRECISION | MASK_VSX_TIMODE)
-RS6000_CPU ("power8", PROCESSOR_POWER7,   /* Don't add MASK_ISEL by default */
-	    POWERPC_7400_MASK | MASK_POWERPC64 | MASK_PPC_GPOPT | MASK_MFCRF
-	    | MASK_POPCNTB | MASK_FPRND | MASK_CMPB | MASK_DFP | MASK_POPCNTD
-	    | MASK_VSX | MASK_RECIP_PRECISION | MASK_VSX_TIMODE)
+RS6000_CPU ("power8", PROCESSOR_POWER7, MASK_POWERPC64 | ISA_2_7_MASKS_SERVER)
 RS6000_CPU ("powerpc", PROCESSOR_POWERPC, 0)
 RS6000_CPU ("powerpc64", PROCESSOR_POWERPC64, MASK_PPC_GFXOPT | MASK_POWERPC64)
 RS6000_CPU ("rs64", PROCESSOR_RS64A, MASK_PPC_GFXOPT | MASK_POWERPC64)
Index: gcc/config/rs6000/rs6000-opts.h
===================================================================
--- gcc/config/rs6000/rs6000-opts.h	(revision 199121)
+++ gcc/config/rs6000/rs6000-opts.h	(revision 199122)
@@ -59,7 +59,8 @@ enum processor_type
    PROCESSOR_POWER7,
    PROCESSOR_CELL,
    PROCESSOR_PPCA2,
-   PROCESSOR_TITAN
+   PROCESSOR_TITAN,
+   PROCESSOR_POWER8
 };
 
 /* FP processor type.  */
@@ -131,11 +132,14 @@ enum rs6000_cmodel {
   CMODEL_LARGE
 };
 
-/* Describe which vector unit to use for a given machine mode.  */
+/* Describe which vector unit to use for a given machine mode.  The
+   VECTOR_MEM_* and VECTOR_UNIT_* macros assume that Altivec, VSX, and
+   P8_VECTOR are contiguous.  */
 enum rs6000_vector {
   VECTOR_NONE,			/* Type is not  a vector or not supported */
   VECTOR_ALTIVEC,		/* Use altivec for vector processing */
   VECTOR_VSX,			/* Use VSX for vector processing */
+  VECTOR_P8_VECTOR,		/* Use ISA 2.07 VSX for vector processing */
   VECTOR_PAIRED,		/* Use paired floating point for vectors */
   VECTOR_SPE,			/* Use SPE for vector processing */
   VECTOR_OTHER			/* Some other vector unit */

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH, rs6000] power8 patches, patch #2, add crypto builtins
  2013-05-20 20:41 [PATCH, rs6000] power8 patches Michael Meissner
  2013-05-20 20:49 ` [PATCH, rs6000] power8 patch #1, infrastructure changes Michael Meissner
@ 2013-05-20 23:13 ` Michael Meissner
  2013-05-22  3:30   ` David Edelsohn
  2013-05-21  2:11 ` [PATCH, rs6000] power8 patches Peter Bergner
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 52+ messages in thread
From: Michael Meissner @ 2013-05-20 23:13 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, dje.gcc, pthaugen, bergner

[-- Attachment #1: Type: text/plain, Size: 2088 bytes --]

This patch adds the builtins for the new ISA 2.07 crypto instructions.  It
bootstraps and causes no regressions, is it ok to install after patch #1 has
been applied?

[gcc]
2013-05-20  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* doc/extend.texi (PowerPC AltiVec/VSX Built-in Functions): Add
	documentation for the power8 crypto builtins.

	* config/rs6000/t-rs6000 (MD_INCLUDES): Add crypto.md.

	* config/rs6000/rs6000-builtin.def (BU_P8V_AV_1): Add support
	macros for defining power8 builtin functions.
	(BU_P8V_AV_2): Likewise.
	(BU_P8V_AV_P): Likewise.
	(BU_P8V_VSX_1): Likewise.
	(BU_P8V_OVERLOAD_1): Likewise.
	(BU_P8V_OVERLOAD_2): Likewise.
	(BU_CRYPTO_1): Likewise.
	(BU_CRYPTO_2): Likewise.
	(BU_CRYPTO_3): Likewise.
	(BU_CRYPTO_OVERLOAD_1): Likewise.
	(BU_CRYPTO_OVERLOAD_2): Likewise.
	(XSCVSPDP): Fix typo, point to the correct instruction.
	(VCIPHER): Add power8 crypto builtins.
	(VCIPHERLAST): Likewise.
	(VNCIPHER): Likewise.
	(VNCIPHERLAST): Likewise.
	(VPMSUMB): Likewise.
	(VPMSUMH): Likewise.
	(VPMSUMW): Likewise.
	(VPERMXOR_V2DI): Likewise.
	(VPERMXOR_V4SI: Likewise.
	(VPERMXOR_V8HI: Likewise.
	(VPERMXOR_V16QI: Likewise.
	(VSHASIGMAW): Likewise.
	(VSHASIGMAD): Likewise.
	(VPMSUM): Likewise.
	(VPERMXOR): Likewise.
	(VSHASIGMA): Likewise.

	* config/rs6000/rs6000-c.c (rs6000_target_modify_macros): Define
	__CRYPTO__ if the crypto instructions are available.
	(altivec_overloaded_builtins): Add support for overloaded power8
	builtins.

	* config/rs6000/rs6000.c (rs6000_expand_ternop_builtin): Add
	support for power8 crypto builtins.
	(builtin_function_type): Likewise.
	(altivec_init_builtins): Add support for builtins that take vector
	long long (V2DI) arguments.

	* config/rs6000/crypto.md: New file, define power8 crypto
	instructions.

[gcc/testsuite]
2013-05-20  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* gcc.target/powerpc/crypto-builtin-1.c: New file, test for power8
	crypto builtins.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

[-- Attachment #2: gcc-power8.official-02b --]
[-- Type: text/plain, Size: 27172 bytes --]

Index: gcc/doc/extend.texi
===================================================================
--- gcc/doc/extend.texi	(revision 199037)
+++ gcc/doc/extend.texi	(working copy)
@@ -13937,6 +13937,66 @@ if the VSX instruction set is available.
 @samp{vec_vsx_st} built-in functions always generate the VSX @samp{LXVD2X},
 @samp{LXVW4X}, @samp{STXVD2X}, and @samp{STXVW4X} instructions.
 
+If the cryptographic instructions are enabled (@option{-mcrypto} or
+@option{-mcpu=power8}), the following builtins are enabled.
+
+@smallexample
+vector unsigned long long __builtin_crypto_vsbox (vector unsigned long long);
+
+vector unsigned long long __builtin_crypto_vcipher (vector unsigned long long,
+                                                    vector unsigned long long);
+
+vector unsigned long long __builtin_crypto_vcipherlast
+                                     (vector unsigned long long,
+                                      vector unsigned long long);
+
+vector unsigned long long __builtin_crypto_vncipher (vector unsigned long long,
+                                                     vector unsigned long long);
+
+vector unsigned long long __builtin_crypto_vncipherlast
+                                     (vector unsigned long long,
+                                      vector unsigned long long);
+
+vector unsigned char __builtin_crypto_vpermxor (vector unsigned char,
+                                                vector unsigned char,
+                                                vector unsigned char);
+
+vector unsigned short __builtin_crypto_vpermxor (vector unsigned short,
+                                                 vector unsigned short,
+                                                 vector unsigned short);
+
+vector unsigned int __builtin_crypto_vpermxor (vector unsigned int,
+                                               vector unsigned int,
+                                               vector unsigned int);
+
+vector unsigned long long __builtin_crypto_vpermxor (vector unsigned long long,
+                                                     vector unsigned long long,
+                                                     vector unsigned long long);
+
+vector unsigned char __builtin_crypto_vpmsumb (vector unsigned char,
+                                               vector unsigned char);
+
+vector unsigned short __builtin_crypto_vpmsumb (vector unsigned short,
+                                                vector unsigned short);
+
+vector unsigned int __builtin_crypto_vpmsumb (vector unsigned int,
+                                              vector unsigned int);
+
+vector unsigned long long __builtin_crypto_vpmsumb (vector unsigned long long,
+                                                    vector unsigned long long);
+
+vector unsigned long long __builtin_crypto_vshasigmad
+                               (vector unsigned long long, int, int);
+
+vector unsigned int __builtin_crypto_vshasigmaw (vector unsigned int,
+                                                 int, int);
+@end smallexample
+
+The second argument to the @var{__builtin_crypto_vshasigmad} and
+@var{__builtin_crypto_vshasigmaw} builtin functions must be a constant
+integer that is 0 or 1.  The third argument to these builtin functions
+must be a constant integer in the range of 0 to 15.
+
 @node RX Built-in Functions
 @subsection RX Built-in Functions
 GCC supports some of the RX instructions which cannot be expressed in
Index: gcc/testsuite/gcc.target/powerpc/crypto-builtin-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/crypto-builtin-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/crypto-builtin-1.c	(revision 0)
@@ -0,0 +1,130 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-mcpu=power8 -O2 -ftree-vectorize -fvect-cost-model -fno-unroll-loops -fno-unroll-all-loops" } */
+
+typedef vector unsigned long long	crypto_t;
+typedef vector unsigned long long	v2di_t;
+typedef vector unsigned int		v4si_t;
+typedef vector unsigned short		v8hi_t;
+typedef vector unsigned char		v16qi_t;
+
+crypto_t crpyto1 (crypto_t a)
+{
+  return __builtin_crypto_vsbox (a);
+}
+
+crypto_t crypto2 (crypto_t a, crypto_t b)
+{
+  return __builtin_crypto_vcipher (a, b);
+}
+
+crypto_t crypto3 (crypto_t a, crypto_t b)
+{
+  return __builtin_crypto_vcipherlast (a, b);
+}
+
+crypto_t crypto4 (crypto_t a, crypto_t b)
+{
+  return __builtin_crypto_vncipher (a, b);
+}
+
+crypto_t crypto5 (crypto_t a, crypto_t b)
+{
+  return __builtin_crypto_vncipherlast (a, b);
+}
+
+v16qi_t crypto6a (v16qi_t a, v16qi_t b, v16qi_t c)
+{
+  return __builtin_crypto_vpermxor (a, b, c);
+}
+
+v8hi_t crypto6b (v8hi_t a, v8hi_t b, v8hi_t c)
+{
+  return __builtin_crypto_vpermxor (a, b, c);
+}
+
+v4si_t crypto6c (v4si_t a, v4si_t b, v4si_t c)
+{
+  return __builtin_crypto_vpermxor (a, b, c);
+}
+
+v2di_t crypto6d (v2di_t a, v2di_t b, v2di_t c)
+{
+  return __builtin_crypto_vpermxor (a, b, c);
+}
+
+v16qi_t crypto7a (v16qi_t a, v16qi_t b)
+{
+  return __builtin_crypto_vpmsumb (a, b);
+}
+
+v16qi_t crypto7b (v16qi_t a, v16qi_t b)
+{
+  return __builtin_crypto_vpmsum (a, b);
+}
+
+v8hi_t crypto7c (v8hi_t a, v8hi_t b)
+{
+  return __builtin_crypto_vpmsumh (a, b);
+}
+
+v8hi_t crypto7d (v8hi_t a, v8hi_t b)
+{
+  return __builtin_crypto_vpmsum (a, b);
+}
+
+v4si_t crypto7e (v4si_t a, v4si_t b)
+{
+  return __builtin_crypto_vpmsumw (a, b);
+}
+
+v4si_t crypto7f (v4si_t a, v4si_t b)
+{
+  return __builtin_crypto_vpmsum (a, b);
+}
+
+v2di_t crypto7g (v2di_t a, v2di_t b)
+{
+  return __builtin_crypto_vpmsumd (a, b);
+}
+
+v2di_t crypto7h (v2di_t a, v2di_t b)
+{
+  return __builtin_crypto_vpmsum (a, b);
+}
+
+v2di_t crypto8a (v2di_t a)
+{
+  return __builtin_crypto_vshasigmad (a, 0, 8);
+}
+
+v2di_t crypto8b (v2di_t a)
+{
+  return __builtin_crypto_vshasigma (a, 0, 8);
+}
+
+v4si_t crypto8c (v4si_t a)
+{
+  return __builtin_crypto_vshasigmaw (a, 1, 15);
+}
+
+v4si_t crypto8d (v4si_t a)
+{
+  return __builtin_crypto_vshasigma (a, 1, 15);
+}
+
+/* Note space is used after the instruction so that vcipherlast does not match
+   vcipher.  */
+/* { dg-final { scan-assembler-times "vcipher "      1 } } */
+/* { dg-final { scan-assembler-times "vcipherlast "  1 } } */
+/* { dg-final { scan-assembler-times "vncipher "     1 } } */
+/* { dg-final { scan-assembler-times "vncipherlast " 1 } } */
+/* { dg-final { scan-assembler-times "vpermxor "     4 } } */
+/* { dg-final { scan-assembler-times "vpmsumb "      2 } } */
+/* { dg-final { scan-assembler-times "vpmsumd "      2 } } */
+/* { dg-final { scan-assembler-times "vpmsumh "      2 } } */
+/* { dg-final { scan-assembler-times "vpmsumw "      2 } } */
+/* { dg-final { scan-assembler-times "vsbox "        1 } } */
+/* { dg-final { scan-assembler-times "vshasigmad "   2 } } */
+/* { dg-final { scan-assembler-times "vshasigmaw "   2 } } */
Index: gcc/config/rs6000/t-rs6000
===================================================================
--- gcc/config/rs6000/t-rs6000	(revision 199037)
+++ gcc/config/rs6000/t-rs6000	(working copy)
@@ -70,6 +70,7 @@ MD_INCLUDES = $(srcdir)/config/rs6000/rs
 	$(srcdir)/config/rs6000/vector.md \
 	$(srcdir)/config/rs6000/vsx.md \
 	$(srcdir)/config/rs6000/altivec.md \
+	$(srcdir)/config/rs6000/crypto.md \
 	$(srcdir)/config/rs6000/spe.md \
 	$(srcdir)/config/rs6000/dfp.md \
 	$(srcdir)/config/rs6000/paired.md
Index: gcc/config/rs6000/rs6000-builtin.def
===================================================================
--- gcc/config/rs6000/rs6000-builtin.def	(revision 199037)
+++ gcc/config/rs6000/rs6000-builtin.def	(working copy)
@@ -30,7 +30,7 @@
    RS6000_BUILTIN_A -- ABS builtins
    RS6000_BUILTIN_D -- DST builtins
    RS6000_BUILTIN_E -- SPE EVSEL builtins.
-   RS6000_BUILTIN_P -- Altivec and VSX predicate builtins
+   RS6000_BUILTIN_P -- Altivec, VSX, Power8 vector predicate builtins
    RS6000_BUILTIN_Q -- Paired floating point VSX predicate builtins
    RS6000_BUILTIN_S -- SPE predicate builtins
    RS6000_BUILTIN_X -- special builtins
@@ -301,6 +301,108 @@
 		     | RS6000_BTC_SPECIAL),				\
 		    CODE_FOR_nothing)			/* ICODE */
 
+/* Power8 vector convenience macros.  */
+/* For the instructions that are encoded as altivec instructions use
+   __builtin_altivec_ as the builtin name.  */
+#define BU_P8V_AV_1(ENUM, NAME, ATTR, ICODE)				\
+  RS6000_BUILTIN_1 (P8V_BUILTIN_ ## ENUM,		/* ENUM */	\
+		    "__builtin_altivec_" NAME,		/* NAME */	\
+		    RS6000_BTM_P8_VECTOR,		/* MASK */	\
+		    (RS6000_BTC_ ## ATTR		/* ATTR */	\
+		     | RS6000_BTC_UNARY),				\
+		    CODE_FOR_ ## ICODE)			/* ICODE */
+
+#define BU_P8V_AV_2(ENUM, NAME, ATTR, ICODE)				\
+  RS6000_BUILTIN_2 (P8V_BUILTIN_ ## ENUM,		/* ENUM */	\
+		    "__builtin_altivec_" NAME,		/* NAME */	\
+		    RS6000_BTM_P8_VECTOR,		/* MASK */	\
+		    (RS6000_BTC_ ## ATTR		/* ATTR */	\
+		     | RS6000_BTC_BINARY),				\
+		    CODE_FOR_ ## ICODE)			/* ICODE */
+
+#define BU_P8V_AV_P(ENUM, NAME, ATTR, ICODE)				\
+  RS6000_BUILTIN_P (P8V_BUILTIN_ ## ENUM,		/* ENUM */	\
+		    "__builtin_altivec_" NAME,		/* NAME */	\
+		    RS6000_BTM_P8_VECTOR,		/* MASK */	\
+		    (RS6000_BTC_ ## ATTR		/* ATTR */	\
+		     | RS6000_BTC_PREDICATE),				\
+		    CODE_FOR_ ## ICODE)			/* ICODE */
+
+/* For the instructions encoded as VSX instructions use __builtin_vsx as the
+   builtin name.  */
+#define BU_P8V_VSX_1(ENUM, NAME, ATTR, ICODE)				\
+  RS6000_BUILTIN_1 (P8V_BUILTIN_ ## ENUM,		/* ENUM */	\
+		    "__builtin_vsx_" NAME,		/* NAME */	\
+		    RS6000_BTM_P8_VECTOR,		/* MASK */	\
+		    (RS6000_BTC_ ## ATTR		/* ATTR */	\
+		     | RS6000_BTC_UNARY),				\
+		    CODE_FOR_ ## ICODE)			/* ICODE */
+
+#define BU_P8V_OVERLOAD_1(ENUM, NAME)					\
+  RS6000_BUILTIN_1 (P8V_BUILTIN_VEC_ ## ENUM,		/* ENUM */	\
+		    "__builtin_vec_" NAME,		/* NAME */	\
+		    RS6000_BTM_P8_VECTOR,		/* MASK */	\
+		    (RS6000_BTC_OVERLOADED		/* ATTR */	\
+		     | RS6000_BTC_UNARY),				\
+		    CODE_FOR_nothing)			/* ICODE */
+
+#define BU_P8V_OVERLOAD_2(ENUM, NAME)					\
+  RS6000_BUILTIN_2 (P8V_BUILTIN_VEC_ ## ENUM,		/* ENUM */	\
+		    "__builtin_vec_" NAME,		/* NAME */	\
+		    RS6000_BTM_P8_VECTOR,		/* MASK */	\
+		    (RS6000_BTC_OVERLOADED		/* ATTR */	\
+		     | RS6000_BTC_BINARY),				\
+		    CODE_FOR_nothing)			/* ICODE */
+
+/* Crypto convenience macros.  */
+#define BU_CRYPTO_1(ENUM, NAME, ATTR, ICODE)				\
+  RS6000_BUILTIN_1 (CRYPTO_BUILTIN_ ## ENUM,		/* ENUM */	\
+		    "__builtin_crypto_" NAME,		/* NAME */	\
+		    RS6000_BTM_CRYPTO,			/* MASK */	\
+		    (RS6000_BTC_ ## ATTR		/* ATTR */	\
+		     | RS6000_BTC_UNARY),				\
+		    CODE_FOR_ ## ICODE)			/* ICODE */
+
+#define BU_CRYPTO_2(ENUM, NAME, ATTR, ICODE)				\
+  RS6000_BUILTIN_2 (CRYPTO_BUILTIN_ ## ENUM,		/* ENUM */	\
+		    "__builtin_crypto_" NAME,		/* NAME */	\
+		    RS6000_BTM_CRYPTO,			/* MASK */	\
+		    (RS6000_BTC_ ## ATTR		/* ATTR */	\
+		     | RS6000_BTC_BINARY),				\
+		    CODE_FOR_ ## ICODE)			/* ICODE */
+
+#define BU_CRYPTO_3(ENUM, NAME, ATTR, ICODE)				\
+  RS6000_BUILTIN_3 (CRYPTO_BUILTIN_ ## ENUM,		/* ENUM */	\
+		    "__builtin_crypto_" NAME,		/* NAME */	\
+		    RS6000_BTM_CRYPTO,			/* MASK */	\
+		    (RS6000_BTC_ ## ATTR		/* ATTR */	\
+		     | RS6000_BTC_TERNARY),				\
+		    CODE_FOR_ ## ICODE)			/* ICODE */
+
+#define BU_CRYPTO_OVERLOAD_1(ENUM, NAME)				\
+  RS6000_BUILTIN_1 (CRYPTO_BUILTIN_ ## ENUM,		/* ENUM */	\
+		    "__builtin_crypto_" NAME,		/* NAME */	\
+		    RS6000_BTM_CRYPTO,			/* MASK */	\
+		    (RS6000_BTC_OVERLOADED		/* ATTR */	\
+		     | RS6000_BTC_UNARY),				\
+		    CODE_FOR_nothing)			/* ICODE */
+
+#define BU_CRYPTO_OVERLOAD_2(ENUM, NAME)				\
+  RS6000_BUILTIN_2 (CRYPTO_BUILTIN_ ## ENUM,		/* ENUM */	\
+		    "__builtin_crypto_" NAME,		/* NAME */	\
+		    RS6000_BTM_CRYPTO,			/* MASK */	\
+		    (RS6000_BTC_OVERLOADED		/* ATTR */	\
+		     | RS6000_BTC_BINARY),				\
+		    CODE_FOR_nothing)			/* ICODE */
+
+#define BU_CRYPTO_OVERLOAD_3(ENUM, NAME)				\
+  RS6000_BUILTIN_3 (CRYPTO_BUILTIN_ ## ENUM,		/* ENUM */	\
+		    "__builtin_crypto_" NAME,		/* NAME */	\
+		    RS6000_BTM_CRYPTO,			/* MASK */	\
+		    (RS6000_BTC_OVERLOADED		/* ATTR */	\
+		     | RS6000_BTC_TERNARY),				\
+		    CODE_FOR_nothing)			/* ICODE */
+
 /* SPE convenience macros.  */
 #define BU_SPE_1(ENUM, NAME, ATTR, ICODE)				\
   RS6000_BUILTIN_1 (SPE_BUILTIN_ ## ENUM,		/* ENUM */	\
@@ -1012,7 +1114,7 @@ BU_VSX_1 (XVTSQRTSP_FG,	      "xvtsqrtsp
 BU_VSX_1 (XVRESP,	      "xvresp",		CONST,	vsx_frev4sf2)
 
 BU_VSX_1 (XSCVDPSP,	      "xscvdpsp",	CONST,	vsx_xscvdpsp)
-BU_VSX_1 (XSCVSPDP,	      "xscvspdp",	CONST,	vsx_xscvdpsp)
+BU_VSX_1 (XSCVSPDP,	      "xscvspdp",	CONST,	vsx_xscvspdp)
 BU_VSX_1 (XVCVDPSP,	      "xvcvdpsp",	CONST,	vsx_xvcvdpsp)
 BU_VSX_1 (XVCVSPDP,	      "xvcvspdp",	CONST,	vsx_xvcvspdp)
 BU_VSX_1 (XSTSQRTDP_FE,	      "xstsqrtdp_fe",	CONST,	vsx_tsqrtdf2_fe)
@@ -1132,6 +1234,35 @@ BU_VSX_OVERLOAD_2 (XXSPLTW,  "xxspltw")
 BU_VSX_OVERLOAD_X (LD,	     "ld")
 BU_VSX_OVERLOAD_X (ST,	     "st")
 \f
+/* 1 argument crypto functions.  */
+BU_CRYPTO_1 (VSBOX,		"vsbox",	  CONST, crypto_vsbox)
+
+/* 2 argument crypto functions.  */
+BU_CRYPTO_2 (VCIPHER,		"vcipher",	  CONST, crypto_vcipher)
+BU_CRYPTO_2 (VCIPHERLAST,	"vcipherlast",	  CONST, crypto_vcipherlast)
+BU_CRYPTO_2 (VNCIPHER,		"vncipher",	  CONST, crypto_vncipher)
+BU_CRYPTO_2 (VNCIPHERLAST,	"vncipherlast",	  CONST, crypto_vncipherlast)
+BU_CRYPTO_2 (VPMSUMB,		"vpmsumb",	  CONST, crypto_vpmsumb)
+BU_CRYPTO_2 (VPMSUMH,		"vpmsumh",	  CONST, crypto_vpmsumh)
+BU_CRYPTO_2 (VPMSUMW,		"vpmsumw",	  CONST, crypto_vpmsumw)
+BU_CRYPTO_2 (VPMSUMD,		"vpmsumd",	  CONST, crypto_vpmsumd)
+
+/* 3 argument crypto functions.  */
+BU_CRYPTO_3 (VPERMXOR_V2DI,	"vpermxor_v2di",  CONST, crypto_vpermxor_v2di)
+BU_CRYPTO_3 (VPERMXOR_V4SI,	"vpermxor_v4si",  CONST, crypto_vpermxor_v4si)
+BU_CRYPTO_3 (VPERMXOR_V8HI,	"vpermxor_v8hi",  CONST, crypto_vpermxor_v8hi)
+BU_CRYPTO_3 (VPERMXOR_V16QI,	"vpermxor_v16qi", CONST, crypto_vpermxor_v16qi)
+BU_CRYPTO_3 (VSHASIGMAW,	"vshasigmaw",	  CONST, crypto_vshasigmaw)
+BU_CRYPTO_3 (VSHASIGMAD,	"vshasigmad",	  CONST, crypto_vshasigmad)
+
+/* 2 argument crypto overloaded functions.  */
+BU_CRYPTO_OVERLOAD_2 (VPMSUM,	 "vpmsum")
+
+/* 3 argument crypto overloaded functions.  */
+BU_CRYPTO_OVERLOAD_3 (VPERMXOR,	 "vpermxor")
+BU_CRYPTO_OVERLOAD_3 (VSHASIGMA, "vshasigma")
+
+\f
 /* 3 argument paired floating point builtins.  */
 BU_PAIRED_3 (MSUB,            "msub",           FP, 	fmsv2sf4)
 BU_PAIRED_3 (MADD,            "madd",           FP, 	fmav2sf4)
Index: gcc/config/rs6000/rs6000-c.c
===================================================================
--- gcc/config/rs6000/rs6000-c.c	(revision 199122)
+++ gcc/config/rs6000/rs6000-c.c	(working copy)
@@ -335,6 +335,8 @@ rs6000_target_modify_macros (bool define
     rs6000_define_or_undefine_macro (define_p, "__VSX__");
   if ((flags & OPTION_MASK_P8_VECTOR) != 0)
     rs6000_define_or_undefine_macro (define_p, "__POWER8_VECTOR__");
+  if ((flags & OPTION_MASK_CRYPTO) != 0)
+    rs6000_define_or_undefine_macro (define_p, "__CRYPTO__");
 
   /* options from the builtin masks.  */
   if ((bu_mask & RS6000_BTM_SPE) != 0)
@@ -3381,6 +3383,40 @@ const struct altivec_builtin_types altiv
   { ALTIVEC_BUILTIN_VEC_VCMPGE_P, VSX_BUILTIN_XVCMPGEDP_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V2DF, RS6000_BTI_V2DF },
 
+  /* Crypto builtins.  */
+  { CRYPTO_BUILTIN_VPERMXOR, CRYPTO_BUILTIN_VPERMXOR_V16QI,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI },
+  { CRYPTO_BUILTIN_VPERMXOR, CRYPTO_BUILTIN_VPERMXOR_V8HI,
+    RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI,
+    RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI },
+  { CRYPTO_BUILTIN_VPERMXOR, CRYPTO_BUILTIN_VPERMXOR_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI },
+  { CRYPTO_BUILTIN_VPERMXOR, CRYPTO_BUILTIN_VPERMXOR_V2DI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI },
+
+  { CRYPTO_BUILTIN_VPMSUM, CRYPTO_BUILTIN_VPMSUMB,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI,
+    RS6000_BTI_unsigned_V16QI, 0 },
+  { CRYPTO_BUILTIN_VPMSUM, CRYPTO_BUILTIN_VPMSUMH,
+    RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI,
+    RS6000_BTI_unsigned_V8HI, 0 },
+  { CRYPTO_BUILTIN_VPMSUM, CRYPTO_BUILTIN_VPMSUMW,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
+    RS6000_BTI_unsigned_V4SI, 0 },
+  { CRYPTO_BUILTIN_VPMSUM, CRYPTO_BUILTIN_VPMSUMD,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
+
+  { CRYPTO_BUILTIN_VSHASIGMA, CRYPTO_BUILTIN_VSHASIGMAW,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
+    RS6000_BTI_INTSI, RS6000_BTI_INTSI },
+  { CRYPTO_BUILTIN_VSHASIGMA, CRYPTO_BUILTIN_VSHASIGMAD,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_INTSI, RS6000_BTI_INTSI },
+
   { (enum rs6000_builtins) 0, (enum rs6000_builtins) 0, 0, 0, 0, 0 }
 };
 \f
@@ -3828,7 +3864,8 @@ altivec_resolve_overloaded_builtin (loca
 	&& (desc->op2 == RS6000_BTI_NOT_OPAQUE
 	    || rs6000_builtin_type_compatible (types[1], desc->op2))
 	&& (desc->op3 == RS6000_BTI_NOT_OPAQUE
-	    || rs6000_builtin_type_compatible (types[2], desc->op3)))
+	    || rs6000_builtin_type_compatible (types[2], desc->op3))
+	&& rs6000_builtin_decls[desc->overloaded_code] != NULL_TREE)
       return altivec_build_resolved_builtin (args, n, desc);
 
  bad:
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 199122)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -10676,6 +10676,27 @@ rs6000_expand_ternop_builtin (enum insn_
 	  return const0_rtx;
 	}
     }
+  else if (icode == CODE_FOR_crypto_vshasigmaw
+	   || icode == CODE_FOR_crypto_vshasigmad)
+    {
+      /* Check whether the 2nd and 3rd arguments are integer constants and in
+	 range and prepare arguments.  */
+      STRIP_NOPS (arg1);
+      if (TREE_CODE (arg1) != INTEGER_CST
+	  || !IN_RANGE (TREE_INT_CST_LOW (arg1), 0, 1))
+	{
+	  error ("argument 2 must be 0 or 1");
+	  return const0_rtx;
+	}
+
+      STRIP_NOPS (arg2);
+      if (TREE_CODE (arg2) != INTEGER_CST
+	  || !IN_RANGE (TREE_INT_CST_LOW (arg2), 0, 15))
+	{
+	  error ("argument 3 must be in the range 0..15");
+	  return const0_rtx;
+	}
+    }
 
   if (target == 0
       || GET_MODE (target) != tmode
@@ -12366,6 +12387,10 @@ altivec_init_builtins (void)
     = build_function_type_list (integer_type_node,
 				integer_type_node, V4SI_type_node,
 				V4SI_type_node, NULL_TREE);
+  tree int_ftype_int_v2di_v2di
+    = build_function_type_list (integer_type_node,
+				integer_type_node, V2DI_type_node,
+				V2DI_type_node, NULL_TREE);
   tree void_ftype_v4si
     = build_function_type_list (void_type_node, V4SI_type_node, NULL_TREE);
   tree v8hi_ftype_void
@@ -12448,6 +12473,8 @@ altivec_init_builtins (void)
     = build_function_type_list (integer_type_node,
 				integer_type_node, V2DF_type_node,
 				V2DF_type_node, NULL_TREE);
+  tree v2di_ftype_v2di
+    = build_function_type_list (V2DI_type_node, V2DI_type_node, NULL_TREE);
   tree v4si_ftype_v4si
     = build_function_type_list (V4SI_type_node, V4SI_type_node, NULL_TREE);
   tree v8hi_ftype_v8hi
@@ -12583,6 +12610,9 @@ altivec_init_builtins (void)
 	case VOIDmode:
 	  type = int_ftype_int_opaque_opaque;
 	  break;
+	case V2DImode:
+	  type = int_ftype_int_v2di_v2di;
+	  break;
 	case V4SImode:
 	  type = int_ftype_int_v4si_v4si;
 	  break;
@@ -12616,6 +12646,9 @@ altivec_init_builtins (void)
 
       switch (mode0)
 	{
+	case V2DImode:
+	  type = v2di_ftype_v2di;
+	  break;
 	case V4SImode:
 	  type = v4si_ftype_v4si;
 	  break;
@@ -12821,11 +12854,26 @@ builtin_function_type (enum machine_mode
      are type correct.  */
   switch (builtin)
     {
+      /* unsigned 1 argument functions.  */
+    case CRYPTO_BUILTIN_VSBOX:
+      h.uns_p[0] = 1;
+      h.uns_p[1] = 1;
+      break;
+
       /* unsigned 2 argument functions.  */
     case ALTIVEC_BUILTIN_VMULEUB_UNS:
     case ALTIVEC_BUILTIN_VMULEUH_UNS:
     case ALTIVEC_BUILTIN_VMULOUB_UNS:
     case ALTIVEC_BUILTIN_VMULOUH_UNS:
+    case CRYPTO_BUILTIN_VCIPHER:
+    case CRYPTO_BUILTIN_VCIPHERLAST:
+    case CRYPTO_BUILTIN_VNCIPHER:
+    case CRYPTO_BUILTIN_VNCIPHERLAST:
+    case CRYPTO_BUILTIN_VPMSUMB:
+    case CRYPTO_BUILTIN_VPMSUMH:
+    case CRYPTO_BUILTIN_VPMSUMW:
+    case CRYPTO_BUILTIN_VPMSUMD:
+    case CRYPTO_BUILTIN_VPMSUM:
       h.uns_p[0] = 1;
       h.uns_p[1] = 1;
       h.uns_p[2] = 1;
@@ -12848,6 +12896,14 @@ builtin_function_type (enum machine_mode
     case VSX_BUILTIN_XXSEL_8HI_UNS:
     case VSX_BUILTIN_XXSEL_4SI_UNS:
     case VSX_BUILTIN_XXSEL_2DI_UNS:
+    case CRYPTO_BUILTIN_VPERMXOR:
+    case CRYPTO_BUILTIN_VPERMXOR_V2DI:
+    case CRYPTO_BUILTIN_VPERMXOR_V4SI:
+    case CRYPTO_BUILTIN_VPERMXOR_V8HI:
+    case CRYPTO_BUILTIN_VPERMXOR_V16QI:
+    case CRYPTO_BUILTIN_VSHASIGMAW:
+    case CRYPTO_BUILTIN_VSHASIGMAD:
+    case CRYPTO_BUILTIN_VSHASIGMA:
       h.uns_p[0] = 1;
       h.uns_p[1] = 1;
       h.uns_p[2] = 1;
Index: gcc/config/rs6000/crypto.md
===================================================================
--- gcc/config/rs6000/crypto.md	(revision 0)
+++ gcc/config/rs6000/crypto.md	(revision 0)
@@ -0,0 +1,101 @@
+;; Cryptographic instructions added in ISA 2.07
+;; Copyright (C) 2012-2013 Free Software Foundation, Inc.
+;; Contributed by Michael Meissner (meissner@linux.vnet.ibm.com)
+
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published
+;; by the Free Software Foundation; either version 3, or (at your
+;; option) any later version.
+
+;; GCC is distributed in the hope that it will be useful, but WITHOUT
+;; ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+;; or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+;; License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+(define_c_enum "unspec"
+  [UNSPEC_VCIPHER
+   UNSPEC_VNCIPHER
+   UNSPEC_VCIPHERLAST
+   UNSPEC_VNCIPHERLAST
+   UNSPEC_VSBOX
+   UNSPEC_VSHASIGMA
+   UNSPEC_VPERMXOR
+   UNSPEC_VPMSUM])
+
+;; Iterator for VPMSUM/VPERMXOR
+(define_mode_iterator CR_mode [V16QI V8HI V4SI V2DI])
+
+(define_mode_attr CR_char [(V16QI "b")
+			   (V8HI  "h")
+			   (V4SI  "w")
+			   (V2DI  "d")])
+
+;; Iterator for VSHASIGMAD/VSHASIGMAW
+(define_mode_iterator CR_hash [V4SI V2DI])
+
+;; Iterator for the other crypto functions
+(define_int_iterator CR_code   [UNSPEC_VCIPHER
+				UNSPEC_VNCIPHER
+				UNSPEC_VCIPHERLAST
+				UNSPEC_VNCIPHERLAST])
+
+(define_int_attr CR_insn [(UNSPEC_VCIPHER      "vcipher")
+			  (UNSPEC_VNCIPHER     "vncipher")
+			  (UNSPEC_VCIPHERLAST  "vcipherlast")
+			  (UNSPEC_VNCIPHERLAST "vncipherlast")])
+
+;; 2 operand crypto instructions
+(define_insn "crypto_<CR_insn>"
+  [(set (match_operand:V2DI 0 "register_operand" "=v")
+	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "v")
+		      (match_operand:V2DI 2 "register_operand" "v")]
+		     CR_code))]
+  "TARGET_CRYPTO"
+  "<CR_insn> %0,%1,%2"
+  [(set_attr "type" "crypto")])
+
+(define_insn "crypto_vpmsum<CR_char>"
+  [(set (match_operand:CR_mode 0 "register_operand" "=v")
+	(unspec:CR_mode [(match_operand:CR_mode 1 "register_operand" "v")
+			 (match_operand:CR_mode 2 "register_operand" "v")]
+			UNSPEC_VPMSUM))]
+  "TARGET_CRYPTO"
+  "vpmsum<CR_char> %0,%1,%2"
+  [(set_attr "type" "crypto")])
+
+;; 3 operand crypto instructions
+(define_insn "crypto_vpermxor_<mode>"
+  [(set (match_operand:CR_mode 0 "register_operand" "=v")
+	(unspec:CR_mode [(match_operand:CR_mode 1 "register_operand" "v")
+			 (match_operand:CR_mode 2 "register_operand" "v")
+			 (match_operand:CR_mode 3 "register_operand" "v")]
+			UNSPEC_VPERMXOR))]
+  "TARGET_CRYPTO"
+  "vpermxor %0,%1,%2,%3"
+  [(set_attr "type" "crypto")])
+
+;; 1 operand crypto instruction
+(define_insn "crypto_vsbox"
+  [(set (match_operand:V2DI 0 "register_operand" "=v")
+	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "v")]
+		     UNSPEC_VSBOX))]
+  "TARGET_CRYPTO"
+  "vsbox %0,%1"
+  [(set_attr "type" "crypto")])
+
+;; Hash crypto instructions
+(define_insn "crypto_vshasigma<CR_char>"
+  [(set (match_operand:CR_hash 0 "register_operand" "=v")
+	(unspec:CR_hash [(match_operand:CR_hash 1 "register_operand" "v")
+			 (match_operand:SI 2 "const_0_to_1_operand" "n")
+			 (match_operand:SI 3 "const_0_to_15_operand" "n")]
+			UNSPEC_VSHASIGMA))]
+  "TARGET_CRYPTO"
+  "vshasigma<CR_char> %0,%1,%2,%3"
+  [(set_attr "type" "crypto")])
Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(revision 199122)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -146,7 +146,7 @@ (define_c_enum "unspecv"
 \f
 ;; Define an insn type attribute.  This is used in function unit delay
 ;; computations.
-(define_attr "type" "integer,two,three,load,load_ext,load_ext_u,load_ext_ux,load_ux,load_u,store,store_ux,store_u,fpload,fpload_ux,fpload_u,fpstore,fpstore_ux,fpstore_u,vecload,vecstore,imul,imul2,imul3,lmul,idiv,ldiv,insert_word,branch,cmp,fast_compare,compare,var_delayed_compare,delayed_compare,imul_compare,lmul_compare,fpcompare,cr_logical,delayed_cr,mfcr,mfcrf,mtcr,mfjmpr,mtjmpr,fp,fpsimple,dmul,sdiv,ddiv,ssqrt,dsqrt,jmpreg,brinc,vecsimple,veccomplex,vecdiv,veccmp,veccmpsimple,vecperm,vecfloat,vecfdiv,vecdouble,isync,sync,load_l,store_c,shift,trap,insert_dword,var_shift_rotate,cntlz,exts,mffgpr,mftgpr,isel,popcnt"
+(define_attr "type" "integer,two,three,load,load_ext,load_ext_u,load_ext_ux,load_ux,load_u,store,store_ux,store_u,fpload,fpload_ux,fpload_u,fpstore,fpstore_ux,fpstore_u,vecload,vecstore,imul,imul2,imul3,lmul,idiv,ldiv,insert_word,branch,cmp,fast_compare,compare,var_delayed_compare,delayed_compare,imul_compare,lmul_compare,fpcompare,cr_logical,delayed_cr,mfcr,mfcrf,mtcr,mfjmpr,mtjmpr,fp,fpsimple,dmul,sdiv,ddiv,ssqrt,dsqrt,jmpreg,brinc,vecsimple,veccomplex,vecdiv,veccmp,veccmpsimple,vecperm,vecfloat,vecfdiv,vecdouble,isync,sync,load_l,store_c,shift,trap,insert_dword,var_shift_rotate,cntlz,exts,mffgpr,mftgpr,isel,popcnt,crypto"
   (const_string "integer"))
 
 ;; Define floating point instruction sub-types for use with Xfpu.md
@@ -14788,7 +14788,7 @@ (define_insn "bpermd_<mode>"
 		   (match_operand:P 2 "gpc_reg_operand" "r")] UNSPEC_BPERM))]
   "TARGET_POPCNTD"
   "bpermd %0,%1,%2"
-  [(set_attr "type" "integer")])
+  [(set_attr "type" "popcnt")])
 
 \f
 ;; Builtin fma support.  Handle 
@@ -14931,3 +14931,4 @@ (define_insn "rs6000_mftb_<mode>"
 (include "spe.md")
 (include "dfp.md")
 (include "paired.md")
+(include "crypto.md")

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH, rs6000] power8 patches
  2013-05-20 20:41 [PATCH, rs6000] power8 patches Michael Meissner
  2013-05-20 20:49 ` [PATCH, rs6000] power8 patch #1, infrastructure changes Michael Meissner
  2013-05-20 23:13 ` [PATCH, rs6000] power8 patches, patch #2, add crypto builtins Michael Meissner
@ 2013-05-21  2:11 ` Peter Bergner
  2013-05-21 15:51 ` [PATCH, rs6000] power8 patches, patch #3, add V2DI vector support Michael Meissner
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 52+ messages in thread
From: Peter Bergner @ 2013-05-21  2:11 UTC (permalink / raw)
  To: Michael Meissner; +Cc: gcc-patches, dje.gcc, pthaugen

On Mon, 2013-05-20 at 16:40 -0400, Michael Meissner wrote:
> Note, in order to build code for power8, you will need a power8 assembler,
> which will shortly be submitted to the binutils mailing lists.

Already submitted and committed upstream:

  http://sourceware.org/ml/binutils/2013-05/msg00235.html

Peter



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH, rs6000] power8 patches, patch #3, add V2DI vector support
  2013-05-20 20:41 [PATCH, rs6000] power8 patches Michael Meissner
                   ` (2 preceding siblings ...)
  2013-05-21  2:11 ` [PATCH, rs6000] power8 patches Peter Bergner
@ 2013-05-21 15:51 ` Michael Meissner
  2013-05-23 16:31   ` David Edelsohn
  2013-05-21 23:47 ` [PATCH, rs6000] power8 patches, patch #4, new power8 builtins Michael Meissner
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 52+ messages in thread
From: Michael Meissner @ 2013-05-21 15:51 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, dje.gcc, pthaugen, bergner

[-- Attachment #1: Type: text/plain, Size: 5360 bytes --]

This is patch #3 of our power8 changes.  It adds support for vectorizing 64-bit
integer types (V2DI) for plus, subtract, absolute value, minimum, maximum,
shift, rotate, and comparison.  Like the other patches, I have bootstraped
these patches, and had no regressions.  The test gcc.dg/vect/vect-96.c now
passes (it had failed on trunk, for compilers built with --with-cpu=power7).
Are the patches ok to commit to the tree.

Due to size issues, I will submit the tests for the testsuite either as part of
patch #4 or #5.

2013-05-20  Michael Meissner  <meissner@linux.vnet.ibm.com>
	    Pat Haugen <pthaugen@us.ibm.com>
	    Peter Bergner <bergner@vnet.ibm.com>

	* config/rs6000/vector.md (VEC_I): Add support for new power8 V2DI
	instructions.
	(VEC_A): Likewise.
	(VEC_C): Likewise.
	(vrotl<mode>3): Likewise.
	(vashl<mode>3): Likewise.
	(vlshr<mode>3): Likewise.
	(vashr<mode>3): Likewise.

	* config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Add
	support for power8 V2DI builtins.

	* config/rs6000/rs6000-builtin.def (abs_v2di): Add support for
	power8 V2DI builtins.
	(vupkhsw): Likewise.
	(vupklsw): Likewise.
	(vaddudm): Likewise.
	(vminsd): Likewise.
	(vmaxsd): Likewise.
	(vminud): Likewise.
	(vmaxud): Likewise.
	(vpkudum): Likewise.
	(vpksdss): Likewise.
	(vpkudus): Likewise.
	(vpksdus): Likewise.
	(vrld): Likewise.
	(vsld): Likewise.
	(vsrd): Likewise.
	(vsrad): Likewise.
	(vsubudm): Likewise.
	(vcmpequd): Likewise.
	(vcmpgtsd): Likewise.
	(vcmpgtud): Likewise.
	(vcmpequd_p): Likewise.
	(vcmpgtsd_p): Likewise.
	(vcmpgtud_p): Likewise.
	(vupkhsw): Likewise.
	(vupklsw): Likewise.
	(vaddudm): Likewise.
	(vmaxsd): Likewise.
	(vmaxud): Likewise.
	(vminsd): Likewise.
	(vminud): Likewise.
	(vpksdss): Likewise.
	(vpksdus): Likewise.
	(vpkudum): Likewise.
	(vpkudus): Likewise.
	(vrld): Likewise.
	(vsld): Likewise.
	(vsrad): Likewise.
	(vsrd): Likewise.
	(vsubudm): Likewise.

	* config/rs6000/rs6000.c (rs6000_init_hard_regno_mode_ok): Add
	support for power8 V2DI instructions.

	* config/rs6000/altivec.md (UNSPEC_VPKUHUM): Add support for
	power8 V2DI instructions.  Combine pack and unpack insns to use an
	iterator for each mode.  Check whether a particular mode supports
	Altivec instructions instead of just checking TARGET_ALTIVEC.
	(UNSPEC_VPKUWUM): Likewise.
	(UNSPEC_VPKSHSS): Likewise.
	(UNSPEC_VPKSWSS): Likewise.
	(UNSPEC_VPKUHUS): Likewise.
	(UNSPEC_VPKSHUS): Likewise.
	(UNSPEC_VPKUWUS): Likewise.
	(UNSPEC_VPKSWUS): Likewise.
	(UNSPEC_VPACK_SIGN_SIGN_SAT): Likewise.
	(UNSPEC_VPACK_SIGN_UNS_SAT): Likewise.
	(UNSPEC_VPACK_UNS_UNS_SAT): Likewise.
	(UNSPEC_VPACK_UNS_UNS_MOD): Likewise.
	(UNSPEC_VUPKHSB): Likewise.
	(UNSPEC_VUNPACK_HI_SIGN): Likewise.
	(UNSPEC_VUNPACK_LO_SIGN): Likewise.
	(UNSPEC_VUPKHSH): Likewise.
	(UNSPEC_VUPKLSB): Likewise.
	(UNSPEC_VUPKLSH): Likewise.
	(VI2): Likewise.
	(VI_char): Likewise.
	(VI_scalar): Likewise.
	(VI_unit): Likewise.
	(VP): Likewise.
	(VP_small): Likewise.
	(VP_small_lc): Likewise.
	(VU_char): Likewise.
	(add<mode>3): Likewise.
	(altivec_vaddcuw): Likewise.
	(altivec_vaddu<VI_char>s): Likewise.
	(altivec_vadds<VI_char>s): Likewise.
	(sub<mode>3): Likewise.
	(altivec_vsubcuw): Likewise.
	(altivec_vsubu<VI_char>s): Likewise.
	(altivec_vsubs<VI_char>s): Likewise.
	(altivec_vavgs<VI_char>): Likewise.
	(altivec_vcmpbfp): Likewise.
	(altivec_eq<mode>): Likewise.
	(altivec_gt<mode>): Likewise.
	(altivec_gtu<mode>): Likewise.
	(umax<mode>3): Likewise.
	(smax<mode>3): Likewise.
	(umin<mode>3): Likewise.
	(smin<mode>3): Likewise.
	(altivec_vpkuhum): Likewise.
	(altivec_vpkuwum): Likewise.
	(altivec_vpkshss): Likewise.
	(altivec_vpkswss): Likewise.
	(altivec_vpkuhus): Likewise.
	(altivec_vpkshus): Likewise.
	(altivec_vpkuwus): Likewise.
	(altivec_vpkswus): Likewise.
	(altivec_vpks<VI_char>ss): Likewise.
	(altivec_vpks<VI_char>us): Likewise.
	(altivec_vpku<VI_char>us): Likewise.
	(altivec_vpku<VI_char>um): Likewise.
	(altivec_vrl<VI_char>): Likewise.
	(altivec_vsl<VI_char>): Likewise.
	(altivec_vsr<VI_char>): Likewise.
	(altivec_vsra<VI_char>): Likewise.
	(altivec_vsldoi_<mode>): Likewise.
	(altivec_vupkhsb): Likewise.
	(altivec_vupkhs<VU_char>): Likewise.
	(altivec_vupkls<VU_char>): Likewise.
	(altivec_vupkhsh): Likewise.
	(altivec_vupklsb): Likewise.
	(altivec_vupklsh): Likewise.
	(altivec_vcmpequ<VI_char>_p): Likewise.
	(altivec_vcmpgts<VI_char>_p): Likewise.
	(altivec_vcmpgtu<VI_char>_p): Likewise.
	(abs<mode>2): Likewise.
	(vec_unpacks_hi_v16qi): Likewise.
	(vec_unpacks_hi_v8hi): Likewise.
	(vec_unpacks_lo_v16qi): Likewise.
	(vec_unpacks_hi_<VP_small_lc>): Likewise.
	(vec_unpacks_lo_v8hi): Likewise.
	(vec_unpacks_lo_<VP_small_lc>): Likewise.
	(vec_pack_trunc_v8h): Likewise.
	(vec_pack_trunc_v4si): Likewise.
	(vec_pack_trunc_<mode>): Likewise.

	* config/rs6000/altivec.h (vec_vaddudm): Add defines for power8
	V2DI builtins.
	(vec_vmaxsd): Likewise.
	(vec_vmaxud): Likewise.
	(vec_vminsd): Likewise.
	(vec_vminud): Likewise.
	(vec_vpksdss): Likewise.
	(vec_vpksdus): Likewise.
	(vec_vpkudum): Likewise.
	(vec_vpkudus): Likewise.
	(vec_vrld): Likewise.
	(vec_vsld): Likewise.
	(vec_vsrad): Likewise.
	(vec_vsrd): Likewise.
	(vec_vsubudm): Likewise.
	(vec_vupkhsw): Likewise.
	(vec_vupklsw): Likewise.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

[-- Attachment #2: gcc-power8.official-03b --]
[-- Type: text/plain, Size: 56864 bytes --]

Index: gcc/doc/extend.texi
===================================================================
--- gcc/doc/extend.texi	(revision 199128)
+++ gcc/doc/extend.texi	(working copy)
@@ -13937,6 +13937,143 @@ if the VSX instruction set is available.
 @samp{vec_vsx_st} built-in functions always generate the VSX @samp{LXVD2X},
 @samp{LXVW4X}, @samp{STXVD2X}, and @samp{STXVW4X} instructions.
 
+If the ISA 2.07 additions to the vector/scalar (power8-vector)
+instruction set is available, the following additional functions are
+available for both 32-bit and 64-bit targets.  For 64-bit targets, you
+can use @var{vector long} instead of @var{vector long long},
+@var{vector bool long} instead of @var{vector bool long long}, and
+@var{vector unsigned long} instead of @var{vector unsigned long long}.
+
+@smallexample
+vector long long vec_abs (vector long long);
+
+vector long long vec_add (vector long long, vector long long);
+vector unsigned long long vec_add (vector unsigned long long,
+                                   vector unsigned long long);
+
+int vec_all_eq (vector long long, vector long long);
+int vec_all_ge (vector long long, vector long long);
+int vec_all_gt (vector long long, vector long long);
+int vec_all_le (vector long long, vector long long);
+int vec_all_lt (vector long long, vector long long);
+int vec_all_ne (vector long long, vector long long);
+int vec_any_eq (vector long long, vector long long);
+int vec_any_ge (vector long long, vector long long);
+int vec_any_gt (vector long long, vector long long);
+int vec_any_le (vector long long, vector long long);
+int vec_any_lt (vector long long, vector long long);
+int vec_any_ne (vector long long, vector long long);
+
+vector long long vec_max (vector long long, vector long long);
+vector unsigned long long vec_max (vector unsigned long long,
+                                   vector unsigned long long);
+
+vector long long vec_min (vector long long, vector long long);
+vector unsigned long long vec_min (vector unsigned long long,
+                                   vector unsigned long long);
+
+vector int vec_pack (vector long long, vector long long);
+vector unsigned int vec_pack (vector unsigned long long,
+                              vector unsigned long long);
+vector bool int vec_pack (vector bool long long, vector bool long long);
+
+vector int vec_packs (vector long long, vector long long);
+vector unsigned int vec_packs (vector unsigned long long,
+                               vector unsigned long long);
+
+vector unsigned int vec_packsu (vector long long, vector long long);
+
+vector long long vec_rl (vector long long,
+                         vector unsigned long long);
+vector long long vec_rl (vector unsigned long long,
+                         vector unsigned long long);
+
+vector long long vec_sl (vector long long, vector unsigned long long);
+vector long long vec_sl (vector unsigned long long,
+                         vector unsigned long long);
+
+vector long long vec_sr (vector long long, vector unsigned long long);
+vector unsigned long long char vec_sr (vector unsigned long long,
+                                       vector unsigned long long);
+
+vector long long vec_sra (vector long long, vector unsigned long long);
+vector unsigned long long vec_sra (vector unsigned long long,
+                                   vector unsigned long long);
+
+vector long long vec_sub (vector long long, vector long long);
+vector unsigned long long vec_sub (vector unsigned long long,
+                                   vector unsigned long long);
+
+vector long long vec_unpackh (vector int);
+vector unsigned long long vec_unpackh (vector unsigned int);
+
+vector long long vec_unpackl (vector int);
+vector unsigned long long vec_unpackl (vector unsigned int);
+
+vector long long vec_vaddudm (vector long long, vector long long);
+vector long long vec_vaddudm (vector bool long long, vector long long);
+vector long long vec_vaddudm (vector long long, vector bool long long);
+vector unsigned long long vec_vaddudm (vector unsigned long long,
+                                       vector unsigned long long);
+vector unsigned long long vec_vaddudm (vector bool unsigned long long,
+                                       vector unsigned long long);
+vector unsigned long long vec_vaddudm (vector unsigned long long,
+                                       vector bool unsigned long long);
+
+vector long long vec_vmaxsd (vector long long, vector long long);
+
+vector unsigned long long vec_vmaxud (vector unsigned long long,
+                                      unsigned vector long long);
+
+vector long long vec_vminsd (vector long long, vector long long);
+
+vector unsigned long long vec_vminud (vector long long,
+                                      vector long long);
+
+vector int vec_vpksdss (vector long long, vector long long);
+vector unsigned int vec_vpksdss (vector long long, vector long long);
+
+vector unsigned int vec_vpkudus (vector unsigned long long,
+                                 vector unsigned long long);
+
+vector int vec_vpkudum (vector long long, vector long long);
+vector unsigned int vec_vpkudum (vector unsigned long long,
+                                 vector unsigned long long);
+vector bool int vec_vpkudum (vector bool long long, vector bool long long);
+
+vector long long vec_vrld (vector long long, vector unsigned long long);
+vector unsigned long long vec_vrld (vector unsigned long long,
+                                    vector unsigned long long);
+
+vector long long vec_vsld (vector long long, vector unsigned long long);
+vector long long vec_vsld (vector unsigned long long,
+                           vector unsigned long long);
+
+vector long long vec_vsrad (vector long long, vector unsigned long long);
+vector unsigned long long vec_vsrad (vector unsigned long long,
+                                     vector unsigned long long);
+
+vector long long vec_vsrd (vector long long, vector unsigned long long);
+vector unsigned long long char vec_vsrd (vector unsigned long long,
+                                         vector unsigned long long);
+
+vector long long vec_vsubudm (vector long long, vector long long);
+vector long long vec_vsubudm (vector bool long long, vector long long);
+vector long long vec_vsubudm (vector long long, vector bool long long);
+vector unsigned long long vec_vsubudm (vector unsigned long long,
+                                       vector unsigned long long);
+vector unsigned long long vec_vsubudm (vector bool long long,
+                                       vector unsigned long long);
+vector unsigned long long vec_vsubudm (vector unsigned long long,
+                                       vector bool long long);
+
+vector long long vec_vupkhsw (vector int);
+vector unsigned long long vec_vupkhsw (vector unsigned int);
+
+vector long long vec_vupklsw (vector int);
+vector unsigned long long vec_vupklsw (vector int);
+@end smallexample
+
 If the cryptographic instructions are enabled (@option{-mcrypto} or
 @option{-mcpu=power8}), the following builtins are enabled.
 
Index: gcc/config/rs6000/vector.md
===================================================================
--- gcc/config/rs6000/vector.md	(revision 199037)
+++ gcc/config/rs6000/vector.md	(working copy)
@@ -24,13 +24,13 @@
 
 
 ;; Vector int modes
-(define_mode_iterator VEC_I [V16QI V8HI V4SI])
+(define_mode_iterator VEC_I [V16QI V8HI V4SI V2DI])
 
 ;; Vector float modes
 (define_mode_iterator VEC_F [V4SF V2DF])
 
 ;; Vector arithmetic modes
-(define_mode_iterator VEC_A [V16QI V8HI V4SI V4SF V2DF])
+(define_mode_iterator VEC_A [V16QI V8HI V4SI V2DI V4SF V2DF])
 
 ;; Vector modes that need alginment via permutes
 (define_mode_iterator VEC_K [V16QI V8HI V4SI V4SF])
@@ -45,7 +45,7 @@ (define_mode_iterator VEC_M [V16QI V8HI 
 (define_mode_iterator VEC_N [V4SI V4SF V2DI V2DF])
 
 ;; Vector comparison modes
-(define_mode_iterator VEC_C [V16QI V8HI V4SI V4SF V2DF])
+(define_mode_iterator VEC_C [V16QI V8HI V4SI V2DI V4SF V2DF])
 
 ;; Vector init/extract modes
 (define_mode_iterator VEC_E [V16QI V8HI V4SI V2DI V4SF V2DF])
@@ -1074,7 +1074,7 @@ (define_expand "vrotl<mode>3"
   [(set (match_operand:VEC_I 0 "vint_operand" "")
 	(rotate:VEC_I (match_operand:VEC_I 1 "vint_operand" "")
 		      (match_operand:VEC_I 2 "vint_operand" "")))]
-  "TARGET_ALTIVEC"
+  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 
 ;; Expanders for arithmetic shift left on each vector element
@@ -1082,7 +1082,7 @@ (define_expand "vashl<mode>3"
   [(set (match_operand:VEC_I 0 "vint_operand" "")
 	(ashift:VEC_I (match_operand:VEC_I 1 "vint_operand" "")
 		      (match_operand:VEC_I 2 "vint_operand" "")))]
-  "TARGET_ALTIVEC"
+  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 
 ;; Expanders for logical shift right on each vector element
@@ -1090,7 +1090,7 @@ (define_expand "vlshr<mode>3"
   [(set (match_operand:VEC_I 0 "vint_operand" "")
 	(lshiftrt:VEC_I (match_operand:VEC_I 1 "vint_operand" "")
 			(match_operand:VEC_I 2 "vint_operand" "")))]
-  "TARGET_ALTIVEC"
+  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 
 ;; Expanders for arithmetic shift right on each vector element
@@ -1098,7 +1098,7 @@ (define_expand "vashr<mode>3"
   [(set (match_operand:VEC_I 0 "vint_operand" "")
 	(ashiftrt:VEC_I (match_operand:VEC_I 1 "vint_operand" "")
 			(match_operand:VEC_I 2 "vint_operand" "")))]
-  "TARGET_ALTIVEC"
+  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 \f
 ;; Vector reduction expanders for VSX
Index: gcc/config/rs6000/rs6000-c.c
===================================================================
--- gcc/config/rs6000/rs6000-c.c	(revision 199128)
+++ gcc/config/rs6000/rs6000-c.c	(working copy)
@@ -511,6 +511,8 @@ const struct altivec_builtin_types altiv
     RS6000_BTI_V8HI, RS6000_BTI_V8HI, 0, 0 },
   { ALTIVEC_BUILTIN_VEC_ABS, ALTIVEC_BUILTIN_ABS_V4SI,
     RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0, 0 },
+  { ALTIVEC_BUILTIN_VEC_ABS, P8V_BUILTIN_ABS_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0, 0 },
   { ALTIVEC_BUILTIN_VEC_ABS, ALTIVEC_BUILTIN_ABS_V4SF,
     RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0, 0 },
   { ALTIVEC_BUILTIN_VEC_ABS, VSX_BUILTIN_XVABSDP,
@@ -583,12 +585,24 @@ const struct altivec_builtin_types altiv
     RS6000_BTI_V4SI, RS6000_BTI_V8HI, 0, 0 },
   { ALTIVEC_BUILTIN_VEC_UNPACKH, ALTIVEC_BUILTIN_VUPKHSH,
     RS6000_BTI_bool_V4SI, RS6000_BTI_bool_V8HI, 0, 0 },
+  { ALTIVEC_BUILTIN_VEC_UNPACKH, P8V_BUILTIN_VUPKHSW,
+    RS6000_BTI_V2DI, RS6000_BTI_V4SI, 0, 0 },
+  { ALTIVEC_BUILTIN_VEC_UNPACKH, P8V_BUILTIN_VUPKHSW,
+    RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V4SI, 0, 0 },
   { ALTIVEC_BUILTIN_VEC_UNPACKH, ALTIVEC_BUILTIN_VUPKHPX,
     RS6000_BTI_unsigned_V4SI, RS6000_BTI_pixel_V8HI, 0, 0 },
   { ALTIVEC_BUILTIN_VEC_VUPKHSH, ALTIVEC_BUILTIN_VUPKHSH,
     RS6000_BTI_V4SI, RS6000_BTI_V8HI, 0, 0 },
   { ALTIVEC_BUILTIN_VEC_VUPKHSH, ALTIVEC_BUILTIN_VUPKHSH,
     RS6000_BTI_bool_V4SI, RS6000_BTI_bool_V8HI, 0, 0 },
+  { ALTIVEC_BUILTIN_VEC_UNPACKH, P8V_BUILTIN_VUPKHSW,
+    RS6000_BTI_V2DI, RS6000_BTI_V4SI, 0, 0 },
+  { ALTIVEC_BUILTIN_VEC_UNPACKH, P8V_BUILTIN_VUPKHSW,
+    RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V4SI, 0, 0 },
+  { ALTIVEC_BUILTIN_VEC_VUPKHSH, P8V_BUILTIN_VUPKHSW,
+    RS6000_BTI_V2DI, RS6000_BTI_V4SI, 0, 0 },
+  { ALTIVEC_BUILTIN_VEC_VUPKHSH, P8V_BUILTIN_VUPKHSW,
+    RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V4SI, 0, 0 },
   { ALTIVEC_BUILTIN_VEC_VUPKHPX, ALTIVEC_BUILTIN_VUPKHPX,
     RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V8HI, 0, 0 },
   { ALTIVEC_BUILTIN_VEC_VUPKHPX, ALTIVEC_BUILTIN_VUPKHPX,
@@ -607,6 +621,10 @@ const struct altivec_builtin_types altiv
     RS6000_BTI_V4SI, RS6000_BTI_V8HI, 0, 0 },
   { ALTIVEC_BUILTIN_VEC_UNPACKL, ALTIVEC_BUILTIN_VUPKLSH,
     RS6000_BTI_bool_V4SI, RS6000_BTI_bool_V8HI, 0, 0 },
+  { ALTIVEC_BUILTIN_VEC_UNPACKL, P8V_BUILTIN_VUPKLSW,
+    RS6000_BTI_V2DI, RS6000_BTI_V4SI, 0, 0 },
+  { ALTIVEC_BUILTIN_VEC_UNPACKL, P8V_BUILTIN_VUPKLSW,
+    RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V4SI, 0, 0 },
   { ALTIVEC_BUILTIN_VEC_VUPKLPX, ALTIVEC_BUILTIN_VUPKLPX,
     RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V8HI, 0, 0 },
   { ALTIVEC_BUILTIN_VEC_VUPKLPX, ALTIVEC_BUILTIN_VUPKLPX,
@@ -657,6 +675,18 @@ const struct altivec_builtin_types altiv
     RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, RS6000_BTI_bool_V4SI, 0 },
   { ALTIVEC_BUILTIN_VEC_ADD, ALTIVEC_BUILTIN_VADDUWM,
     RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, 0 },
+  { ALTIVEC_BUILTIN_VEC_ADD, P8V_BUILTIN_VADDUDM,
+    RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_ADD, P8V_BUILTIN_VADDUDM,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_ADD, P8V_BUILTIN_VADDUDM,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_ADD, P8V_BUILTIN_VADDUDM,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_ADD, P8V_BUILTIN_VADDUDM,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_ADD, P8V_BUILTIN_VADDUDM,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_ADD, ALTIVEC_BUILTIN_VADDFP,
     RS6000_BTI_V4SF, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
   { ALTIVEC_BUILTIN_VEC_ADD, VSX_BUILTIN_XVADDDP,
@@ -943,6 +973,10 @@ const struct altivec_builtin_types altiv
     RS6000_BTI_bool_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPEQ, ALTIVEC_BUILTIN_VCMPEQUW,
     RS6000_BTI_bool_V4SI, RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, 0 },
+  { ALTIVEC_BUILTIN_VEC_CMPEQ, P8V_BUILTIN_VCMPEQUD,
+    RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_CMPEQ, P8V_BUILTIN_VCMPEQUD,
+    RS6000_BTI_bool_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPEQ, ALTIVEC_BUILTIN_VCMPEQFP,
     RS6000_BTI_bool_V4SI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPEQ, VSX_BUILTIN_XVCMPEQDP,
@@ -981,6 +1015,10 @@ const struct altivec_builtin_types altiv
     RS6000_BTI_bool_V4SI, RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPGT, ALTIVEC_BUILTIN_VCMPGTSW,
     RS6000_BTI_bool_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
+  { ALTIVEC_BUILTIN_VEC_CMPGT, P8V_BUILTIN_VCMPGTUD,
+    RS6000_BTI_bool_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_CMPGT, P8V_BUILTIN_VCMPGTSD,
+    RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPGT, ALTIVEC_BUILTIN_VCMPGTFP,
     RS6000_BTI_bool_V4SI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPGT, VSX_BUILTIN_XVCMPGTDP,
@@ -1027,6 +1065,10 @@ const struct altivec_builtin_types altiv
     RS6000_BTI_bool_V4SI, RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPLT, ALTIVEC_BUILTIN_VCMPGTSW,
     RS6000_BTI_bool_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
+  { ALTIVEC_BUILTIN_VEC_CMPLT, P8V_BUILTIN_VCMPGTUD,
+    RS6000_BTI_bool_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_CMPLT, P8V_BUILTIN_VCMPGTSD,
+    RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPLT, ALTIVEC_BUILTIN_VCMPGTFP,
     RS6000_BTI_bool_V4SI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPLT, VSX_BUILTIN_XVCMPGTDP,
@@ -1424,6 +1466,18 @@ const struct altivec_builtin_types altiv
     RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_bool_V4SI, 0 },
   { ALTIVEC_BUILTIN_VEC_MAX, ALTIVEC_BUILTIN_VMAXSW,
     RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
+  { ALTIVEC_BUILTIN_VEC_MAX, P8V_BUILTIN_VMAXUD,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_MAX, P8V_BUILTIN_VMAXUD,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_MAX, P8V_BUILTIN_VMAXUD,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_MAX, P8V_BUILTIN_VMAXSD,
+    RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_MAX, P8V_BUILTIN_VMAXSD,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_MAX, P8V_BUILTIN_VMAXSD,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_MAX, ALTIVEC_BUILTIN_VMAXFP,
     RS6000_BTI_V4SF, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
   { ALTIVEC_BUILTIN_VEC_MAX, VSX_BUILTIN_XVMAXDP,
@@ -1610,6 +1664,18 @@ const struct altivec_builtin_types altiv
     RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_bool_V4SI, 0 },
   { ALTIVEC_BUILTIN_VEC_MIN, ALTIVEC_BUILTIN_VMINSW,
     RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
+  { ALTIVEC_BUILTIN_VEC_MIN, P8V_BUILTIN_VMINUD,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_MIN, P8V_BUILTIN_VMINUD,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_MIN, P8V_BUILTIN_VMINUD,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_MIN, P8V_BUILTIN_VMINSD,
+    RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_MIN, P8V_BUILTIN_VMINSD,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_MIN, P8V_BUILTIN_VMINSD,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_MIN, ALTIVEC_BUILTIN_VMINFP,
     RS6000_BTI_V4SF, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
   { ALTIVEC_BUILTIN_VEC_MIN, VSX_BUILTIN_XVMINDP,
@@ -1792,6 +1858,12 @@ const struct altivec_builtin_types altiv
     RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, 0 },
   { ALTIVEC_BUILTIN_VEC_PACK, ALTIVEC_BUILTIN_VPKUWUM,
     RS6000_BTI_bool_V8HI, RS6000_BTI_bool_V4SI, RS6000_BTI_bool_V4SI, 0 },
+  { ALTIVEC_BUILTIN_VEC_PACK, P8V_BUILTIN_VPKUDUM,
+    RS6000_BTI_V4SI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_PACK, P8V_BUILTIN_VPKUDUM,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_PACK, P8V_BUILTIN_VPKUDUM,
+    RS6000_BTI_bool_V4SI, RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_VPKUWUM, ALTIVEC_BUILTIN_VPKUWUM,
     RS6000_BTI_V8HI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
   { ALTIVEC_BUILTIN_VEC_VPKUWUM, ALTIVEC_BUILTIN_VPKUWUM,
@@ -1818,6 +1890,10 @@ const struct altivec_builtin_types altiv
     RS6000_BTI_V8HI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
   { ALTIVEC_BUILTIN_VEC_VPKUWUS, ALTIVEC_BUILTIN_VPKUWUS,
     RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, 0 },
+  { ALTIVEC_BUILTIN_VEC_PACKS, P8V_BUILTIN_VPKUDUS,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_PACKS, P8V_BUILTIN_VPKSDSS,
+    RS6000_BTI_V4SI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_VPKSHSS, ALTIVEC_BUILTIN_VPKSHSS,
     RS6000_BTI_V16QI, RS6000_BTI_V8HI, RS6000_BTI_V8HI, 0 },
   { ALTIVEC_BUILTIN_VEC_VPKUHUS, ALTIVEC_BUILTIN_VPKUHUS,
@@ -1830,6 +1906,8 @@ const struct altivec_builtin_types altiv
     RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, 0 },
   { ALTIVEC_BUILTIN_VEC_PACKSU, ALTIVEC_BUILTIN_VPKSWUS,
     RS6000_BTI_unsigned_V8HI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
+  { ALTIVEC_BUILTIN_VEC_PACKSU, P8V_BUILTIN_VPKSDUS,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_VPKSWUS, ALTIVEC_BUILTIN_VPKSWUS,
     RS6000_BTI_unsigned_V8HI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
   { ALTIVEC_BUILTIN_VEC_VPKSHUS, ALTIVEC_BUILTIN_VPKSHUS,
@@ -1850,6 +1928,10 @@ const struct altivec_builtin_types altiv
     RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_unsigned_V4SI, 0 },
   { ALTIVEC_BUILTIN_VEC_RL, ALTIVEC_BUILTIN_VRLW,
     RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, 0 },
+  { ALTIVEC_BUILTIN_VEC_RL, P8V_BUILTIN_VRLD,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_RL, P8V_BUILTIN_VRLD,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_VRLW, ALTIVEC_BUILTIN_VRLW,
     RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_unsigned_V4SI, 0 },
   { ALTIVEC_BUILTIN_VEC_VRLW, ALTIVEC_BUILTIN_VRLW,
@@ -1874,6 +1956,10 @@ const struct altivec_builtin_types altiv
     RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_unsigned_V4SI, 0 },
   { ALTIVEC_BUILTIN_VEC_SL, ALTIVEC_BUILTIN_VSLW,
     RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, 0 },
+  { ALTIVEC_BUILTIN_VEC_SL, P8V_BUILTIN_VSLD,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_SL, P8V_BUILTIN_VSLD,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_SQRT, VSX_BUILTIN_XVSQRTDP,
     RS6000_BTI_V2DF, RS6000_BTI_V2DF, 0, 0 },
   { ALTIVEC_BUILTIN_VEC_SQRT, VSX_BUILTIN_XVSQRTSP,
@@ -2038,6 +2124,10 @@ const struct altivec_builtin_types altiv
     RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_unsigned_V4SI, 0 },
   { ALTIVEC_BUILTIN_VEC_SR, ALTIVEC_BUILTIN_VSRW,
     RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, 0 },
+  { ALTIVEC_BUILTIN_VEC_SR, P8V_BUILTIN_VSRD,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_SR, P8V_BUILTIN_VSRD,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_VSRW, ALTIVEC_BUILTIN_VSRW,
     RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_unsigned_V4SI, 0 },
   { ALTIVEC_BUILTIN_VEC_VSRW, ALTIVEC_BUILTIN_VSRW,
@@ -2062,6 +2152,10 @@ const struct altivec_builtin_types altiv
     RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_unsigned_V4SI, 0 },
   { ALTIVEC_BUILTIN_VEC_SRA, ALTIVEC_BUILTIN_VSRAW,
     RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, 0 },
+  { ALTIVEC_BUILTIN_VEC_SRA, P8V_BUILTIN_VSRAD,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_SRA, P8V_BUILTIN_VSRD,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_VSRAW, ALTIVEC_BUILTIN_VSRAW,
     RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_unsigned_V4SI, 0 },
   { ALTIVEC_BUILTIN_VEC_VSRAW, ALTIVEC_BUILTIN_VSRAW,
@@ -2202,6 +2296,18 @@ const struct altivec_builtin_types altiv
     RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, RS6000_BTI_bool_V4SI, 0 },
   { ALTIVEC_BUILTIN_VEC_SUB, ALTIVEC_BUILTIN_VSUBUWM,
     RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, 0 },
+  { ALTIVEC_BUILTIN_VEC_SUB, P8V_BUILTIN_VSUBUDM,
+    RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_SUB, P8V_BUILTIN_VSUBUDM,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_SUB, P8V_BUILTIN_VSUBUDM,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_SUB, P8V_BUILTIN_VSUBUDM,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_SUB, P8V_BUILTIN_VSUBUDM,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_SUB, P8V_BUILTIN_VSUBUDM,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_SUB, ALTIVEC_BUILTIN_VSUBFP,
     RS6000_BTI_V4SF, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
   { ALTIVEC_BUILTIN_VEC_SUB, VSX_BUILTIN_XVSUBDP,
@@ -3333,6 +3439,20 @@ const struct altivec_builtin_types altiv
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V4SI, RS6000_BTI_V4SI },
   { ALTIVEC_BUILTIN_VEC_VCMPEQ_P, ALTIVEC_BUILTIN_VCMPEQUW_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_bool_V4SI, RS6000_BTI_bool_V4SI },
+  { ALTIVEC_BUILTIN_VEC_VCMPEQ_P, P8V_BUILTIN_VCMPEQUD_P,
+    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_bool_V2DI, RS6000_BTI_unsigned_V2DI },
+  { ALTIVEC_BUILTIN_VEC_VCMPEQ_P, P8V_BUILTIN_VCMPEQUD_P,
+    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI },
+  { ALTIVEC_BUILTIN_VEC_VCMPEQ_P, P8V_BUILTIN_VCMPEQUD_P,
+    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI },
+  { ALTIVEC_BUILTIN_VEC_VCMPEQ_P, P8V_BUILTIN_VCMPEQUD_P,
+    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI },
+  { ALTIVEC_BUILTIN_VEC_VCMPEQ_P, P8V_BUILTIN_VCMPEQUD_P,
+    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI },
+  { ALTIVEC_BUILTIN_VEC_VCMPEQ_P, P8V_BUILTIN_VCMPEQUD_P,
+    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V2DI, RS6000_BTI_V2DI },
+  { ALTIVEC_BUILTIN_VEC_VCMPEQ_P, P8V_BUILTIN_VCMPEQUD_P,
+    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V2DI },
   { ALTIVEC_BUILTIN_VEC_VCMPEQ_P, ALTIVEC_BUILTIN_VCMPEQFP_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF },
   { ALTIVEC_BUILTIN_VEC_VCMPEQ_P, VSX_BUILTIN_XVCMPEQDP_P,
@@ -3378,6 +3498,18 @@ const struct altivec_builtin_types altiv
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V4SI, RS6000_BTI_bool_V4SI },
   { ALTIVEC_BUILTIN_VEC_VCMPGE_P, ALTIVEC_BUILTIN_VCMPGTSW_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V4SI, RS6000_BTI_V4SI },
+  { ALTIVEC_BUILTIN_VEC_VCMPGE_P, P8V_BUILTIN_VCMPGTUD_P,
+    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_bool_V2DI, RS6000_BTI_unsigned_V2DI },
+  { ALTIVEC_BUILTIN_VEC_VCMPGE_P, P8V_BUILTIN_VCMPGTUD_P,
+    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI },
+  { ALTIVEC_BUILTIN_VEC_VCMPGE_P, P8V_BUILTIN_VCMPGTUD_P,
+    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI },
+  { ALTIVEC_BUILTIN_VEC_VCMPGE_P, P8V_BUILTIN_VCMPGTSD_P,
+    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI },
+  { ALTIVEC_BUILTIN_VEC_VCMPGE_P, P8V_BUILTIN_VCMPGTSD_P,
+    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI },
+  { ALTIVEC_BUILTIN_VEC_VCMPGE_P, P8V_BUILTIN_VCMPGTSD_P,
+    RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V2DI, RS6000_BTI_V2DI },
   { ALTIVEC_BUILTIN_VEC_VCMPGE_P, ALTIVEC_BUILTIN_VCMPGEFP_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF },
   { ALTIVEC_BUILTIN_VEC_VCMPGE_P, VSX_BUILTIN_XVCMPGEDP_P,
Index: gcc/config/rs6000/rs6000-builtin.def
===================================================================
--- gcc/config/rs6000/rs6000-builtin.def	(revision 199128)
+++ gcc/config/rs6000/rs6000-builtin.def	(working copy)
@@ -1234,6 +1234,58 @@ BU_VSX_OVERLOAD_2 (XXSPLTW,  "xxspltw")
 BU_VSX_OVERLOAD_X (LD,	     "ld")
 BU_VSX_OVERLOAD_X (ST,	     "st")
 \f
+/* 1 argument altivec instructions added in ISA 2.07.  */
+BU_P8V_AV_1 (ABS_V2DI,	      "abs_v2di",	CONST,	absv2di2)
+BU_P8V_AV_1 (VUPKHSW,	      "vupkhsw",	CONST,	altivec_vupkhsw)
+BU_P8V_AV_1 (VUPKLSW,	      "vupklsw",	CONST,	altivec_vupklsw)
+
+/* 2 argument altivec instructions added in ISA 2.07.  */
+BU_P8V_AV_2 (VADDUDM,		"vaddudm",	CONST,	addv2di3)
+BU_P8V_AV_2 (VMINSD,		"vminsd",	CONST,	sminv2di3)
+BU_P8V_AV_2 (VMAXSD,		"vmaxsd",	CONST,	smaxv2di3)
+BU_P8V_AV_2 (VMINUD,		"vminud",	CONST,	uminv2di3)
+BU_P8V_AV_2 (VMAXUD,		"vmaxud",	CONST,	umaxv2di3)
+BU_P8V_AV_2 (VPKUDUM,		"vpkudum",	CONST,	altivec_vpkudum)
+BU_P8V_AV_2 (VPKSDSS,		"vpksdss",	CONST,	altivec_vpksdss)
+BU_P8V_AV_2 (VPKUDUS,		"vpkudus",	CONST,	altivec_vpkudus)
+BU_P8V_AV_2 (VPKSDUS,		"vpksdus",	CONST,	altivec_vpkswus)
+BU_P8V_AV_2 (VRLD,		"vrld",		CONST,	vrotlv2di3)
+BU_P8V_AV_2 (VSLD,		"vsld",		CONST,	vashlv2di3)
+BU_P8V_AV_2 (VSRD,		"vsrd",		CONST,	vlshrv2di3)
+BU_P8V_AV_2 (VSRAD,		"vsrad",	CONST,	vashrv2di3)
+BU_P8V_AV_2 (VSUBUDM,		"vsubudm",	CONST,	subv2di3)
+
+/* Vector comparison instructions added in ISA 2.07.  */
+BU_P8V_AV_2 (VCMPEQUD,		"vcmpequd",	CONST,	vector_eqv2di)
+BU_P8V_AV_2 (VCMPGTSD,		"vcmpgtsd",	CONST,	vector_gtv2di)
+BU_P8V_AV_2 (VCMPGTUD,		"vcmpgtud",	CONST,	vector_gtuv2di)
+
+/* Vector comparison predicate instructions added in ISA 2.07.  */
+BU_P8V_AV_P (VCMPEQUD_P,	"vcmpequd_p",	CONST,	vector_eq_v2di_p)
+BU_P8V_AV_P (VCMPGTSD_P,	"vcmpgtsd_p",	CONST,	vector_gt_v2di_p)
+BU_P8V_AV_P (VCMPGTUD_P,	"vcmpgtud_p",	CONST,	vector_gtu_v2di_p)
+
+/* Power8 vector overloaded 1 argument functions.  */
+BU_P8V_OVERLOAD_1 (VUPKHSW,	"vupkhsw")
+BU_P8V_OVERLOAD_1 (VUPKLSW,	"vupklsw")
+
+/* Power8 vector overloaded 2 argument functions.  */
+BU_P8V_OVERLOAD_2 (VADDUDM,	"vaddudm")
+BU_P8V_OVERLOAD_2 (VMAXSD,	"vmaxsd")
+BU_P8V_OVERLOAD_2 (VMAXUD,	"vmaxud")
+BU_P8V_OVERLOAD_2 (VMINSD,	"vminsd")
+BU_P8V_OVERLOAD_2 (VMINUD,	"vminud")
+BU_P8V_OVERLOAD_2 (VPKSDSS,	"vpksdss")
+BU_P8V_OVERLOAD_2 (VPKSDUS,	"vpksdus")
+BU_P8V_OVERLOAD_2 (VPKUDUM,	"vpkudum")
+BU_P8V_OVERLOAD_2 (VPKUDUS,	"vpkudus")
+BU_P8V_OVERLOAD_2 (VRLD,	"vrld")
+BU_P8V_OVERLOAD_2 (VSLD,	"vsld")
+BU_P8V_OVERLOAD_2 (VSRAD,	"vsrad")
+BU_P8V_OVERLOAD_2 (VSRD,	"vsrd")
+BU_P8V_OVERLOAD_2 (VSUBUDM,	"vsubudm")
+
+\f
 /* 1 argument crypto functions.  */
 BU_CRYPTO_1 (VSBOX,		"vsbox",	  CONST, crypto_vsbox)
 
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 199128)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -2226,12 +2226,13 @@ rs6000_init_hard_regno_mode_ok (bool glo
 	}
     }
 
-  /* V2DImode, only allow under VSX, which can do V2DI insert/splat/extract.
-     Altivec doesn't have 64-bit support.  */
+  /* V2DImode, full mode depends on power8 vector mode.  Allow under VSX to do
+     insert/splat/extract.  Altivec doesn't have 64-bit integer support.  */
   if (TARGET_VSX)
     {
       rs6000_vector_mem[V2DImode] = VECTOR_VSX;
-      rs6000_vector_unit[V2DImode] = VECTOR_NONE;
+      rs6000_vector_unit[V2DImode]
+	= (TARGET_P8_VECTOR) ? VECTOR_P8_VECTOR : VECTOR_NONE;
       rs6000_vector_align[V2DImode] = align64;
     }
 
Index: gcc/config/rs6000/altivec.md
===================================================================
--- gcc/config/rs6000/altivec.md	(revision 199037)
+++ gcc/config/rs6000/altivec.md	(working copy)
@@ -41,15 +41,11 @@ (define_c_enum "unspec"
    UNSPEC_VMULOSB
    UNSPEC_VMULOUH
    UNSPEC_VMULOSH
-   UNSPEC_VPKUHUM
-   UNSPEC_VPKUWUM
    UNSPEC_VPKPX
-   UNSPEC_VPKSHSS
-   UNSPEC_VPKSWSS
-   UNSPEC_VPKUHUS
-   UNSPEC_VPKSHUS
-   UNSPEC_VPKUWUS
-   UNSPEC_VPKSWUS
+   UNSPEC_VPACK_SIGN_SIGN_SAT
+   UNSPEC_VPACK_SIGN_UNS_SAT
+   UNSPEC_VPACK_UNS_UNS_SAT
+   UNSPEC_VPACK_UNS_UNS_MOD
    UNSPEC_VSLV4SI
    UNSPEC_VSLO
    UNSPEC_VSR
@@ -71,12 +67,10 @@ (define_c_enum "unspec"
    UNSPEC_VLOGEFP
    UNSPEC_VEXPTEFP
    UNSPEC_VLSDOI
-   UNSPEC_VUPKHSB
+   UNSPEC_VUNPACK_HI_SIGN
+   UNSPEC_VUNPACK_LO_SIGN
    UNSPEC_VUPKHPX
-   UNSPEC_VUPKHSH
-   UNSPEC_VUPKLSB
    UNSPEC_VUPKLPX
-   UNSPEC_VUPKLSH
    UNSPEC_DST
    UNSPEC_DSTT
    UNSPEC_DSTST
@@ -146,6 +140,8 @@ (define_c_enum "unspecv"
 
 ;; Vec int modes
 (define_mode_iterator VI [V4SI V8HI V16QI])
+;; Like VI, but add ISA 2.07 integer vector ops
+(define_mode_iterator VI2 [V4SI V8HI V16QI V2DI])
 ;; Short vec in modes
 (define_mode_iterator VIshort [V8HI V16QI])
 ;; Vec float modes
@@ -159,8 +155,18 @@ (define_mode_iterator VM [V4SI V8HI V16Q
 ;; Like VM, except don't do TImode
 (define_mode_iterator VM2 [V4SI V8HI V16QI V4SF V2DF V2DI])
 
-(define_mode_attr VI_char [(V4SI "w") (V8HI "h") (V16QI "b")])
-(define_mode_attr VI_scalar [(V4SI "SI") (V8HI "HI") (V16QI "QI")])
+(define_mode_attr VI_char [(V2DI "d") (V4SI "w") (V8HI "h") (V16QI "b")])
+(define_mode_attr VI_scalar [(V2DI "DI") (V4SI "SI") (V8HI "HI") (V16QI "QI")])
+(define_mode_attr VI_unit [(V16QI "VECTOR_UNIT_ALTIVEC_P (V16QImode)")
+			   (V8HI "VECTOR_UNIT_ALTIVEC_P (V8HImode)")
+			   (V4SI "VECTOR_UNIT_ALTIVEC_P (V4SImode)")
+			   (V2DI "VECTOR_UNIT_P8_VECTOR_P (V2DImode)")])
+
+;; Vector pack/unpack
+(define_mode_iterator VP [V2DI V4SI V8HI])
+(define_mode_attr VP_small [(V2DI "V4SI") (V4SI "V8HI") (V8HI "V16QI")])
+(define_mode_attr VP_small_lc [(V2DI "v4si") (V4SI "v8hi") (V8HI "v16qi")])
+(define_mode_attr VU_char [(V2DI "w") (V4SI "h") (V8HI "b")])
 
 ;; Vector move instructions.
 (define_insn "*altivec_mov<mode>"
@@ -378,10 +384,10 @@ (define_insn "*restore_vregs_<mode>_r12"
 
 ;; add
 (define_insn "add<mode>3"
-  [(set (match_operand:VI 0 "register_operand" "=v")
-        (plus:VI (match_operand:VI 1 "register_operand" "v")
-                 (match_operand:VI 2 "register_operand" "v")))]
-  "TARGET_ALTIVEC"
+  [(set (match_operand:VI2 0 "register_operand" "=v")
+        (plus:VI2 (match_operand:VI2 1 "register_operand" "v")
+		  (match_operand:VI2 2 "register_operand" "v")))]
+  "<VI_unit>"
   "vaddu<VI_char>m %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
@@ -398,17 +404,17 @@ (define_insn "altivec_vaddcuw"
         (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v")
                       (match_operand:V4SI 2 "register_operand" "v")]
 		     UNSPEC_VADDCUW))]
-  "TARGET_ALTIVEC"
+  "VECTOR_UNIT_ALTIVEC_P (V4SImode)"
   "vaddcuw %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
 (define_insn "altivec_vaddu<VI_char>s"
   [(set (match_operand:VI 0 "register_operand" "=v")
         (unspec:VI [(match_operand:VI 1 "register_operand" "v")
-                    (match_operand:VI 2 "register_operand" "v")]
+		    (match_operand:VI 2 "register_operand" "v")]
 		   UNSPEC_VADDU))
    (set (reg:SI 110) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR))]
-  "TARGET_ALTIVEC"
+  "<VI_unit>"
   "vaddu<VI_char>s %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
@@ -418,16 +424,16 @@ (define_insn "altivec_vadds<VI_char>s"
                     (match_operand:VI 2 "register_operand" "v")]
 		   UNSPEC_VADDS))
    (set (reg:SI 110) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR))]
-  "TARGET_ALTIVEC"
+  "VECTOR_UNIT_ALTIVEC_P (<MODE>mode)"
   "vadds<VI_char>s %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
 ;; sub
 (define_insn "sub<mode>3"
-  [(set (match_operand:VI 0 "register_operand" "=v")
-        (minus:VI (match_operand:VI 1 "register_operand" "v")
-                  (match_operand:VI 2 "register_operand" "v")))]
-  "TARGET_ALTIVEC"
+  [(set (match_operand:VI2 0 "register_operand" "=v")
+        (minus:VI2 (match_operand:VI2 1 "register_operand" "v")
+		   (match_operand:VI2 2 "register_operand" "v")))]
+  "<VI_unit>"
   "vsubu<VI_char>m %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
@@ -444,7 +450,7 @@ (define_insn "altivec_vsubcuw"
         (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v")
                       (match_operand:V4SI 2 "register_operand" "v")]
 		     UNSPEC_VSUBCUW))]
-  "TARGET_ALTIVEC"
+  "VECTOR_UNIT_ALTIVEC_P (V4SImode)"
   "vsubcuw %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
@@ -454,7 +460,7 @@ (define_insn "altivec_vsubu<VI_char>s"
                     (match_operand:VI 2 "register_operand" "v")]
 		   UNSPEC_VSUBU))
    (set (reg:SI 110) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR))]
-  "TARGET_ALTIVEC"
+  "VECTOR_UNIT_ALTIVEC_P (<MODE>mode)"
   "vsubu<VI_char>s %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
@@ -464,7 +470,7 @@ (define_insn "altivec_vsubs<VI_char>s"
                     (match_operand:VI 2 "register_operand" "v")]
 		   UNSPEC_VSUBS))
    (set (reg:SI 110) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR))]
-  "TARGET_ALTIVEC"
+  "VECTOR_UNIT_ALTIVEC_P (<MODE>mode)"
   "vsubs<VI_char>s %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
@@ -483,7 +489,7 @@ (define_insn "altivec_vavgs<VI_char>"
         (unspec:VI [(match_operand:VI 1 "register_operand" "v")
                     (match_operand:VI 2 "register_operand" "v")]
 		   UNSPEC_VAVGS))]
-  "TARGET_ALTIVEC"
+  "VECTOR_UNIT_ALTIVEC_P (<MODE>mode)"
   "vavgs<VI_char> %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
@@ -492,31 +498,31 @@ (define_insn "altivec_vcmpbfp"
         (unspec:V4SI [(match_operand:V4SF 1 "register_operand" "v")
                       (match_operand:V4SF 2 "register_operand" "v")] 
                       UNSPEC_VCMPBFP))]
-  "TARGET_ALTIVEC"
+  "VECTOR_UNIT_ALTIVEC_P (V4SImode)"
   "vcmpbfp %0,%1,%2"
   [(set_attr "type" "veccmp")])
 
 (define_insn "*altivec_eq<mode>"
-  [(set (match_operand:VI 0 "altivec_register_operand" "=v")
-	(eq:VI (match_operand:VI 1 "altivec_register_operand" "v")
-	       (match_operand:VI 2 "altivec_register_operand" "v")))]
-  "TARGET_ALTIVEC"
+  [(set (match_operand:VI2 0 "altivec_register_operand" "=v")
+	(eq:VI2 (match_operand:VI2 1 "altivec_register_operand" "v")
+		(match_operand:VI2 2 "altivec_register_operand" "v")))]
+  "<VI_unit>"
   "vcmpequ<VI_char> %0,%1,%2"
   [(set_attr "type" "veccmp")])
 
 (define_insn "*altivec_gt<mode>"
-  [(set (match_operand:VI 0 "altivec_register_operand" "=v")
-	(gt:VI (match_operand:VI 1 "altivec_register_operand" "v")
-	       (match_operand:VI 2 "altivec_register_operand" "v")))]
-  "TARGET_ALTIVEC"
+  [(set (match_operand:VI2 0 "altivec_register_operand" "=v")
+	(gt:VI2 (match_operand:VI2 1 "altivec_register_operand" "v")
+		(match_operand:VI2 2 "altivec_register_operand" "v")))]
+  "<VI_unit>"
   "vcmpgts<VI_char> %0,%1,%2"
   [(set_attr "type" "veccmp")])
 
 (define_insn "*altivec_gtu<mode>"
-  [(set (match_operand:VI 0 "altivec_register_operand" "=v")
-	(gtu:VI (match_operand:VI 1 "altivec_register_operand" "v")
-		(match_operand:VI 2 "altivec_register_operand" "v")))]
-  "TARGET_ALTIVEC"
+  [(set (match_operand:VI2 0 "altivec_register_operand" "=v")
+	(gtu:VI2 (match_operand:VI2 1 "altivec_register_operand" "v")
+		 (match_operand:VI2 2 "altivec_register_operand" "v")))]
+  "<VI_unit>"
   "vcmpgtu<VI_char> %0,%1,%2"
   [(set_attr "type" "veccmp")])
 
@@ -744,18 +750,18 @@ (define_insn "altivec_vmsumshs"
 ;; max
 
 (define_insn "umax<mode>3"
-  [(set (match_operand:VI 0 "register_operand" "=v")
-        (umax:VI (match_operand:VI 1 "register_operand" "v")
-                 (match_operand:VI 2 "register_operand" "v")))]
-  "TARGET_ALTIVEC"
+  [(set (match_operand:VI2 0 "register_operand" "=v")
+        (umax:VI2 (match_operand:VI2 1 "register_operand" "v")
+		  (match_operand:VI2 2 "register_operand" "v")))]
+  "<VI_unit>"
   "vmaxu<VI_char> %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
 (define_insn "smax<mode>3"
-  [(set (match_operand:VI 0 "register_operand" "=v")
-        (smax:VI (match_operand:VI 1 "register_operand" "v")
-                 (match_operand:VI 2 "register_operand" "v")))]
-  "TARGET_ALTIVEC"
+  [(set (match_operand:VI2 0 "register_operand" "=v")
+        (smax:VI2 (match_operand:VI2 1 "register_operand" "v")
+		  (match_operand:VI2 2 "register_operand" "v")))]
+  "<VI_unit>"
   "vmaxs<VI_char> %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
@@ -768,18 +774,18 @@ (define_insn "*altivec_smaxv4sf3"
   [(set_attr "type" "veccmp")])
 
 (define_insn "umin<mode>3"
-  [(set (match_operand:VI 0 "register_operand" "=v")
-        (umin:VI (match_operand:VI 1 "register_operand" "v")
-                 (match_operand:VI 2 "register_operand" "v")))]
-  "TARGET_ALTIVEC"
+  [(set (match_operand:VI2 0 "register_operand" "=v")
+        (umin:VI2 (match_operand:VI2 1 "register_operand" "v")
+		  (match_operand:VI2 2 "register_operand" "v")))]
+  "<VI_unit>"
   "vminu<VI_char> %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
 (define_insn "smin<mode>3"
-  [(set (match_operand:VI 0 "register_operand" "=v")
-        (smin:VI (match_operand:VI 1 "register_operand" "v")
-                 (match_operand:VI 2 "register_operand" "v")))]
-  "TARGET_ALTIVEC"
+  [(set (match_operand:VI2 0 "register_operand" "=v")
+        (smin:VI2 (match_operand:VI2 1 "register_operand" "v")
+		  (match_operand:VI2 2 "register_operand" "v")))]
+  "<VI_unit>"
   "vmins<VI_char> %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
@@ -1058,24 +1064,6 @@ (define_insn "*altivec_andc<mode>3"
   "vandc %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
-(define_insn "altivec_vpkuhum"
-  [(set (match_operand:V16QI 0 "register_operand" "=v")
-        (unspec:V16QI [(match_operand:V8HI 1 "register_operand" "v")
-                       (match_operand:V8HI 2 "register_operand" "v")]
-		      UNSPEC_VPKUHUM))]
-  "TARGET_ALTIVEC"
-  "vpkuhum %0,%1,%2"
-  [(set_attr "type" "vecperm")])
-
-(define_insn "altivec_vpkuwum"
-  [(set (match_operand:V8HI 0 "register_operand" "=v")
-        (unspec:V8HI [(match_operand:V4SI 1 "register_operand" "v")
-                      (match_operand:V4SI 2 "register_operand" "v")]
-		     UNSPEC_VPKUWUM))]
-  "TARGET_ALTIVEC"
-  "vpkuwum %0,%1,%2"
-  [(set_attr "type" "vecperm")])
-
 (define_insn "altivec_vpkpx"
   [(set (match_operand:V8HI 0 "register_operand" "=v")
         (unspec:V8HI [(match_operand:V4SI 1 "register_operand" "v")
@@ -1085,71 +1073,47 @@ (define_insn "altivec_vpkpx"
   "vpkpx %0,%1,%2"
   [(set_attr "type" "vecperm")])
 
-(define_insn "altivec_vpkshss"
-  [(set (match_operand:V16QI 0 "register_operand" "=v")
-        (unspec:V16QI [(match_operand:V8HI 1 "register_operand" "v")
-                       (match_operand:V8HI 2 "register_operand" "v")]
-		      UNSPEC_VPKSHSS))
-   (set (reg:SI 110) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR))]
-  "TARGET_ALTIVEC"
-  "vpkshss %0,%1,%2"
-  [(set_attr "type" "vecperm")])
-
-(define_insn "altivec_vpkswss"
-  [(set (match_operand:V8HI 0 "register_operand" "=v")
-        (unspec:V8HI [(match_operand:V4SI 1 "register_operand" "v")
-                      (match_operand:V4SI 2 "register_operand" "v")]
-		     UNSPEC_VPKSWSS))
-   (set (reg:SI 110) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR))]
-  "TARGET_ALTIVEC"
-  "vpkswss %0,%1,%2"
-  [(set_attr "type" "vecperm")])
-
-(define_insn "altivec_vpkuhus"
-  [(set (match_operand:V16QI 0 "register_operand" "=v")
-        (unspec:V16QI [(match_operand:V8HI 1 "register_operand" "v")
-                       (match_operand:V8HI 2 "register_operand" "v")]
-		      UNSPEC_VPKUHUS))
-   (set (reg:SI 110) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR))]
-  "TARGET_ALTIVEC"
-  "vpkuhus %0,%1,%2"
-  [(set_attr "type" "vecperm")])
-
-(define_insn "altivec_vpkshus"
-  [(set (match_operand:V16QI 0 "register_operand" "=v")
-        (unspec:V16QI [(match_operand:V8HI 1 "register_operand" "v")
-                       (match_operand:V8HI 2 "register_operand" "v")]
-		      UNSPEC_VPKSHUS))
-   (set (reg:SI 110) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR))]
-  "TARGET_ALTIVEC"
-  "vpkshus %0,%1,%2"
-  [(set_attr "type" "vecperm")])
-
-(define_insn "altivec_vpkuwus"
-  [(set (match_operand:V8HI 0 "register_operand" "=v")
-        (unspec:V8HI [(match_operand:V4SI 1 "register_operand" "v")
-                      (match_operand:V4SI 2 "register_operand" "v")]
-		     UNSPEC_VPKUWUS))
-   (set (reg:SI 110) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR))]
-  "TARGET_ALTIVEC"
-  "vpkuwus %0,%1,%2"
-  [(set_attr "type" "vecperm")])
-
-(define_insn "altivec_vpkswus"
-  [(set (match_operand:V8HI 0 "register_operand" "=v")
-        (unspec:V8HI [(match_operand:V4SI 1 "register_operand" "v")
-                      (match_operand:V4SI 2 "register_operand" "v")]
-		     UNSPEC_VPKSWUS))
-   (set (reg:SI 110) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR))]
-  "TARGET_ALTIVEC"
-  "vpkswus %0,%1,%2"
+(define_insn "altivec_vpks<VI_char>ss"
+  [(set (match_operand:<VP_small> 0 "register_operand" "=v")
+	(unspec:<VP_small> [(match_operand:VP 1 "register_operand" "v")
+			    (match_operand:VP 2 "register_operand" "v")]
+			   UNSPEC_VPACK_SIGN_SIGN_SAT))]
+  "<VI_unit>"
+  "vpks<VI_char>ss %0,%1,%2"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vpks<VI_char>us"
+  [(set (match_operand:<VP_small> 0 "register_operand" "=v")
+	(unspec:<VP_small> [(match_operand:VP 1 "register_operand" "v")
+			    (match_operand:VP 2 "register_operand" "v")]
+			   UNSPEC_VPACK_SIGN_UNS_SAT))]
+  "<VI_unit>"
+  "vpks<VI_char>us %0,%1,%2"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vpku<VI_char>us"
+  [(set (match_operand:<VP_small> 0 "register_operand" "=v")
+	(unspec:<VP_small> [(match_operand:VP 1 "register_operand" "v")
+			    (match_operand:VP 2 "register_operand" "v")]
+			   UNSPEC_VPACK_UNS_UNS_SAT))]
+  "<VI_unit>"
+  "vpku<VI_char>us %0,%1,%2"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vpku<VI_char>um"
+  [(set (match_operand:<VP_small> 0 "register_operand" "=v")
+	(unspec:<VP_small> [(match_operand:VP 1 "register_operand" "v")
+			    (match_operand:VP 2 "register_operand" "v")]
+			   UNSPEC_VPACK_UNS_UNS_MOD))]
+  "<VI_unit>"
+  "vpku<VI_char>um %0,%1,%2"
   [(set_attr "type" "vecperm")])
 
 (define_insn "*altivec_vrl<VI_char>"
-  [(set (match_operand:VI 0 "register_operand" "=v")
-        (rotate:VI (match_operand:VI 1 "register_operand" "v")
-		   (match_operand:VI 2 "register_operand" "v")))]
-  "TARGET_ALTIVEC"
+  [(set (match_operand:VI2 0 "register_operand" "=v")
+        (rotate:VI2 (match_operand:VI2 1 "register_operand" "v")
+		    (match_operand:VI2 2 "register_operand" "v")))]
+  "<VI_unit>"
   "vrl<VI_char> %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
@@ -1172,26 +1136,26 @@ (define_insn "altivec_vslo"
   [(set_attr "type" "vecperm")])
 
 (define_insn "*altivec_vsl<VI_char>"
-  [(set (match_operand:VI 0 "register_operand" "=v")
-        (ashift:VI (match_operand:VI 1 "register_operand" "v")
-		   (match_operand:VI 2 "register_operand" "v")))]
-  "TARGET_ALTIVEC"
+  [(set (match_operand:VI2 0 "register_operand" "=v")
+        (ashift:VI2 (match_operand:VI2 1 "register_operand" "v")
+		    (match_operand:VI2 2 "register_operand" "v")))]
+  "<VI_unit>"
   "vsl<VI_char> %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
 (define_insn "*altivec_vsr<VI_char>"
-  [(set (match_operand:VI 0 "register_operand" "=v")
-        (lshiftrt:VI (match_operand:VI 1 "register_operand" "v")
-		     (match_operand:VI 2 "register_operand" "v")))]
-  "TARGET_ALTIVEC"
+  [(set (match_operand:VI2 0 "register_operand" "=v")
+        (lshiftrt:VI2 (match_operand:VI2 1 "register_operand" "v")
+		      (match_operand:VI2 2 "register_operand" "v")))]
+  "<VI_unit>"
   "vsr<VI_char> %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
 (define_insn "*altivec_vsra<VI_char>"
-  [(set (match_operand:VI 0 "register_operand" "=v")
-        (ashiftrt:VI (match_operand:VI 1 "register_operand" "v")
-		     (match_operand:VI 2 "register_operand" "v")))]
-  "TARGET_ALTIVEC"
+  [(set (match_operand:VI2 0 "register_operand" "=v")
+        (ashiftrt:VI2 (match_operand:VI2 1 "register_operand" "v")
+		      (match_operand:VI2 2 "register_operand" "v")))]
+  "<VI_unit>"
   "vsra<VI_char> %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
@@ -1476,12 +1440,20 @@ (define_insn "altivec_vsldoi_<mode>"
   "vsldoi %0,%1,%2,%3"
   [(set_attr "type" "vecperm")])
 
-(define_insn "altivec_vupkhsb"
-  [(set (match_operand:V8HI 0 "register_operand" "=v")
-	(unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")]
-		     UNSPEC_VUPKHSB))]
-  "TARGET_ALTIVEC"
-  "vupkhsb %0,%1"
+(define_insn "altivec_vupkhs<VU_char>"
+  [(set (match_operand:VP 0 "register_operand" "=v")
+	(unspec:VP [(match_operand:<VP_small> 1 "register_operand" "v")]
+		     UNSPEC_VUNPACK_HI_SIGN))]
+  "<VI_unit>"
+  "vupkhs<VU_char> %0,%1"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vupkls<VU_char>"
+  [(set (match_operand:VP 0 "register_operand" "=v")
+	(unspec:VP [(match_operand:<VP_small> 1 "register_operand" "v")]
+		     UNSPEC_VUNPACK_LO_SIGN))]
+  "<VI_unit>"
+  "vupkls<VU_char> %0,%1"
   [(set_attr "type" "vecperm")])
 
 (define_insn "altivec_vupkhpx"
@@ -1492,22 +1464,6 @@ (define_insn "altivec_vupkhpx"
   "vupkhpx %0,%1"
   [(set_attr "type" "vecperm")])
 
-(define_insn "altivec_vupkhsh"
-  [(set (match_operand:V4SI 0 "register_operand" "=v")
-	(unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")]
-		     UNSPEC_VUPKHSH))]
-  "TARGET_ALTIVEC"
-  "vupkhsh %0,%1"
-  [(set_attr "type" "vecperm")])
-
-(define_insn "altivec_vupklsb"
-  [(set (match_operand:V8HI 0 "register_operand" "=v")
-	(unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")]
-		     UNSPEC_VUPKLSB))]
-  "TARGET_ALTIVEC"
-  "vupklsb %0,%1"
-  [(set_attr "type" "vecperm")])
-
 (define_insn "altivec_vupklpx"
   [(set (match_operand:V4SI 0 "register_operand" "=v")
 	(unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")]
@@ -1516,49 +1472,41 @@ (define_insn "altivec_vupklpx"
   "vupklpx %0,%1"
   [(set_attr "type" "vecperm")])
 
-(define_insn "altivec_vupklsh"
-  [(set (match_operand:V4SI 0 "register_operand" "=v")
-	(unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")]
-		     UNSPEC_VUPKLSH))]
-  "TARGET_ALTIVEC"
-  "vupklsh %0,%1"
-  [(set_attr "type" "vecperm")])
-
 ;; Compare vectors producing a vector result and a predicate, setting CR6 to
 ;; indicate a combined status
 (define_insn "*altivec_vcmpequ<VI_char>_p"
   [(set (reg:CC 74)
-	(unspec:CC [(eq:CC (match_operand:VI 1 "register_operand" "v")
-			   (match_operand:VI 2 "register_operand" "v"))]
+	(unspec:CC [(eq:CC (match_operand:VI2 1 "register_operand" "v")
+			   (match_operand:VI2 2 "register_operand" "v"))]
 		   UNSPEC_PREDICATE))
-   (set (match_operand:VI 0 "register_operand" "=v")
-	(eq:VI (match_dup 1)
-	       (match_dup 2)))]
-  "VECTOR_UNIT_ALTIVEC_P (<MODE>mode)"
+   (set (match_operand:VI2 0 "register_operand" "=v")
+	(eq:VI2 (match_dup 1)
+		(match_dup 2)))]
+  "<VI_unit>"
   "vcmpequ<VI_char>. %0,%1,%2"
   [(set_attr "type" "veccmp")])
 
 (define_insn "*altivec_vcmpgts<VI_char>_p"
   [(set (reg:CC 74)
-	(unspec:CC [(gt:CC (match_operand:VI 1 "register_operand" "v")
-			   (match_operand:VI 2 "register_operand" "v"))]
+	(unspec:CC [(gt:CC (match_operand:VI2 1 "register_operand" "v")
+			   (match_operand:VI2 2 "register_operand" "v"))]
 		   UNSPEC_PREDICATE))
-   (set (match_operand:VI 0 "register_operand" "=v")
-	(gt:VI (match_dup 1)
-	       (match_dup 2)))]
-  "VECTOR_UNIT_ALTIVEC_P (<MODE>mode)"
+   (set (match_operand:VI2 0 "register_operand" "=v")
+	(gt:VI2 (match_dup 1)
+		(match_dup 2)))]
+  "<VI_unit>"
   "vcmpgts<VI_char>. %0,%1,%2"
   [(set_attr "type" "veccmp")])
 
 (define_insn "*altivec_vcmpgtu<VI_char>_p"
   [(set (reg:CC 74)
-	(unspec:CC [(gtu:CC (match_operand:VI 1 "register_operand" "v")
-			    (match_operand:VI 2 "register_operand" "v"))]
+	(unspec:CC [(gtu:CC (match_operand:VI2 1 "register_operand" "v")
+			    (match_operand:VI2 2 "register_operand" "v"))]
 		   UNSPEC_PREDICATE))
-   (set (match_operand:VI 0 "register_operand" "=v")
-	(gtu:VI (match_dup 1)
-		(match_dup 2)))]
-  "VECTOR_UNIT_ALTIVEC_P (<MODE>mode)"
+   (set (match_operand:VI2 0 "register_operand" "=v")
+	(gtu:VI2 (match_dup 1)
+		 (match_dup 2)))]
+  "<VI_unit>"
   "vcmpgtu<VI_char>. %0,%1,%2"
   [(set_attr "type" "veccmp")])
 
@@ -1779,20 +1727,28 @@ (define_insn "*altivec_stvesfx"
   [(set_attr "type" "vecstore")])
 
 ;; Generate
-;;    vspltis? SCRATCH0,0
+;;    xxlxor/vxor SCRATCH0,SCRATCH0,SCRATCH0
 ;;    vsubu?m SCRATCH2,SCRATCH1,%1
 ;;    vmaxs? %0,%1,SCRATCH2"
 (define_expand "abs<mode>2"
-  [(set (match_dup 2) (vec_duplicate:VI (const_int 0)))
-   (set (match_dup 3)
-        (minus:VI (match_dup 2)
-                  (match_operand:VI 1 "register_operand" "v")))
-   (set (match_operand:VI 0 "register_operand" "=v")
-        (smax:VI (match_dup 1) (match_dup 3)))]
-  "TARGET_ALTIVEC"
-{
-  operands[2] = gen_reg_rtx (GET_MODE (operands[0]));
-  operands[3] = gen_reg_rtx (GET_MODE (operands[0]));
+  [(set (match_dup 2) (match_dup 3))
+   (set (match_dup 4)
+        (minus:VI2 (match_dup 2)
+		   (match_operand:VI2 1 "register_operand" "v")))
+   (set (match_operand:VI2 0 "register_operand" "=v")
+        (smax:VI2 (match_dup 1) (match_dup 4)))]
+  "<VI_unit>"
+{
+  int i, n_elt = GET_MODE_NUNITS (<MODE>mode);
+  rtvec v = rtvec_alloc (n_elt);
+
+  /* Create an all 0 constant.  */
+  for (i = 0; i < n_elt; ++i)
+    RTVEC_ELT (v, i) = const0_rtx;
+
+  operands[2] = gen_reg_rtx (<MODE>mode);
+  operands[3] = gen_rtx_CONST_VECTOR (<MODE>mode, v);
+  operands[4] = gen_reg_rtx (<MODE>mode);
 })
 
 ;; Generate
@@ -1950,49 +1906,19 @@ (define_expand "widen_ssumv8hi3"
   DONE;
 }")
 
-(define_expand "vec_unpacks_hi_v16qi"
-  [(set (match_operand:V8HI 0 "register_operand" "=v")
-        (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")]
-                     UNSPEC_VUPKHSB))]
-  "TARGET_ALTIVEC"
-  "
-{
-  emit_insn (gen_altivec_vupkhsb (operands[0], operands[1]));
-  DONE;
-}")
-
-(define_expand "vec_unpacks_hi_v8hi"
-  [(set (match_operand:V4SI 0 "register_operand" "=v")
-        (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")]
-                     UNSPEC_VUPKHSH))]
-  "TARGET_ALTIVEC"
-  "
-{
-  emit_insn (gen_altivec_vupkhsh (operands[0], operands[1]));
-  DONE;
-}")
-
-(define_expand "vec_unpacks_lo_v16qi"
-  [(set (match_operand:V8HI 0 "register_operand" "=v")
-        (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")]
-                     UNSPEC_VUPKLSB))]
-  "TARGET_ALTIVEC"
-  "
-{
-  emit_insn (gen_altivec_vupklsb (operands[0], operands[1]));
-  DONE;
-}")
+(define_expand "vec_unpacks_hi_<VP_small_lc>"
+  [(set (match_operand:VP 0 "register_operand" "=v")
+        (unspec:VP [(match_operand:<VP_small> 1 "register_operand" "v")]
+		   UNSPEC_VUNPACK_HI_SIGN))]
+  "<VI_unit>"
+  "")
 
-(define_expand "vec_unpacks_lo_v8hi"
-  [(set (match_operand:V4SI 0 "register_operand" "=v")
-        (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")]
-                     UNSPEC_VUPKLSH))]
-  "TARGET_ALTIVEC"
-  "
-{
-  emit_insn (gen_altivec_vupklsh (operands[0], operands[1]));
-  DONE;
-}")
+(define_expand "vec_unpacks_lo_<VP_small_lc>"
+  [(set (match_operand:VP 0 "register_operand" "=v")
+        (unspec:VP [(match_operand:<VP_small> 1 "register_operand" "v")]
+		   UNSPEC_VUNPACK_LO_SIGN))]
+  "<VI_unit>"
+  "")
 
 (define_insn "vperm_v8hiv4si"
   [(set (match_operand:V4SI 0 "register_operand" "=v")
@@ -2291,29 +2217,13 @@ (define_expand "vec_widen_smult_lo_v8hi"
   DONE;
 }")
 
-(define_expand "vec_pack_trunc_v8hi"
-  [(set (match_operand:V16QI 0 "register_operand" "=v")
-        (unspec:V16QI [(match_operand:V8HI 1 "register_operand" "v")
-                       (match_operand:V8HI 2 "register_operand" "v")]
-                      UNSPEC_VPKUHUM))]
-  "TARGET_ALTIVEC"
-  "
-{
-  emit_insn (gen_altivec_vpkuhum (operands[0], operands[1], operands[2]));
-  DONE;
-}")
-                                                                                
-(define_expand "vec_pack_trunc_v4si"
-  [(set (match_operand:V8HI 0 "register_operand" "=v")
-        (unspec:V8HI [(match_operand:V4SI 1 "register_operand" "v")
-                      (match_operand:V4SI 2 "register_operand" "v")]
-                     UNSPEC_VPKUWUM))]
-  "TARGET_ALTIVEC"
-  "
-{
-  emit_insn (gen_altivec_vpkuwum (operands[0], operands[1], operands[2]));
-  DONE;
-}")
+(define_expand "vec_pack_trunc_<mode>"
+  [(set (match_operand:<VP_small> 0 "register_operand" "=v")
+        (unspec:<VP_small> [(match_operand:VP 1 "register_operand" "v")
+			    (match_operand:VP 2 "register_operand" "v")]
+                      UNSPEC_VPACK_UNS_UNS_MOD))]
+  "<VI_unit>"
+  "")
 
 (define_expand "altivec_negv4sf2"
   [(use (match_operand:V4SF 0 "register_operand" ""))
Index: gcc/config/rs6000/altivec.h
===================================================================
--- gcc/config/rs6000/altivec.h	(revision 199037)
+++ gcc/config/rs6000/altivec.h	(working copy)
@@ -321,6 +321,26 @@
 #define vec_vsx_st __builtin_vec_vsx_st
 #endif
 
+#ifdef _ARCH_PWR8
+/* Vector additions added in Power8/ISA 2.07.  */
+#define vec_vaddudm __builtin_vec_vaddudm
+#define vec_vmaxsd __builtin_vec_vmaxsd
+#define vec_vmaxud __builtin_vec_vmaxud
+#define vec_vminsd __builtin_vec_vminsd
+#define vec_vminud __builtin_vec_vminud
+#define vec_vpksdss __builtin_vec_vpksdss
+#define vec_vpksdus __builtin_vec_vpksdus
+#define vec_vpkudum __builtin_vec_vpkudum
+#define vec_vpkudus __builtin_vec_vpkudus
+#define vec_vrld __builtin_vec_vrld
+#define vec_vsld __builtin_vec_vsld
+#define vec_vsrad __builtin_vec_vsrad
+#define vec_vsrd __builtin_vec_vsrd
+#define vec_vsubudm __builtin_vec_vsubudm
+#define vec_vupkhsw __builtin_vec_vupkhsw
+#define vec_vupklsw __builtin_vec_vupklsw
+#endif
+
 /* Predicates.
    For C++, we use templates in order to allow non-parenthesized arguments.
    For C, instead, we use macros since non-parenthesized arguments were

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH, rs6000] power8 patches, patch #4, new power8 builtins
  2013-05-20 20:41 [PATCH, rs6000] power8 patches Michael Meissner
                   ` (3 preceding siblings ...)
  2013-05-21 15:51 ` [PATCH, rs6000] power8 patches, patch #3, add V2DI vector support Michael Meissner
@ 2013-05-21 23:47 ` Michael Meissner
  2013-05-25  4:03   ` David Edelsohn
  2013-06-04 18:49   ` [PATCH, rs6000] power8 patches, patch #4 (revised), " Michael Meissner
  2013-05-21 23:49 ` [PATCH, rs6000] power8 patches, patch #5, new vector tests Michael Meissner
                   ` (4 subsequent siblings)
  9 siblings, 2 replies; 52+ messages in thread
From: Michael Meissner @ 2013-05-21 23:47 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, dje.gcc, pthaugen, bergner

[-- Attachment #1: Type: text/plain, Size: 5420 bytes --]

This patch adds new builtins for power8.

Code generation support for the new VSX logical instructions (xxleqv, xxlnand,
and xxlorc) is added.

In reworking the patch for posting in smaller chunks, I discvered a bug in the
int_reg_operand operand added in a previous patch, and I have fixed it here.

In addition, it reworks the VSX logical operations to generate code when a GPR
is selected instead of a VSX register when quad memory instructions are enabled
in a future patch.

The next patch will provide the tests for this patch (#4) and the previous
patch (#3).

As before, it bootstraps, and has no regressions.  Is it ok to check in once
the previous 3 patches have been applied?

2013-05-21  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* doc/extend.texi (PowerPC AltiVec/VSX Built-in Functions):
	Document new power8 builtins.

	* config/rs6000/vector.md (gpr move splitter): Do not split direct
	moves or quad word load/stores.
	(and<mode>3): Add a clobber/scratch of a condition code register,
	so TImode logical operations can be done either in VSX registers
	or GPRs.
	(eqv<mode>3): Add expanders for power8 xxleqv, xxlnand, xxlorc,
	vclz*, and vpopcnt* vector instructions.

	* config/rs6000/rs6000-builtin.def (xscvspdpn): Add new power8
	builtin functions.
	(xscvdpspn): Likewise.
	(vclzb): Likewise.
	(vclzh): Likewise.
	(vclzw): Likewise.
	(vclzd): Likewise.
	(vpopcntb): Likewise.
	(vpopcnth): Likewise.
	(vpopcntw): Likewise.
	(vpopcntd): Likewise.
	(vgbbd): Likewise.
	(vmrgew): Likewise.
	(vmrgow): Likewise.
	(eqv_v16qi3): Likewise.
	(eqv_v8hi3): Likewise.
	(eqv_v4si3): Likewise.
	(eqv_v2di3): Likewise.
	(eqv_v4sf3): Likewise.
	(eqv_v2df3): Likewise.
	(nand_v16qi3): Likewise.
	(nand_v8hi3): Likewise.
	(nand_v4si3): Likewise.
	(nand_v2di3): Likewise.
	(nand_v4sf3): Likewise.
	(nand_v2df3): Likewise.
	(orc_v16qi3): Likewise.
	(orc_v8hi3): Likewise.
	(orc_v4si3): Likewise.
	(orc_v2di3): Likewise.
	(orc_v4sf3): Likewise.
	(orc_v2df3): Likewise.
	(vclz): Likewise.
	(vclzb): Likewise.
	(vclzh): Likewise.
	(vclzw): Likewise.
	(vclzd): Likewise.
	(vpopcnt): Likewise.
	(vpopcntb): Likewise.
	(vpopcnth): Likewise.
	(vpopcntw): Likewise.
	(vpopcntd): Likewise.
	(vgbbd): Likewise.
	(eqv): Likewise.
	(nand): Likewise.
	(orc): Likewise.
	(vmrgew): Likewise.
	(vmrgow): Likewise.

	* config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Add
	support for new power8 builtins.

	* config/rs6000/rs6000.c (rs6000_option_override_internal): Only
	allow power8 quad mode in 64-bit.  Turn off splitting wide types
	if we have quad mode.
	(rs6000_builtin_vectorized_function): Vectorize count leading
	zeros, population count builtins.
	(rs6000_expand_vector_init): On power8 use xscvdpspn to form V4SF
	vectors instead of xscvdpsp to avoid IEEE related traps.
	(builtin_function_type): Add vgbbd builtin function which takes an
	unsigned argument.
	(altivec_expand_vec_perm_const): Add support for new power8 merge
	instructions.

	* config/rs6000/vsx.md (VSX_M2): New iterator that includes
	TImode.
	(UNSPEC_VSX_CVSPDPN): Support for power8 xscvdpspn and xscvspdpn
	instructions.
	(UNSPEC_VSX_CVDPSPN): Likewise.
	(vsx_xscvdpspn): Likewise.
	(vsx_xscvspdpn): Likewise.
	(vsx_xscvdpspn_scalar): Likewise.
	(vsx_xscvspdpn_directmove): Likewise.
	(vsx_and<mode>3): Add support to do logical operations on TImode
	as well as VSX vector types.  Allow logical operations to be done
	in either VSX registers or in general purpose registers if we
	support quad mode in GPRs.  Add splitters if GPRs were used.  For
	and, add clobber of CCmode to allow use of ANDI on GPRs.
	(vsx_or_gpr_and<mode>3): Likewise.
	(vsx_ior<mode>3): Likewise.
	(vsx_or_gpr_ior<mode>3): Likewise.
	(vsx_xor<mode>3): Likewise.
	(vsx_or_gpr_xor<mode>3): Likewise.
	(vsx_one_cmpl<mode>2): Likewise.
	(vsx_or_gpr_one_cmpl<mode>2): Likewise.
	(vsx_nor<mode>3): Likewise.
	(vsx_or_gpr_nor<mode>3): Likewise.
	(vsx_andc<mode>3): Likewise.
	(vsx_or_gpr_andc<mode>3): Likewise.
	(vsx_eqv<mode>3): Add support for power8 xxleqv, xxlnand, and
	xxlorc instructions.
	(vsx_or_gpr_eqv<mode>3): Likewise.
	(vsx_nand<mode>3): Likewise.
	(vsx_or_gpr_nand<mode>3): Likewise.
	(vsx_orc<mode>3): Likewise.
	(vsx_or_gpr_orc<mode>3): Likewise.

	* config/rs6000/altivec.md (p8_vmrgew): Add power8 vmrgew and
	vmrgow instructions.
	(p8_vmrgow): Likewise.
	(altivec_and<mode>3): Add clobber of CCmode to allow AND using
	GPRs to be split under VSX.
	(p8v_clz<mode>2): Add power8 count leading zero support.
	(p8v_popcount<mode>2): Add power8 population count support.
	(p8v_vgbbd): Add power8 gather bits by bytes by doubleword
	support.

	* config/rs6000/altivec.h (vec_eqv): Add defines to export power8
	builtin functions.
	(vec_nand): Likewise.
	(vec_vclz): Likewise.
	(vec_vclzb): Likewise.
	(vec_vclzd): Likewise.
	(vec_vclzh): Likewise.
	(vec_vclzw): Likewise.
	(vec_vgbbd): Likewise.
	(vec_vmrgew): Likewise.
	(vec_vmrgow): Likewise.
	(vec_vpopcnt): Likewise.
	(vec_vpopcntb): Likewise.
	(vec_vpopcntd): Likewise.
	(vec_vpopcnth): Likewise.
	(vec_vpopcntw): Likewise.

	* config/rs6000/predicates.md (int_reg_operand): Rework tests so
	that only the GPRs are recognized.

	* config/rs6000/rs6000.h (VLOGICAL_REGNO_P): Only allow logical
	operations in GPRs if we are supporting quad memory mode.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

[-- Attachment #2: gcc-power8.official-04b --]
[-- Type: text/plain, Size: 69480 bytes --]

Index: gcc/doc/extend.texi
===================================================================
--- gcc/doc/extend.texi	(revision 199149)
+++ gcc/doc/extend.texi	(working copy)
@@ -13964,6 +13964,38 @@ int vec_any_le (vector long long, vector
 int vec_any_lt (vector long long, vector long long);
 int vec_any_ne (vector long long, vector long long);
 
+vector long long vec_eqv (vector long long, vector long long);
+vector long long vec_eqv (vector bool long long, vector long long);
+vector long long vec_eqv (vector long long, vector bool long long);
+vector unsigned long long vec_eqv (vector unsigned long long,
+                                   vector unsigned long long);
+vector unsigned long long vec_eqv (vector bool long long,
+                                   vector unsigned long long);
+vector unsigned long long vec_eqv (vector unsigned long long,
+                                   vector bool long long);
+vector int vec_eqv (vector int, vector int);
+vector int vec_eqv (vector bool int, vector int);
+vector int vec_eqv (vector int, vector bool int);
+vector unsigned int vec_eqv (vector unsigned int, vector unsigned int);
+vector unsigned int vec_eqv (vector bool unsigned int,
+                             vector unsigned int);
+vector unsigned int vec_eqv (vector unsigned int,
+                             vector bool unsigned int);
+vector short vec_eqv (vector short, vector short);
+vector short vec_eqv (vector bool short, vector short);
+vector short vec_eqv (vector short, vector bool short);
+vector unsigned short vec_eqv (vector unsigned short, vector unsigned short);
+vector unsigned short vec_eqv (vector bool unsigned short,
+                               vector unsigned short);
+vector unsigned short vec_eqv (vector unsigned short,
+                               vector bool unsigned short);
+vector signed char vec_eqv (vector signed char, vector signed char);
+vector signed char vec_eqv (vector bool signed char, vector signed char);
+vector signed char vec_eqv (vector signed char, vector bool signed char);
+vector unsigned char vec_eqv (vector unsigned char, vector unsigned char);
+vector unsigned char vec_eqv (vector bool unsigned char, vector unsigned char);
+vector unsigned char vec_eqv (vector unsigned char, vector bool unsigned char);
+
 vector long long vec_max (vector long long, vector long long);
 vector unsigned long long vec_max (vector unsigned long long,
                                    vector unsigned long long);
@@ -13972,6 +14004,70 @@ vector long long vec_min (vector long lo
 vector unsigned long long vec_min (vector unsigned long long,
                                    vector unsigned long long);
 
+vector long long vec_nand (vector long long, vector long long);
+vector long long vec_nand (vector bool long long, vector long long);
+vector long long vec_nand (vector long long, vector bool long long);
+vector unsigned long long vec_nand (vector unsigned long long,
+                                    vector unsigned long long);
+vector unsigned long long vec_nand (vector bool long long,
+                                   vector unsigned long long);
+vector unsigned long long vec_nand (vector unsigned long long,
+                                    vector bool long long);
+vector int vec_nand (vector int, vector int);
+vector int vec_nand (vector bool int, vector int);
+vector int vec_nand (vector int, vector bool int);
+vector unsigned int vec_nand (vector unsigned int, vector unsigned int);
+vector unsigned int vec_nand (vector bool unsigned int,
+                              vector unsigned int);
+vector unsigned int vec_nand (vector unsigned int,
+                              vector bool unsigned int);
+vector short vec_nand (vector short, vector short);
+vector short vec_nand (vector bool short, vector short);
+vector short vec_nand (vector short, vector bool short);
+vector unsigned short vec_nand (vector unsigned short, vector unsigned short);
+vector unsigned short vec_nand (vector bool unsigned short,
+                                vector unsigned short);
+vector unsigned short vec_nand (vector unsigned short,
+                                vector bool unsigned short);
+vector signed char vec_nand (vector signed char, vector signed char);
+vector signed char vec_nand (vector bool signed char, vector signed char);
+vector signed char vec_nand (vector signed char, vector bool signed char);
+vector unsigned char vec_nand (vector unsigned char, vector unsigned char);
+vector unsigned char vec_nand (vector bool unsigned char, vector unsigned char);
+vector unsigned char vec_nand (vector unsigned char, vector bool unsigned char);
+
+vector long long vec_orc (vector long long, vector long long);
+vector long long vec_orc (vector bool long long, vector long long);
+vector long long vec_orc (vector long long, vector bool long long);
+vector unsigned long long vec_orc (vector unsigned long long,
+                                   vector unsigned long long);
+vector unsigned long long vec_orc (vector bool long long,
+                                   vector unsigned long long);
+vector unsigned long long vec_orc (vector unsigned long long,
+                                   vector bool long long);
+vector int vec_orc (vector int, vector int);
+vector int vec_orc (vector bool int, vector int);
+vector int vec_orc (vector int, vector bool int);
+vector unsigned int vec_orc (vector unsigned int, vector unsigned int);
+vector unsigned int vec_orc (vector bool unsigned int,
+                             vector unsigned int);
+vector unsigned int vec_orc (vector unsigned int,
+                             vector bool unsigned int);
+vector short vec_orc (vector short, vector short);
+vector short vec_orc (vector bool short, vector short);
+vector short vec_orc (vector short, vector bool short);
+vector unsigned short vec_orc (vector unsigned short, vector unsigned short);
+vector unsigned short vec_orc (vector bool unsigned short,
+                               vector unsigned short);
+vector unsigned short vec_orc (vector unsigned short,
+                               vector bool unsigned short);
+vector signed char vec_orc (vector signed char, vector signed char);
+vector signed char vec_orc (vector bool signed char, vector signed char);
+vector signed char vec_orc (vector signed char, vector bool signed char);
+vector unsigned char vec_orc (vector unsigned char, vector unsigned char);
+vector unsigned char vec_orc (vector bool unsigned char, vector unsigned char);
+vector unsigned char vec_orc (vector unsigned char, vector bool unsigned char);
+
 vector int vec_pack (vector long long, vector long long);
 vector unsigned int vec_pack (vector unsigned long long,
                               vector unsigned long long);
@@ -14020,6 +14116,27 @@ vector unsigned long long vec_vaddudm (v
 vector unsigned long long vec_vaddudm (vector unsigned long long,
                                        vector bool unsigned long long);
 
+vector long long vec_vclz (vector long long);
+vector unsigned long long vec_vclz (vector unsigned long long);
+vector int vec_vclz (vector int);
+vector unsigned int vec_vclz (vector int);
+vector short vec_vclz (vector short);
+vector unsigned short vec_vclz (vector unsigned short);
+vector signed char vec_vclz (vector signed char);
+vector unsigned char vec_vclz (vector unsigned char);
+
+vector signed char vec_vclzb (vector signed char);
+vector unsigned char vec_vclzb (vector unsigned char);
+
+vector long long vec_vclzd (vector long long);
+vector unsigned long long vec_vclzd (vector unsigned long long);
+
+vector short vec_vclzh (vector short);
+vector unsigned short vec_vclzh (vector unsigned short);
+
+vector int vec_vclzw (vector int);
+vector unsigned int vec_vclzw (vector int);
+
 vector long long vec_vmaxsd (vector long long, vector long long);
 
 vector unsigned long long vec_vmaxud (vector unsigned long long,
@@ -14041,6 +14158,27 @@ vector unsigned int vec_vpkudum (vector 
                                  vector unsigned long long);
 vector bool int vec_vpkudum (vector bool long long, vector bool long long);
 
+vector long long vec_vpopcnt (vector long long);
+vector unsigned long long vec_vpopcnt (vector unsigned long long);
+vector int vec_vpopcnt (vector int);
+vector unsigned int vec_vpopcnt (vector int);
+vector short vec_vpopcnt (vector short);
+vector unsigned short vec_vpopcnt (vector unsigned short);
+vector signed char vec_vpopcnt (vector signed char);
+vector unsigned char vec_vpopcnt (vector unsigned char);
+
+vector signed char vec_vpopcntb (vector signed char);
+vector unsigned char vec_vpopcntb (vector unsigned char);
+
+vector long long vec_vpopcntd (vector long long);
+vector unsigned long long vec_vpopcntd (vector unsigned long long);
+
+vector short vec_vpopcnth (vector short);
+vector unsigned short vec_vpopcnth (vector unsigned short);
+
+vector int vec_vpopcntw (vector int);
+vector unsigned int vec_vpopcntw (vector int);
+
 vector long long vec_vrld (vector long long, vector unsigned long long);
 vector unsigned long long vec_vrld (vector unsigned long long,
                                     vector unsigned long long);
Index: gcc/config/rs6000/vector.md
===================================================================
--- gcc/config/rs6000/vector.md	(revision 199149)
+++ gcc/config/rs6000/vector.md	(working copy)
@@ -730,9 +730,10 @@ (define_expand "ior<mode>3"
   "")
 
 (define_expand "and<mode>3"
-  [(set (match_operand:VEC_L 0 "vlogical_operand" "")
-        (and:VEC_L (match_operand:VEC_L 1 "vlogical_operand" "")
-		   (match_operand:VEC_L 2 "vlogical_operand" "")))]
+  [(parallel [(set (match_operand:VEC_L 0 "vlogical_operand" "")
+		   (and:VEC_L (match_operand:VEC_L 1 "vlogical_operand" "")
+			      (match_operand:VEC_L 2 "vlogical_operand" "")))
+	      (clobber (match_scratch:CC 3 ""))])]
   "VECTOR_MEM_ALTIVEC_OR_VSX_P (<MODE>mode)
    && (<MODE>mode != TImode || TARGET_POWERPC64)"
   "")
@@ -760,6 +761,47 @@ (define_expand "andc<mode>3"
    && (<MODE>mode != TImode || TARGET_POWERPC64)"
   "")
 
+;; Power8 vector logical instructions.
+(define_expand "eqv<mode>3"
+  [(set (match_operand:VEC_L 0 "register_operand" "")
+	(not:VEC_L
+	 (xor:VEC_L (match_operand:VEC_L 1 "register_operand" "")
+		    (match_operand:VEC_L 2 "register_operand" ""))))]
+  "TARGET_P8_VECTOR && VECTOR_MEM_VSX_P (<MODE>mode)
+   && (<MODE>mode != TImode || TARGET_POWERPC64)")
+
+;; Rewrite nand into canonical form
+(define_expand "nand<mode>3"
+  [(set (match_operand:VEC_L 0 "register_operand" "")
+	(ior:VEC_L
+	 (not:VEC_L (match_operand:VEC_L 1 "register_operand" ""))
+	 (not:VEC_L (match_operand:VEC_L 2 "register_operand" ""))))]
+  "TARGET_P8_VECTOR && VECTOR_MEM_VSX_P (<MODE>mode)
+   && (<MODE>mode != TImode || TARGET_POWERPC64)")
+
+;; The canonical form is to have the negated elment first, so we need to
+;; reverse arguments.
+(define_expand "orc<mode>3"
+  [(set (match_operand:VEC_L 0 "register_operand" "")
+	(ior:VEC_L
+	 (not:VEC_L (match_operand:VEC_L 1 "register_operand" ""))
+	 (match_operand:VEC_L 2 "register_operand" "")))]
+  "TARGET_P8_VECTOR && VECTOR_MEM_VSX_P (<MODE>mode)
+   && (<MODE>mode != TImode || TARGET_POWERPC64)")
+
+;; Vector count leading zeros
+(define_expand "clz<mode>2"
+  [(set (match_operand:VEC_I 0 "register_operand" "")
+	(clz:VEC_I (match_operand:VEC_I 1 "register_operand" "")))]
+  "TARGET_P8_VECTOR")
+
+;; Vector population count
+(define_expand "popcount<mode>2"
+  [(set (match_operand:VEC_I 0 "register_operand" "")
+        (popcount:VEC_I (match_operand:VEC_I 1 "register_operand" "")))]
+  "TARGET_P8_VECTOR")
+
+\f
 ;; Same size conversions
 (define_expand "float<VEC_int><mode>2"
   [(set (match_operand:VEC_F 0 "vfloat_operand" "")
Index: gcc/config/rs6000/rs6000-builtin.def
===================================================================
--- gcc/config/rs6000/rs6000-builtin.def	(revision 199149)
+++ gcc/config/rs6000/rs6000-builtin.def	(working copy)
@@ -1234,10 +1234,24 @@ BU_VSX_OVERLOAD_2 (XXSPLTW,  "xxspltw")
 BU_VSX_OVERLOAD_X (LD,	     "ld")
 BU_VSX_OVERLOAD_X (ST,	     "st")
 \f
+/* 1 argument instruction added in ISA 2.07 that is classified as a VSX
+   instruction.  */
+BU_P8V_VSX_1 (XSCVSPDPN,      "xscvspdpn",	CONST,	vsx_xscvspdpn)
+BU_P8V_VSX_1 (XSCVDPSPN,      "xscvdpspn",	CONST,	vsx_xscvdpspn)
+
 /* 1 argument altivec instructions added in ISA 2.07.  */
 BU_P8V_AV_1 (ABS_V2DI,	      "abs_v2di",	CONST,	absv2di2)
 BU_P8V_AV_1 (VUPKHSW,	      "vupkhsw",	CONST,	altivec_vupkhsw)
 BU_P8V_AV_1 (VUPKLSW,	      "vupklsw",	CONST,	altivec_vupklsw)
+BU_P8V_AV_1 (VCLZB,	      "vclzb",		CONST,  clzv16qi2)
+BU_P8V_AV_1 (VCLZH,	      "vclzh",		CONST,  clzv8hi2)
+BU_P8V_AV_1 (VCLZW,	      "vclzw",		CONST,  clzv4si2)
+BU_P8V_AV_1 (VCLZD,	      "vclzd",		CONST,  clzv2di2)
+BU_P8V_AV_1 (VPOPCNTB,	      "vpopcntb",	CONST,  popcountv16qi2)
+BU_P8V_AV_1 (VPOPCNTH,	      "vpopcnth",	CONST,  popcountv8hi2)
+BU_P8V_AV_1 (VPOPCNTW,	      "vpopcntw",	CONST,  popcountv4si2)
+BU_P8V_AV_1 (VPOPCNTD,	      "vpopcntd",	CONST,  popcountv2di2)
+BU_P8V_AV_1 (VGBBD,	      "vgbbd",		CONST,  p8v_vgbbd)
 
 /* 2 argument altivec instructions added in ISA 2.07.  */
 BU_P8V_AV_2 (VADDUDM,		"vaddudm",	CONST,	addv2di3)
@@ -1245,6 +1259,8 @@ BU_P8V_AV_2 (VMINSD,		"vminsd",	CONST,	s
 BU_P8V_AV_2 (VMAXSD,		"vmaxsd",	CONST,	smaxv2di3)
 BU_P8V_AV_2 (VMINUD,		"vminud",	CONST,	uminv2di3)
 BU_P8V_AV_2 (VMAXUD,		"vmaxud",	CONST,	umaxv2di3)
+BU_P8V_AV_2 (VMRGEW,		"vmrgew",	CONST,	p8_vmrgew)
+BU_P8V_AV_2 (VMRGOW,		"vmrgow",	CONST,	p8_vmrgow)
 BU_P8V_AV_2 (VPKUDUM,		"vpkudum",	CONST,	altivec_vpkudum)
 BU_P8V_AV_2 (VPKSDSS,		"vpksdss",	CONST,	altivec_vpksdss)
 BU_P8V_AV_2 (VPKUDUS,		"vpkudus",	CONST,	altivec_vpkudus)
@@ -1255,6 +1271,29 @@ BU_P8V_AV_2 (VSRD,		"vsrd",		CONST,	vlsh
 BU_P8V_AV_2 (VSRAD,		"vsrad",	CONST,	vashrv2di3)
 BU_P8V_AV_2 (VSUBUDM,		"vsubudm",	CONST,	subv2di3)
 
+/* 2 argument VSX instructions added in ISA 2.07.  For the logical
+   instructions, we define a builtin for each vector type.  */
+BU_P8V_AV_2 (EQV_V16QI,		"eqv_v16qi",	CONST,	eqvv16qi3)
+BU_P8V_AV_2 (EQV_V8HI,		"eqv_v8hi",	CONST,	eqvv8hi3)
+BU_P8V_AV_2 (EQV_V4SI,		"eqv_v4si",	CONST,	eqvv4si3)
+BU_P8V_AV_2 (EQV_V2DI,		"eqv_v2di",	CONST,	eqvv2di3)
+BU_P8V_AV_2 (EQV_V4SF,		"eqv_v4sf",	CONST,	eqvv4sf3)
+BU_P8V_AV_2 (EQV_V2DF,		"eqv_v2df",	CONST,	eqvv2df3)
+
+BU_P8V_AV_2 (NAND_V16QI,	"nand_v16qi",	CONST,	nandv16qi3)
+BU_P8V_AV_2 (NAND_V8HI,		"nand_v8hi",	CONST,	nandv8hi3)
+BU_P8V_AV_2 (NAND_V4SI,		"nand_v4si",	CONST,	nandv4si3)
+BU_P8V_AV_2 (NAND_V2DI,		"nand_v2di",	CONST,	nandv2di3)
+BU_P8V_AV_2 (NAND_V4SF,		"nand_v4sf",	CONST,	nandv4sf3)
+BU_P8V_AV_2 (NAND_V2DF,		"nand_v2df",	CONST,	nandv2df3)
+
+BU_P8V_AV_2 (ORC_V16QI,		"orc_v16qi",	CONST,	orcv16qi3)
+BU_P8V_AV_2 (ORC_V8HI,		"orc_v8hi",	CONST,	orcv8hi3)
+BU_P8V_AV_2 (ORC_V4SI,		"orc_v4si",	CONST,	orcv4si3)
+BU_P8V_AV_2 (ORC_V2DI,		"orc_v2di",	CONST,	orcv2di3)
+BU_P8V_AV_2 (ORC_V4SF,		"orc_v4sf",	CONST,	orcv4sf3)
+BU_P8V_AV_2 (ORC_V2DF,		"orc_v2df",	CONST,	orcv2df3)
+
 /* Vector comparison instructions added in ISA 2.07.  */
 BU_P8V_AV_2 (VCMPEQUD,		"vcmpequd",	CONST,	vector_eqv2di)
 BU_P8V_AV_2 (VCMPGTSD,		"vcmpgtsd",	CONST,	vector_gtv2di)
@@ -1268,13 +1307,29 @@ BU_P8V_AV_P (VCMPGTUD_P,	"vcmpgtud_p",	C
 /* Power8 vector overloaded 1 argument functions.  */
 BU_P8V_OVERLOAD_1 (VUPKHSW,	"vupkhsw")
 BU_P8V_OVERLOAD_1 (VUPKLSW,	"vupklsw")
+BU_P8V_OVERLOAD_1 (VCLZ,	"vclz")
+BU_P8V_OVERLOAD_1 (VCLZB,	"vclzb")
+BU_P8V_OVERLOAD_1 (VCLZH,	"vclzh")
+BU_P8V_OVERLOAD_1 (VCLZW,	"vclzw")
+BU_P8V_OVERLOAD_1 (VCLZD,	"vclzd")
+BU_P8V_OVERLOAD_1 (VPOPCNT,	"vpopcnt")
+BU_P8V_OVERLOAD_1 (VPOPCNTB,	"vpopcntb")
+BU_P8V_OVERLOAD_1 (VPOPCNTH,	"vpopcnth")
+BU_P8V_OVERLOAD_1 (VPOPCNTW,	"vpopcntw")
+BU_P8V_OVERLOAD_1 (VPOPCNTD,	"vpopcntd")
+BU_P8V_OVERLOAD_1 (VGBBD,	"vgbbd")
 
 /* Power8 vector overloaded 2 argument functions.  */
+BU_P8V_OVERLOAD_2 (EQV,		"eqv")
+BU_P8V_OVERLOAD_2 (NAND,	"nand")
+BU_P8V_OVERLOAD_2 (ORC,		"orc")
 BU_P8V_OVERLOAD_2 (VADDUDM,	"vaddudm")
 BU_P8V_OVERLOAD_2 (VMAXSD,	"vmaxsd")
 BU_P8V_OVERLOAD_2 (VMAXUD,	"vmaxud")
 BU_P8V_OVERLOAD_2 (VMINSD,	"vminsd")
 BU_P8V_OVERLOAD_2 (VMINUD,	"vminud")
+BU_P8V_OVERLOAD_2 (VMRGEW,	"vmrgew")
+BU_P8V_OVERLOAD_2 (VMRGOW,	"vmrgow")
 BU_P8V_OVERLOAD_2 (VPKSDSS,	"vpksdss")
 BU_P8V_OVERLOAD_2 (VPKSDUS,	"vpksdus")
 BU_P8V_OVERLOAD_2 (VPKUDUM,	"vpkudum")
Index: gcc/config/rs6000/rs6000-c.c
===================================================================
--- gcc/config/rs6000/rs6000-c.c	(revision 199149)
+++ gcc/config/rs6000/rs6000-c.c	(working copy)
@@ -3515,6 +3515,404 @@ const struct altivec_builtin_types altiv
   { ALTIVEC_BUILTIN_VEC_VCMPGE_P, VSX_BUILTIN_XVCMPGEDP_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V2DF, RS6000_BTI_V2DF },
 
+  /* Power8 vector overloaded functions.  */
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V16QI,
+    RS6000_BTI_V16QI, RS6000_BTI_bool_V16QI, RS6000_BTI_V16QI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V16QI,
+    RS6000_BTI_V16QI, RS6000_BTI_V16QI, RS6000_BTI_bool_V16QI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V16QI,
+    RS6000_BTI_V16QI, RS6000_BTI_V16QI, RS6000_BTI_V16QI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V16QI,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_bool_V16QI,
+    RS6000_BTI_unsigned_V16QI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V16QI,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI,
+    RS6000_BTI_bool_V16QI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V16QI,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI,
+    RS6000_BTI_unsigned_V16QI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V8HI,
+    RS6000_BTI_V8HI, RS6000_BTI_bool_V8HI, RS6000_BTI_V8HI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V8HI,
+    RS6000_BTI_V8HI, RS6000_BTI_V8HI, RS6000_BTI_bool_V8HI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V8HI,
+    RS6000_BTI_V8HI, RS6000_BTI_V8HI, RS6000_BTI_V8HI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V8HI,
+    RS6000_BTI_unsigned_V8HI, RS6000_BTI_bool_V8HI,
+    RS6000_BTI_unsigned_V8HI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V8HI,
+    RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI,
+    RS6000_BTI_bool_V8HI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V8HI,
+    RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI,
+    RS6000_BTI_unsigned_V8HI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_bool_V4SI, RS6000_BTI_V4SI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_bool_V4SI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_bool_V4SI,
+    RS6000_BTI_unsigned_V4SI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
+    RS6000_BTI_bool_V4SI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
+    RS6000_BTI_unsigned_V4SI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V2DI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V2DI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_bool_V2DI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V2DI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V4SF,
+    RS6000_BTI_V4SF, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V2DF,
+    RS6000_BTI_V2DF, RS6000_BTI_V2DF, RS6000_BTI_V2DF, 0 },
+
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V16QI,
+    RS6000_BTI_V16QI, RS6000_BTI_bool_V16QI, RS6000_BTI_V16QI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V16QI,
+    RS6000_BTI_V16QI, RS6000_BTI_V16QI, RS6000_BTI_bool_V16QI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V16QI,
+    RS6000_BTI_V16QI, RS6000_BTI_V16QI, RS6000_BTI_V16QI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V16QI,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_bool_V16QI,
+    RS6000_BTI_unsigned_V16QI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V16QI,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI,
+    RS6000_BTI_bool_V16QI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V16QI,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI,
+    RS6000_BTI_unsigned_V16QI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V8HI,
+    RS6000_BTI_V8HI, RS6000_BTI_bool_V8HI, RS6000_BTI_V8HI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V8HI,
+    RS6000_BTI_V8HI, RS6000_BTI_V8HI, RS6000_BTI_bool_V8HI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V8HI,
+    RS6000_BTI_V8HI, RS6000_BTI_V8HI, RS6000_BTI_V8HI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V8HI,
+    RS6000_BTI_unsigned_V8HI, RS6000_BTI_bool_V8HI,
+    RS6000_BTI_unsigned_V8HI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V8HI,
+    RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI,
+    RS6000_BTI_bool_V8HI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V8HI,
+    RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI,
+    RS6000_BTI_unsigned_V8HI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_bool_V4SI, RS6000_BTI_V4SI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_bool_V4SI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_bool_V4SI,
+    RS6000_BTI_unsigned_V4SI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
+    RS6000_BTI_bool_V4SI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
+    RS6000_BTI_unsigned_V4SI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V2DI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V2DI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_bool_V2DI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V2DI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V4SF,
+    RS6000_BTI_V4SF, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V2DF,
+    RS6000_BTI_V2DF, RS6000_BTI_V2DF, RS6000_BTI_V2DF, 0 },
+
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V16QI,
+    RS6000_BTI_V16QI, RS6000_BTI_bool_V16QI, RS6000_BTI_V16QI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V16QI,
+    RS6000_BTI_V16QI, RS6000_BTI_V16QI, RS6000_BTI_bool_V16QI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V16QI,
+    RS6000_BTI_V16QI, RS6000_BTI_V16QI, RS6000_BTI_V16QI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V16QI,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_bool_V16QI,
+    RS6000_BTI_unsigned_V16QI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V16QI,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI,
+    RS6000_BTI_bool_V16QI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V16QI,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI,
+    RS6000_BTI_unsigned_V16QI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V8HI,
+    RS6000_BTI_V8HI, RS6000_BTI_bool_V8HI, RS6000_BTI_V8HI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V8HI,
+    RS6000_BTI_V8HI, RS6000_BTI_V8HI, RS6000_BTI_bool_V8HI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V8HI,
+    RS6000_BTI_V8HI, RS6000_BTI_V8HI, RS6000_BTI_V8HI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V8HI,
+    RS6000_BTI_unsigned_V8HI, RS6000_BTI_bool_V8HI,
+    RS6000_BTI_unsigned_V8HI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V8HI,
+    RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI,
+    RS6000_BTI_bool_V8HI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V8HI,
+    RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI,
+    RS6000_BTI_unsigned_V8HI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_bool_V4SI, RS6000_BTI_V4SI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_bool_V4SI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_bool_V4SI,
+    RS6000_BTI_unsigned_V4SI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
+    RS6000_BTI_bool_V4SI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
+    RS6000_BTI_unsigned_V4SI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V2DI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V2DI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_bool_V2DI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V2DI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V4SF,
+    RS6000_BTI_V4SF, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V2DF,
+    RS6000_BTI_V2DF, RS6000_BTI_V2DF, RS6000_BTI_V2DF, 0 },
+
+  { P8V_BUILTIN_VEC_VADDUDM, P8V_BUILTIN_VADDUDM,
+    RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VADDUDM, P8V_BUILTIN_VADDUDM,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VADDUDM, P8V_BUILTIN_VADDUDM,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VADDUDM, P8V_BUILTIN_VADDUDM,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VADDUDM, P8V_BUILTIN_VADDUDM,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VADDUDM, P8V_BUILTIN_VADDUDM,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+
+  { P8V_BUILTIN_VEC_VCLZ, P8V_BUILTIN_VCLZB,
+    RS6000_BTI_V16QI, RS6000_BTI_V16QI, 0, 0 },
+  { P8V_BUILTIN_VEC_VCLZ, P8V_BUILTIN_VCLZB,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0, 0 },
+  { P8V_BUILTIN_VEC_VCLZ, P8V_BUILTIN_VCLZH,
+    RS6000_BTI_V8HI, RS6000_BTI_V8HI, 0, 0 },
+  { P8V_BUILTIN_VEC_VCLZ, P8V_BUILTIN_VCLZH,
+    RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI, 0, 0 },
+  { P8V_BUILTIN_VEC_VCLZ, P8V_BUILTIN_VCLZW,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0, 0 },
+  { P8V_BUILTIN_VEC_VCLZ, P8V_BUILTIN_VCLZW,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, 0, 0 },
+  { P8V_BUILTIN_VEC_VCLZ, P8V_BUILTIN_VCLZD,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0, 0 },
+  { P8V_BUILTIN_VEC_VCLZ, P8V_BUILTIN_VCLZD,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0, 0 },
+
+  { P8V_BUILTIN_VEC_VCLZB, P8V_BUILTIN_VCLZB,
+    RS6000_BTI_V16QI, RS6000_BTI_V16QI, 0, 0 },
+  { P8V_BUILTIN_VEC_VCLZB, P8V_BUILTIN_VCLZB,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0, 0 },
+
+  { P8V_BUILTIN_VEC_VCLZH, P8V_BUILTIN_VCLZH,
+    RS6000_BTI_V8HI, RS6000_BTI_V8HI, 0, 0 },
+  { P8V_BUILTIN_VEC_VCLZH, P8V_BUILTIN_VCLZH,
+    RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI, 0, 0 },
+
+  { P8V_BUILTIN_VEC_VCLZW, P8V_BUILTIN_VCLZW,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0, 0 },
+  { P8V_BUILTIN_VEC_VCLZW, P8V_BUILTIN_VCLZW,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, 0, 0 },
+
+  { P8V_BUILTIN_VEC_VCLZD, P8V_BUILTIN_VCLZD,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0, 0 },
+  { P8V_BUILTIN_VEC_VCLZD, P8V_BUILTIN_VCLZD,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0, 0 },
+
+  { P8V_BUILTIN_VEC_VGBBD, P8V_BUILTIN_VGBBD,
+    RS6000_BTI_V16QI, RS6000_BTI_V16QI, 0, 0 },
+  { P8V_BUILTIN_VEC_VGBBD, P8V_BUILTIN_VGBBD,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0, 0 },
+
+  { P8V_BUILTIN_VEC_VMINSD, P8V_BUILTIN_VMINSD,
+    RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VMINSD, P8V_BUILTIN_VMINSD,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VMINSD, P8V_BUILTIN_VMINSD,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+
+  { P8V_BUILTIN_VEC_VMAXSD, P8V_BUILTIN_VMAXSD,
+    RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VMAXSD, P8V_BUILTIN_VMAXSD,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VMAXSD, P8V_BUILTIN_VMAXSD,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+
+  { P8V_BUILTIN_VEC_VMINUD, P8V_BUILTIN_VMINUD,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VMINUD, P8V_BUILTIN_VMINUD,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_bool_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VMINUD, P8V_BUILTIN_VMINUD,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
+
+  { P8V_BUILTIN_VEC_VMAXUD, P8V_BUILTIN_VMAXUD,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VMAXUD, P8V_BUILTIN_VMAXUD,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_bool_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VMAXUD, P8V_BUILTIN_VMAXUD,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
+
+  { P8V_BUILTIN_VEC_VMRGEW, P8V_BUILTIN_VMRGEW,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
+  { P8V_BUILTIN_VEC_VMRGEW, P8V_BUILTIN_VMRGEW,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
+    RS6000_BTI_unsigned_V4SI, 0 },
+
+  { P8V_BUILTIN_VEC_VMRGOW, P8V_BUILTIN_VMRGOW,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
+  { P8V_BUILTIN_VEC_VMRGOW, P8V_BUILTIN_VMRGOW,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
+    RS6000_BTI_unsigned_V4SI, 0 },
+
+  { P8V_BUILTIN_VEC_VPOPCNT, P8V_BUILTIN_VPOPCNTB,
+    RS6000_BTI_V16QI, RS6000_BTI_V16QI, 0, 0 },
+  { P8V_BUILTIN_VEC_VPOPCNT, P8V_BUILTIN_VPOPCNTB,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0, 0 },
+  { P8V_BUILTIN_VEC_VPOPCNT, P8V_BUILTIN_VPOPCNTH,
+    RS6000_BTI_V8HI, RS6000_BTI_V8HI, 0, 0 },
+  { P8V_BUILTIN_VEC_VPOPCNT, P8V_BUILTIN_VPOPCNTH,
+    RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI, 0, 0 },
+  { P8V_BUILTIN_VEC_VPOPCNT, P8V_BUILTIN_VPOPCNTW,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0, 0 },
+  { P8V_BUILTIN_VEC_VPOPCNT, P8V_BUILTIN_VPOPCNTW,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, 0, 0 },
+  { P8V_BUILTIN_VEC_VPOPCNT, P8V_BUILTIN_VPOPCNTD,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0, 0 },
+  { P8V_BUILTIN_VEC_VPOPCNT, P8V_BUILTIN_VPOPCNTD,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0, 0 },
+
+  { P8V_BUILTIN_VEC_VPOPCNTB, P8V_BUILTIN_VPOPCNTB,
+    RS6000_BTI_V16QI, RS6000_BTI_V16QI, 0, 0 },
+  { P8V_BUILTIN_VEC_VPOPCNTB, P8V_BUILTIN_VPOPCNTB,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0, 0 },
+
+  { P8V_BUILTIN_VEC_VPOPCNTH, P8V_BUILTIN_VPOPCNTH,
+    RS6000_BTI_V8HI, RS6000_BTI_V8HI, 0, 0 },
+  { P8V_BUILTIN_VEC_VPOPCNTH, P8V_BUILTIN_VPOPCNTH,
+    RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI, 0, 0 },
+
+  { P8V_BUILTIN_VEC_VPOPCNTW, P8V_BUILTIN_VPOPCNTW,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0, 0 },
+  { P8V_BUILTIN_VEC_VPOPCNTW, P8V_BUILTIN_VPOPCNTW,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, 0, 0 },
+
+  { P8V_BUILTIN_VEC_VPOPCNTD, P8V_BUILTIN_VPOPCNTD,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0, 0 },
+  { P8V_BUILTIN_VEC_VPOPCNTD, P8V_BUILTIN_VPOPCNTD,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0, 0 },
+
+  { P8V_BUILTIN_VEC_VPKUDUM, P8V_BUILTIN_VPKUDUM,
+    RS6000_BTI_V4SI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VPKUDUM, P8V_BUILTIN_VPKUDUM,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VPKUDUM, P8V_BUILTIN_VPKUDUM,
+    RS6000_BTI_bool_V4SI, RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V2DI, 0 },
+
+  { P8V_BUILTIN_VEC_VPKSDSS, P8V_BUILTIN_VPKSDSS,
+    RS6000_BTI_V4SI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+
+  { P8V_BUILTIN_VEC_VPKUDUS, P8V_BUILTIN_VPKUDUS,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+
+  { P8V_BUILTIN_VEC_VPKSDUS, P8V_BUILTIN_VPKSDUS,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+
+  { P8V_BUILTIN_VEC_VRLD, P8V_BUILTIN_VRLD,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VRLD, P8V_BUILTIN_VRLD,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+
+  { P8V_BUILTIN_VEC_VSLD, P8V_BUILTIN_VSLD,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VSLD, P8V_BUILTIN_VSLD,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+
+  { P8V_BUILTIN_VEC_VSRD, P8V_BUILTIN_VSRD,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VSRD, P8V_BUILTIN_VSRD,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+
+  { P8V_BUILTIN_VEC_VSRAD, P8V_BUILTIN_VSRAD,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VSRAD, P8V_BUILTIN_VSRD,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+
+  { P8V_BUILTIN_VEC_VSUBUDM, P8V_BUILTIN_VSUBUDM,
+    RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VSUBUDM, P8V_BUILTIN_VSUBUDM,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VSUBUDM, P8V_BUILTIN_VSUBUDM,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VSUBUDM, P8V_BUILTIN_VSUBUDM,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VSUBUDM, P8V_BUILTIN_VSUBUDM,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VSUBUDM, P8V_BUILTIN_VSUBUDM,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+
+  { P8V_BUILTIN_VEC_VUPKHSW, P8V_BUILTIN_VUPKHSW,
+    RS6000_BTI_V2DI, RS6000_BTI_V4SI, 0, 0 },
+  { P8V_BUILTIN_VEC_VUPKHSW, P8V_BUILTIN_VUPKHSW,
+    RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V4SI, 0, 0 },
+
+  { P8V_BUILTIN_VEC_VUPKLSW, P8V_BUILTIN_VUPKLSW,
+    RS6000_BTI_V2DI, RS6000_BTI_V4SI, 0, 0 },
+  { P8V_BUILTIN_VEC_VUPKLSW, P8V_BUILTIN_VUPKLSW,
+    RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V4SI, 0, 0 },
+
+  { P8V_BUILTIN_VEC_VGBBD, P8V_BUILTIN_VGBBD,
+    RS6000_BTI_V16QI, 0, 0, 0 },
+  { P8V_BUILTIN_VEC_VGBBD, P8V_BUILTIN_VGBBD,
+    RS6000_BTI_unsigned_V16QI, 0, 0, 0 },
+
   /* Crypto builtins.  */
   { CRYPTO_BUILTIN_VPERMXOR, CRYPTO_BUILTIN_VPERMXOR_V16QI,
     RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI,
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 199149)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -2859,6 +2859,37 @@ rs6000_option_override_internal (bool gl
 	}
     }
 
+  /* Quad memory only works in 64-bit mode, if the user did -mcpu=power8 -m32,
+     silently turn off quad memory mode.
+
+     Also, we must not split wide types with quad memory, because it will mess
+     up the quad word atomic operations combined with a logical operation.  The
+     logical operations splitters will handle splitting up the operation if it
+     is done in a GPR instead of a VSX register.  */
+  if (TARGET_QUAD_MEMORY)
+    {
+      if (!TARGET_64BIT)
+	{
+	  if ((rs6000_isa_flags_explicit & OPTION_MASK_QUAD_MEMORY) != 0)
+	    warning (0, N_("-mquad-memory requires 64-bit mode"));
+
+	  rs6000_isa_flags &= ~OPTION_MASK_QUAD_MEMORY;
+	}
+
+      else if (flag_split_wide_types)
+	{
+	  if (global_options_set.x_flag_split_wide_types)
+	    {
+	      warning (0, N_("-mquad-memory is incompatible with "
+			     "-fsplit-wide-types"));
+
+	      rs6000_isa_flags &= ~OPTION_MASK_QUAD_MEMORY;
+	    }
+	  else
+	    flag_split_wide_types = 0;
+	}
+    }
+
   if (TARGET_DEBUG_REG || TARGET_DEBUG_TARGET)
     rs6000_print_isa_options (stderr, 0, "before defaults", rs6000_isa_flags);
 
@@ -4082,6 +4113,22 @@ rs6000_builtin_vectorized_function (tree
       enum built_in_function fn = DECL_FUNCTION_CODE (fndecl);
       switch (fn)
 	{
+	case BUILT_IN_CLZIMAX:
+	case BUILT_IN_CLZLL:
+	case BUILT_IN_CLZL:
+	case BUILT_IN_CLZ:
+	  if (TARGET_P8_VECTOR && in_mode == out_mode && out_n == in_n)
+	    {
+	      if (out_mode == QImode && out_n == 16)
+		return rs6000_builtin_decls[P8V_BUILTIN_VCLZB];
+	      else if (out_mode == HImode && out_n == 8)
+		return rs6000_builtin_decls[P8V_BUILTIN_VCLZH];
+	      else if (out_mode == SImode && out_n == 4)
+		return rs6000_builtin_decls[P8V_BUILTIN_VCLZW];
+	      else if (out_mode == DImode && out_n == 2)
+		return rs6000_builtin_decls[P8V_BUILTIN_VCLZD];
+	    }
+	  break;
 	case BUILT_IN_COPYSIGN:
 	  if (VECTOR_UNIT_VSX_P (V2DFmode)
 	      && out_mode == DFmode && out_n == 2
@@ -4097,6 +4144,22 @@ rs6000_builtin_vectorized_function (tree
 	  if (VECTOR_UNIT_ALTIVEC_P (V4SFmode))
 	    return rs6000_builtin_decls[ALTIVEC_BUILTIN_COPYSIGN_V4SF];
 	  break;
+	case BUILT_IN_POPCOUNTIMAX:
+	case BUILT_IN_POPCOUNTLL:
+	case BUILT_IN_POPCOUNTL:
+	case BUILT_IN_POPCOUNT:
+	  if (TARGET_P8_VECTOR && in_mode == out_mode && out_n == in_n)
+	    {
+	      if (out_mode == QImode && out_n == 16)
+		return rs6000_builtin_decls[P8V_BUILTIN_VPOPCNTB];
+	      else if (out_mode == HImode && out_n == 8)
+		return rs6000_builtin_decls[P8V_BUILTIN_VPOPCNTH];
+	      else if (out_mode == SImode && out_n == 4)
+		return rs6000_builtin_decls[P8V_BUILTIN_VPOPCNTW];
+	      else if (out_mode == DImode && out_n == 2)
+		return rs6000_builtin_decls[P8V_BUILTIN_VPOPCNTD];
+	    }
+	  break;
 	case BUILT_IN_SQRT:
 	  if (VECTOR_UNIT_VSX_P (V2DFmode)
 	      && out_mode == DFmode && out_n == 2
@@ -4955,8 +5018,11 @@ rs6000_expand_vector_init (rtx target, r
 	{
 	  rtx freg = gen_reg_rtx (V4SFmode);
 	  rtx sreg = force_reg (SFmode, XVECEXP (vals, 0, 0));
+	  rtx cvt  = ((TARGET_XSCVDPSPN)
+		      ? gen_vsx_xscvdpspn_scalar (freg, sreg)
+		      : gen_vsx_xscvdpsp_scalar (freg, sreg));
 
-	  emit_insn (gen_vsx_xscvdpsp_scalar (freg, sreg));
+	  emit_insn (cvt);
 	  emit_insn (gen_vsx_xxspltw_v4sf (target, freg, const0_rtx));
 	}
       else
@@ -12857,6 +12923,7 @@ builtin_function_type (enum machine_mode
     {
       /* unsigned 1 argument functions.  */
     case CRYPTO_BUILTIN_VSBOX:
+    case P8V_BUILTIN_VGBBD:
       h.uns_p[0] = 1;
       h.uns_p[1] = 1;
       break;
@@ -27197,26 +27264,31 @@ bool
 altivec_expand_vec_perm_const (rtx operands[4])
 {
   struct altivec_perm_insn {
+    HOST_WIDE_INT mask;
     enum insn_code impl;
     unsigned char perm[16];
   };
   static const struct altivec_perm_insn patterns[] = {
-    { CODE_FOR_altivec_vpkuhum,
+    { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vpkuhum,
       {  1,  3,  5,  7,  9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31 } },
-    { CODE_FOR_altivec_vpkuwum,
+    { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vpkuwum,
       {  2,  3,  6,  7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31 } },
-    { CODE_FOR_altivec_vmrghb,
+    { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghb,
       {  0, 16,  1, 17,  2, 18,  3, 19,  4, 20,  5, 21,  6, 22,  7, 23 } },
-    { CODE_FOR_altivec_vmrghh,
+    { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghh,
       {  0,  1, 16, 17,  2,  3, 18, 19,  4,  5, 20, 21,  6,  7, 22, 23 } },
-    { CODE_FOR_altivec_vmrghw,
+    { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghw,
       {  0,  1,  2,  3, 16, 17, 18, 19,  4,  5,  6,  7, 20, 21, 22, 23 } },
-    { CODE_FOR_altivec_vmrglb,
+    { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglb,
       {  8, 24,  9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31 } },
-    { CODE_FOR_altivec_vmrglh,
+    { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglh,
       {  8,  9, 24, 25, 10, 11, 26, 27, 12, 13, 28, 29, 14, 15, 30, 31 } },
-    { CODE_FOR_altivec_vmrglw,
-      {  8,  9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31 } }
+    { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglw,
+      {  8,  9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31 } },
+    { OPTION_MASK_P8_VECTOR, CODE_FOR_p8_vmrgew,
+      {  0,  1,  2,  3, 16, 17, 18, 19,  8,  9, 10, 11, 24, 25, 26, 27 } },
+    { OPTION_MASK_P8_VECTOR, CODE_FOR_p8_vmrgow,
+      {  4,  5,  6,  7, 20, 21, 22, 23, 12, 13, 14, 15, 28, 29, 30, 31 } }
   };
 
   unsigned int i, j, elt, which;
@@ -27316,6 +27388,9 @@ altivec_expand_vec_perm_const (rtx opera
     {
       bool swapped;
 
+      if ((patterns[j].mask & rs6000_isa_flags) == 0)
+	continue;
+
       elt = patterns[j].perm[0];
       if (perm[0] == elt)
 	swapped = false;
Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md	(revision 199149)
+++ gcc/config/rs6000/vsx.md	(working copy)
@@ -40,6 +40,14 @@ (define_mode_iterator VSX_L [V16QI V8HI 
 ;; it to use gprs as well as vsx registers.
 (define_mode_iterator VSX_M [V16QI V8HI V4SI V2DI V4SF V2DF])
 
+(define_mode_iterator VSX_M2 [V16QI
+			      V8HI
+			      V4SI
+			      V2DI
+			      V4SF
+			      V2DF
+			      (TI	"TARGET_VSX_TIMODE")])
+
 ;; Map into the appropriate load/store name based on the type
 (define_mode_attr VSm  [(V16QI "vw4")
 			(V8HI  "vw4")
@@ -191,6 +199,8 @@ (define_c_enum "unspec"
    UNSPEC_VSX_CVDPSXWS
    UNSPEC_VSX_CVDPUXWS
    UNSPEC_VSX_CVSPDP
+   UNSPEC_VSX_CVSPDPN
+   UNSPEC_VSX_CVDPSPN
    UNSPEC_VSX_CVSXWDP
    UNSPEC_VSX_CVUXWDP
    UNSPEC_VSX_CVSXDSP
@@ -1003,6 +1013,40 @@ (define_insn "vsx_xscvspdp_scalar2"
   "xscvspdp %x0,%x1"
   [(set_attr "type" "fp")])
 
+;; Power8 versions using xscvdpspn/xscvspdpn
+(define_insn "vsx_xscvdpspn"
+  [(set (match_operand:V4SF 0 "vsx_register_operand" "=ws,?wa")
+	(unspec:V4SF [(match_operand:DF 1 "vsx_register_operand" "wd,wa")]
+		     UNSPEC_VSX_CVDPSPN))]
+  "TARGET_XSCVDPSPN"
+  "xscvdpspn %x0,%x1"
+  [(set_attr "type" "fp")])
+
+(define_insn "vsx_xscvspdpn"
+  [(set (match_operand:DF 0 "vsx_register_operand" "=ws,?wa")
+	(unspec:DF [(match_operand:V4SF 1 "vsx_register_operand" "wa,wa")]
+		   UNSPEC_VSX_CVSPDPN))]
+  "TARGET_XSCVSPDPN"
+  "xscvspdpn %x0,%x1"
+  [(set_attr "type" "fp")])
+
+(define_insn "vsx_xscvdpspn_scalar"
+  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa")
+	(unspec:V4SF [(match_operand:SF 1 "vsx_register_operand" "f")]
+		     UNSPEC_VSX_CVDPSPN))]
+  "TARGET_XSCVDPSPN"
+  "xscvdpspn %x0,%x1"
+  [(set_attr "type" "fp")])
+
+;; Used by direct move of SFmode from gpr to VSX register
+(define_insn "vsx_xscvspdpn_directmove"
+  [(set (match_operand:SF 0 "vsx_register_operand" "=wa")
+	(unspec:SF [(match_operand:SF 1 "vsx_register_operand" "wa")]
+		   UNSPEC_VSX_CVSPDPN))]
+  "TARGET_XSCVSPDPN"
+  "xscvspdpn %x0,%x1"
+  [(set_attr "type" "fp")])
+
 ;; Convert from 64-bit to 32-bit types
 ;; Note, favor the Altivec registers since the usual use of these instructions
 ;; is in vector converts and we need to use the Altivec vperm instruction.
@@ -1088,70 +1132,397 @@ (define_insn "*vsx_float_fix_<mode>2"
    (set_attr "fp_type" "<VSfptype_simple>")])
 
 \f
-;; Logical operations
+;; Logical operations.  If we have quad memory, we allow doing these operations
+;; in GPRs to allow for the use of the quad word atomic load/store operations.
 ;; Do not support TImode logical instructions on 32-bit at present, because the
 ;; compiler will see that we have a TImode and when it wanted DImode, and
 ;; convert the DImode to TImode, store it on the stack, and load it in a VSX
 ;; register.
+
+;; When we are splitting the operations to GPRs, we use three alternatives, two
+;; where the first/second inputs and output are in the same register, and the
+;; third where the output specifies an early clobber so that we don't have to
+;; worry about overlapping registers.
+
 (define_insn "*vsx_and<mode>3"
-  [(set (match_operand:VSX_L 0 "vsx_register_operand" "=<VSr>,?wa")
-        (and:VSX_L
-	 (match_operand:VSX_L 1 "vsx_register_operand" "<VSr>,wa")
-	 (match_operand:VSX_L 2 "vsx_register_operand" "<VSr>,wa")))]
-  "VECTOR_MEM_VSX_P (<MODE>mode)
+  [(set (match_operand:VSX_L 0 "vlogical_operand" "=wa")
+        (and:VSX_L (match_operand:VSX_L 1 "vlogical_operand" "%wa")
+		   (match_operand:VSX_L 2 "vlogical_operand" "wa")))
+   (clobber (match_scratch:CC 3 "X"))]
+  "VECTOR_MEM_VSX_P (<MODE>mode) && !TARGET_QUAD_MEMORY
    && (<MODE>mode != TImode || TARGET_POWERPC64)"
   "xxland %x0,%x1,%x2"
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecsimple")
+   (set_attr "length" "4")])
+
+(define_insn_and_split "*vsx_or_gpr_and<mode>3"
+  [(set (match_operand:VSX_L 0 "vlogical_operand" "=wa,?wr,?wr,&?wr")
+        (and:VSX_L
+	 (match_operand:VSX_L 1 "vlogical_operand" "%wa,0,wr,wr")
+	 (match_operand:VSX_L 2 "vlogical_operand" "wa,wr,0,wr")))
+   (clobber (match_scratch:CC 3 "X,X,X,X"))]
+  "VECTOR_MEM_VSX_P (<MODE>mode) && TARGET_QUAD_MEMORY
+   && (<MODE>mode != TImode || TARGET_POWERPC64)"
+  "@
+   xxland %x0,%x1,%x2
+   #
+   #
+   #"
+  "reload_completed && VECTOR_MEM_VSX_P (<MODE>mode) && TARGET_QUAD_MEMORY
+   && (<MODE>mode != TImode || TARGET_POWERPC64)
+   && int_reg_operand (operands[0], <MODE>mode)"
+  [(parallel [(set (match_dup 4) (and:DI (match_dup 5) (match_dup 6)))
+	      (clobber (match_dup 3))])
+   (parallel [(set (match_dup 7) (and:DI (match_dup 8) (match_dup 9)))
+	      (clobber (match_dup 3))])]
+{
+  operands[4] = simplify_subreg (DImode, operands[0], <MODE>mode, 0);
+  operands[5] = simplify_subreg (DImode, operands[1], <MODE>mode, 0);
+  operands[6] = simplify_subreg (DImode, operands[2], <MODE>mode, 0);
+  operands[7] = simplify_subreg (DImode, operands[0], <MODE>mode, 8);
+  operands[8] = simplify_subreg (DImode, operands[1], <MODE>mode, 8);
+  operands[9] = simplify_subreg (DImode, operands[2], <MODE>mode, 8);
+}
+  [(set_attr "type" "vecsimple,two,two,two")
+   (set_attr "length" "4,8,8,8")])
 
 (define_insn "*vsx_ior<mode>3"
-  [(set (match_operand:VSX_L 0 "vsx_register_operand" "=<VSr>,?wa")
-        (ior:VSX_L (match_operand:VSX_L 1 "vsx_register_operand" "<VSr>,wa")
-		   (match_operand:VSX_L 2 "vsx_register_operand" "<VSr>,wa")))]
-  "VECTOR_MEM_VSX_P (<MODE>mode)
+  [(set (match_operand:VSX_L 0 "vlogical_operand" "=wa")
+        (ior:VSX_L (match_operand:VSX_L 1 "vlogical_operand" "%wa")
+		   (match_operand:VSX_L 2 "vlogical_operand" "wa")))]
+  "VECTOR_MEM_VSX_P (<MODE>mode) && !TARGET_QUAD_MEMORY
    && (<MODE>mode != TImode || TARGET_POWERPC64)"
   "xxlor %x0,%x1,%x2"
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecsimple")
+   (set_attr "length" "4")])
+
+(define_insn_and_split "*vsx_or_gpr_ior<mode>3"
+  [(set (match_operand:VSX_L 0 "vlogical_operand" "=wa,?wr,?wr,&?wr,?wr,&?wr")
+        (ior:VSX_L
+	 (match_operand:VSX_L 1 "vlogical_operand" "%wa,0,wr,wr,0,wr")
+	 (match_operand:VSX_L 2 "vsx_reg_or_cint_operand" "wa,wr,0,wr,n,n")))]
+  "VECTOR_MEM_VSX_P (<MODE>mode) && TARGET_QUAD_MEMORY
+   && (<MODE>mode != TImode || TARGET_POWERPC64)"
+  "@
+   xxlor %x0,%x1,%x2
+   #
+   #
+   #
+   #
+   #"
+  "reload_completed && VECTOR_MEM_VSX_P (<MODE>mode) && TARGET_QUAD_MEMORY
+   && (<MODE>mode != TImode || TARGET_POWERPC64)
+   && int_reg_operand (operands[0], <MODE>mode)"
+  [(const_int 0)]
+{
+  operands[3] = simplify_subreg (DImode, operands[0], <MODE>mode, 0);
+  operands[4] = simplify_subreg (DImode, operands[1], <MODE>mode, 0);
+  operands[5] = simplify_subreg (DImode, operands[2], <MODE>mode, 0);
+  operands[6] = simplify_subreg (DImode, operands[0], <MODE>mode, 8);
+  operands[7] = simplify_subreg (DImode, operands[1], <MODE>mode, 8);
+  operands[8] = simplify_subreg (DImode, operands[2], <MODE>mode, 8);
+
+  if (operands[5] == constm1_rtx)
+    emit_move_insn (operands[3], constm1_rtx);
+
+  else if (operands[5] == const0_rtx)
+    {
+      if (!rtx_equal_p (operands[3], operands[4]))
+	emit_move_insn (operands[3], operands[4]);
+    }
+  else
+    emit_insn (gen_iordi3 (operands[3], operands[4], operands[5]));
+
+  if (operands[8] == constm1_rtx)
+    emit_move_insn (operands[8], constm1_rtx);
+
+  else if (operands[8] == const0_rtx)
+    {
+      if (!rtx_equal_p (operands[6], operands[7]))
+	emit_move_insn (operands[6], operands[7]);
+    }
+  else
+    emit_insn (gen_iordi3 (operands[6], operands[7], operands[8]));
+  DONE;
+}
+  [(set_attr "type" "vecsimple,two,two,two,three,three")
+   (set_attr "length" "4,8,8,8,16,16")])
 
 (define_insn "*vsx_xor<mode>3"
-  [(set (match_operand:VSX_L 0 "vsx_register_operand" "=<VSr>,?wa")
-        (xor:VSX_L
-	 (match_operand:VSX_L 1 "vsx_register_operand" "<VSr>,wa")
-	 (match_operand:VSX_L 2 "vsx_register_operand" "<VSr>,wa")))]
-  "VECTOR_MEM_VSX_P (<MODE>mode)
+  [(set (match_operand:VSX_L 0 "vlogical_operand" "=wa")
+        (xor:VSX_L (match_operand:VSX_L 1 "vlogical_operand" "%wa")
+		   (match_operand:VSX_L 2 "vlogical_operand" "wa")))]
+  "VECTOR_MEM_VSX_P (<MODE>mode) && !TARGET_QUAD_MEMORY
    && (<MODE>mode != TImode || TARGET_POWERPC64)"
   "xxlxor %x0,%x1,%x2"
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecsimple")
+   (set_attr "length" "4")])
+
+(define_insn_and_split "*vsx_or_gpr_xor<mode>3"
+  [(set (match_operand:VSX_L 0 "vlogical_operand" "=wa,?wr,?wr,&?wr,?wr,&?wr")
+        (xor:VSX_L
+	 (match_operand:VSX_L 1 "vlogical_operand" "%wa,0,wr,wr,0,wr")
+	 (match_operand:VSX_L 2 "vsx_reg_or_cint_operand" "wa,wr,0,wr,n,n")))]
+  "VECTOR_MEM_VSX_P (<MODE>mode) && TARGET_QUAD_MEMORY
+   && (<MODE>mode != TImode || TARGET_POWERPC64)"
+  "@
+   xxlxor %x0,%x1,%x2
+   #
+   #
+   #
+   #
+   #"
+  "reload_completed && VECTOR_MEM_VSX_P (<MODE>mode) && TARGET_QUAD_MEMORY
+   && (<MODE>mode != TImode || TARGET_POWERPC64)
+   && int_reg_operand (operands[0], <MODE>mode)"
+  [(set (match_dup 3) (xor:DI (match_dup 4) (match_dup 5)))
+   (set (match_dup 6) (xor:DI (match_dup 7) (match_dup 8)))]
+{
+  operands[3] = simplify_subreg (DImode, operands[0], <MODE>mode, 0);
+  operands[4] = simplify_subreg (DImode, operands[1], <MODE>mode, 0);
+  operands[5] = simplify_subreg (DImode, operands[2], <MODE>mode, 0);
+  operands[6] = simplify_subreg (DImode, operands[0], <MODE>mode, 8);
+  operands[7] = simplify_subreg (DImode, operands[1], <MODE>mode, 8);
+  operands[8] = simplify_subreg (DImode, operands[2], <MODE>mode, 8);
+}
+  [(set_attr "type" "vecsimple,two,two,two,three,three")
+   (set_attr "length" "4,8,8,8,16,16")])
 
 (define_insn "*vsx_one_cmpl<mode>2"
-  [(set (match_operand:VSX_L 0 "vsx_register_operand" "=<VSr>,?wa")
-        (not:VSX_L
-	 (match_operand:VSX_L 1 "vsx_register_operand" "<VSr>,wa")))]
-  "VECTOR_MEM_VSX_P (<MODE>mode)
+  [(set (match_operand:VSX_L 0 "vlogical_operand" "=wa")
+        (not:VSX_L (match_operand:VSX_L 1 "vlogical_operand" "wa")))]
+  "VECTOR_MEM_VSX_P (<MODE>mode) && !TARGET_QUAD_MEMORY
    && (<MODE>mode != TImode || TARGET_POWERPC64)"
   "xxlnor %x0,%x1,%x1"
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecsimple")
+   (set_attr "length" "4")])
+
+(define_insn_and_split "*vsx_one_cmpl<mode>2"
+  [(set (match_operand:VSX_L 0 "vlogical_operand" "=wa,?wr,&?wr")
+        (not:VSX_L
+	 (match_operand:VSX_L 1 "vlogical_operand" "wa,0,wr")))]
+  "VECTOR_MEM_VSX_P (<MODE>mode) && TARGET_QUAD_MEMORY
+   && (<MODE>mode != TImode || TARGET_POWERPC64)"
+  "@
+   xxlnor %x0,%x1,%x1
+   #
+   #"
+  "reload_completed && VECTOR_MEM_VSX_P (<MODE>mode) && TARGET_QUAD_MEMORY
+   && (<MODE>mode != TImode || TARGET_POWERPC64)
+   && int_reg_operand (operands[0], <MODE>mode)"
+  [(set (match_dup 2) (not:DI (match_dup 3)))
+   (set (match_dup 4) (not:DI (match_dup 5)))]
+{
+  operands[2] = simplify_subreg (DImode, operands[0], <MODE>mode, 0);
+  operands[3] = simplify_subreg (DImode, operands[1], <MODE>mode, 0);
+  operands[4] = simplify_subreg (DImode, operands[0], <MODE>mode, 8);
+  operands[5] = simplify_subreg (DImode, operands[1], <MODE>mode, 8);
+}
+  [(set_attr "type" "vecsimple,two,two")
+   (set_attr "length" "4,8,8")])
   
 (define_insn "*vsx_nor<mode>3"
-  [(set (match_operand:VSX_L 0 "vsx_register_operand" "=<VSr>,?wa")
+  [(set (match_operand:VSX_L 0 "vlogical_operand" "=wa")
         (not:VSX_L
-	 (ior:VSX_L
-	  (match_operand:VSX_L 1 "vsx_register_operand" "<VSr>,?wa")
-	  (match_operand:VSX_L 2 "vsx_register_operand" "<VSr>,?wa"))))]
-  "VECTOR_MEM_VSX_P (<MODE>mode)
+	 (ior:VSX_L (match_operand:VSX_L 1 "vlogical_operand" "%wa")
+		    (match_operand:VSX_L 2 "vlogical_operand" "wa"))))]
+  "VECTOR_MEM_VSX_P (<MODE>mode) && !TARGET_QUAD_MEMORY
    && (<MODE>mode != TImode || TARGET_POWERPC64)"
   "xxlnor %x0,%x1,%x2"
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecsimple")
+   (set_attr "length" "4")])
+
+(define_insn_and_split "*vsx_or_gpr_nor<mode>3"
+  [(set (match_operand:VSX_L 0 "vlogical_operand" "=wa,?wr,?wr,&?wr")
+        (not:VSX_L
+	 (ior:VSX_L
+	  (match_operand:VSX_L 1 "vlogical_operand" "wa,0,wr,wr")
+	  (match_operand:VSX_L 2 "vlogical_operand" "wa,wr,0,wr"))))]
+  "VECTOR_MEM_VSX_P (<MODE>mode) && TARGET_QUAD_MEMORY
+   && (<MODE>mode != TImode || TARGET_POWERPC64)"
+  "@
+   xxlnor %x0,%x1,%x2
+   #
+   #
+   #"
+  "reload_completed && VECTOR_MEM_VSX_P (<MODE>mode) && TARGET_QUAD_MEMORY
+   && (<MODE>mode != TImode || TARGET_POWERPC64)
+   && int_reg_operand (operands[0], <MODE>mode)"
+  [(set (match_dup 3) (not:DI (ior:DI (match_dup 4) (match_dup 5))))
+   (set (match_dup 6) (not:DI (ior:DI (match_dup 7) (match_dup 8))))]
+{
+  operands[3] = simplify_subreg (DImode, operands[0], <MODE>mode, 0);
+  operands[4] = simplify_subreg (DImode, operands[1], <MODE>mode, 0);
+  operands[5] = simplify_subreg (DImode, operands[2], <MODE>mode, 0);
+  operands[6] = simplify_subreg (DImode, operands[0], <MODE>mode, 8);
+  operands[7] = simplify_subreg (DImode, operands[1], <MODE>mode, 8);
+  operands[8] = simplify_subreg (DImode, operands[2], <MODE>mode, 8);
+}
+  [(set_attr "type" "vecsimple,two,two,two")
+   (set_attr "length" "4,8,8,8")])
 
 (define_insn "*vsx_andc<mode>3"
-  [(set (match_operand:VSX_L 0 "vsx_register_operand" "=<VSr>,?wa")
+  [(set (match_operand:VSX_L 0 "vlogical_operand" "=wa")
         (and:VSX_L
 	 (not:VSX_L
-	  (match_operand:VSX_L 2 "vsx_register_operand" "<VSr>,?wa"))
-	 (match_operand:VSX_L 1 "vsx_register_operand" "<VSr>,?wa")))]
-  "VECTOR_MEM_VSX_P (<MODE>mode)
+	  (match_operand:VSX_L 2 "vlogical_operand" "wa"))
+	 (match_operand:VSX_L 1 "vlogical_operand" "wa")))]
+  "VECTOR_MEM_VSX_P (<MODE>mode) && !TARGET_QUAD_MEMORY
    && (<MODE>mode != TImode || TARGET_POWERPC64)"
   "xxlandc %x0,%x1,%x2"
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecsimple")
+   (set_attr "length" "4")])
+
+(define_insn_and_split "*vsx_or_gpr_andc<mode>3"
+  [(set (match_operand:VSX_L 0 "vlogical_operand" "=wa,?wr,?wr,?wr")
+        (and:VSX_L
+	 (not:VSX_L
+	  (match_operand:VSX_L 2 "vlogical_operand" "wa,0,wr,wr"))
+	 (match_operand:VSX_L 1 "vlogical_operand" "wa,wr,0,wr")))]
+  "VECTOR_MEM_VSX_P (<MODE>mode) && TARGET_QUAD_MEMORY
+   && (<MODE>mode != TImode || TARGET_POWERPC64)"
+  "@
+   xxlandc %x0,%x1,%x2
+   #
+   #
+   #"
+  "reload_completed && VECTOR_MEM_VSX_P (<MODE>mode) && TARGET_QUAD_MEMORY
+   && (<MODE>mode != TImode || TARGET_POWERPC64)
+   && int_reg_operand (operands[0], <MODE>mode)"
+  [(set (match_dup 3) (and:DI (not:DI (match_dup 4)) (match_dup 5)))
+   (set (match_dup 6) (and:DI (not:DI (match_dup 7)) (match_dup 8)))]
+{
+  operands[3] = simplify_subreg (DImode, operands[0], <MODE>mode, 0);
+  operands[4] = simplify_subreg (DImode, operands[1], <MODE>mode, 0);
+  operands[5] = simplify_subreg (DImode, operands[2], <MODE>mode, 0);
+  operands[6] = simplify_subreg (DImode, operands[0], <MODE>mode, 8);
+  operands[7] = simplify_subreg (DImode, operands[1], <MODE>mode, 8);
+  operands[8] = simplify_subreg (DImode, operands[2], <MODE>mode, 8);
+}
+  [(set_attr "type" "vecsimple,two,two,two")
+   (set_attr "length" "4,8,8,8")])
+
+;; Power8 vector logical instructions.  We only generate the VSX form of the
+;; instruction (xxl<xxx> vs. v<xxx>).
+(define_insn "*vsx_eqv<mode>3"
+  [(set (match_operand:VSX_L 0 "vlogical_operand" "=wa")
+	(not:VSX_L
+	 (xor:VSX_L (match_operand:VSX_L 1 "vlogical_operand" "wa")
+		    (match_operand:VSX_L 2 "vlogical_operand" "wa"))))]
+  "TARGET_P8_VECTOR && VECTOR_MEM_VSX_P (<MODE>mode) && !TARGET_QUAD_MEMORY
+   && (<MODE>mode != TImode || TARGET_POWERPC64)"
+  "xxleqv %x0,%x1,%x2"
+  [(set_attr "type" "vecsimple")
+   (set_attr "length" "4")])
+
+(define_insn_and_split "*vsx_or_gpr_eqv<mode>3"
+  [(set (match_operand:VSX_L 0 "vlogical_operand" "=wa,?wr,?wr,?wr")
+	(not:VSX_L
+	 (xor:VSX_L (match_operand:VSX_L 1 "vlogical_operand" "wa,0,wr,wr")
+		    (match_operand:VSX_L 2 "vlogical_operand" "wa,wr,0,wr"))))]
+  "TARGET_P8_VECTOR && VECTOR_MEM_VSX_P (<MODE>mode) && TARGET_QUAD_MEMORY
+   && (<MODE>mode != TImode || TARGET_POWERPC64)"
+  "@
+   xxleqv %x0,%x1,%x2
+   #
+   #
+   #"
+  "reload_completed && VECTOR_MEM_VSX_P (<MODE>mode) && TARGET_QUAD_MEMORY
+   && (<MODE>mode != TImode || TARGET_POWERPC64)
+   && int_reg_operand (operands[0], <MODE>mode)"
+  [(set (match_dup 3) (not:DI (xor:DI (match_dup 4) (match_dup 5))))
+   (set (match_dup 6) (not:DI (xor:DI (match_dup 7) (match_dup 8))))]
+{
+  operands[3] = simplify_subreg (DImode, operands[0], <MODE>mode, 0);
+  operands[4] = simplify_subreg (DImode, operands[1], <MODE>mode, 0);
+  operands[5] = simplify_subreg (DImode, operands[2], <MODE>mode, 0);
+  operands[6] = simplify_subreg (DImode, operands[0], <MODE>mode, 8);
+  operands[7] = simplify_subreg (DImode, operands[1], <MODE>mode, 8);
+  operands[8] = simplify_subreg (DImode, operands[2], <MODE>mode, 8);
+}
+  [(set_attr "type" "vecsimple,two,two,two")
+   (set_attr "length" "4,8,8,8")])
+
+;; Rewrite nand into canonical form
+(define_insn "*vsx_nand<mode>3"
+  [(set (match_operand:VSX_L 0 "vlogical_operand" "=wa")
+	(ior:VSX_L
+	 (not:VSX_L (match_operand:VSX_L 1 "vlogical_operand" "wa"))
+	 (not:VSX_L (match_operand:VSX_L 2 "vlogical_operand" "wa"))))]
+  "TARGET_P8_VECTOR && VECTOR_MEM_VSX_P (<MODE>mode) && !TARGET_QUAD_MEMORY
+   && (<MODE>mode != TImode || TARGET_POWERPC64)"
+  "xxlnand %x0,%x1,%x2"
+  [(set_attr "type" "vecsimple")
+   (set_attr "length" "4")])
+
+(define_insn_and_split "*vsx_or_gpr_nand<mode>3"
+  [(set (match_operand:VSX_L 0 "register_operand" "=wa,?wr,?wr,?wr")
+	(ior:VSX_L
+	 (not:VSX_L (match_operand:VSX_L 1 "register_operand" "wa,0,wr,wr"))
+	 (not:VSX_L (match_operand:VSX_L 2 "register_operand" "wa,wr,0,wr"))))]
+  "TARGET_P8_VECTOR && VECTOR_MEM_VSX_P (<MODE>mode) && TARGET_QUAD_MEMORY
+   && (<MODE>mode != TImode || TARGET_POWERPC64)"
+  "@
+   xxlnand %x0,%x1,%x2
+   #
+   #
+   #"
+  "reload_completed && VECTOR_MEM_VSX_P (<MODE>mode) && TARGET_QUAD_MEMORY
+   && (<MODE>mode != TImode || TARGET_POWERPC64)
+   && int_reg_operand (operands[0], <MODE>mode)"
+  [(set (match_dup 3) (ior:DI (not:DI (match_dup 4)) (not:DI (match_dup 5))))
+   (set (match_dup 6) (ior:DI (not:DI (match_dup 7)) (not:DI (match_dup 8))))]
+{
+  operands[3] = simplify_subreg (DImode, operands[0], <MODE>mode, 0);
+  operands[4] = simplify_subreg (DImode, operands[1], <MODE>mode, 0);
+  operands[5] = simplify_subreg (DImode, operands[2], <MODE>mode, 0);
+  operands[6] = simplify_subreg (DImode, operands[0], <MODE>mode, 8);
+  operands[7] = simplify_subreg (DImode, operands[1], <MODE>mode, 8);
+  operands[8] = simplify_subreg (DImode, operands[2], <MODE>mode, 8);
+}
+  [(set_attr "type" "vecsimple,two,two,two")
+   (set_attr "length" "4,8,8,8")])
+
+;; The canonical form is to have the negated elment first, so we need to
+;; reverse arguments.
+(define_insn "*vsx_orc<mode>3"
+  [(set (match_operand:VSX_L 0 "vlogical_operand" "=wa")
+	(ior:VSX_L
+	 (not:VSX_L (match_operand:VSX_L 1 "vlogical_operand" "wa"))
+	 (match_operand:VSX_L 2 "vlogical_operand" "wa")))]
+  "TARGET_P8_VECTOR && VECTOR_MEM_VSX_P (<MODE>mode) && !TARGET_QUAD_MEMORY
+   && (<MODE>mode != TImode || TARGET_POWERPC64)"
+  "xxlorc %x0,%x2,%x1"
+  [(set_attr "type" "vecsimple")
+   (set_attr "length" "4")])
+
+(define_insn_and_split "*vsx_or_gpr_orc<mode>3"
+  [(set (match_operand:VSX_L 0 "vlogical_operand" "=wa,?wr,?wr,?wr")
+	(ior:VSX_L
+	 (not:VSX_L (match_operand:VSX_L 1 "vlogical_operand" "wa,0,wr,wr"))
+	 (match_operand:VSX_L 2 "vlogical_operand" "wa,wr,0,wr")))]
+  "TARGET_P8_VECTOR && VECTOR_MEM_VSX_P (<MODE>mode)
+   && (<MODE>mode != TImode || TARGET_POWERPC64)"
+  "@
+   xxlorc %x0,%x2,%x1
+   #
+   #
+   #"
+  "reload_completed && VECTOR_MEM_VSX_P (<MODE>mode) && TARGET_QUAD_MEMORY
+   && (<MODE>mode != TImode || TARGET_POWERPC64)
+   && int_reg_operand (operands[0], <MODE>mode)"
+  [(set (match_dup 3) (ior:DI (not:DI (match_dup 4)) (match_dup 5)))
+   (set (match_dup 6) (ior:DI (not:DI (match_dup 7)) (match_dup 8)))]
+{
+  operands[3] = simplify_subreg (DImode, operands[0], <MODE>mode, 0);
+  operands[4] = simplify_subreg (DImode, operands[1], <MODE>mode, 0);
+  operands[5] = simplify_subreg (DImode, operands[2], <MODE>mode, 0);
+  operands[6] = simplify_subreg (DImode, operands[0], <MODE>mode, 8);
+  operands[7] = simplify_subreg (DImode, operands[1], <MODE>mode, 8);
+  operands[8] = simplify_subreg (DImode, operands[2], <MODE>mode, 8);
+}
+  [(set_attr "type" "vecsimple,two,two,two")
+   (set_attr "length" "4,8,8,8")])
 
 \f
 ;; Permute operations
Index: gcc/config/rs6000/altivec.md
===================================================================
--- gcc/config/rs6000/altivec.md	(revision 199149)
+++ gcc/config/rs6000/altivec.md	(working copy)
@@ -128,6 +128,7 @@ (define_c_enum "unspec"
    UNSPEC_VUPKLS_V4SF
    UNSPEC_VUPKHU_V4SF
    UNSPEC_VUPKLU_V4SF
+   UNSPEC_VGBBD
 ])
 
 (define_c_enum "unspecv"
@@ -941,6 +942,31 @@ (define_insn "*altivec_vmrglsf"
   "vmrglw %0,%1,%2"
   [(set_attr "type" "vecperm")])
 
+;; Power8 vector merge even/odd
+(define_insn "p8_vmrgew"
+  [(set (match_operand:V4SI 0 "register_operand" "=v")
+	(vec_select:V4SI
+	  (vec_concat:V8SI
+	    (match_operand:V4SI 1 "register_operand" "v")
+	    (match_operand:V4SI 2 "register_operand" "v"))
+	  (parallel [(const_int 0) (const_int 4)
+		     (const_int 2) (const_int 6)])))]
+  "TARGET_P8_VECTOR"
+  "vmrgew %0,%1,%2"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "p8_vmrgow"
+  [(set (match_operand:V4SI 0 "register_operand" "=v")
+	(vec_select:V4SI
+	  (vec_concat:V8SI
+	    (match_operand:V4SI 1 "register_operand" "v")
+	    (match_operand:V4SI 2 "register_operand" "v"))
+	  (parallel [(const_int 1) (const_int 5)
+		     (const_int 3) (const_int 7)])))]
+  "TARGET_P8_VECTOR"
+  "vmrgow %0,%1,%2"
+  [(set_attr "type" "vecperm")])
+
 (define_insn "vec_widen_umult_even_v16qi"
   [(set (match_operand:V8HI 0 "register_operand" "=v")
         (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")
@@ -1017,10 +1043,13 @@ (define_insn "vec_widen_smult_odd_v8hi"
 ;; logical ops.  Have the logical ops follow the memory ops in
 ;; terms of whether to prefer VSX or Altivec
 
+;; For and, add the clobber to be consistant with VSX, which adds splitters for
+;; using the GPR registers.
 (define_insn "*altivec_and<mode>3"
   [(set (match_operand:VM 0 "register_operand" "=v")
         (and:VM (match_operand:VM 1 "register_operand" "v")
-		(match_operand:VM 2 "register_operand" "v")))]
+		(match_operand:VM 2 "register_operand" "v")))
+   (clobber (match_scratch:CC 3 "=X"))]
   "VECTOR_MEM_ALTIVEC_P (<MODE>mode)"
   "vand %0,%1,%2"
   [(set_attr "type" "vecsimple")])
@@ -2370,3 +2399,34 @@ (define_expand "vec_unpacku_float_lo_v8h
   emit_insn (gen_altivec_vcfux (operands[0], tmp, const0_rtx));
   DONE;
 }")
+
+\f
+;; Power8 vector instructions encoded as Altivec instructions
+
+;; Vector count leading zeros
+(define_insn "*p8v_clz<mode>2"
+  [(set (match_operand:VI2 0 "register_operand" "=v")
+	(clz:VI2 (match_operand:VI2 1 "register_operand" "v")))]
+  "TARGET_P8_VECTOR"
+  "vclz<wd> %0,%1"
+  [(set_attr "length" "4")
+   (set_attr "type" "vecsimple")])
+
+;; Vector population count
+(define_insn "*p8v_popcount<mode>2"
+  [(set (match_operand:VI2 0 "register_operand" "=v")
+        (popcount:VI2 (match_operand:VI2 1 "register_operand" "v")))]
+  "TARGET_P8_VECTOR"
+  "vpopcnt<wd> %0,%1"
+  [(set_attr "length" "4")
+   (set_attr "type" "vecsimple")])
+
+;; Vector Gather Bits by Bytes by Doubleword
+(define_insn "p8v_vgbbd"
+  [(set (match_operand:V16QI 0 "register_operand" "=v")
+	(unspec:V16QI [(match_operand:V16QI 1 "register_operand" "v")]
+		      UNSPEC_VGBBD))]
+  "TARGET_P8_VECTOR"
+  "vgbbd %0,%1"
+  [(set_attr "length" "4")
+   (set_attr "type" "vecsimple")])
Index: gcc/config/rs6000/altivec.h
===================================================================
--- gcc/config/rs6000/altivec.h	(revision 199149)
+++ gcc/config/rs6000/altivec.h	(working copy)
@@ -323,15 +323,31 @@
 
 #ifdef _ARCH_PWR8
 /* Vector additions added in Power8/ISA 2.07.  */
+#define vec_eqv __builtin_vec_eqv
+#define vec_nand __builtin_vec_nand
+#define vec_orc __builtin_vec_orc
 #define vec_vaddudm __builtin_vec_vaddudm
+#define vec_vclz __builtin_vec_vclz
+#define vec_vclzb __builtin_vec_vclzb
+#define vec_vclzd __builtin_vec_vclzd
+#define vec_vclzh __builtin_vec_vclzh
+#define vec_vclzw __builtin_vec_vclzw
+#define vec_vgbbd __builtin_vec_vgbbd
 #define vec_vmaxsd __builtin_vec_vmaxsd
 #define vec_vmaxud __builtin_vec_vmaxud
 #define vec_vminsd __builtin_vec_vminsd
 #define vec_vminud __builtin_vec_vminud
+#define vec_vmrgew __builtin_vec_vmrgew
+#define vec_vmrgow __builtin_vec_vmrgow
 #define vec_vpksdss __builtin_vec_vpksdss
 #define vec_vpksdus __builtin_vec_vpksdus
 #define vec_vpkudum __builtin_vec_vpkudum
 #define vec_vpkudus __builtin_vec_vpkudus
+#define vec_vpopcnt __builtin_vec_vpopcnt
+#define vec_vpopcntb __builtin_vec_vpopcntb
+#define vec_vpopcntd __builtin_vec_vpopcntd
+#define vec_vpopcnth __builtin_vec_vpopcnth
+#define vec_vpopcntw __builtin_vec_vpopcntw
 #define vec_vrld __builtin_vec_vrld
 #define vec_vsld __builtin_vec_vsld
 #define vec_vsrad __builtin_vec_vsrad
Index: gcc/config/rs6000/predicates.md
===================================================================
--- gcc/config/rs6000/predicates.md	(revision 199122)
+++ gcc/config/rs6000/predicates.md	(working copy)
@@ -207,7 +207,7 @@ (define_predicate "int_reg_operand"
   if (!REG_P (op))
     return 0;
 
-  if (REGNO (op) >= ARG_POINTER_REGNUM && !CA_REGNO_P (REGNO (op)))
+  if (REGNO (op) >= FIRST_PSEUDO_REGISTER)
     return 1;
 
   return INT_REGNO_P (REGNO (op));
Index: gcc/config/rs6000/rs6000.h
===================================================================
--- gcc/config/rs6000/rs6000.h	(revision 199122)
+++ gcc/config/rs6000/rs6000.h	(working copy)
@@ -1114,14 +1114,13 @@ extern unsigned rs6000_pointer_size;
 #define VINT_REGNO_P(N) ALTIVEC_REGNO_P (N)
 
 /* Alternate name for any vector register supporting logical operations, no
-   matter which instruction set(s) are available.  Under VSX, we allow GPRs as
-   well as vector registers on 64-bit systems.  We don't allow 32-bit systems,
-   due to the number of registers involved, and the number of instructions to
-   load/store the values..  */
+   matter which instruction set(s) are available.  If we have quad memory
+   support, we also allow logical operations in the GPRS.  This is to allow
+   atomic quad word builtins not to need the VSX registers for lqarx/stqcx.  */
 #define VLOGICAL_REGNO_P(N)						\
   (ALTIVEC_REGNO_P (N)							\
    || (TARGET_VSX && FP_REGNO_P (N))					\
-   || (TARGET_VSX && TARGET_POWERPC64 && INT_REGNO_P (N)))
+   || (TARGET_VSX && TARGET_QUAD_MEMORY && INT_REGNO_P (N)))
 
 /* Return number of consecutive hard regs needed starting at reg REGNO
    to hold something of mode MODE.  */

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH, rs6000] power8 patches, patch #5, new vector tests
  2013-05-20 20:41 [PATCH, rs6000] power8 patches Michael Meissner
                   ` (4 preceding siblings ...)
  2013-05-21 23:47 ` [PATCH, rs6000] power8 patches, patch #4, new power8 builtins Michael Meissner
@ 2013-05-21 23:49 ` Michael Meissner
  2013-06-06 21:51   ` Michael Meissner
  2013-05-22 14:26 ` [PATCH, rs6000] power8 patches, patch #6, direct move & basic quad load/store Michael Meissner
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 52+ messages in thread
From: Michael Meissner @ 2013-05-21 23:49 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, dje.gcc, pthaugen, bergner

[-- Attachment #1: Type: text/plain, Size: 1679 bytes --]

This patch provides the tests for the new vector instructions added in patches
3 and 4.  In addition, it provides the target support for power8 systems.

Is this patch acceptable to be checked in once the previous 4 patches have been
applied?

2013-05-21  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* gcc.target/powerpc/p8vector-builtin-1.c: New test to test
	power8 builtin functions.
	* gcc/testsuite/gcc.target/powerpc/p8vector-builtin-2.c: Likewise.
	* gcc/testsuite/gcc.target/powerpc/p8vector-builtin-3.c: Likewise.
	* gcc/testsuite/gcc.target/powerpc/p8vector-builtin-4.c: Likewise.
	* gcc/testsuite/gcc.target/powerpc/p8vector-builtin-5.c: Likewise.
	* gcc/testsuite/gcc.target/powerpc/p8vector-builtin-6.c: Likewise.
	* gcc/testsuite/gcc.target/powerpc/p8vector-builtin-7.c: Likewise.
	* gcc/testsuite/gcc.target/powerpc/p8vector-vectorize-1.c: New
	tests to test power8 auto-vectorization.
	* gcc/testsuite/gcc.target/powerpc/p8vector-vectorize-2.c: Likewise.
	* gcc/testsuite/gcc.target/powerpc/p8vector-vectorize-3.c: Likewise.
	* gcc/testsuite/gcc.target/powerpc/p8vector-vectorize-4.c: Likewise.
	* gcc/testsuite/gcc.target/powerpc/p8vector-vectorize-5.c: Likewise.

	* lib/target-supports.exp (check_p8vector_hw_available): New
	function, check if we are running on a power8.
	(check_effective_target_powerpc_p8vector_ok): New function, check
	if target can compile for power8.
	(is-effective-target): Add power8 support.
	(is-effective-target-keyword): Likewise.
	(check_vect_support_and_set_flags): Likewise.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

[-- Attachment #2: gcc-power8.official-05b --]
[-- Type: text/plain, Size: 31245 bytes --]

Index: gcc/testsuite/gcc.target/powerpc/p8vector-builtin-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/p8vector-builtin-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/p8vector-builtin-1.c	(revision 0)
@@ -0,0 +1,65 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-mcpu=power8 -O2 -ftree-vectorize -fvect-cost-model -fno-unroll-loops -fno-unroll-all-loops" } */
+
+#ifndef TYPE
+#define TYPE long long
+#endif
+
+#ifndef SIGN_TYPE
+#define SIGN_TYPE signed TYPE
+#endif
+
+#ifndef UNS_TYPE
+#define UNS_TYPE unsigned TYPE
+#endif
+
+typedef vector SIGN_TYPE v_sign;
+typedef vector UNS_TYPE  v_uns;
+
+v_sign sign_add (v_sign a, v_sign b)
+{
+  return a + b;
+}
+
+v_sign sign_sub (v_sign a, v_sign b)
+{
+  return a - b;
+}
+
+v_sign sign_shift_left (v_sign a, v_sign b)
+{
+  return a << b;
+}
+
+v_sign sign_shift_right (v_sign a, v_sign b)
+{
+  return a >> b;
+}
+
+v_uns uns_add (v_uns a, v_uns b)
+{
+  return a + b;
+}
+
+v_uns uns_sub (v_uns a, v_uns b)
+{
+  return a - b;
+}
+
+v_uns uns_shift_left (v_uns a, v_uns b)
+{
+  return a << b;
+}
+
+v_uns uns_shift_right (v_uns a, v_uns b)
+{
+  return a >> b;
+}
+
+/* { dg-final { scan-assembler-times "vaddudm" 2 } } */
+/* { dg-final { scan-assembler-times "vsubudm" 2 } } */
+/* { dg-final { scan-assembler-times "vsld"    2 } } */
+/* { dg-final { scan-assembler-times "vsrad"   1 } } */
+/* { dg-final { scan-assembler-times "vsrd"    1 } } */
Index: gcc/testsuite/gcc.target/powerpc/p8vector-vectorize-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/p8vector-vectorize-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/p8vector-vectorize-1.c	(revision 0)
@@ -0,0 +1,200 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-mcpu=power8 -O2 -ftree-vectorize -fvect-cost-model -fno-unroll-loops -fno-unroll-all-loops" } */
+
+#ifndef SIZE
+#define SIZE 1024
+#endif
+
+#ifndef ALIGN
+#define ALIGN 32
+#endif
+
+#ifndef TYPE
+#define TYPE long long
+#endif
+
+#ifndef SIGN_TYPE
+#define SIGN_TYPE signed TYPE
+#endif
+
+#ifndef UNS_TYPE
+#define UNS_TYPE unsigned TYPE
+#endif
+
+#define ALIGN_ATTR __attribute__((__aligned__(ALIGN)))
+
+SIGN_TYPE	sa[SIZE] ALIGN_ATTR;
+SIGN_TYPE	sb[SIZE] ALIGN_ATTR;
+SIGN_TYPE	sc[SIZE] ALIGN_ATTR;
+
+UNS_TYPE	ua[SIZE] ALIGN_ATTR;
+UNS_TYPE	ub[SIZE] ALIGN_ATTR;
+UNS_TYPE	uc[SIZE] ALIGN_ATTR;
+
+void
+sign_add (void)
+{
+  unsigned long i;
+
+  for (i = 0; i < SIZE; i++)
+    sa[i] = sb[i] + sc[i];
+}
+
+void
+sign_sub (void)
+{
+  unsigned long i;
+
+  for (i = 0; i < SIZE; i++)
+    sa[i] = sb[i] - sc[i];
+}
+
+void
+sign_shift_left (void)
+{
+  unsigned long i;
+
+  for (i = 0; i < SIZE; i++)
+    sa[i] = sb[i] << sc[i];
+}
+
+void
+sign_shift_right (void)
+{
+  unsigned long i;
+
+  for (i = 0; i < SIZE; i++)
+    sa[i] = sb[i] >> sc[i];
+}
+
+void
+sign_max (void)
+{
+  unsigned long i;
+
+  for (i = 0; i < SIZE; i++)
+    sa[i] = (sb[i] > sc[i]) ? sb[i] : sc[i];
+}
+
+void
+sign_min (void)
+{
+  unsigned long i;
+
+  for (i = 0; i < SIZE; i++)
+    sa[i] = (sb[i] < sc[i]) ? sb[i] : sc[i];
+}
+
+void
+sign_abs (void)
+{
+  unsigned long i;
+
+  for (i = 0; i < SIZE; i++)
+    sa[i] = (sb[i] < 0) ? -sb[i] : sb[i];	/* xor, vsubudm, vmaxsd.  */
+}
+
+void
+sign_eq (SIGN_TYPE val1, SIGN_TYPE val2)
+{
+  unsigned long i;
+
+  for (i = 0; i < SIZE; i++)
+    sa[i] = (sb[i] == sc[i]) ? val1 : val2;
+}
+
+void
+sign_lt (SIGN_TYPE val1, SIGN_TYPE val2)
+{
+  unsigned long i;
+
+  for (i = 0; i < SIZE; i++)
+    sa[i] = (sb[i] < sc[i]) ? val1 : val2;
+}
+
+void
+uns_add (void)
+{
+  unsigned long i;
+
+  for (i = 0; i < SIZE; i++)
+    ua[i] = ub[i] + uc[i];
+}
+
+void
+uns_sub (void)
+{
+  unsigned long i;
+
+  for (i = 0; i < SIZE; i++)
+    ua[i] = ub[i] - uc[i];
+}
+
+void
+uns_shift_left (void)
+{
+  unsigned long i;
+
+  for (i = 0; i < SIZE; i++)
+    ua[i] = ub[i] << uc[i];
+}
+
+void
+uns_shift_right (void)
+{
+  unsigned long i;
+
+  for (i = 0; i < SIZE; i++)
+    ua[i] = ub[i] >> uc[i];
+}
+
+void
+uns_max (void)
+{
+  unsigned long i;
+
+  for (i = 0; i < SIZE; i++)
+    ua[i] = (ub[i] > uc[i]) ? ub[i] : uc[i];
+}
+
+void
+uns_min (void)
+{
+  unsigned long i;
+
+  for (i = 0; i < SIZE; i++)
+    ua[i] = (ub[i] < uc[i]) ? ub[i] : uc[i];
+}
+
+void
+uns_eq (UNS_TYPE val1, UNS_TYPE val2)
+{
+  unsigned long i;
+
+  for (i = 0; i < SIZE; i++)
+    ua[i] = (ub[i] == uc[i]) ? val1 : val2;
+}
+
+void
+uns_lt (UNS_TYPE val1, UNS_TYPE val2)
+{
+  unsigned long i;
+
+  for (i = 0; i < SIZE; i++)
+    ua[i] = (ub[i] < uc[i]) ? val1 : val2;
+}
+
+/* { dg-final { scan-assembler-times "\[\t \]vaddudm\[\t \]"  2 } } */
+/* { dg-final { scan-assembler-times "\[\t \]vsubudm\[\t \]"  3 } } */
+/* { dg-final { scan-assembler-times "\[\t \]vmaxsd\[\t \]"   2 } } */
+/* { dg-final { scan-assembler-times "\[\t \]vmaxud\[\t \]"   1 } } */
+/* { dg-final { scan-assembler-times "\[\t \]vminsd\[\t \]"   1 } } */
+/* { dg-final { scan-assembler-times "\[\t \]vminud\[\t \]"   1 } } */
+/* { dg-final { scan-assembler-times "\[\t \]vsld\[\t \]"     2 } } */
+/* { dg-final { scan-assembler-times "\[\t \]vsrad\[\t \]"    1 } } */
+/* { dg-final { scan-assembler-times "\[\t \]vsrd\[\t \]"     1 } } */
+/* { dg-final { scan-assembler-times "\[\t \]vcmpequd\[\t \]" 2 } } */
+/* { dg-final { scan-assembler-times "\[\t \]vcmpgtsd\[\t \]" 1 } } */
+/* { dg-final { scan-assembler-times "\[\t \]vcmpgtud\[\t \]" 1 } } */
Index: gcc/testsuite/gcc.target/powerpc/p8vector-builtin-2.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/p8vector-builtin-2.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/p8vector-builtin-2.c	(revision 0)
@@ -0,0 +1,204 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-mcpu=power8 -O2 -ftree-vectorize -fvect-cost-model -fno-unroll-loops -fno-unroll-all-loops" } */
+
+#include <altivec.h>
+
+typedef vector long long		v_sign;
+typedef vector unsigned long long	v_uns;
+typedef vector bool long long		v_bool;
+
+v_sign sign_add_1 (v_sign a, v_sign b)
+{
+  return __builtin_altivec_vaddudm (a, b);
+}
+
+v_sign sign_add_2 (v_sign a, v_sign b)
+{
+  return vec_add (a, b);
+}
+
+v_sign sign_add_3 (v_sign a, v_sign b)
+{
+  return vec_vaddudm (a, b);
+}
+
+v_sign sign_sub_1 (v_sign a, v_sign b)
+{
+  return __builtin_altivec_vsubudm (a, b);
+}
+
+v_sign sign_sub_2 (v_sign a, v_sign b)
+{
+  return vec_sub (a, b);
+}
+
+
+v_sign sign_sub_3 (v_sign a, v_sign b)
+{
+  return vec_vsubudm (a, b);
+}
+
+v_sign sign_min_1 (v_sign a, v_sign b)
+{
+  return __builtin_altivec_vminsd (a, b);
+}
+
+v_sign sign_min_2 (v_sign a, v_sign b)
+{
+  return vec_min (a, b);
+}
+
+v_sign sign_min_3 (v_sign a, v_sign b)
+{
+  return vec_vminsd (a, b);
+}
+
+v_sign sign_max_1 (v_sign a, v_sign b)
+{
+  return __builtin_altivec_vmaxsd (a, b);
+}
+
+v_sign sign_max_2 (v_sign a, v_sign b)
+{
+  return vec_max (a, b);
+}
+
+v_sign sign_max_3 (v_sign a, v_sign b)
+{
+  return vec_vmaxsd (a, b);
+}
+
+v_sign sign_abs (v_sign a)
+{
+  return vec_abs (a);		/* xor, vsubudm, vmaxsd.  */
+}
+
+v_bool sign_eq (v_sign a, v_sign b)
+{
+  return vec_cmpeq (a, b);
+}
+
+v_bool sign_lt (v_sign a, v_sign b)
+{
+  return vec_cmplt (a, b);
+}
+
+v_uns uns_add_2 (v_uns a, v_uns b)
+{
+  return vec_add (a, b);
+}
+
+v_uns uns_add_3 (v_uns a, v_uns b)
+{
+  return vec_vaddudm (a, b);
+}
+
+v_uns uns_sub_2 (v_uns a, v_uns b)
+{
+  return vec_sub (a, b);
+}
+
+v_uns uns_sub_3 (v_uns a, v_uns b)
+{
+  return vec_vsubudm (a, b);
+}
+
+v_uns uns_min_2 (v_uns a, v_uns b)
+{
+  return vec_min (a, b);
+}
+
+v_uns uns_min_3 (v_uns a, v_uns b)
+{
+  return vec_vminud (a, b);
+}
+
+v_uns uns_max_2 (v_uns a, v_uns b)
+{
+  return vec_max (a, b);
+}
+
+v_uns uns_max_3 (v_uns a, v_uns b)
+{
+  return vec_vmaxud (a, b);
+}
+
+v_bool uns_eq (v_uns a, v_uns b)
+{
+  return vec_cmpeq (a, b);
+}
+
+v_bool uns_lt (v_uns a, v_uns b)
+{
+  return vec_cmplt (a, b);
+}
+
+v_sign sign_rl_1 (v_sign a, v_sign b)
+{
+  return __builtin_altivec_vrld (a, b);
+}
+
+v_sign sign_rl_2 (v_sign a, v_uns b)
+{
+  return vec_rl (a, b);
+}
+
+v_uns uns_rl_2 (v_uns a, v_uns b)
+{
+  return vec_rl (a, b);
+}
+
+v_sign sign_sl_1 (v_sign a, v_sign b)
+{
+  return __builtin_altivec_vsld (a, b);
+}
+
+v_sign sign_sl_2 (v_sign a, v_uns b)
+{
+  return vec_sl (a, b);
+}
+
+v_sign sign_sl_3 (v_sign a, v_uns b)
+{
+  return vec_vsld (a, b);
+}
+
+v_uns uns_sl_2 (v_uns a, v_uns b)
+{
+  return vec_sl (a, b);
+}
+
+v_uns uns_sl_3 (v_uns a, v_uns b)
+{
+  return vec_vsld (a, b);
+}
+
+v_sign sign_sra_1 (v_sign a, v_sign b)
+{
+  return __builtin_altivec_vsrad (a, b);
+}
+
+v_sign sign_sra_2 (v_sign a, v_uns b)
+{
+  return vec_sra (a, b);
+}
+
+v_sign sign_sra_3 (v_sign a, v_uns b)
+{
+  return vec_vsrad (a, b);
+}
+
+/* { dg-final { scan-assembler-times "vaddudm" 	5 } } */
+/* { dg-final { scan-assembler-times "vsubudm" 	6 } } */
+/* { dg-final { scan-assembler-times "vmaxsd"  	4 } } */
+/* { dg-final { scan-assembler-times "vminsd"  	3 } } */
+/* { dg-final { scan-assembler-times "vmaxud"  	2 } } */
+/* { dg-final { scan-assembler-times "vminud"  	2 } } */
+/* { dg-final { scan-assembler-times "vcmpequd" 2 } } */
+/* { dg-final { scan-assembler-times "vcmpgtsd" 1 } } */
+/* { dg-final { scan-assembler-times "vcmpgtud" 1 } } */
+/* { dg-final { scan-assembler-times "vrld"     3 } } */
+/* { dg-final { scan-assembler-times "vsld"     5 } } */
+/* { dg-final { scan-assembler-times "vsrad"    3 } } */
Index: gcc/testsuite/gcc.target/powerpc/p8vector-vectorize-2.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/p8vector-vectorize-2.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/p8vector-vectorize-2.c	(revision 0)
@@ -0,0 +1,30 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-mcpu=power8 -O2 -ftree-vectorize -fvect-cost-model" } */
+
+#include <stddef.h>
+
+#ifndef SIZE
+#define SIZE 1024
+#endif
+
+#ifndef ALIGN
+#define ALIGN 32
+#endif
+
+#define ALIGN_ATTR __attribute__((__aligned__(ALIGN)))
+
+long long sign_ll[SIZE]	ALIGN_ATTR;
+int	  sign_i [SIZE]	ALIGN_ATTR;
+
+void copy_int_to_long_long (void)
+{
+  size_t i;
+
+  for (i = 0; i < SIZE; i++)
+    sign_ll[i] = sign_i[i];
+}
+
+/* { dg-final { scan-assembler "vupkhsw" } } */
+/* { dg-final { scan-assembler "vupklsw" } } */
Index: gcc/testsuite/gcc.target/powerpc/p8vector-builtin-3.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/p8vector-builtin-3.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/p8vector-builtin-3.c	(revision 0)
@@ -0,0 +1,104 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-mcpu=power8 -O3 -ftree-vectorize -fvect-cost-model" } */
+
+#include <altivec.h>
+
+typedef vector long long		vll_sign;
+typedef vector unsigned long long	vll_uns;
+typedef vector bool long long		vll_bool;
+
+typedef vector int			vi_sign;
+typedef vector unsigned int		vi_uns;
+typedef vector bool int			vi_bool;
+
+typedef vector short			vs_sign;
+typedef vector unsigned short		vs_uns;
+typedef vector bool short		vs_bool;
+
+typedef vector signed char		vc_sign;
+typedef vector unsigned char		vc_uns;
+typedef vector bool char		vc_bool;
+
+
+vi_sign vi_pack_1 (vll_sign a, vll_sign b)
+{
+  return __builtin_altivec_vpkudum (a, b);
+}
+
+vi_sign vi_pack_2 (vll_sign a, vll_sign b)
+{
+  return vec_pack (a, b);
+}
+
+vi_sign vi_pack_3 (vll_sign a, vll_sign b)
+{
+  return vec_vpkudum (a, b);
+}
+
+vs_sign vs_pack_1 (vi_sign a, vi_sign b)
+{
+  return __builtin_altivec_vpkuwum (a, b);
+}
+
+vs_sign vs_pack_2 (vi_sign a, vi_sign b)
+{
+  return vec_pack (a, b);
+}
+
+vs_sign vs_pack_3 (vi_sign a, vi_sign b)
+{
+  return vec_vpkuwum (a, b);
+}
+
+vc_sign vc_pack_1 (vs_sign a, vs_sign b)
+{
+  return __builtin_altivec_vpkuhum (a, b);
+}
+
+vc_sign vc_pack_2 (vs_sign a, vs_sign b)
+{
+  return vec_pack (a, b);
+}
+
+vc_sign vc_pack_3 (vs_sign a, vs_sign b)
+{
+  return vec_vpkuhum (a, b);
+}
+
+vll_sign vll_unpack_hi_1 (vi_sign a)
+{
+  return __builtin_altivec_vupkhsw (a);
+}
+
+vll_sign vll_unpack_hi_2 (vi_sign a)
+{
+  return vec_unpackh (a);
+}
+
+vll_sign vll_unpack_hi_3 (vi_sign a)
+{
+  return __builtin_vec_vupkhsw (a);
+}
+
+vll_sign vll_unpack_lo_1 (vi_sign a)
+{
+  return vec_vupklsw (a);
+}
+
+vll_sign vll_unpack_lo_2 (vi_sign a)
+{
+  return vec_unpackl (a);
+}
+
+vll_sign vll_unpack_lo_3 (vi_sign a)
+{
+  return vec_vupklsw (a);
+}
+
+/* { dg-final { scan-assembler-times "vpkudum" 3 } } */
+/* { dg-final { scan-assembler-times "vpkuwum" 3 } } */
+/* { dg-final { scan-assembler-times "vpkuhum" 3 } } */
+/* { dg-final { scan-assembler-times "vupklsw" 3 } } */
+/* { dg-final { scan-assembler-times "vupkhsw" 3 } } */
Index: gcc/testsuite/gcc.target/powerpc/p8vector-vectorize-3.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/p8vector-vectorize-3.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/p8vector-vectorize-3.c	(revision 0)
@@ -0,0 +1,29 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-mcpu=power8 -O2 -ftree-vectorize -fvect-cost-model" } */
+
+#include <stddef.h>
+
+#ifndef SIZE
+#define SIZE 1024
+#endif
+
+#ifndef ALIGN
+#define ALIGN 32
+#endif
+
+#define ALIGN_ATTR __attribute__((__aligned__(ALIGN)))
+
+long long sign_ll[SIZE]	ALIGN_ATTR;
+int	  sign_i [SIZE]	ALIGN_ATTR;
+
+void copy_long_long_to_int (void)
+{
+  size_t i;
+
+  for (i = 0; i < SIZE; i++)
+    sign_i[i] = sign_ll[i];
+}
+
+/* { dg-final { scan-assembler "vpkudum" } } */
Index: gcc/testsuite/gcc.target/powerpc/p8vector-builtin-4.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/p8vector-builtin-4.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/p8vector-builtin-4.c	(revision 0)
@@ -0,0 +1,249 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-mcpu=power8 -O3 -ftree-vectorize -fvect-cost-model" } */
+
+#include <altivec.h>
+
+typedef vector long long		vll_sign;
+typedef vector unsigned long long	vll_uns;
+typedef vector bool long long		vll_bool;
+
+typedef vector int			vi_sign;
+typedef vector unsigned int		vi_uns;
+typedef vector bool int			vi_bool;
+
+typedef vector short			vs_sign;
+typedef vector unsigned short		vs_uns;
+typedef vector bool short		vs_bool;
+
+typedef vector signed char		vc_sign;
+typedef vector unsigned char		vc_uns;
+typedef vector bool char		vc_bool;
+
+vll_sign vll_clz_1 (vll_sign a)
+{
+  return __builtin_altivec_vclzd (a);
+}
+
+vll_sign vll_clz_2 (vll_sign a)
+{
+  return vec_vclz (a);
+}
+
+vll_sign vll_clz_3 (vll_sign a)
+{
+  return vec_vclzd (a);
+}
+
+vll_uns vll_clz_4 (vll_uns a)
+{
+  return vec_vclz (a);
+}
+
+vll_uns vll_clz_5 (vll_uns a)
+{
+  return vec_vclzd (a);
+}
+
+vi_sign vi_clz_1 (vi_sign a)
+{
+  return __builtin_altivec_vclzw (a);
+}
+
+vi_sign vi_clz_2 (vi_sign a)
+{
+  return vec_vclz (a);
+}
+
+vi_sign vi_clz_3 (vi_sign a)
+{
+  return vec_vclzw (a);
+}
+
+vi_uns vi_clz_4 (vi_uns a)
+{
+  return vec_vclz (a);
+}
+
+vi_uns vi_clz_5 (vi_uns a)
+{
+  return vec_vclzw (a);
+}
+
+vs_sign vs_clz_1 (vs_sign a)
+{
+  return __builtin_altivec_vclzh (a);
+}
+
+vs_sign vs_clz_2 (vs_sign a)
+{
+  return vec_vclz (a);
+}
+
+vs_sign vs_clz_3 (vs_sign a)
+{
+  return vec_vclzh (a);
+}
+
+vs_uns vs_clz_4 (vs_uns a)
+{
+  return vec_vclz (a);
+}
+
+vs_uns vs_clz_5 (vs_uns a)
+{
+  return vec_vclzh (a);
+}
+
+vc_sign vc_clz_1 (vc_sign a)
+{
+  return __builtin_altivec_vclzb (a);
+}
+
+vc_sign vc_clz_2 (vc_sign a)
+{
+  return vec_vclz (a);
+}
+
+vc_sign vc_clz_3 (vc_sign a)
+{
+  return vec_vclzb (a);
+}
+
+vc_uns vc_clz_4 (vc_uns a)
+{
+  return vec_vclz (a);
+}
+
+vc_uns vc_clz_5 (vc_uns a)
+{
+  return vec_vclzb (a);
+}
+
+vll_sign vll_popcnt_1 (vll_sign a)
+{
+  return __builtin_altivec_vpopcntd (a);
+}
+
+vll_sign vll_popcnt_2 (vll_sign a)
+{
+  return vec_vpopcnt (a);
+}
+
+vll_sign vll_popcnt_3 (vll_sign a)
+{
+  return vec_vpopcntd (a);
+}
+
+vll_uns vll_popcnt_4 (vll_uns a)
+{
+  return vec_vpopcnt (a);
+}
+
+vll_uns vll_popcnt_5 (vll_uns a)
+{
+  return vec_vpopcntd (a);
+}
+
+vi_sign vi_popcnt_1 (vi_sign a)
+{
+  return __builtin_altivec_vpopcntw (a);
+}
+
+vi_sign vi_popcnt_2 (vi_sign a)
+{
+  return vec_vpopcnt (a);
+}
+
+vi_sign vi_popcnt_3 (vi_sign a)
+{
+  return vec_vpopcntw (a);
+}
+
+vi_uns vi_popcnt_4 (vi_uns a)
+{
+  return vec_vpopcnt (a);
+}
+
+vi_uns vi_popcnt_5 (vi_uns a)
+{
+  return vec_vpopcntw (a);
+}
+
+vs_sign vs_popcnt_1 (vs_sign a)
+{
+  return __builtin_altivec_vpopcnth (a);
+}
+
+vs_sign vs_popcnt_2 (vs_sign a)
+{
+  return vec_vpopcnt (a);
+}
+
+vs_sign vs_popcnt_3 (vs_sign a)
+{
+  return vec_vpopcnth (a);
+}
+
+vs_uns vs_popcnt_4 (vs_uns a)
+{
+  return vec_vpopcnt (a);
+}
+
+vs_uns vs_popcnt_5 (vs_uns a)
+{
+  return vec_vpopcnth (a);
+}
+
+vc_sign vc_popcnt_1 (vc_sign a)
+{
+  return __builtin_altivec_vpopcntb (a);
+}
+
+vc_sign vc_popcnt_2 (vc_sign a)
+{
+  return vec_vpopcnt (a);
+}
+
+vc_sign vc_popcnt_3 (vc_sign a)
+{
+  return vec_vpopcntb (a);
+}
+
+vc_uns vc_popcnt_4 (vc_uns a)
+{
+  return vec_vpopcnt (a);
+}
+
+vc_uns vc_popcnt_5 (vc_uns a)
+{
+  return vec_vpopcntb (a);
+}
+
+vc_uns vc_gbb_1 (vc_uns a)
+{
+  return __builtin_altivec_vgbbd (a);
+}
+
+vc_sign vc_gbb_2 (vc_sign a)
+{
+  return vec_vgbbd (a);
+}
+
+vc_uns vc_gbb_3 (vc_uns a)
+{
+  return vec_vgbbd (a);
+}
+
+/* { dg-final { scan-assembler-times "vclzd" 	5 } } */
+/* { dg-final { scan-assembler-times "vclzw" 	5 } } */
+/* { dg-final { scan-assembler-times "vclzh" 	5 } } */
+/* { dg-final { scan-assembler-times "vclzb" 	5 } } */
+
+/* { dg-final { scan-assembler-times "vpopcntd" 5 } } */
+/* { dg-final { scan-assembler-times "vpopcntw" 5 } } */
+/* { dg-final { scan-assembler-times "vpopcnth" 5 } } */
+/* { dg-final { scan-assembler-times "vpopcntb" 5 } } */
+
+/* { dg-final { scan-assembler-times "vgbbd"    3 } } */
Index: gcc/testsuite/gcc.target/powerpc/p8vector-vectorize-4.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/p8vector-vectorize-4.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/p8vector-vectorize-4.c	(revision 0)
@@ -0,0 +1,69 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-mcpu=power8 -O2 -ftree-vectorize -fvect-cost-model -fno-unroll-loops -fno-unroll-all-loops" } */
+
+#ifndef SIZE
+#define SIZE 1024
+#endif
+
+#ifndef ALIGN
+#define ALIGN 32
+#endif
+
+#define ALIGN_ATTR __attribute__((__aligned__(ALIGN)))
+
+#define DO_BUILTIN(PREFIX, TYPE, CLZ, POPCNT)				\
+TYPE PREFIX ## _a[SIZE] ALIGN_ATTR;					\
+TYPE PREFIX ## _b[SIZE] ALIGN_ATTR;					\
+									\
+void									\
+PREFIX ## _clz (void)							\
+{									\
+  unsigned long i;							\
+									\
+  for (i = 0; i < SIZE; i++)						\
+    PREFIX ## _a[i] = CLZ (PREFIX ## _b[i]);				\
+}									\
+									\
+void									\
+PREFIX ## _popcnt (void)						\
+{									\
+  unsigned long i;							\
+									\
+  for (i = 0; i < SIZE; i++)						\
+    PREFIX ## _a[i] = POPCNT (PREFIX ## _b[i]);				\
+}
+
+#if !defined(DO_LONG_LONG) && !defined(DO_LONG) && !defined(DO_INT) && !defined(DO_SHORT) && !defined(DO_CHAR)
+#define DO_INT 1
+#endif
+
+#if DO_LONG_LONG
+/* At the moment, only int is auto vectorized.  */
+DO_BUILTIN (sll, long long,		__builtin_clzll, __builtin_popcountll)
+DO_BUILTIN (ull, unsigned long long,	__builtin_clzll, __builtin_popcountll)
+#endif
+
+#if defined(_ARCH_PPC64) && DO_LONG
+DO_BUILTIN (sl,  long,			__builtin_clzl,  __builtin_popcountl)
+DO_BUILTIN (ul,  unsigned long,		__builtin_clzl,  __builtin_popcountl)
+#endif
+
+#if DO_INT
+DO_BUILTIN (si,  int,			__builtin_clz,   __builtin_popcount)
+DO_BUILTIN (ui,  unsigned int,		__builtin_clz,   __builtin_popcount)
+#endif
+
+#if DO_SHORT
+DO_BUILTIN (ss,  short,			__builtin_clz,   __builtin_popcount)
+DO_BUILTIN (us,  unsigned short,	__builtin_clz,   __builtin_popcount)
+#endif
+
+#if DO_CHAR
+DO_BUILTIN (sc,  signed char,		__builtin_clz,   __builtin_popcount)
+DO_BUILTIN (uc,  unsigned char,		__builtin_clz,   __builtin_popcount)
+#endif
+
+/* { dg-final { scan-assembler-times "vclzw"     2 } } */
+/* { dg-final { scan-assembler-times "vpopcntw"  2 } } */
Index: gcc/testsuite/gcc.target/powerpc/p8vector-builtin-5.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/p8vector-builtin-5.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/p8vector-builtin-5.c	(revision 0)
@@ -0,0 +1,105 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-mcpu=power8 -O2 -ftree-vectorize -fvect-cost-model -fno-unroll-loops -fno-unroll-all-loops" } */
+
+#include <altivec.h>
+
+#ifndef SIZE
+#define SIZE 1024
+#endif
+
+#ifndef ALIGN
+#define ALIGN 32
+#endif
+
+#ifndef ATTR_ALIGN
+#define ATTR_ALIGN __attribute__((__aligned__(ALIGN)))
+#endif
+
+#define DOIT(TYPE, PREFIX)						\
+TYPE PREFIX ## _eqv_builtin (TYPE a, TYPE b)				\
+{									\
+  return vec_eqv (a, b);						\
+}									\
+									\
+TYPE PREFIX ## _eqv_arith (TYPE a, TYPE b)				\
+{									\
+  return ~(a ^ b);							\
+}									\
+									\
+TYPE PREFIX ## _nand_builtin (TYPE a, TYPE b)				\
+{									\
+  return vec_nand (a, b);						\
+}									\
+									\
+TYPE PREFIX ## _nand_arith1 (TYPE a, TYPE b)				\
+{									\
+  return ~(a & b);							\
+}									\
+									\
+TYPE PREFIX ## _nand_arith2 (TYPE a, TYPE b)				\
+{									\
+  return (~a) | (~b);							\
+}									\
+									\
+TYPE PREFIX ## _orc_builtin (TYPE a, TYPE b)				\
+{									\
+  return vec_orc (a, b);						\
+}									\
+									\
+TYPE PREFIX ## _orc_arith1 (TYPE a, TYPE b)				\
+{									\
+  return (~ a) | b;							\
+}									\
+									\
+TYPE PREFIX ## _orc_arith2 (TYPE a, TYPE b)				\
+{									\
+  return a | (~ b);							\
+}
+
+#define DOIT_FLOAT(TYPE, PREFIX)					\
+TYPE PREFIX ## _eqv_builtin (TYPE a, TYPE b)				\
+{									\
+  return vec_eqv (a, b);						\
+}									\
+									\
+TYPE PREFIX ## _nand_builtin (TYPE a, TYPE b)				\
+{									\
+  return vec_nand (a, b);						\
+}									\
+									\
+TYPE PREFIX ## _orc_builtin (TYPE a, TYPE b)				\
+{									\
+  return vec_orc (a, b);						\
+}
+
+typedef vector signed char		sign_char_vec;
+typedef vector short			sign_short_vec;
+typedef vector int			sign_int_vec;
+typedef vector long long		sign_llong_vec;
+
+typedef vector unsigned char		uns_char_vec;
+typedef vector unsigned short		uns_short_vec;
+typedef vector unsigned int		uns_int_vec;
+typedef vector unsigned long long	uns_llong_vec;
+
+typedef vector float			float_vec;
+typedef vector double			double_vec;
+
+DOIT(sign_char_vec,	sign_char)
+DOIT(sign_short_vec,	sign_short)
+DOIT(sign_int_vec,	sign_int)
+DOIT(sign_llong_vec,	sign_llong)
+
+DOIT(uns_char_vec,	uns_char)
+DOIT(uns_short_vec,	uns_short)
+DOIT(uns_int_vec,	uns_int)
+DOIT(uns_llong_vec,	uns_llong)
+
+DOIT_FLOAT(float_vec,	float)
+DOIT_FLOAT(double_vec,	double)
+
+/* { dg-final { scan-assembler-times "xxleqv"  18 } } */
+/* { dg-final { scan-assembler-times "xxlnand" 26 } } */
+/* { dg-final { scan-assembler-times "xxlorc"  26 } } */
Index: gcc/testsuite/gcc.target/powerpc/p8vector-vectorize-5.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/p8vector-vectorize-5.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/p8vector-vectorize-5.c	(revision 0)
@@ -0,0 +1,87 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-mcpu=power8 -O2 -ftree-vectorize -fvect-cost-model -fno-unroll-loops -fno-unroll-all-loops" } */
+
+#ifndef SIZE
+#define SIZE 1024
+#endif
+
+#ifndef ALIGN
+#define ALIGN 32
+#endif
+
+#ifndef ATTR_ALIGN
+#define ATTR_ALIGN __attribute__((__aligned__(ALIGN)))
+#endif
+
+#ifndef TYPE
+#define TYPE unsigned int
+#endif
+
+TYPE in1  [SIZE] ATTR_ALIGN;
+TYPE in2  [SIZE] ATTR_ALIGN;
+TYPE eqv  [SIZE] ATTR_ALIGN;
+TYPE nand1[SIZE] ATTR_ALIGN;
+TYPE nand2[SIZE] ATTR_ALIGN;
+TYPE orc1 [SIZE] ATTR_ALIGN;
+TYPE orc2 [SIZE] ATTR_ALIGN;
+
+void
+do_eqv (void)
+{
+  unsigned long i;
+
+  for (i = 0; i < SIZE; i++)
+    {
+      eqv[i] = ~(in1[i] ^ in2[i]);
+    }
+}
+
+void
+do_nand1 (void)
+{
+  unsigned long i;
+
+  for (i = 0; i < SIZE; i++)
+    {
+      nand1[i] = ~(in1[i] & in2[i]);
+    }
+}
+
+void
+do_nand2 (void)
+{
+  unsigned long i;
+
+  for (i = 0; i < SIZE; i++)
+    {
+      nand2[i] = (~in1[i]) | (~in2[i]);
+    }
+}
+
+void
+do_orc1 (void)
+{
+  unsigned long i;
+
+  for (i = 0; i < SIZE; i++)
+    {
+      orc1[i] = (~in1[i]) | in2[i];
+    }
+}
+
+void
+do_orc2 (void)
+{
+  unsigned long i;
+
+  for (i = 0; i < SIZE; i++)
+    {
+      orc1[i] = in1[i] | (~in2[i]);
+    }
+}
+
+/* { dg-final { scan-assembler-times "xxleqv"  1 } } */
+/* { dg-final { scan-assembler-times "xxlnand" 2 } } */
+/* { dg-final { scan-assembler-times "xxlorc"  2 } } */
Index: gcc/testsuite/gcc.target/powerpc/p8vector-builtin-6.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/p8vector-builtin-6.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/p8vector-builtin-6.c	(revision 0)
@@ -0,0 +1,10 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-mcpu=power8 -O2" } */
+
+vector float dbl_to_float_p8 (double x) { return __builtin_vsx_xscvdpspn (x); }
+double float_to_dbl_p8 (vector float x) { return __builtin_vsx_xscvspdpn (x); }
+
+/* { dg-final { scan-assembler "xscvdpspn" } } */
+/* { dg-final { scan-assembler "xscvspdpn" } } */
Index: gcc/testsuite/gcc.target/powerpc/p8vector-builtin-7.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/p8vector-builtin-7.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/p8vector-builtin-7.c	(revision 0)
@@ -0,0 +1,32 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-mcpu=power8 -O2" } */
+
+#include <altivec.h>
+
+typedef vector int		v_sign;
+typedef vector unsigned int	v_uns;
+
+v_sign even_sign (v_sign a, v_sign b)
+{
+  return vec_vmrgew (a, b);
+}
+
+v_uns even_uns (v_uns a, v_uns b)
+{
+  return vec_vmrgew (a, b);
+}
+
+v_sign odd_sign (v_sign a, v_sign b)
+{
+  return vec_vmrgow (a, b);
+}
+
+v_uns odd_uns (v_uns a, v_uns b)
+{
+  return vec_vmrgow (a, b);
+}
+
+/* { dg-final { scan-assembler-times "vmrgew" 2 } } */
+/* { dg-final { scan-assembler-times "vmrgow" 2 } } */
Index: gcc/testsuite/lib/target-supports.exp
===================================================================
--- gcc/testsuite/lib/target-supports.exp	(revision 199037)
+++ gcc/testsuite/lib/target-supports.exp	(working copy)
@@ -1311,6 +1311,32 @@ proc check_effective_target_avx_runtime 
     return 0
 }
 
+# Return 1 if the target supports executing power8 vector instructions, 0
+# otherwise.  Cache the result.
+
+proc check_p8vector_hw_available { } {
+    return [check_cached_effective_target p8vector_hw_available {
+	# Some simulators are known to not support VSX/power8 instructions.
+	# For now, disable on Darwin
+	if { [istarget powerpc-*-eabi] || [istarget powerpc*-*-eabispe] || [istarget *-*-darwin*]} {
+	    expr 0
+	} else {
+	    set options "-mpower8-vector"
+	    check_runtime_nocache p8vector_hw_available {
+		int main()
+		{
+		#ifdef __MACH__
+		  asm volatile ("xxlorc vs0,vs0,vs0");
+		#else
+		  asm volatile ("xxlorc 0,0,0");
+	        #endif
+		  return 0;
+		}
+	    } $options
+	}
+    }]
+}
+
 # Return 1 if the target supports executing VSX instructions, 0
 # otherwise.  Cache the result.
 
@@ -2749,6 +2775,33 @@ proc check_effective_target_powerpc_alti
     }
 }
 
+# Return 1 if this is a PowerPC target supporting -mpower8-vector
+
+proc check_effective_target_powerpc_p8vector_ok { } {
+    if { ([istarget powerpc*-*-*]
+         && ![istarget powerpc-*-linux*paired*])
+	 || [istarget rs6000-*-*] } {
+	# AltiVec is not supported on AIX before 5.3.
+	if { [istarget powerpc*-*-aix4*]
+	     || [istarget powerpc*-*-aix5.1*] 
+	     || [istarget powerpc*-*-aix5.2*] } {
+	    return 0
+	}
+	return [check_no_compiler_messages powerpc_p8vector_ok object {
+	    int main (void) {
+#ifdef __MACH__
+		asm volatile ("xxlorc vs0,vs0,vs0");
+#else
+		asm volatile ("xxlorc 0,0,0");
+#endif
+		return 0;
+	    }
+	} "-mpower8-vector"]
+    } else {
+	return 0
+    }
+}
+
 # Return 1 if this is a PowerPC target supporting -mvsx
 
 proc check_effective_target_powerpc_vsx_ok { } {
@@ -4576,6 +4629,7 @@ proc is-effective-target { arg } {
 	switch $arg {
 	  "vmx_hw"         { set selected [check_vmx_hw_available] }
 	  "vsx_hw"         { set selected [check_vsx_hw_available] }
+	  "p8vector_hw"    { set selected [check_p8vector_hw_available] }
 	  "ppc_recip_hw"   { set selected [check_ppc_recip_hw_available] }
 	  "named_sections" { set selected [check_named_sections_available] }
 	  "gc_sections"    { set selected [check_gc_sections_available] }
@@ -4597,6 +4651,7 @@ proc is-effective-target-keyword { arg }
 	switch $arg {
 	  "vmx_hw"         { return 1 }
 	  "vsx_hw"         { return 1 }
+	  "p8vector_hw"    { return 1 }
 	  "ppc_recip_hw"   { return 1 }
 	  "named_sections" { return 1 }
 	  "gc_sections"    { return 1 }
@@ -5181,7 +5236,9 @@ proc check_vect_support_and_set_flags { 
         }
 
         lappend DEFAULT_VECTCFLAGS "-maltivec"
-        if [check_vsx_hw_available] {
+        if [check_p8vector_hw_available] {
+            lappend DEFAULT_VECTCFLAGS "-mpower8-vector" "-mno-allow-movmisalign"
+        } elseif [check_vsx_hw_available] {
             lappend DEFAULT_VECTCFLAGS "-mvsx" "-mno-allow-movmisalign"
         }
 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH, rs6000] power8 patch #1, infrastructure changes (revised patch)
  2013-05-20 21:34   ` [PATCH, rs6000] power8 patch #1, infrastructure changes (revised patch) Michael Meissner
@ 2013-05-22  3:29     ` David Edelsohn
  0 siblings, 0 replies; 52+ messages in thread
From: David Edelsohn @ 2013-05-22  3:29 UTC (permalink / raw)
  To: Michael Meissner, GCC Patches, Pat Haugen, Peter Bergner

On Mon, May 20, 2013 at 5:34 PM, Michael Meissner
<meissner@linux.vnet.ibm.com> wrote:
> After submitting the patch, I realized I had submitted a previous version of
> the patch, that had the wq constraint that was initially for the quad memory
> operations, and also had the changes for ChangeLog.ibm, that I keep on the
> branch.  However, the wq constraint was always equal to the r constraint, do I
> have removed it, and used the 'r' constraint once again.
>
> I have also done bootstraps and make check with the patches submitted, with no
> regressions found.  Can I check in the revised patch?

Patch #1 is okay.

Thanks, David

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH, rs6000] power8 patches, patch #2, add crypto builtins
  2013-05-20 23:13 ` [PATCH, rs6000] power8 patches, patch #2, add crypto builtins Michael Meissner
@ 2013-05-22  3:30   ` David Edelsohn
  2013-05-23  3:41     ` David Edelsohn
  0 siblings, 1 reply; 52+ messages in thread
From: David Edelsohn @ 2013-05-22  3:30 UTC (permalink / raw)
  To: Michael Meissner, GCC Patches, Pat Haugen, Peter Bergner

On Mon, May 20, 2013 at 7:13 PM, Michael Meissner
<meissner@linux.vnet.ibm.com> wrote:
> This patch adds the builtins for the new ISA 2.07 crypto instructions.  It
> bootstraps and causes no regressions, is it ok to install after patch #1 has
> been applied?
>
> [gcc]
> 2013-05-20  Michael Meissner  <meissner@linux.vnet.ibm.com>
>
>         * doc/extend.texi (PowerPC AltiVec/VSX Built-in Functions): Add
>         documentation for the power8 crypto builtins.
>
>         * config/rs6000/t-rs6000 (MD_INCLUDES): Add crypto.md.
>
>         * config/rs6000/rs6000-builtin.def (BU_P8V_AV_1): Add support
>         macros for defining power8 builtin functions.
>         (BU_P8V_AV_2): Likewise.
>         (BU_P8V_AV_P): Likewise.
>         (BU_P8V_VSX_1): Likewise.
>         (BU_P8V_OVERLOAD_1): Likewise.
>         (BU_P8V_OVERLOAD_2): Likewise.
>         (BU_CRYPTO_1): Likewise.
>         (BU_CRYPTO_2): Likewise.
>         (BU_CRYPTO_3): Likewise.
>         (BU_CRYPTO_OVERLOAD_1): Likewise.
>         (BU_CRYPTO_OVERLOAD_2): Likewise.
>         (XSCVSPDP): Fix typo, point to the correct instruction.
>         (VCIPHER): Add power8 crypto builtins.
>         (VCIPHERLAST): Likewise.
>         (VNCIPHER): Likewise.
>         (VNCIPHERLAST): Likewise.
>         (VPMSUMB): Likewise.
>         (VPMSUMH): Likewise.
>         (VPMSUMW): Likewise.
>         (VPERMXOR_V2DI): Likewise.
>         (VPERMXOR_V4SI: Likewise.
>         (VPERMXOR_V8HI: Likewise.
>         (VPERMXOR_V16QI: Likewise.
>         (VSHASIGMAW): Likewise.
>         (VSHASIGMAD): Likewise.
>         (VPMSUM): Likewise.
>         (VPERMXOR): Likewise.
>         (VSHASIGMA): Likewise.
>
>         * config/rs6000/rs6000-c.c (rs6000_target_modify_macros): Define
>         __CRYPTO__ if the crypto instructions are available.
>         (altivec_overloaded_builtins): Add support for overloaded power8
>         builtins.
>
>         * config/rs6000/rs6000.c (rs6000_expand_ternop_builtin): Add
>         support for power8 crypto builtins.
>         (builtin_function_type): Likewise.
>         (altivec_init_builtins): Add support for builtins that take vector
>         long long (V2DI) arguments.
>
>         * config/rs6000/crypto.md: New file, define power8 crypto
>         instructions.
>
> [gcc/testsuite]
> 2013-05-20  Michael Meissner  <meissner@linux.vnet.ibm.com>
>
>         * gcc.target/powerpc/crypto-builtin-1.c: New file, test for power8
>         crypto builtins.

Patch #2 is okay.

Thanks, David

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH, rs6000] power8 patches, patch #6, direct move & basic quad load/store
  2013-05-20 20:41 [PATCH, rs6000] power8 patches Michael Meissner
                   ` (5 preceding siblings ...)
  2013-05-21 23:49 ` [PATCH, rs6000] power8 patches, patch #5, new vector tests Michael Meissner
@ 2013-05-22 14:26 ` Michael Meissner
  2013-05-29 19:53   ` David Edelsohn
  2013-05-22 16:51 ` [PATCH, rs6000] power8 patches, patch #7, quad/byte/half-word atomic instructions Michael Meissner
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 52+ messages in thread
From: Michael Meissner @ 2013-05-22 14:26 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, dje.gcc, pthaugen, bergner

[-- Attachment #1: Type: text/plain, Size: 5513 bytes --]

This patch adds support for the power8 direct move instructions, which allow
data transfer between general purpose registers and VSX registers.  In
addition, it adds basic support for quad word load and store instructions if
they can be used in the insn.  The next patch will add support for the atomic
quad word load and store instructions.

This patch passes the basic bootstrap and adds no regressions.  Is it
acceptable to check in once the previous patches are checked in?

[gcc]
2013-05-21  Michael Meissner  <meissner@linux.vnet.ibm.com>
	    Pat Haugen <pthaugen@us.ibm.com>
	    Peter Bergner <bergner@vnet.ibm.com>

	* config/rs6000/vector.md (GPR move splitter): Do not split moves
	of vectors in GPRS if they are direct moves or quad word load or
	store moves.

	* config/rs6000/rs6000-protos.h (rs6000_output_move_128bit): Add
	declaration.
	(direct_move_p): Likewise.
	(quad_load_store_p): Likewise.

	* config/rs6000/rs6000.c (enum rs6000_reg_type): Simplify register
	classes into bins based on the physical register type.
	(reg_class_to_reg_type): Likewise.
	(IS_STD_REG_TYPE): Likewise.
	(IS_FP_VECT_REG_TYPE): Likewise.
	(reload_fpr_gpr): Arrays to determine what insn to use if we can
	use direct move instructions.
	(reload_gpr_vsx): Likewise.
	(reload_vsx_gpr): Likewise.
	(rs6000_init_hard_regno_mode_ok): Precalculate the register type
	information that is a simplification of register classes.  Also
	precalculate direct move reload helpers.
	(direct_move_p): New function to return true if the operation can
	be done as a direct move instruciton.
	(quad_load_store_p): New function to return true if the operation
	is a quad memory operation.
	(rs6000_legitimize_address): If quad memory, only allow register
	indirect for TImode addresses.
	(rs6000_legitimate_address_p): Likewise.
	(enum reload_reg_type): Delete, replace with rs6000_reg_type.
	(rs6000_reload_register_type): Likewise.
	(register_to_reg_type): Return register type.
	(rs6000_secondary_reload_simple_move): New helper function for
	secondary reload and secondary memory needed to identify anything
	that is a simple move, and does not need reloading.
	(rs6000_secondary_reload_direct_move): New helper function for
	secondary reload to identify cases that can be done with several
	instructions via the direct move instructions.
	(rs6000_secondary_reload_move): New helper function for secondary
	reload to identify moves between register types that can be done.
	(rs6000_secondary_reload): Add support for quad memory operations
	and for direct move.
	(rs6000_secondary_memory_needed): Likewise.
	(rs6000_debug_secondary_memory_needed): Change argument names.
	(rs6000_output_move_128bit): New function to return the move to
	use for 128-bit moves, including knowing about the various
	limitations of quad memory operations.

	* config/rs6000/vsx.md (vsx_mov<mode>): Add support for quad
	memory operations.  call rs6000_output_move_128bit for the actual
	instruciton(s) to generate.
	(vsx_movti_64bit): Likewise.

	* config/rs6000/rs6000.md (UNSPEC_P8V_FMRGOW): New unspec values.
	(UNSPEC_P8V_MTVSRWZ): Likewise.
	(UNSPEC_P8V_RELOAD_FROM_GPR): Likewise.
	(UNSPEC_P8V_MTVSRD): Likewise.
	(UNSPEC_P8V_XXPERMDI): Likewise.
	(UNSPEC_P8V_RELOAD_FROM_VSX): Likewise.
	(UNSPEC_FUSION_GPR): Likewise.
	(FMOVE128_GPR): New iterator for direct move.
	(f32_lv): New mode attribute for load/store of SFmode/SDmode
	values.
	(f32_sv): Likewise.
	(f32_dm): Likewise.
	(zero_extend<mode>di2_internal1): Add support for power8 32-bit
	loads and direct move instructions.
	(zero_extendsidi2_lfiwzx): Likewise.
	(extendsidi2_lfiwax): Likewise.
	(extendsidi2_nocell): Likewise.
	(floatsi<mode>2_lfiwax): Likewise.
	(lfiwax): Likewise.
	(floatunssi<mode>2_lfiwzx): Likewise.
	(lfiwzx): Likewise.
	(fix_trunc<mode>_stfiwx): Likewise.
	(fixuns_trunc<mode>_stfiwx): Likewise.
	(mov<mode>_hardfloat, 32-bit floating point): Likewise.
	(mov<move>_hardfloat64, 64-bit floating point): Likewise.
	(parity<mode>2_cmpb): Set length/type attr.
	(unnamed shift right patterns, mov<mode>_internal2): Change type attr
	for 'mr.' to fast_compare.
	(bpermd_<mode>): Change type attr to popcnt.
	(p8_fmrgow_<mode>): New insns for power8 direct move support.
	(p8_mtvsrwz_1): Likewise.
	(p8_mtvsrwz_2): Likewise.
	(reload_fpr_from_gpr<mode>): Likewise.
	(p8_mtvsrd_1): Likewise.
	(p8_mtvsrd_2): Likewise.
	(p8_xxpermdi_<mode>): Likewise.
	(reload_vsx_from_gpr<mode>): Likewise.
	(reload_vsx_from_gprsf): Likewise.
	(p8_mfvsrd_3_<mode>): LIkewise.
	(reload_gpr_from_vsx<mode>): Likewise.
	(reload_gpr_from_vsxsf): Likewise.
	(p8_mfvsrd_4_disf): Likewise.
	(multi-word GPR splits): Do not split direct moves or quad memory
	operations.

[gcc/testsuite]
2013-05-21  Michael Meissner  <meissner@linux.vnet.ibm.com>
	    Pat Haugen <pthaugen@us.ibm.com>
	    Peter Bergner <bergner@vnet.ibm.com>

	* gcc.target/powerpc/direct-move-vint1.c: New tests for power8
	direct move instructions.
	* gcc.target/powerpc/direct-move-vint2.c: Likewise.
	* gcc.target/powerpc/direct-move.h: Likewise.
	* gcc.target/powerpc/direct-move-float1.c: Likewise.
	* gcc.target/powerpc/direct-move-float2.c: Likewise.
	* gcc.target/powerpc/direct-move-double1.c: Likewise.
	* gcc.target/powerpc/direct-move-double2.c: Likewise.
	* gcc.target/powerpc/direct-move-long1.c: Likewise.
	* gcc.target/powerpc/direct-move-long2.c: Likewise.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

[-- Attachment #2: gcc-power8.official-06b --]
[-- Type: text/plain, Size: 69836 bytes --]

Index: gcc/config/rs6000/vector.md
===================================================================
--- gcc/config/rs6000/vector.md	(revision 199168)
+++ gcc/config/rs6000/vector.md	(working copy)
@@ -126,7 +126,9 @@ (define_split
         (match_operand:VEC_L 1 "input_operand" ""))]
   "VECTOR_MEM_ALTIVEC_OR_VSX_P (<MODE>mode)
    && reload_completed
-   && gpr_or_gpr_p (operands[0], operands[1])"
+   && gpr_or_gpr_p (operands[0], operands[1])
+   && !direct_move_p (operands[0], operands[1])
+   && !quad_load_store_p (operands[0], operands[1])"
   [(pc)]
 {
   rs6000_split_multireg_move (operands[0], operands[1]);
Index: gcc/config/rs6000/rs6000-protos.h
===================================================================
--- gcc/config/rs6000/rs6000-protos.h	(revision 199037)
+++ gcc/config/rs6000/rs6000-protos.h	(working copy)
@@ -50,6 +50,7 @@ extern rtx rs6000_got_register (rtx);
 extern rtx find_addr_reg (rtx);
 extern rtx gen_easy_altivec_constant (rtx);
 extern const char *output_vec_const_move (rtx *);
+extern const char *rs6000_output_move_128bit (rtx *);
 extern void rs6000_expand_vector_init (rtx, rtx);
 extern void paired_expand_vector_init (rtx, rtx);
 extern void rs6000_expand_vector_set (rtx, rtx, int);
@@ -70,6 +71,8 @@ extern int insvdi_rshift_rlwimi_p (rtx, 
 extern int registers_ok_for_quad_peep (rtx, rtx);
 extern int mems_ok_for_quad_peep (rtx, rtx);
 extern bool gpr_or_gpr_p (rtx, rtx);
+extern bool direct_move_p (rtx, rtx);
+extern bool quad_load_store_p (rtx, rtx);
 extern enum reg_class (*rs6000_preferred_reload_class_ptr) (rtx,
 							    enum reg_class);
 extern enum reg_class (*rs6000_secondary_reload_class_ptr) (enum reg_class,
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 199168)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -292,6 +292,39 @@ typedef rtx (*gen_2arg_fn_t) (rtx, rtx, 
    don't link in rs6000-c.c, so we can't call it directly.  */
 void (*rs6000_target_modify_macros_ptr) (bool, HOST_WIDE_INT, HOST_WIDE_INT);
 
+/* Simplfy register classes into simpler classifications.  We assume
+   GPR_REG_TYPE - FPR_REG_TYPE are ordered so that we can use a simple range
+   check for standard register classes (gpr/floating/altivec/vsx) and
+   floating/vector classes (float/altivec/vsx).  */
+
+enum rs6000_reg_type {
+  NO_REG_TYPE,
+  PSEUDO_REG_TYPE,
+  GPR_REG_TYPE,
+  VSX_REG_TYPE,
+  ALTIVEC_REG_TYPE,
+  FPR_REG_TYPE,
+  SPR_REG_TYPE,
+  CR_REG_TYPE,
+  SPE_ACC_TYPE,
+  SPEFSCR_REG_TYPE
+};
+
+/* Map register class to register type.  */
+static enum rs6000_reg_type reg_class_to_reg_type[N_REG_CLASSES];
+
+/* First/last register type for the 'normal' register types (i.e. general
+   purpose, floating point, altivec, and VSX registers).  */
+#define IS_STD_REG_TYPE(RTYPE) IN_RANGE(RTYPE, GPR_REG_TYPE, FPR_REG_TYPE)
+
+#define IS_FP_VECT_REG_TYPE(RTYPE) IN_RANGE(RTYPE, VSX_REG_TYPE, FPR_REG_TYPE)
+
+/* Direct moves to/from vsx/gpr registers that need an additional register to
+   do the move.  */
+static enum insn_code reload_fpr_gpr[NUM_MACHINE_MODES];
+static enum insn_code reload_gpr_vsx[NUM_MACHINE_MODES];
+static enum insn_code reload_vsx_gpr[NUM_MACHINE_MODES];
+
 \f
 /* Target cpu costs.  */
 
@@ -1042,6 +1075,13 @@ static void rs6000_print_isa_options (FI
 static void rs6000_print_builtin_options (FILE *, int, const char *,
 					  HOST_WIDE_INT);
 
+static enum rs6000_reg_type register_to_reg_type (rtx, bool *);
+static bool rs6000_secondary_reload_move (enum rs6000_reg_type,
+					  enum rs6000_reg_type,
+					  enum machine_mode,
+					  secondary_reload_info *,
+					  bool);
+
 /* Hash table stuff for keeping track of TOC entries.  */
 
 struct GTY(()) toc_hash_struct
@@ -1587,8 +1627,7 @@ rs6000_hard_regno_mode_ok (int regno, en
 	return ALTIVEC_REGNO_P (last_regno);
     }
 
-  /* Allow TImode in all VSX registers if the user asked for it.  Note, PTImode
-     can only go in GPRs.  */
+  /* Allow TImode in all VSX registers if the user asked for it.  */
   if (mode == TImode && TARGET_VSX_TIMODE && VSX_REGNO_P (regno))
     return 1;
 
@@ -2154,6 +2193,36 @@ rs6000_init_hard_regno_mode_ok (bool glo
   rs6000_regno_regclass[ARG_POINTER_REGNUM] = BASE_REGS;
   rs6000_regno_regclass[FRAME_POINTER_REGNUM] = BASE_REGS;
 
+  /* Precalculate register class to simpler reload register class.  We don't
+     need all of the register classes that are combinations of different
+     classes, just the simple ones that have constraint letters.  */
+  for (c = 0; c < N_REG_CLASSES; c++)
+    reg_class_to_reg_type[c] = NO_REG_TYPE;
+
+  reg_class_to_reg_type[(int)GENERAL_REGS] = GPR_REG_TYPE;
+  reg_class_to_reg_type[(int)BASE_REGS] = GPR_REG_TYPE;
+  reg_class_to_reg_type[(int)VSX_REGS] = VSX_REG_TYPE;
+  reg_class_to_reg_type[(int)VRSAVE_REGS] = SPR_REG_TYPE;
+  reg_class_to_reg_type[(int)VSCR_REGS] = SPR_REG_TYPE;
+  reg_class_to_reg_type[(int)LINK_REGS] = SPR_REG_TYPE;
+  reg_class_to_reg_type[(int)CTR_REGS] = SPR_REG_TYPE;
+  reg_class_to_reg_type[(int)LINK_OR_CTR_REGS] = SPR_REG_TYPE;
+  reg_class_to_reg_type[(int)CR_REGS] = CR_REG_TYPE;
+  reg_class_to_reg_type[(int)CR0_REGS] = CR_REG_TYPE;
+  reg_class_to_reg_type[(int)SPE_ACC_REGS] = SPE_ACC_TYPE;
+  reg_class_to_reg_type[(int)SPEFSCR_REGS] = SPEFSCR_REG_TYPE;
+
+  if (TARGET_VSX)
+    {
+      reg_class_to_reg_type[(int)FLOAT_REGS] = VSX_REG_TYPE;
+      reg_class_to_reg_type[(int)ALTIVEC_REGS] = VSX_REG_TYPE;
+    }
+  else
+    {
+      reg_class_to_reg_type[(int)FLOAT_REGS] = FPR_REG_TYPE;
+      reg_class_to_reg_type[(int)ALTIVEC_REGS] = ALTIVEC_REG_TYPE;
+    }
+
   /* Precalculate vector information, this must be set up before the
      rs6000_hard_regno_nregs_internal below.  */
   for (m = 0; m < NUM_MACHINE_MODES; ++m)
@@ -2305,7 +2374,15 @@ rs6000_init_hard_regno_mode_ok (bool glo
   if (TARGET_LFIWZX)
     rs6000_constraints[RS6000_CONSTRAINT_wz] = FLOAT_REGS;
 
-  /* Set up the reload helper functions.  */
+  /* Setup the direct move combinations.  */
+  for (m = 0; m < NUM_MACHINE_MODES; ++m)
+    {
+      reload_fpr_gpr[m] = CODE_FOR_nothing;
+      reload_gpr_vsx[m] = CODE_FOR_nothing;
+      reload_vsx_gpr[m] = CODE_FOR_nothing;
+    }
+
+  /* Set up the reload helper and direct move functions.  */
   if (TARGET_VSX || TARGET_ALTIVEC)
     {
       if (TARGET_64BIT)
@@ -2329,11 +2406,47 @@ rs6000_init_hard_regno_mode_ok (bool glo
 	      rs6000_vector_reload[DDmode][0]  = CODE_FOR_reload_dd_di_store;
 	      rs6000_vector_reload[DDmode][1]  = CODE_FOR_reload_dd_di_load;
 	    }
+	  if (TARGET_P8_VECTOR)
+	    {
+	      rs6000_vector_reload[SFmode][0]  = CODE_FOR_reload_sf_di_store;
+	      rs6000_vector_reload[SFmode][1]  = CODE_FOR_reload_sf_di_load;
+	      rs6000_vector_reload[SDmode][0]  = CODE_FOR_reload_sd_di_store;
+	      rs6000_vector_reload[SDmode][1]  = CODE_FOR_reload_sd_di_load;
+	    }
 	  if (TARGET_VSX_TIMODE)
 	    {
 	      rs6000_vector_reload[TImode][0]  = CODE_FOR_reload_ti_di_store;
 	      rs6000_vector_reload[TImode][1]  = CODE_FOR_reload_ti_di_load;
 	    }
+	  if (TARGET_DIRECT_MOVE)
+	    {
+	      if (TARGET_POWERPC64)
+		{
+		  reload_gpr_vsx[TImode]    = CODE_FOR_reload_gpr_from_vsxti;
+		  reload_gpr_vsx[V2DFmode]  = CODE_FOR_reload_gpr_from_vsxv2df;
+		  reload_gpr_vsx[V2DImode]  = CODE_FOR_reload_gpr_from_vsxv2di;
+		  reload_gpr_vsx[V4SFmode]  = CODE_FOR_reload_gpr_from_vsxv4sf;
+		  reload_gpr_vsx[V4SImode]  = CODE_FOR_reload_gpr_from_vsxv4si;
+		  reload_gpr_vsx[V8HImode]  = CODE_FOR_reload_gpr_from_vsxv8hi;
+		  reload_gpr_vsx[V16QImode] = CODE_FOR_reload_gpr_from_vsxv16qi;
+		  reload_gpr_vsx[SFmode]    = CODE_FOR_reload_gpr_from_vsxsf;
+
+		  reload_vsx_gpr[TImode]    = CODE_FOR_reload_vsx_from_gprti;
+		  reload_vsx_gpr[V2DFmode]  = CODE_FOR_reload_vsx_from_gprv2df;
+		  reload_vsx_gpr[V2DImode]  = CODE_FOR_reload_vsx_from_gprv2di;
+		  reload_vsx_gpr[V4SFmode]  = CODE_FOR_reload_vsx_from_gprv4sf;
+		  reload_vsx_gpr[V4SImode]  = CODE_FOR_reload_vsx_from_gprv4si;
+		  reload_vsx_gpr[V8HImode]  = CODE_FOR_reload_vsx_from_gprv8hi;
+		  reload_vsx_gpr[V16QImode] = CODE_FOR_reload_vsx_from_gprv16qi;
+		  reload_vsx_gpr[SFmode]    = CODE_FOR_reload_vsx_from_gprsf;
+		}
+	      else
+		{
+		  reload_fpr_gpr[DImode] = CODE_FOR_reload_fpr_from_gprdi;
+		  reload_fpr_gpr[DDmode] = CODE_FOR_reload_fpr_from_gprdd;
+		  reload_fpr_gpr[DFmode] = CODE_FOR_reload_fpr_from_gprdf;
+		}
+	    }
 	}
       else
 	{
@@ -2356,6 +2469,13 @@ rs6000_init_hard_regno_mode_ok (bool glo
 	      rs6000_vector_reload[DDmode][0]  = CODE_FOR_reload_dd_si_store;
 	      rs6000_vector_reload[DDmode][1]  = CODE_FOR_reload_dd_si_load;
 	    }
+	  if (TARGET_P8_VECTOR)
+	    {
+	      rs6000_vector_reload[SFmode][0]  = CODE_FOR_reload_sf_si_store;
+	      rs6000_vector_reload[SFmode][1]  = CODE_FOR_reload_sf_si_load;
+	      rs6000_vector_reload[SDmode][0]  = CODE_FOR_reload_sd_si_store;
+	      rs6000_vector_reload[SDmode][1]  = CODE_FOR_reload_sd_si_load;
+	    }
 	  if (TARGET_VSX_TIMODE)
 	    {
 	      rs6000_vector_reload[TImode][0]  = CODE_FOR_reload_ti_si_store;
@@ -5405,6 +5525,72 @@ gpr_or_gpr_p (rtx op0, rtx op1)
 	  || (REG_P (op1) && INT_REGNO_P (REGNO (op1))));
 }
 
+/* Return true if this is a move direct operation between GPR registers and
+   floating point/VSX registers.  */
+
+bool
+direct_move_p (rtx op0, rtx op1)
+{
+  int regno0, regno1;
+
+  if (!REG_P (op0) || !REG_P (op1))
+    return false;
+
+  if (!TARGET_DIRECT_MOVE && !TARGET_MFPGPR)
+    return false;
+
+  regno0 = REGNO (op0);
+  regno1 = REGNO (op1);
+  if (regno0 >= FIRST_PSEUDO_REGISTER || regno1 >= FIRST_PSEUDO_REGISTER)
+    return false;
+
+  if (INT_REGNO_P (regno0))
+    return (TARGET_DIRECT_MOVE) ? VSX_REGNO_P (regno1) : FP_REGNO_P (regno1);
+
+  else if (INT_REGNO_P (regno1))
+    {
+      if (TARGET_MFPGPR && FP_REGNO_P (regno0))
+	return true;
+
+      else if (TARGET_DIRECT_MOVE && VSX_REGNO_P (regno0))
+	return true;
+    }
+
+  return false;
+}
+
+/* Return true if this is a load or store quad operation.  */
+
+bool
+quad_load_store_p (rtx op0, rtx op1)
+{
+  bool ret;
+
+  if (!TARGET_QUAD_MEMORY)
+    ret = false;
+
+  else if (REG_P (op0) && MEM_P (op1))
+    ret = (quad_int_reg_operand (op0, GET_MODE (op0))
+	   && quad_memory_operand (op1, GET_MODE (op1))
+	   && !reg_overlap_mentioned_p (op0, op1));
+
+  else if (MEM_P (op0) && REG_P (op1))
+    ret = (quad_memory_operand (op0, GET_MODE (op0))
+	   && quad_int_reg_operand (op1, GET_MODE (op1)));
+
+  else
+    ret = false;
+
+  if (TARGET_DEBUG_ADDR)
+    {
+      fprintf (stderr, "\n========== quad_load_store, return %s\n",
+	       ret ? "true" : "false");
+      debug_rtx (gen_rtx_SET (VOIDmode, op0, op1));
+    }
+
+  return ret;
+}
+
 /* Given an address, return a constant offset term if one exists.  */
 
 static rtx
@@ -5912,8 +6098,11 @@ rs6000_legitimize_address (rtx x, rtx ol
       if (GET_CODE (x) == PLUS && XEXP (x, 1) == const0_rtx)
 	return force_reg (Pmode, XEXP (x, 0));
 
+      /* For TImode with load/store quad, restrict addresses to just a single
+	 pointer, so it works with both GPRs and VSX registers.  */
       /* Make sure both operands are registers.  */
-      else if (GET_CODE (x) == PLUS)
+      else if (GET_CODE (x) == PLUS
+	       && (mode != TImode || !TARGET_QUAD_MEMORY))
 	return gen_rtx_PLUS (Pmode,
 			     force_reg (Pmode, XEXP (x, 0)),
 			     force_reg (Pmode, XEXP (x, 1)));
@@ -6868,6 +7057,13 @@ rs6000_legitimate_address_p (enum machin
   if (reg_offset_p
       && legitimate_constant_pool_address_p (x, mode, reg_ok_strict))
     return 1;
+  /* For TImode, if we have load/store quad, only allow register indirect
+     addresses.  This will allow the values to go in either GPRs or VSX
+     registers without reloading.  The vector types would tend to go into VSX
+     registers, so we allow REG+REG, while TImode seems somewhat split, in that
+     some uses are GPR based, and some VSX based.  */
+  if (mode == TImode && TARGET_QUAD_MEMORY)
+    return 0;
   /* If not REG_OK_STRICT (before reload) let pass any stack offset.  */
   if (! reg_ok_strict
       && reg_offset_p
@@ -14014,29 +14210,226 @@ rs6000_check_sdmode (tree *tp, int *walk
   return NULL_TREE;
 }
 
-enum reload_reg_type {
-  GPR_REGISTER_TYPE,
-  VECTOR_REGISTER_TYPE,
-  OTHER_REGISTER_TYPE
-};
+/* Classify a register type.  Because the FMRGOW/FMRGEW instructions only work
+   on traditional floating point registers, and the VMRGOW/VMRGEW instructions
+   only work on the traditional altivec registers, note if an altivec register
+   was choosen.  */
 
-static enum reload_reg_type
-rs6000_reload_register_type (enum reg_class rclass)
+static enum rs6000_reg_type
+register_to_reg_type (rtx reg, bool *is_altivec)
 {
-  switch (rclass)
+  HOST_WIDE_INT regno;
+  enum reg_class rclass;
+
+  if (GET_CODE (reg) == SUBREG)
+    reg = SUBREG_REG (reg);
+
+  if (!REG_P (reg))
+    return NO_REG_TYPE;
+
+  regno = REGNO (reg);
+  if (regno >= FIRST_PSEUDO_REGISTER)
     {
-    case GENERAL_REGS:
-    case BASE_REGS:
-      return GPR_REGISTER_TYPE;
+      if (!lra_in_progress && !reload_in_progress && !reload_completed)
+	return PSEUDO_REG_TYPE;
 
-    case FLOAT_REGS:
-    case ALTIVEC_REGS:
-    case VSX_REGS:
-      return VECTOR_REGISTER_TYPE;
+      regno = true_regnum (reg);
+      if (regno < 0 || regno >= FIRST_PSEUDO_REGISTER)
+	return PSEUDO_REG_TYPE;
+    }	
 
-    default:
-      return OTHER_REGISTER_TYPE;
+  gcc_assert (regno >= 0);
+
+  if (is_altivec && ALTIVEC_REGNO_P (regno))
+    *is_altivec = true;
+
+  rclass = rs6000_regno_regclass[regno];
+  return reg_class_to_reg_type[(int)rclass];
+}
+
+/* Helper function for rs6000_secondary_reload to return true if a move to a
+   different register classe is really a simple move.  */
+
+static bool
+rs6000_secondary_reload_simple_move (enum rs6000_reg_type to_type,
+				     enum rs6000_reg_type from_type,
+				     enum machine_mode mode)
+{
+  int size;
+
+  /* Add support for various direct moves available.  In this function, we only
+     look at cases where we don't need any extra registers, and one or more
+     simple move insns are issued.  At present, 32-bit integers are not allowed
+     in FPR/VSX registers.  Single precision binary floating is not a simple
+     move because we need to convert to the single precision memory layout.
+     The 4-byte SDmode can be moved.  */
+  size = GET_MODE_SIZE (mode);
+  if (TARGET_DIRECT_MOVE
+      && ((mode == SDmode) || (TARGET_POWERPC64 && size == 8))
+      && ((to_type == GPR_REG_TYPE && from_type == VSX_REG_TYPE)
+	  || (to_type == VSX_REG_TYPE && from_type == GPR_REG_TYPE)))
+    return true;
+
+  else if (TARGET_MFPGPR && TARGET_POWERPC64 && size == 8
+	   && ((to_type == GPR_REG_TYPE && from_type == FPR_REG_TYPE)
+	       || (to_type == FPR_REG_TYPE && from_type == GPR_REG_TYPE)))
+    return true;
+
+  else if ((size == 4 || (TARGET_POWERPC64 && size == 8))
+	   && ((to_type == GPR_REG_TYPE && from_type == SPR_REG_TYPE)
+	       || (to_type == SPR_REG_TYPE && from_type == GPR_REG_TYPE)))
+    return true;
+
+  return false;
+}
+
+/* Power8 helper function for rs6000_secondary_reload, handle all of the
+   special direct moves that involve allocating an extra register, return the
+   insn code of the helper function if there is such a function or
+   CODE_FOR_nothing if not.  */
+
+static bool
+rs6000_secondary_reload_direct_move (enum rs6000_reg_type to_type,
+				     enum rs6000_reg_type from_type,
+				     enum machine_mode mode,
+				     secondary_reload_info *sri,
+				     bool altivec_p)
+{
+  bool ret = false;
+  enum insn_code icode = CODE_FOR_nothing;
+  int cost = 0;
+  int size = GET_MODE_SIZE (mode);
+
+  if (TARGET_POWERPC64)
+    {
+      if (size == 16)
+	{
+	  /* Handle moving 128-bit values from GPRs to VSX point registers on
+	     power8 when running in 64-bit mode using XXPERMDI to glue the two
+	     64-bit values back together.  */
+	  if (to_type == VSX_REG_TYPE && from_type == GPR_REG_TYPE)
+	    {
+	      cost = 3;			/* 2 mtvsrd's, 1 xxpermdi.  */
+	      icode = reload_vsx_gpr[(int)mode];
+	    }
+
+	  /* Handle moving 128-bit values from VSX point registers to GPRs on
+	     power8 when running in 64-bit mode using XXPERMDI to get access to the
+	     bottom 64-bit value.  */
+	  else if (to_type == GPR_REG_TYPE && from_type == VSX_REG_TYPE)
+	    {
+	      cost = 3;			/* 2 mfvsrd's, 1 xxpermdi.  */
+	      icode = reload_gpr_vsx[(int)mode];
+	    }
+	}
+
+      else if (mode == SFmode)
+	{
+	  if (to_type == GPR_REG_TYPE && from_type == VSX_REG_TYPE)
+	    {
+	      cost = 3;			/* xscvdpspn, mfvsrd, and.  */
+	      icode = reload_gpr_vsx[(int)mode];
+	    }
+
+	  else if (to_type == VSX_REG_TYPE && from_type == GPR_REG_TYPE)
+	    {
+	      cost = 2;			/* mtvsrz, xscvspdpn.  */
+	      icode = reload_vsx_gpr[(int)mode];
+	    }
+	}
+    }
+
+  if (TARGET_POWERPC64 && size == 16)
+    {
+      /* Handle moving 128-bit values from GPRs to VSX point registers on
+	 power8 when running in 64-bit mode using XXPERMDI to glue the two
+	 64-bit values back together.  */
+      if (to_type == VSX_REG_TYPE && from_type == GPR_REG_TYPE)
+	{
+	  cost = 3;			/* 2 mtvsrd's, 1 xxpermdi.  */
+	  icode = reload_vsx_gpr[(int)mode];
+	}
+
+      /* Handle moving 128-bit values from VSX point registers to GPRs on
+	 power8 when running in 64-bit mode using XXPERMDI to get access to the
+	 bottom 64-bit value.  */
+      else if (to_type == GPR_REG_TYPE && from_type == VSX_REG_TYPE)
+	{
+	  cost = 3;			/* 2 mfvsrd's, 1 xxpermdi.  */
+	  icode = reload_gpr_vsx[(int)mode];
+	}
+    }
+
+  else if (!TARGET_POWERPC64 && size == 8)
+    {
+      /* Handle moving 64-bit values from GPRs to floating point registers on
+	 power8 when running in 32-bit mode using FMRGOW to glue the two 32-bit
+	 values back together.  Altivec register classes must be handled
+	 specially since a different instruction is used, and the secondary
+	 reload support requires a single instruction class in the scratch
+	 register constraint.  However, right now TFmode is not allowed in
+	 Altivec registers, so the pattern will never match.  */
+      if (to_type == VSX_REG_TYPE && from_type == GPR_REG_TYPE && !altivec_p)
+	{
+	  cost = 3;			/* 2 mtvsrwz's, 1 fmrgow.  */
+	  icode = reload_fpr_gpr[(int)mode];
+	}
     }
+
+  if (icode != CODE_FOR_nothing)
+    {
+      ret = true;
+      if (sri)
+	{
+	  sri->icode = icode;
+	  sri->extra_cost = cost;
+	}
+    }
+
+  return ret;
+}
+
+/* Return whether a move between two register classes can be done either
+   directly (simple move) or via a pattern that uses a single extra temporary
+   (using power8's direct move in this case.  */
+
+static bool
+rs6000_secondary_reload_move (enum rs6000_reg_type to_type,
+			      enum rs6000_reg_type from_type,
+			      enum machine_mode mode,
+			      secondary_reload_info *sri,
+			      bool altivec_p)
+{
+  /* Fall back to load/store reloads if either type is not a register.  */
+  if (to_type == NO_REG_TYPE || from_type == NO_REG_TYPE)
+    return false;
+
+  /* If we haven't allocated registers yet, assume the move can be done for the
+     standard register types.  */
+  if ((to_type == PSEUDO_REG_TYPE && from_type == PSEUDO_REG_TYPE)
+      || (to_type == PSEUDO_REG_TYPE && IS_STD_REG_TYPE (from_type))
+      || (from_type == PSEUDO_REG_TYPE && IS_STD_REG_TYPE (to_type)))
+    return true;
+
+  /* Moves to the same set of registers is a simple move for non-specialized
+     registers.  */
+  if (to_type == from_type && IS_STD_REG_TYPE (to_type))
+    return true;
+
+  /* Check whether a simple move can be done directly.  */
+  if (rs6000_secondary_reload_simple_move (to_type, from_type, mode))
+    {
+      if (sri)
+	{
+	  sri->icode = CODE_FOR_nothing;
+	  sri->extra_cost = 0;
+	}
+      return true;
+    }
+
+  /* Now check if we can do it in a few steps.  */
+  return rs6000_secondary_reload_direct_move (to_type, from_type, mode, sri,
+					      altivec_p);
 }
 
 /* Inform reload about cases where moving X with a mode MODE to a register in
@@ -14062,11 +14455,32 @@ rs6000_secondary_reload (bool in_p,
   bool default_p = false;
 
   sri->icode = CODE_FOR_nothing;
-
-  /* Convert vector loads and stores into gprs to use an additional base
-     register.  */
   icode = rs6000_vector_reload[mode][in_p != false];
-  if (icode != CODE_FOR_nothing)
+
+  if (REG_P (x) || register_operand (x, mode))
+    {
+      enum rs6000_reg_type to_type = reg_class_to_reg_type[(int)rclass];
+      bool altivec_p = (rclass == ALTIVEC_REGS);
+      enum rs6000_reg_type from_type = register_to_reg_type (x, &altivec_p);
+
+      if (!in_p)
+	{
+	  enum rs6000_reg_type exchange = to_type;
+	  to_type = from_type;
+	  from_type = exchange;
+	}
+
+      if (rs6000_secondary_reload_move (to_type, from_type, mode, sri,
+					altivec_p))
+	{
+	  icode = (enum insn_code)sri->icode;
+	  default_p = false;
+	  ret = NO_REGS;
+	}
+    }
+
+  /* Handle vector moves with reload helper functions.  */
+  if (ret == ALL_REGS && icode != CODE_FOR_nothing)
     {
       ret = NO_REGS;
       sri->icode = CODE_FOR_nothing;
@@ -14078,12 +14492,21 @@ rs6000_secondary_reload (bool in_p,
 
 	  /* Loads to and stores from gprs can do reg+offset, and wouldn't need
 	     an extra register in that case, but it would need an extra
-	     register if the addressing is reg+reg or (reg+reg)&(-16).  */
+	     register if the addressing is reg+reg or (reg+reg)&(-16).  Special
+	     case load/store quad.  */
 	  if (rclass == GENERAL_REGS || rclass == BASE_REGS)
 	    {
-	      if (!legitimate_indirect_address_p (addr, false)
-		  && !rs6000_legitimate_offset_address_p (PTImode, addr,
-							  false, true))
+	      if (TARGET_POWERPC64 && TARGET_QUAD_MEMORY
+		  && GET_MODE_SIZE (mode) == 16
+		  && quad_memory_operand (x, mode))
+		{
+		  sri->icode = icode;
+		  sri->extra_cost = 2;
+		}
+
+	      else if (!legitimate_indirect_address_p (addr, false)
+		       && !rs6000_legitimate_offset_address_p (PTImode, addr,
+							       false, true))
 		{
 		  sri->icode = icode;
 		  /* account for splitting the loads, and converting the
@@ -14097,7 +14520,7 @@ rs6000_secondary_reload (bool in_p,
          else if ((rclass == FLOAT_REGS || rclass == NO_REGS)
                   && (GET_MODE_SIZE (mode) == 4 || GET_MODE_SIZE (mode) == 8)
                   && (legitimate_indirect_address_p (addr, false)
-                      || legitimate_indirect_address_p (XEXP (addr, 0), false)
+                      || legitimate_indirect_address_p (addr, false)
                       || rs6000_legitimate_offset_address_p (mode, addr,
                                                              false, true)))
 
@@ -14149,12 +14572,12 @@ rs6000_secondary_reload (bool in_p,
 	  else
 	    {
 	      enum reg_class xclass = REGNO_REG_CLASS (regno);
-	      enum reload_reg_type rtype1 = rs6000_reload_register_type (rclass);
-	      enum reload_reg_type rtype2 = rs6000_reload_register_type (xclass);
+	      enum rs6000_reg_type rtype1 = reg_class_to_reg_type[(int)rclass];
+	      enum rs6000_reg_type rtype2 = reg_class_to_reg_type[(int)xclass];
 
 	      /* If memory is needed, use default_secondary_reload to create the
 		 stack slot.  */
-	      if (rtype1 != rtype2 || rtype1 == OTHER_REGISTER_TYPE)
+	      if (rtype1 != rtype2 || !IS_STD_REG_TYPE (rtype1))
 		default_p = true;
 	      else
 		ret = NO_REGS;
@@ -14164,7 +14587,7 @@ rs6000_secondary_reload (bool in_p,
 	default_p = true;
     }
   else if (TARGET_POWERPC64
-	   && rs6000_reload_register_type (rclass) == GPR_REGISTER_TYPE
+	   && reg_class_to_reg_type[(int)rclass] == GPR_REG_TYPE
 	   && MEM_P (x)
 	   && GET_MODE_SIZE (GET_MODE (x)) >= UNITS_PER_WORD)
     {
@@ -14203,7 +14626,7 @@ rs6000_secondary_reload (bool in_p,
 	default_p = true;
     }
   else if (!TARGET_POWERPC64
-	   && rs6000_reload_register_type (rclass) == GPR_REGISTER_TYPE
+	   && reg_class_to_reg_type[(int)rclass] == GPR_REG_TYPE
 	   && MEM_P (x)
 	   && GET_MODE_SIZE (GET_MODE (x)) > UNITS_PER_WORD)
     {
@@ -14766,42 +15189,25 @@ rs6000_debug_preferred_reload_class (rtx
    set and vice versa.  */
 
 static bool
-rs6000_secondary_memory_needed (enum reg_class class1,
-				enum reg_class class2,
+rs6000_secondary_memory_needed (enum reg_class from_class,
+				enum reg_class to_class,
 				enum machine_mode mode)
 {
-  if (class1 == class2)
-    return false;
-
-  /* Under VSX, there are 3 register classes that values could be in (VSX_REGS,
-     ALTIVEC_REGS, and FLOAT_REGS).  We don't need to use memory to copy
-     between these classes.  But we need memory for other things that can go in
-     FLOAT_REGS like SFmode.  */
-  if (TARGET_VSX
-      && (VECTOR_MEM_VSX_P (mode) || VECTOR_UNIT_VSX_P (mode))
-      && (class1 == VSX_REGS || class1 == ALTIVEC_REGS
-	  || class1 == FLOAT_REGS))
-    return (class2 != VSX_REGS && class2 != ALTIVEC_REGS
-	    && class2 != FLOAT_REGS);
-
-  if (class1 == VSX_REGS || class2 == VSX_REGS)
-    return true;
-
-  if (class1 == FLOAT_REGS
-      && (!TARGET_MFPGPR || !TARGET_POWERPC64
-	  || ((mode != DFmode)
-	      && (mode != DDmode)
-	      && (mode != DImode))))
-    return true;
+  enum rs6000_reg_type from_type, to_type;
+  bool altivec_p = ((from_class == ALTIVEC_REGS)
+		    || (to_class == ALTIVEC_REGS));
+
+  /* If a simple/direct move is available, we don't need secondary memory  */
+  from_type = reg_class_to_reg_type[(int)from_class];
+  to_type = reg_class_to_reg_type[(int)to_class];
 
-  if (class2 == FLOAT_REGS
-      && (!TARGET_MFPGPR || !TARGET_POWERPC64
-	  || ((mode != DFmode)
-	      && (mode != DDmode)
-	      && (mode != DImode))))
-    return true;
+  if (rs6000_secondary_reload_move (to_type, from_type, mode,
+				    (secondary_reload_info *)0, altivec_p))
+    return false;
 
-  if (class1 == ALTIVEC_REGS || class2 == ALTIVEC_REGS)
+  /* If we have a floating point or vector register class, we need to use
+     memory to transfer the data.  */
+  if (IS_FP_VECT_REG_TYPE (from_type) || IS_FP_VECT_REG_TYPE (to_type))
     return true;
 
   return false;
@@ -14809,17 +15215,19 @@ rs6000_secondary_memory_needed (enum reg
 
 /* Debug version of rs6000_secondary_memory_needed.  */
 static bool
-rs6000_debug_secondary_memory_needed (enum reg_class class1,
-				      enum reg_class class2,
+rs6000_debug_secondary_memory_needed (enum reg_class from_class,
+				      enum reg_class to_class,
 				      enum machine_mode mode)
 {
-  bool ret = rs6000_secondary_memory_needed (class1, class2, mode);
+  bool ret = rs6000_secondary_memory_needed (from_class, to_class, mode);
 
   fprintf (stderr,
-	   "rs6000_secondary_memory_needed, return: %s, class1 = %s, "
-	   "class2 = %s, mode = %s\n",
-	   ret ? "true" : "false", reg_class_names[class1],
-	   reg_class_names[class2], GET_MODE_NAME (mode));
+	   "rs6000_secondary_memory_needed, return: %s, from_class = %s, "
+	   "to_class = %s, mode = %s\n",
+	   ret ? "true" : "false",
+	   reg_class_names[from_class],
+	   reg_class_names[to_class],
+	   GET_MODE_NAME (mode));
 
   return ret;
 }
@@ -15025,6 +15433,170 @@ rs6000_debug_cannot_change_mode_class (e
   return ret;
 }
 \f
+/* Return a string to do a move operation of 128 bits of data.  */
+
+const char *
+rs6000_output_move_128bit (rtx operands[])
+{
+  rtx dest = operands[0];
+  rtx src = operands[1];
+  enum machine_mode mode = GET_MODE (dest);
+  int dest_regno;
+  int src_regno;
+  bool dest_gpr_p, dest_fp_p, dest_av_p, dest_vsx_p;
+  bool src_gpr_p, src_fp_p, src_av_p, src_vsx_p;
+
+  if (REG_P (dest))
+    {
+      dest_regno = REGNO (dest);
+      dest_gpr_p = INT_REGNO_P (dest_regno);
+      dest_fp_p = FP_REGNO_P (dest_regno);
+      dest_av_p = ALTIVEC_REGNO_P (dest_regno);
+      dest_vsx_p = dest_fp_p | dest_av_p;
+    }
+  else
+    {
+      dest_regno = -1;
+      dest_gpr_p = dest_fp_p = dest_av_p = dest_vsx_p = false;
+    }
+
+  if (REG_P (src))
+    {
+      src_regno = REGNO (src);
+      src_gpr_p = INT_REGNO_P (src_regno);
+      src_fp_p = FP_REGNO_P (src_regno);
+      src_av_p = ALTIVEC_REGNO_P (src_regno);
+      src_vsx_p = src_fp_p | src_av_p;
+    }
+  else
+    {
+      src_regno = -1;
+      src_gpr_p = src_fp_p = src_av_p = src_vsx_p = false;
+    }
+
+  /* Register moves.  */
+  if (dest_regno >= 0 && src_regno >= 0)
+    {
+      if (dest_gpr_p)
+	{
+	  if (src_gpr_p)
+	    return "#";
+
+	  else if (TARGET_VSX && TARGET_DIRECT_MOVE && src_vsx_p)
+	    return "#";
+	}
+
+      else if (TARGET_VSX && dest_vsx_p)
+	{
+	  if (src_vsx_p)
+	    return "xxlor %x0,%x1,%x1";
+
+	  else if (TARGET_DIRECT_MOVE && src_gpr_p)
+	    return "#";
+	}
+
+      else if (TARGET_ALTIVEC && dest_av_p && src_av_p)
+	return "vor %0,%1,%1";
+
+      else if (dest_fp_p && src_fp_p)
+	return "#";
+    }
+
+  /* Loads.  */
+  else if (dest_regno >= 0 && MEM_P (src))
+    {
+      if (dest_gpr_p)
+	{
+	  if (TARGET_QUAD_MEMORY && (dest_regno & 1) == 0
+	      && quad_memory_operand (src, mode)
+	      && !reg_overlap_mentioned_p (dest, src))
+	    {
+	      /* lq/stq only has DQ-form, so avoid X-form that %y produces.  */
+	      return REG_P (XEXP (src, 0)) ? "lq %0,%1" : "lq %0,%y1";
+	    }
+	  else
+	    return "#";
+	}
+
+      else if (TARGET_ALTIVEC && dest_av_p
+	       && altivec_indexed_or_indirect_operand (src, mode))
+	return "lvx %0,%y1";
+
+      else if (TARGET_VSX && dest_vsx_p)
+	{
+	  if (mode == V16QImode || mode == V8HImode || mode == V4SImode)
+	    return "lxvw4x %x0,%y1";
+	  else
+	    return "lxvd2x %x0,%y1";
+	}
+
+      else if (TARGET_ALTIVEC && dest_av_p)
+	return "lvx %0,%y1";
+
+      else if (dest_fp_p)
+	return "#";
+    }
+
+  /* Stores.  */
+  else if (src_regno >= 0 && MEM_P (dest))
+    {
+      if (src_gpr_p)
+	{
+	  if (TARGET_QUAD_MEMORY && (src_regno & 1) == 0
+	      && quad_memory_operand (dest, mode))
+	    {
+	      /* lq/stq only has DQ-form, so avoid X-form that %y produces.  */
+	      return REG_P (XEXP (dest, 0)) ? "stq %1,%0" : "stq %1,%y0";
+	    }
+	  else
+	    return "#";
+	}
+
+      else if (TARGET_ALTIVEC && src_av_p
+	       && altivec_indexed_or_indirect_operand (src, mode))
+	return "stvx %1,%y0";
+
+      else if (TARGET_VSX && src_vsx_p)
+	{
+	  if (mode == V16QImode || mode == V8HImode || mode == V4SImode)
+	    return "stxvw4x %x1,%y0";
+	  else
+	    return "stxvd2x %x1,%y0";
+	}
+
+      else if (TARGET_ALTIVEC && src_av_p)
+	return "stvx %1,%y0";
+
+      else if (src_fp_p)
+	return "#";
+    }
+
+  /* Constants.  */
+  else if (dest_regno >= 0
+	   && (GET_CODE (src) == CONST_INT
+	       || GET_CODE (src) == CONST_DOUBLE
+	       || GET_CODE (src) == CONST_VECTOR))
+    {
+      if (dest_gpr_p)
+	return "#";
+
+      else if (TARGET_VSX && dest_vsx_p && zero_constant (src, mode))
+	return "xxlxor %x0,%x0,%x0";
+
+      else if (TARGET_ALTIVEC && dest_av_p)
+	return output_vec_const_move (operands);
+    }
+
+  if (TARGET_DEBUG_ADDR)
+    {
+      fprintf (stderr, "\n===== Bad 128 bit move:\n");
+      debug_rtx (gen_rtx_SET (VOIDmode, dest, src));
+    }
+
+  gcc_unreachable ();
+}
+
+\f
 /* Given a comparison operation, return the bit number in CCR to test.  We
    know this is a valid comparison.
 
Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md	(revision 199168)
+++ gcc/config/rs6000/vsx.md	(working copy)
@@ -217,112 +217,31 @@ (define_c_enum "unspec"
 
 ;; VSX moves
 (define_insn "*vsx_mov<mode>"
-  [(set (match_operand:VSX_M 0 "nonimmediate_operand" "=Z,<VSr>,<VSr>,?Z,?wa,?wa,*Y,*r,*r,<VSr>,?wa,*r,v,wZ,v")
-	(match_operand:VSX_M 1 "input_operand" "<VSr>,Z,<VSr>,wa,Z,wa,r,Y,r,j,j,j,W,v,wZ"))]
+  [(set (match_operand:VSX_M 0 "nonimmediate_operand" "=Z,<VSr>,<VSr>,?Z,?wa,?wa,wQ,?&r,??Y,??r,??r,<VSr>,?wa,*r,v,wZ, v")
+	(match_operand:VSX_M 1 "input_operand" "<VSr>,Z,<VSr>,wa,Z,wa,r,wQ,r,Y,r,j,j,j,W,v,wZ"))]
   "VECTOR_MEM_VSX_P (<MODE>mode)
    && (register_operand (operands[0], <MODE>mode) 
        || register_operand (operands[1], <MODE>mode))"
 {
-  switch (which_alternative)
-    {
-    case 0:
-    case 3:
-      gcc_assert (MEM_P (operands[0])
-		  && GET_CODE (XEXP (operands[0], 0)) != PRE_INC
-		  && GET_CODE (XEXP (operands[0], 0)) != PRE_DEC
-		  && GET_CODE (XEXP (operands[0], 0)) != PRE_MODIFY);
-      return "stx<VSm>x %x1,%y0";
-
-    case 1:
-    case 4:
-      gcc_assert (MEM_P (operands[1])
-		  && GET_CODE (XEXP (operands[1], 0)) != PRE_INC
-		  && GET_CODE (XEXP (operands[1], 0)) != PRE_DEC
-		  && GET_CODE (XEXP (operands[1], 0)) != PRE_MODIFY);
-      return "lx<VSm>x %x0,%y1";
-
-    case 2:
-    case 5:
-      return "xxlor %x0,%x1,%x1";
-
-    case 6:
-    case 7:
-    case 8:
-    case 11:
-      return "#";
-
-    case 9:
-    case 10:
-      return "xxlxor %x0,%x0,%x0";
-
-    case 12:
-      return output_vec_const_move (operands);
-
-    case 13:
-      gcc_assert (MEM_P (operands[0])
-		  && GET_CODE (XEXP (operands[0], 0)) != PRE_INC
-		  && GET_CODE (XEXP (operands[0], 0)) != PRE_DEC
-		  && GET_CODE (XEXP (operands[0], 0)) != PRE_MODIFY);
-      return "stvx %1,%y0";
-
-    case 14:
-      gcc_assert (MEM_P (operands[0])
-		  && GET_CODE (XEXP (operands[0], 0)) != PRE_INC
-		  && GET_CODE (XEXP (operands[0], 0)) != PRE_DEC
-		  && GET_CODE (XEXP (operands[0], 0)) != PRE_MODIFY);
-      return "lvx %0,%y1";
-
-    default:
-      gcc_unreachable ();
-    }
+  return rs6000_output_move_128bit (operands);
 }
-  [(set_attr "type" "vecstore,vecload,vecsimple,vecstore,vecload,vecsimple,*,*,*,vecsimple,vecsimple,*,*,vecstore,vecload")])
+  [(set_attr "type" "vecstore,vecload,vecsimple,vecstore,vecload,vecsimple,load,store,store,load, *,vecsimple,vecsimple,*, *,vecstore,vecload")
+   (set_attr "length" "4,4,4,4,4,4,12,12,12,12,16,4,4,*,16,4,4")])
 
 ;; Unlike other VSX moves, allow the GPRs even for reloading, since a normal
 ;; use of TImode is for unions.  However for plain data movement, slightly
 ;; favor the vector loads
 (define_insn "*vsx_movti_64bit"
-  [(set (match_operand:TI 0 "nonimmediate_operand" "=Z,wa,wa,wa,v, v,wZ,?Y,?r,?r,?r")
-	(match_operand:TI 1 "input_operand"        "wa, Z,wa, O,W,wZ, v, r, Y, r, n"))]
+  [(set (match_operand:TI 0 "nonimmediate_operand" "=Z,wa,wa,wa,v,v,wZ,wQ,&r,Y,r,r,?r")
+	(match_operand:TI 1 "input_operand" "wa,Z,wa,O,W,wZ,v,r,wQ,r,Y,r,n"))]
   "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (TImode)
    && (register_operand (operands[0], TImode) 
        || register_operand (operands[1], TImode))"
 {
-  switch (which_alternative)
-    {
-    case 0:
-      return "stxvd2x %x1,%y0";
-
-    case 1:
-      return "lxvd2x %x0,%y1";
-
-    case 2:
-      return "xxlor %x0,%x1,%x1";
-
-    case 3:
-      return "xxlxor %x0,%x0,%x0";
-
-    case 4:
-      return output_vec_const_move (operands);
-
-    case 5:
-      return "stvx %1,%y0";
-
-    case 6:
-      return "lvx %0,%y1";
-
-    case 7:
-    case 8:
-    case 9:
-    case 10:
-      return "#";
-
-    default:
-      gcc_unreachable ();
-    }
+  return rs6000_output_move_128bit (operands);
 }
-  [(set_attr "type" "vecstore,vecload,vecsimple,vecsimple,vecsimple,vecstore,vecload,*,*,*,*")
-   (set_attr "length" "     4,      4,        4,       4,         8,       4,      4,8,8,8,8")])
+  [(set_attr "type" "vecstore,vecload,vecsimple,vecsimple,vecsimple,vecstore,vecload,store,load,store,load,*,*")
+   (set_attr "length" "4,4,4,4,16,4,4,8,8,8,8,8,8")])
 
 (define_insn "*vsx_movti_32bit"
   [(set (match_operand:TI 0 "nonimmediate_operand" "=Z,wa,wa,wa,v, v,wZ,Q,Y,????r,????r,????r,r")
Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(revision 199128)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -127,6 +127,13 @@ (define_c_enum "unspec"
    UNSPEC_LFIWZX
    UNSPEC_FCTIWUZ
    UNSPEC_GRP_END_NOP
+   UNSPEC_P8V_FMRGOW
+   UNSPEC_P8V_MTVSRWZ
+   UNSPEC_P8V_RELOAD_FROM_GPR
+   UNSPEC_P8V_MTVSRD
+   UNSPEC_P8V_XXPERMDI
+   UNSPEC_P8V_RELOAD_FROM_VSX
+   UNSPEC_FUSION_GPR
   ])
 
 ;;
@@ -268,6 +275,15 @@ (define_mode_iterator FMOVE64X [DI DF DD
 (define_mode_iterator FMOVE128 [(TF "!TARGET_IEEEQUAD && TARGET_LONG_DOUBLE_128")
 				(TD "TARGET_HARD_FLOAT && TARGET_FPRS")])
 
+; Iterators for 128 bit types for direct move
+(define_mode_iterator FMOVE128_GPR [(TI    "TARGET_VSX_TIMODE")
+				    (V16QI "")
+				    (V8HI  "")
+				    (V4SI  "")
+				    (V4SF  "")
+				    (V2DI  "")
+				    (V2DF  "")])
+
 ; Whether a floating point move is ok, don't allow SD without hardware FP
 (define_mode_attr fmove_ok [(SF "")
 			    (DF "")
@@ -284,11 +300,16 @@ (define_mode_attr real_value_to_target [
 (define_mode_attr f32_lr [(SF "f")		 (SD "wz")])
 (define_mode_attr f32_lm [(SF "m")		 (SD "Z")])
 (define_mode_attr f32_li [(SF "lfs%U1%X1 %0,%1") (SD "lfiwzx %0,%y1")])
+(define_mode_attr f32_lv [(SF "lxsspx %0,%y1")	 (SD "lxsiwzx %0,%y1")])
 
 ; Definitions for store from 32-bit fpr register
 (define_mode_attr f32_sr [(SF "f")		  (SD "wx")])
 (define_mode_attr f32_sm [(SF "m")		  (SD "Z")])
 (define_mode_attr f32_si [(SF "stfs%U0%X0 %1,%0") (SD "stfiwx %1,%y0")])
+(define_mode_attr f32_sv [(SF "stxsspx %1,%y0")	  (SD "stxsiwzx %1,%y0")])
+
+; Definitions for 32-bit fpr direct move
+(define_mode_attr f32_dm [(SF "wn") (SD "wm")])
 
 ; These modes do not fit in integer registers in 32-bit mode.
 ; but on e500v2, the gpr are 64 bit registers
@@ -368,7 +389,7 @@ (define_expand "zero_extend<mode>di2"
 (define_insn "*zero_extend<mode>di2_internal1"
   [(set (match_operand:DI 0 "gpc_reg_operand" "=r,r")
 	(zero_extend:DI (match_operand:QHSI 1 "reg_or_mem_operand" "m,r")))]
-  "TARGET_POWERPC64"
+  "TARGET_POWERPC64 && (<MODE>mode != SImode || !TARGET_LFIWZX)"
   "@
    l<wd>z%U1%X1 %0,%1
    rldicl %0,%1,0,<dbits>"
@@ -434,6 +455,29 @@ (define_split
 		    (const_int 0)))]
   "")
 
+(define_insn "*zero_extendsidi2_lfiwzx"
+  [(set (match_operand:DI 0 "gpc_reg_operand" "=r,r,??wm,!wz,!wm")
+	(zero_extend:DI (match_operand:SI 1 "reg_or_mem_operand" "m,r,r,Z,Z")))]
+  "TARGET_POWERPC64 && TARGET_LFIWZX"
+  "@
+   lwz%U1%X1 %0,%1
+   rldicl %0,%1,0,32
+   mtvsrwz %x0,%1
+   lfiwzx %0,%y1
+   lxsiwzx %x0,%y1"
+  [(set_attr_alternative "type"
+      [(if_then_else
+	 (match_test "update_indexed_address_mem (operands[1], VOIDmode)")
+	 (const_string "load_ux")
+	 (if_then_else
+	   (match_test "update_address_mem (operands[1], VOIDmode)")
+	   (const_string "load_u")
+	   (const_string "load")))
+       (const_string "*")
+       (const_string "mffgpr")
+       (const_string "fpload")
+       (const_string "fpload")])])
+
 (define_insn "extendqidi2"
   [(set (match_operand:DI 0 "gpc_reg_operand" "=r")
 	(sign_extend:DI (match_operand:QI 1 "gpc_reg_operand" "r")))]
@@ -581,10 +625,33 @@ (define_expand "extendsidi2"
   "TARGET_POWERPC64"
   "")
 
-(define_insn ""
+(define_insn "*extendsidi2_lfiwax"
+  [(set (match_operand:DI 0 "gpc_reg_operand" "=r,r,??wm,!wl,!wm")
+	(sign_extend:DI (match_operand:SI 1 "lwa_operand" "m,r,r,Z,Z")))]
+  "TARGET_POWERPC64 && TARGET_LFIWAX"
+  "@
+   lwa%U1%X1 %0,%1
+   extsw %0,%1
+   mtvsrwa %x0,%1
+   lfiwax %0,%y1
+   lxsiwax %x0,%y1"
+  [(set_attr_alternative "type"
+      [(if_then_else
+	 (match_test "update_indexed_address_mem (operands[1], VOIDmode)")
+	 (const_string "load_ext_ux")
+	 (if_then_else
+	   (match_test "update_address_mem (operands[1], VOIDmode)")
+	   (const_string "load_ext_u")
+	   (const_string "load_ext")))
+       (const_string "exts")
+       (const_string "mffgpr")
+       (const_string "fpload")
+       (const_string "fpload")])])
+
+(define_insn "*extendsidi2_nocell"
   [(set (match_operand:DI 0 "gpc_reg_operand" "=r,r")
 	(sign_extend:DI (match_operand:SI 1 "lwa_operand" "m,r")))]
-  "TARGET_POWERPC64 && rs6000_gen_cell_microcode"
+  "TARGET_POWERPC64 && rs6000_gen_cell_microcode && !TARGET_LFIWAX"
   "@
    lwa%U1%X1 %0,%1
    extsw %0,%1"
@@ -598,7 +665,7 @@ (define_insn ""
 	   (const_string "load_ext")))
        (const_string "exts")])])
 
-(define_insn ""
+(define_insn "*extendsidi2_nocell"
   [(set (match_operand:DI 0 "gpc_reg_operand" "=r")
 	(sign_extend:DI (match_operand:SI 1 "gpc_reg_operand" "r")))]
   "TARGET_POWERPC64 && !rs6000_gen_cell_microcode"
@@ -2035,7 +2102,9 @@ (define_insn "parity<mode>2_cmpb"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
 	(unspec:GPR [(match_operand:GPR 1 "gpc_reg_operand" "r")] UNSPEC_PARITY))]
   "TARGET_CMPB && TARGET_POPCNTB"
-  "prty<wd> %0,%1")
+  "prty<wd> %0,%1"
+  [(set_attr "length" "4")
+   (set_attr "type" "popcnt")])
 
 (define_expand "parity<mode>2"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "")
@@ -4316,7 +4385,7 @@ (define_insn ""
    #
    #
    #"
-  [(set_attr "type" "delayed_compare,var_delayed_compare,delayed_compare,delayed_compare,var_delayed_compare,delayed_compare")
+  [(set_attr "type" "fast_compare,var_delayed_compare,delayed_compare,delayed_compare,var_delayed_compare,delayed_compare")
    (set_attr "length" "4,4,4,8,8,8")])
 
 (define_split
@@ -4348,7 +4417,7 @@ (define_insn ""
    #
    #
    #"
-  [(set_attr "type" "delayed_compare,var_delayed_compare,delayed_compare,delayed_compare,var_delayed_compare,delayed_compare")
+  [(set_attr "type" "fast_compare,var_delayed_compare,delayed_compare,delayed_compare,var_delayed_compare,delayed_compare")
    (set_attr "length" "4,4,4,8,8,8")])
 
 (define_split
@@ -5553,12 +5622,15 @@ (define_insn "*fselsfdf4"
 ; We don't define lfiwax/lfiwzx with the normal definition, because we
 ; don't want to support putting SImode in FPR registers.
 (define_insn "lfiwax"
-  [(set (match_operand:DI 0 "gpc_reg_operand" "=d")
-	(unspec:DI [(match_operand:SI 1 "indexed_or_indirect_operand" "Z")]
+  [(set (match_operand:DI 0 "gpc_reg_operand" "=d,wm,!wm")
+	(unspec:DI [(match_operand:SI 1 "reg_or_indexed_operand" "Z,Z,r")]
 		   UNSPEC_LFIWAX))]
   "TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_DOUBLE_FLOAT && TARGET_LFIWAX"
-  "lfiwax %0,%y1"
-  [(set_attr "type" "fpload")])
+  "@
+   lfiwax %0,%y1
+   lxsiwax %x0,%y1
+   mtvsrwa %x0,%1"
+  [(set_attr "type" "fpload,fpload,mffgpr")])
 
 ; This split must be run before register allocation because it allocates the
 ; memory slot that is needed to move values to/from the FPR.  We don't allocate
@@ -5580,7 +5652,8 @@ (define_insn_and_split "floatsi<mode>2_l
   rtx src = operands[1];
   rtx tmp;
 
-  if (!MEM_P (src) && TARGET_MFPGPR && TARGET_POWERPC64)
+  if (!MEM_P (src) && TARGET_POWERPC64
+      && (TARGET_MFPGPR || TARGET_DIRECT_MOVE))
     tmp = convert_to_mode (DImode, src, false);
   else
     {
@@ -5629,12 +5702,15 @@ (define_insn_and_split "floatsi<mode>2_l
    (set_attr "type" "fpload")])
 
 (define_insn "lfiwzx"
-  [(set (match_operand:DI 0 "gpc_reg_operand" "=d")
-	(unspec:DI [(match_operand:SI 1 "indexed_or_indirect_operand" "Z")]
+  [(set (match_operand:DI 0 "gpc_reg_operand" "=d,wm,!wm")
+	(unspec:DI [(match_operand:SI 1 "reg_or_indexed_operand" "Z,Z,r")]
 		   UNSPEC_LFIWZX))]
   "TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_DOUBLE_FLOAT && TARGET_LFIWZX"
-  "lfiwzx %0,%y1"
-  [(set_attr "type" "fpload")])
+  "@
+   lfiwzx %0,%y1
+   lxsiwzx %x0,%y1
+   mtvsrwz %x0,%1"
+  [(set_attr "type" "fpload,fpload,mftgpr")])
 
 (define_insn_and_split "floatunssi<mode>2_lfiwzx"
   [(set (match_operand:SFDF 0 "gpc_reg_operand" "=d")
@@ -5651,7 +5727,8 @@ (define_insn_and_split "floatunssi<mode>
   rtx src = operands[1];
   rtx tmp;
 
-  if (!MEM_P (src) && TARGET_MFPGPR && TARGET_POWERPC64)
+  if (!MEM_P (src) && TARGET_POWERPC64
+      && (TARGET_MFPGPR || TARGET_DIRECT_MOVE))
     tmp = convert_to_mode (DImode, src, true);
   else
     {
@@ -5942,7 +6019,7 @@ (define_insn_and_split "fix_trunc<mode>s
       emit_insn (gen_stfiwx (dest, tmp));
       DONE;
     }
-  else if (TARGET_MFPGPR && TARGET_POWERPC64)
+  else if (TARGET_POWERPC64 && (TARGET_MFPGPR || TARGET_DIRECT_MOVE))
     {
       dest = gen_lowpart (DImode, dest);
       emit_move_insn (dest, tmp);
@@ -6036,7 +6113,7 @@ (define_insn_and_split "fixuns_trunc<mod
       emit_insn (gen_stfiwx (dest, tmp));
       DONE;
     }
-  else if (TARGET_MFPGPR && TARGET_POWERPC64)
+  else if (TARGET_POWERPC64 && (TARGET_MFPGPR || TARGET_DIRECT_MOVE))
     {
       dest = gen_lowpart (DImode, dest);
       emit_move_insn (dest, tmp);
@@ -8490,7 +8567,7 @@ (define_insn "*mov<mode>_internal2"
    cmp<wd>i %2,%0,0
    mr. %0,%1
    #"
-  [(set_attr "type" "cmp,compare,cmp")
+  [(set_attr "type" "cmp,fast_compare,cmp")
    (set_attr "length" "4,4,8")])
 
 (define_split
@@ -8680,8 +8757,8 @@ (define_split
 }")
 
 (define_insn "mov<mode>_hardfloat"
-  [(set (match_operand:FMOVE32 0 "nonimmediate_operand" "=!r,!r,m,f,wa,wa,<f32_lr>,<f32_sm>,*c*l,!r,*h,!r,!r")
-	(match_operand:FMOVE32 1 "input_operand" "r,m,r,f,wa,j,<f32_lm>,<f32_sr>,r,h,0,G,Fn"))]
+  [(set (match_operand:FMOVE32 0 "nonimmediate_operand" "=!r,!r,m,f,wa,wa,<f32_lr>,<f32_sm>,wm,Z,?<f32_dm>,?r,*c*l,!r,*h,!r,!r")
+	(match_operand:FMOVE32 1 "input_operand" "r,m,r,f,wa,j,<f32_lm>,<f32_sr>,Z,wm,r,<f32_dm>,r,h,0,G,Fn"))]
   "(gpc_reg_operand (operands[0], <MODE>mode)
    || gpc_reg_operand (operands[1], <MODE>mode))
    && (TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_SINGLE_FLOAT)"
@@ -8694,6 +8771,10 @@ (define_insn "mov<mode>_hardfloat"
    xxlxor %x0,%x0,%x0
    <f32_li>
    <f32_si>
+   <f32_lv>
+   <f32_sv>
+   mtvsrwz %x0,%1
+   mfvsrwz %0,%x1
    mt%0 %1
    mf%1 %0
    nop
@@ -8732,16 +8813,20 @@ (define_insn "mov<mode>_hardfloat"
 	   (match_test "update_address_mem (operands[0], VOIDmode)")
 	   (const_string "fpstore_u")
 	   (const_string "fpstore")))
+       (const_string "fpload")
+       (const_string "fpstore")
+       (const_string "mftgpr")
+       (const_string "mffgpr")
        (const_string "mtjmpr")
        (const_string "mfjmpr")
        (const_string "*")
        (const_string "*")
        (const_string "*")])
-   (set_attr "length" "4,4,4,4,4,4,4,4,4,4,4,4,8")])
+   (set_attr "length" "4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,8")])
 
 (define_insn "*mov<mode>_softfloat"
   [(set (match_operand:FMOVE32 0 "nonimmediate_operand" "=r,cl,r,r,m,r,r,r,r,*h")
-	(match_operand:FMOVE32 1 "input_operand" "r, r,h,m,r,I,L,G,Fn,0"))]
+	(match_operand:FMOVE32 1 "input_operand" "r,r,h,m,r,I,L,G,Fn,0"))]
   "(gpc_reg_operand (operands[0], <MODE>mode)
    || gpc_reg_operand (operands[1], <MODE>mode))
    && (TARGET_SOFT_FLOAT || !TARGET_FPRS)"
@@ -8954,8 +9039,8 @@ (define_insn "*mov<mode>_softfloat32"
 ; ld/std require word-aligned displacements -> 'Y' constraint.
 ; List Y->r and r->Y before r->r for reload.
 (define_insn "*mov<mode>_hardfloat64"
-  [(set (match_operand:FMOVE64 0 "nonimmediate_operand" "=m,d,d,ws,?wa,Z,?Z,ws,?wa,wa,Y,r,!r,*c*l,!r,*h,!r,!r,!r,r,wg")
-	(match_operand:FMOVE64 1 "input_operand" "d,m,d,Z,Z,ws,wa,ws,wa,j,r,Y,r,r,h,0,G,H,F,wg,r"))]
+  [(set (match_operand:FMOVE64 0 "nonimmediate_operand" "=m,d,d,ws,?wa,Z,?Z,ws,?wa,wa,Y,r,!r,*c*l,!r,*h,!r,!r,!r,r,wg,r,wm")
+	(match_operand:FMOVE64 1 "input_operand" "d,m,d,Z,Z,ws,wa,ws,wa,j,r,Y,r,r,h,0,G,H,F,wg,r,wm,r"))]
   "TARGET_POWERPC64 && TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_DOUBLE_FLOAT
    && (gpc_reg_operand (operands[0], <MODE>mode)
        || gpc_reg_operand (operands[1], <MODE>mode))"
@@ -8980,7 +9065,9 @@ (define_insn "*mov<mode>_hardfloat64"
    #
    #
    mftgpr %0,%1
-   mffgpr %0,%1"
+   mffgpr %0,%1
+   mfvsrd %0,%x1
+   mtvsrd %x0,%1"
   [(set_attr_alternative "type"
       [(if_then_else
 	 (match_test "update_indexed_address_mem (operands[0], VOIDmode)")
@@ -9038,8 +9125,10 @@ (define_insn "*mov<mode>_hardfloat64"
        (const_string "*")
        (const_string "*")
        (const_string "mftgpr")
+       (const_string "mffgpr")
+       (const_string "mftgpr")
        (const_string "mffgpr")])
-   (set_attr "length" "4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,8,12,16,4,4")])
+   (set_attr "length" "4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,8,12,16,4,4,4,4")])
 
 (define_insn "*mov<mode>_softfloat64"
   [(set (match_operand:FMOVE64 0 "nonimmediate_operand" "=Y,r,r,cl,r,r,r,r,*h")
@@ -9419,6 +9508,216 @@ (define_expand "reload_<mode>_load"
 })
 
 \f
+;; Power8 merge instructions to allow direct move to/from floating point
+;; registers in 32-bit mode.  We use TF mode to get two registers to move the
+;; individual 32-bit parts across.  Subreg doesn't work too well on the TF
+;; value, since it is allocated in reload and not all of the flow information
+;; is setup for it.  We have two patterns to do the two moves between gprs and
+;; fprs.  There isn't a dependancy between the two, but we could potentially
+;; schedule other instructions between the two instructions.  TFmode is
+;; currently limited to traditional FPR registers.  If/when this is changed, we
+;; will need to revist %L to make sure it works with VSX registers, or add an
+;; %x version of %L.
+
+(define_insn "p8_fmrgow_<mode>"
+  [(set (match_operand:FMOVE64X 0 "register_operand" "=d")
+	(unspec:FMOVE64X [(match_operand:TF 1 "register_operand" "d")]
+			 UNSPEC_P8V_FMRGOW))]
+  "!TARGET_POWERPC64 && TARGET_DIRECT_MOVE"
+  "fmrgow %0,%1,%L1"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "p8_mtvsrwz_1"
+  [(set (match_operand:TF 0 "register_operand" "=d")
+	(unspec:TF [(match_operand:SI 1 "register_operand" "r")]
+		   UNSPEC_P8V_MTVSRWZ))]
+  "!TARGET_POWERPC64 && TARGET_DIRECT_MOVE"
+  "mtvsrwz %x0,%1"
+  [(set_attr "type" "mftgpr")])
+
+(define_insn "p8_mtvsrwz_2"
+  [(set (match_operand:TF 0 "register_operand" "+d")
+	(unspec:TF [(match_dup 0)
+		    (match_operand:SI 1 "register_operand" "r")]
+		   UNSPEC_P8V_MTVSRWZ))]
+  "!TARGET_POWERPC64 && TARGET_DIRECT_MOVE"
+  "mtvsrwz %L0,%1"
+  [(set_attr "type" "mftgpr")])
+
+(define_insn_and_split "reload_fpr_from_gpr<mode>"
+  [(set (match_operand:FMOVE64X 0 "register_operand" "=ws")
+	(unspec:FMOVE64X [(match_operand:FMOVE64X 1 "register_operand" "r")]
+			 UNSPEC_P8V_RELOAD_FROM_GPR))
+   (clobber (match_operand:TF 2 "register_operand" "=d"))]
+  "!TARGET_POWERPC64 && TARGET_DIRECT_MOVE"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  rtx dest = operands[0];
+  rtx src = operands[1];
+  rtx tmp = operands[2];
+  rtx gpr_hi_reg = gen_highpart (SImode, src);
+  rtx gpr_lo_reg = gen_lowpart (SImode, src);
+
+  emit_insn (gen_p8_mtvsrwz_1 (tmp, gpr_hi_reg));
+  emit_insn (gen_p8_mtvsrwz_2 (tmp, gpr_lo_reg));
+  emit_insn (gen_p8_fmrgow_<mode> (dest, tmp));
+  DONE;
+}
+  [(set_attr "length" "12")
+   (set_attr "type" "three")])
+
+;; Move 128 bit values from GPRs to VSX registers in 64-bit mode
+(define_insn "p8_mtvsrd_1"
+  [(set (match_operand:TF 0 "register_operand" "=ws")
+	(unspec:TF [(match_operand:DI 1 "register_operand" "r")]
+		   UNSPEC_P8V_MTVSRD))]
+  "TARGET_POWERPC64 && TARGET_DIRECT_MOVE"
+  "mtvsrd %0,%1"
+  [(set_attr "type" "mftgpr")])
+
+(define_insn "p8_mtvsrd_2"
+  [(set (match_operand:TF 0 "register_operand" "+ws")
+	(unspec:TF [(match_dup 0)
+		    (match_operand:DI 1 "register_operand" "r")]
+		   UNSPEC_P8V_MTVSRD))]
+  "TARGET_POWERPC64 && TARGET_DIRECT_MOVE"
+  "mtvsrd %L0,%1"
+  [(set_attr "type" "mftgpr")])
+
+(define_insn "p8_xxpermdi_<mode>"
+  [(set (match_operand:FMOVE128_GPR 0 "register_operand" "=wa")
+	(unspec:FMOVE128_GPR [(match_operand:TF 1 "register_operand" "ws")]
+			     UNSPEC_P8V_XXPERMDI))]
+  "TARGET_POWERPC64 && TARGET_DIRECT_MOVE"
+  "xxpermdi %x0,%1,%L1,0"
+  [(set_attr "type" "vecperm")])
+
+(define_insn_and_split "reload_vsx_from_gpr<mode>"
+  [(set (match_operand:FMOVE128_GPR 0 "register_operand" "=wa")
+	(unspec:FMOVE128_GPR
+	 [(match_operand:FMOVE128_GPR 1 "register_operand" "r")]
+	 UNSPEC_P8V_RELOAD_FROM_GPR))
+   (clobber (match_operand:TF 2 "register_operand" "=ws"))]
+  "TARGET_POWERPC64 && TARGET_DIRECT_MOVE"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  rtx dest = operands[0];
+  rtx src = operands[1];
+  rtx tmp = operands[2];
+  rtx gpr_hi_reg = gen_highpart (DImode, src);
+  rtx gpr_lo_reg = gen_lowpart (DImode, src);
+
+  emit_insn (gen_p8_mtvsrd_1 (tmp, gpr_hi_reg));
+  emit_insn (gen_p8_mtvsrd_2 (tmp, gpr_lo_reg));
+  emit_insn (gen_p8_xxpermdi_<mode> (dest, tmp));
+}
+  [(set_attr "length" "12")
+   (set_attr "type" "three")])
+
+;; Move SFmode to a VSX from a GPR register.  Because scalar floating point
+;; type is stored internally as double precision in the VSX registers, we have
+;; to convert it from the vector format.
+
+(define_insn_and_split "reload_vsx_from_gprsf"
+  [(set (match_operand:SF 0 "register_operand" "=wa")
+	(unspec:SF [(match_operand:SF 1 "register_operand" "r")]
+		   UNSPEC_P8V_RELOAD_FROM_GPR))
+   (clobber (match_operand:DI 2 "register_operand" "=r"))]
+  "TARGET_POWERPC64 && TARGET_DIRECT_MOVE && WORDS_BIG_ENDIAN"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  rtx op0 = operands[0];
+  rtx op1 = operands[1];
+  rtx op2 = operands[2];
+  rtx op0_di = simplify_gen_subreg (DImode, op0, SFmode, 0);
+  rtx op1_di = simplify_gen_subreg (DImode, op1, SFmode, 0);
+
+  /* Move SF value to upper 32-bits for xscvspdpn.  */
+  emit_insn (gen_ashldi3 (op2, op1_di, GEN_INT (32)));
+  emit_move_insn (op0_di, op2);
+  emit_insn (gen_vsx_xscvspdpn_directmove (op0, op0));
+  DONE;
+}
+  [(set_attr "length" "8")
+   (set_attr "type" "two")])
+
+;; Move 128 bit values from VSX registers to GPRs in 64-bit mode by doing a
+;; normal 64-bit move, followed by an xxpermdi to get the bottom 64-bit value,
+;; and then doing a move of that.
+(define_insn "p8_mfvsrd_3_<mode>"
+  [(set (match_operand:DF 0 "register_operand" "=r")
+	(unspec:DF [(match_operand:FMOVE128_GPR 1 "register_operand" "wa")]
+		   UNSPEC_P8V_RELOAD_FROM_VSX))]
+  "TARGET_POWERPC64 && TARGET_DIRECT_MOVE && WORDS_BIG_ENDIAN"
+  "mfvsrd %0,%x1"
+  [(set_attr "type" "mftgpr")])
+
+(define_insn_and_split "reload_gpr_from_vsx<mode>"
+  [(set (match_operand:FMOVE128_GPR 0 "register_operand" "=r")
+	(unspec:FMOVE128_GPR
+	 [(match_operand:FMOVE128_GPR 1 "register_operand" "wa")]
+	 UNSPEC_P8V_RELOAD_FROM_VSX))
+   (clobber (match_operand:FMOVE128_GPR 2 "register_operand" "=wa"))]
+  "TARGET_POWERPC64 && TARGET_DIRECT_MOVE && WORDS_BIG_ENDIAN"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  rtx dest = operands[0];
+  rtx src = operands[1];
+  rtx tmp = operands[2];
+  rtx gpr_hi_reg = gen_highpart (DFmode, dest);
+  rtx gpr_lo_reg = gen_lowpart (DFmode, dest);
+
+  emit_insn (gen_p8_mfvsrd_3_<mode> (gpr_hi_reg, src));
+  emit_insn (gen_vsx_xxpermdi_<mode> (tmp, src, src, GEN_INT (3)));
+  emit_insn (gen_p8_mfvsrd_3_<mode> (gpr_lo_reg, tmp));
+}
+  [(set_attr "length" "12")
+   (set_attr "type" "three")])
+
+;; Move SFmode to a GPR from a VSX register.  Because scalar floating point
+;; type is stored internally as double precision, we have to convert it to the
+;; vector format.
+
+(define_insn_and_split "reload_gpr_from_vsxsf"
+  [(set (match_operand:SF 0 "register_operand" "=r")
+	(unspec:SF [(match_operand:SF 1 "register_operand" "wa")]
+		   UNSPEC_P8V_RELOAD_FROM_VSX))
+   (clobber (match_operand:V4SF 2 "register_operand" "=wa"))]
+  "TARGET_POWERPC64 && TARGET_DIRECT_MOVE && WORDS_BIG_ENDIAN"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  rtx op0 = operands[0];
+  rtx op1 = operands[1];
+  rtx op2 = operands[2];
+  rtx diop0 = simplify_gen_subreg (DImode, op0, SFmode, 0);
+
+  emit_insn (gen_vsx_xscvdpspn_scalar (op2, op1));
+  emit_insn (gen_p8_mfvsrd_4_disf (diop0, op2));
+  emit_insn (gen_lshrdi3 (diop0, diop0, GEN_INT (32)));
+  DONE;
+}
+  [(set_attr "length" "12")
+   (set_attr "type" "three")])
+
+(define_insn "p8_mfvsrd_4_disf"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+	(unspec:DI [(match_operand:V4SF 1 "register_operand" "wa")]
+		   UNSPEC_P8V_RELOAD_FROM_VSX))]
+  "TARGET_POWERPC64 && TARGET_DIRECT_MOVE && WORDS_BIG_ENDIAN"
+  "mfvsrd %0,%x1"
+  [(set_attr "type" "mftgpr")])
+
+\f
 ;; Next come the multi-word integer load and store and the load and store
 ;; multiple insns.
 
@@ -9467,7 +9766,8 @@ (define_split
   [(set (match_operand:DI 0 "gpc_reg_operand" "")
 	(match_operand:DI 1 "const_int_operand" ""))]
   "! TARGET_POWERPC64 && reload_completed
-   && gpr_or_gpr_p (operands[0], operands[1])"
+   && gpr_or_gpr_p (operands[0], operands[1])
+   && !direct_move_p (operands[0], operands[1])"
   [(set (match_dup 2) (match_dup 4))
    (set (match_dup 3) (match_dup 1))]
   "
@@ -9485,13 +9785,14 @@ (define_split
   [(set (match_operand:DIFD 0 "rs6000_nonimmediate_operand" "")
         (match_operand:DIFD 1 "input_operand" ""))]
   "reload_completed && !TARGET_POWERPC64
-   && gpr_or_gpr_p (operands[0], operands[1])"
+   && gpr_or_gpr_p (operands[0], operands[1])
+   && !direct_move_p (operands[0], operands[1])"
   [(pc)]
 { rs6000_split_multireg_move (operands[0], operands[1]); DONE; })
 
 (define_insn "*movdi_internal64"
-  [(set (match_operand:DI 0 "nonimmediate_operand" "=Y,r,r,r,r,r,?m,?*d,?*d,?Z,?wa,?wa,r,*h,*h,?wa,r,?*wg")
-	(match_operand:DI 1 "input_operand" "r,Y,r,I,L,nF,d,m,d,wa,Z,wa,*h,r,0,O,*wg,r"))]
+  [(set (match_operand:DI 0 "nonimmediate_operand" "=Y,r,r,r,r,r,?m,?*d,?*d,?Z,?wa,?wa,r,*h,*h,?wa,r,?*wg,r,?*wm")
+	(match_operand:DI 1 "input_operand" "r,Y,r,I,L,nF,d,m,d,wa,Z,wa,*h,r,0,O,*wg,r,*wm,r"))]
   "TARGET_POWERPC64
    && (gpc_reg_operand (operands[0], DImode)
        || gpc_reg_operand (operands[1], DImode))"
@@ -9513,7 +9814,9 @@ (define_insn "*movdi_internal64"
    nop
    xxlxor %x0,%x0,%x0
    mftgpr %0,%1
-   mffgpr %0,%1"
+   mffgpr %0,%1
+   mfvsrd %0,%x1
+   mtvsrd %x0,%1"
   [(set_attr_alternative "type"
       [(if_then_else
 	 (match_test "update_indexed_address_mem (operands[0], VOIDmode)")
@@ -9562,8 +9865,10 @@ (define_insn "*movdi_internal64"
        (const_string "*")
        (const_string "vecsimple")
        (const_string "mftgpr")
+       (const_string "mffgpr")
+       (const_string "mftgpr")
        (const_string "mffgpr")])
-   (set_attr "length" "4,4,4,4,4,20,4,4,4,4,4,4,4,4,4,4,4,4")])
+   (set_attr "length" "4,4,4,4,4,20,4,4,4,4,4,4,4,4,4,4,4,4,4,4")])
 
 ;; Generate all one-bits and clear left or right.
 ;; Use (and:DI (rotate:DI ...)) to avoid anddi3 unnecessary clobber.
@@ -9652,19 +9957,20 @@ (define_insn "*mov<mode>_string"
 					  (const_string "conditional")))])
 
 (define_insn "*mov<mode>_ppc64"
-  [(set (match_operand:TI2 0 "nonimmediate_operand" "=Y,r,r")
-	(match_operand:TI2 1 "input_operand" "r,Y,r"))]
-  "(TARGET_POWERPC64
-   && (<MODE>mode != TImode || VECTOR_MEM_NONE_P (TImode))
+  [(set (match_operand:TI2 0 "nonimmediate_operand" "=Y,r,r,r")
+	(match_operand:TI2 1 "input_operand" "r,Y,r,F"))]
+  "(TARGET_POWERPC64 && VECTOR_MEM_NONE_P (<MODE>mode)
    && (gpc_reg_operand (operands[0], <MODE>mode)
        || gpc_reg_operand (operands[1], <MODE>mode)))"
   "#"
-  [(set_attr "type" "store,load,*")])
+  [(set_attr "type" "store,load,*,*")])
 
 (define_split
-  [(set (match_operand:TI2 0 "gpc_reg_operand" "")
+  [(set (match_operand:TI2 0 "int_reg_operand" "")
 	(match_operand:TI2 1 "const_double_operand" ""))]
-  "TARGET_POWERPC64"
+  "TARGET_POWERPC64
+   && (VECTOR_MEM_NONE_P (<MODE>mode)
+       || (reload_completed && INT_REGNO_P (REGNO (operands[0]))))"
   [(set (match_dup 2) (match_dup 4))
    (set (match_dup 3) (match_dup 5))]
   "
@@ -9691,7 +9997,9 @@ (define_split
   [(set (match_operand:TI2 0 "nonimmediate_operand" "")
         (match_operand:TI2 1 "input_operand" ""))]
   "reload_completed
-   && gpr_or_gpr_p (operands[0], operands[1])"
+   && gpr_or_gpr_p (operands[0], operands[1])
+   && !direct_move_p (operands[0], operands[1])
+   && !quad_load_store_p (operands[0], operands[1])"
   [(pc)]
 { rs6000_split_multireg_move (operands[0], operands[1]); DONE; })
 \f
Index: gcc/testsuite/gcc.target/powerpc/direct-move-vint1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/direct-move-vint1.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/direct-move-vint1.c	(revision 0)
@@ -0,0 +1,13 @@
+/* { dg-do compile { target { powerpc*-*-linux* && lp64 } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-skip-if "" { powerpc*-*-*spe* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-mcpu=power8 -O2" } */
+/* { dg-final { scan-assembler-times "mtvsrd" 4 } } */
+/* { dg-final { scan-assembler-times "mfvsrd" 4 } } */
+
+/* Check code generation for direct move for long types.  */
+
+#define TYPE vector int
+
+#include "direct-move.h"
Index: gcc/testsuite/gcc.target/powerpc/direct-move-vint2.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/direct-move-vint2.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/direct-move-vint2.c	(revision 0)
@@ -0,0 +1,12 @@
+/* { dg-do run { target { powerpc*-*-linux* && lp64 } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-skip-if "" { powerpc*-*-*spe* } { "*" } { "" } } */
+/* { dg-require-effective-target p8vector_hw } */
+/* { dg-options "-mcpu=power8 -O2" } */
+
+/* Check whether we get the right bits for direct move at runtime.  */
+
+#define TYPE vector int
+#define DO_MAIN
+
+#include "direct-move.h"
Index: gcc/testsuite/gcc.target/powerpc/direct-move.h
===================================================================
--- gcc/testsuite/gcc.target/powerpc/direct-move.h	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/direct-move.h	(revision 0)
@@ -0,0 +1,183 @@
+/* Test functions for direct move support.  */
+
+
+void __attribute__((__noinline__))
+copy (TYPE *a, TYPE *b)
+{
+  *b = *a;
+}
+
+#ifndef NO_GPR
+void __attribute__((__noinline__))
+load_gpr (TYPE *a, TYPE *b)
+{
+  TYPE c = *a;
+  __asm__ ("# gpr, reg = %0" : "+b" (c));
+  *b = c;
+}
+#endif
+
+#ifndef NO_FPR
+void __attribute__((__noinline__))
+load_fpr (TYPE *a, TYPE *b)
+{
+  TYPE c = *a;
+  __asm__ ("# fpr, reg = %0" : "+d" (c));
+  *b = c;
+}
+#endif
+
+#ifndef NO_ALTIVEC
+void __attribute__((__noinline__))
+load_altivec (TYPE *a, TYPE *b)
+{
+  TYPE c = *a;
+  __asm__ ("# altivec, reg = %0" : "+v" (c));
+  *b = c;
+}
+#endif
+
+#ifndef NO_VSX
+void __attribute__((__noinline__))
+load_vsx (TYPE *a, TYPE *b)
+{
+  TYPE c = *a;
+  __asm__ ("# vsx, reg = %x0" : "+wa" (c));
+  *b = c;
+}
+#endif
+
+#ifndef NO_GPR_TO_VSX
+void __attribute__((__noinline__))
+load_gpr_to_vsx (TYPE *a, TYPE *b)
+{
+  TYPE c = *a;
+  TYPE d;
+  __asm__ ("# gpr, reg = %0" : "+b" (c));
+  d = c;
+  __asm__ ("# vsx, reg = %x0" : "+wa" (d));
+  *b = d;
+}
+#endif
+
+#ifndef NO_VSX_TO_GPR
+void __attribute__((__noinline__))
+load_vsx_to_gpr (TYPE *a, TYPE *b)
+{
+  TYPE c = *a;
+  TYPE d;
+  __asm__ ("# vsx, reg = %x0" : "+wa" (c));
+  d = c;
+  __asm__ ("# gpr, reg = %0" : "+b" (d));
+  *b = d;
+}
+#endif
+
+#ifdef DO_MAIN
+typedef void (fn_type (TYPE *, TYPE *));
+
+struct test_struct {
+  fn_type *func;
+  const char *name;
+};
+
+const struct test_struct test_functions[] = {
+  { copy,		"copy"		  },
+#ifndef NO_GPR
+  { load_gpr,		"load_gpr"	  },
+#endif
+#ifndef NO_FPR
+  { load_fpr,		"load_fpr"	  },
+#endif
+#ifndef NO_ALTIVEC
+  { load_altivec,	"load_altivec"	  },
+#endif
+#ifndef NO_VSX
+  { load_vsx,		"load_vsx"	  },
+#endif
+#ifndef NO_GPR_TO_VSX
+  { load_gpr_to_vsx,	"load_gpr_to_vsx" },
+#endif
+#ifndef NO_VSX_TO_GPR
+  { load_vsx_to_gpr,	"load_vsx_to_gpr" },
+#endif
+};
+
+/* Test a given value for each of the functions.  */
+void __attribute__((__noinline__))
+test_value (TYPE a)
+{
+  size_t i;
+
+  for (i = 0; i < sizeof (test_functions) / sizeof (test_functions[0]); i++)
+    {
+      TYPE b;
+
+      test_functions[i].func (&a, &b);
+      if (memcmp ((void *)&a, (void *)&b, sizeof (TYPE)) != 0)
+	abort ();
+    }
+}
+
+/* Main program.  */
+int
+main (void)
+{
+  size_t i;
+  long j;
+  union {
+    TYPE value;
+    unsigned char bytes[sizeof (TYPE)];
+  } u;
+
+#if IS_INT
+  TYPE value = (TYPE)-5;
+  for (i = 0; i < 12; i++)
+    {
+      test_value (value);
+      value++;
+    }
+
+  for (i = 0; i < 8*sizeof (TYPE); i++)
+    test_value (((TYPE)1) << i);
+
+#elif IS_UNS
+  TYPE value = (TYPE)0;
+  for (i = 0; i < 10; i++)
+    {
+      test_value (value);
+      test_value (~ value);
+      value++;
+    }
+
+  for (i = 0; i < 8*sizeof (TYPE); i++)
+    test_value (((TYPE)1) << i);
+
+#elif IS_FLOAT
+  TYPE value = (TYPE)-5;
+  for (i = 0; i < 12; i++)
+    {
+      test_value (value);
+      value++;
+    }
+
+  test_value ((TYPE)3.1415926535);
+  test_value ((TYPE)1.23456);
+  test_value ((TYPE)(-0.0));
+  test_value ((TYPE)NAN);
+  test_value ((TYPE)+INFINITY);
+  test_value ((TYPE)-INFINITY);
+#else
+
+  for (j = 0; j < 10; j++)
+    {
+      for (i = 0; i < sizeof (TYPE); i++)
+	u.bytes[i] = (unsigned char) (random () >> 4);
+
+      test_value (u.value);
+    }
+#endif
+
+  return 0;
+}
+#endif
Index: gcc/testsuite/gcc.target/powerpc/direct-move-float1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/direct-move-float1.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/direct-move-float1.c	(revision 0)
@@ -0,0 +1,17 @@
+/* { dg-do compile { target { powerpc*-*-linux* && lp64 } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-skip-if "" { powerpc*-*-*spe* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-mcpu=power8 -O2" } */
+/* { dg-final { scan-assembler-times "mtvsrd" 2 } } */
+/* { dg-final { scan-assembler-times "mfvsrd" 2 } } */
+/* { dg-final { scan-assembler-times "xscvdpspn" 2 } } */
+/* { dg-final { scan-assembler-times "xscvspdpn" 2 } } */
+
+/* Check code generation for direct move for long types.  */
+
+#define TYPE float
+#define IS_FLOAT 1
+#define NO_ALTIVEC 1
+
+#include "direct-move.h"
Index: gcc/testsuite/gcc.target/powerpc/direct-move-float2.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/direct-move-float2.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/direct-move-float2.c	(revision 0)
@@ -0,0 +1,14 @@
+/* { dg-do run { target { powerpc*-*-linux* && lp64 } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-skip-if "" { powerpc*-*-*spe* } { "*" } { "" } } */
+/* { dg-require-effective-target p8vector_hw } */
+/* { dg-options "-mcpu=power8 -O2" } */
+
+/* Check whether we get the right bits for direct move at runtime.  */
+
+#define TYPE float
+#define IS_FLOAT 1
+#define NO_ALTIVEC 1
+#define DO_MAIN
+
+#include "direct-move.h"
Index: gcc/testsuite/gcc.target/powerpc/direct-move-double1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/direct-move-double1.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/direct-move-double1.c	(revision 0)
@@ -0,0 +1,15 @@
+/* { dg-do compile { target { powerpc*-*-linux* && lp64 } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-skip-if "" { powerpc*-*-*spe* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-mcpu=power8 -O2" } */
+/* { dg-final { scan-assembler-times "mtvsrd" 1 } } */
+/* { dg-final { scan-assembler-times "mfvsrd" 1 } } */
+
+/* Check code generation for direct move for long types.  */
+
+#define TYPE double
+#define IS_FLOAT 1
+#define NO_ALTIVEC 1
+
+#include "direct-move.h"
Index: gcc/testsuite/gcc.target/powerpc/direct-move-long1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/direct-move-long1.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/direct-move-long1.c	(revision 0)
@@ -0,0 +1,15 @@
+/* { dg-do compile { target { powerpc*-*-linux* && lp64 } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-skip-if "" { powerpc*-*-*spe* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-mcpu=power8 -O2" } */
+/* { dg-final { scan-assembler-times "mtvsrd" 1 } } */
+/* { dg-final { scan-assembler-times "mfvsrd" 2 } } */
+
+/* Check code generation for direct move for long types.  */
+
+#define TYPE long
+#define IS_INT 1
+#define NO_ALTIVEC 1
+
+#include "direct-move.h"
Index: gcc/testsuite/gcc.target/powerpc/direct-move-double2.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/direct-move-double2.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/direct-move-double2.c	(revision 0)
@@ -0,0 +1,14 @@
+/* { dg-do run { target { powerpc*-*-linux* && lp64 } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-skip-if "" { powerpc*-*-*spe* } { "*" } { "" } } */
+/* { dg-require-effective-target p8vector_hw } */
+/* { dg-options "-mcpu=power8 -O2" } */
+
+/* Check whether we get the right bits for direct move at runtime.  */
+
+#define TYPE double
+#define IS_FLOAT 1
+#define NO_ALTIVEC 1
+#define DO_MAIN
+
+#include "direct-move.h"
Index: gcc/testsuite/gcc.target/powerpc/direct-move-long2.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/direct-move-long2.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/direct-move-long2.c	(revision 0)
@@ -0,0 +1,14 @@
+/* { dg-do run { target { powerpc*-*-linux* && lp64 } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-skip-if "" { powerpc*-*-*spe* } { "*" } { "" } } */
+/* { dg-require-effective-target p8vector_hw } */
+/* { dg-options "-mcpu=power8 -O2" } */
+
+/* Check whether we get the right bits for direct move at runtime.  */
+
+#define TYPE long
+#define IS_INT 1
+#define NO_ALTIVEC 1
+#define DO_MAIN
+
+#include "direct-move.h"

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH, rs6000] power8 patches, patch #7, quad/byte/half-word atomic instructions
  2013-05-20 20:41 [PATCH, rs6000] power8 patches Michael Meissner
                   ` (6 preceding siblings ...)
  2013-05-22 14:26 ` [PATCH, rs6000] power8 patches, patch #6, direct move & basic quad load/store Michael Meissner
@ 2013-05-22 16:51 ` Michael Meissner
  2013-05-29 20:29   ` David Edelsohn
  2013-05-22 20:53 ` [PATCH, rs6000] power8 patches, patch #8, power8 load fusion + misc Michael Meissner
  2013-06-07 19:22 ` [PATCH, rs6000] power8 patches, patch #9, power8 scheduling Pat Haugen
  9 siblings, 1 reply; 52+ messages in thread
From: Michael Meissner @ 2013-05-22 16:51 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, dje.gcc, pthaugen, bergner

[-- Attachment #1: Type: text/plain, Size: 2340 bytes --]

This patch adds support for the byte, half-word, and quad-word atomic memory
operations that were added in ISA 2.07 (i.e. power8).  Like the other patches,
this passes bootstrap and had no regressions in make check.  Is it ok to commit
this patch after the previous 6 patches have been applied?

[gcc]
2013-05-22  Michael Meissner  <meissner@linux.vnet.ibm.com>
	    Pat Haugen <pthaugen@us.ibm.com>
	    Peter Bergner <bergner@vnet.ibm.com>

	* config/rs6000/rs6000.c (emit_load_locked): Add support for
	power8 byte, half-word, and quad-word atomic instructions.
	(emit_store_conditional): Likewise.
	(rs6000_expand_atomic_compare_and_swap): Likewise.
	(rs6000_expand_atomic_op): Likewise.

	* config/rs6000/sync.md (larx): Add new modes for power8.
	(stcx): Likewise.
	(AINT): New mode iterator to include TImode as well as normal
	integer modes on power8.
	(fetchop_pred): Use int_reg_operand instead of gpc_reg_operand so
	that VSX registers are not considered.  Use AINT mode iterator
	instead of INT1 to allow inclusion of quad word atomic operations
	on power8.
	(load_locked<mode>): Likewise.
	(store_conditional<mode>): Likewise.
	(atomic_compare_and_swap<mode>): Likewise.
	(atomic_exchange<mode>): Likewise.
	(atomic_nand<mode>): Likewise.
	(atomic_fetch_<fetchop_name><mode>): Likewise.
	(atomic_nand_fetch<mode>): Likewise.
	(mem_thread_fence): Use gen_loadsync_<mode> instead of enumerating
	each type.
	(ATOMIC): On power8, add QImode, HImode modes.
	(load_locked<QHI:mode>_si): Varients of load_locked for QI/HI
	modes that promote to SImode.
	(load_lockedti): Convert TImode arguments to PTImode, so that we
	get a guaranteed even/odd register pair.
	(load_lockedpti): Likewise.
	(store_conditionalti): Likewise.
	(store_conditionalpti): Likewise.

	* config/rs6000/rs6000.md (QHI): New mode iterator for power8
	atomic load/store instructions.
	(HSI): Likewise.

[gcc/testsuite]
2013-05-22  Michael Meissner  <meissner@linux.vnet.ibm.com>
	    Pat Haugen <pthaugen@us.ibm.com>
	    Peter Bergner <bergner@vnet.ibm.com>

	* gcc.target/powerpc/atomic-p7.c: New file, add tests for atomic
	load/store instructions on power7, power8.
	* gcc.target/powerpc/atomic-p8.c: Likewise.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

[-- Attachment #2: gcc-power8.official-07b --]
[-- Type: text/plain, Size: 30205 bytes --]

Index: gcc/testsuite/gcc.target/powerpc/atomic-p7.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/atomic-p7.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/atomic-p7.c	(revision 0)
@@ -0,0 +1,207 @@
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-mcpu=power7 -O2" } */
+/* { dg-final { scan-assembler-not "lbarx" } } */
+/* { dg-final { scan-assembler-not "lharx" } } */
+/* { dg-final { scan-assembler-times "lwarx" 18 } } */
+/* { dg-final { scan-assembler-times "ldarx" 6 } } */
+/* { dg-final { scan-assembler-not "lqarx" } } */
+/* { dg-final { scan-assembler-not "stbcx" } } */
+/* { dg-final { scan-assembler-not "sthcx" } } */
+/* { dg-final { scan-assembler-times "stwcx" 18 } } */
+/* { dg-final { scan-assembler-times "stdcx" 6 } } */
+/* { dg-final { scan-assembler-not "stqcx" } } */
+/* { dg-final { scan-assembler-times "bl __atomic" 6 } } */
+/* { dg-final { scan-assembler-times "isync" 12 } } */
+/* { dg-final { scan-assembler-times "lwsync" 8 } } */
+/* { dg-final { scan-assembler-not "mtvsrd" } } */
+/* { dg-final { scan-assembler-not "mtvsrwa" } } */
+/* { dg-final { scan-assembler-not "mtvsrwz" } } */
+/* { dg-final { scan-assembler-not "mfvsrd" } } */
+/* { dg-final { scan-assembler-not "mfvsrwz" } } */
+
+/* Test for the byte atomic operations on power8 using lbarx/stbcx.  */
+char
+char_fetch_add_relaxed (char *ptr, int value)
+{
+  return __atomic_fetch_add (ptr, value, __ATOMIC_RELAXED);
+}
+
+char
+char_fetch_sub_consume (char *ptr, int value)
+{
+  return __atomic_fetch_sub (ptr, value, __ATOMIC_CONSUME);
+}
+
+char
+char_fetch_and_acquire (char *ptr, int value)
+{
+  return __atomic_fetch_and (ptr, value, __ATOMIC_ACQUIRE);
+}
+
+char
+char_fetch_ior_release (char *ptr, int value)
+{
+  return __atomic_fetch_or (ptr, value, __ATOMIC_RELEASE);
+}
+
+char
+char_fetch_xor_acq_rel (char *ptr, int value)
+{
+  return __atomic_fetch_xor (ptr, value, __ATOMIC_ACQ_REL);
+}
+
+char
+char_fetch_nand_seq_cst (char *ptr, int value)
+{
+  return __atomic_fetch_nand (ptr, value, __ATOMIC_SEQ_CST);
+}
+
+/* Test for the half word atomic operations on power8 using lharx/sthcx.  */
+short
+short_fetch_add_relaxed (short *ptr, int value)
+{
+  return __atomic_fetch_add (ptr, value, __ATOMIC_RELAXED);
+}
+
+short
+short_fetch_sub_consume (short *ptr, int value)
+{
+  return __atomic_fetch_sub (ptr, value, __ATOMIC_CONSUME);
+}
+
+short
+short_fetch_and_acquire (short *ptr, int value)
+{
+  return __atomic_fetch_and (ptr, value, __ATOMIC_ACQUIRE);
+}
+
+short
+short_fetch_ior_release (short *ptr, int value)
+{
+  return __atomic_fetch_or (ptr, value, __ATOMIC_RELEASE);
+}
+
+short
+short_fetch_xor_acq_rel (short *ptr, int value)
+{
+  return __atomic_fetch_xor (ptr, value, __ATOMIC_ACQ_REL);
+}
+
+short
+short_fetch_nand_seq_cst (short *ptr, int value)
+{
+  return __atomic_fetch_nand (ptr, value, __ATOMIC_SEQ_CST);
+}
+
+/* Test for the word atomic operations on power8 using lwarx/stwcx.  */
+int
+int_fetch_add_relaxed (int *ptr, int value)
+{
+  return __atomic_fetch_add (ptr, value, __ATOMIC_RELAXED);
+}
+
+int
+int_fetch_sub_consume (int *ptr, int value)
+{
+  return __atomic_fetch_sub (ptr, value, __ATOMIC_CONSUME);
+}
+
+int
+int_fetch_and_acquire (int *ptr, int value)
+{
+  return __atomic_fetch_and (ptr, value, __ATOMIC_ACQUIRE);
+}
+
+int
+int_fetch_ior_release (int *ptr, int value)
+{
+  return __atomic_fetch_or (ptr, value, __ATOMIC_RELEASE);
+}
+
+int
+int_fetch_xor_acq_rel (int *ptr, int value)
+{
+  return __atomic_fetch_xor (ptr, value, __ATOMIC_ACQ_REL);
+}
+
+int
+int_fetch_nand_seq_cst (int *ptr, int value)
+{
+  return __atomic_fetch_nand (ptr, value, __ATOMIC_SEQ_CST);
+}
+
+/* Test for the double word atomic operations on power8 using ldarx/stdcx.  */
+long
+long_fetch_add_relaxed (long *ptr, long value)
+{
+  return __atomic_fetch_add (ptr, value, __ATOMIC_RELAXED);
+}
+
+long
+long_fetch_sub_consume (long *ptr, long value)
+{
+  return __atomic_fetch_sub (ptr, value, __ATOMIC_CONSUME);
+}
+
+long
+long_fetch_and_acquire (long *ptr, long value)
+{
+  return __atomic_fetch_and (ptr, value, __ATOMIC_ACQUIRE);
+}
+
+long
+long_fetch_ior_release (long *ptr, long value)
+{
+  return __atomic_fetch_or (ptr, value, __ATOMIC_RELEASE);
+}
+
+long
+long_fetch_xor_acq_rel (long *ptr, long value)
+{
+  return __atomic_fetch_xor (ptr, value, __ATOMIC_ACQ_REL);
+}
+
+long
+long_fetch_nand_seq_cst (long *ptr, long value)
+{
+  return __atomic_fetch_nand (ptr, value, __ATOMIC_SEQ_CST);
+}
+
+/* Test for the quad word atomic operations on power8 using ldarx/stdcx.  */
+__int128_t
+quad_fetch_add_relaxed (__int128_t *ptr, __int128_t value)
+{
+  return __atomic_fetch_add (ptr, value, __ATOMIC_RELAXED);
+}
+
+__int128_t
+quad_fetch_sub_consume (__int128_t *ptr, __int128_t value)
+{
+  return __atomic_fetch_sub (ptr, value, __ATOMIC_CONSUME);
+}
+
+__int128_t
+quad_fetch_and_acquire (__int128_t *ptr, __int128_t value)
+{
+  return __atomic_fetch_and (ptr, value, __ATOMIC_ACQUIRE);
+}
+
+__int128_t
+quad_fetch_ior_release (__int128_t *ptr, __int128_t value)
+{
+  return __atomic_fetch_or (ptr, value, __ATOMIC_RELEASE);
+}
+
+__int128_t
+quad_fetch_xor_acq_rel (__int128_t *ptr, __int128_t value)
+{
+  return __atomic_fetch_xor (ptr, value, __ATOMIC_ACQ_REL);
+}
+
+__int128_t
+quad_fetch_nand_seq_cst (__int128_t *ptr, __int128_t value)
+{
+  return __atomic_fetch_nand (ptr, value, __ATOMIC_SEQ_CST);
+}
Index: gcc/testsuite/gcc.target/powerpc/atomic-p8.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/atomic-p8.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/atomic-p8.c	(revision 0)
@@ -0,0 +1,237 @@
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-mcpu=power8 -O2" } */
+/* { dg-final { scan-assembler-times "lbarx" 7 } } */
+/* { dg-final { scan-assembler-times "lharx" 7 } } */
+/* { dg-final { scan-assembler-times "lwarx" 7 } } */
+/* { dg-final { scan-assembler-times "ldarx" 7 } } */
+/* { dg-final { scan-assembler-times "lqarx" 7 } } */
+/* { dg-final { scan-assembler-times "stbcx" 7 } } */
+/* { dg-final { scan-assembler-times "sthcx" 7 } } */
+/* { dg-final { scan-assembler-times "stwcx" 7 } } */
+/* { dg-final { scan-assembler-times "stdcx" 7 } } */
+/* { dg-final { scan-assembler-times "stqcx" 7 } } */
+/* { dg-final { scan-assembler-not "bl __atomic" } } */
+/* { dg-final { scan-assembler-times "isync" 20 } } */
+/* { dg-final { scan-assembler-times "lwsync" 10 } } */
+/* { dg-final { scan-assembler-not "mtvsrd" } } */
+/* { dg-final { scan-assembler-not "mtvsrwa" } } */
+/* { dg-final { scan-assembler-not "mtvsrwz" } } */
+/* { dg-final { scan-assembler-not "mfvsrd" } } */
+/* { dg-final { scan-assembler-not "mfvsrwz" } } */
+
+/* Test for the byte atomic operations on power8 using lbarx/stbcx.  */
+char
+char_fetch_add_relaxed (char *ptr, int value)
+{
+  return __atomic_fetch_add (ptr, value, __ATOMIC_RELAXED);
+}
+
+char
+char_fetch_sub_consume (char *ptr, int value)
+{
+  return __atomic_fetch_sub (ptr, value, __ATOMIC_CONSUME);
+}
+
+char
+char_fetch_and_acquire (char *ptr, int value)
+{
+  return __atomic_fetch_and (ptr, value, __ATOMIC_ACQUIRE);
+}
+
+char
+char_fetch_ior_release (char *ptr, int value)
+{
+  return __atomic_fetch_or (ptr, value, __ATOMIC_RELEASE);
+}
+
+char
+char_fetch_xor_acq_rel (char *ptr, int value)
+{
+  return __atomic_fetch_xor (ptr, value, __ATOMIC_ACQ_REL);
+}
+
+char
+char_fetch_nand_seq_cst (char *ptr, int value)
+{
+  return __atomic_fetch_nand (ptr, value, __ATOMIC_SEQ_CST);
+}
+
+void
+char_val_compare_and_swap (char *p, int i, int j, char *q)
+{
+  *q = __sync_val_compare_and_swap (p, i, j);
+}
+
+/* Test for the half word atomic operations on power8 using lharx/sthcx.  */
+short
+short_fetch_add_relaxed (short *ptr, int value)
+{
+  return __atomic_fetch_add (ptr, value, __ATOMIC_RELAXED);
+}
+
+short
+short_fetch_sub_consume (short *ptr, int value)
+{
+  return __atomic_fetch_sub (ptr, value, __ATOMIC_CONSUME);
+}
+
+short
+short_fetch_and_acquire (short *ptr, int value)
+{
+  return __atomic_fetch_and (ptr, value, __ATOMIC_ACQUIRE);
+}
+
+short
+short_fetch_ior_release (short *ptr, int value)
+{
+  return __atomic_fetch_or (ptr, value, __ATOMIC_RELEASE);
+}
+
+short
+short_fetch_xor_acq_rel (short *ptr, int value)
+{
+  return __atomic_fetch_xor (ptr, value, __ATOMIC_ACQ_REL);
+}
+
+short
+short_fetch_nand_seq_cst (short *ptr, int value)
+{
+  return __atomic_fetch_nand (ptr, value, __ATOMIC_SEQ_CST);
+}
+
+void
+short_val_compare_and_swap (short *p, int i, int j, short *q)
+{
+  *q = __sync_val_compare_and_swap (p, i, j);
+}
+
+/* Test for the word atomic operations on power8 using lwarx/stwcx.  */
+int
+int_fetch_add_relaxed (int *ptr, int value)
+{
+  return __atomic_fetch_add (ptr, value, __ATOMIC_RELAXED);
+}
+
+int
+int_fetch_sub_consume (int *ptr, int value)
+{
+  return __atomic_fetch_sub (ptr, value, __ATOMIC_CONSUME);
+}
+
+int
+int_fetch_and_acquire (int *ptr, int value)
+{
+  return __atomic_fetch_and (ptr, value, __ATOMIC_ACQUIRE);
+}
+
+int
+int_fetch_ior_release (int *ptr, int value)
+{
+  return __atomic_fetch_or (ptr, value, __ATOMIC_RELEASE);
+}
+
+int
+int_fetch_xor_acq_rel (int *ptr, int value)
+{
+  return __atomic_fetch_xor (ptr, value, __ATOMIC_ACQ_REL);
+}
+
+int
+int_fetch_nand_seq_cst (int *ptr, int value)
+{
+  return __atomic_fetch_nand (ptr, value, __ATOMIC_SEQ_CST);
+}
+
+void
+int_val_compare_and_swap (int *p, int i, int j, int *q)
+{
+  *q = __sync_val_compare_and_swap (p, i, j);
+}
+
+/* Test for the double word atomic operations on power8 using ldarx/stdcx.  */
+long
+long_fetch_add_relaxed (long *ptr, long value)
+{
+  return __atomic_fetch_add (ptr, value, __ATOMIC_RELAXED);
+}
+
+long
+long_fetch_sub_consume (long *ptr, long value)
+{
+  return __atomic_fetch_sub (ptr, value, __ATOMIC_CONSUME);
+}
+
+long
+long_fetch_and_acquire (long *ptr, long value)
+{
+  return __atomic_fetch_and (ptr, value, __ATOMIC_ACQUIRE);
+}
+
+long
+long_fetch_ior_release (long *ptr, long value)
+{
+  return __atomic_fetch_or (ptr, value, __ATOMIC_RELEASE);
+}
+
+long
+long_fetch_xor_acq_rel (long *ptr, long value)
+{
+  return __atomic_fetch_xor (ptr, value, __ATOMIC_ACQ_REL);
+}
+
+long
+long_fetch_nand_seq_cst (long *ptr, long value)
+{
+  return __atomic_fetch_nand (ptr, value, __ATOMIC_SEQ_CST);
+}
+
+void
+long_val_compare_and_swap (long *p, long i, long j, long *q)
+{
+  *q = __sync_val_compare_and_swap (p, i, j);
+}
+
+/* Test for the quad word atomic operations on power8 using ldarx/stdcx.  */
+__int128_t
+quad_fetch_add_relaxed (__int128_t *ptr, __int128_t value)
+{
+  return __atomic_fetch_add (ptr, value, __ATOMIC_RELAXED);
+}
+
+__int128_t
+quad_fetch_sub_consume (__int128_t *ptr, __int128_t value)
+{
+  return __atomic_fetch_sub (ptr, value, __ATOMIC_CONSUME);
+}
+
+__int128_t
+quad_fetch_and_acquire (__int128_t *ptr, __int128_t value)
+{
+  return __atomic_fetch_and (ptr, value, __ATOMIC_ACQUIRE);
+}
+
+__int128_t
+quad_fetch_ior_release (__int128_t *ptr, __int128_t value)
+{
+  return __atomic_fetch_or (ptr, value, __ATOMIC_RELEASE);
+}
+
+__int128_t
+quad_fetch_xor_acq_rel (__int128_t *ptr, __int128_t value)
+{
+  return __atomic_fetch_xor (ptr, value, __ATOMIC_ACQ_REL);
+}
+
+__int128_t
+quad_fetch_nand_seq_cst (__int128_t *ptr, __int128_t value)
+{
+  return __atomic_fetch_nand (ptr, value, __ATOMIC_SEQ_CST);
+}
+
+void
+quad_val_compare_and_swap (__int128_t *p, __int128_t i, __int128_t j, __int128_t *q)
+{
+  *q = __sync_val_compare_and_swap (p, i, j);
+}
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 199200)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -17752,7 +17752,8 @@ emit_unlikely_jump (rtx cond, rtx label)
 }
 
 /* A subroutine of the atomic operation splitters.  Emit a load-locked
-   instruction in MODE.  */
+   instruction in MODE.  For QI/HImode, possibly use a pattern than includes
+   the zero_extend operation.  */
 
 static void
 emit_load_locked (enum machine_mode mode, rtx reg, rtx mem)
@@ -17761,12 +17762,26 @@ emit_load_locked (enum machine_mode mode
 
   switch (mode)
     {
+    case QImode:
+      fn = gen_load_lockedqi;
+      break;
+    case HImode:
+      fn = gen_load_lockedhi;
+      break;
     case SImode:
-      fn = gen_load_lockedsi;
+      if (GET_MODE (mem) == QImode)
+	fn = gen_load_lockedqi_si;
+      else if (GET_MODE (mem) == HImode)
+	fn = gen_load_lockedhi_si;
+      else
+	fn = gen_load_lockedsi;
       break;
     case DImode:
       fn = gen_load_lockeddi;
       break;
+    case TImode:
+      fn = gen_load_lockedti;
+      break;
     default:
       gcc_unreachable ();
     }
@@ -17783,12 +17798,21 @@ emit_store_conditional (enum machine_mod
 
   switch (mode)
     {
+    case QImode:
+      fn = gen_store_conditionalqi;
+      break;
+    case HImode:
+      fn = gen_store_conditionalhi;
+      break;
     case SImode:
       fn = gen_store_conditionalsi;
       break;
     case DImode:
       fn = gen_store_conditionaldi;
       break;
+    case TImode:
+      fn = gen_store_conditionalti;
+      break;
     default:
       gcc_unreachable ();
     }
@@ -17934,7 +17958,7 @@ rs6000_expand_atomic_compare_and_swap (r
 {
   rtx boolval, retval, mem, oldval, newval, cond;
   rtx label1, label2, x, mask, shift;
-  enum machine_mode mode;
+  enum machine_mode mode, orig_mode;
   enum memmodel mod_s, mod_f;
   bool is_weak;
 
@@ -17946,22 +17970,29 @@ rs6000_expand_atomic_compare_and_swap (r
   is_weak = (INTVAL (operands[5]) != 0);
   mod_s = (enum memmodel) INTVAL (operands[6]);
   mod_f = (enum memmodel) INTVAL (operands[7]);
-  mode = GET_MODE (mem);
+  orig_mode = mode = GET_MODE (mem);
 
   mask = shift = NULL_RTX;
   if (mode == QImode || mode == HImode)
     {
-      mem = rs6000_adjust_atomic_subword (mem, &shift, &mask);
-
-      /* Shift and mask OLDVAL into position with the word.  */
+      /* Before power8, we didn't have access to lbarx/lharx, so generate a
+	 lwarx and shift/mask operations.  With power8, we need to do the
+	 comparison in SImode, but the store is still done in QI/HImode.  */
       oldval = convert_modes (SImode, mode, oldval, 1);
-      oldval = expand_simple_binop (SImode, ASHIFT, oldval, shift,
-				    NULL_RTX, 1, OPTAB_LIB_WIDEN);
 
-      /* Shift and mask NEWVAL into position within the word.  */
-      newval = convert_modes (SImode, mode, newval, 1);
-      newval = expand_simple_binop (SImode, ASHIFT, newval, shift,
-				    NULL_RTX, 1, OPTAB_LIB_WIDEN);
+      if (!TARGET_SYNC_HI_QI)
+	{
+	  mem = rs6000_adjust_atomic_subword (mem, &shift, &mask);
+
+	  /* Shift and mask OLDVAL into position with the word.  */
+	  oldval = expand_simple_binop (SImode, ASHIFT, oldval, shift,
+					NULL_RTX, 1, OPTAB_LIB_WIDEN);
+
+	  /* Shift and mask NEWVAL into position within the word.  */
+	  newval = convert_modes (SImode, mode, newval, 1);
+	  newval = expand_simple_binop (SImode, ASHIFT, newval, shift,
+					NULL_RTX, 1, OPTAB_LIB_WIDEN);
+	}
 
       /* Prepare to adjust the return value.  */
       retval = gen_reg_rtx (SImode);
@@ -17990,7 +18021,25 @@ rs6000_expand_atomic_compare_and_swap (r
     }
 
   cond = gen_reg_rtx (CCmode);
-  x = gen_rtx_COMPARE (CCmode, x, oldval);
+  /* If we have TImode, synthesize a comparison.  */
+  if (mode != TImode)
+    x = gen_rtx_COMPARE (CCmode, x, oldval);
+  else
+    {
+      rtx xor1_result = gen_reg_rtx (DImode);
+      rtx xor2_result = gen_reg_rtx (DImode);
+      rtx or_result = gen_reg_rtx (DImode);
+      rtx new_word0 = simplify_gen_subreg (DImode, x, TImode, 0);
+      rtx new_word1 = simplify_gen_subreg (DImode, x, TImode, 8);
+      rtx old_word0 = simplify_gen_subreg (DImode, oldval, TImode, 0);
+      rtx old_word1 = simplify_gen_subreg (DImode, oldval, TImode, 8);
+
+      emit_insn (gen_xordi3 (xor1_result, new_word0, old_word0));
+      emit_insn (gen_xordi3 (xor2_result, new_word1, old_word1));
+      emit_insn (gen_iordi3 (or_result, xor1_result, xor2_result));
+      x = gen_rtx_COMPARE (CCmode, or_result, const0_rtx);
+    }
+
   emit_insn (gen_rtx_SET (VOIDmode, cond, x));
 
   x = gen_rtx_NE (VOIDmode, cond, const0_rtx);
@@ -18000,7 +18049,7 @@ rs6000_expand_atomic_compare_and_swap (r
   if (mask)
     x = rs6000_mask_atomic_subword (retval, newval, mask);
 
-  emit_store_conditional (mode, cond, mem, x);
+  emit_store_conditional (orig_mode, cond, mem, x);
 
   if (!is_weak)
     {
@@ -18018,6 +18067,8 @@ rs6000_expand_atomic_compare_and_swap (r
 
   if (shift)
     rs6000_finish_atomic_subword (operands[1], retval, shift);
+  else if (mode != GET_MODE (operands[1]))
+    convert_move (operands[1], retval, 1);
 
   /* In all cases, CR0 contains EQ on success, and NE on failure.  */
   x = gen_rtx_EQ (SImode, cond, const0_rtx);
@@ -18041,7 +18092,7 @@ rs6000_expand_atomic_exchange (rtx opera
   mode = GET_MODE (mem);
 
   mask = shift = NULL_RTX;
-  if (mode == QImode || mode == HImode)
+  if (!TARGET_SYNC_HI_QI && (mode == QImode || mode == HImode))
     {
       mem = rs6000_adjust_atomic_subword (mem, &shift, &mask);
 
@@ -18090,11 +18141,25 @@ rs6000_expand_atomic_op (enum rtx_code c
 {
   enum memmodel model = (enum memmodel) INTVAL (model_rtx);
   enum machine_mode mode = GET_MODE (mem);
+  enum machine_mode store_mode = mode;
   rtx label, x, cond, mask, shift;
   rtx before = orig_before, after = orig_after;
 
   mask = shift = NULL_RTX;
-  if (mode == QImode || mode == HImode)
+  /* On power8, we want to use SImode for the operation.  On previoius systems,
+     use the operation in a subword and shift/mask to get the proper byte or
+     halfword.  */
+  if (TARGET_SYNC_HI_QI && (mode == QImode || mode == HImode))
+    {
+      val = convert_modes (SImode, mode, val, 1);
+
+      /* Prepare to adjust the return value.  */
+      before = gen_reg_rtx (SImode);
+      if (after)
+	after = gen_reg_rtx (SImode);
+      mode = SImode;
+    }
+  else if (mode == QImode || mode == HImode)
     {
       mem = rs6000_adjust_atomic_subword (mem, &shift, &mask);
 
@@ -18136,7 +18201,7 @@ rs6000_expand_atomic_op (enum rtx_code c
       before = gen_reg_rtx (SImode);
       if (after)
 	after = gen_reg_rtx (SImode);
-      mode = SImode;
+      store_mode = mode = SImode;
     }
 
   mem = rs6000_pre_atomic_barrier (mem, model);
@@ -18169,9 +18234,11 @@ rs6000_expand_atomic_op (enum rtx_code c
 			       NULL_RTX, 1, OPTAB_LIB_WIDEN);
       x = rs6000_mask_atomic_subword (before, x, mask);
     }
+  else if (store_mode != mode)
+    x = convert_modes (store_mode, mode, x, 1);
 
   cond = gen_reg_rtx (CCmode);
-  emit_store_conditional (mode, cond, mem, x);
+  emit_store_conditional (store_mode, cond, mem, x);
 
   x = gen_rtx_NE (VOIDmode, cond, const0_rtx);
   emit_unlikely_jump (x, label);
@@ -18180,11 +18247,22 @@ rs6000_expand_atomic_op (enum rtx_code c
 
   if (shift)
     {
+      /* QImode/HImode on machines without lbarx/lharx where we do a lwarx and
+	 then do the calcuations in a SImode register.  */
       if (orig_before)
 	rs6000_finish_atomic_subword (orig_before, before, shift);
       if (orig_after)
 	rs6000_finish_atomic_subword (orig_after, after, shift);
     }
+  else if (store_mode != mode)
+    {
+      /* QImode/HImode on machines with lbarx/lharx where we do the native
+	 operation and then do the calcuations in a SImode register.  */
+      if (orig_before)
+	convert_move (orig_before, before, 1);
+      if (orig_after)
+	convert_move (orig_after, after, 1);
+    }
   else if (orig_after && after != orig_after)
     emit_move_insn (orig_after, after);
 }
Index: gcc/config/rs6000/sync.md
===================================================================
--- gcc/config/rs6000/sync.md	(revision 199037)
+++ gcc/config/rs6000/sync.md	(working copy)
@@ -18,14 +18,23 @@
 ;; along with GCC; see the file COPYING3.  If not see
 ;; <http://www.gnu.org/licenses/>.
 
-(define_mode_attr larx [(SI "lwarx") (DI "ldarx")])
-(define_mode_attr stcx [(SI "stwcx.") (DI "stdcx.")])
+(define_mode_attr larx [(QI "lbarx")
+			(HI "lharx")
+			(SI "lwarx")
+			(DI "ldarx")
+			(TI "lqarx")])
+
+(define_mode_attr stcx [(QI "stbcx.")
+			(HI "sthcx.")
+			(SI "stwcx.")
+			(DI "stdcx.")
+			(TI "stqcx.")])
 
 (define_code_iterator FETCHOP [plus minus ior xor and])
 (define_code_attr fetchop_name
   [(plus "add") (minus "sub") (ior "or") (xor "xor") (and "and")])
 (define_code_attr fetchop_pred
-  [(plus "add_operand") (minus "gpc_reg_operand")
+  [(plus "add_operand") (minus "int_reg_operand")
    (ior "logical_operand") (xor "logical_operand") (and "and_operand")])
 
 (define_expand "mem_thread_fence"
@@ -129,16 +138,7 @@ (define_expand "atomic_load<mode>"
     case MEMMODEL_CONSUME:
     case MEMMODEL_ACQUIRE:
     case MEMMODEL_SEQ_CST:
-      if (GET_MODE (operands[0]) == QImode)
-	emit_insn (gen_loadsync_qi (operands[0]));
-      else if (GET_MODE (operands[0]) == HImode)
-	emit_insn (gen_loadsync_hi (operands[0]));
-      else if (GET_MODE (operands[0]) == SImode)
-	emit_insn (gen_loadsync_si (operands[0]));
-      else if (GET_MODE (operands[0]) == DImode)
-	emit_insn (gen_loadsync_di (operands[0]));
-      else
-	gcc_unreachable ();
+      emit_insn (gen_loadsync_<mode> (operands[0]));
       break;
     default:
       gcc_unreachable ();
@@ -170,35 +170,97 @@ (define_expand "atomic_store<mode>"
   DONE;
 })
 
-;; ??? Power ISA 2.06B says that there *is* a load-{byte,half}-and-reserve
-;; opcode that is "phased-in".  Not implemented as of Power7, so not yet used,
-;; but let's prepare the macros anyway.
-
-(define_mode_iterator ATOMIC    [SI (DI "TARGET_POWERPC64")])
+;; Any supported integer mode that has atomic l<x>arx/st<x>cx. instrucitons
+;; other than the quad memory operations, which have special restrictions.
+;; Byte/halfword atomic instructions were added in ISA 2.06B, but were phased
+;; in and did not show up until power8.  TImode atomic lqarx/stqcx. require
+;; special handling due to even/odd register requirements.
+(define_mode_iterator ATOMIC [(QI "TARGET_SYNC_HI_QI")
+			      (HI "TARGET_SYNC_HI_QI")
+			      SI
+			      (DI "TARGET_POWERPC64")])
+
+;; Types that we should provide atomic instructions for.
+
+(define_mode_iterator AINT [QI
+			    HI
+			    SI
+			    (DI "TARGET_POWERPC64")
+			    (TI "TARGET_SYNC_TI")])
 
 (define_insn "load_locked<mode>"
-  [(set (match_operand:ATOMIC 0 "gpc_reg_operand" "=r")
+  [(set (match_operand:ATOMIC 0 "int_reg_operand" "=r")
 	(unspec_volatile:ATOMIC
          [(match_operand:ATOMIC 1 "memory_operand" "Z")] UNSPECV_LL))]
   ""
   "<larx> %0,%y1"
   [(set_attr "type" "load_l")])
 
+(define_insn "load_locked<QHI:mode>_si"
+  [(set (match_operand:SI 0 "int_reg_operand" "=r")
+	(unspec_volatile:SI
+	  [(match_operand:QHI 1 "memory_operand" "Z")] UNSPECV_LL))]
+  "TARGET_SYNC_HI_QI"
+  "<QHI:larx> %0,%y1"
+  [(set_attr "type" "load_l")])
+
+;; Use PTImode to get even/odd register pairs
+(define_expand "load_lockedti"
+  [(use (match_operand:TI 0 "quad_int_reg_operand" ""))
+   (use (match_operand:TI 1 "memory_operand" ""))]
+  "TARGET_SYNC_TI"
+{
+  emit_insn (gen_load_lockedpti (gen_lowpart (PTImode, operands[0]),
+				 operands[1]));
+  DONE;
+})
+
+(define_insn "load_lockedpti"
+  [(set (match_operand:PTI 0 "quad_int_reg_operand" "=&r")
+	(unspec_volatile:PTI
+         [(match_operand:TI 1 "memory_operand" "Z")] UNSPECV_LL))]
+  "TARGET_SYNC_TI
+   && !reg_mentioned_p (operands[0], operands[1])
+   && quad_int_reg_operand (operands[0], PTImode)"
+  "lqarx %0,%y1"
+  [(set_attr "type" "load_l")])
+
 (define_insn "store_conditional<mode>"
   [(set (match_operand:CC 0 "cc_reg_operand" "=x")
 	(unspec_volatile:CC [(const_int 0)] UNSPECV_SC))
    (set (match_operand:ATOMIC 1 "memory_operand" "=Z")
-	(match_operand:ATOMIC 2 "gpc_reg_operand" "r"))]
+	(match_operand:ATOMIC 2 "int_reg_operand" "r"))]
   ""
   "<stcx> %2,%y1"
   [(set_attr "type" "store_c")])
 
+(define_expand "store_conditionalti"
+  [(use (match_operand:CC 0 "cc_reg_operand" ""))
+   (use (match_operand:TI 1 "memory_operand" ""))
+   (use (match_operand:TI 2 "quad_int_reg_operand" ""))]
+  "TARGET_SYNC_TI"
+{
+  emit_insn (gen_store_conditionalpti (operands[0],
+				       gen_lowpart (PTImode, operands[1]),
+				       gen_lowpart (PTImode, operands[2])));
+  DONE;
+})
+
+(define_insn "store_conditionalpti"
+  [(set (match_operand:CC 0 "cc_reg_operand" "=x")
+	(unspec_volatile:CC [(const_int 0)] UNSPECV_SC))
+   (set (match_operand:PTI 1 "memory_operand" "=Z")
+	(match_operand:PTI 2 "quad_int_reg_operand" "r"))]
+  "TARGET_SYNC_TI && quad_int_reg_operand (operands[2], PTImode)"
+  "stqcx. %2,%y1"
+  [(set_attr "type" "store_c")])
+
 (define_expand "atomic_compare_and_swap<mode>"
-  [(match_operand:SI 0 "gpc_reg_operand" "")		;; bool out
-   (match_operand:INT1 1 "gpc_reg_operand" "")		;; val out
-   (match_operand:INT1 2 "memory_operand" "")		;; memory
-   (match_operand:INT1 3 "reg_or_short_operand" "")	;; expected
-   (match_operand:INT1 4 "gpc_reg_operand" "")		;; desired
+  [(match_operand:SI 0 "int_reg_operand" "")		;; bool out
+   (match_operand:AINT 1 "int_reg_operand" "")		;; val out
+   (match_operand:AINT 2 "memory_operand" "")		;; memory
+   (match_operand:AINT 3 "reg_or_short_operand" "")	;; expected
+   (match_operand:AINT 4 "int_reg_operand" "")		;; desired
    (match_operand:SI 5 "const_int_operand" "")		;; is_weak
    (match_operand:SI 6 "const_int_operand" "")		;; model succ
    (match_operand:SI 7 "const_int_operand" "")]		;; model fail
@@ -209,9 +271,9 @@ (define_expand "atomic_compare_and_swap<
 })
 
 (define_expand "atomic_exchange<mode>"
-  [(match_operand:INT1 0 "gpc_reg_operand" "")		;; output
-   (match_operand:INT1 1 "memory_operand" "")		;; memory
-   (match_operand:INT1 2 "gpc_reg_operand" "")		;; input
+  [(match_operand:AINT 0 "int_reg_operand" "")		;; output
+   (match_operand:AINT 1 "memory_operand" "")		;; memory
+   (match_operand:AINT 2 "int_reg_operand" "")		;; input
    (match_operand:SI 3 "const_int_operand" "")]		;; model
   ""
 {
@@ -220,9 +282,9 @@ (define_expand "atomic_exchange<mode>"
 })
 
 (define_expand "atomic_<fetchop_name><mode>"
-  [(match_operand:INT1 0 "memory_operand" "")		;; memory
-   (FETCHOP:INT1 (match_dup 0)
-     (match_operand:INT1 1 "<fetchop_pred>" ""))	;; operand
+  [(match_operand:AINT 0 "memory_operand" "")		;; memory
+   (FETCHOP:AINT (match_dup 0)
+     (match_operand:AINT 1 "<fetchop_pred>" ""))	;; operand
    (match_operand:SI 2 "const_int_operand" "")]		;; model
   ""
 {
@@ -232,8 +294,8 @@ (define_expand "atomic_<fetchop_name><mo
 })
 
 (define_expand "atomic_nand<mode>"
-  [(match_operand:INT1 0 "memory_operand" "")		;; memory
-   (match_operand:INT1 1 "gpc_reg_operand" "")		;; operand
+  [(match_operand:AINT 0 "memory_operand" "")		;; memory
+   (match_operand:AINT 1 "int_reg_operand" "")		;; operand
    (match_operand:SI 2 "const_int_operand" "")]		;; model
   ""
 {
@@ -243,10 +305,10 @@ (define_expand "atomic_nand<mode>"
 })
 
 (define_expand "atomic_fetch_<fetchop_name><mode>"
-  [(match_operand:INT1 0 "gpc_reg_operand" "")		;; output
-   (match_operand:INT1 1 "memory_operand" "")		;; memory
-   (FETCHOP:INT1 (match_dup 1)
-     (match_operand:INT1 2 "<fetchop_pred>" ""))	;; operand
+  [(match_operand:AINT 0 "int_reg_operand" "")		;; output
+   (match_operand:AINT 1 "memory_operand" "")		;; memory
+   (FETCHOP:AINT (match_dup 1)
+     (match_operand:AINT 2 "<fetchop_pred>" ""))	;; operand
    (match_operand:SI 3 "const_int_operand" "")]		;; model
   ""
 { 
@@ -256,9 +318,9 @@ (define_expand "atomic_fetch_<fetchop_na
 })
 
 (define_expand "atomic_fetch_nand<mode>"
-  [(match_operand:INT1 0 "gpc_reg_operand" "")		;; output
-   (match_operand:INT1 1 "memory_operand" "")		;; memory
-   (match_operand:INT1 2 "gpc_reg_operand" "")		;; operand
+  [(match_operand:AINT 0 "int_reg_operand" "")		;; output
+   (match_operand:AINT 1 "memory_operand" "")		;; memory
+   (match_operand:AINT 2 "int_reg_operand" "")		;; operand
    (match_operand:SI 3 "const_int_operand" "")]		;; model
   ""
 {
@@ -268,10 +330,10 @@ (define_expand "atomic_fetch_nand<mode>"
 })
 
 (define_expand "atomic_<fetchop_name>_fetch<mode>"
-  [(match_operand:INT1 0 "gpc_reg_operand" "")		;; output
-   (match_operand:INT1 1 "memory_operand" "")		;; memory
-   (FETCHOP:INT1 (match_dup 1)
-     (match_operand:INT1 2 "<fetchop_pred>" ""))	;; operand
+  [(match_operand:AINT 0 "int_reg_operand" "")		;; output
+   (match_operand:AINT 1 "memory_operand" "")		;; memory
+   (FETCHOP:AINT (match_dup 1)
+     (match_operand:AINT 2 "<fetchop_pred>" ""))	;; operand
    (match_operand:SI 3 "const_int_operand" "")]		;; model
   ""
 {
@@ -281,9 +343,9 @@ (define_expand "atomic_<fetchop_name>_fe
 })
 
 (define_expand "atomic_nand_fetch<mode>"
-  [(match_operand:INT1 0 "gpc_reg_operand" "")		;; output
-   (match_operand:INT1 1 "memory_operand" "")		;; memory
-   (match_operand:INT1 2 "gpc_reg_operand" "")		;; operand
+  [(match_operand:AINT 0 "int_reg_operand" "")		;; output
+   (match_operand:AINT 1 "memory_operand" "")		;; memory
+   (match_operand:AINT 2 "int_reg_operand" "")		;; operand
    (match_operand:SI 3 "const_int_operand" "")]		;; model
   ""
 {
Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(revision 199200)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -234,6 +234,12 @@ (define_mode_iterator INT1 [QI HI SI (DI
 ; extend modes for DImode
 (define_mode_iterator QHSI [QI HI SI])
 
+; QImode or HImode for small atomic ops
+(define_mode_iterator QHI [QI HI])
+
+; HImode or SImode for sign extended fusion ops
+(define_mode_iterator HSI [HI SI])
+
 ; SImode or DImode, even if DImode doesn't fit in GPRs.
 (define_mode_iterator SDI [SI DI])
 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH, rs6000] power8 patches, patch #8, power8 load fusion + misc.
  2013-05-20 20:41 [PATCH, rs6000] power8 patches Michael Meissner
                   ` (7 preceding siblings ...)
  2013-05-22 16:51 ` [PATCH, rs6000] power8 patches, patch #7, quad/byte/half-word atomic instructions Michael Meissner
@ 2013-05-22 20:53 ` Michael Meissner
  2013-06-18 18:30   ` David Edelsohn
  2013-07-29 18:46   ` [PATCH, rs6000] power8 patches, revised patch #8, power8 load fusion Michael Meissner
  2013-06-07 19:22 ` [PATCH, rs6000] power8 patches, patch #9, power8 scheduling Pat Haugen
  9 siblings, 2 replies; 52+ messages in thread
From: Michael Meissner @ 2013-05-22 20:53 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, dje.gcc, pthaugen, bergner

[-- Attachment #1: Type: text/plain, Size: 3375 bytes --]

This is the final set of patches that I have available right now.  We will be
doing additional patches over the summer.

The primary thing in this patch is to add support for load fusion in the
power8.  Power8 has two types of fusion:

	addi <a>,<b>,<const>
	lxvd2x <va>,<b>,<c>

and:

	addis <a>,<b>,<const-hi>
	ld <a>,<const-lo>(<a>)

These instructions must be adjacent to each other, and in the case of fusion in
loading GPRs, the register being loaded must be the base register to load from
it.  In this patch, I added peepholes to cover this case.  In the future, I
plan on reworking the problem by being more liberal in what addresses are
allowed before reload/lra, and in lra, generate these forms.  However, these
peepholes do help find fusion cases.

I also added two switches (-mlra and -mconstrain-regs) that were used in
converting the powerpc port to use the LRA register allocator.  Note, at the
present time, Vlad and I are going back on forth on additional things needed
for LRA.

This patch bootstraps and has no regressions in the test suite.  Is it ok to
check in after the previous 7 patches have been applied?

FWIW, patches 1-2 that were approved have now been checked in.

2013-05-22  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* config/rs6000/predicates.md (fusion_gpr_addis): New predicates
	to support power8 load fusion.
	(fusion_gpr_mem_load): Likewise.

	* config/rs6000/rs6000-modes.def (PTImode): Update a comment.

	* config/rs6000/rs6000-protos.h (fusion_gpr_load_p): New
	declarations for power8 load fusion.
	(emit_fusion_gpr_load): Likewise.

	* config/rs6000/rs6000.opt (-mlra): New undocumented switch to
	turn on using the LRA register allocator.
	(-mconstrain-regs): New undocumented switch to constrain
	non-integer values from being loaded into the LR or CTR registers.

	* config/rs6000/rs6000.c (TARGET_LRA_P): If -mlra, turn on using
	the LRA register allocator.
	(rs6000_lra_p): Likewise.
	(rs6000_hard_regno_mode_ok): Allow DI/DD/SF/SD modes in altivec
	registers if power8.  If -mconstrain-regs, only allow int modes
	into LR, CTR, and special purpose registers.
	(rs6000_debug_reg_global): Print -mlra, -mconstrain-regs status if
	debugging.
	(rs6000_init_hard_regno_mode_ok): Mark that SFmode can use Altivec
	registers in the future.
	(rs6000_option_override_internal): If tuning for power8, turn on
	fusion mode by default.  Turn on sign extending fusion mode if
	normal fusion mode is on, and we are at -O2 or -O3.
	(rs6000_opt_masks): Add -mlra, -mconstrain-regs.
	(fusion_gpr_load_p): New function, return true if we can fuse an
	addis instruction with a dependent load to a GPR.
	(emit_fusion_gpr_load): Emit the instructions for power8 load
	fusion to GPRs.

	* config/rs6000/vsx.md (VSX load fusion peepholes): Add peepholes
	to fuse together an addi instruction with a VSX load instruction.

	* config/rs6000/rs6000.md (GPR load fusion peepholes): Add
	peepholes to fuse an addis instruction with a load to a GPR base
	register, if the addis instruction is dead after the load, by
	using the register to be loaded for the addis.  If we are
	supporting sign extending fusions, convert sign extending loads to
	zero extending loads and an explicit sign extension.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

[-- Attachment #2: gcc-power8.official-08b --]
[-- Type: text/plain, Size: 21787 bytes --]

Index: gcc/config/rs6000/predicates.md
===================================================================
--- gcc/config/rs6000/predicates.md	(revision 199168)
+++ gcc/config/rs6000/predicates.md	(working copy)
@@ -1654,3 +1654,99 @@ (define_predicate "small_toc_ref"
 
   return GET_CODE (op) == UNSPEC && XINT (op, 1) == UNSPEC_TOCREL;
 })
+
+;; Match the first insn (addis) in fusing the combination of addis and loads to
+;; GPR registers on power8.  Power8 currently will only do the fusion if the
+;; top 11 bits of the addis value are all 1's or 0's.
+(define_predicate "fusion_gpr_addis"
+  (match_code "const_int,high,plus")
+{
+  HOST_WIDE_INT value;
+  rtx int_const;
+
+  /* 32-bit is not done yet.  */
+  if (TARGET_ELF && !TARGET_POWERPC64)
+    return 0;
+
+  if (GET_CODE (op) == HIGH)
+    return 1;
+
+  if (CONST_INT_P (op))
+    int_const = op;
+
+  else if (GET_CODE (op) == PLUS
+	   && base_reg_operand (XEXP (op, 0), Pmode)
+	   && CONST_INT_P (XEXP (op, 1)))
+    int_const = XEXP (op, 1);
+
+  else
+    return 0;
+
+  value = INTVAL (int_const);
+  if ((value & (HOST_WIDE_INT)0xffff) != 0)
+    return 0;
+
+  if ((value & (HOST_WIDE_INT)0xffff0000) == 0)
+    return 0;
+
+  return (IN_RANGE (value >> 16, -32, 31));
+})
+
+;; Match the second insn (lbz, lhz, lwz, ld) in fusing the combination of addis
+;; and loads to GPR registers on power8.
+(define_predicate "fusion_gpr_mem_load"
+  (match_code "mem")
+{
+  rtx addr;
+
+  if (!MEM_P (op))
+    return 0;
+
+  switch (mode)
+    {
+    case QImode:
+    case HImode:
+    case SImode:
+      break;
+
+    case DImode:
+      if (!TARGET_POWERPC64)
+	return 0;
+      break;
+
+    default:
+      return 0;
+    }
+
+  addr = XEXP (op, 0);
+  if (GET_CODE (addr) == PLUS)
+    {
+      rtx base = XEXP (addr, 0);
+      rtx offset = XEXP (addr, 1);
+
+      return (base_reg_operand (base, GET_MODE (base))
+	      && satisfies_constraint_I (offset));
+    }
+
+  else if (GET_CODE (addr) == LO_SUM)
+    {
+      rtx base = XEXP (addr, 0);
+      rtx offset = XEXP (addr, 1);
+
+      /* 32-bit is not done yet.  */
+      if (TARGET_ELF && !TARGET_POWERPC64)
+      return 0;
+
+      if (!base_reg_operand (base, GET_MODE (base)))
+	return 0;
+
+      else if (TARGET_XCOFF || (TARGET_ELF && TARGET_POWERPC64))
+	return small_toc_ref (offset, GET_MODE (offset));
+/*
+      else if (TARGET_ELF && !TARGET_POWERPC64)
+	return CONSTANT_P (offset);
+*/
+    }
+
+  return 0;
+})
Index: gcc/config/rs6000/rs6000-modes.def
===================================================================
--- gcc/config/rs6000/rs6000-modes.def	(revision 199037)
+++ gcc/config/rs6000/rs6000-modes.def	(working copy)
@@ -42,5 +42,7 @@ VECTOR_MODES (FLOAT, 8);      /*        
 VECTOR_MODES (FLOAT, 16);     /*       V8HF  V4SF V2DF */
 VECTOR_MODES (FLOAT, 32);     /*       V16HF V8SF V4DF */
 
-/* Replacement for TImode that only is allowed in GPRs.  */
+/* Replacement for TImode that only is allowed in GPRs.  We also use PTImode
+   for quad memory atomic operations to force getting an even/odd register
+   combination.  */
 PARTIAL_INT_MODE (TI);
Index: gcc/config/rs6000/rs6000-protos.h
===================================================================
--- gcc/config/rs6000/rs6000-protos.h	(revision 199200)
+++ gcc/config/rs6000/rs6000-protos.h	(working copy)
@@ -73,6 +73,8 @@ extern int mems_ok_for_quad_peep (rtx, r
 extern bool gpr_or_gpr_p (rtx, rtx);
 extern bool direct_move_p (rtx, rtx);
 extern bool quad_load_store_p (rtx, rtx);
+extern bool fusion_gpr_load_p (rtx, rtx, rtx, rtx, rtx);
+extern const char *emit_fusion_gpr_load (rtx, rtx, rtx, rtx);
 extern enum reg_class (*rs6000_preferred_reload_class_ptr) (rtx,
 							    enum reg_class);
 extern enum reg_class (*rs6000_secondary_reload_class_ptr) (enum reg_class,
Index: gcc/config/rs6000/rs6000.opt
===================================================================
--- gcc/config/rs6000/rs6000.opt	(revision 199122)
+++ gcc/config/rs6000/rs6000.opt	(working copy)
@@ -542,3 +542,11 @@ Use ISA 2.07 direct move between GPR & V
 mquad-memory
 Target Report Mask(QUAD_MEMORY) Var(rs6000_isa_flags)
 Generate the quad word memory instructions (lq/stq/lqarx/stqcx).
+
+mlra
+Target Report Mask(LRA) Var(rs6000_isa_flags)
+Enable the use of the LRA (local register allocator).
+
+mconstrain-regs
+Target Undocumented Mask(CONSTRAIN_REGS) Var(rs6000_isa_flags)
+; Only allow ints of certain modes to go in SPRs
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 199210)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -1044,6 +1044,7 @@ static bool rs6000_debug_cannot_change_m
 						   enum machine_mode,
 						   enum reg_class);
 static bool rs6000_save_toc_in_prologue_p (void);
+static bool rs6000_lra_p (void);
 
 rtx (*rs6000_legitimize_reload_address_ptr) (rtx, enum machine_mode, int, int,
 					     int, int *)
@@ -1519,6 +1520,9 @@ static const struct attribute_spec rs600
 
 #undef TARGET_VECTORIZE_VEC_PERM_CONST_OK
 #define TARGET_VECTORIZE_VEC_PERM_CONST_OK rs6000_vectorize_vec_perm_const_ok
+
+#undef TARGET_LRA_P
+#define TARGET_LRA_P rs6000_lra_p
 \f
 
 /* Processor table.  */
@@ -1631,6 +1635,18 @@ rs6000_hard_regno_mode_ok (int regno, en
   if (mode == TImode && TARGET_VSX_TIMODE && VSX_REGNO_P (regno))
     return 1;
 
+  /* Allow 64-bit and the 32-bit floating point types in Altivec registers
+     under Power8.  In theory, we would allow 32-bit integers as well.  We
+     allow SDmode, even though no decimal operation works the Altivec
+     registers, but it is ok for moves.  */
+  if (TARGET_VSX && VSX_REGNO_P (regno) && TARGET_P8_VECTOR
+      && VECTOR_MEM_VSX_P (DFmode)
+      && (mode == DImode
+	  || mode == DDmode
+	  || mode == SFmode
+	  || mode == SDmode))
+    return 1;
+
   /* The GPRs can hold any mode, but values bigger than one register
      cannot go past R31.  */
   if (INT_REGNO_P (regno))
@@ -1671,6 +1687,18 @@ rs6000_hard_regno_mode_ok (int regno, en
   if (SPE_SIMD_REGNO_P (regno) && TARGET_SPE && SPE_VECTOR_MODE (mode))
     return 1;
 
+  /* See if we need to be stricter about what goes into the special
+     registers (LR, CTR, VSAVE, VSCR).  */
+  if (TARGET_CONSTRAIN_REGS)
+    {
+      if (regno == LR_REGNO || regno == CTR_REGNO)
+	return (GET_MODE_CLASS (mode) == MODE_INT
+		&& rs6000_hard_regno_nregs[mode][regno] == 1);
+
+      if (regno == VRSAVE_REGNO || regno == VSCR_REGNO)
+	return (mode == SImode);
+    }
+
   /* We cannot put non-VSX TImode or PTImode anywhere except general register
      and it must be able to fit within the register set.  */
 
@@ -2138,6 +2166,9 @@ rs6000_debug_reg_global (void)
     fprintf (stderr, DEBUG_FMT_S, "p8 fusion",
 	     (TARGET_P8_FUSION_SIGN) ? "zero+sign" : "zero");
 
+  if (TARGET_CONSTRAIN_REGS)
+    fprintf (stderr, DEBUG_FMT_S, "constrain-regs", "true");
+
   fprintf (stderr, DEBUG_FMT_S, "plt-format",
 	   TARGET_SECURE_PLT ? "secure" : "bss");
   fprintf (stderr, DEBUG_FMT_S, "struct-return",
@@ -2321,6 +2352,15 @@ rs6000_init_hard_regno_mode_ok (bool glo
       rs6000_vector_align[TImode] = align64;
     }
 
+  /* SFmode, see if we want to use the VSX unit.  */
+  if (TARGET_P8_VECTOR)
+    {
+      rs6000_vector_unit[SFmode] = VECTOR_P8_VECTOR;
+      rs6000_vector_mem[SFmode]
+	= (TARGET_VSX_SCALAR_MEMORY ? VECTOR_P8_VECTOR : VECTOR_NONE);
+      rs6000_vector_align[SFmode] = align32;
+    }
+
   /* TODO add SPE and paired floating point vector support.  */
 
   /* Register class constraints for the constraints that depend on compile
@@ -3067,6 +3107,21 @@ rs6000_option_override_internal (bool gl
       rs6000_isa_flags &= ~OPTION_MASK_VSX_TIMODE;
     }
 
+  /* Enable power8 fusion if we are tuning for power8, even if we aren't
+     generating power8 instructions.  */
+  if (!(rs6000_isa_flags_explicit & OPTION_MASK_P8_FUSION))
+    rs6000_isa_flags |= (processor_target_table[tune_index].target_enable
+			 & OPTION_MASK_P8_FUSION);
+
+  /* Power8 does not fuse sign extended loads with the addis.  If we are
+     optimizing at high levels for speed, convert a sign extended load into a
+     zero extending load, and an explicit sign extension.  */
+  if (TARGET_P8_FUSION
+      && !(rs6000_isa_flags_explicit & OPTION_MASK_P8_FUSION_SIGN)
+      && optimize_function_for_speed_p (cfun)
+      && optimize >= 3)
+    rs6000_isa_flags |= OPTION_MASK_P8_FUSION_SIGN;
+
   if (TARGET_DEBUG_REG || TARGET_DEBUG_TARGET)
     rs6000_print_isa_options (stderr, 0, "after defaults", rs6000_isa_flags);
 
@@ -28674,12 +28729,14 @@ static struct rs6000_opt_mask const rs60
 {
   { "altivec",			OPTION_MASK_ALTIVEC,		false, true  },
   { "cmpb",			OPTION_MASK_CMPB,		false, true  },
+  { "constrain-regs",		OPTION_MASK_CONSTRAIN_REGS,	false, false },
   { "crypto",			OPTION_MASK_CRYPTO,		false, true  },
   { "direct-move",		OPTION_MASK_DIRECT_MOVE,	false, true  },
   { "dlmzb",			OPTION_MASK_DLMZB,		false, true  },
   { "fprnd",			OPTION_MASK_FPRND,		false, true  },
   { "hard-dfp",			OPTION_MASK_DFP,		false, true  },
   { "isel",			OPTION_MASK_ISEL,		false, true  },
+  { "lra",			OPTION_MASK_LRA,		false, false },
   { "mfcrf",			OPTION_MASK_MFCRF,		false, true  },
   { "mfpgpr",			OPTION_MASK_MFPGPR,		false, true  },
   { "mulhw",			OPTION_MASK_MULHW,		false, true  },
@@ -29683,6 +29740,254 @@ rs6000_set_up_by_prologue (struct hard_r
     add_to_hard_reg_set (&set->set, Pmode, RS6000_PIC_OFFSET_TABLE_REGNUM);
 }
 
+\f
+/* Enable/disable the LRA (local register allocator).  */
+
+static bool
+rs6000_lra_p (void)
+{
+  return TARGET_LRA;
+}
+
+\f
+/* Return true if the peephole2 can combine a load involving a combination of
+   an addis instruction and a load with an offset that can be fused together on
+   a power8.  */
+
+bool
+fusion_gpr_load_p (rtx addis_reg,	/* reg. to hold high value.  */
+		   rtx addis_value,	/* high value loaded.  */
+		   rtx target,		/* reg. that is loaded.  */
+		   rtx mem,		/* memory to load.  */
+		   rtx insn)		/* insn for looking up reg notes or
+					   NULL_RTX if this is a peephole2.  */
+{
+  rtx addr;
+  rtx base_reg;
+
+  /* Validate arguments.  */
+  if (!base_reg_operand (addis_reg, GET_MODE (addis_reg)))
+    return false;
+
+  if (!base_reg_operand (target, GET_MODE (target)))
+    return false;
+
+  if (!fusion_gpr_addis (addis_value, GET_MODE (addis_value)))
+    return false;
+
+  if (!fusion_gpr_mem_load (mem, GET_MODE (mem)))
+    return false;
+
+  /* Validate that the register used to load the high value is either the
+     register being loaded, or we can safely replace its use in a peephole.
+
+     If this is a peephole2, we assume that there are 2 instructions in the
+     peephole (addis and load), so we want to check if the target register was
+     not used and the register to hold the addis result is dead after the
+     peephole.  */
+  if (REGNO (addis_reg) != REGNO (target))
+    {
+      if (reg_mentioned_p (target, mem))
+	return false;
+
+      if (insn)
+	{
+	  if (!find_reg_note (insn, REG_DEAD, addis_reg))
+	    return false;
+	}
+      else
+	{
+	  if (!peep2_reg_dead_p (2, addis_reg))
+	    return false;
+	}
+    }
+
+  /* Validate that the value being loaded in the addis is used in the load.  */
+  addr = XEXP (mem, 0);			/* either PLUS or LO_SUM.  */
+  if (GET_CODE (addr) != PLUS && GET_CODE (addr) != LO_SUM)
+    return false;
+
+  base_reg = XEXP (addr, 0);
+  return REGNO (addis_reg) == REGNO (base_reg);
+}
+
+/* Return a string to fuse an addis instruction with a gpr load to the same
+   register that we loaded up the addis instruction.  The code is complicated,
+   so we call output_asm_insn directly, and just return "".  */
+
+const char *
+emit_fusion_gpr_load (rtx addis_reg, rtx addis_value, rtx target, rtx mem)
+{
+  rtx fuse_ops[10];
+  rtx addr;
+  rtx load_offset;
+  const char *addis_str = NULL;
+  const char *load_str = NULL;
+  const char *mode_name = NULL;
+  char insn_template[80];
+  enum machine_mode mode = GET_MODE (mem);
+  const char *comment_str = ASM_COMMENT_START;
+
+  if (*comment_str == ' ')
+    comment_str++;
+
+  if (!MEM_P (mem))
+    gcc_unreachable ();
+
+  addr = XEXP (mem, 0);
+  if (GET_CODE (addr) != PLUS && GET_CODE (addr) != LO_SUM)
+    gcc_unreachable ();
+
+  load_offset = XEXP (addr, 1);
+
+  /* Now emit the load instruction to the same register.  */
+  switch (mode)
+    {
+    case QImode:
+      mode_name = "char";
+      load_str = "lbz";
+      break;
+
+    case HImode:
+      mode_name = "short";
+      load_str = "lhz";
+      break;
+
+    case SImode:
+      mode_name = "int";
+      load_str = "lwz";
+      break;
+
+    case DImode:
+      if (TARGET_POWERPC64)
+	{
+	  mode_name = "long";
+	  load_str = "ld";
+	}
+      break;
+
+    default:
+      break;
+    }
+
+  if (!load_str)
+    gcc_unreachable ();
+
+  /* Emit the addis instruction.  */
+  fuse_ops[0] = target;
+  fuse_ops[1] = addis_reg;
+  if (satisfies_constraint_L (addis_value))
+    {
+      fuse_ops[2] = addis_value;
+      addis_str = "lis %0,%v2";
+    }
+
+  else if (GET_CODE (addis_value) == PLUS)
+    {
+      rtx op0 = XEXP (addis_value, 0);
+      rtx op1 = XEXP (addis_value, 1);
+
+      if (REG_P (op0) && CONST_INT_P (op1)
+	  && satisfies_constraint_L (op1))
+	{
+	  fuse_ops[2] = op0;
+	  fuse_ops[3] = op1;
+	  addis_str = "addis %0,%2,%v3";
+	}
+    }
+
+  else if (GET_CODE (addis_value) == HIGH)
+    {
+      rtx value = XEXP (addis_value, 0);
+      if (GET_CODE (value) == UNSPEC && XINT (value, 1) == UNSPEC_TOCREL)
+	{
+	  fuse_ops[2] = XVECEXP (value, 0, 0);		/* symbol ref.  */
+	  fuse_ops[3] = XVECEXP (value, 0, 1);		/* TOC register.  */
+	  if (TARGET_ELF)
+	    addis_str = "addis %0,%3,%2@toc@ha";
+
+	  else if (TARGET_XCOFF)
+	    addis_str = "addis %0,%2@u(%3)";
+	}
+
+      else if (GET_CODE (value) == PLUS)
+	{
+	  rtx op0 = XEXP (value, 0);
+	  rtx op1 = XEXP (value, 1);
+
+	  if (GET_CODE (op0) == UNSPEC
+	      && XINT (op0, 1) == UNSPEC_TOCREL
+	      && CONST_INT_P (op1))
+	    {
+	      fuse_ops[2] = XVECEXP (op0, 0, 0);	/* symbol ref.  */
+	      fuse_ops[3] = XVECEXP (op0, 0, 1);	/* TOC register.  */
+	      fuse_ops[4] = op1;
+	      if (TARGET_ELF)
+		addis_str = "addis %0,%3,%2+%4@toc@ha";
+
+	      else if (TARGET_XCOFF)
+		addis_str = "addis %0,%2+%4@u(%3)";
+	    }
+	}
+    }
+
+  if (!addis_str)
+    gcc_unreachable ();
+
+  sprintf (insn_template, "%s\t\t%s gpr load fusion, type %s, addis reg %%1",
+	   addis_str, comment_str, mode_name);
+  output_asm_insn (insn_template, fuse_ops);
+
+  if (CONST_INT_P (load_offset) && satisfies_constraint_I (load_offset))
+    {
+      sprintf (insn_template, "%s %%0,%%1(%%0)", load_str);
+      fuse_ops[1] = load_offset;
+      output_asm_insn (insn_template, fuse_ops);
+    }
+
+  else if (GET_CODE (load_offset) == UNSPEC
+	   && XINT (load_offset, 1) == UNSPEC_TOCREL)
+    {
+      if (TARGET_ELF)
+	sprintf (insn_template, "%s %%0,%%1@toc@l(%%0)", load_str);
+
+      else if (TARGET_XCOFF)
+	sprintf (insn_template, "%s %%0,%%1@l(%%0)", load_str);
+
+      else
+	gcc_unreachable ();
+
+      fuse_ops[1] = XVECEXP (load_offset, 0, 0);
+      output_asm_insn (insn_template, fuse_ops);
+    }
+
+  else if (GET_CODE (load_offset) == PLUS
+	   && GET_CODE (XEXP (load_offset, 0)) == UNSPEC
+	   && XINT (XEXP (load_offset, 0), 1) == UNSPEC_TOCREL
+	   && CONST_INT_P (XEXP (load_offset, 1)))
+    {
+      rtx tocrel_unspec = XEXP (load_offset, 0);
+      if (TARGET_ELF)
+	sprintf (insn_template, "%s %%0,%%1+%%2@toc@l(%%0)", load_str);
+
+      else if (TARGET_XCOFF)
+	sprintf (insn_template, "%s %%0,%%1+%%2@l(%%0)", load_str);
+
+      else
+	gcc_unreachable ();
+
+      fuse_ops[1] = XVECEXP (tocrel_unspec, 0, 0);
+      fuse_ops[2] = XEXP (load_offset, 1);
+      output_asm_insn (insn_template, fuse_ops);
+    }
+
+  else
+    gcc_unreachable ();
+
+  return "";
+}
+
+\f
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-rs6000.h"
Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md	(revision 199200)
+++ gcc/config/rs6000/vsx.md	(working copy)
@@ -1847,3 +1847,28 @@ (define_insn_and_split "*vsx_reduc_<VEC_
 }"
   [(set_attr "length" "20")
    (set_attr "type" "veccomplex")])
+
+\f
+;; Power8 Vector fusion
+;; Note, the fused ops must be adjacent, so don't split these ops
+(define_peephole
+  [(set (match_operand:P 0 "base_reg_operand" "")
+	(match_operand:P 1 "short_cint_operand" ""))
+   (set (match_operand:VSX_M2 2 "vsx_register_operand" "")
+	(mem:VSX_M2 (plus:P (match_dup 0)
+			    (match_operand:P 3 "int_reg_operand" ""))))]
+  "TARGET_P8_FUSION"
+  "li %0,%1\t\t# vector load fusion\;lx<VSX_M2:VSm>x %x2,%0,%3"  
+  [(set_attr "length" "8")
+   (set_attr "type" "vecload")])
+
+(define_peephole
+  [(set (match_operand:P 0 "base_reg_operand" "")
+	(match_operand:P 1 "short_cint_operand" ""))
+   (set (match_operand:VSX_M2 2 "vsx_register_operand" "")
+	(mem:VSX_M2 (plus:P (match_operand:P 3 "int_reg_operand" "")
+			    (match_dup 0))))]
+  "TARGET_P8_FUSION"
+  "li %0,%1\t\t# vector load fusion\;lx<VSX_M2:VSm>x %x2,%0,%3"  
+  [(set_attr "length" "8")
+   (set_attr "type" "vecload")])
Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(revision 199210)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -15237,6 +15237,117 @@ (define_insn "rs6000_mftb_<mode>"
 })
 
 \f
+;; Power8 fusion support for fusing an addis instruction with a D-form load of
+;; a GPR.  The addis instruction must be adjacent to the load, and use the same
+;; register that is being loaded.
+
+;; Note, the fused ops must be adjacent, so don't split these ops Originally
+;; these were written as define_peephole2's, but moved to define_peephole's so
+;; that it doesn't conflict with cse after reload in some cases.
+
+;; GPR fusion for single word integer types
+
+(define_peephole
+  [(set (match_operand:P 0 "base_reg_operand" "")
+	(match_operand:P 1 "fusion_gpr_addis" ""))
+   (set (match_operand:INT1 2 "base_reg_operand" "")
+	(match_operand:INT1 3 "fusion_gpr_mem_load" ""))]
+  "TARGET_P8_FUSION
+   && fusion_gpr_load_p (operands[0], operands[1], operands[2], operands[3],
+			 insn)"
+{
+  return emit_fusion_gpr_load (operands[0], operands[1], operands[2],
+			       operands[3]);
+}
+  [(set_attr "type" "load")
+   (set_attr "length" "8")])
+
+(define_peephole
+  [(set (match_operand:DI 0 "base_reg_operand" "")
+	(match_operand:DI 1 "fusion_gpr_addis" ""))
+   (set (match_operand:DI 2 "base_reg_operand" "")
+	(zero_extend:DI (match_operand:QHSI 3 "fusion_gpr_mem_load" "")))]
+  "TARGET_P8_FUSION && TARGET_POWERPC64
+   && fusion_gpr_load_p (operands[0], operands[1], operands[2], operands[3],
+			 insn)"
+{
+  return emit_fusion_gpr_load (operands[0], operands[1], operands[2],
+			       operands[3]);
+}
+  [(set_attr "type" "load")
+   (set_attr "length" "8")])
+
+;; Power8 does not fuse a sign extending load, so convert the sign extending
+;; load into a zero extending load, and do an explicit sign extension.  Don't
+;; do this if we are trying to optimize for space.  Do this as a peephole2 to
+;; allow final rtl optimizations and scheduling to move the sign extend.
+(define_peephole2
+  [(set (match_operand:DI 0 "base_reg_operand" "")
+	(match_operand:DI 1 "fusion_gpr_addis" ""))
+   (set (match_operand:DI 2 "base_reg_operand" "")
+	(sign_extend:DI (match_operand:HSI 3 "fusion_gpr_mem_load" "")))]
+  "TARGET_P8_FUSION && TARGET_P8_FUSION_SIGN && TARGET_POWERPC64
+   && fusion_gpr_load_p (operands[0], operands[1], operands[2], operands[3],
+			 NULL_RTX)"
+  [(set (match_dup 0) (match_dup 1))
+   (set (match_dup 4) (match_dup 3))
+   (set (match_dup 2) (sign_extend:DI (match_dup 4)))]
+{
+  unsigned int offset
+    = (BYTES_BIG_ENDIAN ? 8 - GET_MODE_SIZE (<MODE>mode) : 0);
+
+  operands[4] = simplify_subreg (<MODE>mode, operands[2], DImode,
+				 offset);
+})
+
+(define_peephole
+  [(set (match_operand:P 0 "base_reg_operand" "")
+	(match_operand:P 1 "fusion_gpr_addis" ""))
+   (set (match_operand:SI 2 "base_reg_operand" "")
+	(zero_extend:SI (match_operand:QHI 3 "fusion_gpr_mem_load" "")))]
+  "TARGET_P8_FUSION
+   && fusion_gpr_load_p (operands[0], operands[1], operands[2], operands[3],
+			 insn)"
+{
+  return emit_fusion_gpr_load (operands[0], operands[1], operands[2],
+			       operands[3]);
+}
+  [(set_attr "type" "load")
+   (set_attr "length" "8")])
+
+(define_peephole2
+  [(set (match_operand:P 0 "base_reg_operand" "")
+	(match_operand:P 1 "fusion_gpr_addis" ""))
+   (set (match_operand:SI 2 "base_reg_operand" "")
+	(sign_extend:SI (match_operand:HI 3 "fusion_gpr_mem_load" "")))]
+  "TARGET_P8_FUSION && TARGET_P8_FUSION_SIGN
+   && fusion_gpr_load_p (operands[0], operands[1], operands[2], operands[3],
+			 NULL_RTX)"
+  [(set (match_dup 0) (match_dup 1))
+   (set (match_dup 4) (match_dup 3))
+   (set (match_dup 2) (sign_extend:SI (match_dup 4)))]
+{
+  unsigned int offset = (BYTES_BIG_ENDIAN ? 2 : 0);
+
+  operands[4] = simplify_subreg (HImode, operands[2], SImode, offset);
+})
+
+(define_peephole
+  [(set (match_operand:P 0 "base_reg_operand" "")
+	(match_operand:P 1 "fusion_gpr_addis" ""))
+   (set (match_operand:HI 2 "base_reg_operand" "")
+	(zero_extend:HI (match_operand:QI 3 "fusion_gpr_mem_load" "")))]
+  "TARGET_P8_FUSION
+   && fusion_gpr_load_p (operands[0], operands[1], operands[2], operands[3],
+			 insn)"
+{
+  return emit_fusion_gpr_load (operands[0], operands[1], operands[2],
+			       operands[3]);
+}
+  [(set_attr "type" "load")
+   (set_attr "length" "8")])
+
+\f
 
 (include "sync.md")
 (include "vector.md")

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH, rs6000] power8 patches, patch #2, add crypto builtins
  2013-05-22  3:30   ` David Edelsohn
@ 2013-05-23  3:41     ` David Edelsohn
  2013-05-23  3:59       ` Michael Meissner
  0 siblings, 1 reply; 52+ messages in thread
From: David Edelsohn @ 2013-05-23  3:41 UTC (permalink / raw)
  To: Michael Meissner, GCC Patches, Pat Haugen, Peter Bergner

Mike,

When you committed the patch, you did not add the new rs6000/crypto.md
file to the repository.

- David


On Tue, May 21, 2013 at 11:30 PM, David Edelsohn <dje.gcc@gmail.com> wrote:
> On Mon, May 20, 2013 at 7:13 PM, Michael Meissner
> <meissner@linux.vnet.ibm.com> wrote:
>> This patch adds the builtins for the new ISA 2.07 crypto instructions.  It
>> bootstraps and causes no regressions, is it ok to install after patch #1 has
>> been applied?
>>
>> [gcc]
>> 2013-05-20  Michael Meissner  <meissner@linux.vnet.ibm.com>
>>
>>         * doc/extend.texi (PowerPC AltiVec/VSX Built-in Functions): Add
>>         documentation for the power8 crypto builtins.
>>
>>         * config/rs6000/t-rs6000 (MD_INCLUDES): Add crypto.md.
>>
>>         * config/rs6000/rs6000-builtin.def (BU_P8V_AV_1): Add support
>>         macros for defining power8 builtin functions.
>>         (BU_P8V_AV_2): Likewise.
>>         (BU_P8V_AV_P): Likewise.
>>         (BU_P8V_VSX_1): Likewise.
>>         (BU_P8V_OVERLOAD_1): Likewise.
>>         (BU_P8V_OVERLOAD_2): Likewise.
>>         (BU_CRYPTO_1): Likewise.
>>         (BU_CRYPTO_2): Likewise.
>>         (BU_CRYPTO_3): Likewise.
>>         (BU_CRYPTO_OVERLOAD_1): Likewise.
>>         (BU_CRYPTO_OVERLOAD_2): Likewise.
>>         (XSCVSPDP): Fix typo, point to the correct instruction.
>>         (VCIPHER): Add power8 crypto builtins.
>>         (VCIPHERLAST): Likewise.
>>         (VNCIPHER): Likewise.
>>         (VNCIPHERLAST): Likewise.
>>         (VPMSUMB): Likewise.
>>         (VPMSUMH): Likewise.
>>         (VPMSUMW): Likewise.
>>         (VPERMXOR_V2DI): Likewise.
>>         (VPERMXOR_V4SI: Likewise.
>>         (VPERMXOR_V8HI: Likewise.
>>         (VPERMXOR_V16QI: Likewise.
>>         (VSHASIGMAW): Likewise.
>>         (VSHASIGMAD): Likewise.
>>         (VPMSUM): Likewise.
>>         (VPERMXOR): Likewise.
>>         (VSHASIGMA): Likewise.
>>
>>         * config/rs6000/rs6000-c.c (rs6000_target_modify_macros): Define
>>         __CRYPTO__ if the crypto instructions are available.
>>         (altivec_overloaded_builtins): Add support for overloaded power8
>>         builtins.
>>
>>         * config/rs6000/rs6000.c (rs6000_expand_ternop_builtin): Add
>>         support for power8 crypto builtins.
>>         (builtin_function_type): Likewise.
>>         (altivec_init_builtins): Add support for builtins that take vector
>>         long long (V2DI) arguments.
>>
>>         * config/rs6000/crypto.md: New file, define power8 crypto
>>         instructions.
>>
>> [gcc/testsuite]
>> 2013-05-20  Michael Meissner  <meissner@linux.vnet.ibm.com>
>>
>>         * gcc.target/powerpc/crypto-builtin-1.c: New file, test for power8
>>         crypto builtins.
>
> Patch #2 is okay.
>
> Thanks, David

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH, rs6000] power8 patches, patch #2, add crypto builtins
  2013-05-23  3:41     ` David Edelsohn
@ 2013-05-23  3:59       ` Michael Meissner
  2013-05-25  4:07         ` David Edelsohn
  0 siblings, 1 reply; 52+ messages in thread
From: Michael Meissner @ 2013-05-23  3:59 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Michael Meissner, GCC Patches, Pat Haugen, Peter Bergner

On Wed, May 22, 2013 at 11:41:44PM -0400, David Edelsohn wrote:
> Mike,
> 
> When you committed the patch, you did not add the new rs6000/crypto.md
> file to the repository.

Right.  I remembered to add the new test, but not crypto.me.  I just committed
it.  I'm sorry about that.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH, rs6000] power8 patches, patch #3, add V2DI vector support
  2013-05-21 15:51 ` [PATCH, rs6000] power8 patches, patch #3, add V2DI vector support Michael Meissner
@ 2013-05-23 16:31   ` David Edelsohn
  0 siblings, 0 replies; 52+ messages in thread
From: David Edelsohn @ 2013-05-23 16:31 UTC (permalink / raw)
  To: Michael Meissner, GCC Patches, Pat Haugen, Peter Bergner

On Tue, May 21, 2013 at 11:42 AM, Michael Meissner
<meissner@linux.vnet.ibm.com> wrote:
> This is patch #3 of our power8 changes.  It adds support for vectorizing 64-bit
> integer types (V2DI) for plus, subtract, absolute value, minimum, maximum,
> shift, rotate, and comparison.  Like the other patches, I have bootstraped
> these patches, and had no regressions.  The test gcc.dg/vect/vect-96.c now
> passes (it had failed on trunk, for compilers built with --with-cpu=power7).
> Are the patches ok to commit to the tree.
>
> Due to size issues, I will submit the tests for the testsuite either as part of
> patch #4 or #5.
>
> 2013-05-20  Michael Meissner  <meissner@linux.vnet.ibm.com>
>             Pat Haugen <pthaugen@us.ibm.com>
>             Peter Bergner <bergner@vnet.ibm.com>
>
>         * config/rs6000/vector.md (VEC_I): Add support for new power8 V2DI
>         instructions.
>         (VEC_A): Likewise.
>         (VEC_C): Likewise.
>         (vrotl<mode>3): Likewise.
>         (vashl<mode>3): Likewise.
>         (vlshr<mode>3): Likewise.
>         (vashr<mode>3): Likewise.
>
>         * config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Add
>         support for power8 V2DI builtins.
>
>         * config/rs6000/rs6000-builtin.def (abs_v2di): Add support for
>         power8 V2DI builtins.
>         (vupkhsw): Likewise.
>         (vupklsw): Likewise.
>         (vaddudm): Likewise.
>         (vminsd): Likewise.
>         (vmaxsd): Likewise.
>         (vminud): Likewise.
>         (vmaxud): Likewise.
>         (vpkudum): Likewise.
>         (vpksdss): Likewise.
>         (vpkudus): Likewise.
>         (vpksdus): Likewise.
>         (vrld): Likewise.
>         (vsld): Likewise.
>         (vsrd): Likewise.
>         (vsrad): Likewise.
>         (vsubudm): Likewise.
>         (vcmpequd): Likewise.
>         (vcmpgtsd): Likewise.
>         (vcmpgtud): Likewise.
>         (vcmpequd_p): Likewise.
>         (vcmpgtsd_p): Likewise.
>         (vcmpgtud_p): Likewise.
>         (vupkhsw): Likewise.
>         (vupklsw): Likewise.
>         (vaddudm): Likewise.
>         (vmaxsd): Likewise.
>         (vmaxud): Likewise.
>         (vminsd): Likewise.
>         (vminud): Likewise.
>         (vpksdss): Likewise.
>         (vpksdus): Likewise.
>         (vpkudum): Likewise.
>         (vpkudus): Likewise.
>         (vrld): Likewise.
>         (vsld): Likewise.
>         (vsrad): Likewise.
>         (vsrd): Likewise.
>         (vsubudm): Likewise.
>
>         * config/rs6000/rs6000.c (rs6000_init_hard_regno_mode_ok): Add
>         support for power8 V2DI instructions.
>
>         * config/rs6000/altivec.md (UNSPEC_VPKUHUM): Add support for
>         power8 V2DI instructions.  Combine pack and unpack insns to use an
>         iterator for each mode.  Check whether a particular mode supports
>         Altivec instructions instead of just checking TARGET_ALTIVEC.
>         (UNSPEC_VPKUWUM): Likewise.
>         (UNSPEC_VPKSHSS): Likewise.
>         (UNSPEC_VPKSWSS): Likewise.
>         (UNSPEC_VPKUHUS): Likewise.
>         (UNSPEC_VPKSHUS): Likewise.
>         (UNSPEC_VPKUWUS): Likewise.
>         (UNSPEC_VPKSWUS): Likewise.
>         (UNSPEC_VPACK_SIGN_SIGN_SAT): Likewise.
>         (UNSPEC_VPACK_SIGN_UNS_SAT): Likewise.
>         (UNSPEC_VPACK_UNS_UNS_SAT): Likewise.
>         (UNSPEC_VPACK_UNS_UNS_MOD): Likewise.
>         (UNSPEC_VUPKHSB): Likewise.
>         (UNSPEC_VUNPACK_HI_SIGN): Likewise.
>         (UNSPEC_VUNPACK_LO_SIGN): Likewise.
>         (UNSPEC_VUPKHSH): Likewise.
>         (UNSPEC_VUPKLSB): Likewise.
>         (UNSPEC_VUPKLSH): Likewise.
>         (VI2): Likewise.
>         (VI_char): Likewise.
>         (VI_scalar): Likewise.
>         (VI_unit): Likewise.
>         (VP): Likewise.
>         (VP_small): Likewise.
>         (VP_small_lc): Likewise.
>         (VU_char): Likewise.
>         (add<mode>3): Likewise.
>         (altivec_vaddcuw): Likewise.
>         (altivec_vaddu<VI_char>s): Likewise.
>         (altivec_vadds<VI_char>s): Likewise.
>         (sub<mode>3): Likewise.
>         (altivec_vsubcuw): Likewise.
>         (altivec_vsubu<VI_char>s): Likewise.
>         (altivec_vsubs<VI_char>s): Likewise.
>         (altivec_vavgs<VI_char>): Likewise.
>         (altivec_vcmpbfp): Likewise.
>         (altivec_eq<mode>): Likewise.
>         (altivec_gt<mode>): Likewise.
>         (altivec_gtu<mode>): Likewise.
>         (umax<mode>3): Likewise.
>         (smax<mode>3): Likewise.
>         (umin<mode>3): Likewise.
>         (smin<mode>3): Likewise.
>         (altivec_vpkuhum): Likewise.
>         (altivec_vpkuwum): Likewise.
>         (altivec_vpkshss): Likewise.
>         (altivec_vpkswss): Likewise.
>         (altivec_vpkuhus): Likewise.
>         (altivec_vpkshus): Likewise.
>         (altivec_vpkuwus): Likewise.
>         (altivec_vpkswus): Likewise.
>         (altivec_vpks<VI_char>ss): Likewise.
>         (altivec_vpks<VI_char>us): Likewise.
>         (altivec_vpku<VI_char>us): Likewise.
>         (altivec_vpku<VI_char>um): Likewise.
>         (altivec_vrl<VI_char>): Likewise.
>         (altivec_vsl<VI_char>): Likewise.
>         (altivec_vsr<VI_char>): Likewise.
>         (altivec_vsra<VI_char>): Likewise.
>         (altivec_vsldoi_<mode>): Likewise.
>         (altivec_vupkhsb): Likewise.
>         (altivec_vupkhs<VU_char>): Likewise.
>         (altivec_vupkls<VU_char>): Likewise.
>         (altivec_vupkhsh): Likewise.
>         (altivec_vupklsb): Likewise.
>         (altivec_vupklsh): Likewise.
>         (altivec_vcmpequ<VI_char>_p): Likewise.
>         (altivec_vcmpgts<VI_char>_p): Likewise.
>         (altivec_vcmpgtu<VI_char>_p): Likewise.
>         (abs<mode>2): Likewise.
>         (vec_unpacks_hi_v16qi): Likewise.
>         (vec_unpacks_hi_v8hi): Likewise.
>         (vec_unpacks_lo_v16qi): Likewise.
>         (vec_unpacks_hi_<VP_small_lc>): Likewise.
>         (vec_unpacks_lo_v8hi): Likewise.
>         (vec_unpacks_lo_<VP_small_lc>): Likewise.
>         (vec_pack_trunc_v8h): Likewise.
>         (vec_pack_trunc_v4si): Likewise.
>         (vec_pack_trunc_<mode>): Likewise.
>
>         * config/rs6000/altivec.h (vec_vaddudm): Add defines for power8
>         V2DI builtins.
>         (vec_vmaxsd): Likewise.
>         (vec_vmaxud): Likewise.
>         (vec_vminsd): Likewise.
>         (vec_vminud): Likewise.
>         (vec_vpksdss): Likewise.
>         (vec_vpksdus): Likewise.
>         (vec_vpkudum): Likewise.
>         (vec_vpkudus): Likewise.
>         (vec_vrld): Likewise.
>         (vec_vsld): Likewise.
>         (vec_vsrad): Likewise.
>         (vec_vsrd): Likewise.
>         (vec_vsubudm): Likewise.
>         (vec_vupkhsw): Likewise.
>         (vec_vupklsw): Likewise.

Mike,

I don't mind using "P8" in macros because IBM doesn't have a specific
name for the additional vector instructions, but the comments in the
code sometimes refer to ISA 2.07, sometimes to power8 and sometimes to
both. Please make the references consistent, probably referring to the
ISA, because it's not a Power8-only feature.

For example:

+/* Vector comparison instructions added in ISA 2.07.  */
+BU_P8V_AV_2 (VCMPEQUD,        "vcmpequd",    CONST,    vector_eqv2di)
+BU_P8V_AV_2 (VCMPGTSD,        "vcmpgtsd",    CONST,    vector_gtv2di)
+BU_P8V_AV_2 (VCMPGTUD,        "vcmpgtud",    CONST,    vector_gtuv2di)
+
+/* Vector comparison predicate instructions added in ISA 2.07.  */
+BU_P8V_AV_P (VCMPEQUD_P,    "vcmpequd_p",    CONST,    vector_eq_v2di_p)
+BU_P8V_AV_P (VCMPGTSD_P,    "vcmpgtsd_p",    CONST,    vector_gt_v2di_p)
+BU_P8V_AV_P (VCMPGTUD_P,    "vcmpgtud_p",    CONST,    vector_gtu_v2di_p)
+
+/* Power8 vector overloaded 1 argument functions.  */
+BU_P8V_OVERLOAD_1 (VUPKHSW,    "vupkhsw")
+BU_P8V_OVERLOAD_1 (VUPKLSW,    "vupklsw")
+
+/* Power8 vector overloaded 2 argument functions.  */
+BU_P8V_OVERLOAD_2 (VADDUDM,    "vaddudm")


+  /* V2DImode, full mode depends on power8 vector mode.  Allow under VSX to do
+     insert/splat/extract.  Altivec doesn't have 64-bit integer support.  */


Patch #3 is okay with that change.

Thanks, David

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH, rs6000] power8 patches, patch #4, new power8 builtins
  2013-05-21 23:47 ` [PATCH, rs6000] power8 patches, patch #4, new power8 builtins Michael Meissner
@ 2013-05-25  4:03   ` David Edelsohn
  2013-05-30 23:26     ` Michael Meissner
  2013-06-04 18:49   ` [PATCH, rs6000] power8 patches, patch #4 (revised), " Michael Meissner
  1 sibling, 1 reply; 52+ messages in thread
From: David Edelsohn @ 2013-05-25  4:03 UTC (permalink / raw)
  To: Michael Meissner, GCC Patches, Pat Haugen, Peter Bergner

On Tue, May 21, 2013 at 7:47 PM, Michael Meissner
<meissner@linux.vnet.ibm.com> wrote:


>         * config/rs6000/rs6000.c (rs6000_option_override_internal): Only
>         allow power8 quad mode in 64-bit.  Turn off splitting wide types
>         if we have quad mode.

Completely turning off splitting wide types seems like an
unnecessarily large hammer to prevent splitting a value across
registers within logical atomic operations.  I think we need to
examine other alternatives.

- David

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH, rs6000] power8 patches, patch #2, add crypto builtins
  2013-05-23  3:59       ` Michael Meissner
@ 2013-05-25  4:07         ` David Edelsohn
  2013-05-30 21:04           ` Michael Meissner
  0 siblings, 1 reply; 52+ messages in thread
From: David Edelsohn @ 2013-05-25  4:07 UTC (permalink / raw)
  To: Michael Meissner, David Edelsohn, GCC Patches, Pat Haugen, Peter Bergner

[gcc/testsuite]
2013-05-20  Michael Meissner  <meissner@linux.vnet.ibm.com>

        * gcc.target/powerpc/crypto-builtin-1.c: New file, test for power8
        crypto builtins.

The testcase needs to check something more than

/* { dg-require-effective-target powerpc_vsx_ok } */

I don't know if we need to separate the new VSX operations from the
crypto opterations, but the new testcases need to ensure that the
assembler and/or processor support the new instructions.

Thanks, David

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH, rs6000] power8 patches, patch #6, direct move & basic quad load/store
  2013-05-22 14:26 ` [PATCH, rs6000] power8 patches, patch #6, direct move & basic quad load/store Michael Meissner
@ 2013-05-29 19:53   ` David Edelsohn
  2013-05-29 20:32     ` Michael Meissner
  0 siblings, 1 reply; 52+ messages in thread
From: David Edelsohn @ 2013-05-29 19:53 UTC (permalink / raw)
  To: Michael Meissner, GCC Patches, David Edelsohn, Pat Haugen, Peter Bergner

+      if (TARGET_DIRECT_MOVE)
+        {
+          if (TARGET_POWERPC64)
+        {
+          reload_gpr_vsx[TImode]    = CODE_FOR_reload_gpr_from_vsxti;
+          reload_gpr_vsx[V2DFmode]  = CODE_FOR_reload_gpr_from_vsxv2df;
+          reload_gpr_vsx[V2DImode]  = CODE_FOR_reload_gpr_from_vsxv2di;
+          reload_gpr_vsx[V4SFmode]  = CODE_FOR_reload_gpr_from_vsxv4sf;
+          reload_gpr_vsx[V4SImode]  = CODE_FOR_reload_gpr_from_vsxv4si;
+          reload_gpr_vsx[V8HImode]  = CODE_FOR_reload_gpr_from_vsxv8hi;
+          reload_gpr_vsx[V16QImode] = CODE_FOR_reload_gpr_from_vsxv16qi;
+          reload_gpr_vsx[SFmode]    = CODE_FOR_reload_gpr_from_vsxsf;
+
+          reload_vsx_gpr[TImode]    = CODE_FOR_reload_vsx_from_gprti;
+          reload_vsx_gpr[V2DFmode]  = CODE_FOR_reload_vsx_from_gprv2df;
+          reload_vsx_gpr[V2DImode]  = CODE_FOR_reload_vsx_from_gprv2di;
+          reload_vsx_gpr[V4SFmode]  = CODE_FOR_reload_vsx_from_gprv4sf;
+          reload_vsx_gpr[V4SImode]  = CODE_FOR_reload_vsx_from_gprv4si;
+          reload_vsx_gpr[V8HImode]  = CODE_FOR_reload_vsx_from_gprv8hi;
+          reload_vsx_gpr[V16QImode] = CODE_FOR_reload_vsx_from_gprv16qi;
+          reload_vsx_gpr[SFmode]    = CODE_FOR_reload_vsx_from_gprsf;
+        }
+          else
+        {
+          reload_fpr_gpr[DImode] = CODE_FOR_reload_fpr_from_gprdi;
+          reload_fpr_gpr[DDmode] = CODE_FOR_reload_fpr_from_gprdd;
+          reload_fpr_gpr[DFmode] = CODE_FOR_reload_fpr_from_gprdf;
+        }
+        }

Why do the VSX reload functions depend on TARGET_POWERPC64? That seems
like the wrong test.

+/* Return true if this is a move direct operation between GPR registers and
+   floating point/VSX registers.  */
+
+bool
+direct_move_p (rtx op0, rtx op1)

Why isn't this function symmetric?  It at least needs an explanation
in the comment about assumptions for the operands.

+/* Return true if this is a load or store quad operation.  */
+
+bool
+quad_load_store_p (rtx op0, rtx op1)

Same for this function.

+/* Helper function for rs6000_secondary_reload to return true if a move to a
+   different register classe is really a simple move.  */
+
+static bool
+rs6000_secondary_reload_simple_move (enum rs6000_reg_type to_type,
+                     enum rs6000_reg_type from_type,
+                     enum machine_mode mode)
+{
+  int size;
+
+  /* Add support for various direct moves available.  In this function, we only
+     look at cases where we don't need any extra registers, and one or more
+     simple move insns are issued.  At present, 32-bit integers are not allowed
+     in FPR/VSX registers.  Single precision binary floating is not a simple
+     move because we need to convert to the single precision memory layout.
+     The 4-byte SDmode can be moved.  */

The second comment should be merged into the first -- it explains the
purpose and implementation of the function.

+/* Return whether a move between two register classes can be done either
+   directly (simple move) or via a pattern that uses a single extra temporary
+   (using power8's direct move in this case.  */
+
+static bool
+rs6000_secondary_reload_move (enum rs6000_reg_type to_type,
+                  enum rs6000_reg_type from_type,
+                  enum machine_mode mode,
+                  secondary_reload_info *sri,
+                  bool altivec_p)

Missing a close parenthesis in the comment.

(define_insn "*vsx_mov<mode>"
-  [(set (match_operand:VSX_M 0 "nonimmediate_operand"
"=Z,<VSr>,<VSr>,?Z,?wa,?wa,*Y,*r,*r,<VSr>,?wa,*r,v,wZ,v")
-    (match_operand:VSX_M 1 "input_operand"
"<VSr>,Z,<VSr>,wa,Z,wa,r,Y,r,j,j,j,W,v,wZ"))]
+  [(set (match_operand:VSX_M 0 "nonimmediate_operand"
"=Z,<VSr>,<VSr>,?Z,?wa,?wa,wQ,?&r,??Y,??r,??r,<VSr>,?wa,*r,v,wZ, v")
+    (match_operand:VSX_M 1 "input_operand"
"<VSr>,Z,<VSr>,wa,Z,wa,r,wQ,r,Y,r,j,j,j,W,v,wZ"))]

Why do you need to change the modifiers? Why should vector operands in
GPRs matter for register preferences (removing `*' from "r"
constraints)?

Thanks, David

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH, rs6000] power8 patches, patch #7, quad/byte/half-word atomic instructions
  2013-05-22 16:51 ` [PATCH, rs6000] power8 patches, patch #7, quad/byte/half-word atomic instructions Michael Meissner
@ 2013-05-29 20:29   ` David Edelsohn
  2013-05-29 20:36     ` Michael Meissner
  2013-06-11 23:56     ` Michael Meissner
  0 siblings, 2 replies; 52+ messages in thread
From: David Edelsohn @ 2013-05-29 20:29 UTC (permalink / raw)
  To: Michael Meissner, GCC Patches, Pat Haugen, Peter Bergner

-  if (mode == QImode || mode == HImode)
+  /* On power8, we want to use SImode for the operation.  On previoius systems,
+     use the operation in a subword and shift/mask to get the proper byte or
+     halfword.  */
+  if (TARGET_SYNC_HI_QI && (mode == QImode || mode == HImode))
+    {
+      val = convert_modes (SImode, mode, val, 1);
+
+      /* Prepare to adjust the return value.  */
+      before = gen_reg_rtx (SImode);
+      if (after)
+    after = gen_reg_rtx (SImode);
+      mode = SImode;
+    }
+  else if (mode == QImode || mode == HImode)

Spelling: previoius.

This logic is redundant. Why not

if (mode == QImode || mode == HImode)
  {
    if (TARGET_SYNC_HI_QI)
      {
         new code
      }
    else
      {
         original code
      }

The rest of this patch is okay.

Thanks, David

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH, rs6000] power8 patches, patch #6, direct move & basic quad load/store
  2013-05-29 19:53   ` David Edelsohn
@ 2013-05-29 20:32     ` Michael Meissner
  2013-06-10 15:41       ` David Edelsohn
  0 siblings, 1 reply; 52+ messages in thread
From: Michael Meissner @ 2013-05-29 20:32 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Michael Meissner, GCC Patches, Pat Haugen, Peter Bergner

On Wed, May 29, 2013 at 03:53:43PM -0400, David Edelsohn wrote:
> +      if (TARGET_DIRECT_MOVE)
> +        {
> +          if (TARGET_POWERPC64)
> +        {
> +          reload_gpr_vsx[TImode]    = CODE_FOR_reload_gpr_from_vsxti;
> +          reload_gpr_vsx[V2DFmode]  = CODE_FOR_reload_gpr_from_vsxv2df;
> +          reload_gpr_vsx[V2DImode]  = CODE_FOR_reload_gpr_from_vsxv2di;
> +          reload_gpr_vsx[V4SFmode]  = CODE_FOR_reload_gpr_from_vsxv4sf;
> +          reload_gpr_vsx[V4SImode]  = CODE_FOR_reload_gpr_from_vsxv4si;
> +          reload_gpr_vsx[V8HImode]  = CODE_FOR_reload_gpr_from_vsxv8hi;
> +          reload_gpr_vsx[V16QImode] = CODE_FOR_reload_gpr_from_vsxv16qi;
> +          reload_gpr_vsx[SFmode]    = CODE_FOR_reload_gpr_from_vsxsf;
> +
> +          reload_vsx_gpr[TImode]    = CODE_FOR_reload_vsx_from_gprti;
> +          reload_vsx_gpr[V2DFmode]  = CODE_FOR_reload_vsx_from_gprv2df;
> +          reload_vsx_gpr[V2DImode]  = CODE_FOR_reload_vsx_from_gprv2di;
> +          reload_vsx_gpr[V4SFmode]  = CODE_FOR_reload_vsx_from_gprv4sf;
> +          reload_vsx_gpr[V4SImode]  = CODE_FOR_reload_vsx_from_gprv4si;
> +          reload_vsx_gpr[V8HImode]  = CODE_FOR_reload_vsx_from_gprv8hi;
> +          reload_vsx_gpr[V16QImode] = CODE_FOR_reload_vsx_from_gprv16qi;
> +          reload_vsx_gpr[SFmode]    = CODE_FOR_reload_vsx_from_gprsf;
> +        }
> +          else
> +        {
> +          reload_fpr_gpr[DImode] = CODE_FOR_reload_fpr_from_gprdi;
> +          reload_fpr_gpr[DDmode] = CODE_FOR_reload_fpr_from_gprdd;
> +          reload_fpr_gpr[DFmode] = CODE_FOR_reload_fpr_from_gprdf;
> +        }
> +        }
> 
> Why do the VSX reload functions depend on TARGET_POWERPC64? That seems
> like the wrong test.

Because at present we do not do direct move between VSX and GPR registers for
128-bit in 32-bit mode.  For 32-bit mode, we only transfer 64-bit types.

Due to issues with secondary reload where you only get one temporary register,
and that temporary register might/might not overlap with the output register,
we might need a type that takes 3 traditional FPR registers, and we don't have
one at present (to move from GPR to VSX we would need to do 2 direct moves, a
FMRGOW for the first 64-bits, and 2 direct moves, and another FMRGOW for the
second 64-bits, and then an XXPERMDI to glue the two 64-bits together).  I've
seen places where reload does not honor the & constraint for these secondary
reload cases, so the output can't be one of the temporary registers, hence
needing a type that spans 3 registers.

In theory it can be done, it just hasn't been done.

> +/* Return true if this is a move direct operation between GPR registers and
> +   floating point/VSX registers.  */
> +
> +bool
> +direct_move_p (rtx op0, rtx op1)
> 
> Why isn't this function symmetric?  It at least needs an explanation
> in the comment about assumptions for the operands.

I probably should have named the operands to and from, since that is how the
callers call it.  I'm not sure I understand the comment about it being
symetric, since you need different strategies to move in different directions,
and you might or might not have implemented all of the cases.

> +/* Return true if this is a load or store quad operation.  */
> +
> +bool
> +quad_load_store_p (rtx op0, rtx op1)
> 
> Same for this function.

Again it should have been to/from.

> +/* Helper function for rs6000_secondary_reload to return true if a move to a
> +   different register classe is really a simple move.  */
> +
> +static bool
> +rs6000_secondary_reload_simple_move (enum rs6000_reg_type to_type,
> +                     enum rs6000_reg_type from_type,
> +                     enum machine_mode mode)
> +{
> +  int size;
> +
> +  /* Add support for various direct moves available.  In this function, we only
> +     look at cases where we don't need any extra registers, and one or more
> +     simple move insns are issued.  At present, 32-bit integers are not allowed
> +     in FPR/VSX registers.  Single precision binary floating is not a simple
> +     move because we need to convert to the single precision memory layout.
> +     The 4-byte SDmode can be moved.  */
> 
> The second comment should be merged into the first -- it explains the
> purpose and implementation of the function.

Ok.

> +/* Return whether a move between two register classes can be done either
> +   directly (simple move) or via a pattern that uses a single extra temporary
> +   (using power8's direct move in this case.  */
> +
> +static bool
> +rs6000_secondary_reload_move (enum rs6000_reg_type to_type,
> +                  enum rs6000_reg_type from_type,
> +                  enum machine_mode mode,
> +                  secondary_reload_info *sri,
> +                  bool altivec_p)
> 
> Missing a close parenthesis in the comment.

Yep, thanks.

> (define_insn "*vsx_mov<mode>"
> -  [(set (match_operand:VSX_M 0 "nonimmediate_operand"
> "=Z,<VSr>,<VSr>,?Z,?wa,?wa,*Y,*r,*r,<VSr>,?wa,*r,v,wZ,v")
> -    (match_operand:VSX_M 1 "input_operand"
> "<VSr>,Z,<VSr>,wa,Z,wa,r,Y,r,j,j,j,W,v,wZ"))]
> +  [(set (match_operand:VSX_M 0 "nonimmediate_operand"
> "=Z,<VSr>,<VSr>,?Z,?wa,?wa,wQ,?&r,??Y,??r,??r,<VSr>,?wa,*r,v,wZ, v")
> +    (match_operand:VSX_M 1 "input_operand"
> "<VSr>,Z,<VSr>,wa,Z,wa,r,wQ,r,Y,r,j,j,j,W,v,wZ"))]
> 
> Why do you need to change the modifiers? Why should vector operands in
> GPRs matter for register preferences (removing `*' from "r"
> constraints)?

The problem is if you use '*', the quad word atomics will either get aborts or
load stuff up into vector registers and then do direct moves, since it will
never allocate vector registers to the GPRs.  We will also have the problem
when the 128-bit add/subtract in vector register support is written.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH, rs6000] power8 patches, patch #7, quad/byte/half-word atomic instructions
  2013-05-29 20:29   ` David Edelsohn
@ 2013-05-29 20:36     ` Michael Meissner
  2013-06-11 23:56     ` Michael Meissner
  1 sibling, 0 replies; 52+ messages in thread
From: Michael Meissner @ 2013-05-29 20:36 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Michael Meissner, GCC Patches, Pat Haugen, Peter Bergner

On Wed, May 29, 2013 at 04:29:07PM -0400, David Edelsohn wrote:
> -  if (mode == QImode || mode == HImode)
> +  /* On power8, we want to use SImode for the operation.  On previoius systems,
> +     use the operation in a subword and shift/mask to get the proper byte or
> +     halfword.  */
> +  if (TARGET_SYNC_HI_QI && (mode == QImode || mode == HImode))
> +    {
> +      val = convert_modes (SImode, mode, val, 1);
> +
> +      /* Prepare to adjust the return value.  */
> +      before = gen_reg_rtx (SImode);
> +      if (after)
> +    after = gen_reg_rtx (SImode);
> +      mode = SImode;
> +    }
> +  else if (mode == QImode || mode == HImode)
> 
> Spelling: previoius.

Thanks.

> This logic is redundant. Why not
> 
> if (mode == QImode || mode == HImode)
>   {
>     if (TARGET_SYNC_HI_QI)
>       {
>          new code
>       }
>     else
>       {
>          original code
>       }
> 
> The rest of this patch is okay.

I can do that.  I think earlier versions did it that way.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH, rs6000] power8 patches, patch #2, add crypto builtins
  2013-05-25  4:07         ` David Edelsohn
@ 2013-05-30 21:04           ` Michael Meissner
  0 siblings, 0 replies; 52+ messages in thread
From: Michael Meissner @ 2013-05-30 21:04 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Michael Meissner, GCC Patches, Pat Haugen, Peter Bergner

On Sat, May 25, 2013 at 12:07:32AM -0400, David Edelsohn wrote:
> [gcc/testsuite]
> 2013-05-20  Michael Meissner  <meissner@linux.vnet.ibm.com>
> 
>         * gcc.target/powerpc/crypto-builtin-1.c: New file, test for power8
>         crypto builtins.
> 
> The testcase needs to check something more than
> 
> /* { dg-require-effective-target powerpc_vsx_ok } */
> 
> I don't know if we need to separate the new VSX operations from the
> crypto opterations, but the new testcases need to ensure that the
> assembler and/or processor support the new instructions.

I believe I did this test before adding powerpc_p8vector_ok in patch #5.
Thanks for noticing it.  I think I'll commit the target-supports.exp change and
then change this test to use powerpc_p8vector_ok unless you have objections to
the target-supports.exp part of the patch.

I don't think it is important enough to have a separate category for crypto
vs. the other p8 vector stuff, since the assembler change added both at the
same time.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH, rs6000] power8 patches, patch #4, new power8 builtins
  2013-05-25  4:03   ` David Edelsohn
@ 2013-05-30 23:26     ` Michael Meissner
  2013-05-31  9:14       ` Segher Boessenkool
  0 siblings, 1 reply; 52+ messages in thread
From: Michael Meissner @ 2013-05-30 23:26 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Michael Meissner, GCC Patches, Pat Haugen, Peter Bergner

On Sat, May 25, 2013 at 12:03:51AM -0400, David Edelsohn wrote:
> On Tue, May 21, 2013 at 7:47 PM, Michael Meissner
> <meissner@linux.vnet.ibm.com> wrote:
> 
> 
> >         * config/rs6000/rs6000.c (rs6000_option_override_internal): Only
> >         allow power8 quad mode in 64-bit.  Turn off splitting wide types
> >         if we have quad mode.
> 
> Completely turning off splitting wide types seems like an
> unnecessarily large hammer to prevent splitting a value across
> registers within logical atomic operations.  I think we need to
> examine other alternatives.

Ok, I tracked down what the problem is.  We never implemented the EQV, ORC, or
NAND insns in the GPRs.  When I added the power8 vector versions, the split
wide types pass tried to do its thing in the GPRs, it creates a bad insn. I
originally saw it in the atomic ops, because I was testing all of the
combinations provided, but I can reproduce it just by using __int128_t.

In looking at the code, we don't seem to implement nor of two values either.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH, rs6000] power8 patches, patch #4, new power8 builtins
  2013-05-30 23:26     ` Michael Meissner
@ 2013-05-31  9:14       ` Segher Boessenkool
  2013-05-31 15:11         ` Michael Meissner
  0 siblings, 1 reply; 52+ messages in thread
From: Segher Boessenkool @ 2013-05-31  9:14 UTC (permalink / raw)
  To: Michael Meissner; +Cc: David Edelsohn, GCC Patches, Pat Haugen, Peter Bergner

> Ok, I tracked down what the problem is.  We never implemented the  
> EQV, ORC, or
> NAND insns in the GPRs.  When I added the power8 vector versions,  
> the split
> wide types pass tried to do its thing in the GPRs, it creates a bad  
> insn. I
> originally saw it in the atomic ops, because I was testing all of the
> combinations provided, but I can reproduce it just by using  
> __int128_t.

The boolc<mode>3_internal1 pattern uses non-canonical RTL for
eqv: (xor (not x) y) instead of (not (xor x y)).  You'll need
to add a correct pattern, or wait for my patch series (which
I'll start sending later today) to get in.

(There are problems with the dot forms of xor, nand, nor, and
eqv as well, but I don't think you will hit that?)


Segher

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH, rs6000] power8 patches, patch #4, new power8 builtins
  2013-05-31  9:14       ` Segher Boessenkool
@ 2013-05-31 15:11         ` Michael Meissner
  0 siblings, 0 replies; 52+ messages in thread
From: Michael Meissner @ 2013-05-31 15:11 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Michael Meissner, David Edelsohn, GCC Patches, Pat Haugen, Peter Bergner

On Fri, May 31, 2013 at 11:10:54AM +0200, Segher Boessenkool wrote:
> >Ok, I tracked down what the problem is.  We never implemented the
> >EQV, ORC, or
> >NAND insns in the GPRs.  When I added the power8 vector versions,
> >the split
> >wide types pass tried to do its thing in the GPRs, it creates a
> >bad insn. I
> >originally saw it in the atomic ops, because I was testing all of the
> >combinations provided, but I can reproduce it just by using
> >__int128_t.
> 
> The boolc<mode>3_internal1 pattern uses non-canonical RTL for
> eqv: (xor (not x) y) instead of (not (xor x y)).  You'll need
> to add a correct pattern, or wait for my patch series (which
> I'll start sending later today) to get in.
> 
> (There are problems with the dot forms of xor, nand, nor, and
> eqv as well, but I don't think you will hit that?)

I can probably wait for a bit if you are doing it shortly.  For splitting
128-bit types, we won't need the dot form.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH, rs6000] power8 patches, patch #4 (revised), new power8 builtins
  2013-05-21 23:47 ` [PATCH, rs6000] power8 patches, patch #4, new power8 builtins Michael Meissner
  2013-05-25  4:03   ` David Edelsohn
@ 2013-06-04 18:49   ` Michael Meissner
  2013-06-05 14:28     ` David Edelsohn
  1 sibling, 1 reply; 52+ messages in thread
From: Michael Meissner @ 2013-06-04 18:49 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, dje.gcc, pthaugen, bergner

[-- Attachment #1: Type: text/plain, Size: 6356 bytes --]

I revised this patch for power8 to add new miscellaneous vector instructions to
not turn off splitting wide moves.  In doing the patch, I discovered that we
never supported the 'eqv' instruction, and I have added support for eqv in the
GPR registers.

I also fixed the issue David raised in patch #2, that I did not protect the
crypt tests in case an assembler that does not understand ISA 2.07 instructions
was used to build the compiler.  I brought in the changes to
target-supports.exp from patch #5 to fix this.

This patch bootstraps and causes no regressions, is it ok to check in?

[gcc]
2013-06-04  Michael Meissner  <meissner@linux.vnet.ibm.com>
	    Pat Haugen <pthaugen@us.ibm.com>
	    Peter Bergner <bergner@vnet.ibm.com>

	* doc/extend.texi (PowerPC AltiVec/VSX Built-in Functions):
	Document new power8 builtins.

	* config/rs6000/vector.md (and<mode>3): Add a clobber/scratch of a
	condition code register, so TImode logical operations can be done
	either in VSX registers or GPRs.
	(nor<mode>3): Use the canonical form for nor.
	(eqv<mode>3): Add expanders for power8 xxleqv, xxlnand, xxlorc,
	vclz*, and vpopcnt* vector instructions.
	(nand<mode>3): Likewise.
	(orc<mode>3): Likewise.
	(clz<mode>2): LIkewise.
	(popcount<mode>2): Likewise.

	* config/rs6000/predicates.md (int_reg_operand): Rework tests so
	that only the GPRs are recognized.

	* config/rs6000/rs6000-builtin.def (xscvspdpn): Add new power8
	builtin functions.
	(xscvdpspn): Likewise.
	(vclzb): Likewise.
	(vclzh): Likewise.
	(vclzw): Likewise.
	(vclzd): Likewise.
	(vpopcntb): Likewise.
	(vpopcnth): Likewise.
	(vpopcntw): Likewise.
	(vpopcntd): Likewise.
	(vgbbd): Likewise.
	(vmrgew): Likewise.
	(vmrgow): Likewise.
	(eqv_v16qi3): Likewise.
	(eqv_v8hi3): Likewise.
	(eqv_v4si3): Likewise.
	(eqv_v2di3): Likewise.
	(eqv_v4sf3): Likewise.
	(eqv_v2df3): Likewise.
	(nand_v16qi3): Likewise.
	(nand_v8hi3): Likewise.
	(nand_v4si3): Likewise.
	(nand_v2di3): Likewise.
	(nand_v4sf3): Likewise.
	(nand_v2df3): Likewise.
	(orc_v16qi3): Likewise.
	(orc_v8hi3): Likewise.
	(orc_v4si3): Likewise.
	(orc_v2di3): Likewise.
	(orc_v4sf3): Likewise.
	(orc_v2df3): Likewise.
	(vclz): Likewise.
	(vclzb): Likewise.
	(vclzh): Likewise.
	(vclzw): Likewise.
	(vclzd): Likewise.
	(vpopcnt): Likewise.
	(vpopcntb): Likewise.
	(vpopcnth): Likewise.
	(vpopcntw): Likewise.
	(vpopcntd): Likewise.
	(vgbbd): Likewise.
	(eqv): Likewise.
	(nand): Likewise.
	(orc): Likewise.
	(vmrgew): Likewise.
	(vmrgow): Likewise.

	* config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Add
	support for new power8 builtins.

	* config/rs6000/rs6000.c (rs6000_option_override_internal): Only
	allow power8 quad mode in 64-bit.
	(rs6000_builtin_vectorized_function): Vectorize count leading
	zeros, population count builtins.
	(rs6000_expand_vector_init): On power8 use xscvdpspn to form V4SF
	vectors instead of xscvdpsp to avoid IEEE related traps.
	(builtin_function_type): Add vgbbd builtin function which takes an
	unsigned argument.
	(altivec_expand_vec_perm_const): Add support for new power8 merge
	instructions.

	* config/rs6000/vsx.md (VSX_L2): New iterator for 128-bit types,
	that does not include TImdoe for use with 32-bit.
	(UNSPEC_VSX_CVSPDPN): Support for power8 xscvdpspn and xscvspdpn
	instructions.
	(UNSPEC_VSX_CVDPSPN): Likewise.
	(vsx_xscvdpspn): Likewise.
	(vsx_xscvspdpn): Likewise.
	(vsx_xscvdpspn_scalar): Likewise.
	(vsx_xscvspdpn_directmove): Likewise.
	(vsx_and<mode>3): Split logical operations into 32-bit and
	64-bit. Add support to do logical operations on TImode as well as
	VSX vector types.  Allow logical operations to be done in either
	VSX registers or in general purpose registers in 64-bit mode.  Add
	splitters if GPRs were used. For and, add clobber of CCmode to
	allow use of ANDI on GPRs.
	(vsx_and<mode>3_32bit): Likewise.
	(vsx_and<mode>3_64bit): Likewise.
	(vsx_ior<mode>3): Likewise.
	(vsx_ior<mode>3_32bit): Likewise.
	(vsx_ior<mode>3_64bit): Likewise.
	(vsx_xor<mode>3): Likewise.
	(vsx_xor<mode>3_32bit): Likewise.
	(vsx_xor<mode>3_64bit): Likewise.
	(vsx_one_cmpl<mode>2): Likewise.
	(vsx_one_cmpl<mode>2_32bit): Likewise.
	(vsx_one_cmpl<mode>2_64bit): Likewise.
	(vsx_nor<mode>3): Likewise.
	(vsx_nor<mode>3_32bit): Likewise.
	(vsx_nor<mode>3_64bit): Likewise.
	(vsx_andc<mode>3): Likewise.
	(vsx_andc<mode>3_32bit): Likewise.
	(vsx_andc<mode>3_64bit): Likewise.
	(vsx_eqv<mode>3_32bit): Add support for power8 xxleqv, xxlnand,
	and xxlorc instructions.
	(vsx_eqv<mode>3_64bit): Likewise.
	(vsx_nand<mode>3_32bit): Likewise.
	(vsx_nand<mode>3_64bit): Likewise.
	(vsx_orc<mode>3_32bit): Likewise.
	(vsx_orc<mode>3_64bit): Likewise.

	* config/rs6000/rs6000.h (VLOGICAL_REGNO_P): Update comment.

	* config/rs6000/altivec.md (UNSPEC_VGBBD): Add power8 vgbbd
	instruction.
	(p8_vmrgew): Add power8 vmrgew and vmrgow instructions.
	(p8_vmrgow): Likewise.
	(altivec_and<mode>3): Add clobber of CCmode to allow AND using
	GPRs to be split under VSX.
	(p8v_clz<mode>2): Add power8 count leading zero support.
	(p8v_popcount<mode>2): Add power8 population count support.
	(p8v_vgbbd): Add power8 gather bits by bytes by doubleword
	support.

	* config/rs6000/rs6000.md (eqv<mode>3): Add support for powerp eqv
	instruction.

	* config/rs6000/altivec.h (vec_eqv): Add defines to export power8
	builtin functions.
	(vec_nand): Likewise.
	(vec_vclz): Likewise.
	(vec_vclzb): Likewise.
	(vec_vclzd): Likewise.
	(vec_vclzh): Likewise.
	(vec_vclzw): Likewise.
	(vec_vgbbd): Likewise.
	(vec_vmrgew): Likewise.
	(vec_vmrgow): Likewise.
	(vec_vpopcnt): Likewise.
	(vec_vpopcntb): Likewise.
	(vec_vpopcntd): Likewise.
	(vec_vpopcnth): Likewise.
	(vec_vpopcntw): Likewise.

[gcc/testsuite]
2013-06-04  Michael Meissner  <meissner@linux.vnet.ibm.com>
	    Pat Haugen <pthaugen@us.ibm.com>
	    Peter Bergner <bergner@vnet.ibm.com>

	* gcc.target/powerpc/bool.c: New file, add eqv, nand, nor tests.

	* gcc.target/powerpc/crypto-builtin-1.c: Use effective target
	powerpc_p8vector_ok instead of powerpc_vsx_ok.

	* lib/target-supports.exp (check_p8vector_hw_available) Add power8
	support.
	(check_effective_target_powerpc_p8vector_ok): Likewise.
	(is-effective-target): Likewise.
	(check_vect_support_and_set_flags): Likewise.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

[-- Attachment #2: gcc-power8.official-04d --]
[-- Type: text/plain, Size: 74819 bytes --]

Index: gcc/doc/extend.texi
===================================================================
--- gcc/doc/extend.texi	(revision 199650)
+++ gcc/doc/extend.texi	(working copy)
@@ -13991,6 +13991,38 @@ int vec_any_le (vector long long, vector
 int vec_any_lt (vector long long, vector long long);
 int vec_any_ne (vector long long, vector long long);
 
+vector long long vec_eqv (vector long long, vector long long);
+vector long long vec_eqv (vector bool long long, vector long long);
+vector long long vec_eqv (vector long long, vector bool long long);
+vector unsigned long long vec_eqv (vector unsigned long long,
+                                   vector unsigned long long);
+vector unsigned long long vec_eqv (vector bool long long,
+                                   vector unsigned long long);
+vector unsigned long long vec_eqv (vector unsigned long long,
+                                   vector bool long long);
+vector int vec_eqv (vector int, vector int);
+vector int vec_eqv (vector bool int, vector int);
+vector int vec_eqv (vector int, vector bool int);
+vector unsigned int vec_eqv (vector unsigned int, vector unsigned int);
+vector unsigned int vec_eqv (vector bool unsigned int,
+                             vector unsigned int);
+vector unsigned int vec_eqv (vector unsigned int,
+                             vector bool unsigned int);
+vector short vec_eqv (vector short, vector short);
+vector short vec_eqv (vector bool short, vector short);
+vector short vec_eqv (vector short, vector bool short);
+vector unsigned short vec_eqv (vector unsigned short, vector unsigned short);
+vector unsigned short vec_eqv (vector bool unsigned short,
+                               vector unsigned short);
+vector unsigned short vec_eqv (vector unsigned short,
+                               vector bool unsigned short);
+vector signed char vec_eqv (vector signed char, vector signed char);
+vector signed char vec_eqv (vector bool signed char, vector signed char);
+vector signed char vec_eqv (vector signed char, vector bool signed char);
+vector unsigned char vec_eqv (vector unsigned char, vector unsigned char);
+vector unsigned char vec_eqv (vector bool unsigned char, vector unsigned char);
+vector unsigned char vec_eqv (vector unsigned char, vector bool unsigned char);
+
 vector long long vec_max (vector long long, vector long long);
 vector unsigned long long vec_max (vector unsigned long long,
                                    vector unsigned long long);
@@ -13999,6 +14031,70 @@ vector long long vec_min (vector long lo
 vector unsigned long long vec_min (vector unsigned long long,
                                    vector unsigned long long);
 
+vector long long vec_nand (vector long long, vector long long);
+vector long long vec_nand (vector bool long long, vector long long);
+vector long long vec_nand (vector long long, vector bool long long);
+vector unsigned long long vec_nand (vector unsigned long long,
+                                    vector unsigned long long);
+vector unsigned long long vec_nand (vector bool long long,
+                                   vector unsigned long long);
+vector unsigned long long vec_nand (vector unsigned long long,
+                                    vector bool long long);
+vector int vec_nand (vector int, vector int);
+vector int vec_nand (vector bool int, vector int);
+vector int vec_nand (vector int, vector bool int);
+vector unsigned int vec_nand (vector unsigned int, vector unsigned int);
+vector unsigned int vec_nand (vector bool unsigned int,
+                              vector unsigned int);
+vector unsigned int vec_nand (vector unsigned int,
+                              vector bool unsigned int);
+vector short vec_nand (vector short, vector short);
+vector short vec_nand (vector bool short, vector short);
+vector short vec_nand (vector short, vector bool short);
+vector unsigned short vec_nand (vector unsigned short, vector unsigned short);
+vector unsigned short vec_nand (vector bool unsigned short,
+                                vector unsigned short);
+vector unsigned short vec_nand (vector unsigned short,
+                                vector bool unsigned short);
+vector signed char vec_nand (vector signed char, vector signed char);
+vector signed char vec_nand (vector bool signed char, vector signed char);
+vector signed char vec_nand (vector signed char, vector bool signed char);
+vector unsigned char vec_nand (vector unsigned char, vector unsigned char);
+vector unsigned char vec_nand (vector bool unsigned char, vector unsigned char);
+vector unsigned char vec_nand (vector unsigned char, vector bool unsigned char);
+
+vector long long vec_orc (vector long long, vector long long);
+vector long long vec_orc (vector bool long long, vector long long);
+vector long long vec_orc (vector long long, vector bool long long);
+vector unsigned long long vec_orc (vector unsigned long long,
+                                   vector unsigned long long);
+vector unsigned long long vec_orc (vector bool long long,
+                                   vector unsigned long long);
+vector unsigned long long vec_orc (vector unsigned long long,
+                                   vector bool long long);
+vector int vec_orc (vector int, vector int);
+vector int vec_orc (vector bool int, vector int);
+vector int vec_orc (vector int, vector bool int);
+vector unsigned int vec_orc (vector unsigned int, vector unsigned int);
+vector unsigned int vec_orc (vector bool unsigned int,
+                             vector unsigned int);
+vector unsigned int vec_orc (vector unsigned int,
+                             vector bool unsigned int);
+vector short vec_orc (vector short, vector short);
+vector short vec_orc (vector bool short, vector short);
+vector short vec_orc (vector short, vector bool short);
+vector unsigned short vec_orc (vector unsigned short, vector unsigned short);
+vector unsigned short vec_orc (vector bool unsigned short,
+                               vector unsigned short);
+vector unsigned short vec_orc (vector unsigned short,
+                               vector bool unsigned short);
+vector signed char vec_orc (vector signed char, vector signed char);
+vector signed char vec_orc (vector bool signed char, vector signed char);
+vector signed char vec_orc (vector signed char, vector bool signed char);
+vector unsigned char vec_orc (vector unsigned char, vector unsigned char);
+vector unsigned char vec_orc (vector bool unsigned char, vector unsigned char);
+vector unsigned char vec_orc (vector unsigned char, vector bool unsigned char);
+
 vector int vec_pack (vector long long, vector long long);
 vector unsigned int vec_pack (vector unsigned long long,
                               vector unsigned long long);
@@ -14047,6 +14143,27 @@ vector unsigned long long vec_vaddudm (v
 vector unsigned long long vec_vaddudm (vector unsigned long long,
                                        vector bool unsigned long long);
 
+vector long long vec_vclz (vector long long);
+vector unsigned long long vec_vclz (vector unsigned long long);
+vector int vec_vclz (vector int);
+vector unsigned int vec_vclz (vector int);
+vector short vec_vclz (vector short);
+vector unsigned short vec_vclz (vector unsigned short);
+vector signed char vec_vclz (vector signed char);
+vector unsigned char vec_vclz (vector unsigned char);
+
+vector signed char vec_vclzb (vector signed char);
+vector unsigned char vec_vclzb (vector unsigned char);
+
+vector long long vec_vclzd (vector long long);
+vector unsigned long long vec_vclzd (vector unsigned long long);
+
+vector short vec_vclzh (vector short);
+vector unsigned short vec_vclzh (vector unsigned short);
+
+vector int vec_vclzw (vector int);
+vector unsigned int vec_vclzw (vector int);
+
 vector long long vec_vmaxsd (vector long long, vector long long);
 
 vector unsigned long long vec_vmaxud (vector unsigned long long,
@@ -14068,6 +14185,27 @@ vector unsigned int vec_vpkudum (vector 
                                  vector unsigned long long);
 vector bool int vec_vpkudum (vector bool long long, vector bool long long);
 
+vector long long vec_vpopcnt (vector long long);
+vector unsigned long long vec_vpopcnt (vector unsigned long long);
+vector int vec_vpopcnt (vector int);
+vector unsigned int vec_vpopcnt (vector int);
+vector short vec_vpopcnt (vector short);
+vector unsigned short vec_vpopcnt (vector unsigned short);
+vector signed char vec_vpopcnt (vector signed char);
+vector unsigned char vec_vpopcnt (vector unsigned char);
+
+vector signed char vec_vpopcntb (vector signed char);
+vector unsigned char vec_vpopcntb (vector unsigned char);
+
+vector long long vec_vpopcntd (vector long long);
+vector unsigned long long vec_vpopcntd (vector unsigned long long);
+
+vector short vec_vpopcnth (vector short);
+vector unsigned short vec_vpopcnth (vector unsigned short);
+
+vector int vec_vpopcntw (vector int);
+vector unsigned int vec_vpopcntw (vector int);
+
 vector long long vec_vrld (vector long long, vector unsigned long long);
 vector unsigned long long vec_vrld (vector unsigned long long,
                                     vector unsigned long long);
Index: gcc/testsuite/gcc.target/powerpc/crypto-builtin-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/crypto-builtin-1.c	(revision 199650)
+++ gcc/testsuite/gcc.target/powerpc/crypto-builtin-1.c	(working copy)
@@ -1,6 +1,6 @@
 /* { dg-do compile { target { powerpc*-*-* } } } */
 /* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
-/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
 /* { dg-options "-mcpu=power8 -O2 -ftree-vectorize -fvect-cost-model -fno-unroll-loops -fno-unroll-all-loops" } */
 
 typedef vector unsigned long long	crypto_t;
Index: gcc/testsuite/gcc.target/powerpc/bool.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/bool.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/bool.c	(revision 0)
@@ -0,0 +1,14 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler "eqv" } } */
+/* { dg-final { scan-assembler "nand" } } */
+/* { dg-final { scan-assembler "nor" } } */
+
+#ifndef TYPE
+#define TYPE unsigned long
+#endif
+
+TYPE op1 (TYPE a, TYPE b) { return ~(a ^ b); }	/* eqv */
+TYPE op2 (TYPE a, TYPE b) { return ~(a & b); }	/* nand */
+TYPE op3 (TYPE a, TYPE b) { return ~(a | b); }	/* nor */
+
Index: gcc/testsuite/lib/target-supports.exp
===================================================================
--- gcc/testsuite/lib/target-supports.exp	(revision 199650)
+++ gcc/testsuite/lib/target-supports.exp	(working copy)
@@ -1311,6 +1311,32 @@ proc check_effective_target_avx_runtime 
     return 0
 }
 
+# Return 1 if the target supports executing power8 vector instructions, 0
+# otherwise.  Cache the result.
+
+proc check_p8vector_hw_available { } {
+    return [check_cached_effective_target p8vector_hw_available {
+	# Some simulators are known to not support VSX/power8 instructions.
+	# For now, disable on Darwin
+	if { [istarget powerpc-*-eabi] || [istarget powerpc*-*-eabispe] || [istarget *-*-darwin*]} {
+	    expr 0
+	} else {
+	    set options "-mpower8-vector"
+	    check_runtime_nocache p8vector_hw_available {
+		int main()
+		{
+		#ifdef __MACH__
+		  asm volatile ("xxlorc vs0,vs0,vs0");
+		#else
+		  asm volatile ("xxlorc 0,0,0");
+	        #endif
+		  return 0;
+		}
+	    } $options
+	}
+    }]
+}
+
 # Return 1 if the target supports executing VSX instructions, 0
 # otherwise.  Cache the result.
 
@@ -2749,6 +2775,33 @@ proc check_effective_target_powerpc_alti
     }
 }
 
+# Return 1 if this is a PowerPC target supporting -mpower8-vector
+
+proc check_effective_target_powerpc_p8vector_ok { } {
+    if { ([istarget powerpc*-*-*]
+         && ![istarget powerpc-*-linux*paired*])
+	 || [istarget rs6000-*-*] } {
+	# AltiVec is not supported on AIX before 5.3.
+	if { [istarget powerpc*-*-aix4*]
+	     || [istarget powerpc*-*-aix5.1*] 
+	     || [istarget powerpc*-*-aix5.2*] } {
+	    return 0
+	}
+	return [check_no_compiler_messages powerpc_p8vector_ok object {
+	    int main (void) {
+#ifdef __MACH__
+		asm volatile ("xxlorc vs0,vs0,vs0");
+#else
+		asm volatile ("xxlorc 0,0,0");
+#endif
+		return 0;
+	    }
+	} "-mpower8-vector"]
+    } else {
+	return 0
+    }
+}
+
 # Return 1 if this is a PowerPC target supporting -mvsx
 
 proc check_effective_target_powerpc_vsx_ok { } {
@@ -4576,6 +4629,7 @@ proc is-effective-target { arg } {
 	switch $arg {
 	  "vmx_hw"         { set selected [check_vmx_hw_available] }
 	  "vsx_hw"         { set selected [check_vsx_hw_available] }
+	  "p8vector_hw"    { set selected [check_p8vector_hw_available] }
 	  "ppc_recip_hw"   { set selected [check_ppc_recip_hw_available] }
 	  "named_sections" { set selected [check_named_sections_available] }
 	  "gc_sections"    { set selected [check_gc_sections_available] }
@@ -4597,6 +4651,7 @@ proc is-effective-target-keyword { arg }
 	switch $arg {
 	  "vmx_hw"         { return 1 }
 	  "vsx_hw"         { return 1 }
+	  "p8vector_hw"    { return 1 }
 	  "ppc_recip_hw"   { return 1 }
 	  "named_sections" { return 1 }
 	  "gc_sections"    { return 1 }
@@ -5181,7 +5236,9 @@ proc check_vect_support_and_set_flags { 
         }
 
         lappend DEFAULT_VECTCFLAGS "-maltivec"
-        if [check_vsx_hw_available] {
+        if [check_p8vector_hw_available] {
+            lappend DEFAULT_VECTCFLAGS "-mpower8-vector" "-mno-allow-movmisalign"
+        } elseif [check_vsx_hw_available] {
             lappend DEFAULT_VECTCFLAGS "-mvsx" "-mno-allow-movmisalign"
         }
 
Index: gcc/config/rs6000/vector.md
===================================================================
--- gcc/config/rs6000/vector.md	(revision 199650)
+++ gcc/config/rs6000/vector.md	(working copy)
@@ -730,9 +730,10 @@ (define_expand "ior<mode>3"
   "")
 
 (define_expand "and<mode>3"
-  [(set (match_operand:VEC_L 0 "vlogical_operand" "")
-        (and:VEC_L (match_operand:VEC_L 1 "vlogical_operand" "")
-		   (match_operand:VEC_L 2 "vlogical_operand" "")))]
+  [(parallel [(set (match_operand:VEC_L 0 "vlogical_operand" "")
+		   (and:VEC_L (match_operand:VEC_L 1 "vlogical_operand" "")
+			      (match_operand:VEC_L 2 "vlogical_operand" "")))
+	      (clobber (match_scratch:CC 3 ""))])]
   "VECTOR_MEM_ALTIVEC_OR_VSX_P (<MODE>mode)
    && (<MODE>mode != TImode || TARGET_POWERPC64)"
   "")
@@ -746,8 +747,8 @@ (define_expand "one_cmpl<mode>2"
 
 (define_expand "nor<mode>3"
   [(set (match_operand:VEC_L 0 "vlogical_operand" "")
-        (not:VEC_L (ior:VEC_L (match_operand:VEC_L 1 "vlogical_operand" "")
-			      (match_operand:VEC_L 2 "vlogical_operand" ""))))]
+	(and:VEC_L (not:VEC_L (match_operand:VEC_L 1 "vlogical_operand" ""))
+		   (not:VEC_L (match_operand:VEC_L 2 "vlogical_operand" ""))))]
   "VECTOR_MEM_ALTIVEC_OR_VSX_P (<MODE>mode)
    && (<MODE>mode != TImode || TARGET_POWERPC64)"
   "")
@@ -760,6 +761,47 @@ (define_expand "andc<mode>3"
    && (<MODE>mode != TImode || TARGET_POWERPC64)"
   "")
 
+;; Power8 vector logical instructions.
+(define_expand "eqv<mode>3"
+  [(set (match_operand:VEC_L 0 "register_operand" "")
+	(not:VEC_L
+	 (xor:VEC_L (match_operand:VEC_L 1 "register_operand" "")
+		    (match_operand:VEC_L 2 "register_operand" ""))))]
+  "TARGET_P8_VECTOR && VECTOR_MEM_VSX_P (<MODE>mode)
+   && (<MODE>mode != TImode || TARGET_POWERPC64)")
+
+;; Rewrite nand into canonical form
+(define_expand "nand<mode>3"
+  [(set (match_operand:VEC_L 0 "register_operand" "")
+	(ior:VEC_L
+	 (not:VEC_L (match_operand:VEC_L 1 "register_operand" ""))
+	 (not:VEC_L (match_operand:VEC_L 2 "register_operand" ""))))]
+  "TARGET_P8_VECTOR && VECTOR_MEM_VSX_P (<MODE>mode)
+   && (<MODE>mode != TImode || TARGET_POWERPC64)")
+
+;; The canonical form is to have the negated elment first, so we need to
+;; reverse arguments.
+(define_expand "orc<mode>3"
+  [(set (match_operand:VEC_L 0 "register_operand" "")
+	(ior:VEC_L
+	 (not:VEC_L (match_operand:VEC_L 1 "register_operand" ""))
+	 (match_operand:VEC_L 2 "register_operand" "")))]
+  "TARGET_P8_VECTOR && VECTOR_MEM_VSX_P (<MODE>mode)
+   && (<MODE>mode != TImode || TARGET_POWERPC64)")
+
+;; Vector count leading zeros
+(define_expand "clz<mode>2"
+  [(set (match_operand:VEC_I 0 "register_operand" "")
+	(clz:VEC_I (match_operand:VEC_I 1 "register_operand" "")))]
+  "TARGET_P8_VECTOR")
+
+;; Vector population count
+(define_expand "popcount<mode>2"
+  [(set (match_operand:VEC_I 0 "register_operand" "")
+        (popcount:VEC_I (match_operand:VEC_I 1 "register_operand" "")))]
+  "TARGET_P8_VECTOR")
+
+\f
 ;; Same size conversions
 (define_expand "float<VEC_int><mode>2"
   [(set (match_operand:VEC_F 0 "vfloat_operand" "")
Index: gcc/config/rs6000/predicates.md
===================================================================
--- gcc/config/rs6000/predicates.md	(revision 199650)
+++ gcc/config/rs6000/predicates.md	(working copy)
@@ -207,7 +207,7 @@ (define_predicate "int_reg_operand"
   if (!REG_P (op))
     return 0;
 
-  if (REGNO (op) >= ARG_POINTER_REGNUM && !CA_REGNO_P (REGNO (op)))
+  if (REGNO (op) >= FIRST_PSEUDO_REGISTER)
     return 1;
 
   return INT_REGNO_P (REGNO (op));
Index: gcc/config/rs6000/rs6000-builtin.def
===================================================================
--- gcc/config/rs6000/rs6000-builtin.def	(revision 199650)
+++ gcc/config/rs6000/rs6000-builtin.def	(working copy)
@@ -1234,10 +1234,24 @@ BU_VSX_OVERLOAD_2 (XXSPLTW,  "xxspltw")
 BU_VSX_OVERLOAD_X (LD,	     "ld")
 BU_VSX_OVERLOAD_X (ST,	     "st")
 \f
+/* 1 argument instruction added in ISA 2.07 that is classified as a VSX
+   instruction.  */
+BU_P8V_VSX_1 (XSCVSPDPN,      "xscvspdpn",	CONST,	vsx_xscvspdpn)
+BU_P8V_VSX_1 (XSCVDPSPN,      "xscvdpspn",	CONST,	vsx_xscvdpspn)
+
 /* 1 argument altivec instructions added in ISA 2.07.  */
 BU_P8V_AV_1 (ABS_V2DI,	      "abs_v2di",	CONST,	absv2di2)
 BU_P8V_AV_1 (VUPKHSW,	      "vupkhsw",	CONST,	altivec_vupkhsw)
 BU_P8V_AV_1 (VUPKLSW,	      "vupklsw",	CONST,	altivec_vupklsw)
+BU_P8V_AV_1 (VCLZB,	      "vclzb",		CONST,  clzv16qi2)
+BU_P8V_AV_1 (VCLZH,	      "vclzh",		CONST,  clzv8hi2)
+BU_P8V_AV_1 (VCLZW,	      "vclzw",		CONST,  clzv4si2)
+BU_P8V_AV_1 (VCLZD,	      "vclzd",		CONST,  clzv2di2)
+BU_P8V_AV_1 (VPOPCNTB,	      "vpopcntb",	CONST,  popcountv16qi2)
+BU_P8V_AV_1 (VPOPCNTH,	      "vpopcnth",	CONST,  popcountv8hi2)
+BU_P8V_AV_1 (VPOPCNTW,	      "vpopcntw",	CONST,  popcountv4si2)
+BU_P8V_AV_1 (VPOPCNTD,	      "vpopcntd",	CONST,  popcountv2di2)
+BU_P8V_AV_1 (VGBBD,	      "vgbbd",		CONST,  p8v_vgbbd)
 
 /* 2 argument altivec instructions added in ISA 2.07.  */
 BU_P8V_AV_2 (VADDUDM,		"vaddudm",	CONST,	addv2di3)
@@ -1245,6 +1259,8 @@ BU_P8V_AV_2 (VMINSD,		"vminsd",	CONST,	s
 BU_P8V_AV_2 (VMAXSD,		"vmaxsd",	CONST,	smaxv2di3)
 BU_P8V_AV_2 (VMINUD,		"vminud",	CONST,	uminv2di3)
 BU_P8V_AV_2 (VMAXUD,		"vmaxud",	CONST,	umaxv2di3)
+BU_P8V_AV_2 (VMRGEW,		"vmrgew",	CONST,	p8_vmrgew)
+BU_P8V_AV_2 (VMRGOW,		"vmrgow",	CONST,	p8_vmrgow)
 BU_P8V_AV_2 (VPKUDUM,		"vpkudum",	CONST,	altivec_vpkudum)
 BU_P8V_AV_2 (VPKSDSS,		"vpksdss",	CONST,	altivec_vpksdss)
 BU_P8V_AV_2 (VPKUDUS,		"vpkudus",	CONST,	altivec_vpkudus)
@@ -1255,6 +1271,29 @@ BU_P8V_AV_2 (VSRD,		"vsrd",		CONST,	vlsh
 BU_P8V_AV_2 (VSRAD,		"vsrad",	CONST,	vashrv2di3)
 BU_P8V_AV_2 (VSUBUDM,		"vsubudm",	CONST,	subv2di3)
 
+/* 2 argument VSX instructions added in ISA 2.07.  For the logical
+   instructions, we define a builtin for each vector type.  */
+BU_P8V_AV_2 (EQV_V16QI,		"eqv_v16qi",	CONST,	eqvv16qi3)
+BU_P8V_AV_2 (EQV_V8HI,		"eqv_v8hi",	CONST,	eqvv8hi3)
+BU_P8V_AV_2 (EQV_V4SI,		"eqv_v4si",	CONST,	eqvv4si3)
+BU_P8V_AV_2 (EQV_V2DI,		"eqv_v2di",	CONST,	eqvv2di3)
+BU_P8V_AV_2 (EQV_V4SF,		"eqv_v4sf",	CONST,	eqvv4sf3)
+BU_P8V_AV_2 (EQV_V2DF,		"eqv_v2df",	CONST,	eqvv2df3)
+
+BU_P8V_AV_2 (NAND_V16QI,	"nand_v16qi",	CONST,	nandv16qi3)
+BU_P8V_AV_2 (NAND_V8HI,		"nand_v8hi",	CONST,	nandv8hi3)
+BU_P8V_AV_2 (NAND_V4SI,		"nand_v4si",	CONST,	nandv4si3)
+BU_P8V_AV_2 (NAND_V2DI,		"nand_v2di",	CONST,	nandv2di3)
+BU_P8V_AV_2 (NAND_V4SF,		"nand_v4sf",	CONST,	nandv4sf3)
+BU_P8V_AV_2 (NAND_V2DF,		"nand_v2df",	CONST,	nandv2df3)
+
+BU_P8V_AV_2 (ORC_V16QI,		"orc_v16qi",	CONST,	orcv16qi3)
+BU_P8V_AV_2 (ORC_V8HI,		"orc_v8hi",	CONST,	orcv8hi3)
+BU_P8V_AV_2 (ORC_V4SI,		"orc_v4si",	CONST,	orcv4si3)
+BU_P8V_AV_2 (ORC_V2DI,		"orc_v2di",	CONST,	orcv2di3)
+BU_P8V_AV_2 (ORC_V4SF,		"orc_v4sf",	CONST,	orcv4sf3)
+BU_P8V_AV_2 (ORC_V2DF,		"orc_v2df",	CONST,	orcv2df3)
+
 /* Vector comparison instructions added in ISA 2.07.  */
 BU_P8V_AV_2 (VCMPEQUD,		"vcmpequd",	CONST,	vector_eqv2di)
 BU_P8V_AV_2 (VCMPGTSD,		"vcmpgtsd",	CONST,	vector_gtv2di)
@@ -1268,13 +1307,29 @@ BU_P8V_AV_P (VCMPGTUD_P,	"vcmpgtud_p",	C
 /* ISA 2.07 vector overloaded 1 argument functions.  */
 BU_P8V_OVERLOAD_1 (VUPKHSW,	"vupkhsw")
 BU_P8V_OVERLOAD_1 (VUPKLSW,	"vupklsw")
+BU_P8V_OVERLOAD_1 (VCLZ,	"vclz")
+BU_P8V_OVERLOAD_1 (VCLZB,	"vclzb")
+BU_P8V_OVERLOAD_1 (VCLZH,	"vclzh")
+BU_P8V_OVERLOAD_1 (VCLZW,	"vclzw")
+BU_P8V_OVERLOAD_1 (VCLZD,	"vclzd")
+BU_P8V_OVERLOAD_1 (VPOPCNT,	"vpopcnt")
+BU_P8V_OVERLOAD_1 (VPOPCNTB,	"vpopcntb")
+BU_P8V_OVERLOAD_1 (VPOPCNTH,	"vpopcnth")
+BU_P8V_OVERLOAD_1 (VPOPCNTW,	"vpopcntw")
+BU_P8V_OVERLOAD_1 (VPOPCNTD,	"vpopcntd")
+BU_P8V_OVERLOAD_1 (VGBBD,	"vgbbd")
 
 /* ISA 2.07 vector overloaded 2 argument functions.  */
+BU_P8V_OVERLOAD_2 (EQV,		"eqv")
+BU_P8V_OVERLOAD_2 (NAND,	"nand")
+BU_P8V_OVERLOAD_2 (ORC,		"orc")
 BU_P8V_OVERLOAD_2 (VADDUDM,	"vaddudm")
 BU_P8V_OVERLOAD_2 (VMAXSD,	"vmaxsd")
 BU_P8V_OVERLOAD_2 (VMAXUD,	"vmaxud")
 BU_P8V_OVERLOAD_2 (VMINSD,	"vminsd")
 BU_P8V_OVERLOAD_2 (VMINUD,	"vminud")
+BU_P8V_OVERLOAD_2 (VMRGEW,	"vmrgew")
+BU_P8V_OVERLOAD_2 (VMRGOW,	"vmrgow")
 BU_P8V_OVERLOAD_2 (VPKSDSS,	"vpksdss")
 BU_P8V_OVERLOAD_2 (VPKSDUS,	"vpksdus")
 BU_P8V_OVERLOAD_2 (VPKUDUM,	"vpkudum")
Index: gcc/config/rs6000/rs6000-c.c
===================================================================
--- gcc/config/rs6000/rs6000-c.c	(revision 199650)
+++ gcc/config/rs6000/rs6000-c.c	(working copy)
@@ -3515,6 +3515,404 @@ const struct altivec_builtin_types altiv
   { ALTIVEC_BUILTIN_VEC_VCMPGE_P, VSX_BUILTIN_XVCMPGEDP_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V2DF, RS6000_BTI_V2DF },
 
+  /* Power8 vector overloaded functions.  */
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V16QI,
+    RS6000_BTI_V16QI, RS6000_BTI_bool_V16QI, RS6000_BTI_V16QI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V16QI,
+    RS6000_BTI_V16QI, RS6000_BTI_V16QI, RS6000_BTI_bool_V16QI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V16QI,
+    RS6000_BTI_V16QI, RS6000_BTI_V16QI, RS6000_BTI_V16QI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V16QI,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_bool_V16QI,
+    RS6000_BTI_unsigned_V16QI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V16QI,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI,
+    RS6000_BTI_bool_V16QI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V16QI,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI,
+    RS6000_BTI_unsigned_V16QI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V8HI,
+    RS6000_BTI_V8HI, RS6000_BTI_bool_V8HI, RS6000_BTI_V8HI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V8HI,
+    RS6000_BTI_V8HI, RS6000_BTI_V8HI, RS6000_BTI_bool_V8HI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V8HI,
+    RS6000_BTI_V8HI, RS6000_BTI_V8HI, RS6000_BTI_V8HI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V8HI,
+    RS6000_BTI_unsigned_V8HI, RS6000_BTI_bool_V8HI,
+    RS6000_BTI_unsigned_V8HI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V8HI,
+    RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI,
+    RS6000_BTI_bool_V8HI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V8HI,
+    RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI,
+    RS6000_BTI_unsigned_V8HI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_bool_V4SI, RS6000_BTI_V4SI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_bool_V4SI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_bool_V4SI,
+    RS6000_BTI_unsigned_V4SI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
+    RS6000_BTI_bool_V4SI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
+    RS6000_BTI_unsigned_V4SI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V2DI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V2DI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_bool_V2DI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V2DI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V4SF,
+    RS6000_BTI_V4SF, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V2DF,
+    RS6000_BTI_V2DF, RS6000_BTI_V2DF, RS6000_BTI_V2DF, 0 },
+
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V16QI,
+    RS6000_BTI_V16QI, RS6000_BTI_bool_V16QI, RS6000_BTI_V16QI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V16QI,
+    RS6000_BTI_V16QI, RS6000_BTI_V16QI, RS6000_BTI_bool_V16QI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V16QI,
+    RS6000_BTI_V16QI, RS6000_BTI_V16QI, RS6000_BTI_V16QI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V16QI,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_bool_V16QI,
+    RS6000_BTI_unsigned_V16QI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V16QI,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI,
+    RS6000_BTI_bool_V16QI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V16QI,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI,
+    RS6000_BTI_unsigned_V16QI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V8HI,
+    RS6000_BTI_V8HI, RS6000_BTI_bool_V8HI, RS6000_BTI_V8HI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V8HI,
+    RS6000_BTI_V8HI, RS6000_BTI_V8HI, RS6000_BTI_bool_V8HI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V8HI,
+    RS6000_BTI_V8HI, RS6000_BTI_V8HI, RS6000_BTI_V8HI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V8HI,
+    RS6000_BTI_unsigned_V8HI, RS6000_BTI_bool_V8HI,
+    RS6000_BTI_unsigned_V8HI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V8HI,
+    RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI,
+    RS6000_BTI_bool_V8HI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V8HI,
+    RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI,
+    RS6000_BTI_unsigned_V8HI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_bool_V4SI, RS6000_BTI_V4SI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_bool_V4SI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_bool_V4SI,
+    RS6000_BTI_unsigned_V4SI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
+    RS6000_BTI_bool_V4SI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
+    RS6000_BTI_unsigned_V4SI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V2DI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V2DI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_bool_V2DI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V2DI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V4SF,
+    RS6000_BTI_V4SF, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V2DF,
+    RS6000_BTI_V2DF, RS6000_BTI_V2DF, RS6000_BTI_V2DF, 0 },
+
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V16QI,
+    RS6000_BTI_V16QI, RS6000_BTI_bool_V16QI, RS6000_BTI_V16QI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V16QI,
+    RS6000_BTI_V16QI, RS6000_BTI_V16QI, RS6000_BTI_bool_V16QI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V16QI,
+    RS6000_BTI_V16QI, RS6000_BTI_V16QI, RS6000_BTI_V16QI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V16QI,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_bool_V16QI,
+    RS6000_BTI_unsigned_V16QI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V16QI,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI,
+    RS6000_BTI_bool_V16QI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V16QI,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI,
+    RS6000_BTI_unsigned_V16QI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V8HI,
+    RS6000_BTI_V8HI, RS6000_BTI_bool_V8HI, RS6000_BTI_V8HI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V8HI,
+    RS6000_BTI_V8HI, RS6000_BTI_V8HI, RS6000_BTI_bool_V8HI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V8HI,
+    RS6000_BTI_V8HI, RS6000_BTI_V8HI, RS6000_BTI_V8HI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V8HI,
+    RS6000_BTI_unsigned_V8HI, RS6000_BTI_bool_V8HI,
+    RS6000_BTI_unsigned_V8HI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V8HI,
+    RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI,
+    RS6000_BTI_bool_V8HI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V8HI,
+    RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI,
+    RS6000_BTI_unsigned_V8HI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_bool_V4SI, RS6000_BTI_V4SI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_bool_V4SI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_bool_V4SI,
+    RS6000_BTI_unsigned_V4SI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
+    RS6000_BTI_bool_V4SI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
+    RS6000_BTI_unsigned_V4SI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V2DI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V2DI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_bool_V2DI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V2DI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V4SF,
+    RS6000_BTI_V4SF, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V2DF,
+    RS6000_BTI_V2DF, RS6000_BTI_V2DF, RS6000_BTI_V2DF, 0 },
+
+  { P8V_BUILTIN_VEC_VADDUDM, P8V_BUILTIN_VADDUDM,
+    RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VADDUDM, P8V_BUILTIN_VADDUDM,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VADDUDM, P8V_BUILTIN_VADDUDM,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VADDUDM, P8V_BUILTIN_VADDUDM,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VADDUDM, P8V_BUILTIN_VADDUDM,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VADDUDM, P8V_BUILTIN_VADDUDM,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+
+  { P8V_BUILTIN_VEC_VCLZ, P8V_BUILTIN_VCLZB,
+    RS6000_BTI_V16QI, RS6000_BTI_V16QI, 0, 0 },
+  { P8V_BUILTIN_VEC_VCLZ, P8V_BUILTIN_VCLZB,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0, 0 },
+  { P8V_BUILTIN_VEC_VCLZ, P8V_BUILTIN_VCLZH,
+    RS6000_BTI_V8HI, RS6000_BTI_V8HI, 0, 0 },
+  { P8V_BUILTIN_VEC_VCLZ, P8V_BUILTIN_VCLZH,
+    RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI, 0, 0 },
+  { P8V_BUILTIN_VEC_VCLZ, P8V_BUILTIN_VCLZW,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0, 0 },
+  { P8V_BUILTIN_VEC_VCLZ, P8V_BUILTIN_VCLZW,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, 0, 0 },
+  { P8V_BUILTIN_VEC_VCLZ, P8V_BUILTIN_VCLZD,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0, 0 },
+  { P8V_BUILTIN_VEC_VCLZ, P8V_BUILTIN_VCLZD,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0, 0 },
+
+  { P8V_BUILTIN_VEC_VCLZB, P8V_BUILTIN_VCLZB,
+    RS6000_BTI_V16QI, RS6000_BTI_V16QI, 0, 0 },
+  { P8V_BUILTIN_VEC_VCLZB, P8V_BUILTIN_VCLZB,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0, 0 },
+
+  { P8V_BUILTIN_VEC_VCLZH, P8V_BUILTIN_VCLZH,
+    RS6000_BTI_V8HI, RS6000_BTI_V8HI, 0, 0 },
+  { P8V_BUILTIN_VEC_VCLZH, P8V_BUILTIN_VCLZH,
+    RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI, 0, 0 },
+
+  { P8V_BUILTIN_VEC_VCLZW, P8V_BUILTIN_VCLZW,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0, 0 },
+  { P8V_BUILTIN_VEC_VCLZW, P8V_BUILTIN_VCLZW,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, 0, 0 },
+
+  { P8V_BUILTIN_VEC_VCLZD, P8V_BUILTIN_VCLZD,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0, 0 },
+  { P8V_BUILTIN_VEC_VCLZD, P8V_BUILTIN_VCLZD,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0, 0 },
+
+  { P8V_BUILTIN_VEC_VGBBD, P8V_BUILTIN_VGBBD,
+    RS6000_BTI_V16QI, RS6000_BTI_V16QI, 0, 0 },
+  { P8V_BUILTIN_VEC_VGBBD, P8V_BUILTIN_VGBBD,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0, 0 },
+
+  { P8V_BUILTIN_VEC_VMINSD, P8V_BUILTIN_VMINSD,
+    RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VMINSD, P8V_BUILTIN_VMINSD,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VMINSD, P8V_BUILTIN_VMINSD,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+
+  { P8V_BUILTIN_VEC_VMAXSD, P8V_BUILTIN_VMAXSD,
+    RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VMAXSD, P8V_BUILTIN_VMAXSD,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VMAXSD, P8V_BUILTIN_VMAXSD,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+
+  { P8V_BUILTIN_VEC_VMINUD, P8V_BUILTIN_VMINUD,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VMINUD, P8V_BUILTIN_VMINUD,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_bool_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VMINUD, P8V_BUILTIN_VMINUD,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
+
+  { P8V_BUILTIN_VEC_VMAXUD, P8V_BUILTIN_VMAXUD,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VMAXUD, P8V_BUILTIN_VMAXUD,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_bool_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VMAXUD, P8V_BUILTIN_VMAXUD,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
+
+  { P8V_BUILTIN_VEC_VMRGEW, P8V_BUILTIN_VMRGEW,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
+  { P8V_BUILTIN_VEC_VMRGEW, P8V_BUILTIN_VMRGEW,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
+    RS6000_BTI_unsigned_V4SI, 0 },
+
+  { P8V_BUILTIN_VEC_VMRGOW, P8V_BUILTIN_VMRGOW,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
+  { P8V_BUILTIN_VEC_VMRGOW, P8V_BUILTIN_VMRGOW,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
+    RS6000_BTI_unsigned_V4SI, 0 },
+
+  { P8V_BUILTIN_VEC_VPOPCNT, P8V_BUILTIN_VPOPCNTB,
+    RS6000_BTI_V16QI, RS6000_BTI_V16QI, 0, 0 },
+  { P8V_BUILTIN_VEC_VPOPCNT, P8V_BUILTIN_VPOPCNTB,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0, 0 },
+  { P8V_BUILTIN_VEC_VPOPCNT, P8V_BUILTIN_VPOPCNTH,
+    RS6000_BTI_V8HI, RS6000_BTI_V8HI, 0, 0 },
+  { P8V_BUILTIN_VEC_VPOPCNT, P8V_BUILTIN_VPOPCNTH,
+    RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI, 0, 0 },
+  { P8V_BUILTIN_VEC_VPOPCNT, P8V_BUILTIN_VPOPCNTW,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0, 0 },
+  { P8V_BUILTIN_VEC_VPOPCNT, P8V_BUILTIN_VPOPCNTW,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, 0, 0 },
+  { P8V_BUILTIN_VEC_VPOPCNT, P8V_BUILTIN_VPOPCNTD,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0, 0 },
+  { P8V_BUILTIN_VEC_VPOPCNT, P8V_BUILTIN_VPOPCNTD,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0, 0 },
+
+  { P8V_BUILTIN_VEC_VPOPCNTB, P8V_BUILTIN_VPOPCNTB,
+    RS6000_BTI_V16QI, RS6000_BTI_V16QI, 0, 0 },
+  { P8V_BUILTIN_VEC_VPOPCNTB, P8V_BUILTIN_VPOPCNTB,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0, 0 },
+
+  { P8V_BUILTIN_VEC_VPOPCNTH, P8V_BUILTIN_VPOPCNTH,
+    RS6000_BTI_V8HI, RS6000_BTI_V8HI, 0, 0 },
+  { P8V_BUILTIN_VEC_VPOPCNTH, P8V_BUILTIN_VPOPCNTH,
+    RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI, 0, 0 },
+
+  { P8V_BUILTIN_VEC_VPOPCNTW, P8V_BUILTIN_VPOPCNTW,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0, 0 },
+  { P8V_BUILTIN_VEC_VPOPCNTW, P8V_BUILTIN_VPOPCNTW,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, 0, 0 },
+
+  { P8V_BUILTIN_VEC_VPOPCNTD, P8V_BUILTIN_VPOPCNTD,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0, 0 },
+  { P8V_BUILTIN_VEC_VPOPCNTD, P8V_BUILTIN_VPOPCNTD,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0, 0 },
+
+  { P8V_BUILTIN_VEC_VPKUDUM, P8V_BUILTIN_VPKUDUM,
+    RS6000_BTI_V4SI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VPKUDUM, P8V_BUILTIN_VPKUDUM,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VPKUDUM, P8V_BUILTIN_VPKUDUM,
+    RS6000_BTI_bool_V4SI, RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V2DI, 0 },
+
+  { P8V_BUILTIN_VEC_VPKSDSS, P8V_BUILTIN_VPKSDSS,
+    RS6000_BTI_V4SI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+
+  { P8V_BUILTIN_VEC_VPKUDUS, P8V_BUILTIN_VPKUDUS,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+
+  { P8V_BUILTIN_VEC_VPKSDUS, P8V_BUILTIN_VPKSDUS,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+
+  { P8V_BUILTIN_VEC_VRLD, P8V_BUILTIN_VRLD,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VRLD, P8V_BUILTIN_VRLD,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+
+  { P8V_BUILTIN_VEC_VSLD, P8V_BUILTIN_VSLD,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VSLD, P8V_BUILTIN_VSLD,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+
+  { P8V_BUILTIN_VEC_VSRD, P8V_BUILTIN_VSRD,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VSRD, P8V_BUILTIN_VSRD,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+
+  { P8V_BUILTIN_VEC_VSRAD, P8V_BUILTIN_VSRAD,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VSRAD, P8V_BUILTIN_VSRD,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+
+  { P8V_BUILTIN_VEC_VSUBUDM, P8V_BUILTIN_VSUBUDM,
+    RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VSUBUDM, P8V_BUILTIN_VSUBUDM,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VSUBUDM, P8V_BUILTIN_VSUBUDM,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VSUBUDM, P8V_BUILTIN_VSUBUDM,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VSUBUDM, P8V_BUILTIN_VSUBUDM,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VSUBUDM, P8V_BUILTIN_VSUBUDM,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+
+  { P8V_BUILTIN_VEC_VUPKHSW, P8V_BUILTIN_VUPKHSW,
+    RS6000_BTI_V2DI, RS6000_BTI_V4SI, 0, 0 },
+  { P8V_BUILTIN_VEC_VUPKHSW, P8V_BUILTIN_VUPKHSW,
+    RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V4SI, 0, 0 },
+
+  { P8V_BUILTIN_VEC_VUPKLSW, P8V_BUILTIN_VUPKLSW,
+    RS6000_BTI_V2DI, RS6000_BTI_V4SI, 0, 0 },
+  { P8V_BUILTIN_VEC_VUPKLSW, P8V_BUILTIN_VUPKLSW,
+    RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V4SI, 0, 0 },
+
+  { P8V_BUILTIN_VEC_VGBBD, P8V_BUILTIN_VGBBD,
+    RS6000_BTI_V16QI, 0, 0, 0 },
+  { P8V_BUILTIN_VEC_VGBBD, P8V_BUILTIN_VGBBD,
+    RS6000_BTI_unsigned_V16QI, 0, 0, 0 },
+
   /* Crypto builtins.  */
   { CRYPTO_BUILTIN_VPERMXOR, CRYPTO_BUILTIN_VPERMXOR_V16QI,
     RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI,
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 199650)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -2859,6 +2859,16 @@ rs6000_option_override_internal (bool gl
 	}
     }
 
+  /* Quad memory only works in 64-bit mode, if the user did -mcpu=power8 -m32,
+     silently turn off quad memory mode.  */
+  if (TARGET_QUAD_MEMORY && !TARGET_POWERPC64)
+    {
+      if ((rs6000_isa_flags_explicit & OPTION_MASK_QUAD_MEMORY) != 0)
+	warning (0, N_("-mquad-memory requires 64-bit mode"));
+
+      rs6000_isa_flags &= ~OPTION_MASK_QUAD_MEMORY;
+    }
+
   if (TARGET_DEBUG_REG || TARGET_DEBUG_TARGET)
     rs6000_print_isa_options (stderr, 0, "before defaults", rs6000_isa_flags);
 
@@ -4082,6 +4092,22 @@ rs6000_builtin_vectorized_function (tree
       enum built_in_function fn = DECL_FUNCTION_CODE (fndecl);
       switch (fn)
 	{
+	case BUILT_IN_CLZIMAX:
+	case BUILT_IN_CLZLL:
+	case BUILT_IN_CLZL:
+	case BUILT_IN_CLZ:
+	  if (TARGET_P8_VECTOR && in_mode == out_mode && out_n == in_n)
+	    {
+	      if (out_mode == QImode && out_n == 16)
+		return rs6000_builtin_decls[P8V_BUILTIN_VCLZB];
+	      else if (out_mode == HImode && out_n == 8)
+		return rs6000_builtin_decls[P8V_BUILTIN_VCLZH];
+	      else if (out_mode == SImode && out_n == 4)
+		return rs6000_builtin_decls[P8V_BUILTIN_VCLZW];
+	      else if (out_mode == DImode && out_n == 2)
+		return rs6000_builtin_decls[P8V_BUILTIN_VCLZD];
+	    }
+	  break;
 	case BUILT_IN_COPYSIGN:
 	  if (VECTOR_UNIT_VSX_P (V2DFmode)
 	      && out_mode == DFmode && out_n == 2
@@ -4097,6 +4123,22 @@ rs6000_builtin_vectorized_function (tree
 	  if (VECTOR_UNIT_ALTIVEC_P (V4SFmode))
 	    return rs6000_builtin_decls[ALTIVEC_BUILTIN_COPYSIGN_V4SF];
 	  break;
+	case BUILT_IN_POPCOUNTIMAX:
+	case BUILT_IN_POPCOUNTLL:
+	case BUILT_IN_POPCOUNTL:
+	case BUILT_IN_POPCOUNT:
+	  if (TARGET_P8_VECTOR && in_mode == out_mode && out_n == in_n)
+	    {
+	      if (out_mode == QImode && out_n == 16)
+		return rs6000_builtin_decls[P8V_BUILTIN_VPOPCNTB];
+	      else if (out_mode == HImode && out_n == 8)
+		return rs6000_builtin_decls[P8V_BUILTIN_VPOPCNTH];
+	      else if (out_mode == SImode && out_n == 4)
+		return rs6000_builtin_decls[P8V_BUILTIN_VPOPCNTW];
+	      else if (out_mode == DImode && out_n == 2)
+		return rs6000_builtin_decls[P8V_BUILTIN_VPOPCNTD];
+	    }
+	  break;
 	case BUILT_IN_SQRT:
 	  if (VECTOR_UNIT_VSX_P (V2DFmode)
 	      && out_mode == DFmode && out_n == 2
@@ -4955,8 +4997,11 @@ rs6000_expand_vector_init (rtx target, r
 	{
 	  rtx freg = gen_reg_rtx (V4SFmode);
 	  rtx sreg = force_reg (SFmode, XVECEXP (vals, 0, 0));
+	  rtx cvt  = ((TARGET_XSCVDPSPN)
+		      ? gen_vsx_xscvdpspn_scalar (freg, sreg)
+		      : gen_vsx_xscvdpsp_scalar (freg, sreg));
 
-	  emit_insn (gen_vsx_xscvdpsp_scalar (freg, sreg));
+	  emit_insn (cvt);
 	  emit_insn (gen_vsx_xxspltw_v4sf (target, freg, const0_rtx));
 	}
       else
@@ -12857,6 +12902,7 @@ builtin_function_type (enum machine_mode
     {
       /* unsigned 1 argument functions.  */
     case CRYPTO_BUILTIN_VSBOX:
+    case P8V_BUILTIN_VGBBD:
       h.uns_p[0] = 1;
       h.uns_p[1] = 1;
       break;
@@ -27214,26 +27260,31 @@ bool
 altivec_expand_vec_perm_const (rtx operands[4])
 {
   struct altivec_perm_insn {
+    HOST_WIDE_INT mask;
     enum insn_code impl;
     unsigned char perm[16];
   };
   static const struct altivec_perm_insn patterns[] = {
-    { CODE_FOR_altivec_vpkuhum,
+    { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vpkuhum,
       {  1,  3,  5,  7,  9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31 } },
-    { CODE_FOR_altivec_vpkuwum,
+    { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vpkuwum,
       {  2,  3,  6,  7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31 } },
-    { CODE_FOR_altivec_vmrghb,
+    { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghb,
       {  0, 16,  1, 17,  2, 18,  3, 19,  4, 20,  5, 21,  6, 22,  7, 23 } },
-    { CODE_FOR_altivec_vmrghh,
+    { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghh,
       {  0,  1, 16, 17,  2,  3, 18, 19,  4,  5, 20, 21,  6,  7, 22, 23 } },
-    { CODE_FOR_altivec_vmrghw,
+    { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghw,
       {  0,  1,  2,  3, 16, 17, 18, 19,  4,  5,  6,  7, 20, 21, 22, 23 } },
-    { CODE_FOR_altivec_vmrglb,
+    { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglb,
       {  8, 24,  9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31 } },
-    { CODE_FOR_altivec_vmrglh,
+    { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglh,
       {  8,  9, 24, 25, 10, 11, 26, 27, 12, 13, 28, 29, 14, 15, 30, 31 } },
-    { CODE_FOR_altivec_vmrglw,
-      {  8,  9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31 } }
+    { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglw,
+      {  8,  9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31 } },
+    { OPTION_MASK_P8_VECTOR, CODE_FOR_p8_vmrgew,
+      {  0,  1,  2,  3, 16, 17, 18, 19,  8,  9, 10, 11, 24, 25, 26, 27 } },
+    { OPTION_MASK_P8_VECTOR, CODE_FOR_p8_vmrgow,
+      {  4,  5,  6,  7, 20, 21, 22, 23, 12, 13, 14, 15, 28, 29, 30, 31 } }
   };
 
   unsigned int i, j, elt, which;
@@ -27333,6 +27384,9 @@ altivec_expand_vec_perm_const (rtx opera
     {
       bool swapped;
 
+      if ((patterns[j].mask & rs6000_isa_flags) == 0)
+	continue;
+
       elt = patterns[j].perm[0];
       if (perm[0] == elt)
 	swapped = false;
Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md	(revision 199650)
+++ gcc/config/rs6000/vsx.md	(working copy)
@@ -36,10 +36,22 @@ (define_mode_iterator VSX_F [V4SF V2DF])
 ;; Iterator for logical types supported by VSX
 (define_mode_iterator VSX_L [V16QI V8HI V4SI V2DI V4SF V2DF TI])
 
+;; Like VSX_L, but don't support TImode for doing logical instructions in
+;; 32-bit
+(define_mode_iterator VSX_L2 [V16QI V8HI V4SI V2DI V4SF V2DF])
+
 ;; Iterator for memory move.  Handle TImode specially to allow
 ;; it to use gprs as well as vsx registers.
 (define_mode_iterator VSX_M [V16QI V8HI V4SI V2DI V4SF V2DF])
 
+(define_mode_iterator VSX_M2 [V16QI
+			      V8HI
+			      V4SI
+			      V2DI
+			      V4SF
+			      V2DF
+			      (TI	"TARGET_VSX_TIMODE")])
+
 ;; Map into the appropriate load/store name based on the type
 (define_mode_attr VSm  [(V16QI "vw4")
 			(V8HI  "vw4")
@@ -191,6 +203,8 @@ (define_c_enum "unspec"
    UNSPEC_VSX_CVDPSXWS
    UNSPEC_VSX_CVDPUXWS
    UNSPEC_VSX_CVSPDP
+   UNSPEC_VSX_CVSPDPN
+   UNSPEC_VSX_CVDPSPN
    UNSPEC_VSX_CVSXWDP
    UNSPEC_VSX_CVUXWDP
    UNSPEC_VSX_CVSXDSP
@@ -1003,6 +1017,40 @@ (define_insn "vsx_xscvspdp_scalar2"
   "xscvspdp %x0,%x1"
   [(set_attr "type" "fp")])
 
+;; Power8 versions using xscvdpspn/xscvspdpn
+(define_insn "vsx_xscvdpspn"
+  [(set (match_operand:V4SF 0 "vsx_register_operand" "=ws,?wa")
+	(unspec:V4SF [(match_operand:DF 1 "vsx_register_operand" "wd,wa")]
+		     UNSPEC_VSX_CVDPSPN))]
+  "TARGET_XSCVDPSPN"
+  "xscvdpspn %x0,%x1"
+  [(set_attr "type" "fp")])
+
+(define_insn "vsx_xscvspdpn"
+  [(set (match_operand:DF 0 "vsx_register_operand" "=ws,?wa")
+	(unspec:DF [(match_operand:V4SF 1 "vsx_register_operand" "wa,wa")]
+		   UNSPEC_VSX_CVSPDPN))]
+  "TARGET_XSCVSPDPN"
+  "xscvspdpn %x0,%x1"
+  [(set_attr "type" "fp")])
+
+(define_insn "vsx_xscvdpspn_scalar"
+  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa")
+	(unspec:V4SF [(match_operand:SF 1 "vsx_register_operand" "f")]
+		     UNSPEC_VSX_CVDPSPN))]
+  "TARGET_XSCVDPSPN"
+  "xscvdpspn %x0,%x1"
+  [(set_attr "type" "fp")])
+
+;; Used by direct move of SFmode from gpr to VSX register
+(define_insn "vsx_xscvspdpn_directmove"
+  [(set (match_operand:SF 0 "vsx_register_operand" "=wa")
+	(unspec:SF [(match_operand:SF 1 "vsx_register_operand" "wa")]
+		   UNSPEC_VSX_CVSPDPN))]
+  "TARGET_XSCVSPDPN"
+  "xscvspdpn %x0,%x1"
+  [(set_attr "type" "fp")])
+
 ;; Convert from 64-bit to 32-bit types
 ;; Note, favor the Altivec registers since the usual use of these instructions
 ;; is in vector converts and we need to use the Altivec vperm instruction.
@@ -1088,70 +1136,370 @@ (define_insn "*vsx_float_fix_<mode>2"
    (set_attr "fp_type" "<VSfptype_simple>")])
 
 \f
-;; Logical operations
-;; Do not support TImode logical instructions on 32-bit at present, because the
-;; compiler will see that we have a TImode and when it wanted DImode, and
-;; convert the DImode to TImode, store it on the stack, and load it in a VSX
-;; register.
-(define_insn "*vsx_and<mode>3"
-  [(set (match_operand:VSX_L 0 "vsx_register_operand" "=<VSr>,?wa")
-        (and:VSX_L
-	 (match_operand:VSX_L 1 "vsx_register_operand" "<VSr>,wa")
-	 (match_operand:VSX_L 2 "vsx_register_operand" "<VSr>,wa")))]
-  "VECTOR_MEM_VSX_P (<MODE>mode)
-   && (<MODE>mode != TImode || TARGET_POWERPC64)"
+;; Logical operations.  Do not support TImode logical instructions on 32-bit at
+;; present, because the compiler will see that we have a TImode and when it
+;; wanted DImode, and convert the DImode to TImode, store it on the stack, and
+;; load it in a VSX register or generate extra logical instructions in GPR
+;; registers.
+
+;; When we are splitting the operations to GPRs, we use three alternatives, two
+;; where the first/second inputs and output are in the same register, and the
+;; third where the output specifies an early clobber so that we don't have to
+;; worry about overlapping registers.
+
+(define_insn "*vsx_and<mode>3_32bit"
+  [(set (match_operand:VSX_L2 0 "vlogical_operand" "=wa")
+        (and:VSX_L2 (match_operand:VSX_L2 1 "vlogical_operand" "%wa")
+		    (match_operand:VSX_L2 2 "vlogical_operand" "wa")))
+   (clobber (match_scratch:CC 3 "X"))]
+  "!TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)"
   "xxland %x0,%x1,%x2"
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecsimple")
+   (set_attr "length" "4")])
 
-(define_insn "*vsx_ior<mode>3"
-  [(set (match_operand:VSX_L 0 "vsx_register_operand" "=<VSr>,?wa")
-        (ior:VSX_L (match_operand:VSX_L 1 "vsx_register_operand" "<VSr>,wa")
-		   (match_operand:VSX_L 2 "vsx_register_operand" "<VSr>,wa")))]
-  "VECTOR_MEM_VSX_P (<MODE>mode)
-   && (<MODE>mode != TImode || TARGET_POWERPC64)"
+(define_insn_and_split "*vsx_and<mode>3_64bit"
+  [(set (match_operand:VSX_L 0 "vlogical_operand" "=wa,?r,?r,&?r")
+        (and:VSX_L
+	 (match_operand:VSX_L 1 "vlogical_operand" "%wa,0,r,r")
+	 (match_operand:VSX_L 2 "vlogical_operand" "wa,r,0,r")))
+   (clobber (match_scratch:CC 3 "X,X,X,X"))]
+  "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)"
+  "@
+   xxland %x0,%x1,%x2
+   #
+   #
+   #"
+  "reload_completed && TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
+   && int_reg_operand (operands[0], <MODE>mode)"
+  [(parallel [(set (match_dup 4) (and:DI (match_dup 5) (match_dup 6)))
+	      (clobber (match_dup 3))])
+   (parallel [(set (match_dup 7) (and:DI (match_dup 8) (match_dup 9)))
+	      (clobber (match_dup 3))])]
+{
+  operands[4] = simplify_subreg (DImode, operands[0], <MODE>mode, 0);
+  operands[5] = simplify_subreg (DImode, operands[1], <MODE>mode, 0);
+  operands[6] = simplify_subreg (DImode, operands[2], <MODE>mode, 0);
+  operands[7] = simplify_subreg (DImode, operands[0], <MODE>mode, 8);
+  operands[8] = simplify_subreg (DImode, operands[1], <MODE>mode, 8);
+  operands[9] = simplify_subreg (DImode, operands[2], <MODE>mode, 8);
+}
+  [(set_attr "type" "vecsimple,two,two,two")
+   (set_attr "length" "4,8,8,8")])
+
+(define_insn "*vsx_ior<mode>3_32bit"
+  [(set (match_operand:VSX_L2 0 "vlogical_operand" "=wa")
+        (ior:VSX_L2 (match_operand:VSX_L2 1 "vlogical_operand" "%wa")
+		    (match_operand:VSX_L2 2 "vlogical_operand" "wa")))]
+  "!TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)"
   "xxlor %x0,%x1,%x2"
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecsimple")
+   (set_attr "length" "4")])
 
-(define_insn "*vsx_xor<mode>3"
-  [(set (match_operand:VSX_L 0 "vsx_register_operand" "=<VSr>,?wa")
-        (xor:VSX_L
-	 (match_operand:VSX_L 1 "vsx_register_operand" "<VSr>,wa")
-	 (match_operand:VSX_L 2 "vsx_register_operand" "<VSr>,wa")))]
-  "VECTOR_MEM_VSX_P (<MODE>mode)
-   && (<MODE>mode != TImode || TARGET_POWERPC64)"
+(define_insn_and_split "*vsx_ior<mode>3_64bit"
+  [(set (match_operand:VSX_L 0 "vlogical_operand" "=wa,?r,?r,&?r,?r,&?r")
+        (ior:VSX_L
+	 (match_operand:VSX_L 1 "vlogical_operand" "%wa,0,r,r,0,r")
+	 (match_operand:VSX_L 2 "vsx_reg_or_cint_operand" "wa,r,0,r,n,n")))]
+  "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)"
+  "@
+   xxlor %x0,%x1,%x2
+   #
+   #
+   #
+   #
+   #"
+  "reload_completed && TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
+   && int_reg_operand (operands[0], <MODE>mode)"
+  [(const_int 0)]
+{
+  operands[3] = simplify_subreg (DImode, operands[0], <MODE>mode, 0);
+  operands[4] = simplify_subreg (DImode, operands[1], <MODE>mode, 0);
+  operands[5] = simplify_subreg (DImode, operands[2], <MODE>mode, 0);
+  operands[6] = simplify_subreg (DImode, operands[0], <MODE>mode, 8);
+  operands[7] = simplify_subreg (DImode, operands[1], <MODE>mode, 8);
+  operands[8] = simplify_subreg (DImode, operands[2], <MODE>mode, 8);
+
+  if (operands[5] == constm1_rtx)
+    emit_move_insn (operands[3], constm1_rtx);
+
+  else if (operands[5] == const0_rtx)
+    {
+      if (!rtx_equal_p (operands[3], operands[4]))
+	emit_move_insn (operands[3], operands[4]);
+    }
+  else
+    emit_insn (gen_iordi3 (operands[3], operands[4], operands[5]));
+
+  if (operands[8] == constm1_rtx)
+    emit_move_insn (operands[8], constm1_rtx);
+
+  else if (operands[8] == const0_rtx)
+    {
+      if (!rtx_equal_p (operands[6], operands[7]))
+	emit_move_insn (operands[6], operands[7]);
+    }
+  else
+    emit_insn (gen_iordi3 (operands[6], operands[7], operands[8]));
+  DONE;
+}
+  [(set_attr "type" "vecsimple,two,two,two,three,three")
+   (set_attr "length" "4,8,8,8,16,16")])
+
+(define_insn "*vsx_xor<mode>3_32bit"
+  [(set (match_operand:VSX_L2 0 "vlogical_operand" "=wa")
+        (xor:VSX_L2 (match_operand:VSX_L2 1 "vlogical_operand" "%wa")
+		    (match_operand:VSX_L2 2 "vlogical_operand" "wa")))]
+  "VECTOR_MEM_VSX_P (<MODE>mode) && !TARGET_POWERPC64"
   "xxlxor %x0,%x1,%x2"
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecsimple")
+   (set_attr "length" "4")])
 
-(define_insn "*vsx_one_cmpl<mode>2"
-  [(set (match_operand:VSX_L 0 "vsx_register_operand" "=<VSr>,?wa")
-        (not:VSX_L
-	 (match_operand:VSX_L 1 "vsx_register_operand" "<VSr>,wa")))]
-  "VECTOR_MEM_VSX_P (<MODE>mode)
-   && (<MODE>mode != TImode || TARGET_POWERPC64)"
+(define_insn_and_split "*vsx_xor<mode>3_64bit"
+  [(set (match_operand:VSX_L 0 "vlogical_operand" "=wa,?r,?r,&?r,?r,&?r")
+        (xor:VSX_L
+	 (match_operand:VSX_L 1 "vlogical_operand" "%wa,0,r,r,0,r")
+	 (match_operand:VSX_L 2 "vsx_reg_or_cint_operand" "wa,r,0,r,n,n")))]
+  "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)"
+  "@
+   xxlxor %x0,%x1,%x2
+   #
+   #
+   #
+   #
+   #"
+  "reload_completed && TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
+   && int_reg_operand (operands[0], <MODE>mode)"
+  [(set (match_dup 3) (xor:DI (match_dup 4) (match_dup 5)))
+   (set (match_dup 6) (xor:DI (match_dup 7) (match_dup 8)))]
+{
+  operands[3] = simplify_subreg (DImode, operands[0], <MODE>mode, 0);
+  operands[4] = simplify_subreg (DImode, operands[1], <MODE>mode, 0);
+  operands[5] = simplify_subreg (DImode, operands[2], <MODE>mode, 0);
+  operands[6] = simplify_subreg (DImode, operands[0], <MODE>mode, 8);
+  operands[7] = simplify_subreg (DImode, operands[1], <MODE>mode, 8);
+  operands[8] = simplify_subreg (DImode, operands[2], <MODE>mode, 8);
+}
+  [(set_attr "type" "vecsimple,two,two,two,three,three")
+   (set_attr "length" "4,8,8,8,16,16")])
+
+(define_insn "*vsx_one_cmpl<mode>2_32bit"
+  [(set (match_operand:VSX_L2 0 "vlogical_operand" "=wa")
+        (not:VSX_L2 (match_operand:VSX_L2 1 "vlogical_operand" "wa")))]
+  "!TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)"
   "xxlnor %x0,%x1,%x1"
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecsimple")
+   (set_attr "length" "4")])
+
+(define_insn_and_split "*vsx_one_cmpl<mode>2_64bit"
+  [(set (match_operand:VSX_L 0 "vlogical_operand" "=wa,?r,&?r")
+        (not:VSX_L (match_operand:VSX_L 1 "vlogical_operand" "wa,0,r")))]
+  "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)"
+  "@
+   xxlnor %x0,%x1,%x1
+   #
+   #"
+  "reload_completed && TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
+   && int_reg_operand (operands[0], <MODE>mode)"
+  [(set (match_dup 2) (not:DI (match_dup 3)))
+   (set (match_dup 4) (not:DI (match_dup 5)))]
+{
+  operands[2] = simplify_subreg (DImode, operands[0], <MODE>mode, 0);
+  operands[3] = simplify_subreg (DImode, operands[1], <MODE>mode, 0);
+  operands[4] = simplify_subreg (DImode, operands[0], <MODE>mode, 8);
+  operands[5] = simplify_subreg (DImode, operands[1], <MODE>mode, 8);
+}
+  [(set_attr "type" "vecsimple,two,two")
+   (set_attr "length" "4,8,8")])
   
-(define_insn "*vsx_nor<mode>3"
-  [(set (match_operand:VSX_L 0 "vsx_register_operand" "=<VSr>,?wa")
-        (not:VSX_L
-	 (ior:VSX_L
-	  (match_operand:VSX_L 1 "vsx_register_operand" "<VSr>,?wa")
-	  (match_operand:VSX_L 2 "vsx_register_operand" "<VSr>,?wa"))))]
-  "VECTOR_MEM_VSX_P (<MODE>mode)
-   && (<MODE>mode != TImode || TARGET_POWERPC64)"
+(define_insn "*vsx_nor<mode>3_32bit"
+  [(set (match_operand:VSX_L2 0 "vlogical_operand" "=wa")
+	(and:VSX_L2
+	 (not:VSX_L2 (match_operand:VSX_L 1 "vlogical_operand" "%wa"))
+	 (not:VSX_L2 (match_operand:VSX_L 2 "vlogical_operand" "wa"))))]
+  "!TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)"
   "xxlnor %x0,%x1,%x2"
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecsimple")
+   (set_attr "length" "4")])
 
-(define_insn "*vsx_andc<mode>3"
-  [(set (match_operand:VSX_L 0 "vsx_register_operand" "=<VSr>,?wa")
+(define_insn_and_split "*vsx_nor<mode>3_64bit"
+  [(set (match_operand:VSX_L 0 "vlogical_operand" "=wa,?r,?r,&?r")
+	(and:VSX_L
+	 (not:VSX_L (match_operand:VSX_L 1 "vlogical_operand" "%wa,0,r,r"))
+	 (not:VSX_L (match_operand:VSX_L 2 "vlogical_operand" "wa,r,0,r"))))]
+  "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)"
+  "@
+   xxlnor %x0,%x1,%x2
+   #
+   #
+   #"
+  "reload_completed && TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
+   && int_reg_operand (operands[0], <MODE>mode)"
+  [(set (match_dup 3) (and:DI (not:DI (match_dup 4)) (not:DI (match_dup 5))))
+   (set (match_dup 6) (and:DI (not:DI (match_dup 7)) (not:DI (match_dup 8))))]
+{
+  operands[3] = simplify_subreg (DImode, operands[0], <MODE>mode, 0);
+  operands[4] = simplify_subreg (DImode, operands[1], <MODE>mode, 0);
+  operands[5] = simplify_subreg (DImode, operands[2], <MODE>mode, 0);
+  operands[6] = simplify_subreg (DImode, operands[0], <MODE>mode, 8);
+  operands[7] = simplify_subreg (DImode, operands[1], <MODE>mode, 8);
+  operands[8] = simplify_subreg (DImode, operands[2], <MODE>mode, 8);
+}
+  [(set_attr "type" "vecsimple,two,two,two")
+   (set_attr "length" "4,8,8,8")])
+
+(define_insn "*vsx_andc<mode>3_32bit"
+  [(set (match_operand:VSX_L2 0 "vlogical_operand" "=wa")
+        (and:VSX_L2
+	 (not:VSX_L2
+	  (match_operand:VSX_L2 2 "vlogical_operand" "wa"))
+	 (match_operand:VSX_L2 1 "vlogical_operand" "wa")))]
+  "!TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)"
+  "xxlandc %x0,%x1,%x2"
+  [(set_attr "type" "vecsimple")
+   (set_attr "length" "4")])
+
+(define_insn_and_split "*vsx_andc<mode>3_64bit"
+  [(set (match_operand:VSX_L 0 "vlogical_operand" "=wa,?r,?r,?r")
         (and:VSX_L
 	 (not:VSX_L
-	  (match_operand:VSX_L 2 "vsx_register_operand" "<VSr>,?wa"))
-	 (match_operand:VSX_L 1 "vsx_register_operand" "<VSr>,?wa")))]
-  "VECTOR_MEM_VSX_P (<MODE>mode)
-   && (<MODE>mode != TImode || TARGET_POWERPC64)"
-  "xxlandc %x0,%x1,%x2"
-  [(set_attr "type" "vecsimple")])
+	  (match_operand:VSX_L 2 "vlogical_operand" "wa,0,r,r"))
+	 (match_operand:VSX_L 1 "vlogical_operand" "wa,r,0,r")))]
+  "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)"
+  "@
+   xxlandc %x0,%x1,%x2
+   #
+   #
+   #"
+  "reload_completed && TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
+   && int_reg_operand (operands[0], <MODE>mode)"
+  [(set (match_dup 3) (and:DI (not:DI (match_dup 4)) (match_dup 5)))
+   (set (match_dup 6) (and:DI (not:DI (match_dup 7)) (match_dup 8)))]
+{
+  operands[3] = simplify_subreg (DImode, operands[0], <MODE>mode, 0);
+  operands[4] = simplify_subreg (DImode, operands[1], <MODE>mode, 0);
+  operands[5] = simplify_subreg (DImode, operands[2], <MODE>mode, 0);
+  operands[6] = simplify_subreg (DImode, operands[0], <MODE>mode, 8);
+  operands[7] = simplify_subreg (DImode, operands[1], <MODE>mode, 8);
+  operands[8] = simplify_subreg (DImode, operands[2], <MODE>mode, 8);
+}
+  [(set_attr "type" "vecsimple,two,two,two")
+   (set_attr "length" "4,8,8,8")])
+
+;; Power8 vector logical instructions.  We only generate the VSX form of the
+;; instruction (xxl<xxx> vs. v<xxx>).
+(define_insn "*vsx_eqv<mode>3_32bit"
+  [(set (match_operand:VSX_L2 0 "vlogical_operand" "=wa")
+	(not:VSX_L2
+	 (xor:VSX_L2 (match_operand:VSX_L2 1 "vlogical_operand" "wa")
+		     (match_operand:VSX_L2 2 "vlogical_operand" "wa"))))]
+  "!TARGET_POWERPC64 && TARGET_P8_VECTOR && VECTOR_MEM_VSX_P (<MODE>mode)"
+  "xxleqv %x0,%x1,%x2"
+  [(set_attr "type" "vecsimple")
+   (set_attr "length" "4")])
+
+(define_insn_and_split "*vsx_eqv<mode>3_64bit"
+  [(set (match_operand:VSX_L 0 "vlogical_operand" "=wa,?r,?r,?r")
+	(not:VSX_L
+	 (xor:VSX_L (match_operand:VSX_L 1 "vlogical_operand" "wa,0,r,r")
+		    (match_operand:VSX_L 2 "vlogical_operand" "wa,r,0,r"))))]
+  "TARGET_POWERPC64 && TARGET_P8_VECTOR && VECTOR_MEM_VSX_P (<MODE>mode)"
+  "@
+   xxleqv %x0,%x1,%x2
+   #
+   #
+   #"
+  "reload_completed && TARGET_POWERPC64 && TARGET_P8_VECTOR
+   && VECTOR_MEM_VSX_P (<MODE>mode)
+   && int_reg_operand (operands[0], <MODE>mode)"
+  [(set (match_dup 3) (not:DI (xor:DI (match_dup 4) (match_dup 5))))
+   (set (match_dup 6) (not:DI (xor:DI (match_dup 7) (match_dup 8))))]
+{
+  operands[3] = simplify_subreg (DImode, operands[0], <MODE>mode, 0);
+  operands[4] = simplify_subreg (DImode, operands[1], <MODE>mode, 0);
+  operands[5] = simplify_subreg (DImode, operands[2], <MODE>mode, 0);
+  operands[6] = simplify_subreg (DImode, operands[0], <MODE>mode, 8);
+  operands[7] = simplify_subreg (DImode, operands[1], <MODE>mode, 8);
+  operands[8] = simplify_subreg (DImode, operands[2], <MODE>mode, 8);
+}
+  [(set_attr "type" "vecsimple,two,two,two")
+   (set_attr "length" "4,8,8,8")])
+
+;; Rewrite nand into canonical form
+(define_insn "*vsx_nand<mode>3_32bit"
+  [(set (match_operand:VSX_L2 0 "vlogical_operand" "=wa")
+	(ior:VSX_L2
+	 (not:VSX_L2 (match_operand:VSX_L2 1 "vlogical_operand" "wa"))
+	 (not:VSX_L2 (match_operand:VSX_L2 2 "vlogical_operand" "wa"))))]
+  "!TARGET_POWERPC64 && TARGET_P8_VECTOR && VECTOR_MEM_VSX_P (<MODE>mode)"
+  "xxlnand %x0,%x1,%x2"
+  [(set_attr "type" "vecsimple")
+   (set_attr "length" "4")])
+
+(define_insn_and_split "*vsx_nand<mode>3_64bit"
+  [(set (match_operand:VSX_L 0 "register_operand" "=wa,?r,?r,?r")
+	(ior:VSX_L
+	 (not:VSX_L (match_operand:VSX_L 1 "register_operand" "wa,0,r,r"))
+	 (not:VSX_L (match_operand:VSX_L 2 "register_operand" "wa,r,0,r"))))]
+  "TARGET_POWERPC64 && TARGET_P8_VECTOR && VECTOR_MEM_VSX_P (<MODE>mode)"
+  "@
+   xxlnand %x0,%x1,%x2
+   #
+   #
+   #"
+  "reload_completed && TARGET_POWERPC64 && TARGET_P8_VECTOR
+   && VECTOR_MEM_VSX_P (<MODE>mode)
+   && int_reg_operand (operands[0], <MODE>mode)"
+  [(set (match_dup 3) (ior:DI (not:DI (match_dup 4)) (not:DI (match_dup 5))))
+   (set (match_dup 6) (ior:DI (not:DI (match_dup 7)) (not:DI (match_dup 8))))]
+{
+  operands[3] = simplify_subreg (DImode, operands[0], <MODE>mode, 0);
+  operands[4] = simplify_subreg (DImode, operands[1], <MODE>mode, 0);
+  operands[5] = simplify_subreg (DImode, operands[2], <MODE>mode, 0);
+  operands[6] = simplify_subreg (DImode, operands[0], <MODE>mode, 8);
+  operands[7] = simplify_subreg (DImode, operands[1], <MODE>mode, 8);
+  operands[8] = simplify_subreg (DImode, operands[2], <MODE>mode, 8);
+}
+  [(set_attr "type" "vecsimple,two,two,two")
+   (set_attr "length" "4,8,8,8")])
+
+;; The canonical form is to have the negated elment first, so we need to
+;; reverse arguments.
+(define_insn "*vsx_orc<mode>3_32bit"
+  [(set (match_operand:VSX_L2 0 "vlogical_operand" "=wa")
+	(ior:VSX_L2
+	 (not:VSX_L2 (match_operand:VSX_L2 1 "vlogical_operand" "wa"))
+	 (match_operand:VSX_L2 2 "vlogical_operand" "wa")))]
+  "!TARGET_POWERPC64 && TARGET_P8_VECTOR && VECTOR_MEM_VSX_P (<MODE>mode)"
+  "xxlorc %x0,%x2,%x1"
+  [(set_attr "type" "vecsimple")
+   (set_attr "length" "4")])
+
+(define_insn_and_split "*vsx_orc<mode>3_64bit"
+  [(set (match_operand:VSX_L 0 "vlogical_operand" "=wa,?r,?r,?r")
+	(ior:VSX_L
+	 (not:VSX_L (match_operand:VSX_L 1 "vlogical_operand" "wa,0,r,r"))
+	 (match_operand:VSX_L 2 "vlogical_operand" "wa,r,0,r")))]
+  "TARGET_POWERPC64 && TARGET_P8_VECTOR && VECTOR_MEM_VSX_P (<MODE>mode)"
+  "@
+   xxlorc %x0,%x2,%x1
+   #
+   #
+   #"
+  "reload_completed && TARGET_POWERPC64 && TARGET_P8_VECTOR
+   && VECTOR_MEM_VSX_P (<MODE>mode)
+   && int_reg_operand (operands[0], <MODE>mode)"
+  [(set (match_dup 3) (ior:DI (not:DI (match_dup 4)) (match_dup 5)))
+   (set (match_dup 6) (ior:DI (not:DI (match_dup 7)) (match_dup 8)))]
+{
+  operands[3] = simplify_subreg (DImode, operands[0], <MODE>mode, 0);
+  operands[4] = simplify_subreg (DImode, operands[1], <MODE>mode, 0);
+  operands[5] = simplify_subreg (DImode, operands[2], <MODE>mode, 0);
+  operands[6] = simplify_subreg (DImode, operands[0], <MODE>mode, 8);
+  operands[7] = simplify_subreg (DImode, operands[1], <MODE>mode, 8);
+  operands[8] = simplify_subreg (DImode, operands[2], <MODE>mode, 8);
+}
+  [(set_attr "type" "vecsimple,two,two,two")
+   (set_attr "length" "4,8,8,8")])
 
 \f
 ;; Permute operations
Index: gcc/config/rs6000/rs6000.h
===================================================================
--- gcc/config/rs6000/rs6000.h	(revision 199650)
+++ gcc/config/rs6000/rs6000.h	(working copy)
@@ -1114,10 +1114,10 @@ extern unsigned rs6000_pointer_size;
 #define VINT_REGNO_P(N) ALTIVEC_REGNO_P (N)
 
 /* Alternate name for any vector register supporting logical operations, no
-   matter which instruction set(s) are available.  Under VSX, we allow GPRs as
-   well as vector registers on 64-bit systems.  We don't allow 32-bit systems,
-   due to the number of registers involved, and the number of instructions to
-   load/store the values..  */
+   matter which instruction set(s) are available.  If we in 64-bit mode, we
+   also allow logical operations in the GPRS.  This is to allow atomic quad
+   word builtins not to need the VSX registers for lqarx/stqcx.  It also helps
+   with __int128_t arguments that are passed in GPRs.  */
 #define VLOGICAL_REGNO_P(N)						\
   (ALTIVEC_REGNO_P (N)							\
    || (TARGET_VSX && FP_REGNO_P (N))					\
Index: gcc/config/rs6000/altivec.md
===================================================================
--- gcc/config/rs6000/altivec.md	(revision 199650)
+++ gcc/config/rs6000/altivec.md	(working copy)
@@ -128,6 +128,7 @@ (define_c_enum "unspec"
    UNSPEC_VUPKLS_V4SF
    UNSPEC_VUPKHU_V4SF
    UNSPEC_VUPKLU_V4SF
+   UNSPEC_VGBBD
 ])
 
 (define_c_enum "unspecv"
@@ -941,6 +942,31 @@ (define_insn "*altivec_vmrglsf"
   "vmrglw %0,%1,%2"
   [(set_attr "type" "vecperm")])
 
+;; Power8 vector merge even/odd
+(define_insn "p8_vmrgew"
+  [(set (match_operand:V4SI 0 "register_operand" "=v")
+	(vec_select:V4SI
+	  (vec_concat:V8SI
+	    (match_operand:V4SI 1 "register_operand" "v")
+	    (match_operand:V4SI 2 "register_operand" "v"))
+	  (parallel [(const_int 0) (const_int 4)
+		     (const_int 2) (const_int 6)])))]
+  "TARGET_P8_VECTOR"
+  "vmrgew %0,%1,%2"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "p8_vmrgow"
+  [(set (match_operand:V4SI 0 "register_operand" "=v")
+	(vec_select:V4SI
+	  (vec_concat:V8SI
+	    (match_operand:V4SI 1 "register_operand" "v")
+	    (match_operand:V4SI 2 "register_operand" "v"))
+	  (parallel [(const_int 1) (const_int 5)
+		     (const_int 3) (const_int 7)])))]
+  "TARGET_P8_VECTOR"
+  "vmrgow %0,%1,%2"
+  [(set_attr "type" "vecperm")])
+
 (define_insn "vec_widen_umult_even_v16qi"
   [(set (match_operand:V8HI 0 "register_operand" "=v")
         (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")
@@ -1017,10 +1043,13 @@ (define_insn "vec_widen_smult_odd_v8hi"
 ;; logical ops.  Have the logical ops follow the memory ops in
 ;; terms of whether to prefer VSX or Altivec
 
+;; For and, add the clobber to be consistant with VSX, which adds splitters for
+;; using the GPR registers.
 (define_insn "*altivec_and<mode>3"
   [(set (match_operand:VM 0 "register_operand" "=v")
         (and:VM (match_operand:VM 1 "register_operand" "v")
-		(match_operand:VM 2 "register_operand" "v")))]
+		(match_operand:VM 2 "register_operand" "v")))
+   (clobber (match_scratch:CC 3 "=X"))]
   "VECTOR_MEM_ALTIVEC_P (<MODE>mode)"
   "vand %0,%1,%2"
   [(set_attr "type" "vecsimple")])
@@ -1050,8 +1079,8 @@ (define_insn "*altivec_one_cmpl<mode>2"
   
 (define_insn "*altivec_nor<mode>3"
   [(set (match_operand:VM 0 "register_operand" "=v")
-        (not:VM (ior:VM (match_operand:VM 1 "register_operand" "v")
-			(match_operand:VM 2 "register_operand" "v"))))]
+	(and:VM (not:VM (match_operand:VM 1 "register_operand" "v"))
+		(not:VM (match_operand:VM 2 "register_operand" "v"))))]
   "VECTOR_MEM_ALTIVEC_P (<MODE>mode)"
   "vnor %0,%1,%2"
   [(set_attr "type" "vecsimple")])
@@ -2370,3 +2399,34 @@ (define_expand "vec_unpacku_float_lo_v8h
   emit_insn (gen_altivec_vcfux (operands[0], tmp, const0_rtx));
   DONE;
 }")
+
+\f
+;; Power8 vector instructions encoded as Altivec instructions
+
+;; Vector count leading zeros
+(define_insn "*p8v_clz<mode>2"
+  [(set (match_operand:VI2 0 "register_operand" "=v")
+	(clz:VI2 (match_operand:VI2 1 "register_operand" "v")))]
+  "TARGET_P8_VECTOR"
+  "vclz<wd> %0,%1"
+  [(set_attr "length" "4")
+   (set_attr "type" "vecsimple")])
+
+;; Vector population count
+(define_insn "*p8v_popcount<mode>2"
+  [(set (match_operand:VI2 0 "register_operand" "=v")
+        (popcount:VI2 (match_operand:VI2 1 "register_operand" "v")))]
+  "TARGET_P8_VECTOR"
+  "vpopcnt<wd> %0,%1"
+  [(set_attr "length" "4")
+   (set_attr "type" "vecsimple")])
+
+;; Vector Gather Bits by Bytes by Doubleword
+(define_insn "p8v_vgbbd"
+  [(set (match_operand:V16QI 0 "register_operand" "=v")
+	(unspec:V16QI [(match_operand:V16QI 1 "register_operand" "v")]
+		      UNSPEC_VGBBD))]
+  "TARGET_P8_VECTOR"
+  "vgbbd %0,%1"
+  [(set_attr "length" "4")
+   (set_attr "type" "vecsimple")])
Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(revision 199650)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -8290,6 +8290,20 @@ (define_split
 	(compare:CC (match_dup 0)
 		    (const_int 0)))]
   "")
+
+;; Eqv operation.
+;; It probably is not worth it to add combiner insns to recognize eqv compared
+;; to 0 operations.
+(define_insn "*eqv<mode>3"
+  [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
+	(not:GPR
+	 (xor:GPR (match_operand:GPR 1 "gpc_reg_operand" "r")
+		  (match_operand:GPR 2 "gpc_reg_operand" "r"))))]
+  ""
+  "eqv %0,%1,%2"
+  [(set_attr "type" "integer")
+   (set_attr "length" "4")])
+
 \f
 ;; Now define ways of moving data around.
 
Index: gcc/config/rs6000/altivec.h
===================================================================
--- gcc/config/rs6000/altivec.h	(revision 199650)
+++ gcc/config/rs6000/altivec.h	(working copy)
@@ -323,15 +323,31 @@
 
 #ifdef _ARCH_PWR8
 /* Vector additions added in ISA 2.07.  */
+#define vec_eqv __builtin_vec_eqv
+#define vec_nand __builtin_vec_nand
+#define vec_orc __builtin_vec_orc
 #define vec_vaddudm __builtin_vec_vaddudm
+#define vec_vclz __builtin_vec_vclz
+#define vec_vclzb __builtin_vec_vclzb
+#define vec_vclzd __builtin_vec_vclzd
+#define vec_vclzh __builtin_vec_vclzh
+#define vec_vclzw __builtin_vec_vclzw
+#define vec_vgbbd __builtin_vec_vgbbd
 #define vec_vmaxsd __builtin_vec_vmaxsd
 #define vec_vmaxud __builtin_vec_vmaxud
 #define vec_vminsd __builtin_vec_vminsd
 #define vec_vminud __builtin_vec_vminud
+#define vec_vmrgew __builtin_vec_vmrgew
+#define vec_vmrgow __builtin_vec_vmrgow
 #define vec_vpksdss __builtin_vec_vpksdss
 #define vec_vpksdus __builtin_vec_vpksdus
 #define vec_vpkudum __builtin_vec_vpkudum
 #define vec_vpkudus __builtin_vec_vpkudus
+#define vec_vpopcnt __builtin_vec_vpopcnt
+#define vec_vpopcntb __builtin_vec_vpopcntb
+#define vec_vpopcntd __builtin_vec_vpopcntd
+#define vec_vpopcnth __builtin_vec_vpopcnth
+#define vec_vpopcntw __builtin_vec_vpopcntw
 #define vec_vrld __builtin_vec_vrld
 #define vec_vsld __builtin_vec_vsld
 #define vec_vsrad __builtin_vec_vsrad

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH, rs6000] power8 patches, patch #4 (revised), new power8 builtins
  2013-06-04 18:49   ` [PATCH, rs6000] power8 patches, patch #4 (revised), " Michael Meissner
@ 2013-06-05 14:28     ` David Edelsohn
  2013-06-05 15:50       ` Segher Boessenkool
  2013-06-05 16:13       ` Michael Meissner
  0 siblings, 2 replies; 52+ messages in thread
From: David Edelsohn @ 2013-06-05 14:28 UTC (permalink / raw)
  To: Michael Meissner, GCC Patches, David Edelsohn, Pat Haugen, Peter Bergner

On Tue, Jun 4, 2013 at 2:48 PM, Michael Meissner
<meissner@linux.vnet.ibm.com> wrote:
> I revised this patch for power8 to add new miscellaneous vector instructions to
> not turn off splitting wide moves.  In doing the patch, I discovered that we
> never supported the 'eqv' instruction, and I have added support for eqv in the
> GPR registers.
>
> I also fixed the issue David raised in patch #2, that I did not protect the
> crypt tests in case an assembler that does not understand ISA 2.07 instructions
> was used to build the compiler.  I brought in the changes to
> target-supports.exp from patch #5 to fix this.
>
> This patch bootstraps and causes no regressions, is it ok to check in?
>
> [gcc]
> 2013-06-04  Michael Meissner  <meissner@linux.vnet.ibm.com>
>             Pat Haugen <pthaugen@us.ibm.com>
>             Peter Bergner <bergner@vnet.ibm.com>
>
>         * doc/extend.texi (PowerPC AltiVec/VSX Built-in Functions):
>         Document new power8 builtins.
>
>         * config/rs6000/vector.md (and<mode>3): Add a clobber/scratch of a
>         condition code register, so TImode logical operations can be done
>         either in VSX registers or GPRs.
>         (nor<mode>3): Use the canonical form for nor.
>         (eqv<mode>3): Add expanders for power8 xxleqv, xxlnand, xxlorc,
>         vclz*, and vpopcnt* vector instructions.
>         (nand<mode>3): Likewise.
>         (orc<mode>3): Likewise.
>         (clz<mode>2): LIkewise.
>         (popcount<mode>2): Likewise.
>
>         * config/rs6000/predicates.md (int_reg_operand): Rework tests so
>         that only the GPRs are recognized.
>
>         * config/rs6000/rs6000-builtin.def (xscvspdpn): Add new power8
>         builtin functions.
>         (xscvdpspn): Likewise.
>         (vclzb): Likewise.
>         (vclzh): Likewise.
>         (vclzw): Likewise.
>         (vclzd): Likewise.
>         (vpopcntb): Likewise.
>         (vpopcnth): Likewise.
>         (vpopcntw): Likewise.
>         (vpopcntd): Likewise.
>         (vgbbd): Likewise.
>         (vmrgew): Likewise.
>         (vmrgow): Likewise.
>         (eqv_v16qi3): Likewise.
>         (eqv_v8hi3): Likewise.
>         (eqv_v4si3): Likewise.
>         (eqv_v2di3): Likewise.
>         (eqv_v4sf3): Likewise.
>         (eqv_v2df3): Likewise.
>         (nand_v16qi3): Likewise.
>         (nand_v8hi3): Likewise.
>         (nand_v4si3): Likewise.
>         (nand_v2di3): Likewise.
>         (nand_v4sf3): Likewise.
>         (nand_v2df3): Likewise.
>         (orc_v16qi3): Likewise.
>         (orc_v8hi3): Likewise.
>         (orc_v4si3): Likewise.
>         (orc_v2di3): Likewise.
>         (orc_v4sf3): Likewise.
>         (orc_v2df3): Likewise.
>         (vclz): Likewise.
>         (vclzb): Likewise.
>         (vclzh): Likewise.
>         (vclzw): Likewise.
>         (vclzd): Likewise.
>         (vpopcnt): Likewise.
>         (vpopcntb): Likewise.
>         (vpopcnth): Likewise.
>         (vpopcntw): Likewise.
>         (vpopcntd): Likewise.
>         (vgbbd): Likewise.
>         (eqv): Likewise.
>         (nand): Likewise.
>         (orc): Likewise.
>         (vmrgew): Likewise.
>         (vmrgow): Likewise.
>
>         * config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Add
>         support for new power8 builtins.
>
>         * config/rs6000/rs6000.c (rs6000_option_override_internal): Only
>         allow power8 quad mode in 64-bit.
>         (rs6000_builtin_vectorized_function): Vectorize count leading
>         zeros, population count builtins.
>         (rs6000_expand_vector_init): On power8 use xscvdpspn to form V4SF
>         vectors instead of xscvdpsp to avoid IEEE related traps.
>         (builtin_function_type): Add vgbbd builtin function which takes an
>         unsigned argument.
>         (altivec_expand_vec_perm_const): Add support for new power8 merge
>         instructions.
>
>         * config/rs6000/vsx.md (VSX_L2): New iterator for 128-bit types,
>         that does not include TImdoe for use with 32-bit.
>         (UNSPEC_VSX_CVSPDPN): Support for power8 xscvdpspn and xscvspdpn
>         instructions.
>         (UNSPEC_VSX_CVDPSPN): Likewise.
>         (vsx_xscvdpspn): Likewise.
>         (vsx_xscvspdpn): Likewise.
>         (vsx_xscvdpspn_scalar): Likewise.
>         (vsx_xscvspdpn_directmove): Likewise.
>         (vsx_and<mode>3): Split logical operations into 32-bit and
>         64-bit. Add support to do logical operations on TImode as well as
>         VSX vector types.  Allow logical operations to be done in either
>         VSX registers or in general purpose registers in 64-bit mode.  Add
>         splitters if GPRs were used. For and, add clobber of CCmode to
>         allow use of ANDI on GPRs.
>         (vsx_and<mode>3_32bit): Likewise.
>         (vsx_and<mode>3_64bit): Likewise.
>         (vsx_ior<mode>3): Likewise.
>         (vsx_ior<mode>3_32bit): Likewise.
>         (vsx_ior<mode>3_64bit): Likewise.
>         (vsx_xor<mode>3): Likewise.
>         (vsx_xor<mode>3_32bit): Likewise.
>         (vsx_xor<mode>3_64bit): Likewise.
>         (vsx_one_cmpl<mode>2): Likewise.
>         (vsx_one_cmpl<mode>2_32bit): Likewise.
>         (vsx_one_cmpl<mode>2_64bit): Likewise.
>         (vsx_nor<mode>3): Likewise.
>         (vsx_nor<mode>3_32bit): Likewise.
>         (vsx_nor<mode>3_64bit): Likewise.
>         (vsx_andc<mode>3): Likewise.
>         (vsx_andc<mode>3_32bit): Likewise.
>         (vsx_andc<mode>3_64bit): Likewise.
>         (vsx_eqv<mode>3_32bit): Add support for power8 xxleqv, xxlnand,
>         and xxlorc instructions.
>         (vsx_eqv<mode>3_64bit): Likewise.
>         (vsx_nand<mode>3_32bit): Likewise.
>         (vsx_nand<mode>3_64bit): Likewise.
>         (vsx_orc<mode>3_32bit): Likewise.
>         (vsx_orc<mode>3_64bit): Likewise.
>
>         * config/rs6000/rs6000.h (VLOGICAL_REGNO_P): Update comment.
>
>         * config/rs6000/altivec.md (UNSPEC_VGBBD): Add power8 vgbbd
>         instruction.
>         (p8_vmrgew): Add power8 vmrgew and vmrgow instructions.
>         (p8_vmrgow): Likewise.
>         (altivec_and<mode>3): Add clobber of CCmode to allow AND using
>         GPRs to be split under VSX.
>         (p8v_clz<mode>2): Add power8 count leading zero support.
>         (p8v_popcount<mode>2): Add power8 population count support.
>         (p8v_vgbbd): Add power8 gather bits by bytes by doubleword
>         support.
>
>         * config/rs6000/rs6000.md (eqv<mode>3): Add support for powerp eqv
>         instruction.
>
>         * config/rs6000/altivec.h (vec_eqv): Add defines to export power8
>         builtin functions.
>         (vec_nand): Likewise.
>         (vec_vclz): Likewise.
>         (vec_vclzb): Likewise.
>         (vec_vclzd): Likewise.
>         (vec_vclzh): Likewise.
>         (vec_vclzw): Likewise.
>         (vec_vgbbd): Likewise.
>         (vec_vmrgew): Likewise.
>         (vec_vmrgow): Likewise.
>         (vec_vpopcnt): Likewise.
>         (vec_vpopcntb): Likewise.
>         (vec_vpopcntd): Likewise.
>         (vec_vpopcnth): Likewise.
>         (vec_vpopcntw): Likewise.
>
> [gcc/testsuite]
> 2013-06-04  Michael Meissner  <meissner@linux.vnet.ibm.com>
>             Pat Haugen <pthaugen@us.ibm.com>
>             Peter Bergner <bergner@vnet.ibm.com>
>
>         * gcc.target/powerpc/bool.c: New file, add eqv, nand, nor tests.
>
>         * gcc.target/powerpc/crypto-builtin-1.c: Use effective target
>         powerpc_p8vector_ok instead of powerpc_vsx_ok.
>
>         * lib/target-supports.exp (check_p8vector_hw_available) Add power8
>         support.
>         (check_effective_target_powerpc_p8vector_ok): Likewise.
>         (is-effective-target): Likewise.
>         (check_vect_support_and_set_flags): Likewise.

Thanks for the changes and fixes. It's looking better.

+;; The canonical form is to have the negated elment first, so we need to
+;; reverse arguments.

Please fix the typo in the comment: "element".

+;; Like VSX_L, but don't support TImode for doing logical instructions in
+;; 32-bit
+(define_mode_iterator VSX_L2 [V16QI V8HI V4SI V2DI V4SF V2DF])
+
 ;; Iterator for memory move.  Handle TImode specially to allow
 ;; it to use gprs as well as vsx registers.
 (define_mode_iterator VSX_M [V16QI V8HI V4SI V2DI V4SF V2DF])

+(define_mode_iterator VSX_M2 [V16QI
+                  V8HI
+                  V4SI
+                  V2DI
+                  V4SF
+                  V2DF
+                  (TI    "TARGET_VSX_TIMODE")])

The patch adds new iterators VSX_L2 and VSX_M2.  The original
ChangeLog only mentioned M2 and the new ChangeLog only mentions L2.
What's going on?

* config/rs6000/rs6000.md (eqv<mode>3): Add support for powerp eqv instruction.

Why isn't this covered by boolean_operator and %q output operand?  And
why can't that predicate and output operand handle vsx as well, e.g.,
*vsx_eqv?  Why don't we simply have vsx_bool<mode>3, etc.

Thanks, David

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH, rs6000] power8 patches, patch #4 (revised), new power8 builtins
  2013-06-05 14:28     ` David Edelsohn
@ 2013-06-05 15:50       ` Segher Boessenkool
  2013-06-05 16:05         ` Michael Meissner
  2013-06-05 16:13       ` Michael Meissner
  1 sibling, 1 reply; 52+ messages in thread
From: Segher Boessenkool @ 2013-06-05 15:50 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Michael Meissner, GCC Patches, Pat Haugen, Peter Bergner

> * config/rs6000/rs6000.md (eqv<mode>3): Add support for powerp eqv  
> instruction.

[Typo, "powerp".  There are many more typos and non-grammatical
sentences.]

> Why isn't this covered by boolean_operator and %q output operand?

The existing patterns (boolc<mode>3_...) do for eqv:

	(set reg (xor (not reg) reg))

which is not canonical RTL.


Segher

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH, rs6000] power8 patches, patch #4 (revised), new power8 builtins
  2013-06-05 15:50       ` Segher Boessenkool
@ 2013-06-05 16:05         ` Michael Meissner
  2013-06-05 20:06           ` Segher Boessenkool
  0 siblings, 1 reply; 52+ messages in thread
From: Michael Meissner @ 2013-06-05 16:05 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: David Edelsohn, Michael Meissner, GCC Patches, Pat Haugen, Peter Bergner

On Wed, Jun 05, 2013 at 05:50:21PM +0200, Segher Boessenkool wrote:
> >* config/rs6000/rs6000.md (eqv<mode>3): Add support for powerp eqv
> >instruction.
> 
> [Typo, "powerp".  There are many more typos and non-grammatical
> sentences.]
> 
> >Why isn't this covered by boolean_operator and %q output operand?
> 
> The existing patterns (boolc<mode>3_...) do for eqv:
> 
> 	(set reg (xor (not reg) reg))
> 
> which is not canonical RTL.

I wasn't really aware of boolc and friends when I wrote the patches.  However,
I need named insns to enable vector builtin functions, so except for the
splitting part, I wouldn't be able to save much rtl.

I also wonder whether it would be useful to have 32-bit do the vector logical
ops in gprs as well.  At the moment, the patches don't allow it (vector types
must be done in the altivec/vsx registers, an TImode is done by splitting the
operation into 4 separate categories).  On the 64-bit side, having __int128_t
passed in GPRs, means you want to avoid ping-ponging between the GPRs and VSX
registers.  In addition, the atomic quad word support (patch #7) has to run in
GPRs, so we need add/subtract/logical to have versions that run in GPRs.

I can rewrite the pattern for the vector eqv so that when split it uses the
format used by boolc, but since it is not canonical we would never generate it
in the rare cases it might be useful to generate it.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH, rs6000] power8 patches, patch #4 (revised), new power8 builtins
  2013-06-05 14:28     ` David Edelsohn
  2013-06-05 15:50       ` Segher Boessenkool
@ 2013-06-05 16:13       ` Michael Meissner
  2013-06-05 17:28         ` David Edelsohn
  2013-06-06 15:57         ` David Edelsohn
  1 sibling, 2 replies; 52+ messages in thread
From: Michael Meissner @ 2013-06-05 16:13 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Michael Meissner, GCC Patches, Pat Haugen, Peter Bergner

On Wed, Jun 05, 2013 at 10:28:02AM -0400, David Edelsohn wrote:
> +;; The canonical form is to have the negated elment first, so we need to
> +;; reverse arguments.
> 
> Please fix the typo in the comment: "element".

Ok.  I need to proof-read the patches before sending them out.

> +;; Like VSX_L, but don't support TImode for doing logical instructions in
> +;; 32-bit
> +(define_mode_iterator VSX_L2 [V16QI V8HI V4SI V2DI V4SF V2DF])
> +
>  ;; Iterator for memory move.  Handle TImode specially to allow
>  ;; it to use gprs as well as vsx registers.
>  (define_mode_iterator VSX_M [V16QI V8HI V4SI V2DI V4SF V2DF])
> 
> +(define_mode_iterator VSX_M2 [V16QI
> +                  V8HI
> +                  V4SI
> +                  V2DI
> +                  V4SF
> +                  V2DF
> +                  (TI    "TARGET_VSX_TIMODE")])
> 
> The patch adds new iterators VSX_L2 and VSX_M2.  The original
> ChangeLog only mentioned M2 and the new ChangeLog only mentions L2.
> What's going on?

I thought I had deleted VSX_M2 from this patch.  It will be needed in patch #8
for the fusion peephole.  The difference is VSX_L2 avoids TImode altogether,
and was used by the logical ops to prevent TImode operations in VSX registers
in 32-bit.

The problem is unless we have expanders/splitters for logical DImode, the
compiler when it wants to do a logical DImode operation says, aha I have a
TImode operation, and then it converts the DImode value to TImode, does the
operation (which in turn may mean transfer between GPR and VSX registers).

I can add splitters and such for 32-bit DImode to prevent this, but I don't
know if you want me to do it in the context of this patch, or do it as a later
patch.

> * config/rs6000/rs6000.md (eqv<mode>3): Add support for powerp eqv instruction.
> 
> Why isn't this covered by boolean_operator and %q output operand?  And
> why can't that predicate and output operand handle vsx as well, e.g.,
> *vsx_eqv?  Why don't we simply have vsx_bool<mode>3, etc.

See my reply to Segher that boolc doesn't use the canonical form, so eqv is not
currently generated.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH, rs6000] power8 patches, patch #4 (revised), new power8 builtins
  2013-06-05 16:13       ` Michael Meissner
@ 2013-06-05 17:28         ` David Edelsohn
  2013-06-06 15:57         ` David Edelsohn
  1 sibling, 0 replies; 52+ messages in thread
From: David Edelsohn @ 2013-06-05 17:28 UTC (permalink / raw)
  To: Michael Meissner, David Edelsohn, GCC Patches, Pat Haugen, Peter Bergner

On Wed, Jun 5, 2013 at 12:13 PM, Michael Meissner
<meissner@linux.vnet.ibm.com> wrote:

> I thought I had deleted VSX_M2 from this patch.  It will be needed in patch #8
> for the fusion peephole.  The difference is VSX_L2 avoids TImode altogether,
> and was used by the logical ops to prevent TImode operations in VSX registers
> in 32-bit.
>
> The problem is unless we have expanders/splitters for logical DImode, the
> compiler when it wants to do a logical DImode operation says, aha I have a
> TImode operation, and then it converts the DImode value to TImode, does the
> operation (which in turn may mean transfer between GPR and VSX registers).
>
> I can add splitters and such for 32-bit DImode to prevent this, but I don't
> know if you want me to do it in the context of this patch, or do it as a later
> patch.

I don't have a preference whether the iterator is added now or later,
but I want the ChangeLog to match the patch.

>
>> * config/rs6000/rs6000.md (eqv<mode>3): Add support for powerp eqv instruction.
>>
>> Why isn't this covered by boolean_operator and %q output operand?  And
>> why can't that predicate and output operand handle vsx as well, e.g.,
>> *vsx_eqv?  Why don't we simply have vsx_bool<mode>3, etc.
>
> See my reply to Segher that boolc doesn't use the canonical form, so eqv is not
> currently generated.

Okay, I understand that the bool patterns no longer are matching
canonical RTL after the canonical RTL was changed.  Unfortunately,
there were no powerpc-specific tests to ensure that those instructions
are generated.

I also understand that you need to match builtins.

But I would like vector.md and vsx.md to use the boolean operator
iterators.  I don't think that the patterns explicitly need to be
duplicated. This creates more clutter and more opportunity for
mistakes and differences.

Thanks, David

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH, rs6000] power8 patches, patch #4 (revised), new power8 builtins
  2013-06-05 16:05         ` Michael Meissner
@ 2013-06-05 20:06           ` Segher Boessenkool
  2013-06-05 20:24             ` Michael Meissner
  0 siblings, 1 reply; 52+ messages in thread
From: Segher Boessenkool @ 2013-06-05 20:06 UTC (permalink / raw)
  To: Michael Meissner; +Cc: David Edelsohn, GCC Patches, Pat Haugen, Peter Bergner

> I also wonder whether it would be useful to have 32-bit do the  
> vector logical
> ops in gprs as well.  At the moment, the patches don't allow it  
> (vector types
> must be done in the altivec/vsx registers, an TImode is done by  
> splitting the
> operation into 4 separate categories).  On the 64-bit side, having  
> __int128_t
> passed in GPRs, means you want to avoid ping-ponging between the  
> GPRs and VSX
> registers.  In addition, the atomic quad word support (patch #7)  
> has to run in
> GPRs, so we need add/subtract/logical to have versions that run in  
> GPRs.

It might work better if you added a mode V1TI for TI in vector
regs, and then used plain TI only for GPRs.  It certainly will
make things a lot more regular; whether it actually works better,
I have no idea.

The way you have things now, only after reload the vector patterns
are split to GPR patterns; much too late to do most optimisations
on it.  On the other hand, deciding early what register set some
op should go to isn't too pleasant either; is it always the best
choice to use the vector regs when possible?


Segher

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH, rs6000] power8 patches, patch #4 (revised), new power8 builtins
  2013-06-05 20:06           ` Segher Boessenkool
@ 2013-06-05 20:24             ` Michael Meissner
  0 siblings, 0 replies; 52+ messages in thread
From: Michael Meissner @ 2013-06-05 20:24 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Michael Meissner, David Edelsohn, GCC Patches, Pat Haugen, Peter Bergner

On Wed, Jun 05, 2013 at 10:06:08PM +0200, Segher Boessenkool wrote:
> >I also wonder whether it would be useful to have 32-bit do the
> >vector logical
> >ops in gprs as well.  At the moment, the patches don't allow it
> >(vector types
> >must be done in the altivec/vsx registers, an TImode is done by
> >splitting the
> >operation into 4 separate categories).  On the 64-bit side, having
> >__int128_t
> >passed in GPRs, means you want to avoid ping-ponging between the
> >GPRs and VSX
> >registers.  In addition, the atomic quad word support (patch #7)
> >has to run in
> >GPRs, so we need add/subtract/logical to have versions that run in
> >GPRs.
> 
> It might work better if you added a mode V1TI for TI in vector
> regs, and then used plain TI only for GPRs.  It certainly will
> make things a lot more regular; whether it actually works better,
> I have no idea.
> 
> The way you have things now, only after reload the vector patterns
> are split to GPR patterns; much too late to do most optimisations
> on it.  On the other hand, deciding early what register set some
> op should go to isn't too pleasant either; is it always the best
> choice to use the vector regs when possible?

It depends.  For example consider:

#ifndef TYPE
#define TYPE __int128_t
#endif

TYPE a_and (TYPE p, TYPE q) { return p & q; }
void p_and (TYPE *p, TYPE *q, TYPE *r) { *p = *q & *r; }

In a_and, p and q are passed in GPRs, so you want to use the GPR based
instructions.  In p_and, it is simpler to do the instruction in the VSX
registers.

This is what my code from patch 4 generates:

.L.a_and:
        and 3,3,5
        and 4,4,6
        blr

.L.p_and:
        lxvd2x 12,0,4
        lxvd2x 0,0,5
        xxland 0,12,0
        stxvd2x 0,0,3
        blr

Unfortunately when I added the TImode in VSX registers, I didn't notice this,
and the current code generates:

.L.a_and:
        addi 9,1,-16
        std 3,0(9)
        std 4,8(9)
        ori 2,2,0
        lxvd2x 12,0,9
        std 5,0(9)
        std 6,8(9)
        ori 2,2,0
        lxvd2x 0,0,9
        xxland 0,12,0
        stxvd2x 0,0,9
        ori 2,2,0
        ld 3,0(9)
        ld 4,8(9)
        blr

.L.p_and:
        lxvd2x 12,0,4
        lxvd2x 0,0,5
        xxland 0,12,0
        stxvd2x 0,0,3
        blr

Previous versions (and -mno-vsx-timode) generate:

.L.a_and:
        and 3,3,5
        and 4,4,6
        blr

.L.p_and:
        ld 10,0(4)
        ld 9,0(5)
        and 9,10,9
        std 9,0(3)
        ld 10,8(4)
        ld 9,8(5)
        and 9,10,9
        std 9,8(3)
        blr

Note, that the scheduler does not interleave the loads and the and's, instead
it does ld/ld/and/std.

This bouncing back and forth will get somewhat worse when the support for doing
128int_t add/subtract in the vector registers is added.  We don't want to hard
wire doing all of TImode in vector registers, because this breaks the 8-byte
atomic fetch_and_add functions (without having to use an UNSPEC to hide the
add).

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH, rs6000] power8 patches, patch #4 (revised), new power8 builtins
  2013-06-05 16:13       ` Michael Meissner
  2013-06-05 17:28         ` David Edelsohn
@ 2013-06-06 15:57         ` David Edelsohn
  2013-06-06 21:42           ` Michael Meissner
  2013-07-15 21:48           ` Michael Meissner
  1 sibling, 2 replies; 52+ messages in thread
From: David Edelsohn @ 2013-06-06 15:57 UTC (permalink / raw)
  To: Michael Meissner, GCC Patches, Pat Haugen, Peter Bergner

On Wed, Jun 5, 2013 at 12:13 PM, Michael Meissner
<meissner@linux.vnet.ibm.com> wrote:
> On Wed, Jun 05, 2013 at 10:28:02AM -0400, David Edelsohn wrote:
>> +;; The canonical form is to have the negated elment first, so we need to
>> +;; reverse arguments.
>>
>> Please fix the typo in the comment: "element".
>
> Ok.  I need to proof-read the patches before sending them out.
>
>> +;; Like VSX_L, but don't support TImode for doing logical instructions in
>> +;; 32-bit
>> +(define_mode_iterator VSX_L2 [V16QI V8HI V4SI V2DI V4SF V2DF])
>> +
>>  ;; Iterator for memory move.  Handle TImode specially to allow
>>  ;; it to use gprs as well as vsx registers.
>>  (define_mode_iterator VSX_M [V16QI V8HI V4SI V2DI V4SF V2DF])
>>
>> +(define_mode_iterator VSX_M2 [V16QI
>> +                  V8HI
>> +                  V4SI
>> +                  V2DI
>> +                  V4SF
>> +                  V2DF
>> +                  (TI    "TARGET_VSX_TIMODE")])
>>
>> The patch adds new iterators VSX_L2 and VSX_M2.  The original
>> ChangeLog only mentioned M2 and the new ChangeLog only mentions L2.
>> What's going on?
>
> I thought I had deleted VSX_M2 from this patch.  It will be needed in patch #8
> for the fusion peephole.  The difference is VSX_L2 avoids TImode altogether,
> and was used by the logical ops to prevent TImode operations in VSX registers
> in 32-bit.
>
> The problem is unless we have expanders/splitters for logical DImode, the
> compiler when it wants to do a logical DImode operation says, aha I have a
> TImode operation, and then it converts the DImode value to TImode, does the
> operation (which in turn may mean transfer between GPR and VSX registers).
>
> I can add splitters and such for 32-bit DImode to prevent this, but I don't
> know if you want me to do it in the context of this patch, or do it as a later
> patch.

Okay, the revised patch #4 is okay with the typos fixed and either the
ChangeLog or the patch adjusted for iterators VSX_L2 and VSX_M2 -- the
ChangeLog and patch need to match.

But I view this as a preliminary step.  The logical instructions need
an iterator and TImode needs to be cleaned up on 32 bit.

Thanks, David

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH, rs6000] power8 patches, patch #4 (revised), new power8 builtins
  2013-06-06 15:57         ` David Edelsohn
@ 2013-06-06 21:42           ` Michael Meissner
  2013-07-15 21:48           ` Michael Meissner
  1 sibling, 0 replies; 52+ messages in thread
From: Michael Meissner @ 2013-06-06 21:42 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Michael Meissner, GCC Patches, Pat Haugen, Peter Bergner

[-- Attachment #1: Type: text/plain, Size: 6147 bytes --]

On Thu, Jun 06, 2013 at 11:57:01AM -0400, David Edelsohn wrote:
> Okay, the revised patch #4 is okay with the typos fixed and either the
> ChangeLog or the patch adjusted for iterators VSX_L2 and VSX_M2 -- the
> ChangeLog and patch need to match.
> 
> But I view this as a preliminary step.  The logical instructions need
> an iterator and TImode needs to be cleaned up on 32 bit.
> 
> Thanks, David
> 

I checked in this revised patch #4 as subversion id 199767.  I will wait for
Seger's patches before redoing the logical operations.

[gcc]
2013-06-06  Michael Meissner  <meissner@linux.vnet.ibm.com>
	    Pat Haugen <pthaugen@us.ibm.com>
	    Peter Bergner <bergner@vnet.ibm.com>

	* doc/extend.texi (PowerPC AltiVec/VSX Built-in Functions):
	Document new power8 builtins.

	* config/rs6000/vector.md (and<mode>3): Add a clobber/scratch of a
	condition code register, to allow 128-bit logical operations to be
	done in the VSX or GPR registers.
	(nor<mode>3): Use the canonical form for nor.
	(eqv<mode>3): Add expanders for power8 xxleqv, xxlnand, xxlorc,
	vclz*, and vpopcnt* vector instructions.
	(nand<mode>3): Likewise.
	(orc<mode>3): Likewise.
	(clz<mode>2): LIkewise.
	(popcount<mode>2): Likewise.

	* config/rs6000/predicates.md (int_reg_operand): Rework tests so
	that only the GPRs are recognized.

	* config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Add
	support for new power8 builtins.

	* config/rs6000/rs6000-builtin.def (xscvspdpn): Add new power8
	builtin functions.
	(xscvdpspn): Likewise.
	(vclz): Likewise.
	(vclzb): Likewise.
	(vclzh): Likewise.
	(vclzw): Likewise.
	(vclzd): Likewise.
	(vpopcnt): Likewise.
	(vpopcntb): Likewise.
	(vpopcnth): Likewise.
	(vpopcntw): Likewise.
	(vpopcntd): Likewise.
	(vgbbd): Likewise.
	(vmrgew): Likewise.
	(vmrgow): Likewise.
	(eqv): Likewise.
	(eqv_v16qi3): Likewise.
	(eqv_v8hi3): Likewise.
	(eqv_v4si3): Likewise.
	(eqv_v2di3): Likewise.
	(eqv_v4sf3): Likewise.
	(eqv_v2df3): Likewise.
	(nand): Likewise.
	(nand_v16qi3): Likewise.
	(nand_v8hi3): Likewise.
	(nand_v4si3): Likewise.
	(nand_v2di3): Likewise.
	(nand_v4sf3): Likewise.
	(nand_v2df3): Likewise.
	(orc): Likewise.
	(orc_v16qi3): Likewise.
	(orc_v8hi3): Likewise.
	(orc_v4si3): Likewise.
	(orc_v2di3): Likewise.
	(orc_v4sf3): Likewise.
	(orc_v2df3): Likewise.

	* config/rs6000/rs6000.c (rs6000_option_override_internal): Only
	allow power8 quad mode in 64-bit.
	(rs6000_builtin_vectorized_function): Add support to vectorize
	ISA 2.07 count leading zeros, population count builtins.
	(rs6000_expand_vector_init): On ISA 2.07 use xscvdpspn to form
	V4SF vectors instead of xscvdpsp to avoid IEEE related traps.
	(builtin_function_type): Add vgbbd builtin function which takes an
	unsigned argument.
	(altivec_expand_vec_perm_const): Add support for new power8 merge
	instructions.

	* config/rs6000/vsx.md (VSX_L2): New iterator for 128-bit types,
	that does not include TImdoe for use with 32-bit.
	(UNSPEC_VSX_CVSPDPN): Support for power8 xscvdpspn and xscvspdpn
	instructions.
	(UNSPEC_VSX_CVDPSPN): Likewise.
	(vsx_xscvdpspn): Likewise.
	(vsx_xscvspdpn): Likewise.
	(vsx_xscvdpspn_scalar): Likewise.
	(vsx_xscvspdpn_directmove): Likewise.
	(vsx_and<mode>3): Split logical operations into 32-bit and
	64-bit. Add support to do logical operations on TImode as well as
	VSX vector types.  Allow logical operations to be done in either
	VSX registers or in general purpose registers in 64-bit mode.  Add
	splitters if GPRs were used. For AND, add clobber of CCmode to
	allow use of ANDI on GPRs.  Rewrite nor to use the canonical RTL
	encoding.
	(vsx_and<mode>3_32bit): Likewise.
	(vsx_and<mode>3_64bit): Likewise.
	(vsx_ior<mode>3): Likewise.
	(vsx_ior<mode>3_32bit): Likewise.
	(vsx_ior<mode>3_64bit): Likewise.
	(vsx_xor<mode>3): Likewise.
	(vsx_xor<mode>3_32bit): Likewise.
	(vsx_xor<mode>3_64bit): Likewise.
	(vsx_one_cmpl<mode>2): Likewise.
	(vsx_one_cmpl<mode>2_32bit): Likewise.
	(vsx_one_cmpl<mode>2_64bit): Likewise.
	(vsx_nor<mode>3): Likewise.
	(vsx_nor<mode>3_32bit): Likewise.
	(vsx_nor<mode>3_64bit): Likewise.
	(vsx_andc<mode>3): Likewise.
	(vsx_andc<mode>3_32bit): Likewise.
	(vsx_andc<mode>3_64bit): Likewise.
	(vsx_eqv<mode>3_32bit): Add support for power8 xxleqv, xxlnand,
	and xxlorc instructions.
	(vsx_eqv<mode>3_64bit): Likewise.
	(vsx_nand<mode>3_32bit): Likewise.
	(vsx_nand<mode>3_64bit): Likewise.
	(vsx_orc<mode>3_32bit): Likewise.
	(vsx_orc<mode>3_64bit): Likewise.

	* config/rs6000/rs6000.h (VLOGICAL_REGNO_P): Update comment.

	* config/rs6000/altivec.md (UNSPEC_VGBBD): Add power8 vgbbd
	instruction.
	(p8_vmrgew): Add power8 vmrgew and vmrgow instructions.
	(p8_vmrgow): Likewise.
	(altivec_and<mode>3): Add clobber of CCmode to allow AND using
	GPRs to be split under VSX.
	(p8v_clz<mode>2): Add power8 count leading zero support.
	(p8v_popcount<mode>2): Add power8 population count support.
	(p8v_vgbbd): Add power8 gather bits by bytes by doubleword
	support.

	* config/rs6000/rs6000.md (eqv<mode>3): Add support for powerp eqv
	instruction.

	* config/rs6000/altivec.h (vec_eqv): Add defines to export power8
	builtin functions.
	(vec_nand): Likewise.
	(vec_vclz): Likewise.
	(vec_vclzb): Likewise.
	(vec_vclzd): Likewise.
	(vec_vclzh): Likewise.
	(vec_vclzw): Likewise.
	(vec_vgbbd): Likewise.
	(vec_vmrgew): Likewise.
	(vec_vmrgow): Likewise.
	(vec_vpopcnt): Likewise.
	(vec_vpopcntb): Likewise.
	(vec_vpopcntd): Likewise.
	(vec_vpopcnth): Likewise.
	(vec_vpopcntw): Likewise.

[gcc/testsuite]
2013-06-06  Michael Meissner  <meissner@linux.vnet.ibm.com>
	    Pat Haugen <pthaugen@us.ibm.com>
	    Peter Bergner <bergner@vnet.ibm.com>

	* gcc.target/powerpc/crypto-builtin-1.c: Use effective target
	powerpc_p8vector_ok instead of powerpc_vsx_ok.

	* gcc.target/powerpc/bool.c: New file, add eqv, nand, nor tests.

	* lib/target-supports.exp (check_p8vector_hw_available) Add power8
	support.
	(check_effective_target_powerpc_p8vector_ok): Likewise.
	(is-effective-target): Likewise.
	(check_vect_support_and_set_flags): Likewise.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

[-- Attachment #2: gcc-power8.official-04f --]
[-- Type: text/plain, Size: 74207 bytes --]

Index: gcc/doc/extend.texi
===================================================================
--- gcc/doc/extend.texi	(revision 199752)
+++ gcc/doc/extend.texi	(working copy)
@@ -13991,6 +13991,38 @@ int vec_any_le (vector long long, vector
 int vec_any_lt (vector long long, vector long long);
 int vec_any_ne (vector long long, vector long long);
 
+vector long long vec_eqv (vector long long, vector long long);
+vector long long vec_eqv (vector bool long long, vector long long);
+vector long long vec_eqv (vector long long, vector bool long long);
+vector unsigned long long vec_eqv (vector unsigned long long,
+                                   vector unsigned long long);
+vector unsigned long long vec_eqv (vector bool long long,
+                                   vector unsigned long long);
+vector unsigned long long vec_eqv (vector unsigned long long,
+                                   vector bool long long);
+vector int vec_eqv (vector int, vector int);
+vector int vec_eqv (vector bool int, vector int);
+vector int vec_eqv (vector int, vector bool int);
+vector unsigned int vec_eqv (vector unsigned int, vector unsigned int);
+vector unsigned int vec_eqv (vector bool unsigned int,
+                             vector unsigned int);
+vector unsigned int vec_eqv (vector unsigned int,
+                             vector bool unsigned int);
+vector short vec_eqv (vector short, vector short);
+vector short vec_eqv (vector bool short, vector short);
+vector short vec_eqv (vector short, vector bool short);
+vector unsigned short vec_eqv (vector unsigned short, vector unsigned short);
+vector unsigned short vec_eqv (vector bool unsigned short,
+                               vector unsigned short);
+vector unsigned short vec_eqv (vector unsigned short,
+                               vector bool unsigned short);
+vector signed char vec_eqv (vector signed char, vector signed char);
+vector signed char vec_eqv (vector bool signed char, vector signed char);
+vector signed char vec_eqv (vector signed char, vector bool signed char);
+vector unsigned char vec_eqv (vector unsigned char, vector unsigned char);
+vector unsigned char vec_eqv (vector bool unsigned char, vector unsigned char);
+vector unsigned char vec_eqv (vector unsigned char, vector bool unsigned char);
+
 vector long long vec_max (vector long long, vector long long);
 vector unsigned long long vec_max (vector unsigned long long,
                                    vector unsigned long long);
@@ -13999,6 +14031,70 @@ vector long long vec_min (vector long lo
 vector unsigned long long vec_min (vector unsigned long long,
                                    vector unsigned long long);
 
+vector long long vec_nand (vector long long, vector long long);
+vector long long vec_nand (vector bool long long, vector long long);
+vector long long vec_nand (vector long long, vector bool long long);
+vector unsigned long long vec_nand (vector unsigned long long,
+                                    vector unsigned long long);
+vector unsigned long long vec_nand (vector bool long long,
+                                   vector unsigned long long);
+vector unsigned long long vec_nand (vector unsigned long long,
+                                    vector bool long long);
+vector int vec_nand (vector int, vector int);
+vector int vec_nand (vector bool int, vector int);
+vector int vec_nand (vector int, vector bool int);
+vector unsigned int vec_nand (vector unsigned int, vector unsigned int);
+vector unsigned int vec_nand (vector bool unsigned int,
+                              vector unsigned int);
+vector unsigned int vec_nand (vector unsigned int,
+                              vector bool unsigned int);
+vector short vec_nand (vector short, vector short);
+vector short vec_nand (vector bool short, vector short);
+vector short vec_nand (vector short, vector bool short);
+vector unsigned short vec_nand (vector unsigned short, vector unsigned short);
+vector unsigned short vec_nand (vector bool unsigned short,
+                                vector unsigned short);
+vector unsigned short vec_nand (vector unsigned short,
+                                vector bool unsigned short);
+vector signed char vec_nand (vector signed char, vector signed char);
+vector signed char vec_nand (vector bool signed char, vector signed char);
+vector signed char vec_nand (vector signed char, vector bool signed char);
+vector unsigned char vec_nand (vector unsigned char, vector unsigned char);
+vector unsigned char vec_nand (vector bool unsigned char, vector unsigned char);
+vector unsigned char vec_nand (vector unsigned char, vector bool unsigned char);
+
+vector long long vec_orc (vector long long, vector long long);
+vector long long vec_orc (vector bool long long, vector long long);
+vector long long vec_orc (vector long long, vector bool long long);
+vector unsigned long long vec_orc (vector unsigned long long,
+                                   vector unsigned long long);
+vector unsigned long long vec_orc (vector bool long long,
+                                   vector unsigned long long);
+vector unsigned long long vec_orc (vector unsigned long long,
+                                   vector bool long long);
+vector int vec_orc (vector int, vector int);
+vector int vec_orc (vector bool int, vector int);
+vector int vec_orc (vector int, vector bool int);
+vector unsigned int vec_orc (vector unsigned int, vector unsigned int);
+vector unsigned int vec_orc (vector bool unsigned int,
+                             vector unsigned int);
+vector unsigned int vec_orc (vector unsigned int,
+                             vector bool unsigned int);
+vector short vec_orc (vector short, vector short);
+vector short vec_orc (vector bool short, vector short);
+vector short vec_orc (vector short, vector bool short);
+vector unsigned short vec_orc (vector unsigned short, vector unsigned short);
+vector unsigned short vec_orc (vector bool unsigned short,
+                               vector unsigned short);
+vector unsigned short vec_orc (vector unsigned short,
+                               vector bool unsigned short);
+vector signed char vec_orc (vector signed char, vector signed char);
+vector signed char vec_orc (vector bool signed char, vector signed char);
+vector signed char vec_orc (vector signed char, vector bool signed char);
+vector unsigned char vec_orc (vector unsigned char, vector unsigned char);
+vector unsigned char vec_orc (vector bool unsigned char, vector unsigned char);
+vector unsigned char vec_orc (vector unsigned char, vector bool unsigned char);
+
 vector int vec_pack (vector long long, vector long long);
 vector unsigned int vec_pack (vector unsigned long long,
                               vector unsigned long long);
@@ -14047,6 +14143,27 @@ vector unsigned long long vec_vaddudm (v
 vector unsigned long long vec_vaddudm (vector unsigned long long,
                                        vector bool unsigned long long);
 
+vector long long vec_vclz (vector long long);
+vector unsigned long long vec_vclz (vector unsigned long long);
+vector int vec_vclz (vector int);
+vector unsigned int vec_vclz (vector int);
+vector short vec_vclz (vector short);
+vector unsigned short vec_vclz (vector unsigned short);
+vector signed char vec_vclz (vector signed char);
+vector unsigned char vec_vclz (vector unsigned char);
+
+vector signed char vec_vclzb (vector signed char);
+vector unsigned char vec_vclzb (vector unsigned char);
+
+vector long long vec_vclzd (vector long long);
+vector unsigned long long vec_vclzd (vector unsigned long long);
+
+vector short vec_vclzh (vector short);
+vector unsigned short vec_vclzh (vector unsigned short);
+
+vector int vec_vclzw (vector int);
+vector unsigned int vec_vclzw (vector int);
+
 vector long long vec_vmaxsd (vector long long, vector long long);
 
 vector unsigned long long vec_vmaxud (vector unsigned long long,
@@ -14068,6 +14185,27 @@ vector unsigned int vec_vpkudum (vector 
                                  vector unsigned long long);
 vector bool int vec_vpkudum (vector bool long long, vector bool long long);
 
+vector long long vec_vpopcnt (vector long long);
+vector unsigned long long vec_vpopcnt (vector unsigned long long);
+vector int vec_vpopcnt (vector int);
+vector unsigned int vec_vpopcnt (vector int);
+vector short vec_vpopcnt (vector short);
+vector unsigned short vec_vpopcnt (vector unsigned short);
+vector signed char vec_vpopcnt (vector signed char);
+vector unsigned char vec_vpopcnt (vector unsigned char);
+
+vector signed char vec_vpopcntb (vector signed char);
+vector unsigned char vec_vpopcntb (vector unsigned char);
+
+vector long long vec_vpopcntd (vector long long);
+vector unsigned long long vec_vpopcntd (vector unsigned long long);
+
+vector short vec_vpopcnth (vector short);
+vector unsigned short vec_vpopcnth (vector unsigned short);
+
+vector int vec_vpopcntw (vector int);
+vector unsigned int vec_vpopcntw (vector int);
+
 vector long long vec_vrld (vector long long, vector unsigned long long);
 vector unsigned long long vec_vrld (vector unsigned long long,
                                     vector unsigned long long);
Index: gcc/testsuite/gcc.target/powerpc/crypto-builtin-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/crypto-builtin-1.c	(revision 199752)
+++ gcc/testsuite/gcc.target/powerpc/crypto-builtin-1.c	(working copy)
@@ -1,6 +1,6 @@
 /* { dg-do compile { target { powerpc*-*-* } } } */
 /* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
-/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
 /* { dg-options "-mcpu=power8 -O2 -ftree-vectorize -fvect-cost-model -fno-unroll-loops -fno-unroll-all-loops" } */
 
 typedef vector unsigned long long	crypto_t;
Index: gcc/testsuite/gcc.target/powerpc/bool.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/bool.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/bool.c	(revision 0)
@@ -0,0 +1,14 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler "eqv" } } */
+/* { dg-final { scan-assembler "nand" } } */
+/* { dg-final { scan-assembler "nor" } } */
+
+#ifndef TYPE
+#define TYPE unsigned long
+#endif
+
+TYPE op1 (TYPE a, TYPE b) { return ~(a ^ b); }	/* eqv */
+TYPE op2 (TYPE a, TYPE b) { return ~(a & b); }	/* nand */
+TYPE op3 (TYPE a, TYPE b) { return ~(a | b); }	/* nor */
+
Index: gcc/testsuite/lib/target-supports.exp
===================================================================
--- gcc/testsuite/lib/target-supports.exp	(revision 199752)
+++ gcc/testsuite/lib/target-supports.exp	(working copy)
@@ -1311,6 +1311,32 @@ proc check_effective_target_avx_runtime 
     return 0
 }
 
+# Return 1 if the target supports executing power8 vector instructions, 0
+# otherwise.  Cache the result.
+
+proc check_p8vector_hw_available { } {
+    return [check_cached_effective_target p8vector_hw_available {
+	# Some simulators are known to not support VSX/power8 instructions.
+	# For now, disable on Darwin
+	if { [istarget powerpc-*-eabi] || [istarget powerpc*-*-eabispe] || [istarget *-*-darwin*]} {
+	    expr 0
+	} else {
+	    set options "-mpower8-vector"
+	    check_runtime_nocache p8vector_hw_available {
+		int main()
+		{
+		#ifdef __MACH__
+		  asm volatile ("xxlorc vs0,vs0,vs0");
+		#else
+		  asm volatile ("xxlorc 0,0,0");
+	        #endif
+		  return 0;
+		}
+	    } $options
+	}
+    }]
+}
+
 # Return 1 if the target supports executing VSX instructions, 0
 # otherwise.  Cache the result.
 
@@ -2749,6 +2775,33 @@ proc check_effective_target_powerpc_alti
     }
 }
 
+# Return 1 if this is a PowerPC target supporting -mpower8-vector
+
+proc check_effective_target_powerpc_p8vector_ok { } {
+    if { ([istarget powerpc*-*-*]
+         && ![istarget powerpc-*-linux*paired*])
+	 || [istarget rs6000-*-*] } {
+	# AltiVec is not supported on AIX before 5.3.
+	if { [istarget powerpc*-*-aix4*]
+	     || [istarget powerpc*-*-aix5.1*] 
+	     || [istarget powerpc*-*-aix5.2*] } {
+	    return 0
+	}
+	return [check_no_compiler_messages powerpc_p8vector_ok object {
+	    int main (void) {
+#ifdef __MACH__
+		asm volatile ("xxlorc vs0,vs0,vs0");
+#else
+		asm volatile ("xxlorc 0,0,0");
+#endif
+		return 0;
+	    }
+	} "-mpower8-vector"]
+    } else {
+	return 0
+    }
+}
+
 # Return 1 if this is a PowerPC target supporting -mvsx
 
 proc check_effective_target_powerpc_vsx_ok { } {
@@ -4576,6 +4629,7 @@ proc is-effective-target { arg } {
 	switch $arg {
 	  "vmx_hw"         { set selected [check_vmx_hw_available] }
 	  "vsx_hw"         { set selected [check_vsx_hw_available] }
+	  "p8vector_hw"    { set selected [check_p8vector_hw_available] }
 	  "ppc_recip_hw"   { set selected [check_ppc_recip_hw_available] }
 	  "named_sections" { set selected [check_named_sections_available] }
 	  "gc_sections"    { set selected [check_gc_sections_available] }
@@ -4597,6 +4651,7 @@ proc is-effective-target-keyword { arg }
 	switch $arg {
 	  "vmx_hw"         { return 1 }
 	  "vsx_hw"         { return 1 }
+	  "p8vector_hw"    { return 1 }
 	  "ppc_recip_hw"   { return 1 }
 	  "named_sections" { return 1 }
 	  "gc_sections"    { return 1 }
@@ -5181,7 +5236,9 @@ proc check_vect_support_and_set_flags { 
         }
 
         lappend DEFAULT_VECTCFLAGS "-maltivec"
-        if [check_vsx_hw_available] {
+        if [check_p8vector_hw_available] {
+            lappend DEFAULT_VECTCFLAGS "-mpower8-vector" "-mno-allow-movmisalign"
+        } elseif [check_vsx_hw_available] {
             lappend DEFAULT_VECTCFLAGS "-mvsx" "-mno-allow-movmisalign"
         }
 
Index: gcc/config/rs6000/vector.md
===================================================================
--- gcc/config/rs6000/vector.md	(revision 199752)
+++ gcc/config/rs6000/vector.md	(working copy)
@@ -730,9 +730,10 @@ (define_expand "ior<mode>3"
   "")
 
 (define_expand "and<mode>3"
-  [(set (match_operand:VEC_L 0 "vlogical_operand" "")
-        (and:VEC_L (match_operand:VEC_L 1 "vlogical_operand" "")
-		   (match_operand:VEC_L 2 "vlogical_operand" "")))]
+  [(parallel [(set (match_operand:VEC_L 0 "vlogical_operand" "")
+		   (and:VEC_L (match_operand:VEC_L 1 "vlogical_operand" "")
+			      (match_operand:VEC_L 2 "vlogical_operand" "")))
+	      (clobber (match_scratch:CC 3 ""))])]
   "VECTOR_MEM_ALTIVEC_OR_VSX_P (<MODE>mode)
    && (<MODE>mode != TImode || TARGET_POWERPC64)"
   "")
@@ -746,8 +747,8 @@ (define_expand "one_cmpl<mode>2"
 
 (define_expand "nor<mode>3"
   [(set (match_operand:VEC_L 0 "vlogical_operand" "")
-        (not:VEC_L (ior:VEC_L (match_operand:VEC_L 1 "vlogical_operand" "")
-			      (match_operand:VEC_L 2 "vlogical_operand" ""))))]
+	(and:VEC_L (not:VEC_L (match_operand:VEC_L 1 "vlogical_operand" ""))
+		   (not:VEC_L (match_operand:VEC_L 2 "vlogical_operand" ""))))]
   "VECTOR_MEM_ALTIVEC_OR_VSX_P (<MODE>mode)
    && (<MODE>mode != TImode || TARGET_POWERPC64)"
   "")
@@ -760,6 +761,47 @@ (define_expand "andc<mode>3"
    && (<MODE>mode != TImode || TARGET_POWERPC64)"
   "")
 
+;; Power8 vector logical instructions.
+(define_expand "eqv<mode>3"
+  [(set (match_operand:VEC_L 0 "register_operand" "")
+	(not:VEC_L
+	 (xor:VEC_L (match_operand:VEC_L 1 "register_operand" "")
+		    (match_operand:VEC_L 2 "register_operand" ""))))]
+  "TARGET_P8_VECTOR && VECTOR_MEM_VSX_P (<MODE>mode)
+   && (<MODE>mode != TImode || TARGET_POWERPC64)")
+
+;; Rewrite nand into canonical form
+(define_expand "nand<mode>3"
+  [(set (match_operand:VEC_L 0 "register_operand" "")
+	(ior:VEC_L
+	 (not:VEC_L (match_operand:VEC_L 1 "register_operand" ""))
+	 (not:VEC_L (match_operand:VEC_L 2 "register_operand" ""))))]
+  "TARGET_P8_VECTOR && VECTOR_MEM_VSX_P (<MODE>mode)
+   && (<MODE>mode != TImode || TARGET_POWERPC64)")
+
+;; The canonical form is to have the negated elment first, so we need to
+;; reverse arguments.
+(define_expand "orc<mode>3"
+  [(set (match_operand:VEC_L 0 "register_operand" "")
+	(ior:VEC_L
+	 (not:VEC_L (match_operand:VEC_L 1 "register_operand" ""))
+	 (match_operand:VEC_L 2 "register_operand" "")))]
+  "TARGET_P8_VECTOR && VECTOR_MEM_VSX_P (<MODE>mode)
+   && (<MODE>mode != TImode || TARGET_POWERPC64)")
+
+;; Vector count leading zeros
+(define_expand "clz<mode>2"
+  [(set (match_operand:VEC_I 0 "register_operand" "")
+	(clz:VEC_I (match_operand:VEC_I 1 "register_operand" "")))]
+  "TARGET_P8_VECTOR")
+
+;; Vector population count
+(define_expand "popcount<mode>2"
+  [(set (match_operand:VEC_I 0 "register_operand" "")
+        (popcount:VEC_I (match_operand:VEC_I 1 "register_operand" "")))]
+  "TARGET_P8_VECTOR")
+
+\f
 ;; Same size conversions
 (define_expand "float<VEC_int><mode>2"
   [(set (match_operand:VEC_F 0 "vfloat_operand" "")
Index: gcc/config/rs6000/predicates.md
===================================================================
--- gcc/config/rs6000/predicates.md	(revision 199752)
+++ gcc/config/rs6000/predicates.md	(working copy)
@@ -207,7 +207,7 @@ (define_predicate "int_reg_operand"
   if (!REG_P (op))
     return 0;
 
-  if (REGNO (op) >= ARG_POINTER_REGNUM && !CA_REGNO_P (REGNO (op)))
+  if (REGNO (op) >= FIRST_PSEUDO_REGISTER)
     return 1;
 
   return INT_REGNO_P (REGNO (op));
Index: gcc/config/rs6000/rs6000-c.c
===================================================================
--- gcc/config/rs6000/rs6000-c.c	(revision 199752)
+++ gcc/config/rs6000/rs6000-c.c	(working copy)
@@ -3515,6 +3515,404 @@ const struct altivec_builtin_types altiv
   { ALTIVEC_BUILTIN_VEC_VCMPGE_P, VSX_BUILTIN_XVCMPGEDP_P,
     RS6000_BTI_INTSI, RS6000_BTI_INTSI, RS6000_BTI_V2DF, RS6000_BTI_V2DF },
 
+  /* Power8 vector overloaded functions.  */
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V16QI,
+    RS6000_BTI_V16QI, RS6000_BTI_bool_V16QI, RS6000_BTI_V16QI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V16QI,
+    RS6000_BTI_V16QI, RS6000_BTI_V16QI, RS6000_BTI_bool_V16QI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V16QI,
+    RS6000_BTI_V16QI, RS6000_BTI_V16QI, RS6000_BTI_V16QI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V16QI,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_bool_V16QI,
+    RS6000_BTI_unsigned_V16QI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V16QI,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI,
+    RS6000_BTI_bool_V16QI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V16QI,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI,
+    RS6000_BTI_unsigned_V16QI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V8HI,
+    RS6000_BTI_V8HI, RS6000_BTI_bool_V8HI, RS6000_BTI_V8HI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V8HI,
+    RS6000_BTI_V8HI, RS6000_BTI_V8HI, RS6000_BTI_bool_V8HI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V8HI,
+    RS6000_BTI_V8HI, RS6000_BTI_V8HI, RS6000_BTI_V8HI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V8HI,
+    RS6000_BTI_unsigned_V8HI, RS6000_BTI_bool_V8HI,
+    RS6000_BTI_unsigned_V8HI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V8HI,
+    RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI,
+    RS6000_BTI_bool_V8HI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V8HI,
+    RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI,
+    RS6000_BTI_unsigned_V8HI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_bool_V4SI, RS6000_BTI_V4SI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_bool_V4SI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_bool_V4SI,
+    RS6000_BTI_unsigned_V4SI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
+    RS6000_BTI_bool_V4SI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
+    RS6000_BTI_unsigned_V4SI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V2DI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V2DI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_bool_V2DI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V2DI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V4SF,
+    RS6000_BTI_V4SF, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
+  { P8V_BUILTIN_VEC_EQV, P8V_BUILTIN_EQV_V2DF,
+    RS6000_BTI_V2DF, RS6000_BTI_V2DF, RS6000_BTI_V2DF, 0 },
+
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V16QI,
+    RS6000_BTI_V16QI, RS6000_BTI_bool_V16QI, RS6000_BTI_V16QI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V16QI,
+    RS6000_BTI_V16QI, RS6000_BTI_V16QI, RS6000_BTI_bool_V16QI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V16QI,
+    RS6000_BTI_V16QI, RS6000_BTI_V16QI, RS6000_BTI_V16QI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V16QI,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_bool_V16QI,
+    RS6000_BTI_unsigned_V16QI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V16QI,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI,
+    RS6000_BTI_bool_V16QI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V16QI,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI,
+    RS6000_BTI_unsigned_V16QI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V8HI,
+    RS6000_BTI_V8HI, RS6000_BTI_bool_V8HI, RS6000_BTI_V8HI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V8HI,
+    RS6000_BTI_V8HI, RS6000_BTI_V8HI, RS6000_BTI_bool_V8HI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V8HI,
+    RS6000_BTI_V8HI, RS6000_BTI_V8HI, RS6000_BTI_V8HI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V8HI,
+    RS6000_BTI_unsigned_V8HI, RS6000_BTI_bool_V8HI,
+    RS6000_BTI_unsigned_V8HI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V8HI,
+    RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI,
+    RS6000_BTI_bool_V8HI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V8HI,
+    RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI,
+    RS6000_BTI_unsigned_V8HI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_bool_V4SI, RS6000_BTI_V4SI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_bool_V4SI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_bool_V4SI,
+    RS6000_BTI_unsigned_V4SI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
+    RS6000_BTI_bool_V4SI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
+    RS6000_BTI_unsigned_V4SI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V2DI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V2DI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_bool_V2DI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V2DI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V4SF,
+    RS6000_BTI_V4SF, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
+  { P8V_BUILTIN_VEC_NAND, P8V_BUILTIN_NAND_V2DF,
+    RS6000_BTI_V2DF, RS6000_BTI_V2DF, RS6000_BTI_V2DF, 0 },
+
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V16QI,
+    RS6000_BTI_V16QI, RS6000_BTI_bool_V16QI, RS6000_BTI_V16QI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V16QI,
+    RS6000_BTI_V16QI, RS6000_BTI_V16QI, RS6000_BTI_bool_V16QI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V16QI,
+    RS6000_BTI_V16QI, RS6000_BTI_V16QI, RS6000_BTI_V16QI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V16QI,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_bool_V16QI,
+    RS6000_BTI_unsigned_V16QI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V16QI,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI,
+    RS6000_BTI_bool_V16QI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V16QI,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI,
+    RS6000_BTI_unsigned_V16QI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V8HI,
+    RS6000_BTI_V8HI, RS6000_BTI_bool_V8HI, RS6000_BTI_V8HI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V8HI,
+    RS6000_BTI_V8HI, RS6000_BTI_V8HI, RS6000_BTI_bool_V8HI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V8HI,
+    RS6000_BTI_V8HI, RS6000_BTI_V8HI, RS6000_BTI_V8HI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V8HI,
+    RS6000_BTI_unsigned_V8HI, RS6000_BTI_bool_V8HI,
+    RS6000_BTI_unsigned_V8HI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V8HI,
+    RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI,
+    RS6000_BTI_bool_V8HI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V8HI,
+    RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI,
+    RS6000_BTI_unsigned_V8HI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_bool_V4SI, RS6000_BTI_V4SI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_bool_V4SI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_bool_V4SI,
+    RS6000_BTI_unsigned_V4SI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
+    RS6000_BTI_bool_V4SI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
+    RS6000_BTI_unsigned_V4SI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V2DI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V2DI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_bool_V2DI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V2DI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V4SF,
+    RS6000_BTI_V4SF, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
+  { P8V_BUILTIN_VEC_ORC, P8V_BUILTIN_ORC_V2DF,
+    RS6000_BTI_V2DF, RS6000_BTI_V2DF, RS6000_BTI_V2DF, 0 },
+
+  { P8V_BUILTIN_VEC_VADDUDM, P8V_BUILTIN_VADDUDM,
+    RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VADDUDM, P8V_BUILTIN_VADDUDM,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VADDUDM, P8V_BUILTIN_VADDUDM,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VADDUDM, P8V_BUILTIN_VADDUDM,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VADDUDM, P8V_BUILTIN_VADDUDM,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VADDUDM, P8V_BUILTIN_VADDUDM,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+
+  { P8V_BUILTIN_VEC_VCLZ, P8V_BUILTIN_VCLZB,
+    RS6000_BTI_V16QI, RS6000_BTI_V16QI, 0, 0 },
+  { P8V_BUILTIN_VEC_VCLZ, P8V_BUILTIN_VCLZB,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0, 0 },
+  { P8V_BUILTIN_VEC_VCLZ, P8V_BUILTIN_VCLZH,
+    RS6000_BTI_V8HI, RS6000_BTI_V8HI, 0, 0 },
+  { P8V_BUILTIN_VEC_VCLZ, P8V_BUILTIN_VCLZH,
+    RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI, 0, 0 },
+  { P8V_BUILTIN_VEC_VCLZ, P8V_BUILTIN_VCLZW,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0, 0 },
+  { P8V_BUILTIN_VEC_VCLZ, P8V_BUILTIN_VCLZW,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, 0, 0 },
+  { P8V_BUILTIN_VEC_VCLZ, P8V_BUILTIN_VCLZD,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0, 0 },
+  { P8V_BUILTIN_VEC_VCLZ, P8V_BUILTIN_VCLZD,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0, 0 },
+
+  { P8V_BUILTIN_VEC_VCLZB, P8V_BUILTIN_VCLZB,
+    RS6000_BTI_V16QI, RS6000_BTI_V16QI, 0, 0 },
+  { P8V_BUILTIN_VEC_VCLZB, P8V_BUILTIN_VCLZB,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0, 0 },
+
+  { P8V_BUILTIN_VEC_VCLZH, P8V_BUILTIN_VCLZH,
+    RS6000_BTI_V8HI, RS6000_BTI_V8HI, 0, 0 },
+  { P8V_BUILTIN_VEC_VCLZH, P8V_BUILTIN_VCLZH,
+    RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI, 0, 0 },
+
+  { P8V_BUILTIN_VEC_VCLZW, P8V_BUILTIN_VCLZW,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0, 0 },
+  { P8V_BUILTIN_VEC_VCLZW, P8V_BUILTIN_VCLZW,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, 0, 0 },
+
+  { P8V_BUILTIN_VEC_VCLZD, P8V_BUILTIN_VCLZD,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0, 0 },
+  { P8V_BUILTIN_VEC_VCLZD, P8V_BUILTIN_VCLZD,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0, 0 },
+
+  { P8V_BUILTIN_VEC_VGBBD, P8V_BUILTIN_VGBBD,
+    RS6000_BTI_V16QI, RS6000_BTI_V16QI, 0, 0 },
+  { P8V_BUILTIN_VEC_VGBBD, P8V_BUILTIN_VGBBD,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0, 0 },
+
+  { P8V_BUILTIN_VEC_VMINSD, P8V_BUILTIN_VMINSD,
+    RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VMINSD, P8V_BUILTIN_VMINSD,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VMINSD, P8V_BUILTIN_VMINSD,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+
+  { P8V_BUILTIN_VEC_VMAXSD, P8V_BUILTIN_VMAXSD,
+    RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VMAXSD, P8V_BUILTIN_VMAXSD,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VMAXSD, P8V_BUILTIN_VMAXSD,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+
+  { P8V_BUILTIN_VEC_VMINUD, P8V_BUILTIN_VMINUD,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VMINUD, P8V_BUILTIN_VMINUD,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_bool_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VMINUD, P8V_BUILTIN_VMINUD,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
+
+  { P8V_BUILTIN_VEC_VMAXUD, P8V_BUILTIN_VMAXUD,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VMAXUD, P8V_BUILTIN_VMAXUD,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_bool_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VMAXUD, P8V_BUILTIN_VMAXUD,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
+
+  { P8V_BUILTIN_VEC_VMRGEW, P8V_BUILTIN_VMRGEW,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
+  { P8V_BUILTIN_VEC_VMRGEW, P8V_BUILTIN_VMRGEW,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
+    RS6000_BTI_unsigned_V4SI, 0 },
+
+  { P8V_BUILTIN_VEC_VMRGOW, P8V_BUILTIN_VMRGOW,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
+  { P8V_BUILTIN_VEC_VMRGOW, P8V_BUILTIN_VMRGOW,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
+    RS6000_BTI_unsigned_V4SI, 0 },
+
+  { P8V_BUILTIN_VEC_VPOPCNT, P8V_BUILTIN_VPOPCNTB,
+    RS6000_BTI_V16QI, RS6000_BTI_V16QI, 0, 0 },
+  { P8V_BUILTIN_VEC_VPOPCNT, P8V_BUILTIN_VPOPCNTB,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0, 0 },
+  { P8V_BUILTIN_VEC_VPOPCNT, P8V_BUILTIN_VPOPCNTH,
+    RS6000_BTI_V8HI, RS6000_BTI_V8HI, 0, 0 },
+  { P8V_BUILTIN_VEC_VPOPCNT, P8V_BUILTIN_VPOPCNTH,
+    RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI, 0, 0 },
+  { P8V_BUILTIN_VEC_VPOPCNT, P8V_BUILTIN_VPOPCNTW,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0, 0 },
+  { P8V_BUILTIN_VEC_VPOPCNT, P8V_BUILTIN_VPOPCNTW,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, 0, 0 },
+  { P8V_BUILTIN_VEC_VPOPCNT, P8V_BUILTIN_VPOPCNTD,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0, 0 },
+  { P8V_BUILTIN_VEC_VPOPCNT, P8V_BUILTIN_VPOPCNTD,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0, 0 },
+
+  { P8V_BUILTIN_VEC_VPOPCNTB, P8V_BUILTIN_VPOPCNTB,
+    RS6000_BTI_V16QI, RS6000_BTI_V16QI, 0, 0 },
+  { P8V_BUILTIN_VEC_VPOPCNTB, P8V_BUILTIN_VPOPCNTB,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0, 0 },
+
+  { P8V_BUILTIN_VEC_VPOPCNTH, P8V_BUILTIN_VPOPCNTH,
+    RS6000_BTI_V8HI, RS6000_BTI_V8HI, 0, 0 },
+  { P8V_BUILTIN_VEC_VPOPCNTH, P8V_BUILTIN_VPOPCNTH,
+    RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI, 0, 0 },
+
+  { P8V_BUILTIN_VEC_VPOPCNTW, P8V_BUILTIN_VPOPCNTW,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0, 0 },
+  { P8V_BUILTIN_VEC_VPOPCNTW, P8V_BUILTIN_VPOPCNTW,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, 0, 0 },
+
+  { P8V_BUILTIN_VEC_VPOPCNTD, P8V_BUILTIN_VPOPCNTD,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0, 0 },
+  { P8V_BUILTIN_VEC_VPOPCNTD, P8V_BUILTIN_VPOPCNTD,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0, 0 },
+
+  { P8V_BUILTIN_VEC_VPKUDUM, P8V_BUILTIN_VPKUDUM,
+    RS6000_BTI_V4SI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VPKUDUM, P8V_BUILTIN_VPKUDUM,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VPKUDUM, P8V_BUILTIN_VPKUDUM,
+    RS6000_BTI_bool_V4SI, RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V2DI, 0 },
+
+  { P8V_BUILTIN_VEC_VPKSDSS, P8V_BUILTIN_VPKSDSS,
+    RS6000_BTI_V4SI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+
+  { P8V_BUILTIN_VEC_VPKUDUS, P8V_BUILTIN_VPKUDUS,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+
+  { P8V_BUILTIN_VEC_VPKSDUS, P8V_BUILTIN_VPKSDUS,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+
+  { P8V_BUILTIN_VEC_VRLD, P8V_BUILTIN_VRLD,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VRLD, P8V_BUILTIN_VRLD,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+
+  { P8V_BUILTIN_VEC_VSLD, P8V_BUILTIN_VSLD,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VSLD, P8V_BUILTIN_VSLD,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+
+  { P8V_BUILTIN_VEC_VSRD, P8V_BUILTIN_VSRD,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VSRD, P8V_BUILTIN_VSRD,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+
+  { P8V_BUILTIN_VEC_VSRAD, P8V_BUILTIN_VSRAD,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VSRAD, P8V_BUILTIN_VSRD,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+
+  { P8V_BUILTIN_VEC_VSUBUDM, P8V_BUILTIN_VSUBUDM,
+    RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VSUBUDM, P8V_BUILTIN_VSUBUDM,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_bool_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VSUBUDM, P8V_BUILTIN_VSUBUDM,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VSUBUDM, P8V_BUILTIN_VSUBUDM,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VSUBUDM, P8V_BUILTIN_VSUBUDM,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_bool_V2DI, 0 },
+  { P8V_BUILTIN_VEC_VSUBUDM, P8V_BUILTIN_VSUBUDM,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 },
+
+  { P8V_BUILTIN_VEC_VUPKHSW, P8V_BUILTIN_VUPKHSW,
+    RS6000_BTI_V2DI, RS6000_BTI_V4SI, 0, 0 },
+  { P8V_BUILTIN_VEC_VUPKHSW, P8V_BUILTIN_VUPKHSW,
+    RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V4SI, 0, 0 },
+
+  { P8V_BUILTIN_VEC_VUPKLSW, P8V_BUILTIN_VUPKLSW,
+    RS6000_BTI_V2DI, RS6000_BTI_V4SI, 0, 0 },
+  { P8V_BUILTIN_VEC_VUPKLSW, P8V_BUILTIN_VUPKLSW,
+    RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V4SI, 0, 0 },
+
+  { P8V_BUILTIN_VEC_VGBBD, P8V_BUILTIN_VGBBD,
+    RS6000_BTI_V16QI, 0, 0, 0 },
+  { P8V_BUILTIN_VEC_VGBBD, P8V_BUILTIN_VGBBD,
+    RS6000_BTI_unsigned_V16QI, 0, 0, 0 },
+
   /* Crypto builtins.  */
   { CRYPTO_BUILTIN_VPERMXOR, CRYPTO_BUILTIN_VPERMXOR_V16QI,
     RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI,
Index: gcc/config/rs6000/rs6000-builtin.def
===================================================================
--- gcc/config/rs6000/rs6000-builtin.def	(revision 199752)
+++ gcc/config/rs6000/rs6000-builtin.def	(working copy)
@@ -1234,10 +1234,23 @@ BU_VSX_OVERLOAD_2 (XXSPLTW,  "xxspltw")
 BU_VSX_OVERLOAD_X (LD,	     "ld")
 BU_VSX_OVERLOAD_X (ST,	     "st")
 \f
+/* 1 argument VSX instructions added in ISA 2.07.  */
+BU_P8V_VSX_1 (XSCVSPDPN,      "xscvspdpn",	CONST,	vsx_xscvspdpn)
+BU_P8V_VSX_1 (XSCVDPSPN,      "xscvdpspn",	CONST,	vsx_xscvdpspn)
+
 /* 1 argument altivec instructions added in ISA 2.07.  */
 BU_P8V_AV_1 (ABS_V2DI,	      "abs_v2di",	CONST,	absv2di2)
 BU_P8V_AV_1 (VUPKHSW,	      "vupkhsw",	CONST,	altivec_vupkhsw)
 BU_P8V_AV_1 (VUPKLSW,	      "vupklsw",	CONST,	altivec_vupklsw)
+BU_P8V_AV_1 (VCLZB,	      "vclzb",		CONST,  clzv16qi2)
+BU_P8V_AV_1 (VCLZH,	      "vclzh",		CONST,  clzv8hi2)
+BU_P8V_AV_1 (VCLZW,	      "vclzw",		CONST,  clzv4si2)
+BU_P8V_AV_1 (VCLZD,	      "vclzd",		CONST,  clzv2di2)
+BU_P8V_AV_1 (VPOPCNTB,	      "vpopcntb",	CONST,  popcountv16qi2)
+BU_P8V_AV_1 (VPOPCNTH,	      "vpopcnth",	CONST,  popcountv8hi2)
+BU_P8V_AV_1 (VPOPCNTW,	      "vpopcntw",	CONST,  popcountv4si2)
+BU_P8V_AV_1 (VPOPCNTD,	      "vpopcntd",	CONST,  popcountv2di2)
+BU_P8V_AV_1 (VGBBD,	      "vgbbd",		CONST,  p8v_vgbbd)
 
 /* 2 argument altivec instructions added in ISA 2.07.  */
 BU_P8V_AV_2 (VADDUDM,		"vaddudm",	CONST,	addv2di3)
@@ -1245,6 +1258,8 @@ BU_P8V_AV_2 (VMINSD,		"vminsd",	CONST,	s
 BU_P8V_AV_2 (VMAXSD,		"vmaxsd",	CONST,	smaxv2di3)
 BU_P8V_AV_2 (VMINUD,		"vminud",	CONST,	uminv2di3)
 BU_P8V_AV_2 (VMAXUD,		"vmaxud",	CONST,	umaxv2di3)
+BU_P8V_AV_2 (VMRGEW,		"vmrgew",	CONST,	p8_vmrgew)
+BU_P8V_AV_2 (VMRGOW,		"vmrgow",	CONST,	p8_vmrgow)
 BU_P8V_AV_2 (VPKUDUM,		"vpkudum",	CONST,	altivec_vpkudum)
 BU_P8V_AV_2 (VPKSDSS,		"vpksdss",	CONST,	altivec_vpksdss)
 BU_P8V_AV_2 (VPKUDUS,		"vpkudus",	CONST,	altivec_vpkudus)
@@ -1255,6 +1270,27 @@ BU_P8V_AV_2 (VSRD,		"vsrd",		CONST,	vlsh
 BU_P8V_AV_2 (VSRAD,		"vsrad",	CONST,	vashrv2di3)
 BU_P8V_AV_2 (VSUBUDM,		"vsubudm",	CONST,	subv2di3)
 
+BU_P8V_AV_2 (EQV_V16QI,		"eqv_v16qi",	CONST,	eqvv16qi3)
+BU_P8V_AV_2 (EQV_V8HI,		"eqv_v8hi",	CONST,	eqvv8hi3)
+BU_P8V_AV_2 (EQV_V4SI,		"eqv_v4si",	CONST,	eqvv4si3)
+BU_P8V_AV_2 (EQV_V2DI,		"eqv_v2di",	CONST,	eqvv2di3)
+BU_P8V_AV_2 (EQV_V4SF,		"eqv_v4sf",	CONST,	eqvv4sf3)
+BU_P8V_AV_2 (EQV_V2DF,		"eqv_v2df",	CONST,	eqvv2df3)
+
+BU_P8V_AV_2 (NAND_V16QI,	"nand_v16qi",	CONST,	nandv16qi3)
+BU_P8V_AV_2 (NAND_V8HI,		"nand_v8hi",	CONST,	nandv8hi3)
+BU_P8V_AV_2 (NAND_V4SI,		"nand_v4si",	CONST,	nandv4si3)
+BU_P8V_AV_2 (NAND_V2DI,		"nand_v2di",	CONST,	nandv2di3)
+BU_P8V_AV_2 (NAND_V4SF,		"nand_v4sf",	CONST,	nandv4sf3)
+BU_P8V_AV_2 (NAND_V2DF,		"nand_v2df",	CONST,	nandv2df3)
+
+BU_P8V_AV_2 (ORC_V16QI,		"orc_v16qi",	CONST,	orcv16qi3)
+BU_P8V_AV_2 (ORC_V8HI,		"orc_v8hi",	CONST,	orcv8hi3)
+BU_P8V_AV_2 (ORC_V4SI,		"orc_v4si",	CONST,	orcv4si3)
+BU_P8V_AV_2 (ORC_V2DI,		"orc_v2di",	CONST,	orcv2di3)
+BU_P8V_AV_2 (ORC_V4SF,		"orc_v4sf",	CONST,	orcv4sf3)
+BU_P8V_AV_2 (ORC_V2DF,		"orc_v2df",	CONST,	orcv2df3)
+
 /* Vector comparison instructions added in ISA 2.07.  */
 BU_P8V_AV_2 (VCMPEQUD,		"vcmpequd",	CONST,	vector_eqv2di)
 BU_P8V_AV_2 (VCMPGTSD,		"vcmpgtsd",	CONST,	vector_gtv2di)
@@ -1268,13 +1304,29 @@ BU_P8V_AV_P (VCMPGTUD_P,	"vcmpgtud_p",	C
 /* ISA 2.07 vector overloaded 1 argument functions.  */
 BU_P8V_OVERLOAD_1 (VUPKHSW,	"vupkhsw")
 BU_P8V_OVERLOAD_1 (VUPKLSW,	"vupklsw")
+BU_P8V_OVERLOAD_1 (VCLZ,	"vclz")
+BU_P8V_OVERLOAD_1 (VCLZB,	"vclzb")
+BU_P8V_OVERLOAD_1 (VCLZH,	"vclzh")
+BU_P8V_OVERLOAD_1 (VCLZW,	"vclzw")
+BU_P8V_OVERLOAD_1 (VCLZD,	"vclzd")
+BU_P8V_OVERLOAD_1 (VPOPCNT,	"vpopcnt")
+BU_P8V_OVERLOAD_1 (VPOPCNTB,	"vpopcntb")
+BU_P8V_OVERLOAD_1 (VPOPCNTH,	"vpopcnth")
+BU_P8V_OVERLOAD_1 (VPOPCNTW,	"vpopcntw")
+BU_P8V_OVERLOAD_1 (VPOPCNTD,	"vpopcntd")
+BU_P8V_OVERLOAD_1 (VGBBD,	"vgbbd")
 
 /* ISA 2.07 vector overloaded 2 argument functions.  */
+BU_P8V_OVERLOAD_2 (EQV,		"eqv")
+BU_P8V_OVERLOAD_2 (NAND,	"nand")
+BU_P8V_OVERLOAD_2 (ORC,		"orc")
 BU_P8V_OVERLOAD_2 (VADDUDM,	"vaddudm")
 BU_P8V_OVERLOAD_2 (VMAXSD,	"vmaxsd")
 BU_P8V_OVERLOAD_2 (VMAXUD,	"vmaxud")
 BU_P8V_OVERLOAD_2 (VMINSD,	"vminsd")
 BU_P8V_OVERLOAD_2 (VMINUD,	"vminud")
+BU_P8V_OVERLOAD_2 (VMRGEW,	"vmrgew")
+BU_P8V_OVERLOAD_2 (VMRGOW,	"vmrgow")
 BU_P8V_OVERLOAD_2 (VPKSDSS,	"vpksdss")
 BU_P8V_OVERLOAD_2 (VPKSDUS,	"vpksdus")
 BU_P8V_OVERLOAD_2 (VPKUDUM,	"vpkudum")
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 199752)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -2859,6 +2859,16 @@ rs6000_option_override_internal (bool gl
 	}
     }
 
+  /* The quad memory instructions only works in 64-bit mode. In 32-bit mode,
+     silently turn off quad memory mode.  */
+  if (TARGET_QUAD_MEMORY && !TARGET_POWERPC64)
+    {
+      if ((rs6000_isa_flags_explicit & OPTION_MASK_QUAD_MEMORY) != 0)
+	warning (0, N_("-mquad-memory requires 64-bit mode"));
+
+      rs6000_isa_flags &= ~OPTION_MASK_QUAD_MEMORY;
+    }
+
   if (TARGET_DEBUG_REG || TARGET_DEBUG_TARGET)
     rs6000_print_isa_options (stderr, 0, "before defaults", rs6000_isa_flags);
 
@@ -4082,6 +4092,22 @@ rs6000_builtin_vectorized_function (tree
       enum built_in_function fn = DECL_FUNCTION_CODE (fndecl);
       switch (fn)
 	{
+	case BUILT_IN_CLZIMAX:
+	case BUILT_IN_CLZLL:
+	case BUILT_IN_CLZL:
+	case BUILT_IN_CLZ:
+	  if (TARGET_P8_VECTOR && in_mode == out_mode && out_n == in_n)
+	    {
+	      if (out_mode == QImode && out_n == 16)
+		return rs6000_builtin_decls[P8V_BUILTIN_VCLZB];
+	      else if (out_mode == HImode && out_n == 8)
+		return rs6000_builtin_decls[P8V_BUILTIN_VCLZH];
+	      else if (out_mode == SImode && out_n == 4)
+		return rs6000_builtin_decls[P8V_BUILTIN_VCLZW];
+	      else if (out_mode == DImode && out_n == 2)
+		return rs6000_builtin_decls[P8V_BUILTIN_VCLZD];
+	    }
+	  break;
 	case BUILT_IN_COPYSIGN:
 	  if (VECTOR_UNIT_VSX_P (V2DFmode)
 	      && out_mode == DFmode && out_n == 2
@@ -4097,6 +4123,22 @@ rs6000_builtin_vectorized_function (tree
 	  if (VECTOR_UNIT_ALTIVEC_P (V4SFmode))
 	    return rs6000_builtin_decls[ALTIVEC_BUILTIN_COPYSIGN_V4SF];
 	  break;
+	case BUILT_IN_POPCOUNTIMAX:
+	case BUILT_IN_POPCOUNTLL:
+	case BUILT_IN_POPCOUNTL:
+	case BUILT_IN_POPCOUNT:
+	  if (TARGET_P8_VECTOR && in_mode == out_mode && out_n == in_n)
+	    {
+	      if (out_mode == QImode && out_n == 16)
+		return rs6000_builtin_decls[P8V_BUILTIN_VPOPCNTB];
+	      else if (out_mode == HImode && out_n == 8)
+		return rs6000_builtin_decls[P8V_BUILTIN_VPOPCNTH];
+	      else if (out_mode == SImode && out_n == 4)
+		return rs6000_builtin_decls[P8V_BUILTIN_VPOPCNTW];
+	      else if (out_mode == DImode && out_n == 2)
+		return rs6000_builtin_decls[P8V_BUILTIN_VPOPCNTD];
+	    }
+	  break;
 	case BUILT_IN_SQRT:
 	  if (VECTOR_UNIT_VSX_P (V2DFmode)
 	      && out_mode == DFmode && out_n == 2
@@ -4955,8 +4997,11 @@ rs6000_expand_vector_init (rtx target, r
 	{
 	  rtx freg = gen_reg_rtx (V4SFmode);
 	  rtx sreg = force_reg (SFmode, XVECEXP (vals, 0, 0));
+	  rtx cvt  = ((TARGET_XSCVDPSPN)
+		      ? gen_vsx_xscvdpspn_scalar (freg, sreg)
+		      : gen_vsx_xscvdpsp_scalar (freg, sreg));
 
-	  emit_insn (gen_vsx_xscvdpsp_scalar (freg, sreg));
+	  emit_insn (cvt);
 	  emit_insn (gen_vsx_xxspltw_v4sf (target, freg, const0_rtx));
 	}
       else
@@ -12857,6 +12902,7 @@ builtin_function_type (enum machine_mode
     {
       /* unsigned 1 argument functions.  */
     case CRYPTO_BUILTIN_VSBOX:
+    case P8V_BUILTIN_VGBBD:
       h.uns_p[0] = 1;
       h.uns_p[1] = 1;
       break;
@@ -27214,26 +27260,31 @@ bool
 altivec_expand_vec_perm_const (rtx operands[4])
 {
   struct altivec_perm_insn {
+    HOST_WIDE_INT mask;
     enum insn_code impl;
     unsigned char perm[16];
   };
   static const struct altivec_perm_insn patterns[] = {
-    { CODE_FOR_altivec_vpkuhum,
+    { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vpkuhum,
       {  1,  3,  5,  7,  9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31 } },
-    { CODE_FOR_altivec_vpkuwum,
+    { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vpkuwum,
       {  2,  3,  6,  7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31 } },
-    { CODE_FOR_altivec_vmrghb,
+    { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghb,
       {  0, 16,  1, 17,  2, 18,  3, 19,  4, 20,  5, 21,  6, 22,  7, 23 } },
-    { CODE_FOR_altivec_vmrghh,
+    { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghh,
       {  0,  1, 16, 17,  2,  3, 18, 19,  4,  5, 20, 21,  6,  7, 22, 23 } },
-    { CODE_FOR_altivec_vmrghw,
+    { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghw,
       {  0,  1,  2,  3, 16, 17, 18, 19,  4,  5,  6,  7, 20, 21, 22, 23 } },
-    { CODE_FOR_altivec_vmrglb,
+    { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglb,
       {  8, 24,  9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31 } },
-    { CODE_FOR_altivec_vmrglh,
+    { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglh,
       {  8,  9, 24, 25, 10, 11, 26, 27, 12, 13, 28, 29, 14, 15, 30, 31 } },
-    { CODE_FOR_altivec_vmrglw,
-      {  8,  9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31 } }
+    { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglw,
+      {  8,  9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31 } },
+    { OPTION_MASK_P8_VECTOR, CODE_FOR_p8_vmrgew,
+      {  0,  1,  2,  3, 16, 17, 18, 19,  8,  9, 10, 11, 24, 25, 26, 27 } },
+    { OPTION_MASK_P8_VECTOR, CODE_FOR_p8_vmrgow,
+      {  4,  5,  6,  7, 20, 21, 22, 23, 12, 13, 14, 15, 28, 29, 30, 31 } }
   };
 
   unsigned int i, j, elt, which;
@@ -27333,6 +27384,9 @@ altivec_expand_vec_perm_const (rtx opera
     {
       bool swapped;
 
+      if ((patterns[j].mask & rs6000_isa_flags) == 0)
+	continue;
+
       elt = patterns[j].perm[0];
       if (perm[0] == elt)
 	swapped = false;
Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md	(revision 199752)
+++ gcc/config/rs6000/vsx.md	(working copy)
@@ -36,6 +36,10 @@ (define_mode_iterator VSX_F [V4SF V2DF])
 ;; Iterator for logical types supported by VSX
 (define_mode_iterator VSX_L [V16QI V8HI V4SI V2DI V4SF V2DF TI])
 
+;; Like VSX_L, but don't support TImode for doing logical instructions in
+;; 32-bit
+(define_mode_iterator VSX_L2 [V16QI V8HI V4SI V2DI V4SF V2DF])
+
 ;; Iterator for memory move.  Handle TImode specially to allow
 ;; it to use gprs as well as vsx registers.
 (define_mode_iterator VSX_M [V16QI V8HI V4SI V2DI V4SF V2DF])
@@ -191,6 +195,8 @@ (define_c_enum "unspec"
    UNSPEC_VSX_CVDPSXWS
    UNSPEC_VSX_CVDPUXWS
    UNSPEC_VSX_CVSPDP
+   UNSPEC_VSX_CVSPDPN
+   UNSPEC_VSX_CVDPSPN
    UNSPEC_VSX_CVSXWDP
    UNSPEC_VSX_CVUXWDP
    UNSPEC_VSX_CVSXDSP
@@ -1003,6 +1009,40 @@ (define_insn "vsx_xscvspdp_scalar2"
   "xscvspdp %x0,%x1"
   [(set_attr "type" "fp")])
 
+;; ISA 2.07 xscvdpspn/xscvspdpn that does not raise an error on signalling NaNs
+(define_insn "vsx_xscvdpspn"
+  [(set (match_operand:V4SF 0 "vsx_register_operand" "=ws,?wa")
+	(unspec:V4SF [(match_operand:DF 1 "vsx_register_operand" "wd,wa")]
+		     UNSPEC_VSX_CVDPSPN))]
+  "TARGET_XSCVDPSPN"
+  "xscvdpspn %x0,%x1"
+  [(set_attr "type" "fp")])
+
+(define_insn "vsx_xscvspdpn"
+  [(set (match_operand:DF 0 "vsx_register_operand" "=ws,?wa")
+	(unspec:DF [(match_operand:V4SF 1 "vsx_register_operand" "wa,wa")]
+		   UNSPEC_VSX_CVSPDPN))]
+  "TARGET_XSCVSPDPN"
+  "xscvspdpn %x0,%x1"
+  [(set_attr "type" "fp")])
+
+(define_insn "vsx_xscvdpspn_scalar"
+  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa")
+	(unspec:V4SF [(match_operand:SF 1 "vsx_register_operand" "f")]
+		     UNSPEC_VSX_CVDPSPN))]
+  "TARGET_XSCVDPSPN"
+  "xscvdpspn %x0,%x1"
+  [(set_attr "type" "fp")])
+
+;; Used by direct move to move a SFmode value from GPR to VSX register
+(define_insn "vsx_xscvspdpn_directmove"
+  [(set (match_operand:SF 0 "vsx_register_operand" "=wa")
+	(unspec:SF [(match_operand:SF 1 "vsx_register_operand" "wa")]
+		   UNSPEC_VSX_CVSPDPN))]
+  "TARGET_XSCVSPDPN"
+  "xscvspdpn %x0,%x1"
+  [(set_attr "type" "fp")])
+
 ;; Convert from 64-bit to 32-bit types
 ;; Note, favor the Altivec registers since the usual use of these instructions
 ;; is in vector converts and we need to use the Altivec vperm instruction.
@@ -1088,70 +1128,368 @@ (define_insn "*vsx_float_fix_<mode>2"
    (set_attr "fp_type" "<VSfptype_simple>")])
 
 \f
-;; Logical operations
-;; Do not support TImode logical instructions on 32-bit at present, because the
-;; compiler will see that we have a TImode and when it wanted DImode, and
-;; convert the DImode to TImode, store it on the stack, and load it in a VSX
-;; register.
-(define_insn "*vsx_and<mode>3"
-  [(set (match_operand:VSX_L 0 "vsx_register_operand" "=<VSr>,?wa")
-        (and:VSX_L
-	 (match_operand:VSX_L 1 "vsx_register_operand" "<VSr>,wa")
-	 (match_operand:VSX_L 2 "vsx_register_operand" "<VSr>,wa")))]
-  "VECTOR_MEM_VSX_P (<MODE>mode)
-   && (<MODE>mode != TImode || TARGET_POWERPC64)"
+;; Logical operations.  Do not support TImode logical instructions on 32-bit at
+;; present, because the compiler will see that we have a TImode and when it
+;; wanted DImode, and convert the DImode to TImode, store it on the stack, and
+;; load it in a VSX register or generate extra logical instructions in GPR
+;; registers.
+
+;; When we are splitting the operations to GPRs, we use three alternatives, two
+;; where the first/second inputs and output are in the same register, and the
+;; third where the output specifies an early clobber so that we don't have to
+;; worry about overlapping registers.
+
+(define_insn "*vsx_and<mode>3_32bit"
+  [(set (match_operand:VSX_L2 0 "vlogical_operand" "=wa")
+        (and:VSX_L2 (match_operand:VSX_L2 1 "vlogical_operand" "%wa")
+		    (match_operand:VSX_L2 2 "vlogical_operand" "wa")))
+   (clobber (match_scratch:CC 3 "X"))]
+  "!TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)"
   "xxland %x0,%x1,%x2"
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecsimple")
+   (set_attr "length" "4")])
 
-(define_insn "*vsx_ior<mode>3"
-  [(set (match_operand:VSX_L 0 "vsx_register_operand" "=<VSr>,?wa")
-        (ior:VSX_L (match_operand:VSX_L 1 "vsx_register_operand" "<VSr>,wa")
-		   (match_operand:VSX_L 2 "vsx_register_operand" "<VSr>,wa")))]
-  "VECTOR_MEM_VSX_P (<MODE>mode)
-   && (<MODE>mode != TImode || TARGET_POWERPC64)"
+(define_insn_and_split "*vsx_and<mode>3_64bit"
+  [(set (match_operand:VSX_L 0 "vlogical_operand" "=wa,?r,?r,&?r")
+        (and:VSX_L
+	 (match_operand:VSX_L 1 "vlogical_operand" "%wa,0,r,r")
+	 (match_operand:VSX_L 2 "vlogical_operand" "wa,r,0,r")))
+   (clobber (match_scratch:CC 3 "X,X,X,X"))]
+  "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)"
+  "@
+   xxland %x0,%x1,%x2
+   #
+   #
+   #"
+  "reload_completed && TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
+   && int_reg_operand (operands[0], <MODE>mode)"
+  [(parallel [(set (match_dup 4) (and:DI (match_dup 5) (match_dup 6)))
+	      (clobber (match_dup 3))])
+   (parallel [(set (match_dup 7) (and:DI (match_dup 8) (match_dup 9)))
+	      (clobber (match_dup 3))])]
+{
+  operands[4] = simplify_subreg (DImode, operands[0], <MODE>mode, 0);
+  operands[5] = simplify_subreg (DImode, operands[1], <MODE>mode, 0);
+  operands[6] = simplify_subreg (DImode, operands[2], <MODE>mode, 0);
+  operands[7] = simplify_subreg (DImode, operands[0], <MODE>mode, 8);
+  operands[8] = simplify_subreg (DImode, operands[1], <MODE>mode, 8);
+  operands[9] = simplify_subreg (DImode, operands[2], <MODE>mode, 8);
+}
+  [(set_attr "type" "vecsimple,two,two,two")
+   (set_attr "length" "4,8,8,8")])
+
+(define_insn "*vsx_ior<mode>3_32bit"
+  [(set (match_operand:VSX_L2 0 "vlogical_operand" "=wa")
+        (ior:VSX_L2 (match_operand:VSX_L2 1 "vlogical_operand" "%wa")
+		    (match_operand:VSX_L2 2 "vlogical_operand" "wa")))]
+  "!TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)"
   "xxlor %x0,%x1,%x2"
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecsimple")
+   (set_attr "length" "4")])
 
-(define_insn "*vsx_xor<mode>3"
-  [(set (match_operand:VSX_L 0 "vsx_register_operand" "=<VSr>,?wa")
-        (xor:VSX_L
-	 (match_operand:VSX_L 1 "vsx_register_operand" "<VSr>,wa")
-	 (match_operand:VSX_L 2 "vsx_register_operand" "<VSr>,wa")))]
-  "VECTOR_MEM_VSX_P (<MODE>mode)
-   && (<MODE>mode != TImode || TARGET_POWERPC64)"
+(define_insn_and_split "*vsx_ior<mode>3_64bit"
+  [(set (match_operand:VSX_L 0 "vlogical_operand" "=wa,?r,?r,&?r,?r,&?r")
+        (ior:VSX_L
+	 (match_operand:VSX_L 1 "vlogical_operand" "%wa,0,r,r,0,r")
+	 (match_operand:VSX_L 2 "vsx_reg_or_cint_operand" "wa,r,0,r,n,n")))]
+  "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)"
+  "@
+   xxlor %x0,%x1,%x2
+   #
+   #
+   #
+   #
+   #"
+  "reload_completed && TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
+   && int_reg_operand (operands[0], <MODE>mode)"
+  [(const_int 0)]
+{
+  operands[3] = simplify_subreg (DImode, operands[0], <MODE>mode, 0);
+  operands[4] = simplify_subreg (DImode, operands[1], <MODE>mode, 0);
+  operands[5] = simplify_subreg (DImode, operands[2], <MODE>mode, 0);
+  operands[6] = simplify_subreg (DImode, operands[0], <MODE>mode, 8);
+  operands[7] = simplify_subreg (DImode, operands[1], <MODE>mode, 8);
+  operands[8] = simplify_subreg (DImode, operands[2], <MODE>mode, 8);
+
+  if (operands[5] == constm1_rtx)
+    emit_move_insn (operands[3], constm1_rtx);
+
+  else if (operands[5] == const0_rtx)
+    {
+      if (!rtx_equal_p (operands[3], operands[4]))
+	emit_move_insn (operands[3], operands[4]);
+    }
+  else
+    emit_insn (gen_iordi3 (operands[3], operands[4], operands[5]));
+
+  if (operands[8] == constm1_rtx)
+    emit_move_insn (operands[8], constm1_rtx);
+
+  else if (operands[8] == const0_rtx)
+    {
+      if (!rtx_equal_p (operands[6], operands[7]))
+	emit_move_insn (operands[6], operands[7]);
+    }
+  else
+    emit_insn (gen_iordi3 (operands[6], operands[7], operands[8]));
+  DONE;
+}
+  [(set_attr "type" "vecsimple,two,two,two,three,three")
+   (set_attr "length" "4,8,8,8,16,16")])
+
+(define_insn "*vsx_xor<mode>3_32bit"
+  [(set (match_operand:VSX_L2 0 "vlogical_operand" "=wa")
+        (xor:VSX_L2 (match_operand:VSX_L2 1 "vlogical_operand" "%wa")
+		    (match_operand:VSX_L2 2 "vlogical_operand" "wa")))]
+  "VECTOR_MEM_VSX_P (<MODE>mode) && !TARGET_POWERPC64"
   "xxlxor %x0,%x1,%x2"
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecsimple")
+   (set_attr "length" "4")])
 
-(define_insn "*vsx_one_cmpl<mode>2"
-  [(set (match_operand:VSX_L 0 "vsx_register_operand" "=<VSr>,?wa")
-        (not:VSX_L
-	 (match_operand:VSX_L 1 "vsx_register_operand" "<VSr>,wa")))]
-  "VECTOR_MEM_VSX_P (<MODE>mode)
-   && (<MODE>mode != TImode || TARGET_POWERPC64)"
+(define_insn_and_split "*vsx_xor<mode>3_64bit"
+  [(set (match_operand:VSX_L 0 "vlogical_operand" "=wa,?r,?r,&?r,?r,&?r")
+        (xor:VSX_L
+	 (match_operand:VSX_L 1 "vlogical_operand" "%wa,0,r,r,0,r")
+	 (match_operand:VSX_L 2 "vsx_reg_or_cint_operand" "wa,r,0,r,n,n")))]
+  "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)"
+  "@
+   xxlxor %x0,%x1,%x2
+   #
+   #
+   #
+   #
+   #"
+  "reload_completed && TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
+   && int_reg_operand (operands[0], <MODE>mode)"
+  [(set (match_dup 3) (xor:DI (match_dup 4) (match_dup 5)))
+   (set (match_dup 6) (xor:DI (match_dup 7) (match_dup 8)))]
+{
+  operands[3] = simplify_subreg (DImode, operands[0], <MODE>mode, 0);
+  operands[4] = simplify_subreg (DImode, operands[1], <MODE>mode, 0);
+  operands[5] = simplify_subreg (DImode, operands[2], <MODE>mode, 0);
+  operands[6] = simplify_subreg (DImode, operands[0], <MODE>mode, 8);
+  operands[7] = simplify_subreg (DImode, operands[1], <MODE>mode, 8);
+  operands[8] = simplify_subreg (DImode, operands[2], <MODE>mode, 8);
+}
+  [(set_attr "type" "vecsimple,two,two,two,three,three")
+   (set_attr "length" "4,8,8,8,16,16")])
+
+(define_insn "*vsx_one_cmpl<mode>2_32bit"
+  [(set (match_operand:VSX_L2 0 "vlogical_operand" "=wa")
+        (not:VSX_L2 (match_operand:VSX_L2 1 "vlogical_operand" "wa")))]
+  "!TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)"
   "xxlnor %x0,%x1,%x1"
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecsimple")
+   (set_attr "length" "4")])
+
+(define_insn_and_split "*vsx_one_cmpl<mode>2_64bit"
+  [(set (match_operand:VSX_L 0 "vlogical_operand" "=wa,?r,&?r")
+        (not:VSX_L (match_operand:VSX_L 1 "vlogical_operand" "wa,0,r")))]
+  "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)"
+  "@
+   xxlnor %x0,%x1,%x1
+   #
+   #"
+  "reload_completed && TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
+   && int_reg_operand (operands[0], <MODE>mode)"
+  [(set (match_dup 2) (not:DI (match_dup 3)))
+   (set (match_dup 4) (not:DI (match_dup 5)))]
+{
+  operands[2] = simplify_subreg (DImode, operands[0], <MODE>mode, 0);
+  operands[3] = simplify_subreg (DImode, operands[1], <MODE>mode, 0);
+  operands[4] = simplify_subreg (DImode, operands[0], <MODE>mode, 8);
+  operands[5] = simplify_subreg (DImode, operands[1], <MODE>mode, 8);
+}
+  [(set_attr "type" "vecsimple,two,two")
+   (set_attr "length" "4,8,8")])
   
-(define_insn "*vsx_nor<mode>3"
-  [(set (match_operand:VSX_L 0 "vsx_register_operand" "=<VSr>,?wa")
-        (not:VSX_L
-	 (ior:VSX_L
-	  (match_operand:VSX_L 1 "vsx_register_operand" "<VSr>,?wa")
-	  (match_operand:VSX_L 2 "vsx_register_operand" "<VSr>,?wa"))))]
-  "VECTOR_MEM_VSX_P (<MODE>mode)
-   && (<MODE>mode != TImode || TARGET_POWERPC64)"
+(define_insn "*vsx_nor<mode>3_32bit"
+  [(set (match_operand:VSX_L2 0 "vlogical_operand" "=wa")
+	(and:VSX_L2
+	 (not:VSX_L2 (match_operand:VSX_L 1 "vlogical_operand" "%wa"))
+	 (not:VSX_L2 (match_operand:VSX_L 2 "vlogical_operand" "wa"))))]
+  "!TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)"
   "xxlnor %x0,%x1,%x2"
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecsimple")
+   (set_attr "length" "4")])
 
-(define_insn "*vsx_andc<mode>3"
-  [(set (match_operand:VSX_L 0 "vsx_register_operand" "=<VSr>,?wa")
+(define_insn_and_split "*vsx_nor<mode>3_64bit"
+  [(set (match_operand:VSX_L 0 "vlogical_operand" "=wa,?r,?r,&?r")
+	(and:VSX_L
+	 (not:VSX_L (match_operand:VSX_L 1 "vlogical_operand" "%wa,0,r,r"))
+	 (not:VSX_L (match_operand:VSX_L 2 "vlogical_operand" "wa,r,0,r"))))]
+  "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)"
+  "@
+   xxlnor %x0,%x1,%x2
+   #
+   #
+   #"
+  "reload_completed && TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
+   && int_reg_operand (operands[0], <MODE>mode)"
+  [(set (match_dup 3) (and:DI (not:DI (match_dup 4)) (not:DI (match_dup 5))))
+   (set (match_dup 6) (and:DI (not:DI (match_dup 7)) (not:DI (match_dup 8))))]
+{
+  operands[3] = simplify_subreg (DImode, operands[0], <MODE>mode, 0);
+  operands[4] = simplify_subreg (DImode, operands[1], <MODE>mode, 0);
+  operands[5] = simplify_subreg (DImode, operands[2], <MODE>mode, 0);
+  operands[6] = simplify_subreg (DImode, operands[0], <MODE>mode, 8);
+  operands[7] = simplify_subreg (DImode, operands[1], <MODE>mode, 8);
+  operands[8] = simplify_subreg (DImode, operands[2], <MODE>mode, 8);
+}
+  [(set_attr "type" "vecsimple,two,two,two")
+   (set_attr "length" "4,8,8,8")])
+
+(define_insn "*vsx_andc<mode>3_32bit"
+  [(set (match_operand:VSX_L2 0 "vlogical_operand" "=wa")
+        (and:VSX_L2
+	 (not:VSX_L2
+	  (match_operand:VSX_L2 2 "vlogical_operand" "wa"))
+	 (match_operand:VSX_L2 1 "vlogical_operand" "wa")))]
+  "!TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)"
+  "xxlandc %x0,%x1,%x2"
+  [(set_attr "type" "vecsimple")
+   (set_attr "length" "4")])
+
+(define_insn_and_split "*vsx_andc<mode>3_64bit"
+  [(set (match_operand:VSX_L 0 "vlogical_operand" "=wa,?r,?r,?r")
         (and:VSX_L
 	 (not:VSX_L
-	  (match_operand:VSX_L 2 "vsx_register_operand" "<VSr>,?wa"))
-	 (match_operand:VSX_L 1 "vsx_register_operand" "<VSr>,?wa")))]
-  "VECTOR_MEM_VSX_P (<MODE>mode)
-   && (<MODE>mode != TImode || TARGET_POWERPC64)"
-  "xxlandc %x0,%x1,%x2"
-  [(set_attr "type" "vecsimple")])
+	  (match_operand:VSX_L 2 "vlogical_operand" "wa,0,r,r"))
+	 (match_operand:VSX_L 1 "vlogical_operand" "wa,r,0,r")))]
+  "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)"
+  "@
+   xxlandc %x0,%x1,%x2
+   #
+   #
+   #"
+  "reload_completed && TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
+   && int_reg_operand (operands[0], <MODE>mode)"
+  [(set (match_dup 3) (and:DI (not:DI (match_dup 4)) (match_dup 5)))
+   (set (match_dup 6) (and:DI (not:DI (match_dup 7)) (match_dup 8)))]
+{
+  operands[3] = simplify_subreg (DImode, operands[0], <MODE>mode, 0);
+  operands[4] = simplify_subreg (DImode, operands[1], <MODE>mode, 0);
+  operands[5] = simplify_subreg (DImode, operands[2], <MODE>mode, 0);
+  operands[6] = simplify_subreg (DImode, operands[0], <MODE>mode, 8);
+  operands[7] = simplify_subreg (DImode, operands[1], <MODE>mode, 8);
+  operands[8] = simplify_subreg (DImode, operands[2], <MODE>mode, 8);
+}
+  [(set_attr "type" "vecsimple,two,two,two")
+   (set_attr "length" "4,8,8,8")])
+
+;; Power8 vector logical instructions.
+(define_insn "*vsx_eqv<mode>3_32bit"
+  [(set (match_operand:VSX_L2 0 "vlogical_operand" "=wa")
+	(not:VSX_L2
+	 (xor:VSX_L2 (match_operand:VSX_L2 1 "vlogical_operand" "wa")
+		     (match_operand:VSX_L2 2 "vlogical_operand" "wa"))))]
+  "!TARGET_POWERPC64 && TARGET_P8_VECTOR && VECTOR_MEM_VSX_P (<MODE>mode)"
+  "xxleqv %x0,%x1,%x2"
+  [(set_attr "type" "vecsimple")
+   (set_attr "length" "4")])
+
+(define_insn_and_split "*vsx_eqv<mode>3_64bit"
+  [(set (match_operand:VSX_L 0 "vlogical_operand" "=wa,?r,?r,?r")
+	(not:VSX_L
+	 (xor:VSX_L (match_operand:VSX_L 1 "vlogical_operand" "wa,0,r,r")
+		    (match_operand:VSX_L 2 "vlogical_operand" "wa,r,0,r"))))]
+  "TARGET_POWERPC64 && TARGET_P8_VECTOR && VECTOR_MEM_VSX_P (<MODE>mode)"
+  "@
+   xxleqv %x0,%x1,%x2
+   #
+   #
+   #"
+  "reload_completed && TARGET_POWERPC64 && TARGET_P8_VECTOR
+   && VECTOR_MEM_VSX_P (<MODE>mode)
+   && int_reg_operand (operands[0], <MODE>mode)"
+  [(set (match_dup 3) (not:DI (xor:DI (match_dup 4) (match_dup 5))))
+   (set (match_dup 6) (not:DI (xor:DI (match_dup 7) (match_dup 8))))]
+{
+  operands[3] = simplify_subreg (DImode, operands[0], <MODE>mode, 0);
+  operands[4] = simplify_subreg (DImode, operands[1], <MODE>mode, 0);
+  operands[5] = simplify_subreg (DImode, operands[2], <MODE>mode, 0);
+  operands[6] = simplify_subreg (DImode, operands[0], <MODE>mode, 8);
+  operands[7] = simplify_subreg (DImode, operands[1], <MODE>mode, 8);
+  operands[8] = simplify_subreg (DImode, operands[2], <MODE>mode, 8);
+}
+  [(set_attr "type" "vecsimple,two,two,two")
+   (set_attr "length" "4,8,8,8")])
+
+;; Rewrite nand into canonical form
+(define_insn "*vsx_nand<mode>3_32bit"
+  [(set (match_operand:VSX_L2 0 "vlogical_operand" "=wa")
+	(ior:VSX_L2
+	 (not:VSX_L2 (match_operand:VSX_L2 1 "vlogical_operand" "wa"))
+	 (not:VSX_L2 (match_operand:VSX_L2 2 "vlogical_operand" "wa"))))]
+  "!TARGET_POWERPC64 && TARGET_P8_VECTOR && VECTOR_MEM_VSX_P (<MODE>mode)"
+  "xxlnand %x0,%x1,%x2"
+  [(set_attr "type" "vecsimple")
+   (set_attr "length" "4")])
+
+(define_insn_and_split "*vsx_nand<mode>3_64bit"
+  [(set (match_operand:VSX_L 0 "register_operand" "=wa,?r,?r,?r")
+	(ior:VSX_L
+	 (not:VSX_L (match_operand:VSX_L 1 "register_operand" "wa,0,r,r"))
+	 (not:VSX_L (match_operand:VSX_L 2 "register_operand" "wa,r,0,r"))))]
+  "TARGET_POWERPC64 && TARGET_P8_VECTOR && VECTOR_MEM_VSX_P (<MODE>mode)"
+  "@
+   xxlnand %x0,%x1,%x2
+   #
+   #
+   #"
+  "reload_completed && TARGET_POWERPC64 && TARGET_P8_VECTOR
+   && VECTOR_MEM_VSX_P (<MODE>mode)
+   && int_reg_operand (operands[0], <MODE>mode)"
+  [(set (match_dup 3) (ior:DI (not:DI (match_dup 4)) (not:DI (match_dup 5))))
+   (set (match_dup 6) (ior:DI (not:DI (match_dup 7)) (not:DI (match_dup 8))))]
+{
+  operands[3] = simplify_subreg (DImode, operands[0], <MODE>mode, 0);
+  operands[4] = simplify_subreg (DImode, operands[1], <MODE>mode, 0);
+  operands[5] = simplify_subreg (DImode, operands[2], <MODE>mode, 0);
+  operands[6] = simplify_subreg (DImode, operands[0], <MODE>mode, 8);
+  operands[7] = simplify_subreg (DImode, operands[1], <MODE>mode, 8);
+  operands[8] = simplify_subreg (DImode, operands[2], <MODE>mode, 8);
+}
+  [(set_attr "type" "vecsimple,two,two,two")
+   (set_attr "length" "4,8,8,8")])
+
+;; Rewrite or complement into canonical form, by reversing the arguments
+(define_insn "*vsx_orc<mode>3_32bit"
+  [(set (match_operand:VSX_L2 0 "vlogical_operand" "=wa")
+	(ior:VSX_L2
+	 (not:VSX_L2 (match_operand:VSX_L2 1 "vlogical_operand" "wa"))
+	 (match_operand:VSX_L2 2 "vlogical_operand" "wa")))]
+  "!TARGET_POWERPC64 && TARGET_P8_VECTOR && VECTOR_MEM_VSX_P (<MODE>mode)"
+  "xxlorc %x0,%x2,%x1"
+  [(set_attr "type" "vecsimple")
+   (set_attr "length" "4")])
+
+(define_insn_and_split "*vsx_orc<mode>3_64bit"
+  [(set (match_operand:VSX_L 0 "vlogical_operand" "=wa,?r,?r,?r")
+	(ior:VSX_L
+	 (not:VSX_L (match_operand:VSX_L 1 "vlogical_operand" "wa,0,r,r"))
+	 (match_operand:VSX_L 2 "vlogical_operand" "wa,r,0,r")))]
+  "TARGET_POWERPC64 && TARGET_P8_VECTOR && VECTOR_MEM_VSX_P (<MODE>mode)"
+  "@
+   xxlorc %x0,%x2,%x1
+   #
+   #
+   #"
+  "reload_completed && TARGET_POWERPC64 && TARGET_P8_VECTOR
+   && VECTOR_MEM_VSX_P (<MODE>mode)
+   && int_reg_operand (operands[0], <MODE>mode)"
+  [(set (match_dup 3) (ior:DI (not:DI (match_dup 4)) (match_dup 5)))
+   (set (match_dup 6) (ior:DI (not:DI (match_dup 7)) (match_dup 8)))]
+{
+  operands[3] = simplify_subreg (DImode, operands[0], <MODE>mode, 0);
+  operands[4] = simplify_subreg (DImode, operands[1], <MODE>mode, 0);
+  operands[5] = simplify_subreg (DImode, operands[2], <MODE>mode, 0);
+  operands[6] = simplify_subreg (DImode, operands[0], <MODE>mode, 8);
+  operands[7] = simplify_subreg (DImode, operands[1], <MODE>mode, 8);
+  operands[8] = simplify_subreg (DImode, operands[2], <MODE>mode, 8);
+}
+  [(set_attr "type" "vecsimple,two,two,two")
+   (set_attr "length" "4,8,8,8")])
 
 \f
 ;; Permute operations
Index: gcc/config/rs6000/rs6000.h
===================================================================
--- gcc/config/rs6000/rs6000.h	(revision 199752)
+++ gcc/config/rs6000/rs6000.h	(working copy)
@@ -1114,10 +1114,10 @@ extern unsigned rs6000_pointer_size;
 #define VINT_REGNO_P(N) ALTIVEC_REGNO_P (N)
 
 /* Alternate name for any vector register supporting logical operations, no
-   matter which instruction set(s) are available.  Under VSX, we allow GPRs as
-   well as vector registers on 64-bit systems.  We don't allow 32-bit systems,
-   due to the number of registers involved, and the number of instructions to
-   load/store the values..  */
+   matter which instruction set(s) are available.  For 64-bit mode, we also
+   allow logical operations in the GPRS.  This is to allow atomic quad word
+   builtins not to need the VSX registers for lqarx/stqcx.  It also helps with
+   __int128_t arguments that are passed in GPRs.  */
 #define VLOGICAL_REGNO_P(N)						\
   (ALTIVEC_REGNO_P (N)							\
    || (TARGET_VSX && FP_REGNO_P (N))					\
Index: gcc/config/rs6000/altivec.md
===================================================================
--- gcc/config/rs6000/altivec.md	(revision 199752)
+++ gcc/config/rs6000/altivec.md	(working copy)
@@ -128,6 +128,7 @@ (define_c_enum "unspec"
    UNSPEC_VUPKLS_V4SF
    UNSPEC_VUPKHU_V4SF
    UNSPEC_VUPKLU_V4SF
+   UNSPEC_VGBBD
 ])
 
 (define_c_enum "unspecv"
@@ -941,6 +942,31 @@ (define_insn "*altivec_vmrglsf"
   "vmrglw %0,%1,%2"
   [(set_attr "type" "vecperm")])
 
+;; Power8 vector merge even/odd
+(define_insn "p8_vmrgew"
+  [(set (match_operand:V4SI 0 "register_operand" "=v")
+	(vec_select:V4SI
+	  (vec_concat:V8SI
+	    (match_operand:V4SI 1 "register_operand" "v")
+	    (match_operand:V4SI 2 "register_operand" "v"))
+	  (parallel [(const_int 0) (const_int 4)
+		     (const_int 2) (const_int 6)])))]
+  "TARGET_P8_VECTOR"
+  "vmrgew %0,%1,%2"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "p8_vmrgow"
+  [(set (match_operand:V4SI 0 "register_operand" "=v")
+	(vec_select:V4SI
+	  (vec_concat:V8SI
+	    (match_operand:V4SI 1 "register_operand" "v")
+	    (match_operand:V4SI 2 "register_operand" "v"))
+	  (parallel [(const_int 1) (const_int 5)
+		     (const_int 3) (const_int 7)])))]
+  "TARGET_P8_VECTOR"
+  "vmrgow %0,%1,%2"
+  [(set_attr "type" "vecperm")])
+
 (define_insn "vec_widen_umult_even_v16qi"
   [(set (match_operand:V8HI 0 "register_operand" "=v")
         (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")
@@ -1017,10 +1043,13 @@ (define_insn "vec_widen_smult_odd_v8hi"
 ;; logical ops.  Have the logical ops follow the memory ops in
 ;; terms of whether to prefer VSX or Altivec
 
+;; AND has a clobber to be consistant with VSX, which adds splitters for using
+;; the GPR registers.
 (define_insn "*altivec_and<mode>3"
   [(set (match_operand:VM 0 "register_operand" "=v")
         (and:VM (match_operand:VM 1 "register_operand" "v")
-		(match_operand:VM 2 "register_operand" "v")))]
+		(match_operand:VM 2 "register_operand" "v")))
+   (clobber (match_scratch:CC 3 "=X"))]
   "VECTOR_MEM_ALTIVEC_P (<MODE>mode)"
   "vand %0,%1,%2"
   [(set_attr "type" "vecsimple")])
@@ -1050,8 +1079,8 @@ (define_insn "*altivec_one_cmpl<mode>2"
   
 (define_insn "*altivec_nor<mode>3"
   [(set (match_operand:VM 0 "register_operand" "=v")
-        (not:VM (ior:VM (match_operand:VM 1 "register_operand" "v")
-			(match_operand:VM 2 "register_operand" "v"))))]
+	(and:VM (not:VM (match_operand:VM 1 "register_operand" "v"))
+		(not:VM (match_operand:VM 2 "register_operand" "v"))))]
   "VECTOR_MEM_ALTIVEC_P (<MODE>mode)"
   "vnor %0,%1,%2"
   [(set_attr "type" "vecsimple")])
@@ -2370,3 +2399,34 @@ (define_expand "vec_unpacku_float_lo_v8h
   emit_insn (gen_altivec_vcfux (operands[0], tmp, const0_rtx));
   DONE;
 }")
+
+\f
+;; Power8 vector instructions encoded as Altivec instructions
+
+;; Vector count leading zeros
+(define_insn "*p8v_clz<mode>2"
+  [(set (match_operand:VI2 0 "register_operand" "=v")
+	(clz:VI2 (match_operand:VI2 1 "register_operand" "v")))]
+  "TARGET_P8_VECTOR"
+  "vclz<wd> %0,%1"
+  [(set_attr "length" "4")
+   (set_attr "type" "vecsimple")])
+
+;; Vector population count
+(define_insn "*p8v_popcount<mode>2"
+  [(set (match_operand:VI2 0 "register_operand" "=v")
+        (popcount:VI2 (match_operand:VI2 1 "register_operand" "v")))]
+  "TARGET_P8_VECTOR"
+  "vpopcnt<wd> %0,%1"
+  [(set_attr "length" "4")
+   (set_attr "type" "vecsimple")])
+
+;; Vector Gather Bits by Bytes by Doubleword
+(define_insn "p8v_vgbbd"
+  [(set (match_operand:V16QI 0 "register_operand" "=v")
+	(unspec:V16QI [(match_operand:V16QI 1 "register_operand" "v")]
+		      UNSPEC_VGBBD))]
+  "TARGET_P8_VECTOR"
+  "vgbbd %0,%1"
+  [(set_attr "length" "4")
+   (set_attr "type" "vecsimple")])
Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(revision 199752)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -8290,6 +8290,18 @@ (define_split
 	(compare:CC (match_dup 0)
 		    (const_int 0)))]
   "")
+
+;; Eqv operation.
+(define_insn "*eqv<mode>3"
+  [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
+	(not:GPR
+	 (xor:GPR (match_operand:GPR 1 "gpc_reg_operand" "r")
+		  (match_operand:GPR 2 "gpc_reg_operand" "r"))))]
+  ""
+  "eqv %0,%1,%2"
+  [(set_attr "type" "integer")
+   (set_attr "length" "4")])
+
 \f
 ;; Now define ways of moving data around.
 
Index: gcc/config/rs6000/altivec.h
===================================================================
--- gcc/config/rs6000/altivec.h	(revision 199752)
+++ gcc/config/rs6000/altivec.h	(working copy)
@@ -323,15 +323,31 @@
 
 #ifdef _ARCH_PWR8
 /* Vector additions added in ISA 2.07.  */
+#define vec_eqv __builtin_vec_eqv
+#define vec_nand __builtin_vec_nand
+#define vec_orc __builtin_vec_orc
 #define vec_vaddudm __builtin_vec_vaddudm
+#define vec_vclz __builtin_vec_vclz
+#define vec_vclzb __builtin_vec_vclzb
+#define vec_vclzd __builtin_vec_vclzd
+#define vec_vclzh __builtin_vec_vclzh
+#define vec_vclzw __builtin_vec_vclzw
+#define vec_vgbbd __builtin_vec_vgbbd
 #define vec_vmaxsd __builtin_vec_vmaxsd
 #define vec_vmaxud __builtin_vec_vmaxud
 #define vec_vminsd __builtin_vec_vminsd
 #define vec_vminud __builtin_vec_vminud
+#define vec_vmrgew __builtin_vec_vmrgew
+#define vec_vmrgow __builtin_vec_vmrgow
 #define vec_vpksdss __builtin_vec_vpksdss
 #define vec_vpksdus __builtin_vec_vpksdus
 #define vec_vpkudum __builtin_vec_vpkudum
 #define vec_vpkudus __builtin_vec_vpkudus
+#define vec_vpopcnt __builtin_vec_vpopcnt
+#define vec_vpopcntb __builtin_vec_vpopcntb
+#define vec_vpopcntd __builtin_vec_vpopcntd
+#define vec_vpopcnth __builtin_vec_vpopcnth
+#define vec_vpopcntw __builtin_vec_vpopcntw
 #define vec_vrld __builtin_vec_vrld
 #define vec_vsld __builtin_vec_vsld
 #define vec_vsrad __builtin_vec_vsrad

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH, rs6000] power8 patches, patch #5, new vector tests
  2013-05-21 23:49 ` [PATCH, rs6000] power8 patches, patch #5, new vector tests Michael Meissner
@ 2013-06-06 21:51   ` Michael Meissner
  0 siblings, 0 replies; 52+ messages in thread
From: Michael Meissner @ 2013-06-06 21:51 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, dje.gcc, pthaugen, bergner

I checked in the tests that went with power8 patches #3 and #4 (which have been
committed) as subversion id 199768.

2013-06-06  Michael Meissner  <meissner@linux.vnet.ibm.com>
	    Pat Haugen <pthaugen@us.ibm.com>
	    Peter Bergner <bergner@vnet.ibm.com>

	* gcc.target/powerpc/p8vector-builtin-1.c: New test to test
	power8 builtin functions.
	* gcc/testsuite/gcc.target/powerpc/p8vector-builtin-2.c: Likewise.
	* gcc/testsuite/gcc.target/powerpc/p8vector-builtin-3.c: Likewise.
	* gcc/testsuite/gcc.target/powerpc/p8vector-builtin-4.c: Likewise.
	* gcc/testsuite/gcc.target/powerpc/p8vector-builtin-5.c: Likewise.
	* gcc/testsuite/gcc.target/powerpc/p8vector-builtin-6.c: Likewise.
	* gcc/testsuite/gcc.target/powerpc/p8vector-builtin-7.c: Likewise.
	* gcc/testsuite/gcc.target/powerpc/p8vector-vectorize-1.c: New
	tests to test power8 auto-vectorization.
	* gcc/testsuite/gcc.target/powerpc/p8vector-vectorize-2.c: Likewise.
	* gcc/testsuite/gcc.target/powerpc/p8vector-vectorize-3.c: Likewise.
	* gcc/testsuite/gcc.target/powerpc/p8vector-vectorize-4.c: Likewise.
	* gcc/testsuite/gcc.target/powerpc/p8vector-vectorize-5.c: Likewise.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH, rs6000] power8 patches, patch #9, power8 scheduling
  2013-05-20 20:41 [PATCH, rs6000] power8 patches Michael Meissner
                   ` (8 preceding siblings ...)
  2013-05-22 20:53 ` [PATCH, rs6000] power8 patches, patch #8, power8 load fusion + misc Michael Meissner
@ 2013-06-07 19:22 ` Pat Haugen
  2013-06-19 13:00   ` David Edelsohn
  9 siblings, 1 reply; 52+ messages in thread
From: Pat Haugen @ 2013-06-07 19:22 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, dje.gcc, bergner

[-- Attachment #1: Type: text/plain, Size: 782 bytes --]

This patch adds instruction scheduling support for the Power8 processor. 
Bootstrap/regression test with no new failures. Ok for trunk?


2013-06-07  Michael Meissner  <meissner@linux.vnet.ibm.com>
         Pat Haugen <pthaugen@us.ibm.com>
         Peter Bergner <bergner@vnet.ibm.com>

     * config/rs6000/power8.md: New.
     * config/rs6000/rs6000-cpus.def (RS6000_CPU table): Adjust processor
     setting for power8 entry.
     * config/rs6000/t-rs6000 (MD_INCLUDES): Add power8.md.
     * config/rs6000/rs6000.c (is_microcoded_insn, is_cracked_insn): Adjust
     test for Power4/Power5 only.
     (insn_must_be_first_in_group, insn_must_be_last_in_group): Add Power8
     support.
     (force_new_group): Adjust comment.
     * config/rs6000/rs6000.md: Include power8.md.




[-- Attachment #2: power8-sched.diff --]
[-- Type: text/plain, Size: 17821 bytes --]

Index: gcc/config/rs6000/power8.md
===================================================================
--- gcc/config/rs6000/power8.md	(revision 0)
+++ gcc/config/rs6000/power8.md	(revision 0)
@@ -0,0 +1,373 @@
+;; Scheduling description for IBM POWER8 processor.
+;; Copyright (C) 2013 Free Software Foundation, Inc.
+;;
+;; Contributed by Pat Haugen (pthaugen@us.ibm.com).
+
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published
+;; by the Free Software Foundation; either version 3, or (at your
+;; option) any later version.
+;;
+;; GCC is distributed in the hope that it will be useful, but WITHOUT
+;; ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+;; or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+;; License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+(define_automaton "power8fxu,power8lsu,power8vsu,power8misc")
+
+(define_cpu_unit "fxu0_power8,fxu1_power8" "power8fxu")
+(define_cpu_unit "lu0_power8,lu1_power8" "power8lsu")
+(define_cpu_unit "lsu0_power8,lsu1_power8" "power8lsu")
+(define_cpu_unit "vsu0_power8,vsu1_power8" "power8vsu")
+(define_cpu_unit "bpu_power8,cru_power8" "power8misc")
+(define_cpu_unit "du0_power8,du1_power8,du2_power8,du3_power8,du4_power8,\
+		  du5_power8,du6_power8"  "power8misc")
+
+
+; Dispatch group reservations
+(define_reservation "DU_any_power8"
+		    "du0_power8|du1_power8|du2_power8|du3_power8|du4_power8|\
+		     du5_power8")
+
+; 2-way Cracked instructions go in slots 0-1
+;   (can also have a second in slots 3-4 if insns are adjacent)
+(define_reservation "DU_cracked_power8"
+		    "du0_power8+du1_power8")
+
+; Insns that are first in group
+(define_reservation "DU_first_power8"
+		    "du0_power8")
+
+; Insns that are first and last in group
+(define_reservation "DU_both_power8"
+		    "du0_power8+du1_power8+du2_power8+du3_power8+du4_power8+\
+		     du5_power8+du6_power8")
+
+; Dispatch slots are allocated in order conforming to program order.
+(absence_set "du0_power8" "du1_power8,du2_power8,du3_power8,du4_power8,\
+	      du5_power8,du6_power8")
+(absence_set "du1_power8" "du2_power8,du3_power8,du4_power8,du5_power8,\
+	      du6_power8")
+(absence_set "du2_power8" "du3_power8,du4_power8,du5_power8,du6_power8")
+(absence_set "du3_power8" "du4_power8,du5_power8,du6_power8")
+(absence_set "du4_power8" "du5_power8,du6_power8")
+(absence_set "du5_power8" "du6_power8")
+
+
+; Execution unit reservations
+(define_reservation "FXU_power8"
+                    "fxu0_power8|fxu1_power8")
+
+(define_reservation "LU_power8"
+                    "lu0_power8|lu1_power8")
+
+(define_reservation "LSU_power8"
+                    "lsu0_power8|lsu1_power8")
+
+(define_reservation "LU_or_LSU_power8"
+                    "lu0_power8|lu1_power8|lsu0_power8|lsu1_power8")
+
+(define_reservation "VSU_power8"
+                    "vsu0_power8|vsu1_power8")
+
+
+; LS Unit
+(define_insn_reservation "power8-load" 3
+  (and (eq_attr "type" "load")
+       (eq_attr "cpu" "power8"))
+  "DU_any_power8,LU_or_LSU_power8")
+
+(define_insn_reservation "power8-load-update" 3
+  (and (eq_attr "type" "load_u,load_ux")
+       (eq_attr "cpu" "power8"))
+  "DU_cracked_power8,LU_or_LSU_power8+FXU_power8")
+
+(define_insn_reservation "power8-load-ext" 3
+  (and (eq_attr "type" "load_ext")
+       (eq_attr "cpu" "power8"))
+  "DU_cracked_power8,LU_or_LSU_power8,FXU_power8")
+
+(define_insn_reservation "power8-load-ext-update" 3
+  (and (eq_attr "type" "load_ext_u,load_ext_ux")
+       (eq_attr "cpu" "power8"))
+  "DU_both_power8,LU_or_LSU_power8+FXU_power8,FXU_power8")
+
+(define_insn_reservation "power8-fpload" 5
+  (and (eq_attr "type" "fpload,vecload")
+       (eq_attr "cpu" "power8"))
+  "DU_any_power8,LU_power8")
+
+(define_insn_reservation "power8-fpload-update" 5
+  (and (eq_attr "type" "fpload_u,fpload_ux")
+       (eq_attr "cpu" "power8"))
+  "DU_cracked_power8,LU_power8+FXU_power8")
+
+(define_insn_reservation "power8-store" 5 ; store-forwarding latency
+  (and (eq_attr "type" "store,store_u")
+       (eq_attr "cpu" "power8"))
+  "DU_any_power8,LSU_power8+LU_power8")
+
+(define_insn_reservation "power8-store-update-indexed" 5
+  (and (eq_attr "type" "store_ux")
+       (eq_attr "cpu" "power8"))
+  "DU_cracked_power8,LSU_power8+LU_power8")
+
+(define_insn_reservation "power8-fpstore" 5
+  (and (eq_attr "type" "fpstore")
+       (eq_attr "cpu" "power8"))
+  "DU_any_power8,LSU_power8+VSU_power8")
+
+(define_insn_reservation "power8-fpstore-update" 5
+  (and (eq_attr "type" "fpstore_u,fpstore_ux")
+       (eq_attr "cpu" "power8"))
+  "DU_any_power8,LSU_power8+VSU_power8")
+
+(define_insn_reservation "power8-vecstore" 5
+  (and (eq_attr "type" "vecstore")
+       (eq_attr "cpu" "power8"))
+  "DU_cracked_power8,LSU_power8+VSU_power8")
+
+(define_insn_reservation "power8-larx" 3
+  (and (eq_attr "type" "load_l")
+       (eq_attr "cpu" "power8"))
+  "DU_both_power8,LU_or_LSU_power8")
+
+(define_insn_reservation "power8-stcx" 10
+  (and (eq_attr "type" "store_c")
+       (eq_attr "cpu" "power8"))
+  "DU_both_power8,LSU_power8+LU_power8")
+
+(define_insn_reservation "power8-sync" 1
+  (and (eq_attr "type" "sync,isync")
+       (eq_attr "cpu" "power8"))
+  "DU_both_power8,LSU_power8")
+
+
+; FX Unit
+(define_insn_reservation "power8-1cyc" 1
+  (and (eq_attr "type" "integer,insert_word,insert_dword,shift,trap,\
+                        var_shift_rotate,exts,isel")
+       (eq_attr "cpu" "power8"))
+  "DU_any_power8,FXU_power8")
+
+; Extra cycle to LU/LSU
+(define_bypass 2 "power8-1cyc"
+		 "power8-load*,power8-fpload*,power8-store*,power8-fpstore*,\
+		  power8-vecstore,power8-larx,power8-stcx")
+;		 "power8-load,power8-load-update,power8-load-ext,\
+;		  power8-load-ext-update,power8-fpload,power8-fpload-update,\
+;		  power8-store,power8-store-update,power8-store-update-indexed,\
+;		  power8-fpstore,power8-fpstore-update,power8-vecstore,\
+;		  power8-larx,power8-stcx")
+
+(define_insn_reservation "power8-2cyc" 2
+  (and (eq_attr "type" "cntlz,popcnt")
+       (eq_attr "cpu" "power8"))
+  "DU_any_power8,FXU_power8")
+
+(define_insn_reservation "power8-two" 2
+  (and (eq_attr "type" "two")
+       (eq_attr "cpu" "power8"))
+  "DU_any_power8+DU_any_power8,FXU_power8,FXU_power8")
+
+(define_insn_reservation "power8-three" 3
+  (and (eq_attr "type" "three")
+       (eq_attr "cpu" "power8"))
+  "DU_any_power8+DU_any_power8+DU_any_power8,FXU_power8,FXU_power8,FXU_power8")
+
+; cmp - Normal compare insns
+(define_insn_reservation "power8-cmp" 2
+  (and (eq_attr "type" "cmp")
+       (eq_attr "cpu" "power8"))
+  "DU_any_power8,FXU_power8")
+
+; fast_compare : add./and./nor./etc
+(define_insn_reservation "power8-fast-compare" 2
+  (and (eq_attr "type" "fast_compare")
+       (eq_attr "cpu" "power8"))
+  "DU_any_power8,FXU_power8")
+
+; compare : rldicl./exts./etc
+; delayed_compare : rlwinm./slwi./etc
+; var_delayed_compare : rlwnm./slw./etc
+(define_insn_reservation "power8-compare" 2
+  (and (eq_attr "type" "compare,delayed_compare,var_delayed_compare")
+       (eq_attr "cpu" "power8"))
+  "DU_cracked_power8,FXU_power8,FXU_power8")
+
+; Extra cycle to LU/LSU
+(define_bypass 3 "power8-fast-compare,power8-compare"
+		 "power8-load*,power8-fpload*,power8-store*,power8-fpstore*,\
+		  power8-vecstore,power8-larx,power8-stcx")
+
+; 5 cycle CR latency 
+(define_bypass 5 "power8-fast-compare,power8-compare"
+		 "power8-crlogical,power8-mfcr,power8-mfcrf,power8-branch")
+
+(define_insn_reservation "power8-mul" 4
+  (and (eq_attr "type" "imul,imul2,imul3,lmul")
+       (eq_attr "cpu" "power8"))
+  "DU_any_power8,FXU_power8")
+
+(define_insn_reservation "power8-mul-compare" 4
+  (and (eq_attr "type" "imul_compare,lmul_compare")
+       (eq_attr "cpu" "power8"))
+  "DU_cracked_power8,FXU_power8")
+
+; Extra cycle to LU/LSU
+(define_bypass 5 "power8-mul,power8-mul-compare"
+		 "power8-load*,power8-fpload*,power8-store*,power8-fpstore*,\
+		  power8-vecstore,power8-larx,power8-stcx")
+
+; 7 cycle CR latency 
+(define_bypass 7 "power8-mul,power8-mul-compare"
+		 "power8-crlogical,power8-mfcr,power8-mfcrf,power8-branch")
+
+; FXU divides are not pipelined
+(define_insn_reservation "power8-idiv" 37
+  (and (eq_attr "type" "idiv")
+       (eq_attr "cpu" "power8"))
+  "DU_any_power8,fxu0_power8*37|fxu1_power8*37")
+
+(define_insn_reservation "power8-ldiv" 68
+  (and (eq_attr "type" "ldiv")
+       (eq_attr "cpu" "power8"))
+  "DU_any_power8,fxu0_power8*68|fxu1_power8*68")
+
+(define_insn_reservation "power8-mtjmpr" 5
+  (and (eq_attr "type" "mtjmpr")
+       (eq_attr "cpu" "power8"))
+  "DU_first_power8,FXU_power8")
+
+; Should differentiate between 1 cr field and > 1 since mtocrf is not microcode
+(define_insn_reservation "power8-mtcr" 3
+  (and (eq_attr "type" "mtcr")
+       (eq_attr "cpu" "power8"))
+  "DU_both_power8,FXU_power8")
+
+
+; CR Unit
+(define_insn_reservation "power8-mfjmpr" 5
+  (and (eq_attr "type" "mfjmpr")
+       (eq_attr "cpu" "power8"))
+  "DU_first_power8,cru_power8+FXU_power8")
+
+(define_insn_reservation "power8-crlogical" 3
+  (and (eq_attr "type" "cr_logical,delayed_cr")
+       (eq_attr "cpu" "power8"))
+  "DU_first_power8,cru_power8")
+
+(define_insn_reservation "power8-mfcr" 5
+  (and (eq_attr "type" "mfcr")
+       (eq_attr "cpu" "power8"))
+  "DU_both_power8,cru_power8")
+
+(define_insn_reservation "power8-mfcrf" 3
+  (and (eq_attr "type" "mfcrf")
+       (eq_attr "cpu" "power8"))
+  "DU_first_power8,cru_power8")
+
+
+; BR Unit
+; Branches take dispatch slot 7, but reserve any remaining prior slots to
+; prevent other insns from grabbing them once this is assigned.
+(define_insn_reservation "power8-branch" 3
+  (and (eq_attr "type" "jmpreg,branch")
+       (eq_attr "cpu" "power8"))
+  "(du6_power8\
+   |du5_power8+du6_power8\
+   |du4_power8+du5_power8+du6_power8\
+   |du3_power8+du4_power8+du5_power8+du6_power8\
+   |du2_power8+du3_power8+du4_power8+du5_power8+du6_power8\
+   |du1_power8+du2_power8+du3_power8+du4_power8+du5_power8+du6_power8\
+   |du0_power8+du1_power8+du2_power8+du3_power8+du4_power8+du5_power8+\
+    du6_power8),bpu_power8")
+
+; Branch updating LR/CTR feeding mf[lr|ctr]
+(define_bypass 4 "power8-branch" "power8-mfjmpr")
+
+
+; VS Unit (includes FP/VSX/VMX/DFP/Crypto)
+(define_insn_reservation "power8-fp" 6
+  (and (eq_attr "type" "fp,dmul")
+       (eq_attr "cpu" "power8"))
+  "DU_any_power8,VSU_power8")
+
+; Additional 3 cycles for any CR result
+(define_bypass 9 "power8-fp" "power8-crlogical,power8-mfcr*,power8-branch")
+
+(define_insn_reservation "power8-fpcompare" 8
+  (and (eq_attr "type" "fpcompare")
+       (eq_attr "cpu" "power8"))
+  "DU_any_power8,VSU_power8")
+
+(define_insn_reservation "power8-sdiv" 27
+  (and (eq_attr "type" "sdiv")
+       (eq_attr "cpu" "power8"))
+  "DU_any_power8,VSU_power8")
+
+(define_insn_reservation "power8-ddiv" 33
+  (and (eq_attr "type" "ddiv")
+       (eq_attr "cpu" "power8"))
+  "DU_any_power8,VSU_power8")
+
+(define_insn_reservation "power8-sqrt" 32
+  (and (eq_attr "type" "ssqrt")
+       (eq_attr "cpu" "power8"))
+  "DU_any_power8,VSU_power8")
+
+(define_insn_reservation "power8-dsqrt" 44
+  (and (eq_attr "type" "dsqrt")
+       (eq_attr "cpu" "power8"))
+  "DU_any_power8,VSU_power8")
+
+(define_insn_reservation "power8-vecsimple" 2
+  (and (eq_attr "type" "vecperm,vecsimple,veccmp")
+       (eq_attr "cpu" "power8"))
+  "DU_any_power8,VSU_power8")
+
+(define_insn_reservation "power8-vecnormal" 6
+  (and (eq_attr "type" "vecfloat,vecdouble")
+       (eq_attr "cpu" "power8"))
+  "DU_any_power8,VSU_power8")
+
+(define_bypass 7 "power8-vecnormal"
+		 "power8-vecsimple,power8-veccomplex,power8-fpstore*,\
+		  power8-vecstore")
+
+(define_insn_reservation "power8-veccomplex" 7
+  (and (eq_attr "type" "veccomplex")
+       (eq_attr "cpu" "power8"))
+  "DU_any_power8,VSU_power8")
+
+(define_insn_reservation "power8-vecfdiv" 25
+  (and (eq_attr "type" "vecfdiv")
+       (eq_attr "cpu" "power8"))
+  "DU_any_power8,VSU_power8")
+
+(define_insn_reservation "power8-vecdiv" 31
+  (and (eq_attr "type" "vecdiv")
+       (eq_attr "cpu" "power8"))
+  "DU_any_power8,VSU_power8")
+
+(define_insn_reservation "power8-mffgpr" 5
+  (and (eq_attr "type" "mffgpr")
+       (eq_attr "cpu" "power8"))
+  "DU_any_power8,VSU_power8")
+
+(define_insn_reservation "power8-mftgpr" 6
+  (and (eq_attr "type" "mftgpr")
+       (eq_attr "cpu" "power8"))
+  "DU_any_power8,VSU_power8")
+
+(define_insn_reservation "power8-crypto" 7
+  (and (eq_attr "type" "crypto")
+       (eq_attr "cpu" "power8"))
+  "DU_any_power8,VSU_power8")
+
Index: gcc/config/rs6000/rs6000-cpus.def
===================================================================
--- gcc/config/rs6000/rs6000-cpus.def	(revision 199674)
+++ gcc/config/rs6000/rs6000-cpus.def	(working copy)
@@ -181,7 +181,7 @@ RS6000_CPU ("power7", PROCESSOR_POWER7, 
 	    POWERPC_7400_MASK | MASK_POWERPC64 | MASK_PPC_GPOPT | MASK_MFCRF
 	    | MASK_POPCNTB | MASK_FPRND | MASK_CMPB | MASK_DFP | MASK_POPCNTD
 	    | MASK_VSX | MASK_RECIP_PRECISION | MASK_VSX_TIMODE)
-RS6000_CPU ("power8", PROCESSOR_POWER7, MASK_POWERPC64 | ISA_2_7_MASKS_SERVER)
+RS6000_CPU ("power8", PROCESSOR_POWER8, MASK_POWERPC64 | ISA_2_7_MASKS_SERVER)
 RS6000_CPU ("powerpc", PROCESSOR_POWERPC, 0)
 RS6000_CPU ("powerpc64", PROCESSOR_POWERPC64, MASK_PPC_GFXOPT | MASK_POWERPC64)
 RS6000_CPU ("rs64", PROCESSOR_RS64A, MASK_PPC_GFXOPT | MASK_POWERPC64)
Index: gcc/config/rs6000/t-rs6000
===================================================================
--- gcc/config/rs6000/t-rs6000	(revision 199674)
+++ gcc/config/rs6000/t-rs6000	(working copy)
@@ -60,6 +60,7 @@ MD_INCLUDES = $(srcdir)/config/rs6000/rs
 	$(srcdir)/config/rs6000/power5.md \
 	$(srcdir)/config/rs6000/power6.md \
 	$(srcdir)/config/rs6000/power7.md \
+	$(srcdir)/config/rs6000/power8.md \
 	$(srcdir)/config/rs6000/cell.md \
 	$(srcdir)/config/rs6000/xfpu.md \
 	$(srcdir)/config/rs6000/a2.md \
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 199674)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -23451,7 +23451,8 @@ is_microcoded_insn (rtx insn)
   if (rs6000_cpu_attr == CPU_CELL)
     return get_attr_cell_micro (insn) == CELL_MICRO_ALWAYS;
 
-  if (rs6000_sched_groups)
+  if (rs6000_sched_groups
+      && (rs6000_cpu == PROCESSOR_POWER4 || rs6000_cpu == PROCESSOR_POWER5))
     {
       enum attr_type type = get_attr_type (insn);
       if (type == TYPE_LOAD_EXT_U
@@ -23476,7 +23477,8 @@ is_cracked_insn (rtx insn)
       || GET_CODE (PATTERN (insn)) == CLOBBER)
     return false;
 
-  if (rs6000_sched_groups)
+  if (rs6000_sched_groups
+      && (rs6000_cpu == PROCESSOR_POWER4 || rs6000_cpu == PROCESSOR_POWER5))
     {
       enum attr_type type = get_attr_type (insn);
       if (type == TYPE_LOAD_U || type == TYPE_STORE_U
@@ -24350,7 +24352,6 @@ insn_must_be_first_in_group (rtx insn)
         }
       break;
     case PROCESSOR_POWER7:
-    case PROCESSOR_POWER8:	/* FIXME */
       type = get_attr_type (insn);
 
       switch (type)
@@ -24385,6 +24386,39 @@ insn_must_be_first_in_group (rtx insn)
           break;
         }
       break;
+    case PROCESSOR_POWER8:
+      type = get_attr_type (insn);
+
+      switch (type)
+        {
+        case TYPE_CR_LOGICAL:
+        case TYPE_DELAYED_CR:
+        case TYPE_MFCR:
+        case TYPE_MFCRF:
+        case TYPE_MTCR:
+        case TYPE_COMPARE:
+        case TYPE_DELAYED_COMPARE:
+        case TYPE_VAR_DELAYED_COMPARE:
+        case TYPE_IMUL_COMPARE:
+        case TYPE_LMUL_COMPARE:
+        case TYPE_SYNC:
+        case TYPE_ISYNC:
+        case TYPE_LOAD_L:
+        case TYPE_STORE_C:
+        case TYPE_LOAD_U:
+        case TYPE_LOAD_UX:
+        case TYPE_LOAD_EXT:
+        case TYPE_LOAD_EXT_U:
+        case TYPE_LOAD_EXT_UX:
+        case TYPE_STORE_UX:
+        case TYPE_VECSTORE:
+        case TYPE_MFJMPR:
+        case TYPE_MTJMPR:
+          return true;
+        default:
+          break;
+        }
+      break;
     default:
       break;
     }
@@ -24447,7 +24481,6 @@ insn_must_be_last_in_group (rtx insn)
     }
     break;
   case PROCESSOR_POWER7:
-  case PROCESSOR_POWER8:	/* FIXME */
     type = get_attr_type (insn);
 
     switch (type)
@@ -24464,6 +24497,25 @@ insn_must_be_last_in_group (rtx insn)
         break;
     }
     break;
+  case PROCESSOR_POWER8:
+    type = get_attr_type (insn);
+
+    switch (type)
+      {
+      case TYPE_MFCR:
+      case TYPE_MTCR:
+      case TYPE_ISYNC:
+      case TYPE_SYNC:
+      case TYPE_LOAD_L:
+      case TYPE_STORE_C:
+      case TYPE_LOAD_EXT_U:
+      case TYPE_LOAD_EXT_UX:
+      case TYPE_STORE_UX:
+        return true;
+      default:
+        break;
+    }
+    break;
   default:
     break;
   }
@@ -24553,7 +24605,7 @@ force_new_group (int sched_verbose, FILE
       if (can_issue_more && !is_branch_slot_insn (next_insn))
 	can_issue_more--;
 
-      /* Power6 and Power7 have special group ending nop. */
+      /* Do we have a special group ending nop? */
       if (rs6000_cpu_attr == CPU_POWER6 || rs6000_cpu_attr == CPU_POWER7
 	  || rs6000_cpu_attr == CPU_POWER8)
 	{
Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(revision 199674)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -206,6 +206,7 @@ (define_attr "cell_micro" "not,condition
 (include "power5.md")
 (include "power6.md")
 (include "power7.md")
+(include "power8.md")
 (include "cell.md")
 (include "xfpu.md")
 (include "a2.md")

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH, rs6000] power8 patches, patch #6, direct move & basic quad load/store
  2013-05-29 20:32     ` Michael Meissner
@ 2013-06-10 15:41       ` David Edelsohn
  2013-06-10 20:26         ` Michael Meissner
  0 siblings, 1 reply; 52+ messages in thread
From: David Edelsohn @ 2013-06-10 15:41 UTC (permalink / raw)
  To: Michael Meissner, GCC Patches, Pat Haugen, Peter Bergner

Mike,

This patch is okay, but something seems really broken with respect to
TImode.  I don't know if we have to separate TImode from V1TImode or
some distinction for atomics from other uses of TImode.  This isn't
like float modes where they mostly live in FPRs and only occassionally
need to live in GPRs.  TImode between VSX and GPRs really is bimodal.
Something is wrong with this preferencing design.

Maybe we need a separate set of logical TImode instructions for the
atomic ops with a neutral set of preferences on the constraints for
movti.  Then the registers chosen for the computation will correctly
drive the register allocation decisions.

Thanks, David

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH, rs6000] power8 patches, patch #6, direct move & basic quad load/store
  2013-06-10 15:41       ` David Edelsohn
@ 2013-06-10 20:26         ` Michael Meissner
  0 siblings, 0 replies; 52+ messages in thread
From: Michael Meissner @ 2013-06-10 20:26 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Michael Meissner, GCC Patches, Pat Haugen, Peter Bergner

On Mon, Jun 10, 2013 at 11:41:20AM -0400, David Edelsohn wrote:
> Mike,
> 
> This patch is okay, but something seems really broken with respect to
> TImode.  I don't know if we have to separate TImode from V1TImode or
> some distinction for atomics from other uses of TImode.  This isn't
> like float modes where they mostly live in FPRs and only occassionally
> need to live in GPRs.  TImode between VSX and GPRs really is bimodal.
> Something is wrong with this preferencing design.

Yes, though at present, I'm not sure how to solve it.  I worry that when the
128-bit add/subtract support is done, it will make the problem worse.

> Maybe we need a separate set of logical TImode instructions for the
> atomic ops with a neutral set of preferences on the constraints for
> movti.  Then the registers chosen for the computation will correctly
> drive the register allocation decisions.

I can redo the atomics to have all of the logical operations done in PTImode,
which restricts it to GPRs.  I need to think about it for the logical operation
revist.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH, rs6000] power8 patches, patch #7, quad/byte/half-word atomic instructions
  2013-05-29 20:29   ` David Edelsohn
  2013-05-29 20:36     ` Michael Meissner
@ 2013-06-11 23:56     ` Michael Meissner
  2013-06-12 21:55       ` David Edelsohn
  1 sibling, 1 reply; 52+ messages in thread
From: Michael Meissner @ 2013-06-11 23:56 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Michael Meissner, GCC Patches, Pat Haugen, Peter Bergner

[-- Attachment #1: Type: text/plain, Size: 2521 bytes --]

I needed to rework the sync.md so that it would work correctly with no
optimization (using SUBREG's at -O0 did not give us the even registers for
holding PTImode values, so I created a PTImode temporary in load_lockedti and
store_conditionalti, which is normally optimized out.

[gcc]
2013-06-11  Michael Meissner  <meissner@linux.vnet.ibm.com>
	    Pat Haugen <pthaugen@us.ibm.com>
	    Peter Bergner <bergner@vnet.ibm.com>

	* config/rs6000/rs6000.c (emit_load_locked): Add support for
	power8 byte, half-word, and quad-word atomic instructions.
	(emit_store_conditional): Likewise.
	(rs6000_expand_atomic_compare_and_swap): Likewise.
	(rs6000_expand_atomic_op): Likewise.

	* config/rs6000/sync.md (larx): Add new modes for power8.
	(stcx): Likewise.
	(AINT): New mode iterator to include TImode as well as normal
	integer modes on power8.
	(fetchop_pred): Use int_reg_operand instead of gpc_reg_operand so
	that VSX registers are not considered.  Use AINT mode iterator
	instead of INT1 to allow inclusion of quad word atomic operations
	on power8.
	(load_locked<mode>): Likewise.
	(store_conditional<mode>): Likewise.
	(atomic_compare_and_swap<mode>): Likewise.
	(atomic_exchange<mode>): Likewise.
	(atomic_nand<mode>): Likewise.
	(atomic_fetch_<fetchop_name><mode>): Likewise.
	(atomic_nand_fetch<mode>): Likewise.
	(mem_thread_fence): Use gen_loadsync_<mode> instead of enumerating
	each type.
	(ATOMIC): On power8, add QImode, HImode modes.
	(load_locked<QHI:mode>_si): Varients of load_locked for QI/HI
	modes that promote to SImode.
	(load_lockedti): Convert TImode arguments to PTImode, so that we
	get a guaranteed even/odd register pair.
	(load_lockedpti): Likewise.
	(store_conditionalti): Likewise.
	(store_conditionalpti): Likewise.

	* config/rs6000/rs6000.md (QHI): New mode iterator for power8
	atomic load/store instructions.
	(HSI): Likewise.

[gcc/testsuite]
2013-06-11  Michael Meissner  <meissner@linux.vnet.ibm.com>
	    Pat Haugen <pthaugen@us.ibm.com>
	    Peter Bergner <bergner@vnet.ibm.com>

	* gcc.target/powerpc/atomic-p7.c: New file, add tests for atomic
	load/store instructions on power7, power8.
	* gcc.target/powerpc/atomic-p8.c: Likewise.

Given these changes went beyond the original request to fix a spelling error
and improve the logic, I figured to send these patches out again.  David, do
you have any problem with the new patches?

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

[-- Attachment #2: gcc-power8.official-07d --]
[-- Type: text/plain, Size: 32738 bytes --]

Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 199965)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -17748,7 +17748,8 @@ emit_unlikely_jump (rtx cond, rtx label)
 }
 
 /* A subroutine of the atomic operation splitters.  Emit a load-locked
-   instruction in MODE.  */
+   instruction in MODE.  For QI/HImode, possibly use a pattern than includes
+   the zero_extend operation.  */
 
 static void
 emit_load_locked (enum machine_mode mode, rtx reg, rtx mem)
@@ -17757,12 +17758,26 @@ emit_load_locked (enum machine_mode mode
 
   switch (mode)
     {
+    case QImode:
+      fn = gen_load_lockedqi;
+      break;
+    case HImode:
+      fn = gen_load_lockedhi;
+      break;
     case SImode:
-      fn = gen_load_lockedsi;
+      if (GET_MODE (mem) == QImode)
+	fn = gen_load_lockedqi_si;
+      else if (GET_MODE (mem) == HImode)
+	fn = gen_load_lockedhi_si;
+      else
+	fn = gen_load_lockedsi;
       break;
     case DImode:
       fn = gen_load_lockeddi;
       break;
+    case TImode:
+      fn = gen_load_lockedti;
+      break;
     default:
       gcc_unreachable ();
     }
@@ -17779,12 +17794,21 @@ emit_store_conditional (enum machine_mod
 
   switch (mode)
     {
+    case QImode:
+      fn = gen_store_conditionalqi;
+      break;
+    case HImode:
+      fn = gen_store_conditionalhi;
+      break;
     case SImode:
       fn = gen_store_conditionalsi;
       break;
     case DImode:
       fn = gen_store_conditionaldi;
       break;
+    case TImode:
+      fn = gen_store_conditionalti;
+      break;
     default:
       gcc_unreachable ();
     }
@@ -17931,7 +17955,7 @@ rs6000_expand_atomic_compare_and_swap (r
 {
   rtx boolval, retval, mem, oldval, newval, cond;
   rtx label1, label2, x, mask, shift;
-  enum machine_mode mode;
+  enum machine_mode mode, orig_mode;
   enum memmodel mod_s, mod_f;
   bool is_weak;
 
@@ -17943,22 +17967,29 @@ rs6000_expand_atomic_compare_and_swap (r
   is_weak = (INTVAL (operands[5]) != 0);
   mod_s = (enum memmodel) INTVAL (operands[6]);
   mod_f = (enum memmodel) INTVAL (operands[7]);
-  mode = GET_MODE (mem);
+  orig_mode = mode = GET_MODE (mem);
 
   mask = shift = NULL_RTX;
   if (mode == QImode || mode == HImode)
     {
-      mem = rs6000_adjust_atomic_subword (mem, &shift, &mask);
-
-      /* Shift and mask OLDVAL into position with the word.  */
+      /* Before power8, we didn't have access to lbarx/lharx, so generate a
+	 lwarx and shift/mask operations.  With power8, we need to do the
+	 comparison in SImode, but the store is still done in QI/HImode.  */
       oldval = convert_modes (SImode, mode, oldval, 1);
-      oldval = expand_simple_binop (SImode, ASHIFT, oldval, shift,
-				    NULL_RTX, 1, OPTAB_LIB_WIDEN);
 
-      /* Shift and mask NEWVAL into position within the word.  */
-      newval = convert_modes (SImode, mode, newval, 1);
-      newval = expand_simple_binop (SImode, ASHIFT, newval, shift,
-				    NULL_RTX, 1, OPTAB_LIB_WIDEN);
+      if (!TARGET_SYNC_HI_QI)
+	{
+	  mem = rs6000_adjust_atomic_subword (mem, &shift, &mask);
+
+	  /* Shift and mask OLDVAL into position with the word.  */
+	  oldval = expand_simple_binop (SImode, ASHIFT, oldval, shift,
+					NULL_RTX, 1, OPTAB_LIB_WIDEN);
+
+	  /* Shift and mask NEWVAL into position within the word.  */
+	  newval = convert_modes (SImode, mode, newval, 1);
+	  newval = expand_simple_binop (SImode, ASHIFT, newval, shift,
+					NULL_RTX, 1, OPTAB_LIB_WIDEN);
+	}
 
       /* Prepare to adjust the return value.  */
       retval = gen_reg_rtx (SImode);
@@ -17987,7 +18018,25 @@ rs6000_expand_atomic_compare_and_swap (r
     }
 
   cond = gen_reg_rtx (CCmode);
-  x = gen_rtx_COMPARE (CCmode, x, oldval);
+  /* If we have TImode, synthesize a comparison.  */
+  if (mode != TImode)
+    x = gen_rtx_COMPARE (CCmode, x, oldval);
+  else
+    {
+      rtx xor1_result = gen_reg_rtx (DImode);
+      rtx xor2_result = gen_reg_rtx (DImode);
+      rtx or_result = gen_reg_rtx (DImode);
+      rtx new_word0 = simplify_gen_subreg (DImode, x, TImode, 0);
+      rtx new_word1 = simplify_gen_subreg (DImode, x, TImode, 8);
+      rtx old_word0 = simplify_gen_subreg (DImode, oldval, TImode, 0);
+      rtx old_word1 = simplify_gen_subreg (DImode, oldval, TImode, 8);
+
+      emit_insn (gen_xordi3 (xor1_result, new_word0, old_word0));
+      emit_insn (gen_xordi3 (xor2_result, new_word1, old_word1));
+      emit_insn (gen_iordi3 (or_result, xor1_result, xor2_result));
+      x = gen_rtx_COMPARE (CCmode, or_result, const0_rtx);
+    }
+
   emit_insn (gen_rtx_SET (VOIDmode, cond, x));
 
   x = gen_rtx_NE (VOIDmode, cond, const0_rtx);
@@ -17997,7 +18046,7 @@ rs6000_expand_atomic_compare_and_swap (r
   if (mask)
     x = rs6000_mask_atomic_subword (retval, newval, mask);
 
-  emit_store_conditional (mode, cond, mem, x);
+  emit_store_conditional (orig_mode, cond, mem, x);
 
   if (!is_weak)
     {
@@ -18015,6 +18064,8 @@ rs6000_expand_atomic_compare_and_swap (r
 
   if (shift)
     rs6000_finish_atomic_subword (operands[1], retval, shift);
+  else if (mode != GET_MODE (operands[1]))
+    convert_move (operands[1], retval, 1);
 
   /* In all cases, CR0 contains EQ on success, and NE on failure.  */
   x = gen_rtx_EQ (SImode, cond, const0_rtx);
@@ -18038,7 +18089,7 @@ rs6000_expand_atomic_exchange (rtx opera
   mode = GET_MODE (mem);
 
   mask = shift = NULL_RTX;
-  if (mode == QImode || mode == HImode)
+  if (!TARGET_SYNC_HI_QI && (mode == QImode || mode == HImode))
     {
       mem = rs6000_adjust_atomic_subword (mem, &shift, &mask);
 
@@ -18087,53 +18138,70 @@ rs6000_expand_atomic_op (enum rtx_code c
 {
   enum memmodel model = (enum memmodel) INTVAL (model_rtx);
   enum machine_mode mode = GET_MODE (mem);
+  enum machine_mode store_mode = mode;
   rtx label, x, cond, mask, shift;
   rtx before = orig_before, after = orig_after;
 
   mask = shift = NULL_RTX;
+  /* On power8, we want to use SImode for the operation.  On previous systems,
+     use the operation in a subword and shift/mask to get the proper byte or
+     halfword.  */
   if (mode == QImode || mode == HImode)
     {
-      mem = rs6000_adjust_atomic_subword (mem, &shift, &mask);
-
-      /* Shift and mask VAL into position with the word.  */
-      val = convert_modes (SImode, mode, val, 1);
-      val = expand_simple_binop (SImode, ASHIFT, val, shift,
-				 NULL_RTX, 1, OPTAB_LIB_WIDEN);
+      if (TARGET_SYNC_HI_QI)
+	{
+	  val = convert_modes (SImode, mode, val, 1);
 
-      switch (code)
+	  /* Prepare to adjust the return value.  */
+	  before = gen_reg_rtx (SImode);
+	  if (after)
+	    after = gen_reg_rtx (SImode);
+	  mode = SImode;
+	}
+      else
 	{
-	case IOR:
-	case XOR:
-	  /* We've already zero-extended VAL.  That is sufficient to
-	     make certain that it does not affect other bits.  */
-	  mask = NULL;
-	  break;
+	  mem = rs6000_adjust_atomic_subword (mem, &shift, &mask);
 
-	case AND:
-	  /* If we make certain that all of the other bits in VAL are
-	     set, that will be sufficient to not affect other bits.  */
-	  x = gen_rtx_NOT (SImode, mask);
-	  x = gen_rtx_IOR (SImode, x, val);
-	  emit_insn (gen_rtx_SET (VOIDmode, val, x));
-	  mask = NULL;
-	  break;
+	  /* Shift and mask VAL into position with the word.  */
+	  val = convert_modes (SImode, mode, val, 1);
+	  val = expand_simple_binop (SImode, ASHIFT, val, shift,
+				     NULL_RTX, 1, OPTAB_LIB_WIDEN);
 
-	case NOT:
-	case PLUS:
-	case MINUS:
-	  /* These will all affect bits outside the field and need
-	     adjustment via MASK within the loop.  */
-	  break;
+	  switch (code)
+	    {
+	    case IOR:
+	    case XOR:
+	      /* We've already zero-extended VAL.  That is sufficient to
+		 make certain that it does not affect other bits.  */
+	      mask = NULL;
+	      break;
 
-	default:
-	  gcc_unreachable ();
-	}
+	    case AND:
+	      /* If we make certain that all of the other bits in VAL are
+		 set, that will be sufficient to not affect other bits.  */
+	      x = gen_rtx_NOT (SImode, mask);
+	      x = gen_rtx_IOR (SImode, x, val);
+	      emit_insn (gen_rtx_SET (VOIDmode, val, x));
+	      mask = NULL;
+	      break;
 
-      /* Prepare to adjust the return value.  */
-      before = gen_reg_rtx (SImode);
-      if (after)
-	after = gen_reg_rtx (SImode);
-      mode = SImode;
+	    case NOT:
+	    case PLUS:
+	    case MINUS:
+	      /* These will all affect bits outside the field and need
+		 adjustment via MASK within the loop.  */
+	      break;
+
+	    default:
+	      gcc_unreachable ();
+	    }
+
+	  /* Prepare to adjust the return value.  */
+	  before = gen_reg_rtx (SImode);
+	  if (after)
+	    after = gen_reg_rtx (SImode);
+	  store_mode = mode = SImode;
+	}
     }
 
   mem = rs6000_pre_atomic_barrier (mem, model);
@@ -18166,9 +18234,11 @@ rs6000_expand_atomic_op (enum rtx_code c
 			       NULL_RTX, 1, OPTAB_LIB_WIDEN);
       x = rs6000_mask_atomic_subword (before, x, mask);
     }
+  else if (store_mode != mode)
+    x = convert_modes (store_mode, mode, x, 1);
 
   cond = gen_reg_rtx (CCmode);
-  emit_store_conditional (mode, cond, mem, x);
+  emit_store_conditional (store_mode, cond, mem, x);
 
   x = gen_rtx_NE (VOIDmode, cond, const0_rtx);
   emit_unlikely_jump (x, label);
@@ -18177,11 +18247,22 @@ rs6000_expand_atomic_op (enum rtx_code c
 
   if (shift)
     {
+      /* QImode/HImode on machines without lbarx/lharx where we do a lwarx and
+	 then do the calcuations in a SImode register.  */
       if (orig_before)
 	rs6000_finish_atomic_subword (orig_before, before, shift);
       if (orig_after)
 	rs6000_finish_atomic_subword (orig_after, after, shift);
     }
+  else if (store_mode != mode)
+    {
+      /* QImode/HImode on machines with lbarx/lharx where we do the native
+	 operation and then do the calcuations in a SImode register.  */
+      if (orig_before)
+	convert_move (orig_before, before, 1);
+      if (orig_after)
+	convert_move (orig_after, after, 1);
+    }
   else if (orig_after && after != orig_after)
     emit_move_insn (orig_after, after);
 }
Index: gcc/config/rs6000/sync.md
===================================================================
--- gcc/config/rs6000/sync.md	(revision 199965)
+++ gcc/config/rs6000/sync.md	(working copy)
@@ -18,14 +18,23 @@
 ;; along with GCC; see the file COPYING3.  If not see
 ;; <http://www.gnu.org/licenses/>.
 
-(define_mode_attr larx [(SI "lwarx") (DI "ldarx")])
-(define_mode_attr stcx [(SI "stwcx.") (DI "stdcx.")])
+(define_mode_attr larx [(QI "lbarx")
+			(HI "lharx")
+			(SI "lwarx")
+			(DI "ldarx")
+			(TI "lqarx")])
+
+(define_mode_attr stcx [(QI "stbcx.")
+			(HI "sthcx.")
+			(SI "stwcx.")
+			(DI "stdcx.")
+			(TI "stqcx.")])
 
 (define_code_iterator FETCHOP [plus minus ior xor and])
 (define_code_attr fetchop_name
   [(plus "add") (minus "sub") (ior "or") (xor "xor") (and "and")])
 (define_code_attr fetchop_pred
-  [(plus "add_operand") (minus "gpc_reg_operand")
+  [(plus "add_operand") (minus "int_reg_operand")
    (ior "logical_operand") (xor "logical_operand") (and "and_operand")])
 
 (define_expand "mem_thread_fence"
@@ -129,16 +138,7 @@ (define_expand "atomic_load<mode>"
     case MEMMODEL_CONSUME:
     case MEMMODEL_ACQUIRE:
     case MEMMODEL_SEQ_CST:
-      if (GET_MODE (operands[0]) == QImode)
-	emit_insn (gen_loadsync_qi (operands[0]));
-      else if (GET_MODE (operands[0]) == HImode)
-	emit_insn (gen_loadsync_hi (operands[0]));
-      else if (GET_MODE (operands[0]) == SImode)
-	emit_insn (gen_loadsync_si (operands[0]));
-      else if (GET_MODE (operands[0]) == DImode)
-	emit_insn (gen_loadsync_di (operands[0]));
-      else
-	gcc_unreachable ();
+      emit_insn (gen_loadsync_<mode> (operands[0]));
       break;
     default:
       gcc_unreachable ();
@@ -170,35 +170,109 @@ (define_expand "atomic_store<mode>"
   DONE;
 })
 
-;; ??? Power ISA 2.06B says that there *is* a load-{byte,half}-and-reserve
-;; opcode that is "phased-in".  Not implemented as of Power7, so not yet used,
-;; but let's prepare the macros anyway.
-
-(define_mode_iterator ATOMIC    [SI (DI "TARGET_POWERPC64")])
+;; Any supported integer mode that has atomic l<x>arx/st<x>cx. instrucitons
+;; other than the quad memory operations, which have special restrictions.
+;; Byte/halfword atomic instructions were added in ISA 2.06B, but were phased
+;; in and did not show up until power8.  TImode atomic lqarx/stqcx. require
+;; special handling due to even/odd register requirements.
+(define_mode_iterator ATOMIC [(QI "TARGET_SYNC_HI_QI")
+			      (HI "TARGET_SYNC_HI_QI")
+			      SI
+			      (DI "TARGET_POWERPC64")])
+
+;; Types that we should provide atomic instructions for.
+
+(define_mode_iterator AINT [QI
+			    HI
+			    SI
+			    (DI "TARGET_POWERPC64")
+			    (TI "TARGET_SYNC_TI")])
 
 (define_insn "load_locked<mode>"
-  [(set (match_operand:ATOMIC 0 "gpc_reg_operand" "=r")
+  [(set (match_operand:ATOMIC 0 "int_reg_operand" "=r")
 	(unspec_volatile:ATOMIC
          [(match_operand:ATOMIC 1 "memory_operand" "Z")] UNSPECV_LL))]
   ""
   "<larx> %0,%y1"
   [(set_attr "type" "load_l")])
 
+(define_insn "load_locked<QHI:mode>_si"
+  [(set (match_operand:SI 0 "int_reg_operand" "=r")
+	(unspec_volatile:SI
+	  [(match_operand:QHI 1 "memory_operand" "Z")] UNSPECV_LL))]
+  "TARGET_SYNC_HI_QI"
+  "<QHI:larx> %0,%y1"
+  [(set_attr "type" "load_l")])
+
+;; Use PTImode to get even/odd register pairs
+(define_expand "load_lockedti"
+  [(use (match_operand:TI 0 "quad_int_reg_operand" ""))
+   (use (match_operand:TI 1 "memory_operand" ""))]
+  "TARGET_SYNC_TI"
+{
+  /* Use a temporary register to force getting an even register for the
+     lqarx/stqcrx. instructions.  Normal optimizations will eliminate this
+     extra copy.  */
+  rtx pti = gen_reg_rtx (PTImode);
+  emit_insn (gen_load_lockedpti (pti, operands[1]));
+  emit_move_insn (operands[0], gen_lowpart (TImode, pti));
+  DONE;
+})
+
+(define_insn "load_lockedpti"
+  [(set (match_operand:PTI 0 "quad_int_reg_operand" "=&r")
+	(unspec_volatile:PTI
+         [(match_operand:TI 1 "memory_operand" "Z")] UNSPECV_LL))]
+  "TARGET_SYNC_TI
+   && !reg_mentioned_p (operands[0], operands[1])
+   && quad_int_reg_operand (operands[0], PTImode)"
+  "lqarx %0,%y1"
+  [(set_attr "type" "load_l")])
+
 (define_insn "store_conditional<mode>"
   [(set (match_operand:CC 0 "cc_reg_operand" "=x")
 	(unspec_volatile:CC [(const_int 0)] UNSPECV_SC))
    (set (match_operand:ATOMIC 1 "memory_operand" "=Z")
-	(match_operand:ATOMIC 2 "gpc_reg_operand" "r"))]
+	(match_operand:ATOMIC 2 "int_reg_operand" "r"))]
   ""
   "<stcx> %2,%y1"
   [(set_attr "type" "store_c")])
 
+(define_expand "store_conditionalti"
+  [(use (match_operand:CC 0 "cc_reg_operand" ""))
+   (use (match_operand:TI 1 "memory_operand" ""))
+   (use (match_operand:TI 2 "quad_int_reg_operand" ""))]
+  "TARGET_SYNC_TI"
+{
+  rtx op0 = operands[0];
+  rtx op1 = operands[1];
+  rtx op2 = operands[2];
+  rtx pti_op1 = change_address (op1, PTImode, XEXP (op1, 0));
+  rtx pti_op2 = gen_reg_rtx (PTImode);
+
+  /* Use a temporary register to force getting an even register for the
+     lqarx/stqcrx. instructions.  Normal optimizations will eliminate this
+     extra copy.  */
+  emit_move_insn (pti_op2, gen_lowpart (PTImode, op2));
+  emit_insn (gen_store_conditionalpti (op0, pti_op1, pti_op2));
+  DONE;
+})
+
+(define_insn "store_conditionalpti"
+  [(set (match_operand:CC 0 "cc_reg_operand" "=x")
+	(unspec_volatile:CC [(const_int 0)] UNSPECV_SC))
+   (set (match_operand:PTI 1 "memory_operand" "=Z")
+	(match_operand:PTI 2 "quad_int_reg_operand" "r"))]
+  "TARGET_SYNC_TI && quad_int_reg_operand (operands[2], PTImode)"
+  "stqcx. %2,%y1"
+  [(set_attr "type" "store_c")])
+
 (define_expand "atomic_compare_and_swap<mode>"
-  [(match_operand:SI 0 "gpc_reg_operand" "")		;; bool out
-   (match_operand:INT1 1 "gpc_reg_operand" "")		;; val out
-   (match_operand:INT1 2 "memory_operand" "")		;; memory
-   (match_operand:INT1 3 "reg_or_short_operand" "")	;; expected
-   (match_operand:INT1 4 "gpc_reg_operand" "")		;; desired
+  [(match_operand:SI 0 "int_reg_operand" "")		;; bool out
+   (match_operand:AINT 1 "int_reg_operand" "")		;; val out
+   (match_operand:AINT 2 "memory_operand" "")		;; memory
+   (match_operand:AINT 3 "reg_or_short_operand" "")	;; expected
+   (match_operand:AINT 4 "int_reg_operand" "")		;; desired
    (match_operand:SI 5 "const_int_operand" "")		;; is_weak
    (match_operand:SI 6 "const_int_operand" "")		;; model succ
    (match_operand:SI 7 "const_int_operand" "")]		;; model fail
@@ -209,9 +283,9 @@ (define_expand "atomic_compare_and_swap<
 })
 
 (define_expand "atomic_exchange<mode>"
-  [(match_operand:INT1 0 "gpc_reg_operand" "")		;; output
-   (match_operand:INT1 1 "memory_operand" "")		;; memory
-   (match_operand:INT1 2 "gpc_reg_operand" "")		;; input
+  [(match_operand:AINT 0 "int_reg_operand" "")		;; output
+   (match_operand:AINT 1 "memory_operand" "")		;; memory
+   (match_operand:AINT 2 "int_reg_operand" "")		;; input
    (match_operand:SI 3 "const_int_operand" "")]		;; model
   ""
 {
@@ -220,9 +294,9 @@ (define_expand "atomic_exchange<mode>"
 })
 
 (define_expand "atomic_<fetchop_name><mode>"
-  [(match_operand:INT1 0 "memory_operand" "")		;; memory
-   (FETCHOP:INT1 (match_dup 0)
-     (match_operand:INT1 1 "<fetchop_pred>" ""))	;; operand
+  [(match_operand:AINT 0 "memory_operand" "")		;; memory
+   (FETCHOP:AINT (match_dup 0)
+     (match_operand:AINT 1 "<fetchop_pred>" ""))	;; operand
    (match_operand:SI 2 "const_int_operand" "")]		;; model
   ""
 {
@@ -232,8 +306,8 @@ (define_expand "atomic_<fetchop_name><mo
 })
 
 (define_expand "atomic_nand<mode>"
-  [(match_operand:INT1 0 "memory_operand" "")		;; memory
-   (match_operand:INT1 1 "gpc_reg_operand" "")		;; operand
+  [(match_operand:AINT 0 "memory_operand" "")		;; memory
+   (match_operand:AINT 1 "int_reg_operand" "")		;; operand
    (match_operand:SI 2 "const_int_operand" "")]		;; model
   ""
 {
@@ -243,10 +317,10 @@ (define_expand "atomic_nand<mode>"
 })
 
 (define_expand "atomic_fetch_<fetchop_name><mode>"
-  [(match_operand:INT1 0 "gpc_reg_operand" "")		;; output
-   (match_operand:INT1 1 "memory_operand" "")		;; memory
-   (FETCHOP:INT1 (match_dup 1)
-     (match_operand:INT1 2 "<fetchop_pred>" ""))	;; operand
+  [(match_operand:AINT 0 "int_reg_operand" "")		;; output
+   (match_operand:AINT 1 "memory_operand" "")		;; memory
+   (FETCHOP:AINT (match_dup 1)
+     (match_operand:AINT 2 "<fetchop_pred>" ""))	;; operand
    (match_operand:SI 3 "const_int_operand" "")]		;; model
   ""
 { 
@@ -256,9 +330,9 @@ (define_expand "atomic_fetch_<fetchop_na
 })
 
 (define_expand "atomic_fetch_nand<mode>"
-  [(match_operand:INT1 0 "gpc_reg_operand" "")		;; output
-   (match_operand:INT1 1 "memory_operand" "")		;; memory
-   (match_operand:INT1 2 "gpc_reg_operand" "")		;; operand
+  [(match_operand:AINT 0 "int_reg_operand" "")		;; output
+   (match_operand:AINT 1 "memory_operand" "")		;; memory
+   (match_operand:AINT 2 "int_reg_operand" "")		;; operand
    (match_operand:SI 3 "const_int_operand" "")]		;; model
   ""
 {
@@ -268,10 +342,10 @@ (define_expand "atomic_fetch_nand<mode>"
 })
 
 (define_expand "atomic_<fetchop_name>_fetch<mode>"
-  [(match_operand:INT1 0 "gpc_reg_operand" "")		;; output
-   (match_operand:INT1 1 "memory_operand" "")		;; memory
-   (FETCHOP:INT1 (match_dup 1)
-     (match_operand:INT1 2 "<fetchop_pred>" ""))	;; operand
+  [(match_operand:AINT 0 "int_reg_operand" "")		;; output
+   (match_operand:AINT 1 "memory_operand" "")		;; memory
+   (FETCHOP:AINT (match_dup 1)
+     (match_operand:AINT 2 "<fetchop_pred>" ""))	;; operand
    (match_operand:SI 3 "const_int_operand" "")]		;; model
   ""
 {
@@ -281,9 +355,9 @@ (define_expand "atomic_<fetchop_name>_fe
 })
 
 (define_expand "atomic_nand_fetch<mode>"
-  [(match_operand:INT1 0 "gpc_reg_operand" "")		;; output
-   (match_operand:INT1 1 "memory_operand" "")		;; memory
-   (match_operand:INT1 2 "gpc_reg_operand" "")		;; operand
+  [(match_operand:AINT 0 "int_reg_operand" "")		;; output
+   (match_operand:AINT 1 "memory_operand" "")		;; memory
+   (match_operand:AINT 2 "int_reg_operand" "")		;; operand
    (match_operand:SI 3 "const_int_operand" "")]		;; model
   ""
 {
Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(revision 199965)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -239,6 +239,12 @@ (define_mode_iterator INT1 [QI HI SI (DI
 ; extend modes for DImode
 (define_mode_iterator QHSI [QI HI SI])
 
+; QImode or HImode for small atomic ops
+(define_mode_iterator QHI [QI HI])
+
+; HImode or SImode for sign extended fusion ops
+(define_mode_iterator HSI [HI SI])
+
 ; SImode or DImode, even if DImode doesn't fit in GPRs.
 (define_mode_iterator SDI [SI DI])
 
Index: gcc/testsuite/gcc.target/powerpc/atomic-p7.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/atomic-p7.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/atomic-p7.c	(revision 0)
@@ -0,0 +1,207 @@
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-mcpu=power7 -O2" } */
+/* { dg-final { scan-assembler-not "lbarx" } } */
+/* { dg-final { scan-assembler-not "lharx" } } */
+/* { dg-final { scan-assembler-times "lwarx" 18 } } */
+/* { dg-final { scan-assembler-times "ldarx" 6 } } */
+/* { dg-final { scan-assembler-not "lqarx" } } */
+/* { dg-final { scan-assembler-not "stbcx" } } */
+/* { dg-final { scan-assembler-not "sthcx" } } */
+/* { dg-final { scan-assembler-times "stwcx" 18 } } */
+/* { dg-final { scan-assembler-times "stdcx" 6 } } */
+/* { dg-final { scan-assembler-not "stqcx" } } */
+/* { dg-final { scan-assembler-times "bl __atomic" 6 } } */
+/* { dg-final { scan-assembler-times "isync" 12 } } */
+/* { dg-final { scan-assembler-times "lwsync" 8 } } */
+/* { dg-final { scan-assembler-not "mtvsrd" } } */
+/* { dg-final { scan-assembler-not "mtvsrwa" } } */
+/* { dg-final { scan-assembler-not "mtvsrwz" } } */
+/* { dg-final { scan-assembler-not "mfvsrd" } } */
+/* { dg-final { scan-assembler-not "mfvsrwz" } } */
+
+/* Test for the byte atomic operations on power8 using lbarx/stbcx.  */
+char
+char_fetch_add_relaxed (char *ptr, int value)
+{
+  return __atomic_fetch_add (ptr, value, __ATOMIC_RELAXED);
+}
+
+char
+char_fetch_sub_consume (char *ptr, int value)
+{
+  return __atomic_fetch_sub (ptr, value, __ATOMIC_CONSUME);
+}
+
+char
+char_fetch_and_acquire (char *ptr, int value)
+{
+  return __atomic_fetch_and (ptr, value, __ATOMIC_ACQUIRE);
+}
+
+char
+char_fetch_ior_release (char *ptr, int value)
+{
+  return __atomic_fetch_or (ptr, value, __ATOMIC_RELEASE);
+}
+
+char
+char_fetch_xor_acq_rel (char *ptr, int value)
+{
+  return __atomic_fetch_xor (ptr, value, __ATOMIC_ACQ_REL);
+}
+
+char
+char_fetch_nand_seq_cst (char *ptr, int value)
+{
+  return __atomic_fetch_nand (ptr, value, __ATOMIC_SEQ_CST);
+}
+
+/* Test for the half word atomic operations on power8 using lharx/sthcx.  */
+short
+short_fetch_add_relaxed (short *ptr, int value)
+{
+  return __atomic_fetch_add (ptr, value, __ATOMIC_RELAXED);
+}
+
+short
+short_fetch_sub_consume (short *ptr, int value)
+{
+  return __atomic_fetch_sub (ptr, value, __ATOMIC_CONSUME);
+}
+
+short
+short_fetch_and_acquire (short *ptr, int value)
+{
+  return __atomic_fetch_and (ptr, value, __ATOMIC_ACQUIRE);
+}
+
+short
+short_fetch_ior_release (short *ptr, int value)
+{
+  return __atomic_fetch_or (ptr, value, __ATOMIC_RELEASE);
+}
+
+short
+short_fetch_xor_acq_rel (short *ptr, int value)
+{
+  return __atomic_fetch_xor (ptr, value, __ATOMIC_ACQ_REL);
+}
+
+short
+short_fetch_nand_seq_cst (short *ptr, int value)
+{
+  return __atomic_fetch_nand (ptr, value, __ATOMIC_SEQ_CST);
+}
+
+/* Test for the word atomic operations on power8 using lwarx/stwcx.  */
+int
+int_fetch_add_relaxed (int *ptr, int value)
+{
+  return __atomic_fetch_add (ptr, value, __ATOMIC_RELAXED);
+}
+
+int
+int_fetch_sub_consume (int *ptr, int value)
+{
+  return __atomic_fetch_sub (ptr, value, __ATOMIC_CONSUME);
+}
+
+int
+int_fetch_and_acquire (int *ptr, int value)
+{
+  return __atomic_fetch_and (ptr, value, __ATOMIC_ACQUIRE);
+}
+
+int
+int_fetch_ior_release (int *ptr, int value)
+{
+  return __atomic_fetch_or (ptr, value, __ATOMIC_RELEASE);
+}
+
+int
+int_fetch_xor_acq_rel (int *ptr, int value)
+{
+  return __atomic_fetch_xor (ptr, value, __ATOMIC_ACQ_REL);
+}
+
+int
+int_fetch_nand_seq_cst (int *ptr, int value)
+{
+  return __atomic_fetch_nand (ptr, value, __ATOMIC_SEQ_CST);
+}
+
+/* Test for the double word atomic operations on power8 using ldarx/stdcx.  */
+long
+long_fetch_add_relaxed (long *ptr, long value)
+{
+  return __atomic_fetch_add (ptr, value, __ATOMIC_RELAXED);
+}
+
+long
+long_fetch_sub_consume (long *ptr, long value)
+{
+  return __atomic_fetch_sub (ptr, value, __ATOMIC_CONSUME);
+}
+
+long
+long_fetch_and_acquire (long *ptr, long value)
+{
+  return __atomic_fetch_and (ptr, value, __ATOMIC_ACQUIRE);
+}
+
+long
+long_fetch_ior_release (long *ptr, long value)
+{
+  return __atomic_fetch_or (ptr, value, __ATOMIC_RELEASE);
+}
+
+long
+long_fetch_xor_acq_rel (long *ptr, long value)
+{
+  return __atomic_fetch_xor (ptr, value, __ATOMIC_ACQ_REL);
+}
+
+long
+long_fetch_nand_seq_cst (long *ptr, long value)
+{
+  return __atomic_fetch_nand (ptr, value, __ATOMIC_SEQ_CST);
+}
+
+/* Test for the quad word atomic operations on power8 using ldarx/stdcx.  */
+__int128_t
+quad_fetch_add_relaxed (__int128_t *ptr, __int128_t value)
+{
+  return __atomic_fetch_add (ptr, value, __ATOMIC_RELAXED);
+}
+
+__int128_t
+quad_fetch_sub_consume (__int128_t *ptr, __int128_t value)
+{
+  return __atomic_fetch_sub (ptr, value, __ATOMIC_CONSUME);
+}
+
+__int128_t
+quad_fetch_and_acquire (__int128_t *ptr, __int128_t value)
+{
+  return __atomic_fetch_and (ptr, value, __ATOMIC_ACQUIRE);
+}
+
+__int128_t
+quad_fetch_ior_release (__int128_t *ptr, __int128_t value)
+{
+  return __atomic_fetch_or (ptr, value, __ATOMIC_RELEASE);
+}
+
+__int128_t
+quad_fetch_xor_acq_rel (__int128_t *ptr, __int128_t value)
+{
+  return __atomic_fetch_xor (ptr, value, __ATOMIC_ACQ_REL);
+}
+
+__int128_t
+quad_fetch_nand_seq_cst (__int128_t *ptr, __int128_t value)
+{
+  return __atomic_fetch_nand (ptr, value, __ATOMIC_SEQ_CST);
+}
Index: gcc/testsuite/gcc.target/powerpc/atomic-p8.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/atomic-p8.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/atomic-p8.c	(revision 0)
@@ -0,0 +1,237 @@
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-mcpu=power8 -O2" } */
+/* { dg-final { scan-assembler-times "lbarx" 7 } } */
+/* { dg-final { scan-assembler-times "lharx" 7 } } */
+/* { dg-final { scan-assembler-times "lwarx" 7 } } */
+/* { dg-final { scan-assembler-times "ldarx" 7 } } */
+/* { dg-final { scan-assembler-times "lqarx" 7 } } */
+/* { dg-final { scan-assembler-times "stbcx" 7 } } */
+/* { dg-final { scan-assembler-times "sthcx" 7 } } */
+/* { dg-final { scan-assembler-times "stwcx" 7 } } */
+/* { dg-final { scan-assembler-times "stdcx" 7 } } */
+/* { dg-final { scan-assembler-times "stqcx" 7 } } */
+/* { dg-final { scan-assembler-not "bl __atomic" } } */
+/* { dg-final { scan-assembler-times "isync" 20 } } */
+/* { dg-final { scan-assembler-times "lwsync" 10 } } */
+/* { dg-final { scan-assembler-not "mtvsrd" } } */
+/* { dg-final { scan-assembler-not "mtvsrwa" } } */
+/* { dg-final { scan-assembler-not "mtvsrwz" } } */
+/* { dg-final { scan-assembler-not "mfvsrd" } } */
+/* { dg-final { scan-assembler-not "mfvsrwz" } } */
+
+/* Test for the byte atomic operations on power8 using lbarx/stbcx.  */
+char
+char_fetch_add_relaxed (char *ptr, int value)
+{
+  return __atomic_fetch_add (ptr, value, __ATOMIC_RELAXED);
+}
+
+char
+char_fetch_sub_consume (char *ptr, int value)
+{
+  return __atomic_fetch_sub (ptr, value, __ATOMIC_CONSUME);
+}
+
+char
+char_fetch_and_acquire (char *ptr, int value)
+{
+  return __atomic_fetch_and (ptr, value, __ATOMIC_ACQUIRE);
+}
+
+char
+char_fetch_ior_release (char *ptr, int value)
+{
+  return __atomic_fetch_or (ptr, value, __ATOMIC_RELEASE);
+}
+
+char
+char_fetch_xor_acq_rel (char *ptr, int value)
+{
+  return __atomic_fetch_xor (ptr, value, __ATOMIC_ACQ_REL);
+}
+
+char
+char_fetch_nand_seq_cst (char *ptr, int value)
+{
+  return __atomic_fetch_nand (ptr, value, __ATOMIC_SEQ_CST);
+}
+
+void
+char_val_compare_and_swap (char *p, int i, int j, char *q)
+{
+  *q = __sync_val_compare_and_swap (p, i, j);
+}
+
+/* Test for the half word atomic operations on power8 using lharx/sthcx.  */
+short
+short_fetch_add_relaxed (short *ptr, int value)
+{
+  return __atomic_fetch_add (ptr, value, __ATOMIC_RELAXED);
+}
+
+short
+short_fetch_sub_consume (short *ptr, int value)
+{
+  return __atomic_fetch_sub (ptr, value, __ATOMIC_CONSUME);
+}
+
+short
+short_fetch_and_acquire (short *ptr, int value)
+{
+  return __atomic_fetch_and (ptr, value, __ATOMIC_ACQUIRE);
+}
+
+short
+short_fetch_ior_release (short *ptr, int value)
+{
+  return __atomic_fetch_or (ptr, value, __ATOMIC_RELEASE);
+}
+
+short
+short_fetch_xor_acq_rel (short *ptr, int value)
+{
+  return __atomic_fetch_xor (ptr, value, __ATOMIC_ACQ_REL);
+}
+
+short
+short_fetch_nand_seq_cst (short *ptr, int value)
+{
+  return __atomic_fetch_nand (ptr, value, __ATOMIC_SEQ_CST);
+}
+
+void
+short_val_compare_and_swap (short *p, int i, int j, short *q)
+{
+  *q = __sync_val_compare_and_swap (p, i, j);
+}
+
+/* Test for the word atomic operations on power8 using lwarx/stwcx.  */
+int
+int_fetch_add_relaxed (int *ptr, int value)
+{
+  return __atomic_fetch_add (ptr, value, __ATOMIC_RELAXED);
+}
+
+int
+int_fetch_sub_consume (int *ptr, int value)
+{
+  return __atomic_fetch_sub (ptr, value, __ATOMIC_CONSUME);
+}
+
+int
+int_fetch_and_acquire (int *ptr, int value)
+{
+  return __atomic_fetch_and (ptr, value, __ATOMIC_ACQUIRE);
+}
+
+int
+int_fetch_ior_release (int *ptr, int value)
+{
+  return __atomic_fetch_or (ptr, value, __ATOMIC_RELEASE);
+}
+
+int
+int_fetch_xor_acq_rel (int *ptr, int value)
+{
+  return __atomic_fetch_xor (ptr, value, __ATOMIC_ACQ_REL);
+}
+
+int
+int_fetch_nand_seq_cst (int *ptr, int value)
+{
+  return __atomic_fetch_nand (ptr, value, __ATOMIC_SEQ_CST);
+}
+
+void
+int_val_compare_and_swap (int *p, int i, int j, int *q)
+{
+  *q = __sync_val_compare_and_swap (p, i, j);
+}
+
+/* Test for the double word atomic operations on power8 using ldarx/stdcx.  */
+long
+long_fetch_add_relaxed (long *ptr, long value)
+{
+  return __atomic_fetch_add (ptr, value, __ATOMIC_RELAXED);
+}
+
+long
+long_fetch_sub_consume (long *ptr, long value)
+{
+  return __atomic_fetch_sub (ptr, value, __ATOMIC_CONSUME);
+}
+
+long
+long_fetch_and_acquire (long *ptr, long value)
+{
+  return __atomic_fetch_and (ptr, value, __ATOMIC_ACQUIRE);
+}
+
+long
+long_fetch_ior_release (long *ptr, long value)
+{
+  return __atomic_fetch_or (ptr, value, __ATOMIC_RELEASE);
+}
+
+long
+long_fetch_xor_acq_rel (long *ptr, long value)
+{
+  return __atomic_fetch_xor (ptr, value, __ATOMIC_ACQ_REL);
+}
+
+long
+long_fetch_nand_seq_cst (long *ptr, long value)
+{
+  return __atomic_fetch_nand (ptr, value, __ATOMIC_SEQ_CST);
+}
+
+void
+long_val_compare_and_swap (long *p, long i, long j, long *q)
+{
+  *q = __sync_val_compare_and_swap (p, i, j);
+}
+
+/* Test for the quad word atomic operations on power8 using ldarx/stdcx.  */
+__int128_t
+quad_fetch_add_relaxed (__int128_t *ptr, __int128_t value)
+{
+  return __atomic_fetch_add (ptr, value, __ATOMIC_RELAXED);
+}
+
+__int128_t
+quad_fetch_sub_consume (__int128_t *ptr, __int128_t value)
+{
+  return __atomic_fetch_sub (ptr, value, __ATOMIC_CONSUME);
+}
+
+__int128_t
+quad_fetch_and_acquire (__int128_t *ptr, __int128_t value)
+{
+  return __atomic_fetch_and (ptr, value, __ATOMIC_ACQUIRE);
+}
+
+__int128_t
+quad_fetch_ior_release (__int128_t *ptr, __int128_t value)
+{
+  return __atomic_fetch_or (ptr, value, __ATOMIC_RELEASE);
+}
+
+__int128_t
+quad_fetch_xor_acq_rel (__int128_t *ptr, __int128_t value)
+{
+  return __atomic_fetch_xor (ptr, value, __ATOMIC_ACQ_REL);
+}
+
+__int128_t
+quad_fetch_nand_seq_cst (__int128_t *ptr, __int128_t value)
+{
+  return __atomic_fetch_nand (ptr, value, __ATOMIC_SEQ_CST);
+}
+
+void
+quad_val_compare_and_swap (__int128_t *p, __int128_t i, __int128_t j, __int128_t *q)
+{
+  *q = __sync_val_compare_and_swap (p, i, j);
+}

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH, rs6000] power8 patches, patch #7, quad/byte/half-word atomic instructions
  2013-06-11 23:56     ` Michael Meissner
@ 2013-06-12 21:55       ` David Edelsohn
  0 siblings, 0 replies; 52+ messages in thread
From: David Edelsohn @ 2013-06-12 21:55 UTC (permalink / raw)
  To: Michael Meissner, David Edelsohn, GCC Patches, Pat Haugen, Peter Bergner

On Tue, Jun 11, 2013 at 7:53 PM, Michael Meissner
<meissner@linux.vnet.ibm.com> wrote:
> I needed to rework the sync.md so that it would work correctly with no
> optimization (using SUBREG's at -O0 did not give us the even registers for
> holding PTImode values, so I created a PTImode temporary in load_lockedti and
> store_conditionalti, which is normally optimized out.
>
> [gcc]
> 2013-06-11  Michael Meissner  <meissner@linux.vnet.ibm.com>
>             Pat Haugen <pthaugen@us.ibm.com>
>             Peter Bergner <bergner@vnet.ibm.com>
>
>         * config/rs6000/rs6000.c (emit_load_locked): Add support for
>         power8 byte, half-word, and quad-word atomic instructions.
>         (emit_store_conditional): Likewise.
>         (rs6000_expand_atomic_compare_and_swap): Likewise.
>         (rs6000_expand_atomic_op): Likewise.
>
>         * config/rs6000/sync.md (larx): Add new modes for power8.
>         (stcx): Likewise.
>         (AINT): New mode iterator to include TImode as well as normal
>         integer modes on power8.
>         (fetchop_pred): Use int_reg_operand instead of gpc_reg_operand so
>         that VSX registers are not considered.  Use AINT mode iterator
>         instead of INT1 to allow inclusion of quad word atomic operations
>         on power8.
>         (load_locked<mode>): Likewise.
>         (store_conditional<mode>): Likewise.
>         (atomic_compare_and_swap<mode>): Likewise.
>         (atomic_exchange<mode>): Likewise.
>         (atomic_nand<mode>): Likewise.
>         (atomic_fetch_<fetchop_name><mode>): Likewise.
>         (atomic_nand_fetch<mode>): Likewise.
>         (mem_thread_fence): Use gen_loadsync_<mode> instead of enumerating
>         each type.
>         (ATOMIC): On power8, add QImode, HImode modes.
>         (load_locked<QHI:mode>_si): Varients of load_locked for QI/HI
>         modes that promote to SImode.
>         (load_lockedti): Convert TImode arguments to PTImode, so that we
>         get a guaranteed even/odd register pair.
>         (load_lockedpti): Likewise.
>         (store_conditionalti): Likewise.
>         (store_conditionalpti): Likewise.
>
>         * config/rs6000/rs6000.md (QHI): New mode iterator for power8
>         atomic load/store instructions.
>         (HSI): Likewise.
>
> [gcc/testsuite]
> 2013-06-11  Michael Meissner  <meissner@linux.vnet.ibm.com>
>             Pat Haugen <pthaugen@us.ibm.com>
>             Peter Bergner <bergner@vnet.ibm.com>
>
>         * gcc.target/powerpc/atomic-p7.c: New file, add tests for atomic
>         load/store instructions on power7, power8.
>         * gcc.target/powerpc/atomic-p8.c: Likewise.
>
> Given these changes went beyond the original request to fix a spelling error
> and improve the logic, I figured to send these patches out again.  David, do
> you have any problem with the new patches?

The new patches are okay.  Thanks for re-checking.

Thanks, David

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH, rs6000] power8 patches, patch #8, power8 load fusion + misc.
  2013-05-22 20:53 ` [PATCH, rs6000] power8 patches, patch #8, power8 load fusion + misc Michael Meissner
@ 2013-06-18 18:30   ` David Edelsohn
  2013-06-24 16:32     ` Michael Meissner
  2013-07-29 18:46   ` [PATCH, rs6000] power8 patches, revised patch #8, power8 load fusion Michael Meissner
  1 sibling, 1 reply; 52+ messages in thread
From: David Edelsohn @ 2013-06-18 18:30 UTC (permalink / raw)
  To: Michael Meissner, GCC Patches, David Edelsohn, Pat Haugen, Peter Bergner

On Wed, May 22, 2013 at 4:52 PM, Michael Meissner
<meissner@linux.vnet.ibm.com> wrote:

> 2013-05-22  Michael Meissner  <meissner@linux.vnet.ibm.com>
>
>         * config/rs6000/predicates.md (fusion_gpr_addis): New predicates
>         to support power8 load fusion.
>         (fusion_gpr_mem_load): Likewise.
>
>         * config/rs6000/rs6000-modes.def (PTImode): Update a comment.
>
>         * config/rs6000/rs6000-protos.h (fusion_gpr_load_p): New
>         declarations for power8 load fusion.
>         (emit_fusion_gpr_load): Likewise.
>
>         * config/rs6000/rs6000.opt (-mlra): New undocumented switch to
>         turn on using the LRA register allocator.
>         (-mconstrain-regs): New undocumented switch to constrain
>         non-integer values from being loaded into the LR or CTR registers.

This really should have been a separate patch.

>         * config/rs6000/rs6000.c (TARGET_LRA_P): If -mlra, turn on using
>         the LRA register allocator.
>         (rs6000_lra_p): Likewise.
>         (rs6000_hard_regno_mode_ok): Allow DI/DD/SF/SD modes in altivec
>         registers if power8.  If -mconstrain-regs, only allow int modes
>         into LR, CTR, and special purpose registers.
>         (rs6000_debug_reg_global): Print -mlra, -mconstrain-regs status if
>         debugging.
>         (rs6000_init_hard_regno_mode_ok): Mark that SFmode can use Altivec
>         registers in the future.
>         (rs6000_option_override_internal): If tuning for power8, turn on
>         fusion mode by default.  Turn on sign extending fusion mode if
>         normal fusion mode is on, and we are at -O2 or -O3.
>         (rs6000_opt_masks): Add -mlra, -mconstrain-regs.
>         (fusion_gpr_load_p): New function, return true if we can fuse an
>         addis instruction with a dependent load to a GPR.
>         (emit_fusion_gpr_load): Emit the instructions for power8 load
>         fusion to GPRs.
>
>         * config/rs6000/vsx.md (VSX load fusion peepholes): Add peepholes
>         to fuse together an addi instruction with a VSX load instruction.
>
>         * config/rs6000/rs6000.md (GPR load fusion peepholes): Add
>         peepholes to fuse an addis instruction with a load to a GPR base
>         register, if the addis instruction is dead after the load, by
>         using the register to be loaded for the addis.  If we are
>         supporting sign extending fusions, convert sign extending loads to
>         zero extending loads and an explicit sign extension.

+  /* 32-bit is not done yet.  */
+  if (TARGET_ELF && !TARGET_POWERPC64)
+    return 0;

What does "32-bit is not done yet." mean? This means PPC32 Linux is
not supported but PPC32 AIX is supported?

+  if (TARGET_ELF && !TARGET_POWERPC64)
+    return 0;

Please return "true" and "false" from new predicates, not "1" and "0".

+
+    case DImode:
+      if (TARGET_POWERPC64)
+    {
+      mode_name = "long";
+      load_str = "ld";
+    }
+      break;

What happens for DImode when not TARGET_POWERPC64?  This should be
gcc_unreachable()?

Thanks, David

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH, rs6000] power8 patches, patch #9, power8 scheduling
  2013-06-07 19:22 ` [PATCH, rs6000] power8 patches, patch #9, power8 scheduling Pat Haugen
@ 2013-06-19 13:00   ` David Edelsohn
  0 siblings, 0 replies; 52+ messages in thread
From: David Edelsohn @ 2013-06-19 13:00 UTC (permalink / raw)
  To: Pat Haugen; +Cc: Michael Meissner, GCC Patches, Peter Bergner

On Fri, Jun 7, 2013 at 3:22 PM, Pat Haugen <pthaugen@linux.vnet.ibm.com> wrote:
> This patch adds instruction scheduling support for the Power8 processor.
> Bootstrap/regression test with no new failures. Ok for trunk?
>
>
> 2013-06-07  Michael Meissner  <meissner@linux.vnet.ibm.com>
>         Pat Haugen <pthaugen@us.ibm.com>
>         Peter Bergner <bergner@vnet.ibm.com>
>
>     * config/rs6000/power8.md: New.
>     * config/rs6000/rs6000-cpus.def (RS6000_CPU table): Adjust processor
>     setting for power8 entry.
>     * config/rs6000/t-rs6000 (MD_INCLUDES): Add power8.md.
>     * config/rs6000/rs6000.c (is_microcoded_insn, is_cracked_insn): Adjust
>     test for Power4/Power5 only.
>     (insn_must_be_first_in_group, insn_must_be_last_in_group): Add Power8
>     support.
>     (force_new_group): Adjust comment.
>     * config/rs6000/rs6000.md: Include power8.md.

This patch is okay.

Thanks, David

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH, rs6000] power8 patches, patch #8, power8 load fusion + misc.
  2013-06-18 18:30   ` David Edelsohn
@ 2013-06-24 16:32     ` Michael Meissner
  2013-06-24 19:43       ` David Edelsohn
  0 siblings, 1 reply; 52+ messages in thread
From: Michael Meissner @ 2013-06-24 16:32 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Michael Meissner, GCC Patches, Pat Haugen, Peter Bergner

On Tue, Jun 18, 2013 at 02:30:49PM -0400, David Edelsohn wrote:
> On Wed, May 22, 2013 at 4:52 PM, Michael Meissner
> <meissner@linux.vnet.ibm.com> wrote:
> 
> > 2013-05-22  Michael Meissner  <meissner@linux.vnet.ibm.com>
> >
> >         * config/rs6000/predicates.md (fusion_gpr_addis): New predicates
> >         to support power8 load fusion.
> >         (fusion_gpr_mem_load): Likewise.
> >
> >         * config/rs6000/rs6000-modes.def (PTImode): Update a comment.
> >
> >         * config/rs6000/rs6000-protos.h (fusion_gpr_load_p): New
> >         declarations for power8 load fusion.
> >         (emit_fusion_gpr_load): Likewise.
> >
> >         * config/rs6000/rs6000.opt (-mlra): New undocumented switch to
> >         turn on using the LRA register allocator.
> >         (-mconstrain-regs): New undocumented switch to constrain
> >         non-integer values from being loaded into the LR or CTR registers.
> 
> This really should have been a separate patch.

Yes, you are right.  I can separate it to be a separate patch if desired.  The
last I checked, there were still problems in moving to use LRA.  It would be
nice if we could get the switch for better testing, rather than continuing to
use a branch.  Right now my focus as been getting the initial power8 changes
in, so it was added more because it was in the sandbox, I was working from.

> +  /* 32-bit is not done yet.  */
> +  if (TARGET_ELF && !TARGET_POWERPC64)
> +    return 0;
> 
> What does "32-bit is not done yet." mean? This means PPC32 Linux is
> not supported but PPC32 AIX is supported?

I don't believe AIX and Linux 64-bit small code model will work with fusion
loading the GPRs, except in the case where you have more than 64K in the static
area that the section anchors point to.  It would work with the VSX fusion that
loads a small constant plus doing hte load.  I tend to feel that restructuring
the code to allow more general addresses before reload, and have secondary
reload, generate the appropriate instructions will work better, but that may
take a longer period to get correct (I'm starting work on it now).

I hadn't gotten around to to looking at 32-bit ELF/Linux.  In theory, 32-bit
Linux should work well with fusion for non-pic code.

> +  if (TARGET_ELF && !TARGET_POWERPC64)
> +    return 0;
> 
> Please return "true" and "false" from new predicates, not "1" and "0".

Ok, I was just being constant with the existing code.

> +
> +    case DImode:
> +      if (TARGET_POWERPC64)
> +    {
> +      mode_name = "long";
> +      load_str = "ld";
> +    }
> +      break;
> 
> What happens for DImode when not TARGET_POWERPC64?  This should be
> gcc_unreachable()?

There is a gcc_unreachable () at the end of the switch that is reached by
either an unknown mode (default case), or by DImode on 32-bit.  But I can put
in two separate gcc_unreachable ()'s.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH, rs6000] power8 patches, patch #8, power8 load fusion + misc.
  2013-06-24 16:32     ` Michael Meissner
@ 2013-06-24 19:43       ` David Edelsohn
  0 siblings, 0 replies; 52+ messages in thread
From: David Edelsohn @ 2013-06-24 19:43 UTC (permalink / raw)
  To: Michael Meissner, GCC Patches, Pat Haugen, Peter Bergner

On Mon, Jun 24, 2013 at 12:31 PM, Michael Meissner
<meissner@linux.vnet.ibm.com> wrote:

>> This really should have been a separate patch.
>
> Yes, you are right.  I can separate it to be a separate patch if desired.  The
> last I checked, there were still problems in moving to use LRA.  It would be
> nice if we could get the switch for better testing, rather than continuing to
> use a branch.  Right now my focus as been getting the initial power8 changes
> in, so it was added more because it was in the sandbox, I was working from.

We'll continue as is, but this set of patches should have been split
into more pieces with more descriptive ChangeLog entries to ease the
review process.

>
>> +  /* 32-bit is not done yet.  */
>> +  if (TARGET_ELF && !TARGET_POWERPC64)
>> +    return 0;
>>
>> What does "32-bit is not done yet." mean? This means PPC32 Linux is
>> not supported but PPC32 AIX is supported?
>
> I don't believe AIX and Linux 64-bit small code model will work with fusion
> loading the GPRs, except in the case where you have more than 64K in the static
> area that the section anchors point to.  It would work with the VSX fusion that
> loads a small constant plus doing hte load.  I tend to feel that restructuring
> the code to allow more general addresses before reload, and have secondary
> reload, generate the appropriate instructions will work better, but that may
> take a longer period to get correct (I'm starting work on it now).
>
> I hadn't gotten around to to looking at 32-bit ELF/Linux.  In theory, 32-bit
> Linux should work well with fusion for non-pic code.

My question is what will break, not what will remain unoptimized. The
comment is not clear and the code addresses only avoids one very
specific target.


>> +  if (TARGET_ELF && !TARGET_POWERPC64)
>> +    return 0;
>>
>> Please return "true" and "false" from new predicates, not "1" and "0".
>
> Ok, I was just being constant with the existing code.

Some code uses 0/1 and some uses true/false. Newer code uses true/false.

>
>> +
>> +    case DImode:
>> +      if (TARGET_POWERPC64)
>> +    {
>> +      mode_name = "long";
>> +      load_str = "ld";
>> +    }
>> +      break;
>>
>> What happens for DImode when not TARGET_POWERPC64?  This should be
>> gcc_unreachable()?
>
> There is a gcc_unreachable () at the end of the switch that is reached by
> either an unknown mode (default case), or by DImode on 32-bit.  But I can put
> in two separate gcc_unreachable ()'s.

The current implementation seems like a rather obtuse error path. If
PPC32 DImode is not supported, it would be clearer to fail there, as
opposed to the function called with a garbage or illegal mode.

Thanks, David

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH, rs6000] power8 patches, patch #4 (revised), new power8 builtins
  2013-06-06 15:57         ` David Edelsohn
  2013-06-06 21:42           ` Michael Meissner
@ 2013-07-15 21:48           ` Michael Meissner
  2013-07-20 19:12             ` David Edelsohn
  1 sibling, 1 reply; 52+ messages in thread
From: Michael Meissner @ 2013-07-15 21:48 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Michael Meissner, GCC Patches, Pat Haugen, Peter Bergner

[-- Attachment #1: Type: text/plain, Size: 4315 bytes --]

On Thu, Jun 06, 2013 at 11:57:01AM -0400, David Edelsohn wrote:
> But I view this as a preliminary step.  The logical instructions need
> an iterator and TImode needs to be cleaned up on 32 bit.
> 
> Thanks, David

Here is my proposed cleanup of the logical support.  It adds DI expanders,
which on 32-bit split the insn immediately, just like the current behavior in
32-bit.  It defines 128-bit logical operations for both 32/64-bit modes.  If
VSX is available, it uses the VSX register set, but allows fallback to GPRs.
Similarly for Altivec only (which was not handled in the last patch).  TImode
prefers GPRs, while the vector types prefer VSX/Altivec.

I've bootstrapped it and ran make check with no regressions.

I'm running the 10 spec tests (gcc, hmmer, povray, milc, omnetpp, h264ref,
cactusADM, libquantum, perlbench, and gromacs) that use long long in some
fashion and there was no significant differences in 32-bit mode, when built
with the same compiler version (I'm using subversion id 200823 as the base for
the moment).

Are these patches ok to install?

2013-07-15  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* config/rs6000/vector.md (xor<mode>3): Move 128-bit boolean
	expanders to rs6000.md.
	(ior<mode>3): Likewise.
	(and<mode>3): Likewise.
	(one_cmpl<mode>2): Likewise.
	(nor<mode>3): Likewise.
	(andc<mode>3): Likewise.
	(eqv<mode>3): Likewise.
	(nand<mode>3): Likewise.
	(orc<mode>3): Likewise.

	* config/rs6000/vsx.md (VSX_L2): Delete, no longer used.
	(vsx_and<mode>3_32bit): Move 128-bit logical insns to rs6000.md,
	and allow TImode operations in 32-bit.
	(vsx_and<mode>3_64bit): Likewise.
	(vsx_ior<mode>3_32bit): Likewise.
	(vsx_ior<mode>3_64bit): Likewise.
	(vsx_xor<mode>3_32bit): Likewise.
	(vsx_xor<mode>3_64bit): Likewise.
	(vsx_one_cmpl<mode>2_32bit): Likewise.
	(vsx_one_cmpl<mode>2_64bit): Likewise.
	(vsx_nor<mode>3_32bit): Likewise.
	(vsx_nor<mode>3_64bit): Likewise.
	(vsx_andc<mode>3_32bit): Likewise.
	(vsx_andc<mode>3_64bit): Likewise.
	(vsx_eqv<mode>3_32bit): Likewise.
	(vsx_eqv<mode>3_64bit): Likewise.
	(vsx_nand<mode>3_32bit): Likewise.
	(vsx_nand<mode>3_64bit): Likewise.
	(vsx_orc<mode>3_32bit): Likewise.
	(vsx_orc<mode>3_64bit): Likewise.

	* config/rs6000/altivec.md (altivec_and): Move 128-bit logical
	insns to rs6000.md, and allow TImode operations in 32-bit.
	(altivec_ior<mode>3): Likewise.
	(altivec_xor<mode>3): Likewise.
	(altivec_one_cmpl<mode>2): Likewise.
	(altivec_nor<mode>3): Likewise.
	(altivec_andc<mode>3): Likewise.

	* config/rs6000/rs6000.md (BOOL_128): New mode iterators and mode
	attributes for moving the 128-bit logical operations into
	rs6000.md.
	(BOOL_REGS_OUTPUT): Likewise.
	(BOOL_REGS_OP1): Likewise.
	(BOOL_REGS_OP2): Likewise.
	(BOOL_REGS_UNARY): Likewise.
	(BOOL_REGS_AND_CR0): Likewise.
	(one_cmpl<mode>2): Add support for DI logical operations on
	32-bit, splitting the operations to 32-bit.
	(anddi3): Likewise.
	(iordi3): Likewise.
	(xordi3): Likewise.
	(and<mode>3, 128-bit types): Rewrite 2013-06-06 logical operator
	changes to combine the 32/64-bit code, allow logical operations on
	TI mode in 32-bit, and to use similar match_operator patterns like
	scalar mode uses.  Combine the Altivec and VSX code for logical
	operations, and move it here.
	(ior<mode>3, 128-bit types): Likewise.
	(xor<mode>3, 128-bit types): Likewise.
	(one_cmpl<mode>3, 128-bit types): Likewise.
	(nor<mode>3, 128-bit types): Likewise.
	(andc<mode>3, 128-bit types): Likewise.
	(eqv<mode>3, 128-bit types): Likewise.
	(nand<mode>3, 128-bit types): Likewise.
	(orc<mode>3, 128-bit types): Likewise.
	(and<mode>3_internal): Likewise.
	(bool<mode>3_internal): Likewise.
	(boolc<mode>3_internal1): Likewise.
	(boolc<mode>3_internal2): Likewise.
	(boolcc<mode>3_internal1): Likewise.
	(boolcc<mode>3_internal2): Likewise.
	(eqv<mode>3_internal1): Likewise.
	(eqv<mode>3_internal2): Likewise.
	(one_cmpl1<mode>3_internal): Likewise.

	* config/rs6000/rs6000-protos.h (rs6000_split_logical): New
	declaration.

	* config/rs6000/rs6000.c (rs6000_split_logical_inner): Add support
	to split multi-word logical operations.
	(rs6000_split_logical_di): Likewise.
	(rs6000_split_logical): Likewise.



-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

[-- Attachment #2: gcc-power8.patch041b --]
[-- Type: text/plain, Size: 50317 bytes --]

Index: gcc/config/rs6000/vector.md
===================================================================
--- gcc/config/rs6000/vector.md	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)	(revision 200823)
+++ gcc/config/rs6000/vector.md	(.../gcc/config/rs6000)	(working copy)
@@ -710,87 +710,6 @@ (define_expand "cr6_test_for_lt_reverse"
   "")
 
 \f
-;; Vector logical instructions
-;; Do not support TImode logical instructions on 32-bit at present, because the
-;; compiler will see that we have a TImode and when it wanted DImode, and
-;; convert the DImode to TImode, store it on the stack, and load it in a VSX
-;; register.
-(define_expand "xor<mode>3"
-  [(set (match_operand:VEC_L 0 "vlogical_operand" "")
-        (xor:VEC_L (match_operand:VEC_L 1 "vlogical_operand" "")
-		   (match_operand:VEC_L 2 "vlogical_operand" "")))]
-  "VECTOR_MEM_ALTIVEC_OR_VSX_P (<MODE>mode)
-   && (<MODE>mode != TImode || TARGET_POWERPC64)"
-  "")
-
-(define_expand "ior<mode>3"
-  [(set (match_operand:VEC_L 0 "vlogical_operand" "")
-        (ior:VEC_L (match_operand:VEC_L 1 "vlogical_operand" "")
-		   (match_operand:VEC_L 2 "vlogical_operand" "")))]
-  "VECTOR_MEM_ALTIVEC_OR_VSX_P (<MODE>mode)
-   && (<MODE>mode != TImode || TARGET_POWERPC64)"
-  "")
-
-(define_expand "and<mode>3"
-  [(parallel [(set (match_operand:VEC_L 0 "vlogical_operand" "")
-		   (and:VEC_L (match_operand:VEC_L 1 "vlogical_operand" "")
-			      (match_operand:VEC_L 2 "vlogical_operand" "")))
-	      (clobber (match_scratch:CC 3 ""))])]
-  "VECTOR_MEM_ALTIVEC_OR_VSX_P (<MODE>mode)
-   && (<MODE>mode != TImode || TARGET_POWERPC64)"
-  "")
-
-(define_expand "one_cmpl<mode>2"
-  [(set (match_operand:VEC_L 0 "vlogical_operand" "")
-        (not:VEC_L (match_operand:VEC_L 1 "vlogical_operand" "")))]
-  "VECTOR_MEM_ALTIVEC_OR_VSX_P (<MODE>mode)
-   && (<MODE>mode != TImode || TARGET_POWERPC64)"
-  "")
-
-(define_expand "nor<mode>3"
-  [(set (match_operand:VEC_L 0 "vlogical_operand" "")
-	(and:VEC_L (not:VEC_L (match_operand:VEC_L 1 "vlogical_operand" ""))
-		   (not:VEC_L (match_operand:VEC_L 2 "vlogical_operand" ""))))]
-  "VECTOR_MEM_ALTIVEC_OR_VSX_P (<MODE>mode)
-   && (<MODE>mode != TImode || TARGET_POWERPC64)"
-  "")
-
-(define_expand "andc<mode>3"
-  [(set (match_operand:VEC_L 0 "vlogical_operand" "")
-        (and:VEC_L (not:VEC_L (match_operand:VEC_L 2 "vlogical_operand" ""))
-		   (match_operand:VEC_L 1 "vlogical_operand" "")))]
-  "VECTOR_MEM_ALTIVEC_OR_VSX_P (<MODE>mode)
-   && (<MODE>mode != TImode || TARGET_POWERPC64)"
-  "")
-
-;; Power8 vector logical instructions.
-(define_expand "eqv<mode>3"
-  [(set (match_operand:VEC_L 0 "register_operand" "")
-	(not:VEC_L
-	 (xor:VEC_L (match_operand:VEC_L 1 "register_operand" "")
-		    (match_operand:VEC_L 2 "register_operand" ""))))]
-  "TARGET_P8_VECTOR && VECTOR_MEM_VSX_P (<MODE>mode)
-   && (<MODE>mode != TImode || TARGET_POWERPC64)")
-
-;; Rewrite nand into canonical form
-(define_expand "nand<mode>3"
-  [(set (match_operand:VEC_L 0 "register_operand" "")
-	(ior:VEC_L
-	 (not:VEC_L (match_operand:VEC_L 1 "register_operand" ""))
-	 (not:VEC_L (match_operand:VEC_L 2 "register_operand" ""))))]
-  "TARGET_P8_VECTOR && VECTOR_MEM_VSX_P (<MODE>mode)
-   && (<MODE>mode != TImode || TARGET_POWERPC64)")
-
-;; The canonical form is to have the negated elment first, so we need to
-;; reverse arguments.
-(define_expand "orc<mode>3"
-  [(set (match_operand:VEC_L 0 "register_operand" "")
-	(ior:VEC_L
-	 (not:VEC_L (match_operand:VEC_L 1 "register_operand" ""))
-	 (match_operand:VEC_L 2 "register_operand" "")))]
-  "TARGET_P8_VECTOR && VECTOR_MEM_VSX_P (<MODE>mode)
-   && (<MODE>mode != TImode || TARGET_POWERPC64)")
-
 ;; Vector count leading zeros
 (define_expand "clz<mode>2"
   [(set (match_operand:VEC_I 0 "register_operand" "")
Index: gcc/config/rs6000/rs6000-protos.h
===================================================================
--- gcc/config/rs6000/rs6000-protos.h	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)	(revision 200823)
+++ gcc/config/rs6000/rs6000-protos.h	(.../gcc/config/rs6000)	(working copy)
@@ -138,6 +138,7 @@ extern rtx rs6000_address_for_fpconvert 
 extern rtx rs6000_address_for_altivec (rtx);
 extern rtx rs6000_allocate_stack_temp (enum machine_mode, bool, bool);
 extern int rs6000_loop_align (rtx);
+extern void rs6000_split_logical (rtx [], enum rtx_code, bool, bool, bool, rtx);
 #endif /* RTX_CODE */
 
 #ifdef TREE_CODE
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)	(revision 200823)
+++ gcc/config/rs6000/rs6000.c	(.../gcc/config/rs6000)	(working copy)
@@ -29780,6 +29780,280 @@ rs6000_set_up_by_prologue (struct hard_r
     add_to_hard_reg_set (&set->set, Pmode, RS6000_PIC_OFFSET_TABLE_REGNUM);
 }
 
+\f
+/* Helper function for rs6000_split_logical to emit a logical instruction after
+   spliting the operation to single GPR registers.
+
+   DEST is the destination register.
+   OP1 and OP2 are the input source registers.
+   CODE is the base operation (AND, IOR, XOR, NOT).
+   MODE is the machine mode.
+   If COMPLEMENT_FINAL_P is true, wrap the whole operation with NOT.
+   If COMPLEMENT_OP1_P is true, wrap operand1 with NOT.
+   If COMPLEMENT_OP2_P is true, wrap operand2 with NOT.
+   CLOBBER_REG is either NULL or a scratch register of type CC to allow
+   formation of the AND instructions.  */
+
+static void
+rs6000_split_logical_inner (rtx dest,
+			    rtx op1,
+			    rtx op2,
+			    enum rtx_code code,
+			    enum machine_mode mode,
+			    bool complement_final_p,
+			    bool complement_op1_p,
+			    bool complement_op2_p,
+			    rtx clobber_reg)
+{
+  rtx bool_rtx;
+  rtx set_rtx;
+
+  /* Optimize AND of 0/0xffffffff and IOR/XOR of 0.  */
+  if (op2 && GET_CODE (op2) == CONST_INT
+      && (mode == SImode || (mode == DImode && TARGET_POWERPC64))
+      && !complement_final_p && !complement_op1_p && !complement_op2_p)
+    {
+      HOST_WIDE_INT mask = GET_MODE_MASK (mode);
+      HOST_WIDE_INT value = INTVAL (op2) & mask;
+
+      /* Optimize AND of 0 to just set 0.  Optimize AND of -1 to be a move.  */
+      if (code == AND)
+	{
+	  if (value == 0)
+	    {
+	      emit_insn (gen_rtx_SET (VOIDmode, dest, const0_rtx));
+	      return;
+	    }
+
+	  else if (value == mask)
+	    {
+	      if (!rtx_equal_p (dest, op1))
+		emit_insn (gen_rtx_SET (VOIDmode, dest, op1));
+	      return;
+	    }
+	}
+
+      /* Optimize IOR/XOR of 0 to be a simple move.  Split large operations
+	 into separate ORI/ORIS or XORI/XORIS instrucitons.  */
+      else if (code == IOR || code == XOR)
+	{
+	  if (value == 0)
+	    {
+	      if (!rtx_equal_p (dest, op1))
+		emit_insn (gen_rtx_SET (VOIDmode, dest, op1));
+	      return;
+	    }
+	}
+    }
+
+  if (complement_op1_p)
+    op1 = gen_rtx_NOT (mode, op1);
+
+  if (complement_op2_p)
+    op2 = gen_rtx_NOT (mode, op2);
+
+  bool_rtx = ((code == NOT)
+	      ? gen_rtx_NOT (mode, op1)
+	      : gen_rtx_fmt_ee (code, mode, op1, op2));
+
+  if (complement_final_p)
+    bool_rtx = gen_rtx_NOT (mode, bool_rtx);
+
+  set_rtx = gen_rtx_SET (VOIDmode, dest, bool_rtx);
+
+  /* Is this AND with an explicit clobber?  */
+  if (clobber_reg)
+    {
+      rtx clobber = gen_rtx_CLOBBER (VOIDmode, clobber_reg);
+      set_rtx = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, set_rtx, clobber));
+    }
+
+  emit_insn (set_rtx);
+  return;
+}
+
+/* Split a DImode AND/IOR/XOR with a constant on a 32-bit system.  These
+   operations are split immediately during RTL generation to allow for more
+   optimizations of the AND/IOR/XOR.
+
+   OPERANDS is an array containing the destination and two input operands.
+   CODE is the base operation (AND, IOR, XOR, NOT).
+   MODE is the machine mode.
+   If COMPLEMENT_FINAL_P is true, wrap the whole operation with NOT.
+   If COMPLEMENT_OP1_P is true, wrap operand1 with NOT.
+   If COMPLEMENT_OP2_P is true, wrap operand2 with NOT.
+   CLOBBER_REG is either NULL or a scratch register of type CC to allow
+   formation of the AND instructions.  */
+
+static void
+rs6000_split_logical_di (rtx operands[3],
+			 enum rtx_code code,
+			 bool complement_final_p,
+			 bool complement_op1_p,
+			 bool complement_op2_p,
+			 rtx clobber_reg)
+{
+  const HOST_WIDE_INT lower_32bits = HOST_WIDE_INT_C(0xffffffff);
+  const HOST_WIDE_INT upper_32bits = ~ lower_32bits;
+  const HOST_WIDE_INT sign_bit = HOST_WIDE_INT_C(0x80000000);
+  enum hi_lo { hi = 0, lo = 1 };
+  rtx op0_hi_lo[2], op1_hi_lo[2], op2_hi_lo[2];
+  size_t i;
+
+  op0_hi_lo[hi] = gen_highpart (SImode, operands[0]);
+  op1_hi_lo[hi] = gen_highpart (SImode, operands[1]);
+  op0_hi_lo[lo] = gen_lowpart (SImode, operands[0]);
+  op1_hi_lo[lo] = gen_lowpart (SImode, operands[1]);
+
+  if (code == NOT)
+    op2_hi_lo[hi] = op2_hi_lo[lo] = NULL_RTX;
+  else
+    {
+      if (GET_CODE (operands[2]) != CONST_INT)
+	{
+	  op2_hi_lo[hi] = gen_highpart_mode (SImode, DImode, operands[2]);
+	  op2_hi_lo[lo] = gen_lowpart (SImode, operands[2]);
+	}
+      else
+	{
+	  HOST_WIDE_INT value = INTVAL (operands[2]);
+	  HOST_WIDE_INT value_hi_lo[2];
+
+	  gcc_assert (!complement_final_p);
+	  gcc_assert (!complement_op1_p);
+	  gcc_assert (!complement_op2_p);
+
+	  value_hi_lo[hi] = value >> 32;
+	  value_hi_lo[lo] = value & lower_32bits;
+
+	  for (i = 0; i < 2; i++)
+	    {
+	      HOST_WIDE_INT sub_value = value_hi_lo[i];
+
+	      if (sub_value & sign_bit)
+		sub_value |= upper_32bits;
+
+	      op2_hi_lo[i] = GEN_INT (sub_value);
+
+	      /* If this is an AND instruction, check to see if we need to load
+		 the value in a register.  */
+	      if (code == AND && sub_value != -1 && sub_value != 0
+		  && !and_operand (op2_hi_lo[i], SImode))
+		op2_hi_lo[i] = force_reg (SImode, op2_hi_lo[i]);
+	    }
+	}
+    }
+
+  for (i = 0; i < 2; i++)
+    {
+      /* Split large IOR/XOR operations.  */
+      if ((code == IOR || code == XOR)
+	  && GET_CODE (op2_hi_lo[i]) == CONST_INT
+	  && !complement_final_p
+	  && !complement_op1_p
+	  && !complement_op2_p
+	  && clobber_reg == NULL_RTX
+	  && !logical_const_operand (op2_hi_lo[i], SImode))
+	{
+	  HOST_WIDE_INT value = INTVAL (op2_hi_lo[i]);
+	  HOST_WIDE_INT hi_16bits = value & HOST_WIDE_INT_C(0xffff0000);
+	  HOST_WIDE_INT lo_16bits = value & HOST_WIDE_INT_C(0x0000ffff);
+	  rtx tmp = gen_reg_rtx (SImode);
+
+	  /* Make sure the constant is sign extended.  */
+	  if ((hi_16bits & sign_bit) != 0)
+	    hi_16bits |= upper_32bits;
+
+	  rs6000_split_logical_inner (tmp, op1_hi_lo[i], GEN_INT (hi_16bits),
+				      code, SImode, false, false, false,
+				      NULL_RTX);
+
+	  rs6000_split_logical_inner (op0_hi_lo[i], tmp, GEN_INT (lo_16bits),
+				      code, SImode, false, false, false,
+				      NULL_RTX);
+	}
+      else
+	rs6000_split_logical_inner (op0_hi_lo[i], op1_hi_lo[i], op2_hi_lo[i],
+				    code, SImode, complement_final_p,
+				    complement_op1_p, complement_op2_p,
+				    clobber_reg);
+    }
+
+  return;
+}
+
+/* Split the insns that make up boolean operations operating on multiple GPR
+   registers.  The boolean MD patterns ensure that the inputs either are
+   exactly the same as the output registers, or there is no overlap.
+
+   OPERANDS is an array containing the destination and two input operands.
+   CODE is the base operation (AND, IOR, XOR, NOT).
+   MODE is the machine mode.
+   If COMPLEMENT_FINAL_P is true, wrap the whole operation with NOT.
+   If COMPLEMENT_OP1_P is true, wrap operand1 with NOT.
+   If COMPLEMENT_OP2_P is true, wrap operand2 with NOT.
+   CLOBBER_REG is either NULL or a scratch register of type CC to allow
+   formation of the AND instructions.  */
+
+void
+rs6000_split_logical (rtx operands[3],
+		      enum rtx_code code,
+		      bool complement_final_p,
+		      bool complement_op1_p,
+		      bool complement_op2_p,
+		      rtx clobber_reg)
+{
+  enum machine_mode mode = GET_MODE (operands[0]);
+  enum machine_mode sub_mode;
+  rtx op0, op1, op2;
+  int sub_size, regno0, regno1, nregs, i;
+
+  /* If this is DImode, use the specialized version that can run before
+     register allocation.  */
+  if (mode == DImode && !TARGET_POWERPC64)
+    {
+      rs6000_split_logical_di (operands, code, complement_final_p,
+			       complement_op1_p, complement_op2_p,
+			       clobber_reg);
+      return;
+    }
+
+  op0 = operands[0];
+  op1 = operands[1];
+  op2 = (code == NOT) ? NULL_RTX : operands[2];
+  sub_mode = (TARGET_POWERPC64) ? DImode : SImode;
+  sub_size = GET_MODE_SIZE (sub_mode);
+  regno0 = REGNO (op0);
+  regno1 = REGNO (op1);
+
+  gcc_assert (reload_completed);
+  gcc_assert (IN_RANGE (regno0, FIRST_GPR_REGNO, LAST_GPR_REGNO));
+  gcc_assert (IN_RANGE (regno1, FIRST_GPR_REGNO, LAST_GPR_REGNO));
+
+  nregs = rs6000_hard_regno_nregs[(int)mode][regno0];
+  gcc_assert (nregs > 1);
+
+  if (op2 && REG_P (op2))
+    gcc_assert (IN_RANGE (REGNO (op2), FIRST_GPR_REGNO, LAST_GPR_REGNO));
+
+  for (i = 0; i < nregs; i++)
+    {
+      int offset = i * sub_size;
+      rtx sub_op0 = simplify_subreg (sub_mode, op0, mode, offset);
+      rtx sub_op1 = simplify_subreg (sub_mode, op1, mode, offset);
+      rtx sub_op2 = ((code == NOT)
+		     ? NULL_RTX
+		     : simplify_subreg (sub_mode, op2, mode, offset));
+
+      rs6000_split_logical_inner (sub_op0, sub_op1, sub_op2, code, sub_mode,
+				  complement_final_p, complement_op1_p,
+				  complement_op2_p, clobber_reg);
+    }
+
+  return;
+}
+
+\f
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-rs6000.h"
Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)	(revision 200823)
+++ gcc/config/rs6000/vsx.md	(.../gcc/config/rs6000)	(working copy)
@@ -36,10 +36,6 @@ (define_mode_iterator VSX_F [V4SF V2DF])
 ;; Iterator for logical types supported by VSX
 (define_mode_iterator VSX_L [V16QI V8HI V4SI V2DI V4SF V2DF TI])
 
-;; Like VSX_L, but don't support TImode for doing logical instructions in
-;; 32-bit
-(define_mode_iterator VSX_L2 [V16QI V8HI V4SI V2DI V4SF V2DF])
-
 ;; Iterator for memory move.  Handle TImode specially to allow
 ;; it to use gprs as well as vsx registers.
 (define_mode_iterator VSX_M [V16QI V8HI V4SI V2DI V4SF V2DF])
@@ -1047,370 +1043,6 @@ (define_insn "*vsx_float_fix_<mode>2"
    (set_attr "fp_type" "<VSfptype_simple>")])
 
 \f
-;; Logical operations.  Do not support TImode logical instructions on 32-bit at
-;; present, because the compiler will see that we have a TImode and when it
-;; wanted DImode, and convert the DImode to TImode, store it on the stack, and
-;; load it in a VSX register or generate extra logical instructions in GPR
-;; registers.
-
-;; When we are splitting the operations to GPRs, we use three alternatives, two
-;; where the first/second inputs and output are in the same register, and the
-;; third where the output specifies an early clobber so that we don't have to
-;; worry about overlapping registers.
-
-(define_insn "*vsx_and<mode>3_32bit"
-  [(set (match_operand:VSX_L2 0 "vlogical_operand" "=wa")
-        (and:VSX_L2 (match_operand:VSX_L2 1 "vlogical_operand" "%wa")
-		    (match_operand:VSX_L2 2 "vlogical_operand" "wa")))
-   (clobber (match_scratch:CC 3 "X"))]
-  "!TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)"
-  "xxland %x0,%x1,%x2"
-  [(set_attr "type" "vecsimple")
-   (set_attr "length" "4")])
-
-(define_insn_and_split "*vsx_and<mode>3_64bit"
-  [(set (match_operand:VSX_L 0 "vlogical_operand" "=wa,?r,?r,&?r")
-        (and:VSX_L
-	 (match_operand:VSX_L 1 "vlogical_operand" "%wa,0,r,r")
-	 (match_operand:VSX_L 2 "vlogical_operand" "wa,r,0,r")))
-   (clobber (match_scratch:CC 3 "X,X,X,X"))]
-  "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)"
-  "@
-   xxland %x0,%x1,%x2
-   #
-   #
-   #"
-  "reload_completed && TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
-   && int_reg_operand (operands[0], <MODE>mode)"
-  [(parallel [(set (match_dup 4) (and:DI (match_dup 5) (match_dup 6)))
-	      (clobber (match_dup 3))])
-   (parallel [(set (match_dup 7) (and:DI (match_dup 8) (match_dup 9)))
-	      (clobber (match_dup 3))])]
-{
-  operands[4] = simplify_subreg (DImode, operands[0], <MODE>mode, 0);
-  operands[5] = simplify_subreg (DImode, operands[1], <MODE>mode, 0);
-  operands[6] = simplify_subreg (DImode, operands[2], <MODE>mode, 0);
-  operands[7] = simplify_subreg (DImode, operands[0], <MODE>mode, 8);
-  operands[8] = simplify_subreg (DImode, operands[1], <MODE>mode, 8);
-  operands[9] = simplify_subreg (DImode, operands[2], <MODE>mode, 8);
-}
-  [(set_attr "type" "vecsimple,two,two,two")
-   (set_attr "length" "4,8,8,8")])
-
-(define_insn "*vsx_ior<mode>3_32bit"
-  [(set (match_operand:VSX_L2 0 "vlogical_operand" "=wa")
-        (ior:VSX_L2 (match_operand:VSX_L2 1 "vlogical_operand" "%wa")
-		    (match_operand:VSX_L2 2 "vlogical_operand" "wa")))]
-  "!TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)"
-  "xxlor %x0,%x1,%x2"
-  [(set_attr "type" "vecsimple")
-   (set_attr "length" "4")])
-
-(define_insn_and_split "*vsx_ior<mode>3_64bit"
-  [(set (match_operand:VSX_L 0 "vlogical_operand" "=wa,?r,?r,&?r,?r,&?r")
-        (ior:VSX_L
-	 (match_operand:VSX_L 1 "vlogical_operand" "%wa,0,r,r,0,r")
-	 (match_operand:VSX_L 2 "vsx_reg_or_cint_operand" "wa,r,0,r,n,n")))]
-  "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)"
-  "@
-   xxlor %x0,%x1,%x2
-   #
-   #
-   #
-   #
-   #"
-  "reload_completed && TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
-   && int_reg_operand (operands[0], <MODE>mode)"
-  [(const_int 0)]
-{
-  operands[3] = simplify_subreg (DImode, operands[0], <MODE>mode, 0);
-  operands[4] = simplify_subreg (DImode, operands[1], <MODE>mode, 0);
-  operands[5] = simplify_subreg (DImode, operands[2], <MODE>mode, 0);
-  operands[6] = simplify_subreg (DImode, operands[0], <MODE>mode, 8);
-  operands[7] = simplify_subreg (DImode, operands[1], <MODE>mode, 8);
-  operands[8] = simplify_subreg (DImode, operands[2], <MODE>mode, 8);
-
-  if (operands[5] == constm1_rtx)
-    emit_move_insn (operands[3], constm1_rtx);
-
-  else if (operands[5] == const0_rtx)
-    {
-      if (!rtx_equal_p (operands[3], operands[4]))
-	emit_move_insn (operands[3], operands[4]);
-    }
-  else
-    emit_insn (gen_iordi3 (operands[3], operands[4], operands[5]));
-
-  if (operands[8] == constm1_rtx)
-    emit_move_insn (operands[8], constm1_rtx);
-
-  else if (operands[8] == const0_rtx)
-    {
-      if (!rtx_equal_p (operands[6], operands[7]))
-	emit_move_insn (operands[6], operands[7]);
-    }
-  else
-    emit_insn (gen_iordi3 (operands[6], operands[7], operands[8]));
-  DONE;
-}
-  [(set_attr "type" "vecsimple,two,two,two,three,three")
-   (set_attr "length" "4,8,8,8,16,16")])
-
-(define_insn "*vsx_xor<mode>3_32bit"
-  [(set (match_operand:VSX_L2 0 "vlogical_operand" "=wa")
-        (xor:VSX_L2 (match_operand:VSX_L2 1 "vlogical_operand" "%wa")
-		    (match_operand:VSX_L2 2 "vlogical_operand" "wa")))]
-  "VECTOR_MEM_VSX_P (<MODE>mode) && !TARGET_POWERPC64"
-  "xxlxor %x0,%x1,%x2"
-  [(set_attr "type" "vecsimple")
-   (set_attr "length" "4")])
-
-(define_insn_and_split "*vsx_xor<mode>3_64bit"
-  [(set (match_operand:VSX_L 0 "vlogical_operand" "=wa,?r,?r,&?r,?r,&?r")
-        (xor:VSX_L
-	 (match_operand:VSX_L 1 "vlogical_operand" "%wa,0,r,r,0,r")
-	 (match_operand:VSX_L 2 "vsx_reg_or_cint_operand" "wa,r,0,r,n,n")))]
-  "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)"
-  "@
-   xxlxor %x0,%x1,%x2
-   #
-   #
-   #
-   #
-   #"
-  "reload_completed && TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
-   && int_reg_operand (operands[0], <MODE>mode)"
-  [(set (match_dup 3) (xor:DI (match_dup 4) (match_dup 5)))
-   (set (match_dup 6) (xor:DI (match_dup 7) (match_dup 8)))]
-{
-  operands[3] = simplify_subreg (DImode, operands[0], <MODE>mode, 0);
-  operands[4] = simplify_subreg (DImode, operands[1], <MODE>mode, 0);
-  operands[5] = simplify_subreg (DImode, operands[2], <MODE>mode, 0);
-  operands[6] = simplify_subreg (DImode, operands[0], <MODE>mode, 8);
-  operands[7] = simplify_subreg (DImode, operands[1], <MODE>mode, 8);
-  operands[8] = simplify_subreg (DImode, operands[2], <MODE>mode, 8);
-}
-  [(set_attr "type" "vecsimple,two,two,two,three,three")
-   (set_attr "length" "4,8,8,8,16,16")])
-
-(define_insn "*vsx_one_cmpl<mode>2_32bit"
-  [(set (match_operand:VSX_L2 0 "vlogical_operand" "=wa")
-        (not:VSX_L2 (match_operand:VSX_L2 1 "vlogical_operand" "wa")))]
-  "!TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)"
-  "xxlnor %x0,%x1,%x1"
-  [(set_attr "type" "vecsimple")
-   (set_attr "length" "4")])
-
-(define_insn_and_split "*vsx_one_cmpl<mode>2_64bit"
-  [(set (match_operand:VSX_L 0 "vlogical_operand" "=wa,?r,&?r")
-        (not:VSX_L (match_operand:VSX_L 1 "vlogical_operand" "wa,0,r")))]
-  "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)"
-  "@
-   xxlnor %x0,%x1,%x1
-   #
-   #"
-  "reload_completed && TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
-   && int_reg_operand (operands[0], <MODE>mode)"
-  [(set (match_dup 2) (not:DI (match_dup 3)))
-   (set (match_dup 4) (not:DI (match_dup 5)))]
-{
-  operands[2] = simplify_subreg (DImode, operands[0], <MODE>mode, 0);
-  operands[3] = simplify_subreg (DImode, operands[1], <MODE>mode, 0);
-  operands[4] = simplify_subreg (DImode, operands[0], <MODE>mode, 8);
-  operands[5] = simplify_subreg (DImode, operands[1], <MODE>mode, 8);
-}
-  [(set_attr "type" "vecsimple,two,two")
-   (set_attr "length" "4,8,8")])
-  
-(define_insn "*vsx_nor<mode>3_32bit"
-  [(set (match_operand:VSX_L2 0 "vlogical_operand" "=wa")
-	(and:VSX_L2
-	 (not:VSX_L2 (match_operand:VSX_L 1 "vlogical_operand" "%wa"))
-	 (not:VSX_L2 (match_operand:VSX_L 2 "vlogical_operand" "wa"))))]
-  "!TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)"
-  "xxlnor %x0,%x1,%x2"
-  [(set_attr "type" "vecsimple")
-   (set_attr "length" "4")])
-
-(define_insn_and_split "*vsx_nor<mode>3_64bit"
-  [(set (match_operand:VSX_L 0 "vlogical_operand" "=wa,?r,?r,&?r")
-	(and:VSX_L
-	 (not:VSX_L (match_operand:VSX_L 1 "vlogical_operand" "%wa,0,r,r"))
-	 (not:VSX_L (match_operand:VSX_L 2 "vlogical_operand" "wa,r,0,r"))))]
-  "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)"
-  "@
-   xxlnor %x0,%x1,%x2
-   #
-   #
-   #"
-  "reload_completed && TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
-   && int_reg_operand (operands[0], <MODE>mode)"
-  [(set (match_dup 3) (and:DI (not:DI (match_dup 4)) (not:DI (match_dup 5))))
-   (set (match_dup 6) (and:DI (not:DI (match_dup 7)) (not:DI (match_dup 8))))]
-{
-  operands[3] = simplify_subreg (DImode, operands[0], <MODE>mode, 0);
-  operands[4] = simplify_subreg (DImode, operands[1], <MODE>mode, 0);
-  operands[5] = simplify_subreg (DImode, operands[2], <MODE>mode, 0);
-  operands[6] = simplify_subreg (DImode, operands[0], <MODE>mode, 8);
-  operands[7] = simplify_subreg (DImode, operands[1], <MODE>mode, 8);
-  operands[8] = simplify_subreg (DImode, operands[2], <MODE>mode, 8);
-}
-  [(set_attr "type" "vecsimple,two,two,two")
-   (set_attr "length" "4,8,8,8")])
-
-(define_insn "*vsx_andc<mode>3_32bit"
-  [(set (match_operand:VSX_L2 0 "vlogical_operand" "=wa")
-        (and:VSX_L2
-	 (not:VSX_L2
-	  (match_operand:VSX_L2 2 "vlogical_operand" "wa"))
-	 (match_operand:VSX_L2 1 "vlogical_operand" "wa")))]
-  "!TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)"
-  "xxlandc %x0,%x1,%x2"
-  [(set_attr "type" "vecsimple")
-   (set_attr "length" "4")])
-
-(define_insn_and_split "*vsx_andc<mode>3_64bit"
-  [(set (match_operand:VSX_L 0 "vlogical_operand" "=wa,?r,?r,?r")
-        (and:VSX_L
-	 (not:VSX_L
-	  (match_operand:VSX_L 2 "vlogical_operand" "wa,0,r,r"))
-	 (match_operand:VSX_L 1 "vlogical_operand" "wa,r,0,r")))]
-  "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)"
-  "@
-   xxlandc %x0,%x1,%x2
-   #
-   #
-   #"
-  "reload_completed && TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
-   && int_reg_operand (operands[0], <MODE>mode)"
-  [(set (match_dup 3) (and:DI (not:DI (match_dup 4)) (match_dup 5)))
-   (set (match_dup 6) (and:DI (not:DI (match_dup 7)) (match_dup 8)))]
-{
-  operands[3] = simplify_subreg (DImode, operands[0], <MODE>mode, 0);
-  operands[4] = simplify_subreg (DImode, operands[1], <MODE>mode, 0);
-  operands[5] = simplify_subreg (DImode, operands[2], <MODE>mode, 0);
-  operands[6] = simplify_subreg (DImode, operands[0], <MODE>mode, 8);
-  operands[7] = simplify_subreg (DImode, operands[1], <MODE>mode, 8);
-  operands[8] = simplify_subreg (DImode, operands[2], <MODE>mode, 8);
-}
-  [(set_attr "type" "vecsimple,two,two,two")
-   (set_attr "length" "4,8,8,8")])
-
-;; Power8 vector logical instructions.
-(define_insn "*vsx_eqv<mode>3_32bit"
-  [(set (match_operand:VSX_L2 0 "vlogical_operand" "=wa")
-	(not:VSX_L2
-	 (xor:VSX_L2 (match_operand:VSX_L2 1 "vlogical_operand" "wa")
-		     (match_operand:VSX_L2 2 "vlogical_operand" "wa"))))]
-  "!TARGET_POWERPC64 && TARGET_P8_VECTOR && VECTOR_MEM_VSX_P (<MODE>mode)"
-  "xxleqv %x0,%x1,%x2"
-  [(set_attr "type" "vecsimple")
-   (set_attr "length" "4")])
-
-(define_insn_and_split "*vsx_eqv<mode>3_64bit"
-  [(set (match_operand:VSX_L 0 "vlogical_operand" "=wa,?r,?r,?r")
-	(not:VSX_L
-	 (xor:VSX_L (match_operand:VSX_L 1 "vlogical_operand" "wa,0,r,r")
-		    (match_operand:VSX_L 2 "vlogical_operand" "wa,r,0,r"))))]
-  "TARGET_POWERPC64 && TARGET_P8_VECTOR && VECTOR_MEM_VSX_P (<MODE>mode)"
-  "@
-   xxleqv %x0,%x1,%x2
-   #
-   #
-   #"
-  "reload_completed && TARGET_POWERPC64 && TARGET_P8_VECTOR
-   && VECTOR_MEM_VSX_P (<MODE>mode)
-   && int_reg_operand (operands[0], <MODE>mode)"
-  [(set (match_dup 3) (not:DI (xor:DI (match_dup 4) (match_dup 5))))
-   (set (match_dup 6) (not:DI (xor:DI (match_dup 7) (match_dup 8))))]
-{
-  operands[3] = simplify_subreg (DImode, operands[0], <MODE>mode, 0);
-  operands[4] = simplify_subreg (DImode, operands[1], <MODE>mode, 0);
-  operands[5] = simplify_subreg (DImode, operands[2], <MODE>mode, 0);
-  operands[6] = simplify_subreg (DImode, operands[0], <MODE>mode, 8);
-  operands[7] = simplify_subreg (DImode, operands[1], <MODE>mode, 8);
-  operands[8] = simplify_subreg (DImode, operands[2], <MODE>mode, 8);
-}
-  [(set_attr "type" "vecsimple,two,two,two")
-   (set_attr "length" "4,8,8,8")])
-
-;; Rewrite nand into canonical form
-(define_insn "*vsx_nand<mode>3_32bit"
-  [(set (match_operand:VSX_L2 0 "vlogical_operand" "=wa")
-	(ior:VSX_L2
-	 (not:VSX_L2 (match_operand:VSX_L2 1 "vlogical_operand" "wa"))
-	 (not:VSX_L2 (match_operand:VSX_L2 2 "vlogical_operand" "wa"))))]
-  "!TARGET_POWERPC64 && TARGET_P8_VECTOR && VECTOR_MEM_VSX_P (<MODE>mode)"
-  "xxlnand %x0,%x1,%x2"
-  [(set_attr "type" "vecsimple")
-   (set_attr "length" "4")])
-
-(define_insn_and_split "*vsx_nand<mode>3_64bit"
-  [(set (match_operand:VSX_L 0 "register_operand" "=wa,?r,?r,?r")
-	(ior:VSX_L
-	 (not:VSX_L (match_operand:VSX_L 1 "register_operand" "wa,0,r,r"))
-	 (not:VSX_L (match_operand:VSX_L 2 "register_operand" "wa,r,0,r"))))]
-  "TARGET_POWERPC64 && TARGET_P8_VECTOR && VECTOR_MEM_VSX_P (<MODE>mode)"
-  "@
-   xxlnand %x0,%x1,%x2
-   #
-   #
-   #"
-  "reload_completed && TARGET_POWERPC64 && TARGET_P8_VECTOR
-   && VECTOR_MEM_VSX_P (<MODE>mode)
-   && int_reg_operand (operands[0], <MODE>mode)"
-  [(set (match_dup 3) (ior:DI (not:DI (match_dup 4)) (not:DI (match_dup 5))))
-   (set (match_dup 6) (ior:DI (not:DI (match_dup 7)) (not:DI (match_dup 8))))]
-{
-  operands[3] = simplify_subreg (DImode, operands[0], <MODE>mode, 0);
-  operands[4] = simplify_subreg (DImode, operands[1], <MODE>mode, 0);
-  operands[5] = simplify_subreg (DImode, operands[2], <MODE>mode, 0);
-  operands[6] = simplify_subreg (DImode, operands[0], <MODE>mode, 8);
-  operands[7] = simplify_subreg (DImode, operands[1], <MODE>mode, 8);
-  operands[8] = simplify_subreg (DImode, operands[2], <MODE>mode, 8);
-}
-  [(set_attr "type" "vecsimple,two,two,two")
-   (set_attr "length" "4,8,8,8")])
-
-;; Rewrite or complement into canonical form, by reversing the arguments
-(define_insn "*vsx_orc<mode>3_32bit"
-  [(set (match_operand:VSX_L2 0 "vlogical_operand" "=wa")
-	(ior:VSX_L2
-	 (not:VSX_L2 (match_operand:VSX_L2 1 "vlogical_operand" "wa"))
-	 (match_operand:VSX_L2 2 "vlogical_operand" "wa")))]
-  "!TARGET_POWERPC64 && TARGET_P8_VECTOR && VECTOR_MEM_VSX_P (<MODE>mode)"
-  "xxlorc %x0,%x2,%x1"
-  [(set_attr "type" "vecsimple")
-   (set_attr "length" "4")])
-
-(define_insn_and_split "*vsx_orc<mode>3_64bit"
-  [(set (match_operand:VSX_L 0 "vlogical_operand" "=wa,?r,?r,?r")
-	(ior:VSX_L
-	 (not:VSX_L (match_operand:VSX_L 1 "vlogical_operand" "wa,0,r,r"))
-	 (match_operand:VSX_L 2 "vlogical_operand" "wa,r,0,r")))]
-  "TARGET_POWERPC64 && TARGET_P8_VECTOR && VECTOR_MEM_VSX_P (<MODE>mode)"
-  "@
-   xxlorc %x0,%x2,%x1
-   #
-   #
-   #"
-  "reload_completed && TARGET_POWERPC64 && TARGET_P8_VECTOR
-   && VECTOR_MEM_VSX_P (<MODE>mode)
-   && int_reg_operand (operands[0], <MODE>mode)"
-  [(set (match_dup 3) (ior:DI (not:DI (match_dup 4)) (match_dup 5)))
-   (set (match_dup 6) (ior:DI (not:DI (match_dup 7)) (match_dup 8)))]
-{
-  operands[3] = simplify_subreg (DImode, operands[0], <MODE>mode, 0);
-  operands[4] = simplify_subreg (DImode, operands[1], <MODE>mode, 0);
-  operands[5] = simplify_subreg (DImode, operands[2], <MODE>mode, 0);
-  operands[6] = simplify_subreg (DImode, operands[0], <MODE>mode, 8);
-  operands[7] = simplify_subreg (DImode, operands[1], <MODE>mode, 8);
-  operands[8] = simplify_subreg (DImode, operands[2], <MODE>mode, 8);
-}
-  [(set_attr "type" "vecsimple,two,two,two")
-   (set_attr "length" "4,8,8,8")])
-
-\f
 ;; Permute operations
 
 ;; Build a V2DF/V2DI vector from two scalars
Index: gcc/config/rs6000/altivec.md
===================================================================
--- gcc/config/rs6000/altivec.md	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)	(revision 200823)
+++ gcc/config/rs6000/altivec.md	(.../gcc/config/rs6000)	(working copy)
@@ -1040,59 +1040,7 @@ (define_insn "vec_widen_smult_odd_v8hi"
   [(set_attr "type" "veccomplex")])
 
 
-;; logical ops.  Have the logical ops follow the memory ops in
-;; terms of whether to prefer VSX or Altivec
-
-;; AND has a clobber to be consistant with VSX, which adds splitters for using
-;; the GPR registers.
-(define_insn "*altivec_and<mode>3"
-  [(set (match_operand:VM 0 "register_operand" "=v")
-        (and:VM (match_operand:VM 1 "register_operand" "v")
-		(match_operand:VM 2 "register_operand" "v")))
-   (clobber (match_scratch:CC 3 "=X"))]
-  "VECTOR_MEM_ALTIVEC_P (<MODE>mode)"
-  "vand %0,%1,%2"
-  [(set_attr "type" "vecsimple")])
-
-(define_insn "*altivec_ior<mode>3"
-  [(set (match_operand:VM 0 "register_operand" "=v")
-        (ior:VM (match_operand:VM 1 "register_operand" "v")
-		(match_operand:VM 2 "register_operand" "v")))]
-  "VECTOR_MEM_ALTIVEC_P (<MODE>mode)"
-  "vor %0,%1,%2"
-  [(set_attr "type" "vecsimple")])
-
-(define_insn "*altivec_xor<mode>3"
-  [(set (match_operand:VM 0 "register_operand" "=v")
-        (xor:VM (match_operand:VM 1 "register_operand" "v")
-		(match_operand:VM 2 "register_operand" "v")))]
-  "VECTOR_MEM_ALTIVEC_P (<MODE>mode)"
-  "vxor %0,%1,%2"
-  [(set_attr "type" "vecsimple")])
-
-(define_insn "*altivec_one_cmpl<mode>2"
-  [(set (match_operand:VM 0 "register_operand" "=v")
-        (not:VM (match_operand:VM 1 "register_operand" "v")))]
-  "VECTOR_MEM_ALTIVEC_P (<MODE>mode)"
-  "vnor %0,%1,%1"
-  [(set_attr "type" "vecsimple")])
-  
-(define_insn "*altivec_nor<mode>3"
-  [(set (match_operand:VM 0 "register_operand" "=v")
-	(and:VM (not:VM (match_operand:VM 1 "register_operand" "v"))
-		(not:VM (match_operand:VM 2 "register_operand" "v"))))]
-  "VECTOR_MEM_ALTIVEC_P (<MODE>mode)"
-  "vnor %0,%1,%2"
-  [(set_attr "type" "vecsimple")])
-
-(define_insn "*altivec_andc<mode>3"
-  [(set (match_operand:VM 0 "register_operand" "=v")
-        (and:VM (not:VM (match_operand:VM 2 "register_operand" "v"))
-		(match_operand:VM 1 "register_operand" "v")))]
-  "VECTOR_MEM_ALTIVEC_P (<MODE>mode)"
-  "vandc %0,%1,%2"
-  [(set_attr "type" "vecsimple")])
-
+;; Vector pack/unpack
 (define_insn "altivec_vpkpx"
   [(set (match_operand:V8HI 0 "register_operand" "=v")
         (unspec:V8HI [(match_operand:V4SI 1 "register_operand" "v")
Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)	(revision 200823)
+++ gcc/config/rs6000/rs6000.md	(.../gcc/config/rs6000)	(working copy)
@@ -388,6 +388,77 @@ (define_mode_attr E500_CONVERT [(SF "!TA
 
 (define_mode_attr TARGET_FLOAT [(SF "TARGET_SINGLE_FLOAT")
 				(DF "TARGET_DOUBLE_FLOAT")])
+
+;; Mode iterator for logical operations on 128-bit types
+(define_mode_iterator BOOL_128		[TI
+					 PTI
+					 (V16QI	"TARGET_ALTIVEC")
+					 (V8HI	"TARGET_ALTIVEC")
+					 (V4SI	"TARGET_ALTIVEC")
+					 (V4SF	"TARGET_ALTIVEC")
+					 (V2DI	"TARGET_ALTIVEC")
+					 (V2DF	"TARGET_ALTIVEC")])
+
+;; For the GPRs we use 3 constraints for register outputs, two that are the
+;; same as the output register, and a third where the output register is an
+;; early clobber, so we don't have to deal with register overlaps.  For the
+;; vector types, we prefer to use the vector registers.  For TI mode, allow
+;; either.
+
+;; Mode attribute for boolean operation register constraints for output
+(define_mode_attr BOOL_REGS_OUTPUT	[(TI	"&r,r,r,wa,v")
+					 (PTI	"&r,r,r")
+					 (V16QI	"wa,v,&?r,?r,?r")
+					 (V8HI	"wa,v,&?r,?r,?r")
+					 (V4SI	"wa,v,&?r,?r,?r")
+					 (V4SF	"wa,v,&?r,?r,?r")
+					 (V2DI	"wa,v,&?r,?r,?r")
+					 (V2DF	"wa,v,&?r,?r,?r")])
+
+;; Mode attribute for boolean operation register constraints for operand1
+(define_mode_attr BOOL_REGS_OP1		[(TI	"r,0,r,wa,v")
+					 (PTI	"r,0,r")
+					 (V16QI	"wa,v,r,0,r")
+					 (V8HI	"wa,v,r,0,r")
+					 (V4SI	"wa,v,r,0,r")
+					 (V4SF	"wa,v,r,0,r")
+					 (V2DI	"wa,v,r,0,r")
+					 (V2DF	"wa,v,r,0,r")])
+
+;; Mode attribute for boolean operation register constraints for operand2
+(define_mode_attr BOOL_REGS_OP2		[(TI	"r,r,0,wa,v")
+					 (PTI	"r,r,0")
+					 (V16QI	"wa,v,r,r,0")
+					 (V8HI	"wa,v,r,r,0")
+					 (V4SI	"wa,v,r,r,0")
+					 (V4SF	"wa,v,r,r,0")
+					 (V2DI	"wa,v,r,r,0")
+					 (V2DF	"wa,v,r,r,0")])
+
+;; Mode attribute for boolean operation register constraints for operand1
+;; for one_cmpl.  To simplify things, we repeat the constraint where 0
+;; is used for operand1 or operand2
+(define_mode_attr BOOL_REGS_UNARY	[(TI	"r,0,0,wa,v")
+					 (PTI	"r,0,0")
+					 (V16QI	"wa,v,r,0,0")
+					 (V8HI	"wa,v,r,0,0")
+					 (V4SI	"wa,v,r,0,0")
+					 (V4SF	"wa,v,r,0,0")
+					 (V2DI	"wa,v,r,0,0")
+					 (V2DF	"wa,v,r,0,0")])
+
+;; Mode attribute for the clobber of CC0 for AND expansion.
+;; For the 128-bit types, we never do AND immediate, but we need to
+;; get the correct number of X's for the number of operands.
+(define_mode_attr BOOL_REGS_AND_CR0	[(TI	"X,X,X,X,X")
+					 (PTI	"X,X,X")
+					 (V16QI	"X,X,X,X,X")
+					 (V8HI	"X,X,X,X,X")
+					 (V4SI	"X,X,X,X,X")
+					 (V4SF	"X,X,X,X,X")
+					 (V2DI	"X,X,X,X,X")
+					 (V2DF	"X,X,X,X,X")])
+
 \f
 ;; Start with fixed-point load and store insns.  Here we put only the more
 ;; complex forms.  Basic data transfer is done later.
@@ -1837,7 +1908,19 @@ (define_split
     FAIL;
 })
 
-(define_insn "one_cmpl<mode>2"
+(define_expand "one_cmpl<mode>2"
+  [(set (match_operand:SDI 0 "gpc_reg_operand" "")
+	(not:SDI (match_operand:SDI 1 "gpc_reg_operand" "")))]
+  ""
+{
+  if (<MODE>mode == DImode && !TARGET_POWERPC64)
+    {
+      rs6000_split_logical (operands, NOT, false, false, false, NULL_RTX);
+      DONE;
+    }
+})
+
+(define_insn "*one_cmpl<mode>2"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
 	(not:GPR (match_operand:GPR 1 "gpc_reg_operand" "r")))]
   ""
@@ -7959,10 +8042,19 @@ (define_expand "anddi3"
   [(parallel
     [(set (match_operand:DI 0 "gpc_reg_operand" "")
 	  (and:DI (match_operand:DI 1 "gpc_reg_operand" "")
-		  (match_operand:DI 2 "and64_2_operand" "")))
+		  (match_operand:DI 2 "reg_or_cint_operand" "")))
      (clobber (match_scratch:CC 3 ""))])]
-  "TARGET_POWERPC64"
-  "")
+  ""
+{
+  if (!TARGET_POWERPC64)
+    {
+      rtx cc = gen_rtx_SCRATCH (CCmode);
+      rs6000_split_logical (operands, AND, false, false, false, cc);
+      DONE;
+    }
+  else if (!and64_2_operand (operands[2], DImode))
+    operands[2] = force_reg (DImode, operands[2]);
+})
 
 (define_insn "anddi3_mc"
   [(set (match_operand:DI 0 "gpc_reg_operand" "=r,r,r,r,r,r")
@@ -8143,11 +8235,17 @@ (define_split
 (define_expand "iordi3"
   [(set (match_operand:DI 0 "gpc_reg_operand" "")
 	(ior:DI (match_operand:DI 1 "gpc_reg_operand" "")
-		(match_operand:DI 2 "reg_or_logical_cint_operand" "")))]
-  "TARGET_POWERPC64"
-  "
+		(match_operand:DI 2 "reg_or_cint_operand" "")))]
+  ""
 {
-  if (non_logical_cint_operand (operands[2], DImode))
+  if (!TARGET_POWERPC64)
+    {
+      rs6000_split_logical (operands, IOR, false, false, false, NULL_RTX);
+      DONE;
+    }
+  else if (!reg_or_logical_cint_operand (operands[2], DImode))
+    operands[2] = force_reg (DImode, operands[2]);
+  else if (non_logical_cint_operand (operands[2], DImode))
     {
       HOST_WIDE_INT value;
       rtx tmp = ((!can_create_pseudo_p ()
@@ -8161,15 +8259,21 @@ (define_expand "iordi3"
       emit_insn (gen_iordi3 (operands[0], tmp, GEN_INT (value & 0xffff)));
       DONE;
     }
-}")
+})
 
 (define_expand "xordi3"
   [(set (match_operand:DI 0 "gpc_reg_operand" "")
 	(xor:DI (match_operand:DI 1 "gpc_reg_operand" "")
-		(match_operand:DI 2 "reg_or_logical_cint_operand" "")))]
-  "TARGET_POWERPC64"
-  "
+		(match_operand:DI 2 "reg_or_cint_operand" "")))]
+  ""
 {
+  if (!TARGET_POWERPC64)
+    {
+      rs6000_split_logical (operands, XOR, false, false, false, NULL_RTX);
+      DONE;
+    }
+  else if (!reg_or_logical_cint_operand (operands[2], DImode))
+    operands[2] = force_reg (DImode, operands[2]);
   if (non_logical_cint_operand (operands[2], DImode))
     {
       HOST_WIDE_INT value;
@@ -8184,7 +8288,7 @@ (define_expand "xordi3"
       emit_insn (gen_xordi3 (operands[0], tmp, GEN_INT (value & 0xffff)));
       DONE;
     }
-}")
+})
 
 (define_insn "*booldi3_internal1"
   [(set (match_operand:DI 0 "gpc_reg_operand" "=r,r,r")
@@ -8422,6 +8526,372 @@ (define_insn "*eqv<mode>3"
    (set_attr "length" "4")])
 
 \f
+;; 128-bit logical operations expanders
+
+(define_expand "and<mode>3"
+  [(parallel [(set (match_operand:BOOL_128 0 "vlogical_operand" "")
+		   (and:BOOL_128
+		    (match_operand:BOOL_128 1 "vlogical_operand" "")
+		    (match_operand:BOOL_128 2 "vlogical_operand" "")))
+	      (clobber (match_scratch:CC 3 ""))])]
+  ""
+  "")
+
+(define_expand "ior<mode>3"
+  [(set (match_operand:BOOL_128 0 "vlogical_operand" "")
+        (ior:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand" "")
+		      (match_operand:BOOL_128 2 "vlogical_operand" "")))]
+  ""
+  "")
+
+(define_expand "xor<mode>3"
+  [(set (match_operand:BOOL_128 0 "vlogical_operand" "")
+        (xor:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand" "")
+		      (match_operand:BOOL_128 2 "vlogical_operand" "")))]
+  ""
+  "")
+
+(define_expand "one_cmpl<mode>2"
+  [(set (match_operand:BOOL_128 0 "vlogical_operand" "")
+        (not:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand" "")))]
+  ""
+  "")
+
+(define_expand "nor<mode>3"
+  [(set (match_operand:BOOL_128 0 "vlogical_operand" "")
+	(and:BOOL_128
+	 (not:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand" ""))
+	 (not:BOOL_128 (match_operand:BOOL_128 2 "vlogical_operand" ""))))]
+  ""
+  "")
+
+(define_expand "andc<mode>3"
+  [(set (match_operand:BOOL_128 0 "vlogical_operand" "")
+        (and:BOOL_128
+	 (not:BOOL_128 (match_operand:BOOL_128 2 "vlogical_operand" ""))
+	 (match_operand:BOOL_128 1 "vlogical_operand" "")))]
+  ""
+  "")
+
+;; Power8 vector logical instructions.
+(define_expand "eqv<mode>3"
+  [(set (match_operand:BOOL_128 0 "vlogical_operand" "")
+	(not:BOOL_128
+	 (xor:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand" "")
+		       (match_operand:BOOL_128 2 "vlogical_operand" ""))))]
+  "<MODE>mode == TImode || <MODE>mode == PTImode || TARGET_P8_VECTOR"
+  "")
+
+;; Rewrite nand into canonical form
+(define_expand "nand<mode>3"
+  [(set (match_operand:BOOL_128 0 "vlogical_operand" "")
+	(ior:BOOL_128
+	 (not:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand" ""))
+	 (not:BOOL_128 (match_operand:BOOL_128 2 "vlogical_operand" ""))))]
+  "<MODE>mode == TImode || <MODE>mode == PTImode || TARGET_P8_VECTOR"
+  "")
+
+;; The canonical form is to have the negated elment first, so we need to
+;; reverse arguments.
+(define_expand "orc<mode>3"
+  [(set (match_operand:BOOL_128 0 "vlogical_operand" "")
+	(ior:BOOL_128
+	 (not:BOOL_128 (match_operand:BOOL_128 2 "vlogical_operand" ""))
+	 (match_operand:BOOL_128 1 "vlogical_operand" "")))]
+  "<MODE>mode == TImode || <MODE>mode == PTImode || TARGET_P8_VECTOR"
+  "")
+
+;; 128-bit logical operations insns and split operations
+(define_insn_and_split "*and<mode>3_internal"
+  [(set (match_operand:BOOL_128 0 "vlogical_operand" "=<BOOL_REGS_OUTPUT>")
+        (and:BOOL_128
+	 (match_operand:BOOL_128 1 "vlogical_operand" "%<BOOL_REGS_OP1>")
+	 (match_operand:BOOL_128 2 "vlogical_operand" "<BOOL_REGS_OP2>")))
+   (clobber (match_scratch:CC 3 "<BOOL_REGS_AND_CR0>"))]
+  ""
+{
+  if (TARGET_VSX && vsx_register_operand (operands[0], <MODE>mode))
+    return "xxland %x0,%x1,%x2";
+
+  if (TARGET_ALTIVEC && altivec_register_operand (operands[0], <MODE>mode))
+    return "vand %0,%1,%2";
+
+  return "#";
+}
+  "reload_completed && int_reg_operand (operands[0], <MODE>mode)"
+  [(const_int 0)]
+{
+  rs6000_split_logical (operands, AND, false, false, false, operands[3]);
+  DONE;
+}
+  [(set (attr "type")
+      (if_then_else
+	(match_test "vsx_register_operand (operands[0], <MODE>mode)")
+	(const_string "vecsimple")
+	(const_string "integer")))
+   (set (attr "length")
+      (if_then_else
+	(match_test "vsx_register_operand (operands[0], <MODE>mode)")
+	(const_string "4")
+	(if_then_else
+	 (match_test "TARGET_POWERPC64")
+	 (const_string "8")
+	 (const_string "16"))))])
+
+;; 128-bit IOR/XOR
+(define_insn_and_split "*bool<mode>3_internal"
+  [(set (match_operand:BOOL_128 0 "vlogical_operand" "=<BOOL_REGS_OUTPUT>")
+	(match_operator:BOOL_128 3 "boolean_or_operator"
+	 [(match_operand:BOOL_128 1 "vlogical_operand" "%<BOOL_REGS_OP1>")
+	  (match_operand:BOOL_128 2 "vlogical_operand" "<BOOL_REGS_OP2>")]))]
+  ""
+{
+  if (TARGET_VSX && vsx_register_operand (operands[0], <MODE>mode))
+    return "xxl%q3 %x0,%x1,%x2";
+
+  if (TARGET_ALTIVEC && altivec_register_operand (operands[0], <MODE>mode))
+    return "v%q3 %0,%1,%2";
+
+  return "#";
+}
+  "reload_completed && int_reg_operand (operands[0], <MODE>mode)"
+  [(const_int 0)]
+{
+  rs6000_split_logical (operands, GET_CODE (operands[3]), false, false, false,
+			NULL_RTX);
+  DONE;
+}
+  [(set (attr "type")
+      (if_then_else
+	(match_test "vsx_register_operand (operands[0], <MODE>mode)")
+	(const_string "vecsimple")
+	(const_string "integer")))
+   (set (attr "length")
+      (if_then_else
+	(match_test "vsx_register_operand (operands[0], <MODE>mode)")
+	(const_string "4")
+	(if_then_else
+	 (match_test "TARGET_POWERPC64")
+	 (const_string "8")
+	 (const_string "16"))))])
+
+;; 128-bit ANDC/ORC
+(define_insn_and_split "*boolc<mode>3_internal1"
+  [(set (match_operand:BOOL_128 0 "vlogical_operand" "=<BOOL_REGS_OUTPUT>")
+	(match_operator:BOOL_128 3 "boolean_operator"
+	 [(not:BOOL_128
+	   (match_operand:BOOL_128 2 "vlogical_operand" "<BOOL_REGS_OP1>"))
+	  (match_operand:BOOL_128 1 "vlogical_operand" "<BOOL_REGS_OP2>")]))]
+  "TARGET_P8_VECTOR || (GET_CODE (operands[3]) == AND)"
+{
+  if (TARGET_VSX && vsx_register_operand (operands[0], <MODE>mode))
+    return "xxl%q3 %x0,%x1,%x2";
+
+  if (TARGET_ALTIVEC && altivec_register_operand (operands[0], <MODE>mode))
+    return "v%q3 %0,%1,%2";
+
+  return "#";
+}
+  "(TARGET_P8_VECTOR || (GET_CODE (operands[3]) == AND))
+   && reload_completed && int_reg_operand (operands[0], <MODE>mode)"
+  [(const_int 0)]
+{
+  rs6000_split_logical (operands, GET_CODE (operands[3]), false, true, false,
+			NULL_RTX);
+  DONE;
+}
+  [(set (attr "type")
+      (if_then_else
+	(match_test "vsx_register_operand (operands[0], <MODE>mode)")
+	(const_string "vecsimple")
+	(const_string "integer")))
+   (set (attr "length")
+      (if_then_else
+	(match_test "vsx_register_operand (operands[0], <MODE>mode)")
+	(const_string "4")
+	(if_then_else
+	 (match_test "TARGET_POWERPC64")
+	 (const_string "8")
+	 (const_string "16"))))])
+
+(define_insn_and_split "*boolc<mode>3_internal2"
+  [(set (match_operand:TI2 0 "int_reg_operand" "=&r,r,r")
+	(match_operator:TI2 3 "boolean_operator"
+	 [(not:TI2
+	   (match_operand:TI2 1 "int_reg_operand" "r,0,r"))
+	  (match_operand:TI2 2 "int_reg_operand" "r,r,0")]))]
+  "!TARGET_P8_VECTOR && (GET_CODE (operands[3]) != AND)"
+  "#"
+  "reload_completed && !TARGET_P8_VECTOR && (GET_CODE (operands[3]) != AND)"
+  [(const_int 0)]
+{
+  rs6000_split_logical (operands, GET_CODE (operands[3]), false, true, false,
+			NULL_RTX);
+  DONE;
+}
+  [(set_attr "type" "integer")
+   (set (attr "length")
+	(if_then_else
+	 (match_test "TARGET_POWERPC64")
+	 (const_string "8")
+	 (const_string "16")))])
+
+;; 128-bit NAND/NOR
+(define_insn_and_split "*boolcc<mode>3_internal1"
+  [(set (match_operand:BOOL_128 0 "vlogical_operand" "=<BOOL_REGS_OUTPUT>")
+	(match_operator:BOOL_128 3 "boolean_operator"
+	 [(not:BOOL_128
+	   (match_operand:BOOL_128 1 "vlogical_operand" "<BOOL_REGS_OP1>"))
+	  (not:BOOL_128
+	   (match_operand:BOOL_128 2 "vlogical_operand" "<BOOL_REGS_OP2>"))]))]
+  "TARGET_P8_VECTOR || (GET_CODE (operands[3]) == AND)"
+{
+  if (TARGET_VSX && vsx_register_operand (operands[0], <MODE>mode))
+    return "xxl%q3 %x0,%x1,%x2";
+
+  if (TARGET_ALTIVEC && altivec_register_operand (operands[0], <MODE>mode))
+    return "v%q3 %0,%1,%2";
+
+  return "#";
+}
+  "(TARGET_P8_VECTOR || (GET_CODE (operands[3]) == AND))
+   && reload_completed && int_reg_operand (operands[0], <MODE>mode)"
+  [(const_int 0)]
+{
+  rs6000_split_logical (operands, GET_CODE (operands[3]), false, true, true,
+			NULL_RTX);
+  DONE;
+}
+  [(set (attr "type")
+      (if_then_else
+	(match_test "vsx_register_operand (operands[0], <MODE>mode)")
+	(const_string "vecsimple")
+	(const_string "integer")))
+   (set (attr "length")
+      (if_then_else
+	(match_test "vsx_register_operand (operands[0], <MODE>mode)")
+	(const_string "4")
+	(if_then_else
+	 (match_test "TARGET_POWERPC64")
+	 (const_string "8")
+	 (const_string "16"))))])
+
+(define_insn_and_split "*boolcc<mode>3_internal2"
+  [(set (match_operand:TI2 0 "int_reg_operand" "=&r,r,r")
+	(match_operator:TI2 3 "boolean_operator"
+	 [(not:TI2
+	   (match_operand:TI2 1 "int_reg_operand" "r,0,r"))
+	  (not:TI2
+	   (match_operand:TI2 2 "int_reg_operand" "r,r,0"))]))]
+  "!TARGET_P8_VECTOR && (GET_CODE (operands[3]) != AND)"
+  "#"
+  "reload_completed && !TARGET_P8_VECTOR && (GET_CODE (operands[3]) != AND)"
+  [(const_int 0)]
+{
+  rs6000_split_logical (operands, GET_CODE (operands[3]), false, true, true,
+			NULL_RTX);
+  DONE;
+}
+  [(set_attr "type" "integer")
+   (set (attr "length")
+	(if_then_else
+	 (match_test "TARGET_POWERPC64")
+	 (const_string "8")
+	 (const_string "16")))])
+
+
+;; 128-bit EQV
+(define_insn_and_split "*eqv<mode>3_internal1"
+  [(set (match_operand:BOOL_128 0 "vlogical_operand" "=<BOOL_REGS_OUTPUT>")
+	(not:BOOL_128
+	 (xor:BOOL_128
+	  (match_operand:BOOL_128 1 "vlogical_operand" "<BOOL_REGS_OP1>")
+	  (match_operand:BOOL_128 2 "vlogical_operand" "<BOOL_REGS_OP2>"))))]
+  "TARGET_P8_VECTOR"
+{
+  if (vsx_register_operand (operands[0], <MODE>mode))
+    return "xxleqv %x0,%x1,%x2";
+
+  return "#";
+}
+  "TARGET_P8_VECTOR && reload_completed
+   && int_reg_operand (operands[0], <MODE>mode)"
+  [(const_int 0)]
+{
+  rs6000_split_logical (operands, XOR, true, false, false, NULL_RTX);
+  DONE;
+}
+  [(set (attr "type")
+      (if_then_else
+	(match_test "vsx_register_operand (operands[0], <MODE>mode)")
+	(const_string "vecsimple")
+	(const_string "integer")))
+   (set (attr "length")
+      (if_then_else
+	(match_test "vsx_register_operand (operands[0], <MODE>mode)")
+	(const_string "4")
+	(if_then_else
+	 (match_test "TARGET_POWERPC64")
+	 (const_string "8")
+	 (const_string "16"))))])
+
+(define_insn_and_split "*eqv<mode>3_internal2"
+  [(set (match_operand:TI2 0 "int_reg_operand" "=&r,r,r")
+	(not:TI2
+	 (xor:TI2
+	  (match_operand:TI2 1 "int_reg_operand" "r,0,r")
+	  (match_operand:TI2 2 "int_reg_operand" "r,r,0"))))]
+  "!TARGET_P8_VECTOR"
+  "#"
+  "reload_completed && !TARGET_P8_VECTOR"
+  [(const_int 0)]
+{
+  rs6000_split_logical (operands, XOR, true, false, false, NULL_RTX);
+  DONE;
+}
+  [(set_attr "type" "integer")
+   (set (attr "length")
+	(if_then_else
+	 (match_test "TARGET_POWERPC64")
+	 (const_string "8")
+	 (const_string "16")))])
+
+;; 128-bit one's complement
+(define_insn_and_split "*one_cmpl<mode>3_internal"
+  [(set (match_operand:BOOL_128 0 "vlogical_operand" "=<BOOL_REGS_OUTPUT>")
+	(not:BOOL_128
+	  (match_operand:BOOL_128 1 "vlogical_operand" "<BOOL_REGS_UNARY>")))]
+  ""
+{
+  if (TARGET_VSX && vsx_register_operand (operands[0], <MODE>mode))
+    return "xxlnor %x0,%x1,%x1";
+
+  if (TARGET_ALTIVEC && altivec_register_operand (operands[0], <MODE>mode))
+    return "vnor %0,%1,%1";
+
+  return "#";
+}
+  "reload_completed && int_reg_operand (operands[0], <MODE>mode)"
+  [(const_int 0)]
+{
+  rs6000_split_logical (operands, NOT, false, false, false, NULL_RTX);
+  DONE;
+}
+  [(set (attr "type")
+      (if_then_else
+	(match_test "vsx_register_operand (operands[0], <MODE>mode)")
+	(const_string "vecsimple")
+	(const_string "integer")))
+   (set (attr "length")
+      (if_then_else
+	(match_test "vsx_register_operand (operands[0], <MODE>mode)")
+	(const_string "4")
+	(if_then_else
+	 (match_test "TARGET_POWERPC64")
+	 (const_string "8")
+	 (const_string "16"))))])
+
+\f
 ;; Now define ways of moving data around.
 
 ;; Set up a register with a value from the GOT table

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH, rs6000] power8 patches, patch #4 (revised), new power8 builtins
  2013-07-15 21:48           ` Michael Meissner
@ 2013-07-20 19:12             ` David Edelsohn
  2013-07-23 21:24               ` Michael Meissner
  0 siblings, 1 reply; 52+ messages in thread
From: David Edelsohn @ 2013-07-20 19:12 UTC (permalink / raw)
  To: Michael Meissner, David Edelsohn, GCC Patches, Pat Haugen, Peter Bergner

On Mon, Jul 15, 2013 at 5:43 PM, Michael Meissner
<meissner@linux.vnet.ibm.com> wrote:

> Are these patches ok to install?
>
> 2013-07-15  Michael Meissner  <meissner@linux.vnet.ibm.com>
>
>         * config/rs6000/vector.md (xor<mode>3): Move 128-bit boolean
>         expanders to rs6000.md.
>         (ior<mode>3): Likewise.
>         (and<mode>3): Likewise.
>         (one_cmpl<mode>2): Likewise.
>         (nor<mode>3): Likewise.
>         (andc<mode>3): Likewise.
>         (eqv<mode>3): Likewise.
>         (nand<mode>3): Likewise.
>         (orc<mode>3): Likewise.
>
>         * config/rs6000/vsx.md (VSX_L2): Delete, no longer used.
>         (vsx_and<mode>3_32bit): Move 128-bit logical insns to rs6000.md,
>         and allow TImode operations in 32-bit.
>         (vsx_and<mode>3_64bit): Likewise.
>         (vsx_ior<mode>3_32bit): Likewise.
>         (vsx_ior<mode>3_64bit): Likewise.
>         (vsx_xor<mode>3_32bit): Likewise.
>         (vsx_xor<mode>3_64bit): Likewise.
>         (vsx_one_cmpl<mode>2_32bit): Likewise.
>         (vsx_one_cmpl<mode>2_64bit): Likewise.
>         (vsx_nor<mode>3_32bit): Likewise.
>         (vsx_nor<mode>3_64bit): Likewise.
>         (vsx_andc<mode>3_32bit): Likewise.
>         (vsx_andc<mode>3_64bit): Likewise.
>         (vsx_eqv<mode>3_32bit): Likewise.
>         (vsx_eqv<mode>3_64bit): Likewise.
>         (vsx_nand<mode>3_32bit): Likewise.
>         (vsx_nand<mode>3_64bit): Likewise.
>         (vsx_orc<mode>3_32bit): Likewise.
>         (vsx_orc<mode>3_64bit): Likewise.
>
>         * config/rs6000/altivec.md (altivec_and): Move 128-bit logical
>         insns to rs6000.md, and allow TImode operations in 32-bit.
>         (altivec_ior<mode>3): Likewise.
>         (altivec_xor<mode>3): Likewise.
>         (altivec_one_cmpl<mode>2): Likewise.
>         (altivec_nor<mode>3): Likewise.
>         (altivec_andc<mode>3): Likewise.
>
>         * config/rs6000/rs6000.md (BOOL_128): New mode iterators and mode
>         attributes for moving the 128-bit logical operations into
>         rs6000.md.
>         (BOOL_REGS_OUTPUT): Likewise.
>         (BOOL_REGS_OP1): Likewise.
>         (BOOL_REGS_OP2): Likewise.
>         (BOOL_REGS_UNARY): Likewise.
>         (BOOL_REGS_AND_CR0): Likewise.
>         (one_cmpl<mode>2): Add support for DI logical operations on
>         32-bit, splitting the operations to 32-bit.
>         (anddi3): Likewise.
>         (iordi3): Likewise.
>         (xordi3): Likewise.
>         (and<mode>3, 128-bit types): Rewrite 2013-06-06 logical operator
>         changes to combine the 32/64-bit code, allow logical operations on
>         TI mode in 32-bit, and to use similar match_operator patterns like
>         scalar mode uses.  Combine the Altivec and VSX code for logical
>         operations, and move it here.
>         (ior<mode>3, 128-bit types): Likewise.
>         (xor<mode>3, 128-bit types): Likewise.
>         (one_cmpl<mode>3, 128-bit types): Likewise.
>         (nor<mode>3, 128-bit types): Likewise.
>         (andc<mode>3, 128-bit types): Likewise.
>         (eqv<mode>3, 128-bit types): Likewise.
>         (nand<mode>3, 128-bit types): Likewise.
>         (orc<mode>3, 128-bit types): Likewise.
>         (and<mode>3_internal): Likewise.
>         (bool<mode>3_internal): Likewise.
>         (boolc<mode>3_internal1): Likewise.
>         (boolc<mode>3_internal2): Likewise.
>         (boolcc<mode>3_internal1): Likewise.
>         (boolcc<mode>3_internal2): Likewise.
>         (eqv<mode>3_internal1): Likewise.
>         (eqv<mode>3_internal2): Likewise.
>         (one_cmpl1<mode>3_internal): Likewise.
>
>         * config/rs6000/rs6000-protos.h (rs6000_split_logical): New
>         declaration.
>
>         * config/rs6000/rs6000.c (rs6000_split_logical_inner): Add support
>         to split multi-word logical operations.
>         (rs6000_split_logical_di): Likewise.
>         (rs6000_split_logical): Likewise.

This patch is okay.  But two things:

Spelling mistake: elEment

+;; The canonical form is to have the negated elment first, so we need to
+;; reverse arguments.
+(define_expand "orc<mode>3"

And this patch needs a number of new testcases for logical operations,
GPRs, VRs and VSRs.

Thanks, David

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH, rs6000] power8 patches, patch #4 (revised), new power8 builtins
  2013-07-20 19:12             ` David Edelsohn
@ 2013-07-23 21:24               ` Michael Meissner
  0 siblings, 0 replies; 52+ messages in thread
From: Michael Meissner @ 2013-07-23 21:24 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Michael Meissner, GCC Patches, Pat Haugen, Peter Bergner

[-- Attachment #1: Type: text/plain, Size: 4214 bytes --]

This is the patch I committed.  It turns out I forgot to change
VLOGICAL_REGNO_P to allow 128-bit types to use GPRs on 32-bit in the last
patch.  So, I incorporated this into the patch, along with the spelling error,
and adding more test cases for power5, altivec, power7, and power8
architectures.

[gcc]
2013-07-23  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* config/rs6000/vector.md (xor<mode>3): Move 128-bit boolean
	expanders to rs6000.md.
	(ior<mode>3): Likewise.
	(and<mode>3): Likewise.
	(one_cmpl<mode>2): Likewise.
	(nor<mode>3): Likewise.
	(andc<mode>3): Likewise.
	(eqv<mode>3): Likewise.
	(nand<mode>3): Likewise.
	(orc<mode>3): Likewise.

	* config/rs6000/rs6000-protos.h (rs6000_split_logical): New
	declaration.

	* config/rs6000/rs6000.c (rs6000_split_logical_inner): Add support
	to split multi-word logical operations.
	(rs6000_split_logical_di): Likewise.
	(rs6000_split_logical): Likewise.

	* config/rs6000/vsx.md (VSX_L2): Delete, no longer used.
	(vsx_and<mode>3_32bit): Move 128-bit logical insns to rs6000.md,
	and allow TImode operations in 32-bit.
	(vsx_and<mode>3_64bit): Likewise.
	(vsx_ior<mode>3_32bit): Likewise.
	(vsx_ior<mode>3_64bit): Likewise.
	(vsx_xor<mode>3_32bit): Likewise.
	(vsx_xor<mode>3_64bit): Likewise.
	(vsx_one_cmpl<mode>2_32bit): Likewise.
	(vsx_one_cmpl<mode>2_64bit): Likewise.
	(vsx_nor<mode>3_32bit): Likewise.
	(vsx_nor<mode>3_64bit): Likewise.
	(vsx_andc<mode>3_32bit): Likewise.
	(vsx_andc<mode>3_64bit): Likewise.
	(vsx_eqv<mode>3_32bit): Likewise.
	(vsx_eqv<mode>3_64bit): Likewise.
	(vsx_nand<mode>3_32bit): Likewise.
	(vsx_nand<mode>3_64bit): Likewise.
	(vsx_orc<mode>3_32bit): Likewise.
	(vsx_orc<mode>3_64bit): Likewise.

	* config/rs6000/rs6000.h (VLOGICAL_REGNO_P): Always allow vector
	logical types in GPRs.

	* config/rs6000/altivec.md (altivec_and<mode>3): Move 128-bit
	logical insns to rs6000.md, and allow TImode operations in
	32-bit.
	(altivec_ior<mode>3): Likewise.
	(altivec_xor<mode>3): Likewise.
	(altivec_one_cmpl<mode>2): Likewise.
	(altivec_nor<mode>3): Likewise.
	(altivec_andc<mode>3): Likewise.

	* config/rs6000/rs6000.md (BOOL_128): New mode iterators and mode
	attributes for moving the 128-bit logical operations into
	rs6000.md.
	(BOOL_REGS_OUTPUT): Likewise.
	(BOOL_REGS_OP1): Likewise.
	(BOOL_REGS_OP2): Likewise.
	(BOOL_REGS_UNARY): Likewise.
	(BOOL_REGS_AND_CR0): Likewise.
	(one_cmpl<mode>2): Add support for DI logical operations on
	32-bit, splitting the operations to 32-bit.
	(anddi3): Likewise.
	(iordi3): Likewise.
	(xordi3): Likewise.
	(and<mode>3, 128-bit types): Rewrite 2013-06-06 logical operator
	changes to combine the 32/64-bit code, allow logical operations on
	TI mode in 32-bit, and to use similar match_operator patterns like
	scalar mode uses.  Combine the Altivec and VSX code for logical
	operations, and move it here.
	(ior<mode>3, 128-bit types): Likewise.
	(xor<mode>3, 128-bit types): Likewise.
	(one_cmpl<mode>3, 128-bit types): Likewise.
	(nor<mode>3, 128-bit types): Likewise.
	(andc<mode>3, 128-bit types): Likewise.
	(eqv<mode>3, 128-bit types): Likewise.
	(nand<mode>3, 128-bit types): Likewise.
	(orc<mode>3, 128-bit types): Likewise.
	(and<mode>3_internal): Likewise.
	(bool<mode>3_internal): Likewise.
	(boolc<mode>3_internal1): Likewise.
	(boolc<mode>3_internal2): Likewise.
	(boolcc<mode>3_internal1): Likewise.
	(boolcc<mode>3_internal2): Likewise.
	(eqv<mode>3_internal1): Likewise.
	(eqv<mode>3_internal2): Likewise.
	(one_cmpl1<mode>3_internal): Likewise.

[gcc/testsuite]
2013-07-23  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* gcc.target/powerpc/bool2.h: New file, test the code generation
	of logical operations for power5, altivec, power7, and power8
	systems.
	* gcc.target/powerpc/bool2-p5.c: Likewise.
	* gcc.target/powerpc/bool2-av.c: Likewise.
	* gcc.target/powerpc/bool2-p7.c: Likewise.
	* gcc.target/powerpc/bool2-p8.c: Likewise.
	* gcc.target/powerpc/bool3.h: Likewise.
	* gcc.target/powerpc/bool3-av.c: Likewise.
	* gcc.target/powerpc/bool2-p7.c: Likewise.
	* gcc.target/powerpc/bool2-p8.c: Likewise.



-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

[-- Attachment #2: gcc-power8.official-10b --]
[-- Type: text/plain, Size: 69181 bytes --]

Index: gcc/config/rs6000/vector.md
===================================================================
--- gcc/config/rs6000/vector.md	(revision 201180)
+++ gcc/config/rs6000/vector.md	(working copy)
@@ -710,87 +710,6 @@ (define_expand "cr6_test_for_lt_reverse"
   "")
 
 \f
-;; Vector logical instructions
-;; Do not support TImode logical instructions on 32-bit at present, because the
-;; compiler will see that we have a TImode and when it wanted DImode, and
-;; convert the DImode to TImode, store it on the stack, and load it in a VSX
-;; register.
-(define_expand "xor<mode>3"
-  [(set (match_operand:VEC_L 0 "vlogical_operand" "")
-        (xor:VEC_L (match_operand:VEC_L 1 "vlogical_operand" "")
-		   (match_operand:VEC_L 2 "vlogical_operand" "")))]
-  "VECTOR_MEM_ALTIVEC_OR_VSX_P (<MODE>mode)
-   && (<MODE>mode != TImode || TARGET_POWERPC64)"
-  "")
-
-(define_expand "ior<mode>3"
-  [(set (match_operand:VEC_L 0 "vlogical_operand" "")
-        (ior:VEC_L (match_operand:VEC_L 1 "vlogical_operand" "")
-		   (match_operand:VEC_L 2 "vlogical_operand" "")))]
-  "VECTOR_MEM_ALTIVEC_OR_VSX_P (<MODE>mode)
-   && (<MODE>mode != TImode || TARGET_POWERPC64)"
-  "")
-
-(define_expand "and<mode>3"
-  [(parallel [(set (match_operand:VEC_L 0 "vlogical_operand" "")
-		   (and:VEC_L (match_operand:VEC_L 1 "vlogical_operand" "")
-			      (match_operand:VEC_L 2 "vlogical_operand" "")))
-	      (clobber (match_scratch:CC 3 ""))])]
-  "VECTOR_MEM_ALTIVEC_OR_VSX_P (<MODE>mode)
-   && (<MODE>mode != TImode || TARGET_POWERPC64)"
-  "")
-
-(define_expand "one_cmpl<mode>2"
-  [(set (match_operand:VEC_L 0 "vlogical_operand" "")
-        (not:VEC_L (match_operand:VEC_L 1 "vlogical_operand" "")))]
-  "VECTOR_MEM_ALTIVEC_OR_VSX_P (<MODE>mode)
-   && (<MODE>mode != TImode || TARGET_POWERPC64)"
-  "")
-
-(define_expand "nor<mode>3"
-  [(set (match_operand:VEC_L 0 "vlogical_operand" "")
-	(and:VEC_L (not:VEC_L (match_operand:VEC_L 1 "vlogical_operand" ""))
-		   (not:VEC_L (match_operand:VEC_L 2 "vlogical_operand" ""))))]
-  "VECTOR_MEM_ALTIVEC_OR_VSX_P (<MODE>mode)
-   && (<MODE>mode != TImode || TARGET_POWERPC64)"
-  "")
-
-(define_expand "andc<mode>3"
-  [(set (match_operand:VEC_L 0 "vlogical_operand" "")
-        (and:VEC_L (not:VEC_L (match_operand:VEC_L 2 "vlogical_operand" ""))
-		   (match_operand:VEC_L 1 "vlogical_operand" "")))]
-  "VECTOR_MEM_ALTIVEC_OR_VSX_P (<MODE>mode)
-   && (<MODE>mode != TImode || TARGET_POWERPC64)"
-  "")
-
-;; Power8 vector logical instructions.
-(define_expand "eqv<mode>3"
-  [(set (match_operand:VEC_L 0 "register_operand" "")
-	(not:VEC_L
-	 (xor:VEC_L (match_operand:VEC_L 1 "register_operand" "")
-		    (match_operand:VEC_L 2 "register_operand" ""))))]
-  "TARGET_P8_VECTOR && VECTOR_MEM_VSX_P (<MODE>mode)
-   && (<MODE>mode != TImode || TARGET_POWERPC64)")
-
-;; Rewrite nand into canonical form
-(define_expand "nand<mode>3"
-  [(set (match_operand:VEC_L 0 "register_operand" "")
-	(ior:VEC_L
-	 (not:VEC_L (match_operand:VEC_L 1 "register_operand" ""))
-	 (not:VEC_L (match_operand:VEC_L 2 "register_operand" ""))))]
-  "TARGET_P8_VECTOR && VECTOR_MEM_VSX_P (<MODE>mode)
-   && (<MODE>mode != TImode || TARGET_POWERPC64)")
-
-;; The canonical form is to have the negated elment first, so we need to
-;; reverse arguments.
-(define_expand "orc<mode>3"
-  [(set (match_operand:VEC_L 0 "register_operand" "")
-	(ior:VEC_L
-	 (not:VEC_L (match_operand:VEC_L 1 "register_operand" ""))
-	 (match_operand:VEC_L 2 "register_operand" "")))]
-  "TARGET_P8_VECTOR && VECTOR_MEM_VSX_P (<MODE>mode)
-   && (<MODE>mode != TImode || TARGET_POWERPC64)")
-
 ;; Vector count leading zeros
 (define_expand "clz<mode>2"
   [(set (match_operand:VEC_I 0 "register_operand" "")
Index: gcc/config/rs6000/rs6000-protos.h
===================================================================
--- gcc/config/rs6000/rs6000-protos.h	(revision 201180)
+++ gcc/config/rs6000/rs6000-protos.h	(working copy)
@@ -138,6 +138,7 @@ extern rtx rs6000_address_for_fpconvert 
 extern rtx rs6000_address_for_altivec (rtx);
 extern rtx rs6000_allocate_stack_temp (enum machine_mode, bool, bool);
 extern int rs6000_loop_align (rtx);
+extern void rs6000_split_logical (rtx [], enum rtx_code, bool, bool, bool, rtx);
 #endif /* RTX_CODE */
 
 #ifdef TREE_CODE
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 201180)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -30139,6 +30139,280 @@ rs6000_set_up_by_prologue (struct hard_r
     add_to_hard_reg_set (&set->set, Pmode, RS6000_PIC_OFFSET_TABLE_REGNUM);
 }
 
+\f
+/* Helper function for rs6000_split_logical to emit a logical instruction after
+   spliting the operation to single GPR registers.
+
+   DEST is the destination register.
+   OP1 and OP2 are the input source registers.
+   CODE is the base operation (AND, IOR, XOR, NOT).
+   MODE is the machine mode.
+   If COMPLEMENT_FINAL_P is true, wrap the whole operation with NOT.
+   If COMPLEMENT_OP1_P is true, wrap operand1 with NOT.
+   If COMPLEMENT_OP2_P is true, wrap operand2 with NOT.
+   CLOBBER_REG is either NULL or a scratch register of type CC to allow
+   formation of the AND instructions.  */
+
+static void
+rs6000_split_logical_inner (rtx dest,
+			    rtx op1,
+			    rtx op2,
+			    enum rtx_code code,
+			    enum machine_mode mode,
+			    bool complement_final_p,
+			    bool complement_op1_p,
+			    bool complement_op2_p,
+			    rtx clobber_reg)
+{
+  rtx bool_rtx;
+  rtx set_rtx;
+
+  /* Optimize AND of 0/0xffffffff and IOR/XOR of 0.  */
+  if (op2 && GET_CODE (op2) == CONST_INT
+      && (mode == SImode || (mode == DImode && TARGET_POWERPC64))
+      && !complement_final_p && !complement_op1_p && !complement_op2_p)
+    {
+      HOST_WIDE_INT mask = GET_MODE_MASK (mode);
+      HOST_WIDE_INT value = INTVAL (op2) & mask;
+
+      /* Optimize AND of 0 to just set 0.  Optimize AND of -1 to be a move.  */
+      if (code == AND)
+	{
+	  if (value == 0)
+	    {
+	      emit_insn (gen_rtx_SET (VOIDmode, dest, const0_rtx));
+	      return;
+	    }
+
+	  else if (value == mask)
+	    {
+	      if (!rtx_equal_p (dest, op1))
+		emit_insn (gen_rtx_SET (VOIDmode, dest, op1));
+	      return;
+	    }
+	}
+
+      /* Optimize IOR/XOR of 0 to be a simple move.  Split large operations
+	 into separate ORI/ORIS or XORI/XORIS instrucitons.  */
+      else if (code == IOR || code == XOR)
+	{
+	  if (value == 0)
+	    {
+	      if (!rtx_equal_p (dest, op1))
+		emit_insn (gen_rtx_SET (VOIDmode, dest, op1));
+	      return;
+	    }
+	}
+    }
+
+  if (complement_op1_p)
+    op1 = gen_rtx_NOT (mode, op1);
+
+  if (complement_op2_p)
+    op2 = gen_rtx_NOT (mode, op2);
+
+  bool_rtx = ((code == NOT)
+	      ? gen_rtx_NOT (mode, op1)
+	      : gen_rtx_fmt_ee (code, mode, op1, op2));
+
+  if (complement_final_p)
+    bool_rtx = gen_rtx_NOT (mode, bool_rtx);
+
+  set_rtx = gen_rtx_SET (VOIDmode, dest, bool_rtx);
+
+  /* Is this AND with an explicit clobber?  */
+  if (clobber_reg)
+    {
+      rtx clobber = gen_rtx_CLOBBER (VOIDmode, clobber_reg);
+      set_rtx = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, set_rtx, clobber));
+    }
+
+  emit_insn (set_rtx);
+  return;
+}
+
+/* Split a DImode AND/IOR/XOR with a constant on a 32-bit system.  These
+   operations are split immediately during RTL generation to allow for more
+   optimizations of the AND/IOR/XOR.
+
+   OPERANDS is an array containing the destination and two input operands.
+   CODE is the base operation (AND, IOR, XOR, NOT).
+   MODE is the machine mode.
+   If COMPLEMENT_FINAL_P is true, wrap the whole operation with NOT.
+   If COMPLEMENT_OP1_P is true, wrap operand1 with NOT.
+   If COMPLEMENT_OP2_P is true, wrap operand2 with NOT.
+   CLOBBER_REG is either NULL or a scratch register of type CC to allow
+   formation of the AND instructions.  */
+
+static void
+rs6000_split_logical_di (rtx operands[3],
+			 enum rtx_code code,
+			 bool complement_final_p,
+			 bool complement_op1_p,
+			 bool complement_op2_p,
+			 rtx clobber_reg)
+{
+  const HOST_WIDE_INT lower_32bits = HOST_WIDE_INT_C(0xffffffff);
+  const HOST_WIDE_INT upper_32bits = ~ lower_32bits;
+  const HOST_WIDE_INT sign_bit = HOST_WIDE_INT_C(0x80000000);
+  enum hi_lo { hi = 0, lo = 1 };
+  rtx op0_hi_lo[2], op1_hi_lo[2], op2_hi_lo[2];
+  size_t i;
+
+  op0_hi_lo[hi] = gen_highpart (SImode, operands[0]);
+  op1_hi_lo[hi] = gen_highpart (SImode, operands[1]);
+  op0_hi_lo[lo] = gen_lowpart (SImode, operands[0]);
+  op1_hi_lo[lo] = gen_lowpart (SImode, operands[1]);
+
+  if (code == NOT)
+    op2_hi_lo[hi] = op2_hi_lo[lo] = NULL_RTX;
+  else
+    {
+      if (GET_CODE (operands[2]) != CONST_INT)
+	{
+	  op2_hi_lo[hi] = gen_highpart_mode (SImode, DImode, operands[2]);
+	  op2_hi_lo[lo] = gen_lowpart (SImode, operands[2]);
+	}
+      else
+	{
+	  HOST_WIDE_INT value = INTVAL (operands[2]);
+	  HOST_WIDE_INT value_hi_lo[2];
+
+	  gcc_assert (!complement_final_p);
+	  gcc_assert (!complement_op1_p);
+	  gcc_assert (!complement_op2_p);
+
+	  value_hi_lo[hi] = value >> 32;
+	  value_hi_lo[lo] = value & lower_32bits;
+
+	  for (i = 0; i < 2; i++)
+	    {
+	      HOST_WIDE_INT sub_value = value_hi_lo[i];
+
+	      if (sub_value & sign_bit)
+		sub_value |= upper_32bits;
+
+	      op2_hi_lo[i] = GEN_INT (sub_value);
+
+	      /* If this is an AND instruction, check to see if we need to load
+		 the value in a register.  */
+	      if (code == AND && sub_value != -1 && sub_value != 0
+		  && !and_operand (op2_hi_lo[i], SImode))
+		op2_hi_lo[i] = force_reg (SImode, op2_hi_lo[i]);
+	    }
+	}
+    }
+
+  for (i = 0; i < 2; i++)
+    {
+      /* Split large IOR/XOR operations.  */
+      if ((code == IOR || code == XOR)
+	  && GET_CODE (op2_hi_lo[i]) == CONST_INT
+	  && !complement_final_p
+	  && !complement_op1_p
+	  && !complement_op2_p
+	  && clobber_reg == NULL_RTX
+	  && !logical_const_operand (op2_hi_lo[i], SImode))
+	{
+	  HOST_WIDE_INT value = INTVAL (op2_hi_lo[i]);
+	  HOST_WIDE_INT hi_16bits = value & HOST_WIDE_INT_C(0xffff0000);
+	  HOST_WIDE_INT lo_16bits = value & HOST_WIDE_INT_C(0x0000ffff);
+	  rtx tmp = gen_reg_rtx (SImode);
+
+	  /* Make sure the constant is sign extended.  */
+	  if ((hi_16bits & sign_bit) != 0)
+	    hi_16bits |= upper_32bits;
+
+	  rs6000_split_logical_inner (tmp, op1_hi_lo[i], GEN_INT (hi_16bits),
+				      code, SImode, false, false, false,
+				      NULL_RTX);
+
+	  rs6000_split_logical_inner (op0_hi_lo[i], tmp, GEN_INT (lo_16bits),
+				      code, SImode, false, false, false,
+				      NULL_RTX);
+	}
+      else
+	rs6000_split_logical_inner (op0_hi_lo[i], op1_hi_lo[i], op2_hi_lo[i],
+				    code, SImode, complement_final_p,
+				    complement_op1_p, complement_op2_p,
+				    clobber_reg);
+    }
+
+  return;
+}
+
+/* Split the insns that make up boolean operations operating on multiple GPR
+   registers.  The boolean MD patterns ensure that the inputs either are
+   exactly the same as the output registers, or there is no overlap.
+
+   OPERANDS is an array containing the destination and two input operands.
+   CODE is the base operation (AND, IOR, XOR, NOT).
+   MODE is the machine mode.
+   If COMPLEMENT_FINAL_P is true, wrap the whole operation with NOT.
+   If COMPLEMENT_OP1_P is true, wrap operand1 with NOT.
+   If COMPLEMENT_OP2_P is true, wrap operand2 with NOT.
+   CLOBBER_REG is either NULL or a scratch register of type CC to allow
+   formation of the AND instructions.  */
+
+void
+rs6000_split_logical (rtx operands[3],
+		      enum rtx_code code,
+		      bool complement_final_p,
+		      bool complement_op1_p,
+		      bool complement_op2_p,
+		      rtx clobber_reg)
+{
+  enum machine_mode mode = GET_MODE (operands[0]);
+  enum machine_mode sub_mode;
+  rtx op0, op1, op2;
+  int sub_size, regno0, regno1, nregs, i;
+
+  /* If this is DImode, use the specialized version that can run before
+     register allocation.  */
+  if (mode == DImode && !TARGET_POWERPC64)
+    {
+      rs6000_split_logical_di (operands, code, complement_final_p,
+			       complement_op1_p, complement_op2_p,
+			       clobber_reg);
+      return;
+    }
+
+  op0 = operands[0];
+  op1 = operands[1];
+  op2 = (code == NOT) ? NULL_RTX : operands[2];
+  sub_mode = (TARGET_POWERPC64) ? DImode : SImode;
+  sub_size = GET_MODE_SIZE (sub_mode);
+  regno0 = REGNO (op0);
+  regno1 = REGNO (op1);
+
+  gcc_assert (reload_completed);
+  gcc_assert (IN_RANGE (regno0, FIRST_GPR_REGNO, LAST_GPR_REGNO));
+  gcc_assert (IN_RANGE (regno1, FIRST_GPR_REGNO, LAST_GPR_REGNO));
+
+  nregs = rs6000_hard_regno_nregs[(int)mode][regno0];
+  gcc_assert (nregs > 1);
+
+  if (op2 && REG_P (op2))
+    gcc_assert (IN_RANGE (REGNO (op2), FIRST_GPR_REGNO, LAST_GPR_REGNO));
+
+  for (i = 0; i < nregs; i++)
+    {
+      int offset = i * sub_size;
+      rtx sub_op0 = simplify_subreg (sub_mode, op0, mode, offset);
+      rtx sub_op1 = simplify_subreg (sub_mode, op1, mode, offset);
+      rtx sub_op2 = ((code == NOT)
+		     ? NULL_RTX
+		     : simplify_subreg (sub_mode, op2, mode, offset));
+
+      rs6000_split_logical_inner (sub_op0, sub_op1, sub_op2, code, sub_mode,
+				  complement_final_p, complement_op1_p,
+				  complement_op2_p, clobber_reg);
+    }
+
+  return;
+}
+
+\f
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-rs6000.h"
Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md	(revision 201180)
+++ gcc/config/rs6000/vsx.md	(working copy)
@@ -36,10 +36,6 @@ (define_mode_iterator VSX_F [V4SF V2DF])
 ;; Iterator for logical types supported by VSX
 (define_mode_iterator VSX_L [V16QI V8HI V4SI V2DI V4SF V2DF TI])
 
-;; Like VSX_L, but don't support TImode for doing logical instructions in
-;; 32-bit
-(define_mode_iterator VSX_L2 [V16QI V8HI V4SI V2DI V4SF V2DF])
-
 ;; Iterator for memory move.  Handle TImode specially to allow
 ;; it to use gprs as well as vsx registers.
 (define_mode_iterator VSX_M [V16QI V8HI V4SI V2DI V4SF V2DF])
@@ -1047,370 +1043,6 @@ (define_insn "*vsx_float_fix_<mode>2"
    (set_attr "fp_type" "<VSfptype_simple>")])
 
 \f
-;; Logical operations.  Do not support TImode logical instructions on 32-bit at
-;; present, because the compiler will see that we have a TImode and when it
-;; wanted DImode, and convert the DImode to TImode, store it on the stack, and
-;; load it in a VSX register or generate extra logical instructions in GPR
-;; registers.
-
-;; When we are splitting the operations to GPRs, we use three alternatives, two
-;; where the first/second inputs and output are in the same register, and the
-;; third where the output specifies an early clobber so that we don't have to
-;; worry about overlapping registers.
-
-(define_insn "*vsx_and<mode>3_32bit"
-  [(set (match_operand:VSX_L2 0 "vlogical_operand" "=wa")
-        (and:VSX_L2 (match_operand:VSX_L2 1 "vlogical_operand" "%wa")
-		    (match_operand:VSX_L2 2 "vlogical_operand" "wa")))
-   (clobber (match_scratch:CC 3 "X"))]
-  "!TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)"
-  "xxland %x0,%x1,%x2"
-  [(set_attr "type" "vecsimple")
-   (set_attr "length" "4")])
-
-(define_insn_and_split "*vsx_and<mode>3_64bit"
-  [(set (match_operand:VSX_L 0 "vlogical_operand" "=wa,?r,?r,&?r")
-        (and:VSX_L
-	 (match_operand:VSX_L 1 "vlogical_operand" "%wa,0,r,r")
-	 (match_operand:VSX_L 2 "vlogical_operand" "wa,r,0,r")))
-   (clobber (match_scratch:CC 3 "X,X,X,X"))]
-  "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)"
-  "@
-   xxland %x0,%x1,%x2
-   #
-   #
-   #"
-  "reload_completed && TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
-   && int_reg_operand (operands[0], <MODE>mode)"
-  [(parallel [(set (match_dup 4) (and:DI (match_dup 5) (match_dup 6)))
-	      (clobber (match_dup 3))])
-   (parallel [(set (match_dup 7) (and:DI (match_dup 8) (match_dup 9)))
-	      (clobber (match_dup 3))])]
-{
-  operands[4] = simplify_subreg (DImode, operands[0], <MODE>mode, 0);
-  operands[5] = simplify_subreg (DImode, operands[1], <MODE>mode, 0);
-  operands[6] = simplify_subreg (DImode, operands[2], <MODE>mode, 0);
-  operands[7] = simplify_subreg (DImode, operands[0], <MODE>mode, 8);
-  operands[8] = simplify_subreg (DImode, operands[1], <MODE>mode, 8);
-  operands[9] = simplify_subreg (DImode, operands[2], <MODE>mode, 8);
-}
-  [(set_attr "type" "vecsimple,two,two,two")
-   (set_attr "length" "4,8,8,8")])
-
-(define_insn "*vsx_ior<mode>3_32bit"
-  [(set (match_operand:VSX_L2 0 "vlogical_operand" "=wa")
-        (ior:VSX_L2 (match_operand:VSX_L2 1 "vlogical_operand" "%wa")
-		    (match_operand:VSX_L2 2 "vlogical_operand" "wa")))]
-  "!TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)"
-  "xxlor %x0,%x1,%x2"
-  [(set_attr "type" "vecsimple")
-   (set_attr "length" "4")])
-
-(define_insn_and_split "*vsx_ior<mode>3_64bit"
-  [(set (match_operand:VSX_L 0 "vlogical_operand" "=wa,?r,?r,&?r,?r,&?r")
-        (ior:VSX_L
-	 (match_operand:VSX_L 1 "vlogical_operand" "%wa,0,r,r,0,r")
-	 (match_operand:VSX_L 2 "vsx_reg_or_cint_operand" "wa,r,0,r,n,n")))]
-  "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)"
-  "@
-   xxlor %x0,%x1,%x2
-   #
-   #
-   #
-   #
-   #"
-  "reload_completed && TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
-   && int_reg_operand (operands[0], <MODE>mode)"
-  [(const_int 0)]
-{
-  operands[3] = simplify_subreg (DImode, operands[0], <MODE>mode, 0);
-  operands[4] = simplify_subreg (DImode, operands[1], <MODE>mode, 0);
-  operands[5] = simplify_subreg (DImode, operands[2], <MODE>mode, 0);
-  operands[6] = simplify_subreg (DImode, operands[0], <MODE>mode, 8);
-  operands[7] = simplify_subreg (DImode, operands[1], <MODE>mode, 8);
-  operands[8] = simplify_subreg (DImode, operands[2], <MODE>mode, 8);
-
-  if (operands[5] == constm1_rtx)
-    emit_move_insn (operands[3], constm1_rtx);
-
-  else if (operands[5] == const0_rtx)
-    {
-      if (!rtx_equal_p (operands[3], operands[4]))
-	emit_move_insn (operands[3], operands[4]);
-    }
-  else
-    emit_insn (gen_iordi3 (operands[3], operands[4], operands[5]));
-
-  if (operands[8] == constm1_rtx)
-    emit_move_insn (operands[8], constm1_rtx);
-
-  else if (operands[8] == const0_rtx)
-    {
-      if (!rtx_equal_p (operands[6], operands[7]))
-	emit_move_insn (operands[6], operands[7]);
-    }
-  else
-    emit_insn (gen_iordi3 (operands[6], operands[7], operands[8]));
-  DONE;
-}
-  [(set_attr "type" "vecsimple,two,two,two,three,three")
-   (set_attr "length" "4,8,8,8,16,16")])
-
-(define_insn "*vsx_xor<mode>3_32bit"
-  [(set (match_operand:VSX_L2 0 "vlogical_operand" "=wa")
-        (xor:VSX_L2 (match_operand:VSX_L2 1 "vlogical_operand" "%wa")
-		    (match_operand:VSX_L2 2 "vlogical_operand" "wa")))]
-  "VECTOR_MEM_VSX_P (<MODE>mode) && !TARGET_POWERPC64"
-  "xxlxor %x0,%x1,%x2"
-  [(set_attr "type" "vecsimple")
-   (set_attr "length" "4")])
-
-(define_insn_and_split "*vsx_xor<mode>3_64bit"
-  [(set (match_operand:VSX_L 0 "vlogical_operand" "=wa,?r,?r,&?r,?r,&?r")
-        (xor:VSX_L
-	 (match_operand:VSX_L 1 "vlogical_operand" "%wa,0,r,r,0,r")
-	 (match_operand:VSX_L 2 "vsx_reg_or_cint_operand" "wa,r,0,r,n,n")))]
-  "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)"
-  "@
-   xxlxor %x0,%x1,%x2
-   #
-   #
-   #
-   #
-   #"
-  "reload_completed && TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
-   && int_reg_operand (operands[0], <MODE>mode)"
-  [(set (match_dup 3) (xor:DI (match_dup 4) (match_dup 5)))
-   (set (match_dup 6) (xor:DI (match_dup 7) (match_dup 8)))]
-{
-  operands[3] = simplify_subreg (DImode, operands[0], <MODE>mode, 0);
-  operands[4] = simplify_subreg (DImode, operands[1], <MODE>mode, 0);
-  operands[5] = simplify_subreg (DImode, operands[2], <MODE>mode, 0);
-  operands[6] = simplify_subreg (DImode, operands[0], <MODE>mode, 8);
-  operands[7] = simplify_subreg (DImode, operands[1], <MODE>mode, 8);
-  operands[8] = simplify_subreg (DImode, operands[2], <MODE>mode, 8);
-}
-  [(set_attr "type" "vecsimple,two,two,two,three,three")
-   (set_attr "length" "4,8,8,8,16,16")])
-
-(define_insn "*vsx_one_cmpl<mode>2_32bit"
-  [(set (match_operand:VSX_L2 0 "vlogical_operand" "=wa")
-        (not:VSX_L2 (match_operand:VSX_L2 1 "vlogical_operand" "wa")))]
-  "!TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)"
-  "xxlnor %x0,%x1,%x1"
-  [(set_attr "type" "vecsimple")
-   (set_attr "length" "4")])
-
-(define_insn_and_split "*vsx_one_cmpl<mode>2_64bit"
-  [(set (match_operand:VSX_L 0 "vlogical_operand" "=wa,?r,&?r")
-        (not:VSX_L (match_operand:VSX_L 1 "vlogical_operand" "wa,0,r")))]
-  "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)"
-  "@
-   xxlnor %x0,%x1,%x1
-   #
-   #"
-  "reload_completed && TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
-   && int_reg_operand (operands[0], <MODE>mode)"
-  [(set (match_dup 2) (not:DI (match_dup 3)))
-   (set (match_dup 4) (not:DI (match_dup 5)))]
-{
-  operands[2] = simplify_subreg (DImode, operands[0], <MODE>mode, 0);
-  operands[3] = simplify_subreg (DImode, operands[1], <MODE>mode, 0);
-  operands[4] = simplify_subreg (DImode, operands[0], <MODE>mode, 8);
-  operands[5] = simplify_subreg (DImode, operands[1], <MODE>mode, 8);
-}
-  [(set_attr "type" "vecsimple,two,two")
-   (set_attr "length" "4,8,8")])
-  
-(define_insn "*vsx_nor<mode>3_32bit"
-  [(set (match_operand:VSX_L2 0 "vlogical_operand" "=wa")
-	(and:VSX_L2
-	 (not:VSX_L2 (match_operand:VSX_L 1 "vlogical_operand" "%wa"))
-	 (not:VSX_L2 (match_operand:VSX_L 2 "vlogical_operand" "wa"))))]
-  "!TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)"
-  "xxlnor %x0,%x1,%x2"
-  [(set_attr "type" "vecsimple")
-   (set_attr "length" "4")])
-
-(define_insn_and_split "*vsx_nor<mode>3_64bit"
-  [(set (match_operand:VSX_L 0 "vlogical_operand" "=wa,?r,?r,&?r")
-	(and:VSX_L
-	 (not:VSX_L (match_operand:VSX_L 1 "vlogical_operand" "%wa,0,r,r"))
-	 (not:VSX_L (match_operand:VSX_L 2 "vlogical_operand" "wa,r,0,r"))))]
-  "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)"
-  "@
-   xxlnor %x0,%x1,%x2
-   #
-   #
-   #"
-  "reload_completed && TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
-   && int_reg_operand (operands[0], <MODE>mode)"
-  [(set (match_dup 3) (and:DI (not:DI (match_dup 4)) (not:DI (match_dup 5))))
-   (set (match_dup 6) (and:DI (not:DI (match_dup 7)) (not:DI (match_dup 8))))]
-{
-  operands[3] = simplify_subreg (DImode, operands[0], <MODE>mode, 0);
-  operands[4] = simplify_subreg (DImode, operands[1], <MODE>mode, 0);
-  operands[5] = simplify_subreg (DImode, operands[2], <MODE>mode, 0);
-  operands[6] = simplify_subreg (DImode, operands[0], <MODE>mode, 8);
-  operands[7] = simplify_subreg (DImode, operands[1], <MODE>mode, 8);
-  operands[8] = simplify_subreg (DImode, operands[2], <MODE>mode, 8);
-}
-  [(set_attr "type" "vecsimple,two,two,two")
-   (set_attr "length" "4,8,8,8")])
-
-(define_insn "*vsx_andc<mode>3_32bit"
-  [(set (match_operand:VSX_L2 0 "vlogical_operand" "=wa")
-        (and:VSX_L2
-	 (not:VSX_L2
-	  (match_operand:VSX_L2 2 "vlogical_operand" "wa"))
-	 (match_operand:VSX_L2 1 "vlogical_operand" "wa")))]
-  "!TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)"
-  "xxlandc %x0,%x1,%x2"
-  [(set_attr "type" "vecsimple")
-   (set_attr "length" "4")])
-
-(define_insn_and_split "*vsx_andc<mode>3_64bit"
-  [(set (match_operand:VSX_L 0 "vlogical_operand" "=wa,?r,?r,?r")
-        (and:VSX_L
-	 (not:VSX_L
-	  (match_operand:VSX_L 2 "vlogical_operand" "wa,0,r,r"))
-	 (match_operand:VSX_L 1 "vlogical_operand" "wa,r,0,r")))]
-  "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)"
-  "@
-   xxlandc %x0,%x1,%x2
-   #
-   #
-   #"
-  "reload_completed && TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
-   && int_reg_operand (operands[0], <MODE>mode)"
-  [(set (match_dup 3) (and:DI (not:DI (match_dup 4)) (match_dup 5)))
-   (set (match_dup 6) (and:DI (not:DI (match_dup 7)) (match_dup 8)))]
-{
-  operands[3] = simplify_subreg (DImode, operands[0], <MODE>mode, 0);
-  operands[4] = simplify_subreg (DImode, operands[1], <MODE>mode, 0);
-  operands[5] = simplify_subreg (DImode, operands[2], <MODE>mode, 0);
-  operands[6] = simplify_subreg (DImode, operands[0], <MODE>mode, 8);
-  operands[7] = simplify_subreg (DImode, operands[1], <MODE>mode, 8);
-  operands[8] = simplify_subreg (DImode, operands[2], <MODE>mode, 8);
-}
-  [(set_attr "type" "vecsimple,two,two,two")
-   (set_attr "length" "4,8,8,8")])
-
-;; Power8 vector logical instructions.
-(define_insn "*vsx_eqv<mode>3_32bit"
-  [(set (match_operand:VSX_L2 0 "vlogical_operand" "=wa")
-	(not:VSX_L2
-	 (xor:VSX_L2 (match_operand:VSX_L2 1 "vlogical_operand" "wa")
-		     (match_operand:VSX_L2 2 "vlogical_operand" "wa"))))]
-  "!TARGET_POWERPC64 && TARGET_P8_VECTOR && VECTOR_MEM_VSX_P (<MODE>mode)"
-  "xxleqv %x0,%x1,%x2"
-  [(set_attr "type" "vecsimple")
-   (set_attr "length" "4")])
-
-(define_insn_and_split "*vsx_eqv<mode>3_64bit"
-  [(set (match_operand:VSX_L 0 "vlogical_operand" "=wa,?r,?r,?r")
-	(not:VSX_L
-	 (xor:VSX_L (match_operand:VSX_L 1 "vlogical_operand" "wa,0,r,r")
-		    (match_operand:VSX_L 2 "vlogical_operand" "wa,r,0,r"))))]
-  "TARGET_POWERPC64 && TARGET_P8_VECTOR && VECTOR_MEM_VSX_P (<MODE>mode)"
-  "@
-   xxleqv %x0,%x1,%x2
-   #
-   #
-   #"
-  "reload_completed && TARGET_POWERPC64 && TARGET_P8_VECTOR
-   && VECTOR_MEM_VSX_P (<MODE>mode)
-   && int_reg_operand (operands[0], <MODE>mode)"
-  [(set (match_dup 3) (not:DI (xor:DI (match_dup 4) (match_dup 5))))
-   (set (match_dup 6) (not:DI (xor:DI (match_dup 7) (match_dup 8))))]
-{
-  operands[3] = simplify_subreg (DImode, operands[0], <MODE>mode, 0);
-  operands[4] = simplify_subreg (DImode, operands[1], <MODE>mode, 0);
-  operands[5] = simplify_subreg (DImode, operands[2], <MODE>mode, 0);
-  operands[6] = simplify_subreg (DImode, operands[0], <MODE>mode, 8);
-  operands[7] = simplify_subreg (DImode, operands[1], <MODE>mode, 8);
-  operands[8] = simplify_subreg (DImode, operands[2], <MODE>mode, 8);
-}
-  [(set_attr "type" "vecsimple,two,two,two")
-   (set_attr "length" "4,8,8,8")])
-
-;; Rewrite nand into canonical form
-(define_insn "*vsx_nand<mode>3_32bit"
-  [(set (match_operand:VSX_L2 0 "vlogical_operand" "=wa")
-	(ior:VSX_L2
-	 (not:VSX_L2 (match_operand:VSX_L2 1 "vlogical_operand" "wa"))
-	 (not:VSX_L2 (match_operand:VSX_L2 2 "vlogical_operand" "wa"))))]
-  "!TARGET_POWERPC64 && TARGET_P8_VECTOR && VECTOR_MEM_VSX_P (<MODE>mode)"
-  "xxlnand %x0,%x1,%x2"
-  [(set_attr "type" "vecsimple")
-   (set_attr "length" "4")])
-
-(define_insn_and_split "*vsx_nand<mode>3_64bit"
-  [(set (match_operand:VSX_L 0 "register_operand" "=wa,?r,?r,?r")
-	(ior:VSX_L
-	 (not:VSX_L (match_operand:VSX_L 1 "register_operand" "wa,0,r,r"))
-	 (not:VSX_L (match_operand:VSX_L 2 "register_operand" "wa,r,0,r"))))]
-  "TARGET_POWERPC64 && TARGET_P8_VECTOR && VECTOR_MEM_VSX_P (<MODE>mode)"
-  "@
-   xxlnand %x0,%x1,%x2
-   #
-   #
-   #"
-  "reload_completed && TARGET_POWERPC64 && TARGET_P8_VECTOR
-   && VECTOR_MEM_VSX_P (<MODE>mode)
-   && int_reg_operand (operands[0], <MODE>mode)"
-  [(set (match_dup 3) (ior:DI (not:DI (match_dup 4)) (not:DI (match_dup 5))))
-   (set (match_dup 6) (ior:DI (not:DI (match_dup 7)) (not:DI (match_dup 8))))]
-{
-  operands[3] = simplify_subreg (DImode, operands[0], <MODE>mode, 0);
-  operands[4] = simplify_subreg (DImode, operands[1], <MODE>mode, 0);
-  operands[5] = simplify_subreg (DImode, operands[2], <MODE>mode, 0);
-  operands[6] = simplify_subreg (DImode, operands[0], <MODE>mode, 8);
-  operands[7] = simplify_subreg (DImode, operands[1], <MODE>mode, 8);
-  operands[8] = simplify_subreg (DImode, operands[2], <MODE>mode, 8);
-}
-  [(set_attr "type" "vecsimple,two,two,two")
-   (set_attr "length" "4,8,8,8")])
-
-;; Rewrite or complement into canonical form, by reversing the arguments
-(define_insn "*vsx_orc<mode>3_32bit"
-  [(set (match_operand:VSX_L2 0 "vlogical_operand" "=wa")
-	(ior:VSX_L2
-	 (not:VSX_L2 (match_operand:VSX_L2 1 "vlogical_operand" "wa"))
-	 (match_operand:VSX_L2 2 "vlogical_operand" "wa")))]
-  "!TARGET_POWERPC64 && TARGET_P8_VECTOR && VECTOR_MEM_VSX_P (<MODE>mode)"
-  "xxlorc %x0,%x2,%x1"
-  [(set_attr "type" "vecsimple")
-   (set_attr "length" "4")])
-
-(define_insn_and_split "*vsx_orc<mode>3_64bit"
-  [(set (match_operand:VSX_L 0 "vlogical_operand" "=wa,?r,?r,?r")
-	(ior:VSX_L
-	 (not:VSX_L (match_operand:VSX_L 1 "vlogical_operand" "wa,0,r,r"))
-	 (match_operand:VSX_L 2 "vlogical_operand" "wa,r,0,r")))]
-  "TARGET_POWERPC64 && TARGET_P8_VECTOR && VECTOR_MEM_VSX_P (<MODE>mode)"
-  "@
-   xxlorc %x0,%x2,%x1
-   #
-   #
-   #"
-  "reload_completed && TARGET_POWERPC64 && TARGET_P8_VECTOR
-   && VECTOR_MEM_VSX_P (<MODE>mode)
-   && int_reg_operand (operands[0], <MODE>mode)"
-  [(set (match_dup 3) (ior:DI (not:DI (match_dup 4)) (match_dup 5)))
-   (set (match_dup 6) (ior:DI (not:DI (match_dup 7)) (match_dup 8)))]
-{
-  operands[3] = simplify_subreg (DImode, operands[0], <MODE>mode, 0);
-  operands[4] = simplify_subreg (DImode, operands[1], <MODE>mode, 0);
-  operands[5] = simplify_subreg (DImode, operands[2], <MODE>mode, 0);
-  operands[6] = simplify_subreg (DImode, operands[0], <MODE>mode, 8);
-  operands[7] = simplify_subreg (DImode, operands[1], <MODE>mode, 8);
-  operands[8] = simplify_subreg (DImode, operands[2], <MODE>mode, 8);
-}
-  [(set_attr "type" "vecsimple,two,two,two")
-   (set_attr "length" "4,8,8,8")])
-
-\f
 ;; Permute operations
 
 ;; Build a V2DF/V2DI vector from two scalars
Index: gcc/config/rs6000/rs6000.h
===================================================================
--- gcc/config/rs6000/rs6000.h	(revision 201180)
+++ gcc/config/rs6000/rs6000.h	(working copy)
@@ -1121,14 +1121,11 @@ enum data_align { align_abi, align_opt, 
 #define VINT_REGNO_P(N) ALTIVEC_REGNO_P (N)
 
 /* Alternate name for any vector register supporting logical operations, no
-   matter which instruction set(s) are available.  For 64-bit mode, we also
-   allow logical operations in the GPRS.  This is to allow atomic quad word
-   builtins not to need the VSX registers for lqarx/stqcx.  It also helps with
-   __int128_t arguments that are passed in GPRs.  */
+   matter which instruction set(s) are available.  Allow GPRs as well as the
+   vector registers.  */
 #define VLOGICAL_REGNO_P(N)						\
-  (ALTIVEC_REGNO_P (N)							\
-   || (TARGET_VSX && FP_REGNO_P (N))					\
-   || (TARGET_VSX && TARGET_POWERPC64 && INT_REGNO_P (N)))
+  (INT_REGNO_P (N) || ALTIVEC_REGNO_P (N)				\
+   || (TARGET_VSX && FP_REGNO_P (N)))					\
 
 /* Return number of consecutive hard regs needed starting at reg REGNO
    to hold something of mode MODE.  */
Index: gcc/config/rs6000/altivec.md
===================================================================
--- gcc/config/rs6000/altivec.md	(revision 201180)
+++ gcc/config/rs6000/altivec.md	(working copy)
@@ -1040,59 +1040,7 @@ (define_insn "vec_widen_smult_odd_v8hi"
   [(set_attr "type" "veccomplex")])
 
 
-;; logical ops.  Have the logical ops follow the memory ops in
-;; terms of whether to prefer VSX or Altivec
-
-;; AND has a clobber to be consistant with VSX, which adds splitters for using
-;; the GPR registers.
-(define_insn "*altivec_and<mode>3"
-  [(set (match_operand:VM 0 "register_operand" "=v")
-        (and:VM (match_operand:VM 1 "register_operand" "v")
-		(match_operand:VM 2 "register_operand" "v")))
-   (clobber (match_scratch:CC 3 "=X"))]
-  "VECTOR_MEM_ALTIVEC_P (<MODE>mode)"
-  "vand %0,%1,%2"
-  [(set_attr "type" "vecsimple")])
-
-(define_insn "*altivec_ior<mode>3"
-  [(set (match_operand:VM 0 "register_operand" "=v")
-        (ior:VM (match_operand:VM 1 "register_operand" "v")
-		(match_operand:VM 2 "register_operand" "v")))]
-  "VECTOR_MEM_ALTIVEC_P (<MODE>mode)"
-  "vor %0,%1,%2"
-  [(set_attr "type" "vecsimple")])
-
-(define_insn "*altivec_xor<mode>3"
-  [(set (match_operand:VM 0 "register_operand" "=v")
-        (xor:VM (match_operand:VM 1 "register_operand" "v")
-		(match_operand:VM 2 "register_operand" "v")))]
-  "VECTOR_MEM_ALTIVEC_P (<MODE>mode)"
-  "vxor %0,%1,%2"
-  [(set_attr "type" "vecsimple")])
-
-(define_insn "*altivec_one_cmpl<mode>2"
-  [(set (match_operand:VM 0 "register_operand" "=v")
-        (not:VM (match_operand:VM 1 "register_operand" "v")))]
-  "VECTOR_MEM_ALTIVEC_P (<MODE>mode)"
-  "vnor %0,%1,%1"
-  [(set_attr "type" "vecsimple")])
-  
-(define_insn "*altivec_nor<mode>3"
-  [(set (match_operand:VM 0 "register_operand" "=v")
-	(and:VM (not:VM (match_operand:VM 1 "register_operand" "v"))
-		(not:VM (match_operand:VM 2 "register_operand" "v"))))]
-  "VECTOR_MEM_ALTIVEC_P (<MODE>mode)"
-  "vnor %0,%1,%2"
-  [(set_attr "type" "vecsimple")])
-
-(define_insn "*altivec_andc<mode>3"
-  [(set (match_operand:VM 0 "register_operand" "=v")
-        (and:VM (not:VM (match_operand:VM 2 "register_operand" "v"))
-		(match_operand:VM 1 "register_operand" "v")))]
-  "VECTOR_MEM_ALTIVEC_P (<MODE>mode)"
-  "vandc %0,%1,%2"
-  [(set_attr "type" "vecsimple")])
-
+;; Vector pack/unpack
 (define_insn "altivec_vpkpx"
   [(set (match_operand:V8HI 0 "register_operand" "=v")
         (unspec:V8HI [(match_operand:V4SI 1 "register_operand" "v")
Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(revision 201180)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -391,6 +391,77 @@ (define_mode_attr E500_CONVERT [(SF "!TA
 
 (define_mode_attr TARGET_FLOAT [(SF "TARGET_SINGLE_FLOAT")
 				(DF "TARGET_DOUBLE_FLOAT")])
+
+;; Mode iterator for logical operations on 128-bit types
+(define_mode_iterator BOOL_128		[TI
+					 PTI
+					 (V16QI	"TARGET_ALTIVEC")
+					 (V8HI	"TARGET_ALTIVEC")
+					 (V4SI	"TARGET_ALTIVEC")
+					 (V4SF	"TARGET_ALTIVEC")
+					 (V2DI	"TARGET_ALTIVEC")
+					 (V2DF	"TARGET_ALTIVEC")])
+
+;; For the GPRs we use 3 constraints for register outputs, two that are the
+;; same as the output register, and a third where the output register is an
+;; early clobber, so we don't have to deal with register overlaps.  For the
+;; vector types, we prefer to use the vector registers.  For TI mode, allow
+;; either.
+
+;; Mode attribute for boolean operation register constraints for output
+(define_mode_attr BOOL_REGS_OUTPUT	[(TI	"&r,r,r,wa,v")
+					 (PTI	"&r,r,r")
+					 (V16QI	"wa,v,&?r,?r,?r")
+					 (V8HI	"wa,v,&?r,?r,?r")
+					 (V4SI	"wa,v,&?r,?r,?r")
+					 (V4SF	"wa,v,&?r,?r,?r")
+					 (V2DI	"wa,v,&?r,?r,?r")
+					 (V2DF	"wa,v,&?r,?r,?r")])
+
+;; Mode attribute for boolean operation register constraints for operand1
+(define_mode_attr BOOL_REGS_OP1		[(TI	"r,0,r,wa,v")
+					 (PTI	"r,0,r")
+					 (V16QI	"wa,v,r,0,r")
+					 (V8HI	"wa,v,r,0,r")
+					 (V4SI	"wa,v,r,0,r")
+					 (V4SF	"wa,v,r,0,r")
+					 (V2DI	"wa,v,r,0,r")
+					 (V2DF	"wa,v,r,0,r")])
+
+;; Mode attribute for boolean operation register constraints for operand2
+(define_mode_attr BOOL_REGS_OP2		[(TI	"r,r,0,wa,v")
+					 (PTI	"r,r,0")
+					 (V16QI	"wa,v,r,r,0")
+					 (V8HI	"wa,v,r,r,0")
+					 (V4SI	"wa,v,r,r,0")
+					 (V4SF	"wa,v,r,r,0")
+					 (V2DI	"wa,v,r,r,0")
+					 (V2DF	"wa,v,r,r,0")])
+
+;; Mode attribute for boolean operation register constraints for operand1
+;; for one_cmpl.  To simplify things, we repeat the constraint where 0
+;; is used for operand1 or operand2
+(define_mode_attr BOOL_REGS_UNARY	[(TI	"r,0,0,wa,v")
+					 (PTI	"r,0,0")
+					 (V16QI	"wa,v,r,0,0")
+					 (V8HI	"wa,v,r,0,0")
+					 (V4SI	"wa,v,r,0,0")
+					 (V4SF	"wa,v,r,0,0")
+					 (V2DI	"wa,v,r,0,0")
+					 (V2DF	"wa,v,r,0,0")])
+
+;; Mode attribute for the clobber of CC0 for AND expansion.
+;; For the 128-bit types, we never do AND immediate, but we need to
+;; get the correct number of X's for the number of operands.
+(define_mode_attr BOOL_REGS_AND_CR0	[(TI	"X,X,X,X,X")
+					 (PTI	"X,X,X")
+					 (V16QI	"X,X,X,X,X")
+					 (V8HI	"X,X,X,X,X")
+					 (V4SI	"X,X,X,X,X")
+					 (V4SF	"X,X,X,X,X")
+					 (V2DI	"X,X,X,X,X")
+					 (V2DF	"X,X,X,X,X")])
+
 \f
 ;; Start with fixed-point load and store insns.  Here we put only the more
 ;; complex forms.  Basic data transfer is done later.
@@ -1840,7 +1911,19 @@ (define_split
     FAIL;
 })
 
-(define_insn "one_cmpl<mode>2"
+(define_expand "one_cmpl<mode>2"
+  [(set (match_operand:SDI 0 "gpc_reg_operand" "")
+	(not:SDI (match_operand:SDI 1 "gpc_reg_operand" "")))]
+  ""
+{
+  if (<MODE>mode == DImode && !TARGET_POWERPC64)
+    {
+      rs6000_split_logical (operands, NOT, false, false, false, NULL_RTX);
+      DONE;
+    }
+})
+
+(define_insn "*one_cmpl<mode>2"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
 	(not:GPR (match_operand:GPR 1 "gpc_reg_operand" "r")))]
   ""
@@ -7962,10 +8045,19 @@ (define_expand "anddi3"
   [(parallel
     [(set (match_operand:DI 0 "gpc_reg_operand" "")
 	  (and:DI (match_operand:DI 1 "gpc_reg_operand" "")
-		  (match_operand:DI 2 "and64_2_operand" "")))
+		  (match_operand:DI 2 "reg_or_cint_operand" "")))
      (clobber (match_scratch:CC 3 ""))])]
-  "TARGET_POWERPC64"
-  "")
+  ""
+{
+  if (!TARGET_POWERPC64)
+    {
+      rtx cc = gen_rtx_SCRATCH (CCmode);
+      rs6000_split_logical (operands, AND, false, false, false, cc);
+      DONE;
+    }
+  else if (!and64_2_operand (operands[2], DImode))
+    operands[2] = force_reg (DImode, operands[2]);
+})
 
 (define_insn "anddi3_mc"
   [(set (match_operand:DI 0 "gpc_reg_operand" "=r,r,r,r,r,r")
@@ -8146,11 +8238,17 @@ (define_split
 (define_expand "iordi3"
   [(set (match_operand:DI 0 "gpc_reg_operand" "")
 	(ior:DI (match_operand:DI 1 "gpc_reg_operand" "")
-		(match_operand:DI 2 "reg_or_logical_cint_operand" "")))]
-  "TARGET_POWERPC64"
-  "
+		(match_operand:DI 2 "reg_or_cint_operand" "")))]
+  ""
 {
-  if (non_logical_cint_operand (operands[2], DImode))
+  if (!TARGET_POWERPC64)
+    {
+      rs6000_split_logical (operands, IOR, false, false, false, NULL_RTX);
+      DONE;
+    }
+  else if (!reg_or_logical_cint_operand (operands[2], DImode))
+    operands[2] = force_reg (DImode, operands[2]);
+  else if (non_logical_cint_operand (operands[2], DImode))
     {
       HOST_WIDE_INT value;
       rtx tmp = ((!can_create_pseudo_p ()
@@ -8164,15 +8262,21 @@ (define_expand "iordi3"
       emit_insn (gen_iordi3 (operands[0], tmp, GEN_INT (value & 0xffff)));
       DONE;
     }
-}")
+})
 
 (define_expand "xordi3"
   [(set (match_operand:DI 0 "gpc_reg_operand" "")
 	(xor:DI (match_operand:DI 1 "gpc_reg_operand" "")
-		(match_operand:DI 2 "reg_or_logical_cint_operand" "")))]
-  "TARGET_POWERPC64"
-  "
+		(match_operand:DI 2 "reg_or_cint_operand" "")))]
+  ""
 {
+  if (!TARGET_POWERPC64)
+    {
+      rs6000_split_logical (operands, XOR, false, false, false, NULL_RTX);
+      DONE;
+    }
+  else if (!reg_or_logical_cint_operand (operands[2], DImode))
+    operands[2] = force_reg (DImode, operands[2]);
   if (non_logical_cint_operand (operands[2], DImode))
     {
       HOST_WIDE_INT value;
@@ -8187,7 +8291,7 @@ (define_expand "xordi3"
       emit_insn (gen_xordi3 (operands[0], tmp, GEN_INT (value & 0xffff)));
       DONE;
     }
-}")
+})
 
 (define_insn "*booldi3_internal1"
   [(set (match_operand:DI 0 "gpc_reg_operand" "=r,r,r")
@@ -8425,6 +8529,372 @@ (define_insn "*eqv<mode>3"
    (set_attr "length" "4")])
 
 \f
+;; 128-bit logical operations expanders
+
+(define_expand "and<mode>3"
+  [(parallel [(set (match_operand:BOOL_128 0 "vlogical_operand" "")
+		   (and:BOOL_128
+		    (match_operand:BOOL_128 1 "vlogical_operand" "")
+		    (match_operand:BOOL_128 2 "vlogical_operand" "")))
+	      (clobber (match_scratch:CC 3 ""))])]
+  ""
+  "")
+
+(define_expand "ior<mode>3"
+  [(set (match_operand:BOOL_128 0 "vlogical_operand" "")
+        (ior:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand" "")
+		      (match_operand:BOOL_128 2 "vlogical_operand" "")))]
+  ""
+  "")
+
+(define_expand "xor<mode>3"
+  [(set (match_operand:BOOL_128 0 "vlogical_operand" "")
+        (xor:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand" "")
+		      (match_operand:BOOL_128 2 "vlogical_operand" "")))]
+  ""
+  "")
+
+(define_expand "one_cmpl<mode>2"
+  [(set (match_operand:BOOL_128 0 "vlogical_operand" "")
+        (not:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand" "")))]
+  ""
+  "")
+
+(define_expand "nor<mode>3"
+  [(set (match_operand:BOOL_128 0 "vlogical_operand" "")
+	(and:BOOL_128
+	 (not:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand" ""))
+	 (not:BOOL_128 (match_operand:BOOL_128 2 "vlogical_operand" ""))))]
+  ""
+  "")
+
+(define_expand "andc<mode>3"
+  [(set (match_operand:BOOL_128 0 "vlogical_operand" "")
+        (and:BOOL_128
+	 (not:BOOL_128 (match_operand:BOOL_128 2 "vlogical_operand" ""))
+	 (match_operand:BOOL_128 1 "vlogical_operand" "")))]
+  ""
+  "")
+
+;; Power8 vector logical instructions.
+(define_expand "eqv<mode>3"
+  [(set (match_operand:BOOL_128 0 "vlogical_operand" "")
+	(not:BOOL_128
+	 (xor:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand" "")
+		       (match_operand:BOOL_128 2 "vlogical_operand" ""))))]
+  "<MODE>mode == TImode || <MODE>mode == PTImode || TARGET_P8_VECTOR"
+  "")
+
+;; Rewrite nand into canonical form
+(define_expand "nand<mode>3"
+  [(set (match_operand:BOOL_128 0 "vlogical_operand" "")
+	(ior:BOOL_128
+	 (not:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand" ""))
+	 (not:BOOL_128 (match_operand:BOOL_128 2 "vlogical_operand" ""))))]
+  "<MODE>mode == TImode || <MODE>mode == PTImode || TARGET_P8_VECTOR"
+  "")
+
+;; The canonical form is to have the negated element first, so we need to
+;; reverse arguments.
+(define_expand "orc<mode>3"
+  [(set (match_operand:BOOL_128 0 "vlogical_operand" "")
+	(ior:BOOL_128
+	 (not:BOOL_128 (match_operand:BOOL_128 2 "vlogical_operand" ""))
+	 (match_operand:BOOL_128 1 "vlogical_operand" "")))]
+  "<MODE>mode == TImode || <MODE>mode == PTImode || TARGET_P8_VECTOR"
+  "")
+
+;; 128-bit logical operations insns and split operations
+(define_insn_and_split "*and<mode>3_internal"
+  [(set (match_operand:BOOL_128 0 "vlogical_operand" "=<BOOL_REGS_OUTPUT>")
+        (and:BOOL_128
+	 (match_operand:BOOL_128 1 "vlogical_operand" "%<BOOL_REGS_OP1>")
+	 (match_operand:BOOL_128 2 "vlogical_operand" "<BOOL_REGS_OP2>")))
+   (clobber (match_scratch:CC 3 "<BOOL_REGS_AND_CR0>"))]
+  ""
+{
+  if (TARGET_VSX && vsx_register_operand (operands[0], <MODE>mode))
+    return "xxland %x0,%x1,%x2";
+
+  if (TARGET_ALTIVEC && altivec_register_operand (operands[0], <MODE>mode))
+    return "vand %0,%1,%2";
+
+  return "#";
+}
+  "reload_completed && int_reg_operand (operands[0], <MODE>mode)"
+  [(const_int 0)]
+{
+  rs6000_split_logical (operands, AND, false, false, false, operands[3]);
+  DONE;
+}
+  [(set (attr "type")
+      (if_then_else
+	(match_test "vsx_register_operand (operands[0], <MODE>mode)")
+	(const_string "vecsimple")
+	(const_string "integer")))
+   (set (attr "length")
+      (if_then_else
+	(match_test "vsx_register_operand (operands[0], <MODE>mode)")
+	(const_string "4")
+	(if_then_else
+	 (match_test "TARGET_POWERPC64")
+	 (const_string "8")
+	 (const_string "16"))))])
+
+;; 128-bit IOR/XOR
+(define_insn_and_split "*bool<mode>3_internal"
+  [(set (match_operand:BOOL_128 0 "vlogical_operand" "=<BOOL_REGS_OUTPUT>")
+	(match_operator:BOOL_128 3 "boolean_or_operator"
+	 [(match_operand:BOOL_128 1 "vlogical_operand" "%<BOOL_REGS_OP1>")
+	  (match_operand:BOOL_128 2 "vlogical_operand" "<BOOL_REGS_OP2>")]))]
+  ""
+{
+  if (TARGET_VSX && vsx_register_operand (operands[0], <MODE>mode))
+    return "xxl%q3 %x0,%x1,%x2";
+
+  if (TARGET_ALTIVEC && altivec_register_operand (operands[0], <MODE>mode))
+    return "v%q3 %0,%1,%2";
+
+  return "#";
+}
+  "reload_completed && int_reg_operand (operands[0], <MODE>mode)"
+  [(const_int 0)]
+{
+  rs6000_split_logical (operands, GET_CODE (operands[3]), false, false, false,
+			NULL_RTX);
+  DONE;
+}
+  [(set (attr "type")
+      (if_then_else
+	(match_test "vsx_register_operand (operands[0], <MODE>mode)")
+	(const_string "vecsimple")
+	(const_string "integer")))
+   (set (attr "length")
+      (if_then_else
+	(match_test "vsx_register_operand (operands[0], <MODE>mode)")
+	(const_string "4")
+	(if_then_else
+	 (match_test "TARGET_POWERPC64")
+	 (const_string "8")
+	 (const_string "16"))))])
+
+;; 128-bit ANDC/ORC
+(define_insn_and_split "*boolc<mode>3_internal1"
+  [(set (match_operand:BOOL_128 0 "vlogical_operand" "=<BOOL_REGS_OUTPUT>")
+	(match_operator:BOOL_128 3 "boolean_operator"
+	 [(not:BOOL_128
+	   (match_operand:BOOL_128 2 "vlogical_operand" "<BOOL_REGS_OP1>"))
+	  (match_operand:BOOL_128 1 "vlogical_operand" "<BOOL_REGS_OP2>")]))]
+  "TARGET_P8_VECTOR || (GET_CODE (operands[3]) == AND)"
+{
+  if (TARGET_VSX && vsx_register_operand (operands[0], <MODE>mode))
+    return "xxl%q3 %x0,%x1,%x2";
+
+  if (TARGET_ALTIVEC && altivec_register_operand (operands[0], <MODE>mode))
+    return "v%q3 %0,%1,%2";
+
+  return "#";
+}
+  "(TARGET_P8_VECTOR || (GET_CODE (operands[3]) == AND))
+   && reload_completed && int_reg_operand (operands[0], <MODE>mode)"
+  [(const_int 0)]
+{
+  rs6000_split_logical (operands, GET_CODE (operands[3]), false, true, false,
+			NULL_RTX);
+  DONE;
+}
+  [(set (attr "type")
+      (if_then_else
+	(match_test "vsx_register_operand (operands[0], <MODE>mode)")
+	(const_string "vecsimple")
+	(const_string "integer")))
+   (set (attr "length")
+      (if_then_else
+	(match_test "vsx_register_operand (operands[0], <MODE>mode)")
+	(const_string "4")
+	(if_then_else
+	 (match_test "TARGET_POWERPC64")
+	 (const_string "8")
+	 (const_string "16"))))])
+
+(define_insn_and_split "*boolc<mode>3_internal2"
+  [(set (match_operand:TI2 0 "int_reg_operand" "=&r,r,r")
+	(match_operator:TI2 3 "boolean_operator"
+	 [(not:TI2
+	   (match_operand:TI2 1 "int_reg_operand" "r,0,r"))
+	  (match_operand:TI2 2 "int_reg_operand" "r,r,0")]))]
+  "!TARGET_P8_VECTOR && (GET_CODE (operands[3]) != AND)"
+  "#"
+  "reload_completed && !TARGET_P8_VECTOR && (GET_CODE (operands[3]) != AND)"
+  [(const_int 0)]
+{
+  rs6000_split_logical (operands, GET_CODE (operands[3]), false, true, false,
+			NULL_RTX);
+  DONE;
+}
+  [(set_attr "type" "integer")
+   (set (attr "length")
+	(if_then_else
+	 (match_test "TARGET_POWERPC64")
+	 (const_string "8")
+	 (const_string "16")))])
+
+;; 128-bit NAND/NOR
+(define_insn_and_split "*boolcc<mode>3_internal1"
+  [(set (match_operand:BOOL_128 0 "vlogical_operand" "=<BOOL_REGS_OUTPUT>")
+	(match_operator:BOOL_128 3 "boolean_operator"
+	 [(not:BOOL_128
+	   (match_operand:BOOL_128 1 "vlogical_operand" "<BOOL_REGS_OP1>"))
+	  (not:BOOL_128
+	   (match_operand:BOOL_128 2 "vlogical_operand" "<BOOL_REGS_OP2>"))]))]
+  "TARGET_P8_VECTOR || (GET_CODE (operands[3]) == AND)"
+{
+  if (TARGET_VSX && vsx_register_operand (operands[0], <MODE>mode))
+    return "xxl%q3 %x0,%x1,%x2";
+
+  if (TARGET_ALTIVEC && altivec_register_operand (operands[0], <MODE>mode))
+    return "v%q3 %0,%1,%2";
+
+  return "#";
+}
+  "(TARGET_P8_VECTOR || (GET_CODE (operands[3]) == AND))
+   && reload_completed && int_reg_operand (operands[0], <MODE>mode)"
+  [(const_int 0)]
+{
+  rs6000_split_logical (operands, GET_CODE (operands[3]), false, true, true,
+			NULL_RTX);
+  DONE;
+}
+  [(set (attr "type")
+      (if_then_else
+	(match_test "vsx_register_operand (operands[0], <MODE>mode)")
+	(const_string "vecsimple")
+	(const_string "integer")))
+   (set (attr "length")
+      (if_then_else
+	(match_test "vsx_register_operand (operands[0], <MODE>mode)")
+	(const_string "4")
+	(if_then_else
+	 (match_test "TARGET_POWERPC64")
+	 (const_string "8")
+	 (const_string "16"))))])
+
+(define_insn_and_split "*boolcc<mode>3_internal2"
+  [(set (match_operand:TI2 0 "int_reg_operand" "=&r,r,r")
+	(match_operator:TI2 3 "boolean_operator"
+	 [(not:TI2
+	   (match_operand:TI2 1 "int_reg_operand" "r,0,r"))
+	  (not:TI2
+	   (match_operand:TI2 2 "int_reg_operand" "r,r,0"))]))]
+  "!TARGET_P8_VECTOR && (GET_CODE (operands[3]) != AND)"
+  "#"
+  "reload_completed && !TARGET_P8_VECTOR && (GET_CODE (operands[3]) != AND)"
+  [(const_int 0)]
+{
+  rs6000_split_logical (operands, GET_CODE (operands[3]), false, true, true,
+			NULL_RTX);
+  DONE;
+}
+  [(set_attr "type" "integer")
+   (set (attr "length")
+	(if_then_else
+	 (match_test "TARGET_POWERPC64")
+	 (const_string "8")
+	 (const_string "16")))])
+
+
+;; 128-bit EQV
+(define_insn_and_split "*eqv<mode>3_internal1"
+  [(set (match_operand:BOOL_128 0 "vlogical_operand" "=<BOOL_REGS_OUTPUT>")
+	(not:BOOL_128
+	 (xor:BOOL_128
+	  (match_operand:BOOL_128 1 "vlogical_operand" "<BOOL_REGS_OP1>")
+	  (match_operand:BOOL_128 2 "vlogical_operand" "<BOOL_REGS_OP2>"))))]
+  "TARGET_P8_VECTOR"
+{
+  if (vsx_register_operand (operands[0], <MODE>mode))
+    return "xxleqv %x0,%x1,%x2";
+
+  return "#";
+}
+  "TARGET_P8_VECTOR && reload_completed
+   && int_reg_operand (operands[0], <MODE>mode)"
+  [(const_int 0)]
+{
+  rs6000_split_logical (operands, XOR, true, false, false, NULL_RTX);
+  DONE;
+}
+  [(set (attr "type")
+      (if_then_else
+	(match_test "vsx_register_operand (operands[0], <MODE>mode)")
+	(const_string "vecsimple")
+	(const_string "integer")))
+   (set (attr "length")
+      (if_then_else
+	(match_test "vsx_register_operand (operands[0], <MODE>mode)")
+	(const_string "4")
+	(if_then_else
+	 (match_test "TARGET_POWERPC64")
+	 (const_string "8")
+	 (const_string "16"))))])
+
+(define_insn_and_split "*eqv<mode>3_internal2"
+  [(set (match_operand:TI2 0 "int_reg_operand" "=&r,r,r")
+	(not:TI2
+	 (xor:TI2
+	  (match_operand:TI2 1 "int_reg_operand" "r,0,r")
+	  (match_operand:TI2 2 "int_reg_operand" "r,r,0"))))]
+  "!TARGET_P8_VECTOR"
+  "#"
+  "reload_completed && !TARGET_P8_VECTOR"
+  [(const_int 0)]
+{
+  rs6000_split_logical (operands, XOR, true, false, false, NULL_RTX);
+  DONE;
+}
+  [(set_attr "type" "integer")
+   (set (attr "length")
+	(if_then_else
+	 (match_test "TARGET_POWERPC64")
+	 (const_string "8")
+	 (const_string "16")))])
+
+;; 128-bit one's complement
+(define_insn_and_split "*one_cmpl<mode>3_internal"
+  [(set (match_operand:BOOL_128 0 "vlogical_operand" "=<BOOL_REGS_OUTPUT>")
+	(not:BOOL_128
+	  (match_operand:BOOL_128 1 "vlogical_operand" "<BOOL_REGS_UNARY>")))]
+  ""
+{
+  if (TARGET_VSX && vsx_register_operand (operands[0], <MODE>mode))
+    return "xxlnor %x0,%x1,%x1";
+
+  if (TARGET_ALTIVEC && altivec_register_operand (operands[0], <MODE>mode))
+    return "vnor %0,%1,%1";
+
+  return "#";
+}
+  "reload_completed && int_reg_operand (operands[0], <MODE>mode)"
+  [(const_int 0)]
+{
+  rs6000_split_logical (operands, NOT, false, false, false, NULL_RTX);
+  DONE;
+}
+  [(set (attr "type")
+      (if_then_else
+	(match_test "vsx_register_operand (operands[0], <MODE>mode)")
+	(const_string "vecsimple")
+	(const_string "integer")))
+   (set (attr "length")
+      (if_then_else
+	(match_test "vsx_register_operand (operands[0], <MODE>mode)")
+	(const_string "4")
+	(if_then_else
+	 (match_test "TARGET_POWERPC64")
+	 (const_string "8")
+	 (const_string "16"))))])
+
+\f
 ;; Now define ways of moving data around.
 
 ;; Set up a register with a value from the GOT table
Index: gcc/testsuite/gcc.target/powerpc/bool2-p8.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/bool2-p8.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/bool2-p8.c	(revision 0)
@@ -0,0 +1,32 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-O2 -mcpu=power8" } */
+/* { dg-final { scan-assembler-not "\[ \t\]and "     } } */
+/* { dg-final { scan-assembler-not "\[ \t\]or "      } } */
+/* { dg-final { scan-assembler-not "\[ \t\]xor "     } } */
+/* { dg-final { scan-assembler-not "\[ \t\]nor "     } } */
+/* { dg-final { scan-assembler-not "\[ \t\]eqv "     } } */
+/* { dg-final { scan-assembler-not "\[ \t\]andc "    } } */
+/* { dg-final { scan-assembler-not "\[ \t\]orc "     } } */
+/* { dg-final { scan-assembler-not "\[ \t\]nand "    } } */
+/* { dg-final { scan-assembler-not "\[ \t\]vand "    } } */
+/* { dg-final { scan-assembler-not "\[ \t\]vandc "   } } */
+/* { dg-final { scan-assembler-not "\[ \t\]vor "     } } */
+/* { dg-final { scan-assembler-not "\[ \t\]vxor "    } } */
+/* { dg-final { scan-assembler-not "\[ \t\]vnor "    } } */
+/* { dg-final { scan-assembler     "\[ \t\]xxland "  } } */
+/* { dg-final { scan-assembler     "\[ \t\]xxlor "   } } */
+/* { dg-final { scan-assembler     "\[ \t\]xxlxor "  } } */
+/* { dg-final { scan-assembler     "\[ \t\]xxlnor "  } } */
+/* { dg-final { scan-assembler     "\[ \t\]xxlandc " } } */
+/* { dg-final { scan-assembler     "\[ \t\]xxleqv "  } } */
+/* { dg-final { scan-assembler     "\[ \t\]xxlorc "  } } */
+/* { dg-final { scan-assembler     "\[ \t\]xxlnand " } } */
+
+#ifndef TYPE
+typedef int v4si __attribute__ ((vector_size (16)));
+#define TYPE v4si
+#endif
+
+#include "bool2.h"
Index: gcc/testsuite/gcc.target/powerpc/bool2-av.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/bool2-av.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/bool2-av.c	(revision 0)
@@ -0,0 +1,32 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-options "-O2 -mcpu=power6 -maltivec" } */
+/* { dg-final { scan-assembler-not "\[ \t\]and "     } } */
+/* { dg-final { scan-assembler-not "\[ \t\]or "      } } */
+/* { dg-final { scan-assembler-not "\[ \t\]xor "     } } */
+/* { dg-final { scan-assembler-not "\[ \t\]nor "     } } */
+/* { dg-final { scan-assembler-not "\[ \t\]andc "    } } */
+/* { dg-final { scan-assembler-not "\[ \t\]eqv "     } } */
+/* { dg-final { scan-assembler-not "\[ \t\]orc "     } } */
+/* { dg-final { scan-assembler-not "\[ \t\]nand "    } } */
+/* { dg-final { scan-assembler     "\[ \t\]vand "    } } */
+/* { dg-final { scan-assembler     "\[ \t\]vandc "   } } */
+/* { dg-final { scan-assembler     "\[ \t\]vor "     } } */
+/* { dg-final { scan-assembler     "\[ \t\]vxor "    } } */
+/* { dg-final { scan-assembler     "\[ \t\]vnor "    } } */
+/* { dg-final { scan-assembler-not "\[ \t\]xxland "  } } */
+/* { dg-final { scan-assembler-not "\[ \t\]xxlor "   } } */
+/* { dg-final { scan-assembler-not "\[ \t\]xxlxor "  } } */
+/* { dg-final { scan-assembler-not "\[ \t\]xxlnor "  } } */
+/* { dg-final { scan-assembler-not "\[ \t\]xxlandc " } } */
+/* { dg-final { scan-assembler-not "\[ \t\]xxleqv "  } } */
+/* { dg-final { scan-assembler-not "\[ \t\]xxlorc "  } } */
+/* { dg-final { scan-assembler-not "\[ \t\]xxlnand " } } */
+
+#ifndef TYPE
+typedef int v4si __attribute__ ((vector_size (16)));
+#define TYPE v4si
+#endif
+
+#include "bool2.h"
Index: gcc/testsuite/gcc.target/powerpc/bool3-p7.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/bool3-p7.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/bool3-p7.c	(revision 0)
@@ -0,0 +1,37 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mcpu=power7" } */
+/* { dg-final { scan-assembler	   "\[ \t\]and "     } } */
+/* { dg-final { scan-assembler	   "\[ \t\]or "      } } */
+/* { dg-final { scan-assembler	   "\[ \t\]xor "     } } */
+/* { dg-final { scan-assembler	   "\[ \t\]nor "     } } */
+/* { dg-final { scan-assembler	   "\[ \t\]andc "    } } */
+/* { dg-final { scan-assembler-not "\[ \t\]vand "    } } */
+/* { dg-final { scan-assembler-not "\[ \t\]vandc "   } } */
+/* { dg-final { scan-assembler-not "\[ \t\]vor "     } } */
+/* { dg-final { scan-assembler-not "\[ \t\]vxor "    } } */
+/* { dg-final { scan-assembler-not "\[ \t\]vnor "    } } */
+/* { dg-final { scan-assembler-not "\[ \t\]xxland "  } } */
+/* { dg-final { scan-assembler-not "\[ \t\]xxlor "   } } */
+/* { dg-final { scan-assembler-not "\[ \t\]xxlxor "  } } */
+/* { dg-final { scan-assembler-not "\[ \t\]xxlnor "  } } */
+/* { dg-final { scan-assembler-not "\[ \t\]xxlandc " } } */
+/* { dg-final { scan-assembler-not "\[ \t\]xxleqv "  } } */
+/* { dg-final { scan-assembler-not "\[ \t\]xxlorc "  } } */
+/* { dg-final { scan-assembler-not "\[ \t\]xxlnand " } } */
+
+/* On power7, for 128-bit types, ORC/ANDC/EQV might not show up, since the
+   vector unit doesn't support these, so the appropriate combine patterns may
+   not be generated.  */
+
+#ifndef TYPE
+#ifdef _ARCH_PPC64
+#define TYPE __int128_t
+#else
+typedef int v4si __attribute__ ((vector_size (16)));
+#define TYPE v4si
+#endif
+#endif
+
+#include "bool3.h"
Index: gcc/testsuite/gcc.target/powerpc/bool3-p8.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/bool3-p8.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/bool3-p8.c	(revision 0)
@@ -0,0 +1,36 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-O2 -mcpu=power8" } */
+/* { dg-final { scan-assembler	   "\[ \t\]and "     } } */
+/* { dg-final { scan-assembler	   "\[ \t\]or "      } } */
+/* { dg-final { scan-assembler	   "\[ \t\]xor "     } } */
+/* { dg-final { scan-assembler	   "\[ \t\]nor "     } } */
+/* { dg-final { scan-assembler	   "\[ \t\]andc "    } } */
+/* { dg-final { scan-assembler	   "\[ \t\]eqv "     } } */
+/* { dg-final { scan-assembler	   "\[ \t\]orc "     } } */
+/* { dg-final { scan-assembler	   "\[ \t\]nand "    } } */
+/* { dg-final { scan-assembler-not "\[ \t\]vand "    } } */
+/* { dg-final { scan-assembler-not "\[ \t\]vandc "   } } */
+/* { dg-final { scan-assembler-not "\[ \t\]vor "     } } */
+/* { dg-final { scan-assembler-not "\[ \t\]vxor "    } } */
+/* { dg-final { scan-assembler-not "\[ \t\]vnor "    } } */
+/* { dg-final { scan-assembler-not "\[ \t\]xxland "  } } */
+/* { dg-final { scan-assembler-not "\[ \t\]xxlor "   } } */
+/* { dg-final { scan-assembler-not "\[ \t\]xxlxor "  } } */
+/* { dg-final { scan-assembler-not "\[ \t\]xxlnor "  } } */
+/* { dg-final { scan-assembler-not "\[ \t\]xxlandc " } } */
+/* { dg-final { scan-assembler-not "\[ \t\]xxleqv "  } } */
+/* { dg-final { scan-assembler-not "\[ \t\]xxlorc "  } } */
+/* { dg-final { scan-assembler-not "\[ \t\]xxlnand " } } */
+
+#ifndef TYPE
+#ifdef _ARCH_PPC64
+#define TYPE __int128_t
+#else
+typedef int v4si __attribute__ ((vector_size (16)));
+#define TYPE v4si
+#endif
+#endif
+
+#include "bool3.h"
Index: gcc/testsuite/gcc.target/powerpc/bool3-av.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/bool3-av.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/bool3-av.c	(revision 0)
@@ -0,0 +1,37 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-options "-O2 -mcpu=power6 -mabi=altivec -maltivec -mno-vsx" } */
+/* { dg-final { scan-assembler	   "\[ \t\]and "     } } */
+/* { dg-final { scan-assembler	   "\[ \t\]or "      } } */
+/* { dg-final { scan-assembler	   "\[ \t\]xor "     } } */
+/* { dg-final { scan-assembler	   "\[ \t\]nor "     } } */
+/* { dg-final { scan-assembler	   "\[ \t\]andc "    } } */
+/* { dg-final { scan-assembler-not "\[ \t\]vand "    } } */
+/* { dg-final { scan-assembler-not "\[ \t\]vandc "   } } */
+/* { dg-final { scan-assembler-not "\[ \t\]vor "     } } */
+/* { dg-final { scan-assembler-not "\[ \t\]vxor "    } } */
+/* { dg-final { scan-assembler-not "\[ \t\]vnor "    } } */
+/* { dg-final { scan-assembler-not "\[ \t\]xxland "  } } */
+/* { dg-final { scan-assembler-not "\[ \t\]xxlor "   } } */
+/* { dg-final { scan-assembler-not "\[ \t\]xxlxor "  } } */
+/* { dg-final { scan-assembler-not "\[ \t\]xxlnor "  } } */
+/* { dg-final { scan-assembler-not "\[ \t\]xxlandc " } } */
+/* { dg-final { scan-assembler-not "\[ \t\]xxleqv "  } } */
+/* { dg-final { scan-assembler-not "\[ \t\]xxlorc "  } } */
+/* { dg-final { scan-assembler-not "\[ \t\]xxlnand " } } */
+
+/* On altivec, for 128-bit types, ORC/ANDC/EQV might not show up, since the
+   vector unit doesn't support these, so the appropriate combine patterns may
+   not be generated.  */
+
+#ifndef TYPE
+#ifdef _ARCH_PPC64
+#define TYPE __int128_t
+#else
+typedef int v4si __attribute__ ((vector_size (16)));
+#define TYPE v4si
+#endif
+#endif
+
+#include "bool3.h"
Index: gcc/testsuite/gcc.target/powerpc/bool2-p5.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/bool2-p5.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/bool2-p5.c	(revision 0)
@@ -0,0 +1,32 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-options "-O2 -mcpu=power5 -mabi=altivec -mno-altivec -mno-vsx" } */
+/* { dg-final { scan-assembler	   "\[ \t\]and "     } } */
+/* { dg-final { scan-assembler	   "\[ \t\]or "      } } */
+/* { dg-final { scan-assembler	   "\[ \t\]xor "     } } */
+/* { dg-final { scan-assembler	   "\[ \t\]nor "     } } */
+/* { dg-final { scan-assembler	   "\[ \t\]andc "    } } */
+/* { dg-final { scan-assembler	   "\[ \t\]eqv "     } } */
+/* { dg-final { scan-assembler	   "\[ \t\]orc "     } } */
+/* { dg-final { scan-assembler	   "\[ \t\]nand "    } } */
+/* { dg-final { scan-assembler-not "\[ \t\]vand "    } } */
+/* { dg-final { scan-assembler-not "\[ \t\]vandc "   } } */
+/* { dg-final { scan-assembler-not "\[ \t\]vor "     } } */
+/* { dg-final { scan-assembler-not "\[ \t\]vxor "    } } */
+/* { dg-final { scan-assembler-not "\[ \t\]vnor "    } } */
+/* { dg-final { scan-assembler-not "\[ \t\]xxland "  } } */
+/* { dg-final { scan-assembler-not "\[ \t\]xxlor "   } } */
+/* { dg-final { scan-assembler-not "\[ \t\]xxlxor "  } } */
+/* { dg-final { scan-assembler-not "\[ \t\]xxlnor "  } } */
+/* { dg-final { scan-assembler-not "\[ \t\]xxlandc " } } */
+/* { dg-final { scan-assembler-not "\[ \t\]xxleqv "  } } */
+/* { dg-final { scan-assembler-not "\[ \t\]xxlorc "  } } */
+/* { dg-final { scan-assembler-not "\[ \t\]xxlnand " } } */
+
+#ifndef TYPE
+typedef int v4si __attribute__ ((vector_size (16)));
+#define TYPE v4si
+#endif
+
+#include "bool2.h"
Index: gcc/testsuite/gcc.target/powerpc/bool2.h
===================================================================
--- gcc/testsuite/gcc.target/powerpc/bool2.h	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/bool2.h	(revision 0)
@@ -0,0 +1,29 @@
+/* Test various logical operations.  */
+
+TYPE arg1 (TYPE p, TYPE q) { return p & q; }		/* AND  */
+TYPE arg2 (TYPE p, TYPE q) { return p | q; }		/* OR   */
+TYPE arg3 (TYPE p, TYPE q) { return p ^ q; }		/* XOR  */
+TYPE arg4 (TYPE p)	   { return ~ p; }		/* NOR  */
+TYPE arg5 (TYPE p, TYPE q) { return ~(p & q); }		/* NAND */
+TYPE arg6 (TYPE p, TYPE q) { return ~(p | q); }		/* NOR  */
+TYPE arg7 (TYPE p, TYPE q) { return ~(p ^ q); }		/* EQV  */
+TYPE arg8 (TYPE p, TYPE q) { return (~p) & q; }		/* ANDC */
+TYPE arg9 (TYPE p, TYPE q) { return (~p) | q; }		/* ORC  */
+TYPE arg10(TYPE p, TYPE q) { return (~p) ^ q; }		/* EQV  */
+TYPE arg11(TYPE p, TYPE q) { return p & (~q); }		/* ANDC */
+TYPE arg12(TYPE p, TYPE q) { return p | (~q); }		/* ORC  */
+TYPE arg13(TYPE p, TYPE q) { return p ^ (~q); }		/* EQV  */
+
+void ptr1 (TYPE *p) { p[0] = p[1] & p[2]; }		/* AND  */
+void ptr2 (TYPE *p) { p[0] = p[1] | p[2]; }		/* OR   */
+void ptr3 (TYPE *p) { p[0] = p[1] ^ p[2]; }		/* XOR  */
+void ptr4 (TYPE *p) { p[0] = ~p[1]; }			/* NOR  */
+void ptr5 (TYPE *p) { p[0] = ~(p[1] & p[2]); }		/* NAND */
+void ptr6 (TYPE *p) { p[0] = ~(p[1] | p[2]); }		/* NOR  */
+void ptr7 (TYPE *p) { p[0] = ~(p[1] ^ p[2]); }		/* EQV  */
+void ptr8 (TYPE *p) { p[0] = ~(p[1]) & p[2]; }		/* ANDC */
+void ptr9 (TYPE *p) { p[0] = (~p[1]) | p[2]; }		/* ORC  */
+void ptr10(TYPE *p) { p[0] = (~p[1]) ^ p[2]; }		/* EQV  */
+void ptr11(TYPE *p) { p[0] = p[1] & (~p[2]); }		/* ANDC */
+void ptr12(TYPE *p) { p[0] = p[1] | (~p[2]); }		/* ORC  */
+void ptr13(TYPE *p) { p[0] = p[1] ^ (~p[2]); }		/* EQV  */
Index: gcc/testsuite/gcc.target/powerpc/bool3.h
===================================================================
--- gcc/testsuite/gcc.target/powerpc/bool3.h	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/bool3.h	(revision 0)
@@ -0,0 +1,186 @@
+/* Test forcing 128-bit logical types into GPR registers.  */
+
+#if defined(NO_ASM)
+#define FORCE_REG1(X)
+#define FORCE_REG2(X,Y)
+
+#else
+#if defined(USE_ALTIVEC)
+#define REG_CLASS "+v"
+#define PRINT_REG1 "# altivec reg %0"
+#define PRINT_REG2 "# altivec reg %0, %1"
+
+#elif defined(USE_FPR)
+#define REG_CLASS "+d"
+#define PRINT_REG1 "# fpr reg %0"
+#define PRINT_REG2 "# fpr reg %0, %1"
+
+#elif defined(USE_VSX)
+#define REG_CLASS "+wa"
+#define PRINT_REG1 "# vsx reg %x0"
+#define PRINT_REG2 "# vsx reg %x0, %x1"
+
+#else
+#define REG_CLASS "+r"
+#define PRINT_REG1 "# gpr reg %0"
+#define PRINT_REG2 "# gpr reg %0, %1"
+#endif
+
+#define FORCE_REG1(X) __asm__ (PRINT_REG1 : REG_CLASS (X))
+#define FORCE_REG2(X,Y) __asm__ (PRINT_REG2 : REG_CLASS (X), REG_CLASS (Y))
+#endif
+
+void ptr1 (TYPE *p)
+{
+  TYPE a = p[1];
+  TYPE b = p[2];
+  TYPE c;
+
+  FORCE_REG2 (a, b);
+  c = a & b;					/* AND */
+  FORCE_REG1 (c);
+  p[0] = c;
+}
+
+void ptr2 (TYPE *p)
+{
+  TYPE a = p[1];
+  TYPE b = p[2];
+  TYPE c;
+
+  FORCE_REG2 (a, b);
+  c = a | b;					/* OR */
+  FORCE_REG1 (c);
+  p[0] = c;
+}
+
+void ptr3 (TYPE *p)
+{
+  TYPE a = p[1];
+  TYPE b = p[2];
+  TYPE c;
+
+  FORCE_REG2 (a, b);
+  c = a ^ b;					/* XOR */
+  FORCE_REG1 (c);
+  p[0] = c;
+}
+
+void ptr4 (TYPE *p)
+{
+  TYPE a = p[1];
+  TYPE b;
+
+  FORCE_REG1 (a);
+  b = ~a;					/* NOR */
+  FORCE_REG1 (b);
+  p[0] = b;
+}
+
+void ptr5 (TYPE *p)
+{
+  TYPE a = p[1];
+  TYPE b = p[2];
+  TYPE c;
+
+  FORCE_REG2 (a, b);
+  c = ~(a & b);					   /* NAND */
+  FORCE_REG1 (c);
+  p[0] = c;
+}
+
+void ptr6 (TYPE *p)
+{
+  TYPE a = p[1];
+  TYPE b = p[2];
+  TYPE c;
+
+  FORCE_REG2 (a, b);
+  c = ~(a | b);					   /* AND */
+  FORCE_REG1 (c);
+  p[0] = c;
+}
+
+void ptr7 (TYPE *p)
+{
+  TYPE a = p[1];
+  TYPE b = p[2];
+  TYPE c;
+
+  FORCE_REG2 (a, b);
+  c = ~(a ^ b);					   /* EQV */
+  FORCE_REG1 (c);
+  p[0] = c;
+}
+
+void ptr8 (TYPE *p)
+{
+  TYPE a = p[1];
+  TYPE b = p[2];
+  TYPE c;
+
+  FORCE_REG2 (a, b);
+  c = (~a) & b;					   /* ANDC */
+  FORCE_REG1 (c);
+  p[0] = c;
+}
+
+void ptr9 (TYPE *p)
+{
+  TYPE a = p[1];
+  TYPE b = p[2];
+  TYPE c;
+
+  FORCE_REG2 (a, b);
+  c = (~a) | b;					   /* ORC */
+  FORCE_REG1 (c);
+  p[0] = c;
+}
+
+void ptr10 (TYPE *p)
+{
+  TYPE a = p[1];
+  TYPE b = p[2];
+  TYPE c;
+
+  FORCE_REG2 (a, b);
+  c = (~a) ^ b;					   /* EQV */
+  FORCE_REG1 (c);
+  p[0] = c;
+}
+
+void ptr11 (TYPE *p)
+{
+  TYPE a = p[1];
+  TYPE b = p[2];
+  TYPE c;
+
+  FORCE_REG2 (a, b);
+  c = a & (~b);					   /* ANDC */
+  FORCE_REG1 (c);
+  p[0] = c;
+}
+
+void ptr12 (TYPE *p)
+{
+  TYPE a = p[1];
+  TYPE b = p[2];
+  TYPE c;
+
+  FORCE_REG2 (a, b);
+  c = a | (~b);					   /* ORC */
+  FORCE_REG1 (c);
+  p[0] = c;
+}
+
+void ptr13 (TYPE *p)
+{
+  TYPE a = p[1];
+  TYPE b = p[2];
+  TYPE c;
+
+  FORCE_REG2 (a, b);
+  c = a ^ (~b);					   /* AND */
+  FORCE_REG1 (c);
+  p[0] = c;
+}
Index: gcc/testsuite/gcc.target/powerpc/bool2-p7.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/bool2-p7.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/bool2-p7.c	(revision 0)
@@ -0,0 +1,31 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mcpu=power7" } */
+/* { dg-final { scan-assembler-not "\[ \t\]and "     } } */
+/* { dg-final { scan-assembler-not "\[ \t\]or "      } } */
+/* { dg-final { scan-assembler-not "\[ \t\]xor "     } } */
+/* { dg-final { scan-assembler-not "\[ \t\]nor "     } } */
+/* { dg-final { scan-assembler-not "\[ \t\]eqv "     } } */
+/* { dg-final { scan-assembler-not "\[ \t\]andc "    } } */
+/* { dg-final { scan-assembler-not "\[ \t\]orc "     } } */
+/* { dg-final { scan-assembler-not "\[ \t\]nand "    } } */
+/* { dg-final { scan-assembler-not "\[ \t\]vand "    } } */
+/* { dg-final { scan-assembler-not "\[ \t\]vor "     } } */
+/* { dg-final { scan-assembler-not "\[ \t\]vxor "    } } */
+/* { dg-final { scan-assembler-not "\[ \t\]vnor "    } } */
+/* { dg-final { scan-assembler     "\[ \t\]xxland "  } } */
+/* { dg-final { scan-assembler     "\[ \t\]xxlor "   } } */
+/* { dg-final { scan-assembler     "\[ \t\]xxlxor "  } } */
+/* { dg-final { scan-assembler     "\[ \t\]xxlnor "  } } */
+/* { dg-final { scan-assembler     "\[ \t\]xxlandc " } } */
+/* { dg-final { scan-assembler-not "\[ \t\]xxleqv "  } } */
+/* { dg-final { scan-assembler-not "\[ \t\]xxlorc "  } } */
+/* { dg-final { scan-assembler-not "\[ \t\]xxlnand " } } */
+
+#ifndef TYPE
+typedef int v4si __attribute__ ((vector_size (16)));
+#define TYPE v4si
+#endif
+
+#include "bool2.h"

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH, rs6000] power8 patches, revised patch #8, power8 load fusion
  2013-05-22 20:53 ` [PATCH, rs6000] power8 patches, patch #8, power8 load fusion + misc Michael Meissner
  2013-06-18 18:30   ` David Edelsohn
@ 2013-07-29 18:46   ` Michael Meissner
  2013-07-31 16:00     ` David Edelsohn
  2013-11-23 16:48     ` Alan Modra
  1 sibling, 2 replies; 52+ messages in thread
From: Michael Meissner @ 2013-07-29 18:46 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, dje.gcc, pthaugen, bergner

[-- Attachment #1: Type: text/plain, Size: 1691 bytes --]

This is the revised version of my patch #8 for power8 support.  I have removed
all of the incidental changes, and only added the support for load fusion.  I
have added support for fusion on 32-bit Linux.  I have added a test to make
sure the fusion ops are being generated.

I have built a compiler on power7 with the bootstrap options to use
-mcpu=power7 -mtune=power8 (which turns on power8 fusion), and it bootstraps
with the change.  There are no make check regressions with this patch.

I also have done a full 64-bit spec 2006 run comparing my normal power7
configuration to a configuration that enables power8 fusion.  There were no
significant differences in any of the 29 benchmarks.

As I mentioned before a better solution is to rework the secondary reload
interface, so that we recognize more general addresses before reload, and keep
fusion addresses after reload.  I have started work on doing this, but I expect
it will be some time before I get the code so that it is stable, and perhaps a
longer period of time before it is acceptable for release, and corner case bugs
fixed.  So, I would like to install this temporary patch to enable load fusion
while I'm working on the final solution.  Is this patch ok to install?

I should note that while fusion will be enabled for AIX and for 64-bit small
code model with this patch, I don't expect you would see as many fusion
opportunities as you would under 32-bit Linux and 64-bit Linux for medium/large
code models, due to the fact that the TOC address is not a fusion opportunity.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

[-- Attachment #2: gcc-power8.official-08d --]
[-- Type: text/plain, Size: 19211 bytes --]

Index: gcc/config/rs6000/predicates.md
===================================================================
--- gcc/config/rs6000/predicates.md	(revision 201273)
+++ gcc/config/rs6000/predicates.md	(working copy)
@@ -1702,3 +1702,91 @@ (define_predicate "small_toc_ref"
 
   return GET_CODE (op) == UNSPEC && XINT (op, 1) == UNSPEC_TOCREL;
 })
+
+;; Match the first insn (addis) in fusing the combination of addis and loads to
+;; GPR registers on power8.
+(define_predicate "fusion_gpr_addis"
+  (match_code "const_int,high,plus")
+{
+  HOST_WIDE_INT value;
+  rtx int_const;
+
+  if (GET_CODE (op) == HIGH)
+    return 1;
+
+  if (CONST_INT_P (op))
+    int_const = op;
+
+  else if (GET_CODE (op) == PLUS
+	   && base_reg_operand (XEXP (op, 0), Pmode)
+	   && CONST_INT_P (XEXP (op, 1)))
+    int_const = XEXP (op, 1);
+
+  else
+    return 0;
+
+  /* Power8 currently will only do the fusion if the top 11 bits of the addis
+     value are all 1's or 0's.  */
+  value = INTVAL (int_const);
+  if ((value & (HOST_WIDE_INT)0xffff) != 0)
+    return 0;
+
+  if ((value & (HOST_WIDE_INT)0xffff0000) == 0)
+    return 0;
+
+  return (IN_RANGE (value >> 16, -32, 31));
+})
+
+;; Match the second insn (lbz, lhz, lwz, ld) in fusing the combination of addis
+;; and loads to GPR registers on power8.
+(define_predicate "fusion_gpr_mem_load"
+  (match_code "mem")
+{
+  rtx addr;
+
+  if (!MEM_P (op))
+    return 0;
+
+  switch (mode)
+    {
+    case QImode:
+    case HImode:
+    case SImode:
+      break;
+
+    case DImode:
+      if (!TARGET_POWERPC64)
+	return 0;
+      break;
+
+    default:
+      return 0;
+    }
+
+  addr = XEXP (op, 0);
+  if (GET_CODE (addr) == PLUS)
+    {
+      rtx base = XEXP (addr, 0);
+      rtx offset = XEXP (addr, 1);
+
+      return (base_reg_operand (base, GET_MODE (base))
+	      && satisfies_constraint_I (offset));
+    }
+
+  else if (GET_CODE (addr) == LO_SUM)
+    {
+      rtx base = XEXP (addr, 0);
+      rtx offset = XEXP (addr, 1);
+
+      if (!base_reg_operand (base, GET_MODE (base)))
+	return 0;
+
+      else if (TARGET_XCOFF || (TARGET_ELF && TARGET_POWERPC64))
+	return small_toc_ref (offset, GET_MODE (offset));
+
+      else if (TARGET_ELF && !TARGET_POWERPC64)
+	return CONSTANT_P (offset);
+    }
+
+  return 0;
+})
Index: gcc/config/rs6000/rs6000-modes.def
===================================================================
--- gcc/config/rs6000/rs6000-modes.def	(revision 201273)
+++ gcc/config/rs6000/rs6000-modes.def	(working copy)
@@ -42,5 +42,7 @@ VECTOR_MODES (FLOAT, 8);      /*        
 VECTOR_MODES (FLOAT, 16);     /*       V8HF  V4SF V2DF */
 VECTOR_MODES (FLOAT, 32);     /*       V16HF V8SF V4DF */
 
-/* Replacement for TImode that only is allowed in GPRs.  */
+/* Replacement for TImode that only is allowed in GPRs.  We also use PTImode
+   for quad memory atomic operations to force getting an even/odd register
+   combination.  */
 PARTIAL_INT_MODE (TI);
Index: gcc/config/rs6000/rs6000-protos.h
===================================================================
--- gcc/config/rs6000/rs6000-protos.h	(revision 201273)
+++ gcc/config/rs6000/rs6000-protos.h	(working copy)
@@ -73,6 +73,8 @@ extern int mems_ok_for_quad_peep (rtx, r
 extern bool gpr_or_gpr_p (rtx, rtx);
 extern bool direct_move_p (rtx, rtx);
 extern bool quad_load_store_p (rtx, rtx);
+extern bool fusion_gpr_load_p (rtx, rtx, rtx, rtx, rtx);
+extern const char *emit_fusion_gpr_load (rtx, rtx, rtx, rtx);
 extern enum reg_class (*rs6000_preferred_reload_class_ptr) (rtx,
 							    enum reg_class);
 extern enum reg_class (*rs6000_secondary_reload_class_ptr) (enum reg_class,
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 201273)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -3074,6 +3074,21 @@ rs6000_option_override_internal (bool gl
       rs6000_isa_flags &= ~OPTION_MASK_QUAD_MEMORY;
     }
 
+  /* Enable power8 fusion if we are tuning for power8, even if we aren't
+     generating power8 instructions.  */
+  if (!(rs6000_isa_flags_explicit & OPTION_MASK_P8_FUSION))
+    rs6000_isa_flags |= (processor_target_table[tune_index].target_enable
+			 & OPTION_MASK_P8_FUSION);
+
+  /* Power8 does not fuse sign extended loads with the addis.  If we are
+     optimizing at high levels for speed, convert a sign extended load into a
+     zero extending load, and an explicit sign extension.  */
+  if (TARGET_P8_FUSION
+      && !(rs6000_isa_flags_explicit & OPTION_MASK_P8_FUSION_SIGN)
+      && optimize_function_for_speed_p (cfun)
+      && optimize >= 3)
+    rs6000_isa_flags |= OPTION_MASK_P8_FUSION_SIGN;
+
   if (TARGET_DEBUG_REG || TARGET_DEBUG_TARGET)
     rs6000_print_isa_options (stderr, 0, "after defaults", rs6000_isa_flags);
 
@@ -30419,6 +30434,264 @@ rs6000_split_logical (rtx operands[3],
 }
 
 \f
+/* Return true if the peephole2 can combine a load involving a combination of
+   an addis instruction and a load with an offset that can be fused together on
+   a power8.  */
+
+bool
+fusion_gpr_load_p (rtx addis_reg,	/* reg. to hold high value.  */
+		   rtx addis_value,	/* high value loaded.  */
+		   rtx target,		/* reg. that is loaded.  */
+		   rtx mem,		/* memory to load.  */
+		   rtx insn)		/* insn for looking up reg notes or
+					   NULL_RTX if this is a peephole2.  */
+{
+  rtx addr;
+  rtx base_reg;
+
+  /* Validate arguments.  */
+  if (!base_reg_operand (addis_reg, GET_MODE (addis_reg)))
+    return false;
+
+  if (!base_reg_operand (target, GET_MODE (target)))
+    return false;
+
+  if (!fusion_gpr_addis (addis_value, GET_MODE (addis_value)))
+    return false;
+
+  if (!fusion_gpr_mem_load (mem, GET_MODE (mem)))
+    return false;
+
+  /* Validate that the register used to load the high value is either the
+     register being loaded, or we can safely replace its use in a peephole.
+
+     If this is a peephole2, we assume that there are 2 instructions in the
+     peephole (addis and load), so we want to check if the target register was
+     not used and the register to hold the addis result is dead after the
+     peephole.  */
+  if (REGNO (addis_reg) != REGNO (target))
+    {
+      if (reg_mentioned_p (target, mem))
+	return false;
+
+      if (insn)
+	{
+	  if (!find_reg_note (insn, REG_DEAD, addis_reg))
+	    return false;
+	}
+      else
+	{
+	  if (!peep2_reg_dead_p (2, addis_reg))
+	    return false;
+	}
+    }
+
+  /* Validate that the value being loaded in the addis is used in the load.  */
+  addr = XEXP (mem, 0);			/* either PLUS or LO_SUM.  */
+  if (GET_CODE (addr) != PLUS && GET_CODE (addr) != LO_SUM)
+    return false;
+
+  base_reg = XEXP (addr, 0);
+  return REGNO (addis_reg) == REGNO (base_reg);
+}
+
+/* Return a string to fuse an addis instruction with a gpr load to the same
+   register that we loaded up the addis instruction.  The code is complicated,
+   so we call output_asm_insn directly, and just return "".  */
+
+const char *
+emit_fusion_gpr_load (rtx addis_reg, rtx addis_value, rtx target, rtx mem)
+{
+  rtx fuse_ops[10];
+  rtx addr;
+  rtx load_offset;
+  const char *addis_str = NULL;
+  const char *load_str = NULL;
+  const char *mode_name = NULL;
+  char insn_template[80];
+  enum machine_mode mode = GET_MODE (mem);
+  const char *comment_str = ASM_COMMENT_START;
+
+  if (*comment_str == ' ')
+    comment_str++;
+
+  if (!MEM_P (mem))
+    gcc_unreachable ();
+
+  addr = XEXP (mem, 0);
+  if (GET_CODE (addr) != PLUS && GET_CODE (addr) != LO_SUM)
+    gcc_unreachable ();
+
+  load_offset = XEXP (addr, 1);
+
+  /* Now emit the load instruction to the same register.  */
+  switch (mode)
+    {
+    case QImode:
+      mode_name = "char";
+      load_str = "lbz";
+      break;
+
+    case HImode:
+      mode_name = "short";
+      load_str = "lhz";
+      break;
+
+    case SImode:
+      mode_name = "int";
+      load_str = "lwz";
+      break;
+
+    case DImode:
+      if (TARGET_POWERPC64)
+	{
+	  mode_name = "long";
+	  load_str = "ld";
+	}
+      break;
+
+    default:
+      break;
+    }
+
+  if (!load_str)
+    gcc_unreachable ();
+
+  /* Emit the addis instruction.  */
+  fuse_ops[0] = target;
+  fuse_ops[1] = addis_reg;
+  if (satisfies_constraint_L (addis_value))
+    {
+      fuse_ops[2] = addis_value;
+      addis_str = "lis %0,%v2";
+    }
+
+  else if (GET_CODE (addis_value) == PLUS)
+    {
+      rtx op0 = XEXP (addis_value, 0);
+      rtx op1 = XEXP (addis_value, 1);
+
+      if (REG_P (op0) && CONST_INT_P (op1)
+	  && satisfies_constraint_L (op1))
+	{
+	  fuse_ops[2] = op0;
+	  fuse_ops[3] = op1;
+	  addis_str = "addis %0,%2,%v3";
+	}
+    }
+
+  else if (GET_CODE (addis_value) == HIGH)
+    {
+      rtx value = XEXP (addis_value, 0);
+      if (GET_CODE (value) == UNSPEC && XINT (value, 1) == UNSPEC_TOCREL)
+	{
+	  fuse_ops[2] = XVECEXP (value, 0, 0);		/* symbol ref.  */
+	  fuse_ops[3] = XVECEXP (value, 0, 1);		/* TOC register.  */
+	  if (TARGET_ELF)
+	    addis_str = "addis %0,%3,%2@toc@ha";
+
+	  else if (TARGET_XCOFF)
+	    addis_str = "addis %0,%2@u(%3)";
+	}
+
+      else if (GET_CODE (value) == PLUS)
+	{
+	  rtx op0 = XEXP (value, 0);
+	  rtx op1 = XEXP (value, 1);
+
+	  if (GET_CODE (op0) == UNSPEC
+	      && XINT (op0, 1) == UNSPEC_TOCREL
+	      && CONST_INT_P (op1))
+	    {
+	      fuse_ops[2] = XVECEXP (op0, 0, 0);	/* symbol ref.  */
+	      fuse_ops[3] = XVECEXP (op0, 0, 1);	/* TOC register.  */
+	      fuse_ops[4] = op1;
+	      if (TARGET_ELF)
+		addis_str = "addis %0,%3,%2+%4@toc@ha";
+
+	      else if (TARGET_XCOFF)
+		addis_str = "addis %0,%2+%4@u(%3)";
+	    }
+	}
+
+      else if (satisfies_constraint_L (value))
+	{
+	  fuse_ops[2] = value;
+	  addis_str = "lis %0,%v2";
+	}
+
+      else if (TARGET_ELF && !TARGET_POWERPC64 && CONSTANT_P (value))
+	{
+	  fuse_ops[2] = value;
+	  addis_str = "lis %0,%2@ha";
+	}
+    }
+
+  if (!addis_str)
+    fatal_insn ("Could not generate addis value for fusion", addis_value);
+
+  sprintf (insn_template, "%s\t\t%s gpr load fusion, type %s, addis reg %%1",
+	   addis_str, comment_str, mode_name);
+  output_asm_insn (insn_template, fuse_ops);
+
+  if (CONST_INT_P (load_offset) && satisfies_constraint_I (load_offset))
+    {
+      sprintf (insn_template, "%s %%0,%%1(%%0)", load_str);
+      fuse_ops[1] = load_offset;
+      output_asm_insn (insn_template, fuse_ops);
+    }
+
+  else if (GET_CODE (load_offset) == UNSPEC
+	   && XINT (load_offset, 1) == UNSPEC_TOCREL)
+    {
+      if (TARGET_ELF)
+	sprintf (insn_template, "%s %%0,%%1@toc@l(%%0)", load_str);
+
+      else if (TARGET_XCOFF)
+	sprintf (insn_template, "%s %%0,%%1@l(%%0)", load_str);
+
+      else
+	gcc_unreachable ();
+
+      fuse_ops[1] = XVECEXP (load_offset, 0, 0);
+      output_asm_insn (insn_template, fuse_ops);
+    }
+
+  else if (GET_CODE (load_offset) == PLUS
+	   && GET_CODE (XEXP (load_offset, 0)) == UNSPEC
+	   && XINT (XEXP (load_offset, 0), 1) == UNSPEC_TOCREL
+	   && CONST_INT_P (XEXP (load_offset, 1)))
+    {
+      rtx tocrel_unspec = XEXP (load_offset, 0);
+      if (TARGET_ELF)
+	sprintf (insn_template, "%s %%0,%%1+%%2@toc@l(%%0)", load_str);
+
+      else if (TARGET_XCOFF)
+	sprintf (insn_template, "%s %%0,%%1+%%2@l(%%0)", load_str);
+
+      else
+	gcc_unreachable ();
+
+      fuse_ops[1] = XVECEXP (tocrel_unspec, 0, 0);
+      fuse_ops[2] = XEXP (load_offset, 1);
+      output_asm_insn (insn_template, fuse_ops);
+    }
+
+  else if (TARGET_ELF && !TARGET_POWERPC64 && CONSTANT_P (load_offset))
+    {
+      sprintf (insn_template, "%s %%0,%%1@l(%%0)", load_str);
+
+      fuse_ops[1] = load_offset;
+      output_asm_insn (insn_template, fuse_ops);
+    }
+
+  else
+    fatal_insn ("Unable to generate load offset for fusion", load_offset);
+
+  return "";
+}
+
+\f
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-rs6000.h"
Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md	(revision 201273)
+++ gcc/config/rs6000/vsx.md	(working copy)
@@ -40,6 +40,14 @@ (define_mode_iterator VSX_L [V16QI V8HI 
 ;; it to use gprs as well as vsx registers.
 (define_mode_iterator VSX_M [V16QI V8HI V4SI V2DI V4SF V2DF])
 
+(define_mode_iterator VSX_M2 [V16QI
+			      V8HI
+			      V4SI
+			      V2DI
+			      V4SF
+			      V2DF
+			      (TI	"TARGET_VSX_TIMODE")])
+
 ;; Map into the appropriate load/store name based on the type
 (define_mode_attr VSm  [(V16QI "vw4")
 			(V8HI  "vw4")
@@ -1446,3 +1454,27 @@ (define_insn_and_split "*vsx_reduc_<VEC_
 }"
   [(set_attr "length" "20")
    (set_attr "type" "veccomplex")])
+
+\f
+;; Power8 Vector fusion.  The fused ops must be physically adjacent.
+(define_peephole
+  [(set (match_operand:P 0 "base_reg_operand" "")
+	(match_operand:P 1 "short_cint_operand" ""))
+   (set (match_operand:VSX_M2 2 "vsx_register_operand" "")
+	(mem:VSX_M2 (plus:P (match_dup 0)
+			    (match_operand:P 3 "int_reg_operand" ""))))]
+  "TARGET_P8_FUSION"
+  "li %0,%1\t\t\t# vector load fusion\;lx<VSX_M2:VSm>x %x2,%0,%3"  
+  [(set_attr "length" "8")
+   (set_attr "type" "vecload")])
+
+(define_peephole
+  [(set (match_operand:P 0 "base_reg_operand" "")
+	(match_operand:P 1 "short_cint_operand" ""))
+   (set (match_operand:VSX_M2 2 "vsx_register_operand" "")
+	(mem:VSX_M2 (plus:P (match_operand:P 3 "int_reg_operand" "")
+			    (match_dup 0))))]
+  "TARGET_P8_FUSION"
+  "li %0,%1\t\t\t# vector load fusion\;lx<VSX_M2:VSm>x %x2,%0,%3"  
+  [(set_attr "length" "8")
+   (set_attr "type" "vecload")])
Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(revision 201273)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -15771,6 +15771,113 @@ (define_insn "rs6000_mftb_<mode>"
 })
 
 \f
+;; Power8 fusion support for fusing an addis instruction with a D-form load of
+;; a GPR.  The addis instruction must be adjacent to the load, and use the same
+;; register that is being loaded.  The fused ops must be physically adjacent.
+
+;; GPR fusion for single word integer types
+
+(define_peephole
+  [(set (match_operand:P 0 "base_reg_operand" "")
+	(match_operand:P 1 "fusion_gpr_addis" ""))
+   (set (match_operand:INT1 2 "base_reg_operand" "")
+	(match_operand:INT1 3 "fusion_gpr_mem_load" ""))]
+  "TARGET_P8_FUSION
+   && fusion_gpr_load_p (operands[0], operands[1], operands[2], operands[3],
+			 insn)"
+{
+  return emit_fusion_gpr_load (operands[0], operands[1], operands[2],
+			       operands[3]);
+}
+  [(set_attr "type" "load")
+   (set_attr "length" "8")])
+
+(define_peephole
+  [(set (match_operand:DI 0 "base_reg_operand" "")
+	(match_operand:DI 1 "fusion_gpr_addis" ""))
+   (set (match_operand:DI 2 "base_reg_operand" "")
+	(zero_extend:DI (match_operand:QHSI 3 "fusion_gpr_mem_load" "")))]
+  "TARGET_P8_FUSION && TARGET_POWERPC64
+   && fusion_gpr_load_p (operands[0], operands[1], operands[2], operands[3],
+			 insn)"
+{
+  return emit_fusion_gpr_load (operands[0], operands[1], operands[2],
+			       operands[3]);
+}
+  [(set_attr "type" "load")
+   (set_attr "length" "8")])
+
+;; Power8 does not fuse a sign extending load, so convert the sign extending
+;; load into a zero extending load, and do an explicit sign extension.  Don't
+;; do this if we are trying to optimize for space.  Do this as a peephole2 to
+;; allow final rtl optimizations and scheduling to move the sign extend.
+(define_peephole2
+  [(set (match_operand:DI 0 "base_reg_operand" "")
+	(match_operand:DI 1 "fusion_gpr_addis" ""))
+   (set (match_operand:DI 2 "base_reg_operand" "")
+	(sign_extend:DI (match_operand:HSI 3 "fusion_gpr_mem_load" "")))]
+  "TARGET_P8_FUSION && TARGET_P8_FUSION_SIGN && TARGET_POWERPC64
+   && fusion_gpr_load_p (operands[0], operands[1], operands[2], operands[3],
+			 NULL_RTX)"
+  [(set (match_dup 0) (match_dup 1))
+   (set (match_dup 4) (match_dup 3))
+   (set (match_dup 2) (sign_extend:DI (match_dup 4)))]
+{
+  unsigned int offset
+    = (BYTES_BIG_ENDIAN ? 8 - GET_MODE_SIZE (<MODE>mode) : 0);
+
+  operands[4] = simplify_subreg (<MODE>mode, operands[2], DImode,
+				 offset);
+})
+
+(define_peephole
+  [(set (match_operand:P 0 "base_reg_operand" "")
+	(match_operand:P 1 "fusion_gpr_addis" ""))
+   (set (match_operand:SI 2 "base_reg_operand" "")
+	(zero_extend:SI (match_operand:QHI 3 "fusion_gpr_mem_load" "")))]
+  "TARGET_P8_FUSION
+   && fusion_gpr_load_p (operands[0], operands[1], operands[2], operands[3],
+			 insn)"
+{
+  return emit_fusion_gpr_load (operands[0], operands[1], operands[2],
+			       operands[3]);
+}
+  [(set_attr "type" "load")
+   (set_attr "length" "8")])
+
+(define_peephole2
+  [(set (match_operand:P 0 "base_reg_operand" "")
+	(match_operand:P 1 "fusion_gpr_addis" ""))
+   (set (match_operand:SI 2 "base_reg_operand" "")
+	(sign_extend:SI (match_operand:HI 3 "fusion_gpr_mem_load" "")))]
+  "TARGET_P8_FUSION && TARGET_P8_FUSION_SIGN
+   && fusion_gpr_load_p (operands[0], operands[1], operands[2], operands[3],
+			 NULL_RTX)"
+  [(set (match_dup 0) (match_dup 1))
+   (set (match_dup 4) (match_dup 3))
+   (set (match_dup 2) (sign_extend:SI (match_dup 4)))]
+{
+  unsigned int offset = (BYTES_BIG_ENDIAN ? 2 : 0);
+
+  operands[4] = simplify_subreg (HImode, operands[2], SImode, offset);
+})
+
+(define_peephole
+  [(set (match_operand:P 0 "base_reg_operand" "")
+	(match_operand:P 1 "fusion_gpr_addis" ""))
+   (set (match_operand:HI 2 "base_reg_operand" "")
+	(zero_extend:HI (match_operand:QI 3 "fusion_gpr_mem_load" "")))]
+  "TARGET_P8_FUSION
+   && fusion_gpr_load_p (operands[0], operands[1], operands[2], operands[3],
+			 insn)"
+{
+  return emit_fusion_gpr_load (operands[0], operands[1], operands[2],
+			       operands[3]);
+}
+  [(set_attr "type" "load")
+   (set_attr "length" "8")])
+
+\f
 
 (include "sync.md")
 (include "vector.md")
Index: gcc/testsuite/gcc.target/powerpc/fusion.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/fusion.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/fusion.c	(revision 0)
@@ -0,0 +1,23 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-mcpu=power7 -mtune=power8 -O3" } */
+
+#define LARGE 0x12345
+
+int fusion_uchar (unsigned char *p){ return p[LARGE]; }
+int fusion_schar (signed char *p){ return p[LARGE]; }
+int fusion_ushort (unsigned short *p){ return p[LARGE]; }
+int fusion_short (short *p){ return p[LARGE]; }
+int fusion_int (int *p){ return p[LARGE]; }
+unsigned fusion_uns (unsigned *p){ return p[LARGE]; }
+
+vector double fusion_vector (vector double *p) { return p[2]; }
+
+/* { dg-final { scan-assembler-times "gpr load fusion"    6 } } */
+/* { dg-final { scan-assembler-times "vector load fusion" 1 } } */
+/* { dg-final { scan-assembler-times "lbz"                2 } } */
+/* { dg-final { scan-assembler-times "extsb"              1 } } */
+/* { dg-final { scan-assembler-times "lhz"                2 } } */
+/* { dg-final { scan-assembler-times "extsh"              1 } } */
+/* { dg-final { scan-assembler-times "lwz"                2 } } */

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH, rs6000] power8 patches, revised patch #8, power8 load fusion
  2013-07-29 18:46   ` [PATCH, rs6000] power8 patches, revised patch #8, power8 load fusion Michael Meissner
@ 2013-07-31 16:00     ` David Edelsohn
  2013-11-23 16:48     ` Alan Modra
  1 sibling, 0 replies; 52+ messages in thread
From: David Edelsohn @ 2013-07-31 16:00 UTC (permalink / raw)
  To: Michael Meissner, GCC Patches, Pat Haugen, Peter Bergner

On Mon, Jul 29, 2013 at 2:39 PM, Michael Meissner
<meissner@linux.vnet.ibm.com> wrote:
> This is the revised version of my patch #8 for power8 support.  I have removed
> all of the incidental changes, and only added the support for load fusion.  I
> have added support for fusion on 32-bit Linux.  I have added a test to make
> sure the fusion ops are being generated.

In emit_fusion_gpr_load(), please add

else
  gcc_unreachable ();

to the

if (TARGET_ELF)
  ...
else if (TARGET_XCOFF)
 ...

paths.  Those really should be unreachable and not fall into the
"Could not generate addis value for fusion" fatal error.

Okay with that change.

Thanks, David

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH, rs6000] power8 patches, revised patch #8, power8 load fusion
  2013-07-29 18:46   ` [PATCH, rs6000] power8 patches, revised patch #8, power8 load fusion Michael Meissner
  2013-07-31 16:00     ` David Edelsohn
@ 2013-11-23 16:48     ` Alan Modra
  1 sibling, 0 replies; 52+ messages in thread
From: Alan Modra @ 2013-11-23 16:48 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, dje.gcc, pthaugen, bergner, Bill Schmidt

Hi Mike,
As discussed on irc, I'm applying the following as obvious to fix a
bug in the vsx fusion peepholes.  The bug is simply that the peepholes
are enabled when -mno-vsx, which leads to replacing RTL that would
emit lvx insns with RTL that emits lxvw4x or lxvd2x.  This is clearly
wrong, and worse, on LE causes permution of register words.  I was
originally going to disable the peepholes entirely for little-endian
but on further thought decided this wasn't necessary:  If TARGET_VSX
the original RTL insns these patterns match would emit vsx loads
anyway.  If that changes in the future, ie. someone decides that vmx
loads are better on little-endian than vsx loads, then we'll need to
disable these peepholes for little-endian..

Bootstrapped and regression tested powerpc64-linux and
powerpc64le-linux.  Fixes the following on powerpc64le-linux (where
power8 is now the default).
-FAIL: gcc.dg/vmx/3c-01.c  -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  execution test
-FAIL: gcc.dg/vmx/varargs-4.c  -O1  execution test
-FAIL: gcc.target/powerpc/ppc64-abi-2.c execution test
-FAIL: gfortran.fortran-torture/execute/constructor.f90 execution, -O2 -ftree-vectorize -maltivec 
-FAIL: gfortran.fortran-torture/execute/elemental.f90 execution, -O2 -ftree-vectorize -maltivec 
-FAIL: gfortran.fortran-torture/execute/forall_4.f90 execution, -O2 -ftree-vectorize -maltivec 
-FAIL: gfortran.fortran-torture/execute/in-pack.f90 execution, -O2 -ftree-vectorize -maltivec 

	* config/rs6000/vsx.md (fusion peepholes): Disable when !TARGET_VSX.

Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md	(revision 205244)
+++ gcc/config/rs6000/vsx.md	(working copy)
@@ -1895,7 +1895,7 @@
    (set (match_operand:VSX_M2 2 "vsx_register_operand" "")
 	(mem:VSX_M2 (plus:P (match_dup 0)
 			    (match_operand:P 3 "int_reg_operand" ""))))]
-  "TARGET_P8_FUSION"
+  "TARGET_VSX && TARGET_P8_FUSION"
   "li %0,%1\t\t\t# vector load fusion\;lx<VSX_M2:VSm>x %x2,%0,%3"  
   [(set_attr "length" "8")
    (set_attr "type" "vecload")])
@@ -1906,7 +1906,7 @@
    (set (match_operand:VSX_M2 2 "vsx_register_operand" "")
 	(mem:VSX_M2 (plus:P (match_operand:P 3 "int_reg_operand" "")
 			    (match_dup 0))))]
-  "TARGET_P8_FUSION"
+  "TARGET_VSX && TARGET_P8_FUSION"
   "li %0,%1\t\t\t# vector load fusion\;lx<VSX_M2:VSm>x %x2,%0,%3"  
   [(set_attr "length" "8")
    (set_attr "type" "vecload")])

-- 
Alan Modra
Australia Development Lab, IBM

^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2013-11-23  5:22 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-20 20:41 [PATCH, rs6000] power8 patches Michael Meissner
2013-05-20 20:49 ` [PATCH, rs6000] power8 patch #1, infrastructure changes Michael Meissner
2013-05-20 21:34   ` [PATCH, rs6000] power8 patch #1, infrastructure changes (revised patch) Michael Meissner
2013-05-22  3:29     ` David Edelsohn
2013-05-20 23:13 ` [PATCH, rs6000] power8 patches, patch #2, add crypto builtins Michael Meissner
2013-05-22  3:30   ` David Edelsohn
2013-05-23  3:41     ` David Edelsohn
2013-05-23  3:59       ` Michael Meissner
2013-05-25  4:07         ` David Edelsohn
2013-05-30 21:04           ` Michael Meissner
2013-05-21  2:11 ` [PATCH, rs6000] power8 patches Peter Bergner
2013-05-21 15:51 ` [PATCH, rs6000] power8 patches, patch #3, add V2DI vector support Michael Meissner
2013-05-23 16:31   ` David Edelsohn
2013-05-21 23:47 ` [PATCH, rs6000] power8 patches, patch #4, new power8 builtins Michael Meissner
2013-05-25  4:03   ` David Edelsohn
2013-05-30 23:26     ` Michael Meissner
2013-05-31  9:14       ` Segher Boessenkool
2013-05-31 15:11         ` Michael Meissner
2013-06-04 18:49   ` [PATCH, rs6000] power8 patches, patch #4 (revised), " Michael Meissner
2013-06-05 14:28     ` David Edelsohn
2013-06-05 15:50       ` Segher Boessenkool
2013-06-05 16:05         ` Michael Meissner
2013-06-05 20:06           ` Segher Boessenkool
2013-06-05 20:24             ` Michael Meissner
2013-06-05 16:13       ` Michael Meissner
2013-06-05 17:28         ` David Edelsohn
2013-06-06 15:57         ` David Edelsohn
2013-06-06 21:42           ` Michael Meissner
2013-07-15 21:48           ` Michael Meissner
2013-07-20 19:12             ` David Edelsohn
2013-07-23 21:24               ` Michael Meissner
2013-05-21 23:49 ` [PATCH, rs6000] power8 patches, patch #5, new vector tests Michael Meissner
2013-06-06 21:51   ` Michael Meissner
2013-05-22 14:26 ` [PATCH, rs6000] power8 patches, patch #6, direct move & basic quad load/store Michael Meissner
2013-05-29 19:53   ` David Edelsohn
2013-05-29 20:32     ` Michael Meissner
2013-06-10 15:41       ` David Edelsohn
2013-06-10 20:26         ` Michael Meissner
2013-05-22 16:51 ` [PATCH, rs6000] power8 patches, patch #7, quad/byte/half-word atomic instructions Michael Meissner
2013-05-29 20:29   ` David Edelsohn
2013-05-29 20:36     ` Michael Meissner
2013-06-11 23:56     ` Michael Meissner
2013-06-12 21:55       ` David Edelsohn
2013-05-22 20:53 ` [PATCH, rs6000] power8 patches, patch #8, power8 load fusion + misc Michael Meissner
2013-06-18 18:30   ` David Edelsohn
2013-06-24 16:32     ` Michael Meissner
2013-06-24 19:43       ` David Edelsohn
2013-07-29 18:46   ` [PATCH, rs6000] power8 patches, revised patch #8, power8 load fusion Michael Meissner
2013-07-31 16:00     ` David Edelsohn
2013-11-23 16:48     ` Alan Modra
2013-06-07 19:22 ` [PATCH, rs6000] power8 patches, patch #9, power8 scheduling Pat Haugen
2013-06-19 13:00   ` David Edelsohn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).