public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH i386 2/8] [AVX512] Add mask registers.
@ 2013-07-31 15:31 Kirill Yukhin
  2013-07-31 16:34 ` Richard Henderson
  0 siblings, 1 reply; 31+ messages in thread
From: Kirill Yukhin @ 2013-07-31 15:31 UTC (permalink / raw)
  To: ubizjak, rth, jakub, vmakarov; +Cc: gcc-patches

Hello,
In this patch we add support for new mask register k0-k7. Changes are mostly
strightforward, but there are two problems. First we can't use k0 as mask in
vector instructions, so we have introduce two register classes. One for use in
vector instruction with "k" constraint - corresponding to k1-k7. And one for
instruction like kxor which can use all new mask registers. Another problem is
that we have only 16-bit kmovw, but both 8 and 16 bit wide uses of masks. So
we don't have memory alternatives in movqi_internal, and hope that register
allocator will figure out that we need move throught GPR.

We have doubts about SECONDARY_MEMORY changes.

There's ICE (max. number of generated reload insns per insn is achieved (90)),
when LRA tried to save mask register before call. This was caused by fact that  split_reg function
in lra-constraints.c allocates memory for saved reg in SECONDARY_MEMORY_NEEDED_MODE which 
means SImode for x86. But there is no SImode move to/from that new register, only HImode move, because
that register is 16-bit wide.  Changing secondary reload doesn’t help because split_reg function just calls emit_spill_move
without calling all secondary reloads machinery. There are a few approaches to fixing that:
  1) [In current patch] Change SECONDARY_MEMORY_NEEDED_MODE, so that it stops widening memory
  2) It's strange, that SECONDARY_MEMORY_NEEDED_MODE  is used to determine mode of memory for  call saved register,
     So, we can add new macros e. g. CALL_SAVE_MODE and use it in split_reg for determining that mode
  3) If we want to use SECONDARY_MEMORY_NEEDED_MODE   for call save case too we should  change emit_spill_move to use 
     secondary reload

Small reproducer (we first encountered that bug in real code in spec2006):
  volatile short t,y,a,b;
  void bar();

  void foo()
  {
	short c;
	__asm__ ( "kxorw %0, %1,%2;" : "=Yk"(c) : "Yk"(t),"Yk"(y));
	bar();
	__asm__ ( "kxorw %0, %1,%2;" : "=Yk"(a) : "Yk"(c),"Yk"(b));
  }

Build command:
      ./gcc/xgcc -B./gcc -Ofast -mavx512f -march=core-avx2 foo.c  -S -o-  -ffixed-rsi  -ffixed-rdi  -ffixed-rbx -ffixed-rbp -m32

So what is the best approach?

Curent status with testing (2/8 applied after 1/8):
  1. Bootstrap passing
  2. No new regressions
  3. Spec 2000 & 2006 build and run (test dataset) passing without -mavx512f option: no comp- or runfails
  4. Spec 2000 & 2006 build passing with -mavx512f option: no compfails

ChangeLog entry:
2013-07-31  Alexander Ivchenko  <alexander.ivchenko@intel.com>
	    Maxim Kuznetsov  <maxim.kuznetsov@intel.com>
	    Sergey Lega  <sergey.s.lega@intel.com>
	    Anna Tikhonova  <anna.tikhonova@intel.com>
	    Ilya Tocar  <ilya.tocar@intel.com>
	    Andrey Turetskiy  <andrey.turetskiy@intel.com>
	    Ilya Verbin  <ilya.verbin@intel.com>
	    Kirill Yukhin  <kirill.yukhin@intel.com>
	    Michael Zolotukhin  <michael.v.zolotukhin@intel.com>

	* config/i386/constraints.md (k): New.
	(Yk): Ditto.
	* config/i386/i386.c (const regclass_map): Add new mask registers.
	(dbx_register_map): Ditto.
	(dbx64_register_map): Ditto.
	(svr4_dbx_register_map): Ditto.
	(ix86_conditional_register_usage): Squash mask registers if AVX512F is
	disabled.
	(ix86_preferred_reload_class): Disable constants for mask registers.
	(ix86_hard_regno_mode_ok): Support new mask registers.
	(x86_order_regs_for_local_alloc): Ditto.
	* config/i386/i386.h (FIRST_PSEUDO_REGISTER): Update.
	(FIXED_REGISTERS): Add new mask registers.
	(CALL_USED_REGISTERS): Ditto.
	(REG_ALLOC_ORDER): Ditto.
	(VALID_MASK_REG_MODE): New.
	(FIRST_MASK_REG): Ditto.
	(LAST_MASK_REG): Ditto.
	(reg_class): Add MASK_EVEX_REGS, MASK_REGS.
	(MAYBE_MASK_CLASS_P): New.
	(REG_CLASS_NAMES): Add MASK_EVEX_REGS, MASK_REGS.
	(REG_CLASS_CONTENTS): Ditto.
	(MASK_REGNO_P): New.
	(ANY_MASK_REG_P): Ditto.
	(SECONDARY_MEMORY_NEEDED_MODE): Change cutoff point from 32 to 8.
	(HI_REGISTER_NAMES): Add new mask registers.
	* config/i386/i386.md (unspec): Add UNSPEC_KIOR, UNSPEC_KXOR, UNSPEC_KAND,
	UNSPEC_KANDN.
	(MASK0_REG, MASK1_REG, MASK2_REG, MASK3_REG, MASK4_REG, MASK5_REG)
	(MASK6_REG, MASK7_REG): Constants for new mask registers.
	(attribute "type"): Add mskmov, msklog.
	(attribute "length_immediate"): Support them.
	(attribute "memory"): Ditto.
	(attribute "prefix_0f"): Ditto.
	(*movhi_internal): Support new mask registers.
	(movqi_internal): Ditto.
	(kandn<mode>): New.
	(kand<mode>): Ditto.
	(kior<mode>): Ditto.
	(kxor<mode>): Ditto.
	(kxnor<mode>): Ditto.
	(kortestzhi): Ditto.
	(kortestchi): Ditto.
	(kunpckhi): Ditto.
	(*one_cmpl<mode>2_1): Remove HImode and handle it...
	(*one_cmplhi2_1): ...Here, now with mask registers support.
	(*one_cmplqi2_1): Support new mask registers.
	(HI/QImode arithmetics splitter): Don't split if mask registers are used.
	(HI/QImode not splitter): Ditto.


Thanks, K

---
 gcc/config/i386/constraints.md |   8 +-
 gcc/config/i386/i386.c         |  23 ++++-
 gcc/config/i386/i386.h         |  38 ++++++--
 gcc/config/i386/i386.md        | 204 +++++++++++++++++++++++++++++++++++++----
 4 files changed, 240 insertions(+), 33 deletions(-)

diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
index 28e626f..92e0c05 100644
--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -19,7 +19,7 @@
 
 ;;; Unused letters:
 ;;;     B     H           T
-;;;           h jk
+;;;           h j
 
 ;; Integer register constraints.
 ;; It is not necessary to define 'r' here.
@@ -78,6 +78,12 @@
  "TARGET_80387 || TARGET_FLOAT_RETURNS_IN_80387 ? FP_SECOND_REG : NO_REGS"
  "Second from top of 80387 floating-point stack (@code{%st(1)}).")
 
+(define_register_constraint "k" "TARGET_AVX512F ? MASK_EVEX_REGS : NO_REGS"
+"@internal Any mask register that can be used as predicate, i.e. k1-k7.")
+
+(define_register_constraint "Yk" "TARGET_AVX512F ? MASK_REGS : NO_REGS"
+"@internal Any mask register.")
+
 ;; Vector registers (also used for plain floating point nowadays).
 (define_register_constraint "y" "TARGET_MMX ? MMX_REGS : NO_REGS"
  "Any MMX register.")
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 1b5294d..8b72f96 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2193,6 +2193,9 @@ enum reg_class const regclass_map[FIRST_PSEUDO_REGISTER] =
   EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS,
   EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS,
   EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS,
+  /* Mask registers.  */
+  MASK_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS,
+  MASK_EVEX_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS,
 };
 
 /* The "default" register map used in 32bit mode.  */
@@ -2208,6 +2211,7 @@ int const dbx_register_map[FIRST_PSEUDO_REGISTER] =
   -1, -1, -1, -1, -1, -1, -1, -1,	/* extended SSE registers */
   -1, -1, -1, -1, -1, -1, -1, -1,       /* new SSE registers 16-23*/
   -1, -1, -1, -1, -1, -1, -1, -1,       /* new SSE registers 24-31*/
+  93, 94, 95, 96, 97, 98, 99, 100,      /* Mask registers */
 };
 
 /* The "default" register map used in 64bit mode.  */
@@ -2223,6 +2227,7 @@ int const dbx64_register_map[FIRST_PSEUDO_REGISTER] =
   25, 26, 27, 28, 29, 30, 31, 32,	/* extended SSE registers */
   67, 68, 69, 70, 71, 72, 73, 74,       /* new SSE registers 16-23 */
   75, 76, 77, 78, 79, 80, 81, 82,       /* new SSE registers 24-31 */
+  118, 119, 120, 121, 122, 123, 124, 125, /* Mask registers */
 };
 
 /* Define the register numbers to be used in Dwarf debugging information.
@@ -2290,6 +2295,7 @@ int const svr4_dbx_register_map[FIRST_PSEUDO_REGISTER] =
   -1, -1, -1, -1, -1, -1, -1, -1,	/* extended SSE registers */
   -1, -1, -1, -1, -1, -1, -1, -1,       /* new SSE registers 16-23*/
   -1, -1, -1, -1, -1, -1, -1, -1,       /* new SSE registers 24-31*/
+  93, 94, 95, 96, 97, 98, 99, 100,      /* Mask registers */
 };
 
 /* Define parameter passing and return registers.  */
@@ -4137,7 +4143,8 @@ ix86_conditional_register_usage (void)
   /* If AVX512F is disabled, squash the registers.  */
   if (! TARGET_AVX512F)
     for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
-      if (TEST_HARD_REG_BIT (reg_class_contents[(int)EVEX_SSE_REGS], i))
+      if (TEST_HARD_REG_BIT (reg_class_contents[(int)MASK_REGS], i)
+	  || TEST_HARD_REG_BIT (reg_class_contents[(int)EVEX_SSE_REGS], i))
 	fixed_regs[i] = call_used_regs[i] = 1, reg_names[i] = "";
 }
 
@@ -33790,10 +33797,12 @@ ix86_preferred_reload_class (rtx x, reg_class_t regclass)
     return regclass;
 
   /* Force constants into memory if we are loading a (nonzero) constant into
-     an MMX or SSE register.  This is because there are no MMX/SSE instructions
-     to load from a constant.  */
+     an MMX, SSE or MASK register.  This is because there are no MMX/SSE/MASK
+     instructions to load from a constant.  */
   if (CONSTANT_P (x)
-      && (MAYBE_MMX_CLASS_P (regclass) || MAYBE_SSE_CLASS_P (regclass)))
+      && (MAYBE_MMX_CLASS_P (regclass)
+	  || MAYBE_SSE_CLASS_P (regclass)
+	  || MAYBE_MASK_CLASS_P (regclass)))
     return NO_REGS;
 
   /* Prefer SSE regs only, if we can use them for math.  */
@@ -34326,6 +34335,8 @@ ix86_hard_regno_mode_ok (int regno, enum machine_mode mode)
     return false;
   if (STACK_REGNO_P (regno))
     return VALID_FP_MODE_P (mode);
+  if (MASK_REGNO_P (regno))
+    return VALID_MASK_REG_MODE (mode);
   if (SSE_REGNO_P (regno))
     {
       /* We implement the move patterns for all vector modes into and
@@ -35126,6 +35137,10 @@ x86_order_regs_for_local_alloc (void)
    for (i = FIRST_EXT_REX_SSE_REG; i <= LAST_EXT_REX_SSE_REG; i++)
      reg_alloc_order [pos++] = i;
 
+   /* Mask register.  */
+   for (i = FIRST_MASK_REG; i <= LAST_MASK_REG; i++)
+     reg_alloc_order [pos++] = i;
+
    /* x87 registers.  */
    if (TARGET_SSE_MATH)
      for (i = FIRST_STACK_REG; i <= LAST_STACK_REG; i++)
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index e30d61c..49999a6 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -963,7 +963,7 @@ enum target_cpu_default
    eliminated during reloading in favor of either the stack or frame
    pointer.  */
 
-#define FIRST_PSEUDO_REGISTER 69
+#define FIRST_PSEUDO_REGISTER 77
 
 /* Number of hardware registers that go into the DWARF-2 unwind info.
    If not defined, equals FIRST_PSEUDO_REGISTER.  */
@@ -993,7 +993,9 @@ enum target_cpu_default
 /*xmm16,xmm17,xmm18,xmm19,xmm20,xmm21,xmm22,xmm23*/		\
      0,   0,    0,    0,    0,    0,    0,    0,		\
 /*xmm24,xmm25,xmm26,xmm27,xmm28,xmm29,xmm30,xmm31*/		\
-     0,   0,    0,    0,    0,    0,    0,    0 }
+     0,   0,    0,    0,    0,    0,    0,    0,		\
+/*  k0,  k1, k2, k3, k4, k5, k6, k7*/				\
+     0,  0,   0,  0,  0,  0,  0,  0 }
 
 /* 1 for registers not available across function calls.
    These must include the FIXED_REGISTERS and also any
@@ -1025,7 +1027,9 @@ enum target_cpu_default
 /*xmm16,xmm17,xmm18,xmm19,xmm20,xmm21,xmm22,xmm23*/		\
      6,    6,     6,    6,    6,    6,    6,    6,		\
 /*xmm24,xmm25,xmm26,xmm27,xmm28,xmm29,xmm30,xmm31*/		\
-     6,    6,     6,    6,    6,    6,    6,    6 }
+     6,    6,     6,    6,    6,    6,    6,    6,		\
+ /* k0,  k1,  k2,  k3,  k4,  k5,  k6,  k7*/			\
+     1,   1,   1,   1,   1,   1,   1,   1 }
 
 /* Order in which to allocate registers.  Each register must be
    listed once, even those in FIXED_REGISTERS.  List frame pointer
@@ -1041,7 +1045,7 @@ enum target_cpu_default
    18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32,	\
    33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,  \
    48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62,	\
-   63, 64, 65, 66, 67, 68 }
+   63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76 }
 
 /* ADJUST_REG_ALLOC_ORDER is a macro which permits reg_alloc_order
    to be rearranged based on a particular function.  When using sse math,
@@ -1138,6 +1142,8 @@ enum target_cpu_default
    || (MODE) == V16SImode || (MODE) == V32HImode || (MODE) == V8DFmode	\
    || (MODE) == V16SFmode)
 
+#define VALID_MASK_REG_MODE(MODE) ((MODE) == HImode || (MODE) == QImode)
+
 /* Value is 1 if hard register REGNO can hold a value of machine-mode MODE.  */
 
 #define HARD_REGNO_MODE_OK(REGNO, MODE)	\
@@ -1211,6 +1217,9 @@ enum target_cpu_default
 #define FIRST_EXT_REX_SSE_REG  (LAST_REX_SSE_REG + 1) /*53*/
 #define LAST_EXT_REX_SSE_REG   (FIRST_EXT_REX_SSE_REG + 15) /*68*/
 
+#define FIRST_MASK_REG  (LAST_EXT_REX_SSE_REG + 1) /*69*/
+#define LAST_MASK_REG   (FIRST_MASK_REG + 7) /*76*/
+
 /* Override this in other tm.h files to cope with various OS lossage
    requiring a frame pointer.  */
 #ifndef SUBTARGET_FRAME_POINTER_REQUIRED
@@ -1299,6 +1308,8 @@ enum reg_class
   FLOAT_INT_REGS,
   INT_SSE_REGS,
   FLOAT_INT_SSE_REGS,
+  MASK_EVEX_REGS,
+  MASK_REGS,
   ALL_REGS, LIM_REG_CLASSES
 };
 
@@ -1320,6 +1331,8 @@ enum reg_class
   reg_classes_intersect_p (ALL_SSE_REGS, (CLASS))
 #define MAYBE_MMX_CLASS_P(CLASS) \
   reg_classes_intersect_p (MMX_REGS, (CLASS))
+#define MAYBE_MASK_CLASS_P(CLASS) \
+  reg_classes_intersect_p (MASK_REGS, (CLASS))
 
 #define Q_CLASS_P(CLASS) \
   reg_class_subset_p ((CLASS), Q_REGS)
@@ -1349,6 +1362,8 @@ enum reg_class
    "FLOAT_INT_REGS",			\
    "INT_SSE_REGS",			\
    "FLOAT_INT_SSE_REGS",		\
+   "MASK_EVEX_REGS",			\
+   "MASK_REGS",				\
    "ALL_REGS" }
 
 /* Define which registers fit in which classes.  This is an initializer
@@ -1386,7 +1401,9 @@ enum reg_class
 {   0x11ffff,    0x1fe0,   0x0 },       /* FLOAT_INT_REGS */            \
 { 0x1ff100ff,0xffffffe0,  0x1f },       /* INT_SSE_REGS */              \
 { 0x1ff1ffff,0xffffffe0,  0x1f },       /* FLOAT_INT_SSE_REGS */        \
-{ 0xffffffff,0xffffffff,  0x1f }                                        \
+       { 0x0,       0x0,0x1fc0 },       /* MASK_EVEX_REGS */           \
+       { 0x0,       0x0,0x1fe0 },       /* MASK_REGS */                 \
+{ 0xffffffff,0xffffffff,0x1fff }                                        \
 }
 
 /* The same information, inverted:
@@ -1444,6 +1461,8 @@ enum reg_class
          : (N) <= LAST_REX_SSE_REG ? (FIRST_REX_SSE_REG + (N) - 8) \
                                    : (FIRST_EXT_REX_SSE_REG + (N) - 16))
 
+#define MASK_REGNO_P(N) IN_RANGE ((N), FIRST_MASK_REG, LAST_MASK_REG)
+#define ANY_MASK_REG_P(X) (REG_P (X) && MASK_REGNO_P (REGNO (X)))
 
 #define SSE_FLOAT_MODE_P(MODE) \
   ((TARGET_SSE && (MODE) == SFmode) || (TARGET_SSE2 && (MODE) == DFmode))
@@ -1496,10 +1515,10 @@ enum reg_class
 
 /* Get_secondary_mem widens integral modes to BITS_PER_WORD.
    There is no need to emit full 64 bit move on 64 bit targets
-   for integral modes that can be moved using 32 bit move.  */
+   for integral modes that can be moved using 8 bit move.  */
 #define SECONDARY_MEMORY_NEEDED_MODE(MODE)			\
-  (GET_MODE_BITSIZE (MODE) < 32 && INTEGRAL_MODE_P (MODE)	\
-   ? mode_for_size (32, GET_MODE_CLASS (MODE), 0)		\
+  (GET_MODE_BITSIZE (MODE) < 8 && INTEGRAL_MODE_P (MODE)	\
+   ? mode_for_size (8, GET_MODE_CLASS (MODE), 0)		\
    : MODE)
 
 /* Return a class of registers that cannot change FROM mode to TO mode.  */
@@ -2000,7 +2019,8 @@ do {							\
  "xmm16", "xmm17", "xmm18", "xmm19",					\
  "xmm20", "xmm21", "xmm22", "xmm23",					\
  "xmm24", "xmm25", "xmm26", "xmm27",					\
- "xmm28", "xmm29", "xmm30", "xmm31" }
+ "xmm28", "xmm29", "xmm30", "xmm31",					\
+ "k0", "k1", "k2", "k3", "k4", "k5", "k6", "k7" }
 
 #define REGISTER_NAMES HI_REGISTER_NAMES
 
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 9d0c6a9..eb0b0fe 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -178,6 +178,12 @@
   ;; For BMI2 support
   UNSPEC_PDEP
   UNSPEC_PEXT
+
+  ;; For AVX512F mask support
+  UNSPEC_KIOR
+  UNSPEC_KXOR
+  UNSPEC_KAND
+  UNSPEC_KANDN
 ])
 
 (define_c_enum "unspecv" [
@@ -328,6 +334,14 @@
    (XMM29_REG			66)
    (XMM30_REG			67)
    (XMM31_REG			68)
+   (MASK0_REG			69)
+   (MASK1_REG			70)
+   (MASK2_REG			71)
+   (MASK3_REG			72)
+   (MASK4_REG			73)
+   (MASK5_REG			74)
+   (MASK6_REG			75)
+   (MASK7_REG			76)
   ])
 
 ;; Insns whose names begin with "x86_" are emitted by gen_FOO calls
@@ -360,7 +374,7 @@
    sseishft,sseishft1,ssecmp,ssecomi,
    ssecvt,ssecvt1,sseicvt,sseins,
    sseshuf,sseshuf1,ssemuladd,sse4arg,
-   lwp,
+   lwp,mskmov,msklog,
    mmx,mmxmov,mmxadd,mmxmul,mmxcmp,mmxcvt,mmxshft"
   (const_string "other"))
 
@@ -379,7 +393,7 @@
 			  ssemul,sseimul,ssediv,sselog,sselog1,
 			  sseishft,sseishft1,ssecmp,ssecomi,
 			  ssecvt,ssecvt1,sseicvt,sseins,
-			  sseshuf,sseshuf1,ssemuladd,sse4arg")
+			  sseshuf,sseshuf1,ssemuladd,sse4arg,mskmov")
 	   (const_string "sse")
 	 (eq_attr "type" "mmx,mmxmov,mmxadd,mmxmul,mmxcmp,mmxcvt,mmxshft")
 	   (const_string "mmx")
@@ -390,7 +404,7 @@
 ;; The (bounding maximum) length of an instruction immediate.
 (define_attr "length_immediate" ""
   (cond [(eq_attr "type" "incdec,setcc,icmov,str,lea,other,multi,idiv,leave,
-			  bitmanip,imulx")
+			  bitmanip,imulx,msklog,mskmov")
 	   (const_int 0)
 	 (eq_attr "unit" "i387,sse,mmx")
 	   (const_int 0)
@@ -451,7 +465,7 @@
 ;; Set when 0f opcode prefix is used.
 (define_attr "prefix_0f" ""
   (if_then_else
-    (ior (eq_attr "type" "imovx,setcc,icmov,bitmanip")
+    (ior (eq_attr "type" "imovx,setcc,icmov,bitmanip,msklog,mskmov")
 	 (eq_attr "unit" "sse,mmx"))
     (const_int 1)
     (const_int 0)))
@@ -651,7 +665,7 @@
 		   fmov,fcmp,fsgn,
 		   sse,ssemov,ssecmp,ssecomi,ssecvt,ssecvt1,sseicvt,
 		   sselog1,sseshuf1,sseadd1,sseiadd1,sseishft1,
-		   mmx,mmxmov,mmxcmp,mmxcvt")
+		   mmx,mmxmov,mmxcmp,mmxcvt,mskmov,msklog")
 	      (match_operand 2 "memory_operand"))
 	   (const_string "load")
 	 (and (eq_attr "type" "icmov,ssemuladd,sse4arg")
@@ -2211,8 +2225,8 @@
 	   (const_string "SI")))])
 
 (define_insn "*movhi_internal"
-  [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r ,r ,m")
-	(match_operand:HI 1 "general_operand"	   "r ,rn,rm,rn"))]
+  [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r ,r ,m ,Yk,Yk,rm")
+	(match_operand:HI 1 "general_operand"      "r ,rn,rm,rn,rm,Yk,Yk"))]
   "!(MEM_P (operands[0]) && MEM_P (operands[1]))"
 {
   switch (get_attr_type (insn))
@@ -2221,6 +2235,16 @@
       /* movzwl is faster than movw on p2 due to partial word stalls,
 	 though not as fast as an aligned movl.  */
       return "movz{wl|x}\t{%1, %k0|%k0, %1}";
+
+    case TYPE_MSKMOV:
+      switch (which_alternative)
+        {
+	case 4: return "kmovw\t{%k1, %0|%0, %k1}";
+	case 5: return "kmovw\t{%1, %0|%0, %1}";
+	case 6: return "kmovw\t{%1, %k0|%k0, %1}";
+	default: gcc_unreachable ();
+	}
+
     default:
       if (get_attr_mode (insn) == MODE_SI)
         return "mov{l}\t{%k1, %k0|%k0, %k1}";
@@ -2238,11 +2262,17 @@
 	    (and (eq_attr "alternative" "1,2")
 		 (match_operand:HI 1 "aligned_operand"))
 	      (const_string "imov")
+	    (eq_attr "alternative" "4,5,6")
+	      (const_string "mskmov")
 	    (and (match_test "TARGET_MOVX")
 		 (eq_attr "alternative" "0,2"))
 	      (const_string "imovx")
 	   ]
 	   (const_string "imov")))
+    (set (attr "prefix")
+      (if_then_else (eq_attr "alternative" "4,5,6")
+	(const_string "vex")
+	(const_string "orig")))
     (set (attr "mode")
       (cond [(eq_attr "type" "imovx")
 	       (const_string "SI")
@@ -2267,8 +2297,8 @@
 ;; register stall machines with, where we use QImode instructions, since
 ;; partial register stall can be caused there.  Then we use movzx.
 (define_insn "*movqi_internal"
-  [(set (match_operand:QI 0 "nonimmediate_operand" "=q,q ,q ,r,r ,?r,m")
-	(match_operand:QI 1 "general_operand"      "q ,qn,qm,q,rn,qm,qn"))]
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=q,q ,q ,r,r ,?r,m ,Yk,Yk,r")
+	(match_operand:QI 1 "general_operand"      "q ,qn,qm,q,rn,qm,qn,rm,Yk,Yk"))]
   "!(MEM_P (operands[0]) && MEM_P (operands[1]))"
 {
   switch (get_attr_type (insn))
@@ -2276,6 +2306,16 @@
     case TYPE_IMOVX:
       gcc_assert (ANY_QI_REG_P (operands[1]) || MEM_P (operands[1]));
       return "movz{bl|x}\t{%1, %k0|%k0, %1}";
+
+    case TYPE_MSKMOV:
+      switch (which_alternative)
+        {
+	case 7: return "kmovw\t{%k1, %0|%0, %k1}";
+	case 8: return "kmovw\t{%1, %0|%0, %1}";
+	case 9: return "kmovw\t{%1, %k0|%k0, %1}";
+	default: gcc_unreachable ();
+	}
+
     default:
       if (get_attr_mode (insn) == MODE_SI)
         return "mov{l}\t{%k1, %k0|%k0, %k1}";
@@ -2295,11 +2335,17 @@
 	      (const_string "imov")
 	    (eq_attr "alternative" "3,5")
 	      (const_string "imovx")
+	    (eq_attr "alternative" "7,8,9")
+	      (const_string "mskmov")
 	    (and (match_test "TARGET_MOVX")
 		 (eq_attr "alternative" "2"))
 	      (const_string "imovx")
 	   ]
 	   (const_string "imov")))
+   (set (attr "prefix")
+     (if_then_else (eq_attr "alternative" "7,8,9")
+       (const_string "vex")
+       (const_string "orig")))
    (set (attr "mode")
       (cond [(eq_attr "alternative" "3,4,5")
 	       (const_string "SI")
@@ -7614,6 +7660,18 @@
        (const_string "*")))
    (set_attr "mode" "HI,HI,SI")])
 
+(define_insn "kandn<mode>"
+  [(set (match_operand:SWI12 0 "register_operand" "=Yk")
+	(unspec:SWI12
+	  [(match_operand:SWI12 1 "register_operand" "Yk")
+	   (match_operand:SWI12 2 "register_operand" "Yk")]
+	 UNSPEC_KANDN))]
+  "TARGET_AVX512F"
+  "kandnw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mode" "<MODE>")
+   (set_attr "type" "msklog")
+   (set_attr "prefix" "vex")])
+
 ;; %%% Potential partial reg stall on alternative 2.  What to do?
 (define_insn "*andqi_1"
   [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,q,r")
@@ -7639,6 +7697,18 @@
   [(set_attr "type" "alu1")
    (set_attr "mode" "QI")])
 
+(define_insn "kand<mode>"
+  [(set (match_operand:SWI12 0 "register_operand" "=Yk")
+	(unspec:SWI12
+	  [(match_operand:SWI12 1 "register_operand" "Yk")
+	   (match_operand:SWI12 2 "register_operand" "Yk")]
+	  UNSPEC_KAND))]
+  "TARGET_AVX512F"
+  "kandw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "msklog")
+   (set_attr "prefix" "vex")
+   (set_attr "mode" "<MODE>")])
+
 ;; Turn *anddi_1 into *andsi_1_zext if possible.
 (define_split
   [(set (match_operand:DI 0 "register_operand")
@@ -8042,6 +8112,81 @@
   [(set_attr "type" "alu")
    (set_attr "mode" "<MODE>")])
 
+(define_insn "kior<mode>"
+  [(set (match_operand:SWI12 0 "register_operand" "=Yk")
+	(unspec:SWI12
+	  [(match_operand:SWI12 1 "register_operand" "Yk")
+	   (match_operand:SWI12 2 "register_operand" "Yk")]
+	  UNSPEC_KIOR))]
+  "TARGET_AVX512F"
+  "korw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "msklog")
+   (set_attr "prefix" "vex")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "kxor<mode>"
+  [(set (match_operand:SWI12 0 "register_operand" "=Yk")
+	(unspec:SWI12
+	  [(match_operand:SWI12 1 "register_operand" "Yk")
+	   (match_operand:SWI12 2 "register_operand" "Yk")]
+	  UNSPEC_KXOR))]
+  "TARGET_AVX512F"
+  "kxorw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "msklog")
+   (set_attr "prefix" "vex")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "kxnor<mode>"
+  [(set (match_operand:SWI12 0 "register_operand" "=Yk")
+	(not:SWI12
+	  (xor:SWI12
+	    (match_operand:SWI12 1 "register_operand" "Yk")
+	    (match_operand:SWI12 2 "register_operand" "Yk"))))]
+  "TARGET_AVX512F"
+  "kxnorw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mode" "<MODE>")
+   (set_attr "type" "msklog")
+   (set_attr "prefix" "vex")])
+
+(define_insn "kortestzhi"
+  [(set (reg:CCZ FLAGS_REG)
+	(compare:CCZ
+	  (ior:HI
+	    (match_operand:HI 0 "register_operand" "%Yk")
+	    (match_operand:HI 1 "register_operand" "Yk"))
+	  (const_int 0)))]
+  "TARGET_AVX512F && ix86_match_ccmode (insn, CCZmode)"
+  "kortestw\t{%1, %0|%0, %1}"
+  [(set_attr "mode" "HI")
+   (set_attr "type" "msklog")
+   (set_attr "prefix" "vex")])
+
+(define_insn "kortestchi"
+  [(set (reg:CCC FLAGS_REG)
+	(compare:CCC
+	  (ior:HI
+	    (match_operand:HI 0 "register_operand" "%Yk")
+	    (match_operand:HI 1 "register_operand" "Yk"))
+	  (const_int -1)))]
+  "TARGET_AVX512F && ix86_match_ccmode (insn, CCCmode)"
+  "kortestw\t{%1, %0|%0, %1}"
+  [(set_attr "mode" "HI")
+   (set_attr "type" "msklog")
+   (set_attr "prefix" "vex")])
+
+(define_insn "kunpckhi"
+  [(set (match_operand:HI 0 "register_operand" "=Yk")
+	(ior:HI
+	  (ashift:HI
+	    (match_operand:HI 1 "register_operand" "Yk")
+	    (const_int 8))
+	  (zero_extend:HI (subreg:QI (match_operand:HI 2 "register_operand" "Yk") 0))))]
+  "TARGET_AVX512F"
+  "kunpckbw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mode" "HI")
+   (set_attr "type" "msklog")
+   (set_attr "prefix" "vex")])
+
 ;; See comment for addsi_1_zext why we do use nonimmediate_operand
 ;; ??? Special case for immediate operand is missing - it is tricky.
 (define_insn "*<code>si_2_zext"
@@ -8611,23 +8756,38 @@
   "ix86_expand_unary_operator (NOT, <MODE>mode, operands); DONE;")
 
 (define_insn "*one_cmpl<mode>2_1"
-  [(set (match_operand:SWI248 0 "nonimmediate_operand" "=rm")
-	(not:SWI248 (match_operand:SWI248 1 "nonimmediate_operand" "0")))]
+  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm")
+	(not:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" "0")))]
   "ix86_unary_operator_ok (NOT, <MODE>mode, operands)"
   "not{<imodesuffix>}\t%0"
   [(set_attr "type" "negnot")
    (set_attr "mode" "<MODE>")])
 
+(define_insn "*one_cmplhi2_1"
+  [(set (match_operand:HI 0 "nonimmediate_operand" "=rm,Yk")
+	(not:HI (match_operand:HI 1 "nonimmediate_operand" "0,Yk")))]
+  "ix86_unary_operator_ok (NOT, HImode, operands)"
+  "@
+   not{w}\t%0
+   knotw\t{%1, %0|%0, %1}"
+  [(set_attr "isa" "*,avx512f")
+   (set_attr "type" "negnot,msklog")
+   (set_attr "prefix" "*,vex")
+   (set_attr "mode" "HI")])
+
 ;; %%% Potential partial reg stall on alternative 1.  What to do?
 (define_insn "*one_cmplqi2_1"
-  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,r")
-	(not:QI (match_operand:QI 1 "nonimmediate_operand" "0,0")))]
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,r,Yk")
+	(not:QI (match_operand:QI 1 "nonimmediate_operand" "0,0,Yk")))]
   "ix86_unary_operator_ok (NOT, QImode, operands)"
   "@
    not{b}\t%0
-   not{l}\t%k0"
-  [(set_attr "type" "negnot")
-   (set_attr "mode" "QI,SI")])
+   not{l}\t%k0
+   knotw\t{%1, %0|%0, %1}"
+  [(set_attr "isa" "*,*,avx512f")
+   (set_attr "type" "negnot,negnot,msklog")
+   (set_attr "prefix" "*,*,vex")
+   (set_attr "mode" "QI,SI,QI")])
 
 ;; ??? Currently never generated - xor is used instead.
 (define_insn "*one_cmplsi2_1_zext"
@@ -16351,7 +16511,7 @@
 })
 
 ;; Avoid redundant prefixes by splitting HImode arithmetic to SImode.
-
+;; Do not split instructions with mask registers.
 (define_split
   [(set (match_operand 0 "register_operand")
 	(match_operator 3 "promotable_binary_operator"
@@ -16365,7 +16525,10 @@
 	    || !CONST_INT_P (operands[2])
 	    || satisfies_constraint_K (operands[2])))
        || (GET_MODE (operands[0]) == QImode
-	   && (TARGET_PROMOTE_QImode || optimize_function_for_size_p (cfun))))"
+	   && (TARGET_PROMOTE_QImode || optimize_function_for_size_p (cfun))))
+   && (! ANY_MASK_REG_P (operands[0])
+	|| ! ANY_MASK_REG_P (operands[1])
+	|| ! ANY_MASK_REG_P (operands[2]))"
   [(parallel [(set (match_dup 0)
 		   (match_op_dup 3 [(match_dup 1) (match_dup 2)]))
 	      (clobber (reg:CC FLAGS_REG))])]
@@ -16450,6 +16613,7 @@
   operands[1] = gen_lowpart (SImode, operands[1]);
 })
 
+;; Do not split instructions with mask regs.
 (define_split
   [(set (match_operand 0 "register_operand")
 	(not (match_operand 1 "register_operand")))]
@@ -16457,7 +16621,9 @@
    && (GET_MODE (operands[0]) == HImode
        || (GET_MODE (operands[0]) == QImode
 	   && (TARGET_PROMOTE_QImode
-	       || optimize_insn_for_size_p ())))"
+	       || optimize_insn_for_size_p ())))
+   && (! ANY_MASK_REG_P (operands[0])
+	 || ! ANY_MASK_REG_P (operands[1]))"
   [(set (match_dup 0)
 	(not:SI (match_dup 1)))]
 {
-- 
1.7.11.7

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH i386 2/8] [AVX512] Add mask registers.
  2013-07-31 15:31 [PATCH i386 2/8] [AVX512] Add mask registers Kirill Yukhin
@ 2013-07-31 16:34 ` Richard Henderson
       [not found]   ` <20130805180659.GA60466@msticlxl57.ims.intel.com>
  0 siblings, 1 reply; 31+ messages in thread
From: Richard Henderson @ 2013-07-31 16:34 UTC (permalink / raw)
  To: Kirill Yukhin; +Cc: ubizjak, jakub, vmakarov, gcc-patches

On 07/31/2013 05:02 AM, Kirill Yukhin wrote:
> 
> There's ICE (max. number of generated reload insns per insn is achieved (90)),
> when LRA tried to save mask register before call. This was caused by fact that  split_reg function
> in lra-constraints.c allocates memory for saved reg in SECONDARY_MEMORY_NEEDED_MODE which 

I've told you before that it's not SECONDARY_MEMORY that you want, but
a secondary reload.  You should be patching ix86_secondary_reload, right
below where we handle QImode spills from non-Q registers for x86-32.

> +(define_insn "kand<mode>"
> +  [(set (match_operand:SWI12 0 "register_operand" "=Yk")
> +	(unspec:SWI12
> +	  [(match_operand:SWI12 1 "register_operand" "Yk")
> +	   (match_operand:SWI12 2 "register_operand" "Yk")]
> +	  UNSPEC_KAND))]

Trying to avoid auto usage of the registers?  Maybe.  I can see this hurting
too, depending on how complicated the expressions get.

> +(define_insn "kxnor<mode>"
> +  [(set (match_operand:SWI12 0 "register_operand" "=Yk")
> +	(not:SWI12
> +	  (xor:SWI12

E.g. your partial conversion to unspecs here.  Will never match.

> +(define_insn "kortestzhi"
> +  [(set (reg:CCZ FLAGS_REG)
> +	(compare:CCZ
> +	  (ior:HI

And here.

> +(define_insn "kortestchi"
> +  [(set (reg:CCC FLAGS_REG)
> +	(compare:CCC
> +	  (ior:HI

And here.

> +(define_insn "kunpckhi"
> +  [(set (match_operand:HI 0 "register_operand" "=Yk")
> +	(ior:HI
> +	  (ashift:HI

And here.

> +(define_insn "*one_cmplhi2_1"
> +  [(set (match_operand:HI 0 "nonimmediate_operand" "=rm,Yk")
> +	(not:HI (match_operand:HI 1 "nonimmediate_operand" "0,Yk")))]
> +  "ix86_unary_operator_ok (NOT, HImode, operands)"
> +  "@
> +   not{w}\t%0
> +   knotw\t{%1, %0|%0, %1}"

But not here?  And if this is the route to go, surely "*Yk".



r~

^ permalink raw reply	[flat|nested] 31+ messages in thread

* avx512 mask register spilling
       [not found]       ` <20130806135722.GA16463@msticlxl57.ims.intel.com>
@ 2013-08-06 19:11         ` Richard Henderson
  2013-08-07 15:55           ` Vladimir Makarov
  2013-08-06 19:43         ` [PATCH i386 2/8] [AVX512] Add mask registers Richard Henderson
  1 sibling, 1 reply; 31+ messages in thread
From: Richard Henderson @ 2013-08-06 19:11 UTC (permalink / raw)
  To: Kirill Yukhin; +Cc: vmakarov, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 2007 bytes --]

On 08/06/2013 03:57 AM, Kirill Yukhin wrote:
> On 05 Aug 09:55, Richard Henderson wrote:
>> On 08/05/2013 08:07 AM, Kirill Yukhin wrote:
>>> Hello Richard, Vlad,
>>>
>>> On 31 Jul 06:26, Richard Henderson wrote:
>>>> On 07/31/2013 05:02 AM, Kirill Yukhin wrote:
>>>>>
>>>>> There's ICE (max. number of generated reload insns per insn is achieved (90)),
>>>>> when LRA tried to save mask register before call. This was caused by fact that  split_reg function
>>>>> in lra-constraints.c allocates memory for saved reg in SECONDARY_MEMORY_NEEDED_MODE which 
>>>>
>>>> I've told you before that it's not SECONDARY_MEMORY that you want, but
>>>> a secondary reload.  You should be patching ix86_secondary_reload, right
>>>> below where we handle QImode spills from non-Q registers for x86-32.
>>>
>>> Trying to do that with no success so far.
>>> Could you pls correct me if I am wrong.
>>> What I am trying to do is to introduce 2 new `define_expand' for load and store.
>>
>> Huh?  You shouldn't need this.
>>
>> Give me a test case and I can have a look at it.
> 
> Hello,
> 
> I've squashed part 1 and 2 + rebased on recent trunk.
> Testcase is attached.
> To reproduce: build-x86_64-linux/gcc/xgcc -Bbuild-x86_64-linux/gcc -Ofast -mavx512f -march=core-avx2 repro.c  -S -o-  -ffixed-rsi  -ffixed-rdi  -ffixed-rbx -ffixed-rbp -m32
> 
> Thanks a lot for help!

You've found what I believe to be a bug in LRA.

Specifically, lra-constraints.c split_reg uses SECONDARY_MEMORY_NEEDED_MODE to
choose what mode to spill a caller-save register.  Given the existing
definition in i386.h, this tries to spill a MASK_CLASS register in SImode.  But
MASK_CLASS does not support SImode, only QI/HImode.  Which leads to substantial
confusion in the allocator trying to satisfy the move.

I believe the use of SECONDARY_MEMORY_NEEDED_MODE in split_reg is wrong.
What's the history behind that, Vlad?  Surely we can spill the value in
its current mode?

Certainly this patch fixes the crash from Kirill's reproducer...


r~


[-- Attachment #2: z --]
[-- Type: text/plain, Size: 620 bytes --]

	* lra-constraints.c (split_reg): Always spill in the current mode.


diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index 728c058..924d588 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -4388,11 +4388,7 @@ split_reg (bool before_p, int original_regno, rtx insn, rtx next_usage_insns)
     {
       enum machine_mode sec_mode;
 
-#ifdef SECONDARY_MEMORY_NEEDED_MODE
-      sec_mode = SECONDARY_MEMORY_NEEDED_MODE (GET_MODE (original_reg));
-#else
       sec_mode = GET_MODE (original_reg);
-#endif
       new_reg = lra_create_new_reg (sec_mode, NULL_RTX,
 				    NO_REGS, "save");
     }

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH i386 2/8] [AVX512] Add mask registers.
       [not found]       ` <20130806135722.GA16463@msticlxl57.ims.intel.com>
  2013-08-06 19:11         ` avx512 mask register spilling Richard Henderson
@ 2013-08-06 19:43         ` Richard Henderson
  2013-08-07 20:03           ` Kirill Yukhin
  1 sibling, 1 reply; 31+ messages in thread
From: Richard Henderson @ 2013-08-06 19:43 UTC (permalink / raw)
  To: Kirill Yukhin; +Cc: GCC Patches, Vladimir Makarov

BTW, you've a bug in your movqi pattern:

> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index b4c9ac5..d8401b5 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -2298,7 +2298,7 @@
>  ;; partial register stall can be caused there.  Then we use movzx.
>  (define_insn "*movqi_internal"
>    [(set (match_operand:QI 0 "nonimmediate_operand" "=q,q ,q ,r,r ,?r,m ,Yk,Yk,r
> -       (match_operand:QI 1 "general_operand"      "q ,qn,qm,q,rn,qm,qn,rm,Yk,Yk
> +       (match_operand:QI 1 "general_operand"      "q ,qn,qm,q,rn,qm,qn,r,Yk,Yk"
>    "!(MEM_P (operands[0]) && MEM_P (operands[1]))"
>  {
>    switch (get_attr_type (insn))

You can't read from memory into a mask register in QImode.
This will fail if the memory was the last byte of a page,
and the following page is not mapped.

I expected you to need the following patch, to help spill
and fill QImode values, but I havn't found a test case that
actually needs it.  Perhaps LRA is better than old reload
about guessing things like this?

> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index fa79441..0e74ec6 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -33926,10 +33926,11 @@ ix86_secondary_reload (bool in_p, rtx x, reg_class_t r
>  
>    /* QImode spills from non-QI registers require
>       intermediate register on 32bit targets.  */
> -  if (!TARGET_64BIT
> -      && !in_p && mode == QImode
> -      && INTEGER_CLASS_P (rclass)
> -      && MAYBE_NON_Q_CLASS_P (rclass))
> +  if (mode == QImode
> +      && (MAYBE_MASK_CLASS_P (rclass)
> +          || (!TARGET_64BIT && !in_p
> +              && INTEGER_CLASS_P (rclass)
> +              && MAYBE_NON_Q_CLASS_P (rclass))))
>      {
>        int regno;
>  


r~

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: avx512 mask register spilling
  2013-08-06 19:11         ` avx512 mask register spilling Richard Henderson
@ 2013-08-07 15:55           ` Vladimir Makarov
  0 siblings, 0 replies; 31+ messages in thread
From: Vladimir Makarov @ 2013-08-07 15:55 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Kirill Yukhin, GCC Patches

On 13-08-06 3:11 PM, Richard Henderson wrote:
> On 08/06/2013 03:57 AM, Kirill Yukhin wrote:
>> On 05 Aug 09:55, Richard Henderson wrote:
>>> On 08/05/2013 08:07 AM, Kirill Yukhin wrote:
>>>> Hello Richard, Vlad,
>>>>
>>>> On 31 Jul 06:26, Richard Henderson wrote:
>>>>> On 07/31/2013 05:02 AM, Kirill Yukhin wrote:
>>>>>> There's ICE (max. number of generated reload insns per insn is achieved (90)),
>>>>>> when LRA tried to save mask register before call. This was caused by fact that  split_reg function
>>>>>> in lra-constraints.c allocates memory for saved reg in SECONDARY_MEMORY_NEEDED_MODE which
>>>>> I've told you before that it's not SECONDARY_MEMORY that you want, but
>>>>> a secondary reload.  You should be patching ix86_secondary_reload, right
>>>>> below where we handle QImode spills from non-Q registers for x86-32.
>>>> Trying to do that with no success so far.
>>>> Could you pls correct me if I am wrong.
>>>> What I am trying to do is to introduce 2 new `define_expand' for load and store.
>>> Huh?  You shouldn't need this.
>>>
>>> Give me a test case and I can have a look at it.
>> Hello,
>>
>> I've squashed part 1 and 2 + rebased on recent trunk.
>> Testcase is attached.
>> To reproduce: build-x86_64-linux/gcc/xgcc -Bbuild-x86_64-linux/gcc -Ofast -mavx512f -march=core-avx2 repro.c  -S -o-  -ffixed-rsi  -ffixed-rdi  -ffixed-rbx -ffixed-rbp -m32
>>
>> Thanks a lot for help!
> You've found what I believe to be a bug in LRA.
>
> Specifically, lra-constraints.c split_reg uses SECONDARY_MEMORY_NEEDED_MODE to
> choose what mode to spill a caller-save register.  Given the existing
> definition in i386.h, this tries to spill a MASK_CLASS register in SImode.  But
> MASK_CLASS does not support SImode, only QI/HImode.  Which leads to substantial
> confusion in the allocator trying to satisfy the move.
>
> I believe the use of SECONDARY_MEMORY_NEEDED_MODE in split_reg is wrong.
> What's the history behind that, Vlad?  Surely we can spill the value in
> its current mode?

As I remember I tried to decrease number of macros used for LRA.

Just using mode of reg might not work for general case.

Reload (caller-saves.c) uses HARD_REGNO_CALLER_SAVE_MODE.   I guess we 
should use it.

I'll try to implement this and after some testing and checking on a few 
platform I'll commit it.

I guess we will have a solution at the end of this week.
> Certainly this patch fixes the crash from Kirill's reproducer...
>
Thanks for working on this, Richard.
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH i386 2/8] [AVX512] Add mask registers.
  2013-08-06 19:43         ` [PATCH i386 2/8] [AVX512] Add mask registers Richard Henderson
@ 2013-08-07 20:03           ` Kirill Yukhin
  2013-08-14  7:24             ` Kirill Yukhin
  0 siblings, 1 reply; 31+ messages in thread
From: Kirill Yukhin @ 2013-08-07 20:03 UTC (permalink / raw)
  To: Richard Henderson, ubizjak, jakub; +Cc: GCC Patches, Vladimir Makarov

Hello!
 
> You can't read from memory into a mask register in QImode.
> This will fail if the memory was the last byte of a page,
> and the following page is not mapped.
> 
> I expected you to need the following patch, to help spill
> and fill QImode values, but I havn't found a test case that
> actually needs it.  Perhaps LRA is better than old reload
> about guessing things like this?

Thanks a lot for inputs and patch. We fixed them and applied your changes.

Updated patch is attached. Testcase I provided is working
when your patch to LRA is applied.

It is applicable on top of [1/8] (which was rebased on new trunk today).

Testing:
 - It passes bootstrap.
 - It introduces no regressions.
 - Spec2006 build pass both with and wothout -mavx512f.

Is it ok?

---
 gcc/config/i386/constraints.md |   8 +-
 gcc/config/i386/i386.c         |  32 +++++--
 gcc/config/i386/i386.h         |  38 ++++++--
 gcc/config/i386/i386.md        | 204 +++++++++++++++++++++++++++++++++++++----
 4 files changed, 245 insertions(+), 37 deletions(-)

diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
index 28e626f..92e0c05 100644
--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -19,7 +19,7 @@
 
 ;;; Unused letters:
 ;;;     B     H           T
-;;;           h jk
+;;;           h j
 
 ;; Integer register constraints.
 ;; It is not necessary to define 'r' here.
@@ -78,6 +78,12 @@
  "TARGET_80387 || TARGET_FLOAT_RETURNS_IN_80387 ? FP_SECOND_REG : NO_REGS"
  "Second from top of 80387 floating-point stack (@code{%st(1)}).")
 
+(define_register_constraint "k" "TARGET_AVX512F ? MASK_EVEX_REGS : NO_REGS"
+"@internal Any mask register that can be used as predicate, i.e. k1-k7.")
+
+(define_register_constraint "Yk" "TARGET_AVX512F ? MASK_REGS : NO_REGS"
+"@internal Any mask register.")
+
 ;; Vector registers (also used for plain floating point nowadays).
 (define_register_constraint "y" "TARGET_MMX ? MMX_REGS : NO_REGS"
  "Any MMX register.")
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index cb86e3b..97ec224 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2195,6 +2195,9 @@ enum reg_class const regclass_map[FIRST_PSEUDO_REGISTER] =
   EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS,
   EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS,
   EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS,
+  /* Mask registers.  */
+  MASK_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS,
+  MASK_EVEX_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS,
 };
 
 /* The "default" register map used in 32bit mode.  */
@@ -2210,6 +2213,7 @@ int const dbx_register_map[FIRST_PSEUDO_REGISTER] =
   -1, -1, -1, -1, -1, -1, -1, -1,	/* extended SSE registers */
   -1, -1, -1, -1, -1, -1, -1, -1,       /* new SSE registers 16-23*/
   -1, -1, -1, -1, -1, -1, -1, -1,       /* new SSE registers 24-31*/
+  93, 94, 95, 96, 97, 98, 99, 100,      /* Mask registers */
 };
 
 /* The "default" register map used in 64bit mode.  */
@@ -2225,6 +2229,7 @@ int const dbx64_register_map[FIRST_PSEUDO_REGISTER] =
   25, 26, 27, 28, 29, 30, 31, 32,	/* extended SSE registers */
   67, 68, 69, 70, 71, 72, 73, 74,       /* new SSE registers 16-23 */
   75, 76, 77, 78, 79, 80, 81, 82,       /* new SSE registers 24-31 */
+  118, 119, 120, 121, 122, 123, 124, 125, /* Mask registers */
 };
 
 /* Define the register numbers to be used in Dwarf debugging information.
@@ -2292,6 +2297,7 @@ int const svr4_dbx_register_map[FIRST_PSEUDO_REGISTER] =
   -1, -1, -1, -1, -1, -1, -1, -1,	/* extended SSE registers */
   -1, -1, -1, -1, -1, -1, -1, -1,       /* new SSE registers 16-23*/
   -1, -1, -1, -1, -1, -1, -1, -1,       /* new SSE registers 24-31*/
+  93, 94, 95, 96, 97, 98, 99, 100,      /* Mask registers */
 };
 
 /* Define parameter passing and return registers.  */
@@ -4157,7 +4163,8 @@ ix86_conditional_register_usage (void)
   /* If AVX512F is disabled, squash the registers.  */
   if (! TARGET_AVX512F)
     for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
-      if (TEST_HARD_REG_BIT (reg_class_contents[(int)EVEX_SSE_REGS], i))
+      if (TEST_HARD_REG_BIT (reg_class_contents[(int)MASK_REGS], i)
+	  || TEST_HARD_REG_BIT (reg_class_contents[(int)EVEX_SSE_REGS], i))
 	fixed_regs[i] = call_used_regs[i] = 1, reg_names[i] = "";
 }
 
@@ -33810,10 +33817,12 @@ ix86_preferred_reload_class (rtx x, reg_class_t regclass)
     return regclass;
 
   /* Force constants into memory if we are loading a (nonzero) constant into
-     an MMX or SSE register.  This is because there are no MMX/SSE instructions
-     to load from a constant.  */
+     an MMX, SSE or MASK register.  This is because there are no MMX/SSE/MASK
+     instructions to load from a constant.  */
   if (CONSTANT_P (x)
-      && (MAYBE_MMX_CLASS_P (regclass) || MAYBE_SSE_CLASS_P (regclass)))
+      && (MAYBE_MMX_CLASS_P (regclass)
+	  || MAYBE_SSE_CLASS_P (regclass)
+	  || MAYBE_MASK_CLASS_P (regclass)))
     return NO_REGS;
 
   /* Prefer SSE regs only, if we can use them for math.  */
@@ -33917,10 +33926,11 @@ ix86_secondary_reload (bool in_p, rtx x, reg_class_t rclass,
 
   /* QImode spills from non-QI registers require
      intermediate register on 32bit targets.  */
-  if (!TARGET_64BIT
-      && !in_p && mode == QImode
-      && INTEGER_CLASS_P (rclass)
-      && MAYBE_NON_Q_CLASS_P (rclass))
+  if (mode == QImode
+      && (MAYBE_MASK_CLASS_P (rclass)
+	  || (!TARGET_64BIT && !in_p
+	      && INTEGER_CLASS_P (rclass)
+	      && MAYBE_NON_Q_CLASS_P (rclass))))
     {
       int regno;
 
@@ -34342,6 +34352,8 @@ ix86_hard_regno_mode_ok (int regno, enum machine_mode mode)
     return false;
   if (STACK_REGNO_P (regno))
     return VALID_FP_MODE_P (mode);
+  if (MASK_REGNO_P (regno))
+    return VALID_MASK_REG_MODE (mode);
   if (SSE_REGNO_P (regno))
     {
       /* We implement the move patterns for all vector modes into and
@@ -35142,6 +35154,10 @@ x86_order_regs_for_local_alloc (void)
    for (i = FIRST_EXT_REX_SSE_REG; i <= LAST_EXT_REX_SSE_REG; i++)
      reg_alloc_order [pos++] = i;
 
+   /* Mask register.  */
+   for (i = FIRST_MASK_REG; i <= LAST_MASK_REG; i++)
+     reg_alloc_order [pos++] = i;
+
    /* x87 registers.  */
    if (TARGET_SSE_MATH)
      for (i = FIRST_STACK_REG; i <= LAST_STACK_REG; i++)
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 78e33bd..d4adeb5 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -963,7 +963,7 @@ enum target_cpu_default
    eliminated during reloading in favor of either the stack or frame
    pointer.  */
 
-#define FIRST_PSEUDO_REGISTER 69
+#define FIRST_PSEUDO_REGISTER 77
 
 /* Number of hardware registers that go into the DWARF-2 unwind info.
    If not defined, equals FIRST_PSEUDO_REGISTER.  */
@@ -993,7 +993,9 @@ enum target_cpu_default
 /*xmm16,xmm17,xmm18,xmm19,xmm20,xmm21,xmm22,xmm23*/		\
      0,   0,    0,    0,    0,    0,    0,    0,		\
 /*xmm24,xmm25,xmm26,xmm27,xmm28,xmm29,xmm30,xmm31*/		\
-     0,   0,    0,    0,    0,    0,    0,    0 }
+     0,   0,    0,    0,    0,    0,    0,    0,		\
+/*  k0,  k1, k2, k3, k4, k5, k6, k7*/				\
+     0,  0,   0,  0,  0,  0,  0,  0 }
 
 /* 1 for registers not available across function calls.
    These must include the FIXED_REGISTERS and also any
@@ -1025,7 +1027,9 @@ enum target_cpu_default
 /*xmm16,xmm17,xmm18,xmm19,xmm20,xmm21,xmm22,xmm23*/		\
      6,    6,     6,    6,    6,    6,    6,    6,		\
 /*xmm24,xmm25,xmm26,xmm27,xmm28,xmm29,xmm30,xmm31*/		\
-     6,    6,     6,    6,    6,    6,    6,    6 }
+     6,    6,     6,    6,    6,    6,    6,    6,		\
+ /* k0,  k1,  k2,  k3,  k4,  k5,  k6,  k7*/			\
+     1,   1,   1,   1,   1,   1,   1,   1 }
 
 /* Order in which to allocate registers.  Each register must be
    listed once, even those in FIXED_REGISTERS.  List frame pointer
@@ -1041,7 +1045,7 @@ enum target_cpu_default
    18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32,	\
    33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,  \
    48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62,	\
-   63, 64, 65, 66, 67, 68 }
+   63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76 }
 
 /* ADJUST_REG_ALLOC_ORDER is a macro which permits reg_alloc_order
    to be rearranged based on a particular function.  When using sse math,
@@ -1138,6 +1142,8 @@ enum target_cpu_default
    || (MODE) == V16SImode || (MODE) == V32HImode || (MODE) == V8DFmode	\
    || (MODE) == V16SFmode)
 
+#define VALID_MASK_REG_MODE(MODE) ((MODE) == HImode || (MODE) == QImode)
+
 /* Value is 1 if hard register REGNO can hold a value of machine-mode MODE.  */
 
 #define HARD_REGNO_MODE_OK(REGNO, MODE)	\
@@ -1211,6 +1217,9 @@ enum target_cpu_default
 #define FIRST_EXT_REX_SSE_REG  (LAST_REX_SSE_REG + 1) /*53*/
 #define LAST_EXT_REX_SSE_REG   (FIRST_EXT_REX_SSE_REG + 15) /*68*/
 
+#define FIRST_MASK_REG  (LAST_EXT_REX_SSE_REG + 1) /*69*/
+#define LAST_MASK_REG   (FIRST_MASK_REG + 7) /*76*/
+
 /* Override this in other tm.h files to cope with various OS lossage
    requiring a frame pointer.  */
 #ifndef SUBTARGET_FRAME_POINTER_REQUIRED
@@ -1299,6 +1308,8 @@ enum reg_class
   FLOAT_INT_REGS,
   INT_SSE_REGS,
   FLOAT_INT_SSE_REGS,
+  MASK_EVEX_REGS,
+  MASK_REGS,
   ALL_REGS, LIM_REG_CLASSES
 };
 
@@ -1320,6 +1331,8 @@ enum reg_class
   reg_classes_intersect_p ((CLASS), ALL_SSE_REGS)
 #define MAYBE_MMX_CLASS_P(CLASS) \
   reg_classes_intersect_p ((CLASS), MMX_REGS)
+#define MAYBE_MASK_CLASS_P(CLASS) \
+  reg_classes_intersect_p ((CLASS), MASK_REGS)
 
 #define Q_CLASS_P(CLASS) \
   reg_class_subset_p ((CLASS), Q_REGS)
@@ -1352,6 +1365,8 @@ enum reg_class
    "FLOAT_INT_REGS",			\
    "INT_SSE_REGS",			\
    "FLOAT_INT_SSE_REGS",		\
+   "MASK_EVEX_REGS",			\
+   "MASK_REGS",				\
    "ALL_REGS" }
 
 /* Define which registers fit in which classes.  This is an initializer
@@ -1389,7 +1404,9 @@ enum reg_class
 {   0x11ffff,    0x1fe0,   0x0 },       /* FLOAT_INT_REGS */            \
 { 0x1ff100ff,0xffffffe0,  0x1f },       /* INT_SSE_REGS */              \
 { 0x1ff1ffff,0xffffffe0,  0x1f },       /* FLOAT_INT_SSE_REGS */        \
-{ 0xffffffff,0xffffffff,  0x1f }                                        \
+       { 0x0,       0x0,0x1fc0 },       /* MASK_EVEX_REGS */           \
+       { 0x0,       0x0,0x1fe0 },       /* MASK_REGS */                 \
+{ 0xffffffff,0xffffffff,0x1fff }                                        \
 }
 
 /* The same information, inverted:
@@ -1447,6 +1464,8 @@ enum reg_class
          : (N) <= LAST_REX_SSE_REG ? (FIRST_REX_SSE_REG + (N) - 8) \
                                    : (FIRST_EXT_REX_SSE_REG + (N) - 16))
 
+#define MASK_REGNO_P(N) IN_RANGE ((N), FIRST_MASK_REG, LAST_MASK_REG)
+#define ANY_MASK_REG_P(X) (REG_P (X) && MASK_REGNO_P (REGNO (X)))
 
 #define SSE_FLOAT_MODE_P(MODE) \
   ((TARGET_SSE && (MODE) == SFmode) || (TARGET_SSE2 && (MODE) == DFmode))
@@ -1499,10 +1518,10 @@ enum reg_class
 
 /* Get_secondary_mem widens integral modes to BITS_PER_WORD.
    There is no need to emit full 64 bit move on 64 bit targets
-   for integral modes that can be moved using 32 bit move.  */
+   for integral modes that can be moved using 8 bit move.  */
 #define SECONDARY_MEMORY_NEEDED_MODE(MODE)			\
-  (GET_MODE_BITSIZE (MODE) < 32 && INTEGRAL_MODE_P (MODE)	\
-   ? mode_for_size (32, GET_MODE_CLASS (MODE), 0)		\
+  (GET_MODE_BITSIZE (MODE) < 8 && INTEGRAL_MODE_P (MODE)	\
+   ? mode_for_size (8, GET_MODE_CLASS (MODE), 0)		\
    : MODE)
 
 /* Return a class of registers that cannot change FROM mode to TO mode.  */
@@ -2003,7 +2022,8 @@ do {							\
  "xmm16", "xmm17", "xmm18", "xmm19",					\
  "xmm20", "xmm21", "xmm22", "xmm23",					\
  "xmm24", "xmm25", "xmm26", "xmm27",					\
- "xmm28", "xmm29", "xmm30", "xmm31" }
+ "xmm28", "xmm29", "xmm30", "xmm31",					\
+ "k0", "k1", "k2", "k3", "k4", "k5", "k6", "k7" }
 
 #define REGISTER_NAMES HI_REGISTER_NAMES
 
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 4e58982..f901bd3 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -178,6 +178,12 @@
   ;; For BMI2 support
   UNSPEC_PDEP
   UNSPEC_PEXT
+
+  ;; For AVX512F mask support
+  UNSPEC_KIOR
+  UNSPEC_KXOR
+  UNSPEC_KAND
+  UNSPEC_KANDN
 ])
 
 (define_c_enum "unspecv" [
@@ -328,6 +334,14 @@
    (XMM29_REG			66)
    (XMM30_REG			67)
    (XMM31_REG			68)
+   (MASK0_REG			69)
+   (MASK1_REG			70)
+   (MASK2_REG			71)
+   (MASK3_REG			72)
+   (MASK4_REG			73)
+   (MASK5_REG			74)
+   (MASK6_REG			75)
+   (MASK7_REG			76)
   ])
 
 ;; Insns whose names begin with "x86_" are emitted by gen_FOO calls
@@ -360,7 +374,7 @@
    sseishft,sseishft1,ssecmp,ssecomi,
    ssecvt,ssecvt1,sseicvt,sseins,
    sseshuf,sseshuf1,ssemuladd,sse4arg,
-   lwp,
+   lwp,mskmov,msklog,
    mmx,mmxmov,mmxadd,mmxmul,mmxcmp,mmxcvt,mmxshft"
   (const_string "other"))
 
@@ -379,7 +393,7 @@
 			  ssemul,sseimul,ssediv,sselog,sselog1,
 			  sseishft,sseishft1,ssecmp,ssecomi,
 			  ssecvt,ssecvt1,sseicvt,sseins,
-			  sseshuf,sseshuf1,ssemuladd,sse4arg")
+			  sseshuf,sseshuf1,ssemuladd,sse4arg,mskmov")
 	   (const_string "sse")
 	 (eq_attr "type" "mmx,mmxmov,mmxadd,mmxmul,mmxcmp,mmxcvt,mmxshft")
 	   (const_string "mmx")
@@ -390,7 +404,7 @@
 ;; The (bounding maximum) length of an instruction immediate.
 (define_attr "length_immediate" ""
   (cond [(eq_attr "type" "incdec,setcc,icmov,str,lea,other,multi,idiv,leave,
-			  bitmanip,imulx")
+			  bitmanip,imulx,msklog,mskmov")
 	   (const_int 0)
 	 (eq_attr "unit" "i387,sse,mmx")
 	   (const_int 0)
@@ -451,7 +465,7 @@
 ;; Set when 0f opcode prefix is used.
 (define_attr "prefix_0f" ""
   (if_then_else
-    (ior (eq_attr "type" "imovx,setcc,icmov,bitmanip")
+    (ior (eq_attr "type" "imovx,setcc,icmov,bitmanip,msklog,mskmov")
 	 (eq_attr "unit" "sse,mmx"))
     (const_int 1)
     (const_int 0)))
@@ -651,7 +665,7 @@
 		   fmov,fcmp,fsgn,
 		   sse,ssemov,ssecmp,ssecomi,ssecvt,ssecvt1,sseicvt,
 		   sselog1,sseshuf1,sseadd1,sseiadd1,sseishft1,
-		   mmx,mmxmov,mmxcmp,mmxcvt")
+		   mmx,mmxmov,mmxcmp,mmxcvt,mskmov,msklog")
 	      (match_operand 2 "memory_operand"))
 	   (const_string "load")
 	 (and (eq_attr "type" "icmov,ssemuladd,sse4arg")
@@ -2211,8 +2225,8 @@
 	   (const_string "SI")))])
 
 (define_insn "*movhi_internal"
-  [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r ,r ,m")
-	(match_operand:HI 1 "general_operand"	   "r ,rn,rm,rn"))]
+  [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r ,r ,m ,Yk,Yk,rm")
+	(match_operand:HI 1 "general_operand"      "r ,rn,rm,rn,rm,Yk,Yk"))]
   "!(MEM_P (operands[0]) && MEM_P (operands[1]))"
 {
   switch (get_attr_type (insn))
@@ -2221,6 +2235,16 @@
       /* movzwl is faster than movw on p2 due to partial word stalls,
 	 though not as fast as an aligned movl.  */
       return "movz{wl|x}\t{%1, %k0|%k0, %1}";
+
+    case TYPE_MSKMOV:
+      switch (which_alternative)
+        {
+	case 4: return "kmovw\t{%k1, %0|%0, %k1}";
+	case 5: return "kmovw\t{%1, %0|%0, %1}";
+	case 6: return "kmovw\t{%1, %k0|%k0, %1}";
+	default: gcc_unreachable ();
+	}
+
     default:
       if (get_attr_mode (insn) == MODE_SI)
         return "mov{l}\t{%k1, %k0|%k0, %k1}";
@@ -2238,11 +2262,17 @@
 	    (and (eq_attr "alternative" "1,2")
 		 (match_operand:HI 1 "aligned_operand"))
 	      (const_string "imov")
+	    (eq_attr "alternative" "4,5,6")
+	      (const_string "mskmov")
 	    (and (match_test "TARGET_MOVX")
 		 (eq_attr "alternative" "0,2"))
 	      (const_string "imovx")
 	   ]
 	   (const_string "imov")))
+    (set (attr "prefix")
+      (if_then_else (eq_attr "alternative" "4,5,6")
+	(const_string "vex")
+	(const_string "orig")))
     (set (attr "mode")
       (cond [(eq_attr "type" "imovx")
 	       (const_string "SI")
@@ -2267,8 +2297,8 @@
 ;; register stall machines with, where we use QImode instructions, since
 ;; partial register stall can be caused there.  Then we use movzx.
 (define_insn "*movqi_internal"
-  [(set (match_operand:QI 0 "nonimmediate_operand" "=q,q ,q ,r,r ,?r,m")
-	(match_operand:QI 1 "general_operand"      "q ,qn,qm,q,rn,qm,qn"))]
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=q,q ,q ,r,r ,?r,m ,Yk,Yk,r")
+	(match_operand:QI 1 "general_operand"      "q ,qn,qm,q,rn,qm,qn,r ,Yk,Yk"))]
   "!(MEM_P (operands[0]) && MEM_P (operands[1]))"
 {
   switch (get_attr_type (insn))
@@ -2276,6 +2306,16 @@
     case TYPE_IMOVX:
       gcc_assert (ANY_QI_REG_P (operands[1]) || MEM_P (operands[1]));
       return "movz{bl|x}\t{%1, %k0|%k0, %1}";
+
+    case TYPE_MSKMOV:
+      switch (which_alternative)
+        {
+	case 7: return "kmovw\t{%k1, %0|%0, %k1}";
+	case 8: return "kmovw\t{%1, %0|%0, %1}";
+	case 9: return "kmovw\t{%1, %k0|%k0, %1}";
+	default: gcc_unreachable ();
+	}
+
     default:
       if (get_attr_mode (insn) == MODE_SI)
         return "mov{l}\t{%k1, %k0|%k0, %k1}";
@@ -2295,11 +2335,17 @@
 	      (const_string "imov")
 	    (eq_attr "alternative" "3,5")
 	      (const_string "imovx")
+	    (eq_attr "alternative" "7,8,9")
+	      (const_string "mskmov")
 	    (and (match_test "TARGET_MOVX")
 		 (eq_attr "alternative" "2"))
 	      (const_string "imovx")
 	   ]
 	   (const_string "imov")))
+   (set (attr "prefix")
+     (if_then_else (eq_attr "alternative" "7,8,9")
+       (const_string "vex")
+       (const_string "orig")))
    (set (attr "mode")
       (cond [(eq_attr "alternative" "3,4,5")
 	       (const_string "SI")
@@ -7627,6 +7673,18 @@
        (const_string "*")))
    (set_attr "mode" "HI,HI,SI")])
 
+(define_insn "kandn<mode>"
+  [(set (match_operand:SWI12 0 "register_operand" "=Yk")
+	(unspec:SWI12
+	  [(match_operand:SWI12 1 "register_operand" "Yk")
+	   (match_operand:SWI12 2 "register_operand" "Yk")]
+	 UNSPEC_KANDN))]
+  "TARGET_AVX512F"
+  "kandnw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mode" "<MODE>")
+   (set_attr "type" "msklog")
+   (set_attr "prefix" "vex")])
+
 ;; %%% Potential partial reg stall on alternative 2.  What to do?
 (define_insn "*andqi_1"
   [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,q,r")
@@ -7652,6 +7710,18 @@
   [(set_attr "type" "alu1")
    (set_attr "mode" "QI")])
 
+(define_insn "kand<mode>"
+  [(set (match_operand:SWI12 0 "register_operand" "=Yk")
+	(unspec:SWI12
+	  [(match_operand:SWI12 1 "register_operand" "Yk")
+	   (match_operand:SWI12 2 "register_operand" "Yk")]
+	  UNSPEC_KAND))]
+  "TARGET_AVX512F"
+  "kandw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "msklog")
+   (set_attr "prefix" "vex")
+   (set_attr "mode" "<MODE>")])
+
 ;; Turn *anddi_1 into *andsi_1_zext if possible.
 (define_split
   [(set (match_operand:DI 0 "register_operand")
@@ -8055,6 +8125,81 @@
   [(set_attr "type" "alu")
    (set_attr "mode" "<MODE>")])
 
+(define_insn "kior<mode>"
+  [(set (match_operand:SWI12 0 "register_operand" "=Yk")
+	(unspec:SWI12
+	  [(match_operand:SWI12 1 "register_operand" "Yk")
+	   (match_operand:SWI12 2 "register_operand" "Yk")]
+	  UNSPEC_KIOR))]
+  "TARGET_AVX512F"
+  "korw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "msklog")
+   (set_attr "prefix" "vex")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "kxor<mode>"
+  [(set (match_operand:SWI12 0 "register_operand" "=Yk")
+	(unspec:SWI12
+	  [(match_operand:SWI12 1 "register_operand" "Yk")
+	   (match_operand:SWI12 2 "register_operand" "Yk")]
+	  UNSPEC_KXOR))]
+  "TARGET_AVX512F"
+  "kxorw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "msklog")
+   (set_attr "prefix" "vex")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "kxnor<mode>"
+  [(set (match_operand:SWI12 0 "register_operand" "=Yk")
+	(not:SWI12
+	  (xor:SWI12
+	    (match_operand:SWI12 1 "register_operand" "Yk")
+	    (match_operand:SWI12 2 "register_operand" "Yk"))))]
+  "TARGET_AVX512F"
+  "kxnorw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mode" "<MODE>")
+   (set_attr "type" "msklog")
+   (set_attr "prefix" "vex")])
+
+(define_insn "kortestzhi"
+  [(set (reg:CCZ FLAGS_REG)
+	(compare:CCZ
+	  (ior:HI
+	    (match_operand:HI 0 "register_operand" "%Yk")
+	    (match_operand:HI 1 "register_operand" "Yk"))
+	  (const_int 0)))]
+  "TARGET_AVX512F && ix86_match_ccmode (insn, CCZmode)"
+  "kortestw\t{%1, %0|%0, %1}"
+  [(set_attr "mode" "HI")
+   (set_attr "type" "msklog")
+   (set_attr "prefix" "vex")])
+
+(define_insn "kortestchi"
+  [(set (reg:CCC FLAGS_REG)
+	(compare:CCC
+	  (ior:HI
+	    (match_operand:HI 0 "register_operand" "%Yk")
+	    (match_operand:HI 1 "register_operand" "Yk"))
+	  (const_int -1)))]
+  "TARGET_AVX512F && ix86_match_ccmode (insn, CCCmode)"
+  "kortestw\t{%1, %0|%0, %1}"
+  [(set_attr "mode" "HI")
+   (set_attr "type" "msklog")
+   (set_attr "prefix" "vex")])
+
+(define_insn "kunpckhi"
+  [(set (match_operand:HI 0 "register_operand" "=Yk")
+	(ior:HI
+	  (ashift:HI
+	    (match_operand:HI 1 "register_operand" "Yk")
+	    (const_int 8))
+	  (zero_extend:HI (subreg:QI (match_operand:HI 2 "register_operand" "Yk") 0))))]
+  "TARGET_AVX512F"
+  "kunpckbw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mode" "HI")
+   (set_attr "type" "msklog")
+   (set_attr "prefix" "vex")])
+
 ;; See comment for addsi_1_zext why we do use nonimmediate_operand
 ;; ??? Special case for immediate operand is missing - it is tricky.
 (define_insn "*<code>si_2_zext"
@@ -8624,23 +8769,38 @@
   "ix86_expand_unary_operator (NOT, <MODE>mode, operands); DONE;")
 
 (define_insn "*one_cmpl<mode>2_1"
-  [(set (match_operand:SWI248 0 "nonimmediate_operand" "=rm")
-	(not:SWI248 (match_operand:SWI248 1 "nonimmediate_operand" "0")))]
+  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm")
+	(not:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" "0")))]
   "ix86_unary_operator_ok (NOT, <MODE>mode, operands)"
   "not{<imodesuffix>}\t%0"
   [(set_attr "type" "negnot")
    (set_attr "mode" "<MODE>")])
 
+(define_insn "*one_cmplhi2_1"
+  [(set (match_operand:HI 0 "nonimmediate_operand" "=rm,Yk")
+	(not:HI (match_operand:HI 1 "nonimmediate_operand" "0,Yk")))]
+  "ix86_unary_operator_ok (NOT, HImode, operands)"
+  "@
+   not{w}\t%0
+   knotw\t{%1, %0|%0, %1}"
+  [(set_attr "isa" "*,avx512f")
+   (set_attr "type" "negnot,msklog")
+   (set_attr "prefix" "*,vex")
+   (set_attr "mode" "HI")])
+
 ;; %%% Potential partial reg stall on alternative 1.  What to do?
 (define_insn "*one_cmplqi2_1"
-  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,r")
-	(not:QI (match_operand:QI 1 "nonimmediate_operand" "0,0")))]
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,r,*Yk")
+	(not:QI (match_operand:QI 1 "nonimmediate_operand" "0,0,*Yk")))]
   "ix86_unary_operator_ok (NOT, QImode, operands)"
   "@
    not{b}\t%0
-   not{l}\t%k0"
-  [(set_attr "type" "negnot")
-   (set_attr "mode" "QI,SI")])
+   not{l}\t%k0
+   knotw\t{%1, %0|%0, %1}"
+  [(set_attr "isa" "*,*,avx512f")
+   (set_attr "type" "negnot,negnot,msklog")
+   (set_attr "prefix" "*,*,vex")
+   (set_attr "mode" "QI,SI,QI")])
 
 ;; ??? Currently never generated - xor is used instead.
 (define_insn "*one_cmplsi2_1_zext"
@@ -16364,7 +16524,7 @@
 })
 
 ;; Avoid redundant prefixes by splitting HImode arithmetic to SImode.
-
+;; Do not split instructions with mask registers.
 (define_split
   [(set (match_operand 0 "register_operand")
 	(match_operator 3 "promotable_binary_operator"
@@ -16378,7 +16538,10 @@
 	    || !CONST_INT_P (operands[2])
 	    || satisfies_constraint_K (operands[2])))
        || (GET_MODE (operands[0]) == QImode
-	   && (TARGET_PROMOTE_QImode || optimize_function_for_size_p (cfun))))"
+	   && (TARGET_PROMOTE_QImode || optimize_function_for_size_p (cfun))))
+   && (! ANY_MASK_REG_P (operands[0])
+	|| ! ANY_MASK_REG_P (operands[1])
+	|| ! ANY_MASK_REG_P (operands[2]))"
   [(parallel [(set (match_dup 0)
 		   (match_op_dup 3 [(match_dup 1) (match_dup 2)]))
 	      (clobber (reg:CC FLAGS_REG))])]
@@ -16463,6 +16626,7 @@
   operands[1] = gen_lowpart (SImode, operands[1]);
 })
 
+;; Do not split instructions with mask regs.
 (define_split
   [(set (match_operand 0 "register_operand")
 	(not (match_operand 1 "register_operand")))]
@@ -16470,7 +16634,9 @@
    && (GET_MODE (operands[0]) == HImode
        || (GET_MODE (operands[0]) == QImode
 	   && (TARGET_PROMOTE_QImode
-	       || optimize_insn_for_size_p ())))"
+	       || optimize_insn_for_size_p ())))
+   && (! ANY_MASK_REG_P (operands[0])
+	 || ! ANY_MASK_REG_P (operands[1]))"
   [(set (match_dup 0)
 	(not:SI (match_dup 1)))]
 {
-- 
1.7.11.7

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH i386 2/8] [AVX512] Add mask registers.
  2013-08-07 20:03           ` Kirill Yukhin
@ 2013-08-14  7:24             ` Kirill Yukhin
  2013-08-19 21:21               ` Richard Henderson
  0 siblings, 1 reply; 31+ messages in thread
From: Kirill Yukhin @ 2013-08-14  7:24 UTC (permalink / raw)
  To: Richard Henderson, ubizjak, jakub; +Cc: GCC Patches, Vladimir Makarov

Hello,
Patch was rebased on top of trunk.

It is applicable on top of [1/8] (which was rebased on new trunk today).
	
Testing:
  1. Bootstrap pass.
  2. make check shows no regressions.
  3. Spec 2000 & 2006 build show no regressions both with and without -mavx512f option.
  4. Spec 2000 & 2006 run shows no regressions without -m512d option.
		
Thanks, K

---
 gcc/config/i386/constraints.md |   8 +-
 gcc/config/i386/i386.c         |  32 +++++--
 gcc/config/i386/i386.h         |  40 ++++++--
 gcc/config/i386/i386.md        | 204 +++++++++++++++++++++++++++++++++++++----
 4 files changed, 247 insertions(+), 37 deletions(-)

diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
index 28e626f..92e0c05 100644
--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -19,7 +19,7 @@
 
 ;;; Unused letters:
 ;;;     B     H           T
-;;;           h jk
+;;;           h j
 
 ;; Integer register constraints.
 ;; It is not necessary to define 'r' here.
@@ -78,6 +78,12 @@
  "TARGET_80387 || TARGET_FLOAT_RETURNS_IN_80387 ? FP_SECOND_REG : NO_REGS"
  "Second from top of 80387 floating-point stack (@code{%st(1)}).")
 
+(define_register_constraint "k" "TARGET_AVX512F ? MASK_EVEX_REGS : NO_REGS"
+"@internal Any mask register that can be used as predicate, i.e. k1-k7.")
+
+(define_register_constraint "Yk" "TARGET_AVX512F ? MASK_REGS : NO_REGS"
+"@internal Any mask register.")
+
 ;; Vector registers (also used for plain floating point nowadays).
 (define_register_constraint "y" "TARGET_MMX ? MMX_REGS : NO_REGS"
  "Any MMX register.")
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index dd97e6b..7412745 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2303,6 +2303,9 @@ enum reg_class const regclass_map[FIRST_PSEUDO_REGISTER] =
   EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS,
   EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS,
   EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS,
+  /* Mask registers.  */
+  MASK_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS,
+  MASK_EVEX_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS,
 };
 
 /* The "default" register map used in 32bit mode.  */
@@ -2318,6 +2321,7 @@ int const dbx_register_map[FIRST_PSEUDO_REGISTER] =
   -1, -1, -1, -1, -1, -1, -1, -1,	/* extended SSE registers */
   -1, -1, -1, -1, -1, -1, -1, -1,       /* new SSE registers 16-23*/
   -1, -1, -1, -1, -1, -1, -1, -1,       /* new SSE registers 24-31*/
+  93, 94, 95, 96, 97, 98, 99, 100,      /* Mask registers */
 };
 
 /* The "default" register map used in 64bit mode.  */
@@ -2333,6 +2337,7 @@ int const dbx64_register_map[FIRST_PSEUDO_REGISTER] =
   25, 26, 27, 28, 29, 30, 31, 32,	/* extended SSE registers */
   67, 68, 69, 70, 71, 72, 73, 74,       /* new SSE registers 16-23 */
   75, 76, 77, 78, 79, 80, 81, 82,       /* new SSE registers 24-31 */
+  118, 119, 120, 121, 122, 123, 124, 125, /* Mask registers */
 };
 
 /* Define the register numbers to be used in Dwarf debugging information.
@@ -2400,6 +2405,7 @@ int const svr4_dbx_register_map[FIRST_PSEUDO_REGISTER] =
   -1, -1, -1, -1, -1, -1, -1, -1,	/* extended SSE registers */
   -1, -1, -1, -1, -1, -1, -1, -1,       /* new SSE registers 16-23*/
   -1, -1, -1, -1, -1, -1, -1, -1,       /* new SSE registers 24-31*/
+  93, 94, 95, 96, 97, 98, 99, 100,      /* Mask registers */
 };
 
 /* Define parameter passing and return registers.  */
@@ -4457,7 +4463,8 @@ ix86_conditional_register_usage (void)
   /* If AVX512F is disabled, squash the registers.  */
   if (! TARGET_AVX512F)
     for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
-      if (TEST_HARD_REG_BIT (reg_class_contents[(int)EVEX_SSE_REGS], i))
+      if (TEST_HARD_REG_BIT (reg_class_contents[(int)MASK_REGS], i)
+	  || TEST_HARD_REG_BIT (reg_class_contents[(int)EVEX_SSE_REGS], i))
 	fixed_regs[i] = call_used_regs[i] = 1, reg_names[i] = "";
 }
 
@@ -34114,10 +34121,12 @@ ix86_preferred_reload_class (rtx x, reg_class_t regclass)
     return regclass;
 
   /* Force constants into memory if we are loading a (nonzero) constant into
-     an MMX or SSE register.  This is because there are no MMX/SSE instructions
-     to load from a constant.  */
+     an MMX, SSE or MASK register.  This is because there are no MMX/SSE/MASK
+     instructions to load from a constant.  */
   if (CONSTANT_P (x)
-      && (MAYBE_MMX_CLASS_P (regclass) || MAYBE_SSE_CLASS_P (regclass)))
+      && (MAYBE_MMX_CLASS_P (regclass)
+	  || MAYBE_SSE_CLASS_P (regclass)
+	  || MAYBE_MASK_CLASS_P (regclass)))
     return NO_REGS;
 
   /* Prefer SSE regs only, if we can use them for math.  */
@@ -34221,10 +34230,11 @@ ix86_secondary_reload (bool in_p, rtx x, reg_class_t rclass,
 
   /* QImode spills from non-QI registers require
      intermediate register on 32bit targets.  */
-  if (!TARGET_64BIT
-      && !in_p && mode == QImode
-      && INTEGER_CLASS_P (rclass)
-      && MAYBE_NON_Q_CLASS_P (rclass))
+  if (mode == QImode
+      && (MAYBE_MASK_CLASS_P (rclass)
+	  || (!TARGET_64BIT && !in_p
+	      && INTEGER_CLASS_P (rclass)
+	      && MAYBE_NON_Q_CLASS_P (rclass))))
     {
       int regno;
 
@@ -34646,6 +34656,8 @@ ix86_hard_regno_mode_ok (int regno, enum machine_mode mode)
     return false;
   if (STACK_REGNO_P (regno))
     return VALID_FP_MODE_P (mode);
+  if (MASK_REGNO_P (regno))
+    return VALID_MASK_REG_MODE (mode);
   if (SSE_REGNO_P (regno))
     {
       /* We implement the move patterns for all vector modes into and
@@ -35446,6 +35458,10 @@ x86_order_regs_for_local_alloc (void)
    for (i = FIRST_EXT_REX_SSE_REG; i <= LAST_EXT_REX_SSE_REG; i++)
      reg_alloc_order [pos++] = i;
 
+   /* Mask register.  */
+   for (i = FIRST_MASK_REG; i <= LAST_MASK_REG; i++)
+     reg_alloc_order [pos++] = i;
+
    /* x87 registers.  */
    if (TARGET_SSE_MATH)
      for (i = FIRST_STACK_REG; i <= LAST_STACK_REG; i++)
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 7700d0a..7dc6856 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -893,7 +893,7 @@ enum target_cpu_default
    eliminated during reloading in favor of either the stack or frame
    pointer.  */
 
-#define FIRST_PSEUDO_REGISTER 69
+#define FIRST_PSEUDO_REGISTER 77
 
 /* Number of hardware registers that go into the DWARF-2 unwind info.
    If not defined, equals FIRST_PSEUDO_REGISTER.  */
@@ -923,7 +923,9 @@ enum target_cpu_default
 /*xmm16,xmm17,xmm18,xmm19,xmm20,xmm21,xmm22,xmm23*/		\
      0,   0,    0,    0,    0,    0,    0,    0,		\
 /*xmm24,xmm25,xmm26,xmm27,xmm28,xmm29,xmm30,xmm31*/		\
-     0,   0,    0,    0,    0,    0,    0,    0 }
+     0,   0,    0,    0,    0,    0,    0,    0,		\
+/*  k0,  k1, k2, k3, k4, k5, k6, k7*/				\
+     0,  0,   0,  0,  0,  0,  0,  0 }
 
 /* 1 for registers not available across function calls.
    These must include the FIXED_REGISTERS and also any
@@ -955,7 +957,9 @@ enum target_cpu_default
 /*xmm16,xmm17,xmm18,xmm19,xmm20,xmm21,xmm22,xmm23*/		\
      6,    6,     6,    6,    6,    6,    6,    6,		\
 /*xmm24,xmm25,xmm26,xmm27,xmm28,xmm29,xmm30,xmm31*/		\
-     6,    6,     6,    6,    6,    6,    6,    6 }
+     6,    6,     6,    6,    6,    6,    6,    6,		\
+ /* k0,  k1,  k2,  k3,  k4,  k5,  k6,  k7*/			\
+     1,   1,   1,   1,   1,   1,   1,   1 }
 
 /* Order in which to allocate registers.  Each register must be
    listed once, even those in FIXED_REGISTERS.  List frame pointer
@@ -971,7 +975,7 @@ enum target_cpu_default
    18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32,	\
    33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,  \
    48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62,	\
-   63, 64, 65, 66, 67, 68 }
+   63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76 }
 
 /* ADJUST_REG_ALLOC_ORDER is a macro which permits reg_alloc_order
    to be rearranged based on a particular function.  When using sse math,
@@ -1068,6 +1072,8 @@ enum target_cpu_default
    || (MODE) == V16SImode || (MODE) == V32HImode || (MODE) == V8DFmode	\
    || (MODE) == V16SFmode)
 
+#define VALID_MASK_REG_MODE(MODE) ((MODE) == HImode || (MODE) == QImode)
+
 /* Value is 1 if hard register REGNO can hold a value of machine-mode MODE.  */
 
 #define HARD_REGNO_MODE_OK(REGNO, MODE)	\
@@ -1093,8 +1099,10 @@ enum target_cpu_default
   (CC_REGNO_P (REGNO) ? VOIDmode					\
    : (MODE) == VOIDmode && (NREGS) != 1 ? VOIDmode			\
    : (MODE) == VOIDmode ? choose_hard_reg_mode ((REGNO), (NREGS), false) \
-   : (MODE) == HImode && !TARGET_PARTIAL_REG_STALL ? SImode		\
-   : (MODE) == QImode && !(TARGET_64BIT || QI_REGNO_P (REGNO)) ? SImode	\
+   : (MODE) == HImode && !(TARGET_PARTIAL_REG_STALL			\
+			   || MASK_REGNO_P (REGNO)) ? SImode		\
+   : (MODE) == QImode && !(TARGET_64BIT || QI_REGNO_P (REGNO)		\
+			   || MASK_REGNO_P (REGNO)) ? SImode		\
    : (MODE))
 
 /* The only ABI that saves SSE registers across calls is Win64 (thus no
@@ -1141,6 +1149,9 @@ enum target_cpu_default
 #define FIRST_EXT_REX_SSE_REG  (LAST_REX_SSE_REG + 1) /*53*/
 #define LAST_EXT_REX_SSE_REG   (FIRST_EXT_REX_SSE_REG + 15) /*68*/
 
+#define FIRST_MASK_REG  (LAST_EXT_REX_SSE_REG + 1) /*69*/
+#define LAST_MASK_REG   (FIRST_MASK_REG + 7) /*76*/
+
 /* Override this in other tm.h files to cope with various OS lossage
    requiring a frame pointer.  */
 #ifndef SUBTARGET_FRAME_POINTER_REQUIRED
@@ -1229,6 +1240,8 @@ enum reg_class
   FLOAT_INT_REGS,
   INT_SSE_REGS,
   FLOAT_INT_SSE_REGS,
+  MASK_EVEX_REGS,
+  MASK_REGS,
   ALL_REGS, LIM_REG_CLASSES
 };
 
@@ -1250,6 +1263,8 @@ enum reg_class
   reg_classes_intersect_p ((CLASS), ALL_SSE_REGS)
 #define MAYBE_MMX_CLASS_P(CLASS) \
   reg_classes_intersect_p ((CLASS), MMX_REGS)
+#define MAYBE_MASK_CLASS_P(CLASS) \
+  reg_classes_intersect_p ((CLASS), MASK_REGS)
 
 #define Q_CLASS_P(CLASS) \
   reg_class_subset_p ((CLASS), Q_REGS)
@@ -1282,6 +1297,8 @@ enum reg_class
    "FLOAT_INT_REGS",			\
    "INT_SSE_REGS",			\
    "FLOAT_INT_SSE_REGS",		\
+   "MASK_EVEX_REGS",			\
+   "MASK_REGS",				\
    "ALL_REGS" }
 
 /* Define which registers fit in which classes.  This is an initializer
@@ -1319,7 +1336,9 @@ enum reg_class
 {   0x11ffff,    0x1fe0,   0x0 },       /* FLOAT_INT_REGS */            \
 { 0x1ff100ff,0xffffffe0,  0x1f },       /* INT_SSE_REGS */              \
 { 0x1ff1ffff,0xffffffe0,  0x1f },       /* FLOAT_INT_SSE_REGS */        \
-{ 0xffffffff,0xffffffff,  0x1f }                                        \
+       { 0x0,       0x0,0x1fc0 },       /* MASK_EVEX_REGS */           \
+       { 0x0,       0x0,0x1fe0 },       /* MASK_REGS */                 \
+{ 0xffffffff,0xffffffff,0x1fff }                                        \
 }
 
 /* The same information, inverted:
@@ -1377,6 +1396,8 @@ enum reg_class
          : (N) <= LAST_REX_SSE_REG ? (FIRST_REX_SSE_REG + (N) - 8) \
                                    : (FIRST_EXT_REX_SSE_REG + (N) - 16))
 
+#define MASK_REGNO_P(N) IN_RANGE ((N), FIRST_MASK_REG, LAST_MASK_REG)
+#define ANY_MASK_REG_P(X) (REG_P (X) && MASK_REGNO_P (REGNO (X)))
 
 #define SSE_FLOAT_MODE_P(MODE) \
   ((TARGET_SSE && (MODE) == SFmode) || (TARGET_SSE2 && (MODE) == DFmode))
@@ -1429,7 +1450,7 @@ enum reg_class
 
 /* Get_secondary_mem widens integral modes to BITS_PER_WORD.
    There is no need to emit full 64 bit move on 64 bit targets
-   for integral modes that can be moved using 32 bit move.  */
+   for integral modes that can be moved using 8 bit move.  */
 #define SECONDARY_MEMORY_NEEDED_MODE(MODE)			\
   (GET_MODE_BITSIZE (MODE) < 32 && INTEGRAL_MODE_P (MODE)	\
    ? mode_for_size (32, GET_MODE_CLASS (MODE), 0)		\
@@ -1933,7 +1954,8 @@ do {							\
  "xmm16", "xmm17", "xmm18", "xmm19",					\
  "xmm20", "xmm21", "xmm22", "xmm23",					\
  "xmm24", "xmm25", "xmm26", "xmm27",					\
- "xmm28", "xmm29", "xmm30", "xmm31" }
+ "xmm28", "xmm29", "xmm30", "xmm31",					\
+ "k0", "k1", "k2", "k3", "k4", "k5", "k6", "k7" }
 
 #define REGISTER_NAMES HI_REGISTER_NAMES
 
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index a666794..630b87e 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -178,6 +178,12 @@
   ;; For BMI2 support
   UNSPEC_PDEP
   UNSPEC_PEXT
+
+  ;; For AVX512F mask support
+  UNSPEC_KIOR
+  UNSPEC_KXOR
+  UNSPEC_KAND
+  UNSPEC_KANDN
 ])
 
 (define_c_enum "unspecv" [
@@ -328,6 +334,14 @@
    (XMM29_REG			66)
    (XMM30_REG			67)
    (XMM31_REG			68)
+   (MASK0_REG			69)
+   (MASK1_REG			70)
+   (MASK2_REG			71)
+   (MASK3_REG			72)
+   (MASK4_REG			73)
+   (MASK5_REG			74)
+   (MASK6_REG			75)
+   (MASK7_REG			76)
   ])
 
 ;; Insns whose names begin with "x86_" are emitted by gen_FOO calls
@@ -360,7 +374,7 @@
    sseishft,sseishft1,ssecmp,ssecomi,
    ssecvt,ssecvt1,sseicvt,sseins,
    sseshuf,sseshuf1,ssemuladd,sse4arg,
-   lwp,
+   lwp,mskmov,msklog,
    mmx,mmxmov,mmxadd,mmxmul,mmxcmp,mmxcvt,mmxshft"
   (const_string "other"))
 
@@ -379,7 +393,7 @@
 			  ssemul,sseimul,ssediv,sselog,sselog1,
 			  sseishft,sseishft1,ssecmp,ssecomi,
 			  ssecvt,ssecvt1,sseicvt,sseins,
-			  sseshuf,sseshuf1,ssemuladd,sse4arg")
+			  sseshuf,sseshuf1,ssemuladd,sse4arg,mskmov")
 	   (const_string "sse")
 	 (eq_attr "type" "mmx,mmxmov,mmxadd,mmxmul,mmxcmp,mmxcvt,mmxshft")
 	   (const_string "mmx")
@@ -390,7 +404,7 @@
 ;; The (bounding maximum) length of an instruction immediate.
 (define_attr "length_immediate" ""
   (cond [(eq_attr "type" "incdec,setcc,icmov,str,lea,other,multi,idiv,leave,
-			  bitmanip,imulx")
+			  bitmanip,imulx,msklog,mskmov")
 	   (const_int 0)
 	 (eq_attr "unit" "i387,sse,mmx")
 	   (const_int 0)
@@ -451,7 +465,7 @@
 ;; Set when 0f opcode prefix is used.
 (define_attr "prefix_0f" ""
   (if_then_else
-    (ior (eq_attr "type" "imovx,setcc,icmov,bitmanip")
+    (ior (eq_attr "type" "imovx,setcc,icmov,bitmanip,msklog,mskmov")
 	 (eq_attr "unit" "sse,mmx"))
     (const_int 1)
     (const_int 0)))
@@ -651,7 +665,7 @@
 		   fmov,fcmp,fsgn,
 		   sse,ssemov,ssecmp,ssecomi,ssecvt,ssecvt1,sseicvt,
 		   sselog1,sseshuf1,sseadd1,sseiadd1,sseishft1,
-		   mmx,mmxmov,mmxcmp,mmxcvt")
+		   mmx,mmxmov,mmxcmp,mmxcvt,mskmov,msklog")
 	      (match_operand 2 "memory_operand"))
 	   (const_string "load")
 	 (and (eq_attr "type" "icmov,ssemuladd,sse4arg")
@@ -2211,8 +2225,8 @@
 	   (const_string "SI")))])
 
 (define_insn "*movhi_internal"
-  [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r ,r ,m")
-	(match_operand:HI 1 "general_operand"	   "r ,rn,rm,rn"))]
+  [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r ,r ,m ,Yk,Yk,rm")
+	(match_operand:HI 1 "general_operand"      "r ,rn,rm,rn,rm,Yk,Yk"))]
   "!(MEM_P (operands[0]) && MEM_P (operands[1]))"
 {
   switch (get_attr_type (insn))
@@ -2221,6 +2235,16 @@
       /* movzwl is faster than movw on p2 due to partial word stalls,
 	 though not as fast as an aligned movl.  */
       return "movz{wl|x}\t{%1, %k0|%k0, %1}";
+
+    case TYPE_MSKMOV:
+      switch (which_alternative)
+        {
+	case 4: return "kmovw\t{%k1, %0|%0, %k1}";
+	case 5: return "kmovw\t{%1, %0|%0, %1}";
+	case 6: return "kmovw\t{%1, %k0|%k0, %1}";
+	default: gcc_unreachable ();
+	}
+
     default:
       if (get_attr_mode (insn) == MODE_SI)
         return "mov{l}\t{%k1, %k0|%k0, %k1}";
@@ -2238,11 +2262,17 @@
 	    (and (eq_attr "alternative" "1,2")
 		 (match_operand:HI 1 "aligned_operand"))
 	      (const_string "imov")
+	    (eq_attr "alternative" "4,5,6")
+	      (const_string "mskmov")
 	    (and (match_test "TARGET_MOVX")
 		 (eq_attr "alternative" "0,2"))
 	      (const_string "imovx")
 	   ]
 	   (const_string "imov")))
+    (set (attr "prefix")
+      (if_then_else (eq_attr "alternative" "4,5,6")
+	(const_string "vex")
+	(const_string "orig")))
     (set (attr "mode")
       (cond [(eq_attr "type" "imovx")
 	       (const_string "SI")
@@ -2267,8 +2297,8 @@
 ;; register stall machines with, where we use QImode instructions, since
 ;; partial register stall can be caused there.  Then we use movzx.
 (define_insn "*movqi_internal"
-  [(set (match_operand:QI 0 "nonimmediate_operand" "=q,q ,q ,r,r ,?r,m")
-	(match_operand:QI 1 "general_operand"      "q ,qn,qm,q,rn,qm,qn"))]
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=q,q ,q ,r,r ,?r,m ,Yk,Yk,r")
+	(match_operand:QI 1 "general_operand"      "q ,qn,qm,q,rn,qm,qn,r ,Yk,Yk"))]
   "!(MEM_P (operands[0]) && MEM_P (operands[1]))"
 {
   switch (get_attr_type (insn))
@@ -2276,6 +2306,16 @@
     case TYPE_IMOVX:
       gcc_assert (ANY_QI_REG_P (operands[1]) || MEM_P (operands[1]));
       return "movz{bl|x}\t{%1, %k0|%k0, %1}";
+
+    case TYPE_MSKMOV:
+      switch (which_alternative)
+        {
+	case 7: return "kmovw\t{%k1, %0|%0, %k1}";
+	case 8: return "kmovw\t{%1, %0|%0, %1}";
+	case 9: return "kmovw\t{%1, %k0|%k0, %1}";
+	default: gcc_unreachable ();
+	}
+
     default:
       if (get_attr_mode (insn) == MODE_SI)
         return "mov{l}\t{%k1, %k0|%k0, %k1}";
@@ -2295,11 +2335,17 @@
 	      (const_string "imov")
 	    (eq_attr "alternative" "3,5")
 	      (const_string "imovx")
+	    (eq_attr "alternative" "7,8,9")
+	      (const_string "mskmov")
 	    (and (match_test "TARGET_MOVX")
 		 (eq_attr "alternative" "2"))
 	      (const_string "imovx")
 	   ]
 	   (const_string "imov")))
+   (set (attr "prefix")
+     (if_then_else (eq_attr "alternative" "7,8,9")
+       (const_string "vex")
+       (const_string "orig")))
    (set (attr "mode")
       (cond [(eq_attr "alternative" "3,4,5")
 	       (const_string "SI")
@@ -7639,6 +7685,18 @@
        (const_string "*")))
    (set_attr "mode" "HI,HI,SI")])
 
+(define_insn "kandn<mode>"
+  [(set (match_operand:SWI12 0 "register_operand" "=Yk")
+	(unspec:SWI12
+	  [(match_operand:SWI12 1 "register_operand" "Yk")
+	   (match_operand:SWI12 2 "register_operand" "Yk")]
+	 UNSPEC_KANDN))]
+  "TARGET_AVX512F"
+  "kandnw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mode" "<MODE>")
+   (set_attr "type" "msklog")
+   (set_attr "prefix" "vex")])
+
 ;; %%% Potential partial reg stall on alternative 2.  What to do?
 (define_insn "*andqi_1"
   [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,q,r")
@@ -7664,6 +7722,18 @@
   [(set_attr "type" "alu1")
    (set_attr "mode" "QI")])
 
+(define_insn "kand<mode>"
+  [(set (match_operand:SWI12 0 "register_operand" "=Yk")
+	(unspec:SWI12
+	  [(match_operand:SWI12 1 "register_operand" "Yk")
+	   (match_operand:SWI12 2 "register_operand" "Yk")]
+	  UNSPEC_KAND))]
+  "TARGET_AVX512F"
+  "kandw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "msklog")
+   (set_attr "prefix" "vex")
+   (set_attr "mode" "<MODE>")])
+
 ;; Turn *anddi_1 into *andsi_1_zext if possible.
 (define_split
   [(set (match_operand:DI 0 "register_operand")
@@ -8067,6 +8137,81 @@
   [(set_attr "type" "alu")
    (set_attr "mode" "<MODE>")])
 
+(define_insn "kior<mode>"
+  [(set (match_operand:SWI12 0 "register_operand" "=Yk")
+	(unspec:SWI12
+	  [(match_operand:SWI12 1 "register_operand" "Yk")
+	   (match_operand:SWI12 2 "register_operand" "Yk")]
+	  UNSPEC_KIOR))]
+  "TARGET_AVX512F"
+  "korw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "msklog")
+   (set_attr "prefix" "vex")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "kxor<mode>"
+  [(set (match_operand:SWI12 0 "register_operand" "=Yk")
+	(unspec:SWI12
+	  [(match_operand:SWI12 1 "register_operand" "Yk")
+	   (match_operand:SWI12 2 "register_operand" "Yk")]
+	  UNSPEC_KXOR))]
+  "TARGET_AVX512F"
+  "kxorw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "msklog")
+   (set_attr "prefix" "vex")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "kxnor<mode>"
+  [(set (match_operand:SWI12 0 "register_operand" "=Yk")
+	(not:SWI12
+	  (xor:SWI12
+	    (match_operand:SWI12 1 "register_operand" "Yk")
+	    (match_operand:SWI12 2 "register_operand" "Yk"))))]
+  "TARGET_AVX512F"
+  "kxnorw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mode" "<MODE>")
+   (set_attr "type" "msklog")
+   (set_attr "prefix" "vex")])
+
+(define_insn "kortestzhi"
+  [(set (reg:CCZ FLAGS_REG)
+	(compare:CCZ
+	  (ior:HI
+	    (match_operand:HI 0 "register_operand" "%Yk")
+	    (match_operand:HI 1 "register_operand" "Yk"))
+	  (const_int 0)))]
+  "TARGET_AVX512F && ix86_match_ccmode (insn, CCZmode)"
+  "kortestw\t{%1, %0|%0, %1}"
+  [(set_attr "mode" "HI")
+   (set_attr "type" "msklog")
+   (set_attr "prefix" "vex")])
+
+(define_insn "kortestchi"
+  [(set (reg:CCC FLAGS_REG)
+	(compare:CCC
+	  (ior:HI
+	    (match_operand:HI 0 "register_operand" "%Yk")
+	    (match_operand:HI 1 "register_operand" "Yk"))
+	  (const_int -1)))]
+  "TARGET_AVX512F && ix86_match_ccmode (insn, CCCmode)"
+  "kortestw\t{%1, %0|%0, %1}"
+  [(set_attr "mode" "HI")
+   (set_attr "type" "msklog")
+   (set_attr "prefix" "vex")])
+
+(define_insn "kunpckhi"
+  [(set (match_operand:HI 0 "register_operand" "=Yk")
+	(ior:HI
+	  (ashift:HI
+	    (match_operand:HI 1 "register_operand" "Yk")
+	    (const_int 8))
+	  (zero_extend:HI (subreg:QI (match_operand:HI 2 "register_operand" "Yk") 0))))]
+  "TARGET_AVX512F"
+  "kunpckbw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mode" "HI")
+   (set_attr "type" "msklog")
+   (set_attr "prefix" "vex")])
+
 ;; See comment for addsi_1_zext why we do use nonimmediate_operand
 ;; ??? Special case for immediate operand is missing - it is tricky.
 (define_insn "*<code>si_2_zext"
@@ -8636,23 +8781,38 @@
   "ix86_expand_unary_operator (NOT, <MODE>mode, operands); DONE;")
 
 (define_insn "*one_cmpl<mode>2_1"
-  [(set (match_operand:SWI248 0 "nonimmediate_operand" "=rm")
-	(not:SWI248 (match_operand:SWI248 1 "nonimmediate_operand" "0")))]
+  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm")
+	(not:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" "0")))]
   "ix86_unary_operator_ok (NOT, <MODE>mode, operands)"
   "not{<imodesuffix>}\t%0"
   [(set_attr "type" "negnot")
    (set_attr "mode" "<MODE>")])
 
+(define_insn "*one_cmplhi2_1"
+  [(set (match_operand:HI 0 "nonimmediate_operand" "=rm,Yk")
+	(not:HI (match_operand:HI 1 "nonimmediate_operand" "0,Yk")))]
+  "ix86_unary_operator_ok (NOT, HImode, operands)"
+  "@
+   not{w}\t%0
+   knotw\t{%1, %0|%0, %1}"
+  [(set_attr "isa" "*,avx512f")
+   (set_attr "type" "negnot,msklog")
+   (set_attr "prefix" "*,vex")
+   (set_attr "mode" "HI")])
+
 ;; %%% Potential partial reg stall on alternative 1.  What to do?
 (define_insn "*one_cmplqi2_1"
-  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,r")
-	(not:QI (match_operand:QI 1 "nonimmediate_operand" "0,0")))]
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,r,*Yk")
+	(not:QI (match_operand:QI 1 "nonimmediate_operand" "0,0,*Yk")))]
   "ix86_unary_operator_ok (NOT, QImode, operands)"
   "@
    not{b}\t%0
-   not{l}\t%k0"
-  [(set_attr "type" "negnot")
-   (set_attr "mode" "QI,SI")])
+   not{l}\t%k0
+   knotw\t{%1, %0|%0, %1}"
+  [(set_attr "isa" "*,*,avx512f")
+   (set_attr "type" "negnot,negnot,msklog")
+   (set_attr "prefix" "*,*,vex")
+   (set_attr "mode" "QI,SI,QI")])
 
 ;; ??? Currently never generated - xor is used instead.
 (define_insn "*one_cmplsi2_1_zext"
@@ -16376,7 +16536,7 @@
 })
 
 ;; Avoid redundant prefixes by splitting HImode arithmetic to SImode.
-
+;; Do not split instructions with mask registers.
 (define_split
   [(set (match_operand 0 "register_operand")
 	(match_operator 3 "promotable_binary_operator"
@@ -16390,7 +16550,10 @@
 	    || !CONST_INT_P (operands[2])
 	    || satisfies_constraint_K (operands[2])))
        || (GET_MODE (operands[0]) == QImode
-	   && (TARGET_PROMOTE_QImode || optimize_function_for_size_p (cfun))))"
+	   && (TARGET_PROMOTE_QImode || optimize_function_for_size_p (cfun))))
+   && (! ANY_MASK_REG_P (operands[0])
+	|| ! ANY_MASK_REG_P (operands[1])
+	|| ! ANY_MASK_REG_P (operands[2]))"
   [(parallel [(set (match_dup 0)
 		   (match_op_dup 3 [(match_dup 1) (match_dup 2)]))
 	      (clobber (reg:CC FLAGS_REG))])]
@@ -16475,6 +16638,7 @@
   operands[1] = gen_lowpart (SImode, operands[1]);
 })
 
+;; Do not split instructions with mask regs.
 (define_split
   [(set (match_operand 0 "register_operand")
 	(not (match_operand 1 "register_operand")))]
@@ -16482,7 +16646,9 @@
    && (GET_MODE (operands[0]) == HImode
        || (GET_MODE (operands[0]) == QImode
 	   && (TARGET_PROMOTE_QImode
-	       || optimize_insn_for_size_p ())))"
+	       || optimize_insn_for_size_p ())))
+   && (! ANY_MASK_REG_P (operands[0])
+	 || ! ANY_MASK_REG_P (operands[1]))"
   [(set (match_dup 0)
 	(not:SI (match_dup 1)))]
 {
-- 
1.7.11.7

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH i386 2/8] [AVX512] Add mask registers.
  2013-08-14  7:24             ` Kirill Yukhin
@ 2013-08-19 21:21               ` Richard Henderson
  2013-08-22  9:43                 ` Kirill Yukhin
  0 siblings, 1 reply; 31+ messages in thread
From: Richard Henderson @ 2013-08-19 21:21 UTC (permalink / raw)
  To: Kirill Yukhin; +Cc: ubizjak, jakub, GCC Patches, Vladimir Makarov

On 08/14/2013 12:23 AM, Kirill Yukhin wrote:
> +  ;; For AVX512F mask support
> +  UNSPEC_KIOR
> +  UNSPEC_KXOR
> +  UNSPEC_KAND
> +  UNSPEC_KANDN

I thought we determined that you didn't need these,
that "*Yk" as a constraint was sufficient.

> +(define_insn "kandn<mode>"
> +(define_insn "kand<mode>"
> +(define_insn "kior<mode>"
> +(define_insn "kxor<mode>"

Because otherwise there's nothing different between these...

> +(define_insn "kxnor<mode>"
> +(define_insn "kortestzhi"
> +(define_insn "kortestchi"
> +(define_insn "kunpckhi"
>  (define_insn "*one_cmpl<mode>2_1"
> +(define_insn "*one_cmplhi2_1"
>  (define_insn "*one_cmplqi2_1"

... and these.



r~

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH i386 2/8] [AVX512] Add mask registers.
  2013-08-19 21:21               ` Richard Henderson
@ 2013-08-22  9:43                 ` Kirill Yukhin
  2013-08-22 15:52                   ` Richard Henderson
  0 siblings, 1 reply; 31+ messages in thread
From: Kirill Yukhin @ 2013-08-22  9:43 UTC (permalink / raw)
  To: Richard Henderson; +Cc: ubizjak, jakub, GCC Patches, Vladimir Makarov

Hello Richard,

On 19 Aug 14:17, Richard Henderson wrote:
> On 08/14/2013 12:23 AM, Kirill Yukhin wrote:
> > +  ;; For AVX512F mask support
> > +  UNSPEC_KIOR
> > +  UNSPEC_KXOR
> > +  UNSPEC_KAND
> > +  UNSPEC_KANDN
> 
> I thought we determined that you didn't need these,
> that "*Yk" as a constraint was sufficient.

As far as I understood, we're talking about incorporating
of mask logic instructions into existing patterns + making 
mask constraints disparage.

E.g. for OR we have:
  (define_insn "*<code><mode>_1"
    [(set (match_operand:SWI248 0 "nonimmediate_operand" "=r,rm")
          (any_or:SWI248
           (match_operand:SWI248 1 "nonimmediate_operand" "%0,0")
           (match_operand:SWI248 2 "<general_operand>" "<g>,r<i>")))
     (clobber (reg:CC FLAGS_REG))]
    "ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
    "<logic>{<imodesuffix>}\t{%2, %0|%0, %2}"

Despite of generic OR, mask version of OR do not clobber FLAGS_REG.
Of course, we may conservatively think that it is, but I believe
this is not good idea.
Making single constraint in new pattern disparage have no sense
as far as I understad, since this is relative notion.
So, what should I do?

--
Thanks, K


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH i386 2/8] [AVX512] Add mask registers.
  2013-08-22  9:43                 ` Kirill Yukhin
@ 2013-08-22 15:52                   ` Richard Henderson
  2013-08-26 16:18                     ` Kirill Yukhin
  0 siblings, 1 reply; 31+ messages in thread
From: Richard Henderson @ 2013-08-22 15:52 UTC (permalink / raw)
  To: Kirill Yukhin; +Cc: ubizjak, jakub, GCC Patches, Vladimir Makarov

On 08/22/2013 02:35 AM, Kirill Yukhin wrote:
> Despite of generic OR, mask version of OR do not clobber FLAGS_REG.
> Of course, we may conservatively think that it is, but I believe
> this is not good idea.

I believe that having two different patterns is a worse idea.

You can always split away the clobber after reload, as we do for
when add gets implemented with lea.


r~

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH i386 2/8] [AVX512] Add mask registers.
  2013-08-22 15:52                   ` Richard Henderson
@ 2013-08-26 16:18                     ` Kirill Yukhin
  2013-08-26 16:40                       ` Richard Henderson
  0 siblings, 1 reply; 31+ messages in thread
From: Kirill Yukhin @ 2013-08-26 16:18 UTC (permalink / raw)
  To: Richard Henderson, Vladimir Makarov
  Cc: Uros Bizjak, Jakub Jelinek, GCC Patches

On 22 Aug 08:49, Richard Henderson wrote:
Hello,
> You can always split away the clobber after reload, as we do for
> when add gets implemented with lea.

I've refactored the patch, making mask logic insn patterns non-unspec.

Unfortunately I was unable to use '*' in mask alternative, since it is
not working. So, I've put '!' and all gooes fine. '?' is not working
as well.

Vlad, could you pls clarify, if it is ok to use '!'?

ChangeLog:
2013-07-24  Alexander Ivchenko  <alexander.ivchenko@intel.com>
	    Maxim Kuznetsov  <maxim.kuznetsov@intel.com>
	    Sergey Lega  <sergey.s.lega@intel.com>
	    Anna Tikhonova  <anna.tikhonova@intel.com>
	    Ilya Tocar  <ilya.tocar@intel.com>
	    Andrey Turetskiy  <andrey.turetskiy@intel.com>
	    Ilya Verbin  <ilya.verbin@intel.com>
	    Kirill Yukhin  <kirill.yukhin@intel.com>
	    Michael Zolotukhin  <michael.v.zolotukhin@intel.com>

	* config/i386/constraints.md (k): New.
	(Yk): Ditto.
	* config/i386/i386.c (const regclass_map): Add new mask registers.
	(dbx_register_map): Ditto.
	(dbx64_register_map): Ditto.
	(svr4_dbx_register_map): Ditto.
	(ix86_conditional_register_usage): Squash mask registers if AVX512F is
	disabled.
	(ix86_preferred_reload_class): Disable constants for mask registers.
	(ix86_secondary_reload): Do spill of mask register using 32-bit insn.
	(ix86_hard_regno_mode_ok): Support new mask registers.
	(x86_order_regs_for_local_alloc): Ditto.
	* config/i386/i386.h (FIRST_PSEUDO_REGISTER): Update.
	(FIXED_REGISTERS): Add new mask registers.
	(CALL_USED_REGISTERS): Ditto.
	(REG_ALLOC_ORDER): Ditto.
	(VALID_MASK_REG_MODE): New.
	(FIRST_MASK_REG): Ditto.
	(LAST_MASK_REG): Ditto.
	(reg_class): Add MASK_EVEX_REGS, MASK_REGS.
	(MAYBE_MASK_CLASS_P): New.
	(REG_CLASS_NAMES): Add MASK_EVEX_REGS, MASK_REGS.
	(REG_CLASS_CONTENTS): Ditto.
	(MASK_REGNO_P): New.
	(ANY_MASK_REG_P): Ditto.
	(HI_REGISTER_NAMES): Add new mask registers.
	* config/i386/i386.md (MASK0_REG, MASK1_REG, MASK2_REG,
	MASK3_REG, MASK4_REG, MASK5_REG, MASK6_REG,
	MASK7_REG): Constants for new mask registers.
	(attribute "type"): Add mskmov, msklog.
	(attribute "length_immediate"): Support them.
	(attribute "memory"): Ditto.
	(attribute "prefix_0f"): Ditto.
	(*movhi_internal): Support new mask registers.
	(*movqi_internal): Ditto.
	(define_split): Split out clobber pattern is a logic
	insn on mask registers.
	(*k<logic><mode>): New.
	(andhi_1): Make code visible, extend to support mask regs.
	(*andqi_1): Extend to support mask regs.
	(kandn<mode>): New.
	(define_split): Split and-not to and and not if operands
	are not mask regs.
	(*<code><mode>_1): Separate HI mode to new pattern...
	(<code>hi_1): This.
	(*<code>qi_1): Extend to support mask regs.
	(kxnor<mode>): New.
	(define_split): New split for not-xor.
	(kortestzhi): new.
	(kortestchi): Ditto.
	(kunpckhi): Ditto.
	(*one_cmpl<mode>2_1): Remove HImode and handle it...
	(*one_cmplhi2_1): ...Here, now with mask registers support.
	(*one_cmplqi2_1): Support new mask registers.
	(HI/QImode arithmetics splitter): Don't split if mask registers are used.
	(HI/QImode not splitter): Ditto.

Testing:
  1. Bootstrap pass.
  2. make check shows no regressions.
  3. Spec 2000 & 2006 build show no regressions both with and without -mavx512f option.
  4. Spec 2000 & 2006 run shows no stability regressions without -mavx512f option.

Is it ok?

Thanks, K

---
 gcc/config/i386/constraints.md |   8 +-
 gcc/config/i386/i386.c         |  34 +++--
 gcc/config/i386/i386.h         |  40 ++++--
 gcc/config/i386/i386.md        | 279 ++++++++++++++++++++++++++++++++++-------
 gcc/config/i386/predicates.md  |   4 +
 5 files changed, 306 insertions(+), 59 deletions(-)

diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
index 28e626f..92e0c05 100644
--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -19,7 +19,7 @@
 
 ;;; Unused letters:
 ;;;     B     H           T
-;;;           h jk
+;;;           h j
 
 ;; Integer register constraints.
 ;; It is not necessary to define 'r' here.
@@ -78,6 +78,12 @@
  "TARGET_80387 || TARGET_FLOAT_RETURNS_IN_80387 ? FP_SECOND_REG : NO_REGS"
  "Second from top of 80387 floating-point stack (@code{%st(1)}).")
 
+(define_register_constraint "k" "TARGET_AVX512F ? MASK_EVEX_REGS : NO_REGS"
+"@internal Any mask register that can be used as predicate, i.e. k1-k7.")
+
+(define_register_constraint "Yk" "TARGET_AVX512F ? MASK_REGS : NO_REGS"
+"@internal Any mask register.")
+
 ;; Vector registers (also used for plain floating point nowadays).
 (define_register_constraint "y" "TARGET_MMX ? MMX_REGS : NO_REGS"
  "Any MMX register.")
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index d05dbf0..8325919 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2032,6 +2032,9 @@ enum reg_class const regclass_map[FIRST_PSEUDO_REGISTER] =
   EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS,
   EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS,
   EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS,
+  /* Mask registers.  */
+  MASK_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS,
+  MASK_EVEX_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS,
 };
 
 /* The "default" register map used in 32bit mode.  */
@@ -2047,6 +2050,7 @@ int const dbx_register_map[FIRST_PSEUDO_REGISTER] =
   -1, -1, -1, -1, -1, -1, -1, -1,	/* extended SSE registers */
   -1, -1, -1, -1, -1, -1, -1, -1,       /* AVX-512 registers 16-23*/
   -1, -1, -1, -1, -1, -1, -1, -1,       /* AVX-512 registers 24-31*/
+  93, 94, 95, 96, 97, 98, 99, 100,      /* Mask registers */
 };
 
 /* The "default" register map used in 64bit mode.  */
@@ -2062,6 +2066,7 @@ int const dbx64_register_map[FIRST_PSEUDO_REGISTER] =
   25, 26, 27, 28, 29, 30, 31, 32,	/* extended SSE registers */
   67, 68, 69, 70, 71, 72, 73, 74,       /* AVX-512 registers 16-23 */
   75, 76, 77, 78, 79, 80, 81, 82,       /* AVX-512 registers 24-31 */
+  118, 119, 120, 121, 122, 123, 124, 125, /* Mask registers */
 };
 
 /* Define the register numbers to be used in Dwarf debugging information.
@@ -2129,6 +2134,7 @@ int const svr4_dbx_register_map[FIRST_PSEUDO_REGISTER] =
   -1, -1, -1, -1, -1, -1, -1, -1,	/* extended SSE registers */
   -1, -1, -1, -1, -1, -1, -1, -1,       /* AVX-512 registers 16-23*/
   -1, -1, -1, -1, -1, -1, -1, -1,       /* AVX-512 registers 24-31*/
+  93, 94, 95, 96, 97, 98, 99, 100,      /* Mask registers */
 };
 
 /* Define parameter passing and return registers.  */
@@ -4219,8 +4225,13 @@ ix86_conditional_register_usage (void)
 
   /* If AVX512F is disabled, squash the registers.  */
   if (! TARGET_AVX512F)
+  {
     for (i = FIRST_EXT_REX_SSE_REG; i < LAST_EXT_REX_SSE_REG; i++)
       fixed_regs[i] = call_used_regs[i] = 1, reg_names[i] = "";
+
+    for (i = FIRST_MASK_REG; i < LAST_MASK_REG; i++)
+      fixed_regs[i] = call_used_regs[i] = 1, reg_names[i] = "";
+  }
 }
 
 \f
@@ -33889,10 +33900,12 @@ ix86_preferred_reload_class (rtx x, reg_class_t regclass)
     return regclass;
 
   /* Force constants into memory if we are loading a (nonzero) constant into
-     an MMX or SSE register.  This is because there are no MMX/SSE instructions
-     to load from a constant.  */
+     an MMX, SSE or MASK register.  This is because there are no MMX/SSE/MASK
+     instructions to load from a constant.  */
   if (CONSTANT_P (x)
-      && (MAYBE_MMX_CLASS_P (regclass) || MAYBE_SSE_CLASS_P (regclass)))
+      && (MAYBE_MMX_CLASS_P (regclass)
+	  || MAYBE_SSE_CLASS_P (regclass)
+	  || MAYBE_MASK_CLASS_P (regclass)))
     return NO_REGS;
 
   /* Prefer SSE regs only, if we can use them for math.  */
@@ -33996,10 +34009,11 @@ ix86_secondary_reload (bool in_p, rtx x, reg_class_t rclass,
 
   /* QImode spills from non-QI registers require
      intermediate register on 32bit targets.  */
-  if (!TARGET_64BIT
-      && !in_p && mode == QImode
-      && INTEGER_CLASS_P (rclass)
-      && MAYBE_NON_Q_CLASS_P (rclass))
+  if (mode == QImode
+      && (MAYBE_MASK_CLASS_P (rclass)
+	  || (!TARGET_64BIT && !in_p
+	      && INTEGER_CLASS_P (rclass)
+	      && MAYBE_NON_Q_CLASS_P (rclass))))
     {
       int regno;
 
@@ -34421,6 +34435,8 @@ ix86_hard_regno_mode_ok (int regno, enum machine_mode mode)
     return false;
   if (STACK_REGNO_P (regno))
     return VALID_FP_MODE_P (mode);
+  if (MASK_REGNO_P (regno))
+    return VALID_MASK_REG_MODE (mode);
   if (SSE_REGNO_P (regno))
     {
       /* We implement the move patterns for all vector modes into and
@@ -35230,6 +35246,10 @@ x86_order_regs_for_local_alloc (void)
    for (i = FIRST_EXT_REX_SSE_REG; i <= LAST_EXT_REX_SSE_REG; i++)
      reg_alloc_order [pos++] = i;
 
+   /* Mask register.  */
+   for (i = FIRST_MASK_REG; i <= LAST_MASK_REG; i++)
+     reg_alloc_order [pos++] = i;
+
    /* x87 registers.  */
    if (TARGET_SSE_MATH)
      for (i = FIRST_STACK_REG; i <= LAST_STACK_REG; i++)
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index e820aa6..13572bf2 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -893,7 +893,7 @@ enum target_cpu_default
    eliminated during reloading in favor of either the stack or frame
    pointer.  */
 
-#define FIRST_PSEUDO_REGISTER 69
+#define FIRST_PSEUDO_REGISTER 77
 
 /* Number of hardware registers that go into the DWARF-2 unwind info.
    If not defined, equals FIRST_PSEUDO_REGISTER.  */
@@ -923,7 +923,9 @@ enum target_cpu_default
 /*xmm16,xmm17,xmm18,xmm19,xmm20,xmm21,xmm22,xmm23*/		\
      0,   0,    0,    0,    0,    0,    0,    0,		\
 /*xmm24,xmm25,xmm26,xmm27,xmm28,xmm29,xmm30,xmm31*/		\
-     0,   0,    0,    0,    0,    0,    0,    0 }
+     0,   0,    0,    0,    0,    0,    0,    0,		\
+/*  k0,  k1, k2, k3, k4, k5, k6, k7*/				\
+     0,  0,   0,  0,  0,  0,  0,  0 }
 
 /* 1 for registers not available across function calls.
    These must include the FIXED_REGISTERS and also any
@@ -955,7 +957,9 @@ enum target_cpu_default
 /*xmm16,xmm17,xmm18,xmm19,xmm20,xmm21,xmm22,xmm23*/		\
      6,    6,     6,    6,    6,    6,    6,    6,		\
 /*xmm24,xmm25,xmm26,xmm27,xmm28,xmm29,xmm30,xmm31*/		\
-     6,    6,     6,    6,    6,    6,    6,    6 }
+     6,    6,     6,    6,    6,    6,    6,    6,		\
+ /* k0,  k1,  k2,  k3,  k4,  k5,  k6,  k7*/			\
+     1,   1,   1,   1,   1,   1,   1,   1 }
 
 /* Order in which to allocate registers.  Each register must be
    listed once, even those in FIXED_REGISTERS.  List frame pointer
@@ -971,7 +975,7 @@ enum target_cpu_default
    18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32,	\
    33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,  \
    48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62,	\
-   63, 64, 65, 66, 67, 68 }
+   63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76 }
 
 /* ADJUST_REG_ALLOC_ORDER is a macro which permits reg_alloc_order
    to be rearranged based on a particular function.  When using sse math,
@@ -1068,6 +1072,8 @@ enum target_cpu_default
    || (MODE) == V16SImode || (MODE) == V32HImode || (MODE) == V8DFmode	\
    || (MODE) == V16SFmode)
 
+#define VALID_MASK_REG_MODE(MODE) ((MODE) == HImode || (MODE) == QImode)
+
 /* Value is 1 if hard register REGNO can hold a value of machine-mode MODE.  */
 
 #define HARD_REGNO_MODE_OK(REGNO, MODE)	\
@@ -1093,8 +1099,10 @@ enum target_cpu_default
   (CC_REGNO_P (REGNO) ? VOIDmode					\
    : (MODE) == VOIDmode && (NREGS) != 1 ? VOIDmode			\
    : (MODE) == VOIDmode ? choose_hard_reg_mode ((REGNO), (NREGS), false) \
-   : (MODE) == HImode && !TARGET_PARTIAL_REG_STALL ? SImode		\
-   : (MODE) == QImode && !(TARGET_64BIT || QI_REGNO_P (REGNO)) ? SImode	\
+   : (MODE) == HImode && !(TARGET_PARTIAL_REG_STALL			\
+			   || MASK_REGNO_P (REGNO)) ? SImode		\
+   : (MODE) == QImode && !(TARGET_64BIT || QI_REGNO_P (REGNO)		\
+			   || MASK_REGNO_P (REGNO)) ? SImode		\
    : (MODE))
 
 /* The only ABI that saves SSE registers across calls is Win64 (thus no
@@ -1141,6 +1149,9 @@ enum target_cpu_default
 #define FIRST_EXT_REX_SSE_REG  (LAST_REX_SSE_REG + 1) /*53*/
 #define LAST_EXT_REX_SSE_REG   (FIRST_EXT_REX_SSE_REG + 15) /*68*/
 
+#define FIRST_MASK_REG  (LAST_EXT_REX_SSE_REG + 1) /*69*/
+#define LAST_MASK_REG   (FIRST_MASK_REG + 7) /*76*/
+
 /* Override this in other tm.h files to cope with various OS lossage
    requiring a frame pointer.  */
 #ifndef SUBTARGET_FRAME_POINTER_REQUIRED
@@ -1229,6 +1240,8 @@ enum reg_class
   FLOAT_INT_REGS,
   INT_SSE_REGS,
   FLOAT_INT_SSE_REGS,
+  MASK_EVEX_REGS,
+  MASK_REGS,
   ALL_REGS, LIM_REG_CLASSES
 };
 
@@ -1250,6 +1263,8 @@ enum reg_class
   reg_classes_intersect_p ((CLASS), ALL_SSE_REGS)
 #define MAYBE_MMX_CLASS_P(CLASS) \
   reg_classes_intersect_p ((CLASS), MMX_REGS)
+#define MAYBE_MASK_CLASS_P(CLASS) \
+  reg_classes_intersect_p ((CLASS), MASK_REGS)
 
 #define Q_CLASS_P(CLASS) \
   reg_class_subset_p ((CLASS), Q_REGS)
@@ -1282,6 +1297,8 @@ enum reg_class
    "FLOAT_INT_REGS",			\
    "INT_SSE_REGS",			\
    "FLOAT_INT_SSE_REGS",		\
+   "MASK_EVEX_REGS",			\
+   "MASK_REGS",				\
    "ALL_REGS" }
 
 /* Define which registers fit in which classes.  This is an initializer
@@ -1319,7 +1336,9 @@ enum reg_class
 {   0x11ffff,    0x1fe0,   0x0 },       /* FLOAT_INT_REGS */            \
 { 0x1ff100ff,0xffffffe0,  0x1f },       /* INT_SSE_REGS */              \
 { 0x1ff1ffff,0xffffffe0,  0x1f },       /* FLOAT_INT_SSE_REGS */        \
-{ 0xffffffff,0xffffffff,  0x1f }                                        \
+       { 0x0,       0x0,0x1fc0 },       /* MASK_EVEX_REGS */           \
+       { 0x0,       0x0,0x1fe0 },       /* MASK_REGS */                 \
+{ 0xffffffff,0xffffffff,0x1fff }                                        \
 }
 
 /* The same information, inverted:
@@ -1377,6 +1396,8 @@ enum reg_class
          : (N) <= LAST_REX_SSE_REG ? (FIRST_REX_SSE_REG + (N) - 8) \
                                    : (FIRST_EXT_REX_SSE_REG + (N) - 16))
 
+#define MASK_REGNO_P(N) IN_RANGE ((N), FIRST_MASK_REG, LAST_MASK_REG)
+#define ANY_MASK_REG_P(X) (REG_P (X) && MASK_REGNO_P (REGNO (X)))
 
 #define SSE_FLOAT_MODE_P(MODE) \
   ((TARGET_SSE && (MODE) == SFmode) || (TARGET_SSE2 && (MODE) == DFmode))
@@ -1429,7 +1450,7 @@ enum reg_class
 
 /* Get_secondary_mem widens integral modes to BITS_PER_WORD.
    There is no need to emit full 64 bit move on 64 bit targets
-   for integral modes that can be moved using 32 bit move.  */
+   for integral modes that can be moved using 8 bit move.  */
 #define SECONDARY_MEMORY_NEEDED_MODE(MODE)			\
   (GET_MODE_BITSIZE (MODE) < 32 && INTEGRAL_MODE_P (MODE)	\
    ? mode_for_size (32, GET_MODE_CLASS (MODE), 0)		\
@@ -1933,7 +1954,8 @@ do {							\
  "xmm16", "xmm17", "xmm18", "xmm19",					\
  "xmm20", "xmm21", "xmm22", "xmm23",					\
  "xmm24", "xmm25", "xmm26", "xmm27",					\
- "xmm28", "xmm29", "xmm30", "xmm31" }
+ "xmm28", "xmm29", "xmm30", "xmm31",					\
+ "k0", "k1", "k2", "k3", "k4", "k5", "k6", "k7" }
 
 #define REGISTER_NAMES HI_REGISTER_NAMES
 
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 3d7533a..62a2482 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -328,6 +328,14 @@
    (XMM29_REG			66)
    (XMM30_REG			67)
    (XMM31_REG			68)
+   (MASK0_REG			69)
+   (MASK1_REG			70)
+   (MASK2_REG			71)
+   (MASK3_REG			72)
+   (MASK4_REG			73)
+   (MASK5_REG			74)
+   (MASK6_REG			75)
+   (MASK7_REG			76)
   ])
 
 ;; Insns whose names begin with "x86_" are emitted by gen_FOO calls
@@ -360,7 +368,7 @@
    sseishft,sseishft1,ssecmp,ssecomi,
    ssecvt,ssecvt1,sseicvt,sseins,
    sseshuf,sseshuf1,ssemuladd,sse4arg,
-   lwp,
+   lwp,mskmov,msklog,
    mmx,mmxmov,mmxadd,mmxmul,mmxcmp,mmxcvt,mmxshft"
   (const_string "other"))
 
@@ -379,7 +387,7 @@
 			  ssemul,sseimul,ssediv,sselog,sselog1,
 			  sseishft,sseishft1,ssecmp,ssecomi,
 			  ssecvt,ssecvt1,sseicvt,sseins,
-			  sseshuf,sseshuf1,ssemuladd,sse4arg")
+			  sseshuf,sseshuf1,ssemuladd,sse4arg,mskmov")
 	   (const_string "sse")
 	 (eq_attr "type" "mmx,mmxmov,mmxadd,mmxmul,mmxcmp,mmxcvt,mmxshft")
 	   (const_string "mmx")
@@ -390,7 +398,7 @@
 ;; The (bounding maximum) length of an instruction immediate.
 (define_attr "length_immediate" ""
   (cond [(eq_attr "type" "incdec,setcc,icmov,str,lea,other,multi,idiv,leave,
-			  bitmanip,imulx")
+			  bitmanip,imulx,msklog,mskmov")
 	   (const_int 0)
 	 (eq_attr "unit" "i387,sse,mmx")
 	   (const_int 0)
@@ -451,7 +459,7 @@
 ;; Set when 0f opcode prefix is used.
 (define_attr "prefix_0f" ""
   (if_then_else
-    (ior (eq_attr "type" "imovx,setcc,icmov,bitmanip")
+    (ior (eq_attr "type" "imovx,setcc,icmov,bitmanip,msklog,mskmov")
 	 (eq_attr "unit" "sse,mmx"))
     (const_int 1)
     (const_int 0)))
@@ -651,7 +659,7 @@
 		   fmov,fcmp,fsgn,
 		   sse,ssemov,ssecmp,ssecomi,ssecvt,ssecvt1,sseicvt,
 		   sselog1,sseshuf1,sseadd1,sseiadd1,sseishft1,
-		   mmx,mmxmov,mmxcmp,mmxcvt")
+		   mmx,mmxmov,mmxcmp,mmxcvt,mskmov,msklog")
 	      (match_operand 2 "memory_operand"))
 	   (const_string "load")
 	 (and (eq_attr "type" "icmov,ssemuladd,sse4arg")
@@ -2213,8 +2221,8 @@
 	   (const_string "SI")))])
 
 (define_insn "*movhi_internal"
-  [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r ,r ,m")
-	(match_operand:HI 1 "general_operand"	   "r ,rn,rm,rn"))]
+  [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r ,r ,m ,Yk,Yk,rm")
+	(match_operand:HI 1 "general_operand"      "r ,rn,rm,rn,rm,Yk,Yk"))]
   "!(MEM_P (operands[0]) && MEM_P (operands[1]))"
 {
   switch (get_attr_type (insn))
@@ -2223,6 +2231,16 @@
       /* movzwl is faster than movw on p2 due to partial word stalls,
 	 though not as fast as an aligned movl.  */
       return "movz{wl|x}\t{%1, %k0|%k0, %1}";
+
+    case TYPE_MSKMOV:
+      switch (which_alternative)
+        {
+	case 4: return "kmovw\t{%k1, %0|%0, %k1}";
+	case 5: return "kmovw\t{%1, %0|%0, %1}";
+	case 6: return "kmovw\t{%1, %k0|%k0, %1}";
+	default: gcc_unreachable ();
+	}
+
     default:
       if (get_attr_mode (insn) == MODE_SI)
         return "mov{l}\t{%k1, %k0|%k0, %k1}";
@@ -2240,11 +2258,17 @@
 	    (and (eq_attr "alternative" "1,2")
 		 (match_operand:HI 1 "aligned_operand"))
 	      (const_string "imov")
+	    (eq_attr "alternative" "4,5,6")
+	      (const_string "mskmov")
 	    (and (match_test "TARGET_MOVX")
 		 (eq_attr "alternative" "0,2"))
 	      (const_string "imovx")
 	   ]
 	   (const_string "imov")))
+    (set (attr "prefix")
+      (if_then_else (eq_attr "alternative" "4,5,6")
+	(const_string "vex")
+	(const_string "orig")))
     (set (attr "mode")
       (cond [(eq_attr "type" "imovx")
 	       (const_string "SI")
@@ -2269,8 +2293,8 @@
 ;; register stall machines with, where we use QImode instructions, since
 ;; partial register stall can be caused there.  Then we use movzx.
 (define_insn "*movqi_internal"
-  [(set (match_operand:QI 0 "nonimmediate_operand" "=q,q ,q ,r,r ,?r,m")
-	(match_operand:QI 1 "general_operand"      "q ,qn,qm,q,rn,qm,qn"))]
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=q,q ,q ,r,r ,?r,m ,Yk,Yk,r")
+	(match_operand:QI 1 "general_operand"      "q ,qn,qm,q,rn,qm,qn,r ,Yk,Yk"))]
   "!(MEM_P (operands[0]) && MEM_P (operands[1]))"
 {
   switch (get_attr_type (insn))
@@ -2278,6 +2302,16 @@
     case TYPE_IMOVX:
       gcc_assert (ANY_QI_REG_P (operands[1]) || MEM_P (operands[1]));
       return "movz{bl|x}\t{%1, %k0|%k0, %1}";
+
+    case TYPE_MSKMOV:
+      switch (which_alternative)
+        {
+	case 7: return "kmovw\t{%k1, %0|%0, %k1}";
+	case 8: return "kmovw\t{%1, %0|%0, %1}";
+	case 9: return "kmovw\t{%1, %k0|%k0, %1}";
+	default: gcc_unreachable ();
+	}
+
     default:
       if (get_attr_mode (insn) == MODE_SI)
         return "mov{l}\t{%k1, %k0|%k0, %k1}";
@@ -2297,11 +2331,17 @@
 	      (const_string "imov")
 	    (eq_attr "alternative" "3,5")
 	      (const_string "imovx")
+	    (eq_attr "alternative" "7,8,9")
+	      (const_string "mskmov")
 	    (and (match_test "TARGET_MOVX")
 		 (eq_attr "alternative" "2"))
 	      (const_string "imovx")
 	   ]
 	   (const_string "imov")))
+   (set (attr "prefix")
+     (if_then_else (eq_attr "alternative" "7,8,9")
+       (const_string "vex")
+       (const_string "orig")))
    (set (attr "mode")
       (cond [(eq_attr "alternative" "3,4,5")
 	       (const_string "SI")
@@ -7494,6 +7534,26 @@
   operands[3] = gen_lowpart (QImode, operands[3]);
 })
 
+(define_split
+  [(set (match_operand:SWI12 0 "mask_reg_operand")
+	(any_logic:SWI12 (match_operand:SWI12 1 "mask_reg_operand")
+			 (match_operand:SWI12 2 "mask_reg_operand")))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_AVX512F && reload_completed"
+  [(set (match_operand:SWI12 0 "mask_reg_operand")
+	(any_logic:SWI12 (match_operand:SWI12 1 "mask_reg_operand")
+			 (match_operand:SWI12 2 "mask_reg_operand")))])
+
+(define_insn "*k<logic><mode>"
+  [(set (match_operand:SWI12 0 "mask_reg_operand" "=Yk")
+	(any_logic:SWI12 (match_operand:SWI12 1 "mask_reg_operand" "Yk")
+			 (match_operand:SWI12 2 "mask_reg_operand" "Yk")))]
+  "TARGET_AVX512F && reload_completed"
+  "k<logic>w\t{%2, %1, %0|%0, %1, %2}";
+  [(set_attr "mode" "<MODE>")
+   (set_attr "type" "msklog")
+   (set_attr "prefix" "vex")])
+
 ;; %%% This used to optimize known byte-wide and operations to memory,
 ;; and sometimes to QImode registers.  If this is considered useful,
 ;; it should be done with splitters.
@@ -7616,16 +7676,17 @@
   [(set_attr "type" "alu")
    (set_attr "mode" "SI")])
 
-(define_insn "*andhi_1"
-  [(set (match_operand:HI 0 "nonimmediate_operand" "=rm,r,Ya")
-	(and:HI (match_operand:HI 1 "nonimmediate_operand" "%0,0,qm")
-		(match_operand:HI 2 "general_operand" "rn,rm,L")))
+(define_insn "andhi_1"
+  [(set (match_operand:HI 0 "nonimmediate_operand" "=rm,r,Ya,!Yk")
+	(and:HI (match_operand:HI 1 "nonimmediate_operand" "%0,0,qm,Yk")
+		(match_operand:HI 2 "general_operand" "rn,rm,L,Yk")))
    (clobber (reg:CC FLAGS_REG))]
   "ix86_binary_operator_ok (AND, HImode, operands)"
 {
   switch (get_attr_type (insn))
     {
     case TYPE_IMOVX:
+    case TYPE_MSKLOG:
       return "#";
 
     default:
@@ -7633,29 +7694,30 @@
       return "and{w}\t{%2, %0|%0, %2}";
     }
 }
-  [(set_attr "type" "alu,alu,imovx")
-   (set_attr "length_immediate" "*,*,0")
+  [(set_attr "type" "alu,alu,imovx,msklog")
+   (set_attr "length_immediate" "*,*,0,*")
    (set (attr "prefix_rex")
      (if_then_else
        (and (eq_attr "type" "imovx")
 	    (match_operand 1 "ext_QIreg_operand"))
        (const_string "1")
        (const_string "*")))
-   (set_attr "mode" "HI,HI,SI")])
+   (set_attr "mode" "HI,HI,SI,HI")])
 
 ;; %%% Potential partial reg stall on alternative 2.  What to do?
 (define_insn "*andqi_1"
-  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,q,r")
-	(and:QI (match_operand:QI 1 "nonimmediate_operand" "%0,0,0")
-		(match_operand:QI 2 "general_operand" "qn,qmn,rn")))
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,q,r,!Yk")
+	(and:QI (match_operand:QI 1 "nonimmediate_operand" "%0,0,0,Yk")
+		(match_operand:QI 2 "general_operand" "qn,qmn,rn,Yk")))
    (clobber (reg:CC FLAGS_REG))]
   "ix86_binary_operator_ok (AND, QImode, operands)"
   "@
    and{b}\t{%2, %0|%0, %2}
    and{b}\t{%2, %0|%0, %2}
-   and{l}\t{%k2, %k0|%k0, %k2}"
-  [(set_attr "type" "alu")
-   (set_attr "mode" "QI,QI,SI")])
+   and{l}\t{%k2, %k0|%k0, %k2}
+   #"
+  [(set_attr "type" "alu,alu,alu,msklog")
+   (set_attr "mode" "QI,QI,SI,HI")])
 
 (define_insn "*andqi_1_slp"
   [(set (strict_low_part (match_operand:QI 0 "nonimmediate_operand" "+qm,q"))
@@ -7668,6 +7730,35 @@
   [(set_attr "type" "alu1")
    (set_attr "mode" "QI")])
 
+(define_insn "kandn<mode>"
+  [(set (match_operand:SWI12 0 "register_operand" "=r,!Yk")
+	(and:SWI12
+	  (not:SWI12
+	    (match_operand:SWI12 1 "register_operand" "0,Yk"))
+	  (match_operand:SWI12 2 "register_operand" "r,Yk")))]
+  "TARGET_AVX512F"
+  "@
+   #
+   kandnw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "*,msklog")
+   (set_attr "prefix" "*,vex")
+   (set_attr "mode" "<MODE>")])
+
+(define_split
+  [(set (match_operand:SWI12 0 "register_operand")
+	(and:SWI12
+	  (not:SWI12
+	    (match_dup 0))
+	  (match_operand:SWI12 1 "register_operand")))]
+  "TARGET_AVX512F && !ANY_MASK_REG_P (operands [0])"
+  [(set (match_dup 0)
+	(not:HI (match_dup 0)))
+   (parallel [(set (match_dup 0)
+		   (and:HI (match_dup 0)
+			   (match_dup 1)))
+	      (clobber (reg:CC FLAGS_REG))])]
+  "")
+
 ;; Turn *anddi_1 into *andsi_1_zext if possible.
 (define_split
   [(set (match_operand:DI 0 "register_operand")
@@ -7999,29 +8090,44 @@
   "ix86_expand_binary_operator (<CODE>, <MODE>mode, operands); DONE;")
 
 (define_insn "*<code><mode>_1"
-  [(set (match_operand:SWI248 0 "nonimmediate_operand" "=r,rm")
-	(any_or:SWI248
-	 (match_operand:SWI248 1 "nonimmediate_operand" "%0,0")
-	 (match_operand:SWI248 2 "<general_operand>" "<g>,r<i>")))
+  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=r,rm")
+	(any_or:SWI48
+	 (match_operand:SWI48 1 "nonimmediate_operand" "%0,0")
+	 (match_operand:SWI48 2 "<general_operand>" "<g>,r<i>")))
    (clobber (reg:CC FLAGS_REG))]
   "ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
   "<logic>{<imodesuffix>}\t{%2, %0|%0, %2}"
   [(set_attr "type" "alu")
    (set_attr "mode" "<MODE>")])
 
+(define_insn "<code>hi_1"
+  [(set (match_operand:HI 0 "nonimmediate_operand" "=r,rm,!Yk")
+	(any_or:HI
+	 (match_operand:HI 1 "nonimmediate_operand" "%0,0,Yk")
+	 (match_operand:HI 2 "general_operand" "<g>,r<i>,Yk")))
+   (clobber (reg:CC FLAGS_REG))]
+  "ix86_binary_operator_ok (<CODE>, HImode, operands)"
+  "@
+  <logic>{w}\t{%2, %0|%0, %2}
+  <logic>{w}\t{%2, %0|%0, %2}
+  #"
+  [(set_attr "type" "alu,alu,msklog")
+   (set_attr "mode" "HI")])
+
 ;; %%% Potential partial reg stall on alternative 2.  What to do?
 (define_insn "*<code>qi_1"
-  [(set (match_operand:QI 0 "nonimmediate_operand" "=q,m,r")
-	(any_or:QI (match_operand:QI 1 "nonimmediate_operand" "%0,0,0")
-		   (match_operand:QI 2 "general_operand" "qmn,qn,rn")))
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=q,m,r,!Yk")
+	(any_or:QI (match_operand:QI 1 "nonimmediate_operand" "%0,0,0,Yk")
+		   (match_operand:QI 2 "general_operand" "qmn,qn,rn,Yk")))
    (clobber (reg:CC FLAGS_REG))]
   "ix86_binary_operator_ok (<CODE>, QImode, operands)"
   "@
    <logic>{b}\t{%2, %0|%0, %2}
    <logic>{b}\t{%2, %0|%0, %2}
-   <logic>{l}\t{%k2, %k0|%k0, %k2}"
-  [(set_attr "type" "alu")
-   (set_attr "mode" "QI,QI,SI")])
+   <logic>{l}\t{%k2, %k0|%k0, %k2}
+   #"
+  [(set_attr "type" "alu,alu,alu,msklog")
+   (set_attr "mode" "QI,QI,SI,HI")])
 
 ;; See comment for addsi_1_zext why we do use nonimmediate_operand
 (define_insn "*<code>si_1_zext"
@@ -8071,6 +8177,74 @@
   [(set_attr "type" "alu")
    (set_attr "mode" "<MODE>")])
 
+(define_insn "kxnor<mode>"
+  [(set (match_operand:SWI12 0 "register_operand" "=r,!k")
+	(not:SWI12
+	  (xor:SWI12
+	    (match_operand:SWI12 1 "register_operand" "0,k"))
+	    (match_operand:SWI12 2 "register_operand" "r,k")))]
+  "TARGET_AVX512F"
+  "@
+   #
+   kxnorw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "*,msklog")
+   (set_attr "prefix" "*,vex")
+   (set_attr "mode" "<MODE>")])
+
+(define_split
+  [(set (match_operand:SWI12 0 "register_operand")
+	(not:SWI12
+	  (xor:SWI12
+	    (match_dup 0))
+	    (match_operand:SWI12 1 "register_operand")))]
+  "TARGET_AVX512F && !ANY_MASK_REG_P (operands [0])"
+   [(parallel [(set (match_dup 0)
+		    (xor:HI (match_dup 0)
+			    (match_dup 1)))
+	       (clobber (reg:CC FLAGS_REG))])
+    (set (match_dup 0)
+	 (not:HI (match_dup 0)))]
+  "")
+
+(define_insn "kortestzhi"
+  [(set (reg:CCZ FLAGS_REG)
+	(compare:CCZ
+	  (ior:HI
+	    (match_operand:HI 0 "register_operand" "%Yk")
+	    (match_operand:HI 1 "register_operand" "Yk"))
+	  (const_int 0)))]
+  "TARGET_AVX512F && ix86_match_ccmode (insn, CCZmode)"
+  "kortestw\t{%1, %0|%0, %1}"
+  [(set_attr "mode" "HI")
+   (set_attr "type" "msklog")
+   (set_attr "prefix" "vex")])
+
+(define_insn "kortestchi"
+  [(set (reg:CCC FLAGS_REG)
+	(compare:CCC
+	  (ior:HI
+	    (match_operand:HI 0 "register_operand" "%Yk")
+	    (match_operand:HI 1 "register_operand" "Yk"))
+	  (const_int -1)))]
+  "TARGET_AVX512F && ix86_match_ccmode (insn, CCCmode)"
+  "kortestw\t{%1, %0|%0, %1}"
+  [(set_attr "mode" "HI")
+   (set_attr "type" "msklog")
+   (set_attr "prefix" "vex")])
+
+(define_insn "kunpckhi"
+  [(set (match_operand:HI 0 "register_operand" "=Yk")
+	(ior:HI
+	  (ashift:HI
+	    (match_operand:HI 1 "register_operand" "Yk")
+	    (const_int 8))
+	  (zero_extend:HI (subreg:QI (match_operand:HI 2 "register_operand" "Yk") 0))))]
+  "TARGET_AVX512F"
+  "kunpckbw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mode" "HI")
+   (set_attr "type" "msklog")
+   (set_attr "prefix" "vex")])
+
 ;; See comment for addsi_1_zext why we do use nonimmediate_operand
 ;; ??? Special case for immediate operand is missing - it is tricky.
 (define_insn "*<code>si_2_zext"
@@ -8640,23 +8814,38 @@
   "ix86_expand_unary_operator (NOT, <MODE>mode, operands); DONE;")
 
 (define_insn "*one_cmpl<mode>2_1"
-  [(set (match_operand:SWI248 0 "nonimmediate_operand" "=rm")
-	(not:SWI248 (match_operand:SWI248 1 "nonimmediate_operand" "0")))]
+  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm")
+	(not:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" "0")))]
   "ix86_unary_operator_ok (NOT, <MODE>mode, operands)"
   "not{<imodesuffix>}\t%0"
   [(set_attr "type" "negnot")
    (set_attr "mode" "<MODE>")])
 
+(define_insn "*one_cmplhi2_1"
+  [(set (match_operand:HI 0 "nonimmediate_operand" "=rm,Yk")
+	(not:HI (match_operand:HI 1 "nonimmediate_operand" "0,Yk")))]
+  "ix86_unary_operator_ok (NOT, HImode, operands)"
+  "@
+   not{w}\t%0
+   knotw\t{%1, %0|%0, %1}"
+  [(set_attr "isa" "*,avx512f")
+   (set_attr "type" "negnot,msklog")
+   (set_attr "prefix" "*,vex")
+   (set_attr "mode" "HI")])
+
 ;; %%% Potential partial reg stall on alternative 1.  What to do?
 (define_insn "*one_cmplqi2_1"
-  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,r")
-	(not:QI (match_operand:QI 1 "nonimmediate_operand" "0,0")))]
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,r,*Yk")
+	(not:QI (match_operand:QI 1 "nonimmediate_operand" "0,0,*Yk")))]
   "ix86_unary_operator_ok (NOT, QImode, operands)"
   "@
    not{b}\t%0
-   not{l}\t%k0"
-  [(set_attr "type" "negnot")
-   (set_attr "mode" "QI,SI")])
+   not{l}\t%k0
+   knotw\t{%1, %0|%0, %1}"
+  [(set_attr "isa" "*,*,avx512f")
+   (set_attr "type" "negnot,negnot,msklog")
+   (set_attr "prefix" "*,*,vex")
+   (set_attr "mode" "QI,SI,QI")])
 
 ;; ??? Currently never generated - xor is used instead.
 (define_insn "*one_cmplsi2_1_zext"
@@ -16380,7 +16569,7 @@
 })
 
 ;; Avoid redundant prefixes by splitting HImode arithmetic to SImode.
-
+;; Do not split instructions with mask registers.
 (define_split
   [(set (match_operand 0 "register_operand")
 	(match_operator 3 "promotable_binary_operator"
@@ -16394,7 +16583,10 @@
 	    || !CONST_INT_P (operands[2])
 	    || satisfies_constraint_K (operands[2])))
        || (GET_MODE (operands[0]) == QImode
-	   && (TARGET_PROMOTE_QImode || optimize_function_for_size_p (cfun))))"
+	   && (TARGET_PROMOTE_QImode || optimize_function_for_size_p (cfun))))
+   && (! ANY_MASK_REG_P (operands[0])
+	|| ! ANY_MASK_REG_P (operands[1])
+	|| ! ANY_MASK_REG_P (operands[2]))"
   [(parallel [(set (match_dup 0)
 		   (match_op_dup 3 [(match_dup 1) (match_dup 2)]))
 	      (clobber (reg:CC FLAGS_REG))])]
@@ -16479,6 +16671,7 @@
   operands[1] = gen_lowpart (SImode, operands[1]);
 })
 
+;; Do not split instructions with mask regs.
 (define_split
   [(set (match_operand 0 "register_operand")
 	(not (match_operand 1 "register_operand")))]
@@ -16486,7 +16679,9 @@
    && (GET_MODE (operands[0]) == HImode
        || (GET_MODE (operands[0]) == QImode
 	   && (TARGET_PROMOTE_QImode
-	       || optimize_insn_for_size_p ())))"
+	       || optimize_insn_for_size_p ())))
+   && (! ANY_MASK_REG_P (operands[0])
+	 || ! ANY_MASK_REG_P (operands[1]))"
   [(set (match_dup 0)
 	(not:SI (match_dup 1)))]
 {
diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
index 3959c38..4f95bf1 100644
--- a/gcc/config/i386/predicates.md
+++ b/gcc/config/i386/predicates.md
@@ -52,6 +52,10 @@
   (and (match_code "reg")
        (match_test "EXT_REX_SSE_REGNO_P (REGNO (op))")))
 
+;; True if the operand is an AVX-512 mask register.
+(define_predicate "mask_reg_operand"
+  (and (match_code "reg")
+       (match_test "MASK_REGNO_P (REGNO (op))")))
 
 ;; True if the operand is a Q_REGS class register.
 (define_predicate "q_regs_operand"
-- 
1.7.11.7

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH i386 2/8] [AVX512] Add mask registers.
  2013-08-26 16:18                     ` Kirill Yukhin
@ 2013-08-26 16:40                       ` Richard Henderson
  2013-08-27 18:22                         ` Kirill Yukhin
  0 siblings, 1 reply; 31+ messages in thread
From: Richard Henderson @ 2013-08-26 16:40 UTC (permalink / raw)
  To: Kirill Yukhin; +Cc: Vladimir Makarov, Uros Bizjak, Jakub Jelinek, GCC Patches

On 08/26/2013 09:13 AM, Kirill Yukhin wrote:
> +(define_split
> +  [(set (match_operand:SWI12 0 "mask_reg_operand")
> +	(any_logic:SWI12 (match_operand:SWI12 1 "mask_reg_operand")
> +			 (match_operand:SWI12 2 "mask_reg_operand")))
> +   (clobber (reg:CC FLAGS_REG))]
> +  "TARGET_AVX512F && reload_completed"
> +  [(set (match_operand:SWI12 0 "mask_reg_operand")
> +	(any_logic:SWI12 (match_operand:SWI12 1 "mask_reg_operand")
> +			 (match_operand:SWI12 2 "mask_reg_operand")))])

You must you match_dup on the new pattern half of define_split.
This pattern must never have triggered during your tests, since
it should have resulted in garbage rtl, and an ICE.

> +(define_insn "kandn<mode>"
> +  [(set (match_operand:SWI12 0 "register_operand" "=r,!Yk")
> +	(and:SWI12
> +	  (not:SWI12
> +	    (match_operand:SWI12 1 "register_operand" "0,Yk"))
> +	  (match_operand:SWI12 2 "register_operand" "r,Yk")))]
> +  "TARGET_AVX512F"
> +  "@
> +   #
> +   kandnw\t{%2, %1, %0|%0, %1, %2}"
> +  [(set_attr "type" "*,msklog")
> +   (set_attr "prefix" "*,vex")
> +   (set_attr "mode" "<MODE>")])

What happened to the bmi andn alternative we discussed?

>  (define_insn "*andqi_1"
> -  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,q,r")
> -	(and:QI (match_operand:QI 1 "nonimmediate_operand" "%0,0,0")
> -		(match_operand:QI 2 "general_operand" "qn,qmn,rn")))
> +  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,q,r,!Yk")
> +	(and:QI (match_operand:QI 1 "nonimmediate_operand" "%0,0,0,Yk")
> +		(match_operand:QI 2 "general_operand" "qn,qmn,rn,Yk")))
>     (clobber (reg:CC FLAGS_REG))]
>    "ix86_binary_operator_ok (AND, QImode, operands)"
>    "@
>     and{b}\t{%2, %0|%0, %2}
>     and{b}\t{%2, %0|%0, %2}
> -   and{l}\t{%k2, %k0|%k0, %k2}"
> -  [(set_attr "type" "alu")
> -   (set_attr "mode" "QI,QI,SI")])
> +   and{l}\t{%k2, %k0|%k0, %k2}
> +   #"
> +  [(set_attr "type" "alu,alu,alu,msklog")
> +   (set_attr "mode" "QI,QI,SI,HI")])

Why force the split?  You can write the kand here...

> +(define_insn "<code>hi_1"
> +  [(set (match_operand:HI 0 "nonimmediate_operand" "=r,rm,!Yk")
> +	(any_or:HI
> +	 (match_operand:HI 1 "nonimmediate_operand" "%0,0,Yk")
> +	 (match_operand:HI 2 "general_operand" "<g>,r<i>,Yk")))
> +   (clobber (reg:CC FLAGS_REG))]
> +  "ix86_binary_operator_ok (<CODE>, HImode, operands)"
> +  "@
> +  <logic>{w}\t{%2, %0|%0, %2}
> +  <logic>{w}\t{%2, %0|%0, %2}
> +  #"
> +  [(set_attr "type" "alu,alu,msklog")
> +   (set_attr "mode" "HI")])

Likewise.

The point being that with optimization enabled, we will have run the split and
gotten all of the performance benefit of eliding the clobber.  But with
optimization disabled, we don't need the split for correctness.

> +(define_insn "kunpckhi"
> +  [(set (match_operand:HI 0 "register_operand" "=Yk")
> +	(ior:HI
> +	  (ashift:HI
> +	    (match_operand:HI 1 "register_operand" "Yk")
> +	    (const_int 8))
> +	  (zero_extend:HI (subreg:QI (match_operand:HI 2 "register_operand" "Yk") 0))))]
> +  "TARGET_AVX512F"
> +  "kunpckbw\t{%2, %1, %0|%0, %1, %2}"
> +  [(set_attr "mode" "HI")
> +   (set_attr "type" "msklog")
> +   (set_attr "prefix" "vex")])

Don't write the subreg explicitly.  Instead, use a match_operand:QI, which will
match the whole (subreg (reg)) expression, and also something that the combiner
could simplify out of that.

> +(define_insn "*one_cmplhi2_1"
> +  [(set (match_operand:HI 0 "nonimmediate_operand" "=rm,Yk")
> +	(not:HI (match_operand:HI 1 "nonimmediate_operand" "0,Yk")))]
> +  "ix86_unary_operator_ok (NOT, HImode, operands)"
...
>  (define_insn "*one_cmplqi2_1"
> -  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,r")
> -	(not:QI (match_operand:QI 1 "nonimmediate_operand" "0,0")))]
> +  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,r,*Yk")
> +	(not:QI (match_operand:QI 1 "nonimmediate_operand" "0,0,*Yk")))]

Forgotten ! for Yk alternatives.


> +  "TARGET_AVX512F && !ANY_MASK_REG_P (operands [0])"
...
> +;; Do not split instructions with mask registers.
>  (define_split
...
> +   && (! ANY_MASK_REG_P (operands[0])
> +	|| ! ANY_MASK_REG_P (operands[1])
> +	|| ! ANY_MASK_REG_P (operands[2]))"

This ugliness is why I suggested adding a general_reg_operand in our last
conversation.


r~

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH i386 2/8] [AVX512] Add mask registers.
  2013-08-26 16:40                       ` Richard Henderson
@ 2013-08-27 18:22                         ` Kirill Yukhin
  2013-08-27 18:41                           ` Kirill Yukhin
  2013-08-27 20:17                           ` Richard Henderson
  0 siblings, 2 replies; 31+ messages in thread
From: Kirill Yukhin @ 2013-08-27 18:22 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Vladimir Makarov, Uros Bizjak, Jakub Jelinek, GCC Patches

Hello Reichard,

On 26 Aug 09:37, Richard Henderson wrote:
> On 08/26/2013 09:13 AM, Kirill Yukhin wrote:
> > +(define_split
> > +  [(set (match_operand:SWI12 0 "mask_reg_operand")
> > +	(any_logic:SWI12 (match_operand:SWI12 1 "mask_reg_operand")
> > +			 (match_operand:SWI12 2 "mask_reg_operand")))
> > +   (clobber (reg:CC FLAGS_REG))]
> > +  "TARGET_AVX512F && reload_completed"
> > +  [(set (match_operand:SWI12 0 "mask_reg_operand")
> > +	(any_logic:SWI12 (match_operand:SWI12 1 "mask_reg_operand")
> > +			 (match_operand:SWI12 2 "mask_reg_operand")))])
> 
> You must you match_dup on the new pattern half of define_split.
> This pattern must never have triggered during your tests, since
> it should have resulted in garbage rtl, and an ICE.
Thanks, fixed.

> > +	  (match_operand:SWI12 2 "register_operand" "r,Yk")))]
> > +  "TARGET_AVX512F"
> > +  "@
> > +   #
> > +   kandnw\t{%2, %1, %0|%0, %1, %2}"
> > +  [(set_attr "type" "*,msklog")
> > +   (set_attr "prefix" "*,vex")
> > +   (set_attr "mode" "<MODE>")])
> 
> What happened to the bmi andn alternative we discussed?
BMI only supported for 4- and 8- byte integers, while
kandw - for HI/QI

> > +   and{l}\t{%k2, %k0|%k0, %k2}
> > +   #"
> > +  [(set_attr "type" "alu,alu,alu,msklog")
> > +   (set_attr "mode" "QI,QI,SI,HI")])
> 
> Why force the split?  You can write the kand here...
Done.

> > +  <logic>{w}\t{%2, %0|%0, %2}
> > +  #"
> > +  [(set_attr "type" "alu,alu,msklog")
> > +   (set_attr "mode" "HI")])
> 
> Likewise.
Done.

> The point being that with optimization enabled, we will have run the split and
> gotten all of the performance benefit of eliding the clobber.  But with
> optimization disabled, we don't need the split for correctness.
> 
> > +(define_insn "kunpckhi"
> > +  [(set (match_operand:HI 0 "register_operand" "=Yk")
> > +	(ior:HI
> > +	  (ashift:HI
> > +	    (match_operand:HI 1 "register_operand" "Yk")
> > +	    (const_int 8))
> > +	  (zero_extend:HI (subreg:QI (match_operand:HI 2 "register_operand" "Yk") 0))))]
> > +  "TARGET_AVX512F"
> > +  "kunpckbw\t{%2, %1, %0|%0, %1, %2}"
> > +  [(set_attr "mode" "HI")
> > +   (set_attr "type" "msklog")
> > +   (set_attr "prefix" "vex")])
> 
> Don't write the subreg explicitly.  Instead, use a match_operand:QI, which will
> match the whole (subreg (reg)) expression, and also something that the combiner
> could simplify out of that.
Thanks, fixed.

> > +(define_insn "*one_cmplhi2_1"
> > +  [(set (match_operand:HI 0 "nonimmediate_operand" "=rm,Yk")
> > +	(not:HI (match_operand:HI 1 "nonimmediate_operand" "0,Yk")))]
> > +  "ix86_unary_operator_ok (NOT, HImode, operands)"
> ...
> >  (define_insn "*one_cmplqi2_1"
> > -  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,r")
> > -	(not:QI (match_operand:QI 1 "nonimmediate_operand" "0,0")))]
> > +  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,r,*Yk")
> > +	(not:QI (match_operand:QI 1 "nonimmediate_operand" "0,0,*Yk")))]
> 
> Forgotten ! for Yk alternatives.
Thanks. Fixed.

> > +  "TARGET_AVX512F && !ANY_MASK_REG_P (operands [0])"
> ...
> > +;; Do not split instructions with mask registers.
> >  (define_split
> ...
> > +   && (! ANY_MASK_REG_P (operands[0])
> > +	|| ! ANY_MASK_REG_P (operands[1])
> > +	|| ! ANY_MASK_REG_P (operands[2]))"
> 
> This ugliness is why I suggested adding a general_reg_operand in our last
> conversation.
If introduced general_reg_operand predicate.

Testing:
  1. Bootstrap pass.
  2. make check shows no regressions.
  3. Spec 2000 & 2006 build show no regressions both with and without -mavx512f option.
  4. Spec 2000 & 2006 run shows no stability regressions without -mavx512f option.


Is it ok?

--
Thanks, K

---
 gcc/config/i386/constraints.md |   8 +-
 gcc/config/i386/i386.c         |  34 +++--
 gcc/config/i386/i386.h         |  40 ++++--
 gcc/config/i386/i386.md        | 280 ++++++++++++++++++++++++++++++++++-------
 gcc/config/i386/predicates.md  |   9 ++
 5 files changed, 311 insertions(+), 60 deletions(-)

diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
index 28e626f..92e0c05 100644
--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -19,7 +19,7 @@
 
 ;;; Unused letters:
 ;;;     B     H           T
-;;;           h jk
+;;;           h j
 
 ;; Integer register constraints.
 ;; It is not necessary to define 'r' here.
@@ -78,6 +78,12 @@
  "TARGET_80387 || TARGET_FLOAT_RETURNS_IN_80387 ? FP_SECOND_REG : NO_REGS"
  "Second from top of 80387 floating-point stack (@code{%st(1)}).")
 
+(define_register_constraint "k" "TARGET_AVX512F ? MASK_EVEX_REGS : NO_REGS"
+"@internal Any mask register that can be used as predicate, i.e. k1-k7.")
+
+(define_register_constraint "Yk" "TARGET_AVX512F ? MASK_REGS : NO_REGS"
+"@internal Any mask register.")
+
 ;; Vector registers (also used for plain floating point nowadays).
 (define_register_constraint "y" "TARGET_MMX ? MMX_REGS : NO_REGS"
  "Any MMX register.")
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index d05dbf0..8325919 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2032,6 +2032,9 @@ enum reg_class const regclass_map[FIRST_PSEUDO_REGISTER] =
   EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS,
   EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS,
   EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS,
+  /* Mask registers.  */
+  MASK_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS,
+  MASK_EVEX_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS,
 };
 
 /* The "default" register map used in 32bit mode.  */
@@ -2047,6 +2050,7 @@ int const dbx_register_map[FIRST_PSEUDO_REGISTER] =
   -1, -1, -1, -1, -1, -1, -1, -1,	/* extended SSE registers */
   -1, -1, -1, -1, -1, -1, -1, -1,       /* AVX-512 registers 16-23*/
   -1, -1, -1, -1, -1, -1, -1, -1,       /* AVX-512 registers 24-31*/
+  93, 94, 95, 96, 97, 98, 99, 100,      /* Mask registers */
 };
 
 /* The "default" register map used in 64bit mode.  */
@@ -2062,6 +2066,7 @@ int const dbx64_register_map[FIRST_PSEUDO_REGISTER] =
   25, 26, 27, 28, 29, 30, 31, 32,	/* extended SSE registers */
   67, 68, 69, 70, 71, 72, 73, 74,       /* AVX-512 registers 16-23 */
   75, 76, 77, 78, 79, 80, 81, 82,       /* AVX-512 registers 24-31 */
+  118, 119, 120, 121, 122, 123, 124, 125, /* Mask registers */
 };
 
 /* Define the register numbers to be used in Dwarf debugging information.
@@ -2129,6 +2134,7 @@ int const svr4_dbx_register_map[FIRST_PSEUDO_REGISTER] =
   -1, -1, -1, -1, -1, -1, -1, -1,	/* extended SSE registers */
   -1, -1, -1, -1, -1, -1, -1, -1,       /* AVX-512 registers 16-23*/
   -1, -1, -1, -1, -1, -1, -1, -1,       /* AVX-512 registers 24-31*/
+  93, 94, 95, 96, 97, 98, 99, 100,      /* Mask registers */
 };
 
 /* Define parameter passing and return registers.  */
@@ -4219,8 +4225,13 @@ ix86_conditional_register_usage (void)
 
   /* If AVX512F is disabled, squash the registers.  */
   if (! TARGET_AVX512F)
+  {
     for (i = FIRST_EXT_REX_SSE_REG; i < LAST_EXT_REX_SSE_REG; i++)
       fixed_regs[i] = call_used_regs[i] = 1, reg_names[i] = "";
+
+    for (i = FIRST_MASK_REG; i < LAST_MASK_REG; i++)
+      fixed_regs[i] = call_used_regs[i] = 1, reg_names[i] = "";
+  }
 }
 
 \f
@@ -33889,10 +33900,12 @@ ix86_preferred_reload_class (rtx x, reg_class_t regclass)
     return regclass;
 
   /* Force constants into memory if we are loading a (nonzero) constant into
-     an MMX or SSE register.  This is because there are no MMX/SSE instructions
-     to load from a constant.  */
+     an MMX, SSE or MASK register.  This is because there are no MMX/SSE/MASK
+     instructions to load from a constant.  */
   if (CONSTANT_P (x)
-      && (MAYBE_MMX_CLASS_P (regclass) || MAYBE_SSE_CLASS_P (regclass)))
+      && (MAYBE_MMX_CLASS_P (regclass)
+	  || MAYBE_SSE_CLASS_P (regclass)
+	  || MAYBE_MASK_CLASS_P (regclass)))
     return NO_REGS;
 
   /* Prefer SSE regs only, if we can use them for math.  */
@@ -33996,10 +34009,11 @@ ix86_secondary_reload (bool in_p, rtx x, reg_class_t rclass,
 
   /* QImode spills from non-QI registers require
      intermediate register on 32bit targets.  */
-  if (!TARGET_64BIT
-      && !in_p && mode == QImode
-      && INTEGER_CLASS_P (rclass)
-      && MAYBE_NON_Q_CLASS_P (rclass))
+  if (mode == QImode
+      && (MAYBE_MASK_CLASS_P (rclass)
+	  || (!TARGET_64BIT && !in_p
+	      && INTEGER_CLASS_P (rclass)
+	      && MAYBE_NON_Q_CLASS_P (rclass))))
     {
       int regno;
 
@@ -34421,6 +34435,8 @@ ix86_hard_regno_mode_ok (int regno, enum machine_mode mode)
     return false;
   if (STACK_REGNO_P (regno))
     return VALID_FP_MODE_P (mode);
+  if (MASK_REGNO_P (regno))
+    return VALID_MASK_REG_MODE (mode);
   if (SSE_REGNO_P (regno))
     {
       /* We implement the move patterns for all vector modes into and
@@ -35230,6 +35246,10 @@ x86_order_regs_for_local_alloc (void)
    for (i = FIRST_EXT_REX_SSE_REG; i <= LAST_EXT_REX_SSE_REG; i++)
      reg_alloc_order [pos++] = i;
 
+   /* Mask register.  */
+   for (i = FIRST_MASK_REG; i <= LAST_MASK_REG; i++)
+     reg_alloc_order [pos++] = i;
+
    /* x87 registers.  */
    if (TARGET_SSE_MATH)
      for (i = FIRST_STACK_REG; i <= LAST_STACK_REG; i++)
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index e820aa6..13572bf2 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -893,7 +893,7 @@ enum target_cpu_default
    eliminated during reloading in favor of either the stack or frame
    pointer.  */
 
-#define FIRST_PSEUDO_REGISTER 69
+#define FIRST_PSEUDO_REGISTER 77
 
 /* Number of hardware registers that go into the DWARF-2 unwind info.
    If not defined, equals FIRST_PSEUDO_REGISTER.  */
@@ -923,7 +923,9 @@ enum target_cpu_default
 /*xmm16,xmm17,xmm18,xmm19,xmm20,xmm21,xmm22,xmm23*/		\
      0,   0,    0,    0,    0,    0,    0,    0,		\
 /*xmm24,xmm25,xmm26,xmm27,xmm28,xmm29,xmm30,xmm31*/		\
-     0,   0,    0,    0,    0,    0,    0,    0 }
+     0,   0,    0,    0,    0,    0,    0,    0,		\
+/*  k0,  k1, k2, k3, k4, k5, k6, k7*/				\
+     0,  0,   0,  0,  0,  0,  0,  0 }
 
 /* 1 for registers not available across function calls.
    These must include the FIXED_REGISTERS and also any
@@ -955,7 +957,9 @@ enum target_cpu_default
 /*xmm16,xmm17,xmm18,xmm19,xmm20,xmm21,xmm22,xmm23*/		\
      6,    6,     6,    6,    6,    6,    6,    6,		\
 /*xmm24,xmm25,xmm26,xmm27,xmm28,xmm29,xmm30,xmm31*/		\
-     6,    6,     6,    6,    6,    6,    6,    6 }
+     6,    6,     6,    6,    6,    6,    6,    6,		\
+ /* k0,  k1,  k2,  k3,  k4,  k5,  k6,  k7*/			\
+     1,   1,   1,   1,   1,   1,   1,   1 }
 
 /* Order in which to allocate registers.  Each register must be
    listed once, even those in FIXED_REGISTERS.  List frame pointer
@@ -971,7 +975,7 @@ enum target_cpu_default
    18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32,	\
    33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,  \
    48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62,	\
-   63, 64, 65, 66, 67, 68 }
+   63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76 }
 
 /* ADJUST_REG_ALLOC_ORDER is a macro which permits reg_alloc_order
    to be rearranged based on a particular function.  When using sse math,
@@ -1068,6 +1072,8 @@ enum target_cpu_default
    || (MODE) == V16SImode || (MODE) == V32HImode || (MODE) == V8DFmode	\
    || (MODE) == V16SFmode)
 
+#define VALID_MASK_REG_MODE(MODE) ((MODE) == HImode || (MODE) == QImode)
+
 /* Value is 1 if hard register REGNO can hold a value of machine-mode MODE.  */
 
 #define HARD_REGNO_MODE_OK(REGNO, MODE)	\
@@ -1093,8 +1099,10 @@ enum target_cpu_default
   (CC_REGNO_P (REGNO) ? VOIDmode					\
    : (MODE) == VOIDmode && (NREGS) != 1 ? VOIDmode			\
    : (MODE) == VOIDmode ? choose_hard_reg_mode ((REGNO), (NREGS), false) \
-   : (MODE) == HImode && !TARGET_PARTIAL_REG_STALL ? SImode		\
-   : (MODE) == QImode && !(TARGET_64BIT || QI_REGNO_P (REGNO)) ? SImode	\
+   : (MODE) == HImode && !(TARGET_PARTIAL_REG_STALL			\
+			   || MASK_REGNO_P (REGNO)) ? SImode		\
+   : (MODE) == QImode && !(TARGET_64BIT || QI_REGNO_P (REGNO)		\
+			   || MASK_REGNO_P (REGNO)) ? SImode		\
    : (MODE))
 
 /* The only ABI that saves SSE registers across calls is Win64 (thus no
@@ -1141,6 +1149,9 @@ enum target_cpu_default
 #define FIRST_EXT_REX_SSE_REG  (LAST_REX_SSE_REG + 1) /*53*/
 #define LAST_EXT_REX_SSE_REG   (FIRST_EXT_REX_SSE_REG + 15) /*68*/
 
+#define FIRST_MASK_REG  (LAST_EXT_REX_SSE_REG + 1) /*69*/
+#define LAST_MASK_REG   (FIRST_MASK_REG + 7) /*76*/
+
 /* Override this in other tm.h files to cope with various OS lossage
    requiring a frame pointer.  */
 #ifndef SUBTARGET_FRAME_POINTER_REQUIRED
@@ -1229,6 +1240,8 @@ enum reg_class
   FLOAT_INT_REGS,
   INT_SSE_REGS,
   FLOAT_INT_SSE_REGS,
+  MASK_EVEX_REGS,
+  MASK_REGS,
   ALL_REGS, LIM_REG_CLASSES
 };
 
@@ -1250,6 +1263,8 @@ enum reg_class
   reg_classes_intersect_p ((CLASS), ALL_SSE_REGS)
 #define MAYBE_MMX_CLASS_P(CLASS) \
   reg_classes_intersect_p ((CLASS), MMX_REGS)
+#define MAYBE_MASK_CLASS_P(CLASS) \
+  reg_classes_intersect_p ((CLASS), MASK_REGS)
 
 #define Q_CLASS_P(CLASS) \
   reg_class_subset_p ((CLASS), Q_REGS)
@@ -1282,6 +1297,8 @@ enum reg_class
    "FLOAT_INT_REGS",			\
    "INT_SSE_REGS",			\
    "FLOAT_INT_SSE_REGS",		\
+   "MASK_EVEX_REGS",			\
+   "MASK_REGS",				\
    "ALL_REGS" }
 
 /* Define which registers fit in which classes.  This is an initializer
@@ -1319,7 +1336,9 @@ enum reg_class
 {   0x11ffff,    0x1fe0,   0x0 },       /* FLOAT_INT_REGS */            \
 { 0x1ff100ff,0xffffffe0,  0x1f },       /* INT_SSE_REGS */              \
 { 0x1ff1ffff,0xffffffe0,  0x1f },       /* FLOAT_INT_SSE_REGS */        \
-{ 0xffffffff,0xffffffff,  0x1f }                                        \
+       { 0x0,       0x0,0x1fc0 },       /* MASK_EVEX_REGS */           \
+       { 0x0,       0x0,0x1fe0 },       /* MASK_REGS */                 \
+{ 0xffffffff,0xffffffff,0x1fff }                                        \
 }
 
 /* The same information, inverted:
@@ -1377,6 +1396,8 @@ enum reg_class
          : (N) <= LAST_REX_SSE_REG ? (FIRST_REX_SSE_REG + (N) - 8) \
                                    : (FIRST_EXT_REX_SSE_REG + (N) - 16))
 
+#define MASK_REGNO_P(N) IN_RANGE ((N), FIRST_MASK_REG, LAST_MASK_REG)
+#define ANY_MASK_REG_P(X) (REG_P (X) && MASK_REGNO_P (REGNO (X)))
 
 #define SSE_FLOAT_MODE_P(MODE) \
   ((TARGET_SSE && (MODE) == SFmode) || (TARGET_SSE2 && (MODE) == DFmode))
@@ -1429,7 +1450,7 @@ enum reg_class
 
 /* Get_secondary_mem widens integral modes to BITS_PER_WORD.
    There is no need to emit full 64 bit move on 64 bit targets
-   for integral modes that can be moved using 32 bit move.  */
+   for integral modes that can be moved using 8 bit move.  */
 #define SECONDARY_MEMORY_NEEDED_MODE(MODE)			\
   (GET_MODE_BITSIZE (MODE) < 32 && INTEGRAL_MODE_P (MODE)	\
    ? mode_for_size (32, GET_MODE_CLASS (MODE), 0)		\
@@ -1933,7 +1954,8 @@ do {							\
  "xmm16", "xmm17", "xmm18", "xmm19",					\
  "xmm20", "xmm21", "xmm22", "xmm23",					\
  "xmm24", "xmm25", "xmm26", "xmm27",					\
- "xmm28", "xmm29", "xmm30", "xmm31" }
+ "xmm28", "xmm29", "xmm30", "xmm31",					\
+ "k0", "k1", "k2", "k3", "k4", "k5", "k6", "k7" }
 
 #define REGISTER_NAMES HI_REGISTER_NAMES
 
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 3d7533a..860ca5c 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -328,6 +328,14 @@
    (XMM29_REG			66)
    (XMM30_REG			67)
    (XMM31_REG			68)
+   (MASK0_REG			69)
+   (MASK1_REG			70)
+   (MASK2_REG			71)
+   (MASK3_REG			72)
+   (MASK4_REG			73)
+   (MASK5_REG			74)
+   (MASK6_REG			75)
+   (MASK7_REG			76)
   ])
 
 ;; Insns whose names begin with "x86_" are emitted by gen_FOO calls
@@ -360,7 +368,7 @@
    sseishft,sseishft1,ssecmp,ssecomi,
    ssecvt,ssecvt1,sseicvt,sseins,
    sseshuf,sseshuf1,ssemuladd,sse4arg,
-   lwp,
+   lwp,mskmov,msklog,
    mmx,mmxmov,mmxadd,mmxmul,mmxcmp,mmxcvt,mmxshft"
   (const_string "other"))
 
@@ -379,7 +387,7 @@
 			  ssemul,sseimul,ssediv,sselog,sselog1,
 			  sseishft,sseishft1,ssecmp,ssecomi,
 			  ssecvt,ssecvt1,sseicvt,sseins,
-			  sseshuf,sseshuf1,ssemuladd,sse4arg")
+			  sseshuf,sseshuf1,ssemuladd,sse4arg,mskmov")
 	   (const_string "sse")
 	 (eq_attr "type" "mmx,mmxmov,mmxadd,mmxmul,mmxcmp,mmxcvt,mmxshft")
 	   (const_string "mmx")
@@ -390,7 +398,7 @@
 ;; The (bounding maximum) length of an instruction immediate.
 (define_attr "length_immediate" ""
   (cond [(eq_attr "type" "incdec,setcc,icmov,str,lea,other,multi,idiv,leave,
-			  bitmanip,imulx")
+			  bitmanip,imulx,msklog,mskmov")
 	   (const_int 0)
 	 (eq_attr "unit" "i387,sse,mmx")
 	   (const_int 0)
@@ -451,7 +459,7 @@
 ;; Set when 0f opcode prefix is used.
 (define_attr "prefix_0f" ""
   (if_then_else
-    (ior (eq_attr "type" "imovx,setcc,icmov,bitmanip")
+    (ior (eq_attr "type" "imovx,setcc,icmov,bitmanip,msklog,mskmov")
 	 (eq_attr "unit" "sse,mmx"))
     (const_int 1)
     (const_int 0)))
@@ -651,7 +659,7 @@
 		   fmov,fcmp,fsgn,
 		   sse,ssemov,ssecmp,ssecomi,ssecvt,ssecvt1,sseicvt,
 		   sselog1,sseshuf1,sseadd1,sseiadd1,sseishft1,
-		   mmx,mmxmov,mmxcmp,mmxcvt")
+		   mmx,mmxmov,mmxcmp,mmxcvt,mskmov,msklog")
 	      (match_operand 2 "memory_operand"))
 	   (const_string "load")
 	 (and (eq_attr "type" "icmov,ssemuladd,sse4arg")
@@ -2213,8 +2221,8 @@
 	   (const_string "SI")))])
 
 (define_insn "*movhi_internal"
-  [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r ,r ,m")
-	(match_operand:HI 1 "general_operand"	   "r ,rn,rm,rn"))]
+  [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r ,r ,m ,Yk,Yk,rm")
+	(match_operand:HI 1 "general_operand"      "r ,rn,rm,rn,rm,Yk,Yk"))]
   "!(MEM_P (operands[0]) && MEM_P (operands[1]))"
 {
   switch (get_attr_type (insn))
@@ -2223,6 +2231,16 @@
       /* movzwl is faster than movw on p2 due to partial word stalls,
 	 though not as fast as an aligned movl.  */
       return "movz{wl|x}\t{%1, %k0|%k0, %1}";
+
+    case TYPE_MSKMOV:
+      switch (which_alternative)
+        {
+	case 4: return "kmovw\t{%k1, %0|%0, %k1}";
+	case 5: return "kmovw\t{%1, %0|%0, %1}";
+	case 6: return "kmovw\t{%1, %k0|%k0, %1}";
+	default: gcc_unreachable ();
+	}
+
     default:
       if (get_attr_mode (insn) == MODE_SI)
         return "mov{l}\t{%k1, %k0|%k0, %k1}";
@@ -2240,11 +2258,17 @@
 	    (and (eq_attr "alternative" "1,2")
 		 (match_operand:HI 1 "aligned_operand"))
 	      (const_string "imov")
+	    (eq_attr "alternative" "4,5,6")
+	      (const_string "mskmov")
 	    (and (match_test "TARGET_MOVX")
 		 (eq_attr "alternative" "0,2"))
 	      (const_string "imovx")
 	   ]
 	   (const_string "imov")))
+    (set (attr "prefix")
+      (if_then_else (eq_attr "alternative" "4,5,6")
+	(const_string "vex")
+	(const_string "orig")))
     (set (attr "mode")
       (cond [(eq_attr "type" "imovx")
 	       (const_string "SI")
@@ -2269,8 +2293,8 @@
 ;; register stall machines with, where we use QImode instructions, since
 ;; partial register stall can be caused there.  Then we use movzx.
 (define_insn "*movqi_internal"
-  [(set (match_operand:QI 0 "nonimmediate_operand" "=q,q ,q ,r,r ,?r,m")
-	(match_operand:QI 1 "general_operand"      "q ,qn,qm,q,rn,qm,qn"))]
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=q,q ,q ,r,r ,?r,m ,Yk,Yk,r")
+	(match_operand:QI 1 "general_operand"      "q ,qn,qm,q,rn,qm,qn,r ,Yk,Yk"))]
   "!(MEM_P (operands[0]) && MEM_P (operands[1]))"
 {
   switch (get_attr_type (insn))
@@ -2278,6 +2302,16 @@
     case TYPE_IMOVX:
       gcc_assert (ANY_QI_REG_P (operands[1]) || MEM_P (operands[1]));
       return "movz{bl|x}\t{%1, %k0|%k0, %1}";
+
+    case TYPE_MSKMOV:
+      switch (which_alternative)
+        {
+	case 7: return "kmovw\t{%k1, %0|%0, %k1}";
+	case 8: return "kmovw\t{%1, %0|%0, %1}";
+	case 9: return "kmovw\t{%1, %k0|%k0, %1}";
+	default: gcc_unreachable ();
+	}
+
     default:
       if (get_attr_mode (insn) == MODE_SI)
         return "mov{l}\t{%k1, %k0|%k0, %k1}";
@@ -2297,11 +2331,17 @@
 	      (const_string "imov")
 	    (eq_attr "alternative" "3,5")
 	      (const_string "imovx")
+	    (eq_attr "alternative" "7,8,9")
+	      (const_string "mskmov")
 	    (and (match_test "TARGET_MOVX")
 		 (eq_attr "alternative" "2"))
 	      (const_string "imovx")
 	   ]
 	   (const_string "imov")))
+   (set (attr "prefix")
+     (if_then_else (eq_attr "alternative" "7,8,9")
+       (const_string "vex")
+       (const_string "orig")))
    (set (attr "mode")
       (cond [(eq_attr "alternative" "3,4,5")
 	       (const_string "SI")
@@ -7494,6 +7534,26 @@
   operands[3] = gen_lowpart (QImode, operands[3]);
 })
 
+(define_split
+  [(set (match_operand:SWI12 0 "mask_reg_operand")
+	(any_logic:SWI12 (match_operand:SWI12 1 "mask_reg_operand")
+			 (match_operand:SWI12 2 "mask_reg_operand")))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_AVX512F && reload_completed"
+  [(set (match_dup 0)
+	(any_logic:SWI12 (match_dup 1)
+			 (match_dup 2)))])
+
+(define_insn "*k<logic><mode>"
+  [(set (match_operand:SWI12 0 "mask_reg_operand" "=Yk")
+	(any_logic:SWI12 (match_operand:SWI12 1 "mask_reg_operand" "Yk")
+			 (match_operand:SWI12 2 "mask_reg_operand" "Yk")))]
+  "TARGET_AVX512F && reload_completed"
+  "k<logic>w\t{%2, %1, %0|%0, %1, %2}";
+  [(set_attr "mode" "<MODE>")
+   (set_attr "type" "msklog")
+   (set_attr "prefix" "vex")])
+
 ;; %%% This used to optimize known byte-wide and operations to memory,
 ;; and sometimes to QImode registers.  If this is considered useful,
 ;; it should be done with splitters.
@@ -7616,10 +7676,10 @@
   [(set_attr "type" "alu")
    (set_attr "mode" "SI")])
 
-(define_insn "*andhi_1"
-  [(set (match_operand:HI 0 "nonimmediate_operand" "=rm,r,Ya")
-	(and:HI (match_operand:HI 1 "nonimmediate_operand" "%0,0,qm")
-		(match_operand:HI 2 "general_operand" "rn,rm,L")))
+(define_insn "andhi_1"
+  [(set (match_operand:HI 0 "nonimmediate_operand" "=rm,r,Ya,!Yk")
+	(and:HI (match_operand:HI 1 "nonimmediate_operand" "%0,0,qm,Yk")
+		(match_operand:HI 2 "general_operand" "rn,rm,L,Yk")))
    (clobber (reg:CC FLAGS_REG))]
   "ix86_binary_operator_ok (AND, HImode, operands)"
 {
@@ -7628,34 +7688,38 @@
     case TYPE_IMOVX:
       return "#";
 
+    case TYPE_MSKLOG:
+      return "kandw\t{%2, %1, %0|%0, %1, %2}";
+
     default:
       gcc_assert (rtx_equal_p (operands[0], operands[1]));
       return "and{w}\t{%2, %0|%0, %2}";
     }
 }
-  [(set_attr "type" "alu,alu,imovx")
-   (set_attr "length_immediate" "*,*,0")
+  [(set_attr "type" "alu,alu,imovx,msklog")
+   (set_attr "length_immediate" "*,*,0,*")
    (set (attr "prefix_rex")
      (if_then_else
        (and (eq_attr "type" "imovx")
 	    (match_operand 1 "ext_QIreg_operand"))
        (const_string "1")
        (const_string "*")))
-   (set_attr "mode" "HI,HI,SI")])
+   (set_attr "mode" "HI,HI,SI,HI")])
 
 ;; %%% Potential partial reg stall on alternative 2.  What to do?
 (define_insn "*andqi_1"
-  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,q,r")
-	(and:QI (match_operand:QI 1 "nonimmediate_operand" "%0,0,0")
-		(match_operand:QI 2 "general_operand" "qn,qmn,rn")))
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,q,r,!Yk")
+	(and:QI (match_operand:QI 1 "nonimmediate_operand" "%0,0,0,Yk")
+		(match_operand:QI 2 "general_operand" "qn,qmn,rn,Yk")))
    (clobber (reg:CC FLAGS_REG))]
   "ix86_binary_operator_ok (AND, QImode, operands)"
   "@
    and{b}\t{%2, %0|%0, %2}
    and{b}\t{%2, %0|%0, %2}
-   and{l}\t{%k2, %k0|%k0, %k2}"
-  [(set_attr "type" "alu")
-   (set_attr "mode" "QI,QI,SI")])
+   and{l}\t{%k2, %k0|%k0, %k2}
+   kandw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "alu,alu,alu,msklog")
+   (set_attr "mode" "QI,QI,SI,HI")])
 
 (define_insn "*andqi_1_slp"
   [(set (strict_low_part (match_operand:QI 0 "nonimmediate_operand" "+qm,q"))
@@ -7668,6 +7732,35 @@
   [(set_attr "type" "alu1")
    (set_attr "mode" "QI")])
 
+(define_insn "kandn<mode>"
+  [(set (match_operand:SWI12 0 "register_operand" "=r,!Yk")
+	(and:SWI12
+	  (not:SWI12
+	    (match_operand:SWI12 1 "register_operand" "0,Yk"))
+	  (match_operand:SWI12 2 "register_operand" "r,Yk")))]
+  "TARGET_AVX512F"
+  "@
+   #
+   kandnw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "*,msklog")
+   (set_attr "prefix" "*,vex")
+   (set_attr "mode" "<MODE>")])
+
+(define_split
+  [(set (match_operand:SWI12 0 "general_reg_operand")
+	(and:SWI12
+	  (not:SWI12
+	    (match_dup 0))
+	  (match_operand:SWI12 1 "general_reg_operand")))]
+  "TARGET_AVX512F"
+  [(set (match_dup 0)
+	(not:HI (match_dup 0)))
+   (parallel [(set (match_dup 0)
+		   (and:HI (match_dup 0)
+			   (match_dup 1)))
+	      (clobber (reg:CC FLAGS_REG))])]
+  "")
+
 ;; Turn *anddi_1 into *andsi_1_zext if possible.
 (define_split
   [(set (match_operand:DI 0 "register_operand")
@@ -7999,29 +8092,44 @@
   "ix86_expand_binary_operator (<CODE>, <MODE>mode, operands); DONE;")
 
 (define_insn "*<code><mode>_1"
-  [(set (match_operand:SWI248 0 "nonimmediate_operand" "=r,rm")
-	(any_or:SWI248
-	 (match_operand:SWI248 1 "nonimmediate_operand" "%0,0")
-	 (match_operand:SWI248 2 "<general_operand>" "<g>,r<i>")))
+  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=r,rm")
+	(any_or:SWI48
+	 (match_operand:SWI48 1 "nonimmediate_operand" "%0,0")
+	 (match_operand:SWI48 2 "<general_operand>" "<g>,r<i>")))
    (clobber (reg:CC FLAGS_REG))]
   "ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
   "<logic>{<imodesuffix>}\t{%2, %0|%0, %2}"
   [(set_attr "type" "alu")
    (set_attr "mode" "<MODE>")])
 
+(define_insn "<code>hi_1"
+  [(set (match_operand:HI 0 "nonimmediate_operand" "=r,rm,!Yk")
+	(any_or:HI
+	 (match_operand:HI 1 "nonimmediate_operand" "%0,0,Yk")
+	 (match_operand:HI 2 "general_operand" "<g>,r<i>,Yk")))
+   (clobber (reg:CC FLAGS_REG))]
+  "ix86_binary_operator_ok (<CODE>, HImode, operands)"
+  "@
+  <logic>{w}\t{%2, %0|%0, %2}
+  <logic>{w}\t{%2, %0|%0, %2}
+  k<logic>w\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "alu,alu,msklog")
+   (set_attr "mode" "HI")])
+
 ;; %%% Potential partial reg stall on alternative 2.  What to do?
 (define_insn "*<code>qi_1"
-  [(set (match_operand:QI 0 "nonimmediate_operand" "=q,m,r")
-	(any_or:QI (match_operand:QI 1 "nonimmediate_operand" "%0,0,0")
-		   (match_operand:QI 2 "general_operand" "qmn,qn,rn")))
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=q,m,r,!Yk")
+	(any_or:QI (match_operand:QI 1 "nonimmediate_operand" "%0,0,0,Yk")
+		   (match_operand:QI 2 "general_operand" "qmn,qn,rn,Yk")))
    (clobber (reg:CC FLAGS_REG))]
   "ix86_binary_operator_ok (<CODE>, QImode, operands)"
   "@
    <logic>{b}\t{%2, %0|%0, %2}
    <logic>{b}\t{%2, %0|%0, %2}
-   <logic>{l}\t{%k2, %k0|%k0, %k2}"
-  [(set_attr "type" "alu")
-   (set_attr "mode" "QI,QI,SI")])
+   <logic>{l}\t{%k2, %k0|%k0, %k2}
+   k<logic>w\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "alu,alu,alu,msklog")
+   (set_attr "mode" "QI,QI,SI,HI")])
 
 ;; See comment for addsi_1_zext why we do use nonimmediate_operand
 (define_insn "*<code>si_1_zext"
@@ -8071,6 +8179,74 @@
   [(set_attr "type" "alu")
    (set_attr "mode" "<MODE>")])
 
+(define_insn "kxnor<mode>"
+  [(set (match_operand:SWI12 0 "register_operand" "=r,!k")
+	(not:SWI12
+	  (xor:SWI12
+	    (match_operand:SWI12 1 "register_operand" "0,k"))
+	    (match_operand:SWI12 2 "register_operand" "r,k")))]
+  "TARGET_AVX512F"
+  "@
+   #
+   kxnorw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "*,msklog")
+   (set_attr "prefix" "*,vex")
+   (set_attr "mode" "<MODE>")])
+
+(define_split
+  [(set (match_operand:SWI12 0 "register_operand")
+	(not:SWI12
+	  (xor:SWI12
+	    (match_dup 0))
+	    (match_operand:SWI12 1 "register_operand")))]
+  "TARGET_AVX512F && !ANY_MASK_REG_P (operands [0])"
+   [(parallel [(set (match_dup 0)
+		    (xor:HI (match_dup 0)
+			    (match_dup 1)))
+	       (clobber (reg:CC FLAGS_REG))])
+    (set (match_dup 0)
+	 (not:HI (match_dup 0)))]
+  "")
+
+(define_insn "kortestzhi"
+  [(set (reg:CCZ FLAGS_REG)
+	(compare:CCZ
+	  (ior:HI
+	    (match_operand:HI 0 "register_operand" "%Yk")
+	    (match_operand:HI 1 "register_operand" "Yk"))
+	  (const_int 0)))]
+  "TARGET_AVX512F && ix86_match_ccmode (insn, CCZmode)"
+  "kortestw\t{%1, %0|%0, %1}"
+  [(set_attr "mode" "HI")
+   (set_attr "type" "msklog")
+   (set_attr "prefix" "vex")])
+
+(define_insn "kortestchi"
+  [(set (reg:CCC FLAGS_REG)
+	(compare:CCC
+	  (ior:HI
+	    (match_operand:HI 0 "register_operand" "%Yk")
+	    (match_operand:HI 1 "register_operand" "Yk"))
+	  (const_int -1)))]
+  "TARGET_AVX512F && ix86_match_ccmode (insn, CCCmode)"
+  "kortestw\t{%1, %0|%0, %1}"
+  [(set_attr "mode" "HI")
+   (set_attr "type" "msklog")
+   (set_attr "prefix" "vex")])
+
+(define_insn "kunpckhi"
+  [(set (match_operand:HI 0 "register_operand" "=Yk")
+	(ior:HI
+	  (ashift:HI
+	    (match_operand:HI 1 "register_operand" "Yk")
+	    (const_int 8))
+	  (zero_extend:HI (match_operand:QI 2 "register_operand" "Yk"))))]
+  "TARGET_AVX512F"
+  "kunpckbw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mode" "HI")
+   (set_attr "type" "msklog")
+   (set_attr "prefix" "vex")])
+
 ;; See comment for addsi_1_zext why we do use nonimmediate_operand
 ;; ??? Special case for immediate operand is missing - it is tricky.
 (define_insn "*<code>si_2_zext"
@@ -8640,23 +8816,38 @@
   "ix86_expand_unary_operator (NOT, <MODE>mode, operands); DONE;")
 
 (define_insn "*one_cmpl<mode>2_1"
-  [(set (match_operand:SWI248 0 "nonimmediate_operand" "=rm")
-	(not:SWI248 (match_operand:SWI248 1 "nonimmediate_operand" "0")))]
+  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm")
+	(not:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" "0")))]
   "ix86_unary_operator_ok (NOT, <MODE>mode, operands)"
   "not{<imodesuffix>}\t%0"
   [(set_attr "type" "negnot")
    (set_attr "mode" "<MODE>")])
 
+(define_insn "*one_cmplhi2_1"
+  [(set (match_operand:HI 0 "nonimmediate_operand" "=rm,!Yk")
+	(not:HI (match_operand:HI 1 "nonimmediate_operand" "0,Yk")))]
+  "ix86_unary_operator_ok (NOT, HImode, operands)"
+  "@
+   not{w}\t%0
+   knotw\t{%1, %0|%0, %1}"
+  [(set_attr "isa" "*,avx512f")
+   (set_attr "type" "negnot,msklog")
+   (set_attr "prefix" "*,vex")
+   (set_attr "mode" "HI")])
+
 ;; %%% Potential partial reg stall on alternative 1.  What to do?
 (define_insn "*one_cmplqi2_1"
-  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,r")
-	(not:QI (match_operand:QI 1 "nonimmediate_operand" "0,0")))]
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,r,!Yk")
+	(not:QI (match_operand:QI 1 "nonimmediate_operand" "0,0,Yk")))]
   "ix86_unary_operator_ok (NOT, QImode, operands)"
   "@
    not{b}\t%0
-   not{l}\t%k0"
-  [(set_attr "type" "negnot")
-   (set_attr "mode" "QI,SI")])
+   not{l}\t%k0
+   knotw\t{%1, %0|%0, %1}"
+  [(set_attr "isa" "*,*,avx512f")
+   (set_attr "type" "negnot,negnot,msklog")
+   (set_attr "prefix" "*,*,vex")
+   (set_attr "mode" "QI,SI,QI")])
 
 ;; ??? Currently never generated - xor is used instead.
 (define_insn "*one_cmplsi2_1_zext"
@@ -16380,11 +16571,11 @@
 })
 
 ;; Avoid redundant prefixes by splitting HImode arithmetic to SImode.
-
+;; Do not split instructions with mask registers.
 (define_split
-  [(set (match_operand 0 "register_operand")
+  [(set (match_operand 0 "general_reg_operand")
 	(match_operator 3 "promotable_binary_operator"
-	   [(match_operand 1 "register_operand")
+	   [(match_operand 1 "general_reg_operand")
 	    (match_operand 2 "aligned_operand")]))
    (clobber (reg:CC FLAGS_REG))]
   "! TARGET_PARTIAL_REG_STALL && reload_completed
@@ -16479,6 +16670,7 @@
   operands[1] = gen_lowpart (SImode, operands[1]);
 })
 
+;; Do not split instructions with mask regs.
 (define_split
   [(set (match_operand 0 "register_operand")
 	(not (match_operand 1 "register_operand")))]
@@ -16486,7 +16678,9 @@
    && (GET_MODE (operands[0]) == HImode
        || (GET_MODE (operands[0]) == QImode
 	   && (TARGET_PROMOTE_QImode
-	       || optimize_insn_for_size_p ())))"
+	       || optimize_insn_for_size_p ())))
+   && (! ANY_MASK_REG_P (operands[0])
+	 || ! ANY_MASK_REG_P (operands[1]))"
   [(set (match_dup 0)
 	(not:SI (match_dup 1)))]
 {
diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
index 3959c38..18f425c 100644
--- a/gcc/config/i386/predicates.md
+++ b/gcc/config/i386/predicates.md
@@ -32,6 +32,11 @@
   (and (match_code "reg")
        (not (match_test "ANY_FP_REGNO_P (REGNO (op))"))))
 
+;; True if the operand is a GENERAL class register.
+(define_predicate "general_reg_operand"
+  (and (match_code "reg")
+       (match_test "GENERAL_REG_P (op)")))
+
 ;; Return true if OP is a register operand other than an i387 fp register.
 (define_predicate "register_and_not_fp_reg_operand"
   (and (match_code "reg")
@@ -52,6 +57,10 @@
   (and (match_code "reg")
        (match_test "EXT_REX_SSE_REGNO_P (REGNO (op))")))
 
+;; True if the operand is an AVX-512 mask register.
+(define_predicate "mask_reg_operand"
+  (and (match_code "reg")
+       (match_test "MASK_REGNO_P (REGNO (op))")))
 
 ;; True if the operand is a Q_REGS class register.
 (define_predicate "q_regs_operand"
-- 
1.7.11.7


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH i386 2/8] [AVX512] Add mask registers.
  2013-08-27 18:22                         ` Kirill Yukhin
@ 2013-08-27 18:41                           ` Kirill Yukhin
  2013-08-27 20:17                           ` Richard Henderson
  1 sibling, 0 replies; 31+ messages in thread
From: Kirill Yukhin @ 2013-08-27 18:41 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Vladimir Makarov, Uros Bizjak, Jakub Jelinek, GCC Patches

On 27 Aug 22:11, Kirill Yukhin wrote:
Hello, I've while pasting the patch I've accidentally put
extra brace.
Pls Ignore it
> +(define_insn "kxnor<mode>"
> +  [(set (match_operand:SWI12 0 "register_operand" "=r,!k")
> +	(not:SWI12
> +	  (xor:SWI12
> +	    (match_operand:SWI12 1 "register_operand" "0,k")) ;; <--- Extra ')'
> +	    (match_operand:SWI12 2 "register_operand" "r,k")))]
> +  "TARGET_AVX512F"

--
Thanks, K

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH i386 2/8] [AVX512] Add mask registers.
  2013-08-27 18:22                         ` Kirill Yukhin
  2013-08-27 18:41                           ` Kirill Yukhin
@ 2013-08-27 20:17                           ` Richard Henderson
  2013-08-28 17:55                             ` Kirill Yukhin
  1 sibling, 1 reply; 31+ messages in thread
From: Richard Henderson @ 2013-08-27 20:17 UTC (permalink / raw)
  To: Kirill Yukhin; +Cc: Vladimir Makarov, Uros Bizjak, Jakub Jelinek, GCC Patches

On 08/27/2013 11:11 AM, Kirill Yukhin wrote:
>> > What happened to the bmi andn alternative we discussed?
> BMI only supported for 4- and 8- byte integers, while
> kandw - for HI/QI
> 

We're talking about values in registers.  Ignoring the high bits of the andn
result still produces the correct results.


r~

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH i386 2/8] [AVX512] Add mask registers.
  2013-08-27 20:17                           ` Richard Henderson
@ 2013-08-28 17:55                             ` Kirill Yukhin
  2013-08-28 18:07                               ` Richard Henderson
  0 siblings, 1 reply; 31+ messages in thread
From: Kirill Yukhin @ 2013-08-28 17:55 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Vladimir Makarov, Uros Bizjak, Jakub Jelinek, GCC Patches

Hello Richard,

On 27 Aug 13:07, Richard Henderson wrote:
> On 08/27/2013 11:11 AM, Kirill Yukhin wrote:
> >> > What happened to the bmi andn alternative we discussed?
> > BMI only supported for 4- and 8- byte integers, while
> > kandw - for HI/QI
> >
>
> We're talking about values in registers.  Ignoring the high bits of the andn
> result still produces the correct results.

I've updated patch, adding BMI alternative and clobber of flags:
+(define_insn "kandn<mode>"
+  [(set (match_operand:SWI12 0 "register_operand" "=r,&r,!k")
+       (and:SWI12
+         (not:SWI12
+           (match_operand:SWI12 1 "register_operand" "r,0,k"))
+         (match_operand:SWI12 2 "register_operand" "r,r,k")))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_AVX512F"
+  "@
+   andn\t{%k2, %k1, %k0|%k0, %k1, %k2}
+   #
+   kandnw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "bitmanip,*,msklog")
+   (set_attr "prefix" "*,*,vex")
+   (set_attr "btver2_decode" "direct,*,*")
+   (set_attr "mode" "<MODE>")])

However I am not fully understand why do we need this.
`kandn' is different from BMI `andn' in clobbering of flags reg.
So, having such a pattern we'll make compiler think that `kandn'
clobber, which seems to me like opportunity to misoptimization as
far as `kandn' doesn't clobber.

Anyway, it seems to work.

Testing:
  1. Bootstrap pass
  2. make check shows no regressions
  3. Spec 2000 & 2006 build show no regressions both with and without -mavx512f option
  4. Spec 2000 & 2006 run shows no stability regressions without -mavx512f option

Is it ok?

---
 gcc/config/i386/constraints.md |   8 +-
 gcc/config/i386/i386.c         |  34 ++++-
 gcc/config/i386/i386.h         |  40 ++++--
 gcc/config/i386/i386.md        | 283 ++++++++++++++++++++++++++++++++++-------
 gcc/config/i386/predicates.md  |   9 ++
 5 files changed, 314 insertions(+), 60 deletions(-)

diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
index 28e626f..92e0c05 100644
--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -19,7 +19,7 @@
 
 ;;; Unused letters:
 ;;;     B     H           T
-;;;           h jk
+;;;           h j
 
 ;; Integer register constraints.
 ;; It is not necessary to define 'r' here.
@@ -78,6 +78,12 @@
  "TARGET_80387 || TARGET_FLOAT_RETURNS_IN_80387 ? FP_SECOND_REG : NO_REGS"
  "Second from top of 80387 floating-point stack (@code{%st(1)}).")
 
+(define_register_constraint "k" "TARGET_AVX512F ? MASK_EVEX_REGS : NO_REGS"
+"@internal Any mask register that can be used as predicate, i.e. k1-k7.")
+
+(define_register_constraint "Yk" "TARGET_AVX512F ? MASK_REGS : NO_REGS"
+"@internal Any mask register.")
+
 ;; Vector registers (also used for plain floating point nowadays).
 (define_register_constraint "y" "TARGET_MMX ? MMX_REGS : NO_REGS"
  "Any MMX register.")
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index d05dbf0..8325919 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2032,6 +2032,9 @@ enum reg_class const regclass_map[FIRST_PSEUDO_REGISTER] =
   EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS,
   EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS,
   EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS,
+  /* Mask registers.  */
+  MASK_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS,
+  MASK_EVEX_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS,
 };
 
 /* The "default" register map used in 32bit mode.  */
@@ -2047,6 +2050,7 @@ int const dbx_register_map[FIRST_PSEUDO_REGISTER] =
   -1, -1, -1, -1, -1, -1, -1, -1, /* extended SSE registers */
   -1, -1, -1, -1, -1, -1, -1, -1,       /* AVX-512 registers 16-23*/
   -1, -1, -1, -1, -1, -1, -1, -1,       /* AVX-512 registers 24-31*/
+  93, 94, 95, 96, 97, 98, 99, 100,      /* Mask registers */
 };
 
 /* The "default" register map used in 64bit mode.  */
@@ -2062,6 +2066,7 @@ int const dbx64_register_map[FIRST_PSEUDO_REGISTER] =
   25, 26, 27, 28, 29, 30, 31, 32, /* extended SSE registers */
   67, 68, 69, 70, 71, 72, 73, 74,       /* AVX-512 registers 16-23 */
   75, 76, 77, 78, 79, 80, 81, 82,       /* AVX-512 registers 24-31 */
+  118, 119, 120, 121, 122, 123, 124, 125, /* Mask registers */
 };
 
 /* Define the register numbers to be used in Dwarf debugging information.
@@ -2129,6 +2134,7 @@ int const svr4_dbx_register_map[FIRST_PSEUDO_REGISTER] =
   -1, -1, -1, -1, -1, -1, -1, -1, /* extended SSE registers */
   -1, -1, -1, -1, -1, -1, -1, -1,       /* AVX-512 registers 16-23*/
   -1, -1, -1, -1, -1, -1, -1, -1,       /* AVX-512 registers 24-31*/
+  93, 94, 95, 96, 97, 98, 99, 100,      /* Mask registers */
 };
 
 /* Define parameter passing and return registers.  */
@@ -4219,8 +4225,13 @@ ix86_conditional_register_usage (void)
 
   /* If AVX512F is disabled, squash the registers.  */
   if (! TARGET_AVX512F)
+  {
     for (i = FIRST_EXT_REX_SSE_REG; i < LAST_EXT_REX_SSE_REG; i++)
       fixed_regs[i] = call_used_regs[i] = 1, reg_names[i] = "";
+
+    for (i = FIRST_MASK_REG; i < LAST_MASK_REG; i++)
+      fixed_regs[i] = call_used_regs[i] = 1, reg_names[i] = "";
+  }
 }
 
 
@@ -33889,10 +33900,12 @@ ix86_preferred_reload_class (rtx x, reg_class_t regclass)
     return regclass;
 
   /* Force constants into memory if we are loading a (nonzero) constant into
-     an MMX or SSE register.  This is because there are no MMX/SSE instructions
-     to load from a constant.  */
+     an MMX, SSE or MASK register.  This is because there are no MMX/SSE/MASK
+     instructions to load from a constant.  */
   if (CONSTANT_P (x)
-      && (MAYBE_MMX_CLASS_P (regclass) || MAYBE_SSE_CLASS_P (regclass)))
+      && (MAYBE_MMX_CLASS_P (regclass)
+  || MAYBE_SSE_CLASS_P (regclass)
+  || MAYBE_MASK_CLASS_P (regclass)))
     return NO_REGS;
 
   /* Prefer SSE regs only, if we can use them for math.  */
@@ -33996,10 +34009,11 @@ ix86_secondary_reload (bool in_p, rtx x, reg_class_t rclass,
 
   /* QImode spills from non-QI registers require
      intermediate register on 32bit targets.  */
-  if (!TARGET_64BIT
-      && !in_p && mode == QImode
-      && INTEGER_CLASS_P (rclass)
-      && MAYBE_NON_Q_CLASS_P (rclass))
+  if (mode == QImode
+      && (MAYBE_MASK_CLASS_P (rclass)
+  || (!TARGET_64BIT && !in_p
+      && INTEGER_CLASS_P (rclass)
+      && MAYBE_NON_Q_CLASS_P (rclass))))
     {
       int regno;
 
@@ -34421,6 +34435,8 @@ ix86_hard_regno_mode_ok (int regno, enum machine_mode mode)
     return false;
   if (STACK_REGNO_P (regno))
     return VALID_FP_MODE_P (mode);
+  if (MASK_REGNO_P (regno))
+    return VALID_MASK_REG_MODE (mode);
   if (SSE_REGNO_P (regno))
     {
       /* We implement the move patterns for all vector modes into and
@@ -35230,6 +35246,10 @@ x86_order_regs_for_local_alloc (void)
    for (i = FIRST_EXT_REX_SSE_REG; i <= LAST_EXT_REX_SSE_REG; i++)
      reg_alloc_order [pos++] = i;
 
+   /* Mask register.  */
+   for (i = FIRST_MASK_REG; i <= LAST_MASK_REG; i++)
+     reg_alloc_order [pos++] = i;
+
    /* x87 registers.  */
    if (TARGET_SSE_MATH)
      for (i = FIRST_STACK_REG; i <= LAST_STACK_REG; i++)
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index e820aa6..13572bf2 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -893,7 +893,7 @@ enum target_cpu_default
    eliminated during reloading in favor of either the stack or frame
    pointer.  */
 
-#define FIRST_PSEUDO_REGISTER 69
+#define FIRST_PSEUDO_REGISTER 77
 
 /* Number of hardware registers that go into the DWARF-2 unwind info.
    If not defined, equals FIRST_PSEUDO_REGISTER.  */
@@ -923,7 +923,9 @@ enum target_cpu_default
 /*xmm16,xmm17,xmm18,xmm19,xmm20,xmm21,xmm22,xmm23*/ \
      0,   0,    0,    0,    0,    0,    0,    0, \
 /*xmm24,xmm25,xmm26,xmm27,xmm28,xmm29,xmm30,xmm31*/ \
-     0,   0,    0,    0,    0,    0,    0,    0 }
+     0,   0,    0,    0,    0,    0,    0,    0, \
+/*  k0,  k1, k2, k3, k4, k5, k6, k7*/ \
+     0,  0,   0,  0,  0,  0,  0,  0 }
 
 /* 1 for registers not available across function calls.
    These must include the FIXED_REGISTERS and also any
@@ -955,7 +957,9 @@ enum target_cpu_default
 /*xmm16,xmm17,xmm18,xmm19,xmm20,xmm21,xmm22,xmm23*/ \
      6,    6,     6,    6,    6,    6,    6,    6, \
 /*xmm24,xmm25,xmm26,xmm27,xmm28,xmm29,xmm30,xmm31*/ \
-     6,    6,     6,    6,    6,    6,    6,    6 }
+     6,    6,     6,    6,    6,    6,    6,    6, \
+ /* k0,  k1,  k2,  k3,  k4,  k5,  k6,  k7*/ \
+     1,   1,   1,   1,   1,   1,   1,   1 }
 
 /* Order in which to allocate registers.  Each register must be
    listed once, even those in FIXED_REGISTERS.  List frame pointer
@@ -971,7 +975,7 @@ enum target_cpu_default
    18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, \
    33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,  \
    48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, \
-   63, 64, 65, 66, 67, 68 }
+   63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76 }
 
 /* ADJUST_REG_ALLOC_ORDER is a macro which permits reg_alloc_order
    to be rearranged based on a particular function.  When using sse math,
@@ -1068,6 +1072,8 @@ enum target_cpu_default
    || (MODE) == V16SImode || (MODE) == V32HImode || (MODE) == V8DFmode \
    || (MODE) == V16SFmode)
 
+#define VALID_MASK_REG_MODE(MODE) ((MODE) == HImode || (MODE) == QImode)
+
 /* Value is 1 if hard register REGNO can hold a value of machine-mode MODE.  */
 
 #define HARD_REGNO_MODE_OK(REGNO, MODE) \
@@ -1093,8 +1099,10 @@ enum target_cpu_default
   (CC_REGNO_P (REGNO) ? VOIDmode \
    : (MODE) == VOIDmode && (NREGS) != 1 ? VOIDmode \
    : (MODE) == VOIDmode ? choose_hard_reg_mode ((REGNO), (NREGS), false) \
-   : (MODE) == HImode && !TARGET_PARTIAL_REG_STALL ? SImode \
-   : (MODE) == QImode && !(TARGET_64BIT || QI_REGNO_P (REGNO)) ? SImode \
+   : (MODE) == HImode && !(TARGET_PARTIAL_REG_STALL \
+   || MASK_REGNO_P (REGNO)) ? SImode \
+   : (MODE) == QImode && !(TARGET_64BIT || QI_REGNO_P (REGNO) \
+   || MASK_REGNO_P (REGNO)) ? SImode \
    : (MODE))
 
 /* The only ABI that saves SSE registers across calls is Win64 (thus no
@@ -1141,6 +1149,9 @@ enum target_cpu_default
 #define FIRST_EXT_REX_SSE_REG  (LAST_REX_SSE_REG + 1) /*53*/
 #define LAST_EXT_REX_SSE_REG   (FIRST_EXT_REX_SSE_REG + 15) /*68*/
 
+#define FIRST_MASK_REG  (LAST_EXT_REX_SSE_REG + 1) /*69*/
+#define LAST_MASK_REG   (FIRST_MASK_REG + 7) /*76*/
+
 /* Override this in other tm.h files to cope with various OS lossage
    requiring a frame pointer.  */
 #ifndef SUBTARGET_FRAME_POINTER_REQUIRED
@@ -1229,6 +1240,8 @@ enum reg_class
   FLOAT_INT_REGS,
   INT_SSE_REGS,
   FLOAT_INT_SSE_REGS,
+  MASK_EVEX_REGS,
+  MASK_REGS,
   ALL_REGS, LIM_REG_CLASSES
 };
 
@@ -1250,6 +1263,8 @@ enum reg_class
   reg_classes_intersect_p ((CLASS), ALL_SSE_REGS)
 #define MAYBE_MMX_CLASS_P(CLASS) \
   reg_classes_intersect_p ((CLASS), MMX_REGS)
+#define MAYBE_MASK_CLASS_P(CLASS) \
+  reg_classes_intersect_p ((CLASS), MASK_REGS)
 
 #define Q_CLASS_P(CLASS) \
   reg_class_subset_p ((CLASS), Q_REGS)
@@ -1282,6 +1297,8 @@ enum reg_class
    "FLOAT_INT_REGS", \
    "INT_SSE_REGS", \
    "FLOAT_INT_SSE_REGS", \
+   "MASK_EVEX_REGS", \
+   "MASK_REGS", \
    "ALL_REGS" }
 
 /* Define which registers fit in which classes.  This is an initializer
@@ -1319,7 +1336,9 @@ enum reg_class
 {   0x11ffff,    0x1fe0,   0x0 },       /* FLOAT_INT_REGS */            \
 { 0x1ff100ff,0xffffffe0,  0x1f },       /* INT_SSE_REGS */              \
 { 0x1ff1ffff,0xffffffe0,  0x1f },       /* FLOAT_INT_SSE_REGS */        \
-{ 0xffffffff,0xffffffff,  0x1f }                                        \
+       { 0x0,       0x0,0x1fc0 },       /* MASK_EVEX_REGS */           \
+       { 0x0,       0x0,0x1fe0 },       /* MASK_REGS */                 \
+{ 0xffffffff,0xffffffff,0x1fff }                                        \
 }
 
 /* The same information, inverted:
@@ -1377,6 +1396,8 @@ enum reg_class
          : (N) <= LAST_REX_SSE_REG ? (FIRST_REX_SSE_REG + (N) - 8) \
                                    : (FIRST_EXT_REX_SSE_REG + (N) - 16))
 
+#define MASK_REGNO_P(N) IN_RANGE ((N), FIRST_MASK_REG, LAST_MASK_REG)
+#define ANY_MASK_REG_P(X) (REG_P (X) && MASK_REGNO_P (REGNO (X)))
 
 #define SSE_FLOAT_MODE_P(MODE) \
   ((TARGET_SSE && (MODE) == SFmode) || (TARGET_SSE2 && (MODE) == DFmode))
@@ -1429,7 +1450,7 @@ enum reg_class
 
 /* Get_secondary_mem widens integral modes to BITS_PER_WORD.
    There is no need to emit full 64 bit move on 64 bit targets
-   for integral modes that can be moved using 32 bit move.  */
+   for integral modes that can be moved using 8 bit move.  */
 #define SECONDARY_MEMORY_NEEDED_MODE(MODE) \
   (GET_MODE_BITSIZE (MODE) < 32 && INTEGRAL_MODE_P (MODE) \
    ? mode_for_size (32, GET_MODE_CLASS (MODE), 0) \
@@ -1933,7 +1954,8 @@ do { \
  "xmm16", "xmm17", "xmm18", "xmm19", \
  "xmm20", "xmm21", "xmm22", "xmm23", \
  "xmm24", "xmm25", "xmm26", "xmm27", \
- "xmm28", "xmm29", "xmm30", "xmm31" }
+ "xmm28", "xmm29", "xmm30", "xmm31", \
+ "k0", "k1", "k2", "k3", "k4", "k5", "k6", "k7" }
 
 #define REGISTER_NAMES HI_REGISTER_NAMES
 
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 3d7533a..4ecfec9 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -328,6 +328,14 @@
    (XMM29_REG 66)
    (XMM30_REG 67)
    (XMM31_REG 68)
+   (MASK0_REG 69)
+   (MASK1_REG 70)
+   (MASK2_REG 71)
+   (MASK3_REG 72)
+   (MASK4_REG 73)
+   (MASK5_REG 74)
+   (MASK6_REG 75)
+   (MASK7_REG 76)
   ])
 
 ;; Insns whose names begin with "x86_" are emitted by gen_FOO calls
@@ -360,7 +368,7 @@
    sseishft,sseishft1,ssecmp,ssecomi,
    ssecvt,ssecvt1,sseicvt,sseins,
    sseshuf,sseshuf1,ssemuladd,sse4arg,
-   lwp,
+   lwp,mskmov,msklog,
    mmx,mmxmov,mmxadd,mmxmul,mmxcmp,mmxcvt,mmxshft"
   (const_string "other"))
 
@@ -379,7 +387,7 @@
   ssemul,sseimul,ssediv,sselog,sselog1,
   sseishft,sseishft1,ssecmp,ssecomi,
   ssecvt,ssecvt1,sseicvt,sseins,
-  sseshuf,sseshuf1,ssemuladd,sse4arg")
+  sseshuf,sseshuf1,ssemuladd,sse4arg,mskmov")
    (const_string "sse")
  (eq_attr "type" "mmx,mmxmov,mmxadd,mmxmul,mmxcmp,mmxcvt,mmxshft")
    (const_string "mmx")
@@ -390,7 +398,7 @@
 ;; The (bounding maximum) length of an instruction immediate.
 (define_attr "length_immediate" ""
   (cond [(eq_attr "type" "incdec,setcc,icmov,str,lea,other,multi,idiv,leave,
-  bitmanip,imulx")
+  bitmanip,imulx,msklog,mskmov")
    (const_int 0)
  (eq_attr "unit" "i387,sse,mmx")
    (const_int 0)
@@ -451,7 +459,7 @@
 ;; Set when 0f opcode prefix is used.
 (define_attr "prefix_0f" ""
   (if_then_else
-    (ior (eq_attr "type" "imovx,setcc,icmov,bitmanip")
+    (ior (eq_attr "type" "imovx,setcc,icmov,bitmanip,msklog,mskmov")
  (eq_attr "unit" "sse,mmx"))
     (const_int 1)
     (const_int 0)))
@@ -651,7 +659,7 @@
    fmov,fcmp,fsgn,
    sse,ssemov,ssecmp,ssecomi,ssecvt,ssecvt1,sseicvt,
    sselog1,sseshuf1,sseadd1,sseiadd1,sseishft1,
-   mmx,mmxmov,mmxcmp,mmxcvt")
+   mmx,mmxmov,mmxcmp,mmxcvt,mskmov,msklog")
       (match_operand 2 "memory_operand"))
    (const_string "load")
  (and (eq_attr "type" "icmov,ssemuladd,sse4arg")
@@ -2213,8 +2221,8 @@
    (const_string "SI")))])
 
 (define_insn "*movhi_internal"
-  [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r ,r ,m")
- (match_operand:HI 1 "general_operand"   "r ,rn,rm,rn"))]
+  [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r ,r ,m ,Yk,Yk,rm")
+ (match_operand:HI 1 "general_operand"      "r ,rn,rm,rn,rm,Yk,Yk"))]
   "!(MEM_P (operands[0]) && MEM_P (operands[1]))"
 {
   switch (get_attr_type (insn))
@@ -2223,6 +2231,16 @@
       /* movzwl is faster than movw on p2 due to partial word stalls,
  though not as fast as an aligned movl.  */
       return "movz{wl|x}\t{%1, %k0|%k0, %1}";
+
+    case TYPE_MSKMOV:
+      switch (which_alternative)
+        {
+ case 4: return "kmovw\t{%k1, %0|%0, %k1}";
+ case 5: return "kmovw\t{%1, %0|%0, %1}";
+ case 6: return "kmovw\t{%1, %k0|%k0, %1}";
+ default: gcc_unreachable ();
+ }
+
     default:
       if (get_attr_mode (insn) == MODE_SI)
         return "mov{l}\t{%k1, %k0|%k0, %k1}";
@@ -2240,11 +2258,17 @@
     (and (eq_attr "alternative" "1,2")
  (match_operand:HI 1 "aligned_operand"))
       (const_string "imov")
+    (eq_attr "alternative" "4,5,6")
+      (const_string "mskmov")
     (and (match_test "TARGET_MOVX")
  (eq_attr "alternative" "0,2"))
       (const_string "imovx")
    ]
    (const_string "imov")))
+    (set (attr "prefix")
+      (if_then_else (eq_attr "alternative" "4,5,6")
+ (const_string "vex")
+ (const_string "orig")))
     (set (attr "mode")
       (cond [(eq_attr "type" "imovx")
        (const_string "SI")
@@ -2269,8 +2293,8 @@
 ;; register stall machines with, where we use QImode instructions, since
 ;; partial register stall can be caused there.  Then we use movzx.
 (define_insn "*movqi_internal"
-  [(set (match_operand:QI 0 "nonimmediate_operand" "=q,q ,q ,r,r ,?r,m")
- (match_operand:QI 1 "general_operand"      "q ,qn,qm,q,rn,qm,qn"))]
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=q,q ,q ,r,r ,?r,m ,Yk,Yk,r")
+ (match_operand:QI 1 "general_operand"      "q ,qn,qm,q,rn,qm,qn,r ,Yk,Yk"))]
   "!(MEM_P (operands[0]) && MEM_P (operands[1]))"
 {
   switch (get_attr_type (insn))
@@ -2278,6 +2302,16 @@
     case TYPE_IMOVX:
       gcc_assert (ANY_QI_REG_P (operands[1]) || MEM_P (operands[1]));
       return "movz{bl|x}\t{%1, %k0|%k0, %1}";
+
+    case TYPE_MSKMOV:
+      switch (which_alternative)
+        {
+ case 7: return "kmovw\t{%k1, %0|%0, %k1}";
+ case 8: return "kmovw\t{%1, %0|%0, %1}";
+ case 9: return "kmovw\t{%1, %k0|%k0, %1}";
+ default: gcc_unreachable ();
+ }
+
     default:
       if (get_attr_mode (insn) == MODE_SI)
         return "mov{l}\t{%k1, %k0|%k0, %k1}";
@@ -2297,11 +2331,17 @@
       (const_string "imov")
     (eq_attr "alternative" "3,5")
       (const_string "imovx")
+    (eq_attr "alternative" "7,8,9")
+      (const_string "mskmov")
     (and (match_test "TARGET_MOVX")
  (eq_attr "alternative" "2"))
       (const_string "imovx")
    ]
    (const_string "imov")))
+   (set (attr "prefix")
+     (if_then_else (eq_attr "alternative" "7,8,9")
+       (const_string "vex")
+       (const_string "orig")))
    (set (attr "mode")
       (cond [(eq_attr "alternative" "3,4,5")
        (const_string "SI")
@@ -7494,6 +7534,26 @@
   operands[3] = gen_lowpart (QImode, operands[3]);
 })
 
+(define_split
+  [(set (match_operand:SWI12 0 "mask_reg_operand")
+ (any_logic:SWI12 (match_operand:SWI12 1 "mask_reg_operand")
+ (match_operand:SWI12 2 "mask_reg_operand")))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_AVX512F && reload_completed"
+  [(set (match_dup 0)
+ (any_logic:SWI12 (match_dup 1)
+ (match_dup 2)))])
+
+(define_insn "*k<logic><mode>"
+  [(set (match_operand:SWI12 0 "mask_reg_operand" "=Yk")
+ (any_logic:SWI12 (match_operand:SWI12 1 "mask_reg_operand" "Yk")
+ (match_operand:SWI12 2 "mask_reg_operand" "Yk")))]
+  "TARGET_AVX512F && reload_completed"
+  "k<logic>w\t{%2, %1, %0|%0, %1, %2}";
+  [(set_attr "mode" "<MODE>")
+   (set_attr "type" "msklog")
+   (set_attr "prefix" "vex")])
+
 ;; %%% This used to optimize known byte-wide and operations to memory,
 ;; and sometimes to QImode registers.  If this is considered useful,
 ;; it should be done with splitters.
@@ -7616,10 +7676,10 @@
   [(set_attr "type" "alu")
    (set_attr "mode" "SI")])
 
-(define_insn "*andhi_1"
-  [(set (match_operand:HI 0 "nonimmediate_operand" "=rm,r,Ya")
- (and:HI (match_operand:HI 1 "nonimmediate_operand" "%0,0,qm")
- (match_operand:HI 2 "general_operand" "rn,rm,L")))
+(define_insn "andhi_1"
+  [(set (match_operand:HI 0 "nonimmediate_operand" "=rm,r,Ya,!Yk")
+ (and:HI (match_operand:HI 1 "nonimmediate_operand" "%0,0,qm,Yk")
+ (match_operand:HI 2 "general_operand" "rn,rm,L,Yk")))
    (clobber (reg:CC FLAGS_REG))]
   "ix86_binary_operator_ok (AND, HImode, operands)"
 {
@@ -7628,34 +7688,38 @@
     case TYPE_IMOVX:
       return "#";
 
+    case TYPE_MSKLOG:
+      return "kandw\t{%2, %1, %0|%0, %1, %2}";
+
     default:
       gcc_assert (rtx_equal_p (operands[0], operands[1]));
       return "and{w}\t{%2, %0|%0, %2}";
     }
 }
-  [(set_attr "type" "alu,alu,imovx")
-   (set_attr "length_immediate" "*,*,0")
+  [(set_attr "type" "alu,alu,imovx,msklog")
+   (set_attr "length_immediate" "*,*,0,*")
    (set (attr "prefix_rex")
      (if_then_else
        (and (eq_attr "type" "imovx")
     (match_operand 1 "ext_QIreg_operand"))
        (const_string "1")
        (const_string "*")))
-   (set_attr "mode" "HI,HI,SI")])
+   (set_attr "mode" "HI,HI,SI,HI")])
 
 ;; %%% Potential partial reg stall on alternative 2.  What to do?
 (define_insn "*andqi_1"
-  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,q,r")
- (and:QI (match_operand:QI 1 "nonimmediate_operand" "%0,0,0")
- (match_operand:QI 2 "general_operand" "qn,qmn,rn")))
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,q,r,!Yk")
+ (and:QI (match_operand:QI 1 "nonimmediate_operand" "%0,0,0,Yk")
+ (match_operand:QI 2 "general_operand" "qn,qmn,rn,Yk")))
    (clobber (reg:CC FLAGS_REG))]
   "ix86_binary_operator_ok (AND, QImode, operands)"
   "@
    and{b}\t{%2, %0|%0, %2}
    and{b}\t{%2, %0|%0, %2}
-   and{l}\t{%k2, %k0|%k0, %k2}"
-  [(set_attr "type" "alu")
-   (set_attr "mode" "QI,QI,SI")])
+   and{l}\t{%k2, %k0|%k0, %k2}
+   kandw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "alu,alu,alu,msklog")
+   (set_attr "mode" "QI,QI,SI,HI")])
 
 (define_insn "*andqi_1_slp"
   [(set (strict_low_part (match_operand:QI 0 "nonimmediate_operand" "+qm,q"))
@@ -7668,6 +7732,38 @@
   [(set_attr "type" "alu1")
    (set_attr "mode" "QI")])
 
+(define_insn "kandn<mode>"
+  [(set (match_operand:SWI12 0 "register_operand" "=r,&r,!k")
+ (and:SWI12
+  (not:SWI12
+    (match_operand:SWI12 1 "register_operand" "r,0,k"))
+  (match_operand:SWI12 2 "register_operand" "r,r,k")))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_AVX512F"
+  "@
+   andn\t{%k2, %k1, %k0|%k0, %k1, %k2}
+   #
+   kandnw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "bitmanip,*,msklog")
+   (set_attr "prefix" "*,*,vex")
+   (set_attr "btver2_decode" "direct,*,*")
+   (set_attr "mode" "<MODE>")])
+
+(define_split
+  [(set (match_operand:SWI12 0 "general_reg_operand")
+ (and:SWI12
+  (not:SWI12
+    (match_dup 0))
+  (match_operand:SWI12 1 "general_reg_operand")))]
+  "TARGET_AVX512F && !TARGET_BMI"
+  [(set (match_dup 0)
+ (not:HI (match_dup 0)))
+   (parallel [(set (match_dup 0)
+   (and:HI (match_dup 0)
+   (match_dup 1)))
+      (clobber (reg:CC FLAGS_REG))])]
+  "")
+
 ;; Turn *anddi_1 into *andsi_1_zext if possible.
 (define_split
   [(set (match_operand:DI 0 "register_operand")
@@ -7999,29 +8095,44 @@
   "ix86_expand_binary_operator (<CODE>, <MODE>mode, operands); DONE;")
 
 (define_insn "*<code><mode>_1"
-  [(set (match_operand:SWI248 0 "nonimmediate_operand" "=r,rm")
- (any_or:SWI248
- (match_operand:SWI248 1 "nonimmediate_operand" "%0,0")
- (match_operand:SWI248 2 "<general_operand>" "<g>,r<i>")))
+  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=r,rm")
+ (any_or:SWI48
+ (match_operand:SWI48 1 "nonimmediate_operand" "%0,0")
+ (match_operand:SWI48 2 "<general_operand>" "<g>,r<i>")))
    (clobber (reg:CC FLAGS_REG))]
   "ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
   "<logic>{<imodesuffix>}\t{%2, %0|%0, %2}"
   [(set_attr "type" "alu")
    (set_attr "mode" "<MODE>")])
 
+(define_insn "<code>hi_1"
+  [(set (match_operand:HI 0 "nonimmediate_operand" "=r,rm,!Yk")
+ (any_or:HI
+ (match_operand:HI 1 "nonimmediate_operand" "%0,0,Yk")
+ (match_operand:HI 2 "general_operand" "<g>,r<i>,Yk")))
+   (clobber (reg:CC FLAGS_REG))]
+  "ix86_binary_operator_ok (<CODE>, HImode, operands)"
+  "@
+  <logic>{w}\t{%2, %0|%0, %2}
+  <logic>{w}\t{%2, %0|%0, %2}
+  k<logic>w\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "alu,alu,msklog")
+   (set_attr "mode" "HI")])
+
 ;; %%% Potential partial reg stall on alternative 2.  What to do?
 (define_insn "*<code>qi_1"
-  [(set (match_operand:QI 0 "nonimmediate_operand" "=q,m,r")
- (any_or:QI (match_operand:QI 1 "nonimmediate_operand" "%0,0,0")
-   (match_operand:QI 2 "general_operand" "qmn,qn,rn")))
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=q,m,r,!Yk")
+ (any_or:QI (match_operand:QI 1 "nonimmediate_operand" "%0,0,0,Yk")
+   (match_operand:QI 2 "general_operand" "qmn,qn,rn,Yk")))
    (clobber (reg:CC FLAGS_REG))]
   "ix86_binary_operator_ok (<CODE>, QImode, operands)"
   "@
    <logic>{b}\t{%2, %0|%0, %2}
    <logic>{b}\t{%2, %0|%0, %2}
-   <logic>{l}\t{%k2, %k0|%k0, %k2}"
-  [(set_attr "type" "alu")
-   (set_attr "mode" "QI,QI,SI")])
+   <logic>{l}\t{%k2, %k0|%k0, %k2}
+   k<logic>w\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "alu,alu,alu,msklog")
+   (set_attr "mode" "QI,QI,SI,HI")])
 
 ;; See comment for addsi_1_zext why we do use nonimmediate_operand
 (define_insn "*<code>si_1_zext"
@@ -8071,6 +8182,74 @@
   [(set_attr "type" "alu")
    (set_attr "mode" "<MODE>")])
 
+(define_insn "kxnor<mode>"
+  [(set (match_operand:SWI12 0 "register_operand" "=r,!k")
+ (not:SWI12
+  (xor:SWI12
+    (match_operand:SWI12 1 "register_operand" "0,k")
+    (match_operand:SWI12 2 "register_operand" "r,k"))))]
+  "TARGET_AVX512F"
+  "@
+   #
+   kxnorw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "*,msklog")
+   (set_attr "prefix" "*,vex")
+   (set_attr "mode" "<MODE>")])
+
+(define_split
+  [(set (match_operand:SWI12 0 "register_operand")
+ (not:SWI12
+  (xor:SWI12
+    (match_dup 0)
+    (match_operand:SWI12 1 "register_operand"))))]
+  "TARGET_AVX512F && !ANY_MASK_REG_P (operands [0])"
+   [(parallel [(set (match_dup 0)
+    (xor:HI (match_dup 0)
+    (match_dup 1)))
+       (clobber (reg:CC FLAGS_REG))])
+    (set (match_dup 0)
+ (not:HI (match_dup 0)))]
+  "")
+
+(define_insn "kortestzhi"
+  [(set (reg:CCZ FLAGS_REG)
+ (compare:CCZ
+  (ior:HI
+    (match_operand:HI 0 "register_operand" "%Yk")
+    (match_operand:HI 1 "register_operand" "Yk"))
+  (const_int 0)))]
+  "TARGET_AVX512F && ix86_match_ccmode (insn, CCZmode)"
+  "kortestw\t{%1, %0|%0, %1}"
+  [(set_attr "mode" "HI")
+   (set_attr "type" "msklog")
+   (set_attr "prefix" "vex")])
+
+(define_insn "kortestchi"
+  [(set (reg:CCC FLAGS_REG)
+ (compare:CCC
+  (ior:HI
+    (match_operand:HI 0 "register_operand" "%Yk")
+    (match_operand:HI 1 "register_operand" "Yk"))
+  (const_int -1)))]
+  "TARGET_AVX512F && ix86_match_ccmode (insn, CCCmode)"
+  "kortestw\t{%1, %0|%0, %1}"
+  [(set_attr "mode" "HI")
+   (set_attr "type" "msklog")
+   (set_attr "prefix" "vex")])
+
+(define_insn "kunpckhi"
+  [(set (match_operand:HI 0 "register_operand" "=Yk")
+ (ior:HI
+  (ashift:HI
+    (match_operand:HI 1 "register_operand" "Yk")
+    (const_int 8))
+  (zero_extend:HI (match_operand:QI 2 "register_operand" "Yk"))))]
+  "TARGET_AVX512F"
+  "kunpckbw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mode" "HI")
+   (set_attr "type" "msklog")
+   (set_attr "prefix" "vex")])
+
 ;; See comment for addsi_1_zext why we do use nonimmediate_operand
 ;; ??? Special case for immediate operand is missing - it is tricky.
 (define_insn "*<code>si_2_zext"
@@ -8640,23 +8819,38 @@
   "ix86_expand_unary_operator (NOT, <MODE>mode, operands); DONE;")
 
 (define_insn "*one_cmpl<mode>2_1"
-  [(set (match_operand:SWI248 0 "nonimmediate_operand" "=rm")
- (not:SWI248 (match_operand:SWI248 1 "nonimmediate_operand" "0")))]
+  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm")
+ (not:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" "0")))]
   "ix86_unary_operator_ok (NOT, <MODE>mode, operands)"
   "not{<imodesuffix>}\t%0"
   [(set_attr "type" "negnot")
    (set_attr "mode" "<MODE>")])
 
+(define_insn "*one_cmplhi2_1"
+  [(set (match_operand:HI 0 "nonimmediate_operand" "=rm,!Yk")
+ (not:HI (match_operand:HI 1 "nonimmediate_operand" "0,Yk")))]
+  "ix86_unary_operator_ok (NOT, HImode, operands)"
+  "@
+   not{w}\t%0
+   knotw\t{%1, %0|%0, %1}"
+  [(set_attr "isa" "*,avx512f")
+   (set_attr "type" "negnot,msklog")
+   (set_attr "prefix" "*,vex")
+   (set_attr "mode" "HI")])
+
 ;; %%% Potential partial reg stall on alternative 1.  What to do?
 (define_insn "*one_cmplqi2_1"
-  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,r")
- (not:QI (match_operand:QI 1 "nonimmediate_operand" "0,0")))]
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,r,!Yk")
+ (not:QI (match_operand:QI 1 "nonimmediate_operand" "0,0,Yk")))]
   "ix86_unary_operator_ok (NOT, QImode, operands)"
   "@
    not{b}\t%0
-   not{l}\t%k0"
-  [(set_attr "type" "negnot")
-   (set_attr "mode" "QI,SI")])
+   not{l}\t%k0
+   knotw\t{%1, %0|%0, %1}"
+  [(set_attr "isa" "*,*,avx512f")
+   (set_attr "type" "negnot,negnot,msklog")
+   (set_attr "prefix" "*,*,vex")
+   (set_attr "mode" "QI,SI,QI")])
 
 ;; ??? Currently never generated - xor is used instead.
 (define_insn "*one_cmplsi2_1_zext"
@@ -16380,11 +16574,11 @@
 })
 
 ;; Avoid redundant prefixes by splitting HImode arithmetic to SImode.
-
+;; Do not split instructions with mask registers.
 (define_split
-  [(set (match_operand 0 "register_operand")
+  [(set (match_operand 0 "general_reg_operand")
  (match_operator 3 "promotable_binary_operator"
-   [(match_operand 1 "register_operand")
+   [(match_operand 1 "general_reg_operand")
     (match_operand 2 "aligned_operand")]))
    (clobber (reg:CC FLAGS_REG))]
   "! TARGET_PARTIAL_REG_STALL && reload_completed
@@ -16479,6 +16673,7 @@
   operands[1] = gen_lowpart (SImode, operands[1]);
 })
 
+;; Do not split instructions with mask regs.
 (define_split
   [(set (match_operand 0 "register_operand")
  (not (match_operand 1 "register_operand")))]
@@ -16486,7 +16681,9 @@
    && (GET_MODE (operands[0]) == HImode
        || (GET_MODE (operands[0]) == QImode
    && (TARGET_PROMOTE_QImode
-       || optimize_insn_for_size_p ())))"
+       || optimize_insn_for_size_p ())))
+   && (! ANY_MASK_REG_P (operands[0])
+ || ! ANY_MASK_REG_P (operands[1]))"
   [(set (match_dup 0)
  (not:SI (match_dup 1)))]
 {
diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
index 3959c38..18f425c 100644
--- a/gcc/config/i386/predicates.md
+++ b/gcc/config/i386/predicates.md
@@ -32,6 +32,11 @@
   (and (match_code "reg")
        (not (match_test "ANY_FP_REGNO_P (REGNO (op))"))))
 
+;; True if the operand is a GENERAL class register.
+(define_predicate "general_reg_operand"
+  (and (match_code "reg")
+       (match_test "GENERAL_REG_P (op)")))
+
 ;; Return true if OP is a register operand other than an i387 fp register.
 (define_predicate "register_and_not_fp_reg_operand"
   (and (match_code "reg")
@@ -52,6 +57,10 @@
   (and (match_code "reg")
        (match_test "EXT_REX_SSE_REGNO_P (REGNO (op))")))
 
+;; True if the operand is an AVX-512 mask register.
+(define_predicate "mask_reg_operand"
+  (and (match_code "reg")
+       (match_test "MASK_REGNO_P (REGNO (op))")))
 
 ;; True if the operand is a Q_REGS class register.
 (define_predicate "q_regs_operand"
-- 
1.7.11.7

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH i386 2/8] [AVX512] Add mask registers.
  2013-08-28 17:55                             ` Kirill Yukhin
@ 2013-08-28 18:07                               ` Richard Henderson
  2013-08-28 20:06                                 ` Kirill Yukhin
  0 siblings, 1 reply; 31+ messages in thread
From: Richard Henderson @ 2013-08-28 18:07 UTC (permalink / raw)
  To: Kirill Yukhin; +Cc: Vladimir Makarov, Uros Bizjak, Jakub Jelinek, GCC Patches

On 08/28/2013 10:45 AM, Kirill Yukhin wrote:
> Hello Richard,
> 
> On 27 Aug 13:07, Richard Henderson wrote:
>> On 08/27/2013 11:11 AM, Kirill Yukhin wrote:
>>>>> What happened to the bmi andn alternative we discussed?
>>> BMI only supported for 4- and 8- byte integers, while
>>> kandw - for HI/QI
>>>
>>
>> We're talking about values in registers.  Ignoring the high bits of the andn
>> result still produces the correct results.
> 
> I've updated patch, adding BMI alternative and clobber of flags:
> +(define_insn "kandn<mode>"
> +  [(set (match_operand:SWI12 0 "register_operand" "=r,&r,!k")
> +       (and:SWI12
> +         (not:SWI12
> +           (match_operand:SWI12 1 "register_operand" "r,0,k"))
> +         (match_operand:SWI12 2 "register_operand" "r,r,k")))
> +   (clobber (reg:CC FLAGS_REG))]
> +  "TARGET_AVX512F"
> +  "@
> +   andn\t{%k2, %k1, %k0|%k0, %k1, %k2}
> +   #
> +   kandnw\t{%2, %1, %0|%0, %1, %2}"
> +  [(set_attr "type" "bitmanip,*,msklog")
> +   (set_attr "prefix" "*,*,vex")
> +   (set_attr "btver2_decode" "direct,*,*")
> +   (set_attr "mode" "<MODE>")])
> 
> However I am not fully understand why do we need this.
> `kandn' is different from BMI `andn' in clobbering of flags reg.
> So, having such a pattern we'll make compiler think that `kandn'
> clobber, which seems to me like opportunity to misoptimization as
> far as `kandn' doesn't clobber.

This is no different than ANY OTHER use of the mask logical ops.

When combine puts the AND and the NOT together, we don't know what registers we
want the data in.  If we do not supply the general register alternative, with
the clobber, then we will be FORCED to implement the operation in the mask
registers, even if this operation had nothing to do with actual vector masks.
And it ought to come as no surprise that X & ~Y is a fairly common operation.

I suppose a real question here in how this is written: Does TARGET_AVX512F
imply TARGET_BMI?  If so, then we can eliminate the second alternative.  If
not, then you are missing an set_attr isa to restrict the first alternative.



r~

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH i386 2/8] [AVX512] Add mask registers.
  2013-08-28 18:07                               ` Richard Henderson
@ 2013-08-28 20:06                                 ` Kirill Yukhin
  2013-08-28 20:41                                   ` Richard Henderson
  0 siblings, 1 reply; 31+ messages in thread
From: Kirill Yukhin @ 2013-08-28 20:06 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Vladimir Makarov, Uros Bizjak, Jakub Jelinek, GCC Patches

Hello Richard,

On 28 Aug 10:55, Richard Henderson wrote:
> On 08/28/2013 10:45 AM, Kirill Yukhin wrote:
> > Hello Richard,
> > 
> > On 27 Aug 13:07, Richard Henderson wrote:
> >> On 08/27/2013 11:11 AM, Kirill Yukhin wrote:
> >>>>> What happened to the bmi andn alternative we discussed?
> >>> BMI only supported for 4- and 8- byte integers, while
> >>> kandw - for HI/QI
> >>>
> >>
> >> We're talking about values in registers.  Ignoring the high bits of the andn
> >> result still produces the correct results.
> > 
> > However I am not fully understand why do we need this.
> > `kandn' is different from BMI `andn' in clobbering of flags reg.
> > So, having such a pattern we'll make compiler think that `kandn'
> > clobber, which seems to me like opportunity to misoptimization as
> > far as `kandn' doesn't clobber.
> 
> This is no different than ANY OTHER use of the mask logical ops.
> 
> When combine puts the AND and the NOT together, we don't know what registers we
> want the data in.  If we do not supply the general register alternative, with
> the clobber, then we will be FORCED to implement the operation in the mask
> registers, even if this operation had nothing to do with actual vector masks.
> And it ought to come as no surprise that X & ~Y is a fairly common operation.
I agree with all of that. But why to put in BMI alternative as well? Without it
me may have this pattern w/o clobber and add it when doing split for GPR constraint.
I am just thing that presense of flags clobber in `kandn' pattern is not good from
optimization point of view. Anyway I don't think this is big deal...

> I suppose a real question here in how this is written: Does TARGET_AVX512F
> imply TARGET_BMI?  If so, then we can eliminate the second alternative.  If
> not, then you are missing an set_attr isa to restrict the first alternative.

I think that it should be possible to use AVX-512F w/o BMI, so I've added
new isa attribute "bmi". I am testing previous patch with that change:
@@ -703,7 +703,7 @@
 ;; Used to control the "enabled" attribute on a per-instruction basis.
 (define_attr "isa" "base,x64,x64_sse4,x64_sse4_noavx,x64_avx,nox64,
 		    sse2,sse2_noavx,sse3,sse4,sse4_noavx,avx,noavx,
-		    avx2,noavx2,bmi2,fma4,fma,avx512f,noavx512f,fma_avx512f"
+		    avx2,noavx2,bmi,bmi2,fma4,fma,avx512f,noavx512f,fma_avx512f"
   (const_string "base"))
 
 (define_attr "enabled" ""
@@ -726,6 +726,7 @@
 	 (eq_attr "isa" "noavx") (symbol_ref "!TARGET_AVX")
 	 (eq_attr "isa" "avx2") (symbol_ref "TARGET_AVX2")
 	 (eq_attr "isa" "noavx2") (symbol_ref "!TARGET_AVX2")
+	 (eq_attr "isa" "bmi") (symbol_ref "TARGET_BMI")
 	 (eq_attr "isa" "bmi2") (symbol_ref "TARGET_BMI2")
 	 (eq_attr "isa" "fma4") (symbol_ref "TARGET_FMA4")
 	 (eq_attr "isa" "fma") (symbol_ref "TARGET_FMA")
@@ -7744,7 +7745,8 @@
    andn\t{%k2, %k1, %k0|%k0, %k1, %k2}
    #
    kandnw\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "type" "bitmanip,*,msklog")
+  [(set_attr "isa" "bmi,*,avx512f")
+   (set_attr "type" "bitmanip,*,msklog")
    (set_attr "prefix" "*,*,vex")
    (set_attr "btver2_decode" "direct,*,*")
    (set_attr "mode" "<MODE>")])

Full patch below.

Ok if testing pass?

Thanks, K

---
 gcc/config/i386/constraints.md |   8 +-
 gcc/config/i386/i386.c         |  34 ++++-
 gcc/config/i386/i386.h         |  40 ++++--
 gcc/config/i386/i386.md        | 287 ++++++++++++++++++++++++++++++++++-------
 gcc/config/i386/predicates.md  |   9 ++
 5 files changed, 317 insertions(+), 61 deletions(-)

diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
index 28e626f..92e0c05 100644
--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -19,7 +19,7 @@
 
 ;;; Unused letters:
 ;;;     B     H           T
-;;;           h jk
+;;;           h j
 
 ;; Integer register constraints.
 ;; It is not necessary to define 'r' here.
@@ -78,6 +78,12 @@
  "TARGET_80387 || TARGET_FLOAT_RETURNS_IN_80387 ? FP_SECOND_REG : NO_REGS"
  "Second from top of 80387 floating-point stack (@code{%st(1)}).")
 
+(define_register_constraint "k" "TARGET_AVX512F ? MASK_EVEX_REGS : NO_REGS"
+"@internal Any mask register that can be used as predicate, i.e. k1-k7.")
+
+(define_register_constraint "Yk" "TARGET_AVX512F ? MASK_REGS : NO_REGS"
+"@internal Any mask register.")
+
 ;; Vector registers (also used for plain floating point nowadays).
 (define_register_constraint "y" "TARGET_MMX ? MMX_REGS : NO_REGS"
  "Any MMX register.")
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index d05dbf0..8325919 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2032,6 +2032,9 @@ enum reg_class const regclass_map[FIRST_PSEUDO_REGISTER] =
   EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS,
   EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS,
   EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS,
+  /* Mask registers.  */
+  MASK_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS,
+  MASK_EVEX_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS,
 };
 
 /* The "default" register map used in 32bit mode.  */
@@ -2047,6 +2050,7 @@ int const dbx_register_map[FIRST_PSEUDO_REGISTER] =
   -1, -1, -1, -1, -1, -1, -1, -1,	/* extended SSE registers */
   -1, -1, -1, -1, -1, -1, -1, -1,       /* AVX-512 registers 16-23*/
   -1, -1, -1, -1, -1, -1, -1, -1,       /* AVX-512 registers 24-31*/
+  93, 94, 95, 96, 97, 98, 99, 100,      /* Mask registers */
 };
 
 /* The "default" register map used in 64bit mode.  */
@@ -2062,6 +2066,7 @@ int const dbx64_register_map[FIRST_PSEUDO_REGISTER] =
   25, 26, 27, 28, 29, 30, 31, 32,	/* extended SSE registers */
   67, 68, 69, 70, 71, 72, 73, 74,       /* AVX-512 registers 16-23 */
   75, 76, 77, 78, 79, 80, 81, 82,       /* AVX-512 registers 24-31 */
+  118, 119, 120, 121, 122, 123, 124, 125, /* Mask registers */
 };
 
 /* Define the register numbers to be used in Dwarf debugging information.
@@ -2129,6 +2134,7 @@ int const svr4_dbx_register_map[FIRST_PSEUDO_REGISTER] =
   -1, -1, -1, -1, -1, -1, -1, -1,	/* extended SSE registers */
   -1, -1, -1, -1, -1, -1, -1, -1,       /* AVX-512 registers 16-23*/
   -1, -1, -1, -1, -1, -1, -1, -1,       /* AVX-512 registers 24-31*/
+  93, 94, 95, 96, 97, 98, 99, 100,      /* Mask registers */
 };
 
 /* Define parameter passing and return registers.  */
@@ -4219,8 +4225,13 @@ ix86_conditional_register_usage (void)
 
   /* If AVX512F is disabled, squash the registers.  */
   if (! TARGET_AVX512F)
+  {
     for (i = FIRST_EXT_REX_SSE_REG; i < LAST_EXT_REX_SSE_REG; i++)
       fixed_regs[i] = call_used_regs[i] = 1, reg_names[i] = "";
+
+    for (i = FIRST_MASK_REG; i < LAST_MASK_REG; i++)
+      fixed_regs[i] = call_used_regs[i] = 1, reg_names[i] = "";
+  }
 }
 
 \f
@@ -33889,10 +33900,12 @@ ix86_preferred_reload_class (rtx x, reg_class_t regclass)
     return regclass;
 
   /* Force constants into memory if we are loading a (nonzero) constant into
-     an MMX or SSE register.  This is because there are no MMX/SSE instructions
-     to load from a constant.  */
+     an MMX, SSE or MASK register.  This is because there are no MMX/SSE/MASK
+     instructions to load from a constant.  */
   if (CONSTANT_P (x)
-      && (MAYBE_MMX_CLASS_P (regclass) || MAYBE_SSE_CLASS_P (regclass)))
+      && (MAYBE_MMX_CLASS_P (regclass)
+	  || MAYBE_SSE_CLASS_P (regclass)
+	  || MAYBE_MASK_CLASS_P (regclass)))
     return NO_REGS;
 
   /* Prefer SSE regs only, if we can use them for math.  */
@@ -33996,10 +34009,11 @@ ix86_secondary_reload (bool in_p, rtx x, reg_class_t rclass,
 
   /* QImode spills from non-QI registers require
      intermediate register on 32bit targets.  */
-  if (!TARGET_64BIT
-      && !in_p && mode == QImode
-      && INTEGER_CLASS_P (rclass)
-      && MAYBE_NON_Q_CLASS_P (rclass))
+  if (mode == QImode
+      && (MAYBE_MASK_CLASS_P (rclass)
+	  || (!TARGET_64BIT && !in_p
+	      && INTEGER_CLASS_P (rclass)
+	      && MAYBE_NON_Q_CLASS_P (rclass))))
     {
       int regno;
 
@@ -34421,6 +34435,8 @@ ix86_hard_regno_mode_ok (int regno, enum machine_mode mode)
     return false;
   if (STACK_REGNO_P (regno))
     return VALID_FP_MODE_P (mode);
+  if (MASK_REGNO_P (regno))
+    return VALID_MASK_REG_MODE (mode);
   if (SSE_REGNO_P (regno))
     {
       /* We implement the move patterns for all vector modes into and
@@ -35230,6 +35246,10 @@ x86_order_regs_for_local_alloc (void)
    for (i = FIRST_EXT_REX_SSE_REG; i <= LAST_EXT_REX_SSE_REG; i++)
      reg_alloc_order [pos++] = i;
 
+   /* Mask register.  */
+   for (i = FIRST_MASK_REG; i <= LAST_MASK_REG; i++)
+     reg_alloc_order [pos++] = i;
+
    /* x87 registers.  */
    if (TARGET_SSE_MATH)
      for (i = FIRST_STACK_REG; i <= LAST_STACK_REG; i++)
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index e820aa6..13572bf2 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -893,7 +893,7 @@ enum target_cpu_default
    eliminated during reloading in favor of either the stack or frame
    pointer.  */
 
-#define FIRST_PSEUDO_REGISTER 69
+#define FIRST_PSEUDO_REGISTER 77
 
 /* Number of hardware registers that go into the DWARF-2 unwind info.
    If not defined, equals FIRST_PSEUDO_REGISTER.  */
@@ -923,7 +923,9 @@ enum target_cpu_default
 /*xmm16,xmm17,xmm18,xmm19,xmm20,xmm21,xmm22,xmm23*/		\
      0,   0,    0,    0,    0,    0,    0,    0,		\
 /*xmm24,xmm25,xmm26,xmm27,xmm28,xmm29,xmm30,xmm31*/		\
-     0,   0,    0,    0,    0,    0,    0,    0 }
+     0,   0,    0,    0,    0,    0,    0,    0,		\
+/*  k0,  k1, k2, k3, k4, k5, k6, k7*/				\
+     0,  0,   0,  0,  0,  0,  0,  0 }
 
 /* 1 for registers not available across function calls.
    These must include the FIXED_REGISTERS and also any
@@ -955,7 +957,9 @@ enum target_cpu_default
 /*xmm16,xmm17,xmm18,xmm19,xmm20,xmm21,xmm22,xmm23*/		\
      6,    6,     6,    6,    6,    6,    6,    6,		\
 /*xmm24,xmm25,xmm26,xmm27,xmm28,xmm29,xmm30,xmm31*/		\
-     6,    6,     6,    6,    6,    6,    6,    6 }
+     6,    6,     6,    6,    6,    6,    6,    6,		\
+ /* k0,  k1,  k2,  k3,  k4,  k5,  k6,  k7*/			\
+     1,   1,   1,   1,   1,   1,   1,   1 }
 
 /* Order in which to allocate registers.  Each register must be
    listed once, even those in FIXED_REGISTERS.  List frame pointer
@@ -971,7 +975,7 @@ enum target_cpu_default
    18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32,	\
    33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,  \
    48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62,	\
-   63, 64, 65, 66, 67, 68 }
+   63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76 }
 
 /* ADJUST_REG_ALLOC_ORDER is a macro which permits reg_alloc_order
    to be rearranged based on a particular function.  When using sse math,
@@ -1068,6 +1072,8 @@ enum target_cpu_default
    || (MODE) == V16SImode || (MODE) == V32HImode || (MODE) == V8DFmode	\
    || (MODE) == V16SFmode)
 
+#define VALID_MASK_REG_MODE(MODE) ((MODE) == HImode || (MODE) == QImode)
+
 /* Value is 1 if hard register REGNO can hold a value of machine-mode MODE.  */
 
 #define HARD_REGNO_MODE_OK(REGNO, MODE)	\
@@ -1093,8 +1099,10 @@ enum target_cpu_default
   (CC_REGNO_P (REGNO) ? VOIDmode					\
    : (MODE) == VOIDmode && (NREGS) != 1 ? VOIDmode			\
    : (MODE) == VOIDmode ? choose_hard_reg_mode ((REGNO), (NREGS), false) \
-   : (MODE) == HImode && !TARGET_PARTIAL_REG_STALL ? SImode		\
-   : (MODE) == QImode && !(TARGET_64BIT || QI_REGNO_P (REGNO)) ? SImode	\
+   : (MODE) == HImode && !(TARGET_PARTIAL_REG_STALL			\
+			   || MASK_REGNO_P (REGNO)) ? SImode		\
+   : (MODE) == QImode && !(TARGET_64BIT || QI_REGNO_P (REGNO)		\
+			   || MASK_REGNO_P (REGNO)) ? SImode		\
    : (MODE))
 
 /* The only ABI that saves SSE registers across calls is Win64 (thus no
@@ -1141,6 +1149,9 @@ enum target_cpu_default
 #define FIRST_EXT_REX_SSE_REG  (LAST_REX_SSE_REG + 1) /*53*/
 #define LAST_EXT_REX_SSE_REG   (FIRST_EXT_REX_SSE_REG + 15) /*68*/
 
+#define FIRST_MASK_REG  (LAST_EXT_REX_SSE_REG + 1) /*69*/
+#define LAST_MASK_REG   (FIRST_MASK_REG + 7) /*76*/
+
 /* Override this in other tm.h files to cope with various OS lossage
    requiring a frame pointer.  */
 #ifndef SUBTARGET_FRAME_POINTER_REQUIRED
@@ -1229,6 +1240,8 @@ enum reg_class
   FLOAT_INT_REGS,
   INT_SSE_REGS,
   FLOAT_INT_SSE_REGS,
+  MASK_EVEX_REGS,
+  MASK_REGS,
   ALL_REGS, LIM_REG_CLASSES
 };
 
@@ -1250,6 +1263,8 @@ enum reg_class
   reg_classes_intersect_p ((CLASS), ALL_SSE_REGS)
 #define MAYBE_MMX_CLASS_P(CLASS) \
   reg_classes_intersect_p ((CLASS), MMX_REGS)
+#define MAYBE_MASK_CLASS_P(CLASS) \
+  reg_classes_intersect_p ((CLASS), MASK_REGS)
 
 #define Q_CLASS_P(CLASS) \
   reg_class_subset_p ((CLASS), Q_REGS)
@@ -1282,6 +1297,8 @@ enum reg_class
    "FLOAT_INT_REGS",			\
    "INT_SSE_REGS",			\
    "FLOAT_INT_SSE_REGS",		\
+   "MASK_EVEX_REGS",			\
+   "MASK_REGS",				\
    "ALL_REGS" }
 
 /* Define which registers fit in which classes.  This is an initializer
@@ -1319,7 +1336,9 @@ enum reg_class
 {   0x11ffff,    0x1fe0,   0x0 },       /* FLOAT_INT_REGS */            \
 { 0x1ff100ff,0xffffffe0,  0x1f },       /* INT_SSE_REGS */              \
 { 0x1ff1ffff,0xffffffe0,  0x1f },       /* FLOAT_INT_SSE_REGS */        \
-{ 0xffffffff,0xffffffff,  0x1f }                                        \
+       { 0x0,       0x0,0x1fc0 },       /* MASK_EVEX_REGS */           \
+       { 0x0,       0x0,0x1fe0 },       /* MASK_REGS */                 \
+{ 0xffffffff,0xffffffff,0x1fff }                                        \
 }
 
 /* The same information, inverted:
@@ -1377,6 +1396,8 @@ enum reg_class
          : (N) <= LAST_REX_SSE_REG ? (FIRST_REX_SSE_REG + (N) - 8) \
                                    : (FIRST_EXT_REX_SSE_REG + (N) - 16))
 
+#define MASK_REGNO_P(N) IN_RANGE ((N), FIRST_MASK_REG, LAST_MASK_REG)
+#define ANY_MASK_REG_P(X) (REG_P (X) && MASK_REGNO_P (REGNO (X)))
 
 #define SSE_FLOAT_MODE_P(MODE) \
   ((TARGET_SSE && (MODE) == SFmode) || (TARGET_SSE2 && (MODE) == DFmode))
@@ -1429,7 +1450,7 @@ enum reg_class
 
 /* Get_secondary_mem widens integral modes to BITS_PER_WORD.
    There is no need to emit full 64 bit move on 64 bit targets
-   for integral modes that can be moved using 32 bit move.  */
+   for integral modes that can be moved using 8 bit move.  */
 #define SECONDARY_MEMORY_NEEDED_MODE(MODE)			\
   (GET_MODE_BITSIZE (MODE) < 32 && INTEGRAL_MODE_P (MODE)	\
    ? mode_for_size (32, GET_MODE_CLASS (MODE), 0)		\
@@ -1933,7 +1954,8 @@ do {							\
  "xmm16", "xmm17", "xmm18", "xmm19",					\
  "xmm20", "xmm21", "xmm22", "xmm23",					\
  "xmm24", "xmm25", "xmm26", "xmm27",					\
- "xmm28", "xmm29", "xmm30", "xmm31" }
+ "xmm28", "xmm29", "xmm30", "xmm31",					\
+ "k0", "k1", "k2", "k3", "k4", "k5", "k6", "k7" }
 
 #define REGISTER_NAMES HI_REGISTER_NAMES
 
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 3d7533a..f41b6b8 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -328,6 +328,14 @@
    (XMM29_REG			66)
    (XMM30_REG			67)
    (XMM31_REG			68)
+   (MASK0_REG			69)
+   (MASK1_REG			70)
+   (MASK2_REG			71)
+   (MASK3_REG			72)
+   (MASK4_REG			73)
+   (MASK5_REG			74)
+   (MASK6_REG			75)
+   (MASK7_REG			76)
   ])
 
 ;; Insns whose names begin with "x86_" are emitted by gen_FOO calls
@@ -360,7 +368,7 @@
    sseishft,sseishft1,ssecmp,ssecomi,
    ssecvt,ssecvt1,sseicvt,sseins,
    sseshuf,sseshuf1,ssemuladd,sse4arg,
-   lwp,
+   lwp,mskmov,msklog,
    mmx,mmxmov,mmxadd,mmxmul,mmxcmp,mmxcvt,mmxshft"
   (const_string "other"))
 
@@ -379,7 +387,7 @@
 			  ssemul,sseimul,ssediv,sselog,sselog1,
 			  sseishft,sseishft1,ssecmp,ssecomi,
 			  ssecvt,ssecvt1,sseicvt,sseins,
-			  sseshuf,sseshuf1,ssemuladd,sse4arg")
+			  sseshuf,sseshuf1,ssemuladd,sse4arg,mskmov")
 	   (const_string "sse")
 	 (eq_attr "type" "mmx,mmxmov,mmxadd,mmxmul,mmxcmp,mmxcvt,mmxshft")
 	   (const_string "mmx")
@@ -390,7 +398,7 @@
 ;; The (bounding maximum) length of an instruction immediate.
 (define_attr "length_immediate" ""
   (cond [(eq_attr "type" "incdec,setcc,icmov,str,lea,other,multi,idiv,leave,
-			  bitmanip,imulx")
+			  bitmanip,imulx,msklog,mskmov")
 	   (const_int 0)
 	 (eq_attr "unit" "i387,sse,mmx")
 	   (const_int 0)
@@ -451,7 +459,7 @@
 ;; Set when 0f opcode prefix is used.
 (define_attr "prefix_0f" ""
   (if_then_else
-    (ior (eq_attr "type" "imovx,setcc,icmov,bitmanip")
+    (ior (eq_attr "type" "imovx,setcc,icmov,bitmanip,msklog,mskmov")
 	 (eq_attr "unit" "sse,mmx"))
     (const_int 1)
     (const_int 0)))
@@ -651,7 +659,7 @@
 		   fmov,fcmp,fsgn,
 		   sse,ssemov,ssecmp,ssecomi,ssecvt,ssecvt1,sseicvt,
 		   sselog1,sseshuf1,sseadd1,sseiadd1,sseishft1,
-		   mmx,mmxmov,mmxcmp,mmxcvt")
+		   mmx,mmxmov,mmxcmp,mmxcvt,mskmov,msklog")
 	      (match_operand 2 "memory_operand"))
 	   (const_string "load")
 	 (and (eq_attr "type" "icmov,ssemuladd,sse4arg")
@@ -695,7 +703,7 @@
 ;; Used to control the "enabled" attribute on a per-instruction basis.
 (define_attr "isa" "base,x64,x64_sse4,x64_sse4_noavx,x64_avx,nox64,
 		    sse2,sse2_noavx,sse3,sse4,sse4_noavx,avx,noavx,
-		    avx2,noavx2,bmi2,fma4,fma,avx512f,noavx512f,fma_avx512f"
+		    avx2,noavx2,bmi,bmi2,fma4,fma,avx512f,noavx512f,fma_avx512f"
   (const_string "base"))
 
 (define_attr "enabled" ""
@@ -718,6 +726,7 @@
 	 (eq_attr "isa" "noavx") (symbol_ref "!TARGET_AVX")
 	 (eq_attr "isa" "avx2") (symbol_ref "TARGET_AVX2")
 	 (eq_attr "isa" "noavx2") (symbol_ref "!TARGET_AVX2")
+	 (eq_attr "isa" "bmi") (symbol_ref "TARGET_BMI")
 	 (eq_attr "isa" "bmi2") (symbol_ref "TARGET_BMI2")
 	 (eq_attr "isa" "fma4") (symbol_ref "TARGET_FMA4")
 	 (eq_attr "isa" "fma") (symbol_ref "TARGET_FMA")
@@ -2213,8 +2222,8 @@
 	   (const_string "SI")))])
 
 (define_insn "*movhi_internal"
-  [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r ,r ,m")
-	(match_operand:HI 1 "general_operand"	   "r ,rn,rm,rn"))]
+  [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r ,r ,m ,Yk,Yk,rm")
+	(match_operand:HI 1 "general_operand"      "r ,rn,rm,rn,rm,Yk,Yk"))]
   "!(MEM_P (operands[0]) && MEM_P (operands[1]))"
 {
   switch (get_attr_type (insn))
@@ -2223,6 +2232,16 @@
       /* movzwl is faster than movw on p2 due to partial word stalls,
 	 though not as fast as an aligned movl.  */
       return "movz{wl|x}\t{%1, %k0|%k0, %1}";
+
+    case TYPE_MSKMOV:
+      switch (which_alternative)
+        {
+	case 4: return "kmovw\t{%k1, %0|%0, %k1}";
+	case 5: return "kmovw\t{%1, %0|%0, %1}";
+	case 6: return "kmovw\t{%1, %k0|%k0, %1}";
+	default: gcc_unreachable ();
+	}
+
     default:
       if (get_attr_mode (insn) == MODE_SI)
         return "mov{l}\t{%k1, %k0|%k0, %k1}";
@@ -2240,11 +2259,17 @@
 	    (and (eq_attr "alternative" "1,2")
 		 (match_operand:HI 1 "aligned_operand"))
 	      (const_string "imov")
+	    (eq_attr "alternative" "4,5,6")
+	      (const_string "mskmov")
 	    (and (match_test "TARGET_MOVX")
 		 (eq_attr "alternative" "0,2"))
 	      (const_string "imovx")
 	   ]
 	   (const_string "imov")))
+    (set (attr "prefix")
+      (if_then_else (eq_attr "alternative" "4,5,6")
+	(const_string "vex")
+	(const_string "orig")))
     (set (attr "mode")
       (cond [(eq_attr "type" "imovx")
 	       (const_string "SI")
@@ -2269,8 +2294,8 @@
 ;; register stall machines with, where we use QImode instructions, since
 ;; partial register stall can be caused there.  Then we use movzx.
 (define_insn "*movqi_internal"
-  [(set (match_operand:QI 0 "nonimmediate_operand" "=q,q ,q ,r,r ,?r,m")
-	(match_operand:QI 1 "general_operand"      "q ,qn,qm,q,rn,qm,qn"))]
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=q,q ,q ,r,r ,?r,m ,Yk,Yk,r")
+	(match_operand:QI 1 "general_operand"      "q ,qn,qm,q,rn,qm,qn,r ,Yk,Yk"))]
   "!(MEM_P (operands[0]) && MEM_P (operands[1]))"
 {
   switch (get_attr_type (insn))
@@ -2278,6 +2303,16 @@
     case TYPE_IMOVX:
       gcc_assert (ANY_QI_REG_P (operands[1]) || MEM_P (operands[1]));
       return "movz{bl|x}\t{%1, %k0|%k0, %1}";
+
+    case TYPE_MSKMOV:
+      switch (which_alternative)
+        {
+	case 7: return "kmovw\t{%k1, %0|%0, %k1}";
+	case 8: return "kmovw\t{%1, %0|%0, %1}";
+	case 9: return "kmovw\t{%1, %k0|%k0, %1}";
+	default: gcc_unreachable ();
+	}
+
     default:
       if (get_attr_mode (insn) == MODE_SI)
         return "mov{l}\t{%k1, %k0|%k0, %k1}";
@@ -2297,11 +2332,17 @@
 	      (const_string "imov")
 	    (eq_attr "alternative" "3,5")
 	      (const_string "imovx")
+	    (eq_attr "alternative" "7,8,9")
+	      (const_string "mskmov")
 	    (and (match_test "TARGET_MOVX")
 		 (eq_attr "alternative" "2"))
 	      (const_string "imovx")
 	   ]
 	   (const_string "imov")))
+   (set (attr "prefix")
+     (if_then_else (eq_attr "alternative" "7,8,9")
+       (const_string "vex")
+       (const_string "orig")))
    (set (attr "mode")
       (cond [(eq_attr "alternative" "3,4,5")
 	       (const_string "SI")
@@ -7494,6 +7535,26 @@
   operands[3] = gen_lowpart (QImode, operands[3]);
 })
 
+(define_split
+  [(set (match_operand:SWI12 0 "mask_reg_operand")
+	(any_logic:SWI12 (match_operand:SWI12 1 "mask_reg_operand")
+			 (match_operand:SWI12 2 "mask_reg_operand")))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_AVX512F && reload_completed"
+  [(set (match_dup 0)
+	(any_logic:SWI12 (match_dup 1)
+			 (match_dup 2)))])
+
+(define_insn "*k<logic><mode>"
+  [(set (match_operand:SWI12 0 "mask_reg_operand" "=Yk")
+	(any_logic:SWI12 (match_operand:SWI12 1 "mask_reg_operand" "Yk")
+			 (match_operand:SWI12 2 "mask_reg_operand" "Yk")))]
+  "TARGET_AVX512F && reload_completed"
+  "k<logic>w\t{%2, %1, %0|%0, %1, %2}";
+  [(set_attr "mode" "<MODE>")
+   (set_attr "type" "msklog")
+   (set_attr "prefix" "vex")])
+
 ;; %%% This used to optimize known byte-wide and operations to memory,
 ;; and sometimes to QImode registers.  If this is considered useful,
 ;; it should be done with splitters.
@@ -7616,10 +7677,10 @@
   [(set_attr "type" "alu")
    (set_attr "mode" "SI")])
 
-(define_insn "*andhi_1"
-  [(set (match_operand:HI 0 "nonimmediate_operand" "=rm,r,Ya")
-	(and:HI (match_operand:HI 1 "nonimmediate_operand" "%0,0,qm")
-		(match_operand:HI 2 "general_operand" "rn,rm,L")))
+(define_insn "andhi_1"
+  [(set (match_operand:HI 0 "nonimmediate_operand" "=rm,r,Ya,!Yk")
+	(and:HI (match_operand:HI 1 "nonimmediate_operand" "%0,0,qm,Yk")
+		(match_operand:HI 2 "general_operand" "rn,rm,L,Yk")))
    (clobber (reg:CC FLAGS_REG))]
   "ix86_binary_operator_ok (AND, HImode, operands)"
 {
@@ -7628,34 +7689,38 @@
     case TYPE_IMOVX:
       return "#";
 
+    case TYPE_MSKLOG:
+      return "kandw\t{%2, %1, %0|%0, %1, %2}";
+
     default:
       gcc_assert (rtx_equal_p (operands[0], operands[1]));
       return "and{w}\t{%2, %0|%0, %2}";
     }
 }
-  [(set_attr "type" "alu,alu,imovx")
-   (set_attr "length_immediate" "*,*,0")
+  [(set_attr "type" "alu,alu,imovx,msklog")
+   (set_attr "length_immediate" "*,*,0,*")
    (set (attr "prefix_rex")
      (if_then_else
        (and (eq_attr "type" "imovx")
 	    (match_operand 1 "ext_QIreg_operand"))
        (const_string "1")
        (const_string "*")))
-   (set_attr "mode" "HI,HI,SI")])
+   (set_attr "mode" "HI,HI,SI,HI")])
 
 ;; %%% Potential partial reg stall on alternative 2.  What to do?
 (define_insn "*andqi_1"
-  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,q,r")
-	(and:QI (match_operand:QI 1 "nonimmediate_operand" "%0,0,0")
-		(match_operand:QI 2 "general_operand" "qn,qmn,rn")))
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,q,r,!Yk")
+	(and:QI (match_operand:QI 1 "nonimmediate_operand" "%0,0,0,Yk")
+		(match_operand:QI 2 "general_operand" "qn,qmn,rn,Yk")))
    (clobber (reg:CC FLAGS_REG))]
   "ix86_binary_operator_ok (AND, QImode, operands)"
   "@
    and{b}\t{%2, %0|%0, %2}
    and{b}\t{%2, %0|%0, %2}
-   and{l}\t{%k2, %k0|%k0, %k2}"
-  [(set_attr "type" "alu")
-   (set_attr "mode" "QI,QI,SI")])
+   and{l}\t{%k2, %k0|%k0, %k2}
+   kandw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "alu,alu,alu,msklog")
+   (set_attr "mode" "QI,QI,SI,HI")])
 
 (define_insn "*andqi_1_slp"
   [(set (strict_low_part (match_operand:QI 0 "nonimmediate_operand" "+qm,q"))
@@ -7668,6 +7733,39 @@
   [(set_attr "type" "alu1")
    (set_attr "mode" "QI")])
 
+(define_insn "kandn<mode>"
+  [(set (match_operand:SWI12 0 "register_operand" "=r,&r,!k")
+	(and:SWI12
+	  (not:SWI12
+	    (match_operand:SWI12 1 "register_operand" "r,0,k"))
+	  (match_operand:SWI12 2 "register_operand" "r,r,k")))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_AVX512F"
+  "@
+   andn\t{%k2, %k1, %k0|%k0, %k1, %k2}
+   #
+   kandnw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "bmi,*,avx512f")
+   (set_attr "type" "bitmanip,*,msklog")
+   (set_attr "prefix" "*,*,vex")
+   (set_attr "btver2_decode" "direct,*,*")
+   (set_attr "mode" "<MODE>")])
+
+(define_split
+  [(set (match_operand:SWI12 0 "general_reg_operand")
+	(and:SWI12
+	  (not:SWI12
+	    (match_dup 0))
+	  (match_operand:SWI12 1 "general_reg_operand")))]
+  "TARGET_AVX512F && !TARGET_BMI"
+  [(set (match_dup 0)
+	(not:HI (match_dup 0)))
+   (parallel [(set (match_dup 0)
+		   (and:HI (match_dup 0)
+			   (match_dup 1)))
+	      (clobber (reg:CC FLAGS_REG))])]
+  "")
+
 ;; Turn *anddi_1 into *andsi_1_zext if possible.
 (define_split
   [(set (match_operand:DI 0 "register_operand")
@@ -7999,29 +8097,44 @@
   "ix86_expand_binary_operator (<CODE>, <MODE>mode, operands); DONE;")
 
 (define_insn "*<code><mode>_1"
-  [(set (match_operand:SWI248 0 "nonimmediate_operand" "=r,rm")
-	(any_or:SWI248
-	 (match_operand:SWI248 1 "nonimmediate_operand" "%0,0")
-	 (match_operand:SWI248 2 "<general_operand>" "<g>,r<i>")))
+  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=r,rm")
+	(any_or:SWI48
+	 (match_operand:SWI48 1 "nonimmediate_operand" "%0,0")
+	 (match_operand:SWI48 2 "<general_operand>" "<g>,r<i>")))
    (clobber (reg:CC FLAGS_REG))]
   "ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
   "<logic>{<imodesuffix>}\t{%2, %0|%0, %2}"
   [(set_attr "type" "alu")
    (set_attr "mode" "<MODE>")])
 
+(define_insn "<code>hi_1"
+  [(set (match_operand:HI 0 "nonimmediate_operand" "=r,rm,!Yk")
+	(any_or:HI
+	 (match_operand:HI 1 "nonimmediate_operand" "%0,0,Yk")
+	 (match_operand:HI 2 "general_operand" "<g>,r<i>,Yk")))
+   (clobber (reg:CC FLAGS_REG))]
+  "ix86_binary_operator_ok (<CODE>, HImode, operands)"
+  "@
+  <logic>{w}\t{%2, %0|%0, %2}
+  <logic>{w}\t{%2, %0|%0, %2}
+  k<logic>w\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "alu,alu,msklog")
+   (set_attr "mode" "HI")])
+
 ;; %%% Potential partial reg stall on alternative 2.  What to do?
 (define_insn "*<code>qi_1"
-  [(set (match_operand:QI 0 "nonimmediate_operand" "=q,m,r")
-	(any_or:QI (match_operand:QI 1 "nonimmediate_operand" "%0,0,0")
-		   (match_operand:QI 2 "general_operand" "qmn,qn,rn")))
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=q,m,r,!Yk")
+	(any_or:QI (match_operand:QI 1 "nonimmediate_operand" "%0,0,0,Yk")
+		   (match_operand:QI 2 "general_operand" "qmn,qn,rn,Yk")))
    (clobber (reg:CC FLAGS_REG))]
   "ix86_binary_operator_ok (<CODE>, QImode, operands)"
   "@
    <logic>{b}\t{%2, %0|%0, %2}
    <logic>{b}\t{%2, %0|%0, %2}
-   <logic>{l}\t{%k2, %k0|%k0, %k2}"
-  [(set_attr "type" "alu")
-   (set_attr "mode" "QI,QI,SI")])
+   <logic>{l}\t{%k2, %k0|%k0, %k2}
+   k<logic>w\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "alu,alu,alu,msklog")
+   (set_attr "mode" "QI,QI,SI,HI")])
 
 ;; See comment for addsi_1_zext why we do use nonimmediate_operand
 (define_insn "*<code>si_1_zext"
@@ -8071,6 +8184,74 @@
   [(set_attr "type" "alu")
    (set_attr "mode" "<MODE>")])
 
+(define_insn "kxnor<mode>"
+  [(set (match_operand:SWI12 0 "register_operand" "=r,!k")
+	(not:SWI12
+	  (xor:SWI12
+	    (match_operand:SWI12 1 "register_operand" "0,k")
+	    (match_operand:SWI12 2 "register_operand" "r,k"))))]
+  "TARGET_AVX512F"
+  "@
+   #
+   kxnorw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "*,msklog")
+   (set_attr "prefix" "*,vex")
+   (set_attr "mode" "<MODE>")])
+
+(define_split
+  [(set (match_operand:SWI12 0 "register_operand")
+	(not:SWI12
+	  (xor:SWI12
+	    (match_dup 0)
+	    (match_operand:SWI12 1 "register_operand"))))]
+  "TARGET_AVX512F && !ANY_MASK_REG_P (operands [0])"
+   [(parallel [(set (match_dup 0)
+		    (xor:HI (match_dup 0)
+			    (match_dup 1)))
+	       (clobber (reg:CC FLAGS_REG))])
+    (set (match_dup 0)
+	 (not:HI (match_dup 0)))]
+  "")
+
+(define_insn "kortestzhi"
+  [(set (reg:CCZ FLAGS_REG)
+	(compare:CCZ
+	  (ior:HI
+	    (match_operand:HI 0 "register_operand" "%Yk")
+	    (match_operand:HI 1 "register_operand" "Yk"))
+	  (const_int 0)))]
+  "TARGET_AVX512F && ix86_match_ccmode (insn, CCZmode)"
+  "kortestw\t{%1, %0|%0, %1}"
+  [(set_attr "mode" "HI")
+   (set_attr "type" "msklog")
+   (set_attr "prefix" "vex")])
+
+(define_insn "kortestchi"
+  [(set (reg:CCC FLAGS_REG)
+	(compare:CCC
+	  (ior:HI
+	    (match_operand:HI 0 "register_operand" "%Yk")
+	    (match_operand:HI 1 "register_operand" "Yk"))
+	  (const_int -1)))]
+  "TARGET_AVX512F && ix86_match_ccmode (insn, CCCmode)"
+  "kortestw\t{%1, %0|%0, %1}"
+  [(set_attr "mode" "HI")
+   (set_attr "type" "msklog")
+   (set_attr "prefix" "vex")])
+
+(define_insn "kunpckhi"
+  [(set (match_operand:HI 0 "register_operand" "=Yk")
+	(ior:HI
+	  (ashift:HI
+	    (match_operand:HI 1 "register_operand" "Yk")
+	    (const_int 8))
+	  (zero_extend:HI (match_operand:QI 2 "register_operand" "Yk"))))]
+  "TARGET_AVX512F"
+  "kunpckbw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mode" "HI")
+   (set_attr "type" "msklog")
+   (set_attr "prefix" "vex")])
+
 ;; See comment for addsi_1_zext why we do use nonimmediate_operand
 ;; ??? Special case for immediate operand is missing - it is tricky.
 (define_insn "*<code>si_2_zext"
@@ -8640,23 +8821,38 @@
   "ix86_expand_unary_operator (NOT, <MODE>mode, operands); DONE;")
 
 (define_insn "*one_cmpl<mode>2_1"
-  [(set (match_operand:SWI248 0 "nonimmediate_operand" "=rm")
-	(not:SWI248 (match_operand:SWI248 1 "nonimmediate_operand" "0")))]
+  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm")
+	(not:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" "0")))]
   "ix86_unary_operator_ok (NOT, <MODE>mode, operands)"
   "not{<imodesuffix>}\t%0"
   [(set_attr "type" "negnot")
    (set_attr "mode" "<MODE>")])
 
+(define_insn "*one_cmplhi2_1"
+  [(set (match_operand:HI 0 "nonimmediate_operand" "=rm,!Yk")
+	(not:HI (match_operand:HI 1 "nonimmediate_operand" "0,Yk")))]
+  "ix86_unary_operator_ok (NOT, HImode, operands)"
+  "@
+   not{w}\t%0
+   knotw\t{%1, %0|%0, %1}"
+  [(set_attr "isa" "*,avx512f")
+   (set_attr "type" "negnot,msklog")
+   (set_attr "prefix" "*,vex")
+   (set_attr "mode" "HI")])
+
 ;; %%% Potential partial reg stall on alternative 1.  What to do?
 (define_insn "*one_cmplqi2_1"
-  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,r")
-	(not:QI (match_operand:QI 1 "nonimmediate_operand" "0,0")))]
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,r,!Yk")
+	(not:QI (match_operand:QI 1 "nonimmediate_operand" "0,0,Yk")))]
   "ix86_unary_operator_ok (NOT, QImode, operands)"
   "@
    not{b}\t%0
-   not{l}\t%k0"
-  [(set_attr "type" "negnot")
-   (set_attr "mode" "QI,SI")])
+   not{l}\t%k0
+   knotw\t{%1, %0|%0, %1}"
+  [(set_attr "isa" "*,*,avx512f")
+   (set_attr "type" "negnot,negnot,msklog")
+   (set_attr "prefix" "*,*,vex")
+   (set_attr "mode" "QI,SI,QI")])
 
 ;; ??? Currently never generated - xor is used instead.
 (define_insn "*one_cmplsi2_1_zext"
@@ -16380,11 +16576,11 @@
 })
 
 ;; Avoid redundant prefixes by splitting HImode arithmetic to SImode.
-
+;; Do not split instructions with mask registers.
 (define_split
-  [(set (match_operand 0 "register_operand")
+  [(set (match_operand 0 "general_reg_operand")
 	(match_operator 3 "promotable_binary_operator"
-	   [(match_operand 1 "register_operand")
+	   [(match_operand 1 "general_reg_operand")
 	    (match_operand 2 "aligned_operand")]))
    (clobber (reg:CC FLAGS_REG))]
   "! TARGET_PARTIAL_REG_STALL && reload_completed
@@ -16479,6 +16675,7 @@
   operands[1] = gen_lowpart (SImode, operands[1]);
 })
 
+;; Do not split instructions with mask regs.
 (define_split
   [(set (match_operand 0 "register_operand")
 	(not (match_operand 1 "register_operand")))]
@@ -16486,7 +16683,9 @@
    && (GET_MODE (operands[0]) == HImode
        || (GET_MODE (operands[0]) == QImode
 	   && (TARGET_PROMOTE_QImode
-	       || optimize_insn_for_size_p ())))"
+	       || optimize_insn_for_size_p ())))
+   && (! ANY_MASK_REG_P (operands[0])
+	 || ! ANY_MASK_REG_P (operands[1]))"
   [(set (match_dup 0)
 	(not:SI (match_dup 1)))]
 {
diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
index 3959c38..18f425c 100644
--- a/gcc/config/i386/predicates.md
+++ b/gcc/config/i386/predicates.md
@@ -32,6 +32,11 @@
   (and (match_code "reg")
        (not (match_test "ANY_FP_REGNO_P (REGNO (op))"))))
 
+;; True if the operand is a GENERAL class register.
+(define_predicate "general_reg_operand"
+  (and (match_code "reg")
+       (match_test "GENERAL_REG_P (op)")))
+
 ;; Return true if OP is a register operand other than an i387 fp register.
 (define_predicate "register_and_not_fp_reg_operand"
   (and (match_code "reg")
@@ -52,6 +57,10 @@
   (and (match_code "reg")
        (match_test "EXT_REX_SSE_REGNO_P (REGNO (op))")))
 
+;; True if the operand is an AVX-512 mask register.
+(define_predicate "mask_reg_operand"
+  (and (match_code "reg")
+       (match_test "MASK_REGNO_P (REGNO (op))")))
 
 ;; True if the operand is a Q_REGS class register.
 (define_predicate "q_regs_operand"
-- 
1.7.11.7

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH i386 2/8] [AVX512] Add mask registers.
  2013-08-28 20:06                                 ` Kirill Yukhin
@ 2013-08-28 20:41                                   ` Richard Henderson
  2013-08-28 22:07                                     ` Kirill Yukhin
  0 siblings, 1 reply; 31+ messages in thread
From: Richard Henderson @ 2013-08-28 20:41 UTC (permalink / raw)
  To: Kirill Yukhin; +Cc: Vladimir Makarov, Uros Bizjak, Jakub Jelinek, GCC Patches

On 08/28/2013 11:38 AM, Kirill Yukhin wrote:
>> When combine puts the AND and the NOT together, we don't know what registers we
>> want the data in.  If we do not supply the general register alternative, with
>> the clobber, then we will be FORCED to implement the operation in the mask
>> registers, even if this operation had nothing to do with actual vector masks.
>> And it ought to come as no surprise that X & ~Y is a fairly common operation.
> I agree with all of that. But why to put in BMI alternative as well? Without it
> me may have this pattern w/o clobber and add it when doing split for GPR constraint.

Uh, no, you can't just add it when doing the split.  You could be adding it in
a place that the flags register is live.  You must ALWAYS have the clobber on
the whole pattern when gprs are possible.

> @@ -4219,8 +4225,13 @@ ix86_conditional_register_usage (void)
>  
>    /* If AVX512F is disabled, squash the registers.  */
>    if (! TARGET_AVX512F)
> +  {
>      for (i = FIRST_EXT_REX_SSE_REG; i < LAST_EXT_REX_SSE_REG; i++)
>        fixed_regs[i] = call_used_regs[i] = 1, reg_names[i] = "";
> +
> +    for (i = FIRST_MASK_REG; i < LAST_MASK_REG; i++)
> +      fixed_regs[i] = call_used_regs[i] = 1, reg_names[i] = "";
> +  }

Fix the indentation.

> @@ -1429,7 +1450,7 @@ enum reg_class
>  
>  /* Get_secondary_mem widens integral modes to BITS_PER_WORD.
>     There is no need to emit full 64 bit move on 64 bit targets
> -   for integral modes that can be moved using 32 bit move.  */
> +   for integral modes that can be moved using 8 bit move.  */
>  #define SECONDARY_MEMORY_NEEDED_MODE(MODE)			\
>    (GET_MODE_BITSIZE (MODE) < 32 && INTEGRAL_MODE_P (MODE)	\
>     ? mode_for_size (32, GET_MODE_CLASS (MODE), 0)		\

Spurious comment change.

> +(define_insn "kandn<mode>"
> +  [(set (match_operand:SWI12 0 "register_operand" "=r,&r,!k")
> +	(and:SWI12
> +	  (not:SWI12
> +	    (match_operand:SWI12 1 "register_operand" "r,0,k"))
> +	  (match_operand:SWI12 2 "register_operand" "r,r,k")))
> +   (clobber (reg:CC FLAGS_REG))]

Yk not k?

> +(define_insn "kxnor<mode>"
> +  [(set (match_operand:SWI12 0 "register_operand" "=r,!k")
> +	(not:SWI12
> +	  (xor:SWI12
> +	    (match_operand:SWI12 1 "register_operand" "0,k")
> +	    (match_operand:SWI12 2 "register_operand" "r,k"))))]
> +  "TARGET_AVX512F"
> +  "@
> +   #
> +   kxnorw\t{%2, %1, %0|%0, %1, %2}"
> +  [(set_attr "type" "*,msklog")
> +   (set_attr "prefix" "*,vex")
> +   (set_attr "mode" "<MODE>")])

Likewise.

> +(define_split
> +  [(set (match_operand:SWI12 0 "register_operand")
> +	(not:SWI12
> +	  (xor:SWI12
> +	    (match_dup 0)
> +	    (match_operand:SWI12 1 "register_operand"))))]
> +  "TARGET_AVX512F && !ANY_MASK_REG_P (operands [0])"
> +   [(parallel [(set (match_dup 0)
> +		    (xor:HI (match_dup 0)
> +			    (match_dup 1)))
> +	       (clobber (reg:CC FLAGS_REG))])
> +    (set (match_dup 0)
> +	 (not:HI (match_dup 0)))]
> +  "")

general_reg_operand.

> +(define_insn "kortestzhi"
> +  [(set (reg:CCZ FLAGS_REG)
> +	(compare:CCZ
> +	  (ior:HI
> +	    (match_operand:HI 0 "register_operand" "%Yk")
> +	    (match_operand:HI 1 "register_operand" "Yk"))

Omit the %; the two operands are identical.

> +(define_insn "kortestchi"
> +  [(set (reg:CCC FLAGS_REG)
> +	(compare:CCC
> +	  (ior:HI
> +	    (match_operand:HI 0 "register_operand" "%Yk")
> +	    (match_operand:HI 1 "register_operand" "Yk"))

Likewise.

> +;; Do not split instructions with mask regs.
>  (define_split
>    [(set (match_operand 0 "register_operand")
>  	(not (match_operand 1 "register_operand")))]
> @@ -16486,7 +16683,9 @@
>     && (GET_MODE (operands[0]) == HImode
>         || (GET_MODE (operands[0]) == QImode
>  	   && (TARGET_PROMOTE_QImode
> -	       || optimize_insn_for_size_p ())))"
> +	       || optimize_insn_for_size_p ())))
> +   && (! ANY_MASK_REG_P (operands[0])
> +	 || ! ANY_MASK_REG_P (operands[1]))"
>    [(set (match_dup 0)
>  	(not:SI (match_dup 1)))]

general_reg_operand.


r~

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH i386 2/8] [AVX512] Add mask registers.
  2013-08-28 20:41                                   ` Richard Henderson
@ 2013-08-28 22:07                                     ` Kirill Yukhin
  2013-08-29 12:19                                       ` Kirill Yukhin
  0 siblings, 1 reply; 31+ messages in thread
From: Kirill Yukhin @ 2013-08-28 22:07 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Vladimir Makarov, Uros Bizjak, Jakub Jelinek, GCC Patches

Hello,

On 28 Aug 13:17, Richard Henderson wrote:
> Uh, no, you can't just add it when doing the split.  You could be adding it in
> a place that the flags register is live.  You must ALWAYS have the clobber on
> the whole pattern when gprs are possible.
I see now, thanks a lot for explanation!

> Fix the indentation.
Done.

> Spurious comment change.
Done.

> > +(define_insn "kandn<mode>"
> > +  [(set (match_operand:SWI12 0 "register_operand" "=r,&r,!k")
> Yk not k?
Yes, sure. Fixed.

> > +(define_insn "kxnor<mode>"
> > +  [(set (match_operand:SWI12 0 "register_operand" "=r,!k")
> > +	(not:SWI12
> Likewise.
Fixed.

> > +(define_split
> > +  [(set (match_operand:SWI12 0 "register_operand")
> > +	(not:SWI12
> > +	  (xor:SWI12
> general_reg_operand.
Done.

> > +(define_insn "kortestzhi"
> > +  [(set (reg:CCZ FLAGS_REG)
> > +	(compare:CCZ
> > +	  (ior:HI
> > +	    (match_operand:HI 0 "register_operand" "%Yk")
> > +	    (match_operand:HI 1 "register_operand" "Yk"))
> 
> Omit the %; the two operands are identical.
Fixed.

> > +(define_insn "kortestchi"
> > +  [(set (reg:CCC FLAGS_REG)
> 
> Likewise.
Fixed.

> > +;; Do not split instructions with mask regs.
> >  (define_split
> >    [(set (match_operand 0 "register_operand")
> >  	(not (match_operand 1 "register_operand")))]
> general_reg_operand.
Fixed.

Testing in progress, is it ok for trunk if pass?

--
Thanks, K

---
 gcc/config/i386/constraints.md |   8 +-
 gcc/config/i386/i386.c         |  38 ++++--
 gcc/config/i386/i386.h         |  38 ++++--
 gcc/config/i386/i386.md        | 287 ++++++++++++++++++++++++++++++++++-------
 gcc/config/i386/predicates.md  |   9 ++
 5 files changed, 317 insertions(+), 63 deletions(-)

diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
index 28e626f..92e0c05 100644
--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -19,7 +19,7 @@
 
 ;;; Unused letters:
 ;;;     B     H           T
-;;;           h jk
+;;;           h j
 
 ;; Integer register constraints.
 ;; It is not necessary to define 'r' here.
@@ -78,6 +78,12 @@
  "TARGET_80387 || TARGET_FLOAT_RETURNS_IN_80387 ? FP_SECOND_REG : NO_REGS"
  "Second from top of 80387 floating-point stack (@code{%st(1)}).")
 
+(define_register_constraint "k" "TARGET_AVX512F ? MASK_EVEX_REGS : NO_REGS"
+"@internal Any mask register that can be used as predicate, i.e. k1-k7.")
+
+(define_register_constraint "Yk" "TARGET_AVX512F ? MASK_REGS : NO_REGS"
+"@internal Any mask register.")
+
 ;; Vector registers (also used for plain floating point nowadays).
 (define_register_constraint "y" "TARGET_MMX ? MMX_REGS : NO_REGS"
  "Any MMX register.")
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index d05dbf0..4e9bac0 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2032,6 +2032,9 @@ enum reg_class const regclass_map[FIRST_PSEUDO_REGISTER] =
   EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS,
   EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS,
   EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS,
+  /* Mask registers.  */
+  MASK_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS,
+  MASK_EVEX_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS,
 };
 
 /* The "default" register map used in 32bit mode.  */
@@ -2047,6 +2050,7 @@ int const dbx_register_map[FIRST_PSEUDO_REGISTER] =
   -1, -1, -1, -1, -1, -1, -1, -1,	/* extended SSE registers */
   -1, -1, -1, -1, -1, -1, -1, -1,       /* AVX-512 registers 16-23*/
   -1, -1, -1, -1, -1, -1, -1, -1,       /* AVX-512 registers 24-31*/
+  93, 94, 95, 96, 97, 98, 99, 100,      /* Mask registers */
 };
 
 /* The "default" register map used in 64bit mode.  */
@@ -2062,6 +2066,7 @@ int const dbx64_register_map[FIRST_PSEUDO_REGISTER] =
   25, 26, 27, 28, 29, 30, 31, 32,	/* extended SSE registers */
   67, 68, 69, 70, 71, 72, 73, 74,       /* AVX-512 registers 16-23 */
   75, 76, 77, 78, 79, 80, 81, 82,       /* AVX-512 registers 24-31 */
+  118, 119, 120, 121, 122, 123, 124, 125, /* Mask registers */
 };
 
 /* Define the register numbers to be used in Dwarf debugging information.
@@ -2129,6 +2134,7 @@ int const svr4_dbx_register_map[FIRST_PSEUDO_REGISTER] =
   -1, -1, -1, -1, -1, -1, -1, -1,	/* extended SSE registers */
   -1, -1, -1, -1, -1, -1, -1, -1,       /* AVX-512 registers 16-23*/
   -1, -1, -1, -1, -1, -1, -1, -1,       /* AVX-512 registers 24-31*/
+  93, 94, 95, 96, 97, 98, 99, 100,      /* Mask registers */
 };
 
 /* Define parameter passing and return registers.  */
@@ -4219,8 +4225,13 @@ ix86_conditional_register_usage (void)
 
   /* If AVX512F is disabled, squash the registers.  */
   if (! TARGET_AVX512F)
-    for (i = FIRST_EXT_REX_SSE_REG; i < LAST_EXT_REX_SSE_REG; i++)
-      fixed_regs[i] = call_used_regs[i] = 1, reg_names[i] = "";
+    {
+      for (i = FIRST_EXT_REX_SSE_REG; i < LAST_EXT_REX_SSE_REG; i++)
+	fixed_regs[i] = call_used_regs[i] = 1, reg_names[i] = "";
+
+      for (i = FIRST_MASK_REG; i < LAST_MASK_REG; i++)
+	fixed_regs[i] = call_used_regs[i] = 1, reg_names[i] = "";
+    }
 }
 
 \f
@@ -33889,10 +33900,12 @@ ix86_preferred_reload_class (rtx x, reg_class_t regclass)
     return regclass;
 
   /* Force constants into memory if we are loading a (nonzero) constant into
-     an MMX or SSE register.  This is because there are no MMX/SSE instructions
-     to load from a constant.  */
+     an MMX, SSE or MASK register.  This is because there are no MMX/SSE/MASK
+     instructions to load from a constant.  */
   if (CONSTANT_P (x)
-      && (MAYBE_MMX_CLASS_P (regclass) || MAYBE_SSE_CLASS_P (regclass)))
+      && (MAYBE_MMX_CLASS_P (regclass)
+	  || MAYBE_SSE_CLASS_P (regclass)
+	  || MAYBE_MASK_CLASS_P (regclass)))
     return NO_REGS;
 
   /* Prefer SSE regs only, if we can use them for math.  */
@@ -33996,10 +34009,11 @@ ix86_secondary_reload (bool in_p, rtx x, reg_class_t rclass,
 
   /* QImode spills from non-QI registers require
      intermediate register on 32bit targets.  */
-  if (!TARGET_64BIT
-      && !in_p && mode == QImode
-      && INTEGER_CLASS_P (rclass)
-      && MAYBE_NON_Q_CLASS_P (rclass))
+  if (mode == QImode
+      && (MAYBE_MASK_CLASS_P (rclass)
+	  || (!TARGET_64BIT && !in_p
+	      && INTEGER_CLASS_P (rclass)
+	      && MAYBE_NON_Q_CLASS_P (rclass))))
     {
       int regno;
 
@@ -34421,6 +34435,8 @@ ix86_hard_regno_mode_ok (int regno, enum machine_mode mode)
     return false;
   if (STACK_REGNO_P (regno))
     return VALID_FP_MODE_P (mode);
+  if (MASK_REGNO_P (regno))
+    return VALID_MASK_REG_MODE (mode);
   if (SSE_REGNO_P (regno))
     {
       /* We implement the move patterns for all vector modes into and
@@ -35230,6 +35246,10 @@ x86_order_regs_for_local_alloc (void)
    for (i = FIRST_EXT_REX_SSE_REG; i <= LAST_EXT_REX_SSE_REG; i++)
      reg_alloc_order [pos++] = i;
 
+   /* Mask register.  */
+   for (i = FIRST_MASK_REG; i <= LAST_MASK_REG; i++)
+     reg_alloc_order [pos++] = i;
+
    /* x87 registers.  */
    if (TARGET_SSE_MATH)
      for (i = FIRST_STACK_REG; i <= LAST_STACK_REG; i++)
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index e820aa6..709d3ed 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -893,7 +893,7 @@ enum target_cpu_default
    eliminated during reloading in favor of either the stack or frame
    pointer.  */
 
-#define FIRST_PSEUDO_REGISTER 69
+#define FIRST_PSEUDO_REGISTER 77
 
 /* Number of hardware registers that go into the DWARF-2 unwind info.
    If not defined, equals FIRST_PSEUDO_REGISTER.  */
@@ -923,7 +923,9 @@ enum target_cpu_default
 /*xmm16,xmm17,xmm18,xmm19,xmm20,xmm21,xmm22,xmm23*/		\
      0,   0,    0,    0,    0,    0,    0,    0,		\
 /*xmm24,xmm25,xmm26,xmm27,xmm28,xmm29,xmm30,xmm31*/		\
-     0,   0,    0,    0,    0,    0,    0,    0 }
+     0,   0,    0,    0,    0,    0,    0,    0,		\
+/*  k0,  k1, k2, k3, k4, k5, k6, k7*/				\
+     0,  0,   0,  0,  0,  0,  0,  0 }
 
 /* 1 for registers not available across function calls.
    These must include the FIXED_REGISTERS and also any
@@ -955,7 +957,9 @@ enum target_cpu_default
 /*xmm16,xmm17,xmm18,xmm19,xmm20,xmm21,xmm22,xmm23*/		\
      6,    6,     6,    6,    6,    6,    6,    6,		\
 /*xmm24,xmm25,xmm26,xmm27,xmm28,xmm29,xmm30,xmm31*/		\
-     6,    6,     6,    6,    6,    6,    6,    6 }
+     6,    6,     6,    6,    6,    6,    6,    6,		\
+ /* k0,  k1,  k2,  k3,  k4,  k5,  k6,  k7*/			\
+     1,   1,   1,   1,   1,   1,   1,   1 }
 
 /* Order in which to allocate registers.  Each register must be
    listed once, even those in FIXED_REGISTERS.  List frame pointer
@@ -971,7 +975,7 @@ enum target_cpu_default
    18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32,	\
    33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,  \
    48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62,	\
-   63, 64, 65, 66, 67, 68 }
+   63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76 }
 
 /* ADJUST_REG_ALLOC_ORDER is a macro which permits reg_alloc_order
    to be rearranged based on a particular function.  When using sse math,
@@ -1068,6 +1072,8 @@ enum target_cpu_default
    || (MODE) == V16SImode || (MODE) == V32HImode || (MODE) == V8DFmode	\
    || (MODE) == V16SFmode)
 
+#define VALID_MASK_REG_MODE(MODE) ((MODE) == HImode || (MODE) == QImode)
+
 /* Value is 1 if hard register REGNO can hold a value of machine-mode MODE.  */
 
 #define HARD_REGNO_MODE_OK(REGNO, MODE)	\
@@ -1093,8 +1099,10 @@ enum target_cpu_default
   (CC_REGNO_P (REGNO) ? VOIDmode					\
    : (MODE) == VOIDmode && (NREGS) != 1 ? VOIDmode			\
    : (MODE) == VOIDmode ? choose_hard_reg_mode ((REGNO), (NREGS), false) \
-   : (MODE) == HImode && !TARGET_PARTIAL_REG_STALL ? SImode		\
-   : (MODE) == QImode && !(TARGET_64BIT || QI_REGNO_P (REGNO)) ? SImode	\
+   : (MODE) == HImode && !(TARGET_PARTIAL_REG_STALL			\
+			   || MASK_REGNO_P (REGNO)) ? SImode		\
+   : (MODE) == QImode && !(TARGET_64BIT || QI_REGNO_P (REGNO)		\
+			   || MASK_REGNO_P (REGNO)) ? SImode		\
    : (MODE))
 
 /* The only ABI that saves SSE registers across calls is Win64 (thus no
@@ -1141,6 +1149,9 @@ enum target_cpu_default
 #define FIRST_EXT_REX_SSE_REG  (LAST_REX_SSE_REG + 1) /*53*/
 #define LAST_EXT_REX_SSE_REG   (FIRST_EXT_REX_SSE_REG + 15) /*68*/
 
+#define FIRST_MASK_REG  (LAST_EXT_REX_SSE_REG + 1) /*69*/
+#define LAST_MASK_REG   (FIRST_MASK_REG + 7) /*76*/
+
 /* Override this in other tm.h files to cope with various OS lossage
    requiring a frame pointer.  */
 #ifndef SUBTARGET_FRAME_POINTER_REQUIRED
@@ -1229,6 +1240,8 @@ enum reg_class
   FLOAT_INT_REGS,
   INT_SSE_REGS,
   FLOAT_INT_SSE_REGS,
+  MASK_EVEX_REGS,
+  MASK_REGS,
   ALL_REGS, LIM_REG_CLASSES
 };
 
@@ -1250,6 +1263,8 @@ enum reg_class
   reg_classes_intersect_p ((CLASS), ALL_SSE_REGS)
 #define MAYBE_MMX_CLASS_P(CLASS) \
   reg_classes_intersect_p ((CLASS), MMX_REGS)
+#define MAYBE_MASK_CLASS_P(CLASS) \
+  reg_classes_intersect_p ((CLASS), MASK_REGS)
 
 #define Q_CLASS_P(CLASS) \
   reg_class_subset_p ((CLASS), Q_REGS)
@@ -1282,6 +1297,8 @@ enum reg_class
    "FLOAT_INT_REGS",			\
    "INT_SSE_REGS",			\
    "FLOAT_INT_SSE_REGS",		\
+   "MASK_EVEX_REGS",			\
+   "MASK_REGS",				\
    "ALL_REGS" }
 
 /* Define which registers fit in which classes.  This is an initializer
@@ -1319,7 +1336,9 @@ enum reg_class
 {   0x11ffff,    0x1fe0,   0x0 },       /* FLOAT_INT_REGS */            \
 { 0x1ff100ff,0xffffffe0,  0x1f },       /* INT_SSE_REGS */              \
 { 0x1ff1ffff,0xffffffe0,  0x1f },       /* FLOAT_INT_SSE_REGS */        \
-{ 0xffffffff,0xffffffff,  0x1f }                                        \
+       { 0x0,       0x0,0x1fc0 },       /* MASK_EVEX_REGS */           \
+       { 0x0,       0x0,0x1fe0 },       /* MASK_REGS */                 \
+{ 0xffffffff,0xffffffff,0x1fff }                                        \
 }
 
 /* The same information, inverted:
@@ -1377,6 +1396,8 @@ enum reg_class
          : (N) <= LAST_REX_SSE_REG ? (FIRST_REX_SSE_REG + (N) - 8) \
                                    : (FIRST_EXT_REX_SSE_REG + (N) - 16))
 
+#define MASK_REGNO_P(N) IN_RANGE ((N), FIRST_MASK_REG, LAST_MASK_REG)
+#define ANY_MASK_REG_P(X) (REG_P (X) && MASK_REGNO_P (REGNO (X)))
 
 #define SSE_FLOAT_MODE_P(MODE) \
   ((TARGET_SSE && (MODE) == SFmode) || (TARGET_SSE2 && (MODE) == DFmode))
@@ -1933,7 +1954,8 @@ do {							\
  "xmm16", "xmm17", "xmm18", "xmm19",					\
  "xmm20", "xmm21", "xmm22", "xmm23",					\
  "xmm24", "xmm25", "xmm26", "xmm27",					\
- "xmm28", "xmm29", "xmm30", "xmm31" }
+ "xmm28", "xmm29", "xmm30", "xmm31",					\
+ "k0", "k1", "k2", "k3", "k4", "k5", "k6", "k7" }
 
 #define REGISTER_NAMES HI_REGISTER_NAMES
 
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 3d7533a..97a489d 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -328,6 +328,14 @@
    (XMM29_REG			66)
    (XMM30_REG			67)
    (XMM31_REG			68)
+   (MASK0_REG			69)
+   (MASK1_REG			70)
+   (MASK2_REG			71)
+   (MASK3_REG			72)
+   (MASK4_REG			73)
+   (MASK5_REG			74)
+   (MASK6_REG			75)
+   (MASK7_REG			76)
   ])
 
 ;; Insns whose names begin with "x86_" are emitted by gen_FOO calls
@@ -360,7 +368,7 @@
    sseishft,sseishft1,ssecmp,ssecomi,
    ssecvt,ssecvt1,sseicvt,sseins,
    sseshuf,sseshuf1,ssemuladd,sse4arg,
-   lwp,
+   lwp,mskmov,msklog,
    mmx,mmxmov,mmxadd,mmxmul,mmxcmp,mmxcvt,mmxshft"
   (const_string "other"))
 
@@ -379,7 +387,7 @@
 			  ssemul,sseimul,ssediv,sselog,sselog1,
 			  sseishft,sseishft1,ssecmp,ssecomi,
 			  ssecvt,ssecvt1,sseicvt,sseins,
-			  sseshuf,sseshuf1,ssemuladd,sse4arg")
+			  sseshuf,sseshuf1,ssemuladd,sse4arg,mskmov")
 	   (const_string "sse")
 	 (eq_attr "type" "mmx,mmxmov,mmxadd,mmxmul,mmxcmp,mmxcvt,mmxshft")
 	   (const_string "mmx")
@@ -390,7 +398,7 @@
 ;; The (bounding maximum) length of an instruction immediate.
 (define_attr "length_immediate" ""
   (cond [(eq_attr "type" "incdec,setcc,icmov,str,lea,other,multi,idiv,leave,
-			  bitmanip,imulx")
+			  bitmanip,imulx,msklog,mskmov")
 	   (const_int 0)
 	 (eq_attr "unit" "i387,sse,mmx")
 	   (const_int 0)
@@ -451,7 +459,7 @@
 ;; Set when 0f opcode prefix is used.
 (define_attr "prefix_0f" ""
   (if_then_else
-    (ior (eq_attr "type" "imovx,setcc,icmov,bitmanip")
+    (ior (eq_attr "type" "imovx,setcc,icmov,bitmanip,msklog,mskmov")
 	 (eq_attr "unit" "sse,mmx"))
     (const_int 1)
     (const_int 0)))
@@ -651,7 +659,7 @@
 		   fmov,fcmp,fsgn,
 		   sse,ssemov,ssecmp,ssecomi,ssecvt,ssecvt1,sseicvt,
 		   sselog1,sseshuf1,sseadd1,sseiadd1,sseishft1,
-		   mmx,mmxmov,mmxcmp,mmxcvt")
+		   mmx,mmxmov,mmxcmp,mmxcvt,mskmov,msklog")
 	      (match_operand 2 "memory_operand"))
 	   (const_string "load")
 	 (and (eq_attr "type" "icmov,ssemuladd,sse4arg")
@@ -695,7 +703,7 @@
 ;; Used to control the "enabled" attribute on a per-instruction basis.
 (define_attr "isa" "base,x64,x64_sse4,x64_sse4_noavx,x64_avx,nox64,
 		    sse2,sse2_noavx,sse3,sse4,sse4_noavx,avx,noavx,
-		    avx2,noavx2,bmi2,fma4,fma,avx512f,noavx512f,fma_avx512f"
+		    avx2,noavx2,bmi,bmi2,fma4,fma,avx512f,noavx512f,fma_avx512f"
   (const_string "base"))
 
 (define_attr "enabled" ""
@@ -718,6 +726,7 @@
 	 (eq_attr "isa" "noavx") (symbol_ref "!TARGET_AVX")
 	 (eq_attr "isa" "avx2") (symbol_ref "TARGET_AVX2")
 	 (eq_attr "isa" "noavx2") (symbol_ref "!TARGET_AVX2")
+	 (eq_attr "isa" "bmi") (symbol_ref "TARGET_BMI")
 	 (eq_attr "isa" "bmi2") (symbol_ref "TARGET_BMI2")
 	 (eq_attr "isa" "fma4") (symbol_ref "TARGET_FMA4")
 	 (eq_attr "isa" "fma") (symbol_ref "TARGET_FMA")
@@ -2213,8 +2222,8 @@
 	   (const_string "SI")))])
 
 (define_insn "*movhi_internal"
-  [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r ,r ,m")
-	(match_operand:HI 1 "general_operand"	   "r ,rn,rm,rn"))]
+  [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r ,r ,m ,Yk,Yk,rm")
+	(match_operand:HI 1 "general_operand"      "r ,rn,rm,rn,rm,Yk,Yk"))]
   "!(MEM_P (operands[0]) && MEM_P (operands[1]))"
 {
   switch (get_attr_type (insn))
@@ -2223,6 +2232,16 @@
       /* movzwl is faster than movw on p2 due to partial word stalls,
 	 though not as fast as an aligned movl.  */
       return "movz{wl|x}\t{%1, %k0|%k0, %1}";
+
+    case TYPE_MSKMOV:
+      switch (which_alternative)
+        {
+	case 4: return "kmovw\t{%k1, %0|%0, %k1}";
+	case 5: return "kmovw\t{%1, %0|%0, %1}";
+	case 6: return "kmovw\t{%1, %k0|%k0, %1}";
+	default: gcc_unreachable ();
+	}
+
     default:
       if (get_attr_mode (insn) == MODE_SI)
         return "mov{l}\t{%k1, %k0|%k0, %k1}";
@@ -2240,11 +2259,17 @@
 	    (and (eq_attr "alternative" "1,2")
 		 (match_operand:HI 1 "aligned_operand"))
 	      (const_string "imov")
+	    (eq_attr "alternative" "4,5,6")
+	      (const_string "mskmov")
 	    (and (match_test "TARGET_MOVX")
 		 (eq_attr "alternative" "0,2"))
 	      (const_string "imovx")
 	   ]
 	   (const_string "imov")))
+    (set (attr "prefix")
+      (if_then_else (eq_attr "alternative" "4,5,6")
+	(const_string "vex")
+	(const_string "orig")))
     (set (attr "mode")
       (cond [(eq_attr "type" "imovx")
 	       (const_string "SI")
@@ -2269,8 +2294,8 @@
 ;; register stall machines with, where we use QImode instructions, since
 ;; partial register stall can be caused there.  Then we use movzx.
 (define_insn "*movqi_internal"
-  [(set (match_operand:QI 0 "nonimmediate_operand" "=q,q ,q ,r,r ,?r,m")
-	(match_operand:QI 1 "general_operand"      "q ,qn,qm,q,rn,qm,qn"))]
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=q,q ,q ,r,r ,?r,m ,Yk,Yk,r")
+	(match_operand:QI 1 "general_operand"      "q ,qn,qm,q,rn,qm,qn,r ,Yk,Yk"))]
   "!(MEM_P (operands[0]) && MEM_P (operands[1]))"
 {
   switch (get_attr_type (insn))
@@ -2278,6 +2303,16 @@
     case TYPE_IMOVX:
       gcc_assert (ANY_QI_REG_P (operands[1]) || MEM_P (operands[1]));
       return "movz{bl|x}\t{%1, %k0|%k0, %1}";
+
+    case TYPE_MSKMOV:
+      switch (which_alternative)
+        {
+	case 7: return "kmovw\t{%k1, %0|%0, %k1}";
+	case 8: return "kmovw\t{%1, %0|%0, %1}";
+	case 9: return "kmovw\t{%1, %k0|%k0, %1}";
+	default: gcc_unreachable ();
+	}
+
     default:
       if (get_attr_mode (insn) == MODE_SI)
         return "mov{l}\t{%k1, %k0|%k0, %k1}";
@@ -2297,11 +2332,17 @@
 	      (const_string "imov")
 	    (eq_attr "alternative" "3,5")
 	      (const_string "imovx")
+	    (eq_attr "alternative" "7,8,9")
+	      (const_string "mskmov")
 	    (and (match_test "TARGET_MOVX")
 		 (eq_attr "alternative" "2"))
 	      (const_string "imovx")
 	   ]
 	   (const_string "imov")))
+   (set (attr "prefix")
+     (if_then_else (eq_attr "alternative" "7,8,9")
+       (const_string "vex")
+       (const_string "orig")))
    (set (attr "mode")
       (cond [(eq_attr "alternative" "3,4,5")
 	       (const_string "SI")
@@ -7494,6 +7535,26 @@
   operands[3] = gen_lowpart (QImode, operands[3]);
 })
 
+(define_split
+  [(set (match_operand:SWI12 0 "mask_reg_operand")
+	(any_logic:SWI12 (match_operand:SWI12 1 "mask_reg_operand")
+			 (match_operand:SWI12 2 "mask_reg_operand")))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_AVX512F && reload_completed"
+  [(set (match_dup 0)
+	(any_logic:SWI12 (match_dup 1)
+			 (match_dup 2)))])
+
+(define_insn "*k<logic><mode>"
+  [(set (match_operand:SWI12 0 "mask_reg_operand" "=Yk")
+	(any_logic:SWI12 (match_operand:SWI12 1 "mask_reg_operand" "Yk")
+			 (match_operand:SWI12 2 "mask_reg_operand" "Yk")))]
+  "TARGET_AVX512F && reload_completed"
+  "k<logic>w\t{%2, %1, %0|%0, %1, %2}";
+  [(set_attr "mode" "<MODE>")
+   (set_attr "type" "msklog")
+   (set_attr "prefix" "vex")])
+
 ;; %%% This used to optimize known byte-wide and operations to memory,
 ;; and sometimes to QImode registers.  If this is considered useful,
 ;; it should be done with splitters.
@@ -7616,10 +7677,10 @@
   [(set_attr "type" "alu")
    (set_attr "mode" "SI")])
 
-(define_insn "*andhi_1"
-  [(set (match_operand:HI 0 "nonimmediate_operand" "=rm,r,Ya")
-	(and:HI (match_operand:HI 1 "nonimmediate_operand" "%0,0,qm")
-		(match_operand:HI 2 "general_operand" "rn,rm,L")))
+(define_insn "andhi_1"
+  [(set (match_operand:HI 0 "nonimmediate_operand" "=rm,r,Ya,!Yk")
+	(and:HI (match_operand:HI 1 "nonimmediate_operand" "%0,0,qm,Yk")
+		(match_operand:HI 2 "general_operand" "rn,rm,L,Yk")))
    (clobber (reg:CC FLAGS_REG))]
   "ix86_binary_operator_ok (AND, HImode, operands)"
 {
@@ -7628,34 +7689,38 @@
     case TYPE_IMOVX:
       return "#";
 
+    case TYPE_MSKLOG:
+      return "kandw\t{%2, %1, %0|%0, %1, %2}";
+
     default:
       gcc_assert (rtx_equal_p (operands[0], operands[1]));
       return "and{w}\t{%2, %0|%0, %2}";
     }
 }
-  [(set_attr "type" "alu,alu,imovx")
-   (set_attr "length_immediate" "*,*,0")
+  [(set_attr "type" "alu,alu,imovx,msklog")
+   (set_attr "length_immediate" "*,*,0,*")
    (set (attr "prefix_rex")
      (if_then_else
        (and (eq_attr "type" "imovx")
 	    (match_operand 1 "ext_QIreg_operand"))
        (const_string "1")
        (const_string "*")))
-   (set_attr "mode" "HI,HI,SI")])
+   (set_attr "mode" "HI,HI,SI,HI")])
 
 ;; %%% Potential partial reg stall on alternative 2.  What to do?
 (define_insn "*andqi_1"
-  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,q,r")
-	(and:QI (match_operand:QI 1 "nonimmediate_operand" "%0,0,0")
-		(match_operand:QI 2 "general_operand" "qn,qmn,rn")))
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,q,r,!Yk")
+	(and:QI (match_operand:QI 1 "nonimmediate_operand" "%0,0,0,Yk")
+		(match_operand:QI 2 "general_operand" "qn,qmn,rn,Yk")))
    (clobber (reg:CC FLAGS_REG))]
   "ix86_binary_operator_ok (AND, QImode, operands)"
   "@
    and{b}\t{%2, %0|%0, %2}
    and{b}\t{%2, %0|%0, %2}
-   and{l}\t{%k2, %k0|%k0, %k2}"
-  [(set_attr "type" "alu")
-   (set_attr "mode" "QI,QI,SI")])
+   and{l}\t{%k2, %k0|%k0, %k2}
+   kandw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "alu,alu,alu,msklog")
+   (set_attr "mode" "QI,QI,SI,HI")])
 
 (define_insn "*andqi_1_slp"
   [(set (strict_low_part (match_operand:QI 0 "nonimmediate_operand" "+qm,q"))
@@ -7668,6 +7733,39 @@
   [(set_attr "type" "alu1")
    (set_attr "mode" "QI")])
 
+(define_insn "kandn<mode>"
+  [(set (match_operand:SWI12 0 "register_operand" "=r,&r,!Yk")
+	(and:SWI12
+	  (not:SWI12
+	    (match_operand:SWI12 1 "register_operand" "r,0,Yk"))
+	  (match_operand:SWI12 2 "register_operand" "r,r,Yk")))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_AVX512F"
+  "@
+   andn\t{%k2, %k1, %k0|%k0, %k1, %k2}
+   #
+   kandnw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "bmi,*,avx512f")
+   (set_attr "type" "bitmanip,*,msklog")
+   (set_attr "prefix" "*,*,vex")
+   (set_attr "btver2_decode" "direct,*,*")
+   (set_attr "mode" "<MODE>")])
+
+(define_split
+  [(set (match_operand:SWI12 0 "general_reg_operand")
+	(and:SWI12
+	  (not:SWI12
+	    (match_dup 0))
+	  (match_operand:SWI12 1 "general_reg_operand")))]
+  "TARGET_AVX512F && !TARGET_BMI"
+  [(set (match_dup 0)
+	(not:HI (match_dup 0)))
+   (parallel [(set (match_dup 0)
+		   (and:HI (match_dup 0)
+			   (match_dup 1)))
+	      (clobber (reg:CC FLAGS_REG))])]
+  "")
+
 ;; Turn *anddi_1 into *andsi_1_zext if possible.
 (define_split
   [(set (match_operand:DI 0 "register_operand")
@@ -7999,29 +8097,44 @@
   "ix86_expand_binary_operator (<CODE>, <MODE>mode, operands); DONE;")
 
 (define_insn "*<code><mode>_1"
-  [(set (match_operand:SWI248 0 "nonimmediate_operand" "=r,rm")
-	(any_or:SWI248
-	 (match_operand:SWI248 1 "nonimmediate_operand" "%0,0")
-	 (match_operand:SWI248 2 "<general_operand>" "<g>,r<i>")))
+  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=r,rm")
+	(any_or:SWI48
+	 (match_operand:SWI48 1 "nonimmediate_operand" "%0,0")
+	 (match_operand:SWI48 2 "<general_operand>" "<g>,r<i>")))
    (clobber (reg:CC FLAGS_REG))]
   "ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
   "<logic>{<imodesuffix>}\t{%2, %0|%0, %2}"
   [(set_attr "type" "alu")
    (set_attr "mode" "<MODE>")])
 
+(define_insn "<code>hi_1"
+  [(set (match_operand:HI 0 "nonimmediate_operand" "=r,rm,!Yk")
+	(any_or:HI
+	 (match_operand:HI 1 "nonimmediate_operand" "%0,0,Yk")
+	 (match_operand:HI 2 "general_operand" "<g>,r<i>,Yk")))
+   (clobber (reg:CC FLAGS_REG))]
+  "ix86_binary_operator_ok (<CODE>, HImode, operands)"
+  "@
+  <logic>{w}\t{%2, %0|%0, %2}
+  <logic>{w}\t{%2, %0|%0, %2}
+  k<logic>w\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "alu,alu,msklog")
+   (set_attr "mode" "HI")])
+
 ;; %%% Potential partial reg stall on alternative 2.  What to do?
 (define_insn "*<code>qi_1"
-  [(set (match_operand:QI 0 "nonimmediate_operand" "=q,m,r")
-	(any_or:QI (match_operand:QI 1 "nonimmediate_operand" "%0,0,0")
-		   (match_operand:QI 2 "general_operand" "qmn,qn,rn")))
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=q,m,r,!Yk")
+	(any_or:QI (match_operand:QI 1 "nonimmediate_operand" "%0,0,0,Yk")
+		   (match_operand:QI 2 "general_operand" "qmn,qn,rn,Yk")))
    (clobber (reg:CC FLAGS_REG))]
   "ix86_binary_operator_ok (<CODE>, QImode, operands)"
   "@
    <logic>{b}\t{%2, %0|%0, %2}
    <logic>{b}\t{%2, %0|%0, %2}
-   <logic>{l}\t{%k2, %k0|%k0, %k2}"
-  [(set_attr "type" "alu")
-   (set_attr "mode" "QI,QI,SI")])
+   <logic>{l}\t{%k2, %k0|%k0, %k2}
+   k<logic>w\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "alu,alu,alu,msklog")
+   (set_attr "mode" "QI,QI,SI,HI")])
 
 ;; See comment for addsi_1_zext why we do use nonimmediate_operand
 (define_insn "*<code>si_1_zext"
@@ -8071,6 +8184,74 @@
   [(set_attr "type" "alu")
    (set_attr "mode" "<MODE>")])
 
+(define_insn "kxnor<mode>"
+  [(set (match_operand:SWI12 0 "register_operand" "=r,!Yk")
+	(not:SWI12
+	  (xor:SWI12
+	    (match_operand:SWI12 1 "register_operand" "0,Yk")
+	    (match_operand:SWI12 2 "register_operand" "r,Yk"))))]
+  "TARGET_AVX512F"
+  "@
+   #
+   kxnorw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "*,msklog")
+   (set_attr "prefix" "*,vex")
+   (set_attr "mode" "<MODE>")])
+
+(define_split
+  [(set (match_operand:SWI12 0 "general_reg_operand")
+	(not:SWI12
+	  (xor:SWI12
+	    (match_dup 0)
+	    (match_operand:SWI12 1 "general_reg_operand"))))]
+  "TARGET_AVX512F"
+   [(parallel [(set (match_dup 0)
+		    (xor:HI (match_dup 0)
+			    (match_dup 1)))
+	       (clobber (reg:CC FLAGS_REG))])
+    (set (match_dup 0)
+	 (not:HI (match_dup 0)))]
+  "")
+
+(define_insn "kortestzhi"
+  [(set (reg:CCZ FLAGS_REG)
+	(compare:CCZ
+	  (ior:HI
+	    (match_operand:HI 0 "register_operand" "Yk")
+	    (match_operand:HI 1 "register_operand" "Yk"))
+	  (const_int 0)))]
+  "TARGET_AVX512F && ix86_match_ccmode (insn, CCZmode)"
+  "kortestw\t{%1, %0|%0, %1}"
+  [(set_attr "mode" "HI")
+   (set_attr "type" "msklog")
+   (set_attr "prefix" "vex")])
+
+(define_insn "kortestchi"
+  [(set (reg:CCC FLAGS_REG)
+	(compare:CCC
+	  (ior:HI
+	    (match_operand:HI 0 "register_operand" "Yk")
+	    (match_operand:HI 1 "register_operand" "Yk"))
+	  (const_int -1)))]
+  "TARGET_AVX512F && ix86_match_ccmode (insn, CCCmode)"
+  "kortestw\t{%1, %0|%0, %1}"
+  [(set_attr "mode" "HI")
+   (set_attr "type" "msklog")
+   (set_attr "prefix" "vex")])
+
+(define_insn "kunpckhi"
+  [(set (match_operand:HI 0 "register_operand" "=Yk")
+	(ior:HI
+	  (ashift:HI
+	    (match_operand:HI 1 "register_operand" "Yk")
+	    (const_int 8))
+	  (zero_extend:HI (match_operand:QI 2 "register_operand" "Yk"))))]
+  "TARGET_AVX512F"
+  "kunpckbw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mode" "HI")
+   (set_attr "type" "msklog")
+   (set_attr "prefix" "vex")])
+
 ;; See comment for addsi_1_zext why we do use nonimmediate_operand
 ;; ??? Special case for immediate operand is missing - it is tricky.
 (define_insn "*<code>si_2_zext"
@@ -8640,23 +8821,38 @@
   "ix86_expand_unary_operator (NOT, <MODE>mode, operands); DONE;")
 
 (define_insn "*one_cmpl<mode>2_1"
-  [(set (match_operand:SWI248 0 "nonimmediate_operand" "=rm")
-	(not:SWI248 (match_operand:SWI248 1 "nonimmediate_operand" "0")))]
+  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm")
+	(not:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" "0")))]
   "ix86_unary_operator_ok (NOT, <MODE>mode, operands)"
   "not{<imodesuffix>}\t%0"
   [(set_attr "type" "negnot")
    (set_attr "mode" "<MODE>")])
 
+(define_insn "*one_cmplhi2_1"
+  [(set (match_operand:HI 0 "nonimmediate_operand" "=rm,!Yk")
+	(not:HI (match_operand:HI 1 "nonimmediate_operand" "0,Yk")))]
+  "ix86_unary_operator_ok (NOT, HImode, operands)"
+  "@
+   not{w}\t%0
+   knotw\t{%1, %0|%0, %1}"
+  [(set_attr "isa" "*,avx512f")
+   (set_attr "type" "negnot,msklog")
+   (set_attr "prefix" "*,vex")
+   (set_attr "mode" "HI")])
+
 ;; %%% Potential partial reg stall on alternative 1.  What to do?
 (define_insn "*one_cmplqi2_1"
-  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,r")
-	(not:QI (match_operand:QI 1 "nonimmediate_operand" "0,0")))]
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,r,!Yk")
+	(not:QI (match_operand:QI 1 "nonimmediate_operand" "0,0,Yk")))]
   "ix86_unary_operator_ok (NOT, QImode, operands)"
   "@
    not{b}\t%0
-   not{l}\t%k0"
-  [(set_attr "type" "negnot")
-   (set_attr "mode" "QI,SI")])
+   not{l}\t%k0
+   knotw\t{%1, %0|%0, %1}"
+  [(set_attr "isa" "*,*,avx512f")
+   (set_attr "type" "negnot,negnot,msklog")
+   (set_attr "prefix" "*,*,vex")
+   (set_attr "mode" "QI,SI,QI")])
 
 ;; ??? Currently never generated - xor is used instead.
 (define_insn "*one_cmplsi2_1_zext"
@@ -16380,11 +16576,11 @@
 })
 
 ;; Avoid redundant prefixes by splitting HImode arithmetic to SImode.
-
+;; Do not split instructions with mask registers.
 (define_split
-  [(set (match_operand 0 "register_operand")
+  [(set (match_operand 0 "general_reg_operand")
 	(match_operator 3 "promotable_binary_operator"
-	   [(match_operand 1 "register_operand")
+	   [(match_operand 1 "general_reg_operand")
 	    (match_operand 2 "aligned_operand")]))
    (clobber (reg:CC FLAGS_REG))]
   "! TARGET_PARTIAL_REG_STALL && reload_completed
@@ -16479,9 +16675,10 @@
   operands[1] = gen_lowpart (SImode, operands[1]);
 })
 
+;; Do not split instructions with mask regs.
 (define_split
-  [(set (match_operand 0 "register_operand")
-	(not (match_operand 1 "register_operand")))]
+  [(set (match_operand 0 "general_reg_operand")
+	(not (match_operand 1 "general_reg_operand")))]
   "! TARGET_PARTIAL_REG_STALL && reload_completed
    && (GET_MODE (operands[0]) == HImode
        || (GET_MODE (operands[0]) == QImode
diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
index 3959c38..18f425c 100644
--- a/gcc/config/i386/predicates.md
+++ b/gcc/config/i386/predicates.md
@@ -32,6 +32,11 @@
   (and (match_code "reg")
        (not (match_test "ANY_FP_REGNO_P (REGNO (op))"))))
 
+;; True if the operand is a GENERAL class register.
+(define_predicate "general_reg_operand"
+  (and (match_code "reg")
+       (match_test "GENERAL_REG_P (op)")))
+
 ;; Return true if OP is a register operand other than an i387 fp register.
 (define_predicate "register_and_not_fp_reg_operand"
   (and (match_code "reg")
@@ -52,6 +57,10 @@
   (and (match_code "reg")
        (match_test "EXT_REX_SSE_REGNO_P (REGNO (op))")))
 
+;; True if the operand is an AVX-512 mask register.
+(define_predicate "mask_reg_operand"
+  (and (match_code "reg")
+       (match_test "MASK_REGNO_P (REGNO (op))")))
 
 ;; True if the operand is a Q_REGS class register.
 (define_predicate "q_regs_operand"
-- 
1.7.11.7

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH i386 2/8] [AVX512] Add mask registers.
  2013-08-28 22:07                                     ` Kirill Yukhin
@ 2013-08-29 12:19                                       ` Kirill Yukhin
  2013-09-04 18:46                                         ` Kirill Yukhin
                                                           ` (2 more replies)
  0 siblings, 3 replies; 31+ messages in thread
From: Kirill Yukhin @ 2013-08-29 12:19 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Vladimir Makarov, Uros Bizjak, Jakub Jelinek, GCC Patches

Hello,

> Testing in progress, is it ok for trunk if pass?
I forgot to add clobber to split of andn,
so testing fail. Fixed.
Updated patch in the bottom.

Testing:
  1. Bootstrap pass.
  2. make check shows no regressions.
  3. Spec 2000 & 2006 build show no regressions both with and without -mavx512f option.
  4. Spec 2000 & 2006 run shows no stability regressions without -mavx512f option.

Is it ok for trunk?

Thanks, K

---
 gcc/config/i386/constraints.md |   8 +-
 gcc/config/i386/i386.c         |  38 ++++--
 gcc/config/i386/i386.h         |  38 ++++--
 gcc/config/i386/i386.md        | 288 ++++++++++++++++++++++++++++++++++-------
 gcc/config/i386/predicates.md  |   9 ++
 5 files changed, 318 insertions(+), 63 deletions(-)

diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
index 28e626f..92e0c05 100644
--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -19,7 +19,7 @@
 
 ;;; Unused letters:
 ;;;     B     H           T
-;;;           h jk
+;;;           h j
 
 ;; Integer register constraints.
 ;; It is not necessary to define 'r' here.
@@ -78,6 +78,12 @@
  "TARGET_80387 || TARGET_FLOAT_RETURNS_IN_80387 ? FP_SECOND_REG : NO_REGS"
  "Second from top of 80387 floating-point stack (@code{%st(1)}).")
 
+(define_register_constraint "k" "TARGET_AVX512F ? MASK_EVEX_REGS : NO_REGS"
+"@internal Any mask register that can be used as predicate, i.e. k1-k7.")
+
+(define_register_constraint "Yk" "TARGET_AVX512F ? MASK_REGS : NO_REGS"
+"@internal Any mask register.")
+
 ;; Vector registers (also used for plain floating point nowadays).
 (define_register_constraint "y" "TARGET_MMX ? MMX_REGS : NO_REGS"
  "Any MMX register.")
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index d05dbf0..4e9bac0 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2032,6 +2032,9 @@ enum reg_class const regclass_map[FIRST_PSEUDO_REGISTER] =
   EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS,
   EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS,
   EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS,
+  /* Mask registers.  */
+  MASK_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS,
+  MASK_EVEX_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS,
 };
 
 /* The "default" register map used in 32bit mode.  */
@@ -2047,6 +2050,7 @@ int const dbx_register_map[FIRST_PSEUDO_REGISTER] =
   -1, -1, -1, -1, -1, -1, -1, -1,	/* extended SSE registers */
   -1, -1, -1, -1, -1, -1, -1, -1,       /* AVX-512 registers 16-23*/
   -1, -1, -1, -1, -1, -1, -1, -1,       /* AVX-512 registers 24-31*/
+  93, 94, 95, 96, 97, 98, 99, 100,      /* Mask registers */
 };
 
 /* The "default" register map used in 64bit mode.  */
@@ -2062,6 +2066,7 @@ int const dbx64_register_map[FIRST_PSEUDO_REGISTER] =
   25, 26, 27, 28, 29, 30, 31, 32,	/* extended SSE registers */
   67, 68, 69, 70, 71, 72, 73, 74,       /* AVX-512 registers 16-23 */
   75, 76, 77, 78, 79, 80, 81, 82,       /* AVX-512 registers 24-31 */
+  118, 119, 120, 121, 122, 123, 124, 125, /* Mask registers */
 };
 
 /* Define the register numbers to be used in Dwarf debugging information.
@@ -2129,6 +2134,7 @@ int const svr4_dbx_register_map[FIRST_PSEUDO_REGISTER] =
   -1, -1, -1, -1, -1, -1, -1, -1,	/* extended SSE registers */
   -1, -1, -1, -1, -1, -1, -1, -1,       /* AVX-512 registers 16-23*/
   -1, -1, -1, -1, -1, -1, -1, -1,       /* AVX-512 registers 24-31*/
+  93, 94, 95, 96, 97, 98, 99, 100,      /* Mask registers */
 };
 
 /* Define parameter passing and return registers.  */
@@ -4219,8 +4225,13 @@ ix86_conditional_register_usage (void)
 
   /* If AVX512F is disabled, squash the registers.  */
   if (! TARGET_AVX512F)
-    for (i = FIRST_EXT_REX_SSE_REG; i < LAST_EXT_REX_SSE_REG; i++)
-      fixed_regs[i] = call_used_regs[i] = 1, reg_names[i] = "";
+    {
+      for (i = FIRST_EXT_REX_SSE_REG; i < LAST_EXT_REX_SSE_REG; i++)
+	fixed_regs[i] = call_used_regs[i] = 1, reg_names[i] = "";
+
+      for (i = FIRST_MASK_REG; i < LAST_MASK_REG; i++)
+	fixed_regs[i] = call_used_regs[i] = 1, reg_names[i] = "";
+    }
 }
 
 \f
@@ -33889,10 +33900,12 @@ ix86_preferred_reload_class (rtx x, reg_class_t regclass)
     return regclass;
 
   /* Force constants into memory if we are loading a (nonzero) constant into
-     an MMX or SSE register.  This is because there are no MMX/SSE instructions
-     to load from a constant.  */
+     an MMX, SSE or MASK register.  This is because there are no MMX/SSE/MASK
+     instructions to load from a constant.  */
   if (CONSTANT_P (x)
-      && (MAYBE_MMX_CLASS_P (regclass) || MAYBE_SSE_CLASS_P (regclass)))
+      && (MAYBE_MMX_CLASS_P (regclass)
+	  || MAYBE_SSE_CLASS_P (regclass)
+	  || MAYBE_MASK_CLASS_P (regclass)))
     return NO_REGS;
 
   /* Prefer SSE regs only, if we can use them for math.  */
@@ -33996,10 +34009,11 @@ ix86_secondary_reload (bool in_p, rtx x, reg_class_t rclass,
 
   /* QImode spills from non-QI registers require
      intermediate register on 32bit targets.  */
-  if (!TARGET_64BIT
-      && !in_p && mode == QImode
-      && INTEGER_CLASS_P (rclass)
-      && MAYBE_NON_Q_CLASS_P (rclass))
+  if (mode == QImode
+      && (MAYBE_MASK_CLASS_P (rclass)
+	  || (!TARGET_64BIT && !in_p
+	      && INTEGER_CLASS_P (rclass)
+	      && MAYBE_NON_Q_CLASS_P (rclass))))
     {
       int regno;
 
@@ -34421,6 +34435,8 @@ ix86_hard_regno_mode_ok (int regno, enum machine_mode mode)
     return false;
   if (STACK_REGNO_P (regno))
     return VALID_FP_MODE_P (mode);
+  if (MASK_REGNO_P (regno))
+    return VALID_MASK_REG_MODE (mode);
   if (SSE_REGNO_P (regno))
     {
       /* We implement the move patterns for all vector modes into and
@@ -35230,6 +35246,10 @@ x86_order_regs_for_local_alloc (void)
    for (i = FIRST_EXT_REX_SSE_REG; i <= LAST_EXT_REX_SSE_REG; i++)
      reg_alloc_order [pos++] = i;
 
+   /* Mask register.  */
+   for (i = FIRST_MASK_REG; i <= LAST_MASK_REG; i++)
+     reg_alloc_order [pos++] = i;
+
    /* x87 registers.  */
    if (TARGET_SSE_MATH)
      for (i = FIRST_STACK_REG; i <= LAST_STACK_REG; i++)
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index e820aa6..709d3ed 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -893,7 +893,7 @@ enum target_cpu_default
    eliminated during reloading in favor of either the stack or frame
    pointer.  */
 
-#define FIRST_PSEUDO_REGISTER 69
+#define FIRST_PSEUDO_REGISTER 77
 
 /* Number of hardware registers that go into the DWARF-2 unwind info.
    If not defined, equals FIRST_PSEUDO_REGISTER.  */
@@ -923,7 +923,9 @@ enum target_cpu_default
 /*xmm16,xmm17,xmm18,xmm19,xmm20,xmm21,xmm22,xmm23*/		\
      0,   0,    0,    0,    0,    0,    0,    0,		\
 /*xmm24,xmm25,xmm26,xmm27,xmm28,xmm29,xmm30,xmm31*/		\
-     0,   0,    0,    0,    0,    0,    0,    0 }
+     0,   0,    0,    0,    0,    0,    0,    0,		\
+/*  k0,  k1, k2, k3, k4, k5, k6, k7*/				\
+     0,  0,   0,  0,  0,  0,  0,  0 }
 
 /* 1 for registers not available across function calls.
    These must include the FIXED_REGISTERS and also any
@@ -955,7 +957,9 @@ enum target_cpu_default
 /*xmm16,xmm17,xmm18,xmm19,xmm20,xmm21,xmm22,xmm23*/		\
      6,    6,     6,    6,    6,    6,    6,    6,		\
 /*xmm24,xmm25,xmm26,xmm27,xmm28,xmm29,xmm30,xmm31*/		\
-     6,    6,     6,    6,    6,    6,    6,    6 }
+     6,    6,     6,    6,    6,    6,    6,    6,		\
+ /* k0,  k1,  k2,  k3,  k4,  k5,  k6,  k7*/			\
+     1,   1,   1,   1,   1,   1,   1,   1 }
 
 /* Order in which to allocate registers.  Each register must be
    listed once, even those in FIXED_REGISTERS.  List frame pointer
@@ -971,7 +975,7 @@ enum target_cpu_default
    18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32,	\
    33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,  \
    48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62,	\
-   63, 64, 65, 66, 67, 68 }
+   63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76 }
 
 /* ADJUST_REG_ALLOC_ORDER is a macro which permits reg_alloc_order
    to be rearranged based on a particular function.  When using sse math,
@@ -1068,6 +1072,8 @@ enum target_cpu_default
    || (MODE) == V16SImode || (MODE) == V32HImode || (MODE) == V8DFmode	\
    || (MODE) == V16SFmode)
 
+#define VALID_MASK_REG_MODE(MODE) ((MODE) == HImode || (MODE) == QImode)
+
 /* Value is 1 if hard register REGNO can hold a value of machine-mode MODE.  */
 
 #define HARD_REGNO_MODE_OK(REGNO, MODE)	\
@@ -1093,8 +1099,10 @@ enum target_cpu_default
   (CC_REGNO_P (REGNO) ? VOIDmode					\
    : (MODE) == VOIDmode && (NREGS) != 1 ? VOIDmode			\
    : (MODE) == VOIDmode ? choose_hard_reg_mode ((REGNO), (NREGS), false) \
-   : (MODE) == HImode && !TARGET_PARTIAL_REG_STALL ? SImode		\
-   : (MODE) == QImode && !(TARGET_64BIT || QI_REGNO_P (REGNO)) ? SImode	\
+   : (MODE) == HImode && !(TARGET_PARTIAL_REG_STALL			\
+			   || MASK_REGNO_P (REGNO)) ? SImode		\
+   : (MODE) == QImode && !(TARGET_64BIT || QI_REGNO_P (REGNO)		\
+			   || MASK_REGNO_P (REGNO)) ? SImode		\
    : (MODE))
 
 /* The only ABI that saves SSE registers across calls is Win64 (thus no
@@ -1141,6 +1149,9 @@ enum target_cpu_default
 #define FIRST_EXT_REX_SSE_REG  (LAST_REX_SSE_REG + 1) /*53*/
 #define LAST_EXT_REX_SSE_REG   (FIRST_EXT_REX_SSE_REG + 15) /*68*/
 
+#define FIRST_MASK_REG  (LAST_EXT_REX_SSE_REG + 1) /*69*/
+#define LAST_MASK_REG   (FIRST_MASK_REG + 7) /*76*/
+
 /* Override this in other tm.h files to cope with various OS lossage
    requiring a frame pointer.  */
 #ifndef SUBTARGET_FRAME_POINTER_REQUIRED
@@ -1229,6 +1240,8 @@ enum reg_class
   FLOAT_INT_REGS,
   INT_SSE_REGS,
   FLOAT_INT_SSE_REGS,
+  MASK_EVEX_REGS,
+  MASK_REGS,
   ALL_REGS, LIM_REG_CLASSES
 };
 
@@ -1250,6 +1263,8 @@ enum reg_class
   reg_classes_intersect_p ((CLASS), ALL_SSE_REGS)
 #define MAYBE_MMX_CLASS_P(CLASS) \
   reg_classes_intersect_p ((CLASS), MMX_REGS)
+#define MAYBE_MASK_CLASS_P(CLASS) \
+  reg_classes_intersect_p ((CLASS), MASK_REGS)
 
 #define Q_CLASS_P(CLASS) \
   reg_class_subset_p ((CLASS), Q_REGS)
@@ -1282,6 +1297,8 @@ enum reg_class
    "FLOAT_INT_REGS",			\
    "INT_SSE_REGS",			\
    "FLOAT_INT_SSE_REGS",		\
+   "MASK_EVEX_REGS",			\
+   "MASK_REGS",				\
    "ALL_REGS" }
 
 /* Define which registers fit in which classes.  This is an initializer
@@ -1319,7 +1336,9 @@ enum reg_class
 {   0x11ffff,    0x1fe0,   0x0 },       /* FLOAT_INT_REGS */            \
 { 0x1ff100ff,0xffffffe0,  0x1f },       /* INT_SSE_REGS */              \
 { 0x1ff1ffff,0xffffffe0,  0x1f },       /* FLOAT_INT_SSE_REGS */        \
-{ 0xffffffff,0xffffffff,  0x1f }                                        \
+       { 0x0,       0x0,0x1fc0 },       /* MASK_EVEX_REGS */           \
+       { 0x0,       0x0,0x1fe0 },       /* MASK_REGS */                 \
+{ 0xffffffff,0xffffffff,0x1fff }                                        \
 }
 
 /* The same information, inverted:
@@ -1377,6 +1396,8 @@ enum reg_class
          : (N) <= LAST_REX_SSE_REG ? (FIRST_REX_SSE_REG + (N) - 8) \
                                    : (FIRST_EXT_REX_SSE_REG + (N) - 16))
 
+#define MASK_REGNO_P(N) IN_RANGE ((N), FIRST_MASK_REG, LAST_MASK_REG)
+#define ANY_MASK_REG_P(X) (REG_P (X) && MASK_REGNO_P (REGNO (X)))
 
 #define SSE_FLOAT_MODE_P(MODE) \
   ((TARGET_SSE && (MODE) == SFmode) || (TARGET_SSE2 && (MODE) == DFmode))
@@ -1933,7 +1954,8 @@ do {							\
  "xmm16", "xmm17", "xmm18", "xmm19",					\
  "xmm20", "xmm21", "xmm22", "xmm23",					\
  "xmm24", "xmm25", "xmm26", "xmm27",					\
- "xmm28", "xmm29", "xmm30", "xmm31" }
+ "xmm28", "xmm29", "xmm30", "xmm31",					\
+ "k0", "k1", "k2", "k3", "k4", "k5", "k6", "k7" }
 
 #define REGISTER_NAMES HI_REGISTER_NAMES
 
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 3d7533a..2113fb9 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -328,6 +328,14 @@
    (XMM29_REG			66)
    (XMM30_REG			67)
    (XMM31_REG			68)
+   (MASK0_REG			69)
+   (MASK1_REG			70)
+   (MASK2_REG			71)
+   (MASK3_REG			72)
+   (MASK4_REG			73)
+   (MASK5_REG			74)
+   (MASK6_REG			75)
+   (MASK7_REG			76)
   ])
 
 ;; Insns whose names begin with "x86_" are emitted by gen_FOO calls
@@ -360,7 +368,7 @@
    sseishft,sseishft1,ssecmp,ssecomi,
    ssecvt,ssecvt1,sseicvt,sseins,
    sseshuf,sseshuf1,ssemuladd,sse4arg,
-   lwp,
+   lwp,mskmov,msklog,
    mmx,mmxmov,mmxadd,mmxmul,mmxcmp,mmxcvt,mmxshft"
   (const_string "other"))
 
@@ -379,7 +387,7 @@
 			  ssemul,sseimul,ssediv,sselog,sselog1,
 			  sseishft,sseishft1,ssecmp,ssecomi,
 			  ssecvt,ssecvt1,sseicvt,sseins,
-			  sseshuf,sseshuf1,ssemuladd,sse4arg")
+			  sseshuf,sseshuf1,ssemuladd,sse4arg,mskmov")
 	   (const_string "sse")
 	 (eq_attr "type" "mmx,mmxmov,mmxadd,mmxmul,mmxcmp,mmxcvt,mmxshft")
 	   (const_string "mmx")
@@ -390,7 +398,7 @@
 ;; The (bounding maximum) length of an instruction immediate.
 (define_attr "length_immediate" ""
   (cond [(eq_attr "type" "incdec,setcc,icmov,str,lea,other,multi,idiv,leave,
-			  bitmanip,imulx")
+			  bitmanip,imulx,msklog,mskmov")
 	   (const_int 0)
 	 (eq_attr "unit" "i387,sse,mmx")
 	   (const_int 0)
@@ -451,7 +459,7 @@
 ;; Set when 0f opcode prefix is used.
 (define_attr "prefix_0f" ""
   (if_then_else
-    (ior (eq_attr "type" "imovx,setcc,icmov,bitmanip")
+    (ior (eq_attr "type" "imovx,setcc,icmov,bitmanip,msklog,mskmov")
 	 (eq_attr "unit" "sse,mmx"))
     (const_int 1)
     (const_int 0)))
@@ -651,7 +659,7 @@
 		   fmov,fcmp,fsgn,
 		   sse,ssemov,ssecmp,ssecomi,ssecvt,ssecvt1,sseicvt,
 		   sselog1,sseshuf1,sseadd1,sseiadd1,sseishft1,
-		   mmx,mmxmov,mmxcmp,mmxcvt")
+		   mmx,mmxmov,mmxcmp,mmxcvt,mskmov,msklog")
 	      (match_operand 2 "memory_operand"))
 	   (const_string "load")
 	 (and (eq_attr "type" "icmov,ssemuladd,sse4arg")
@@ -695,7 +703,7 @@
 ;; Used to control the "enabled" attribute on a per-instruction basis.
 (define_attr "isa" "base,x64,x64_sse4,x64_sse4_noavx,x64_avx,nox64,
 		    sse2,sse2_noavx,sse3,sse4,sse4_noavx,avx,noavx,
-		    avx2,noavx2,bmi2,fma4,fma,avx512f,noavx512f,fma_avx512f"
+		    avx2,noavx2,bmi,bmi2,fma4,fma,avx512f,noavx512f,fma_avx512f"
   (const_string "base"))
 
 (define_attr "enabled" ""
@@ -718,6 +726,7 @@
 	 (eq_attr "isa" "noavx") (symbol_ref "!TARGET_AVX")
 	 (eq_attr "isa" "avx2") (symbol_ref "TARGET_AVX2")
 	 (eq_attr "isa" "noavx2") (symbol_ref "!TARGET_AVX2")
+	 (eq_attr "isa" "bmi") (symbol_ref "TARGET_BMI")
 	 (eq_attr "isa" "bmi2") (symbol_ref "TARGET_BMI2")
 	 (eq_attr "isa" "fma4") (symbol_ref "TARGET_FMA4")
 	 (eq_attr "isa" "fma") (symbol_ref "TARGET_FMA")
@@ -2213,8 +2222,8 @@
 	   (const_string "SI")))])
 
 (define_insn "*movhi_internal"
-  [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r ,r ,m")
-	(match_operand:HI 1 "general_operand"	   "r ,rn,rm,rn"))]
+  [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r ,r ,m ,Yk,Yk,rm")
+	(match_operand:HI 1 "general_operand"      "r ,rn,rm,rn,rm,Yk,Yk"))]
   "!(MEM_P (operands[0]) && MEM_P (operands[1]))"
 {
   switch (get_attr_type (insn))
@@ -2223,6 +2232,16 @@
       /* movzwl is faster than movw on p2 due to partial word stalls,
 	 though not as fast as an aligned movl.  */
       return "movz{wl|x}\t{%1, %k0|%k0, %1}";
+
+    case TYPE_MSKMOV:
+      switch (which_alternative)
+        {
+	case 4: return "kmovw\t{%k1, %0|%0, %k1}";
+	case 5: return "kmovw\t{%1, %0|%0, %1}";
+	case 6: return "kmovw\t{%1, %k0|%k0, %1}";
+	default: gcc_unreachable ();
+	}
+
     default:
       if (get_attr_mode (insn) == MODE_SI)
         return "mov{l}\t{%k1, %k0|%k0, %k1}";
@@ -2240,11 +2259,17 @@
 	    (and (eq_attr "alternative" "1,2")
 		 (match_operand:HI 1 "aligned_operand"))
 	      (const_string "imov")
+	    (eq_attr "alternative" "4,5,6")
+	      (const_string "mskmov")
 	    (and (match_test "TARGET_MOVX")
 		 (eq_attr "alternative" "0,2"))
 	      (const_string "imovx")
 	   ]
 	   (const_string "imov")))
+    (set (attr "prefix")
+      (if_then_else (eq_attr "alternative" "4,5,6")
+	(const_string "vex")
+	(const_string "orig")))
     (set (attr "mode")
       (cond [(eq_attr "type" "imovx")
 	       (const_string "SI")
@@ -2269,8 +2294,8 @@
 ;; register stall machines with, where we use QImode instructions, since
 ;; partial register stall can be caused there.  Then we use movzx.
 (define_insn "*movqi_internal"
-  [(set (match_operand:QI 0 "nonimmediate_operand" "=q,q ,q ,r,r ,?r,m")
-	(match_operand:QI 1 "general_operand"      "q ,qn,qm,q,rn,qm,qn"))]
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=q,q ,q ,r,r ,?r,m ,Yk,Yk,r")
+	(match_operand:QI 1 "general_operand"      "q ,qn,qm,q,rn,qm,qn,r ,Yk,Yk"))]
   "!(MEM_P (operands[0]) && MEM_P (operands[1]))"
 {
   switch (get_attr_type (insn))
@@ -2278,6 +2303,16 @@
     case TYPE_IMOVX:
       gcc_assert (ANY_QI_REG_P (operands[1]) || MEM_P (operands[1]));
       return "movz{bl|x}\t{%1, %k0|%k0, %1}";
+
+    case TYPE_MSKMOV:
+      switch (which_alternative)
+        {
+	case 7: return "kmovw\t{%k1, %0|%0, %k1}";
+	case 8: return "kmovw\t{%1, %0|%0, %1}";
+	case 9: return "kmovw\t{%1, %k0|%k0, %1}";
+	default: gcc_unreachable ();
+	}
+
     default:
       if (get_attr_mode (insn) == MODE_SI)
         return "mov{l}\t{%k1, %k0|%k0, %k1}";
@@ -2297,11 +2332,17 @@
 	      (const_string "imov")
 	    (eq_attr "alternative" "3,5")
 	      (const_string "imovx")
+	    (eq_attr "alternative" "7,8,9")
+	      (const_string "mskmov")
 	    (and (match_test "TARGET_MOVX")
 		 (eq_attr "alternative" "2"))
 	      (const_string "imovx")
 	   ]
 	   (const_string "imov")))
+   (set (attr "prefix")
+     (if_then_else (eq_attr "alternative" "7,8,9")
+       (const_string "vex")
+       (const_string "orig")))
    (set (attr "mode")
       (cond [(eq_attr "alternative" "3,4,5")
 	       (const_string "SI")
@@ -7494,6 +7535,26 @@
   operands[3] = gen_lowpart (QImode, operands[3]);
 })
 
+(define_split
+  [(set (match_operand:SWI12 0 "mask_reg_operand")
+	(any_logic:SWI12 (match_operand:SWI12 1 "mask_reg_operand")
+			 (match_operand:SWI12 2 "mask_reg_operand")))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_AVX512F && reload_completed"
+  [(set (match_dup 0)
+	(any_logic:SWI12 (match_dup 1)
+			 (match_dup 2)))])
+
+(define_insn "*k<logic><mode>"
+  [(set (match_operand:SWI12 0 "mask_reg_operand" "=Yk")
+	(any_logic:SWI12 (match_operand:SWI12 1 "mask_reg_operand" "Yk")
+			 (match_operand:SWI12 2 "mask_reg_operand" "Yk")))]
+  "TARGET_AVX512F"
+  "k<logic>w\t{%2, %1, %0|%0, %1, %2}";
+  [(set_attr "mode" "<MODE>")
+   (set_attr "type" "msklog")
+   (set_attr "prefix" "vex")])
+
 ;; %%% This used to optimize known byte-wide and operations to memory,
 ;; and sometimes to QImode registers.  If this is considered useful,
 ;; it should be done with splitters.
@@ -7616,10 +7677,10 @@
   [(set_attr "type" "alu")
    (set_attr "mode" "SI")])
 
-(define_insn "*andhi_1"
-  [(set (match_operand:HI 0 "nonimmediate_operand" "=rm,r,Ya")
-	(and:HI (match_operand:HI 1 "nonimmediate_operand" "%0,0,qm")
-		(match_operand:HI 2 "general_operand" "rn,rm,L")))
+(define_insn "andhi_1"
+  [(set (match_operand:HI 0 "nonimmediate_operand" "=rm,r,Ya,!Yk")
+	(and:HI (match_operand:HI 1 "nonimmediate_operand" "%0,0,qm,Yk")
+		(match_operand:HI 2 "general_operand" "rn,rm,L,Yk")))
    (clobber (reg:CC FLAGS_REG))]
   "ix86_binary_operator_ok (AND, HImode, operands)"
 {
@@ -7628,34 +7689,38 @@
     case TYPE_IMOVX:
       return "#";
 
+    case TYPE_MSKLOG:
+      return "kandw\t{%2, %1, %0|%0, %1, %2}";
+
     default:
       gcc_assert (rtx_equal_p (operands[0], operands[1]));
       return "and{w}\t{%2, %0|%0, %2}";
     }
 }
-  [(set_attr "type" "alu,alu,imovx")
-   (set_attr "length_immediate" "*,*,0")
+  [(set_attr "type" "alu,alu,imovx,msklog")
+   (set_attr "length_immediate" "*,*,0,*")
    (set (attr "prefix_rex")
      (if_then_else
        (and (eq_attr "type" "imovx")
 	    (match_operand 1 "ext_QIreg_operand"))
        (const_string "1")
        (const_string "*")))
-   (set_attr "mode" "HI,HI,SI")])
+   (set_attr "mode" "HI,HI,SI,HI")])
 
 ;; %%% Potential partial reg stall on alternative 2.  What to do?
 (define_insn "*andqi_1"
-  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,q,r")
-	(and:QI (match_operand:QI 1 "nonimmediate_operand" "%0,0,0")
-		(match_operand:QI 2 "general_operand" "qn,qmn,rn")))
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,q,r,!Yk")
+	(and:QI (match_operand:QI 1 "nonimmediate_operand" "%0,0,0,Yk")
+		(match_operand:QI 2 "general_operand" "qn,qmn,rn,Yk")))
    (clobber (reg:CC FLAGS_REG))]
   "ix86_binary_operator_ok (AND, QImode, operands)"
   "@
    and{b}\t{%2, %0|%0, %2}
    and{b}\t{%2, %0|%0, %2}
-   and{l}\t{%k2, %k0|%k0, %k2}"
-  [(set_attr "type" "alu")
-   (set_attr "mode" "QI,QI,SI")])
+   and{l}\t{%k2, %k0|%k0, %k2}
+   kandw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "alu,alu,alu,msklog")
+   (set_attr "mode" "QI,QI,SI,HI")])
 
 (define_insn "*andqi_1_slp"
   [(set (strict_low_part (match_operand:QI 0 "nonimmediate_operand" "+qm,q"))
@@ -7668,6 +7733,40 @@
   [(set_attr "type" "alu1")
    (set_attr "mode" "QI")])
 
+(define_insn "kandn<mode>"
+  [(set (match_operand:SWI12 0 "register_operand" "=r,&r,!Yk")
+	(and:SWI12
+	  (not:SWI12
+	    (match_operand:SWI12 1 "register_operand" "r,0,Yk"))
+	  (match_operand:SWI12 2 "register_operand" "r,r,Yk")))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_AVX512F"
+  "@
+   andn\t{%k2, %k1, %k0|%k0, %k1, %k2}
+   #
+   kandnw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "bmi,*,avx512f")
+   (set_attr "type" "bitmanip,*,msklog")
+   (set_attr "prefix" "*,*,vex")
+   (set_attr "btver2_decode" "direct,*,*")
+   (set_attr "mode" "<MODE>")])
+
+(define_split
+  [(set (match_operand:SWI12 0 "general_reg_operand")
+	(and:SWI12
+	  (not:SWI12
+	    (match_dup 0))
+	  (match_operand:SWI12 1 "general_reg_operand")))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_AVX512F && !TARGET_BMI && reload_completed"
+  [(set (match_dup 0)
+	(not:HI (match_dup 0)))
+   (parallel [(set (match_dup 0)
+		   (and:HI (match_dup 0)
+			   (match_dup 1)))
+	      (clobber (reg:CC FLAGS_REG))])]
+  "")
+
 ;; Turn *anddi_1 into *andsi_1_zext if possible.
 (define_split
   [(set (match_operand:DI 0 "register_operand")
@@ -7999,29 +8098,44 @@
   "ix86_expand_binary_operator (<CODE>, <MODE>mode, operands); DONE;")
 
 (define_insn "*<code><mode>_1"
-  [(set (match_operand:SWI248 0 "nonimmediate_operand" "=r,rm")
-	(any_or:SWI248
-	 (match_operand:SWI248 1 "nonimmediate_operand" "%0,0")
-	 (match_operand:SWI248 2 "<general_operand>" "<g>,r<i>")))
+  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=r,rm")
+	(any_or:SWI48
+	 (match_operand:SWI48 1 "nonimmediate_operand" "%0,0")
+	 (match_operand:SWI48 2 "<general_operand>" "<g>,r<i>")))
    (clobber (reg:CC FLAGS_REG))]
   "ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
   "<logic>{<imodesuffix>}\t{%2, %0|%0, %2}"
   [(set_attr "type" "alu")
    (set_attr "mode" "<MODE>")])
 
+(define_insn "<code>hi_1"
+  [(set (match_operand:HI 0 "nonimmediate_operand" "=r,rm,!Yk")
+	(any_or:HI
+	 (match_operand:HI 1 "nonimmediate_operand" "%0,0,Yk")
+	 (match_operand:HI 2 "general_operand" "<g>,r<i>,Yk")))
+   (clobber (reg:CC FLAGS_REG))]
+  "ix86_binary_operator_ok (<CODE>, HImode, operands)"
+  "@
+  <logic>{w}\t{%2, %0|%0, %2}
+  <logic>{w}\t{%2, %0|%0, %2}
+  k<logic>w\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "alu,alu,msklog")
+   (set_attr "mode" "HI")])
+
 ;; %%% Potential partial reg stall on alternative 2.  What to do?
 (define_insn "*<code>qi_1"
-  [(set (match_operand:QI 0 "nonimmediate_operand" "=q,m,r")
-	(any_or:QI (match_operand:QI 1 "nonimmediate_operand" "%0,0,0")
-		   (match_operand:QI 2 "general_operand" "qmn,qn,rn")))
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=q,m,r,!Yk")
+	(any_or:QI (match_operand:QI 1 "nonimmediate_operand" "%0,0,0,Yk")
+		   (match_operand:QI 2 "general_operand" "qmn,qn,rn,Yk")))
    (clobber (reg:CC FLAGS_REG))]
   "ix86_binary_operator_ok (<CODE>, QImode, operands)"
   "@
    <logic>{b}\t{%2, %0|%0, %2}
    <logic>{b}\t{%2, %0|%0, %2}
-   <logic>{l}\t{%k2, %k0|%k0, %k2}"
-  [(set_attr "type" "alu")
-   (set_attr "mode" "QI,QI,SI")])
+   <logic>{l}\t{%k2, %k0|%k0, %k2}
+   k<logic>w\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "alu,alu,alu,msklog")
+   (set_attr "mode" "QI,QI,SI,HI")])
 
 ;; See comment for addsi_1_zext why we do use nonimmediate_operand
 (define_insn "*<code>si_1_zext"
@@ -8071,6 +8185,74 @@
   [(set_attr "type" "alu")
    (set_attr "mode" "<MODE>")])
 
+(define_insn "kxnor<mode>"
+  [(set (match_operand:SWI12 0 "register_operand" "=r,!Yk")
+	(not:SWI12
+	  (xor:SWI12
+	    (match_operand:SWI12 1 "register_operand" "0,Yk")
+	    (match_operand:SWI12 2 "register_operand" "r,Yk"))))]
+  "TARGET_AVX512F"
+  "@
+   #
+   kxnorw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "*,msklog")
+   (set_attr "prefix" "*,vex")
+   (set_attr "mode" "<MODE>")])
+
+(define_split
+  [(set (match_operand:SWI12 0 "general_reg_operand")
+	(not:SWI12
+	  (xor:SWI12
+	    (match_dup 0)
+	    (match_operand:SWI12 1 "general_reg_operand"))))]
+  "TARGET_AVX512F && reload_completed"
+   [(parallel [(set (match_dup 0)
+		    (xor:HI (match_dup 0)
+			    (match_dup 1)))
+	       (clobber (reg:CC FLAGS_REG))])
+    (set (match_dup 0)
+	 (not:HI (match_dup 0)))]
+  "")
+
+(define_insn "kortestzhi"
+  [(set (reg:CCZ FLAGS_REG)
+	(compare:CCZ
+	  (ior:HI
+	    (match_operand:HI 0 "register_operand" "Yk")
+	    (match_operand:HI 1 "register_operand" "Yk"))
+	  (const_int 0)))]
+  "TARGET_AVX512F && ix86_match_ccmode (insn, CCZmode)"
+  "kortestw\t{%1, %0|%0, %1}"
+  [(set_attr "mode" "HI")
+   (set_attr "type" "msklog")
+   (set_attr "prefix" "vex")])
+
+(define_insn "kortestchi"
+  [(set (reg:CCC FLAGS_REG)
+	(compare:CCC
+	  (ior:HI
+	    (match_operand:HI 0 "register_operand" "Yk")
+	    (match_operand:HI 1 "register_operand" "Yk"))
+	  (const_int -1)))]
+  "TARGET_AVX512F && ix86_match_ccmode (insn, CCCmode)"
+  "kortestw\t{%1, %0|%0, %1}"
+  [(set_attr "mode" "HI")
+   (set_attr "type" "msklog")
+   (set_attr "prefix" "vex")])
+
+(define_insn "kunpckhi"
+  [(set (match_operand:HI 0 "register_operand" "=Yk")
+	(ior:HI
+	  (ashift:HI
+	    (match_operand:HI 1 "register_operand" "Yk")
+	    (const_int 8))
+	  (zero_extend:HI (match_operand:QI 2 "register_operand" "Yk"))))]
+  "TARGET_AVX512F"
+  "kunpckbw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mode" "HI")
+   (set_attr "type" "msklog")
+   (set_attr "prefix" "vex")])
+
 ;; See comment for addsi_1_zext why we do use nonimmediate_operand
 ;; ??? Special case for immediate operand is missing - it is tricky.
 (define_insn "*<code>si_2_zext"
@@ -8640,23 +8822,38 @@
   "ix86_expand_unary_operator (NOT, <MODE>mode, operands); DONE;")
 
 (define_insn "*one_cmpl<mode>2_1"
-  [(set (match_operand:SWI248 0 "nonimmediate_operand" "=rm")
-	(not:SWI248 (match_operand:SWI248 1 "nonimmediate_operand" "0")))]
+  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm")
+	(not:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" "0")))]
   "ix86_unary_operator_ok (NOT, <MODE>mode, operands)"
   "not{<imodesuffix>}\t%0"
   [(set_attr "type" "negnot")
    (set_attr "mode" "<MODE>")])
 
+(define_insn "*one_cmplhi2_1"
+  [(set (match_operand:HI 0 "nonimmediate_operand" "=rm,!Yk")
+	(not:HI (match_operand:HI 1 "nonimmediate_operand" "0,Yk")))]
+  "ix86_unary_operator_ok (NOT, HImode, operands)"
+  "@
+   not{w}\t%0
+   knotw\t{%1, %0|%0, %1}"
+  [(set_attr "isa" "*,avx512f")
+   (set_attr "type" "negnot,msklog")
+   (set_attr "prefix" "*,vex")
+   (set_attr "mode" "HI")])
+
 ;; %%% Potential partial reg stall on alternative 1.  What to do?
 (define_insn "*one_cmplqi2_1"
-  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,r")
-	(not:QI (match_operand:QI 1 "nonimmediate_operand" "0,0")))]
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,r,!Yk")
+	(not:QI (match_operand:QI 1 "nonimmediate_operand" "0,0,Yk")))]
   "ix86_unary_operator_ok (NOT, QImode, operands)"
   "@
    not{b}\t%0
-   not{l}\t%k0"
-  [(set_attr "type" "negnot")
-   (set_attr "mode" "QI,SI")])
+   not{l}\t%k0
+   knotw\t{%1, %0|%0, %1}"
+  [(set_attr "isa" "*,*,avx512f")
+   (set_attr "type" "negnot,negnot,msklog")
+   (set_attr "prefix" "*,*,vex")
+   (set_attr "mode" "QI,SI,QI")])
 
 ;; ??? Currently never generated - xor is used instead.
 (define_insn "*one_cmplsi2_1_zext"
@@ -16380,11 +16577,11 @@
 })
 
 ;; Avoid redundant prefixes by splitting HImode arithmetic to SImode.
-
+;; Do not split instructions with mask registers.
 (define_split
-  [(set (match_operand 0 "register_operand")
+  [(set (match_operand 0 "general_reg_operand")
 	(match_operator 3 "promotable_binary_operator"
-	   [(match_operand 1 "register_operand")
+	   [(match_operand 1 "general_reg_operand")
 	    (match_operand 2 "aligned_operand")]))
    (clobber (reg:CC FLAGS_REG))]
   "! TARGET_PARTIAL_REG_STALL && reload_completed
@@ -16479,9 +16676,10 @@
   operands[1] = gen_lowpart (SImode, operands[1]);
 })
 
+;; Do not split instructions with mask regs.
 (define_split
-  [(set (match_operand 0 "register_operand")
-	(not (match_operand 1 "register_operand")))]
+  [(set (match_operand 0 "general_reg_operand")
+	(not (match_operand 1 "general_reg_operand")))]
   "! TARGET_PARTIAL_REG_STALL && reload_completed
    && (GET_MODE (operands[0]) == HImode
        || (GET_MODE (operands[0]) == QImode
diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
index 3959c38..18f425c 100644
--- a/gcc/config/i386/predicates.md
+++ b/gcc/config/i386/predicates.md
@@ -32,6 +32,11 @@
   (and (match_code "reg")
        (not (match_test "ANY_FP_REGNO_P (REGNO (op))"))))
 
+;; True if the operand is a GENERAL class register.
+(define_predicate "general_reg_operand"
+  (and (match_code "reg")
+       (match_test "GENERAL_REG_P (op)")))
+
 ;; Return true if OP is a register operand other than an i387 fp register.
 (define_predicate "register_and_not_fp_reg_operand"
   (and (match_code "reg")
@@ -52,6 +57,10 @@
   (and (match_code "reg")
        (match_test "EXT_REX_SSE_REGNO_P (REGNO (op))")))
 
+;; True if the operand is an AVX-512 mask register.
+(define_predicate "mask_reg_operand"
+  (and (match_code "reg")
+       (match_test "MASK_REGNO_P (REGNO (op))")))
 
 ;; True if the operand is a Q_REGS class register.
 (define_predicate "q_regs_operand"
-- 
1.7.11.7



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH i386 2/8] [AVX512] Add mask registers.
  2013-08-29 12:19                                       ` Kirill Yukhin
@ 2013-09-04 18:46                                         ` Kirill Yukhin
  2013-09-09 11:29                                           ` Kirill Yukhin
  2013-09-06 13:49                                         ` Kirill Yukhin
  2013-09-09 17:54                                         ` Richard Henderson
  2 siblings, 1 reply; 31+ messages in thread
From: Kirill Yukhin @ 2013-09-04 18:46 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Vladimir Makarov, Uros Bizjak, Jakub Jelinek, GCC Patches

Hello,

PING.

--
Thanks, K

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH i386 2/8] [AVX512] Add mask registers.
  2013-08-29 12:19                                       ` Kirill Yukhin
  2013-09-04 18:46                                         ` Kirill Yukhin
@ 2013-09-06 13:49                                         ` Kirill Yukhin
  2013-09-09 17:54                                         ` Richard Henderson
  2 siblings, 0 replies; 31+ messages in thread
From: Kirill Yukhin @ 2013-09-06 13:49 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Vladimir Makarov, Uros Bizjak, Jakub Jelinek, GCC Patches

Hello,
On 29 Aug 15:59, Kirill Yukhin wrote:
>  /* Define parameter passing and return registers.  */
> @@ -4219,8 +4225,13 @@ ix86_conditional_register_usage (void)
>  
>    /* If AVX512F is disabled, squash the registers.  */
>    if (! TARGET_AVX512F)
> -    for (i = FIRST_EXT_REX_SSE_REG; i < LAST_EXT_REX_SSE_REG; i++)
> -      fixed_regs[i] = call_used_regs[i] = 1, reg_names[i] = "";
> +    {
> +      for (i = FIRST_EXT_REX_SSE_REG; i < LAST_EXT_REX_SSE_REG; i++)
> +	fixed_regs[i] = call_used_regs[i] = 1, reg_names[i] = "";
> +
> +      for (i = FIRST_MASK_REG; i < LAST_MASK_REG; i++)
> +	fixed_regs[i] = call_used_regs[i] = 1, reg_names[i] = "";
> +    }
>  }
This place should be updated as here: http://gcc.gnu.org/ml/gcc-cvs/2013-09/msg00181.html
I am not reposting as the change is obvious.

--
Thanks, K

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH i386 2/8] [AVX512] Add mask registers.
  2013-09-04 18:46                                         ` Kirill Yukhin
@ 2013-09-09 11:29                                           ` Kirill Yukhin
  0 siblings, 0 replies; 31+ messages in thread
From: Kirill Yukhin @ 2013-09-09 11:29 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Vladimir Makarov, Uros Bizjak, Jakub Jelinek, GCC Patches

Hello,
On 04 Sep 22:45, Kirill Yukhin wrote:
> Hello,
> 
> PING.

PING.

--
Thanks, K 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH i386 2/8] [AVX512] Add mask registers.
  2013-08-29 12:19                                       ` Kirill Yukhin
  2013-09-04 18:46                                         ` Kirill Yukhin
  2013-09-06 13:49                                         ` Kirill Yukhin
@ 2013-09-09 17:54                                         ` Richard Henderson
  2013-09-10 13:11                                           ` Kirill Yukhin
  2 siblings, 1 reply; 31+ messages in thread
From: Richard Henderson @ 2013-09-09 17:54 UTC (permalink / raw)
  To: Kirill Yukhin; +Cc: Vladimir Makarov, Uros Bizjak, Jakub Jelinek, GCC Patches

On 08/29/2013 04:59 AM, Kirill Yukhin wrote:
> @@ -7616,10 +7677,10 @@
>    [(set_attr "type" "alu")
>     (set_attr "mode" "SI")])
>  
> -(define_insn "*andhi_1"
> -  [(set (match_operand:HI 0 "nonimmediate_operand" "=rm,r,Ya")
> -	(and:HI (match_operand:HI 1 "nonimmediate_operand" "%0,0,qm")
> -		(match_operand:HI 2 "general_operand" "rn,rm,L")))
> +(define_insn "andhi_1"
> +  [(set (match_operand:HI 0 "nonimmediate_operand" "=rm,r,Ya,!Yk")
> +	(and:HI (match_operand:HI 1 "nonimmediate_operand" "%0,0,qm,Yk")
> +		(match_operand:HI 2 "general_operand" "rn,rm,L,Yk")))
>     (clobber (reg:CC FLAGS_REG))]

gen_andhi_1 is not used, nor is it likely to be in the future, therefore this
should still have "*".

> +(define_insn "<code>hi_1"
> +  [(set (match_operand:HI 0 "nonimmediate_operand" "=r,rm,!Yk")
> +	(any_or:HI
> +	 (match_operand:HI 1 "nonimmediate_operand" "%0,0,Yk")
> +	 (match_operand:HI 2 "general_operand" "<g>,r<i>,Yk")))
> +   (clobber (reg:CC FLAGS_REG))]

Likewise.

With this and the register setup change mentioned down-thread,
the patch is ok.


r~

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH i386 2/8] [AVX512] Add mask registers.
  2013-09-09 17:54                                         ` Richard Henderson
@ 2013-09-10 13:11                                           ` Kirill Yukhin
  2013-09-10 16:18                                             ` Richard Henderson
  2013-09-10 16:49                                             ` Richard Henderson
  0 siblings, 2 replies; 31+ messages in thread
From: Kirill Yukhin @ 2013-09-10 13:11 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Vladimir Makarov, Uros Bizjak, Jakub Jelinek, GCC Patches

Hello Richard,
Thanks for inputs.
On 09 Sep 10:39, Richard Henderson wrote:

> gen_andhi_1 is not used, nor is it likely to be in the future, therefore this
> should still have "*".
We're using it in patch 6/8 when introducing plugins:

+  { OPTION_MASK_ISA_AVX512F, CODE_FOR_andhi_1, "__builtin_ia32_kandhi", IX86_BUILTIN_KAND16, UNKNOWN, (int) HI_FTYPE_HI_HI },

And covered by tests in patch 8/8:

new file mode 100644
index 0000000..3d777c8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-kandnw-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "kandnw\[ \\t\]+\[^\n\]*%k\[1-7\]" 1 } } */
+
+#include <immintrin.h>
+
+void
+avx512f_test ()
+{
+  __mmask16 k1, k2, k3;
+  volatile __m512 x;
+
+  __asm__( "kmovw %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovw %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _mm512_kandn (k1, k2);
+  x = _mm512_mask_add_ps (x, k3, x, x);
+}

> 
> > +(define_insn "<code>hi_1"
> > +  [(set (match_operand:HI 0 "nonimmediate_operand" "=r,rm,!Yk")
> > +	(any_or:HI
> > +	 (match_operand:HI 1 "nonimmediate_operand" "%0,0,Yk")
> > +	 (match_operand:HI 2 "general_operand" "<g>,r<i>,Yk")))
> > +   (clobber (reg:CC FLAGS_REG))]
> 
> Likewise.
Same as above.

Do you still think we need "*"?

--
Thanks, K

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH i386 2/8] [AVX512] Add mask registers.
  2013-09-10 13:11                                           ` Kirill Yukhin
@ 2013-09-10 16:18                                             ` Richard Henderson
  2013-09-10 16:49                                             ` Richard Henderson
  1 sibling, 0 replies; 31+ messages in thread
From: Richard Henderson @ 2013-09-10 16:18 UTC (permalink / raw)
  To: Kirill Yukhin; +Cc: Vladimir Makarov, Uros Bizjak, Jakub Jelinek, GCC Patches

On 09/10/2013 05:57 AM, Kirill Yukhin wrote:
> Do you still think we need "*"?

No, I suppose that's fine.


r~

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH i386 2/8] [AVX512] Add mask registers.
  2013-09-10 13:11                                           ` Kirill Yukhin
  2013-09-10 16:18                                             ` Richard Henderson
@ 2013-09-10 16:49                                             ` Richard Henderson
  2013-09-10 18:38                                               ` Kirill Yukhin
  1 sibling, 1 reply; 31+ messages in thread
From: Richard Henderson @ 2013-09-10 16:49 UTC (permalink / raw)
  To: Kirill Yukhin; +Cc: Vladimir Makarov, Uros Bizjak, Jakub Jelinek, GCC Patches

On 09/10/2013 05:57 AM, Kirill Yukhin wrote:
> +  { OPTION_MASK_ISA_AVX512F, CODE_FOR_andhi_1, "__builtin_ia32_kandhi", IX86_BUILTIN_KAND16, UNKNOWN, (int) HI_FTYPE_HI_HI },

Alternately, why not use the standard CODE_FOR_andhi3 expander?


r~

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH i386 2/8] [AVX512] Add mask registers.
  2013-09-10 16:49                                             ` Richard Henderson
@ 2013-09-10 18:38                                               ` Kirill Yukhin
  2013-09-10 19:16                                                 ` Richard Henderson
  0 siblings, 1 reply; 31+ messages in thread
From: Kirill Yukhin @ 2013-09-10 18:38 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Vladimir Makarov, Uros Bizjak, Jakub Jelinek, GCC Patches

Hello,
On 10 Sep 09:17, Richard Henderson wrote:
> On 09/10/2013 05:57 AM, Kirill Yukhin wrote:
> > +  { OPTION_MASK_ISA_AVX512F, CODE_FOR_andhi_1, "__builtin_ia32_kandhi", IX86_BUILTIN_KAND16, UNKNOWN, (int) HI_FTYPE_HI_HI },
> 
> Alternately, why not use the standard CODE_FOR_andhi3 expander?

Great point! Thanks, fixed.
gcc.target/i386/avx512f-k* tests still pass. Bootstrap pass.

Is it ok now?

Thanks, K

---
 gcc/config/i386/constraints.md |   8 +-
 gcc/config/i386/i386.c         |  38 ++++--
 gcc/config/i386/i386.h         |  38 ++++--
 gcc/config/i386/i386.md        | 286 ++++++++++++++++++++++++++++++++++-------
 gcc/config/i386/predicates.md  |   9 ++
 5 files changed, 317 insertions(+), 62 deletions(-)

diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
index 28e626f..92e0c05 100644
--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -19,7 +19,7 @@
 
 ;;; Unused letters:
 ;;;     B     H           T
-;;;           h jk
+;;;           h j
 
 ;; Integer register constraints.
 ;; It is not necessary to define 'r' here.
@@ -78,6 +78,12 @@
  "TARGET_80387 || TARGET_FLOAT_RETURNS_IN_80387 ? FP_SECOND_REG : NO_REGS"
  "Second from top of 80387 floating-point stack (@code{%st(1)}).")
 
+(define_register_constraint "k" "TARGET_AVX512F ? MASK_EVEX_REGS : NO_REGS"
+"@internal Any mask register that can be used as predicate, i.e. k1-k7.")
+
+(define_register_constraint "Yk" "TARGET_AVX512F ? MASK_REGS : NO_REGS"
+"@internal Any mask register.")
+
 ;; Vector registers (also used for plain floating point nowadays).
 (define_register_constraint "y" "TARGET_MMX ? MMX_REGS : NO_REGS"
  "Any MMX register.")
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index fe9a714..72549e9 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2032,6 +2032,9 @@ enum reg_class const regclass_map[FIRST_PSEUDO_REGISTER] =
   EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS,
   EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS,
   EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS,
+  /* Mask registers.  */
+  MASK_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS,
+  MASK_EVEX_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS,
 };
 
 /* The "default" register map used in 32bit mode.  */
@@ -2047,6 +2050,7 @@ int const dbx_register_map[FIRST_PSEUDO_REGISTER] =
   -1, -1, -1, -1, -1, -1, -1, -1,	/* extended SSE registers */
   -1, -1, -1, -1, -1, -1, -1, -1,       /* AVX-512 registers 16-23*/
   -1, -1, -1, -1, -1, -1, -1, -1,       /* AVX-512 registers 24-31*/
+  93, 94, 95, 96, 97, 98, 99, 100,      /* Mask registers */
 };
 
 /* The "default" register map used in 64bit mode.  */
@@ -2062,6 +2066,7 @@ int const dbx64_register_map[FIRST_PSEUDO_REGISTER] =
   25, 26, 27, 28, 29, 30, 31, 32,	/* extended SSE registers */
   67, 68, 69, 70, 71, 72, 73, 74,       /* AVX-512 registers 16-23 */
   75, 76, 77, 78, 79, 80, 81, 82,       /* AVX-512 registers 24-31 */
+  118, 119, 120, 121, 122, 123, 124, 125, /* Mask registers */
 };
 
 /* Define the register numbers to be used in Dwarf debugging information.
@@ -2129,6 +2134,7 @@ int const svr4_dbx_register_map[FIRST_PSEUDO_REGISTER] =
   -1, -1, -1, -1, -1, -1, -1, -1,	/* extended SSE registers */
   -1, -1, -1, -1, -1, -1, -1, -1,       /* AVX-512 registers 16-23*/
   -1, -1, -1, -1, -1, -1, -1, -1,       /* AVX-512 registers 24-31*/
+  93, 94, 95, 96, 97, 98, 99, 100,      /* Mask registers */
 };
 
 /* Define parameter passing and return registers.  */
@@ -4224,8 +4230,13 @@ ix86_conditional_register_usage (void)
 
   /* If AVX512F is disabled, squash the registers.  */
   if (! TARGET_AVX512F)
-    for (i = FIRST_EXT_REX_SSE_REG; i <= LAST_EXT_REX_SSE_REG; i++)
-      fixed_regs[i] = call_used_regs[i] = 1, reg_names[i] = "";
+    {
+      for (i = FIRST_EXT_REX_SSE_REG; i <= LAST_EXT_REX_SSE_REG; i++)
+	fixed_regs[i] = call_used_regs[i] = 1, reg_names[i] = "";
+
+      for (i = FIRST_MASK_REG; i <= LAST_MASK_REG; i++)
+	fixed_regs[i] = call_used_regs[i] = 1, reg_names[i] = "";
+    }
 }
 
 \f
@@ -33918,10 +33929,12 @@ ix86_preferred_reload_class (rtx x, reg_class_t regclass)
     return regclass;
 
   /* Force constants into memory if we are loading a (nonzero) constant into
-     an MMX or SSE register.  This is because there are no MMX/SSE instructions
-     to load from a constant.  */
+     an MMX, SSE or MASK register.  This is because there are no MMX/SSE/MASK
+     instructions to load from a constant.  */
   if (CONSTANT_P (x)
-      && (MAYBE_MMX_CLASS_P (regclass) || MAYBE_SSE_CLASS_P (regclass)))
+      && (MAYBE_MMX_CLASS_P (regclass)
+	  || MAYBE_SSE_CLASS_P (regclass)
+	  || MAYBE_MASK_CLASS_P (regclass)))
     return NO_REGS;
 
   /* Prefer SSE regs only, if we can use them for math.  */
@@ -34025,10 +34038,11 @@ ix86_secondary_reload (bool in_p, rtx x, reg_class_t rclass,
 
   /* QImode spills from non-QI registers require
      intermediate register on 32bit targets.  */
-  if (!TARGET_64BIT
-      && !in_p && mode == QImode
-      && INTEGER_CLASS_P (rclass)
-      && MAYBE_NON_Q_CLASS_P (rclass))
+  if (mode == QImode
+      && (MAYBE_MASK_CLASS_P (rclass)
+	  || (!TARGET_64BIT && !in_p
+	      && INTEGER_CLASS_P (rclass)
+	      && MAYBE_NON_Q_CLASS_P (rclass))))
     {
       int regno;
 
@@ -34450,6 +34464,8 @@ ix86_hard_regno_mode_ok (int regno, enum machine_mode mode)
     return false;
   if (STACK_REGNO_P (regno))
     return VALID_FP_MODE_P (mode);
+  if (MASK_REGNO_P (regno))
+    return VALID_MASK_REG_MODE (mode);
   if (SSE_REGNO_P (regno))
     {
       /* We implement the move patterns for all vector modes into and
@@ -35259,6 +35275,10 @@ x86_order_regs_for_local_alloc (void)
    for (i = FIRST_EXT_REX_SSE_REG; i <= LAST_EXT_REX_SSE_REG; i++)
      reg_alloc_order [pos++] = i;
 
+   /* Mask register.  */
+   for (i = FIRST_MASK_REG; i <= LAST_MASK_REG; i++)
+     reg_alloc_order [pos++] = i;
+
    /* x87 registers.  */
    if (TARGET_SSE_MATH)
      for (i = FIRST_STACK_REG; i <= LAST_STACK_REG; i++)
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index e820aa6..709d3ed 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -893,7 +893,7 @@ enum target_cpu_default
    eliminated during reloading in favor of either the stack or frame
    pointer.  */
 
-#define FIRST_PSEUDO_REGISTER 69
+#define FIRST_PSEUDO_REGISTER 77
 
 /* Number of hardware registers that go into the DWARF-2 unwind info.
    If not defined, equals FIRST_PSEUDO_REGISTER.  */
@@ -923,7 +923,9 @@ enum target_cpu_default
 /*xmm16,xmm17,xmm18,xmm19,xmm20,xmm21,xmm22,xmm23*/		\
      0,   0,    0,    0,    0,    0,    0,    0,		\
 /*xmm24,xmm25,xmm26,xmm27,xmm28,xmm29,xmm30,xmm31*/		\
-     0,   0,    0,    0,    0,    0,    0,    0 }
+     0,   0,    0,    0,    0,    0,    0,    0,		\
+/*  k0,  k1, k2, k3, k4, k5, k6, k7*/				\
+     0,  0,   0,  0,  0,  0,  0,  0 }
 
 /* 1 for registers not available across function calls.
    These must include the FIXED_REGISTERS and also any
@@ -955,7 +957,9 @@ enum target_cpu_default
 /*xmm16,xmm17,xmm18,xmm19,xmm20,xmm21,xmm22,xmm23*/		\
      6,    6,     6,    6,    6,    6,    6,    6,		\
 /*xmm24,xmm25,xmm26,xmm27,xmm28,xmm29,xmm30,xmm31*/		\
-     6,    6,     6,    6,    6,    6,    6,    6 }
+     6,    6,     6,    6,    6,    6,    6,    6,		\
+ /* k0,  k1,  k2,  k3,  k4,  k5,  k6,  k7*/			\
+     1,   1,   1,   1,   1,   1,   1,   1 }
 
 /* Order in which to allocate registers.  Each register must be
    listed once, even those in FIXED_REGISTERS.  List frame pointer
@@ -971,7 +975,7 @@ enum target_cpu_default
    18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32,	\
    33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,  \
    48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62,	\
-   63, 64, 65, 66, 67, 68 }
+   63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76 }
 
 /* ADJUST_REG_ALLOC_ORDER is a macro which permits reg_alloc_order
    to be rearranged based on a particular function.  When using sse math,
@@ -1068,6 +1072,8 @@ enum target_cpu_default
    || (MODE) == V16SImode || (MODE) == V32HImode || (MODE) == V8DFmode	\
    || (MODE) == V16SFmode)
 
+#define VALID_MASK_REG_MODE(MODE) ((MODE) == HImode || (MODE) == QImode)
+
 /* Value is 1 if hard register REGNO can hold a value of machine-mode MODE.  */
 
 #define HARD_REGNO_MODE_OK(REGNO, MODE)	\
@@ -1093,8 +1099,10 @@ enum target_cpu_default
   (CC_REGNO_P (REGNO) ? VOIDmode					\
    : (MODE) == VOIDmode && (NREGS) != 1 ? VOIDmode			\
    : (MODE) == VOIDmode ? choose_hard_reg_mode ((REGNO), (NREGS), false) \
-   : (MODE) == HImode && !TARGET_PARTIAL_REG_STALL ? SImode		\
-   : (MODE) == QImode && !(TARGET_64BIT || QI_REGNO_P (REGNO)) ? SImode	\
+   : (MODE) == HImode && !(TARGET_PARTIAL_REG_STALL			\
+			   || MASK_REGNO_P (REGNO)) ? SImode		\
+   : (MODE) == QImode && !(TARGET_64BIT || QI_REGNO_P (REGNO)		\
+			   || MASK_REGNO_P (REGNO)) ? SImode		\
    : (MODE))
 
 /* The only ABI that saves SSE registers across calls is Win64 (thus no
@@ -1141,6 +1149,9 @@ enum target_cpu_default
 #define FIRST_EXT_REX_SSE_REG  (LAST_REX_SSE_REG + 1) /*53*/
 #define LAST_EXT_REX_SSE_REG   (FIRST_EXT_REX_SSE_REG + 15) /*68*/
 
+#define FIRST_MASK_REG  (LAST_EXT_REX_SSE_REG + 1) /*69*/
+#define LAST_MASK_REG   (FIRST_MASK_REG + 7) /*76*/
+
 /* Override this in other tm.h files to cope with various OS lossage
    requiring a frame pointer.  */
 #ifndef SUBTARGET_FRAME_POINTER_REQUIRED
@@ -1229,6 +1240,8 @@ enum reg_class
   FLOAT_INT_REGS,
   INT_SSE_REGS,
   FLOAT_INT_SSE_REGS,
+  MASK_EVEX_REGS,
+  MASK_REGS,
   ALL_REGS, LIM_REG_CLASSES
 };
 
@@ -1250,6 +1263,8 @@ enum reg_class
   reg_classes_intersect_p ((CLASS), ALL_SSE_REGS)
 #define MAYBE_MMX_CLASS_P(CLASS) \
   reg_classes_intersect_p ((CLASS), MMX_REGS)
+#define MAYBE_MASK_CLASS_P(CLASS) \
+  reg_classes_intersect_p ((CLASS), MASK_REGS)
 
 #define Q_CLASS_P(CLASS) \
   reg_class_subset_p ((CLASS), Q_REGS)
@@ -1282,6 +1297,8 @@ enum reg_class
    "FLOAT_INT_REGS",			\
    "INT_SSE_REGS",			\
    "FLOAT_INT_SSE_REGS",		\
+   "MASK_EVEX_REGS",			\
+   "MASK_REGS",				\
    "ALL_REGS" }
 
 /* Define which registers fit in which classes.  This is an initializer
@@ -1319,7 +1336,9 @@ enum reg_class
 {   0x11ffff,    0x1fe0,   0x0 },       /* FLOAT_INT_REGS */            \
 { 0x1ff100ff,0xffffffe0,  0x1f },       /* INT_SSE_REGS */              \
 { 0x1ff1ffff,0xffffffe0,  0x1f },       /* FLOAT_INT_SSE_REGS */        \
-{ 0xffffffff,0xffffffff,  0x1f }                                        \
+       { 0x0,       0x0,0x1fc0 },       /* MASK_EVEX_REGS */           \
+       { 0x0,       0x0,0x1fe0 },       /* MASK_REGS */                 \
+{ 0xffffffff,0xffffffff,0x1fff }                                        \
 }
 
 /* The same information, inverted:
@@ -1377,6 +1396,8 @@ enum reg_class
          : (N) <= LAST_REX_SSE_REG ? (FIRST_REX_SSE_REG + (N) - 8) \
                                    : (FIRST_EXT_REX_SSE_REG + (N) - 16))
 
+#define MASK_REGNO_P(N) IN_RANGE ((N), FIRST_MASK_REG, LAST_MASK_REG)
+#define ANY_MASK_REG_P(X) (REG_P (X) && MASK_REGNO_P (REGNO (X)))
 
 #define SSE_FLOAT_MODE_P(MODE) \
   ((TARGET_SSE && (MODE) == SFmode) || (TARGET_SSE2 && (MODE) == DFmode))
@@ -1933,7 +1954,8 @@ do {							\
  "xmm16", "xmm17", "xmm18", "xmm19",					\
  "xmm20", "xmm21", "xmm22", "xmm23",					\
  "xmm24", "xmm25", "xmm26", "xmm27",					\
- "xmm28", "xmm29", "xmm30", "xmm31" }
+ "xmm28", "xmm29", "xmm30", "xmm31",					\
+ "k0", "k1", "k2", "k3", "k4", "k5", "k6", "k7" }
 
 #define REGISTER_NAMES HI_REGISTER_NAMES
 
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 3307b08..013673a 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -328,6 +328,14 @@
    (XMM29_REG			66)
    (XMM30_REG			67)
    (XMM31_REG			68)
+   (MASK0_REG			69)
+   (MASK1_REG			70)
+   (MASK2_REG			71)
+   (MASK3_REG			72)
+   (MASK4_REG			73)
+   (MASK5_REG			74)
+   (MASK6_REG			75)
+   (MASK7_REG			76)
   ])
 
 ;; Insns whose names begin with "x86_" are emitted by gen_FOO calls
@@ -360,7 +368,7 @@
    sseishft,sseishft1,ssecmp,ssecomi,
    ssecvt,ssecvt1,sseicvt,sseins,
    sseshuf,sseshuf1,ssemuladd,sse4arg,
-   lwp,
+   lwp,mskmov,msklog,
    mmx,mmxmov,mmxadd,mmxmul,mmxcmp,mmxcvt,mmxshft"
   (const_string "other"))
 
@@ -379,7 +387,7 @@
 			  ssemul,sseimul,ssediv,sselog,sselog1,
 			  sseishft,sseishft1,ssecmp,ssecomi,
 			  ssecvt,ssecvt1,sseicvt,sseins,
-			  sseshuf,sseshuf1,ssemuladd,sse4arg")
+			  sseshuf,sseshuf1,ssemuladd,sse4arg,mskmov")
 	   (const_string "sse")
 	 (eq_attr "type" "mmx,mmxmov,mmxadd,mmxmul,mmxcmp,mmxcvt,mmxshft")
 	   (const_string "mmx")
@@ -390,7 +398,7 @@
 ;; The (bounding maximum) length of an instruction immediate.
 (define_attr "length_immediate" ""
   (cond [(eq_attr "type" "incdec,setcc,icmov,str,lea,other,multi,idiv,leave,
-			  bitmanip,imulx")
+			  bitmanip,imulx,msklog,mskmov")
 	   (const_int 0)
 	 (eq_attr "unit" "i387,sse,mmx")
 	   (const_int 0)
@@ -451,7 +459,7 @@
 ;; Set when 0f opcode prefix is used.
 (define_attr "prefix_0f" ""
   (if_then_else
-    (ior (eq_attr "type" "imovx,setcc,icmov,bitmanip")
+    (ior (eq_attr "type" "imovx,setcc,icmov,bitmanip,msklog,mskmov")
 	 (eq_attr "unit" "sse,mmx"))
     (const_int 1)
     (const_int 0)))
@@ -651,7 +659,7 @@
 		   fmov,fcmp,fsgn,
 		   sse,ssemov,ssecmp,ssecomi,ssecvt,ssecvt1,sseicvt,
 		   sselog1,sseshuf1,sseadd1,sseiadd1,sseishft1,
-		   mmx,mmxmov,mmxcmp,mmxcvt")
+		   mmx,mmxmov,mmxcmp,mmxcvt,mskmov,msklog")
 	      (match_operand 2 "memory_operand"))
 	   (const_string "load")
 	 (and (eq_attr "type" "icmov,ssemuladd,sse4arg")
@@ -695,7 +703,7 @@
 ;; Used to control the "enabled" attribute on a per-instruction basis.
 (define_attr "isa" "base,x64,x64_sse4,x64_sse4_noavx,x64_avx,nox64,
 		    sse2,sse2_noavx,sse3,sse4,sse4_noavx,avx,noavx,
-		    avx2,noavx2,bmi2,fma4,fma,avx512f,noavx512f,fma_avx512f"
+		    avx2,noavx2,bmi,bmi2,fma4,fma,avx512f,noavx512f,fma_avx512f"
   (const_string "base"))
 
 (define_attr "enabled" ""
@@ -718,6 +726,7 @@
 	 (eq_attr "isa" "noavx") (symbol_ref "!TARGET_AVX")
 	 (eq_attr "isa" "avx2") (symbol_ref "TARGET_AVX2")
 	 (eq_attr "isa" "noavx2") (symbol_ref "!TARGET_AVX2")
+	 (eq_attr "isa" "bmi") (symbol_ref "TARGET_BMI")
 	 (eq_attr "isa" "bmi2") (symbol_ref "TARGET_BMI2")
 	 (eq_attr "isa" "fma4") (symbol_ref "TARGET_FMA4")
 	 (eq_attr "isa" "fma") (symbol_ref "TARGET_FMA")
@@ -2213,8 +2222,8 @@
 	   (const_string "SI")))])
 
 (define_insn "*movhi_internal"
-  [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r ,r ,m")
-	(match_operand:HI 1 "general_operand"	   "r ,rn,rm,rn"))]
+  [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r ,r ,m ,Yk,Yk,rm")
+	(match_operand:HI 1 "general_operand"      "r ,rn,rm,rn,rm,Yk,Yk"))]
   "!(MEM_P (operands[0]) && MEM_P (operands[1]))"
 {
   switch (get_attr_type (insn))
@@ -2223,6 +2232,16 @@
       /* movzwl is faster than movw on p2 due to partial word stalls,
 	 though not as fast as an aligned movl.  */
       return "movz{wl|x}\t{%1, %k0|%k0, %1}";
+
+    case TYPE_MSKMOV:
+      switch (which_alternative)
+        {
+	case 4: return "kmovw\t{%k1, %0|%0, %k1}";
+	case 5: return "kmovw\t{%1, %0|%0, %1}";
+	case 6: return "kmovw\t{%1, %k0|%k0, %1}";
+	default: gcc_unreachable ();
+	}
+
     default:
       if (get_attr_mode (insn) == MODE_SI)
         return "mov{l}\t{%k1, %k0|%k0, %k1}";
@@ -2240,11 +2259,17 @@
 	    (and (eq_attr "alternative" "1,2")
 		 (match_operand:HI 1 "aligned_operand"))
 	      (const_string "imov")
+	    (eq_attr "alternative" "4,5,6")
+	      (const_string "mskmov")
 	    (and (match_test "TARGET_MOVX")
 		 (eq_attr "alternative" "0,2"))
 	      (const_string "imovx")
 	   ]
 	   (const_string "imov")))
+    (set (attr "prefix")
+      (if_then_else (eq_attr "alternative" "4,5,6")
+	(const_string "vex")
+	(const_string "orig")))
     (set (attr "mode")
       (cond [(eq_attr "type" "imovx")
 	       (const_string "SI")
@@ -2269,8 +2294,8 @@
 ;; register stall machines with, where we use QImode instructions, since
 ;; partial register stall can be caused there.  Then we use movzx.
 (define_insn "*movqi_internal"
-  [(set (match_operand:QI 0 "nonimmediate_operand" "=q,q ,q ,r,r ,?r,m")
-	(match_operand:QI 1 "general_operand"      "q ,qn,qm,q,rn,qm,qn"))]
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=q,q ,q ,r,r ,?r,m ,Yk,Yk,r")
+	(match_operand:QI 1 "general_operand"      "q ,qn,qm,q,rn,qm,qn,r ,Yk,Yk"))]
   "!(MEM_P (operands[0]) && MEM_P (operands[1]))"
 {
   switch (get_attr_type (insn))
@@ -2278,6 +2303,16 @@
     case TYPE_IMOVX:
       gcc_assert (ANY_QI_REG_P (operands[1]) || MEM_P (operands[1]));
       return "movz{bl|x}\t{%1, %k0|%k0, %1}";
+
+    case TYPE_MSKMOV:
+      switch (which_alternative)
+        {
+	case 7: return "kmovw\t{%k1, %0|%0, %k1}";
+	case 8: return "kmovw\t{%1, %0|%0, %1}";
+	case 9: return "kmovw\t{%1, %k0|%k0, %1}";
+	default: gcc_unreachable ();
+	}
+
     default:
       if (get_attr_mode (insn) == MODE_SI)
         return "mov{l}\t{%k1, %k0|%k0, %k1}";
@@ -2297,11 +2332,17 @@
 	      (const_string "imov")
 	    (eq_attr "alternative" "3,5")
 	      (const_string "imovx")
+	    (eq_attr "alternative" "7,8,9")
+	      (const_string "mskmov")
 	    (and (match_test "TARGET_MOVX")
 		 (eq_attr "alternative" "2"))
 	      (const_string "imovx")
 	   ]
 	   (const_string "imov")))
+   (set (attr "prefix")
+     (if_then_else (eq_attr "alternative" "7,8,9")
+       (const_string "vex")
+       (const_string "orig")))
    (set (attr "mode")
       (cond [(eq_attr "alternative" "3,4,5")
 	       (const_string "SI")
@@ -7494,6 +7535,26 @@
   operands[3] = gen_lowpart (QImode, operands[3]);
 })
 
+(define_split
+  [(set (match_operand:SWI12 0 "mask_reg_operand")
+	(any_logic:SWI12 (match_operand:SWI12 1 "mask_reg_operand")
+			 (match_operand:SWI12 2 "mask_reg_operand")))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_AVX512F && reload_completed"
+  [(set (match_dup 0)
+	(any_logic:SWI12 (match_dup 1)
+			 (match_dup 2)))])
+
+(define_insn "*k<logic><mode>"
+  [(set (match_operand:SWI12 0 "mask_reg_operand" "=Yk")
+	(any_logic:SWI12 (match_operand:SWI12 1 "mask_reg_operand" "Yk")
+			 (match_operand:SWI12 2 "mask_reg_operand" "Yk")))]
+  "TARGET_AVX512F"
+  "k<logic>w\t{%2, %1, %0|%0, %1, %2}";
+  [(set_attr "mode" "<MODE>")
+   (set_attr "type" "msklog")
+   (set_attr "prefix" "vex")])
+
 ;; %%% This used to optimize known byte-wide and operations to memory,
 ;; and sometimes to QImode registers.  If this is considered useful,
 ;; it should be done with splitters.
@@ -7617,9 +7678,9 @@
    (set_attr "mode" "SI")])
 
 (define_insn "*andhi_1"
-  [(set (match_operand:HI 0 "nonimmediate_operand" "=rm,r,Ya")
-	(and:HI (match_operand:HI 1 "nonimmediate_operand" "%0,0,qm")
-		(match_operand:HI 2 "general_operand" "rn,rm,L")))
+  [(set (match_operand:HI 0 "nonimmediate_operand" "=rm,r,Ya,!Yk")
+	(and:HI (match_operand:HI 1 "nonimmediate_operand" "%0,0,qm,Yk")
+		(match_operand:HI 2 "general_operand" "rn,rm,L,Yk")))
    (clobber (reg:CC FLAGS_REG))]
   "ix86_binary_operator_ok (AND, HImode, operands)"
 {
@@ -7628,34 +7689,38 @@
     case TYPE_IMOVX:
       return "#";
 
+    case TYPE_MSKLOG:
+      return "kandw\t{%2, %1, %0|%0, %1, %2}";
+
     default:
       gcc_assert (rtx_equal_p (operands[0], operands[1]));
       return "and{w}\t{%2, %0|%0, %2}";
     }
 }
-  [(set_attr "type" "alu,alu,imovx")
-   (set_attr "length_immediate" "*,*,0")
+  [(set_attr "type" "alu,alu,imovx,msklog")
+   (set_attr "length_immediate" "*,*,0,*")
    (set (attr "prefix_rex")
      (if_then_else
        (and (eq_attr "type" "imovx")
 	    (match_operand 1 "ext_QIreg_operand"))
        (const_string "1")
        (const_string "*")))
-   (set_attr "mode" "HI,HI,SI")])
+   (set_attr "mode" "HI,HI,SI,HI")])
 
 ;; %%% Potential partial reg stall on alternative 2.  What to do?
 (define_insn "*andqi_1"
-  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,q,r")
-	(and:QI (match_operand:QI 1 "nonimmediate_operand" "%0,0,0")
-		(match_operand:QI 2 "general_operand" "qn,qmn,rn")))
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,q,r,!Yk")
+	(and:QI (match_operand:QI 1 "nonimmediate_operand" "%0,0,0,Yk")
+		(match_operand:QI 2 "general_operand" "qn,qmn,rn,Yk")))
    (clobber (reg:CC FLAGS_REG))]
   "ix86_binary_operator_ok (AND, QImode, operands)"
   "@
    and{b}\t{%2, %0|%0, %2}
    and{b}\t{%2, %0|%0, %2}
-   and{l}\t{%k2, %k0|%k0, %k2}"
-  [(set_attr "type" "alu")
-   (set_attr "mode" "QI,QI,SI")])
+   and{l}\t{%k2, %k0|%k0, %k2}
+   kandw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "alu,alu,alu,msklog")
+   (set_attr "mode" "QI,QI,SI,HI")])
 
 (define_insn "*andqi_1_slp"
   [(set (strict_low_part (match_operand:QI 0 "nonimmediate_operand" "+qm,q"))
@@ -7668,6 +7733,40 @@
   [(set_attr "type" "alu1")
    (set_attr "mode" "QI")])
 
+(define_insn "kandn<mode>"
+  [(set (match_operand:SWI12 0 "register_operand" "=r,&r,!Yk")
+	(and:SWI12
+	  (not:SWI12
+	    (match_operand:SWI12 1 "register_operand" "r,0,Yk"))
+	  (match_operand:SWI12 2 "register_operand" "r,r,Yk")))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_AVX512F"
+  "@
+   andn\t{%k2, %k1, %k0|%k0, %k1, %k2}
+   #
+   kandnw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "bmi,*,avx512f")
+   (set_attr "type" "bitmanip,*,msklog")
+   (set_attr "prefix" "*,*,vex")
+   (set_attr "btver2_decode" "direct,*,*")
+   (set_attr "mode" "<MODE>")])
+
+(define_split
+  [(set (match_operand:SWI12 0 "general_reg_operand")
+	(and:SWI12
+	  (not:SWI12
+	    (match_dup 0))
+	  (match_operand:SWI12 1 "general_reg_operand")))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_AVX512F && !TARGET_BMI && reload_completed"
+  [(set (match_dup 0)
+	(not:HI (match_dup 0)))
+   (parallel [(set (match_dup 0)
+		   (and:HI (match_dup 0)
+			   (match_dup 1)))
+	      (clobber (reg:CC FLAGS_REG))])]
+  "")
+
 ;; Turn *anddi_1 into *andsi_1_zext if possible.
 (define_split
   [(set (match_operand:DI 0 "register_operand")
@@ -7999,29 +8098,44 @@
   "ix86_expand_binary_operator (<CODE>, <MODE>mode, operands); DONE;")
 
 (define_insn "*<code><mode>_1"
-  [(set (match_operand:SWI248 0 "nonimmediate_operand" "=r,rm")
-	(any_or:SWI248
-	 (match_operand:SWI248 1 "nonimmediate_operand" "%0,0")
-	 (match_operand:SWI248 2 "<general_operand>" "<g>,r<i>")))
+  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=r,rm")
+	(any_or:SWI48
+	 (match_operand:SWI48 1 "nonimmediate_operand" "%0,0")
+	 (match_operand:SWI48 2 "<general_operand>" "<g>,r<i>")))
    (clobber (reg:CC FLAGS_REG))]
   "ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
   "<logic>{<imodesuffix>}\t{%2, %0|%0, %2}"
   [(set_attr "type" "alu")
    (set_attr "mode" "<MODE>")])
 
+(define_insn "*<code>hi_1"
+  [(set (match_operand:HI 0 "nonimmediate_operand" "=r,rm,!Yk")
+	(any_or:HI
+	 (match_operand:HI 1 "nonimmediate_operand" "%0,0,Yk")
+	 (match_operand:HI 2 "general_operand" "<g>,r<i>,Yk")))
+   (clobber (reg:CC FLAGS_REG))]
+  "ix86_binary_operator_ok (<CODE>, HImode, operands)"
+  "@
+  <logic>{w}\t{%2, %0|%0, %2}
+  <logic>{w}\t{%2, %0|%0, %2}
+  k<logic>w\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "alu,alu,msklog")
+   (set_attr "mode" "HI")])
+
 ;; %%% Potential partial reg stall on alternative 2.  What to do?
 (define_insn "*<code>qi_1"
-  [(set (match_operand:QI 0 "nonimmediate_operand" "=q,m,r")
-	(any_or:QI (match_operand:QI 1 "nonimmediate_operand" "%0,0,0")
-		   (match_operand:QI 2 "general_operand" "qmn,qn,rn")))
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=q,m,r,!Yk")
+	(any_or:QI (match_operand:QI 1 "nonimmediate_operand" "%0,0,0,Yk")
+		   (match_operand:QI 2 "general_operand" "qmn,qn,rn,Yk")))
    (clobber (reg:CC FLAGS_REG))]
   "ix86_binary_operator_ok (<CODE>, QImode, operands)"
   "@
    <logic>{b}\t{%2, %0|%0, %2}
    <logic>{b}\t{%2, %0|%0, %2}
-   <logic>{l}\t{%k2, %k0|%k0, %k2}"
-  [(set_attr "type" "alu")
-   (set_attr "mode" "QI,QI,SI")])
+   <logic>{l}\t{%k2, %k0|%k0, %k2}
+   k<logic>w\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "alu,alu,alu,msklog")
+   (set_attr "mode" "QI,QI,SI,HI")])
 
 ;; See comment for addsi_1_zext why we do use nonimmediate_operand
 (define_insn "*<code>si_1_zext"
@@ -8071,6 +8185,74 @@
   [(set_attr "type" "alu")
    (set_attr "mode" "<MODE>")])
 
+(define_insn "kxnor<mode>"
+  [(set (match_operand:SWI12 0 "register_operand" "=r,!Yk")
+	(not:SWI12
+	  (xor:SWI12
+	    (match_operand:SWI12 1 "register_operand" "0,Yk")
+	    (match_operand:SWI12 2 "register_operand" "r,Yk"))))]
+  "TARGET_AVX512F"
+  "@
+   #
+   kxnorw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "*,msklog")
+   (set_attr "prefix" "*,vex")
+   (set_attr "mode" "<MODE>")])
+
+(define_split
+  [(set (match_operand:SWI12 0 "general_reg_operand")
+	(not:SWI12
+	  (xor:SWI12
+	    (match_dup 0)
+	    (match_operand:SWI12 1 "general_reg_operand"))))]
+  "TARGET_AVX512F && reload_completed"
+   [(parallel [(set (match_dup 0)
+		    (xor:HI (match_dup 0)
+			    (match_dup 1)))
+	       (clobber (reg:CC FLAGS_REG))])
+    (set (match_dup 0)
+	 (not:HI (match_dup 0)))]
+  "")
+
+(define_insn "kortestzhi"
+  [(set (reg:CCZ FLAGS_REG)
+	(compare:CCZ
+	  (ior:HI
+	    (match_operand:HI 0 "register_operand" "Yk")
+	    (match_operand:HI 1 "register_operand" "Yk"))
+	  (const_int 0)))]
+  "TARGET_AVX512F && ix86_match_ccmode (insn, CCZmode)"
+  "kortestw\t{%1, %0|%0, %1}"
+  [(set_attr "mode" "HI")
+   (set_attr "type" "msklog")
+   (set_attr "prefix" "vex")])
+
+(define_insn "kortestchi"
+  [(set (reg:CCC FLAGS_REG)
+	(compare:CCC
+	  (ior:HI
+	    (match_operand:HI 0 "register_operand" "Yk")
+	    (match_operand:HI 1 "register_operand" "Yk"))
+	  (const_int -1)))]
+  "TARGET_AVX512F && ix86_match_ccmode (insn, CCCmode)"
+  "kortestw\t{%1, %0|%0, %1}"
+  [(set_attr "mode" "HI")
+   (set_attr "type" "msklog")
+   (set_attr "prefix" "vex")])
+
+(define_insn "kunpckhi"
+  [(set (match_operand:HI 0 "register_operand" "=Yk")
+	(ior:HI
+	  (ashift:HI
+	    (match_operand:HI 1 "register_operand" "Yk")
+	    (const_int 8))
+	  (zero_extend:HI (match_operand:QI 2 "register_operand" "Yk"))))]
+  "TARGET_AVX512F"
+  "kunpckbw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mode" "HI")
+   (set_attr "type" "msklog")
+   (set_attr "prefix" "vex")])
+
 ;; See comment for addsi_1_zext why we do use nonimmediate_operand
 ;; ??? Special case for immediate operand is missing - it is tricky.
 (define_insn "*<code>si_2_zext"
@@ -8640,23 +8822,38 @@
   "ix86_expand_unary_operator (NOT, <MODE>mode, operands); DONE;")
 
 (define_insn "*one_cmpl<mode>2_1"
-  [(set (match_operand:SWI248 0 "nonimmediate_operand" "=rm")
-	(not:SWI248 (match_operand:SWI248 1 "nonimmediate_operand" "0")))]
+  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm")
+	(not:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" "0")))]
   "ix86_unary_operator_ok (NOT, <MODE>mode, operands)"
   "not{<imodesuffix>}\t%0"
   [(set_attr "type" "negnot")
    (set_attr "mode" "<MODE>")])
 
+(define_insn "*one_cmplhi2_1"
+  [(set (match_operand:HI 0 "nonimmediate_operand" "=rm,!Yk")
+	(not:HI (match_operand:HI 1 "nonimmediate_operand" "0,Yk")))]
+  "ix86_unary_operator_ok (NOT, HImode, operands)"
+  "@
+   not{w}\t%0
+   knotw\t{%1, %0|%0, %1}"
+  [(set_attr "isa" "*,avx512f")
+   (set_attr "type" "negnot,msklog")
+   (set_attr "prefix" "*,vex")
+   (set_attr "mode" "HI")])
+
 ;; %%% Potential partial reg stall on alternative 1.  What to do?
 (define_insn "*one_cmplqi2_1"
-  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,r")
-	(not:QI (match_operand:QI 1 "nonimmediate_operand" "0,0")))]
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,r,!Yk")
+	(not:QI (match_operand:QI 1 "nonimmediate_operand" "0,0,Yk")))]
   "ix86_unary_operator_ok (NOT, QImode, operands)"
   "@
    not{b}\t%0
-   not{l}\t%k0"
-  [(set_attr "type" "negnot")
-   (set_attr "mode" "QI,SI")])
+   not{l}\t%k0
+   knotw\t{%1, %0|%0, %1}"
+  [(set_attr "isa" "*,*,avx512f")
+   (set_attr "type" "negnot,negnot,msklog")
+   (set_attr "prefix" "*,*,vex")
+   (set_attr "mode" "QI,SI,QI")])
 
 ;; ??? Currently never generated - xor is used instead.
 (define_insn "*one_cmplsi2_1_zext"
@@ -16423,11 +16620,11 @@
 })
 
 ;; Avoid redundant prefixes by splitting HImode arithmetic to SImode.
-
+;; Do not split instructions with mask registers.
 (define_split
-  [(set (match_operand 0 "register_operand")
+  [(set (match_operand 0 "general_reg_operand")
 	(match_operator 3 "promotable_binary_operator"
-	   [(match_operand 1 "register_operand")
+	   [(match_operand 1 "general_reg_operand")
 	    (match_operand 2 "aligned_operand")]))
    (clobber (reg:CC FLAGS_REG))]
   "! TARGET_PARTIAL_REG_STALL && reload_completed
@@ -16522,9 +16719,10 @@
   operands[1] = gen_lowpart (SImode, operands[1]);
 })
 
+;; Do not split instructions with mask regs.
 (define_split
-  [(set (match_operand 0 "register_operand")
-	(not (match_operand 1 "register_operand")))]
+  [(set (match_operand 0 "general_reg_operand")
+	(not (match_operand 1 "general_reg_operand")))]
   "! TARGET_PARTIAL_REG_STALL && reload_completed
    && (GET_MODE (operands[0]) == HImode
        || (GET_MODE (operands[0]) == QImode
diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
index 3959c38..18f425c 100644
--- a/gcc/config/i386/predicates.md
+++ b/gcc/config/i386/predicates.md
@@ -32,6 +32,11 @@
   (and (match_code "reg")
        (not (match_test "ANY_FP_REGNO_P (REGNO (op))"))))
 
+;; True if the operand is a GENERAL class register.
+(define_predicate "general_reg_operand"
+  (and (match_code "reg")
+       (match_test "GENERAL_REG_P (op)")))
+
 ;; Return true if OP is a register operand other than an i387 fp register.
 (define_predicate "register_and_not_fp_reg_operand"
   (and (match_code "reg")
@@ -52,6 +57,10 @@
   (and (match_code "reg")
        (match_test "EXT_REX_SSE_REGNO_P (REGNO (op))")))
 
+;; True if the operand is an AVX-512 mask register.
+(define_predicate "mask_reg_operand"
+  (and (match_code "reg")
+       (match_test "MASK_REGNO_P (REGNO (op))")))
 
 ;; True if the operand is a Q_REGS class register.
 (define_predicate "q_regs_operand"
-- 
1.7.11.7

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH i386 2/8] [AVX512] Add mask registers.
  2013-09-10 18:38                                               ` Kirill Yukhin
@ 2013-09-10 19:16                                                 ` Richard Henderson
  2013-09-11  7:38                                                   ` Kirill Yukhin
  0 siblings, 1 reply; 31+ messages in thread
From: Richard Henderson @ 2013-09-10 19:16 UTC (permalink / raw)
  To: Kirill Yukhin; +Cc: Vladimir Makarov, Uros Bizjak, Jakub Jelinek, GCC Patches

On 09/10/2013 11:25 AM, Kirill Yukhin wrote:
> Hello,
> On 10 Sep 09:17, Richard Henderson wrote:
>> On 09/10/2013 05:57 AM, Kirill Yukhin wrote:
>>> +  { OPTION_MASK_ISA_AVX512F, CODE_FOR_andhi_1, "__builtin_ia32_kandhi", IX86_BUILTIN_KAND16, UNKNOWN, (int) HI_FTYPE_HI_HI },
>>
>> Alternately, why not use the standard CODE_FOR_andhi3 expander?
> 
> Great point! Thanks, fixed.
> gcc.target/i386/avx512f-k* tests still pass. Bootstrap pass.
> 
> Is it ok now?


Yes.


r~

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH i386 2/8] [AVX512] Add mask registers.
  2013-09-10 19:16                                                 ` Richard Henderson
@ 2013-09-11  7:38                                                   ` Kirill Yukhin
  0 siblings, 0 replies; 31+ messages in thread
From: Kirill Yukhin @ 2013-09-11  7:38 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Vladimir Makarov, Uros Bizjak, Jakub Jelinek, GCC Patches

Hello,
On 10 Sep 11:51, Richard Henderson wrote:
> On 09/10/2013 11:25 AM, Kirill Yukhin wrote:
> > Is it ok now?
> 
> 
> Yes.
Thanks a lot!
Checked into main trunk: http://gcc.gnu.org/ml/gcc-cvs/2013-09/msg00354.html

--
Thanks, K

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2013-09-11  7:34 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-31 15:31 [PATCH i386 2/8] [AVX512] Add mask registers Kirill Yukhin
2013-07-31 16:34 ` Richard Henderson
     [not found]   ` <20130805180659.GA60466@msticlxl57.ims.intel.com>
     [not found]     ` <52000316.5000508@redhat.com>
     [not found]       ` <20130806135722.GA16463@msticlxl57.ims.intel.com>
2013-08-06 19:11         ` avx512 mask register spilling Richard Henderson
2013-08-07 15:55           ` Vladimir Makarov
2013-08-06 19:43         ` [PATCH i386 2/8] [AVX512] Add mask registers Richard Henderson
2013-08-07 20:03           ` Kirill Yukhin
2013-08-14  7:24             ` Kirill Yukhin
2013-08-19 21:21               ` Richard Henderson
2013-08-22  9:43                 ` Kirill Yukhin
2013-08-22 15:52                   ` Richard Henderson
2013-08-26 16:18                     ` Kirill Yukhin
2013-08-26 16:40                       ` Richard Henderson
2013-08-27 18:22                         ` Kirill Yukhin
2013-08-27 18:41                           ` Kirill Yukhin
2013-08-27 20:17                           ` Richard Henderson
2013-08-28 17:55                             ` Kirill Yukhin
2013-08-28 18:07                               ` Richard Henderson
2013-08-28 20:06                                 ` Kirill Yukhin
2013-08-28 20:41                                   ` Richard Henderson
2013-08-28 22:07                                     ` Kirill Yukhin
2013-08-29 12:19                                       ` Kirill Yukhin
2013-09-04 18:46                                         ` Kirill Yukhin
2013-09-09 11:29                                           ` Kirill Yukhin
2013-09-06 13:49                                         ` Kirill Yukhin
2013-09-09 17:54                                         ` Richard Henderson
2013-09-10 13:11                                           ` Kirill Yukhin
2013-09-10 16:18                                             ` Richard Henderson
2013-09-10 16:49                                             ` Richard Henderson
2013-09-10 18:38                                               ` Kirill Yukhin
2013-09-10 19:16                                                 ` Richard Henderson
2013-09-11  7:38                                                   ` Kirill Yukhin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).