public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH v2 00/17] Support Intel APX NDD
@ 2023-12-05  2:29 Hongyu Wang
  2023-12-05  2:29 ` [PATCH 01/17] [APX NDD] Support Intel APX NDD for legacy add insn Hongyu Wang
                   ` (17 more replies)
  0 siblings, 18 replies; 24+ messages in thread
From: Hongyu Wang @ 2023-12-05  2:29 UTC (permalink / raw)
  To: gcc-patches; +Cc: ubizjak, hongtao.liu

Hi,

APX NDD patches have been posted at
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636604.html

Thanks to Hongtao's review, the V2 patch adds support of zext sematic with
memory input as NDD by default clear upper bits of dest for any operand size.

Also we support TImode shift with new split helper functions, which allows NDD
form split but still restric the memory src usage as in post-reload splitter
the register number is restricted, and no new register can be used for
shld/shrd.

Also fixed several typo/formatting/redundant code.

Bootstrapped/regtested on x86_64-pc-linux-gnu{-m32,} and sde.

OK for trunk?

Hongyu Wang (8):
  [APX NDD] Restrict TImode register usage when NDD enabled
  [APX NDD] Disable seg_prefixed memory usage for NDD add
  [APX NDD] Support APX NDD for left shift insns
  [APX NDD] Support APX NDD for right shift insns
  [APX NDD] Support APX NDD for rotate insns
  [APX NDD] Support APX NDD for shld/shrd insns
  [APX NDD] Support APX NDD for cmove insns
  [APX NDD] Support TImode shift for NDD

Kong Lingling (9):
  [APX NDD] Support Intel APX NDD for legacy add insn
  [APX NDD] Support APX NDD for optimization patterns of add
  [APX NDD] Support APX NDD for adc insns
  [APX NDD] Support APX NDD for sub insns
  [APX NDD] Support APX NDD for sbb insn
  [APX NDD] Support APX NDD for neg insn
  [APX NDD] Support APX NDD for not insn
  [APX NDD] Support APX NDD for and insn
  [APX NDD] Support APX NDD for or/xor insn

 gcc/config/i386/constraints.md                |    5 +
 gcc/config/i386/i386-expand.cc                |  164 +-
 gcc/config/i386/i386-options.cc               |    2 +
 gcc/config/i386/i386-protos.h                 |   16 +-
 gcc/config/i386/i386.cc                       |   40 +-
 gcc/config/i386/i386.md                       | 2323 +++++++++++------
 gcc/testsuite/gcc.target/i386/apx-ndd-adc.c   |   15 +
 gcc/testsuite/gcc.target/i386/apx-ndd-cmov.c  |   16 +
 gcc/testsuite/gcc.target/i386/apx-ndd-sbb.c   |    6 +
 .../gcc.target/i386/apx-ndd-shld-shrd.c       |   24 +
 .../gcc.target/i386/apx-ndd-ti-shift.c        |   91 +
 gcc/testsuite/gcc.target/i386/apx-ndd.c       |  202 ++
 12 files changed, 2149 insertions(+), 755 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd-adc.c
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd-cmov.c
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd-sbb.c
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd-shld-shrd.c
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd-ti-shift.c
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd.c

-- 
2.31.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 01/17] [APX NDD] Support Intel APX NDD for legacy add insn
  2023-12-05  2:29 [PATCH v2 00/17] Support Intel APX NDD Hongyu Wang
@ 2023-12-05  2:29 ` Hongyu Wang
  2023-12-05  2:29 ` [PATCH 02/17] [APX NDD] Restrict TImode register usage when NDD enabled Hongyu Wang
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Hongyu Wang @ 2023-12-05  2:29 UTC (permalink / raw)
  To: gcc-patches; +Cc: ubizjak, hongtao.liu, Kong Lingling

From: Kong Lingling <lingling.kong@intel.com>

APX NDD provides an extra destination register operand for several gpr
related legacy insns, so a new alternative can be adopted to operand1
with "r" constraint.

This first patch supports NDD for add instruction, and keeps to use lea
when all operands are registers since lea have shorter encoding. For
add operations containing mem NDD will be adopted to save an extra move.

In legacy x86 binary operation expand it will force operands[0] and
operands[1] to be the same so add a helper function to allow NDD form
pattern that operands[0] and operands[1] can be different.

gcc/ChangeLog:

	* config/i386/i386-expand.cc (ix86_fixup_binary_operands): Add
	new use_ndd flag to check whether ndd can be used for this binop
	and adjust operand emit.
	(ix86_binary_operator_ok): Likewise.
	(ix86_expand_binary_operator): Likewise, and void postreload
	expand generate lea pattern when use_ndd is explicit parsed.
	* config/i386/i386-options.cc (ix86_option_override_internal):
	Prohibit apx subfeatures when not in 64bit mode.
	* config/i386/i386-protos.h (ix86_binary_operator_ok):
	Add use_ndd flag.
	(ix86_fixup_binary_operand): Likewise.
	(ix86_expand_binary_operand): Likewise.
	* config/i386/i386.md (*add<mode>_1): Extend with new alternatives
	to support NDD, and adjust output template.
	(*addhi_1): Likewise.
	(*addqi_1): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/apx-ndd.c: New test.
---
 gcc/config/i386/i386-expand.cc          |  19 ++---
 gcc/config/i386/i386-options.cc         |   2 +
 gcc/config/i386/i386-protos.h           |   6 +-
 gcc/config/i386/i386.md                 | 102 ++++++++++++++----------
 gcc/testsuite/gcc.target/i386/apx-ndd.c |  21 +++++
 5 files changed, 96 insertions(+), 54 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd.c

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 4bd7d4f39c8..3ecda989cf8 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -1260,14 +1260,14 @@ ix86_swap_binary_operands_p (enum rtx_code code, machine_mode mode,
   return false;
 }
 
-
 /* Fix up OPERANDS to satisfy ix86_binary_operator_ok.  Return the
    destination to use for the operation.  If different from the true
-   destination in operands[0], a copy operation will be required.  */
+   destination in operands[0], a copy operation will be required except
+   under TARGET_APX_NDD.  */
 
 rtx
 ix86_fixup_binary_operands (enum rtx_code code, machine_mode mode,
-			    rtx operands[])
+			    rtx operands[], bool use_ndd)
 {
   rtx dst = operands[0];
   rtx src1 = operands[1];
@@ -1307,7 +1307,7 @@ ix86_fixup_binary_operands (enum rtx_code code, machine_mode mode,
     src1 = force_reg (mode, src1);
 
   /* Source 1 cannot be a non-matching memory.  */
-  if (MEM_P (src1) && !rtx_equal_p (dst, src1))
+  if (!use_ndd && MEM_P (src1) && !rtx_equal_p (dst, src1))
     src1 = force_reg (mode, src1);
 
   /* Improve address combine.  */
@@ -1338,11 +1338,11 @@ ix86_fixup_binary_operands_no_copy (enum rtx_code code,
 
 void
 ix86_expand_binary_operator (enum rtx_code code, machine_mode mode,
-			     rtx operands[])
+			     rtx operands[], bool use_ndd)
 {
   rtx src1, src2, dst, op, clob;
 
-  dst = ix86_fixup_binary_operands (code, mode, operands);
+  dst = ix86_fixup_binary_operands (code, mode, operands, use_ndd);
   src1 = operands[1];
   src2 = operands[2];
 
@@ -1352,7 +1352,8 @@ ix86_expand_binary_operator (enum rtx_code code, machine_mode mode,
 
   if (reload_completed
       && code == PLUS
-      && !rtx_equal_p (dst, src1))
+      && !rtx_equal_p (dst, src1)
+      && !use_ndd)
     {
       /* This is going to be an LEA; avoid splitting it later.  */
       emit_insn (op);
@@ -1451,7 +1452,7 @@ ix86_expand_vector_logical_operator (enum rtx_code code, machine_mode mode,
 
 bool
 ix86_binary_operator_ok (enum rtx_code code, machine_mode mode,
-			 rtx operands[3])
+			 rtx operands[3], bool use_ndd)
 {
   rtx dst = operands[0];
   rtx src1 = operands[1];
@@ -1475,7 +1476,7 @@ ix86_binary_operator_ok (enum rtx_code code, machine_mode mode,
     return false;
 
   /* Source 1 cannot be a non-matching memory.  */
-  if (MEM_P (src1) && !rtx_equal_p (dst, src1))
+  if (!use_ndd && MEM_P (src1) && !rtx_equal_p (dst, src1))
     /* Support "andhi/andsi/anddi" as a zero-extending move.  */
     return (code == AND
 	    && (mode == HImode
diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
index 877659229d2..27f078790e7 100644
--- a/gcc/config/i386/i386-options.cc
+++ b/gcc/config/i386/i386-options.cc
@@ -2129,6 +2129,8 @@ ix86_option_override_internal (bool main_args_p,
 
   if (TARGET_APX_F && !TARGET_64BIT)
     error ("%<-mapxf%> is not supported for 32-bit code");
+  else if (opts->x_ix86_apx_features != apx_none && !TARGET_64BIT)
+    error ("%<-mapx-features=%> option is not supported for 32-bit code");
 
   if (TARGET_UINTR && !TARGET_64BIT)
     error ("%<-muintr%> not supported for 32-bit code");
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 28d0eab11d5..a9d0c568bba 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -108,14 +108,14 @@ extern void ix86_expand_move (machine_mode, rtx[]);
 extern void ix86_expand_vector_move (machine_mode, rtx[]);
 extern void ix86_expand_vector_move_misalign (machine_mode, rtx[]);
 extern rtx ix86_fixup_binary_operands (enum rtx_code,
-				       machine_mode, rtx[]);
+				       machine_mode, rtx[], bool = false);
 extern void ix86_fixup_binary_operands_no_copy (enum rtx_code,
 						machine_mode, rtx[]);
 extern void ix86_expand_binary_operator (enum rtx_code,
-					 machine_mode, rtx[]);
+					 machine_mode, rtx[], bool = false);
 extern void ix86_expand_vector_logical_operator (enum rtx_code,
 						 machine_mode, rtx[]);
-extern bool ix86_binary_operator_ok (enum rtx_code, machine_mode, rtx[3]);
+extern bool ix86_binary_operator_ok (enum rtx_code, machine_mode, rtx[3], bool = false);
 extern bool ix86_avoid_lea_for_add (rtx_insn *, rtx[]);
 extern bool ix86_use_lea_for_mov (rtx_insn *, rtx[]);
 extern bool ix86_avoid_lea_for_addr (rtx_insn *, rtx[]);
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 7641b479670..cb227d19f40 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -562,7 +562,7 @@ (define_attr "unit" "integer,i387,sse,mmx,unknown"
 
 ;; Used to control the "enabled" attribute on a per-instruction basis.
 (define_attr "isa" "base,x64,nox64,x64_sse2,x64_sse4,x64_sse4_noavx,
-		    x64_avx,x64_avx512bw,x64_avx512dq,aes,
+		    x64_avx,x64_avx512bw,x64_avx512dq,aes,apx_ndd,
 		    sse_noavx,sse2,sse2_noavx,sse3,sse3_noavx,sse4,sse4_noavx,
 		    avx,noavx,avx2,noavx2,bmi,bmi2,fma4,fma,avx512f,avx512f_512,
 		    noavx512f,avx512bw,avx512bw_512,noavx512bw,avx512dq,
@@ -960,6 +960,8 @@ (define_attr "enabled" ""
 	   (symbol_ref "TARGET_AVX512BF16 && TARGET_AVX512VL")
 	 (eq_attr "isa" "vpclmulqdqvl")
 	   (symbol_ref "TARGET_VPCLMULQDQ && TARGET_AVX512VL")
+	 (eq_attr "isa" "apx_ndd")
+	   (symbol_ref "TARGET_APX_NDD")
 
 	 (eq_attr "mmx_isa" "native")
 	   (symbol_ref "!TARGET_MMX_WITH_SSE")
@@ -6285,7 +6287,8 @@ (define_expand "add<mode>3"
 	(plus:SDWIM (match_operand:SDWIM 1 "nonimmediate_operand")
 		    (match_operand:SDWIM 2 "<general_hilo_operand>")))]
   ""
-  "ix86_expand_binary_operator (PLUS, <MODE>mode, operands); DONE;")
+  "ix86_expand_binary_operator (PLUS, <MODE>mode, operands,
+				TARGET_APX_NDD); DONE;")
 
 (define_insn_and_split "*add<dwi>3_doubleword"
   [(set (match_operand:<DWI> 0 "nonimmediate_operand" "=ro,r")
@@ -6412,26 +6415,29 @@ (define_insn_and_split "*add<dwi>3_doubleword_concat_zext"
  "split_double_mode (<DWI>mode, &operands[0], 1, &operands[0], &operands[5]);")
 
 (define_insn "*add<mode>_1"
-  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r,r")
+  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r,r,r,r")
 	(plus:SWI48
-	  (match_operand:SWI48 1 "nonimmediate_operand" "%0,0,r,r")
-	  (match_operand:SWI48 2 "x86_64_general_operand" "re,BM,0,le")))
+	  (match_operand:SWI48 1 "nonimmediate_operand" "%0,0,r,r,rm,r")
+	  (match_operand:SWI48 2 "x86_64_general_operand" "re,BM,0,le,re,BM")))
    (clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (PLUS, <MODE>mode, operands)"
+  "ix86_binary_operator_ok (PLUS, <MODE>mode, operands,
+			    TARGET_APX_NDD)"
 {
+  bool use_ndd = (which_alternative == 4 || which_alternative == 5);
   switch (get_attr_type (insn))
     {
     case TYPE_LEA:
       return "#";
 
     case TYPE_INCDEC:
-      gcc_assert (rtx_equal_p (operands[0], operands[1]));
       if (operands[2] == const1_rtx)
-        return "inc{<imodesuffix>}\t%0";
+        return use_ndd ? "inc{<imodesuffix>}\t{%1, %0|%0, %1}"
+		      : "inc{<imodesuffix>}\t%0";
       else
         {
 	  gcc_assert (operands[2] == constm1_rtx);
-          return "dec{<imodesuffix>}\t%0";
+	  return use_ndd ? "dec{<imodesuffix>}\t{%1, %0|%0, %1}"
+			: "dec{<imodesuffix>}\t%0";
 	}
 
     default:
@@ -6440,14 +6446,16 @@ (define_insn "*add<mode>_1"
       if (which_alternative == 2)
         std::swap (operands[1], operands[2]);
         
-      gcc_assert (rtx_equal_p (operands[0], operands[1]));
       if (x86_maybe_negate_const_int (&operands[2], <MODE>mode))
-        return "sub{<imodesuffix>}\t{%2, %0|%0, %2}";
+        return use_ndd ? "sub{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}"
+		      : "sub{<imodesuffix>}\t{%2, %0|%0, %2}";
 
-      return "add{<imodesuffix>}\t{%2, %0|%0, %2}";
+      return use_ndd ? "add{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}"
+		    : "add{<imodesuffix>}\t{%2, %0|%0, %2}";
     }
 }
-  [(set (attr "type")
+  [(set_attr "isa" "*,*,*,*,apx_ndd,apx_ndd")
+   (set (attr "type")
      (cond [(eq_attr "alternative" "3")
               (const_string "lea")
 	    (match_operand:SWI48 2 "incdec_operand")
@@ -6516,25 +6524,26 @@ (define_insn "addsi_1_zext"
    (set_attr "mode" "SI")])
 
 (define_insn "*addhi_1"
-  [(set (match_operand:HI 0 "nonimmediate_operand" "=rm,r,r,Yp")
-	(plus:HI (match_operand:HI 1 "nonimmediate_operand" "%0,0,r,Yp")
-		 (match_operand:HI 2 "general_operand" "rn,m,0,ln")))
+  [(set (match_operand:HI 0 "nonimmediate_operand" "=rm,r,r,Yp,r,r")
+	(plus:HI (match_operand:HI 1 "nonimmediate_operand" "%0,0,r,Yp,rm,r")
+		 (match_operand:HI 2 "general_operand" "rn,m,0,ln,rn,m")))
    (clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (PLUS, HImode, operands)"
+  "ix86_binary_operator_ok (PLUS, HImode, operands,
+			    TARGET_APX_NDD)"
 {
+  bool use_ndd = (which_alternative == 4 || which_alternative == 5);
   switch (get_attr_type (insn))
     {
     case TYPE_LEA:
       return "#";
 
     case TYPE_INCDEC:
-      gcc_assert (rtx_equal_p (operands[0], operands[1]));
       if (operands[2] == const1_rtx)
-	return "inc{w}\t%0";
+	return use_ndd ? "inc{w}\t{%1, %0|%0, %1}" : "inc{w}\t%0";
       else
 	{
 	  gcc_assert (operands[2] == constm1_rtx);
-	  return "dec{w}\t%0";
+	  return use_ndd ? "dec{w}\t{%1, %0|%0, %1}" : "dec{w}\t%0";
 	}
 
     default:
@@ -6543,14 +6552,16 @@ (define_insn "*addhi_1"
       if (which_alternative == 2)
         std::swap (operands[1], operands[2]);
 
-      gcc_assert (rtx_equal_p (operands[0], operands[1]));
       if (x86_maybe_negate_const_int (&operands[2], HImode))
-	return "sub{w}\t{%2, %0|%0, %2}";
+	return use_ndd ? "sub{w}\t{%2, %1, %0|%0, %1, %2}"
+		       : "sub{w}\t{%2, %0|%0, %2}";
 
-      return "add{w}\t{%2, %0|%0, %2}";
+      return use_ndd ? "add{w}\t{%2, %1, %0|%0, %1, %2}"
+		     : "add{w}\t{%2, %0|%0, %2}";
     }
 }
-  [(set (attr "type")
+  [(set_attr "isa" "*,*,*,*,apx_ndd,apx_ndd")
+   (set (attr "type")
      (cond [(eq_attr "alternative" "3")
               (const_string "lea")
 	    (match_operand:HI 2 "incdec_operand")
@@ -6562,30 +6573,35 @@ (define_insn "*addhi_1"
 	(and (eq_attr "type" "alu") (match_operand 2 "const128_operand"))
 	(const_string "1")
 	(const_string "*")))
-   (set_attr "mode" "HI,HI,HI,SI")])
+   (set_attr "mode" "HI,HI,HI,SI,HI,HI")])
 
 (define_insn "*addqi_1"
-  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,q,q,r,r,Yp")
-	(plus:QI (match_operand:QI 1 "nonimmediate_operand" "%0,0,q,0,r,Yp")
-		 (match_operand:QI 2 "general_operand" "qn,m,0,rn,0,ln")))
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,q,q,r,r,Yp,r,r")
+	(plus:QI (match_operand:QI 1 "nonimmediate_operand" "%0,0,q,0,r,Yp,rm,r")
+		 (match_operand:QI 2 "general_operand" "qn,m,0,rn,0,ln,rn,m")))
    (clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (PLUS, QImode, operands)"
+  "ix86_binary_operator_ok (PLUS, QImode, operands, TARGET_APX_NDD)"
 {
   bool widen = (get_attr_mode (insn) != MODE_QI);
-
+  bool use_ndd = (which_alternative == 6 || which_alternative == 7);
   switch (get_attr_type (insn))
     {
     case TYPE_LEA:
       return "#";
 
     case TYPE_INCDEC:
-      gcc_assert (rtx_equal_p (operands[0], operands[1]));
       if (operands[2] == const1_rtx)
-	return widen ? "inc{l}\t%k0" : "inc{b}\t%0";
+	if (use_ndd)
+	  return "inc{b}\t{%1, %0|%0, %1}";
+	else
+	  return widen ? "inc{l}\t%k0" : "inc{b}\t%0";
       else
 	{
 	  gcc_assert (operands[2] == constm1_rtx);
-	  return widen ? "dec{l}\t%k0" : "dec{b}\t%0";
+	  if (use_ndd)
+	    return "dec{b}\t{%1, %0|%0, %1}";
+	  else
+	    return widen ? "dec{l}\t%k0" : "dec{b}\t%0";
 	}
 
     default:
@@ -6594,21 +6610,23 @@ (define_insn "*addqi_1"
       if (which_alternative == 2 || which_alternative == 4)
         std::swap (operands[1], operands[2]);
 
-      gcc_assert (rtx_equal_p (operands[0], operands[1]));
       if (x86_maybe_negate_const_int (&operands[2], QImode))
 	{
-	  if (widen)
-	    return "sub{l}\t{%2, %k0|%k0, %2}";
+	  if (use_ndd)
+	    return "sub{b}\t{%2, %1, %0|%0, %1, %2}";
 	  else
-	    return "sub{b}\t{%2, %0|%0, %2}";
+	    return widen ? "sub{l}\t{%2, %k0|%k0, %2}"
+			 : "sub{b}\t{%2, %0|%0, %2}";
 	}
-      if (widen)
-        return "add{l}\t{%k2, %k0|%k0, %k2}";
+      if (use_ndd)
+	return "add{b}\t{%2, %1, %0|%0, %1, %2}";
       else
-        return "add{b}\t{%2, %0|%0, %2}";
+	return widen ? "add{l}\t{%k2, %k0|%k0, %k2}"
+		     : "add{b}\t{%2, %0|%0, %2}";
     }
 }
-  [(set (attr "type")
+  [(set_attr "isa" "*,*,*,*,*,*,apx_ndd,apx_ndd")
+   (set (attr "type")
      (cond [(eq_attr "alternative" "5")
               (const_string "lea")
 	    (match_operand:QI 2 "incdec_operand")
@@ -6620,7 +6638,7 @@ (define_insn "*addqi_1"
 	(and (eq_attr "type" "alu") (match_operand 2 "const128_operand"))
 	(const_string "1")
 	(const_string "*")))
-   (set_attr "mode" "QI,QI,QI,SI,SI,SI")
+   (set_attr "mode" "QI,QI,QI,SI,SI,SI,QI,QI")
    ;; Potential partial reg stall on alternatives 3 and 4.
    (set (attr "preferred_for_speed")
      (cond [(eq_attr "alternative" "3,4")
diff --git a/gcc/testsuite/gcc.target/i386/apx-ndd.c b/gcc/testsuite/gcc.target/i386/apx-ndd.c
new file mode 100644
index 00000000000..056a323a647
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/apx-ndd.c
@@ -0,0 +1,21 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-mapxf -march=x86-64 -O2" } */
+/* { dg-final { scan-assembler-not "movl"} } */
+
+int foo (int *a)
+{
+  int b = *a - 1;
+  return b;
+}
+
+int foo2 (int a, int b)
+{
+  int c = a + b;
+  return c;
+}
+
+int foo3 (int *a, int b)
+{
+  int c = *a + b;
+  return c;
+}
-- 
2.31.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 02/17] [APX NDD] Restrict TImode register usage when NDD enabled
  2023-12-05  2:29 [PATCH v2 00/17] Support Intel APX NDD Hongyu Wang
  2023-12-05  2:29 ` [PATCH 01/17] [APX NDD] Support Intel APX NDD for legacy add insn Hongyu Wang
@ 2023-12-05  2:29 ` Hongyu Wang
  2023-12-05 10:46   ` Uros Bizjak
  2023-12-05  2:29 ` [PATCH 03/17] [APX NDD] Support APX NDD for optimization patterns of add Hongyu Wang
                   ` (15 subsequent siblings)
  17 siblings, 1 reply; 24+ messages in thread
From: Hongyu Wang @ 2023-12-05  2:29 UTC (permalink / raw)
  To: gcc-patches; +Cc: ubizjak, hongtao.liu

Under APX NDD, previous TImode allocation will have issue that it was
originally allocated using continuous pair, like rax:rdi, rdi:rdx.

This will cause issue for all TImode NDD patterns. For NDD we will not
assume the arithmetic operations like add have dependency between dest
and src1, then write to 1st highpart rdi will be overrided by the 2nd
lowpart rdi if 2nd lowpart rdi have different src as input, then the write
to 1st highpart rdi will missed and cause miscompliation.

To resolve this, under TARGET_APX_NDD we'd only allow register with even
regno to be allocated with TImode, then TImode registers will be allocated
with non-overlapping pairs.

There could be some error for inline assembly if it forcely allocate __int128
with odd number general register.

gcc/ChangeLog:

	* config/i386/i386.cc (ix86_hard_regno_mode_ok): Restrict even regno
	for TImode if APX NDD enabled.
---
 gcc/config/i386/i386.cc | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 93a9cb556a5..3efeed396c4 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -20873,6 +20873,16 @@ ix86_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
        return true;
       return !can_create_pseudo_p ();
     }
+  /* With TImode we previously have assumption that src1/dest will use same
+     register, so the allocation of highpart/lowpart can be consecutive, and
+     2 TImode insn would held their low/highpart in continuous sequence like
+     rax:rdx, rdx:rcx. This will not work for APX_NDD since NDD allows
+     different registers as dest/src1, when writes to 2nd lowpart will impact
+     the writes to 1st highpart, then the insn will be optimized out. So for
+     TImode pattern if we support NDD form, the allowed register number should
+     be even to avoid such mixed high/low part override. */
+  else if (TARGET_APX_NDD && mode == TImode)
+    return regno % 2 == 0;
   /* We handle both integer and floats in the general purpose registers.  */
   else if (VALID_INT_MODE_P (mode)
 	   || VALID_FP_MODE_P (mode))
-- 
2.31.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 03/17] [APX NDD] Support APX NDD for optimization patterns of add
  2023-12-05  2:29 [PATCH v2 00/17] Support Intel APX NDD Hongyu Wang
  2023-12-05  2:29 ` [PATCH 01/17] [APX NDD] Support Intel APX NDD for legacy add insn Hongyu Wang
  2023-12-05  2:29 ` [PATCH 02/17] [APX NDD] Restrict TImode register usage when NDD enabled Hongyu Wang
@ 2023-12-05  2:29 ` Hongyu Wang
  2023-12-05 11:20   ` Uros Bizjak
  2023-12-05  2:29 ` [PATCH 04/17] [APX NDD] Disable seg_prefixed memory usage for NDD add Hongyu Wang
                   ` (14 subsequent siblings)
  17 siblings, 1 reply; 24+ messages in thread
From: Hongyu Wang @ 2023-12-05  2:29 UTC (permalink / raw)
  To: gcc-patches; +Cc: ubizjak, hongtao.liu, Kong Lingling

From: Kong Lingling <lingling.kong@intel.com>

gcc/ChangeLog:

	* config/i386/i386.md: (addsi_1_zext): Add new alternatives for
	NDD and adjust output templates.
	(*add<mode>_2): Likewise.
	(*addsi_2_zext): Likewise.
	(*add<mode>_3): Likewise.
	(*addsi_3_zext): Likewise.
	(*adddi_4): Likewise.
	(*add<mode>_4): Likewise.
	(*add<mode>_5): Likewise.
	(*addv<mode>4): Likewise.
	(*addv<mode>4_1): Likewise.
	(*add<mode>3_cconly_overflow_1): Likewise.
	(*add<mode>3_cc_overflow_1): Likewise.
	(*addsi3_zext_cc_overflow_1): Likewise.
	(*add<mode>3_cconly_overflow_2): Likewise.
	(*add<mode>3_cc_overflow_2): Likewise.
	(*addsi3_zext_cc_overflow_2): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/apx-ndd.c: Add more test.
---
 gcc/config/i386/i386.md                 | 310 +++++++++++++++---------
 gcc/testsuite/gcc.target/i386/apx-ndd.c |  53 ++--
 2 files changed, 232 insertions(+), 131 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index cb227d19f40..2a73f6dcaec 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -6476,13 +6476,15 @@ (define_insn "*add<mode>_1"
 ;; patterns constructed from addsi_1 to match.
 
 (define_insn "addsi_1_zext"
-  [(set (match_operand:DI 0 "register_operand" "=r,r,r")
+  [(set (match_operand:DI 0 "register_operand" "=r,r,r,r,r")
 	(zero_extend:DI
-	  (plus:SI (match_operand:SI 1 "nonimmediate_operand" "%0,r,r")
-		   (match_operand:SI 2 "x86_64_general_operand" "rBMe,0,le"))))
+	  (plus:SI (match_operand:SI 1 "nonimmediate_operand" "%0,r,r,r,rm")
+		   (match_operand:SI 2 "x86_64_general_operand" "rBMe,0,le,rBMe,re"))))
    (clobber (reg:CC FLAGS_REG))]
-  "TARGET_64BIT && ix86_binary_operator_ok (PLUS, SImode, operands)"
+  "TARGET_64BIT && ix86_binary_operator_ok (PLUS, SImode, operands,
+					    TARGET_APX_NDD)"
 {
+  bool use_ndd = (which_alternative == 3 || which_alternative == 4);
   switch (get_attr_type (insn))
     {
     case TYPE_LEA:
@@ -6490,11 +6492,13 @@ (define_insn "addsi_1_zext"
 
     case TYPE_INCDEC:
       if (operands[2] == const1_rtx)
-        return "inc{l}\t%k0";
+        return use_ndd ? "inc{l}\t{%1, %k0|%k0, %1}"
+		       : "inc{l}\t%k0";
       else
         {
 	  gcc_assert (operands[2] == constm1_rtx);
-          return "dec{l}\t%k0";
+	  return use_ndd ? "dec{l}\t{%1, %k0|%k0, %1}"
+			 : "dec{l}\t%k0";
 	}
 
     default:
@@ -6504,12 +6508,15 @@ (define_insn "addsi_1_zext"
         std::swap (operands[1], operands[2]);
 
       if (x86_maybe_negate_const_int (&operands[2], SImode))
-        return "sub{l}\t{%2, %k0|%k0, %2}";
+        return use_ndd ? "sub{l}\t{%2 ,%1, %k0|%k0, %1, %2}"
+		       : "sub{l}\t{%2, %k0|%k0, %2}";
 
-      return "add{l}\t{%2, %k0|%k0, %2}";
+      return use_ndd ? "add{l}\t{%2 ,%1, %k0|%k0, %1, %2}"
+		     : "add{l}\t{%2, %k0|%k0, %2}";
     }
 }
-  [(set (attr "type")
+  [(set_attr "isa" "*,*,*,apx_ndd,apx_ndd")
+   (set (attr "type")
      (cond [(eq_attr "alternative" "2")
 	      (const_string "lea")
 	    (match_operand:SI 2 "incdec_operand")
@@ -6811,37 +6818,42 @@ (define_insn "*add<mode>_2"
   [(set (reg FLAGS_REG)
 	(compare
 	  (plus:SWI
-	    (match_operand:SWI 1 "nonimmediate_operand" "%0,0,<r>")
-	    (match_operand:SWI 2 "<general_operand>" "<r><i>,<m>,0"))
+	    (match_operand:SWI 1 "nonimmediate_operand" "%0,0,<r>,rm,r")
+	    (match_operand:SWI 2 "<general_operand>" "<r><i>,<m>,0,r<i>,<m>"))
 	  (const_int 0)))
-   (set (match_operand:SWI 0 "nonimmediate_operand" "=<r>m,<r>,<r>")
+   (set (match_operand:SWI 0 "nonimmediate_operand" "=<r>m,<r>,<r>,r,r")
 	(plus:SWI (match_dup 1) (match_dup 2)))]
   "ix86_match_ccmode (insn, CCGOCmode)
-   && ix86_binary_operator_ok (PLUS, <MODE>mode, operands)"
+   && ix86_binary_operator_ok (PLUS, <MODE>mode, operands, TARGET_APX_NDD)"
 {
+  bool use_ndd = (which_alternative == 3 || which_alternative == 4);
   switch (get_attr_type (insn))
     {
     case TYPE_INCDEC:
       if (operands[2] == const1_rtx)
-        return "inc{<imodesuffix>}\t%0";
+        return use_ndd ? "inc{<imodesuffix>}\t{%1, %0|%0, %1}"
+		       : "inc{<imodesuffix>}\t%0";
       else
         {
 	  gcc_assert (operands[2] == constm1_rtx);
-          return "dec{<imodesuffix>}\t%0";
+	  return use_ndd ? "dec{<imodesuffix>}\t{%1, %0|%0, %1}"
+			 : "dec{<imodesuffix>}\t%0";
 	}
 
     default:
       if (which_alternative == 2)
         std::swap (operands[1], operands[2]);
         
-      gcc_assert (rtx_equal_p (operands[0], operands[1]));
       if (x86_maybe_negate_const_int (&operands[2], <MODE>mode))
-        return "sub{<imodesuffix>}\t{%2, %0|%0, %2}";
+        return use_ndd ? "sub{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}"
+		       : "sub{<imodesuffix>}\t{%2, %0|%0, %2}";
 
-      return "add{<imodesuffix>}\t{%2, %0|%0, %2}";
+      return use_ndd ? "add{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}"
+		     : "add{<imodesuffix>}\t{%2, %0|%0, %2}";
     }
 }
-  [(set (attr "type")
+  [(set_attr "isa" "*,*,*,apx_ndd,apx_ndd")
+   (set (attr "type")
      (if_then_else (match_operand:SWI 2 "incdec_operand")
 	(const_string "incdec")
 	(const_string "alu")))
@@ -6856,23 +6868,26 @@ (define_insn "*add<mode>_2"
 (define_insn "*addsi_2_zext"
   [(set (reg FLAGS_REG)
 	(compare
-	  (plus:SI (match_operand:SI 1 "nonimmediate_operand" "%0,r")
-		   (match_operand:SI 2 "x86_64_general_operand" "rBMe,0"))
+	  (plus:SI (match_operand:SI 1 "nonimmediate_operand" "%0,r,r,rm")
+		   (match_operand:SI 2 "x86_64_general_operand" "rBMe,0,rBMe,re"))
 	  (const_int 0)))
-   (set (match_operand:DI 0 "register_operand" "=r,r")
+   (set (match_operand:DI 0 "register_operand" "=r,r,r,r")
 	(zero_extend:DI (plus:SI (match_dup 1) (match_dup 2))))]
   "TARGET_64BIT && ix86_match_ccmode (insn, CCGOCmode)
-   && ix86_binary_operator_ok (PLUS, SImode, operands)"
+   && ix86_binary_operator_ok (PLUS, SImode, operands, TARGET_APX_NDD)"
 {
+  bool use_ndd = (which_alternative == 2 || which_alternative == 3);
   switch (get_attr_type (insn))
     {
     case TYPE_INCDEC:
       if (operands[2] == const1_rtx)
-        return "inc{l}\t%k0";
+        return use_ndd ? "inc{l}\t{%1, %k0|%k0, %1}"
+		       : "inc{l}\t%k0";
       else
 	{
 	  gcc_assert (operands[2] == constm1_rtx);
-          return "dec{l}\t%k0";
+	  return use_ndd ? "dec{l}\t{%1, %k0|%k0, %1}"
+			 : "dec{l}\t%k0";
 	}
 
     default:
@@ -6880,12 +6895,15 @@ (define_insn "*addsi_2_zext"
         std::swap (operands[1], operands[2]);
 
       if (x86_maybe_negate_const_int (&operands[2], SImode))
-        return "sub{l}\t{%2, %k0|%k0, %2}";
+	return use_ndd ? "sub{l}\t{%2, %1, %k0|%k0, %1, %2}"
+		       : "sub{l}\t{%2, %k0|%k0, %2}";
 
-      return "add{l}\t{%2, %k0|%k0, %2}";
+      return use_ndd ? "add{l}\t{%2, %1, %k0|%k0, %1, %2}"
+		     : "add{l}\t{%2, %k0|%k0, %2}";
     }
 }
-  [(set (attr "type")
+  [(set_attr "isa" "*,*,apx_ndd,apx_ndd")
+   (set (attr "type")
      (if_then_else (match_operand:SI 2 "incdec_operand")
 	(const_string "incdec")
 	(const_string "alu")))
@@ -6899,35 +6917,40 @@ (define_insn "*addsi_2_zext"
 (define_insn "*add<mode>_3"
   [(set (reg FLAGS_REG)
 	(compare
-	  (neg:SWI (match_operand:SWI 2 "<general_operand>" "<g>,0"))
-	  (match_operand:SWI 1 "nonimmediate_operand" "%0,<r>")))
-   (clobber (match_scratch:SWI 0 "=<r>,<r>"))]
+	  (neg:SWI (match_operand:SWI 2 "<general_operand>" "<g>,0,<g>,re"))
+	  (match_operand:SWI 1 "nonimmediate_operand" "%0,<r>,r,rm")))
+   (clobber (match_scratch:SWI 0 "=<r>,<r>,r,r"))]
   "ix86_match_ccmode (insn, CCZmode)
    && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
 {
+  bool use_ndd = (which_alternative == 2 || which_alternative == 3);
   switch (get_attr_type (insn))
     {
     case TYPE_INCDEC:
       if (operands[2] == const1_rtx)
-        return "inc{<imodesuffix>}\t%0";
+        return use_ndd ? "inc{<imodesuffix>}\t{%1, %0|%0, %1}"
+	               : "inc{<imodesuffix>}\t%0";
       else
         {
 	  gcc_assert (operands[2] == constm1_rtx);
-          return "dec{<imodesuffix>}\t%0";
+          return use_ndd ? "dec{<imodesuffix>}\t{%1, %0|%0, %1}"
+	                 : "dec{<imodesuffix>}\t%0";
 	}
 
     default:
       if (which_alternative == 1)
         std::swap (operands[1], operands[2]);
 
-      gcc_assert (rtx_equal_p (operands[0], operands[1]));
       if (x86_maybe_negate_const_int (&operands[2], <MODE>mode))
-        return "sub{<imodesuffix>}\t{%2, %0|%0, %2}";
+        return use_ndd ? "sub{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}"
+                       : "sub{<imodesuffix>}\t{%2, %0|%0, %2}";
 
-      return "add{<imodesuffix>}\t{%2, %0|%0, %2}";
+      return use_ndd ? "add{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}"
+                     : "add{<imodesuffix>}\t{%2, %0|%0, %2}";
     }
 }
-  [(set (attr "type")
+  [(set_attr "isa" "*,*,apx_ndd,apx_ndd")
+   (set (attr "type")
      (if_then_else (match_operand:SWI 2 "incdec_operand")
 	(const_string "incdec")
 	(const_string "alu")))
@@ -6942,22 +6965,23 @@ (define_insn "*add<mode>_3"
 (define_insn "*addsi_3_zext"
   [(set (reg FLAGS_REG)
 	(compare
-	  (neg:SI (match_operand:SI 2 "x86_64_general_operand" "rBMe,0"))
-	  (match_operand:SI 1 "nonimmediate_operand" "%0,r")))
-   (set (match_operand:DI 0 "register_operand" "=r,r")
+	  (neg:SI (match_operand:SI 2 "x86_64_general_operand" "rBMe,0,rBMe,re"))
+	  (match_operand:SI 1 "nonimmediate_operand" "%0,r,r,rm")))
+   (set (match_operand:DI 0 "register_operand" "=r,r,r,r")
 	(zero_extend:DI (plus:SI (match_dup 1) (match_dup 2))))]
   "TARGET_64BIT && ix86_match_ccmode (insn, CCZmode)
-   && ix86_binary_operator_ok (PLUS, SImode, operands)"
+   && ix86_binary_operator_ok (PLUS, SImode, operands, TARGET_APX_NDD)"
 {
+  bool use_ndd = (which_alternative == 2 || which_alternative == 3);
   switch (get_attr_type (insn))
     {
     case TYPE_INCDEC:
       if (operands[2] == const1_rtx)
-        return "inc{l}\t%k0";
+        return use_ndd ? "inc{l}\t{%1, %k0|%k0, %1}" : "inc{l}\t%k0";
       else
         {
 	  gcc_assert (operands[2] == constm1_rtx);
-          return "dec{l}\t%k0";
+	  return use_ndd ? "dec{l}\t{%1, %k0|%k0, %1}" : "dec{l}\t%k0";
 	}
 
     default:
@@ -6965,12 +6989,15 @@ (define_insn "*addsi_3_zext"
         std::swap (operands[1], operands[2]);
 
       if (x86_maybe_negate_const_int (&operands[2], SImode))
-        return "sub{l}\t{%2, %k0|%k0, %2}";
+        return use_ndd ? "sub{l}\t{%2, %1, %k0|%k0, %1, %2}"
+		       : "sub{l}\t{%2, %k0|%k0, %2}";
 
-      return "add{l}\t{%2, %k0|%k0, %2}";
+      return use_ndd ? "add{l}\t{%2, %1, %k0|%k0, %1, %2}"
+		     : "add{l}\t{%2, %k0|%k0, %2}";
     }
 }
-  [(set (attr "type")
+  [(set_attr "isa" "*,*,apx_ndd,apx_ndd")
+   (set (attr "type")
      (if_then_else (match_operand:SI 2 "incdec_operand")
 	(const_string "incdec")
 	(const_string "alu")))
@@ -6991,31 +7018,35 @@ (define_insn "*addsi_3_zext"
 (define_insn "*adddi_4"
   [(set (reg FLAGS_REG)
 	(compare
-	  (match_operand:DI 1 "nonimmediate_operand" "0")
-	  (match_operand:DI 2 "x86_64_immediate_operand" "e")))
-   (clobber (match_scratch:DI 0 "=r"))]
+	  (match_operand:DI 1 "nonimmediate_operand" "0,rm")
+	  (match_operand:DI 2 "x86_64_immediate_operand" "e,e")))
+   (clobber (match_scratch:DI 0 "=r,r"))]
   "TARGET_64BIT
    && ix86_match_ccmode (insn, CCGCmode)"
 {
+  bool use_ndd = (which_alternative == 1);
   switch (get_attr_type (insn))
     {
     case TYPE_INCDEC:
       if (operands[2] == constm1_rtx)
-        return "inc{q}\t%0";
+        return use_ndd ? "inc{q}\t{%1, %0|%0, %1}" : "inc{q}\t%0";
       else
         {
 	  gcc_assert (operands[2] == const1_rtx);
-          return "dec{q}\t%0";
+	  return use_ndd ? "dec{q}\t{%1, %0|%0, %1}" : "dec{q}\t%0";
 	}
 
     default:
       if (x86_maybe_negate_const_int (&operands[2], DImode))
-	return "add{q}\t{%2, %0|%0, %2}";
+	return use_ndd ? "add{q}\t{%2, %1, %0|%0, %1, %2}"
+		       : "add{q}\t{%2, %0|%0, %2}";
 
-      return "sub{q}\t{%2, %0|%0, %2}";
+      return use_ndd ? "sub{q}\t{%2, %1, %0|%0, %1, %2}"
+		     : "sub{q}\t{%2, %0|%0, %2}";
     }
 }
-  [(set (attr "type")
+  [(set_attr "isa" "*,apx_ndd")
+   (set (attr "type")
      (if_then_else (match_operand:DI 2 "incdec_operand")
 	(const_string "incdec")
 	(const_string "alu")))
@@ -7036,30 +7067,36 @@ (define_insn "*adddi_4"
 (define_insn "*add<mode>_4"
   [(set (reg FLAGS_REG)
 	(compare
-	  (match_operand:SWI124 1 "nonimmediate_operand" "0")
+	  (match_operand:SWI124 1 "nonimmediate_operand" "0,rm")
 	  (match_operand:SWI124 2 "const_int_operand")))
-   (clobber (match_scratch:SWI124 0 "=<r>"))]
+   (clobber (match_scratch:SWI124 0 "=<r>,r"))]
   "ix86_match_ccmode (insn, CCGCmode)"
 {
+  bool use_ndd = (which_alternative == 1);
   switch (get_attr_type (insn))
     {
     case TYPE_INCDEC:
       if (operands[2] == constm1_rtx)
-        return "inc{<imodesuffix>}\t%0";
+        return use_ndd ? "inc{<imodesuffix>}\t{%1, %0|%0, %1}"
+		       : "inc{<imodesuffix>}\t%0";
       else
         {
 	  gcc_assert (operands[2] == const1_rtx);
-          return "dec{<imodesuffix>}\t%0";
+	  return use_ndd ? "dec{<imodesuffix>}\t{%1, %0|%0, %1}"
+			 : "dec{<imodesuffix>}\t%0";
 	}
 
     default:
       if (x86_maybe_negate_const_int (&operands[2], <MODE>mode))
-	return "add{<imodesuffix>}\t{%2, %0|%0, %2}";
+	return use_ndd ? "add{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}"
+		       : "add{<imodesuffix>}\t{%2, %0|%0, %2}";
 
-      return "sub{<imodesuffix>}\t{%2, %0|%0, %2}";
+      return use_ndd ? "sub{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}"
+		     : "sub{<imodesuffix>}\t{%2, %0|%0, %2}";
     }
 }
-  [(set (attr "type")
+  [(set_attr "isa" "*,apx_ndd")
+   (set (attr "type")
      (if_then_else (match_operand:<MODE> 2 "incdec_operand")
 	(const_string "incdec")
 	(const_string "alu")))
@@ -7074,36 +7111,41 @@ (define_insn "*add<mode>_5"
   [(set (reg FLAGS_REG)
 	(compare
 	  (plus:SWI
-	    (match_operand:SWI 1 "nonimmediate_operand" "%0,<r>")
-	    (match_operand:SWI 2 "<general_operand>" "<g>,0"))
+	    (match_operand:SWI 1 "nonimmediate_operand" "%0,<r>,r,rm")
+	    (match_operand:SWI 2 "<general_operand>" "<g>,0,<g>,re"))
 	  (const_int 0)))
-   (clobber (match_scratch:SWI 0 "=<r>,<r>"))]
+   (clobber (match_scratch:SWI 0 "=<r>,<r>,r,r"))]
   "ix86_match_ccmode (insn, CCGOCmode)
    && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
 {
+  bool use_ndd = (which_alternative == 2 || which_alternative == 3);
   switch (get_attr_type (insn))
     {
     case TYPE_INCDEC:
       if (operands[2] == const1_rtx)
-        return "inc{<imodesuffix>}\t%0";
+        return use_ndd ? "inc{<imodesuffix>}\t{%1, %0|%0, %1}"
+		       : "inc{<imodesuffix>}\t%0";
       else
         {
           gcc_assert (operands[2] == constm1_rtx);
-          return "dec{<imodesuffix>}\t%0";
+	  return use_ndd ? "dec{<imodesuffix>}\t{%1, %0|%0, %1}"
+			 : "dec{<imodesuffix>}\t%0";
 	}
 
     default:
       if (which_alternative == 1)
         std::swap (operands[1], operands[2]);
 
-      gcc_assert (rtx_equal_p (operands[0], operands[1]));
       if (x86_maybe_negate_const_int (&operands[2], <MODE>mode))
-        return "sub{<imodesuffix>}\t{%2, %0|%0, %2}";
+	return use_ndd ? "sub{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}"
+		       : "sub{<imodesuffix>}\t{%2, %0|%0, %2}";
 
-      return "add{<imodesuffix>}\t{%2, %0|%0, %2}";
+      return use_ndd ? "add{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}"
+		     : "add{<imodesuffix>}\t{%2, %0|%0, %2}";
     }
 }
-  [(set (attr "type")
+  [(set_attr "isa" "*,*,apx_ndd,apx_ndd")
+   (set (attr "type")
      (if_then_else (match_operand:SWI 2 "incdec_operand")
 	(const_string "incdec")
 	(const_string "alu")))
@@ -7316,35 +7358,43 @@ (define_insn "*addv<mode>4"
   [(set (reg:CCO FLAGS_REG)
 	(eq:CCO (plus:<DWI>
 		   (sign_extend:<DWI>
-		      (match_operand:SWI 1 "nonimmediate_operand" "%0,0"))
+		      (match_operand:SWI 1 "nonimmediate_operand" "%0,0,rm,r"))
 		   (sign_extend:<DWI>
-		      (match_operand:SWI 2 "<general_sext_operand>" "<r>We,m")))
+		      (match_operand:SWI 2 "<general_sext_operand>" "<r>We,m,rWe,m")))
 		(sign_extend:<DWI>
 		   (plus:SWI (match_dup 1) (match_dup 2)))))
-   (set (match_operand:SWI 0 "nonimmediate_operand" "=<r>m,<r>")
+   (set (match_operand:SWI 0 "nonimmediate_operand" "=<r>m,<r>,r,r")
 	(plus:SWI (match_dup 1) (match_dup 2)))]
-  "ix86_binary_operator_ok (PLUS, <MODE>mode, operands)"
-  "add{<imodesuffix>}\t{%2, %0|%0, %2}"
-  [(set_attr "type" "alu")
+  "ix86_binary_operator_ok (PLUS, <MODE>mode, operands, TARGET_APX_NDD)"
+  "@
+   add{<imodesuffix>}\t{%2, %0|%0, %2}
+   add{<imodesuffix>}\t{%2, %0|%0, %2}
+   add{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}
+   add{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "*,*,apx_ndd,apx_ndd")
+   (set_attr "type" "alu")
    (set_attr "mode" "<MODE>")])
 
 (define_insn "addv<mode>4_1"
   [(set (reg:CCO FLAGS_REG)
 	(eq:CCO (plus:<DWI>
 		   (sign_extend:<DWI>
-		      (match_operand:SWI 1 "nonimmediate_operand" "0"))
+		      (match_operand:SWI 1 "nonimmediate_operand" "0,rm"))
 		   (match_operand:<DWI> 3 "const_int_operand"))
 		(sign_extend:<DWI>
 		   (plus:SWI
 		     (match_dup 1)
-		     (match_operand:SWI 2 "x86_64_immediate_operand" "<i>")))))
-   (set (match_operand:SWI 0 "nonimmediate_operand" "=<r>m")
+		     (match_operand:SWI 2 "x86_64_immediate_operand" "<i>,<i>")))))
+   (set (match_operand:SWI 0 "nonimmediate_operand" "=<r>m,r")
 	(plus:SWI (match_dup 1) (match_dup 2)))]
-  "ix86_binary_operator_ok (PLUS, <MODE>mode, operands)
+  "ix86_binary_operator_ok (PLUS, <MODE>mode, operands, TARGET_APX_NDD)
    && CONST_INT_P (operands[2])
    && INTVAL (operands[2]) == INTVAL (operands[3])"
-  "add{<imodesuffix>}\t{%2, %0|%0, %2}"
-  [(set_attr "type" "alu")
+  "@
+  add{<imodesuffix>}\t{%2, %0|%0, %2}
+  add{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "*,apx_ndd")
+   (set_attr "type" "alu")
    (set_attr "mode" "<MODE>")
    (set (attr "length_immediate")
 	(cond [(match_test "IN_RANGE (INTVAL (operands[2]), -128, 127)")
@@ -9187,27 +9237,36 @@ (define_insn "*add<mode>3_cconly_overflow_1"
   [(set (reg:CCC FLAGS_REG)
 	(compare:CCC
 	  (plus:SWI
-	    (match_operand:SWI 1 "nonimmediate_operand" "%0")
-	    (match_operand:SWI 2 "<general_operand>" "<g>"))
+	    (match_operand:SWI 1 "nonimmediate_operand" "%0,r,rm")
+	    (match_operand:SWI 2 "<general_operand>" "<g>,<g>,re"))
 	  (match_dup 1)))
-   (clobber (match_scratch:SWI 0 "=<r>"))]
+   (clobber (match_scratch:SWI 0 "=<r>,r,r"))]
   "!(MEM_P (operands[1]) && MEM_P (operands[2]))"
-  "add{<imodesuffix>}\t{%2, %0|%0, %2}"
-  [(set_attr "type" "alu")
+  "@
+  add{<imodesuffix>}\t{%2, %0|%0, %2}
+  add{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}
+  add{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "*,apx_ndd,apx_ndd")
+   (set_attr "type" "alu")
    (set_attr "mode" "<MODE>")])
 
 (define_insn "@add<mode>3_cc_overflow_1"
   [(set (reg:CCC FLAGS_REG)
 	(compare:CCC
 	    (plus:SWI
-		(match_operand:SWI 1 "nonimmediate_operand" "%0,0")
-		(match_operand:SWI 2 "<general_operand>" "<r><i>,<m>"))
+		(match_operand:SWI 1 "nonimmediate_operand" "%0,0,rm,r")
+		(match_operand:SWI 2 "<general_operand>" "<r><i>,<m>,r<i>,<m>"))
 	    (match_dup 1)))
-   (set (match_operand:SWI 0 "nonimmediate_operand" "=<r>m,<r>")
+   (set (match_operand:SWI 0 "nonimmediate_operand" "=<r>m,<r>,r,r")
 	(plus:SWI (match_dup 1) (match_dup 2)))]
-  "ix86_binary_operator_ok (PLUS, <MODE>mode, operands)"
-  "add{<imodesuffix>}\t{%2, %0|%0, %2}"
-  [(set_attr "type" "alu")
+  "ix86_binary_operator_ok (PLUS, <MODE>mode, operands, TARGET_APX_NDD)"
+  "@
+   add{<imodesuffix>}\t{%2, %0|%0, %2}
+   add{<imodesuffix>}\t{%2, %0|%0, %2}
+   add{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}
+   add{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "*,*,apx_ndd,apx_ndd")
+   (set_attr "type" "alu")
    (set_attr "mode" "<MODE>")])
 
 (define_peephole2
@@ -9252,55 +9311,74 @@ (define_insn "*addsi3_zext_cc_overflow_1"
   [(set (reg:CCC FLAGS_REG)
 	(compare:CCC
 	  (plus:SI
-	    (match_operand:SI 1 "nonimmediate_operand" "%0")
-	    (match_operand:SI 2 "x86_64_general_operand" "rBMe"))
+	    (match_operand:SI 1 "nonimmediate_operand" "%0,r,rm")
+	    (match_operand:SI 2 "x86_64_general_operand" "rBMe,rBMe,re"))
 	  (match_dup 1)))
-   (set (match_operand:DI 0 "register_operand" "=r")
+   (set (match_operand:DI 0 "register_operand" "=r,r,r")
 	(zero_extend:DI (plus:SI (match_dup 1) (match_dup 2))))]
-  "TARGET_64BIT && ix86_binary_operator_ok (PLUS, SImode, operands)"
-  "add{l}\t{%2, %k0|%k0, %2}"
-  [(set_attr "type" "alu")
+  "TARGET_64BIT && ix86_binary_operator_ok (PLUS, SImode, operands,
+					    TARGET_APX_NDD)"
+  "@
+  add{l}\t{%2, %k0|%k0, %2}
+  add{l}\t{%2, %1, %k0|%k0, %1, %2}
+  add{l}\t{%2, %1, %k0|%k0, %1, %2}"
+  [(set_attr "isa" "*,apx_ndd,apx_ndd")
+   (set_attr "type" "alu")
    (set_attr "mode" "SI")])
 
 (define_insn "*add<mode>3_cconly_overflow_2"
   [(set (reg:CCC FLAGS_REG)
 	(compare:CCC
 	  (plus:SWI
-	    (match_operand:SWI 1 "nonimmediate_operand" "%0")
-	    (match_operand:SWI 2 "<general_operand>" "<g>"))
+	    (match_operand:SWI 1 "nonimmediate_operand" "%0,r,rm")
+	    (match_operand:SWI 2 "<general_operand>" "<g>,<g>,re"))
 	  (match_dup 2)))
-   (clobber (match_scratch:SWI 0 "=<r>"))]
+   (clobber (match_scratch:SWI 0 "=<r>,r,r"))]
   "!(MEM_P (operands[1]) && MEM_P (operands[2]))"
-  "add{<imodesuffix>}\t{%2, %0|%0, %2}"
-  [(set_attr "type" "alu")
+  "@
+  add{<imodesuffix>}\t{%2, %0|%0, %2}
+  add{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}
+  add{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "*,apx_ndd,apx_ndd")
+   (set_attr "type" "alu")
    (set_attr "mode" "<MODE>")])
 
 (define_insn "*add<mode>3_cc_overflow_2"
   [(set (reg:CCC FLAGS_REG)
 	(compare:CCC
 	    (plus:SWI
-		(match_operand:SWI 1 "nonimmediate_operand" "%0,0")
-		(match_operand:SWI 2 "<general_operand>" "<r><i>,<m>"))
+		(match_operand:SWI 1 "nonimmediate_operand" "%0,0,rm,r")
+		(match_operand:SWI 2 "<general_operand>" "<r><i>,<m>,r<i>,<m>"))
 	    (match_dup 2)))
-   (set (match_operand:SWI 0 "nonimmediate_operand" "=<r>m,<r>")
+   (set (match_operand:SWI 0 "nonimmediate_operand" "=<r>m,<r>,r,r")
 	(plus:SWI (match_dup 1) (match_dup 2)))]
-  "ix86_binary_operator_ok (PLUS, <MODE>mode, operands)"
-  "add{<imodesuffix>}\t{%2, %0|%0, %2}"
-  [(set_attr "type" "alu")
+  "ix86_binary_operator_ok (PLUS, <MODE>mode, operands, TARGET_APX_NDD)"
+  "@
+  add{<imodesuffix>}\t{%2, %0|%0, %2}
+  add{<imodesuffix>}\t{%2, %0|%0, %2}
+  add{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}
+  add{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "*,*,apx_ndd,apx_ndd")
+   (set_attr "type" "alu")
    (set_attr "mode" "<MODE>")])
 
 (define_insn "*addsi3_zext_cc_overflow_2"
   [(set (reg:CCC FLAGS_REG)
 	(compare:CCC
 	  (plus:SI
-	    (match_operand:SI 1 "nonimmediate_operand" "%0")
-	    (match_operand:SI 2 "x86_64_general_operand" "rBMe"))
+	    (match_operand:SI 1 "nonimmediate_operand" "%0,r,rm")
+	    (match_operand:SI 2 "x86_64_general_operand" "rBMe,rBMe,re"))
 	  (match_dup 2)))
-   (set (match_operand:DI 0 "register_operand" "=r")
+   (set (match_operand:DI 0 "register_operand" "=r,r,r")
 	(zero_extend:DI (plus:SI (match_dup 1) (match_dup 2))))]
-  "TARGET_64BIT && ix86_binary_operator_ok (PLUS, SImode, operands)"
-  "add{l}\t{%2, %k0|%k0, %2}"
-  [(set_attr "type" "alu")
+  "TARGET_64BIT && ix86_binary_operator_ok (PLUS, SImode, operands,
+					    TARGET_APX_NDD)"
+  "@
+  add{l}\t{%2, %k0|%k0, %2}
+  add{l}\t{%2, %1, %k0|%k0, %1, %2}
+  add{l}\t{%2, %1, %k0|%k0, %1, %2}"
+  [(set_attr "isa" "*,apx_ndd,apx_ndd")
+   (set_attr "type" "alu")
    (set_attr "mode" "SI")])
 
 (define_insn_and_split "*add<dwi>3_doubleword_cc_overflow_1"
diff --git a/gcc/testsuite/gcc.target/i386/apx-ndd.c b/gcc/testsuite/gcc.target/i386/apx-ndd.c
index 056a323a647..c1049022f2a 100644
--- a/gcc/testsuite/gcc.target/i386/apx-ndd.c
+++ b/gcc/testsuite/gcc.target/i386/apx-ndd.c
@@ -2,20 +2,43 @@
 /* { dg-options "-mapxf -march=x86-64 -O2" } */
 /* { dg-final { scan-assembler-not "movl"} } */
 
-int foo (int *a)
-{
-  int b = *a - 1;
-  return b;
-}
+#define FOO(TYPE, OP_NAME, OP)   \
+TYPE				 \
+__attribute__ ((noipa)) 	 \
+foo_##OP_NAME##_##TYPE (TYPE *a) \
+{				 \
+  TYPE b = *a OP 1;		 \
+  return b;			 \
+}			
 
-int foo2 (int a, int b)
-{
-  int c = a + b;
-  return c;
-}
+#define FOO1(TYPE, OP_NAME, OP)		 \
+TYPE				  	 \
+__attribute__ ((noipa)) 	  	 \
+foo1_##OP_NAME##_##TYPE (TYPE a, TYPE b) \
+{				 	 \
+  TYPE c = a OP b;		 	 \
+  return c;			 	 \
+}			
+
+#define FOO2(TYPE, OP_NAME, OP)		  \
+TYPE				  	  \
+__attribute__ ((noipa)) 	  	  \
+foo2_##OP_NAME##_##TYPE (TYPE *a, TYPE b) \
+{				 	  \
+  TYPE c = *a OP b;		 	  \
+  return c;			 	  \
+}			
+
+FOO (char, add, +)
+FOO1 (char, add, +)
+FOO2 (char, add, +)
+FOO (short, add, +)
+FOO1 (short, add, +)
+FOO2 (short, add, +)
+FOO (int, add, +)
+FOO1 (int, add, +)
+FOO2 (int, add, +)
+FOO (long, add, +)
+FOO1 (long, add, +)
+FOO2 (long, add, +)
 
-int foo3 (int *a, int b)
-{
-  int c = *a + b;
-  return c;
-}
-- 
2.31.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 04/17] [APX NDD] Disable seg_prefixed memory usage for NDD add
  2023-12-05  2:29 [PATCH v2 00/17] Support Intel APX NDD Hongyu Wang
                   ` (2 preceding siblings ...)
  2023-12-05  2:29 ` [PATCH 03/17] [APX NDD] Support APX NDD for optimization patterns of add Hongyu Wang
@ 2023-12-05  2:29 ` Hongyu Wang
  2023-12-05  2:29 ` [PATCH 05/17] [APX NDD] Support APX NDD for adc insns Hongyu Wang
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Hongyu Wang @ 2023-12-05  2:29 UTC (permalink / raw)
  To: gcc-patches; +Cc: ubizjak, hongtao.liu

NDD uses evex prefix, so when segment prefix is also applied, the instruction
could excceed its 15byte limit, especially adding immediates. This could happen
when "e" constraint accepts any UNSPEC_TPOFF/UNSPEC_NTPOFF constant and it will
add the offset to segment register, which will be encoded using segment prefix.
Disable those *POFF constant usage in NDD add alternatives with new constraint.

gcc/ChangeLog:

	* config/i386/constraints.md (je): New constraint.
	* config/i386/i386-protos.h (x86_poff_operand_p): New function to
	check any *POFF constant in operand.
	* config/i386/i386.cc (x86_poff_operand_p): New prototype.
	* config/i386/i386.md (*add<mode>_1): Split out je alternative for add.
---
 gcc/config/i386/constraints.md |  5 +++++
 gcc/config/i386/i386-protos.h  |  1 +
 gcc/config/i386/i386.cc        | 25 +++++++++++++++++++++++++
 gcc/config/i386/i386.md        | 10 +++++-----
 4 files changed, 36 insertions(+), 5 deletions(-)

diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
index cbee31fa40a..f4c3c3dd952 100644
--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -433,3 +433,8 @@ (define_address_constraint "jb"
 
 (define_register_constraint  "jc"
  "TARGET_APX_EGPR && !TARGET_AVX ? GENERAL_GPR16 : GENERAL_REGS")
+
+(define_constraint  "je"
+  "@internal constant that do not allow any unspec global offsets"
+  (and (match_operand 0 "x86_64_immediate_operand")
+       (match_test "!x86_poff_operand_p (op)")))
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index a9d0c568bba..7dfeb6af225 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -66,6 +66,7 @@ extern bool x86_extended_QIreg_mentioned_p (rtx_insn *);
 extern bool x86_extended_reg_mentioned_p (rtx);
 extern bool x86_extended_rex2reg_mentioned_p (rtx);
 extern bool x86_evex_reg_mentioned_p (rtx [], int);
+extern bool x86_poff_operand_p (rtx);
 extern bool x86_maybe_negate_const_int (rtx *, machine_mode);
 extern machine_mode ix86_cc_mode (enum rtx_code, rtx, rtx);
 
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 3efeed396c4..3e670330ef6 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -23341,6 +23341,31 @@ x86_evex_reg_mentioned_p (rtx operands[], int nops)
   return false;
 }
 
+/* Return true when rtx operand does not contain any UNSPEC_*POFF related
+   constant to avoid APX_NDD instructions excceed encoding length limit.  */
+bool
+x86_poff_operand_p (rtx operand)
+{
+  if (GET_CODE (operand) == CONST)
+    {
+      rtx op = XEXP (operand, 0);
+      if (GET_CODE (op) == PLUS)
+	op = XEXP (op, 0);
+	
+      if (GET_CODE (op) == UNSPEC)
+	{
+	  int unspec = XINT (op, 1);
+	  return (unspec == UNSPEC_NTPOFF
+		  || unspec == UNSPEC_TPOFF
+		  || unspec == UNSPEC_DTPOFF
+		  || unspec == UNSPEC_GOTTPOFF
+		  || unspec == UNSPEC_GOTNTPOFF
+		  || unspec == UNSPEC_INDNTPOFF);
+	}
+    }
+  return false;
+}
+
 /* If profitable, negate (without causing overflow) integer constant
    of mode MODE at location LOC.  Return true in this case.  */
 bool
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 2a73f6dcaec..6b316e698bb 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -6415,15 +6415,15 @@ (define_insn_and_split "*add<dwi>3_doubleword_concat_zext"
  "split_double_mode (<DWI>mode, &operands[0], 1, &operands[0], &operands[5]);")
 
 (define_insn "*add<mode>_1"
-  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r,r,r,r")
+  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r,r,r,r,r,r")
 	(plus:SWI48
-	  (match_operand:SWI48 1 "nonimmediate_operand" "%0,0,r,r,rm,r")
-	  (match_operand:SWI48 2 "x86_64_general_operand" "re,BM,0,le,re,BM")))
+	  (match_operand:SWI48 1 "nonimmediate_operand" "%0,0,r,r,rm,r,m,r")
+	  (match_operand:SWI48 2 "x86_64_general_operand" "re,BM,0,le,r,e,je,BM")))
    (clobber (reg:CC FLAGS_REG))]
   "ix86_binary_operator_ok (PLUS, <MODE>mode, operands,
 			    TARGET_APX_NDD)"
 {
-  bool use_ndd = (which_alternative == 4 || which_alternative == 5);
+  bool use_ndd = (which_alternative >= 4);
   switch (get_attr_type (insn))
     {
     case TYPE_LEA:
@@ -6454,7 +6454,7 @@ (define_insn "*add<mode>_1"
 		    : "add{<imodesuffix>}\t{%2, %0|%0, %2}";
     }
 }
-  [(set_attr "isa" "*,*,*,*,apx_ndd,apx_ndd")
+  [(set_attr "isa" "*,*,*,*,apx_ndd,apx_ndd,apx_ndd,apx_ndd")
    (set (attr "type")
      (cond [(eq_attr "alternative" "3")
               (const_string "lea")
-- 
2.31.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 05/17] [APX NDD] Support APX NDD for adc insns
  2023-12-05  2:29 [PATCH v2 00/17] Support Intel APX NDD Hongyu Wang
                   ` (3 preceding siblings ...)
  2023-12-05  2:29 ` [PATCH 04/17] [APX NDD] Disable seg_prefixed memory usage for NDD add Hongyu Wang
@ 2023-12-05  2:29 ` Hongyu Wang
  2023-12-05 11:25   ` Uros Bizjak
  2023-12-05  2:29 ` [PATCH 06/17] [APX NDD] Support APX NDD for sub insns Hongyu Wang
                   ` (12 subsequent siblings)
  17 siblings, 1 reply; 24+ messages in thread
From: Hongyu Wang @ 2023-12-05  2:29 UTC (permalink / raw)
  To: gcc-patches; +Cc: ubizjak, hongtao.liu, Kong Lingling

From: Kong Lingling <lingling.kong@intel.com>

Legacy adc patterns are commonly adopted to TImode add, when extending TImode
add to NDD version, operands[0] and operands[1] can be different, so extra move
should be emitted if those patterns have optimization when adding const0_rtx.

NDD instructions will automatically zero-extend dest register to 64bit, so for
zext patterns it can adopt all NDD form that have memory src input.

gcc/ChangeLog:

	* config/i386/i386.md (*add<dwi>3_doubleword): Add ndd constraints, and
	move operands[1] to operands[0] when they are not equal.
	(*add<dwi>3_doubleword_cc_overflow_1): Likewise.
	(*add<dwi>3_doubleword_zext): Add ndd constraints.
	(*addv<dwi>4_doubleword): Likewise.
	(*addv<dwi>4_doubleword_1): Likewise.
	(addv<mode>4_overflow_1): Likewise.
	(*addv<mode>4_overflow_2): Likewise.
	(@add<mode>3_carry): Likewise.
	(*add<mode>3_carry_0): Likewise.
	(*addsi3_carry_zext): Likewise.
	(addcarry<mode>): Likewise.
	(addcarry<mode>_0): Likewise.
	(*addcarry<mode>_1): Likewise.
	(*add<mode>3_eq): Likewise.
	(*add<mode>3_ne): Likewise.
	(*addsi3_carry_zext_0): Likewise, and use nonimmediate_operand for
	operands[1] to accept memory input for NDD alternative.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/apx-ndd-adc.c: New test.
---
 gcc/config/i386/i386.md                     | 191 ++++++++++++--------
 gcc/testsuite/gcc.target/i386/apx-ndd-adc.c |  15 ++
 2 files changed, 134 insertions(+), 72 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd-adc.c

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 6b316e698bb..358a3857f89 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -6291,12 +6291,12 @@ (define_expand "add<mode>3"
 				TARGET_APX_NDD); DONE;")
 
 (define_insn_and_split "*add<dwi>3_doubleword"
-  [(set (match_operand:<DWI> 0 "nonimmediate_operand" "=ro,r")
+  [(set (match_operand:<DWI> 0 "nonimmediate_operand" "=ro,r,r,r")
 	(plus:<DWI>
-	  (match_operand:<DWI> 1 "nonimmediate_operand" "%0,0")
-	  (match_operand:<DWI> 2 "x86_64_hilo_general_operand" "r<di>,o")))
+	  (match_operand:<DWI> 1 "nonimmediate_operand" "%0,0,ro,r")
+	  (match_operand:<DWI> 2 "x86_64_hilo_general_operand" "r<di>,o,r<di>,r")))
    (clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (PLUS, <DWI>mode, operands)"
+  "ix86_binary_operator_ok (PLUS, <DWI>mode, operands, TARGET_APX_NDD)"
   "#"
   "&& reload_completed"
   [(parallel [(set (reg:CCC FLAGS_REG)
@@ -6316,24 +6316,34 @@ (define_insn_and_split "*add<dwi>3_doubleword"
   split_double_mode (<DWI>mode, &operands[0], 3, &operands[0], &operands[3]);
   if (operands[2] == const0_rtx)
     {
+      /* Under NDD op0 and op1 may not equal, do not delete insn then.  */
+      bool emit_insn_deleted_note_p = true;
+      if (!rtx_equal_p (operands[0], operands[1]))
+	{
+	  emit_move_insn (operands[0], operands[1]);
+	  emit_insn_deleted_note_p = false;
+	}
       if (operands[5] != const0_rtx)
-	ix86_expand_binary_operator (PLUS, <MODE>mode, &operands[3]);
+	ix86_expand_binary_operator (PLUS, <MODE>mode, &operands[3],
+				     TARGET_APX_NDD);
       else if (!rtx_equal_p (operands[3], operands[4]))
 	emit_move_insn (operands[3], operands[4]);
-      else
+      else if (emit_insn_deleted_note_p)
 	emit_note (NOTE_INSN_DELETED);
       DONE;
     }
-})
+}
+[(set_attr "isa" "*,*,apx_ndd,apx_ndd")])
 
 (define_insn_and_split "*add<dwi>3_doubleword_zext"
-  [(set (match_operand:<DWI> 0 "nonimmediate_operand" "=r,o")
+  [(set (match_operand:<DWI> 0 "nonimmediate_operand" "=r,o,r,r")
 	(plus:<DWI>
 	  (zero_extend:<DWI>
-	    (match_operand:DWIH 2 "nonimmediate_operand" "rm,r")) 
-	  (match_operand:<DWI> 1 "nonimmediate_operand" "0,0")))
+	    (match_operand:DWIH 2 "nonimmediate_operand" "rm,r,rm,r"))
+	  (match_operand:<DWI> 1 "nonimmediate_operand" "0,0,r,m")))
    (clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (UNKNOWN, <DWI>mode, operands)"
+  "ix86_binary_operator_ok (UNKNOWN, <DWI>mode, operands,
+			    TARGET_APX_NDD)"
   "#"
   "&& reload_completed"
   [(parallel [(set (reg:CCC FLAGS_REG)
@@ -6349,7 +6359,8 @@ (define_insn_and_split "*add<dwi>3_doubleword_zext"
 		       (match_dup 4))
 		     (const_int 0)))
 	      (clobber (reg:CC FLAGS_REG))])]
- "split_double_mode (<DWI>mode, &operands[0], 2, &operands[0], &operands[3]);")
+ "split_double_mode (<DWI>mode, &operands[0], 2, &operands[0], &operands[3]);"
+ [(set_attr "isa" "*,*,apx_ndd,apx_ndd")])
 
 (define_insn_and_split "*add<dwi>3_doubleword_concat"
   [(set (match_operand:<DWI> 0 "register_operand" "=&r")
@@ -7411,14 +7422,14 @@ (define_insn_and_split "*addv<dwi>4_doubleword"
 	(eq:CCO
 	  (plus:<QPWI>
 	    (sign_extend:<QPWI>
-	      (match_operand:<DWI> 1 "nonimmediate_operand" "%0,0"))
+	      (match_operand:<DWI> 1 "nonimmediate_operand" "%0,0,ro,r"))
 	    (sign_extend:<QPWI>
-	      (match_operand:<DWI> 2 "nonimmediate_operand" "r,o")))
+	      (match_operand:<DWI> 2 "nonimmediate_operand" "r,o,r,o")))
 	  (sign_extend:<QPWI>
 	    (plus:<DWI> (match_dup 1) (match_dup 2)))))
-   (set (match_operand:<DWI> 0 "nonimmediate_operand" "=ro,r")
+   (set (match_operand:<DWI> 0 "nonimmediate_operand" "=ro,r,r,r")
 	(plus:<DWI> (match_dup 1) (match_dup 2)))]
-  "ix86_binary_operator_ok (PLUS, <DWI>mode, operands)"
+  "ix86_binary_operator_ok (PLUS, <DWI>mode, operands, TARGET_APX_NDD)"
   "#"
   "&& reload_completed"
   [(parallel [(set (reg:CCC FLAGS_REG)
@@ -7448,22 +7459,23 @@ (define_insn_and_split "*addv<dwi>4_doubleword"
 		     (match_dup 5)))])]
 {
   split_double_mode (<DWI>mode, &operands[0], 3, &operands[0], &operands[3]);
-})
+}
+[(set_attr "isa" "*,*,apx_ndd,apx_ndd")])
 
 (define_insn_and_split "*addv<dwi>4_doubleword_1"
   [(set (reg:CCO FLAGS_REG)
 	(eq:CCO
 	  (plus:<QPWI>
 	    (sign_extend:<QPWI>
-	      (match_operand:<DWI> 1 "nonimmediate_operand" "%0"))
-	    (match_operand:<QPWI> 3 "const_scalar_int_operand" "n"))
+	      (match_operand:<DWI> 1 "nonimmediate_operand" "%0,rm"))
+	    (match_operand:<QPWI> 3 "const_scalar_int_operand" "n,n"))
 	  (sign_extend:<QPWI>
 	    (plus:<DWI>
 	      (match_dup 1)
-	      (match_operand:<DWI> 2 "x86_64_hilo_general_operand" "<di>")))))
-   (set (match_operand:<DWI> 0 "nonimmediate_operand" "=ro")
+	      (match_operand:<DWI> 2 "x86_64_hilo_general_operand" "<di>,<di>")))))
+   (set (match_operand:<DWI> 0 "nonimmediate_operand" "=ro,r")
 	(plus:<DWI> (match_dup 1) (match_dup 2)))]
-  "ix86_binary_operator_ok (PLUS, <DWI>mode, operands)
+  "ix86_binary_operator_ok (PLUS, <DWI>mode, operands, TARGET_APX_NDD)
    && CONST_SCALAR_INT_P (operands[2])
    && rtx_equal_p (operands[2], operands[3])"
   "#"
@@ -7501,7 +7513,8 @@ (define_insn_and_split "*addv<dwi>4_doubleword_1"
 				    operands[5]));
       DONE;
     }
-})
+}
+[(set_attr "isa" "*,apx_ndd")])
 
 (define_insn "*addv<mode>4_overflow_1"
   [(set (reg:CCO FLAGS_REG)
@@ -7511,9 +7524,9 @@ (define_insn "*addv<mode>4_overflow_1"
 	      (match_operator:<DWI> 4 "ix86_carry_flag_operator"
 		[(match_operand 3 "flags_reg_operand") (const_int 0)])
 	      (sign_extend:<DWI>
-		(match_operand:SWI 1 "nonimmediate_operand" "%0,0")))
+		(match_operand:SWI 1 "nonimmediate_operand" "%0,0,rm,r")))
 	    (sign_extend:<DWI>
-	      (match_operand:SWI 2 "<general_sext_operand>" "rWe,m")))
+	      (match_operand:SWI 2 "<general_sext_operand>" "rWe,m,rWe,m")))
 	  (sign_extend:<DWI>
 	    (plus:SWI
 	      (plus:SWI
@@ -7521,15 +7534,20 @@ (define_insn "*addv<mode>4_overflow_1"
 		  [(match_dup 3) (const_int 0)])
 		(match_dup 1))
 	      (match_dup 2)))))
-   (set (match_operand:SWI 0 "nonimmediate_operand" "=rm,r")
+   (set (match_operand:SWI 0 "nonimmediate_operand" "=rm,r,r,r")
 	(plus:SWI
 	  (plus:SWI
 	    (match_op_dup 5 [(match_dup 3) (const_int 0)])
 	    (match_dup 1))
 	  (match_dup 2)))]
-  "ix86_binary_operator_ok (PLUS, <MODE>mode, operands)"
-  "adc{<imodesuffix>}\t{%2, %0|%0, %2}"
-  [(set_attr "type" "alu")
+  "ix86_binary_operator_ok (PLUS, <MODE>mode, operands, TARGET_APX_NDD)"
+  "@
+   adc{<imodesuffix>}\t{%2, %0|%0, %2}
+   adc{<imodesuffix>}\t{%2, %0|%0, %2}
+   adc{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}
+   adc{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "*,*,apx_ndd,apx_ndd")
+   (set_attr "type" "alu")
    (set_attr "mode" "<MODE>")])
 
 (define_insn "*addv<mode>4_overflow_2"
@@ -7540,26 +7558,29 @@ (define_insn "*addv<mode>4_overflow_2"
 	      (match_operator:<DWI> 4 "ix86_carry_flag_operator"
 		[(match_operand 3 "flags_reg_operand") (const_int 0)])
 	      (sign_extend:<DWI>
-		(match_operand:SWI 1 "nonimmediate_operand" "%0")))
-	    (match_operand:<DWI> 6 "const_int_operand" "n"))
+		(match_operand:SWI 1 "nonimmediate_operand" "%0,rm")))
+	    (match_operand:<DWI> 6 "const_int_operand" "n,n"))
 	  (sign_extend:<DWI>
 	    (plus:SWI
 	      (plus:SWI
 		(match_operator:SWI 5 "ix86_carry_flag_operator"
 		  [(match_dup 3) (const_int 0)])
 		(match_dup 1))
-	      (match_operand:SWI 2 "x86_64_immediate_operand" "e")))))
-   (set (match_operand:SWI 0 "nonimmediate_operand" "=rm")
+	      (match_operand:SWI 2 "x86_64_immediate_operand" "e,e")))))
+   (set (match_operand:SWI 0 "nonimmediate_operand" "=rm,r")
 	(plus:SWI
 	  (plus:SWI
 	    (match_op_dup 5 [(match_dup 3) (const_int 0)])
 	    (match_dup 1))
 	  (match_dup 2)))]
-  "ix86_binary_operator_ok (PLUS, <MODE>mode, operands)
+  "ix86_binary_operator_ok (PLUS, <MODE>mode, operands, TARGET_APX_NDD)
    && CONST_INT_P (operands[2])
    && INTVAL (operands[2]) == INTVAL (operands[6])"
-  "adc{<imodesuffix>}\t{%2, %0|%0, %2}"
-  [(set_attr "type" "alu")
+  "@
+  adc{<imodesuffix>}\t{%2, %0|%0, %2}
+  adc{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "*,apx_ndd")
+   (set_attr "type" "alu")
    (set_attr "mode" "<MODE>")
    (set (attr "length_immediate")
      (if_then_else (match_test "IN_RANGE (INTVAL (operands[2]), -128, 127)")
@@ -8381,17 +8402,22 @@ (define_insn "*subsi_3_zext"
 ;; Add with carry and subtract with borrow
 
 (define_insn "@add<mode>3_carry"
-  [(set (match_operand:SWI 0 "nonimmediate_operand" "=<r>m,<r>")
+  [(set (match_operand:SWI 0 "nonimmediate_operand" "=<r>m,<r>,r,r")
 	(plus:SWI
 	  (plus:SWI
 	    (match_operator:SWI 4 "ix86_carry_flag_operator"
 	     [(match_operand 3 "flags_reg_operand") (const_int 0)])
-	    (match_operand:SWI 1 "nonimmediate_operand" "%0,0"))
-	  (match_operand:SWI 2 "<general_operand>" "<r><i>,<m>")))
+	    (match_operand:SWI 1 "nonimmediate_operand" "%0,0,rm,r"))
+	  (match_operand:SWI 2 "<general_operand>" "<r><i>,<m>,r<i>,<m>")))
    (clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (PLUS, <MODE>mode, operands)"
-  "adc{<imodesuffix>}\t{%2, %0|%0, %2}"
-  [(set_attr "type" "alu")
+  "ix86_binary_operator_ok (PLUS, <MODE>mode, operands, TARGET_APX_NDD)"
+  "@
+   adc{<imodesuffix>}\t{%2, %0|%0, %2}
+   adc{<imodesuffix>}\t{%2, %0|%0, %2}
+   adc{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}
+   adc{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "*,*,apx_ndd,apx_ndd")
+   (set_attr "type" "alu")
    (set_attr "use_carry" "1")
    (set_attr "pent_pair" "pu")
    (set_attr "mode" "<MODE>")])
@@ -8478,31 +8504,39 @@ (define_insn "*add<mode>3_carry_0r"
    (set_attr "mode" "<MODE>")])
 
 (define_insn "*addsi3_carry_zext"
-  [(set (match_operand:DI 0 "register_operand" "=r")
+  [(set (match_operand:DI 0 "register_operand" "=r,r,r")
 	(zero_extend:DI
 	  (plus:SI
 	    (plus:SI (match_operator:SI 3 "ix86_carry_flag_operator"
 		      [(reg FLAGS_REG) (const_int 0)])
-		     (match_operand:SI 1 "register_operand" "%0"))
-	    (match_operand:SI 2 "x86_64_general_operand" "rBMe"))))
+		     (match_operand:SI 1 "nonimmediate_operand" "%0,r,rm"))
+	    (match_operand:SI 2 "x86_64_general_operand" "rBMe,rBMe,re"))))
    (clobber (reg:CC FLAGS_REG))]
-  "TARGET_64BIT && ix86_binary_operator_ok (PLUS, SImode, operands)"
-  "adc{l}\t{%2, %k0|%k0, %2}"
-  [(set_attr "type" "alu")
+  "TARGET_64BIT && ix86_binary_operator_ok (PLUS, SImode, operands,
+					    TARGET_APX_NDD)"
+  "@
+  adc{l}\t{%2, %k0|%k0, %2}
+  adc{l}\t{%2, %1, %k0|%k0, %1, %2}
+  adc{l}\t{%2, %1, %k0|%k0, %1, %2}"
+  [(set_attr "isa" "*,apx_ndd,apx_ndd")
+   (set_attr "type" "alu")
    (set_attr "use_carry" "1")
    (set_attr "pent_pair" "pu")
    (set_attr "mode" "SI")])
 
 (define_insn "*addsi3_carry_zext_0"
-  [(set (match_operand:DI 0 "register_operand" "=r")
+  [(set (match_operand:DI 0 "register_operand" "=r,r")
 	(zero_extend:DI
 	  (plus:SI (match_operator:SI 2 "ix86_carry_flag_operator"
 		    [(reg FLAGS_REG) (const_int 0)])
-		   (match_operand:SI 1 "register_operand" "0"))))
+		   (match_operand:SI 1 "nonimmediate_operand" "0,rm"))))
    (clobber (reg:CC FLAGS_REG))]
   "TARGET_64BIT"
-  "adc{l}\t{$0, %k0|%k0, 0}"
-  [(set_attr "type" "alu")
+  "@
+  adc{l}\t{$0, %k0|%k0, 0}
+  adc{l}\t{$0, %1, %k0|%k0, %1, 0}"
+  [(set_attr "isa" "*,apx_ndd")
+   (set_attr "type" "alu")
    (set_attr "use_carry" "1")
    (set_attr "pent_pair" "pu")
    (set_attr "mode" "SI")])
@@ -8531,20 +8565,25 @@ (define_insn "addcarry<mode>"
 	      (plus:SWI48
 		(match_operator:SWI48 5 "ix86_carry_flag_operator"
 		  [(match_operand 3 "flags_reg_operand") (const_int 0)])
-		(match_operand:SWI48 1 "nonimmediate_operand" "%0,0"))
-	      (match_operand:SWI48 2 "nonimmediate_operand" "r,rm")))
+		(match_operand:SWI48 1 "nonimmediate_operand" "%0,0,rm,r"))
+	      (match_operand:SWI48 2 "nonimmediate_operand" "r,rm,r,m")))
 	  (plus:<DWI>
 	    (zero_extend:<DWI> (match_dup 2))
 	    (match_operator:<DWI> 4 "ix86_carry_flag_operator"
 	      [(match_dup 3) (const_int 0)]))))
-   (set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r")
+   (set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r,r")
 	(plus:SWI48 (plus:SWI48 (match_op_dup 5
 				 [(match_dup 3) (const_int 0)])
 				(match_dup 1))
 		    (match_dup 2)))]
-  "ix86_binary_operator_ok (PLUS, <MODE>mode, operands)"
-  "adc{<imodesuffix>}\t{%2, %0|%0, %2}"
-  [(set_attr "type" "alu")
+  "ix86_binary_operator_ok (PLUS, <MODE>mode, operands, TARGET_APX_NDD)"
+  "@
+  adc{<imodesuffix>}\t{%2, %0|%0, %2}
+  adc{<imodesuffix>}\t{%2, %0|%0, %2}
+  adc{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}
+  adc{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "*,*,apx_ndd,apx_ndd")
+   (set_attr "type" "alu")
    (set_attr "use_carry" "1")
    (set_attr "pent_pair" "pu")
    (set_attr "mode" "<MODE>")])
@@ -8702,7 +8741,8 @@ (define_expand "addcarry<mode>_0"
 	     (match_dup 1)))
       (set (match_operand:SWI48 0 "nonimmediate_operand")
 	   (plus:SWI48 (match_dup 1) (match_dup 2)))])]
-  "ix86_binary_operator_ok (PLUS, <MODE>mode, operands)")
+  "ix86_binary_operator_ok (PLUS, <MODE>mode, operands,
+			    TARGET_APX_NDD)")
 
 (define_insn "*addcarry<mode>_1"
   [(set (reg:CCC FLAGS_REG)
@@ -8712,18 +8752,18 @@ (define_insn "*addcarry<mode>_1"
 	      (plus:SWI48
 		(match_operator:SWI48 5 "ix86_carry_flag_operator"
 		  [(match_operand 3 "flags_reg_operand") (const_int 0)])
-		(match_operand:SWI48 1 "nonimmediate_operand" "%0"))
-	      (match_operand:SWI48 2 "x86_64_immediate_operand" "e")))
+		(match_operand:SWI48 1 "nonimmediate_operand" "%0,rm"))
+	      (match_operand:SWI48 2 "x86_64_immediate_operand" "e,e")))
 	  (plus:<DWI>
 	    (match_operand:<DWI> 6 "const_scalar_int_operand")
 	    (match_operator:<DWI> 4 "ix86_carry_flag_operator"
 	      [(match_dup 3) (const_int 0)]))))
-   (set (match_operand:SWI48 0 "nonimmediate_operand" "=rm")
+   (set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r")
 	(plus:SWI48 (plus:SWI48 (match_op_dup 5
 				 [(match_dup 3) (const_int 0)])
 				(match_dup 1))
 		    (match_dup 2)))]
-  "ix86_binary_operator_ok (PLUS, <MODE>mode, operands)
+  "ix86_binary_operator_ok (PLUS, <MODE>mode, operands, TARGET_APX_NDD)
    && CONST_INT_P (operands[2])
    /* Check that operands[6] is operands[2] zero extended from
       <MODE>mode to <DWI>mode.  */
@@ -8736,8 +8776,11 @@ (define_insn "*addcarry<mode>_1"
 	  && ((unsigned HOST_WIDE_INT) CONST_WIDE_INT_ELT (operands[6], 0)
 	      == UINTVAL (operands[2]))
 	  && CONST_WIDE_INT_ELT (operands[6], 1) == 0))"
-  "adc{<imodesuffix>}\t{%2, %0|%0, %2}"
-  [(set_attr "type" "alu")
+  "@
+  adc{<imodesuffix>}\t{%2, %0|%0, %2}
+  adc{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "*,apx_ndd")
+   (set_attr "type" "alu")
    (set_attr "use_carry" "1")
    (set_attr "pent_pair" "pu")
    (set_attr "mode" "<MODE>")
@@ -9385,12 +9428,12 @@ (define_insn_and_split "*add<dwi>3_doubleword_cc_overflow_1"
   [(set (reg:CCC FLAGS_REG)
 	(compare:CCC
 	  (plus:<DWI>
-	    (match_operand:<DWI> 1 "nonimmediate_operand" "%0,0")
-	    (match_operand:<DWI> 2 "x86_64_hilo_general_operand" "r<di>,o"))
+	    (match_operand:<DWI> 1 "nonimmediate_operand" "%0,0,ro,r")
+	    (match_operand:<DWI> 2 "x86_64_hilo_general_operand" "r<di>,o,r<di>,o"))
 	  (match_dup 1)))
-   (set (match_operand:<DWI> 0 "nonimmediate_operand" "=ro,r")
+   (set (match_operand:<DWI> 0 "nonimmediate_operand" "=ro,r,r,r")
 	(plus:<DWI> (match_dup 1) (match_dup 2)))]
-  "ix86_binary_operator_ok (PLUS, <DWI>mode, operands)"
+  "ix86_binary_operator_ok (PLUS, <DWI>mode, operands, TARGET_APX_NDD)"
   "#"
   "&& reload_completed"
   [(parallel [(set (reg:CCC FLAGS_REG)
@@ -9419,6 +9462,8 @@ (define_insn_and_split "*add<dwi>3_doubleword_cc_overflow_1"
   split_double_mode (<DWI>mode, &operands[0], 3, &operands[0], &operands[3]);
   if (operands[2] == const0_rtx)
     {
+      if (!rtx_equal_p (operands[0], operands[1]))
+	emit_move_insn (operands[0], operands[1]);
       emit_insn (gen_addcarry<mode>_0 (operands[3], operands[4], operands[5]));
       DONE;
     }
@@ -9427,7 +9472,8 @@ (define_insn_and_split "*add<dwi>3_doubleword_cc_overflow_1"
 					    operands[5], <MODE>mode);
   else
     operands[6] = gen_rtx_ZERO_EXTEND (<DWI>mode, operands[5]);
-})
+}
+[(set_attr "isa" "*,*,apx_ndd,apx_ndd")])
 
 ;; x == 0 with zero flag test can be done also as x < 1U with carry flag
 ;; test, where the latter is preferrable if we have some carry consuming
@@ -9442,7 +9488,7 @@ (define_insn_and_split "*add<mode>3_eq"
 	    (match_operand:SWI 1 "nonimmediate_operand"))
 	  (match_operand:SWI 2 "<general_operand>")))
    (clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (PLUS, <MODE>mode, operands)
+  "ix86_binary_operator_ok (PLUS, <MODE>mode, operands, TARGET_APX_NDD)
    && ix86_pre_reload_split ()"
   "#"
   "&& 1"
@@ -9466,7 +9512,8 @@ (define_insn_and_split "*add<mode>3_ne"
   "CONST_INT_P (operands[2])
    && (<MODE>mode != DImode
        || INTVAL (operands[2]) != HOST_WIDE_INT_C (-0x80000000))
-   && ix86_binary_operator_ok (PLUS, <MODE>mode, operands)
+   && ix86_binary_operator_ok (PLUS, <MODE>mode, operands,
+			       TARGET_APX_NDD)
    && ix86_pre_reload_split ()"
   "#"
   "&& 1"
diff --git a/gcc/testsuite/gcc.target/i386/apx-ndd-adc.c b/gcc/testsuite/gcc.target/i386/apx-ndd-adc.c
new file mode 100644
index 00000000000..9d5991457da
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/apx-ndd-adc.c
@@ -0,0 +1,15 @@
+/* { dg-do compile { target { int128 && { ! ia32 } } } } */
+/* { dg-options "-mapxf -O2" } */
+
+#include "pr91681-1.c"
+// *addti3_doubleword
+// *addti3_doubleword_zext
+// *adddi3_cc_overflow_1
+// *adddi3_carry
+
+int foo3 (int *a, int b) 
+{				 	  
+  int c = *a + b + (a > b); /* { dg-warning "comparison between pointer and integer" } */
+  return c;			 	  
+}			
+/* { dg-final { scan-assembler-not "xor" } } */
-- 
2.31.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 06/17] [APX NDD] Support APX NDD for sub insns
  2023-12-05  2:29 [PATCH v2 00/17] Support Intel APX NDD Hongyu Wang
                   ` (4 preceding siblings ...)
  2023-12-05  2:29 ` [PATCH 05/17] [APX NDD] Support APX NDD for adc insns Hongyu Wang
@ 2023-12-05  2:29 ` Hongyu Wang
  2023-12-05  2:29 ` [PATCH 07/17] [APX NDD] Support APX NDD for sbb insn Hongyu Wang
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Hongyu Wang @ 2023-12-05  2:29 UTC (permalink / raw)
  To: gcc-patches; +Cc: ubizjak, hongtao.liu, Kong Lingling

From: Kong Lingling <lingling.kong@intel.com>

gcc/ChangeLog:

	* config/i386/i386-expand.cc (ix86_fixup_binary_operands_no_copy):
	Add use_ndd parameter and parse it.
	* config/i386/i386-protos.h (ix86_fixup_binary_operands_no_copy):
	Change define.
	* config/i386/i386.md (sub<mode>3): Add new alternatives for NDD
	and adjust output templates.
	(*sub<mode>_1): Likewise.
	(*sub<mode>_2): Likewise.
	(subv<mode>4): Likewise.
	(*subv<mode>4): Likewise.
	(subv<mode>4_1): Likewise.
	(usubv<mode>4): Likewise.
	(*sub<mode>_3): Likewise.
	(*subsi_1_zext): Likewise, and use nonimmediate_operand for operands[1]
	to accept memory input for NDD alternatives.
	(*subsi_2_zext): Likewise.
	(*subsi_3_zext): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/apx-ndd.c: Add test for ndd sub.
---
 gcc/config/i386/i386-expand.cc          |   5 +-
 gcc/config/i386/i386-protos.h           |   2 +-
 gcc/config/i386/i386.md                 | 155 ++++++++++++++++--------
 gcc/testsuite/gcc.target/i386/apx-ndd.c |  13 ++
 4 files changed, 120 insertions(+), 55 deletions(-)

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 3ecda989cf8..93ecde4b4a8 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -1326,9 +1326,10 @@ ix86_fixup_binary_operands (enum rtx_code code, machine_mode mode,
 
 void
 ix86_fixup_binary_operands_no_copy (enum rtx_code code,
-				    machine_mode mode, rtx operands[])
+				    machine_mode mode, rtx operands[],
+				    bool use_ndd)
 {
-  rtx dst = ix86_fixup_binary_operands (code, mode, operands);
+  rtx dst = ix86_fixup_binary_operands (code, mode, operands, use_ndd);
   gcc_assert (dst == operands[0]);
 }
 
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 7dfeb6af225..481527872e8 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -111,7 +111,7 @@ extern void ix86_expand_vector_move_misalign (machine_mode, rtx[]);
 extern rtx ix86_fixup_binary_operands (enum rtx_code,
 				       machine_mode, rtx[], bool = false);
 extern void ix86_fixup_binary_operands_no_copy (enum rtx_code,
-						machine_mode, rtx[]);
+						machine_mode, rtx[], bool = false);
 extern void ix86_expand_binary_operator (enum rtx_code,
 					 machine_mode, rtx[], bool = false);
 extern void ix86_expand_vector_logical_operator (enum rtx_code,
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 358a3857f89..ea5377a0b38 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -7772,7 +7772,8 @@ (define_expand "sub<mode>3"
 	(minus:SDWIM (match_operand:SDWIM 1 "nonimmediate_operand")
 		     (match_operand:SDWIM 2 "<general_hilo_operand>")))]
   ""
-  "ix86_expand_binary_operator (MINUS, <MODE>mode, operands); DONE;")
+  "ix86_expand_binary_operator (MINUS, <MODE>mode, operands,
+				TARGET_APX_NDD); DONE;")
 
 (define_insn_and_split "*sub<dwi>3_doubleword"
   [(set (match_operand:<DWI> 0 "nonimmediate_operand" "=ro,r")
@@ -7798,7 +7799,10 @@ (define_insn_and_split "*sub<dwi>3_doubleword"
   split_double_mode (<DWI>mode, &operands[0], 3, &operands[0], &operands[3]);
   if (operands[2] == const0_rtx)
     {
-      ix86_expand_binary_operator (MINUS, <MODE>mode, &operands[3]);
+      if (!rtx_equal_p (operands[0], operands[1]))
+	emit_move_insn (operands[0], operands[1]);
+      ix86_expand_binary_operator (MINUS, <MODE>mode, &operands[3],
+				   TARGET_APX_NDD);
       DONE;
     }
 })
@@ -7827,25 +7831,36 @@ (define_insn_and_split "*sub<dwi>3_doubleword_zext"
   "split_double_mode (<DWI>mode, &operands[0], 2, &operands[0], &operands[3]);")
 
 (define_insn "*sub<mode>_1"
-  [(set (match_operand:SWI 0 "nonimmediate_operand" "=<r>m,<r>")
+  [(set (match_operand:SWI 0 "nonimmediate_operand" "=<r>m,<r>,r,r")
 	(minus:SWI
-	  (match_operand:SWI 1 "nonimmediate_operand" "0,0")
-	  (match_operand:SWI 2 "<general_operand>" "<r><i>,<m>")))
+	  (match_operand:SWI 1 "nonimmediate_operand" "0,0,rm,r")
+	  (match_operand:SWI 2 "<general_operand>" "<r><i>,<m>,r<i>,<m>")))
    (clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (MINUS, <MODE>mode, operands)"
-  "sub{<imodesuffix>}\t{%2, %0|%0, %2}"
-  [(set_attr "type" "alu")
+  "ix86_binary_operator_ok (MINUS, <MODE>mode, operands,
+			    TARGET_APX_NDD)"
+  "@
+  sub{<imodesuffix>}\t{%2, %0|%0, %2}
+  sub{<imodesuffix>}\t{%2, %0|%0, %2}
+  sub{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}
+  sub{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "*,*,apx_ndd,apx_ndd")
+   (set_attr "type" "alu")
    (set_attr "mode" "<MODE>")])
 
 (define_insn "*subsi_1_zext"
-  [(set (match_operand:DI 0 "register_operand" "=r")
+  [(set (match_operand:DI 0 "register_operand" "=r,r,r")
 	(zero_extend:DI
-	  (minus:SI (match_operand:SI 1 "register_operand" "0")
-		    (match_operand:SI 2 "x86_64_general_operand" "rBMe"))))
+	  (minus:SI (match_operand:SI 1 "nonimmediate_operand" "0,r,rm")
+		    (match_operand:SI 2 "x86_64_general_operand" "rBMe,rBMe,re"))))
    (clobber (reg:CC FLAGS_REG))]
-  "TARGET_64BIT && ix86_binary_operator_ok (MINUS, SImode, operands)"
-  "sub{l}\t{%2, %k0|%k0, %2}"
-  [(set_attr "type" "alu")
+  "TARGET_64BIT && ix86_binary_operator_ok (MINUS, SImode, operands,
+					    TARGET_APX_NDD)"
+  "@
+  sub{l}\t{%2, %k0|%k0, %2}
+  sub{l}\t{%2, %1, %k0|%k0, %1, %2}
+  sub{l}\t{%2, %1, %k0|%k0, %1, %2}"
+  [(set_attr "isa" "*,apx_ndd,apx_ndd")
+   (set_attr "type" "alu")
    (set_attr "mode" "SI")])
 
 ;; Alternative 1 is needed to work around LRA limitation, see PR82524.
@@ -7936,31 +7951,42 @@ (define_insn "*sub<mode>_2"
   [(set (reg FLAGS_REG)
 	(compare
 	  (minus:SWI
-	    (match_operand:SWI 1 "nonimmediate_operand" "0,0")
-	    (match_operand:SWI 2 "<general_operand>" "<r><i>,<m>"))
+	    (match_operand:SWI 1 "nonimmediate_operand" "0,0,rm,r")
+	    (match_operand:SWI 2 "<general_operand>" "<r><i>,<m>,r<i>,<m>"))
 	  (const_int 0)))
-   (set (match_operand:SWI 0 "nonimmediate_operand" "=<r>m,<r>")
+   (set (match_operand:SWI 0 "nonimmediate_operand" "=<r>m,<r>,r,r")
 	(minus:SWI (match_dup 1) (match_dup 2)))]
   "ix86_match_ccmode (insn, CCGOCmode)
-   && ix86_binary_operator_ok (MINUS, <MODE>mode, operands)"
-  "sub{<imodesuffix>}\t{%2, %0|%0, %2}"
-  [(set_attr "type" "alu")
+   && ix86_binary_operator_ok (MINUS, <MODE>mode, operands,
+			       TARGET_APX_NDD)"
+  "@
+  sub{<imodesuffix>}\t{%2, %0|%0, %2}
+  sub{<imodesuffix>}\t{%2, %0|%0, %2}
+  sub{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}
+  sub{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "*,*,apx_ndd,apx_ndd")
+   (set_attr "type" "alu")
    (set_attr "mode" "<MODE>")])
 
 (define_insn "*subsi_2_zext"
   [(set (reg FLAGS_REG)
 	(compare
-	  (minus:SI (match_operand:SI 1 "register_operand" "0")
-		    (match_operand:SI 2 "x86_64_general_operand" "rBMe"))
+	  (minus:SI (match_operand:SI 1 "nonimmediate_operand" "0,r,rm")
+		    (match_operand:SI 2 "x86_64_general_operand" "rBMe,rBMe,re"))
 	  (const_int 0)))
-   (set (match_operand:DI 0 "register_operand" "=r")
+   (set (match_operand:DI 0 "register_operand" "=r,r,r")
 	(zero_extend:DI
 	  (minus:SI (match_dup 1)
 		    (match_dup 2))))]
   "TARGET_64BIT && ix86_match_ccmode (insn, CCGOCmode)
-   && ix86_binary_operator_ok (MINUS, SImode, operands)"
-  "sub{l}\t{%2, %k0|%k0, %2}"
-  [(set_attr "type" "alu")
+   && ix86_binary_operator_ok (MINUS, SImode, operands,
+			       TARGET_APX_NDD)"
+  "@
+  sub{l}\t{%2, %k0|%k0, %2}
+  sub{l}\t{%2, %1, %k0|%k0, %1, %2}
+  sub{l}\t{%2, %1, %k0|%k0, %1, %2}"
+  [(set_attr "isa" "*,apx_ndd,apx_ndd")
+   (set_attr "type" "alu")
    (set_attr "mode" "SI")])
 
 (define_insn "*subqi_ext<mode>_0"
@@ -8072,7 +8098,8 @@ (define_expand "subv<mode>4"
 	       (pc)))]
   ""
 {
-  ix86_fixup_binary_operands_no_copy (MINUS, <MODE>mode, operands);
+  ix86_fixup_binary_operands_no_copy (MINUS, <MODE>mode, operands,
+				      TARGET_APX_NDD);
   if (CONST_SCALAR_INT_P (operands[2]))
     operands[4] = operands[2];
   else
@@ -8083,35 +8110,45 @@ (define_insn "*subv<mode>4"
   [(set (reg:CCO FLAGS_REG)
 	(eq:CCO (minus:<DWI>
 		   (sign_extend:<DWI>
-		      (match_operand:SWI 1 "nonimmediate_operand" "0,0"))
+		      (match_operand:SWI 1 "nonimmediate_operand" "0,0,rm,r"))
 		   (sign_extend:<DWI>
-		      (match_operand:SWI 2 "<general_sext_operand>" "<r>We,m")))
+		      (match_operand:SWI 2 "<general_sext_operand>" "<r>We,m,rWe,m")))
 		(sign_extend:<DWI>
 		   (minus:SWI (match_dup 1) (match_dup 2)))))
-   (set (match_operand:SWI 0 "nonimmediate_operand" "=<r>m,<r>")
+   (set (match_operand:SWI 0 "nonimmediate_operand" "=<r>m,<r>,r,r")
 	(minus:SWI (match_dup 1) (match_dup 2)))]
-  "ix86_binary_operator_ok (MINUS, <MODE>mode, operands)"
-  "sub{<imodesuffix>}\t{%2, %0|%0, %2}"
-  [(set_attr "type" "alu")
+  "ix86_binary_operator_ok (MINUS, <MODE>mode, operands,
+			    TARGET_APX_NDD)"
+  "@
+  sub{<imodesuffix>}\t{%2, %0|%0, %2}
+  sub{<imodesuffix>}\t{%2, %0|%0, %2}
+  sub{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}
+  sub{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "*,*,apx_ndd,apx_ndd")
+   (set_attr "type" "alu")
    (set_attr "mode" "<MODE>")])
 
 (define_insn "subv<mode>4_1"
   [(set (reg:CCO FLAGS_REG)
 	(eq:CCO (minus:<DWI>
 		   (sign_extend:<DWI>
-		      (match_operand:SWI 1 "nonimmediate_operand" "0"))
+		      (match_operand:SWI 1 "nonimmediate_operand" "0,rm"))
 		   (match_operand:<DWI> 3 "const_int_operand"))
 		(sign_extend:<DWI>
 		   (minus:SWI
 		     (match_dup 1)
-		     (match_operand:SWI 2 "x86_64_immediate_operand" "<i>")))))
-   (set (match_operand:SWI 0 "nonimmediate_operand" "=<r>m")
+		     (match_operand:SWI 2 "x86_64_immediate_operand" "<i>,<i>")))))
+   (set (match_operand:SWI 0 "nonimmediate_operand" "=<r>m,r")
 	(minus:SWI (match_dup 1) (match_dup 2)))]
-  "ix86_binary_operator_ok (MINUS, <MODE>mode, operands)
+  "ix86_binary_operator_ok (MINUS, <MODE>mode, operands,
+			    TARGET_APX_NDD)
    && CONST_INT_P (operands[2])
    && INTVAL (operands[2]) == INTVAL (operands[3])"
-  "sub{<imodesuffix>}\t{%2, %0|%0, %2}"
-  [(set_attr "type" "alu")
+  "@
+  sub{<imodesuffix>}\t{%2, %0|%0, %2}
+  sub{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "*,apx_ndd")
+   (set_attr "type" "alu")
    (set_attr "mode" "<MODE>")
    (set (attr "length_immediate")
 	(cond [(match_test "IN_RANGE (INTVAL (operands[2]), -128, 127)")
@@ -8207,6 +8244,8 @@ (define_insn_and_split "*subv<dwi>4_doubleword_1"
   split_double_mode (<DWI>mode, &operands[0], 3, &operands[0], &operands[3]);
   if (operands[2] == const0_rtx)
     {
+      if (!rtx_equal_p (operands[0], operands[1]))
+	emit_move_insn (operands[0], operands[1]);
       emit_insn (gen_subv<mode>4_1 (operands[3], operands[4], operands[5],
 				    operands[5]));
       DONE;
@@ -8288,18 +8327,25 @@ (define_expand "usubv<mode>4"
 	       (label_ref (match_operand 3))
 	       (pc)))]
   ""
-  "ix86_fixup_binary_operands_no_copy (MINUS, <MODE>mode, operands);")
+  "ix86_fixup_binary_operands_no_copy (MINUS, <MODE>mode, operands,
+				       TARGET_APX_NDD);")
 
 (define_insn "*sub<mode>_3"
   [(set (reg FLAGS_REG)
-	(compare (match_operand:SWI 1 "nonimmediate_operand" "0,0")
-		 (match_operand:SWI 2 "<general_operand>" "<r><i>,<m>")))
-   (set (match_operand:SWI 0 "nonimmediate_operand" "=<r>m,<r>")
+	(compare (match_operand:SWI 1 "nonimmediate_operand" "0,0,rm,r")
+		 (match_operand:SWI 2 "<general_operand>" "<r><i>,<m>,r<i>,<m>")))
+   (set (match_operand:SWI 0 "nonimmediate_operand" "=<r>m,<r>i,r,r")
 	(minus:SWI (match_dup 1) (match_dup 2)))]
   "ix86_match_ccmode (insn, CCmode)
-   && ix86_binary_operator_ok (MINUS, <MODE>mode, operands)"
-  "sub{<imodesuffix>}\t{%2, %0|%0, %2}"
-  [(set_attr "type" "alu")
+   && ix86_binary_operator_ok (MINUS, <MODE>mode, operands,
+			       TARGET_APX_NDD)"
+  "@
+  sub{<imodesuffix>}\t{%2, %0|%0, %2}
+  sub{<imodesuffix>}\t{%2, %0|%0, %2}
+  sub{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}
+  sub{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "*,*,apx_ndd,apx_ndd")
+   (set_attr "type" "alu")
    (set_attr "mode" "<MODE>")])
 
 (define_peephole2
@@ -8387,16 +8433,21 @@ (define_insn_and_split "*dec_cmov<mode>"
 
 (define_insn "*subsi_3_zext"
   [(set (reg FLAGS_REG)
-	(compare (match_operand:SI 1 "register_operand" "0")
-		 (match_operand:SI 2 "x86_64_general_operand" "rBMe")))
-   (set (match_operand:DI 0 "register_operand" "=r")
+	(compare (match_operand:SI 1 "nonimmediate_operand" "0,r,rm")
+		 (match_operand:SI 2 "x86_64_general_operand" "rBMe,rBMe,re")))
+   (set (match_operand:DI 0 "register_operand" "=r,r,r")
 	(zero_extend:DI
 	  (minus:SI (match_dup 1)
 		    (match_dup 2))))]
   "TARGET_64BIT && ix86_match_ccmode (insn, CCmode)
-   && ix86_binary_operator_ok (MINUS, SImode, operands)"
-  "sub{l}\t{%2, %1|%1, %2}"
-  [(set_attr "type" "alu")
+   && ix86_binary_operator_ok (MINUS, SImode, operands,
+			       TARGET_APX_NDD)"
+  "@
+  sub{l}\t{%2, %1|%1, %2}
+  sub{l}\t{%2, %1, %k0|%k0, %1, %2}
+  sub{l}\t{%2, %1, %k0|%k0, %1, %2}"
+  [(set_attr "isa" "*,apx_ndd,apx_ndd")
+   (set_attr "type" "alu")
    (set_attr "mode" "SI")])
 \f
 ;; Add with carry and subtract with borrow
diff --git a/gcc/testsuite/gcc.target/i386/apx-ndd.c b/gcc/testsuite/gcc.target/i386/apx-ndd.c
index c1049022f2a..0c7952ef018 100644
--- a/gcc/testsuite/gcc.target/i386/apx-ndd.c
+++ b/gcc/testsuite/gcc.target/i386/apx-ndd.c
@@ -42,3 +42,16 @@ FOO (long, add, +)
 FOO1 (long, add, +)
 FOO2 (long, add, +)
 
+FOO (char, sub, -)
+FOO1 (char, sub, -)
+FOO (short, sub, -)
+FOO1 (short, sub, -)
+FOO (int, sub, -)
+FOO1 (int, sub, -)
+FOO (long, sub, -)
+FOO1 (long, sub, -)
+/* { dg-final { scan-assembler-times "add(?:b|l|w|q)\[^\n\r]*1, \\(%rdi\\), %(?:|r|e)a(?:x|l)" 4 } } */
+/* { dg-final { scan-assembler-times "lea(?:l|q)\[^\n\r]\\(%r(?:d|s)i,%r(?:d|s)i\\), %(?:|r|e)ax" 4 } } */
+/* { dg-final { scan-assembler-times "add(?:b|l|w|q)\[^\n\r]%(?:|r|e)si(?:|l), \\(%rdi\\), %(?:|r|e)a(?:x|l)" 4 } } */
+/* { dg-final { scan-assembler-times "sub(?:b|l|w|q)\[^\n\r]*1, \\(%rdi\\), %(?:|r|e)a(?:x|l)" 4 } } */
+/* { dg-final { scan-assembler-times "sub(?:b|l|w|q)\[^\n\r]%(?:|r|e)si(?:|l), %(?:|r|e)di, %(?:|r|e)a(?:x|l)" 4 } } */
-- 
2.31.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 07/17] [APX NDD] Support APX NDD for sbb insn
  2023-12-05  2:29 [PATCH v2 00/17] Support Intel APX NDD Hongyu Wang
                   ` (5 preceding siblings ...)
  2023-12-05  2:29 ` [PATCH 06/17] [APX NDD] Support APX NDD for sub insns Hongyu Wang
@ 2023-12-05  2:29 ` Hongyu Wang
  2023-12-05  2:29 ` [PATCH 08/17] [APX NDD] Support APX NDD for neg insn Hongyu Wang
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Hongyu Wang @ 2023-12-05  2:29 UTC (permalink / raw)
  To: gcc-patches; +Cc: ubizjak, hongtao.liu, Kong Lingling

From: Kong Lingling <lingling.kong@intel.com>

Similar to *add<dwi>3_doubleword, operands[1] may not equal to operands[0] so
extra move is required.

gcc/ChangeLog:

	* config/i386/i386.md (*sub<dwi>3_doubleword): Add new alternative for
	NDD, and emit move when operands[0] not equal to operands[1].
	(*sub<dwi>3_doubleword_zext): Likewise.
	(*subv<dwi>4_doubleword): Likewise.
	(*subv<dwi>4_doubleword_1): Likewise.
	(*subv<mode>4_overflow_1): Add NDD alternatives and adjust output
	templates.
	(*subv<mode>4_overflow_2): Likewise.
	(@sub<mode>3_carry): Likewise.
	(*addsi3_carry_zext_0r): Likewise, and use nonimmediate_operand for
	operands[1] to accept memory input for NDD alternative.
	(*subsi3_carry_zext): Likewise.
	(subborrow<mode>): Parse TARGET_APX_NDD to ix86_binary_operator_ok.
	(subborrow<mode>_0): Likewise.
	(*sub<mode>3_eq): Likewise.
	(*sub<mode>3_ne): Likewise.
	(*sub<mode>3_eq_1): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/apx-ndd-sbb.c: New test.
---
 gcc/config/i386/i386.md                     | 160 ++++++++++++--------
 gcc/testsuite/gcc.target/i386/apx-ndd-sbb.c |   6 +
 2 files changed, 107 insertions(+), 59 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd-sbb.c

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index ea5377a0b38..e2705ada31a 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -7776,12 +7776,13 @@ (define_expand "sub<mode>3"
 				TARGET_APX_NDD); DONE;")
 
 (define_insn_and_split "*sub<dwi>3_doubleword"
-  [(set (match_operand:<DWI> 0 "nonimmediate_operand" "=ro,r")
+  [(set (match_operand:<DWI> 0 "nonimmediate_operand" "=ro,r,r,r")
 	(minus:<DWI>
-	  (match_operand:<DWI> 1 "nonimmediate_operand" "0,0")
-	  (match_operand:<DWI> 2 "x86_64_hilo_general_operand" "r<di>,o")))
+	  (match_operand:<DWI> 1 "nonimmediate_operand" "0,0,ro,r")
+	  (match_operand:<DWI> 2 "x86_64_hilo_general_operand" "r<di>,o,r<di>,o")))
    (clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (MINUS, <MODE>mode, operands)"
+  "ix86_binary_operator_ok (MINUS, <MODE>mode, operands,
+			    TARGET_APX_NDD)"
   "#"
   "&& reload_completed"
   [(parallel [(set (reg:CC FLAGS_REG)
@@ -7805,16 +7806,18 @@ (define_insn_and_split "*sub<dwi>3_doubleword"
 				   TARGET_APX_NDD);
       DONE;
     }
-})
+}
+[(set_attr "isa" "*,*,apx_ndd,apx_ndd")])
 
 (define_insn_and_split "*sub<dwi>3_doubleword_zext"
-  [(set (match_operand:<DWI> 0 "nonimmediate_operand" "=r,o")
+  [(set (match_operand:<DWI> 0 "nonimmediate_operand" "=r,o,r,r")
 	(minus:<DWI>
-	  (match_operand:<DWI> 1 "nonimmediate_operand" "0,0")
+	  (match_operand:<DWI> 1 "nonimmediate_operand" "0,0,r,o")
 	  (zero_extend:<DWI>
-	    (match_operand:DWIH 2 "nonimmediate_operand" "rm,r"))))
+	    (match_operand:DWIH 2 "nonimmediate_operand" "rm,r,rm,r"))))
    (clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (UNKNOWN, <DWI>mode, operands)"
+  "ix86_binary_operator_ok (UNKNOWN, <DWI>mode, operands,
+			    TARGET_APX_NDD)"
   "#"
   "&& reload_completed"
   [(parallel [(set (reg:CC FLAGS_REG)
@@ -7828,7 +7831,8 @@ (define_insn_and_split "*sub<dwi>3_doubleword_zext"
 		       (ltu:DWIH (reg:CC FLAGS_REG) (const_int 0)))
 		     (const_int 0)))
 	      (clobber (reg:CC FLAGS_REG))])]
-  "split_double_mode (<DWI>mode, &operands[0], 2, &operands[0], &operands[3]);")
+  "split_double_mode (<DWI>mode, &operands[0], 2, &operands[0], &operands[3]);"
+[(set_attr "isa" "*,*,apx_ndd,apx_ndd")])
 
 (define_insn "*sub<mode>_1"
   [(set (match_operand:SWI 0 "nonimmediate_operand" "=<r>m,<r>,r,r")
@@ -8162,14 +8166,15 @@ (define_insn_and_split "*subv<dwi>4_doubleword"
 	(eq:CCO
 	  (minus:<QPWI>
 	    (sign_extend:<QPWI>
-	      (match_operand:<DWI> 1 "nonimmediate_operand" "0,0"))
+	      (match_operand:<DWI> 1 "nonimmediate_operand" "0,0,ro,r"))
 	    (sign_extend:<QPWI>
-	      (match_operand:<DWI> 2 "nonimmediate_operand" "r,o")))
+	      (match_operand:<DWI> 2 "nonimmediate_operand" "r,o,r,o")))
 	  (sign_extend:<QPWI>
 	    (minus:<DWI> (match_dup 1) (match_dup 2)))))
-   (set (match_operand:<DWI> 0 "nonimmediate_operand" "=ro,r")
+   (set (match_operand:<DWI> 0 "nonimmediate_operand" "=ro,r,r,r")
 	(minus:<DWI> (match_dup 1) (match_dup 2)))]
-  "ix86_binary_operator_ok (MINUS, <MODE>mode, operands)"
+  "ix86_binary_operator_ok (MINUS, <MODE>mode, operands,
+			    TARGET_APX_NDD)"
   "#"
   "&& reload_completed"
   [(parallel [(set (reg:CC FLAGS_REG)
@@ -8197,22 +8202,24 @@ (define_insn_and_split "*subv<dwi>4_doubleword"
 		     (match_dup 5)))])]
 {
   split_double_mode (<DWI>mode, &operands[0], 3, &operands[0], &operands[3]);
-})
+}
+[(set_attr "isa" "*,*,apx_ndd,apx_ndd")])
 
 (define_insn_and_split "*subv<dwi>4_doubleword_1"
   [(set (reg:CCO FLAGS_REG)
 	(eq:CCO
 	  (minus:<QPWI>
 	    (sign_extend:<QPWI>
-	      (match_operand:<DWI> 1 "nonimmediate_operand" "0"))
+	      (match_operand:<DWI> 1 "nonimmediate_operand" "0,ro"))
 	    (match_operand:<QPWI> 3 "const_scalar_int_operand"))
 	  (sign_extend:<QPWI>
 	    (minus:<DWI>
 	      (match_dup 1)
-	      (match_operand:<DWI> 2 "x86_64_hilo_general_operand" "<di>")))))
-   (set (match_operand:<DWI> 0 "nonimmediate_operand" "=ro")
+	      (match_operand:<DWI> 2 "x86_64_hilo_general_operand" "<di>,<di>")))))
+   (set (match_operand:<DWI> 0 "nonimmediate_operand" "=ro,r")
 	(minus:<DWI> (match_dup 1) (match_dup 2)))]
-  "ix86_binary_operator_ok (MINUS, <MODE>mode, operands)
+  "ix86_binary_operator_ok (MINUS, <MODE>mode, operands,
+			    TARGET_APX_NDD)
    && CONST_SCALAR_INT_P (operands[2])
    && rtx_equal_p (operands[2], operands[3])"
   "#"
@@ -8250,7 +8257,8 @@ (define_insn_and_split "*subv<dwi>4_doubleword_1"
 				    operands[5]));
       DONE;
     }
-})
+}
+[(set_attr "isa" "*,apx_ndd")])
 
 (define_insn "*subv<mode>4_overflow_1"
   [(set (reg:CCO FLAGS_REG)
@@ -8258,11 +8266,11 @@ (define_insn "*subv<mode>4_overflow_1"
 	  (minus:<DWI>
 	    (minus:<DWI>
 	      (sign_extend:<DWI>
-		(match_operand:SWI 1 "nonimmediate_operand" "%0,0"))
+		(match_operand:SWI 1 "nonimmediate_operand" "%0,0,rm,r"))
 	      (match_operator:<DWI> 4 "ix86_carry_flag_operator"
 		[(match_operand 3 "flags_reg_operand") (const_int 0)]))
 	    (sign_extend:<DWI>
-	      (match_operand:SWI 2 "<general_sext_operand>" "rWe,m")))
+	      (match_operand:SWI 2 "<general_sext_operand>" "rWe,m,rWe,m")))
 	  (sign_extend:<DWI>
 	    (minus:SWI
 	      (minus:SWI
@@ -8270,15 +8278,21 @@ (define_insn "*subv<mode>4_overflow_1"
 		(match_operator:SWI 5 "ix86_carry_flag_operator"
 		  [(match_dup 3) (const_int 0)]))
 	      (match_dup 2)))))
-   (set (match_operand:SWI 0 "nonimmediate_operand" "=rm,r")
+   (set (match_operand:SWI 0 "nonimmediate_operand" "=rm,r,r,r")
 	(minus:SWI
 	  (minus:SWI
 	    (match_dup 1)
 	    (match_op_dup 5 [(match_dup 3) (const_int 0)]))
 	  (match_dup 2)))]
-  "ix86_binary_operator_ok (MINUS, <MODE>mode, operands)"
-  "sbb{<imodesuffix>}\t{%2, %0|%0, %2}"
-  [(set_attr "type" "alu")
+  "ix86_binary_operator_ok (MINUS, <MODE>mode, operands,
+			    TARGET_APX_NDD)"
+  "@
+  sbb{<imodesuffix>}\t{%2, %0|%0, %2}
+  sbb{<imodesuffix>}\t{%2, %0|%0, %2}
+  sbb{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}
+  sbb{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "*,*,apx_ndd,apx_ndd")
+   (set_attr "type" "alu")
    (set_attr "mode" "<MODE>")])
 
 (define_insn "*subv<mode>4_overflow_2"
@@ -8287,28 +8301,32 @@ (define_insn "*subv<mode>4_overflow_2"
 	  (minus:<DWI>
 	    (minus:<DWI>
 	      (sign_extend:<DWI>
-		(match_operand:SWI 1 "nonimmediate_operand" "%0"))
+		(match_operand:SWI 1 "nonimmediate_operand" "%0,rm"))
 	      (match_operator:<DWI> 4 "ix86_carry_flag_operator"
 		[(match_operand 3 "flags_reg_operand") (const_int 0)]))
-	    (match_operand:<DWI> 6 "const_int_operand" "n"))
+	    (match_operand:<DWI> 6 "const_int_operand" "n,n"))
 	  (sign_extend:<DWI>
 	    (minus:SWI
 	      (minus:SWI
 		(match_dup 1)
 		(match_operator:SWI 5 "ix86_carry_flag_operator"
 		  [(match_dup 3) (const_int 0)]))
-	      (match_operand:SWI 2 "x86_64_immediate_operand" "e")))))
-   (set (match_operand:SWI 0 "nonimmediate_operand" "=rm")
+	      (match_operand:SWI 2 "x86_64_immediate_operand" "e,e")))))
+   (set (match_operand:SWI 0 "nonimmediate_operand" "=rm,r")
 	(minus:SWI
 	  (minus:SWI
 	    (match_dup 1)
 	    (match_op_dup 5 [(match_dup 3) (const_int 0)]))
 	  (match_dup 2)))]
-  "ix86_binary_operator_ok (MINUS, <MODE>mode, operands)
+  "ix86_binary_operator_ok (MINUS, <MODE>mode, operands,
+			    TARGET_APX_NDD)
    && CONST_INT_P (operands[2])
    && INTVAL (operands[2]) == INTVAL (operands[6])"
-  "sbb{<imodesuffix>}\t{%2, %0|%0, %2}"
-  [(set_attr "type" "alu")
+  "@
+  sbb{<imodesuffix>}\t{%2, %0|%0, %2}
+  sbb{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "*,apx_ndd")
+   (set_attr "type" "alu")
    (set_attr "mode" "<MODE>")
    (set (attr "length_immediate")
      (if_then_else (match_test "IN_RANGE (INTVAL (operands[2]), -128, 127)")
@@ -8593,15 +8611,18 @@ (define_insn "*addsi3_carry_zext_0"
    (set_attr "mode" "SI")])
 
 (define_insn "*addsi3_carry_zext_0r"
-  [(set (match_operand:DI 0 "register_operand" "=r")
+  [(set (match_operand:DI 0 "register_operand" "=r,r")
 	(zero_extend:DI
 	  (plus:SI (match_operator:SI 2 "ix86_carry_flag_unset_operator"
 		    [(reg FLAGS_REG) (const_int 0)])
-		   (match_operand:SI 1 "register_operand" "0"))))
+		   (match_operand:SI 1 "nonimmediate_operand" "0,rm"))))
    (clobber (reg:CC FLAGS_REG))]
   "TARGET_64BIT"
-  "sbb{l}\t{$-1, %k0|%k0, -1}"
-  [(set_attr "type" "alu")
+  "@
+  sbb{l}\t{$-1, %k0|%k0, -1}
+  sbb{l}\t{$-1, %1, %k0|%k0, %1, -1}"
+  [(set_attr "isa" "*,apx_ndd")
+   (set_attr "type" "alu")
    (set_attr "use_carry" "1")
    (set_attr "pent_pair" "pu")
    (set_attr "mode" "SI")])
@@ -8841,17 +8862,23 @@ (define_insn "*addcarry<mode>_1"
        (const_string "4")))])
 
 (define_insn "@sub<mode>3_carry"
-  [(set (match_operand:SWI 0 "nonimmediate_operand" "=<r>m,<r>")
+  [(set (match_operand:SWI 0 "nonimmediate_operand" "=<r>m,<r>,r,r")
 	(minus:SWI
 	  (minus:SWI
-	    (match_operand:SWI 1 "nonimmediate_operand" "0,0")
+	    (match_operand:SWI 1 "nonimmediate_operand" "0,0,rm,r")
 	    (match_operator:SWI 4 "ix86_carry_flag_operator"
 	     [(match_operand 3 "flags_reg_operand") (const_int 0)]))
-	  (match_operand:SWI 2 "<general_operand>" "<r><i>,<m>")))
+	  (match_operand:SWI 2 "<general_operand>" "<r><i>,<m>,r<i>,<m>")))
    (clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (MINUS, <MODE>mode, operands)"
-  "sbb{<imodesuffix>}\t{%2, %0|%0, %2}"
-  [(set_attr "type" "alu")
+  "ix86_binary_operator_ok (MINUS, <MODE>mode, operands,
+			    TARGET_APX_NDD)"
+  "@
+  sbb{<imodesuffix>}\t{%2, %0|%0, %2}
+  sbb{<imodesuffix>}\t{%2, %0|%0, %2}
+  sbb{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}
+  sbb{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "*,*,apx_ndd,apx_ndd")
+   (set_attr "type" "alu")
    (set_attr "use_carry" "1")
    (set_attr "pent_pair" "pu")
    (set_attr "mode" "<MODE>")])
@@ -8938,18 +8965,23 @@ (define_insn "*sub<mode>3_carry_0r"
    (set_attr "mode" "<MODE>")])
 
 (define_insn "*subsi3_carry_zext"
-  [(set (match_operand:DI 0 "register_operand" "=r")
+  [(set (match_operand:DI 0 "register_operand" "=r,r,r")
 	(zero_extend:DI
 	  (minus:SI
 	    (minus:SI
-	      (match_operand:SI 1 "register_operand" "0")
+	      (match_operand:SI 1 "nonimmediate_operand" "0,r,rm")
 	      (match_operator:SI 3 "ix86_carry_flag_operator"
 	       [(reg FLAGS_REG) (const_int 0)]))
-	    (match_operand:SI 2 "x86_64_general_operand" "rBMe"))))
+	    (match_operand:SI 2 "x86_64_general_operand" "rBMe,rBMe,re"))))
    (clobber (reg:CC FLAGS_REG))]
-  "TARGET_64BIT && ix86_binary_operator_ok (MINUS, SImode, operands)"
-  "sbb{l}\t{%2, %k0|%k0, %2}"
-  [(set_attr "type" "alu")
+  "TARGET_64BIT && ix86_binary_operator_ok (MINUS, SImode, operands,
+					    TARGET_APX_NDD)"
+  "@
+  sbb{l}\t{%2, %k0|%k0, %2}
+  sbb{l}\t{%2, %1, %k0|%k0, %1, %2}
+  sbb{l}\t{%2, %1, %k0|%k0, %1, %2}"
+  [(set_attr "isa" "*,apx_ndd,apx_ndd")
+   (set_attr "type" "alu")
    (set_attr "use_carry" "1")
    (set_attr "pent_pair" "pu")
    (set_attr "mode" "SI")])
@@ -9034,21 +9066,27 @@ (define_insn "subborrow<mode>"
   [(set (reg:CCC FLAGS_REG)
 	(compare:CCC
 	  (zero_extend:<DWI>
-	    (match_operand:SWI48 1 "nonimmediate_operand" "0,0"))
+	    (match_operand:SWI48 1 "nonimmediate_operand" "0,0,r,rm"))
 	  (plus:<DWI>
 	    (match_operator:<DWI> 4 "ix86_carry_flag_operator"
 	      [(match_operand 3 "flags_reg_operand") (const_int 0)])
 	    (zero_extend:<DWI>
-	      (match_operand:SWI48 2 "nonimmediate_operand" "r,rm")))))
-   (set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r")
+	      (match_operand:SWI48 2 "nonimmediate_operand" "r,rm,rm,r")))))
+   (set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r,r")
 	(minus:SWI48 (minus:SWI48
 		       (match_dup 1)
 		       (match_operator:SWI48 5 "ix86_carry_flag_operator"
 			 [(match_dup 3) (const_int 0)]))
 		     (match_dup 2)))]
-  "ix86_binary_operator_ok (MINUS, <MODE>mode, operands)"
-  "sbb{<imodesuffix>}\t{%2, %0|%0, %2}"
-  [(set_attr "type" "alu")
+  "ix86_binary_operator_ok (MINUS, <MODE>mode, operands,
+			    TARGET_APX_NDD)"
+  "@
+  sbb{<imodesuffix>}\t{%2, %0|%0, %2}
+  sbb{<imodesuffix>}\t{%2, %0|%0, %2}
+  sbb{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}
+  sbb{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "*,*,apx_ndd,apx_ndd")
+   (set_attr "type" "alu")
    (set_attr "use_carry" "1")
    (set_attr "pent_pair" "pu")
    (set_attr "mode" "<MODE>")])
@@ -9209,7 +9247,8 @@ (define_expand "subborrow<mode>_0"
 	     (match_operand:SWI48 2 "<general_operand>")))
       (set (match_operand:SWI48 0 "register_operand")
 	   (minus:SWI48 (match_dup 1) (match_dup 2)))])]
-  "ix86_binary_operator_ok (MINUS, <MODE>mode, operands)")
+  "ix86_binary_operator_ok (MINUS, <MODE>mode, operands,
+			    TARGET_APX_NDD)")
 
 (define_expand "uaddc<mode>5"
   [(match_operand:SWI48 0 "register_operand")
@@ -9634,7 +9673,8 @@ (define_insn_and_split "*sub<mode>3_eq"
 		    (const_int 0)))
 	  (match_operand:SWI 2 "<general_operand>")))
    (clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (MINUS, <MODE>mode, operands)
+  "ix86_binary_operator_ok (MINUS, <MODE>mode, operands,
+			    TARGET_APX_NDD)
    && ix86_pre_reload_split ()"
   "#"
   "&& 1"
@@ -9659,7 +9699,8 @@ (define_insn_and_split "*sub<mode>3_ne"
   "CONST_INT_P (operands[2])
    && (<MODE>mode != DImode
        || INTVAL (operands[2]) != HOST_WIDE_INT_C (-0x80000000))
-   && ix86_binary_operator_ok (MINUS, <MODE>mode, operands)
+   && ix86_binary_operator_ok (MINUS, <MODE>mode, operands,
+			       TARGET_APX_NDD)
    && ix86_pre_reload_split ()"
   "#"
   "&& 1"
@@ -9688,7 +9729,8 @@ (define_insn_and_split "*sub<mode>3_eq_1"
   "CONST_INT_P (operands[2])
    && (<MODE>mode != DImode
        || INTVAL (operands[2]) != HOST_WIDE_INT_C (-0x80000000))
-   && ix86_binary_operator_ok (MINUS, <MODE>mode, operands)
+   && ix86_binary_operator_ok (MINUS, <MODE>mode, operands,
+			       TARGET_APX_NDD)
    && ix86_pre_reload_split ()"
   "#"
   "&& 1"
diff --git a/gcc/testsuite/gcc.target/i386/apx-ndd-sbb.c b/gcc/testsuite/gcc.target/i386/apx-ndd-sbb.c
new file mode 100644
index 00000000000..662e3c607d8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/apx-ndd-sbb.c
@@ -0,0 +1,6 @@
+/* { dg-do compile { target { int128 && { ! ia32 } } } } */
+/* { dg-options "-mapxf -O2" } */
+
+#include "pr91681-2.c"
+
+/* { dg-final { scan-assembler-times "sbbq\[^\n\r]*0, %rdi, %rdx" 1 } } */
-- 
2.31.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 08/17] [APX NDD] Support APX NDD for neg insn
  2023-12-05  2:29 [PATCH v2 00/17] Support Intel APX NDD Hongyu Wang
                   ` (6 preceding siblings ...)
  2023-12-05  2:29 ` [PATCH 07/17] [APX NDD] Support APX NDD for sbb insn Hongyu Wang
@ 2023-12-05  2:29 ` Hongyu Wang
  2023-12-05  2:29 ` [PATCH 09/17] [APX NDD] Support APX NDD for not insn Hongyu Wang
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Hongyu Wang @ 2023-12-05  2:29 UTC (permalink / raw)
  To: gcc-patches; +Cc: ubizjak, hongtao.liu, Kong Lingling

From: Kong Lingling <lingling.kong@intel.com>

gcc/ChangeLog:

	* config/i386/i386-expand.cc (ix86_expand_unary_operator): Add use_ndd
	parameter and adjust for NDD.
	* config/i386/i386-protos.h: Add use_ndd parameter for
	ix86_unary_operator_ok and ix86_expand_unary_operator.
	* config/i386/i386.cc (ix86_unary_operator_ok): Add use_ndd parameter
	and adjust for NDD.
	* config/i386/i386.md (neg<mode>2): Add new constraint for NDD and
	adjust output template.
	(*neg<mode>_1): Likewise.
	(*neg<dwi>2_doubleword): Likewise.
	(*neg<mode>_2): Likewise.
	(*neg<mode>_ccc_1): Likewise.
	(*neg<mode>_ccc_2): Likewise.
	(*negsi_1_zext): Likewise, and use nonimmediate_operand for operands[1]
	to accept memory input for NDD alternatives.
	(*negsi_2_zext): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/apx-ndd.c: Add neg test.
---
 gcc/config/i386/i386-expand.cc          |  4 +-
 gcc/config/i386/i386-protos.h           |  5 +-
 gcc/config/i386/i386.cc                 |  5 +-
 gcc/config/i386/i386.md                 | 77 ++++++++++++++++---------
 gcc/testsuite/gcc.target/i386/apx-ndd.c | 29 ++++++++++
 5 files changed, 87 insertions(+), 33 deletions(-)

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 93ecde4b4a8..d4bbd33ce07 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -1494,7 +1494,7 @@ ix86_binary_operator_ok (enum rtx_code code, machine_mode mode,
 
 void
 ix86_expand_unary_operator (enum rtx_code code, machine_mode mode,
-			    rtx operands[])
+			    rtx operands[], bool use_ndd)
 {
   bool matching_memory = false;
   rtx src, dst, op, clob;
@@ -1513,7 +1513,7 @@ ix86_expand_unary_operator (enum rtx_code code, machine_mode mode,
     }
 
   /* When source operand is memory, destination must match.  */
-  if (MEM_P (src) && !matching_memory)
+  if (!use_ndd && MEM_P (src) && !matching_memory)
     src = force_reg (mode, src);
 
   /* Emit the instruction.  */
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 481527872e8..fa952409729 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -127,7 +127,7 @@ extern bool ix86_vec_interleave_v2df_operator_ok (rtx operands[3], bool high);
 extern bool ix86_dep_by_shift_count (const_rtx set_insn, const_rtx use_insn);
 extern bool ix86_agi_dependent (rtx_insn *set_insn, rtx_insn *use_insn);
 extern void ix86_expand_unary_operator (enum rtx_code, machine_mode,
-					rtx[]);
+					rtx[], bool = false);
 extern rtx ix86_build_const_vector (machine_mode, bool, rtx);
 extern rtx ix86_build_signbit_mask (machine_mode, bool, bool);
 extern HOST_WIDE_INT ix86_convert_const_vector_to_integer (rtx,
@@ -147,7 +147,8 @@ extern void ix86_split_fp_absneg_operator (enum rtx_code, machine_mode,
 					   rtx[]);
 extern void ix86_expand_copysign (rtx []);
 extern void ix86_expand_xorsign (rtx []);
-extern bool ix86_unary_operator_ok (enum rtx_code, machine_mode, rtx[2]);
+extern bool ix86_unary_operator_ok (enum rtx_code, machine_mode, rtx[2],
+				    bool = false);
 extern bool ix86_match_ccmode (rtx, machine_mode);
 extern bool ix86_match_ptest_ccmode (rtx);
 extern void ix86_expand_branch (enum rtx_code, rtx, rtx, rtx);
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 3e670330ef6..a3b628d2f6d 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -16209,11 +16209,12 @@ ix86_dep_by_shift_count (const_rtx set_insn, const_rtx use_insn)
 bool
 ix86_unary_operator_ok (enum rtx_code,
 			machine_mode,
-			rtx operands[2])
+			rtx operands[2],
+			bool use_ndd)
 {
   /* If one of operands is memory, source and destination must match.  */
   if ((MEM_P (operands[0])
-       || MEM_P (operands[1]))
+       || (!use_ndd && MEM_P (operands[1])))
       && ! rtx_equal_p (operands[0], operands[1]))
     return false;
   return true;
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index e2705ada31a..1a2fb116f01 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -13282,13 +13282,14 @@ (define_expand "neg<mode>2"
   [(set (match_operand:SDWIM 0 "nonimmediate_operand")
 	(neg:SDWIM (match_operand:SDWIM 1 "nonimmediate_operand")))]
   ""
-  "ix86_expand_unary_operator (NEG, <MODE>mode, operands); DONE;")
+  "ix86_expand_unary_operator (NEG, <MODE>mode, operands,
+			       TARGET_APX_NDD); DONE;")
 
 (define_insn_and_split "*neg<dwi>2_doubleword"
-  [(set (match_operand:<DWI> 0 "nonimmediate_operand" "=ro")
-	(neg:<DWI> (match_operand:<DWI> 1 "nonimmediate_operand" "0")))
+  [(set (match_operand:<DWI> 0 "nonimmediate_operand" "=ro,r")
+	(neg:<DWI> (match_operand:<DWI> 1 "nonimmediate_operand" "0,ro")))
    (clobber (reg:CC FLAGS_REG))]
-  "ix86_unary_operator_ok (NEG, <DWI>mode, operands)"
+  "ix86_unary_operator_ok (NEG, <DWI>mode, operands, TARGET_APX_NDD)"
   "#"
   "&& reload_completed"
   [(parallel
@@ -13305,7 +13306,8 @@ (define_insn_and_split "*neg<dwi>2_doubleword"
     [(set (match_dup 2)
 	  (neg:DWIH (match_dup 2)))
      (clobber (reg:CC FLAGS_REG))])]
-  "split_double_mode (<DWI>mode, &operands[0], 2, &operands[0], &operands[2]);")
+  "split_double_mode (<DWI>mode, &operands[0], 2, &operands[0], &operands[2]);"
+  [(set_attr "isa" "*,apx_ndd")])
 
 ;; Convert:
 ;;   mov %esi, %edx
@@ -13394,22 +13396,29 @@ (define_peephole2
      (clobber (reg:CC FLAGS_REG))])])
 
 (define_insn "*neg<mode>_1"
-  [(set (match_operand:SWI 0 "nonimmediate_operand" "=<r>m")
-	(neg:SWI (match_operand:SWI 1 "nonimmediate_operand" "0")))
+  [(set (match_operand:SWI 0 "nonimmediate_operand" "=<r>m,r")
+	(neg:SWI (match_operand:SWI 1 "nonimmediate_operand" "0,rm")))
    (clobber (reg:CC FLAGS_REG))]
-  "ix86_unary_operator_ok (NEG, <MODE>mode, operands)"
-  "neg{<imodesuffix>}\t%0"
+  "ix86_unary_operator_ok (NEG, <MODE>mode, operands, TARGET_APX_NDD)"
+  "@
+  neg{<imodesuffix>}\t%0
+  neg{<imodesuffix>}\t{%1, %0|%0, %1}"
   [(set_attr "type" "negnot")
+   (set_attr "isa" "*,apx_ndd")
    (set_attr "mode" "<MODE>")])
 
 (define_insn "*negsi_1_zext"
-  [(set (match_operand:DI 0 "register_operand" "=r")
+  [(set (match_operand:DI 0 "register_operand" "=r,r")
 	(zero_extend:DI
-	  (neg:SI (match_operand:SI 1 "register_operand" "0"))))
+	  (neg:SI (match_operand:SI 1 "nonimmediate_operand" "0,rm"))))
    (clobber (reg:CC FLAGS_REG))]
-  "TARGET_64BIT && ix86_unary_operator_ok (NEG, SImode, operands)"
-  "neg{l}\t%k0"
+  "TARGET_64BIT && ix86_unary_operator_ok (NEG, SImode, operands,
+					   TARGET_APX_NDD)"
+  "@
+  neg{l}\t%k0
+  neg{l}\t{%k1, %k0|%k0, %k1}"
   [(set_attr "type" "negnot")
+   (set_attr "isa" "*,apx_ndd")
    (set_attr "mode" "SI")])
 
 ;; Alternative 1 is needed to work around LRA limitation, see PR82524.
@@ -13435,51 +13444,65 @@ (define_insn_and_split "*neg<mode>_1_slp"
 (define_insn "*neg<mode>_2"
   [(set (reg FLAGS_REG)
 	(compare
-	  (neg:SWI (match_operand:SWI 1 "nonimmediate_operand" "0"))
+	  (neg:SWI (match_operand:SWI 1 "nonimmediate_operand" "0,rm"))
 	  (const_int 0)))
-   (set (match_operand:SWI 0 "nonimmediate_operand" "=<r>m")
+   (set (match_operand:SWI 0 "nonimmediate_operand" "=<r>m,r")
 	(neg:SWI (match_dup 1)))]
   "ix86_match_ccmode (insn, CCGOCmode)
-   && ix86_unary_operator_ok (NEG, <MODE>mode, operands)"
-  "neg{<imodesuffix>}\t%0"
+   && ix86_unary_operator_ok (NEG, <MODE>mode, operands,
+			      TARGET_APX_NDD)"
+  "@
+   neg{<imodesuffix>}\t%0
+   neg{<imodesuffix>}\t{%1, %0|%0, %1}"
   [(set_attr "type" "negnot")
+   (set_attr "isa" "*,apx_ndd")
    (set_attr "mode" "<MODE>")])
 
 (define_insn "*negsi_2_zext"
   [(set (reg FLAGS_REG)
 	(compare
-	  (neg:SI (match_operand:SI 1 "register_operand" "0"))
+	  (neg:SI (match_operand:SI 1 "nonimmediate_operand" "0,rm"))
 	  (const_int 0)))
-   (set (match_operand:DI 0 "register_operand" "=r")
+   (set (match_operand:DI 0 "register_operand" "=r,r")
 	(zero_extend:DI
 	  (neg:SI (match_dup 1))))]
   "TARGET_64BIT && ix86_match_ccmode (insn, CCGOCmode)
-   && ix86_unary_operator_ok (NEG, SImode, operands)"
-  "neg{l}\t%k0"
+   && ix86_unary_operator_ok (NEG, SImode, operands,
+			      TARGET_APX_NDD)"
+  "@
+   neg{l}\t%k0
+   neg{l}\t{%1, %k0|%k0, %1}"
   [(set_attr "type" "negnot")
+   (set_attr "isa" "*,apx_ndd")
    (set_attr "mode" "SI")])
 
 (define_insn "*neg<mode>_ccc_1"
   [(set (reg:CCC FLAGS_REG)
 	(unspec:CCC
-	  [(match_operand:SWI 1 "nonimmediate_operand" "0")
+	  [(match_operand:SWI 1 "nonimmediate_operand" "0,rm")
 	   (const_int 0)] UNSPEC_CC_NE))
-   (set (match_operand:SWI 0 "nonimmediate_operand" "=<r>m")
+   (set (match_operand:SWI 0 "nonimmediate_operand" "=<r>m,r")
 	(neg:SWI (match_dup 1)))]
   ""
-  "neg{<imodesuffix>}\t%0"
+  "@
+  neg{<imodesuffix>}\t%0
+  neg{<imodesuffix>}\t{%1, %0|%0, %1}"
   [(set_attr "type" "negnot")
+   (set_attr "isa" "*,apx_ndd")
    (set_attr "mode" "<MODE>")])
 
 (define_insn "*neg<mode>_ccc_2"
   [(set (reg:CCC FLAGS_REG)
 	(unspec:CCC
-	  [(match_operand:SWI 1 "nonimmediate_operand" "0")
+	  [(match_operand:SWI 1 "nonimmediate_operand" "0,rm")
 	   (const_int 0)] UNSPEC_CC_NE))
-   (clobber (match_scratch:SWI 0 "=<r>"))]
+   (clobber (match_scratch:SWI 0 "=<r>,r"))]
   ""
-  "neg{<imodesuffix>}\t%0"
+  "@
+  neg{<imodesuffix>}\t%0
+  neg{<imodesuffix>}\t{%1, %0|%0, %1}"
   [(set_attr "type" "negnot")
+   (set_attr "isa" "*,apx_ndd")
    (set_attr "mode" "<MODE>")])
 
 (define_expand "x86_neg<mode>_ccc"
diff --git a/gcc/testsuite/gcc.target/i386/apx-ndd.c b/gcc/testsuite/gcc.target/i386/apx-ndd.c
index 0c7952ef018..c351f71265e 100644
--- a/gcc/testsuite/gcc.target/i386/apx-ndd.c
+++ b/gcc/testsuite/gcc.target/i386/apx-ndd.c
@@ -27,8 +27,25 @@ foo2_##OP_NAME##_##TYPE (TYPE *a, TYPE b) \
 {				 	  \
   TYPE c = *a OP b;		 	  \
   return c;			 	  \
+}
+
+#define F(TYPE, OP_NAME, OP)   \
+TYPE				 \
+__attribute__ ((noipa)) 	 \
+f_##OP_NAME##_##TYPE (TYPE *a) \
+{				 \
+  TYPE b = OP*a;		 \
+  return b;			 \
 }			
 
+#define F1(TYPE, OP_NAME, OP)		 \
+TYPE				  	 \
+__attribute__ ((noipa)) 	  	 \
+f1_##OP_NAME##_##TYPE (TYPE a) \
+{				 	 \
+  TYPE b = OP a;		 	 \
+  return b;			 	 \
+}			
 FOO (char, add, +)
 FOO1 (char, add, +)
 FOO2 (char, add, +)
@@ -50,8 +67,20 @@ FOO (int, sub, -)
 FOO1 (int, sub, -)
 FOO (long, sub, -)
 FOO1 (long, sub, -)
+
+F (char, neg, -)
+F1 (char, neg, -)
+F (short, neg, -)
+F1 (short, neg, -)
+F (int, neg, -)
+F1 (int, neg, -)
+F (long, neg, -)
+F1 (long, neg, -)
 /* { dg-final { scan-assembler-times "add(?:b|l|w|q)\[^\n\r]*1, \\(%rdi\\), %(?:|r|e)a(?:x|l)" 4 } } */
 /* { dg-final { scan-assembler-times "lea(?:l|q)\[^\n\r]\\(%r(?:d|s)i,%r(?:d|s)i\\), %(?:|r|e)ax" 4 } } */
 /* { dg-final { scan-assembler-times "add(?:b|l|w|q)\[^\n\r]%(?:|r|e)si(?:|l), \\(%rdi\\), %(?:|r|e)a(?:x|l)" 4 } } */
 /* { dg-final { scan-assembler-times "sub(?:b|l|w|q)\[^\n\r]*1, \\(%rdi\\), %(?:|r|e)a(?:x|l)" 4 } } */
 /* { dg-final { scan-assembler-times "sub(?:b|l|w|q)\[^\n\r]%(?:|r|e)si(?:|l), %(?:|r|e)di, %(?:|r|e)a(?:x|l)" 4 } } */
+/* { dg-final { scan-assembler-times "negb\[^\n\r]\\(%rdi\\), %(?:|r|e)al" 1 } } */
+/* { dg-final { scan-assembler-times "neg(?:l|w|q)\[^\n\r]\\(%rdi\\), %(?:|r|e)ax" 3 } } */
+/* { dg-final { scan-assembler-times "neg(?:l|w|q)\[^\n\r]%(?:|r|e)di, %(?:|r|e)ax" 4 } } */
-- 
2.31.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 09/17] [APX NDD] Support APX NDD for not insn
  2023-12-05  2:29 [PATCH v2 00/17] Support Intel APX NDD Hongyu Wang
                   ` (7 preceding siblings ...)
  2023-12-05  2:29 ` [PATCH 08/17] [APX NDD] Support APX NDD for neg insn Hongyu Wang
@ 2023-12-05  2:29 ` Hongyu Wang
  2023-12-05  2:29 ` [PATCH 10/17] [APX NDD] Support APX NDD for and insn Hongyu Wang
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Hongyu Wang @ 2023-12-05  2:29 UTC (permalink / raw)
  To: gcc-patches; +Cc: ubizjak, hongtao.liu, Kong Lingling

From: Kong Lingling <lingling.kong@intel.com>

For *one_cmplsi2_2_zext, it will be splitted to xor, so its NDD form will be
added together with xor NDD support.

gcc/ChangeLog:

	* config/i386/i386.md (one_cmpl<mode>2): Add new constraints for NDD
	and adjust output template.
	(*one_cmpl<mode>2_1): Likewise.
	(*one_cmplqi2_1): Likewise.
	(*one_cmpl<dwi>2_doubleword): Likewise.
	(*one_cmpl<mode>2_2): Likewise.
	(*one_cmplsi2_1_zext): Likewise, and use nonimmediate_operand for
	operands[1] to accept memory input for NDD alternative.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/apx-ndd.c: Add not test.
---
 gcc/config/i386/i386.md                 | 58 ++++++++++++++-----------
 gcc/testsuite/gcc.target/i386/apx-ndd.c | 11 +++++
 2 files changed, 44 insertions(+), 25 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 1a2fb116f01..050779273a7 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -14001,57 +14001,63 @@ (define_expand "one_cmpl<mode>2"
   [(set (match_operand:SDWIM 0 "nonimmediate_operand")
 	(not:SDWIM (match_operand:SDWIM 1 "nonimmediate_operand")))]
   ""
-  "ix86_expand_unary_operator (NOT, <MODE>mode, operands); DONE;")
+  "ix86_expand_unary_operator (NOT, <MODE>mode, operands,
+			       TARGET_APX_NDD); DONE;")
 
 (define_insn_and_split "*one_cmpl<dwi>2_doubleword"
-  [(set (match_operand:<DWI> 0 "nonimmediate_operand" "=ro")
-	(not:<DWI> (match_operand:<DWI> 1 "nonimmediate_operand" "0")))]
-  "ix86_unary_operator_ok (NOT, <DWI>mode, operands)"
+  [(set (match_operand:<DWI> 0 "nonimmediate_operand" "=ro,r")
+	(not:<DWI> (match_operand:<DWI> 1 "nonimmediate_operand" "0,ro")))]
+  "ix86_unary_operator_ok (NOT, <DWI>mode, operands, TARGET_APX_NDD)"
   "#"
   "&& reload_completed"
   [(set (match_dup 0)
 	(not:DWIH (match_dup 1)))
    (set (match_dup 2)
 	(not:DWIH (match_dup 3)))]
-  "split_double_mode (<DWI>mode, &operands[0], 2, &operands[0], &operands[2]);")
+  "split_double_mode (<DWI>mode, &operands[0], 2, &operands[0], &operands[2]);"
+  [(set_attr "isa" "*,apx_ndd")])
 
 (define_insn "*one_cmpl<mode>2_1"
-  [(set (match_operand:SWI248 0 "nonimmediate_operand" "=rm,?k")
-	(not:SWI248 (match_operand:SWI248 1 "nonimmediate_operand" "0,k")))]
-  "ix86_unary_operator_ok (NOT, <MODE>mode, operands)"
+  [(set (match_operand:SWI248 0 "nonimmediate_operand" "=rm,r,?k")
+	(not:SWI248 (match_operand:SWI248 1 "nonimmediate_operand" "0,rm,k")))]
+  "ix86_unary_operator_ok (NOT, <MODE>mode, operands, TARGET_APX_NDD)"
   "@
    not{<imodesuffix>}\t%0
+   not{<imodesuffix>}\t{%1, %0|%0, %1}
    #"
-  [(set_attr "isa" "*,<kmov_isa>")
-   (set_attr "type" "negnot,msklog")
+  [(set_attr "isa" "*,apx_ndd,<kmov_isa>")
+   (set_attr "type" "negnot,negnot,msklog")
    (set_attr "mode" "<MODE>")])
 
 (define_insn "*one_cmplsi2_1_zext"
-  [(set (match_operand:DI 0 "register_operand" "=r,?k")
+  [(set (match_operand:DI 0 "register_operand" "=r,r,?k")
 	(zero_extend:DI
-	  (not:SI (match_operand:SI 1 "register_operand" "0,k"))))]
-  "TARGET_64BIT && ix86_unary_operator_ok (NOT, SImode, operands)"
+	  (not:SI (match_operand:SI 1 "nonimmediate_operand" "0,rm,k"))))]
+  "TARGET_64BIT && ix86_unary_operator_ok (NOT, SImode, operands,
+					   TARGET_APX_NDD)"
   "@
    not{l}\t%k0
+   not{l}\t{%1, %k0|%k0, %1}
    #"
-  [(set_attr "isa" "x64,avx512bw_512")
-   (set_attr "type" "negnot,msklog")
-   (set_attr "mode" "SI,SI")])
+  [(set_attr "isa" "x64,apx_ndd,avx512bw_512")
+   (set_attr "type" "negnot,negnot,msklog")
+   (set_attr "mode" "SI,SI,SI")])
 
 (define_insn "*one_cmplqi2_1"
-  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,r,?k")
-	(not:QI (match_operand:QI 1 "nonimmediate_operand" "0,0,k")))]
-  "ix86_unary_operator_ok (NOT, QImode, operands)"
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,r,r,?k")
+	(not:QI (match_operand:QI 1 "nonimmediate_operand" "0,0,rm,k")))]
+  "ix86_unary_operator_ok (NOT, QImode, operands, TARGET_APX_NDD)"
   "@
    not{b}\t%0
    not{l}\t%k0
+   not{b}\t{%1, %0|%0, %1}
    #"
-  [(set_attr "isa" "*,*,avx512f")
-   (set_attr "type" "negnot,negnot,msklog")
+  [(set_attr "isa" "*,*,apx_ndd,avx512f")
+   (set_attr "type" "negnot,negnot,negnot,msklog")
    (set (attr "mode")
 	(cond [(eq_attr "alternative" "1")
 		 (const_string "SI")
-		(and (eq_attr "alternative" "2")
+		(and (eq_attr "alternative" "3")
 		     (match_test "!TARGET_AVX512DQ"))
 		 (const_string "HI")
 	       ]
@@ -14081,14 +14087,16 @@ (define_insn_and_split "*one_cmpl<mode>_1_slp"
 
 (define_insn "*one_cmpl<mode>2_2"
   [(set (reg FLAGS_REG)
-	(compare (not:SWI (match_operand:SWI 1 "nonimmediate_operand" "0"))
+	(compare (not:SWI (match_operand:SWI 1 "nonimmediate_operand" "0,rm"))
 		 (const_int 0)))
-   (set (match_operand:SWI 0 "nonimmediate_operand" "=<r>m")
+   (set (match_operand:SWI 0 "nonimmediate_operand" "=<r>m,r")
 	(not:SWI (match_dup 1)))]
   "ix86_match_ccmode (insn, CCNOmode)
-   && ix86_unary_operator_ok (NOT, <MODE>mode, operands)"
+   && ix86_unary_operator_ok (NOT, <MODE>mode, operands,
+			      TARGET_APX_NDD)"
   "#"
   [(set_attr "type" "alu1")
+   (set_attr "isa" "*,apx_ndd")
    (set_attr "mode" "<MODE>")])
 
 (define_split
diff --git a/gcc/testsuite/gcc.target/i386/apx-ndd.c b/gcc/testsuite/gcc.target/i386/apx-ndd.c
index c351f71265e..2bd551614c4 100644
--- a/gcc/testsuite/gcc.target/i386/apx-ndd.c
+++ b/gcc/testsuite/gcc.target/i386/apx-ndd.c
@@ -76,6 +76,15 @@ F (int, neg, -)
 F1 (int, neg, -)
 F (long, neg, -)
 F1 (long, neg, -)
+
+F (char, not, ~)
+F1 (char, not, ~)
+F (short, not, ~)
+F1 (short, not, ~)
+F (int, not, ~)
+F1 (int, not, ~)
+F (long, not, ~)
+F1 (long, not, ~)
 /* { dg-final { scan-assembler-times "add(?:b|l|w|q)\[^\n\r]*1, \\(%rdi\\), %(?:|r|e)a(?:x|l)" 4 } } */
 /* { dg-final { scan-assembler-times "lea(?:l|q)\[^\n\r]\\(%r(?:d|s)i,%r(?:d|s)i\\), %(?:|r|e)ax" 4 } } */
 /* { dg-final { scan-assembler-times "add(?:b|l|w|q)\[^\n\r]%(?:|r|e)si(?:|l), \\(%rdi\\), %(?:|r|e)a(?:x|l)" 4 } } */
@@ -84,3 +93,5 @@ F1 (long, neg, -)
 /* { dg-final { scan-assembler-times "negb\[^\n\r]\\(%rdi\\), %(?:|r|e)al" 1 } } */
 /* { dg-final { scan-assembler-times "neg(?:l|w|q)\[^\n\r]\\(%rdi\\), %(?:|r|e)ax" 3 } } */
 /* { dg-final { scan-assembler-times "neg(?:l|w|q)\[^\n\r]%(?:|r|e)di, %(?:|r|e)ax" 4 } } */
+/* { dg-final { scan-assembler-times "not(?:b|l|w|q)\[^\n\r]\\(%rdi\\), %(?:|r|e)a(?:x|l)" 4 } } */
+/* { dg-final { scan-assembler-times "not(?:l|w|q)\[^\n\r]%(?:|r|e)di, %(?:|r|e)ax" 4 } } */
-- 
2.31.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 10/17] [APX NDD] Support APX NDD for and insn
  2023-12-05  2:29 [PATCH v2 00/17] Support Intel APX NDD Hongyu Wang
                   ` (8 preceding siblings ...)
  2023-12-05  2:29 ` [PATCH 09/17] [APX NDD] Support APX NDD for not insn Hongyu Wang
@ 2023-12-05  2:29 ` Hongyu Wang
  2023-12-05  2:29 ` [PATCH 11/17] [APX NDD] Support APX NDD for or/xor insn Hongyu Wang
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Hongyu Wang @ 2023-12-05  2:29 UTC (permalink / raw)
  To: gcc-patches; +Cc: ubizjak, hongtao.liu, Kong Lingling

From: Kong Lingling <lingling.kong@intel.com>

For NDD form AND insn, there are three splitter fixes after extending legacy
patterns.

1. APX NDD does not support high QImode registers like ah, bh, ch, dh, so for
some optimization splitters that generates highpart zero_extract for QImode
need to be prohibited under NDD pattern.

2. Legacy AND insn will use r/qm/L constraint, and a post-reload splitter will
transform it into zero_extend move. But for NDD form AND, the splitter is not
strict enough as the splitter assum such AND will have the const_int operand
matching the constraint "L", then NDD form AND allows const_int with any QI
values. Restrict the splitter condition to match "L" constraint that strictly
matches zero-extend sematic.

3. Legacy AND insn will adopt r/0/Z constraint, a splitter will try to optimize
such form into strict_lowpart QImode AND when 7th bit is not set. But the
splitter will wronly convert non-zext form of NDD and with memory src, then the
strict_lowpart transform matches alternative 1 of *<code><mode>_slp_1 and
generates *movstrict<mode>_1 so the zext sematic was omitted. This could cause
highpart of dest not cleared and generates wrong code. Disable the splitter
when NDD adopted and operands[0] and operands[1] are not equal.

gcc/ChangeLog:

	* config/i386/i386.md (and<mode>3): Add NDD alternatives and adjust
	output template.
	(*anddi_1): Likewise.
	(*and<mode>_1): Likewise.
	(*andqi_1): Likewise.
	(*andsi_1_zext): Likewise.
	(*anddi_2): Likewise.
	(*andsi_2_zext): Likewise.
	(*andqi_2_maybe_si): Likewise.
	(*and<mode>_2): Likewise.
	(*and<dwi>3_doubleword): Add NDD alternative, emit move for optimized
	case if operands[0] not equal to operands[1].
	(define_split for QI highpart AND): Prohibit splitter to split NDD
	form AND insn to <any_logic:code>qi_ext<mode>_3.
	(define_split for QI strict_lowpart optimization): Prohibit splitter to
	split NDD form AND insn to *<code><mode>3_1_slp.
	(define_split for zero_extend and optimization): Prohibit splitter to
	split NDD form AND insn to zero_extend insn.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/apx-ndd.c: Add and test.
---
 gcc/config/i386/i386.md                 | 175 +++++++++++++++---------
 gcc/testsuite/gcc.target/i386/apx-ndd.c |  13 ++
 2 files changed, 127 insertions(+), 61 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 050779273a7..64944a1163d 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -11705,18 +11705,19 @@ (define_expand "and<mode>3"
 	       (operands[0], gen_lowpart (mode, operands[1]),
 		<MODE>mode, mode, 1));
   else
-    ix86_expand_binary_operator (AND, <MODE>mode, operands);
+    ix86_expand_binary_operator (AND, <MODE>mode, operands,
+				 TARGET_APX_NDD);
 
   DONE;
 })
 
 (define_insn_and_split "*and<dwi>3_doubleword"
-  [(set (match_operand:<DWI> 0 "nonimmediate_operand" "=ro,r")
+  [(set (match_operand:<DWI> 0 "nonimmediate_operand" "=ro,r,r,r")
 	(and:<DWI>
-	 (match_operand:<DWI> 1 "nonimmediate_operand" "%0,0")
-	 (match_operand:<DWI> 2 "x86_64_hilo_general_operand" "r<di>,o")))
+	 (match_operand:<DWI> 1 "nonimmediate_operand" "%0,0,ro,r")
+	 (match_operand:<DWI> 2 "x86_64_hilo_general_operand" "r<di>,o,r<di>,o")))
    (clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (AND, <DWI>mode, operands)"
+  "ix86_binary_operator_ok (AND, <DWI>mode, operands, TARGET_APX_NDD)"
   "#"
   "&& reload_completed"
   [(const_int:DWIH 0)]
@@ -11728,39 +11729,53 @@ (define_insn_and_split "*and<dwi>3_doubleword"
   if (operands[2] == const0_rtx)
     emit_move_insn (operands[0], const0_rtx);
   else if (operands[2] == constm1_rtx)
-    emit_insn_deleted_note_p = true;
+    {
+      if (!rtx_equal_p (operands[0], operands[1]))
+	emit_move_insn (operands[0], operands[1]);
+      else
+	emit_insn_deleted_note_p = true;
+    }
   else
-    ix86_expand_binary_operator (AND, <MODE>mode, &operands[0]);
+    ix86_expand_binary_operator (AND, <MODE>mode, &operands[0],
+				 TARGET_APX_NDD);
 
   if (operands[5] == const0_rtx)
     emit_move_insn (operands[3], const0_rtx);
   else if (operands[5] == constm1_rtx)
     {
-      if (emit_insn_deleted_note_p)
+      if (!rtx_equal_p (operands[3], operands[4]))
+	emit_move_insn (operands[3], operands[4]);
+      else if (emit_insn_deleted_note_p)
 	emit_note (NOTE_INSN_DELETED);
     }
   else
-    ix86_expand_binary_operator (AND, <MODE>mode, &operands[3]);
+    ix86_expand_binary_operator (AND, <MODE>mode, &operands[3],
+				 TARGET_APX_NDD);
 
   DONE;
-})
+}
+[(set_attr "isa" "*,*,apx_ndd,apx_ndd")])
 
 (define_insn "*anddi_1"
-  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,rm,r,r,?k")
+  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,r,rm,r,r,r,r,?k")
 	(and:DI
-	 (match_operand:DI 1 "nonimmediate_operand" "%0,0,0,qm,k")
-	 (match_operand:DI 2 "x86_64_szext_general_operand" "Z,re,m,L,k")))
+	 (match_operand:DI 1 "nonimmediate_operand" "%0,r,0,0,rm,r,qm,k")
+	 (match_operand:DI 2 "x86_64_szext_general_operand" "Z,Z,re,m,re,m,L,k")))
    (clobber (reg:CC FLAGS_REG))]
-  "TARGET_64BIT && ix86_binary_operator_ok (AND, DImode, operands)"
+  "TARGET_64BIT && ix86_binary_operator_ok (AND, DImode, operands,
+					    TARGET_APX_NDD)"
   "@
    and{l}\t{%k2, %k0|%k0, %k2}
+   and{l}\t{%k2, %k1, %k0|%k0, %k1, %k2}
    and{q}\t{%2, %0|%0, %2}
    and{q}\t{%2, %0|%0, %2}
+   and{q}\t{%2, %1, %0|%0, %1, %2}
+   and{q}\t{%2, %1, %0|%0, %1, %2}
    #
    #"
-  [(set_attr "isa" "x64,x64,x64,x64,avx512bw_512")
-   (set_attr "type" "alu,alu,alu,imovx,msklog")
-   (set_attr "length_immediate" "*,*,*,0,*")
+  [(set_attr "isa" "x64,apx_ndd,x64,x64,apx_ndd,apx_ndd,x64,avx512bw_512")
+   (set_attr "type" "alu,alu,alu,alu,alu,alu,imovx,msklog")
+   (set_attr "length_immediate" "*,*,*,*,*,*,0,*")
    (set (attr "prefix_rex")
      (if_then_else
        (and (eq_attr "type" "imovx")
@@ -11768,7 +11783,7 @@ (define_insn "*anddi_1"
 		 (match_operand 1 "ext_QIreg_operand")))
        (const_string "1")
        (const_string "*")))
-   (set_attr "mode" "SI,DI,DI,SI,DI")])
+   (set_attr "mode" "SI,SI,DI,DI,DI,DI,SI,DI")])
 
 (define_insn_and_split "*anddi_1_btr"
   [(set (match_operand:DI 0 "nonimmediate_operand" "=rm")
@@ -11823,36 +11838,45 @@ (define_split
 
 ;; See comment for addsi_1_zext why we do use nonimmediate_operand
 (define_insn "*andsi_1_zext"
-  [(set (match_operand:DI 0 "register_operand" "=r")
+  [(set (match_operand:DI 0 "register_operand" "=r,r,r")
 	(zero_extend:DI
-	  (and:SI (match_operand:SI 1 "nonimmediate_operand" "%0")
-		  (match_operand:SI 2 "x86_64_general_operand" "rBMe"))))
+	  (and:SI (match_operand:SI 1 "nonimmediate_operand" "%0,rm,r")
+		  (match_operand:SI 2 "x86_64_general_operand" "rBMe,re,BM"))))
    (clobber (reg:CC FLAGS_REG))]
-  "TARGET_64BIT && ix86_binary_operator_ok (AND, SImode, operands)"
-  "and{l}\t{%2, %k0|%k0, %2}"
+  "TARGET_64BIT && ix86_binary_operator_ok (AND, SImode, operands,
+					    TARGET_APX_NDD)"
+  "@
+  and{l}\t{%2, %k0|%k0, %2}
+  and{l}\t{%2, %1, %k0|%k0, %1, %2}
+  and{l}\t{%2, %1, %k0|%k0, %1, %2}"
   [(set_attr "type" "alu")
+   (set_attr "isa" "*,apx_ndd,apx_ndd")
    (set_attr "mode" "SI")])
 
 (define_insn "*and<mode>_1"
-  [(set (match_operand:SWI24 0 "nonimmediate_operand" "=rm,r,Ya,?k")
-	(and:SWI24 (match_operand:SWI24 1 "nonimmediate_operand" "%0,0,qm,k")
-		   (match_operand:SWI24 2 "<general_operand>" "r<i>,<m>,L,k")))
+  [(set (match_operand:SWI24 0 "nonimmediate_operand" "=rm,r,r,r,Ya,?k")
+	(and:SWI24 (match_operand:SWI24 1 "nonimmediate_operand" "%0,0,rm,r,qm,k")
+		   (match_operand:SWI24 2 "<general_operand>" "r<i>,<m>,r<i>,<m>,L,k")))
    (clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (AND, <MODE>mode, operands)"
+  "ix86_binary_operator_ok (AND, <MODE>mode, operands, TARGET_APX_NDD)"
   "@
    and{<imodesuffix>}\t{%2, %0|%0, %2}
    and{<imodesuffix>}\t{%2, %0|%0, %2}
+   and{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}
+   and{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}
    #
    #"
   [(set (attr "isa")
-	(cond [(eq_attr "alternative" "3")
+	(cond [(eq_attr "alternative" "2,3")
+		 (const_string "apx_ndd")
+	       (eq_attr "alternative" "5")
 		 (if_then_else (eq_attr "mode" "SI")
 		   (const_string "avx512bw")
 		   (const_string "avx512f"))
 	      ]
 	      (const_string "*")))
-   (set_attr "type" "alu,alu,imovx,msklog")
-   (set_attr "length_immediate" "*,*,0,*")
+   (set_attr "type" "alu,alu,alu,alu,imovx,msklog")
+   (set_attr "length_immediate" "*,*,*,*,0,*")
    (set (attr "prefix_rex")
      (if_then_else
        (and (eq_attr "type" "imovx")
@@ -11860,24 +11884,27 @@ (define_insn "*and<mode>_1"
 		 (match_operand 1 "ext_QIreg_operand")))
        (const_string "1")
        (const_string "*")))
-   (set_attr "mode" "<MODE>,<MODE>,SI,<MODE>")])
+   (set_attr "mode" "<MODE>,<MODE>,<MODE>,<MODE>,SI,<MODE>")])
 
 (define_insn "*andqi_1"
-  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,q,r,?k")
-	(and:QI (match_operand:QI 1 "nonimmediate_operand" "%0,0,0,k")
-		(match_operand:QI 2 "general_operand" "qn,m,rn,k")))
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,q,r,r,r,?k")
+	(and:QI (match_operand:QI 1 "nonimmediate_operand" "%0,0,0,rm,r,k")
+		(match_operand:QI 2 "general_operand" "qn,m,rn,rn,m,k")))
    (clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (AND, QImode, operands)"
+  "ix86_binary_operator_ok (AND, QImode, operands, TARGET_APX_NDD)"
   "@
    and{b}\t{%2, %0|%0, %2}
    and{b}\t{%2, %0|%0, %2}
    and{l}\t{%k2, %k0|%k0, %k2}
+   and{b}\t{%2, %1, %0|%0, %1, %2}
+   and{b}\t{%2, %1, %0|%0, %1, %2}
    #"
-  [(set_attr "type" "alu,alu,alu,msklog")
+  [(set_attr "type" "alu,alu,alu,alu,alu,msklog")
+   (set_attr "isa" "*,*,*,apx_ndd,apx_ndd,*")
    (set (attr "mode")
 	(cond [(eq_attr "alternative" "2")
 		 (const_string "SI")
-		(and (eq_attr "alternative" "3")
+		(and (eq_attr "alternative" "5")
 		     (match_test "!TARGET_AVX512DQ"))
 		 (const_string "HI")
 	       ]
@@ -11980,7 +12007,10 @@ (define_split
    (clobber (reg:CC FLAGS_REG))]
   "reload_completed
    && (!REG_P (operands[1])
-       || REGNO (operands[0]) != REGNO (operands[1]))"
+       || REGNO (operands[0]) != REGNO (operands[1]))
+   && (UINTVAL (operands[2]) == GET_MODE_MASK (SImode)
+       || UINTVAL (operands[2]) == GET_MODE_MASK (HImode)
+       || UINTVAL (operands[2]) == GET_MODE_MASK (QImode))"
   [(const_int 0)]
 {
   unsigned HOST_WIDE_INT ival = UINTVAL (operands[2]);
@@ -12053,10 +12083,10 @@ (define_insn "*anddi_2"
   [(set (reg FLAGS_REG)
 	(compare
 	 (and:DI
-	  (match_operand:DI 1 "nonimmediate_operand" "%0,0,0")
-	  (match_operand:DI 2 "x86_64_szext_general_operand" "Z,re,m"))
+	  (match_operand:DI 1 "nonimmediate_operand" "%0,0,0,r,rm,r")
+	  (match_operand:DI 2 "x86_64_szext_general_operand" "Z,re,m,Z,re,m"))
 	 (const_int 0)))
-   (set (match_operand:DI 0 "nonimmediate_operand" "=r,rm,r")
+   (set (match_operand:DI 0 "nonimmediate_operand" "=r,rm,r,r,r,r")
 	(and:DI (match_dup 1) (match_dup 2)))]
   "TARGET_64BIT
    && ix86_match_ccmode
@@ -12070,38 +12100,46 @@ (define_insn "*anddi_2"
 	  && (!CONST_INT_P (operands[2])
 	      || val_signbit_known_set_p (SImode, INTVAL (operands[2]))))
 	 ? CCZmode : CCNOmode)
-   && ix86_binary_operator_ok (AND, DImode, operands)"
+   && ix86_binary_operator_ok (AND, DImode, operands, TARGET_APX_NDD)"
   "@
    and{l}\t{%k2, %k0|%k0, %k2}
    and{q}\t{%2, %0|%0, %2}
-   and{q}\t{%2, %0|%0, %2}"
+   and{q}\t{%2, %0|%0, %2}
+   and{l}\t{%k2, %k1, %k0|%k0, %k1, %k2}
+   and{q}\t{%2, %1, %0|%0, %1, %2}
+   and{q}\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "type" "alu")
-   (set_attr "mode" "SI,DI,DI")])
+   (set_attr "isa" "*,*,*,apx_ndd,apx_ndd,apx_ndd")
+   (set_attr "mode" "SI,DI,DI,SI,DI,DI")])
 
 ;; See comment for addsi_1_zext why we do use nonimmediate_operand
 (define_insn "*andsi_2_zext"
   [(set (reg FLAGS_REG)
 	(compare (and:SI
-		  (match_operand:SI 1 "nonimmediate_operand" "%0")
-		  (match_operand:SI 2 "x86_64_general_operand" "rBMe"))
+		  (match_operand:SI 1 "nonimmediate_operand" "%0,rm,r")
+		  (match_operand:SI 2 "x86_64_general_operand" "rBMe,re,BM"))
 		 (const_int 0)))
-   (set (match_operand:DI 0 "register_operand" "=r")
+   (set (match_operand:DI 0 "register_operand" "=r,r,r")
 	(zero_extend:DI (and:SI (match_dup 1) (match_dup 2))))]
   "TARGET_64BIT && ix86_match_ccmode (insn, CCNOmode)
-   && ix86_binary_operator_ok (AND, SImode, operands)"
-  "and{l}\t{%2, %k0|%k0, %2}"
+   && ix86_binary_operator_ok (AND, SImode, operands, TARGET_APX_NDD)"
+  "@
+  and{l}\t{%2, %k0|%k0, %2}
+  and{l}\t{%2, %1, %k0|%k0, %1, %2}
+  and{l}\t{%2, %1, %k0|%k0, %1, %2}"
   [(set_attr "type" "alu")
+   (set_attr "isa" "*,apx_ndd,apx_ndd")
    (set_attr "mode" "SI")])
 
 (define_insn "*andqi_2_maybe_si"
   [(set (reg FLAGS_REG)
 	(compare (and:QI
-		  (match_operand:QI 1 "nonimmediate_operand" "%0,0,0")
-		  (match_operand:QI 2 "general_operand" "qn,m,n"))
+		  (match_operand:QI 1 "nonimmediate_operand" "%0,0,0,rm,r")
+		  (match_operand:QI 2 "general_operand" "qn,m,n,rn,m"))
 		 (const_int 0)))
-   (set (match_operand:QI 0 "nonimmediate_operand" "=qm,q,r")
+   (set (match_operand:QI 0 "nonimmediate_operand" "=qm,q,r,r,r")
 	(and:QI (match_dup 1) (match_dup 2)))]
-  "ix86_binary_operator_ok (AND, QImode, operands)
+  "ix86_binary_operator_ok (AND, QImode, operands, TARGET_APX_NDD)
    && ix86_match_ccmode (insn,
 			 CONST_INT_P (operands[2])
 			 && INTVAL (operands[2]) >= 0 ? CCNOmode : CCZmode)"
@@ -12112,11 +12150,16 @@ (define_insn "*andqi_2_maybe_si"
         operands[2] = GEN_INT (INTVAL (operands[2]) & 0xff);
       return "and{l}\t{%2, %k0|%k0, %2}";
     }
+  if (which_alternative > 2)
+    return "and{b}\t{%2, %1, %0|%0, %1, %2}";
   return "and{b}\t{%2, %0|%0, %2}";
 }
   [(set_attr "type" "alu")
+   (set_attr "isa" "*,*,*,apx_ndd,apx_ndd")
    (set (attr "mode")
-     (cond [(eq_attr "alternative" "2")
+     (cond [(eq_attr "alternative" "3,4")
+	      (const_string "QI")
+	    (eq_attr "alternative" "2")
 	      (const_string "SI")
 	    (and (match_test "optimize_insn_for_size_p ()")
 		 (and (match_operand 0 "ext_QIreg_operand")
@@ -12133,15 +12176,21 @@ (define_insn "*andqi_2_maybe_si"
 (define_insn "*and<mode>_2"
   [(set (reg FLAGS_REG)
 	(compare (and:SWI124
-		  (match_operand:SWI124 1 "nonimmediate_operand" "%0,0")
-		  (match_operand:SWI124 2 "<general_operand>" "<r><i>,<m>"))
+		  (match_operand:SWI124 1 "nonimmediate_operand" "%0,0,rm,r")
+		  (match_operand:SWI124 2 "<general_operand>" "<r><i>,<m>,r<i>,<m>"))
 		 (const_int 0)))
-   (set (match_operand:SWI124 0 "nonimmediate_operand" "=<r>m,<r>")
+   (set (match_operand:SWI124 0 "nonimmediate_operand" "=<r>m,<r>,r,r")
 	(and:SWI124 (match_dup 1) (match_dup 2)))]
   "ix86_match_ccmode (insn, CCNOmode)
-   && ix86_binary_operator_ok (AND, <MODE>mode, operands)"
-  "and{<imodesuffix>}\t{%2, %0|%0, %2}"
+   && ix86_binary_operator_ok (AND, <MODE>mode, operands,
+			       TARGET_APX_NDD)"
+  "@
+  and{<imodesuffix>}\t{%2, %0|%0, %2}
+  and{<imodesuffix>}\t{%2, %0|%0, %2}
+  and{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}
+  and{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "type" "alu")
+   (set_attr "isa" "*,*,apx_ndd,apx_ndd")
    (set_attr "mode" "<MODE>")])
 
 (define_insn "*<code>qi_ext<mode>_0"
@@ -12387,6 +12436,7 @@ (define_insn_and_split "*<code>qi_ext<mode>_3"
 ;; Don't do the splitting with memory operands, since it introduces risk
 ;; of memory mismatch stalls.  We may want to do the splitting for optimizing
 ;; for size, but that can (should?) be handled by generic code instead.
+;; Don't do the splitting for APX NDD as NDD does not support *h registers.
 (define_split
   [(set (match_operand:SWI248 0 "QIreg_operand")
 	(and:SWI248 (match_operand:SWI248 1 "register_operand")
@@ -12394,7 +12444,8 @@ (define_split
    (clobber (reg:CC FLAGS_REG))]
    "reload_completed
     && (!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun))
-    && !(~INTVAL (operands[2]) & ~(255 << 8))"
+    && !(~INTVAL (operands[2]) & ~(255 << 8))
+    && !(TARGET_APX_NDD && REGNO (operands[0]) != REGNO (operands[1]))"
   [(parallel
      [(set (zero_extract:HI (match_dup 0)
 			    (const_int 8)
@@ -12423,7 +12474,9 @@ (define_split
    "reload_completed
     && (!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun))
     && !(~INTVAL (operands[2]) & ~255)
-    && !(INTVAL (operands[2]) & 128)"
+    && !(INTVAL (operands[2]) & 128)
+    && !(TARGET_APX_NDD
+	 && !rtx_equal_p (operands[0], operands[1]))"
   [(parallel [(set (strict_low_part (match_dup 0))
 		   (and:QI (match_dup 1)
 			   (match_dup 2)))
diff --git a/gcc/testsuite/gcc.target/i386/apx-ndd.c b/gcc/testsuite/gcc.target/i386/apx-ndd.c
index 2bd551614c4..be436d57bdf 100644
--- a/gcc/testsuite/gcc.target/i386/apx-ndd.c
+++ b/gcc/testsuite/gcc.target/i386/apx-ndd.c
@@ -85,6 +85,15 @@ F (int, not, ~)
 F1 (int, not, ~)
 F (long, not, ~)
 F1 (long, not, ~)
+
+FOO (char, and, &)
+FOO1 (char, and, &)
+FOO (short, and, &)
+FOO1 (short, and, &)
+FOO (int, and, &)
+FOO1 (int, and, &)
+FOO (long, and, &)
+FOO1 (long, and, &)
 /* { dg-final { scan-assembler-times "add(?:b|l|w|q)\[^\n\r]*1, \\(%rdi\\), %(?:|r|e)a(?:x|l)" 4 } } */
 /* { dg-final { scan-assembler-times "lea(?:l|q)\[^\n\r]\\(%r(?:d|s)i,%r(?:d|s)i\\), %(?:|r|e)ax" 4 } } */
 /* { dg-final { scan-assembler-times "add(?:b|l|w|q)\[^\n\r]%(?:|r|e)si(?:|l), \\(%rdi\\), %(?:|r|e)a(?:x|l)" 4 } } */
@@ -95,3 +104,7 @@ F1 (long, not, ~)
 /* { dg-final { scan-assembler-times "neg(?:l|w|q)\[^\n\r]%(?:|r|e)di, %(?:|r|e)ax" 4 } } */
 /* { dg-final { scan-assembler-times "not(?:b|l|w|q)\[^\n\r]\\(%rdi\\), %(?:|r|e)a(?:x|l)" 4 } } */
 /* { dg-final { scan-assembler-times "not(?:l|w|q)\[^\n\r]%(?:|r|e)di, %(?:|r|e)ax" 4 } } */
+/* { dg-final { scan-assembler-times "andb\[^\n\r]*1, \\(%rdi\\), %al" 1 } } */
+/* { dg-final { scan-assembler-times "and(?:l|w|q)\[^\n\r]*1, \\(%rdi\\), %(?:|r|e)ax" 3 } } */
+/* { dg-final { scan-assembler-times "and(?:l|w|q)\[^\n\r]%(?:|r|e)di, %(?:|r|e)si, %(?:|r|e)ax" 2 } } */
+/* { dg-final { scan-assembler-times "and(?:l|w|q)\[^\n\r]%(?:|r|e)si, %(?:|r|e)di, %(?:|r|e)ax" 2 } } */
-- 
2.31.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 11/17] [APX NDD] Support APX NDD for or/xor insn
  2023-12-05  2:29 [PATCH v2 00/17] Support Intel APX NDD Hongyu Wang
                   ` (9 preceding siblings ...)
  2023-12-05  2:29 ` [PATCH 10/17] [APX NDD] Support APX NDD for and insn Hongyu Wang
@ 2023-12-05  2:29 ` Hongyu Wang
  2023-12-05  2:29 ` [PATCH 12/17] [APX NDD] Support APX NDD for left shift insns Hongyu Wang
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Hongyu Wang @ 2023-12-05  2:29 UTC (permalink / raw)
  To: gcc-patches; +Cc: ubizjak, hongtao.liu, Kong Lingling

From: Kong Lingling <lingling.kong@intel.com>

Similar to AND insn, two splitters need to be adjusted to prevent
misoptimizaiton for NDD OR/XOR.

Also adjust *one_cmplsi2_2_zext and its corresponding splitter that will
generate xor insn.

gcc/ChangeLog:

	* config/i386/i386.md (<code><mode>3): Add new alternative for NDD
	and adjust output templates.
	(*<code><mode>_1): Likewise.
	(*<code>qi_1): Likewise.
	(*notxor<mode>_1): Likewise.
	(*<code>si_1_zext): Likewise.
	(*notxorqi_1): Likewise.
	(*<code><mode>_2): Likewise.
	(*<code>si_2_zext): Likewise.
	(*<code>si_2_zext_imm): Likewise.
	(*<code>si_1_zext_imm): Likewise, and use nonimmediate_operand for
	operands[1] to accept memory input for NDD alternative.
	(*one_cmplsi2_2_zext): Likewise.
	(define_split for *one_cmplsi2_2_zext): Use nonimmediate_operand for
	operands[3].
	(*<code><dwi>3_doubleword): Add NDD constraints, emit move for
	optimized case if operands[0] != operands[1] or operands[4]
	!= operands[5].
	(define_split for QI highpart OR/XOR): Prohibit splitter to split NDD
	form OR/XOR insn to <any_logic:code>qi_ext<mode>_3.
	(define_split for QI strict_lowpart optimization): Prohibit splitter to
	split NDD form AND insn to *<code><mode>3_1_slp.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/apx-ndd.c: Add or and xor test.
---
 gcc/config/i386/i386.md                 | 186 +++++++++++++++---------
 gcc/testsuite/gcc.target/i386/apx-ndd.c |  26 ++++
 2 files changed, 143 insertions(+), 69 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 64944a1163d..62cd21ee3d4 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -12698,17 +12698,19 @@ (define_expand "<code><mode>3"
       && !x86_64_hilo_general_operand (operands[2], <MODE>mode))
     operands[2] = force_reg (<MODE>mode, operands[2]);
 
-  ix86_expand_binary_operator (<CODE>, <MODE>mode, operands);
+  ix86_expand_binary_operator (<CODE>, <MODE>mode, operands,
+			       TARGET_APX_NDD);
   DONE;
 })
 
 (define_insn_and_split "*<code><dwi>3_doubleword"
-  [(set (match_operand:<DWI> 0 "nonimmediate_operand" "=ro,r")
+  [(set (match_operand:<DWI> 0 "nonimmediate_operand" "=ro,r,r,r")
 	(any_or:<DWI>
-	 (match_operand:<DWI> 1 "nonimmediate_operand" "%0,0")
-	 (match_operand:<DWI> 2 "x86_64_hilo_general_operand" "r<di>,o")))
+	 (match_operand:<DWI> 1 "nonimmediate_operand" "%0,0,ro,r")
+	 (match_operand:<DWI> 2 "x86_64_hilo_general_operand" "r<di>,o,r<di>,o")))
    (clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (<CODE>, <DWI>mode, operands)"
+  "ix86_binary_operator_ok (<CODE>, <DWI>mode, operands,
+			    TARGET_APX_NDD)"
   "#"
   "&& reload_completed"
   [(const_int:DWIH 0)]
@@ -12720,20 +12722,29 @@ (define_insn_and_split "*<code><dwi>3_doubleword"
   split_double_mode (<DWI>mode, &operands[0], 3, &operands[0], &operands[3]);
 
   if (operands[2] == const0_rtx)
-    emit_insn_deleted_note_p = true;
+    {
+      if (!rtx_equal_p (operands[0], operands[1]))
+	emit_move_insn (operands[0], operands[1]);
+      else
+	emit_insn_deleted_note_p = true;
+    }
   else if (operands[2] == constm1_rtx)
     {
       if (<CODE> == IOR)
 	emit_move_insn (operands[0], constm1_rtx);
       else
-	ix86_expand_unary_operator (NOT, <MODE>mode, &operands[0]);
+	ix86_expand_unary_operator (NOT, <MODE>mode, &operands[0],
+				    TARGET_APX_NDD);
     }
   else
-    ix86_expand_binary_operator (<CODE>, <MODE>mode, &operands[0]);
+    ix86_expand_binary_operator (<CODE>, <MODE>mode, &operands[0],
+				 TARGET_APX_NDD);
 
   if (operands[5] == const0_rtx)
     {
-      if (emit_insn_deleted_note_p)
+      if (!rtx_equal_p (operands[3], operands[4]))
+	emit_move_insn (operands[3], operands[4]);
+      else if (emit_insn_deleted_note_p)
 	emit_note (NOTE_INSN_DELETED);
     }
   else if (operands[5] == constm1_rtx)
@@ -12741,37 +12752,43 @@ (define_insn_and_split "*<code><dwi>3_doubleword"
       if (<CODE> == IOR)
 	emit_move_insn (operands[3], constm1_rtx);
       else
-	ix86_expand_unary_operator (NOT, <MODE>mode, &operands[3]);
+	ix86_expand_unary_operator (NOT, <MODE>mode, &operands[3],
+				    TARGET_APX_NDD);
     }
   else
-    ix86_expand_binary_operator (<CODE>, <MODE>mode, &operands[3]);
+    ix86_expand_binary_operator (<CODE>, <MODE>mode, &operands[3],
+				 TARGET_APX_NDD);
 
   DONE;
-})
+}
+[(set_attr "isa" "*,*,apx_ndd,apx_ndd")])
 
 (define_insn "*<code><mode>_1"
-  [(set (match_operand:SWI248 0 "nonimmediate_operand" "=rm,r,?k")
+  [(set (match_operand:SWI248 0 "nonimmediate_operand" "=rm,r,r,r,?k")
 	(any_or:SWI248
-	 (match_operand:SWI248 1 "nonimmediate_operand" "%0,0,k")
-	 (match_operand:SWI248 2 "<general_operand>" "r<i>,<m>,k")))
+	 (match_operand:SWI248 1 "nonimmediate_operand" "%0,0,rm,r,k")
+	 (match_operand:SWI248 2 "<general_operand>" "r<i>,<m>,r<i>,<m>,k")))
    (clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
+  "ix86_binary_operator_ok (<CODE>, <MODE>mode, operands,
+			    TARGET_APX_NDD)"
   "@
    <logic>{<imodesuffix>}\t{%2, %0|%0, %2}
    <logic>{<imodesuffix>}\t{%2, %0|%0, %2}
+   <logic>{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}
+   <logic>{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}
    #"
-  [(set_attr "isa" "*,*,<kmov_isa>")
-   (set_attr "type" "alu, alu, msklog")
+  [(set_attr "isa" "*,*,apx_ndd,apx_ndd,<kmov_isa>")
+   (set_attr "type" "alu, alu, alu, alu, msklog")
    (set_attr "mode" "<MODE>")])
 
 (define_insn_and_split "*notxor<mode>_1"
-  [(set (match_operand:SWI248 0 "nonimmediate_operand" "=rm,r,?k")
+  [(set (match_operand:SWI248 0 "nonimmediate_operand" "=rm,r,r,r,?k")
 	(not:SWI248
 	  (xor:SWI248
-	    (match_operand:SWI248 1 "nonimmediate_operand" "%0,0,k")
-	    (match_operand:SWI248 2 "<general_operand>" "r<i>,<m>,k"))))
+	    (match_operand:SWI248 1 "nonimmediate_operand" "%0,0,rm,r,k")
+	    (match_operand:SWI248 2 "<general_operand>" "r<i>,<m>,r<i>,<m>,k"))))
    (clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (XOR, <MODE>mode, operands)"
+  "ix86_binary_operator_ok (XOR, <MODE>mode, operands, TARGET_APX_NDD)"
   "#"
   "&& reload_completed"
   [(parallel
@@ -12787,8 +12804,8 @@ (define_insn_and_split "*notxor<mode>_1"
       DONE;
     }
 }
-  [(set_attr "isa" "*,*,<kmov_isa>")
-   (set_attr "type" "alu, alu, msklog")
+  [(set_attr "isa" "*,*,apx_ndd,apx_ndd,<kmov_isa>")
+   (set_attr "type" "alu, alu, alu, alu, msklog")
    (set_attr "mode" "<MODE>")])
 
 (define_insn_and_split "*iordi_1_bts"
@@ -12876,44 +12893,55 @@ (define_insn_and_split "*xor2andn"
 
 ;; See comment for addsi_1_zext why we do use nonimmediate_operand
 (define_insn "*<code>si_1_zext"
-  [(set (match_operand:DI 0 "register_operand" "=r")
+  [(set (match_operand:DI 0 "register_operand" "=r,r,r")
 	(zero_extend:DI
-	 (any_or:SI (match_operand:SI 1 "nonimmediate_operand" "%0")
-		    (match_operand:SI 2 "x86_64_general_operand" "rBMe"))))
+	 (any_or:SI (match_operand:SI 1 "nonimmediate_operand" "%0,rm,r")
+		    (match_operand:SI 2 "x86_64_general_operand" "rBMe,re,BM"))))
    (clobber (reg:CC FLAGS_REG))]
-  "TARGET_64BIT && ix86_binary_operator_ok (<CODE>, SImode, operands)"
-  "<logic>{l}\t{%2, %k0|%k0, %2}"
+  "TARGET_64BIT && ix86_binary_operator_ok (<CODE>, SImode, operands,
+					    TARGET_APX_NDD)"
+  "@
+  <logic>{l}\t{%2, %k0|%k0, %2}
+  <logic>{l}\t{%2, %1, %k0|%k0, %1, %2}
+  <logic>{l}\t{%2, %1, %k0|%k0, %1, %2}"
   [(set_attr "type" "alu")
+   (set_attr "isa" "*,apx_ndd,apx_ndd")
    (set_attr "mode" "SI")])
 
 (define_insn "*<code>si_1_zext_imm"
-  [(set (match_operand:DI 0 "register_operand" "=r")
+  [(set (match_operand:DI 0 "register_operand" "=r,r")
 	(any_or:DI
-	 (zero_extend:DI (match_operand:SI 1 "register_operand" "%0"))
-	 (match_operand:DI 2 "x86_64_zext_immediate_operand" "Z")))
+	 (zero_extend:DI (match_operand:SI 1 "nonimmediate_operand" "%0,rm"))
+	 (match_operand:DI 2 "x86_64_zext_immediate_operand" "Z,Z")))
    (clobber (reg:CC FLAGS_REG))]
-  "TARGET_64BIT && ix86_binary_operator_ok (<CODE>, SImode, operands)"
-  "<logic>{l}\t{%2, %k0|%k0, %2}"
+  "TARGET_64BIT && ix86_binary_operator_ok (<CODE>, SImode, operands,
+					    TARGET_APX_NDD)"
+  "@
+  <logic>{l}\t{%2, %k0|%k0, %2}
+  <logic>{l}\t{%2, %1, %k0|%k0, %1, %2}"
   [(set_attr "type" "alu")
+   (set_attr "isa" "*,apx_ndd")
    (set_attr "mode" "SI")])
 
 (define_insn "*<code>qi_1"
-  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,q,r,?k")
-	(any_or:QI (match_operand:QI 1 "nonimmediate_operand" "%0,0,0,k")
-		   (match_operand:QI 2 "general_operand" "qn,m,rn,k")))
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,q,r,r,r,?k")
+	(any_or:QI (match_operand:QI 1 "nonimmediate_operand" "%0,0,0,rm,r,k")
+		   (match_operand:QI 2 "general_operand" "qn,m,rn,rn,m,k")))
    (clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (<CODE>, QImode, operands)"
+  "ix86_binary_operator_ok (<CODE>, QImode, operands, TARGET_APX_NDD)"
   "@
    <logic>{b}\t{%2, %0|%0, %2}
    <logic>{b}\t{%2, %0|%0, %2}
    <logic>{l}\t{%k2, %k0|%k0, %k2}
+   <logic>{b}\t{%2, %1, %0|%0, %1, %2}
+   <logic>{b}\t{%2, %1, %0|%0, %1, %2}
    #"
-  [(set_attr "isa" "*,*,*,avx512f")
-   (set_attr "type" "alu,alu,alu,msklog")
+  [(set_attr "isa" "*,*,*,apx_ndd,apx_ndd,avx512f")
+   (set_attr "type" "alu,alu,alu,alu,alu,msklog")
    (set (attr "mode")
 	(cond [(eq_attr "alternative" "2")
 		 (const_string "SI")
-		(and (eq_attr "alternative" "3")
+		(and (eq_attr "alternative" "5")
 		     (match_test "!TARGET_AVX512DQ"))
 		 (const_string "HI")
 	       ]
@@ -12925,12 +12953,12 @@ (define_insn "*<code>qi_1"
 	   (symbol_ref "true")))])
 
 (define_insn_and_split "*notxorqi_1"
-  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,q,r,?k")
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,q,r,r,r,?k")
 	(not:QI
-	  (xor:QI (match_operand:QI 1 "nonimmediate_operand" "%0,0,0,k")
-		  (match_operand:QI 2 "general_operand" "qn,m,rn,k"))))
+	  (xor:QI (match_operand:QI 1 "nonimmediate_operand" "%0,0,0,rm,r,k")
+		  (match_operand:QI 2 "general_operand" "qn,m,rn,rn,m,k"))))
    (clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (XOR, QImode, operands)"
+  "ix86_binary_operator_ok (XOR, QImode, operands, TARGET_APX_NDD)"
   "#"
   "&& reload_completed"
   [(parallel
@@ -12946,12 +12974,12 @@ (define_insn_and_split "*notxorqi_1"
       DONE;
     }
 }
-  [(set_attr "isa" "*,*,*,avx512f")
-   (set_attr "type" "alu,alu,alu,msklog")
+  [(set_attr "isa" "*,*,*,apx_ndd,apx_ndd,avx512f")
+   (set_attr "type" "alu,alu,alu,alu,alu,msklog")
    (set (attr "mode")
 	(cond [(eq_attr "alternative" "2")
 		 (const_string "SI")
-		(and (eq_attr "alternative" "3")
+		(and (eq_attr "alternative" "5")
 		     (match_test "!TARGET_AVX512DQ"))
 		 (const_string "HI")
 	       ]
@@ -12999,44 +13027,59 @@ (define_split
 (define_insn "*<code><mode>_2"
   [(set (reg FLAGS_REG)
 	(compare (any_or:SWI
-		  (match_operand:SWI 1 "nonimmediate_operand" "%0,0")
-		  (match_operand:SWI 2 "<general_operand>" "<r><i>,<m>"))
+		  (match_operand:SWI 1 "nonimmediate_operand" "%0,0,rm,r")
+		  (match_operand:SWI 2 "<general_operand>" "<r><i>,<m>,r<i>,<m>"))
 		 (const_int 0)))
-   (set (match_operand:SWI 0 "nonimmediate_operand" "=<r>m,<r>")
+   (set (match_operand:SWI 0 "nonimmediate_operand" "=<r>m,<r>,r,r")
 	(any_or:SWI (match_dup 1) (match_dup 2)))]
   "ix86_match_ccmode (insn, CCNOmode)
-   && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
-  "<logic>{<imodesuffix>}\t{%2, %0|%0, %2}"
+   && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands,
+			       TARGET_APX_NDD)"
+  "@
+  <logic>{<imodesuffix>}\t{%2, %0|%0, %2}
+  <logic>{<imodesuffix>}\t{%2, %0|%0, %2}
+  <logic>{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}
+  <logic>{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "type" "alu")
+   (set_attr "isa" "*,*,apx_ndd,apx_ndd")
    (set_attr "mode" "<MODE>")])
 
 ;; See comment for addsi_1_zext why we do use nonimmediate_operand
 ;; ??? Special case for immediate operand is missing - it is tricky.
 (define_insn "*<code>si_2_zext"
   [(set (reg FLAGS_REG)
-	(compare (any_or:SI (match_operand:SI 1 "nonimmediate_operand" "%0")
-			    (match_operand:SI 2 "x86_64_general_operand" "rBMe"))
+	(compare (any_or:SI (match_operand:SI 1 "nonimmediate_operand" "%0,rm,r")
+			    (match_operand:SI 2 "x86_64_general_operand" "rBMe,re,BM"))
 		 (const_int 0)))
-   (set (match_operand:DI 0 "register_operand" "=r")
+   (set (match_operand:DI 0 "register_operand" "=r,r,r")
 	(zero_extend:DI (any_or:SI (match_dup 1) (match_dup 2))))]
   "TARGET_64BIT && ix86_match_ccmode (insn, CCNOmode)
-   && ix86_binary_operator_ok (<CODE>, SImode, operands)"
-  "<logic>{l}\t{%2, %k0|%k0, %2}"
+   && ix86_binary_operator_ok (<CODE>, SImode, operands,
+			       TARGET_APX_NDD)"
+  "@
+  <logic>{l}\t{%2, %k0|%k0, %2}
+  <logic>{l}\t{%2, %1, %k0|%k0, %1, %2}
+  <logic>{l}\t{%2, %1, %k0|%k0, %1, %2}"
   [(set_attr "type" "alu")
+   (set_attr "isa" "*,apx_ndd,apx_ndd")
    (set_attr "mode" "SI")])
 
 (define_insn "*<code>si_2_zext_imm"
   [(set (reg FLAGS_REG)
 	(compare (any_or:SI
-		  (match_operand:SI 1 "nonimmediate_operand" "%0")
-		  (match_operand:SI 2 "x86_64_zext_immediate_operand" "Z"))
+		  (match_operand:SI 1 "nonimmediate_operand" "%0,rm")
+		  (match_operand:SI 2 "x86_64_zext_immediate_operand" "Z,Z"))
 		 (const_int 0)))
-   (set (match_operand:DI 0 "register_operand" "=r")
+   (set (match_operand:DI 0 "register_operand" "=r,r")
 	(any_or:DI (zero_extend:DI (match_dup 1)) (match_dup 2)))]
   "TARGET_64BIT && ix86_match_ccmode (insn, CCNOmode)
-   && ix86_binary_operator_ok (<CODE>, SImode, operands)"
-  "<logic>{l}\t{%2, %k0|%k0, %2}"
+   && ix86_binary_operator_ok (<CODE>, SImode, operands,
+			       TARGET_APX_NDD)"
+  "@
+  <logic>{l}\t{%2, %k0|%k0, %2}
+  <logic>{l}\t{%2, %1, %k0|%k0, %1, %2}"
   [(set_attr "type" "alu")
+   (set_attr "isa" "*,apx_ndd")
    (set_attr "mode" "SI")])
 
 (define_insn "*<code><mode>_3"
@@ -13057,6 +13100,7 @@ (define_insn "*<code><mode>_3"
 ;; Don't do the splitting with memory operands, since it introduces risk
 ;; of memory mismatch stalls.  We may want to do the splitting for optimizing
 ;; for size, but that can (should?) be handled by generic code instead.
+;; Don't do the splitting for APX NDD as NDD does not support *h registers.
 (define_split
   [(set (match_operand:SWI248 0 "QIreg_operand")
 	(any_or:SWI248 (match_operand:SWI248 1 "register_operand")
@@ -13064,7 +13108,8 @@ (define_split
    (clobber (reg:CC FLAGS_REG))]
    "reload_completed
     && (!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun))
-    && !(INTVAL (operands[2]) & ~(255 << 8))"
+    && !(INTVAL (operands[2]) & ~(255 << 8))
+    && !(TARGET_APX_NDD && REGNO (operands[0]) != REGNO (operands[1]))"
   [(parallel
      [(set (zero_extract:HI (match_dup 0)
 			    (const_int 8)
@@ -13102,7 +13147,9 @@ (define_split
    "reload_completed
     && (!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun))
     && !(INTVAL (operands[2]) & ~255)
-    && (INTVAL (operands[2]) & 128)"
+    && (INTVAL (operands[2]) & 128)
+    && !(TARGET_APX_NDD
+	 && !rtx_equal_p (operands[0], operands[1]))"
   [(parallel [(set (strict_low_part (match_dup 0))
 		   (any_or:QI (match_dup 1)
 			      (match_dup 2)))
@@ -14168,20 +14215,21 @@ (define_split
 
 (define_insn "*one_cmplsi2_2_zext"
   [(set (reg FLAGS_REG)
-	(compare (not:SI (match_operand:SI 1 "register_operand" "0"))
+	(compare (not:SI (match_operand:SI 1 "nonimmediate_operand" "0,rm"))
 		 (const_int 0)))
-   (set (match_operand:DI 0 "register_operand" "=r")
+   (set (match_operand:DI 0 "register_operand" "=r,r")
 	(zero_extend:DI (not:SI (match_dup 1))))]
   "TARGET_64BIT && ix86_match_ccmode (insn, CCNOmode)
-   && ix86_unary_operator_ok (NOT, SImode, operands)"
+   && ix86_unary_operator_ok (NOT, SImode, operands, TARGET_APX_NDD)"
   "#"
   [(set_attr "type" "alu1")
+   (set_attr "isa" "*,apx_ndd")
    (set_attr "mode" "SI")])
 
 (define_split
   [(set (match_operand 0 "flags_reg_operand")
 	(match_operator 2 "compare_operator"
-	  [(not:SI (match_operand:SI 3 "register_operand"))
+	  [(not:SI (match_operand:SI 3 "nonimmediate_operand"))
 	   (const_int 0)]))
    (set (match_operand:DI 1 "register_operand")
 	(zero_extend:DI (not:SI (match_dup 3))))]
diff --git a/gcc/testsuite/gcc.target/i386/apx-ndd.c b/gcc/testsuite/gcc.target/i386/apx-ndd.c
index be436d57bdf..d97648c876d 100644
--- a/gcc/testsuite/gcc.target/i386/apx-ndd.c
+++ b/gcc/testsuite/gcc.target/i386/apx-ndd.c
@@ -94,6 +94,24 @@ FOO (int, and, &)
 FOO1 (int, and, &)
 FOO (long, and, &)
 FOO1 (long, and, &)
+
+FOO (char, or, |)
+FOO1 (char, or, |)
+FOO (short, or, |)
+FOO1 (short, or, |)
+FOO (int, or, |)
+FOO1 (int, or, |)
+FOO (long, or, |)
+FOO1 (long, or, |)
+
+FOO (char, xor, ^)
+FOO1 (char, xor, ^)
+FOO (short, xor, ^)
+FOO1 (short, xor, ^)
+FOO (int, xor, ^)
+FOO1 (int, xor, ^)
+FOO (long, xor, ^)
+FOO1 (long, xor, ^)
 /* { dg-final { scan-assembler-times "add(?:b|l|w|q)\[^\n\r]*1, \\(%rdi\\), %(?:|r|e)a(?:x|l)" 4 } } */
 /* { dg-final { scan-assembler-times "lea(?:l|q)\[^\n\r]\\(%r(?:d|s)i,%r(?:d|s)i\\), %(?:|r|e)ax" 4 } } */
 /* { dg-final { scan-assembler-times "add(?:b|l|w|q)\[^\n\r]%(?:|r|e)si(?:|l), \\(%rdi\\), %(?:|r|e)a(?:x|l)" 4 } } */
@@ -108,3 +126,11 @@ FOO1 (long, and, &)
 /* { dg-final { scan-assembler-times "and(?:l|w|q)\[^\n\r]*1, \\(%rdi\\), %(?:|r|e)ax" 3 } } */
 /* { dg-final { scan-assembler-times "and(?:l|w|q)\[^\n\r]%(?:|r|e)di, %(?:|r|e)si, %(?:|r|e)ax" 2 } } */
 /* { dg-final { scan-assembler-times "and(?:l|w|q)\[^\n\r]%(?:|r|e)si, %(?:|r|e)di, %(?:|r|e)ax" 2 } } */
+/* { dg-final { scan-assembler-times "orb\[^\n\r]*1, \\(%rdi\\), %al" 2} } */
+/* { dg-final { scan-assembler-times "or(?:l|w|q)\[^\n\r]*1, \\(%rdi\\), %(?:|r|e)ax" 6 } } */
+/* { dg-final { scan-assembler-times "or(?:l|w|q)\[^\n\r]%(?:|r|e)di, %(?:|r|e)si, %(?:|r|e)ax" 4 } } */
+/* { dg-final { scan-assembler-times "or(?:l|w|q)\[^\n\r]%(?:|r|e)si, %(?:|r|e)di, %(?:|r|e)ax" 4 } } */
+/* { dg-final { scan-assembler-times "xorb\[^\n\r]*1, \\(%rdi\\), %al" 1 } } */
+/* { dg-final { scan-assembler-times "xor(?:l|w|q)\[^\n\r]*1, \\(%rdi\\), %(?:|r|e)ax" 3 } } */
+/* { dg-final { scan-assembler-times "xor(?:l|w|q)\[^\n\r]%(?:|r|e)di, %(?:|r|e)si, %(?:|r|e)ax" 2 } } */
+/* { dg-final { scan-assembler-times "xor(?:l|w|q)\[^\n\r]%(?:|r|e)si, %(?:|r|e)di, %(?:|r|e)ax" 2 } } */
-- 
2.31.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 12/17] [APX NDD] Support APX NDD for left shift insns
  2023-12-05  2:29 [PATCH v2 00/17] Support Intel APX NDD Hongyu Wang
                   ` (10 preceding siblings ...)
  2023-12-05  2:29 ` [PATCH 11/17] [APX NDD] Support APX NDD for or/xor insn Hongyu Wang
@ 2023-12-05  2:29 ` Hongyu Wang
  2023-12-05  2:29 ` [PATCH 13/17] [APX NDD] Support APX NDD for right " Hongyu Wang
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Hongyu Wang @ 2023-12-05  2:29 UTC (permalink / raw)
  To: gcc-patches; +Cc: ubizjak, hongtao.liu

For left shift, there is an optimization TARGET_DOUBLE_WITH_ADD that shl
1 can be optimized to add. As NDD form of add requires src operand to
be register since NDD cannot take 2 memory src, we currently just keep
using NDD form shift instead of add.

The optimization TARGET_SHIFT1 will try to remove constant 1 to use shorter
opcode, but under NDD assembler will automatically use it whether $1 exist
or not, so do not involve NDD with it.

The doubleword insns for left shift calls ix86_expand_ashl, which assume
all shift related pattern has same operand[0] and operand[1]. For these pattern
we will support them in a standalone patch.

gcc/ChangeLog:

	* config/i386/i386.md (*ashl<mode>3_1): Extend with new
	alternatives to support NDD, limit the new alternative to
	generate sal only, and adjust output template for NDD.
	(*ashlsi3_1_zext): Likewise.
	(*ashlhi3_1): Likewise.
	(*ashlqi3_1): Likewise.
	(*ashl<mode>3_cmp): Likewise.
	(*ashlsi3_cmp_zext): Likewise, and use nonimmediate_operand for
	operands[1] to accept memory input for NDD alternative.
	(*ashl<mode>3_cconly): Likewise.
	(*ashl<dwi>3_doubleword_highpart): Adjust codegen for NDD.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/apx-ndd.c: Add tests for sal.
---
 gcc/config/i386/i386.md                 | 172 ++++++++++++++++--------
 gcc/testsuite/gcc.target/i386/apx-ndd.c |  22 +++
 2 files changed, 136 insertions(+), 58 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 62cd21ee3d4..43be1364bff 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -14467,10 +14467,19 @@ (define_insn_and_split "*ashl<dwi>3_doubleword_highpart"
 {
   split_double_mode (<DWI>mode, &operands[0], 1, &operands[0], &operands[3]);
   int bits = INTVAL (operands[2]) - (<MODE_SIZE> * BITS_PER_UNIT);
-  if (!rtx_equal_p (operands[3], operands[1]))
-    emit_move_insn (operands[3], operands[1]);
-  if (bits > 0)
-    emit_insn (gen_ashl<mode>3 (operands[3], operands[3], GEN_INT (bits)));
+  bool op_equal_p = rtx_equal_p (operands[3], operands[1]);
+  if (bits == 0)
+    {
+      if (!op_equal_p)
+	emit_move_insn (operands[3], operands[1]);
+    }
+  else
+    {
+      if (!op_equal_p && !TARGET_APX_NDD)
+	emit_move_insn (operands[3], operands[1]);
+      rtx op_tmp = TARGET_APX_NDD ? operands[1] : operands[3];
+      emit_insn (gen_ashl<mode>3 (operands[3], op_tmp, GEN_INT (bits)));
+    }
   ix86_expand_clear (operands[0]);
   DONE;
 })
@@ -14777,12 +14786,14 @@ (define_insn "*bmi2_ashl<mode>3_1"
    (set_attr "mode" "<MODE>")])
 
 (define_insn "*ashl<mode>3_1"
-  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r,?k")
-	(ashift:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" "0,l,rm,k")
-		      (match_operand:QI 2 "nonmemory_operand" "c<S>,M,r,<KS>")))
+  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r,?k,r")
+	(ashift:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" "0,l,rm,k,rm")
+		      (match_operand:QI 2 "nonmemory_operand" "c<S>,M,r,<KS>,c<S>")))
    (clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (ASHIFT, <MODE>mode, operands)"
+  "ix86_binary_operator_ok (ASHIFT, <MODE>mode, operands,
+			    TARGET_APX_NDD)"
 {
+  bool use_ndd = (which_alternative == 4);
   switch (get_attr_type (insn))
     {
     case TYPE_LEA:
@@ -14797,18 +14808,25 @@ (define_insn "*ashl<mode>3_1"
 
     default:
       if (operands[2] == const1_rtx
-	  && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)))
+	  && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))
+	  /* For NDD form instructions related to TARGET_SHIFT1, the $1
+	     immediate do not need to be omitted as assembler will map it
+	     to use shorter encoding. */
+	  && !use_ndd)
 	return "sal{<imodesuffix>}\t%0";
       else
-	return "sal{<imodesuffix>}\t{%2, %0|%0, %2}";
+	return use_ndd ? "sal{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}"
+		       : "sal{<imodesuffix>}\t{%2, %0|%0, %2}";
     }
 }
-  [(set_attr "isa" "*,*,bmi2,<kmov_isa>")
+  [(set_attr "isa" "*,*,bmi2,<kmov_isa>,apx_ndd")
    (set (attr "type")
      (cond [(eq_attr "alternative" "1")
 	      (const_string "lea")
 	    (eq_attr "alternative" "2")
 	      (const_string "ishiftx")
+	    (eq_attr "alternative" "4")
+	      (const_string "ishift")
             (and (and (match_test "TARGET_DOUBLE_WITH_ADD")
 		      (match_operand 0 "register_operand"))
 		 (match_operand 2 "const1_operand"))
@@ -14850,13 +14868,15 @@ (define_insn "*bmi2_ashlsi3_1_zext"
    (set_attr "mode" "SI")])
 
 (define_insn "*ashlsi3_1_zext"
-  [(set (match_operand:DI 0 "register_operand" "=r,r,r")
+  [(set (match_operand:DI 0 "register_operand" "=r,r,r,r")
 	(zero_extend:DI
-	  (ashift:SI (match_operand:SI 1 "nonimmediate_operand" "0,l,rm")
-		     (match_operand:QI 2 "nonmemory_operand" "cI,M,r"))))
+	  (ashift:SI (match_operand:SI 1 "nonimmediate_operand" "0,l,rm,rm")
+		     (match_operand:QI 2 "nonmemory_operand" "cI,M,r,cI"))))
    (clobber (reg:CC FLAGS_REG))]
-  "TARGET_64BIT && ix86_binary_operator_ok (ASHIFT, SImode, operands)"
+  "TARGET_64BIT && ix86_binary_operator_ok (ASHIFT, SImode, operands,
+					    TARGET_APX_NDD)"
 {
+  bool use_ndd = (which_alternative == 3);
   switch (get_attr_type (insn))
     {
     case TYPE_LEA:
@@ -14869,18 +14889,22 @@ (define_insn "*ashlsi3_1_zext"
 
     default:
       if (operands[2] == const1_rtx
-	  && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)))
+	  && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))
+	  && !use_ndd)
 	return "sal{l}\t%k0";
       else
-	return "sal{l}\t{%2, %k0|%k0, %2}";
+	return use_ndd ? "sal{l}\t{%2, %1, %k0|%k0, %1, %2}"
+		       : "sal{l}\t{%2, %k0|%k0, %2}";
     }
 }
-  [(set_attr "isa" "*,*,bmi2")
+  [(set_attr "isa" "*,*,bmi2,apx_ndd")
    (set (attr "type")
      (cond [(eq_attr "alternative" "1")
 	      (const_string "lea")
 	    (eq_attr "alternative" "2")
 	      (const_string "ishiftx")
+	    (eq_attr "alternative" "3")
+	      (const_string "ishift")
             (and (match_test "TARGET_DOUBLE_WITH_ADD")
 		 (match_operand 2 "const1_operand"))
 	      (const_string "alu")
@@ -14910,12 +14934,14 @@ (define_split
   "operands[2] = gen_lowpart (SImode, operands[2]);")
 
 (define_insn "*ashlhi3_1"
-  [(set (match_operand:HI 0 "nonimmediate_operand" "=rm,Yp,?k")
-	(ashift:HI (match_operand:HI 1 "nonimmediate_operand" "0,l,k")
-		   (match_operand:QI 2 "nonmemory_operand" "cI,M,Ww")))
+  [(set (match_operand:HI 0 "nonimmediate_operand" "=rm,Yp,?k,r")
+	(ashift:HI (match_operand:HI 1 "nonimmediate_operand" "0,l,k,rm")
+		   (match_operand:QI 2 "nonmemory_operand" "cI,M,Ww,cI")))
    (clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (ASHIFT, HImode, operands)"
+  "ix86_binary_operator_ok (ASHIFT, HImode, operands,
+			    TARGET_APX_NDD)"
 {
+  bool use_ndd = (which_alternative == 3);
   switch (get_attr_type (insn))
     {
     case TYPE_LEA:
@@ -14928,18 +14954,22 @@ (define_insn "*ashlhi3_1"
 
     default:
       if (operands[2] == const1_rtx
-	  && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)))
+	  && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))
+	  && !use_ndd)
 	return "sal{w}\t%0";
       else
-	return "sal{w}\t{%2, %0|%0, %2}";
+	return use_ndd ? "sal{w}\t{%2, %1, %0|%0, %1, %2}"
+		       : "sal{w}\t{%2, %0|%0, %2}";
     }
 }
-  [(set_attr "isa" "*,*,avx512f")
+  [(set_attr "isa" "*,*,avx512f,apx_ndd")
    (set (attr "type")
      (cond [(eq_attr "alternative" "1")
 	      (const_string "lea")
 	    (eq_attr "alternative" "2")
 	      (const_string "msklog")
+	    (eq_attr "alternative" "3")
+	      (const_string "ishift")
             (and (and (match_test "TARGET_DOUBLE_WITH_ADD")
 		      (match_operand 0 "register_operand"))
 		 (match_operand 2 "const1_operand"))
@@ -14955,15 +14985,17 @@ (define_insn "*ashlhi3_1"
 			   (match_test "optimize_function_for_size_p (cfun)")))))
        (const_string "0")
        (const_string "*")))
-   (set_attr "mode" "HI,SI,HI")])
+   (set_attr "mode" "HI,SI,HI,HI")])
 
 (define_insn "*ashlqi3_1"
-  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,r,Yp,?k")
-	(ashift:QI (match_operand:QI 1 "nonimmediate_operand" "0,0,l,k")
-		   (match_operand:QI 2 "nonmemory_operand" "cI,cI,M,Wb")))
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,r,Yp,?k,r")
+	(ashift:QI (match_operand:QI 1 "nonimmediate_operand" "0,0,l,k,rm")
+		   (match_operand:QI 2 "nonmemory_operand" "cI,cI,M,Wb,cI")))
    (clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (ASHIFT, QImode, operands)"
+  "ix86_binary_operator_ok (ASHIFT, QImode, operands,
+			    TARGET_APX_NDD)"
 {
+  bool use_ndd = (which_alternative == 4);
   switch (get_attr_type (insn))
     {
     case TYPE_LEA:
@@ -14979,7 +15011,8 @@ (define_insn "*ashlqi3_1"
 
     default:
       if (operands[2] == const1_rtx
-	  && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)))
+	  && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))
+	  && !use_ndd)
 	{
 	  if (get_attr_mode (insn) == MODE_SI)
 	    return "sal{l}\t%k0";
@@ -14991,16 +15024,19 @@ (define_insn "*ashlqi3_1"
 	  if (get_attr_mode (insn) == MODE_SI)
 	    return "sal{l}\t{%2, %k0|%k0, %2}";
 	  else
-	    return "sal{b}\t{%2, %0|%0, %2}";
+	    return use_ndd ? "sal{b}\t{%2, %1, %0|%0, %1, %2}"
+			   : "sal{b}\t{%2, %0|%0, %2}";
 	}
     }
 }
-  [(set_attr "isa" "*,*,*,avx512dq")
+  [(set_attr "isa" "*,*,*,avx512dq,apx_ndd")
    (set (attr "type")
      (cond [(eq_attr "alternative" "2")
 	      (const_string "lea")
 	    (eq_attr "alternative" "3")
 	      (const_string "msklog")
+	    (eq_attr "alternative" "4")
+	      (const_string "ishift")
             (and (and (match_test "TARGET_DOUBLE_WITH_ADD")
 		      (match_operand 0 "register_operand"))
 		 (match_operand 2 "const1_operand"))
@@ -15016,10 +15052,10 @@ (define_insn "*ashlqi3_1"
 			   (match_test "optimize_function_for_size_p (cfun)")))))
        (const_string "0")
        (const_string "*")))
-   (set_attr "mode" "QI,SI,SI,QI")
+   (set_attr "mode" "QI,SI,SI,QI,QI")
    ;; Potential partial reg stall on alternative 1.
    (set (attr "preferred_for_speed")
-     (cond [(eq_attr "alternative" "1")
+     (cond [(eq_attr "alternative" "1,4")
 	      (symbol_ref "!TARGET_PARTIAL_REG_STALL")]
 	   (symbol_ref "true")))])
 
@@ -15114,10 +15150,10 @@ (define_split
 (define_insn "*ashl<mode>3_cmp"
   [(set (reg FLAGS_REG)
 	(compare
-	  (ashift:SWI (match_operand:SWI 1 "nonimmediate_operand" "0")
-		      (match_operand:QI 2 "<shift_immediate_operand>" "<S>"))
+	  (ashift:SWI (match_operand:SWI 1 "nonimmediate_operand" "0,rm")
+		      (match_operand:QI 2 "<shift_immediate_operand>" "<S>,<S>"))
 	  (const_int 0)))
-   (set (match_operand:SWI 0 "nonimmediate_operand" "=<r>m")
+   (set (match_operand:SWI 0 "nonimmediate_operand" "=<r>m,r")
 	(ashift:SWI (match_dup 1) (match_dup 2)))]
   "(optimize_function_for_size_p (cfun)
     || !TARGET_PARTIAL_FLAG_REG_STALL
@@ -15125,8 +15161,10 @@ (define_insn "*ashl<mode>3_cmp"
 	&& (TARGET_SHIFT1
 	    || (TARGET_DOUBLE_WITH_ADD && REG_P (operands[0])))))
    && ix86_match_ccmode (insn, CCGOCmode)
-   && ix86_binary_operator_ok (ASHIFT, <MODE>mode, operands)"
+   && ix86_binary_operator_ok (ASHIFT, <MODE>mode, operands,
+			       TARGET_APX_NDD)"
 {
+  bool use_ndd = (which_alternative == 1);
   switch (get_attr_type (insn))
     {
     case TYPE_ALU:
@@ -15135,14 +15173,19 @@ (define_insn "*ashl<mode>3_cmp"
 
     default:
       if (operands[2] == const1_rtx
-	  && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)))
+	  && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))
+	  && !use_ndd)
 	return "sal{<imodesuffix>}\t%0";
       else
-	return "sal{<imodesuffix>}\t{%2, %0|%0, %2}";
+	return use_ndd ? "sal{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}"
+		       : "sal{<imodesuffix>}\t{%2, %0|%0, %2}";
     }
 }
-  [(set (attr "type")
-     (cond [(and (and (match_test "TARGET_DOUBLE_WITH_ADD")
+  [(set_attr "isa" "*,apx_ndd")
+   (set (attr "type")
+     (cond [(eq_attr "alternative" "1")
+	      (const_string "ishift")
+	    (and (and (match_test "TARGET_DOUBLE_WITH_ADD")
 		      (match_operand 0 "register_operand"))
 		 (match_operand 2 "const1_operand"))
 	      (const_string "alu")
@@ -15162,10 +15205,10 @@ (define_insn "*ashl<mode>3_cmp"
 (define_insn "*ashlsi3_cmp_zext"
   [(set (reg FLAGS_REG)
 	(compare
-	  (ashift:SI (match_operand:SI 1 "register_operand" "0")
+	  (ashift:SI (match_operand:SI 1 "nonimmediate_operand" "0,rm")
 		     (match_operand:QI 2 "const_1_to_31_operand"))
 	  (const_int 0)))
-   (set (match_operand:DI 0 "register_operand" "=r")
+   (set (match_operand:DI 0 "register_operand" "=r,r")
 	(zero_extend:DI (ashift:SI (match_dup 1) (match_dup 2))))]
   "TARGET_64BIT
    && (optimize_function_for_size_p (cfun)
@@ -15174,8 +15217,10 @@ (define_insn "*ashlsi3_cmp_zext"
 	   && (TARGET_SHIFT1
 	       || TARGET_DOUBLE_WITH_ADD)))
    && ix86_match_ccmode (insn, CCGOCmode)
-   && ix86_binary_operator_ok (ASHIFT, SImode, operands)"
+   && ix86_binary_operator_ok (ASHIFT, SImode, operands,
+			       TARGET_APX_NDD)"
 {
+  bool use_ndd = (which_alternative == 1);
   switch (get_attr_type (insn))
     {
     case TYPE_ALU:
@@ -15184,14 +15229,19 @@ (define_insn "*ashlsi3_cmp_zext"
 
     default:
       if (operands[2] == const1_rtx
-	  && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)))
+	  && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))
+	  && !use_ndd)
 	return "sal{l}\t%k0";
       else
-	return "sal{l}\t{%2, %k0|%k0, %2}";
+	return use_ndd ? "sal{l}\t{%2, %1, %k0|%k0, %1, %2}"
+		       : "sal{l}\t{%2, %k0|%k0, %2}";
     }
 }
-  [(set (attr "type")
-     (cond [(and (match_test "TARGET_DOUBLE_WITH_ADD")
+  [(set_attr "isa" "*,apx_ndd")
+   (set (attr "type")
+     (cond [(eq_attr "alternative" "1")
+	      (const_string "ishift")
+	    (and (match_test "TARGET_DOUBLE_WITH_ADD")
 		 (match_operand 2 "const1_operand"))
 	      (const_string "alu")
 	   ]
@@ -15210,10 +15260,10 @@ (define_insn "*ashlsi3_cmp_zext"
 (define_insn "*ashl<mode>3_cconly"
   [(set (reg FLAGS_REG)
 	(compare
-	  (ashift:SWI (match_operand:SWI 1 "register_operand" "0")
-		      (match_operand:QI 2 "<shift_immediate_operand>" "<S>"))
+	  (ashift:SWI (match_operand:SWI 1 "nonimmediate_operand" "0,rm")
+		      (match_operand:QI 2 "<shift_immediate_operand>" "<S>,<S>"))
 	  (const_int 0)))
-   (clobber (match_scratch:SWI 0 "=<r>"))]
+   (clobber (match_scratch:SWI 0 "=<r>,r"))]
   "(optimize_function_for_size_p (cfun)
     || !TARGET_PARTIAL_FLAG_REG_STALL
     || (operands[2] == const1_rtx
@@ -15221,22 +15271,28 @@ (define_insn "*ashl<mode>3_cconly"
 	    || TARGET_DOUBLE_WITH_ADD)))
    && ix86_match_ccmode (insn, CCGOCmode)"
 {
+  bool use_ndd = (which_alternative == 1);
   switch (get_attr_type (insn))
     {
     case TYPE_ALU:
       gcc_assert (operands[2] == const1_rtx);
       return "add{<imodesuffix>}\t%0, %0";
 
-    default:
+  default:
       if (operands[2] == const1_rtx
-	  && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)))
+	  && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))
+	  && !use_ndd)
 	return "sal{<imodesuffix>}\t%0";
       else
-	return "sal{<imodesuffix>}\t{%2, %0|%0, %2}";
+	return use_ndd ? "sal{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}"
+		       : "sal{<imodesuffix>}\t{%2, %0|%0, %2}";
     }
 }
-  [(set (attr "type")
-     (cond [(and (and (match_test "TARGET_DOUBLE_WITH_ADD")
+  [(set_attr "isa" "*,apx_ndd")
+   (set (attr "type")
+     (cond [(eq_attr "alternative" "1")
+	      (const_string "ishift")
+	    (and (and (match_test "TARGET_DOUBLE_WITH_ADD")
 		      (match_operand 0 "register_operand"))
 		 (match_operand 2 "const1_operand"))
 	      (const_string "alu")
diff --git a/gcc/testsuite/gcc.target/i386/apx-ndd.c b/gcc/testsuite/gcc.target/i386/apx-ndd.c
index d97648c876d..9951fb00a4c 100644
--- a/gcc/testsuite/gcc.target/i386/apx-ndd.c
+++ b/gcc/testsuite/gcc.target/i386/apx-ndd.c
@@ -29,6 +29,16 @@ foo2_##OP_NAME##_##TYPE (TYPE *a, TYPE b) \
   return c;			 	  \
 }
 
+#define FOO3(TYPE, OP_NAME, OP, IMM)  \
+TYPE				      \
+__attribute__ ((noipa))		      \
+foo3_##OP_NAME##_##TYPE (TYPE a)      \
+{				      \
+  TYPE b = a OP IMM;		      \
+  return b;			      \
+}			
+
+
 #define F(TYPE, OP_NAME, OP)   \
 TYPE				 \
 __attribute__ ((noipa)) 	 \
@@ -112,6 +122,16 @@ FOO (int, xor, ^)
 FOO1 (int, xor, ^)
 FOO (long, xor, ^)
 FOO1 (long, xor, ^)
+
+FOO (char, shl, <<)
+FOO3 (char, shl, <<, 7)
+FOO (short, shl, <<)
+FOO3 (short, shl, <<, 7)
+FOO (int, shl, <<)
+FOO3 (int, shl, <<, 7)
+FOO (long, shl, <<)
+FOO3 (long, shl, <<, 7)
+
 /* { dg-final { scan-assembler-times "add(?:b|l|w|q)\[^\n\r]*1, \\(%rdi\\), %(?:|r|e)a(?:x|l)" 4 } } */
 /* { dg-final { scan-assembler-times "lea(?:l|q)\[^\n\r]\\(%r(?:d|s)i,%r(?:d|s)i\\), %(?:|r|e)ax" 4 } } */
 /* { dg-final { scan-assembler-times "add(?:b|l|w|q)\[^\n\r]%(?:|r|e)si(?:|l), \\(%rdi\\), %(?:|r|e)a(?:x|l)" 4 } } */
@@ -134,3 +154,5 @@ FOO1 (long, xor, ^)
 /* { dg-final { scan-assembler-times "xor(?:l|w|q)\[^\n\r]*1, \\(%rdi\\), %(?:|r|e)ax" 3 } } */
 /* { dg-final { scan-assembler-times "xor(?:l|w|q)\[^\n\r]%(?:|r|e)di, %(?:|r|e)si, %(?:|r|e)ax" 2 } } */
 /* { dg-final { scan-assembler-times "xor(?:l|w|q)\[^\n\r]%(?:|r|e)si, %(?:|r|e)di, %(?:|r|e)ax" 2 } } */
+/* { dg-final { scan-assembler-times "sal(?:b|l|w|q)\[^\n\r]*1, \\(%rdi\\), %(?:|r|e)a(?:x|l)" 4 } } */
+/* { dg-final { scan-assembler-times "sal(?:l|w|q)\[^\n\r]*7, %(?:|r|e)di, %(?:|r|e)ax" 4 } } */
-- 
2.31.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 13/17] [APX NDD] Support APX NDD for right shift insns
  2023-12-05  2:29 [PATCH v2 00/17] Support Intel APX NDD Hongyu Wang
                   ` (11 preceding siblings ...)
  2023-12-05  2:29 ` [PATCH 12/17] [APX NDD] Support APX NDD for left shift insns Hongyu Wang
@ 2023-12-05  2:29 ` Hongyu Wang
  2023-12-05  2:29 ` [PATCH 14/17] [APX NDD] Support APX NDD for rotate insns Hongyu Wang
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Hongyu Wang @ 2023-12-05  2:29 UTC (permalink / raw)
  To: gcc-patches; +Cc: ubizjak, hongtao.liu

Similar to LSHIFT, rshift do not need to omit $1 for NDD form.

gcc/ChangeLog:

	* config/i386/i386.md (ashr<mode>3_cvt): Extend with new
	alternatives to support NDD, and adjust output templates.
	(*ashr<mode>3_1): Likewise for SI/DI mode.
	(*lshr<mode>3_1): Likewise.
	(*<insn>si3_1_zext): Likewise.
	(*ashr<mode>3_1): Likewise for QI/HI mode.
	(*lshrqi3_1): Likewise.
	(*lshrhi3_1): Likewise.
	(<insn><mode>3_cmp): Likewise.
	(*<insn><mode>3_cconly): Likewise.
	(*ashrsi3_cvt_zext): Likewise, and use nonimmediate_operand for
	operands[1] to accept memory input for NDD alternative.
	(*highpartdisi2): Likewise.
	(*<insn>si3_cmp_zext): Likewise.
	(<insn><mode>3_carry): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/apx-ndd.c: Add l/ashiftrt tests.
---
 gcc/config/i386/i386.md                 | 232 +++++++++++++++---------
 gcc/testsuite/gcc.target/i386/apx-ndd.c |  24 +++
 2 files changed, 166 insertions(+), 90 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 43be1364bff..8bec8a63ba9 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -15803,39 +15803,45 @@ (define_mode_attr cvt_mnemonic
   [(SI "{cltd|cdq}") (DI "{cqto|cqo}")])
 
 (define_insn "ashr<mode>3_cvt"
-  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=*d,rm")
+  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=*d,rm,r")
 	(ashiftrt:SWI48
-	  (match_operand:SWI48 1 "nonimmediate_operand" "*a,0")
+	  (match_operand:SWI48 1 "nonimmediate_operand" "*a,0,rm")
 	  (match_operand:QI 2 "const_int_operand")))
    (clobber (reg:CC FLAGS_REG))]
   "INTVAL (operands[2]) == GET_MODE_BITSIZE (<MODE>mode)-1
    && (TARGET_USE_CLTD || optimize_function_for_size_p (cfun))
-   && ix86_binary_operator_ok (ASHIFTRT, <MODE>mode, operands)"
+   && ix86_binary_operator_ok (ASHIFTRT, <MODE>mode, operands,
+			       TARGET_APX_NDD)"
   "@
    <cvt_mnemonic>
-   sar{<imodesuffix>}\t{%2, %0|%0, %2}"
-  [(set_attr "type" "imovx,ishift")
-   (set_attr "prefix_0f" "0,*")
-   (set_attr "length_immediate" "0,*")
-   (set_attr "modrm" "0,1")
+   sar{<imodesuffix>}\t{%2, %0|%0, %2}
+   sar{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "*,*,apx_ndd")
+   (set_attr "type" "imovx,ishift,ishift")
+   (set_attr "prefix_0f" "0,*,*")
+   (set_attr "length_immediate" "0,*,*")
+   (set_attr "modrm" "0,1,1")
    (set_attr "mode" "<MODE>")])
 
 (define_insn "*ashrsi3_cvt_zext"
-  [(set (match_operand:DI 0 "register_operand" "=*d,r")
+  [(set (match_operand:DI 0 "register_operand" "=*d,r,r")
 	(zero_extend:DI
-	  (ashiftrt:SI (match_operand:SI 1 "register_operand" "*a,0")
+	  (ashiftrt:SI (match_operand:SI 1 "nonimmediate_operand" "*a,0,rm")
 		       (match_operand:QI 2 "const_int_operand"))))
    (clobber (reg:CC FLAGS_REG))]
   "TARGET_64BIT && INTVAL (operands[2]) == 31
    && (TARGET_USE_CLTD || optimize_function_for_size_p (cfun))
-   && ix86_binary_operator_ok (ASHIFTRT, SImode, operands)"
+   && ix86_binary_operator_ok (ASHIFTRT, SImode, operands,
+			       TARGET_APX_NDD)"
   "@
    {cltd|cdq}
-   sar{l}\t{%2, %k0|%k0, %2}"
-  [(set_attr "type" "imovx,ishift")
-   (set_attr "prefix_0f" "0,*")
-   (set_attr "length_immediate" "0,*")
-   (set_attr "modrm" "0,1")
+   sar{l}\t{%2, %k0|%k0, %2}
+   sar{l}\t{%2, %1, %k0|%k0, %1, %2}"
+  [(set_attr "isa" "*,*,apx_ndd")
+   (set_attr "type" "imovx,ishift,ishift")
+   (set_attr "prefix_0f" "0,*,*")
+   (set_attr "length_immediate" "0,*,*")
+   (set_attr "modrm" "0,1,1")
    (set_attr "mode" "SI")])
 
 (define_expand "@x86_shift<mode>_adj_3"
@@ -15877,13 +15883,15 @@ (define_insn "*bmi2_<insn><mode>3_1"
    (set_attr "mode" "<MODE>")])
 
 (define_insn "*ashr<mode>3_1"
-  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r")
+  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r")
 	(ashiftrt:SWI48
-	  (match_operand:SWI48 1 "nonimmediate_operand" "0,rm")
-	  (match_operand:QI 2 "nonmemory_operand" "c<S>,r")))
+	  (match_operand:SWI48 1 "nonimmediate_operand" "0,rm,rm")
+	  (match_operand:QI 2 "nonmemory_operand" "c<S>,r,c<S>")))
    (clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (ASHIFTRT, <MODE>mode, operands)"
+  "ix86_binary_operator_ok (ASHIFTRT, <MODE>mode, operands,
+			    TARGET_APX_NDD)"
 {
+  bool use_ndd = (which_alternative == 2);
   switch (get_attr_type (insn))
     {
     case TYPE_ISHIFTX:
@@ -15891,14 +15899,16 @@ (define_insn "*ashr<mode>3_1"
 
     default:
       if (operands[2] == const1_rtx
-	  && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)))
+	  && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))
+	  && !use_ndd)
 	return "sar{<imodesuffix>}\t%0";
       else
-	return "sar{<imodesuffix>}\t{%2, %0|%0, %2}";
+	return use_ndd ? "sar{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}"
+		       : "sar{<imodesuffix>}\t{%2, %0|%0, %2}";
     }
 }
-  [(set_attr "isa" "*,bmi2")
-   (set_attr "type" "ishift,ishiftx")
+  [(set_attr "isa" "*,bmi2,apx_ndd")
+   (set_attr "type" "ishift,ishiftx,ishift")
    (set (attr "length_immediate")
      (if_then_else
        (and (match_operand 2 "const1_operand")
@@ -15911,8 +15921,8 @@ (define_insn "*ashr<mode>3_1"
 ;; Specialization of *lshr<mode>3_1 below, extracting the SImode
 ;; highpart of a DI to be extracted, but allowing it to be clobbered.
 (define_insn_and_split "*highpartdisi2"
-  [(set (subreg:DI (match_operand:SI 0 "register_operand" "=r,x,?k") 0)
-        (lshiftrt:DI (match_operand:DI 1 "register_operand" "0,0,k")
+  [(set (subreg:DI (match_operand:SI 0 "register_operand" "=r,x,?k,r") 0)
+        (lshiftrt:DI (match_operand:DI 1 "nonimmediate_operand" "0,0,k,rm")
 		     (const_int 32)))
    (clobber (reg:CC FLAGS_REG))]
   "TARGET_64BIT"
@@ -15931,16 +15941,20 @@ (define_insn_and_split "*highpartdisi2"
       DONE;
     }
   operands[0] = gen_rtx_REG (DImode, REGNO (operands[0]));
-})
+}
+[(set_attr "isa" "*,*,*,apx_ndd")])
+
 
 (define_insn "*lshr<mode>3_1"
-  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,?k")
+  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,?k,r")
 	(lshiftrt:SWI48
-	  (match_operand:SWI48 1 "nonimmediate_operand" "0,rm,k")
-	  (match_operand:QI 2 "nonmemory_operand" "c<S>,r,<KS>")))
+	  (match_operand:SWI48 1 "nonimmediate_operand" "0,rm,k,rm")
+	  (match_operand:QI 2 "nonmemory_operand" "c<S>,r,<KS>,c<S>")))
    (clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (LSHIFTRT, <MODE>mode, operands)"
+  "ix86_binary_operator_ok (LSHIFTRT, <MODE>mode, operands,
+			    TARGET_APX_NDD)"
 {
+  bool use_ndd = (which_alternative == 3);
   switch (get_attr_type (insn))
     {
     case TYPE_ISHIFTX:
@@ -15949,14 +15963,16 @@ (define_insn "*lshr<mode>3_1"
 
     default:
       if (operands[2] == const1_rtx
-	  && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)))
+	  && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))
+	  && !use_ndd)
 	return "shr{<imodesuffix>}\t%0";
       else
-	return "shr{<imodesuffix>}\t{%2, %0|%0, %2}";
+	return use_ndd ? "shr{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}"
+		       : "shr{<imodesuffix>}\t{%2, %0|%0, %2}";
     }
 }
-  [(set_attr "isa" "*,bmi2,<kmov_isa>")
-   (set_attr "type" "ishift,ishiftx,msklog")
+  [(set_attr "isa" "*,bmi2,<kmov_isa>,apx_ndd")
+   (set_attr "type" "ishift,ishiftx,msklog,ishift")
    (set (attr "length_immediate")
      (if_then_else
        (and (and (match_operand 2 "const1_operand")
@@ -15989,13 +16005,15 @@ (define_insn "*bmi2_<insn>si3_1_zext"
    (set_attr "mode" "SI")])
 
 (define_insn "*<insn>si3_1_zext"
-  [(set (match_operand:DI 0 "register_operand" "=r,r")
+  [(set (match_operand:DI 0 "register_operand" "=r,r,r")
 	(zero_extend:DI
-	  (any_shiftrt:SI (match_operand:SI 1 "nonimmediate_operand" "0,rm")
-			  (match_operand:QI 2 "nonmemory_operand" "cI,r"))))
+	  (any_shiftrt:SI (match_operand:SI 1 "nonimmediate_operand" "0,rm,rm")
+			  (match_operand:QI 2 "nonmemory_operand" "cI,r,cI"))))
    (clobber (reg:CC FLAGS_REG))]
-  "TARGET_64BIT && ix86_binary_operator_ok (<CODE>, SImode, operands)"
+  "TARGET_64BIT && ix86_binary_operator_ok (<CODE>, SImode, operands,
+					    TARGET_APX_NDD)"
 {
+  bool use_ndd = (which_alternative == 2);
   switch (get_attr_type (insn))
     {
     case TYPE_ISHIFTX:
@@ -16003,14 +16021,16 @@ (define_insn "*<insn>si3_1_zext"
 
     default:
       if (operands[2] == const1_rtx
-	  && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)))
+	  && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))
+	  && !use_ndd)
 	return "<shift>{l}\t%k0";
       else
-	return "<shift>{l}\t{%2, %k0|%k0, %2}";
+	return use_ndd ? "<shift>{l}\t{%2, %1, %k0|%k0, %1, %2}"
+		       : "<shift>{l}\t{%2, %k0|%k0, %2}";
     }
 }
-  [(set_attr "isa" "*,bmi2")
-   (set_attr "type" "ishift,ishiftx")
+  [(set_attr "isa" "*,bmi2,apx_ndd")
+   (set_attr "type" "ishift,ishiftx,ishift")
    (set (attr "length_immediate")
      (if_then_else
        (and (match_operand 2 "const1_operand")
@@ -16033,20 +16053,25 @@ (define_split
   "operands[2] = gen_lowpart (SImode, operands[2]);")
 
 (define_insn "*ashr<mode>3_1"
-  [(set (match_operand:SWI12 0 "nonimmediate_operand" "=<r>m")
+  [(set (match_operand:SWI12 0 "nonimmediate_operand" "=<r>m, r")
 	(ashiftrt:SWI12
-	  (match_operand:SWI12 1 "nonimmediate_operand" "0")
-	  (match_operand:QI 2 "nonmemory_operand" "c<S>")))
+	  (match_operand:SWI12 1 "nonimmediate_operand" "0, rm")
+	  (match_operand:QI 2 "nonmemory_operand" "c<S>, c<S>")))
    (clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (ASHIFTRT, <MODE>mode, operands)"
+  "ix86_binary_operator_ok (ASHIFTRT, <MODE>mode, operands,
+			    TARGET_APX_NDD)"
 {
+  bool use_ndd = which_alternative == 1;
   if (operands[2] == const1_rtx
-      && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)))
+      && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))
+      && !use_ndd)
     return "sar{<imodesuffix>}\t%0";
   else
-    return "sar{<imodesuffix>}\t{%2, %0|%0, %2}";
+    return use_ndd ? "sar{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}"
+		   : "sar{<imodesuffix>}\t{%2, %0|%0, %2}";
 }
-  [(set_attr "type" "ishift")
+  [(set_attr "isa" "*, apx_ndd")
+   (set_attr "type" "ishift")
    (set (attr "length_immediate")
      (if_then_else
        (and (match_operand 2 "const1_operand")
@@ -16057,29 +16082,33 @@ (define_insn "*ashr<mode>3_1"
    (set_attr "mode" "<MODE>")])
 
 (define_insn "*lshrqi3_1"
-  [(set (match_operand:QI 0 "nonimmediate_operand"  "=qm,?k")
+  [(set (match_operand:QI 0 "nonimmediate_operand"  "=qm,?k,r")
 	(lshiftrt:QI
-	  (match_operand:QI 1 "nonimmediate_operand" "0, k")
-	  (match_operand:QI 2 "nonmemory_operand"    "cI,Wb")))
+	  (match_operand:QI 1 "nonimmediate_operand" "0, k, rm")
+	  (match_operand:QI 2 "nonmemory_operand"    "cI,Wb,cI")))
    (clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (LSHIFTRT, QImode, operands)"
+  "ix86_binary_operator_ok (LSHIFTRT, QImode, operands,
+			    TARGET_APX_NDD)"
 {
+  bool use_ndd = (which_alternative == 2);
   switch (get_attr_type (insn))
     {
     case TYPE_ISHIFT:
       if (operands[2] == const1_rtx
-	  && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)))
+	  && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))
+	  && !use_ndd)
 	return "shr{b}\t%0";
       else
-	return "shr{b}\t{%2, %0|%0, %2}";
+	return use_ndd ? "shr{b}\t{%2, %1, %0|%0, %1, %2}"
+		       : "shr{b}\t{%2, %0|%0, %2}";
     case TYPE_MSKLOG:
       return "#";
     default:
       gcc_unreachable ();
     }
 }
-  [(set_attr "isa" "*,avx512dq")
-   (set_attr "type" "ishift,msklog")
+  [(set_attr "isa" "*,avx512dq,apx_ndd")
+   (set_attr "type" "ishift,msklog,ishift")
    (set (attr "length_immediate")
      (if_then_else
        (and (and (match_operand 2 "const1_operand")
@@ -16091,29 +16120,33 @@ (define_insn "*lshrqi3_1"
    (set_attr "mode" "QI")])
 
 (define_insn "*lshrhi3_1"
-  [(set (match_operand:HI 0 "nonimmediate_operand" "=rm, ?k")
+  [(set (match_operand:HI 0 "nonimmediate_operand" "=rm, ?k, r")
 	(lshiftrt:HI
-	  (match_operand:HI 1 "nonimmediate_operand" "0, k")
-	  (match_operand:QI 2 "nonmemory_operand" "cI, Ww")))
+	  (match_operand:HI 1 "nonimmediate_operand" "0, k, rm")
+	  (match_operand:QI 2 "nonmemory_operand" "cI, Ww, cI")))
    (clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (LSHIFTRT, HImode, operands)"
+  "ix86_binary_operator_ok (LSHIFTRT, HImode, operands,
+			    TARGET_APX_NDD)"
 {
+  bool use_ndd = (which_alternative == 2);
   switch (get_attr_type (insn))
     {
     case TYPE_ISHIFT:
       if (operands[2] == const1_rtx
-	  && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)))
+	  && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))
+	  && !use_ndd)
 	return "shr{w}\t%0";
       else
-	return "shr{w}\t{%2, %0|%0, %2}";
+	return use_ndd ? "shr{w}\t{%2, %1, %0|%0, %1, %2}"
+		       : "shr{w}\t{%2, %0|%0, %2}";
     case TYPE_MSKLOG:
       return "#";
     default:
       gcc_unreachable ();
     }
 }
-  [(set_attr "isa" "*, avx512f")
-   (set_attr "type" "ishift,msklog")
+  [(set_attr "isa" "*, avx512f, apx_ndd")
+   (set_attr "type" "ishift,msklog,ishift")
    (set (attr "length_immediate")
      (if_then_else
        (and (and (match_operand 2 "const1_operand")
@@ -16166,25 +16199,30 @@ (define_insn "*<insn><mode>3_cmp"
   [(set (reg FLAGS_REG)
 	(compare
 	  (any_shiftrt:SWI
-	    (match_operand:SWI 1 "nonimmediate_operand" "0")
-	    (match_operand:QI 2 "<shift_immediate_operand>" "<S>"))
+	    (match_operand:SWI 1 "nonimmediate_operand" "0,rm")
+	    (match_operand:QI 2 "<shift_immediate_operand>" "<S>,<S>"))
 	  (const_int 0)))
-   (set (match_operand:SWI 0 "nonimmediate_operand" "=<r>m")
+   (set (match_operand:SWI 0 "nonimmediate_operand" "=<r>m,r")
 	(any_shiftrt:SWI (match_dup 1) (match_dup 2)))]
   "(optimize_function_for_size_p (cfun)
     || !TARGET_PARTIAL_FLAG_REG_STALL
     || (operands[2] == const1_rtx
 	&& TARGET_SHIFT1))
    && ix86_match_ccmode (insn, CCGOCmode)
-   && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
+   && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands,
+			       TARGET_APX_NDD)"
 {
+  bool use_ndd = which_alternative == 1;
   if (operands[2] == const1_rtx
-      && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)))
+      && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))
+      && !use_ndd)
     return "<shift>{<imodesuffix>}\t%0";
   else
-    return "<shift>{<imodesuffix>}\t{%2, %0|%0, %2}";
+    return use_ndd ? "<shift>{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}"
+		   : "<shift>{<imodesuffix>}\t{%2, %0|%0, %2}";
 }
-  [(set_attr "type" "ishift")
+  [(set_attr "isa" "*,apx_ndd")
+   (set_attr "type" "ishift")
    (set (attr "length_immediate")
      (if_then_else
        (and (match_operand 2 "const1_operand")
@@ -16197,10 +16235,10 @@ (define_insn "*<insn><mode>3_cmp"
 (define_insn "*<insn>si3_cmp_zext"
   [(set (reg FLAGS_REG)
 	(compare
-	  (any_shiftrt:SI (match_operand:SI 1 "register_operand" "0")
+	  (any_shiftrt:SI (match_operand:SI 1 "nonimmediate_operand" "0,rm")
 			  (match_operand:QI 2 "const_1_to_31_operand"))
 	  (const_int 0)))
-   (set (match_operand:DI 0 "register_operand" "=r")
+   (set (match_operand:DI 0 "register_operand" "=r,r")
 	(zero_extend:DI (any_shiftrt:SI (match_dup 1) (match_dup 2))))]
   "TARGET_64BIT
    && (optimize_function_for_size_p (cfun)
@@ -16208,15 +16246,20 @@ (define_insn "*<insn>si3_cmp_zext"
        || (operands[2] == const1_rtx
 	   && TARGET_SHIFT1))
    && ix86_match_ccmode (insn, CCGOCmode)
-   && ix86_binary_operator_ok (<CODE>, SImode, operands)"
+   && ix86_binary_operator_ok (<CODE>, SImode, operands,
+			       TARGET_APX_NDD)"
 {
+  bool use_ndd = which_alternative == 1;
   if (operands[2] == const1_rtx
-      && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)))
+      && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))
+      && !use_ndd)
     return "<shift>{l}\t%k0";
   else
-    return "<shift>{l}\t{%2, %k0|%k0, %2}";
+    return use_ndd ? "<shift>{l}\t{%2, %1, %k0|%k0, %1, %2}"
+		   : "<shift>{l}\t{%2, %k0|%k0, %2}";
 }
-  [(set_attr "type" "ishift")
+  [(set_attr "isa" "*,apx_ndd")
+   (set_attr "type" "ishift")
    (set (attr "length_immediate")
      (if_then_else
        (and (match_operand 2 "const1_operand")
@@ -16230,23 +16273,28 @@ (define_insn "*<insn><mode>3_cconly"
   [(set (reg FLAGS_REG)
 	(compare
 	  (any_shiftrt:SWI
-	    (match_operand:SWI 1 "register_operand" "0")
-	    (match_operand:QI 2 "<shift_immediate_operand>" "<S>"))
+	    (match_operand:SWI 1 "nonimmediate_operand" "0,rm")
+	    (match_operand:QI 2 "<shift_immediate_operand>" "<S>,<S>"))
 	  (const_int 0)))
-   (clobber (match_scratch:SWI 0 "=<r>"))]
+   (clobber (match_scratch:SWI 0 "=<r>,r"))]
   "(optimize_function_for_size_p (cfun)
     || !TARGET_PARTIAL_FLAG_REG_STALL
     || (operands[2] == const1_rtx
 	&& TARGET_SHIFT1))
    && ix86_match_ccmode (insn, CCGOCmode)"
 {
+  bool use_ndd = which_alternative == 1;
   if (operands[2] == const1_rtx
-      && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)))
+      && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))
+      && !use_ndd)
     return "<shift>{<imodesuffix>}\t%0";
   else
-    return "<shift>{<imodesuffix>}\t{%2, %0|%0, %2}";
+    return use_ndd
+	   ? "<shift>{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}"
+	   : "<shift>{<imodesuffix>}\t{%2, %0|%0, %2}";
 }
-  [(set_attr "type" "ishift")
+  [(set_attr "isa" "*,apx_ndd")
+   (set_attr "type" "ishift")
    (set (attr "length_immediate")
      (if_then_else
        (and (match_operand 2 "const1_operand")
@@ -16850,18 +16898,22 @@ (define_insn "rcrdi2"
 ;; Versions of sar and shr that set the carry flag.
 (define_insn "<insn><mode>3_carry"
   [(set (reg:CCC FLAGS_REG)
-	(unspec:CCC [(and:SWI48 (match_operand:SWI48 1 "register_operand" "0")
+	(unspec:CCC [(and:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" "0,rm")
 				(const_int 1))
 		     (const_int 0)] UNSPEC_CC_NE))
-   (set (match_operand:SWI48 0 "register_operand" "=r")
+   (set (match_operand:SWI48 0 "register_operand" "=r,r")
 	(any_shiftrt:SWI48 (match_dup 1) (const_int 1)))]
   ""
 {
-  if (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))
+  bool use_ndd = which_alternative == 1;
+  if ((TARGET_SHIFT1 || optimize_function_for_size_p (cfun))
+      && !use_ndd)
     return "<shift>{<imodesuffix>}\t%0";
-  return "<shift>{<imodesuffix>}\t{1, %0|%0, 1}";
+  return use_ndd ? "<shift>{<imodesuffix>}\t{$1, %1, %0|%0, %1, 1}"
+		 : "<shift>{<imodesuffix>}\t{$1, %0|%0, 1}";
 }
-  [(set_attr "type" "ishift1")
+  [(set_attr "isa" "*, apx_ndd")
+   (set_attr "type" "ishift1")
    (set (attr "length_immediate")
      (if_then_else
        (ior (match_test "TARGET_SHIFT1")
diff --git a/gcc/testsuite/gcc.target/i386/apx-ndd.c b/gcc/testsuite/gcc.target/i386/apx-ndd.c
index 9951fb00a4c..239c427514a 100644
--- a/gcc/testsuite/gcc.target/i386/apx-ndd.c
+++ b/gcc/testsuite/gcc.target/i386/apx-ndd.c
@@ -2,6 +2,8 @@
 /* { dg-options "-mapxf -march=x86-64 -O2" } */
 /* { dg-final { scan-assembler-not "movl"} } */
 
+#include <stdint.h>
+
 #define FOO(TYPE, OP_NAME, OP)   \
 TYPE				 \
 __attribute__ ((noipa)) 	 \
@@ -132,6 +134,24 @@ FOO3 (int, shl, <<, 7)
 FOO (long, shl, <<)
 FOO3 (long, shl, <<, 7)
 
+FOO (char, sar, >>)
+FOO3 (char, sar, >>, 7)
+FOO (short, sar, >>)
+FOO3 (short, sar, >>, 7)
+FOO (int, sar, >>)
+FOO3 (int, sar, >>, 7)
+FOO (long, sar, >>)
+FOO3 (long, sar, >>, 7)
+
+FOO (uint8_t, shr, >>)
+FOO3 (uint8_t, shr, >>, 7)
+FOO (uint16_t, shr, >>)
+FOO3 (uint16_t, shr, >>, 7)
+FOO (uint32_t, shr, >>)
+FOO3 (uint32_t, shr, >>, 7)
+FOO (uint64_t, shr, >>)
+FOO3 (uint64_t, shr, >>, 7)
+
 /* { dg-final { scan-assembler-times "add(?:b|l|w|q)\[^\n\r]*1, \\(%rdi\\), %(?:|r|e)a(?:x|l)" 4 } } */
 /* { dg-final { scan-assembler-times "lea(?:l|q)\[^\n\r]\\(%r(?:d|s)i,%r(?:d|s)i\\), %(?:|r|e)ax" 4 } } */
 /* { dg-final { scan-assembler-times "add(?:b|l|w|q)\[^\n\r]%(?:|r|e)si(?:|l), \\(%rdi\\), %(?:|r|e)a(?:x|l)" 4 } } */
@@ -156,3 +176,7 @@ FOO3 (long, shl, <<, 7)
 /* { dg-final { scan-assembler-times "xor(?:l|w|q)\[^\n\r]%(?:|r|e)si, %(?:|r|e)di, %(?:|r|e)ax" 2 } } */
 /* { dg-final { scan-assembler-times "sal(?:b|l|w|q)\[^\n\r]*1, \\(%rdi\\), %(?:|r|e)a(?:x|l)" 4 } } */
 /* { dg-final { scan-assembler-times "sal(?:l|w|q)\[^\n\r]*7, %(?:|r|e)di, %(?:|r|e)ax" 4 } } */
+/* { dg-final { scan-assembler-times "sar(?:b|l|w|q)\[^\n\r]*1, \\(%rdi\\), %(?:|r|e)a(?:x|l)" 4 } } */
+/* { dg-final { scan-assembler-times "sar(?:b|l|w|q)\[^\n\r]*7, %(?:|r|e)di(?:|l), %(?:|r|e)a(?:x|l)" 4 } } */
+/* { dg-final { scan-assembler-times "shr(?:b|l|w|q)\[^\n\r]*1, \\(%rdi\\), %(?:|r|e)a(?:x|l)" 4 } } */
+/* { dg-final { scan-assembler-times "shr(?:b|l|w|q)\[^\n\r]*7, %(?:|r|e)di(?:|l), %(?:|r|e)a(?:x|l)" 4 } } */
-- 
2.31.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 14/17] [APX NDD] Support APX NDD for rotate insns
  2023-12-05  2:29 [PATCH v2 00/17] Support Intel APX NDD Hongyu Wang
                   ` (12 preceding siblings ...)
  2023-12-05  2:29 ` [PATCH 13/17] [APX NDD] Support APX NDD for right " Hongyu Wang
@ 2023-12-05  2:29 ` Hongyu Wang
  2023-12-05  2:29 ` [PATCH 15/17] [APX NDD] Support APX NDD for shld/shrd insns Hongyu Wang
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Hongyu Wang @ 2023-12-05  2:29 UTC (permalink / raw)
  To: gcc-patches; +Cc: ubizjak, hongtao.liu

gcc/ChangeLog:

	* config/i386/i386.md (*<insn><mode>3_1): Extend with a new
	alternative to support NDD for SI/DI rotate, and adjust output
	template.
	(*<insn>si3_1_zext): Likewise.
	(*<insn><mode>3_1): Likewise for QI/HI modes.
	(rcrsi2): Likewise, and use nonimmediate_operand for operands[1]
	to accept memory input for NDD alternative.
	(rcrdi2): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/apx-ndd.c: Add test for left/right rotate.
---
 gcc/config/i386/i386.md                 | 79 +++++++++++++++----------
 gcc/testsuite/gcc.target/i386/apx-ndd.c | 20 +++++++
 2 files changed, 69 insertions(+), 30 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 8bec8a63ba9..6398f544a17 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -16662,13 +16662,15 @@ (define_insn "*bmi2_rorx<mode>3_1"
    (set_attr "mode" "<MODE>")])
 
 (define_insn "*<insn><mode>3_1"
-  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r")
+  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r")
 	(any_rotate:SWI48
-	  (match_operand:SWI48 1 "nonimmediate_operand" "0,rm")
-	  (match_operand:QI 2 "nonmemory_operand" "c<S>,<S>")))
+	  (match_operand:SWI48 1 "nonimmediate_operand" "0,rm,rm")
+	  (match_operand:QI 2 "nonmemory_operand" "c<S>,<S>,c<S>")))
    (clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
+  "ix86_binary_operator_ok (<CODE>, <MODE>mode, operands,
+			    TARGET_APX_NDD)"
 {
+  bool use_ndd = (which_alternative == 2);
   switch (get_attr_type (insn))
     {
     case TYPE_ROTATEX:
@@ -16676,14 +16678,16 @@ (define_insn "*<insn><mode>3_1"
 
     default:
       if (operands[2] == const1_rtx
-	  && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)))
+	  && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))
+	  && !use_ndd)
 	return "<rotate>{<imodesuffix>}\t%0";
       else
-	return "<rotate>{<imodesuffix>}\t{%2, %0|%0, %2}";
+	return use_ndd ? "<rotate>{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}"
+		       : "<rotate>{<imodesuffix>}\t{%2, %0|%0, %2}";
     }
 }
-  [(set_attr "isa" "*,bmi2")
-   (set_attr "type" "rotate,rotatex")
+  [(set_attr "isa" "*,bmi2,apx_ndd")
+   (set_attr "type" "rotate,rotatex,rotate")
    (set (attr "preferred_for_size")
      (cond [(eq_attr "alternative" "0")
 	      (symbol_ref "true")]
@@ -16733,13 +16737,14 @@ (define_insn "*bmi2_rorxsi3_1_zext"
    (set_attr "mode" "SI")])
 
 (define_insn "*<insn>si3_1_zext"
-  [(set (match_operand:DI 0 "register_operand" "=r,r")
+  [(set (match_operand:DI 0 "register_operand" "=r,r,r")
 	(zero_extend:DI
-	  (any_rotate:SI (match_operand:SI 1 "nonimmediate_operand" "0,rm")
-			 (match_operand:QI 2 "nonmemory_operand" "cI,I"))))
+	  (any_rotate:SI (match_operand:SI 1 "nonimmediate_operand" "0,rm,rm")
+			 (match_operand:QI 2 "nonmemory_operand" "cI,I,cI"))))
    (clobber (reg:CC FLAGS_REG))]
   "TARGET_64BIT && ix86_binary_operator_ok (<CODE>, SImode, operands)"
 {
+  bool use_ndd = (which_alternative == 2);
   switch (get_attr_type (insn))
     {
     case TYPE_ROTATEX:
@@ -16747,14 +16752,16 @@ (define_insn "*<insn>si3_1_zext"
 
     default:
       if (operands[2] == const1_rtx
-	  && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)))
+	  && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))
+	  && !use_ndd)
 	return "<rotate>{l}\t%k0";
       else
-	return "<rotate>{l}\t{%2, %k0|%k0, %2}";
+	return use_ndd ? "<rotate>{l}\t{%2, %1, %k0|%k0, %1, %2}"
+		       : "<rotate>{l}\t{%2, %k0|%k0, %2}";
     }
 }
-  [(set_attr "isa" "*,bmi2")
-   (set_attr "type" "rotate,rotatex")
+  [(set_attr "isa" "*,bmi2,apx_ndd")
+   (set_attr "type" "rotate,rotatex,rotate")
    (set (attr "preferred_for_size")
      (cond [(eq_attr "alternative" "0")
 	      (symbol_ref "true")]
@@ -16798,19 +16805,25 @@ (define_split
 	(zero_extend:DI (rotatert:SI (match_dup 1) (match_dup 2))))])
 
 (define_insn "*<insn><mode>3_1"
-  [(set (match_operand:SWI12 0 "nonimmediate_operand" "=<r>m")
-	(any_rotate:SWI12 (match_operand:SWI12 1 "nonimmediate_operand" "0")
-			  (match_operand:QI 2 "nonmemory_operand" "c<S>")))
+  [(set (match_operand:SWI12 0 "nonimmediate_operand" "=<r>m,r")
+	(any_rotate:SWI12 (match_operand:SWI12 1 "nonimmediate_operand" "0,rm")
+			  (match_operand:QI 2 "nonmemory_operand" "c<S>,c<S>")))
    (clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
+  "ix86_binary_operator_ok (<CODE>, <MODE>mode, operands,
+			    TARGET_APX_NDD)"
 {
+  bool use_ndd = which_alternative == 1;
   if (operands[2] == const1_rtx
-      && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)))
+      && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))
+      && !use_ndd)
     return "<rotate>{<imodesuffix>}\t%0";
   else
-    return "<rotate>{<imodesuffix>}\t{%2, %0|%0, %2}";
+    return use_ndd
+	   ? "<rotate>{<imodesuffix>}\t{%2, %1, %0|%0, %1, %2}"
+	   : "<rotate>{<imodesuffix>}\t{%2, %0|%0, %2}";
 }
-  [(set_attr "type" "rotate")
+  [(set_attr "isa" "*,apx_ndd")
+   (set_attr "type" "rotate")
    (set (attr "length_immediate")
      (if_then_else
        (and (match_operand 2 "const1_operand")
@@ -16867,31 +16880,37 @@ (define_split
 
 ;; Rotations through carry flag
 (define_insn "rcrsi2"
-  [(set (match_operand:SI 0 "register_operand" "=r")
+  [(set (match_operand:SI 0 "register_operand" "=r,r")
 	(plus:SI
-	  (lshiftrt:SI (match_operand:SI 1 "register_operand" "0")
+	  (lshiftrt:SI (match_operand:SI 1 "nonimmediate_operand" "0,rm")
 		       (const_int 1))
 	  (ashift:SI (ltu:SI (reg:CCC FLAGS_REG) (const_int 0))
 		     (const_int 31))))
    (clobber (reg:CC FLAGS_REG))]
   ""
-  "rcr{l}\t%0"
-  [(set_attr "type" "ishift1")
+  "@
+   rcr{l}\t%0
+   rcr{l}\t{%1, %0|%0, %1}"
+  [(set_attr "isa" "*,apx_ndd")
+   (set_attr "type" "ishift1")
    (set_attr "memory" "none")
    (set_attr "length_immediate" "0")
    (set_attr "mode" "SI")])
 
 (define_insn "rcrdi2"
-  [(set (match_operand:DI 0 "register_operand" "=r")
+  [(set (match_operand:DI 0 "register_operand" "=r,r")
 	(plus:DI
-	  (lshiftrt:DI (match_operand:DI 1 "register_operand" "0")
+	  (lshiftrt:DI (match_operand:DI 1 "nonimmediate_operand" "0,rm")
 		       (const_int 1))
 	  (ashift:DI (ltu:DI (reg:CCC FLAGS_REG) (const_int 0))
 		     (const_int 63))))
    (clobber (reg:CC FLAGS_REG))]
   "TARGET_64BIT"
-  "rcr{q}\t%0"
-  [(set_attr "type" "ishift1")
+  "@
+   rcr{q}\t%0
+   rcr{q}\t{%1, %0|%0, %1}"
+  [(set_attr "isa" "*,apx_ndd")
+   (set_attr "type" "ishift1")
    (set_attr "length_immediate" "0")
    (set_attr "mode" "DI")])
 
diff --git a/gcc/testsuite/gcc.target/i386/apx-ndd.c b/gcc/testsuite/gcc.target/i386/apx-ndd.c
index 239c427514a..b215f66d3e2 100644
--- a/gcc/testsuite/gcc.target/i386/apx-ndd.c
+++ b/gcc/testsuite/gcc.target/i386/apx-ndd.c
@@ -40,6 +40,14 @@ foo3_##OP_NAME##_##TYPE (TYPE a)      \
   return b;			      \
 }			
 
+#define FOO4(TYPE, OP_NAME, OP1, OP2, IMM1)		    \
+TYPE							    \
+__attribute__ ((noipa))					    \
+foo4_##OP_NAME##_##TYPE (TYPE a)			    \
+{							    \
+  TYPE b = (a OP1 IMM1 | a OP2 (8 * sizeof(TYPE) - IMM1));  \
+  return b;						    \
+}
 
 #define F(TYPE, OP_NAME, OP)   \
 TYPE				 \
@@ -152,6 +160,16 @@ FOO3 (uint32_t, shr, >>, 7)
 FOO (uint64_t, shr, >>)
 FOO3 (uint64_t, shr, >>, 7)
 
+FOO4 (uint8_t, ror, >>, <<, 1)
+FOO4 (uint16_t, ror, >>, <<, 1)
+FOO4 (uint32_t, ror, >>, <<, 1)
+FOO4 (uint64_t, ror, >>, <<, 1)
+
+FOO4 (uint8_t, rol, <<, >>, 1)
+FOO4 (uint16_t, rol, <<, >>, 1)
+FOO4 (uint32_t, rol, <<, >>, 1)
+FOO4 (uint64_t, rol, <<, >>, 1)
+
 /* { dg-final { scan-assembler-times "add(?:b|l|w|q)\[^\n\r]*1, \\(%rdi\\), %(?:|r|e)a(?:x|l)" 4 } } */
 /* { dg-final { scan-assembler-times "lea(?:l|q)\[^\n\r]\\(%r(?:d|s)i,%r(?:d|s)i\\), %(?:|r|e)ax" 4 } } */
 /* { dg-final { scan-assembler-times "add(?:b|l|w|q)\[^\n\r]%(?:|r|e)si(?:|l), \\(%rdi\\), %(?:|r|e)a(?:x|l)" 4 } } */
@@ -180,3 +198,5 @@ FOO3 (uint64_t, shr, >>, 7)
 /* { dg-final { scan-assembler-times "sar(?:b|l|w|q)\[^\n\r]*7, %(?:|r|e)di(?:|l), %(?:|r|e)a(?:x|l)" 4 } } */
 /* { dg-final { scan-assembler-times "shr(?:b|l|w|q)\[^\n\r]*1, \\(%rdi\\), %(?:|r|e)a(?:x|l)" 4 } } */
 /* { dg-final { scan-assembler-times "shr(?:b|l|w|q)\[^\n\r]*7, %(?:|r|e)di(?:|l), %(?:|r|e)a(?:x|l)" 4 } } */
+/* { dg-final { scan-assembler-times "ror(?:b|l|w|q)\[^\n\r]*1, %(?:|r|e)di(?:|l), %(?:|r|e)a(?:x|l)" 4 } } */
+/* { dg-final { scan-assembler-times "rol(?:b|l|w|q)\[^\n\r]*1, %(?:|r|e)di(?:|l), %(?:|r|e)a(?:x|l)" 4 } } */
-- 
2.31.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 15/17] [APX NDD] Support APX NDD for shld/shrd insns
  2023-12-05  2:29 [PATCH v2 00/17] Support Intel APX NDD Hongyu Wang
                   ` (13 preceding siblings ...)
  2023-12-05  2:29 ` [PATCH 14/17] [APX NDD] Support APX NDD for rotate insns Hongyu Wang
@ 2023-12-05  2:29 ` Hongyu Wang
  2023-12-05  2:29 ` [PATCH 16/17] [APX NDD] Support APX NDD for cmove insns Hongyu Wang
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Hongyu Wang @ 2023-12-05  2:29 UTC (permalink / raw)
  To: gcc-patches; +Cc: ubizjak, hongtao.liu

For shld/shrd insns, the old pattern use match_dup 0 as its shift src and use
+r*m as its constraint. To support NDD we added new define_insns to handle NDD
form pattern with extra input and dest operand to be fixed in register.

gcc/ChangeLog:

	* config/i386/i386.md (x86_64_shld_ndd): New define_insn.
	(x86_64_shld_ndd_1): Likewise.
	(*x86_64_shld_ndd_2): Likewise.
	(x86_shld_ndd): Likewise.
	(x86_shld_ndd_1): Likewise.
	(*x86_shld_ndd_2): Likewise.
	(x86_64_shrd_ndd): Likewise.
	(x86_64_shrd_ndd_1): Likewise.
	(*x86_64_shrd_ndd_2): Likewise.
	(x86_shrd_ndd): Likewise.
	(x86_shrd_ndd_1): Likewise.
	(*x86_shrd_ndd_2): Likewise.
	(*x86_64_shld_shrd_1_nozext): Adjust codegen under TARGET_APX_NDD.
	(*x86_shld_shrd_1_nozext): Likewise.
	(*x86_64_shrd_shld_1_nozext): Likewise.
	(*x86_shrd_shld_1_nozext): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/apx-ndd-shld-shrd.c: New test.
---
 gcc/config/i386/i386.md                       | 322 +++++++++++++++++-
 .../gcc.target/i386/apx-ndd-shld-shrd.c       |  24 ++
 2 files changed, 344 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd-shld-shrd.c

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 6398f544a17..0af7e82deee 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -14505,6 +14505,23 @@ (define_insn "x86_64_shld"
    (set_attr "amdfam10_decode" "vector")
    (set_attr "bdver1_decode" "vector")])
 
+(define_insn "x86_64_shld_ndd"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+        (ior:DI (ashift:DI (match_operand:DI 1 "nonimmediate_operand" "rm")
+		  (and:QI (match_operand:QI 3 "nonmemory_operand" "Jc")
+			  (const_int 63)))
+		(subreg:DI
+		  (lshiftrt:TI
+		    (zero_extend:TI
+		      (match_operand:DI 2 "register_operand" "r"))
+		    (minus:QI (const_int 64)
+			      (and:QI (match_dup 3) (const_int 63)))) 0)))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_APX_NDD"
+  "shld{q}\t{%s3%2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "ishift")
+   (set_attr "mode" "DI")])
+
 (define_insn "x86_64_shld_1"
   [(set (match_operand:DI 0 "nonimmediate_operand" "+r*m")
         (ior:DI (ashift:DI (match_dup 0)
@@ -14526,6 +14543,24 @@ (define_insn "x86_64_shld_1"
    (set_attr "amdfam10_decode" "vector")
    (set_attr "bdver1_decode" "vector")])
 
+(define_insn "x86_64_shld_ndd_1"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+        (ior:DI (ashift:DI (match_operand:DI 1 "nonimmediate_operand" "rm")
+			   (match_operand:QI 3 "const_0_to_63_operand"))
+		(subreg:DI
+		  (lshiftrt:TI
+		    (zero_extend:TI
+		      (match_operand:DI 2 "register_operand" "r"))
+		    (match_operand:QI 4 "const_0_to_255_operand")) 0)))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_APX_NDD
+   && INTVAL (operands[4]) == 64 - INTVAL (operands[3])"
+  "shld{q}\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "ishift")
+   (set_attr "mode" "DI")
+   (set_attr "length_immediate" "1")])
+
+
 (define_insn_and_split "*x86_64_shld_shrd_1_nozext"
   [(set (match_operand:DI 0 "nonimmediate_operand")
 	(ior:DI (ashift:DI (match_operand:DI 4 "nonimmediate_operand")
@@ -14551,6 +14586,23 @@ (define_insn_and_split "*x86_64_shld_shrd_1_nozext"
       operands[4] = force_reg (DImode, operands[4]);
       emit_insn (gen_x86_64_shrd_1 (operands[0], operands[4], operands[3], operands[2]));
     }
+  else if (TARGET_APX_NDD)
+    {
+     rtx tmp = gen_reg_rtx (DImode);
+     if (MEM_P (operands[4]))
+       {
+	 operands[1] = force_reg (DImode, operands[1]);
+	 emit_insn (gen_x86_64_shld_ndd_1 (tmp, operands[4], operands[1],
+					   operands[2], operands[3]));
+       }
+     else if (MEM_P (operands[1]))
+       emit_insn (gen_x86_64_shrd_ndd_1 (tmp, operands[1], operands[4],
+					 operands[3], operands[2]));
+     else
+       emit_insn (gen_x86_64_shld_ndd_1 (tmp, operands[4], operands[1],
+					 operands[2], operands[3]));
+     emit_move_insn (operands[0], tmp);
+    }
   else
    {
      operands[1] = force_reg (DImode, operands[1]);
@@ -14583,6 +14635,33 @@ (define_insn_and_split "*x86_64_shld_2"
 						   (const_int 63)))) 0)))
 	      (clobber (reg:CC FLAGS_REG))])])
 
+(define_insn_and_split "*x86_64_shld_ndd_2"
+  [(set (match_operand:DI 0 "nonimmediate_operand")
+	(ior:DI (ashift:DI (match_operand:DI 1 "nonimmediate_operand")
+			   (match_operand:QI 3 "nonmemory_operand"))
+		(lshiftrt:DI (match_operand:DI 2 "register_operand")
+			     (minus:QI (const_int 64) (match_dup 3)))))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_APX_NDD
+   && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(parallel [(set (match_dup 4)
+		   (ior:DI (ashift:DI (match_dup 1)
+				      (and:QI (match_dup 3) (const_int 63)))
+			   (subreg:DI
+			     (lshiftrt:TI
+			       (zero_extend:TI (match_dup 2))
+				 (minus:QI (const_int 64)
+					   (and:QI (match_dup 3)
+						   (const_int 63)))) 0)))
+	      (clobber (reg:CC FLAGS_REG))
+	      (set (match_dup 0) (match_dup 4))])]
+{
+  operands[4] = gen_reg_rtx (DImode);
+  emit_move_insn (operands[4], operands[0]);
+})
+
 (define_insn "x86_shld"
   [(set (match_operand:SI 0 "nonimmediate_operand" "+r*m")
         (ior:SI (ashift:SI (match_dup 0)
@@ -14605,6 +14684,24 @@ (define_insn "x86_shld"
    (set_attr "amdfam10_decode" "vector")
    (set_attr "bdver1_decode" "vector")])
 
+(define_insn "x86_shld_ndd"
+  [(set (match_operand:SI 0 "nonimmediate_operand" "=r")
+        (ior:SI (ashift:SI (match_operand:SI 1 "nonimmediate_operand" "rm")
+		  (and:QI (match_operand:QI 3 "nonmemory_operand" "Ic")
+			  (const_int 31)))
+		(subreg:SI
+		  (lshiftrt:DI
+		    (zero_extend:DI
+		      (match_operand:SI 2 "register_operand" "r"))
+		    (minus:QI (const_int 32)
+			      (and:QI (match_dup 3) (const_int 31)))) 0)))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_APX_NDD"
+  "shld{l}\t{%s3%2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "ishift")
+   (set_attr "mode" "SI")])
+
+
 (define_insn "x86_shld_1"
   [(set (match_operand:SI 0 "nonimmediate_operand" "+r*m")
         (ior:SI (ashift:SI (match_dup 0)
@@ -14626,6 +14723,24 @@ (define_insn "x86_shld_1"
    (set_attr "amdfam10_decode" "vector")
    (set_attr "bdver1_decode" "vector")])
 
+(define_insn "x86_shld_ndd_1"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+        (ior:SI (ashift:SI (match_operand:SI 1 "nonimmediate_operand" "rm")
+			   (match_operand:QI 3 "const_0_to_31_operand"))
+		(subreg:SI
+		  (lshiftrt:DI
+		    (zero_extend:DI
+		      (match_operand:SI 2 "register_operand" "r"))
+		    (match_operand:QI 4 "const_0_to_63_operand")) 0)))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_APX_NDD 
+   && INTVAL (operands[4]) == 32 - INTVAL (operands[3])"
+  "shld{l}\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "ishift")
+   (set_attr "length_immediate" "1")
+   (set_attr "mode" "SI")])
+
+
 (define_insn_and_split "*x86_shld_shrd_1_nozext"
   [(set (match_operand:SI 0 "nonimmediate_operand")
 	(ior:SI (ashift:SI (match_operand:SI 4 "nonimmediate_operand")
@@ -14650,7 +14765,24 @@ (define_insn_and_split "*x86_shld_shrd_1_nozext"
       operands[4] = force_reg (SImode, operands[4]);
       emit_insn (gen_x86_shrd_1 (operands[0], operands[4], operands[3], operands[2]));
     }
-  else
+  else if (TARGET_APX_NDD)
+    {
+     rtx tmp = gen_reg_rtx (SImode);
+     if (MEM_P (operands[4]))
+       {
+	 operands[1] = force_reg (SImode, operands[1]);
+	 emit_insn (gen_x86_shld_ndd_1 (tmp, operands[4], operands[1],
+					operands[2], operands[3]));
+       }
+     else if (MEM_P (operands[1]))
+       emit_insn (gen_x86_shrd_ndd_1 (tmp, operands[1], operands[4],
+				      operands[3], operands[2]));
+     else
+       emit_insn (gen_x86_shld_ndd_1 (tmp, operands[4], operands[1],
+				      operands[2], operands[3]));
+     emit_move_insn (operands[0], tmp);
+    }
+ else
    {
      operands[1] = force_reg (SImode, operands[1]);
      rtx tmp = gen_reg_rtx (SImode);
@@ -14682,6 +14814,33 @@ (define_insn_and_split "*x86_shld_2"
 						   (const_int 31)))) 0)))
 	      (clobber (reg:CC FLAGS_REG))])])
 
+(define_insn_and_split "*x86_shld_ndd_2"
+  [(set (match_operand:SI 0 "nonimmediate_operand")
+	(ior:SI (ashift:SI (match_operand:SI 1 "nonimmediate_operand")
+			   (match_operand:QI 3 "nonmemory_operand"))
+		(lshiftrt:SI (match_operand:SI 2 "register_operand")
+			     (minus:QI (const_int 32) (match_dup 3)))))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_APX_NDD
+   && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(parallel [(set (match_dup 4)
+		   (ior:SI (ashift:SI (match_dup 1)
+				      (and:QI (match_dup 3) (const_int 31)))
+			   (subreg:SI
+			     (lshiftrt:DI
+			       (zero_extend:DI (match_dup 2))
+				 (minus:QI (const_int 32)
+					   (and:QI (match_dup 3)
+						   (const_int 31)))) 0)))
+	      (clobber (reg:CC FLAGS_REG))
+	      (set (match_dup 0) (match_dup 4))])]
+{
+  operands[4] = gen_reg_rtx (SImode);
+  emit_move_insn (operands[4], operands[0]);
+})
+
 (define_expand "@x86_shift<mode>_adj_1"
   [(set (reg:CCZ FLAGS_REG)
 	(compare:CCZ (and:QI (match_operand:QI 2 "register_operand")
@@ -15621,6 +15780,24 @@ (define_insn "x86_64_shrd"
    (set_attr "amdfam10_decode" "vector")
    (set_attr "bdver1_decode" "vector")])
 
+(define_insn "x86_64_shrd_ndd"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+        (ior:DI (lshiftrt:DI (match_operand:DI 1 "nonimmediate_operand" "rm")
+		  (and:QI (match_operand:QI 3 "nonmemory_operand" "Jc")
+			  (const_int 63)))
+		(subreg:DI
+		  (ashift:TI
+		    (zero_extend:TI
+		      (match_operand:DI 2 "register_operand" "r"))
+		    (minus:QI (const_int 64)
+			      (and:QI (match_dup 3) (const_int 63)))) 0)))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_APX_NDD"
+  "shrd{q}\t{%s3%2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "ishift")
+   (set_attr "mode" "DI")])
+
+
 (define_insn "x86_64_shrd_1"
   [(set (match_operand:DI 0 "nonimmediate_operand" "+r*m")
         (ior:DI (lshiftrt:DI (match_dup 0)
@@ -15642,6 +15819,24 @@ (define_insn "x86_64_shrd_1"
    (set_attr "amdfam10_decode" "vector")
    (set_attr "bdver1_decode" "vector")])
 
+(define_insn "x86_64_shrd_ndd_1"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+        (ior:DI (lshiftrt:DI (match_operand:DI 1 "nonimmediate_operand" "rm")
+			     (match_operand:QI 3 "const_0_to_63_operand"))
+		(subreg:DI
+		  (ashift:TI
+		    (zero_extend:TI
+		      (match_operand:DI 2 "register_operand" "r"))
+		    (match_operand:QI 4 "const_0_to_255_operand")) 0)))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_APX_NDD
+   && INTVAL (operands[4]) == 64 - INTVAL (operands[3])"
+  "shrd{q}\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "ishift")
+   (set_attr "length_immediate" "1")
+   (set_attr "mode" "DI")])
+
+
 (define_insn_and_split "*x86_64_shrd_shld_1_nozext"
   [(set (match_operand:DI 0 "nonimmediate_operand")
 	(ior:DI (lshiftrt:DI (match_operand:DI 4 "nonimmediate_operand")
@@ -15667,6 +15862,23 @@ (define_insn_and_split "*x86_64_shrd_shld_1_nozext"
       operands[4] = force_reg (DImode, operands[4]);
       emit_insn (gen_x86_64_shld_1 (operands[0], operands[4], operands[3], operands[2]));
     }
+  else if (TARGET_APX_NDD)
+    {
+      rtx tmp = gen_reg_rtx (DImode);
+      if (MEM_P (operands[4]))
+        {
+	  operands[1] = force_reg (DImode, operands[1]);
+	  emit_insn (gen_x86_64_shrd_ndd_1 (tmp, operands[4], operands[1],
+					    operands[2], operands[3]));
+        }
+       else if (MEM_P (operands[1]))
+         emit_insn (gen_x86_64_shld_ndd_1 (tmp, operands[1], operands[4],
+					   operands[3], operands[2]));
+       else
+         emit_insn (gen_x86_64_shrd_ndd_1 (tmp, operands[4], operands[1],
+					   operands[2], operands[3]));
+       emit_move_insn (operands[0], tmp);
+    }
   else
    {
      operands[1] = force_reg (DImode, operands[1]);
@@ -15699,6 +15911,33 @@ (define_insn_and_split "*x86_64_shrd_2"
 						   (const_int 63)))) 0)))
 	      (clobber (reg:CC FLAGS_REG))])])
 
+(define_insn_and_split "*x86_64_shrd_ndd_2"
+  [(set (match_operand:DI 0 "nonimmediate_operand")
+	(ior:DI (lshiftrt:DI (match_operand:DI 1 "nonimmediate_operand")
+			     (match_operand:QI 3 "nonmemory_operand"))
+		(ashift:DI (match_operand:DI 2 "register_operand")
+			   (minus:QI (const_int 64) (match_dup 2)))))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_APX_NDD
+  && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(parallel [(set (match_dup 4)
+		   (ior:DI (lshiftrt:DI (match_dup 1)
+					(and:QI (match_dup 3) (const_int 63)))
+			   (subreg:DI
+			     (ashift:TI
+			       (zero_extend:TI (match_dup 2))
+				 (minus:QI (const_int 64)
+					   (and:QI (match_dup 3)
+						   (const_int 63)))) 0)))
+	      (clobber (reg:CC FLAGS_REG))
+	      (set (match_dup 0) (match_dup 4))])]
+{
+  operands[4] = gen_reg_rtx (DImode);
+  emit_move_insn (operands[4], operands[0]);
+})
+
 (define_insn "x86_shrd"
   [(set (match_operand:SI 0 "nonimmediate_operand" "+r*m")
         (ior:SI (lshiftrt:SI (match_dup 0)
@@ -15721,6 +15960,23 @@ (define_insn "x86_shrd"
    (set_attr "amdfam10_decode" "vector")
    (set_attr "bdver1_decode" "vector")])
 
+(define_insn "x86_shrd_ndd"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+        (ior:SI (lshiftrt:SI (match_operand:SI 1 "nonimmediate_operand" "rm")
+		  (and:QI (match_operand:QI 3 "nonmemory_operand" "Ic")
+			  (const_int 31)))
+		(subreg:SI
+		  (ashift:DI
+		    (zero_extend:DI
+		      (match_operand:SI 2 "register_operand" "r"))
+		    (minus:QI (const_int 32)
+			      (and:QI (match_dup 3) (const_int 31)))) 0)))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_APX_NDD"
+  "shrd{l}\t{%s3%2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "ishift")
+   (set_attr "mode" "SI")])
+
 (define_insn "x86_shrd_1"
   [(set (match_operand:SI 0 "nonimmediate_operand" "+r*m")
         (ior:SI (lshiftrt:SI (match_dup 0)
@@ -15742,6 +15998,24 @@ (define_insn "x86_shrd_1"
    (set_attr "amdfam10_decode" "vector")
    (set_attr "bdver1_decode" "vector")])
 
+(define_insn "x86_shrd_ndd_1"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+        (ior:SI (lshiftrt:SI (match_operand:SI 1 "nonimmediate_operand" "rm")
+			     (match_operand:QI 3 "const_0_to_31_operand"))
+		(subreg:SI
+		  (ashift:DI
+		    (zero_extend:DI
+		      (match_operand:SI 2 "register_operand" "r"))
+		    (match_operand:QI 4 "const_0_to_63_operand")) 0)))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_APX_NDD
+   && (INTVAL (operands[4]) == 32 - INTVAL (operands[3]))"
+  "shrd{l}\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "ishift")
+   (set_attr "length_immediate" "1")
+   (set_attr "mode" "SI")])
+
+
 (define_insn_and_split "*x86_shrd_shld_1_nozext"
   [(set (match_operand:SI 0 "nonimmediate_operand")
 	(ior:SI (lshiftrt:SI (match_operand:SI 4 "nonimmediate_operand")
@@ -15766,7 +16040,24 @@ (define_insn_and_split "*x86_shrd_shld_1_nozext"
       operands[4] = force_reg (SImode, operands[4]);
       emit_insn (gen_x86_shld_1 (operands[0], operands[4], operands[3], operands[2]));
     }
-  else
+  else if (TARGET_APX_NDD)
+    {
+      rtx tmp = gen_reg_rtx (SImode);
+      if (MEM_P (operands[4]))
+        {
+	  operands[1] = force_reg (SImode, operands[1]);
+	  emit_insn (gen_x86_shrd_ndd_1 (tmp, operands[4], operands[1],
+					 operands[2], operands[3]));
+        }
+      else if (MEM_P (operands[1]))
+        emit_insn (gen_x86_shld_ndd_1 (tmp, operands[1], operands[4],
+				       operands[3], operands[2]));
+      else
+        emit_insn (gen_x86_shrd_ndd_1 (tmp, operands[4], operands[1],
+				       operands[2], operands[3]));
+      emit_move_insn (operands[0], tmp);
+     }
+   else
    {
      operands[1] = force_reg (SImode, operands[1]);
      rtx tmp = gen_reg_rtx (SImode);
@@ -15798,6 +16089,33 @@ (define_insn_and_split "*x86_shrd_2"
 						   (const_int 31)))) 0)))
 	      (clobber (reg:CC FLAGS_REG))])])
 
+(define_insn_and_split "*x86_shrd_ndd_2"
+  [(set (match_operand:SI 0 "nonimmediate_operand")
+	(ior:SI (lshiftrt:SI (match_operand:SI 1 "nonimmediate_operand")
+			   (match_operand:QI 3 "nonmemory_operand"))
+		(ashift:SI (match_operand:SI 2 "register_operand")
+			   (minus:QI (const_int 32) (match_dup 3)))))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_APX_NDD
+   && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(parallel [(set (match_dup 4)
+		   (ior:SI (lshiftrt:SI (match_dup 1)
+				        (and:QI (match_dup 3) (const_int 31)))
+			   (subreg:SI
+			     (ashift:DI
+			       (zero_extend:DI (match_dup 2))
+				 (minus:QI (const_int 32)
+					   (and:QI (match_dup 3)
+						   (const_int 31)))) 0)))
+	      (clobber (reg:CC FLAGS_REG))
+	      (set (match_dup 0) (match_dup 4))])]
+{
+  operands[4] = gen_reg_rtx (SImode);
+  emit_move_insn (operands[4], operands[0]);
+})
+
 ;; Base name for insn mnemonic.
 (define_mode_attr cvt_mnemonic
   [(SI "{cltd|cdq}") (DI "{cqto|cqo}")])
diff --git a/gcc/testsuite/gcc.target/i386/apx-ndd-shld-shrd.c b/gcc/testsuite/gcc.target/i386/apx-ndd-shld-shrd.c
new file mode 100644
index 00000000000..87068ea31aa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/apx-ndd-shld-shrd.c
@@ -0,0 +1,24 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -Wno-shift-count-overflow -m64 -mapxf" } */
+/* { dg-final { scan-assembler-times {(?n)shld[ql]?[\t ]*\$2} 4 } } */
+/* { dg-final { scan-assembler-times {(?n)shrd[ql]?[\t ]*\$2} 4 } } */
+
+typedef unsigned long  u64;
+typedef unsigned int   u32;
+
+long  a;
+int   c;
+const char n = 2;
+
+long test64r (long e) { long t = ((u64)a >> n) | (e << (64 - n)); return t;}
+long test64l (u64 e) { long t = (a << n) | (e >> (64 - n)); return t;}
+int test32r (int f) { int t = ((u32)c >> n) | (f << (32 - n)); return t; }
+int test32l (u32 f) { int t = (c << n) | (f >> (32 - n)); return t; }
+
+u64 ua;
+u32 uc;
+
+u64 testu64l (u64 ue) { u64 ut = (ua << n) | (ue >> (64 - n)); return ut; }
+u64 testu64r (u64 ue) { u64 ut = (ua >> n) | (ue << (64 - n)); return ut; }
+u32 testu32l (u32 uf) { u32 ut = (uc << n) | (uf >> (32 - n)); return ut; }
+u32 testu32r (u32 uf) { u32 ut = (uc >> n) | (uf << (32 - n)); return ut; }
-- 
2.31.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 16/17] [APX NDD] Support APX NDD for cmove insns
  2023-12-05  2:29 [PATCH v2 00/17] Support Intel APX NDD Hongyu Wang
                   ` (14 preceding siblings ...)
  2023-12-05  2:29 ` [PATCH 15/17] [APX NDD] Support APX NDD for shld/shrd insns Hongyu Wang
@ 2023-12-05  2:29 ` Hongyu Wang
  2023-12-05  2:29 ` [PATCH 17/17] [APX NDD] Support TImode shift for NDD Hongyu Wang
  2023-12-05  3:48 ` [PATCH v2 00/17] Support Intel APX NDD Hongtao Liu
  17 siblings, 0 replies; 24+ messages in thread
From: Hongyu Wang @ 2023-12-05  2:29 UTC (permalink / raw)
  To: gcc-patches; +Cc: ubizjak, hongtao.liu

gcc/ChangeLog:

	* config/i386/i386.md (*mov<mode>cc_noc): Extend with new constraints
	to support NDD.
	(*movsicc_noc_zext): Likewise.
	(*movsicc_noc_zext_1): Likewise.
	(*movqicc_noc): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/apx-ndd-cmov.c: New test.
---
 gcc/config/i386/i386.md                      | 48 ++++++++++++--------
 gcc/testsuite/gcc.target/i386/apx-ndd-cmov.c | 16 +++++++
 2 files changed, 45 insertions(+), 19 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd-cmov.c

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 0af7e82deee..853f53c2bb9 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -24412,47 +24412,56 @@ (define_split
 	(neg:SWI (ltu:SWI (reg:CCC FLAGS_REG) (const_int 0))))])
 
 (define_insn "*mov<mode>cc_noc"
-  [(set (match_operand:SWI248 0 "register_operand" "=r,r")
+  [(set (match_operand:SWI248 0 "register_operand" "=r,r,r,r")
 	(if_then_else:SWI248 (match_operator 1 "ix86_comparison_operator"
 			       [(reg FLAGS_REG) (const_int 0)])
-	  (match_operand:SWI248 2 "nonimmediate_operand" "rm,0")
-	  (match_operand:SWI248 3 "nonimmediate_operand" "0,rm")))]
+	  (match_operand:SWI248 2 "nonimmediate_operand" "rm,0,rm,r")
+	  (match_operand:SWI248 3 "nonimmediate_operand" "0,rm,r,rm")))]
   "TARGET_CMOVE && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "@
    cmov%O2%C1\t{%2, %0|%0, %2}
-   cmov%O2%c1\t{%3, %0|%0, %3}"
-  [(set_attr "type" "icmov")
+   cmov%O2%c1\t{%3, %0|%0, %3}
+   cmov%O2%C1\t{%2, %3, %0|%0, %3, %2}
+   cmov%O2%c1\t{%3, %2, %0|%0, %2, %3}"
+  [(set_attr "isa" "*,*,apx_ndd,apx_ndd")
+   (set_attr "type" "icmov")
    (set_attr "mode" "<MODE>")])
 
 (define_insn "*movsicc_noc_zext"
-  [(set (match_operand:DI 0 "register_operand" "=r,r")
+  [(set (match_operand:DI 0 "register_operand" "=r,r,r,r")
 	(if_then_else:DI (match_operator 1 "ix86_comparison_operator"
 			   [(reg FLAGS_REG) (const_int 0)])
 	  (zero_extend:DI
-	    (match_operand:SI 2 "nonimmediate_operand" "rm,0"))
+	    (match_operand:SI 2 "nonimmediate_operand" "rm,0,rm,r"))
 	  (zero_extend:DI
-	    (match_operand:SI 3 "nonimmediate_operand" "0,rm"))))]
+	    (match_operand:SI 3 "nonimmediate_operand" "0,rm,r,rm"))))]
   "TARGET_64BIT
    && TARGET_CMOVE && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "@
    cmov%O2%C1\t{%2, %k0|%k0, %2}
-   cmov%O2%c1\t{%3, %k0|%k0, %3}"
-  [(set_attr "type" "icmov")
+   cmov%O2%c1\t{%3, %k0|%k0, %3}
+   cmov%O2%C1\t{%2, %3, %k0|%k0, %3, %2}
+   cmov%O2%c1\t{%3, %2, %k0|%k0, %2, %3}"
+  [(set_attr "isa" "*,*,apx_ndd,apx_ndd")
+   (set_attr "type" "icmov")
    (set_attr "mode" "SI")])
 
 (define_insn "*movsicc_noc_zext_1"
-  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,r")
+  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,r,r,r")
 	(zero_extend:DI
 	  (if_then_else:SI (match_operator 1 "ix86_comparison_operator"
 			     [(reg FLAGS_REG) (const_int 0)])
-	     (match_operand:SI 2 "nonimmediate_operand" "rm,0")
-	     (match_operand:SI 3 "nonimmediate_operand" "0,rm"))))]
+	     (match_operand:SI 2 "nonimmediate_operand" "rm,0,rm,r")
+	     (match_operand:SI 3 "nonimmediate_operand" "0,rm,r,rm"))))]
   "TARGET_64BIT
    && TARGET_CMOVE && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "@
    cmov%O2%C1\t{%2, %k0|%k0, %2}
-   cmov%O2%c1\t{%3, %k0|%k0, %3}"
-  [(set_attr "type" "icmov")
+   cmov%O2%c1\t{%3, %k0|%k0, %3}
+   cmov%O2%C1\t{%2, %3, %k0|%k0, %3, %2}
+   cmov%O2%c1\t{%3, %2, %k0|%k0, %2, %3}"
+  [(set_attr "isa" "*,*,apx_ndd,apx_ndd")
+   (set_attr "type" "icmov")
    (set_attr "mode" "SI")])
 
 
@@ -24477,14 +24486,15 @@ (define_split
 })
 
 (define_insn "*movqicc_noc"
-  [(set (match_operand:QI 0 "register_operand" "=r,r")
+  [(set (match_operand:QI 0 "register_operand" "=r,r,r")
 	(if_then_else:QI (match_operator 1 "ix86_comparison_operator"
 			   [(reg FLAGS_REG) (const_int 0)])
-		      (match_operand:QI 2 "register_operand" "r,0")
-		      (match_operand:QI 3 "register_operand" "0,r")))]
+		      (match_operand:QI 2 "register_operand" "r,0,r")
+		      (match_operand:QI 3 "register_operand" "0,r,r")))]
   "TARGET_CMOVE && !TARGET_PARTIAL_REG_STALL"
   "#"
-  [(set_attr "type" "icmov")
+  [(set_attr "isa" "*,*,apx_ndd")
+   (set_attr "type" "icmov")
    (set_attr "mode" "QI")])
 
 (define_split
diff --git a/gcc/testsuite/gcc.target/i386/apx-ndd-cmov.c b/gcc/testsuite/gcc.target/i386/apx-ndd-cmov.c
new file mode 100644
index 00000000000..459dc965342
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/apx-ndd-cmov.c
@@ -0,0 +1,16 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -m64 -mapxf" } */
+/* { dg-final { scan-assembler-times "cmove\[^\n\r]*, %eax" 1 } } */
+/* { dg-final { scan-assembler-times "cmovge\[^\n\r]*, %eax" 1 } } */
+
+unsigned int c[4];
+
+unsigned long long foo1 (int a, unsigned int b)
+{
+  return a ? b : c[1];
+}
+
+unsigned int foo3 (int a, int b, unsigned int c, unsigned int d)
+{
+  return a < b ? c : d;
+}
-- 
2.31.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 17/17] [APX NDD] Support TImode shift for NDD
  2023-12-05  2:29 [PATCH v2 00/17] Support Intel APX NDD Hongyu Wang
                   ` (15 preceding siblings ...)
  2023-12-05  2:29 ` [PATCH 16/17] [APX NDD] Support APX NDD for cmove insns Hongyu Wang
@ 2023-12-05  2:29 ` Hongyu Wang
  2023-12-05  3:48 ` [PATCH v2 00/17] Support Intel APX NDD Hongtao Liu
  17 siblings, 0 replies; 24+ messages in thread
From: Hongyu Wang @ 2023-12-05  2:29 UTC (permalink / raw)
  To: gcc-patches; +Cc: ubizjak, hongtao.liu

For TImode shifts, they are splitted by splitter functions, which assume
operands[0] and operands[1] to be the same. For the NDD alternative the
assumption may not be true so add split functions for NDD to emit the NDD
form instructions, and omit the handling of !64bit target split.

Although the NDD form allows memory src, for post-reload splitter there are
no extra register to accept NDD form shift, especially shld/shrd. So only
accept register alternative for shift src under NDD.

gcc/ChangeLog:

	* config/i386/i386-expand.cc (ix86_split_ashl_ndd): New
	function to split NDD form lshift.
	(ix86_split_rshift_ndd): Likewise for l/ashiftrt.
	* config/i386/i386-protos.h (ix86_split_ashl_ndd): New
	prototype.
	(ix86_split_rshift_ndd): Likewise.
	* config/i386/i386.md (ashl<mode>3_doubleword): Add NDD
	alternative, call ndd split function when operands[0]
	not equal to operands[1].
	(define_split for doubleword lshift): Likewise.
	(define_peephole for doubleword lshift): Likewise.
	(<insn><mode>3_doubleword): Likewise for l/ashiftrt.
	(define_split for doubleword l/ashiftrt): Likewise.
	(define_peephole for doubleword l/ashiftrt): Likewise.

gcc/ChangeLog:

	* gcc.target/i386/apx-ndd-ti-shift.c: New test.
---
 gcc/config/i386/i386-expand.cc                | 136 ++++++++++++++++++
 gcc/config/i386/i386-protos.h                 |   2 +
 gcc/config/i386/i386.md                       |  56 ++++++--
 .../gcc.target/i386/apx-ndd-ti-shift.c        |  91 ++++++++++++
 4 files changed, 273 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd-ti-shift.c

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index d4bbd33ce07..a53d69d5400 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -6678,6 +6678,142 @@ ix86_split_lshr (rtx *operands, rtx scratch, machine_mode mode)
     }
 }
 
+/* Helper function to split TImode ashl under NDD.  */
+void
+ix86_split_ashl_ndd (rtx *operands, rtx scratch)
+{
+  gcc_assert (TARGET_APX_NDD);
+  int half_width = GET_MODE_BITSIZE (TImode) >> 1;
+
+  rtx low[2], high[2];
+  int count;
+
+  split_double_mode (TImode, operands, 2, low, high);
+  if (CONST_INT_P (operands[2]))
+    {
+      count = INTVAL (operands[2]) & (GET_MODE_BITSIZE (TImode) - 1);
+
+      if (count >= half_width)
+	{
+	  count = count - half_width;
+	  if (count == 0)
+	    {
+	      if (!rtx_equal_p (high[0], low[1]))
+		emit_move_insn (high[0], low[1]);
+	    }
+	  else if (count == 1)
+	    emit_insn (gen_adddi3 (high[0], low[1], low[1]));
+	  else
+	    emit_insn (gen_ashldi3 (high[0], low[1], GEN_INT (count)));
+
+	  ix86_expand_clear (low[0]);
+	}
+      else if (count == 1)
+	{
+	  rtx x3 = gen_rtx_REG (CCCmode, FLAGS_REG);
+	  rtx x4 = gen_rtx_LTU (TImode, x3, const0_rtx);
+	  emit_insn (gen_add3_cc_overflow_1 (DImode, low[0],
+					     low[1], low[1]));
+	  emit_insn (gen_add3_carry (DImode, high[0], high[1], high[1],
+				     x3, x4));
+	}
+      else
+	{
+	  emit_insn (gen_x86_64_shld_ndd (high[0], high[1], low[1],
+					  GEN_INT (count)));
+	  emit_insn (gen_ashldi3 (low[0], low[1], GEN_INT (count)));
+	}
+    }
+  else
+    {
+      emit_insn (gen_x86_64_shld_ndd (high[0], high[1], low[1],
+				      operands[2]));
+      emit_insn (gen_ashldi3 (low[0], low[1], operands[2]));
+      if (TARGET_CMOVE && scratch)
+	{
+	  ix86_expand_clear (scratch);
+	  emit_insn (gen_x86_shift_adj_1
+		     (DImode, high[0], low[0], operands[2], scratch));
+	}
+      else
+	emit_insn (gen_x86_shift_adj_2 (DImode, high[0], low[0], operands[2]));
+    }
+}
+
+/* Helper function to split TImode l/ashr under NDD.  */
+void
+ix86_split_rshift_ndd (enum rtx_code code, rtx *operands, rtx scratch)
+{
+  gcc_assert (TARGET_APX_NDD);
+  int half_width = GET_MODE_BITSIZE (TImode) >> 1;
+  bool ashr_p = code == ASHIFTRT;
+  rtx (*gen_shr)(rtx, rtx, rtx) = ashr_p ? gen_ashrdi3
+					 : gen_lshrdi3;
+
+  rtx low[2], high[2];
+  int count;
+
+  split_double_mode (TImode, operands, 2, low, high);
+  if (CONST_INT_P (operands[2]))
+    {
+      count = INTVAL (operands[2]) & (GET_MODE_BITSIZE (TImode) - 1);
+
+      if (ashr_p && (count == GET_MODE_BITSIZE (TImode) - 1))
+	{
+	  emit_insn (gen_shr (high[0], high[1],
+			      GEN_INT (half_width - 1)));
+	  emit_move_insn (low[0], high[0]);
+	}
+      else if (count >= half_width)
+	{
+	  if (ashr_p)
+	    emit_insn (gen_shr (high[0], high[1],
+				GEN_INT (half_width - 1)));
+	  else
+	    ix86_expand_clear (high[0]);
+
+	  if (count > half_width)
+	    emit_insn (gen_shr (low[0], high[1],
+				GEN_INT (count - half_width)));
+	  else
+	    emit_move_insn (low[0], high[1]);
+	}
+      else
+	{
+	  emit_insn (gen_x86_64_shrd_ndd (low[0], low[1], high[1],
+					  GEN_INT (count)));
+	  emit_insn (gen_shr (high[0], high[1], GEN_INT (count)));
+	}
+    }
+  else
+    {
+      emit_insn (gen_x86_64_shrd_ndd (low[0], low[1], high[1],
+				      operands[2]));
+      emit_insn (gen_shr (high[0], high[1], operands[2]));
+
+      if (TARGET_CMOVE && scratch)
+	{
+	  if (ashr_p)
+	    {
+	      emit_move_insn (scratch, high[0]);
+	      emit_insn (gen_shr (scratch, scratch,
+				  GEN_INT (half_width - 1)));
+	    }
+	  else
+	    ix86_expand_clear (scratch);
+
+	  emit_insn (gen_x86_shift_adj_1
+		     (DImode, low[0], high[0], operands[2], scratch));
+	}
+      else if (ashr_p)
+	emit_insn (gen_x86_shift_adj_3
+		   (DImode, low[0], high[0], operands[2]));
+      else
+	emit_insn (gen_x86_shift_adj_2
+		   (DImode, low[0], high[0], operands[2]));
+    }
+}
+
 /* Expand move of V1TI mode register X to a new TI mode register.  */
 static rtx
 ix86_expand_v1ti_to_ti (rtx x)
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index fa952409729..56349064a6c 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -174,8 +174,10 @@ extern void x86_initialize_trampoline (rtx, rtx, rtx);
 extern rtx ix86_zero_extend_to_Pmode (rtx);
 extern void ix86_split_long_move (rtx[]);
 extern void ix86_split_ashl (rtx *, rtx, machine_mode);
+extern void ix86_split_ashl_ndd (rtx *, rtx);
 extern void ix86_split_ashr (rtx *, rtx, machine_mode);
 extern void ix86_split_lshr (rtx *, rtx, machine_mode);
+extern void ix86_split_rshift_ndd (enum rtx_code, rtx *, rtx);
 extern void ix86_expand_v1ti_shift (enum rtx_code, rtx[]);
 extern void ix86_expand_v1ti_rotate (enum rtx_code, rtx[]);
 extern void ix86_expand_v1ti_ashiftrt (rtx[]);
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 853f53c2bb9..331dda89b29 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -14420,13 +14420,14 @@ (define_insn_and_split "*ashl<dwi>3_doubleword_mask_1"
 })
 
 (define_insn "ashl<mode>3_doubleword"
-  [(set (match_operand:DWI 0 "register_operand" "=&r")
-	(ashift:DWI (match_operand:DWI 1 "reg_or_pm1_operand" "0n")
-		    (match_operand:QI 2 "nonmemory_operand" "<S>c")))
+  [(set (match_operand:DWI 0 "register_operand" "=&r,r")
+	(ashift:DWI (match_operand:DWI 1 "reg_or_pm1_operand" "0n,r")
+		    (match_operand:QI 2 "nonmemory_operand" "<S>c,<S>c")))
    (clobber (reg:CC FLAGS_REG))]
   ""
   "#"
-  [(set_attr "type" "multi")])
+  [(set_attr "type" "multi")
+   (set_attr "isa" "*,apx_ndd")])
 
 (define_split
   [(set (match_operand:DWI 0 "register_operand")
@@ -14435,7 +14436,15 @@ (define_split
    (clobber (reg:CC FLAGS_REG))]
   "epilogue_completed"
   [(const_int 0)]
-  "ix86_split_ashl (operands, NULL_RTX, <MODE>mode); DONE;")
+{
+  if (TARGET_APX_NDD
+      && !rtx_equal_p (operands[0], operands[1])
+      && REG_P (operands[1]))
+    ix86_split_ashl_ndd (operands, NULL_RTX);
+  else
+    ix86_split_ashl (operands, NULL_RTX, <MODE>mode);
+  DONE;
+})
 
 ;; By default we don't ask for a scratch register, because when DWImode
 ;; values are manipulated, registers are already at a premium.  But if
@@ -14451,7 +14460,15 @@ (define_peephole2
    (match_dup 3)]
   "TARGET_CMOVE"
   [(const_int 0)]
-  "ix86_split_ashl (operands, operands[3], <DWI>mode); DONE;")
+{
+  if (TARGET_APX_NDD
+      && !rtx_equal_p (operands[0], operands[1])
+      && (REG_P (operands[1])))
+    ix86_split_ashl_ndd (operands, operands[3]);
+  else
+    ix86_split_ashl (operands, operands[3], <DWI>mode);
+  DONE;
+})
 
 (define_insn_and_split "*ashl<dwi>3_doubleword_highpart"
   [(set (match_operand:<DWI> 0 "register_operand" "=r")
@@ -15708,16 +15725,24 @@ (define_insn_and_split "*<insn><dwi>3_doubleword_mask_1"
 })
 
 (define_insn_and_split "<insn><mode>3_doubleword"
-  [(set (match_operand:DWI 0 "register_operand" "=&r")
-	(any_shiftrt:DWI (match_operand:DWI 1 "register_operand" "0")
-			 (match_operand:QI 2 "nonmemory_operand" "<S>c")))
+  [(set (match_operand:DWI 0 "register_operand" "=&r,r")
+	(any_shiftrt:DWI (match_operand:DWI 1 "register_operand" "0,r")
+			 (match_operand:QI 2 "nonmemory_operand" "<S>c,<S>c")))
    (clobber (reg:CC FLAGS_REG))]
   ""
   "#"
   "epilogue_completed"
   [(const_int 0)]
-  "ix86_split_<insn> (operands, NULL_RTX, <MODE>mode); DONE;"
-  [(set_attr "type" "multi")])
+{
+  if (TARGET_APX_NDD
+      && !rtx_equal_p (operands[0], operands[1]))
+    ix86_split_rshift_ndd (<CODE>, operands, NULL_RTX);
+  else
+    ix86_split_<insn> (operands, NULL_RTX, <MODE>mode);
+  DONE;
+}
+  [(set_attr "type" "multi")
+   (set_attr "isa" "*,apx_ndd")])
 
 ;; By default we don't ask for a scratch register, because when DWImode
 ;; values are manipulated, registers are already at a premium.  But if
@@ -15733,7 +15758,14 @@ (define_peephole2
    (match_dup 3)]
   "TARGET_CMOVE"
   [(const_int 0)]
-  "ix86_split_<insn> (operands, operands[3], <DWI>mode); DONE;")
+{
+  if (TARGET_APX_NDD
+      && !rtx_equal_p (operands[0], operands[1]))
+    ix86_split_rshift_ndd (<CODE>, operands, operands[3]);
+  else
+    ix86_split_<insn> (operands, operands[3], <DWI>mode);
+  DONE;
+})
 
 ;; Split truncations of double word right shifts into x86_shrd_1.
 (define_insn_and_split "<insn><dwi>3_doubleword_lowpart"
diff --git a/gcc/testsuite/gcc.target/i386/apx-ndd-ti-shift.c b/gcc/testsuite/gcc.target/i386/apx-ndd-ti-shift.c
new file mode 100644
index 00000000000..0489712b7f6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/apx-ndd-ti-shift.c
@@ -0,0 +1,91 @@
+/* { dg-do run { target { int128 && { ! ia32 } } } } */
+/* { dg-require-effective-target apxf } */
+/* { dg-options "-O2" } */
+
+#include <stdlib.h>
+
+#define APX_TARGET __attribute__((noinline, target("apxf")))
+#define NO_APX __attribute__((noinline, target("no-apxf")))
+typedef __uint128_t u128;
+typedef __int128 i128;
+
+#define TI_SHIFT_FUNC(TYPE, op, name) \
+APX_TARGET \
+TYPE apx_##name##TYPE (TYPE a, char b) \
+{ \
+  return a op b; \
+} \
+TYPE noapx_##name##TYPE (TYPE a, char b) \
+{ \
+  return a op b; \
+} \
+
+#define TI_SHIFT_FUNC_CONST(TYPE, i, op, name) \
+APX_TARGET \
+TYPE apx_##name##TYPE##_const (TYPE a) \
+{ \
+  return a op i; \
+} \
+NO_APX \
+TYPE noapx_##name##TYPE##_const (TYPE a) \
+{ \
+  return a op i; \
+}
+
+#define TI_SHIFT_TEST(TYPE, name, val) \
+{\
+  if (apx_##name##TYPE (val, b) != noapx_##name##TYPE (val, b)) \
+    abort (); \
+}
+
+#define TI_SHIFT_CONST_TEST(TYPE, name, val) \
+{\
+  if (apx_##name##1##TYPE##_const (val) \
+      != noapx_##name##1##TYPE##_const (val)) \
+    abort (); \
+  if (apx_##name##2##TYPE##_const (val) \
+      != noapx_##name##2##TYPE##_const (val)) \
+    abort (); \
+  if (apx_##name##3##TYPE##_const (val) \
+      != noapx_##name##3##TYPE##_const (val)) \
+    abort (); \
+  if (apx_##name##4##TYPE##_const (val) \
+      != noapx_##name##4##TYPE##_const (val)) \
+    abort (); \
+}
+
+TI_SHIFT_FUNC(i128, <<, ashl)
+TI_SHIFT_FUNC(i128, >>, ashr)
+TI_SHIFT_FUNC(u128, >>, lshr)
+
+TI_SHIFT_FUNC_CONST(i128, 1, <<, ashl1)
+TI_SHIFT_FUNC_CONST(i128, 65, <<, ashl2)
+TI_SHIFT_FUNC_CONST(i128, 64, <<, ashl3)
+TI_SHIFT_FUNC_CONST(i128, 87, <<, ashl4)
+TI_SHIFT_FUNC_CONST(i128, 127, >>, ashr1)
+TI_SHIFT_FUNC_CONST(i128, 87, >>, ashr2)
+TI_SHIFT_FUNC_CONST(i128, 27, >>, ashr3)
+TI_SHIFT_FUNC_CONST(i128, 64, >>, ashr4)
+TI_SHIFT_FUNC_CONST(u128, 127, >>, lshr1)
+TI_SHIFT_FUNC_CONST(u128, 87, >>, lshr2)
+TI_SHIFT_FUNC_CONST(u128, 27, >>, lshr3)
+TI_SHIFT_FUNC_CONST(u128, 64, >>, lshr4)
+
+int main (void)
+{
+  if (!__builtin_cpu_supports ("apxf"))
+    return 0;
+
+  u128 ival = 0x123456788765432FLL;
+  u128 uval = 0xF234567887654321ULL;
+  char b = 28;
+
+  TI_SHIFT_TEST(i128, ashl, ival)
+  TI_SHIFT_TEST(i128, ashr, ival)
+  TI_SHIFT_TEST(u128, lshr, uval)
+  TI_SHIFT_CONST_TEST(i128, ashl, ival)
+  TI_SHIFT_CONST_TEST(i128, ashr, ival)
+  TI_SHIFT_CONST_TEST(u128, lshr, uval)
+
+  return 0;
+}
-- 
2.31.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 00/17] Support Intel APX NDD
  2023-12-05  2:29 [PATCH v2 00/17] Support Intel APX NDD Hongyu Wang
                   ` (16 preceding siblings ...)
  2023-12-05  2:29 ` [PATCH 17/17] [APX NDD] Support TImode shift for NDD Hongyu Wang
@ 2023-12-05  3:48 ` Hongtao Liu
  17 siblings, 0 replies; 24+ messages in thread
From: Hongtao Liu @ 2023-12-05  3:48 UTC (permalink / raw)
  To: Hongyu Wang; +Cc: gcc-patches, ubizjak, hongtao.liu

On Tue, Dec 5, 2023 at 10:32 AM Hongyu Wang <hongyu.wang@intel.com> wrote:
>
> Hi,
>
> APX NDD patches have been posted at
> https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636604.html
>
> Thanks to Hongtao's review, the V2 patch adds support of zext sematic with
> memory input as NDD by default clear upper bits of dest for any operand size.
>
> Also we support TImode shift with new split helper functions, which allows NDD
> form split but still restric the memory src usage as in post-reload splitter
> the register number is restricted, and no new register can be used for
> shld/shrd.
>
> Also fixed several typo/formatting/redundant code.
Patches LGTM, Please wait a few more days before committing incase
other folks have comments.
>
> Bootstrapped/regtested on x86_64-pc-linux-gnu{-m32,} and sde.
>
> OK for trunk?
>
> Hongyu Wang (8):
>   [APX NDD] Restrict TImode register usage when NDD enabled
>   [APX NDD] Disable seg_prefixed memory usage for NDD add
>   [APX NDD] Support APX NDD for left shift insns
>   [APX NDD] Support APX NDD for right shift insns
>   [APX NDD] Support APX NDD for rotate insns
>   [APX NDD] Support APX NDD for shld/shrd insns
>   [APX NDD] Support APX NDD for cmove insns
>   [APX NDD] Support TImode shift for NDD
>
> Kong Lingling (9):
>   [APX NDD] Support Intel APX NDD for legacy add insn
>   [APX NDD] Support APX NDD for optimization patterns of add
>   [APX NDD] Support APX NDD for adc insns
>   [APX NDD] Support APX NDD for sub insns
>   [APX NDD] Support APX NDD for sbb insn
>   [APX NDD] Support APX NDD for neg insn
>   [APX NDD] Support APX NDD for not insn
>   [APX NDD] Support APX NDD for and insn
>   [APX NDD] Support APX NDD for or/xor insn
>
>  gcc/config/i386/constraints.md                |    5 +
>  gcc/config/i386/i386-expand.cc                |  164 +-
>  gcc/config/i386/i386-options.cc               |    2 +
>  gcc/config/i386/i386-protos.h                 |   16 +-
>  gcc/config/i386/i386.cc                       |   40 +-
>  gcc/config/i386/i386.md                       | 2323 +++++++++++------
>  gcc/testsuite/gcc.target/i386/apx-ndd-adc.c   |   15 +
>  gcc/testsuite/gcc.target/i386/apx-ndd-cmov.c  |   16 +
>  gcc/testsuite/gcc.target/i386/apx-ndd-sbb.c   |    6 +
>  .../gcc.target/i386/apx-ndd-shld-shrd.c       |   24 +
>  .../gcc.target/i386/apx-ndd-ti-shift.c        |   91 +
>  gcc/testsuite/gcc.target/i386/apx-ndd.c       |  202 ++
>  12 files changed, 2149 insertions(+), 755 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd-adc.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd-cmov.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd-sbb.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd-shld-shrd.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd-ti-shift.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd.c
>
> --
> 2.31.1
>


-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 02/17] [APX NDD] Restrict TImode register usage when NDD enabled
  2023-12-05  2:29 ` [PATCH 02/17] [APX NDD] Restrict TImode register usage when NDD enabled Hongyu Wang
@ 2023-12-05 10:46   ` Uros Bizjak
  2023-12-06  1:24     ` Hongyu Wang
  0 siblings, 1 reply; 24+ messages in thread
From: Uros Bizjak @ 2023-12-05 10:46 UTC (permalink / raw)
  To: Hongyu Wang; +Cc: gcc-patches, hongtao.liu

On Tue, Dec 5, 2023 at 3:29 AM Hongyu Wang <hongyu.wang@intel.com> wrote:
>
> Under APX NDD, previous TImode allocation will have issue that it was
> originally allocated using continuous pair, like rax:rdi, rdi:rdx.
>
> This will cause issue for all TImode NDD patterns. For NDD we will not
> assume the arithmetic operations like add have dependency between dest
> and src1, then write to 1st highpart rdi will be overrided by the 2nd
> lowpart rdi if 2nd lowpart rdi have different src as input, then the write
> to 1st highpart rdi will missed and cause miscompliation.
>
> To resolve this, under TARGET_APX_NDD we'd only allow register with even
> regno to be allocated with TImode, then TImode registers will be allocated
> with non-overlapping pairs.

Perhaps you could use earlyclobber with __doubleword instructions:

(define_insn_and_split "*add<dwi>3_doubleword"
  [(set (match_operand:<DWI> 0 "nonimmediate_operand" "=ro,r")
    (plus:<DWI>
      (match_operand:<DWI> 1 "nonimmediate_operand" "%0,0")
      (match_operand:<DWI> 2 "x86_64_hilo_general_operand" "r<di>,o")))
   (clobber (reg:CC FLAGS_REG))]

For the above pattern, you can add earlyclobbered &r output
alternative that guarantees that output won't be allocated to any of
the input registers.

Uros.

> There could be some error for inline assembly if it forcely allocate __int128
> with odd number general register.
>
> gcc/ChangeLog:
>
>         * config/i386/i386.cc (ix86_hard_regno_mode_ok): Restrict even regno
>         for TImode if APX NDD enabled.
> ---
>  gcc/config/i386/i386.cc | 10 ++++++++++
>  1 file changed, 10 insertions(+)
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index 93a9cb556a5..3efeed396c4 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -20873,6 +20873,16 @@ ix86_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
>         return true;
>        return !can_create_pseudo_p ();
>      }
> +  /* With TImode we previously have assumption that src1/dest will use same
> +     register, so the allocation of highpart/lowpart can be consecutive, and
> +     2 TImode insn would held their low/highpart in continuous sequence like
> +     rax:rdx, rdx:rcx. This will not work for APX_NDD since NDD allows
> +     different registers as dest/src1, when writes to 2nd lowpart will impact
> +     the writes to 1st highpart, then the insn will be optimized out. So for
> +     TImode pattern if we support NDD form, the allowed register number should
> +     be even to avoid such mixed high/low part override. */
> +  else if (TARGET_APX_NDD && mode == TImode)
> +    return regno % 2 == 0;
>    /* We handle both integer and floats in the general purpose registers.  */
>    else if (VALID_INT_MODE_P (mode)
>            || VALID_FP_MODE_P (mode))
> --
> 2.31.1
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 03/17] [APX NDD] Support APX NDD for optimization patterns of add
  2023-12-05  2:29 ` [PATCH 03/17] [APX NDD] Support APX NDD for optimization patterns of add Hongyu Wang
@ 2023-12-05 11:20   ` Uros Bizjak
  0 siblings, 0 replies; 24+ messages in thread
From: Uros Bizjak @ 2023-12-05 11:20 UTC (permalink / raw)
  To: Hongyu Wang; +Cc: gcc-patches, hongtao.liu, Kong Lingling

On Tue, Dec 5, 2023 at 3:29 AM Hongyu Wang <hongyu.wang@intel.com> wrote:
>
> From: Kong Lingling <lingling.kong@intel.com>
>
> gcc/ChangeLog:
>
>         * config/i386/i386.md: (addsi_1_zext): Add new alternatives for
>         NDD and adjust output templates.
>         (*add<mode>_2): Likewise.
>         (*addsi_2_zext): Likewise.
>         (*add<mode>_3): Likewise.
>         (*addsi_3_zext): Likewise.
>         (*adddi_4): Likewise.
>         (*add<mode>_4): Likewise.
>         (*add<mode>_5): Likewise.
>         (*addv<mode>4): Likewise.
>         (*addv<mode>4_1): Likewise.
>         (*add<mode>3_cconly_overflow_1): Likewise.
>         (*add<mode>3_cc_overflow_1): Likewise.
>         (*addsi3_zext_cc_overflow_1): Likewise.
>         (*add<mode>3_cconly_overflow_2): Likewise.
>         (*add<mode>3_cc_overflow_2): Likewise.
>         (*addsi3_zext_cc_overflow_2): Likewise.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/i386/apx-ndd.c: Add more test.
> ---
>  gcc/config/i386/i386.md                 | 310 +++++++++++++++---------
>  gcc/testsuite/gcc.target/i386/apx-ndd.c |  53 ++--
>  2 files changed, 232 insertions(+), 131 deletions(-)
>
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index cb227d19f40..2a73f6dcaec 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -6476,13 +6476,15 @@ (define_insn "*add<mode>_1"
>  ;; patterns constructed from addsi_1 to match.
>
>  (define_insn "addsi_1_zext"
> -  [(set (match_operand:DI 0 "register_operand" "=r,r,r")
> +  [(set (match_operand:DI 0 "register_operand" "=r,r,r,r,r")
>         (zero_extend:DI
> -         (plus:SI (match_operand:SI 1 "nonimmediate_operand" "%0,r,r")
> -                  (match_operand:SI 2 "x86_64_general_operand" "rBMe,0,le"))))
> +         (plus:SI (match_operand:SI 1 "nonimmediate_operand" "%0,r,r,r,rm")
> +                  (match_operand:SI 2 "x86_64_general_operand" "rBMe,0,le,rBMe,re"))))
>     (clobber (reg:CC FLAGS_REG))]
> -  "TARGET_64BIT && ix86_binary_operator_ok (PLUS, SImode, operands)"
> +  "TARGET_64BIT && ix86_binary_operator_ok (PLUS, SImode, operands,
> +                                           TARGET_APX_NDD)"
>  {
> +  bool use_ndd = (which_alternative == 3 || which_alternative == 4);

Can get_attr_isa (insn) == ISA_APX_NDD be used instead?

Uros.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 05/17] [APX NDD] Support APX NDD for adc insns
  2023-12-05  2:29 ` [PATCH 05/17] [APX NDD] Support APX NDD for adc insns Hongyu Wang
@ 2023-12-05 11:25   ` Uros Bizjak
  0 siblings, 0 replies; 24+ messages in thread
From: Uros Bizjak @ 2023-12-05 11:25 UTC (permalink / raw)
  To: Hongyu Wang; +Cc: gcc-patches, hongtao.liu, Kong Lingling

On Tue, Dec 5, 2023 at 3:29 AM Hongyu Wang <hongyu.wang@intel.com> wrote:
>
> From: Kong Lingling <lingling.kong@intel.com>
>
> Legacy adc patterns are commonly adopted to TImode add, when extending TImode
> add to NDD version, operands[0] and operands[1] can be different, so extra move
> should be emitted if those patterns have optimization when adding const0_rtx.
>
> NDD instructions will automatically zero-extend dest register to 64bit, so for
> zext patterns it can adopt all NDD form that have memory src input.
>
> gcc/ChangeLog:
>
>         * config/i386/i386.md (*add<dwi>3_doubleword): Add ndd constraints, and
>         move operands[1] to operands[0] when they are not equal.
>         (*add<dwi>3_doubleword_cc_overflow_1): Likewise.
>         (*add<dwi>3_doubleword_zext): Add ndd constraints.
>         (*addv<dwi>4_doubleword): Likewise.
>         (*addv<dwi>4_doubleword_1): Likewise.
>         (addv<mode>4_overflow_1): Likewise.
>         (*addv<mode>4_overflow_2): Likewise.
>         (@add<mode>3_carry): Likewise.
>         (*add<mode>3_carry_0): Likewise.
>         (*addsi3_carry_zext): Likewise.
>         (addcarry<mode>): Likewise.
>         (addcarry<mode>_0): Likewise.
>         (*addcarry<mode>_1): Likewise.
>         (*add<mode>3_eq): Likewise.
>         (*add<mode>3_ne): Likewise.
>         (*addsi3_carry_zext_0): Likewise, and use nonimmediate_operand for
>         operands[1] to accept memory input for NDD alternative.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/i386/apx-ndd-adc.c: New test.
> ---
>  gcc/config/i386/i386.md                     | 191 ++++++++++++--------
>  gcc/testsuite/gcc.target/i386/apx-ndd-adc.c |  15 ++
>  2 files changed, 134 insertions(+), 72 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd-adc.c
>
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index 6b316e698bb..358a3857f89 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -6291,12 +6291,12 @@ (define_expand "add<mode>3"
>                                 TARGET_APX_NDD); DONE;")
>
>  (define_insn_and_split "*add<dwi>3_doubleword"
> -  [(set (match_operand:<DWI> 0 "nonimmediate_operand" "=ro,r")
> +  [(set (match_operand:<DWI> 0 "nonimmediate_operand" "=ro,r,r,r")
>         (plus:<DWI>
> -         (match_operand:<DWI> 1 "nonimmediate_operand" "%0,0")
> -         (match_operand:<DWI> 2 "x86_64_hilo_general_operand" "r<di>,o")))
> +         (match_operand:<DWI> 1 "nonimmediate_operand" "%0,0,ro,r")
> +         (match_operand:<DWI> 2 "x86_64_hilo_general_operand" "r<di>,o,r<di>,r")))
>     (clobber (reg:CC FLAGS_REG))]

If we relax the requirement for TImode register pair, then =&r output
should be used here (and in other TImode instructions) for apx_ndd
ISA.

Uros.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 02/17] [APX NDD] Restrict TImode register usage when NDD enabled
  2023-12-05 10:46   ` Uros Bizjak
@ 2023-12-06  1:24     ` Hongyu Wang
  2023-12-06  6:55       ` Uros Bizjak
  0 siblings, 1 reply; 24+ messages in thread
From: Hongyu Wang @ 2023-12-06  1:24 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: Hongyu Wang, gcc-patches, hongtao.liu

Uros Bizjak <ubizjak@gmail.com> 于2023年12月5日周二 18:46写道:

>
> On Tue, Dec 5, 2023 at 3:29 AM Hongyu Wang <hongyu.wang@intel.com> wrote:
> >
> > Under APX NDD, previous TImode allocation will have issue that it was
> > originally allocated using continuous pair, like rax:rdi, rdi:rdx.
> >
> > This will cause issue for all TImode NDD patterns. For NDD we will not
> > assume the arithmetic operations like add have dependency between dest
> > and src1, then write to 1st highpart rdi will be overrided by the 2nd
> > lowpart rdi if 2nd lowpart rdi have different src as input, then the write
> > to 1st highpart rdi will missed and cause miscompliation.
> >
> > To resolve this, under TARGET_APX_NDD we'd only allow register with even
> > regno to be allocated with TImode, then TImode registers will be allocated
> > with non-overlapping pairs.
>
> Perhaps you could use earlyclobber with __doubleword instructions:
>
> (define_insn_and_split "*add<dwi>3_doubleword"
>   [(set (match_operand:<DWI> 0 "nonimmediate_operand" "=ro,r")
>     (plus:<DWI>
>       (match_operand:<DWI> 1 "nonimmediate_operand" "%0,0")
>       (match_operand:<DWI> 2 "x86_64_hilo_general_operand" "r<di>,o")))
>    (clobber (reg:CC FLAGS_REG))]
>
> For the above pattern, you can add earlyclobbered &r output
> alternative that guarantees that output won't be allocated to any of
> the input registers.
>

Yes, it does resolve the dest/src overlapping issue we met, thanks!
I tried it and no fails in gcc-testsuite and spec. Suppose for
different src1/src2 RA can handle them correctly.

Will update in V3 patches with the changes of get_attr_isa (insn) == ISA_APX_NDD

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 02/17] [APX NDD] Restrict TImode register usage when NDD enabled
  2023-12-06  1:24     ` Hongyu Wang
@ 2023-12-06  6:55       ` Uros Bizjak
  0 siblings, 0 replies; 24+ messages in thread
From: Uros Bizjak @ 2023-12-06  6:55 UTC (permalink / raw)
  To: Hongyu Wang; +Cc: Hongyu Wang, gcc-patches, hongtao.liu

On Wed, Dec 6, 2023 at 2:31 AM Hongyu Wang <wwwhhhyyy333@gmail.com> wrote:
>
> Uros Bizjak <ubizjak@gmail.com> 于2023年12月5日周二 18:46写道:
>
> >
> > On Tue, Dec 5, 2023 at 3:29 AM Hongyu Wang <hongyu.wang@intel.com> wrote:
> > >
> > > Under APX NDD, previous TImode allocation will have issue that it was
> > > originally allocated using continuous pair, like rax:rdi, rdi:rdx.
> > >
> > > This will cause issue for all TImode NDD patterns. For NDD we will not
> > > assume the arithmetic operations like add have dependency between dest
> > > and src1, then write to 1st highpart rdi will be overrided by the 2nd
> > > lowpart rdi if 2nd lowpart rdi have different src as input, then the write
> > > to 1st highpart rdi will missed and cause miscompliation.
> > >
> > > To resolve this, under TARGET_APX_NDD we'd only allow register with even
> > > regno to be allocated with TImode, then TImode registers will be allocated
> > > with non-overlapping pairs.
> >
> > Perhaps you could use earlyclobber with __doubleword instructions:
> >
> > (define_insn_and_split "*add<dwi>3_doubleword"
> >   [(set (match_operand:<DWI> 0 "nonimmediate_operand" "=ro,r")
> >     (plus:<DWI>
> >       (match_operand:<DWI> 1 "nonimmediate_operand" "%0,0")
> >       (match_operand:<DWI> 2 "x86_64_hilo_general_operand" "r<di>,o")))
> >    (clobber (reg:CC FLAGS_REG))]
> >
> > For the above pattern, you can add earlyclobbered &r output
> > alternative that guarantees that output won't be allocated to any of
> > the input registers.
> >
>
> Yes, it does resolve the dest/src overlapping issue we met, thanks!
> I tried it and no fails in gcc-testsuite and spec. Suppose for
> different src1/src2 RA can handle them correctly.

Yes, and when memory input operand is used in doubleword patterns, you
need earlyclobber anyway, otherwise nothing prevents the compiler from
clobbering address registers. When addr registers are dead, the
compiler can (and will) allocate output register to the same regno as
address register.

Uros,

> Will update in V3 patches with the changes of get_attr_isa (insn) == ISA_APX_NDD

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2023-12-06  6:55 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-05  2:29 [PATCH v2 00/17] Support Intel APX NDD Hongyu Wang
2023-12-05  2:29 ` [PATCH 01/17] [APX NDD] Support Intel APX NDD for legacy add insn Hongyu Wang
2023-12-05  2:29 ` [PATCH 02/17] [APX NDD] Restrict TImode register usage when NDD enabled Hongyu Wang
2023-12-05 10:46   ` Uros Bizjak
2023-12-06  1:24     ` Hongyu Wang
2023-12-06  6:55       ` Uros Bizjak
2023-12-05  2:29 ` [PATCH 03/17] [APX NDD] Support APX NDD for optimization patterns of add Hongyu Wang
2023-12-05 11:20   ` Uros Bizjak
2023-12-05  2:29 ` [PATCH 04/17] [APX NDD] Disable seg_prefixed memory usage for NDD add Hongyu Wang
2023-12-05  2:29 ` [PATCH 05/17] [APX NDD] Support APX NDD for adc insns Hongyu Wang
2023-12-05 11:25   ` Uros Bizjak
2023-12-05  2:29 ` [PATCH 06/17] [APX NDD] Support APX NDD for sub insns Hongyu Wang
2023-12-05  2:29 ` [PATCH 07/17] [APX NDD] Support APX NDD for sbb insn Hongyu Wang
2023-12-05  2:29 ` [PATCH 08/17] [APX NDD] Support APX NDD for neg insn Hongyu Wang
2023-12-05  2:29 ` [PATCH 09/17] [APX NDD] Support APX NDD for not insn Hongyu Wang
2023-12-05  2:29 ` [PATCH 10/17] [APX NDD] Support APX NDD for and insn Hongyu Wang
2023-12-05  2:29 ` [PATCH 11/17] [APX NDD] Support APX NDD for or/xor insn Hongyu Wang
2023-12-05  2:29 ` [PATCH 12/17] [APX NDD] Support APX NDD for left shift insns Hongyu Wang
2023-12-05  2:29 ` [PATCH 13/17] [APX NDD] Support APX NDD for right " Hongyu Wang
2023-12-05  2:29 ` [PATCH 14/17] [APX NDD] Support APX NDD for rotate insns Hongyu Wang
2023-12-05  2:29 ` [PATCH 15/17] [APX NDD] Support APX NDD for shld/shrd insns Hongyu Wang
2023-12-05  2:29 ` [PATCH 16/17] [APX NDD] Support APX NDD for cmove insns Hongyu Wang
2023-12-05  2:29 ` [PATCH 17/17] [APX NDD] Support TImode shift for NDD Hongyu Wang
2023-12-05  3:48 ` [PATCH v2 00/17] Support Intel APX NDD Hongtao Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).