public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH v2 00/13] Support Intel APX EGPR
@ 2023-09-22 10:56 Hongyu Wang
  2023-09-22 10:56 ` [PATCH 01/13] [APX EGPR] middle-end: Add insn argument to base_reg_class Hongyu Wang
                   ` (13 more replies)
  0 siblings, 14 replies; 28+ messages in thread
From: Hongyu Wang @ 2023-09-22 10:56 UTC (permalink / raw)
  To: gcc-patches; +Cc: ubizjak, vmakarov, jakub

Hi,

This is a v2 patch for APX support which follows-up previous discussion in
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628904.html

As discussed in previous thread, the inverse approach to extend base/index reg
support with new memory constraints requrires much more effort both in
middle-end and backend, so we keep the backend implementation logic. Also for
inline asm, it is hard to provide memory constraint to use EGPR by force due to
the limitation of base/index reg class hook, so we just add one register class
that allows EGPR usage by force.

The major changes are

  1. Add new macros INSN_BASE_REG_CLASS/REGNO_OK_FOR_INSN_BASE_P instead of
  using old MODE_CODE macros.
  2. Add a series of constraints that prohibits EGPR as counterparts of common
  constraints that may involve register/memory/address. All those are prefixed
  with "j" to avoid confilts with previous constraints.
  3. Support constraint mapping for all gpr related common constraints in
  inline asm. 

Bootstrapped/regtested x86_64-linux-gnu.
Ok for trunk?

Hongyu Wang (2):
  [APX EGPR] middle-end: Add index_reg_class with insn argument.
  [APX EGPR] Handle GPR16 only vector move insns

Kong Lingling (11):
  [APX EGPR] middle-end: Add insn argument to base_reg_class
  [APX_EGPR] Initial support for APX_F
  [APX EGPR] Add 16 new integer general purpose registers
  [APX EGPR] Add register and memory constraints that disallow EGPR
  [APX EGPR] Add backend hook for base_reg_class/index_reg_class.
  [APX EGPR] Map reg/mem constraints in inline asm to non-EGPR
    constraint.
  [APX EGPR] Handle legacy insn that only support GPR16 (1/5)
  [APX EGPR] Handle legacy insns that only support GPR16 (2/5)
  [APX EGPR] Handle legacy insns that only support GPR16 (3/5)
  [APX_EGPR] Handle legacy insns that only support GPR16 (4/5)
  [APX EGPR] Handle vex insns that only support GPR16 (5/5)

 gcc/addresses.h                               |  29 +-
 gcc/common/config/i386/cpuinfo.h              |  12 +-
 gcc/common/config/i386/i386-common.cc         |  17 +
 gcc/common/config/i386/i386-cpuinfo.h         |   1 +
 gcc/common/config/i386/i386-isas.h            |   1 +
 gcc/config/i386/constraints.md                |  65 +-
 gcc/config/i386/cpuid.h                       |   1 +
 gcc/config/i386/i386-isa.def                  |   1 +
 gcc/config/i386/i386-options.cc               |  18 +
 gcc/config/i386/i386-opts.h                   |   8 +
 gcc/config/i386/i386-protos.h                 |   5 +
 gcc/config/i386/i386.cc                       | 303 ++++++-
 gcc/config/i386/i386.h                        |  69 +-
 gcc/config/i386/i386.md                       | 131 ++-
 gcc/config/i386/i386.opt                      |  30 +
 gcc/config/i386/mmx.md                        | 154 ++--
 gcc/config/i386/sse.md                        | 792 ++++++++++++------
 gcc/doc/invoke.texi                           |  11 +-
 gcc/doc/tm.texi                               |  21 +
 gcc/doc/tm.texi.in                            |  21 +
 gcc/lra-constraints.cc                        |  32 +-
 gcc/reload.cc                                 |  34 +-
 gcc/reload1.cc                                |   2 +-
 gcc/testsuite/gcc.target/i386/apx-1.c         |   8 +
 .../gcc.target/i386/apx-egprs-names.c         |  17 +
 .../gcc.target/i386/apx-inline-gpr-norex2.c   |  25 +
 .../gcc.target/i386/apx-interrupt-1.c         | 102 +++
 .../i386/apx-legacy-insn-check-norex2-asm.c   |   5 +
 .../i386/apx-legacy-insn-check-norex2.c       | 181 ++++
 .../gcc.target/i386/apx-spill_to_egprs-1.c    |  25 +
 gcc/testsuite/lib/target-supports.exp         |  10 +
 31 files changed, 1683 insertions(+), 448 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-egprs-names.c
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-inline-gpr-norex2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-interrupt-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2-asm.c
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-spill_to_egprs-1.c

-- 
2.31.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 01/13] [APX EGPR] middle-end: Add insn argument to base_reg_class
  2023-09-22 10:56 [PATCH v2 00/13] Support Intel APX EGPR Hongyu Wang
@ 2023-09-22 10:56 ` Hongyu Wang
  2023-09-22 16:02   ` Vladimir Makarov
  2023-09-22 10:56 ` [PATCH 02/13] [APX EGPR] middle-end: Add index_reg_class with insn argument Hongyu Wang
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 28+ messages in thread
From: Hongyu Wang @ 2023-09-22 10:56 UTC (permalink / raw)
  To: gcc-patches; +Cc: ubizjak, vmakarov, jakub, Kong Lingling, Hongtao Liu

From: Kong Lingling <lingling.kong@intel.com>

Current reload infrastructure does not support selective base_reg_class
for backend insn. Add new macros with insn parameters to base_reg_class
for lra/reload usage.

gcc/ChangeLog:

	* addresses.h (base_reg_class): Add insn argument and new macro
	INSN_BASE_REG_CLASS.
	(regno_ok_for_base_p_1): Add insn argument and new macro
	REGNO_OK_FOR_INSN_BASE_P.
	(regno_ok_for_base_p): Add insn argument and parse to ok_for_base_p_1.
	* doc/tm.texi: Document INSN_BASE_REG_CLASS and
	REGNO_OK_FOR_INSN_BASE_P.
	* doc/tm.texi.in: Ditto.
	* lra-constraints.cc (process_address_1): Pass insn to
	base_reg_class.
	(curr_insn_transform): Ditto.
	* reload.cc (find_reloads): Ditto.
	(find_reloads_address): Ditto.
	(find_reloads_address_1): Ditto.
	(find_reloads_subreg_address): Ditto.
	* reload1.cc (maybe_fix_stack_asms): Ditto.

Co-authored-by: Hongyu Wang <hongyu.wang@intel.com>
Co-authored-by: Hongtao Liu <hongtao.liu@intel.com>
---
 gcc/addresses.h        | 19 +++++++++++++++----
 gcc/doc/tm.texi        | 14 ++++++++++++++
 gcc/doc/tm.texi.in     | 14 ++++++++++++++
 gcc/lra-constraints.cc | 15 +++++++++------
 gcc/reload.cc          | 30 ++++++++++++++++++------------
 gcc/reload1.cc         |  2 +-
 6 files changed, 71 insertions(+), 23 deletions(-)

diff --git a/gcc/addresses.h b/gcc/addresses.h
index 3519c241c6d..2c92927bd51 100644
--- a/gcc/addresses.h
+++ b/gcc/addresses.h
@@ -28,8 +28,12 @@ inline enum reg_class
 base_reg_class (machine_mode mode ATTRIBUTE_UNUSED,
 		addr_space_t as ATTRIBUTE_UNUSED,
 		enum rtx_code outer_code ATTRIBUTE_UNUSED,
-		enum rtx_code index_code ATTRIBUTE_UNUSED)
+		enum rtx_code index_code ATTRIBUTE_UNUSED,
+		rtx_insn *insn ATTRIBUTE_UNUSED = NULL)
 {
+#ifdef INSN_BASE_REG_CLASS
+  return INSN_BASE_REG_CLASS (insn);
+#else
 #ifdef MODE_CODE_BASE_REG_CLASS
   return MODE_CODE_BASE_REG_CLASS (MACRO_MODE (mode), as, outer_code,
 				   index_code);
@@ -44,6 +48,7 @@ base_reg_class (machine_mode mode ATTRIBUTE_UNUSED,
   return BASE_REG_CLASS;
 #endif
 #endif
+#endif
 }
 
 /* Wrapper function to unify target macros REGNO_MODE_CODE_OK_FOR_BASE_P,
@@ -56,8 +61,12 @@ ok_for_base_p_1 (unsigned regno ATTRIBUTE_UNUSED,
 		 machine_mode mode ATTRIBUTE_UNUSED,
 		 addr_space_t as ATTRIBUTE_UNUSED,
 		 enum rtx_code outer_code ATTRIBUTE_UNUSED,
-		 enum rtx_code index_code ATTRIBUTE_UNUSED)
+		 enum rtx_code index_code ATTRIBUTE_UNUSED,
+		 rtx_insn* insn ATTRIBUTE_UNUSED = NULL)
 {
+#ifdef REGNO_OK_FOR_INSN_BASE_P
+  return REGNO_OK_FOR_INSN_BASE_P (regno, insn);
+#else
 #ifdef REGNO_MODE_CODE_OK_FOR_BASE_P
   return REGNO_MODE_CODE_OK_FOR_BASE_P (regno, MACRO_MODE (mode), as,
 					outer_code, index_code);
@@ -72,6 +81,7 @@ ok_for_base_p_1 (unsigned regno ATTRIBUTE_UNUSED,
   return REGNO_OK_FOR_BASE_P (regno);
 #endif
 #endif
+#endif
 }
 
 /* Wrapper around ok_for_base_p_1, for use after register allocation is
@@ -79,12 +89,13 @@ ok_for_base_p_1 (unsigned regno ATTRIBUTE_UNUSED,
 
 inline bool
 regno_ok_for_base_p (unsigned regno, machine_mode mode, addr_space_t as,
-		     enum rtx_code outer_code, enum rtx_code index_code)
+		     enum rtx_code outer_code, enum rtx_code index_code,
+		     rtx_insn *insn = NULL)
 {
   if (regno >= FIRST_PSEUDO_REGISTER && reg_renumber[regno] >= 0)
     regno = reg_renumber[regno];
 
-  return ok_for_base_p_1 (regno, mode, as, outer_code, index_code);
+  return ok_for_base_p_1 (regno, mode, as, outer_code, index_code, insn);
 }
 
 #endif /* GCC_ADDRESSES_H */
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index b0779724d30..5b1e2a11f89 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -2568,6 +2568,13 @@ of an address, @code{ADDRESS} for something that occurs in an
 index expression if @var{outer_code} is @code{PLUS}; @code{SCRATCH} otherwise.
 @end defmac
 
+@defmac INSN_BASE_REG_CLASS (@var{insn})
+A C expression whose value is the register class to which a valid
+base register for a specified @var{insn} must belong. This macro is
+used when some backend insns may have limited usage of base register
+compared with other insns.
+@end defmac
+
 @defmac INDEX_REG_CLASS
 A macro whose definition is the name of the class to which a valid
 index register must belong.  An index register is one used in an
@@ -2618,6 +2625,13 @@ corresponding index expression if @var{outer_code} is @code{PLUS};
 that appear outside a @code{MEM}, i.e., as an @code{address_operand}.
 @end defmac
 
+@defmac REGNO_OK_FOR_INSN_BASE_P (@var{num}, @var{insn})
+A C expression which is nonzero if register number @var{num} is
+suitable for use as a base register in operand addresses for a specified
+@var{insn}. This macro is used when some backend insn may have limited
+usage of base register compared with other insns.
+@end defmac
+
 @defmac REGNO_OK_FOR_INDEX_P (@var{num})
 A C expression which is nonzero if register number @var{num} is
 suitable for use as an index register in operand addresses.  It may be
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index d3e18955628..f6e63ad8871 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -2150,6 +2150,13 @@ of an address, @code{ADDRESS} for something that occurs in an
 index expression if @var{outer_code} is @code{PLUS}; @code{SCRATCH} otherwise.
 @end defmac
 
+@defmac INSN_BASE_REG_CLASS (@var{insn})
+A C expression whose value is the register class to which a valid
+base register for a specified @var{insn} must belong. This macro is
+used when some backend insns may have limited usage of base register
+compared with other insns.
+@end defmac
+
 @defmac INDEX_REG_CLASS
 A macro whose definition is the name of the class to which a valid
 index register must belong.  An index register is one used in an
@@ -2200,6 +2207,13 @@ corresponding index expression if @var{outer_code} is @code{PLUS};
 that appear outside a @code{MEM}, i.e., as an @code{address_operand}.
 @end defmac
 
+@defmac REGNO_OK_FOR_INSN_BASE_P (@var{num}, @var{insn})
+A C expression which is nonzero if register number @var{num} is
+suitable for use as a base register in operand addresses for a specified
+@var{insn}. This macro is used when some backend insn may have limited
+usage of base register compared with other insns.
+@end defmac
+
 @defmac REGNO_OK_FOR_INDEX_P (@var{num})
 A C expression which is nonzero if register number @var{num} is
 suitable for use as an index register in operand addresses.  It may be
diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index 3aaa4906999..6dc77af86cd 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -3681,7 +3681,7 @@ process_address_1 (int nop, bool check_only_p,
 				     REGNO (*ad.base_term)) != NULL_RTX)
 	    ? after : NULL),
 	   base_reg_class (ad.mode, ad.as, ad.base_outer_code,
-			   get_index_code (&ad)))))
+			   get_index_code (&ad), curr_insn))))
     {
       change_p = true;
       if (ad.base_term2 != NULL)
@@ -3731,7 +3731,8 @@ process_address_1 (int nop, bool check_only_p,
 	  rtx_insn *last = get_last_insn ();
 	  int code = -1;
 	  enum reg_class cl = base_reg_class (ad.mode, ad.as,
-					      SCRATCH, SCRATCH);
+					      SCRATCH, SCRATCH,
+					      curr_insn);
 	  rtx addr = *ad.inner;
 
 	  new_reg = lra_create_new_reg (Pmode, NULL_RTX, cl, NULL, "addr");
@@ -3794,7 +3795,8 @@ process_address_1 (int nop, bool check_only_p,
 	  /* index * scale + disp => new base + index * scale,
 	     case (1) above.  */
 	  enum reg_class cl = base_reg_class (ad.mode, ad.as, PLUS,
-					      GET_CODE (*ad.index));
+					      GET_CODE (*ad.index),
+					      curr_insn);
 
 	  lra_assert (INDEX_REG_CLASS != NO_REGS);
 	  new_reg = lra_create_new_reg (Pmode, NULL_RTX, cl, NULL, "disp");
@@ -3855,7 +3857,7 @@ process_address_1 (int nop, bool check_only_p,
 	      *ad.base_term = XEXP (SET_SRC (set), 0);
 	      *ad.disp_term = XEXP (SET_SRC (set), 1);
 	      cl = base_reg_class (ad.mode, ad.as, ad.base_outer_code,
-				   get_index_code (&ad));
+				   get_index_code (&ad), curr_insn);
 	      regno = REGNO (*ad.base_term);
 	      if (regno >= FIRST_PSEUDO_REGISTER
 		  && cl != lra_get_allocno_class (regno))
@@ -3899,7 +3901,8 @@ process_address_1 (int nop, bool check_only_p,
   else
     {
       enum reg_class cl = base_reg_class (ad.mode, ad.as,
-					  SCRATCH, SCRATCH);
+					  SCRATCH, SCRATCH,
+					  curr_insn);
       rtx addr = *ad.inner;
       
       new_reg = lra_create_new_reg (Pmode, NULL_RTX, cl, NULL, "addr");
@@ -4649,7 +4652,7 @@ curr_insn_transform (bool check_only_p)
 
 	  push_to_sequence (before);
 	  rclass = base_reg_class (GET_MODE (op), MEM_ADDR_SPACE (op),
-				   MEM, SCRATCH);
+				   MEM, SCRATCH, curr_insn);
 	  if (GET_RTX_CLASS (code) == RTX_AUTOINC)
 	    new_reg = emit_inc (rclass, *loc, *loc,
 				/* This value does not matter for MODIFY.  */
diff --git a/gcc/reload.cc b/gcc/reload.cc
index 2126bdd117c..72f7e27af15 100644
--- a/gcc/reload.cc
+++ b/gcc/reload.cc
@@ -3321,7 +3321,7 @@ find_reloads (rtx_insn *insn, int replace, int ind_levels, int live_known,
 		       were handled in find_reloads_address.  */
 		    this_alternative[i]
 		      = base_reg_class (VOIDmode, ADDR_SPACE_GENERIC,
-					ADDRESS, SCRATCH);
+					ADDRESS, SCRATCH, insn);
 		    win = 1;
 		    badop = 0;
 		    break;
@@ -3508,7 +3508,7 @@ find_reloads (rtx_insn *insn, int replace, int ind_levels, int live_known,
 			   the address into a base register.  */
 			this_alternative[i]
 			  = base_reg_class (VOIDmode, ADDR_SPACE_GENERIC,
-					    ADDRESS, SCRATCH);
+					    ADDRESS, SCRATCH, insn);
 			badop = 0;
 			break;
 
@@ -4018,7 +4018,7 @@ find_reloads (rtx_insn *insn, int replace, int ind_levels, int live_known,
 	    operand_reloadnum[i]
 	      = push_reload (XEXP (recog_data.operand[i], 0), NULL_RTX,
 			     &XEXP (recog_data.operand[i], 0), (rtx*) 0,
-			     base_reg_class (VOIDmode, as, MEM, SCRATCH),
+			     base_reg_class (VOIDmode, as, MEM, SCRATCH, insn),
 			     address_mode,
 			     VOIDmode, 0, 0, i, RELOAD_OTHER);
 	    rld[operand_reloadnum[i]].inc
@@ -4897,7 +4897,8 @@ find_reloads_address (machine_mode mode, rtx *memrefloc, rtx ad,
       if (reg_equiv_constant (regno) != 0)
 	{
 	  find_reloads_address_part (reg_equiv_constant (regno), loc,
-				     base_reg_class (mode, as, MEM, SCRATCH),
+				     base_reg_class (mode, as, MEM,
+						     SCRATCH, insn),
 				     GET_MODE (ad), opnum, type, ind_levels);
 	  return 1;
 	}
@@ -4966,7 +4967,7 @@ find_reloads_address (machine_mode mode, rtx *memrefloc, rtx ad,
 
       /* If we do not have one of the cases above, we must do the reload.  */
       push_reload (ad, NULL_RTX, loc, (rtx*) 0,
-		   base_reg_class (mode, as, MEM, SCRATCH),
+		   base_reg_class (mode, as, MEM, SCRATCH, insn),
 		   GET_MODE (ad), VOIDmode, 0, 0, opnum, type);
       return 1;
     }
@@ -5123,7 +5124,8 @@ find_reloads_address (machine_mode mode, rtx *memrefloc, rtx ad,
 	     reload the sum into a base reg.
 	     That will at least work.  */
 	  find_reloads_address_part (ad, loc,
-				     base_reg_class (mode, as, MEM, SCRATCH),
+				     base_reg_class (mode, as, MEM,
+						     SCRATCH, insn),
 				     GET_MODE (ad), opnum, type, ind_levels);
 	}
       return ! removed_and;
@@ -5203,7 +5205,7 @@ find_reloads_address (machine_mode mode, rtx *memrefloc, rtx ad,
 				 op_index == 0 ? addend : offset_reg);
 	  *loc = ad;
 
-	  cls = base_reg_class (mode, as, MEM, GET_CODE (addend));
+	  cls = base_reg_class (mode, as, MEM, GET_CODE (addend), insn);
 	  find_reloads_address_part (XEXP (ad, op_index),
 				     &XEXP (ad, op_index), cls,
 				     GET_MODE (ad), opnum, type, ind_levels);
@@ -5261,7 +5263,8 @@ find_reloads_address (machine_mode mode, rtx *memrefloc, rtx ad,
 	}
 
       find_reloads_address_part (ad, loc,
-				 base_reg_class (mode, as, MEM, SCRATCH),
+				 base_reg_class (mode, as, MEM,
+						 SCRATCH, insn),
 				 address_mode, opnum, type, ind_levels);
       return ! removed_and;
     }
@@ -5513,7 +5516,8 @@ find_reloads_address_1 (machine_mode mode, addr_space_t as,
   if (context == 1)
     context_reg_class = INDEX_REG_CLASS;
   else
-    context_reg_class = base_reg_class (mode, as, outer_code, index_code);
+    context_reg_class = base_reg_class (mode, as, outer_code, index_code,
+					insn);
 
   switch (code)
     {
@@ -5738,7 +5742,8 @@ find_reloads_address_1 (machine_mode mode, addr_space_t as,
 		reloadnum = push_reload (tem, tem, &XEXP (x, 0),
 					 &XEXP (op1, 0),
 					 base_reg_class (mode, as,
-							 code, index_code),
+							 code, index_code,
+							 insn),
 					 GET_MODE (x), GET_MODE (x), 0,
 					 0, opnum, RELOAD_OTHER);
 
@@ -5756,7 +5761,8 @@ find_reloads_address_1 (machine_mode mode, addr_space_t as,
 	    reloadnum = push_reload (XEXP (op1, 0), XEXP (x, 0),
 				     &XEXP (op1, 0), &XEXP (x, 0),
 				     base_reg_class (mode, as,
-						     code, index_code),
+						     code, index_code,
+						     insn),
 				     GET_MODE (x), GET_MODE (x), 0, 0,
 				     opnum, RELOAD_OTHER);
 
@@ -6216,7 +6222,7 @@ find_reloads_subreg_address (rtx x, int opnum, enum reload_type type,
     {
       push_reload (XEXP (tem, 0), NULL_RTX, &XEXP (tem, 0), (rtx*) 0,
 		   base_reg_class (GET_MODE (tem), MEM_ADDR_SPACE (tem),
-				   MEM, SCRATCH),
+				   MEM, SCRATCH, insn),
 		   GET_MODE (XEXP (tem, 0)), VOIDmode, 0, 0, opnum, type);
       reloaded = 1;
     }
diff --git a/gcc/reload1.cc b/gcc/reload1.cc
index 9ba822d1ff7..f41f4a4de22 100644
--- a/gcc/reload1.cc
+++ b/gcc/reload1.cc
@@ -1382,7 +1382,7 @@ maybe_fix_stack_asms (void)
 		  if (insn_extra_address_constraint (cn))
 		    cls = (int) reg_class_subunion[cls]
 		      [(int) base_reg_class (VOIDmode, ADDR_SPACE_GENERIC,
-					     ADDRESS, SCRATCH)];
+					     ADDRESS, SCRATCH, chain->insn)];
 		  else
 		    cls = (int) reg_class_subunion[cls]
 		      [reg_class_for_constraint (cn)];
-- 
2.31.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 02/13] [APX EGPR] middle-end: Add index_reg_class with insn argument.
  2023-09-22 10:56 [PATCH v2 00/13] Support Intel APX EGPR Hongyu Wang
  2023-09-22 10:56 ` [PATCH 01/13] [APX EGPR] middle-end: Add insn argument to base_reg_class Hongyu Wang
@ 2023-09-22 10:56 ` Hongyu Wang
  2023-09-22 16:03   ` Vladimir Makarov
  2023-09-22 10:56 ` [PATCH 03/13] [APX_EGPR] Initial support for APX_F Hongyu Wang
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 28+ messages in thread
From: Hongyu Wang @ 2023-09-22 10:56 UTC (permalink / raw)
  To: gcc-patches; +Cc: ubizjak, vmakarov, jakub, Kong Lingling, Hongtao Liu

Like base_reg_class, INDEX_REG_CLASS also does not support backend insn.
Add index_reg_class with insn argument for lra/reload usage.

gcc/ChangeLog:

	* addresses.h (index_reg_class): New wrapper function like
	base_reg_class.
	* doc/tm.texi: Document INSN_INDEX_REG_CLASS.
	* doc/tm.texi.in: Ditto.
	* lra-constraints.cc (index_part_to_reg): Pass index_class.
	(process_address_1): Calls index_reg_class with curr_insn and
	replace INDEX_REG_CLASS with its return value index_cl.
	* reload.cc (find_reloads_address): Likewise.
	(find_reloads_address_1): Likewise.

Co-authored-by: Kong Lingling <lingling.kong@intel.com>
Co-authored-by: Hongtao Liu <hongtao.liu@intel.com>
---
 gcc/addresses.h        | 10 ++++++++++
 gcc/doc/tm.texi        |  7 +++++++
 gcc/doc/tm.texi.in     |  7 +++++++
 gcc/lra-constraints.cc | 17 +++++++++--------
 gcc/reload.cc          |  4 ++--
 5 files changed, 35 insertions(+), 10 deletions(-)

diff --git a/gcc/addresses.h b/gcc/addresses.h
index 2c92927bd51..08bf39cd56c 100644
--- a/gcc/addresses.h
+++ b/gcc/addresses.h
@@ -51,6 +51,16 @@ base_reg_class (machine_mode mode ATTRIBUTE_UNUSED,
 #endif
 }
 
+inline enum reg_class
+index_reg_class (rtx_insn *insn ATTRIBUTE_UNUSED = NULL)
+{
+#ifdef INSN_INDEX_REG_CLASS
+  return INSN_INDEX_REG_CLASS (insn);
+#else
+  return INDEX_REG_CLASS;
+#endif
+}
+
 /* Wrapper function to unify target macros REGNO_MODE_CODE_OK_FOR_BASE_P,
    REGNO_MODE_OK_FOR_REG_BASE_P, REGNO_MODE_OK_FOR_BASE_P and
    REGNO_OK_FOR_BASE_P.
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 5b1e2a11f89..c566f7a1105 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -2582,6 +2582,13 @@ address where its value is either multiplied by a scale factor or
 added to another register (as well as added to a displacement).
 @end defmac
 
+@defmac INSN_INDEX_REG_CLASS (@var{insn})
+A C expression whose value is the register class to which a valid
+index register for a specified @var{insn} must belong. This macro is
+used when some backend insns may have limited usage of index register
+compared with other insns.
+@end defmac
+
 @defmac REGNO_OK_FOR_BASE_P (@var{num})
 A C expression which is nonzero if register number @var{num} is
 suitable for use as a base register in operand addresses.
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index f6e63ad8871..3182d0d7c75 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -2164,6 +2164,13 @@ address where its value is either multiplied by a scale factor or
 added to another register (as well as added to a displacement).
 @end defmac
 
+@defmac INSN_INDEX_REG_CLASS (@var{insn})
+A C expression whose value is the register class to which a valid
+index register for a specified @var{insn} must belong. This macro is
+used when some backend insns may have limited usage of index register
+compared with other insns.
+@end defmac
+
 @defmac REGNO_OK_FOR_BASE_P (@var{num})
 A C expression which is nonzero if register number @var{num} is
 suitable for use as a base register in operand addresses.
diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index 6dc77af86cd..0c8e28e0194 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -3399,12 +3399,12 @@ base_plus_disp_to_reg (struct address_info *ad, rtx disp)
 /* Make reload of index part of address AD.  Return the new
    pseudo.  */
 static rtx
-index_part_to_reg (struct address_info *ad)
+index_part_to_reg (struct address_info *ad, enum reg_class index_class)
 {
   rtx new_reg;
 
   new_reg = lra_create_new_reg (GET_MODE (*ad->index), NULL_RTX,
-				INDEX_REG_CLASS, NULL, "index term");
+				index_class, NULL, "index term");
   expand_mult (GET_MODE (*ad->index), *ad->index_term,
 	       GEN_INT (get_index_scale (ad)), new_reg, 1);
   return new_reg;
@@ -3659,13 +3659,14 @@ process_address_1 (int nop, bool check_only_p,
   /* If INDEX_REG_CLASS is assigned to base_term already and isn't to
      index_term, swap them so to avoid assigning INDEX_REG_CLASS to both
      when INDEX_REG_CLASS is a single register class.  */
+  enum reg_class index_cl = index_reg_class (curr_insn);
   if (ad.base_term != NULL
       && ad.index_term != NULL
-      && ira_class_hard_regs_num[INDEX_REG_CLASS] == 1
+      && ira_class_hard_regs_num[index_cl] == 1
       && REG_P (*ad.base_term)
       && REG_P (*ad.index_term)
-      && in_class_p (*ad.base_term, INDEX_REG_CLASS, NULL)
-      && ! in_class_p (*ad.index_term, INDEX_REG_CLASS, NULL))
+      && in_class_p (*ad.base_term, index_cl, NULL)
+      && ! in_class_p (*ad.index_term, index_cl, NULL))
     {
       std::swap (ad.base, ad.index);
       std::swap (ad.base_term, ad.index_term);
@@ -3689,7 +3690,7 @@ process_address_1 (int nop, bool check_only_p,
     }
   if (ad.index_term != NULL
       && process_addr_reg (ad.index_term, check_only_p,
-			   before, NULL, INDEX_REG_CLASS))
+			   before, NULL, index_cl))
     change_p = true;
 
   /* Target hooks sometimes don't treat extra-constraint addresses as
@@ -3798,7 +3799,7 @@ process_address_1 (int nop, bool check_only_p,
 					      GET_CODE (*ad.index),
 					      curr_insn);
 
-	  lra_assert (INDEX_REG_CLASS != NO_REGS);
+	  lra_assert (index_cl != NO_REGS);
 	  new_reg = lra_create_new_reg (Pmode, NULL_RTX, cl, NULL, "disp");
 	  lra_emit_move (new_reg, *ad.disp);
 	  *ad.inner = simplify_gen_binary (PLUS, GET_MODE (new_reg),
@@ -3894,7 +3895,7 @@ process_address_1 (int nop, bool check_only_p,
       changed pseudo on the equivalent memory and a subreg of the
       pseudo onto the memory of different mode for which the scale is
       prohibitted.  */
-      new_reg = index_part_to_reg (&ad);
+      new_reg = index_part_to_reg (&ad, index_cl);
       *ad.inner = simplify_gen_binary (PLUS, GET_MODE (new_reg),
 				       *ad.base_term, new_reg);
     }
diff --git a/gcc/reload.cc b/gcc/reload.cc
index 72f7e27af15..66b484b12fa 100644
--- a/gcc/reload.cc
+++ b/gcc/reload.cc
@@ -5114,7 +5114,7 @@ find_reloads_address (machine_mode mode, rtx *memrefloc, rtx ad,
 	  /* Reload the displacement into an index reg.
 	     We assume the frame pointer or arg pointer is a base reg.  */
 	  find_reloads_address_part (XEXP (ad, 1), &XEXP (ad, 1),
-				     INDEX_REG_CLASS, GET_MODE (ad), opnum,
+				     index_reg_class (insn), GET_MODE (ad), opnum,
 				     type, ind_levels);
 	  return 0;
 	}
@@ -5514,7 +5514,7 @@ find_reloads_address_1 (machine_mode mode, addr_space_t as,
   bool reloaded_inner_of_autoinc = false;
 
   if (context == 1)
-    context_reg_class = INDEX_REG_CLASS;
+    context_reg_class = index_reg_class (insn);
   else
     context_reg_class = base_reg_class (mode, as, outer_code, index_code,
 					insn);
-- 
2.31.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 03/13] [APX_EGPR] Initial support for APX_F
  2023-09-22 10:56 [PATCH v2 00/13] Support Intel APX EGPR Hongyu Wang
  2023-09-22 10:56 ` [PATCH 01/13] [APX EGPR] middle-end: Add insn argument to base_reg_class Hongyu Wang
  2023-09-22 10:56 ` [PATCH 02/13] [APX EGPR] middle-end: Add index_reg_class with insn argument Hongyu Wang
@ 2023-09-22 10:56 ` Hongyu Wang
  2023-10-07  2:35   ` Hongtao Liu
  2023-09-22 10:56 ` [PATCH 04/13] [APX EGPR] Add 16 new integer general purpose registers Hongyu Wang
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 28+ messages in thread
From: Hongyu Wang @ 2023-09-22 10:56 UTC (permalink / raw)
  To: gcc-patches; +Cc: ubizjak, vmakarov, jakub, Kong Lingling, Hongtao Liu

From: Kong Lingling <lingling.kong@intel.com>

Add -mapx-features= enumeration to separate subfeatures of APX_F.
-mapxf is treated same as previous ISA flag, while it sets
-mapx-features=apx_all that enables all subfeatures.

gcc/ChangeLog:

	* common/config/i386/cpuinfo.h (XSTATE_APX_F): New macro.
	(XCR_APX_F_ENABLED_MASK): Likewise.
	(get_available_features): Detect APX_F under
	* common/config/i386/i386-common.cc (OPTION_MASK_ISA2_APX_F_SET): New.
	(OPTION_MASK_ISA2_APX_F_UNSET): Likewise.
	(ix86_handle_option): Handle -mapxf.
	* common/config/i386/i386-cpuinfo.h (FEATURE_APX_F): New.
	* common/config/i386/i386-isas.h: Add entry for APX_F.
	* config/i386/cpuid.h (bit_APX_F): New.
	* config/i386/i386.h (bit_APX_F): (TARGET_APX_EGPR,
	TARGET_APX_PUSH2POP2, TARGET_APX_NDD): New define.
	* config/i386/i386-opts.h (enum apx_features): New enum.
	* config/i386/i386-isa.def (APX_F): New DEF_PTA.
	* config/i386/i386-options.cc (ix86_function_specific_save):
	Save ix86_apx_features.
	(ix86_function_specific_restore): Restore it.
	(ix86_valid_target_attribute_inner_p): Add mapxf.
	(ix86_option_override_internal): Set ix86_apx_features for PTA
	and TARGET_APX_F. Also reports error when APX_F is set but not
	having TARGET_64BIT.
	* config/i386/i386.opt: (-mapxf): New ISA flag option.
	(-mapx=): New enumeration option.
	(apx_features): New enum type.
	(apx_none): New enum value.
	(apx_egpr): Likewise.
	(apx_push2pop2): Likewise.
	(apx_ndd): Likewise.
	(apx_all): Likewise.
	* doc/invoke.texi: Document mapxf.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/apx-1.c: New test.

Co-authored-by: Hongyu Wang <hongyu.wang@intel.com>
Co-authored-by: Hongtao Liu <hongtao.liu@intel.com>
---
 gcc/common/config/i386/cpuinfo.h      | 12 +++++++++++-
 gcc/common/config/i386/i386-common.cc | 17 +++++++++++++++++
 gcc/common/config/i386/i386-cpuinfo.h |  1 +
 gcc/common/config/i386/i386-isas.h    |  1 +
 gcc/config/i386/cpuid.h               |  1 +
 gcc/config/i386/i386-isa.def          |  1 +
 gcc/config/i386/i386-options.cc       | 18 ++++++++++++++++++
 gcc/config/i386/i386-opts.h           |  8 ++++++++
 gcc/config/i386/i386.h                |  4 ++++
 gcc/config/i386/i386.opt              | 25 +++++++++++++++++++++++++
 gcc/doc/invoke.texi                   | 11 +++++++----
 gcc/testsuite/gcc.target/i386/apx-1.c |  8 ++++++++
 12 files changed, 102 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-1.c

diff --git a/gcc/common/config/i386/cpuinfo.h b/gcc/common/config/i386/cpuinfo.h
index 24ae0dbf0ac..141d3743316 100644
--- a/gcc/common/config/i386/cpuinfo.h
+++ b/gcc/common/config/i386/cpuinfo.h
@@ -678,6 +678,7 @@ get_available_features (struct __processor_model *cpu_model,
 #define XSTATE_HI_ZMM			0x80
 #define XSTATE_TILECFG			0x20000
 #define XSTATE_TILEDATA		0x40000
+#define XSTATE_APX_F			0x80000
 
 #define XCR_AVX_ENABLED_MASK \
   (XSTATE_SSE | XSTATE_YMM)
@@ -685,11 +686,13 @@ get_available_features (struct __processor_model *cpu_model,
   (XSTATE_SSE | XSTATE_YMM | XSTATE_OPMASK | XSTATE_ZMM | XSTATE_HI_ZMM)
 #define XCR_AMX_ENABLED_MASK \
   (XSTATE_TILECFG | XSTATE_TILEDATA)
+#define XCR_APX_F_ENABLED_MASK XSTATE_APX_F
 
-  /* Check if AVX and AVX512 are usable.  */
+  /* Check if AVX, AVX512 and APX are usable.  */
   int avx_usable = 0;
   int avx512_usable = 0;
   int amx_usable = 0;
+  int apx_usable = 0;
   /* Check if KL is usable.  */
   int has_kl = 0;
   if ((ecx & bit_OSXSAVE))
@@ -709,6 +712,8 @@ get_available_features (struct __processor_model *cpu_model,
 	}
       amx_usable = ((xcrlow & XCR_AMX_ENABLED_MASK)
 		    == XCR_AMX_ENABLED_MASK);
+      apx_usable = ((xcrlow & XCR_APX_F_ENABLED_MASK)
+		    == XCR_APX_F_ENABLED_MASK);
     }
 
 #define set_feature(f) \
@@ -922,6 +927,11 @@ get_available_features (struct __processor_model *cpu_model,
 	      if (edx & bit_AMX_COMPLEX)
 		set_feature (FEATURE_AMX_COMPLEX);
 	    }
+	  if (apx_usable)
+	    {
+	      if (edx & bit_APX_F)
+		set_feature (FEATURE_APX_F);
+	    }
 	}
     }
 
diff --git a/gcc/common/config/i386/i386-common.cc b/gcc/common/config/i386/i386-common.cc
index 95468b7c405..86596e96ad1 100644
--- a/gcc/common/config/i386/i386-common.cc
+++ b/gcc/common/config/i386/i386-common.cc
@@ -123,6 +123,7 @@ along with GCC; see the file COPYING3.  If not see
 #define OPTION_MASK_ISA2_SM3_SET OPTION_MASK_ISA2_SM3
 #define OPTION_MASK_ISA2_SHA512_SET OPTION_MASK_ISA2_SHA512
 #define OPTION_MASK_ISA2_SM4_SET OPTION_MASK_ISA2_SM4
+#define OPTION_MASK_ISA2_APX_F_SET OPTION_MASK_ISA2_APX_F
 
 /* SSE4 includes both SSE4.1 and SSE4.2. -msse4 should be the same
    as -msse4.2.  */
@@ -309,6 +310,7 @@ along with GCC; see the file COPYING3.  If not see
 #define OPTION_MASK_ISA2_SM3_UNSET OPTION_MASK_ISA2_SM3
 #define OPTION_MASK_ISA2_SHA512_UNSET OPTION_MASK_ISA2_SHA512
 #define OPTION_MASK_ISA2_SM4_UNSET OPTION_MASK_ISA2_SM4
+#define OPTION_MASK_ISA2_APX_F_UNSET OPTION_MASK_ISA2_APX_F
 
 /* SSE4 includes both SSE4.1 and SSE4.2.  -mno-sse4 should the same
    as -mno-sse4.1. */
@@ -1341,6 +1343,21 @@ ix86_handle_option (struct gcc_options *opts,
 	}
       return true;
 
+    case OPT_mapxf:
+      if (value)
+	{
+	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_APX_F_SET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_APX_F_SET;
+	  opts->x_ix86_apx_features = apx_all;
+	}
+      else
+	{
+	  opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_APX_F_UNSET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_APX_F_UNSET;
+	  opts->x_ix86_apx_features = apx_none;
+	}
+      return true;
+
     case OPT_mfma:
       if (value)
 	{
diff --git a/gcc/common/config/i386/i386-cpuinfo.h b/gcc/common/config/i386/i386-cpuinfo.h
index 9153b4d0a54..8bf592191ab 100644
--- a/gcc/common/config/i386/i386-cpuinfo.h
+++ b/gcc/common/config/i386/i386-cpuinfo.h
@@ -261,6 +261,7 @@ enum processor_features
   FEATURE_SM3,
   FEATURE_SHA512,
   FEATURE_SM4,
+  FEATURE_APX_F,
   CPU_FEATURE_MAX
 };
 
diff --git a/gcc/common/config/i386/i386-isas.h b/gcc/common/config/i386/i386-isas.h
index 2297903a45e..47e0cbd6f5b 100644
--- a/gcc/common/config/i386/i386-isas.h
+++ b/gcc/common/config/i386/i386-isas.h
@@ -191,4 +191,5 @@ ISA_NAMES_TABLE_START
   ISA_NAMES_TABLE_ENTRY("sm3", FEATURE_SM3, P_NONE, "-msm3")
   ISA_NAMES_TABLE_ENTRY("sha512", FEATURE_SHA512, P_NONE, "-msha512")
   ISA_NAMES_TABLE_ENTRY("sm4", FEATURE_SM4, P_NONE, "-msm4")
+  ISA_NAMES_TABLE_ENTRY("apxf", FEATURE_APX_F, P_NONE, "-mapxf")
 ISA_NAMES_TABLE_END
diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h
index 73c15480350..f3d3a2a1c22 100644
--- a/gcc/config/i386/cpuid.h
+++ b/gcc/config/i386/cpuid.h
@@ -149,6 +149,7 @@
 #define bit_AVXNECONVERT	(1 << 5)
 #define bit_AVXVNNIINT16	(1 << 10)
 #define bit_PREFETCHI	(1 << 14)
+#define bit_APX_F	(1 << 21)
 
 /* Extended State Enumeration Sub-leaf (%eax == 0xd, %ecx == 1) */
 #define bit_XSAVEOPT	(1 << 0)
diff --git a/gcc/config/i386/i386-isa.def b/gcc/config/i386/i386-isa.def
index aeafcf870ac..c581f343339 100644
--- a/gcc/config/i386/i386-isa.def
+++ b/gcc/config/i386/i386-isa.def
@@ -121,3 +121,4 @@ DEF_PTA(AVXVNNIINT16)
 DEF_PTA(SM3)
 DEF_PTA(SHA512)
 DEF_PTA(SM4)
+DEF_PTA(APX_F)
diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
index e47f9ed5d5f..b9727d5f1f3 100644
--- a/gcc/config/i386/i386-options.cc
+++ b/gcc/config/i386/i386-options.cc
@@ -694,6 +694,7 @@ ix86_function_specific_save (struct cl_target_option *ptr,
   ptr->branch_cost = ix86_branch_cost;
   ptr->tune_defaulted = ix86_tune_defaulted;
   ptr->arch_specified = ix86_arch_specified;
+  ptr->x_ix86_apx_features = opts->x_ix86_apx_features;
   ptr->x_ix86_isa_flags_explicit = opts->x_ix86_isa_flags_explicit;
   ptr->x_ix86_isa_flags2_explicit = opts->x_ix86_isa_flags2_explicit;
   ptr->x_recip_mask_explicit = opts->x_recip_mask_explicit;
@@ -832,6 +833,7 @@ ix86_function_specific_restore (struct gcc_options *opts,
   ix86_prefetch_sse = ptr->prefetch_sse;
   ix86_tune_defaulted = ptr->tune_defaulted;
   ix86_arch_specified = ptr->arch_specified;
+  opts->x_ix86_apx_features = ptr->x_ix86_apx_features;
   opts->x_ix86_isa_flags_explicit = ptr->x_ix86_isa_flags_explicit;
   opts->x_ix86_isa_flags2_explicit = ptr->x_ix86_isa_flags2_explicit;
   opts->x_recip_mask_explicit = ptr->x_recip_mask_explicit;
@@ -1109,6 +1111,7 @@ ix86_valid_target_attribute_inner_p (tree fndecl, tree args, char *p_strings[],
     IX86_ATTR_ISA ("sm3", OPT_msm3),
     IX86_ATTR_ISA ("sha512", OPT_msha512),
     IX86_ATTR_ISA ("sm4", OPT_msm4),
+    IX86_ATTR_ISA ("apxf", OPT_mapxf),
 
     /* enum options */
     IX86_ATTR_ENUM ("fpmath=",	OPT_mfpmath_),
@@ -2080,6 +2083,9 @@ ix86_option_override_internal (bool main_args_p,
       opts->x_ix86_stringop_alg = no_stringop;
     }
 
+  if (TARGET_APX_F && !TARGET_64BIT)
+    error ("%<-mapxf%> is not supported for 32-bit code");
+
   if (TARGET_UINTR && !TARGET_64BIT)
     error ("%<-muintr%> not supported for 32-bit code");
 
@@ -2293,6 +2299,14 @@ ix86_option_override_internal (bool main_args_p,
 	      SET_TARGET_POPCNT (opts);
 	  }
 
+	/* Enable apx if apxf or apx_features are not
+	   explicitly set for -march.  */
+	if (TARGET_64BIT_P (opts->x_ix86_isa_flags)
+	     && ((processor_alias_table[i].flags & PTA_APX_F) != 0)
+	     && !TARGET_EXPLICIT_APX_F_P (opts)
+	     && !OPTION_SET_P (ix86_apx_features))
+	  opts->x_ix86_apx_features = apx_all;
+
 	if ((processor_alias_table[i].flags
 	   & (PTA_PREFETCH_SSE | PTA_SSE)) != 0)
 	  ix86_prefetch_sse = true;
@@ -2444,6 +2458,10 @@ ix86_option_override_internal (bool main_args_p,
   /* Arrange to set up i386_stack_locals for all functions.  */
   init_machine_status = ix86_init_machine_status;
 
+  /* Override APX flag here if ISA bit is set.  */
+  if (TARGET_APX_F && !OPTION_SET_P (ix86_apx_features))
+    opts->x_ix86_apx_features = apx_all;
+
   /* Validate -mregparm= value.  */
   if (opts_set->x_ix86_regparm)
     {
diff --git a/gcc/config/i386/i386-opts.h b/gcc/config/i386/i386-opts.h
index be359f3e3d5..2ec76a16bce 100644
--- a/gcc/config/i386/i386-opts.h
+++ b/gcc/config/i386/i386-opts.h
@@ -134,4 +134,12 @@ enum lam_type {
   lam_u57
 };
 
+enum apx_features {
+  apx_none = 0,
+  apx_egpr = 1 << 0,
+  apx_push2pop2 = 1 << 1,
+  apx_ndd = 1 << 2,
+  apx_all = apx_egpr | apx_push2pop2 | apx_ndd,
+};
+
 #endif
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 3e8488f2ae8..8c7ed541a8f 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -51,6 +51,10 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 
 #define TARGET_MMX_WITH_SSE	(TARGET_64BIT && TARGET_SSE2)
 
+#define TARGET_APX_EGPR (ix86_apx_features & apx_egpr)
+#define TARGET_APX_PUSH2POP2 (ix86_apx_features & apx_push2pop2)
+#define TARGET_APX_NDD (ix86_apx_features & apx_ndd)
+
 #include "config/vxworks-dummy.h"
 
 #include "config/i386/i386-opts.h"
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index 78b499304a4..d89b5bbc5e8 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -1310,3 +1310,28 @@ Enable vectorization for gather instruction.
 mscatter
 Target Alias(mtune-ctrl=, use_scatter, ^use_scatter)
 Enable vectorization for scatter instruction.
+
+mapxf
+Target Mask(ISA2_APX_F) Var(ix86_isa_flags2) Save
+Support APX code generation.
+
+mapx-features=
+Target Undocumented Joined Enum(apx_features) EnumSet Var(ix86_apx_features) Init(apx_none) Save
+
+Enum
+Name(apx_features) Type(int)
+
+EnumValue
+Enum(apx_features) String(none) Value(apx_none) Set(1)
+
+EnumValue
+Enum(apx_features) String(egpr) Value(apx_egpr) Set(2)
+
+EnumValue
+Enum(apx_features) String(push2pop2) Value(apx_push2pop2) Set(3)
+
+EnumValue
+Enum(apx_features) String(ndd) Value(apx_ndd) Set(4)
+
+EnumValue
+Enum(apx_features) String(all) Value(apx_all) Set(1)
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index ba7984bcb7e..afe6b321b14 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1443,7 +1443,7 @@ See RS/6000 and PowerPC Options.
 -mrdseed  -msgx -mavx512vp2intersect -mserialize -mtsxldtrk
 -mamx-tile  -mamx-int8  -mamx-bf16 -muintr -mhreset -mavxvnni
 -mavx512fp16 -mavxifma -mavxvnniint8 -mavxneconvert -mcmpccxadd -mamx-fp16
--mprefetchi -mraoint -mamx-complex -mavxvnniint16 -msm3 -msha512 -msm4
+-mprefetchi -mraoint -mamx-complex -mavxvnniint16 -msm3 -msha512 -msm4 -mapxf
 -mcldemote  -mms-bitfields  -mno-align-stringops  -minline-all-stringops
 -minline-stringops-dynamically  -mstringop-strategy=@var{alg}
 -mkl -mwidekl
@@ -33802,6 +33802,9 @@ preferred alignment to @option{-mpreferred-stack-boundary=2}.
 @need 200
 @opindex msm4
 @itemx -msm4
+@need 200
+@opindex mapxf
+@itemx -mapxf
 These switches enable the use of instructions in the MMX, SSE,
 AVX512ER, AVX512CD, AVX512VL, AVX512BW, AVX512DQ, AVX512IFMA, AVX512VBMI, SHA,
 AES, PCLMUL, CLFLUSHOPT, CLWB, FSGSBASE, PTWRITE, RDRND, F16C, FMA, PCONFIG,
@@ -33812,9 +33815,9 @@ GFNI, VAES, WAITPKG, VPCLMULQDQ, AVX512BITALG, MOVDIRI, MOVDIR64B, AVX512BF16,
 ENQCMD, AVX512VPOPCNTDQ, AVX5124FMAPS, AVX512VNNI, AVX5124VNNIW, SERIALIZE,
 UINTR, HRESET, AMXTILE, AMXINT8, AMXBF16, KL, WIDEKL, AVXVNNI, AVX512-FP16,
 AVXIFMA, AVXVNNIINT8, AVXNECONVERT, CMPCCXADD, AMX-FP16, PREFETCHI, RAOINT,
-AMX-COMPLEX, AVXVNNIINT16, SM3, SHA512, SM4 or CLDEMOTE extended instruction
-sets. Each has a corresponding @option{-mno-} option to disable use of these
-instructions.
+AMX-COMPLEX, AVXVNNIINT16, SM3, SHA512, SM4, APX_F or CLDEMOTE extended
+instruction sets. Each has a corresponding @option{-mno-} option to disable
+use of these instructions.
 
 These extensions are also available as built-in functions: see
 @ref{x86 Built-in Functions}, for details of the functions enabled and
diff --git a/gcc/testsuite/gcc.target/i386/apx-1.c b/gcc/testsuite/gcc.target/i386/apx-1.c
new file mode 100644
index 00000000000..4e580ecdf37
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/apx-1.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mapxf" } */
+/* { dg-error "'-mapxf' is not supported for 32-bit code" "" { target ia32 } 0 } */
+
+void
+apx_hanlder ()
+{
+}
-- 
2.31.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 04/13] [APX EGPR] Add 16 new integer general purpose registers
  2023-09-22 10:56 [PATCH v2 00/13] Support Intel APX EGPR Hongyu Wang
                   ` (2 preceding siblings ...)
  2023-09-22 10:56 ` [PATCH 03/13] [APX_EGPR] Initial support for APX_F Hongyu Wang
@ 2023-09-22 10:56 ` Hongyu Wang
  2023-09-22 10:56 ` [PATCH 05/13] [APX EGPR] Add register and memory constraints that disallow EGPR Hongyu Wang
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 28+ messages in thread
From: Hongyu Wang @ 2023-09-22 10:56 UTC (permalink / raw)
  To: gcc-patches; +Cc: ubizjak, vmakarov, jakub, Kong Lingling, Hongtao Liu

From: Kong Lingling <lingling.kong@intel.com>

Extend GENERAL_REGS with extra r16-r31 registers like REX registers,
named as REX2 registers. They will only be enabled under
TARGET_APX_EGPR.

gcc/ChangeLog:

	* config/i386/i386-protos.h (x86_extended_rex2reg_mentioned_p):
	New function prototype.
	* config/i386/i386.cc (regclass_map): Add mapping for 16 new
	general registers.
	(debugger64_register_map): Likewise.
	(ix86_conditional_register_usage): Clear REX2 register when APX
	disabled.
	(ix86_code_end): Add handling for REX2 reg.
	(print_reg): Likewise.
	(ix86_output_jmp_thunk_or_indirect): Likewise.
	(ix86_output_indirect_branch_via_reg): Likewise.
	(ix86_attr_length_vex_default): Likewise.
	(ix86_emit_save_regs): Adjust to allow saving r31.
	(ix86_register_priority): Set REX2 reg priority same as REX.
	(x86_extended_reg_mentioned_p): Add check for REX2 regs.
	(x86_extended_rex2reg_mentioned_p): New function.
	* config/i386/i386.h (CALL_USED_REGISTERS): Add new extended
	registers.
	(REG_ALLOC_ORDER): Likewise.
	(FIRST_REX2_INT_REG): Define.
	(LAST_REX2_INT_REG): Ditto.
	(GENERAL_REGS): Add 16 new registers.
	(INT_SSE_REGS): Likewise.
	(FLOAT_INT_REGS): Likewise.
	(FLOAT_INT_SSE_REGS): Likewise.
	(INT_MASK_REGS): Likewise.
	(ALL_REGS):Likewise.
	(REX2_INT_REG_P): Define.
	(REX2_INT_REGNO_P): Ditto.
	(GENERAL_REGNO_P): Add REX2_INT_REGNO_P.
	(REGNO_OK_FOR_INDEX_P): Ditto.
	(REG_OK_FOR_INDEX_NONSTRICT_P): Add new extended registers.
	* config/i386/i386.md: Add 16 new integer general
	registers.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/apx-egprs-names.c: New test.
	* gcc.target/i386/apx-spill_to_egprs-1.c: Likewise.
	* gcc.target/i386/apx-interrupt-1.c: Likewise.

Co-authored-by: Hongyu Wang <hongyu.wang@intel.com>
Co-authored-by: Hongtao Liu <hongtao.liu@intel.com>
---
 gcc/config/i386/i386-protos.h                 |   1 +
 gcc/config/i386/i386.cc                       |  67 ++++++++++--
 gcc/config/i386/i386.h                        |  46 +++++---
 gcc/config/i386/i386.md                       |  18 +++-
 .../gcc.target/i386/apx-egprs-names.c         |  17 +++
 .../gcc.target/i386/apx-interrupt-1.c         | 102 ++++++++++++++++++
 .../gcc.target/i386/apx-spill_to_egprs-1.c    |  25 +++++
 7 files changed, 252 insertions(+), 24 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-egprs-names.c
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-interrupt-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-spill_to_egprs-1.c

diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 9ffb125fc2b..bd4782800c4 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -64,6 +64,7 @@ extern bool symbolic_reference_mentioned_p (rtx);
 extern bool extended_reg_mentioned_p (rtx);
 extern bool x86_extended_QIreg_mentioned_p (rtx_insn *);
 extern bool x86_extended_reg_mentioned_p (rtx);
+extern bool x86_extended_rex2reg_mentioned_p (rtx);
 extern bool x86_maybe_negate_const_int (rtx *, machine_mode);
 extern machine_mode ix86_cc_mode (enum rtx_code, rtx, rtx);
 
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 477e6cecc38..fb1672f0b3d 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -169,7 +169,12 @@ enum reg_class const regclass_map[FIRST_PSEUDO_REGISTER] =
   ALL_SSE_REGS, ALL_SSE_REGS, ALL_SSE_REGS, ALL_SSE_REGS,
   /* Mask registers.  */
   ALL_MASK_REGS, MASK_REGS, MASK_REGS, MASK_REGS,
-  MASK_REGS, MASK_REGS, MASK_REGS, MASK_REGS
+  MASK_REGS, MASK_REGS, MASK_REGS, MASK_REGS,
+  /* REX2 registers */
+  GENERAL_REGS, GENERAL_REGS, GENERAL_REGS, GENERAL_REGS,
+  GENERAL_REGS, GENERAL_REGS, GENERAL_REGS, GENERAL_REGS,
+  GENERAL_REGS, GENERAL_REGS, GENERAL_REGS, GENERAL_REGS,
+  GENERAL_REGS, GENERAL_REGS, GENERAL_REGS, GENERAL_REGS,
 };
 
 /* The "default" register map used in 32bit mode.  */
@@ -227,7 +232,10 @@ int const debugger64_register_map[FIRST_PSEUDO_REGISTER] =
   /* AVX-512 registers 24-31 */
   75, 76, 77, 78, 79, 80, 81, 82,
   /* Mask registers */
-  118, 119, 120, 121, 122, 123, 124, 125
+  118, 119, 120, 121, 122, 123, 124, 125,
+  /* rex2 extend interger registers */
+  130, 131, 132, 133, 134, 135, 136, 137,
+  138, 139, 140, 141, 142, 143, 144, 145
 };
 
 /* Define the register numbers to be used in Dwarf debugging information.
@@ -521,6 +529,13 @@ ix86_conditional_register_usage (void)
 
       accessible_reg_set &= ~reg_class_contents[ALL_MASK_REGS];
     }
+
+  /* If APX is disabled, disable the registers.  */
+  if (! (TARGET_APX_EGPR && TARGET_64BIT))
+    {
+      for (i = FIRST_REX2_INT_REG; i <= LAST_REX2_INT_REG; i++)
+	CLEAR_HARD_REG_BIT (accessible_reg_set, i);
+    }
 }
 
 /* Canonicalize a comparison from one we don't have to one we do have.  */
@@ -6188,6 +6203,13 @@ ix86_code_end (void)
 					regno, false);
     }
 
+  for (regno = FIRST_REX2_INT_REG; regno <= LAST_REX2_INT_REG; regno++)
+    {
+      if (TEST_HARD_REG_BIT (indirect_thunks_used, regno))
+	output_indirect_thunk_function (indirect_thunk_prefix_none,
+					regno, false);
+    }
+
   for (regno = FIRST_INT_REG; regno <= LAST_INT_REG; regno++)
     {
       char name[32];
@@ -7199,10 +7221,10 @@ choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned int *align,
 static void
 ix86_emit_save_regs (void)
 {
-  unsigned int regno;
+  int regno;
   rtx_insn *insn;
 
-  for (regno = FIRST_PSEUDO_REGISTER - 1; regno-- > 0; )
+  for (regno = FIRST_PSEUDO_REGISTER - 1; regno >= 0; regno--)
     if (GENERAL_REGNO_P (regno) && ix86_save_reg (regno, true, true))
       {
 	insn = emit_insn (gen_push (gen_rtx_REG (word_mode, regno)));
@@ -13046,7 +13068,7 @@ print_reg (rtx x, int code, FILE *file)
 
   /* Irritatingly, AMD extended registers use
      different naming convention: "r%d[bwd]"  */
-  if (REX_INT_REGNO_P (regno))
+  if (REX_INT_REGNO_P (regno) || REX2_INT_REGNO_P (regno))
     {
       gcc_assert (TARGET_64BIT);
       switch (msize)
@@ -16260,7 +16282,7 @@ ix86_output_jmp_thunk_or_indirect (const char *thunk_name, const int regno)
 {
   if (thunk_name != NULL)
     {
-      if (REX_INT_REGNO_P (regno)
+      if ((REX_INT_REGNO_P (regno) || REX2_INT_REGNO_P (regno))
 	  && ix86_indirect_branch_cs_prefix)
 	fprintf (asm_out_file, "\tcs\n");
       fprintf (asm_out_file, "\tjmp\t");
@@ -16312,7 +16334,7 @@ ix86_output_indirect_branch_via_reg (rtx call_op, bool sibcall_p)
     {
       if (thunk_name != NULL)
 	{
-	  if (REX_INT_REGNO_P (regno)
+	  if ((REX_INT_REGNO_P (regno) || REX_INT_REGNO_P (regno))
 	      && ix86_indirect_branch_cs_prefix)
 	    fprintf (asm_out_file, "\tcs\n");
 	  fprintf (asm_out_file, "\tcall\t");
@@ -17069,19 +17091,26 @@ ix86_attr_length_vex_default (rtx_insn *insn, bool has_0f_opcode,
   for (i = recog_data.n_operands - 1; i >= 0; --i)
     if (REG_P (recog_data.operand[i]))
       {
-	/* REX.W bit uses 3 byte VEX prefix.  */
+	/* REX.W bit uses 3 byte VEX prefix.
+	   REX2 with vex use extended EVEX prefix length is 4-byte.  */
 	if (GET_MODE (recog_data.operand[i]) == DImode
 	    && GENERAL_REG_P (recog_data.operand[i]))
 	  return 3 + 1;
 
 	/* REX.B bit requires 3-byte VEX. Right here we don't know which
-	   operand will be encoded using VEX.B, so be conservative.  */
+	   operand will be encoded using VEX.B, so be conservative.
+	   REX2 with vex use extended EVEX prefix length is 4-byte.  */
 	if (REX_INT_REGNO_P (recog_data.operand[i])
+	    || REX2_INT_REGNO_P (recog_data.operand[i])
 	    || REX_SSE_REGNO_P (recog_data.operand[i]))
 	  reg_only = 3 + 1;
       }
     else if (MEM_P (recog_data.operand[i]))
       {
+	/* REX2.X or REX2.B bits use 3 byte VEX prefix.  */
+	if (x86_extended_rex2reg_mentioned_p (recog_data.operand[i]))
+	  return 4;
+
 	/* REX.X or REX.B bits use 3 byte VEX prefix.  */
 	if (x86_extended_reg_mentioned_p (recog_data.operand[i]))
 	  return 3 + 1;
@@ -19518,6 +19547,8 @@ ix86_register_priority (int hard_regno)
   /* New x86-64 int registers result in bigger code size.  Discourage them.  */
   if (REX_INT_REGNO_P (hard_regno))
     return 2;
+  if (REX2_INT_REGNO_P (hard_regno))
+    return 2;
   /* New x86-64 SSE registers result in bigger code size.  Discourage them.  */
   if (REX_SSE_REGNO_P (hard_regno))
     return 2;
@@ -22764,7 +22795,23 @@ x86_extended_reg_mentioned_p (rtx insn)
     {
       const_rtx x = *iter;
       if (REG_P (x)
-	  && (REX_INT_REGNO_P (REGNO (x)) || REX_SSE_REGNO_P (REGNO (x))))
+	  && (REX_INT_REGNO_P (REGNO (x)) || REX_SSE_REGNO_P (REGNO (x))
+	      || REX2_INT_REGNO_P (REGNO (x))))
+	return true;
+    }
+  return false;
+}
+
+/* Return true when INSN mentions register that must be encoded using REX2
+   prefix.  */
+bool
+x86_extended_rex2reg_mentioned_p (rtx insn)
+{
+  subrtx_iterator::array_type array;
+  FOR_EACH_SUBRTX (iter, array, INSN_P (insn) ? PATTERN (insn) : insn, NONCONST)
+    {
+      const_rtx x = *iter;
+      if (REG_P (x) && REX2_INT_REGNO_P (REGNO (x)))
 	return true;
     }
   return false;
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 8c7ed541a8f..215f6b8db55 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -948,7 +948,11 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
 /*xmm24,xmm25,xmm26,xmm27,xmm28,xmm29,xmm30,xmm31*/		\
      0,   0,    0,    0,    0,    0,    0,    0,		\
 /*  k0,  k1, k2, k3, k4, k5, k6, k7*/				\
-     0,  0,   0,  0,  0,  0,  0,  0 }
+     0,  0,   0,  0,  0,  0,  0,  0,				\
+/*  r16,  r17, r18, r19, r20, r21, r22, r23*/			\
+     0,   0,   0,   0,   0,   0,   0,   0,			\
+/*  r24,  r25, r26, r27, r28, r29, r30, r31*/			\
+     0,   0,   0,   0,   0,   0,   0,   0}			\
 
 /* 1 for registers not available across function calls.
    These must include the FIXED_REGISTERS and also any
@@ -985,7 +989,11 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
 /*xmm24,xmm25,xmm26,xmm27,xmm28,xmm29,xmm30,xmm31*/		\
      1,    1,     1,    1,    1,    1,    1,    1,		\
  /* k0,  k1,  k2,  k3,  k4,  k5,  k6,  k7*/			\
-     1,   1,   1,   1,   1,   1,   1,   1 }
+     1,   1,   1,   1,   1,   1,   1,   1,			\
+/*  r16,  r17, r18, r19, r20, r21, r22, r23*/			\
+     1,   1,   1,   1,   1,   1,   1,   1,			\
+/*  r24,  r25, r26, r27, r28, r29, r30, r31*/			\
+     1,   1,   1,   1,   1,   1,   1,   1}			\
 
 /* Order in which to allocate registers.  Each register must be
    listed once, even those in FIXED_REGISTERS.  List frame pointer
@@ -1001,7 +1009,8 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
   16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,	\
   32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,	\
   48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63,	\
-  64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75 }
+  64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79,	\
+  80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91}
 
 /* ADJUST_REG_ALLOC_ORDER is a macro which permits reg_alloc_order
    to be rearranged based on a particular function.  When using sse math,
@@ -1203,6 +1212,9 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
 #define FIRST_MASK_REG  MASK0_REG
 #define LAST_MASK_REG   MASK7_REG
 
+#define FIRST_REX2_INT_REG  R16_REG
+#define LAST_REX2_INT_REG   R31_REG
+
 /* Override this in other tm.h files to cope with various OS lossage
    requiring a frame pointer.  */
 #ifndef SUBTARGET_FRAME_POINTER_REQUIRED
@@ -1280,7 +1292,9 @@ enum reg_class
   INDEX_REGS,			/* %eax %ebx %ecx %edx %esi %edi %ebp */
   LEGACY_REGS,			/* %eax %ebx %ecx %edx %esi %edi %ebp %esp */
   GENERAL_REGS,			/* %eax %ebx %ecx %edx %esi %edi %ebp %esp
-				   %r8 %r9 %r10 %r11 %r12 %r13 %r14 %r15 */
+				   %r8 %r9 %r10 %r11 %r12 %r13 %r14 %r15
+				   %r16 %r17 %r18 %r19 %r20 %r21 %r22 %r23
+				   %r24 %r25 %r26 %r27 %r28 %r29 %r30 %r31 */
   FP_TOP_REG, FP_SECOND_REG,	/* %st(0) %st(1) */
   FLOAT_REGS,
   SSE_FIRST_REG,
@@ -1380,7 +1394,7 @@ enum reg_class
       { 0x7e,      0xff0,   0x0 },	/* TLS_GOTBASE_REGS */		\
       { 0x7f,      0xff0,   0x0 },	/* INDEX_REGS */		\
    { 0x900ff,        0x0,   0x0 },	/* LEGACY_REGS */		\
-   { 0x900ff,      0xff0,   0x0 },	/* GENERAL_REGS */		\
+   { 0x900ff,      0xff0,   0xffff000 },	/* GENERAL_REGS */		\
      { 0x100,        0x0,   0x0 },	/* FP_TOP_REG */		\
      { 0x200,        0x0,   0x0 },	/* FP_SECOND_REG */		\
     { 0xff00,        0x0,   0x0 },	/* FLOAT_REGS */		\
@@ -1390,13 +1404,13 @@ enum reg_class
  { 0xff00000, 0xfffff000,   0xf },	/* ALL_SSE_REGS */		\
 { 0xf0000000,        0xf,   0x0 },	/* MMX_REGS */			\
  { 0xff0ff00, 0xfffff000,   0xf },	/* FLOAT_SSE_REGS */		\
- {   0x9ffff,      0xff0,   0x0 },	/* FLOAT_INT_REGS */		\
- { 0xff900ff, 0xfffffff0,   0xf },	/* INT_SSE_REGS */		\
- { 0xff9ffff, 0xfffffff0,   0xf },	/* FLOAT_INT_SSE_REGS */	\
+ {   0x9ffff,      0xff0,   0xffff000 },	/* FLOAT_INT_REGS */		\
+ { 0xff900ff, 0xfffffff0,   0xffff00f },	/* INT_SSE_REGS */		\
+ { 0xff9ffff, 0xfffffff0,   0xffff00f },	/* FLOAT_INT_SSE_REGS */	\
        { 0x0,        0x0, 0xfe0 },	/* MASK_REGS */			\
        { 0x0,        0x0, 0xff0 },	/* ALL_MASK_REGS */		\
-   { 0x900ff,      0xff0, 0xff0 },	/* INT_MASK_REGS */	\
-{ 0xffffffff, 0xffffffff, 0xfff }	/* ALL_REGS  */			\
+   { 0x900ff,      0xff0, 0xffffff0 },	/* INT_MASK_REGS */	\
+{ 0xffffffff, 0xffffffff, 0xfffffff }	/* ALL_REGS  */			\
 }
 
 /* The same information, inverted:
@@ -1426,13 +1440,17 @@ enum reg_class
 #define REX_INT_REGNO_P(N) \
   IN_RANGE ((N), FIRST_REX_INT_REG, LAST_REX_INT_REG)
 
+#define REX2_INT_REG_P(X) (REG_P (X) && REX2_INT_REGNO_P (REGNO (X)))
+#define REX2_INT_REGNO_P(N) \
+  IN_RANGE ((N), FIRST_REX2_INT_REG, LAST_REX2_INT_REG)
+
 #define GENERAL_REG_P(X) (REG_P (X) && GENERAL_REGNO_P (REGNO (X)))
 #define GENERAL_REGNO_P(N) \
-  (LEGACY_INT_REGNO_P (N) || REX_INT_REGNO_P (N))
+  (LEGACY_INT_REGNO_P (N) || REX_INT_REGNO_P (N) || REX2_INT_REGNO_P (N))
 
 #define INDEX_REG_P(X) (REG_P (X) && INDEX_REGNO_P (REGNO (X)))
 #define INDEX_REGNO_P(N) \
-  (LEGACY_INDEX_REGNO_P (N) || REX_INT_REGNO_P (N))
+  (LEGACY_INDEX_REGNO_P (N) || REX_INT_REGNO_P (N) || REX2_INT_REGNO_P (N))
 
 #define ANY_QI_REG_P(X) (REG_P (X) && ANY_QI_REGNO_P (REGNO (X)))
 #define ANY_QI_REGNO_P(N) \
@@ -1990,7 +2008,9 @@ do {							\
  "xmm20", "xmm21", "xmm22", "xmm23",					\
  "xmm24", "xmm25", "xmm26", "xmm27",					\
  "xmm28", "xmm29", "xmm30", "xmm31",					\
- "k0", "k1", "k2", "k3", "k4", "k5", "k6", "k7" }
+ "k0", "k1", "k2", "k3", "k4", "k5", "k6", "k7",			\
+ "r16", "r17", "r18", "r19", "r20", "r21", "r22", "r23",		\
+ "r24", "r25", "r26", "r27", "r28", "r29", "r30", "r31" }
 
 #define REGISTER_NAMES HI_REGISTER_NAMES
 
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index eef8a0e01eb..e3270658cb7 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -464,7 +464,23 @@ (define_constants
    (MASK5_REG			73)
    (MASK6_REG			74)
    (MASK7_REG			75)
-   (FIRST_PSEUDO_REG		76)
+   (R16_REG			76)
+   (R17_REG			77)
+   (R18_REG			78)
+   (R19_REG			79)
+   (R20_REG			80)
+   (R21_REG			81)
+   (R22_REG			82)
+   (R23_REG			83)
+   (R24_REG			84)
+   (R25_REG			85)
+   (R26_REG			86)
+   (R27_REG			87)
+   (R28_REG			88)
+   (R29_REG			89)
+   (R30_REG			90)
+   (R31_REG			91)
+   (FIRST_PSEUDO_REG		92)
   ])
 
 ;; Insn callee abi index.
diff --git a/gcc/testsuite/gcc.target/i386/apx-egprs-names.c b/gcc/testsuite/gcc.target/i386/apx-egprs-names.c
new file mode 100644
index 00000000000..445bcf2c250
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/apx-egprs-names.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-mapxf -m64" } */
+/* { dg-final { scan-assembler "r31" } } */
+/* { dg-final { scan-assembler "r30" } } */
+/* { dg-final { scan-assembler "r29" } } */
+/* { dg-final { scan-assembler "r28" } } */
+void foo ()
+{
+  register long a __asm ("r31");
+  register int b __asm ("r30");
+  register short c __asm ("r29");
+  register char d __asm ("r28");
+  __asm__ __volatile__ ("mov %0, %%rax" : : "r" (a) : "rax");
+  __asm__ __volatile__ ("mov %0, %%eax" : : "r" (b) : "eax");
+  __asm__ __volatile__ ("mov %0, %%eax" : : "r" (c) : "eax");
+  __asm__ __volatile__ ("mov %0, %%eax" : : "r" (d) : "eax");
+}
diff --git a/gcc/testsuite/gcc.target/i386/apx-interrupt-1.c b/gcc/testsuite/gcc.target/i386/apx-interrupt-1.c
new file mode 100644
index 00000000000..441dbf04bf2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/apx-interrupt-1.c
@@ -0,0 +1,102 @@
+/* { dg-do compile } */
+/* { dg-options "-mapxf -m64 -O2 -mgeneral-regs-only -mno-cld -mno-push-args -maccumulate-outgoing-args" } */
+
+extern void foo (void *) __attribute__ ((interrupt));
+extern int bar (int);
+
+void foo (void *frame)
+{
+  int a,b,c,d,e,f,i;
+  a = bar (5);
+  b = bar (a);
+  c = bar (b);
+  d = bar (c);
+  e = bar (d);
+  f = bar (e);
+  for (i = 1; i < 10; i++)
+  {
+    a += bar (a + i) + bar (b + i) +
+	 bar (c + i) + bar (d + i) +
+	 bar (e + i) + bar (f + i);
+  }
+}
+/* { dg-final { scan-assembler-times "push(?:l|q)\[\\t \]*%(?:e|r)ax" 1 } } */
+/* { dg-final { scan-assembler-times "push(?:l|q)\[\\t \]*%(?:e|r)bx" 1 } } */
+/* { dg-final { scan-assembler-times "push(?:l|q)\[\\t \]*%(?:e|r)cx" 1 } } */
+/* { dg-final { scan-assembler-times "push(?:l|q)\[\\t \]*%(?:e|r)dx" 1 } } */
+/* { dg-final { scan-assembler-times "push(?:l|q)\[\\t \]*%(?:e|r)si" 1 } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%rdi" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r8" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r9" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r10" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r11" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r12" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r13" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r14" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r15" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r16" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r17" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r18" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r19" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r20" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r21" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r22" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r23" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r24" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r25" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r26" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r27" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r28" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r29" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r30" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r31" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times {\t\.cfi_offset 145, -16} 1 } } */
+/* { dg-final { scan-assembler-times {\t\.cfi_offset 144, -24} 1 } } */
+/* { dg-final { scan-assembler-times {\t\.cfi_offset 143, -32} 1 } } */
+/* { dg-final { scan-assembler-times {\t\.cfi_offset 142, -40} 1 } } */
+/* { dg-final { scan-assembler-times {\t\.cfi_offset 141, -48} 1 } } */
+/* { dg-final { scan-assembler-times {\t\.cfi_offset 140, -56} 1 } } */
+/* { dg-final { scan-assembler-times {\t\.cfi_offset 139, -64} 1 } } */
+/* { dg-final { scan-assembler-times {\t\.cfi_offset 138, -72} 1 } } */
+/* { dg-final { scan-assembler-times {\t\.cfi_offset 137, -80} 1 } } */
+/* { dg-final { scan-assembler-times {\t\.cfi_offset 136, -88} 1 } } */
+/* { dg-final { scan-assembler-times {\t\.cfi_offset 135, -96} 1 } } */
+/* { dg-final { scan-assembler-times {\t\.cfi_offset 134, -104} 1 } } */
+/* { dg-final { scan-assembler-times {\t\.cfi_offset 133, -112} 1 } } */
+/* { dg-final { scan-assembler-times {\t\.cfi_offset 132, -120} 1 } } */
+/* { dg-final { scan-assembler-times {\t\.cfi_offset 131, -128} 1 } } */
+/* { dg-final { scan-assembler-times {\t\.cfi_offset 130, -136} 1 } } */
+/* { dg-final { scan-assembler-times ".cfi_restore" 15} } */
+/* { dg-final { scan-assembler-times "pop(?:l|q)\[\\t \]*%(?:e|r)ax" 1 } } */
+/* { dg-final { scan-assembler-times "pop(?:l|q)\[\\t \]*%(?:e|r)bx" 1 } } */
+/* { dg-final { scan-assembler-times "pop(?:l|q)\[\\t \]*%(?:e|r)cx" 1 } } */
+/* { dg-final { scan-assembler-times "pop(?:l|q)\[\\t \]*%(?:e|r)dx" 1 } } */
+/* { dg-final { scan-assembler-times "pop(?:l|q)\[\\t \]*%(?:e|r)si" 1 } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%rdi" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r8" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r9" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r10" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r11" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r12" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r13" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r14" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r15" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r16" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r17" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r18" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r19" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r20" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r21" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r22" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r23" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r24" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r25" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r26" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r27" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r28" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r29" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r30" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r31" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "iret" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "iretq" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "\tcld" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/apx-spill_to_egprs-1.c b/gcc/testsuite/gcc.target/i386/apx-spill_to_egprs-1.c
new file mode 100644
index 00000000000..290863d63a7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/apx-spill_to_egprs-1.c
@@ -0,0 +1,25 @@
+/* { dg-do compile  { target { ! ia32 } } } */
+/* { dg-options "-O2 -march=skylake-avx512 -mapxf -DDTYPE32" } */
+
+#include "spill_to_mask-1.c"
+
+/* { dg-final { scan-assembler "movl\[ \t]+\[^\\n\\r\]*, %r16d" } } */
+/* { dg-final { scan-assembler "movl\[ \t]+\[^\\n\\r\]*, %r17d" } } */
+/* { dg-final { scan-assembler "movl\[ \t]+\[^\\n\\r\]*, %r18d" } } */
+/* { dg-final { scan-assembler "movq\[ \t]+\[^\\n\\r\]*, %r19" } } */
+/* { dg-final { scan-assembler "movl\[ \t]+\[^\\n\\r\]*, %r20d" } } */
+/* { dg-final { scan-assembler "movl\[ \t]+\[^\\n\\r\]*, %r21d" } } */
+/* { dg-final { scan-assembler "movl\[ \t]+\[^\\n\\r\]*, %r22d" } } */
+/* { dg-final { scan-assembler "movl\[ \t]+\[^\\n\\r\]*, %r23d" } } */
+/* { dg-final { scan-assembler "movl\[ \t]+\[^\\n\\r\]*, %r24d" } } */
+/* { dg-final { scan-assembler "addl\[ \t]+\[^\\n\\r\]*, %r25d" } } */
+/* { dg-final { scan-assembler "movl\[ \t]+\[^\\n\\r\]*, %r26d" } } */
+/* { dg-final { scan-assembler "movl\[ \t]+\[^\\n\\r\]*, %r27d" } } */
+/* { dg-final { scan-assembler "movbel\[ \t]+\[^\\n\\r\]*, %r28d" } } */
+/* { dg-final { scan-assembler "movbel\[ \t]+\[^\\n\\r\]*, %r29d" } } */
+/* { dg-final { scan-assembler "movbel\[ \t]+\[^\\n\\r\]*, %r30d" } } */
+/* { dg-final { scan-assembler "movbel\[ \t]+\[^\\n\\r\]*, %r31d" } } */
+/* { dg-final { scan-assembler-not "knot" } } */
+/* { dg-final { scan-assembler-not "kxor" } } */
+/* { dg-final { scan-assembler-not "kor" } } */
+/* { dg-final { scan-assembler-not "kandn" } } */
-- 
2.31.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 05/13] [APX EGPR] Add register and memory constraints that disallow EGPR
  2023-09-22 10:56 [PATCH v2 00/13] Support Intel APX EGPR Hongyu Wang
                   ` (3 preceding siblings ...)
  2023-09-22 10:56 ` [PATCH 04/13] [APX EGPR] Add 16 new integer general purpose registers Hongyu Wang
@ 2023-09-22 10:56 ` Hongyu Wang
  2023-09-22 10:56 ` [PATCH 06/13] [APX EGPR] Add backend hook for base_reg_class/index_reg_class Hongyu Wang
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 28+ messages in thread
From: Hongyu Wang @ 2023-09-22 10:56 UTC (permalink / raw)
  To: gcc-patches; +Cc: ubizjak, vmakarov, jakub, Kong Lingling, Hongtao Liu

From: Kong Lingling <lingling.kong@intel.com>

For APX, as we extended the GENERAL_REG_CLASS, new constraints are
needed to restrict insns that cannot adopt EGPR either in its reg or
memory operands. We added a series of constraints for general/backend
ones that related to GPR usage. All of them are prefixed with "j" to
indicate the constraints does not allow EGPR.

gcc/ChangeLog:

	* config/i386/constraints.md (jr): New register constraint
	that prohibits EGPR.
	(jR): Constraint that force usage of EGPR.
	(jm): New memory constraint that prohibits EGPR.
	(ja): Likewise for Bm constraint.
	(jb): Likewise for Tv constraint.
	(j<): New auto-dec memory constraint that prohibits EGPR.
	(j>): Likewise for ">" constraint.
	(jo): Likewise for "o" constraint.
	(jv): Likewise for "V" constraint.
	(jp): Likewise for "p" constraint.
	* config/i386/i386.h (enum reg_class): Add new reg class
	GENERAL_GPR16.

Co-authored-by: Hongyu Wang <hongyu.wang@intel.com>
Co-authored-by: Hongtao Liu <hongtao.liu@intel.com>
---
 gcc/config/i386/constraints.md | 59 +++++++++++++++++++++++++++++++++-
 gcc/config/i386/i386.h         |  4 +++
 2 files changed, 62 insertions(+), 1 deletion(-)

diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
index fd490f39110..36c268d7f9b 100644
--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -19,7 +19,7 @@
 
 ;;; Unused letters:
 ;;;           H
-;;;           h j               z
+;;;           j               z
 
 ;; Integer register constraints.
 ;; It is not necessary to define 'r' here.
@@ -371,3 +371,60 @@ (define_address_constraint "Tv"
 (define_address_constraint "Ts"
   "Address operand without segment register"
   (match_operand 0 "address_no_seg_operand"))
+
+;; Constraint that force to use EGPR, can only adopt to register class.
+(define_register_constraint  "jR" "GENERAL_REGS")
+
+(define_register_constraint  "jr"
+ "TARGET_APX_EGPR ? GENERAL_GPR16 : GENERAL_REGS")
+
+(define_memory_constraint "jm"
+  "@internal memory operand without GPR32."
+  (and (match_operand 0 "memory_operand")
+       (not (and (match_test "TARGET_APX_EGPR")
+		 (match_test "x86_extended_rex2reg_mentioned_p (op)")))))
+
+(define_constraint "j<"
+  "@internal auto-dec memory operand without GPR32."
+  (and (and (match_code "mem")
+	    (ior (match_test "GET_CODE (XEXP (op, 0)) == PRE_DEC")
+	         (match_test "GET_CODE (XEXP (op, 0)) == POST_DEC")))
+       (not (and (match_test "TARGET_APX_EGPR")
+		 (match_test "x86_extended_rex2reg_mentioned_p (op)")))))
+
+(define_constraint "j>"
+  "@internal auto-dec memory operand without GPR32."
+  (and (and (match_code "mem")
+	    (ior (match_test "GET_CODE (XEXP (op, 0)) == PRE_INC")
+	         (match_test "GET_CODE (XEXP (op, 0)) == POST_INC")))
+       (not (and (match_test "TARGET_APX_EGPR")
+		 (match_test "x86_extended_rex2reg_mentioned_p (op)")))))
+
+(define_memory_constraint "jo"
+  "@internal offsetable memory operand without GPR32."
+  (and (and (match_code "mem")
+	    (match_test "offsettable_nonstrict_memref_p (op)"))
+       (not (and (match_test "TARGET_APX_EGPR")
+		 (match_test "x86_extended_rex2reg_mentioned_p (op)")))))
+
+(define_constraint "jV"
+  "@internal non-offsetable memory operand without GPR32."
+  (and (and (match_code "mem")
+	    (match_test "memory_address_addr_space_p (GET_MODE (op),
+						      XEXP (op, 0),
+						      MEM_ADDR_SPACE (op))")
+	    (not (match_test "offsettable_nonstrict_memref_p (op)")))
+       (not (and (match_test "TARGET_APX_EGPR")
+		 (match_test "x86_extended_rex2reg_mentioned_p (op)")))))
+
+(define_address_constraint "jp"
+  "@internal general address operand without GPR32"
+  (and (match_test "address_operand (op, VOIDmode)")
+       (not (and (match_test "TARGET_APX_EGPR")
+		 (match_test "x86_extended_rex2reg_mentioned_p (op)")))))
+
+(define_special_memory_constraint "ja"
+  "@internal vector memory operand without GPR32."
+  (and (match_operand 0 "vector_memory_operand")
+       (not (and (match_test "TARGET_APX_EGPR")
+		 (match_test "x86_extended_rex2reg_mentioned_p (op)")))))
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 215f6b8db55..66b8764e82b 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -1295,6 +1295,8 @@ enum reg_class
 				   %r8 %r9 %r10 %r11 %r12 %r13 %r14 %r15
 				   %r16 %r17 %r18 %r19 %r20 %r21 %r22 %r23
 				   %r24 %r25 %r26 %r27 %r28 %r29 %r30 %r31 */
+  GENERAL_GPR16,		/* %eax %ebx %ecx %edx %esi %edi %ebp %esp
+				   %r8 %r9 %r10 %r11 %r12 %r13 %r14 %r15 */
   FP_TOP_REG, FP_SECOND_REG,	/* %st(0) %st(1) */
   FLOAT_REGS,
   SSE_FIRST_REG,
@@ -1357,6 +1359,7 @@ enum reg_class
    "INDEX_REGS",			\
    "LEGACY_REGS",			\
    "GENERAL_REGS",			\
+   "GENERAL_GPR16",			\
    "FP_TOP_REG", "FP_SECOND_REG",	\
    "FLOAT_REGS",			\
    "SSE_FIRST_REG",			\
@@ -1395,6 +1398,7 @@ enum reg_class
       { 0x7f,      0xff0,   0x0 },	/* INDEX_REGS */		\
    { 0x900ff,        0x0,   0x0 },	/* LEGACY_REGS */		\
    { 0x900ff,      0xff0,   0xffff000 },	/* GENERAL_REGS */		\
+   { 0x900ff,      0xff0,   0x0 },	/* GENERAL_GPR16 */		\
      { 0x100,        0x0,   0x0 },	/* FP_TOP_REG */		\
      { 0x200,        0x0,   0x0 },	/* FP_SECOND_REG */		\
     { 0xff00,        0x0,   0x0 },	/* FLOAT_REGS */		\
-- 
2.31.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 06/13] [APX EGPR] Add backend hook for base_reg_class/index_reg_class.
  2023-09-22 10:56 [PATCH v2 00/13] Support Intel APX EGPR Hongyu Wang
                   ` (4 preceding siblings ...)
  2023-09-22 10:56 ` [PATCH 05/13] [APX EGPR] Add register and memory constraints that disallow EGPR Hongyu Wang
@ 2023-09-22 10:56 ` Hongyu Wang
  2023-09-22 10:56 ` [PATCH 07/13] [APX EGPR] Map reg/mem constraints in inline asm to non-EGPR constraint Hongyu Wang
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 28+ messages in thread
From: Hongyu Wang @ 2023-09-22 10:56 UTC (permalink / raw)
  To: gcc-patches; +Cc: ubizjak, vmakarov, jakub, Kong Lingling, Hongtao Liu

From: Kong Lingling <lingling.kong@intel.com>

Add backend helper functions to verify if a rtx_insn can adopt EGPR to
its base/index reg of memory operand. The verification rule goes like
  1. For asm insn, enable/disable EGPR by ix86_apx_inline_asm_use_gpr32.
  2. Disable EGPR for unrecognized insn.
  3. If which_alternative is not decided, loop through enabled alternatives
  and check its attr_gpr32. Only enable EGPR when all enabled
  alternatives has attr_gpr32 = 1.
  4. If which_alternative is decided, enable/disable EGPR by its corresponding
  attr_gpr32.

gcc/ChangeLog:

	* config/i386/i386-protos.h (ix86_insn_base_reg_class): New
	prototype.
	(ix86_regno_ok_for_insn_base_p): Likewise.
	(ix86_insn_index_reg_class): Likewise.
	* config/i386/i386.cc (ix86_memory_address_use_extended_reg_class_p):
	New helper function to scan the insn.
	(ix86_insn_base_reg_class): New function to choose BASE_REG_CLASS.
	(ix86_regno_ok_for_insn_base_p): Likewise for base regno.
	(ix86_insn_index_reg_class): Likewise for INDEX_REG_CLASS.
	* config/i386/i386.h (INSN_BASE_REG_CLASS): Define.
	(REGNO_OK_FOR_INSN_BASE_P): Likewise.
	(INSN_INDEX_REG_CLASS): Likewise.
	(enum reg_class): Add INDEX_GPR16.
	(GENERAL_GPR16_REGNO_P): Define.
	* config/i386/i386.md (gpr32): New attribute.

Co-authored-by: Hongyu Wang <hongyu.wang@intel.com>
Co-authored-by: Hongtao Liu <hongtao.liu@intel.com>
---
 gcc/config/i386/i386-protos.h |  3 ++
 gcc/config/i386/i386.cc       | 89 +++++++++++++++++++++++++++++++++++
 gcc/config/i386/i386.h        | 17 ++++++-
 gcc/config/i386/i386.md       |  3 ++
 4 files changed, 111 insertions(+), 1 deletion(-)

diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index bd4782800c4..a54e3f6b1dc 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -79,6 +79,9 @@ extern bool ix86_expand_set_or_cpymem (rtx, rtx, rtx, rtx, rtx, rtx,
 				       rtx, rtx, rtx, rtx, bool);
 extern bool ix86_expand_cmpstrn_or_cmpmem (rtx, rtx, rtx, rtx, rtx, bool);
 
+extern enum reg_class ix86_insn_base_reg_class (rtx_insn *);
+extern bool ix86_regno_ok_for_insn_base_p (int, rtx_insn *);
+extern enum reg_class ix86_insn_index_reg_class (rtx_insn *);
 extern bool constant_address_p (rtx);
 extern bool legitimate_pic_operand_p (rtx);
 extern bool legitimate_pic_address_disp_p (rtx);
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index fb1672f0b3d..5af0de4dae7 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -11062,6 +11062,95 @@ ix86_validate_address_register (rtx op)
   return NULL_RTX;
 }
 
+/* Return true if insn memory address can use any available reg
+   in BASE_REG_CLASS or INDEX_REG_CLASS, otherwise false.
+   For APX, some instruction can't be encoded with gpr32
+   which is BASE_REG_CLASS or INDEX_REG_CLASS, for that case
+   returns false.  */
+static bool
+ix86_memory_address_use_extended_reg_class_p (rtx_insn* insn)
+{
+  /* LRA will do some initialization with insn == NULL,
+     return the maximum reg class for that.
+     For other cases, real insn will be passed and checked.  */
+  bool ret = true;
+  if (TARGET_APX_EGPR && insn)
+    {
+      if (asm_noperands (PATTERN (insn)) >= 0
+	  || GET_CODE (PATTERN (insn)) == ASM_INPUT)
+	return ix86_apx_inline_asm_use_gpr32;
+
+      if (INSN_CODE (insn) < 0)
+	return false;
+
+      /* Try recog the insn before calling get_attr_gpr32. Save
+	 the current recog_data first.  */
+      /* Also save which_alternative for current recog.  */
+
+      struct recog_data_d recog_data_save = recog_data;
+      int which_alternative_saved = which_alternative;
+
+      /* Update the recog_data for alternative check. */
+      if (recog_data.insn != insn)
+	extract_insn_cached (insn);
+
+      /* If alternative is not set, loop throught each alternative
+	 of insn and get gpr32 attr for all enabled alternatives.
+	 If any enabled alternatives has 0 value for gpr32, disallow
+	 gpr32 for addressing.  */
+      if (which_alternative_saved == -1)
+	{
+	  alternative_mask enabled = get_enabled_alternatives (insn);
+	  bool curr_insn_gpr32 = false;
+	  for (int i = 0; i < recog_data.n_alternatives; i++)
+	    {
+	      if (!TEST_BIT (enabled, i))
+		continue;
+	      which_alternative = i;
+	      curr_insn_gpr32 = get_attr_gpr32 (insn);
+	      if (!curr_insn_gpr32)
+		ret = false;
+	    }
+	}
+      else
+	{
+	  which_alternative = which_alternative_saved;
+	  ret = get_attr_gpr32 (insn);
+	}
+
+      recog_data = recog_data_save;
+      which_alternative = which_alternative_saved;
+    }
+
+  return ret;
+}
+
+/* For APX, some instructions can't be encoded with gpr32.  */
+enum reg_class
+ix86_insn_base_reg_class (rtx_insn* insn)
+{
+  if (ix86_memory_address_use_extended_reg_class_p (insn))
+    return BASE_REG_CLASS;
+  return GENERAL_GPR16;
+}
+
+bool
+ix86_regno_ok_for_insn_base_p (int regno, rtx_insn* insn)
+{
+
+  if (ix86_memory_address_use_extended_reg_class_p (insn))
+    return GENERAL_REGNO_P (regno);
+  return GENERAL_GPR16_REGNO_P (regno);
+}
+
+enum reg_class
+ix86_insn_index_reg_class (rtx_insn* insn)
+{
+  if (ix86_memory_address_use_extended_reg_class_p (insn))
+    return INDEX_REG_CLASS;
+  return INDEX_GPR16;
+}
+
 /* Recognizes RTL expressions that are valid memory addresses for an
    instruction.  The MODE argument is the machine mode for the MEM
    expression that wants to use this address.
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 66b8764e82b..7fa7585e058 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -1018,6 +1018,14 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
 
 #define ADJUST_REG_ALLOC_ORDER x86_order_regs_for_local_alloc ()
 
+#define INSN_BASE_REG_CLASS(INSN) \
+  ix86_insn_base_reg_class (INSN)
+
+#define REGNO_OK_FOR_INSN_BASE_P(NUM, INSN) \
+  ix86_regno_ok_for_insn_base_p (NUM, INSN)
+
+#define INSN_INDEX_REG_CLASS(INSN) \
+  ix86_insn_index_reg_class (INSN)
 
 #define OVERRIDE_ABI_FORMAT(FNDECL) ix86_call_abi_override (FNDECL)
 
@@ -1297,6 +1305,8 @@ enum reg_class
 				   %r24 %r25 %r26 %r27 %r28 %r29 %r30 %r31 */
   GENERAL_GPR16,		/* %eax %ebx %ecx %edx %esi %edi %ebp %esp
 				   %r8 %r9 %r10 %r11 %r12 %r13 %r14 %r15 */
+  INDEX_GPR16,			/* %eax %ebx %ecx %edx %esi %edi %ebp
+				   %r8 %r9 %r10 %r11 %r12 %r13 %r14 %r15 */
   FP_TOP_REG, FP_SECOND_REG,	/* %st(0) %st(1) */
   FLOAT_REGS,
   SSE_FIRST_REG,
@@ -1360,6 +1370,7 @@ enum reg_class
    "LEGACY_REGS",			\
    "GENERAL_REGS",			\
    "GENERAL_GPR16",			\
+   "INDEX_GPR16",			\
    "FP_TOP_REG", "FP_SECOND_REG",	\
    "FLOAT_REGS",			\
    "SSE_FIRST_REG",			\
@@ -1395,10 +1406,11 @@ enum reg_class
       { 0x0f,        0x0,   0x0 },	/* Q_REGS */			\
    { 0x900f0,        0x0,   0x0 },	/* NON_Q_REGS */		\
       { 0x7e,      0xff0,   0x0 },	/* TLS_GOTBASE_REGS */		\
-      { 0x7f,      0xff0,   0x0 },	/* INDEX_REGS */		\
+      { 0x7f,      0xff0,   0xffff000 },	/* INDEX_REGS */		\
    { 0x900ff,        0x0,   0x0 },	/* LEGACY_REGS */		\
    { 0x900ff,      0xff0,   0xffff000 },	/* GENERAL_REGS */		\
    { 0x900ff,      0xff0,   0x0 },	/* GENERAL_GPR16 */		\
+   { 0x0007f,      0xff0,   0x0 },	/* INDEX_GPR16 */		\
      { 0x100,        0x0,   0x0 },	/* FP_TOP_REG */		\
      { 0x200,        0x0,   0x0 },	/* FP_SECOND_REG */		\
     { 0xff00,        0x0,   0x0 },	/* FLOAT_REGS */		\
@@ -1456,6 +1468,9 @@ enum reg_class
 #define INDEX_REGNO_P(N) \
   (LEGACY_INDEX_REGNO_P (N) || REX_INT_REGNO_P (N) || REX2_INT_REGNO_P (N))
 
+#define GENERAL_GPR16_REGNO_P(N) \
+  (LEGACY_INT_REGNO_P (N) || REX_INT_REGNO_P (N))
+
 #define ANY_QI_REG_P(X) (REG_P (X) && ANY_QI_REGNO_P (REGNO (X)))
 #define ANY_QI_REGNO_P(N) \
   (TARGET_64BIT ? GENERAL_REGNO_P (N) : QI_REGNO_P (N))
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index e3270658cb7..b9eaea78f00 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -873,6 +873,9 @@ (define_attr "use_carry" "0,1" (const_string "0"))
 ;; Define attribute to indicate unaligned ssemov insns
 (define_attr "movu" "0,1" (const_string "0"))
 
+;; Define attribute to indicate gpr32 insns.
+(define_attr "gpr32" "0, 1" (const_string "1"))
+
 ;; Define instruction set of MMX instructions
 (define_attr "mmx_isa" "base,native,sse,sse_noavx,avx"
   (const_string "base"))
-- 
2.31.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 07/13] [APX EGPR] Map reg/mem constraints in inline asm to non-EGPR constraint.
  2023-09-22 10:56 [PATCH v2 00/13] Support Intel APX EGPR Hongyu Wang
                   ` (5 preceding siblings ...)
  2023-09-22 10:56 ` [PATCH 06/13] [APX EGPR] Add backend hook for base_reg_class/index_reg_class Hongyu Wang
@ 2023-09-22 10:56 ` Hongyu Wang
  2023-09-22 10:56 ` [PATCH 08/13] [APX EGPR] Handle GPR16 only vector move insns Hongyu Wang
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 28+ messages in thread
From: Hongyu Wang @ 2023-09-22 10:56 UTC (permalink / raw)
  To: gcc-patches; +Cc: ubizjak, vmakarov, jakub, Kong Lingling, Hongtao Liu

From: Kong Lingling <lingling.kong@intel.com>

In inline asm, we do not know if the insn can use EGPR, so disable EGPR
usage by default via mapping the common reg/mem constraint to non-EGPR
constraints.

The full list of mapping goes like

  "g" -> "jrjmi"
  "r" -> "jr"
  "m" -> "jm"
  "<" -> "j<"
  ">" -> "j>"
  "o" -> "jo"
  "V" -> "jV"
  "p" -> "jp"
  "Bm" -> "ja

For memory constraints, we add an option -mapx-inline-asm-use-gpr32
to allow/disallow gpr32 usage in any memory related constraints, as
base_reg_class/index_reg_class cannot aware whether the asm insn
support gpr32 or not.

gcc/ChangeLog:

	* config/i386/i386.cc (map_egpr_constraints): New funciton to
	map common constraints to EGPR prohibited constraints.
	(ix86_md_asm_adjust): Calls map_egpr_constraints.
	* config/i386/i386.opt: Add option mapx-inline-asm-use-gpr32.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/apx-inline-gpr-norex2.c: New test.

Co-authored-by: Hongyu Wang <hongyu.wang@intel.com>
Co-authored-by: Hongtao Liu <hongtao.liu@intel.com>
---
 gcc/config/i386/i386.cc                       | 92 +++++++++++++++++++
 gcc/config/i386/i386.opt                      |  5 +
 .../gcc.target/i386/apx-inline-gpr-norex2.c   | 25 +++++
 3 files changed, 122 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-inline-gpr-norex2.c

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 5af0de4dae7..ea94663eb68 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -17,6 +17,7 @@ You should have received a copy of the GNU General Public License
 along with GCC; see the file COPYING3.  If not see
 <http://www.gnu.org/licenses/>.  */
 
+#define INCLUDE_STRING
 #define IN_TARGET_CODE 1
 
 #include "config.h"
@@ -23161,6 +23162,93 @@ ix86_c_mode_for_suffix (char suffix)
   return VOIDmode;
 }
 
+/* Helper function to map common constraints to non-EGPR ones.
+   All related constraints have h prefix, and h plus Upper letter
+   means the constraint is strictly EGPR enabled, while h plus
+   lower letter indicates the constraint is strictly gpr16 only.
+
+   Specially for "g" constraint, split it to rmi as there is
+   no corresponding general constraint define for backend.
+
+   Here is the full list to map constraints that may involve
+   gpr to h prefixed.
+
+   "g" -> "jrjmi"
+   "r" -> "jr"
+   "m" -> "jm"
+   "<" -> "j<"
+   ">" -> "j>"
+   "o" -> "jo"
+   "V" -> "jV"
+   "p" -> "jp"
+   "Bm" -> "ja"
+*/
+
+static void map_egpr_constraints (vec<const char *> &constraints)
+{
+  for (size_t i = 0; i < constraints.length(); i++)
+    {
+      const char *cur = constraints[i];
+
+      if (startswith (con, "=@cc"))
+	continue;
+
+      int len = strlen (cur);
+      auto_vec<char> buf;
+
+      for (int j = 0; j < len; j++)
+	{
+	  switch (cur[j])
+	    {
+	    case 'g':
+	      buf.safe_push ('j');
+	      buf.safe_push ('r');
+	      buf.safe_push ('j');
+	      buf.safe_push ('m');
+	      buf.safe_push ('i');
+	      break;
+	    case 'r':
+	    case 'm':
+	    case '<':
+	    case '>':
+	    case 'o':
+	    case 'V':
+	    case 'p':
+	      buf.safe_push ('j');
+	      buf.safe_push (cur[j]);
+	      break;
+	    case 'B':
+	      if (cur[j + 1] == 'm')
+		{
+		  buf.safe_push ('j');
+		  buf.safe_push ('a');
+		  j++;
+		}
+	      else
+		{
+		  buf.safe_push (cur[j]);
+		  buf.safe_push (cur[j + 1]);
+		  j++;
+		}
+	      break;
+	    case 'T':
+	    case 'Y':
+	    case 'W':
+	    case 'j':
+	      buf.safe_push (cur[j]);
+	      buf.safe_push (cur[j + 1]);
+	      j++;
+	      break;
+	    default:
+	      buf.safe_push (cur[j]);
+	      break;
+	    }
+	}
+      buf.safe_push ('\0');
+      constraints[i] = xstrdup (buf.address ());
+    }
+}
+
 /* Worker function for TARGET_MD_ASM_ADJUST.
 
    We implement asm flag outputs, and maintain source compatibility
@@ -23175,6 +23263,10 @@ ix86_md_asm_adjust (vec<rtx> &outputs, vec<rtx> & /*inputs*/,
   bool saw_asm_flag = false;
 
   start_sequence ();
+
+  if (TARGET_APX_EGPR && !ix86_apx_inline_asm_use_gpr32)
+    map_egpr_constraints (constraints);
+
   for (unsigned i = 0, n = outputs.length (); i < n; ++i)
     {
       const char *con = constraints[i];
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index d89b5bbc5e8..d4a7b7ec839 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -1335,3 +1335,8 @@ Enum(apx_features) String(ndd) Value(apx_ndd) Set(4)
 
 EnumValue
 Enum(apx_features) String(all) Value(apx_all) Set(1)
+
+mapx-inline-asm-use-gpr32
+Target Var(ix86_apx_inline_asm_use_gpr32) Init(0)
+Enable GPR32 in inline asm when APX_EGPR enabled, do not
+hook reg or mem constraint in inline asm to GPR16.
diff --git a/gcc/testsuite/gcc.target/i386/apx-inline-gpr-norex2.c b/gcc/testsuite/gcc.target/i386/apx-inline-gpr-norex2.c
new file mode 100644
index 00000000000..ffd8f954500
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/apx-inline-gpr-norex2.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mapxf -m64" } */
+
+typedef unsigned int u32;
+typedef unsigned long long u64;
+
+void constraint_test ()
+{
+  register u64 *r16 __asm__("%r16");
+  register u64 r17 __asm__("%r17");
+  u64 *addr = r16;
+  
+  __asm__ __volatile__ ("test_mapping_g_m %0, %%rax" : : "g" (r16) : "rax");
+  __asm__ __volatile__ ("test_mapping_g_r %0, %%rax" : : "g" (r17) : "rax");
+  __asm__ __volatile__ ("test_mapping_m %0, %%rax" : : "m" (addr) : "rax");
+  __asm__ __volatile__ ("test_mapping_r %0, %%rax" : : "r" (r17) : "rax");
+  __asm__ __volatile__ ("test_mapping_rm %0, %%rax" : "=r,m" (r16) : : "rax");
+}
+
+/* { dg-final { scan-assembler-not "test_mapping_g_m %r16, %rax" } } */
+/* { dg-final { scan-assembler-not "test_mapping_g_r %r17, %rax" } } */
+/* { dg-final { scan-assembler-not "test_mapping_m %r16, %rax" } } */
+/* { dg-final { scan-assembler-not "test_mapping_r %r17, %rax" } } */
+/* { dg-final { scan-assembler-not "test_mapping_rm %r16, %rax" } } */
+
-- 
2.31.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 08/13] [APX EGPR] Handle GPR16 only vector move insns
  2023-09-22 10:56 [PATCH v2 00/13] Support Intel APX EGPR Hongyu Wang
                   ` (6 preceding siblings ...)
  2023-09-22 10:56 ` [PATCH 07/13] [APX EGPR] Map reg/mem constraints in inline asm to non-EGPR constraint Hongyu Wang
@ 2023-09-22 10:56 ` Hongyu Wang
  2023-09-22 10:56 ` [PATCH 09/13] [APX EGPR] Handle legacy insn that only support GPR16 (1/5) Hongyu Wang
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 28+ messages in thread
From: Hongyu Wang @ 2023-09-22 10:56 UTC (permalink / raw)
  To: gcc-patches; +Cc: ubizjak, vmakarov, jakub, Kong Lingling, Hongtao Liu

For vector move insns like vmovdqa/vmovdqu, their evex counterparts
requrire explicit suffix 64/32/16/8. The usage of these instruction
are prohibited under AVX10_1 or AVX512F, so for we select
vmovaps/vmovups for vector load/store insns that contains EGPR if
ther is no AVX512VL, and keep the original move insn selection
otherwise.

gcc/ChangeLog:

	* config/i386/i386.cc (ix86_get_ssemov): Check if egpr is used,
	adjust mnemonic for vmovduq/vmovdqa.
	* config/i386/sse.md (*<extract_type>_vinsert<shuffletype><extract_suf>_0):
	Check if egpr is used, adjust mnemonic for vmovdqu/vmovdqa.
	(avx_vec_concat<mode>): Likewise, and separate alternative 0 to
	avx_noavx512f.

Co-authored-by: Kong Lingling <lingling.kong@intel.com>
Co-authored-by: Hongtao Liu <hongtao.liu@intel.com>
---
 gcc/config/i386/i386.cc | 42 +++++++++++++++++++++++++++++++++++------
 gcc/config/i386/sse.md  | 34 +++++++++++++++++++++++----------
 2 files changed, 60 insertions(+), 16 deletions(-)

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index ea94663eb68..5d47c2af25e 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -5478,6 +5478,12 @@ ix86_get_ssemov (rtx *operands, unsigned size,
   bool evex_reg_p = (size == 64
 		     || EXT_REX_SSE_REG_P (operands[0])
 		     || EXT_REX_SSE_REG_P (operands[1]));
+
+  bool egpr_p = (TARGET_APX_EGPR
+		 && (x86_extended_rex2reg_mentioned_p (operands[0])
+		     || x86_extended_rex2reg_mentioned_p (operands[1])));
+  bool egpr_vl = egpr_p && TARGET_AVX512VL;
+
   machine_mode scalar_mode;
 
   const char *opcode = NULL;
@@ -5550,12 +5556,18 @@ ix86_get_ssemov (rtx *operands, unsigned size,
 	{
 	case E_HFmode:
 	case E_BFmode:
-	  if (evex_reg_p)
+	  if (evex_reg_p || egpr_vl)
 	    opcode = (misaligned_p
 		      ? (TARGET_AVX512BW
 			 ? "vmovdqu16"
 			 : "vmovdqu64")
 		      : "vmovdqa64");
+	  else if (egpr_p)
+	    opcode = (misaligned_p
+		      ? (TARGET_AVX512BW
+			 ? "vmovdqu16"
+			 : "%vmovups")
+		      : "%vmovaps");
 	  else
 	    opcode = (misaligned_p
 		      ? (TARGET_AVX512BW
@@ -5570,8 +5582,10 @@ ix86_get_ssemov (rtx *operands, unsigned size,
 	  opcode = misaligned_p ? "%vmovupd" : "%vmovapd";
 	  break;
 	case E_TFmode:
-	  if (evex_reg_p)
+	  if (evex_reg_p || egpr_vl)
 	    opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64";
+	  else if (egpr_p)
+	    opcode = misaligned_p ? "%vmovups" : "%vmovaps";
 	  else
 	    opcode = misaligned_p ? "%vmovdqu" : "%vmovdqa";
 	  break;
@@ -5584,12 +5598,18 @@ ix86_get_ssemov (rtx *operands, unsigned size,
       switch (scalar_mode)
 	{
 	case E_QImode:
-	  if (evex_reg_p)
+	  if (evex_reg_p || egpr_vl)
 	    opcode = (misaligned_p
 		      ? (TARGET_AVX512BW
 			 ? "vmovdqu8"
 			 : "vmovdqu64")
 		      : "vmovdqa64");
+	  else if (egpr_p)
+	    opcode = (misaligned_p
+		      ? (TARGET_AVX512BW
+			 ? "vmovdqu8"
+			 : "%vmovups")
+		      : "%vmovaps");
 	  else
 	    opcode = (misaligned_p
 		      ? (TARGET_AVX512BW
@@ -5598,12 +5618,18 @@ ix86_get_ssemov (rtx *operands, unsigned size,
 		      : "%vmovdqa");
 	  break;
 	case E_HImode:
-	  if (evex_reg_p)
+	  if (evex_reg_p || egpr_vl)
 	    opcode = (misaligned_p
 		      ? (TARGET_AVX512BW
 			 ? "vmovdqu16"
 			 : "vmovdqu64")
 		      : "vmovdqa64");
+	  else if (egpr_p)
+	    opcode = (misaligned_p
+		      ? (TARGET_AVX512BW
+			 ? "vmovdqu16"
+			 : "%vmovups")
+		      : "%vmovaps");
 	  else
 	    opcode = (misaligned_p
 		      ? (TARGET_AVX512BW
@@ -5612,16 +5638,20 @@ ix86_get_ssemov (rtx *operands, unsigned size,
 		      : "%vmovdqa");
 	  break;
 	case E_SImode:
-	  if (evex_reg_p)
+	  if (evex_reg_p || egpr_vl)
 	    opcode = misaligned_p ? "vmovdqu32" : "vmovdqa32";
+	  else if (egpr_p)
+	    opcode = misaligned_p ? "%vmovups" : "%vmovaps";
 	  else
 	    opcode = misaligned_p ? "%vmovdqu" : "%vmovdqa";
 	  break;
 	case E_DImode:
 	case E_TImode:
 	case E_OImode:
-	  if (evex_reg_p)
+	  if (evex_reg_p || egpr_vl)
 	    opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64";
+	  else if (egpr_p)
+	    opcode = misaligned_p ? "%vmovups" : "%vmovaps";
 	  else
 	    opcode = misaligned_p ? "%vmovdqu" : "%vmovdqa";
 	  break;
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 80b43fd7db7..256b0eedbbb 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -18912,6 +18912,12 @@ (define_insn "*<extract_type>_vinsert<shuffletype><extract_suf>_0"
 {
   if (which_alternative == 0)
     return "vinsert<shuffletype><extract_suf>\t{$0, %2, %1, %0|%0, %1, %2, 0}";
+  bool egpr_used = (TARGET_APX_EGPR
+		    && x86_extended_rex2reg_mentioned_p (operands[2]));
+  const char *align_templ = egpr_used ? "vmovaps\t{%2, %x0|%x0, %2}"
+				      : "vmovdqa\t{%2, %x0|%x0, %2}";
+  const char *unalign_templ = egpr_used ? "vmovups\t{%2, %x0|%x0, %2}"
+					: "vmovdqu\t{%2, %x0|%x0, %2}";
   switch (<MODE>mode)
     {
     case E_V8DFmode:
@@ -18927,17 +18933,17 @@ (define_insn "*<extract_type>_vinsert<shuffletype><extract_suf>_0"
     case E_V8DImode:
       if (misaligned_operand (operands[2], <ssequartermode>mode))
 	return which_alternative == 2 ? "vmovdqu64\t{%2, %x0|%x0, %2}"
-				      : "vmovdqu\t{%2, %x0|%x0, %2}";
+				      : unalign_templ;
       else
 	return which_alternative == 2 ? "vmovdqa64\t{%2, %x0|%x0, %2}"
-				      : "vmovdqa\t{%2, %x0|%x0, %2}";
+				      : align_templ;
     case E_V16SImode:
       if (misaligned_operand (operands[2], <ssequartermode>mode))
 	return which_alternative == 2 ? "vmovdqu32\t{%2, %x0|%x0, %2}"
-				      : "vmovdqu\t{%2, %x0|%x0, %2}";
+				      : unalign_templ;
       else
 	return which_alternative == 2 ? "vmovdqa32\t{%2, %x0|%x0, %2}"
-				      : "vmovdqa\t{%2, %x0|%x0, %2}";
+				      : align_templ;
     default:
       gcc_unreachable ();
     }
@@ -27661,11 +27667,13 @@ (define_insn "avx_vec_concat<mode>"
   [(set (match_operand:V_256_512 0 "register_operand" "=x,v,x,Yv")
 	(vec_concat:V_256_512
 	  (match_operand:<ssehalfvecmode> 1 "nonimmediate_operand" "x,v,xm,vm")
-	  (match_operand:<ssehalfvecmode> 2 "nonimm_or_0_operand" "xm,vm,C,C")))]
+	  (match_operand:<ssehalfvecmode> 2 "nonimm_or_0_operand" "xBt,vm,C,C")))]
   "TARGET_AVX
    && (operands[2] == CONST0_RTX (<ssehalfvecmode>mode)
        || !MEM_P (operands[1]))"
 {
+  bool egpr_used = (TARGET_APX_EGPR
+		    && x86_extended_rex2reg_mentioned_p (operands[1]));
   switch (which_alternative)
     {
     case 0:
@@ -27713,7 +27721,8 @@ (define_insn "avx_vec_concat<mode>"
 	  if (misaligned_operand (operands[1], <ssehalfvecmode>mode))
 	    {
 	      if (which_alternative == 2)
-		return "vmovdqu\t{%1, %t0|%t0, %1}";
+		return egpr_used ? "vmovups\t{%1, %t0|%t0, %1}"
+				 : "vmovdqu\t{%1, %t0|%t0, %1}";
 	      else if (GET_MODE_SIZE (<ssescalarmode>mode) == 8)
 		return "vmovdqu64\t{%1, %t0|%t0, %1}";
 	      else
@@ -27722,7 +27731,8 @@ (define_insn "avx_vec_concat<mode>"
 	  else
 	    {
 	      if (which_alternative == 2)
-		return "vmovdqa\t{%1, %t0|%t0, %1}";
+		return egpr_used ? "vmovaps\t{%1, %t0|%t0, %1}"
+				 : "vmovdqa\t{%1, %t0|%t0, %1}";
 	      else if (GET_MODE_SIZE (<ssescalarmode>mode) == 8)
 		return "vmovdqa64\t{%1, %t0|%t0, %1}";
 	      else
@@ -27732,7 +27742,8 @@ (define_insn "avx_vec_concat<mode>"
 	  if (misaligned_operand (operands[1], <ssehalfvecmode>mode))
 	    {
 	      if (which_alternative == 2)
-		return "vmovdqu\t{%1, %x0|%x0, %1}";
+		return egpr_used ? "vmovups\t{%1, %x0|%x0, %1}"
+				 : "vmovdqu\t{%1, %x0|%x0, %1}";
 	      else if (GET_MODE_SIZE (<ssescalarmode>mode) == 8)
 		return "vmovdqu64\t{%1, %x0|%x0, %1}";
 	      else
@@ -27741,7 +27752,8 @@ (define_insn "avx_vec_concat<mode>"
 	  else
 	    {
 	      if (which_alternative == 2)
-		return "vmovdqa\t{%1, %x0|%x0, %1}";
+		return egpr_used ? "vmovaps\t{%1, %x0|%x0, %1}"
+				 : "vmovdqa\t{%1, %x0|%x0, %1}";
 	      else if (GET_MODE_SIZE (<ssescalarmode>mode) == 8)
 		return "vmovdqa64\t{%1, %x0|%x0, %1}";
 	      else
@@ -27754,7 +27766,9 @@ (define_insn "avx_vec_concat<mode>"
       gcc_unreachable ();
     }
 }
-  [(set_attr "type" "sselog,sselog,ssemov,ssemov")
+  [(set_attr "isa" "noavx512f,avx512f,*,*")
+   (set_attr "gpr32" "0,1,1,1")
+   (set_attr "type" "sselog,sselog,ssemov,ssemov")
    (set_attr "prefix_extra" "1,1,*,*")
    (set_attr "length_immediate" "1,1,*,*")
    (set_attr "prefix" "maybe_evex")
-- 
2.31.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 09/13] [APX EGPR] Handle legacy insn that only support GPR16 (1/5)
  2023-09-22 10:56 [PATCH v2 00/13] Support Intel APX EGPR Hongyu Wang
                   ` (7 preceding siblings ...)
  2023-09-22 10:56 ` [PATCH 08/13] [APX EGPR] Handle GPR16 only vector move insns Hongyu Wang
@ 2023-09-22 10:56 ` Hongyu Wang
  2023-09-22 10:56 ` [PATCH 10/13] [APX EGPR] Handle legacy insns that only support GPR16 (2/5) Hongyu Wang
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 28+ messages in thread
From: Hongyu Wang @ 2023-09-22 10:56 UTC (permalink / raw)
  To: gcc-patches; +Cc: ubizjak, vmakarov, jakub, Kong Lingling, Hongtao Liu

From: Kong Lingling <lingling.kong@intel.com>

These legacy insn in opcode map0/1 only support GPR16,
and do not have vex/evex counterpart, directly adjust constraints and
add gpr32 attr to patterns.

insn list:
1. xsave/xsave64, xrstor/xrstor64
2. xsaves/xsaves64, xrstors/xrstors64
3. xsavec/xsavec64
4. xsaveopt/xsaveopt64
5. fxsave64/fxrstor64

gcc/ChangeLog:

	* config/i386/i386.md (<xsave>): Set attr gpr32 0 and constraint
	jm.
	(<xsave>_rex64): Likewise.
	(<xrstor>_rex64): Likewise.
	(<xrstor>64): Likewise.
	(fxsave64): Likewise.
	(fxstore64): Likewise.

gcc/testsuite/ChangeLog:

	* lib/target-supports.exp: Add apxf check.
	* gcc.target/i386/apx-legacy-insn-check-norex2.c: New test.
	* gcc.target/i386/apx-legacy-insn-check-norex2-asm.c: New assembler test.

Co-authored-by: Hongyu Wang <hongyu.wang@intel.com>
Co-authored-by: Hongtao Liu <hongtao.liu@intel.com>
---
 gcc/config/i386/i386.md                       | 18 +++++++----
 .../i386/apx-legacy-insn-check-norex2-asm.c   |  5 ++++
 .../i386/apx-legacy-insn-check-norex2.c       | 30 +++++++++++++++++++
 gcc/testsuite/lib/target-supports.exp         | 10 +++++++
 4 files changed, 57 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2-asm.c
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index b9eaea78f00..6cf86b798a8 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -25626,11 +25626,12 @@ (define_insn "fxsave"
         (symbol_ref "ix86_attr_length_address_default (insn) + 3"))])
 
 (define_insn "fxsave64"
-  [(set (match_operand:BLK 0 "memory_operand" "=m")
+  [(set (match_operand:BLK 0 "memory_operand" "=jm")
 	(unspec_volatile:BLK [(const_int 0)] UNSPECV_FXSAVE64))]
   "TARGET_64BIT && TARGET_FXSR"
   "fxsave64\t%0"
   [(set_attr "type" "other")
+   (set_attr "gpr32" "0")
    (set_attr "memory" "store")
    (set (attr "length")
         (symbol_ref "ix86_attr_length_address_default (insn) + 4"))])
@@ -25646,11 +25647,12 @@ (define_insn "fxrstor"
         (symbol_ref "ix86_attr_length_address_default (insn) + 3"))])
 
 (define_insn "fxrstor64"
-  [(unspec_volatile [(match_operand:BLK 0 "memory_operand" "m")]
+  [(unspec_volatile [(match_operand:BLK 0 "memory_operand" "jm")]
 		    UNSPECV_FXRSTOR64)]
   "TARGET_64BIT && TARGET_FXSR"
   "fxrstor64\t%0"
   [(set_attr "type" "other")
+   (set_attr "gpr32" "0")
    (set_attr "memory" "load")
    (set (attr "length")
         (symbol_ref "ix86_attr_length_address_default (insn) + 4"))])
@@ -25704,7 +25706,7 @@ (define_insn "<xsave>"
         (symbol_ref "ix86_attr_length_address_default (insn) + 3"))])
 
 (define_insn "<xsave>_rex64"
-  [(set (match_operand:BLK 0 "memory_operand" "=m")
+  [(set (match_operand:BLK 0 "memory_operand" "=jm")
 	(unspec_volatile:BLK
 	 [(match_operand:SI 1 "register_operand" "a")
 	  (match_operand:SI 2 "register_operand" "d")]
@@ -25713,11 +25715,12 @@ (define_insn "<xsave>_rex64"
   "<xsave>\t%0"
   [(set_attr "type" "other")
    (set_attr "memory" "store")
+   (set_attr "gpr32" "0")
    (set (attr "length")
         (symbol_ref "ix86_attr_length_address_default (insn) + 3"))])
 
 (define_insn "<xsave>"
-  [(set (match_operand:BLK 0 "memory_operand" "=m")
+  [(set (match_operand:BLK 0 "memory_operand" "=jm")
 	(unspec_volatile:BLK
 	 [(match_operand:SI 1 "register_operand" "a")
 	  (match_operand:SI 2 "register_operand" "d")]
@@ -25726,6 +25729,7 @@ (define_insn "<xsave>"
   "<xsave>\t%0"
   [(set_attr "type" "other")
    (set_attr "memory" "store")
+   (set_attr "gpr32" "0")
    (set (attr "length")
         (symbol_ref "ix86_attr_length_address_default (insn) + 4"))])
 
@@ -25743,7 +25747,7 @@ (define_insn "<xrstor>"
 
 (define_insn "<xrstor>_rex64"
    [(unspec_volatile:BLK
-     [(match_operand:BLK 0 "memory_operand" "m")
+     [(match_operand:BLK 0 "memory_operand" "jm")
       (match_operand:SI 1 "register_operand" "a")
       (match_operand:SI 2 "register_operand" "d")]
      ANY_XRSTOR)]
@@ -25751,12 +25755,13 @@ (define_insn "<xrstor>_rex64"
   "<xrstor>\t%0"
   [(set_attr "type" "other")
    (set_attr "memory" "load")
+   (set_attr "gpr32" "0")
    (set (attr "length")
         (symbol_ref "ix86_attr_length_address_default (insn) + 3"))])
 
 (define_insn "<xrstor>64"
    [(unspec_volatile:BLK
-     [(match_operand:BLK 0 "memory_operand" "m")
+     [(match_operand:BLK 0 "memory_operand" "jm")
       (match_operand:SI 1 "register_operand" "a")
       (match_operand:SI 2 "register_operand" "d")]
      ANY_XRSTOR64)]
@@ -25764,6 +25769,7 @@ (define_insn "<xrstor>64"
   "<xrstor>64\t%0"
   [(set_attr "type" "other")
    (set_attr "memory" "load")
+   (set_attr "gpr32" "0")
    (set (attr "length")
         (symbol_ref "ix86_attr_length_address_default (insn) + 4"))])
 
diff --git a/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2-asm.c b/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2-asm.c
new file mode 100644
index 00000000000..7ecc861435f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2-asm.c
@@ -0,0 +1,5 @@
+/* { dg-do assemble { target apxf } } */
+/* { dg-options "-O1 -mapxf -m64 -DDTYPE32" } */
+
+#include "apx-legacy-insn-check-norex2.c"
+
diff --git a/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c b/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c
new file mode 100644
index 00000000000..1e5450dfb73
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mapxf -m64 -DDTYPE32" } */
+
+#include <immintrin.h>
+
+typedef unsigned int u32;
+typedef unsigned long long u64;
+
+#ifndef DTYPE32
+#define DTYPE32
+#endif
+
+#ifdef DTYPE32
+typedef u32 DTYPE;
+#endif
+
+__attribute__((target("xsave,fxsr")))
+void legacy_test ()
+{
+  register DTYPE* val __asm__("r16");
+  _xsave64 (val, 1);
+  _xrstor64 (val, 1);
+  _fxsave64 (val);
+  _fxrstor64 (val);
+}
+
+/* { dg-final { scan-assembler-not "xsave64\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "xrstor64\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "fxsave64\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "fxrstor64\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 2de41cef2f6..2907de8bd7c 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -10001,6 +10001,16 @@ proc check_effective_target_sm4 { } {
     } "-msm4" ]
 }
 
+proc check_effective_target_apxf { } {
+    return [check_no_compiler_messages apxf object {
+	void
+	foo ()
+	{
+	  __asm__ volatile ("add\t%%r16, %%r31" ::);
+	}
+    } "-mapxf" ]
+}
+
 # Return 1 if sse instructions can be compiled.
 proc check_effective_target_sse { } {
     return [check_no_compiler_messages sse object {
-- 
2.31.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 10/13] [APX EGPR] Handle legacy insns that only support GPR16 (2/5)
  2023-09-22 10:56 [PATCH v2 00/13] Support Intel APX EGPR Hongyu Wang
                   ` (8 preceding siblings ...)
  2023-09-22 10:56 ` [PATCH 09/13] [APX EGPR] Handle legacy insn that only support GPR16 (1/5) Hongyu Wang
@ 2023-09-22 10:56 ` Hongyu Wang
  2023-09-22 10:56 ` [PATCH 11/13] [APX EGPR] Handle legacy insns that only support GPR16 (3/5) Hongyu Wang
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 28+ messages in thread
From: Hongyu Wang @ 2023-09-22 10:56 UTC (permalink / raw)
  To: gcc-patches; +Cc: ubizjak, vmakarov, jakub, Kong Lingling, Hongtao Liu

From: Kong Lingling <lingling.kong@intel.com>

These legacy insns in opcode map2/3 have vex but no evex
counterpart, disable EGPR for them by adjusting alternatives and
attr_gpr32.

insn list:
1. phaddw/vphaddw, phaddd/vphaddd, phaddsw/vphaddsw
2. phsubw/vphsubw, phsubd/vphsubd, phsubsw/vphsubsw
3. psignb/vpsginb, psignw/vpsignw, psignd/vpsignd
4. blendps/vblendps, blendpd/vblendpd
5. blendvps/vblendvps, blendvpd/vblendvpd
6. pblendvb/vpblendvb, pblendw/vpblendw
7. mpsadbw/vmpsadbw
8. dpps/vddps, dppd/vdppd
9. pcmpeqq/vpcmpeqq, pcmpgtq/vpcmpgtq

gcc/ChangeLog:

	* config/i386/sse.md (avx2_ph<plusminus_mnemonic>wv16hi3): Set
	attr gpr32 0 and constraint jm/ja to all mem alternatives.
	(ssse3_ph<plusminus_mnemonic>wv8hi3): Likewise.
	(ssse3_ph<plusminus_mnemonic>wv4hi3): Likewise.
	(avx2_ph<plusminus_mnemonic>dv8si3): Likewise.
	(ssse3_ph<plusminus_mnemonic>dv4si3): Likewise.
	(ssse3_ph<plusminus_mnemonic>dv2si3): Likewise.
	(<ssse3_avx2>_psign<mode>3): Likewise.
	(ssse3_psign<mode>3): Likewise.
	(<sse4_1>_blend<ssemodesuffix><avxsizesuffix): Likewise.
	(<sse4_1>_blendv<ssemodesuffix><avxsizesuffix): Likewise.
	(*<sse4_1>_blendv<ssemodesuffix><avxsizesuffix>_lt): Likewise.
	(*<sse4_1>_blendv<ssefltmodesuff)ix><avxsizesuffix>_not_ltint: Likewise.
	(<sse4_1>_dp<ssemodesuffix><avxsizesuffix>): Likewise.
	(<sse4_1_avx2>_mpsadbw): Likewise.
	(<sse4_1_avx2>_pblendvb): Likewise.
	(*<sse4_1_avx2>_pblendvb_lt): Likewise.
	(sse4_1_pblend<ssemodesuffix>): Likewise.
	(*avx2_pblend<ssemodesuffix>): Likewise.
	(avx2_permv2ti): Likewise.
	(*avx_vperm2f128<mode>_nozero): Likewise.
	(*avx2_eq<mode>3): Likewise.
	(*sse4_1_eqv2di3): Likewise.
	(sse4_2_gtv2di3): Likewise.
	(avx2_gt<mode>3): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/apx-legacy-insn-check-norex2.c: Add
	sse/vex intrinsic tests.

Co-authored-by: Hongyu Wang <hongyu.wang@intel.com>
Co-authored-by: Hongtao Liu <hongtao.liu@intel.com>
---
 gcc/config/i386/sse.md                        |  73 ++++++++----
 .../i386/apx-legacy-insn-check-norex2.c       | 106 ++++++++++++++++++
 2 files changed, 155 insertions(+), 24 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 256b0eedbbb..a7858a7f8cf 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -16831,7 +16831,7 @@ (define_insn "*avx2_eq<mode>3"
   [(set (match_operand:VI_256 0 "register_operand" "=x")
 	(eq:VI_256
 	  (match_operand:VI_256 1 "nonimmediate_operand" "%x")
-	  (match_operand:VI_256 2 "nonimmediate_operand" "xm")))]
+	  (match_operand:VI_256 2 "nonimmediate_operand" "jm")))]
   "TARGET_AVX2 && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
   "vpcmpeq<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "type" "ssecmp")
@@ -16839,6 +16839,7 @@ (define_insn "*avx2_eq<mode>3"
      (if_then_else (eq (const_string "<MODE>mode") (const_string "V4DImode"))
 		   (const_string "1")
 		   (const_string "*")))
+   (set_attr "gpr32" "0")
    (set_attr "prefix" "vex")
    (set_attr "mode" "OI")])
 
@@ -17021,7 +17022,7 @@ (define_insn "*sse4_1_eqv2di3"
   [(set (match_operand:V2DI 0 "register_operand" "=Yr,*x,x")
 	(eq:V2DI
 	  (match_operand:V2DI 1 "vector_operand" "%0,0,x")
-	  (match_operand:V2DI 2 "vector_operand" "YrBm,*xBm,xm")))]
+	  (match_operand:V2DI 2 "vector_operand" "Yrja,*xja,xjm")))]
   "TARGET_SSE4_1 && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
   "@
    pcmpeqq\t{%2, %0|%0, %2}
@@ -17029,6 +17030,7 @@ (define_insn "*sse4_1_eqv2di3"
    vpcmpeqq\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssecmp")
+   (set_attr "gpr32" "0")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "orig,orig,vex")
    (set_attr "mode" "TI")])
@@ -17037,13 +17039,14 @@ (define_insn "*sse2_eq<mode>3"
   [(set (match_operand:VI124_128 0 "register_operand" "=x,x")
 	(eq:VI124_128
 	  (match_operand:VI124_128 1 "vector_operand" "%0,x")
-	  (match_operand:VI124_128 2 "vector_operand" "xBm,xm")))]
+	  (match_operand:VI124_128 2 "vector_operand" "xBm,xjm")))]
   "TARGET_SSE2
    && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
   "@
    pcmpeq<ssemodesuffix>\t{%2, %0|%0, %2}
    vpcmpeq<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,avx")
+   (set_attr "gpr32" "1,0")
    (set_attr "type" "ssecmp")
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "TI")])
@@ -17052,7 +17055,7 @@ (define_insn "sse4_2_gtv2di3"
   [(set (match_operand:V2DI 0 "register_operand" "=Yr,*x,x")
 	(gt:V2DI
 	  (match_operand:V2DI 1 "register_operand" "0,0,x")
-	  (match_operand:V2DI 2 "vector_operand" "YrBm,*xBm,xm")))]
+	  (match_operand:V2DI 2 "vector_operand" "Yrja,*xja,xjm")))]
   "TARGET_SSE4_2"
   "@
    pcmpgtq\t{%2, %0|%0, %2}
@@ -17060,6 +17063,7 @@ (define_insn "sse4_2_gtv2di3"
    vpcmpgtq\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssecmp")
+   (set_attr "gpr32" "0")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "orig,orig,vex")
    (set_attr "mode" "TI")])
@@ -17068,7 +17072,7 @@ (define_insn "avx2_gt<mode>3"
   [(set (match_operand:VI_256 0 "register_operand" "=x")
 	(gt:VI_256
 	  (match_operand:VI_256 1 "register_operand" "x")
-	  (match_operand:VI_256 2 "nonimmediate_operand" "xm")))]
+	  (match_operand:VI_256 2 "nonimmediate_operand" "xjm")))]
   "TARGET_AVX2"
   "vpcmpgt<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "type" "ssecmp")
@@ -17076,6 +17080,7 @@ (define_insn "avx2_gt<mode>3"
      (if_then_else (eq (const_string "<MODE>mode") (const_string "V4DImode"))
 		   (const_string "1")
 		   (const_string "*")))
+   (set_attr "gpr32" "0")
    (set_attr "prefix" "vex")
    (set_attr "mode" "OI")])
 
@@ -17099,7 +17104,7 @@ (define_insn "*sse2_gt<mode>3"
   [(set (match_operand:VI124_128 0 "register_operand" "=x,x")
 	(gt:VI124_128
 	  (match_operand:VI124_128 1 "register_operand" "0,x")
-	  (match_operand:VI124_128 2 "vector_operand" "xBm,xm")))]
+	  (match_operand:VI124_128 2 "vector_operand" "xBm,xjm")))]
   "TARGET_SSE2"
   "@
    pcmpgt<ssemodesuffix>\t{%2, %0|%0, %2}
@@ -21222,7 +21227,7 @@ (define_insn "avx2_ph<plusminus_mnemonic>wv16hi3"
 	  (vec_select:V16HI
 	    (vec_concat:V32HI
 	      (match_operand:V16HI 1 "register_operand" "x")
-	      (match_operand:V16HI 2 "nonimmediate_operand" "xm"))
+	      (match_operand:V16HI 2 "nonimmediate_operand" "xjm"))
 	    (parallel
 	      [(const_int 0) (const_int 2) (const_int 4) (const_int 6)
 	       (const_int 16) (const_int 18) (const_int 20) (const_int 22)
@@ -21238,6 +21243,7 @@ (define_insn "avx2_ph<plusminus_mnemonic>wv16hi3"
   "TARGET_AVX2"
   "vph<plusminus_mnemonic>w\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "type" "sseiadd")
+   (set_attr "gpr32" "0")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "vex")
    (set_attr "mode" "OI")])
@@ -21248,7 +21254,7 @@ (define_insn "ssse3_ph<plusminus_mnemonic>wv8hi3"
 	  (vec_select:V8HI
 	    (vec_concat:V16HI
 	      (match_operand:V8HI 1 "register_operand" "0,x")
-	      (match_operand:V8HI 2 "vector_operand" "xBm,xm"))
+	      (match_operand:V8HI 2 "vector_operand" "xja,xjm"))
 	    (parallel
 	      [(const_int 0) (const_int 2) (const_int 4) (const_int 6)
 	       (const_int 8) (const_int 10) (const_int 12) (const_int 14)]))
@@ -21263,6 +21269,7 @@ (define_insn "ssse3_ph<plusminus_mnemonic>wv8hi3"
    vph<plusminus_mnemonic>w\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sseiadd")
+   (set_attr "gpr32" "0")
    (set_attr "atom_unit" "complex")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "orig,vex")
@@ -21314,7 +21321,7 @@ (define_insn "avx2_ph<plusminus_mnemonic>dv8si3"
 	  (vec_select:V8SI
 	    (vec_concat:V16SI
 	      (match_operand:V8SI 1 "register_operand" "x")
-	      (match_operand:V8SI 2 "nonimmediate_operand" "xm"))
+	      (match_operand:V8SI 2 "nonimmediate_operand" "xjm"))
 	    (parallel
 	      [(const_int 0) (const_int 2) (const_int 8) (const_int 10)
 	       (const_int 4) (const_int 6) (const_int 12) (const_int 14)]))
@@ -21326,6 +21333,7 @@ (define_insn "avx2_ph<plusminus_mnemonic>dv8si3"
   "TARGET_AVX2"
   "vph<plusminus_mnemonic>d\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "type" "sseiadd")
+   (set_attr "gpr32" "0")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "vex")
    (set_attr "mode" "OI")])
@@ -21336,7 +21344,7 @@ (define_insn "ssse3_ph<plusminus_mnemonic>dv4si3"
 	  (vec_select:V4SI
 	    (vec_concat:V8SI
 	      (match_operand:V4SI 1 "register_operand" "0,x")
-	      (match_operand:V4SI 2 "vector_operand" "xBm,xm"))
+	      (match_operand:V4SI 2 "vector_operand" "xja,xjm"))
 	    (parallel
 	      [(const_int 0) (const_int 2) (const_int 4) (const_int 6)]))
 	  (vec_select:V4SI
@@ -21349,6 +21357,7 @@ (define_insn "ssse3_ph<plusminus_mnemonic>dv4si3"
    vph<plusminus_mnemonic>d\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sseiadd")
+   (set_attr "gpr32" "0")
    (set_attr "atom_unit" "complex")
    (set_attr "prefix_data16" "1,*")
    (set_attr "prefix_extra" "1")
@@ -21388,6 +21397,7 @@ (define_insn_and_split "ssse3_ph<plusminus_mnemonic>dv2si3"
 }
   [(set_attr "mmx_isa" "native,sse_noavx,avx")
    (set_attr "type" "sseiadd")
+   (set_attr "gpr32" "0")
    (set_attr "atom_unit" "complex")
    (set_attr "prefix_extra" "1")
    (set (attr "prefix_rex") (symbol_ref "x86_extended_reg_mentioned_p (insn)"))
@@ -21842,7 +21852,7 @@ (define_insn "<ssse3_avx2>_psign<mode>3"
   [(set (match_operand:VI124_AVX2 0 "register_operand" "=x,x")
 	(unspec:VI124_AVX2
 	  [(match_operand:VI124_AVX2 1 "register_operand" "0,x")
-	   (match_operand:VI124_AVX2 2 "vector_operand" "xBm,xm")]
+	   (match_operand:VI124_AVX2 2 "vector_operand" "xja,xjm")]
 	  UNSPEC_PSIGN))]
   "TARGET_SSSE3"
   "@
@@ -21850,6 +21860,7 @@ (define_insn "<ssse3_avx2>_psign<mode>3"
    vpsign<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sselog1")
+   (set_attr "gpr32" "0")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "<sseinsnmode>")])
@@ -22147,7 +22158,7 @@ (define_mode_attr blendbits
 (define_insn "<sse4_1>_blend<ssemodesuffix><avxsizesuffix>"
   [(set (match_operand:VF_128_256 0 "register_operand" "=Yr,*x,x")
 	(vec_merge:VF_128_256
-	  (match_operand:VF_128_256 2 "vector_operand" "YrBm,*xBm,xm")
+	  (match_operand:VF_128_256 2 "vector_operand" "Yrja,*xja,xjm")
 	  (match_operand:VF_128_256 1 "register_operand" "0,0,x")
 	  (match_operand:SI 3 "const_0_to_<blendbits>_operand")))]
   "TARGET_SSE4_1"
@@ -22157,6 +22168,7 @@ (define_insn "<sse4_1>_blend<ssemodesuffix><avxsizesuffix>"
    vblend<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssemov")
+   (set_attr "gpr32" "0")
    (set_attr "length_immediate" "1")
    (set_attr "prefix_data16" "1,1,*")
    (set_attr "prefix_extra" "1")
@@ -22167,7 +22179,7 @@ (define_insn "<sse4_1>_blendv<ssemodesuffix><avxsizesuffix>"
   [(set (match_operand:VF_128_256 0 "register_operand" "=Yr,*x,x")
 	(unspec:VF_128_256
 	  [(match_operand:VF_128_256 1 "register_operand" "0,0,x")
-	   (match_operand:VF_128_256 2 "vector_operand" "YrBm,*xBm,xm")
+	   (match_operand:VF_128_256 2 "vector_operand" "Yrja,*xja,xjm")
 	   (match_operand:VF_128_256 3 "register_operand" "Yz,Yz,x")]
 	  UNSPEC_BLENDV))]
   "TARGET_SSE4_1"
@@ -22177,6 +22189,7 @@ (define_insn "<sse4_1>_blendv<ssemodesuffix><avxsizesuffix>"
    vblendv<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssemov")
+   (set_attr "gpr32" "0")
    (set_attr "length_immediate" "1")
    (set_attr "prefix_data16" "1,1,*")
    (set_attr "prefix_extra" "1")
@@ -22228,7 +22241,7 @@ (define_insn_and_split "*<sse4_1>_blendv<ssemodesuffix><avxsizesuffix>_lt"
   [(set (match_operand:VF_128_256 0 "register_operand" "=Yr,*x,x")
 	(unspec:VF_128_256
 	  [(match_operand:VF_128_256 1 "register_operand" "0,0,x")
-	   (match_operand:VF_128_256 2 "vector_operand" "YrBm,*xBm,xm")
+	   (match_operand:VF_128_256 2 "vector_operand" "Yrja,*xja,xjm")
 	   (lt:VF_128_256
 	     (match_operand:<sseintvecmode> 3 "register_operand" "Yz,Yz,x")
 	     (match_operand:<sseintvecmode> 4 "const0_operand"))]
@@ -22242,6 +22255,7 @@ (define_insn_and_split "*<sse4_1>_blendv<ssemodesuffix><avxsizesuffix>_lt"
   "operands[3] = gen_lowpart (<MODE>mode, operands[3]);"
   [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssemov")
+   (set_attr "gpr32" "0")
    (set_attr "length_immediate" "1")
    (set_attr "prefix_data16" "1,1,*")
    (set_attr "prefix_extra" "1")
@@ -22260,7 +22274,7 @@ (define_insn_and_split "*<sse4_1>_blendv<ssefltmodesuffix><avxsizesuffix>_ltint"
   [(set (match_operand:<ssebytemode> 0 "register_operand" "=Yr,*x,x")
 	(unspec:<ssebytemode>
 	  [(match_operand:<ssebytemode> 1 "register_operand" "0,0,x")
-	   (match_operand:<ssebytemode> 2 "vector_operand" "YrBm,*xBm,xm")
+	   (match_operand:<ssebytemode> 2 "vector_operand" "Yrja,*xja,xjm")
 	   (subreg:<ssebytemode>
 	     (lt:VI48_AVX
 	       (match_operand:VI48_AVX 3 "register_operand" "Yz,Yz,x")
@@ -22280,6 +22294,7 @@ (define_insn_and_split "*<sse4_1>_blendv<ssefltmodesuffix><avxsizesuffix>_ltint"
 }
   [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssemov")
+   (set_attr "gpr32" "0")
    (set_attr "length_immediate" "1")
    (set_attr "prefix_data16" "1,1,*")
    (set_attr "prefix_extra" "1")
@@ -22318,7 +22333,7 @@ (define_insn "<sse4_1>_dp<ssemodesuffix><avxsizesuffix>"
   [(set (match_operand:VF_128_256 0 "register_operand" "=Yr,*x,x")
 	(unspec:VF_128_256
 	  [(match_operand:VF_128_256 1 "vector_operand" "%0,0,x")
-	   (match_operand:VF_128_256 2 "vector_operand" "YrBm,*xBm,xm")
+	   (match_operand:VF_128_256 2 "vector_operand" "Yrja,*xja,xjm")
 	   (match_operand:SI 3 "const_0_to_255_operand")]
 	  UNSPEC_DP))]
   "TARGET_SSE4_1"
@@ -22328,6 +22343,7 @@ (define_insn "<sse4_1>_dp<ssemodesuffix><avxsizesuffix>"
    vdp<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssemul")
+   (set_attr "gpr32" "0")
    (set_attr "length_immediate" "1")
    (set_attr "prefix_data16" "1,1,*")
    (set_attr "prefix_extra" "1")
@@ -22356,7 +22372,7 @@ (define_insn "<sse4_1_avx2>_mpsadbw"
   [(set (match_operand:VI1_AVX2 0 "register_operand" "=Yr,*x,x")
 	(unspec:VI1_AVX2
 	  [(match_operand:VI1_AVX2 1 "register_operand" "0,0,x")
-	   (match_operand:VI1_AVX2 2 "vector_operand" "YrBm,*xBm,xm")
+	   (match_operand:VI1_AVX2 2 "vector_operand" "Yrja,*xja,xjm")
 	   (match_operand:SI 3 "const_0_to_255_operand")]
 	  UNSPEC_MPSADBW))]
   "TARGET_SSE4_1"
@@ -22366,6 +22382,7 @@ (define_insn "<sse4_1_avx2>_mpsadbw"
    vmpsadbw\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "sselog1")
+   (set_attr "gpr32" "0")
    (set_attr "length_immediate" "1")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "orig,orig,vex")
@@ -22394,7 +22411,7 @@ (define_insn "<sse4_1_avx2>_pblendvb"
   [(set (match_operand:VI1_AVX2 0 "register_operand" "=Yr,*x,x")
 	(unspec:VI1_AVX2
 	  [(match_operand:VI1_AVX2 1 "register_operand"  "0,0,x")
-	   (match_operand:VI1_AVX2 2 "vector_operand" "YrBm,*xBm,xm")
+	   (match_operand:VI1_AVX2 2 "vector_operand" "Yrja,*xja,xjm")
 	   (match_operand:VI1_AVX2 3 "register_operand" "Yz,Yz,x")]
 	  UNSPEC_BLENDV))]
   "TARGET_SSE4_1"
@@ -22404,6 +22421,7 @@ (define_insn "<sse4_1_avx2>_pblendvb"
    vpblendvb\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssemov")
+   (set_attr "gpr32" "0")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "*,*,1")
    (set_attr "prefix" "orig,orig,vex")
@@ -22443,7 +22461,7 @@ (define_insn_and_split "*<sse4_1_avx2>_pblendvb_lt"
   [(set (match_operand:VI1_AVX2 0 "register_operand" "=Yr,*x,x")
 	(unspec:VI1_AVX2
 	  [(match_operand:VI1_AVX2 1 "register_operand"  "0,0,x")
-	   (match_operand:VI1_AVX2 2 "vector_operand" "YrBm,*xBm,xm")
+	   (match_operand:VI1_AVX2 2 "vector_operand" "Yrja,*xja,xjm")
 	   (lt:VI1_AVX2 (match_operand:VI1_AVX2 3 "register_operand" "Yz,Yz,x")
 			(match_operand:VI1_AVX2 4 "const0_operand"))]
 	  UNSPEC_BLENDV))]
@@ -22456,6 +22474,7 @@ (define_insn_and_split "*<sse4_1_avx2>_pblendvb_lt"
   ""
   [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssemov")
+   (set_attr "gpr32" "0")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "*,*,1")
    (set_attr "prefix" "orig,orig,vex")
@@ -22487,7 +22506,7 @@ (define_insn_and_split "*<sse4_1_avx2>_pblendvb_lt_subreg_not"
 (define_insn "sse4_1_pblend<ssemodesuffix>"
   [(set (match_operand:V8_128 0 "register_operand" "=Yr,*x,x")
 	(vec_merge:V8_128
-	  (match_operand:V8_128 2 "vector_operand" "YrBm,*xBm,xm")
+	  (match_operand:V8_128 2 "vector_operand" "Yrja,*xja,xjm")
 	  (match_operand:V8_128 1 "register_operand" "0,0,x")
 	  (match_operand:SI 3 "const_0_to_255_operand")))]
   "TARGET_SSE4_1"
@@ -22497,6 +22516,7 @@ (define_insn "sse4_1_pblend<ssemodesuffix>"
    vpblendw\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssemov")
+   (set_attr "gpr32" "0")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "orig,orig,vex")
@@ -22559,7 +22579,7 @@ (define_expand "avx2_pblend<ssemodesuffix>_1"
 (define_insn "*avx2_pblend<ssemodesuffix>"
   [(set (match_operand:V16_256 0 "register_operand" "=x")
 	(vec_merge:V16_256
-	  (match_operand:V16_256 2 "nonimmediate_operand" "xm")
+	  (match_operand:V16_256 2 "nonimmediate_operand" "xjm")
 	  (match_operand:V16_256 1 "register_operand" "x")
 	  (match_operand:SI 3 "avx2_pblendw_operand")))]
   "TARGET_AVX2"
@@ -22568,6 +22588,7 @@ (define_insn "*avx2_pblend<ssemodesuffix>"
   return "vpblendw\t{%3, %2, %1, %0|%0, %1, %2, %3}";
 }
   [(set_attr "type" "ssemov")
+   (set_attr "gpr32" "0")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "vex")
@@ -22576,7 +22597,7 @@ (define_insn "*avx2_pblend<ssemodesuffix>"
 (define_insn "avx2_pblendd<mode>"
   [(set (match_operand:VI4_AVX2 0 "register_operand" "=x")
 	(vec_merge:VI4_AVX2
-	  (match_operand:VI4_AVX2 2 "nonimmediate_operand" "xm")
+	  (match_operand:VI4_AVX2 2 "nonimmediate_operand" "xjm")
 	  (match_operand:VI4_AVX2 1 "register_operand" "x")
 	  (match_operand:SI 3 "const_0_to_255_operand")))]
   "TARGET_AVX2"
@@ -26437,11 +26458,13 @@ (define_insn "avx512f_perm<mode>_1<mask_name>"
    (set_attr "prefix" "<mask_prefix2>")
    (set_attr "mode" "<sseinsnmode>")])
 
+;; TODO (APX): vmovaps supports EGPR but not others, could split
+;; pattern to enable gpr32 for this one.
 (define_insn "avx2_permv2ti"
   [(set (match_operand:V4DI 0 "register_operand" "=x")
 	(unspec:V4DI
 	  [(match_operand:V4DI 1 "register_operand" "x")
-	   (match_operand:V4DI 2 "nonimmediate_operand" "xm")
+	   (match_operand:V4DI 2 "nonimmediate_operand" "xjm")
 	   (match_operand:SI 3 "const_0_to_255_operand")]
 	  UNSPEC_VPERMTI))]
   "TARGET_AVX2"
@@ -26468,6 +26491,7 @@ (define_insn "avx2_permv2ti"
     return "vperm2i128\t{%3, %2, %1, %0|%0, %1, %2, %3}";
   }
   [(set_attr "type" "sselog")
+   (set_attr "gpr32" "0")
    (set_attr "prefix" "vex")
    (set_attr "mode" "OI")])
 
@@ -27098,7 +27122,7 @@ (define_insn "*avx_vperm2f128<mode>_nozero"
 	(vec_select:AVX256MODE2P
 	  (vec_concat:<ssedoublevecmode>
 	    (match_operand:AVX256MODE2P 1 "register_operand" "x")
-	    (match_operand:AVX256MODE2P 2 "nonimmediate_operand" "xm"))
+	    (match_operand:AVX256MODE2P 2 "nonimmediate_operand" "xjm"))
 	  (match_parallel 3 ""
 	    [(match_operand 4 "const_int_operand")])))]
   "TARGET_AVX
@@ -27115,6 +27139,7 @@ (define_insn "*avx_vperm2f128<mode>_nozero"
   return "vperm2<i128>\t{%3, %2, %1, %0|%0, %1, %2, %3}";
 }
   [(set_attr "type" "sselog")
+   (set_attr "gpr32" "0")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "vex")
diff --git a/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c b/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c
index 1e5450dfb73..510213a6ca7 100644
--- a/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c
+++ b/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c
@@ -28,3 +28,109 @@ void legacy_test ()
 /* { dg-final { scan-assembler-not "xrstor64\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
 /* { dg-final { scan-assembler-not "fxsave64\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
 /* { dg-final { scan-assembler-not "fxrstor64\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+
+#ifdef DTYPE
+#undef DTYPE
+#define DTYPE u64
+#endif
+
+typedef union
+{
+  __m128i xi[8];
+  __m128 xf[8];
+  __m128d xd[8];
+  __m256i yi[4];
+  __m256 yf[4];
+  __m256d yd[4];
+  DTYPE a[16];
+} tmp_u;
+
+__attribute__((target("sse4.2")))
+void sse_test ()
+{
+  register tmp_u *tdst __asm__("%r16");
+  register tmp_u *src1 __asm__("%r17");
+  register tmp_u *src2 __asm__("%r18");
+ 
+  src1->xi[0] = _mm_hadd_epi16 (tdst->xi[2], src2->xi[3]);
+  src1->xi[1] = _mm_hadd_epi32 (tdst->xi[0], src2->xi[1]);
+  tdst->xi[2] = _mm_hadds_epi16 (src1->xi[4], src2->xi[5]);
+  tdst->xi[3] = _mm_hsub_epi16 (src1->xi[6], src2->xi[7]);
+  tdst->xi[4] = _mm_hsub_epi32 (src1->xi[0], src2->xi[1]);
+  tdst->xi[5] = _mm_hsubs_epi16 (src1->xi[2], src2->xi[3]);
+
+  src1->xi[6] = _mm_cmpeq_epi64 (tdst->xi[4], src2->xi[5]);
+  src1->xi[7] = _mm_cmpgt_epi64 (tdst->xi[6], src2->xi[7]);
+
+  tdst->xf[0] = _mm_dp_ps (src1->xf[0], src2->xf[1], 0xbf);
+  tdst->xd[1] = _mm_dp_pd (src1->xd[2], src2->xd[3], 0xae);
+
+  tdst->xi[2] = _mm_mpsadbw_epu8 (src1->xi[4], src2->xi[5], 0xc1);
+
+  tdst->xi[3] = _mm_blend_epi16 (src1->xi[6], src2->xi[7], 0xc);
+  tdst->xi[4] = _mm_blendv_epi8 (src1->xi[0], src2->xi[1], tdst->xi[2]);
+  tdst->xf[5] = _mm_blend_ps (src1->xf[3], src2->xf[4], 0x4);
+  tdst->xf[6] = _mm_blendv_ps (src1->xf[5], src2->xf[6], tdst->xf[7]);
+  tdst->xd[7] = _mm_blend_pd (tdst->xd[0], src1->xd[1], 0x1);
+  tdst->xd[0] = _mm_blendv_pd (src1->xd[2], src2->xd[3], tdst->xd[4]);
+
+  tdst->xi[1] = _mm_sign_epi8 (src1->xi[5], src2->xi[6]);
+  tdst->xi[2] = _mm_sign_epi16 (src1->xi[7], src2->xi[0]);
+  tdst->xi[3] = _mm_sign_epi32 (src1->xi[1], src2->xi[2]);
+}
+
+__attribute__((target("avx2")))
+void vex_test ()
+{
+
+  register tmp_u *tdst __asm__("%r16");
+  register tmp_u *src1 __asm__("%r17");
+  register tmp_u *src2 __asm__("%r18");
+  
+  src1->yi[1] = _mm256_hadd_epi16 (tdst->yi[2], src2->yi[3]);
+  src1->yi[2] = _mm256_hadd_epi32 (tdst->yi[0], src2->yi[1]);
+  tdst->yi[3] = _mm256_hadds_epi16 (src1->yi[1], src2->yi[2]);
+  tdst->yi[0] = _mm256_hsub_epi16 (src1->yi[3], src2->yi[0]);
+  tdst->yi[1] = _mm256_hsub_epi32 (src1->yi[0], src2->yi[1]);
+  tdst->yi[2] = _mm256_hsubs_epi16 (src1->yi[2], src2->yi[3]);
+
+  src1->yi[2] = _mm256_cmpeq_epi64 (tdst->yi[1], src2->yi[2]);
+  src1->yi[1] = _mm256_cmpgt_epi64 (tdst->yi[3], src2->yi[0]);
+
+  tdst->yf[2] = _mm256_dp_ps (src1->yf[0], src2->yf[1], 0xbf);
+  tdst->xd[3] = _mm_dp_pd (src1->xd[0], src2->xd[1], 0xbf);
+
+  tdst->yi[3] = _mm256_mpsadbw_epu8 (src1->yi[1], src2->yi[1], 0xc1);
+
+  tdst->yi[0] = _mm256_blend_epi16 (src1->yi[1], src2->yi[2], 0xc);
+  tdst->yi[1] = _mm256_blendv_epi8 (src1->yi[1], src2->yi[2], tdst->yi[0]);
+  tdst->yf[2] = _mm256_blend_ps (src1->yf[0], src2->yf[1], 0x4);
+  tdst->yf[3] = _mm256_blendv_ps (src1->yf[2], src2->yf[3], tdst->yf[1]);
+  tdst->yd[3] = _mm256_blend_pd (tdst->yd[1], src1->yd[0], 0x1);
+  tdst->yd[1] = _mm256_blendv_pd (src1->yd[2], src2->yd[3], tdst->yd[2]);
+
+  tdst->yi[2] = _mm256_sign_epi8 (src1->yi[0], src2->yi[1]);
+  tdst->yi[3] = _mm256_sign_epi16 (src1->yi[2], src2->yi[3]);
+  tdst->yi[0] = _mm256_sign_epi32 (src1->yi[0], src2->yi[1]);
+}
+
+/* { dg-final { scan-assembler-not "v?pcmpeqq\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?pcmpgtq\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?phaddw\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?phaddd\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?phaddsw\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?phsubw\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?phsubd\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?phsubsw\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?dpps\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?dppd\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?psadbw\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?pblendw\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?pblendvb\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?blendps\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?blendvps\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?blendpd\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?blendvpd\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?psignb\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?psignw\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?psignd\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
-- 
2.31.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 11/13] [APX EGPR] Handle legacy insns that only support GPR16 (3/5)
  2023-09-22 10:56 [PATCH v2 00/13] Support Intel APX EGPR Hongyu Wang
                   ` (9 preceding siblings ...)
  2023-09-22 10:56 ` [PATCH 10/13] [APX EGPR] Handle legacy insns that only support GPR16 (2/5) Hongyu Wang
@ 2023-09-22 10:56 ` Hongyu Wang
  2023-09-22 10:56 ` [PATCH 12/13] [APX_EGPR] Handle legacy insns that only support GPR16 (4/5) Hongyu Wang
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 28+ messages in thread
From: Hongyu Wang @ 2023-09-22 10:56 UTC (permalink / raw)
  To: gcc-patches; +Cc: ubizjak, vmakarov, jakub, Kong Lingling, Hongtao Liu

From: Kong Lingling <lingling.kong@intel.com>

Disable EGPR usage for below legacy insns in opcode map2/3 that have vex
but no evex counterpart.

insn list:
1. phminposuw/vphminposuw
2. ptest/vptest
3. roundps/vroundps, roundpd/vroundpd,
   roundss/vroundss, roundsd/vroundsd
4. pcmpestri/vpcmpestri, pcmpestrm/vpcmpestrm
5. pcmpistri/vpcmpistri, pcmpistrm/vpcmpistrm
6. aesimc/vaesimc, aeskeygenassist/vaeskeygenassist

gcc/ChangeLog:

	* config/i386/i386-protos.h (x86_evex_reg_mentioned_p): New
	prototype.
	* config/i386/i386.cc (x86_evex_reg_mentioned_p): New
	function.
	* config/i386/i386.md (sse4_1_round<mode>2): Set attr gpr32 0
	and constraint jm to all non-evex alternatives, adjust
	alternative outputs if evex reg is mentioned.
	* config/i386/sse.md (<sse4_1>_ptest<mode>): Set attr gpr32 0
	and constraint jm/ja to all non-evex alternatives.
	(ptesttf2): Likewise.
	(<sse4_1>_round<ssemodesuffix><avxsizesuffix): Likewise.
	(sse4_1_round<ssescalarmodesuffix>): Likewise.
	(sse4_2_pcmpestri): Likewise.
	(sse4_2_pcmpestrm): Likewise.
	(sse4_2_pcmpestr_cconly): Likewise.
	(sse4_2_pcmpistr): Likewise.
	(sse4_2_pcmpistri): Likewise.
	(sse4_2_pcmpistrm): Likewise.
	(sse4_2_pcmpistr_cconly): Likewise.
	(aesimc): Likewise.
	(aeskeygenassist): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/apx-legacy-insn-check-norex2.c: Add intrinsic
	tests.

Co-authored-by: Hongyu Wang <hongyu.wang@intel.com>
Co-authored-by: Hongtao Liu <hongtao.liu@intel.com>
---
 gcc/config/i386/i386-protos.h                 |  1 +
 gcc/config/i386/i386.cc                       | 13 +++
 gcc/config/i386/i386.md                       |  3 +-
 gcc/config/i386/sse.md                        | 93 +++++++++++++------
 .../i386/apx-legacy-insn-check-norex2.c       | 55 ++++++++++-
 5 files changed, 132 insertions(+), 33 deletions(-)

diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index a54e3f6b1dc..28d0eab11d5 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -65,6 +65,7 @@ extern bool extended_reg_mentioned_p (rtx);
 extern bool x86_extended_QIreg_mentioned_p (rtx_insn *);
 extern bool x86_extended_reg_mentioned_p (rtx);
 extern bool x86_extended_rex2reg_mentioned_p (rtx);
+extern bool x86_evex_reg_mentioned_p (rtx [], int);
 extern bool x86_maybe_negate_const_int (rtx *, machine_mode);
 extern machine_mode ix86_cc_mode (enum rtx_code, rtx, rtx);
 
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 5d47c2af25e..58fa054635a 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -22937,6 +22937,19 @@ x86_extended_rex2reg_mentioned_p (rtx insn)
   return false;
 }
 
+/* Return true when rtx operands mentions register that must be encoded using
+   evex prefix.  */
+bool
+x86_evex_reg_mentioned_p (rtx operands[], int nops)
+{
+  int i;
+  for (i = 0; i < nops; i++)
+    if (EXT_REX_SSE_REG_P (operands[i])
+	|| x86_extended_rex2reg_mentioned_p (operands[i]))
+      return true;
+  return false;
+}
+
 /* If profitable, negate (without causing overflow) integer constant
    of mode MODE at location LOC.  Return true in this case.  */
 bool
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 6cf86b798a8..271d417146c 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -21603,7 +21603,7 @@ (define_expand "significand<mode>2"
 (define_insn "sse4_1_round<mode>2"
   [(set (match_operand:MODEFH 0 "register_operand" "=x,x,x,v,v")
 	(unspec:MODEFH
-	  [(match_operand:MODEFH 1 "nonimmediate_operand" "0,x,m,v,m")
+	  [(match_operand:MODEFH 1 "nonimmediate_operand" "0,x,jm,v,m")
 	   (match_operand:SI 2 "const_0_to_15_operand")]
 	  UNSPEC_ROUND))]
   "TARGET_SSE4_1"
@@ -21616,6 +21616,7 @@ (define_insn "sse4_1_round<mode>2"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix_extra" "1,1,1,*,*")
    (set_attr "length_immediate" "1")
+   (set_attr "gpr32" "1,1,0,1,1")
    (set_attr "prefix" "maybe_vex,maybe_vex,maybe_vex,evex,evex")
    (set_attr "isa" "noavx512f,noavx512f,noavx512f,avx512f,avx512f")
    (set_attr "avx_partial_xmm_update" "false,false,true,false,true")
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index a7858a7f8cf..4db3940e422 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -22610,11 +22610,12 @@ (define_insn "avx2_pblendd<mode>"
 
 (define_insn "sse4_1_phminposuw"
   [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,x")
-	(unspec:V8HI [(match_operand:V8HI 1 "vector_operand" "YrBm,*xBm,xm")]
+	(unspec:V8HI [(match_operand:V8HI 1 "vector_operand" "Yrja,*xja,xjm")]
 		     UNSPEC_PHMINPOSUW))]
   "TARGET_SSE4_1"
   "%vphminposuw\t{%1, %0|%0, %1}"
   [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "gpr32" "0")
    (set_attr "type" "sselog1")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "orig,orig,vex")
@@ -23803,12 +23804,13 @@ (define_insn "avx_vtest<ssemodesuffix><avxsizesuffix>"
 (define_insn "*<sse4_1>_ptest<mode>"
   [(set (reg FLAGS_REG)
 	(unspec [(match_operand:V_AVX 0 "register_operand" "Yr, *x, x")
-		 (match_operand:V_AVX 1 "vector_operand" "YrBm, *xBm, xm")]
+		 (match_operand:V_AVX 1 "vector_operand" "Yrja, *xja, xjm")]
 		UNSPEC_PTEST))]
   "TARGET_SSE4_1 && ix86_match_ptest_ccmode (insn)"
   "%vptest\t{%1, %0|%0, %1}"
   [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssecomi")
+   (set_attr "gpr32" "0")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "orig,orig,vex")
    (set (attr "btver2_decode")
@@ -23845,12 +23847,13 @@ (define_expand "<sse4_1>_ptest<mode>"
 (define_insn "ptesttf2"
   [(set (reg:CC FLAGS_REG)
 	(unspec:CC [(match_operand:TF 0 "register_operand" "Yr, *x, x")
-		    (match_operand:TF 1 "vector_operand" "YrBm, *xBm, xm")]
+		    (match_operand:TF 1 "vector_operand" "Yrja, *xja, xjm")]
 		   UNSPEC_PTEST))]
   "TARGET_SSE4_1"
   "%vptest\t{%1, %0|%0, %1}"
   [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssecomi")
+   (set_attr "gpr32" "0")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "orig,orig,vex")
    (set_attr "mode" "TI")])
@@ -23961,13 +23964,14 @@ (define_expand "lrint<mode><sseintvecmodelower>2"
 (define_insn "<sse4_1>_round<ssemodesuffix><avxsizesuffix>"
   [(set (match_operand:VF_128_256 0 "register_operand" "=Yr,*x,x")
 	(unspec:VF_128_256
-	  [(match_operand:VF_128_256 1 "vector_operand" "YrBm,*xBm,xm")
+	  [(match_operand:VF_128_256 1 "vector_operand" "Yrja,*xja,xjm")
 	   (match_operand:SI 2 "const_0_to_15_operand")]
 	  UNSPEC_ROUND))]
   "TARGET_SSE4_1"
   "%vround<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssecvt")
+   (set_attr "gpr32" "0")
    (set_attr "prefix_data16" "1,1,*")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
@@ -24054,19 +24058,32 @@ (define_insn "sse4_1_round<ssescalarmodesuffix>"
   [(set (match_operand:VF_128 0 "register_operand" "=Yr,*x,x,v")
 	(vec_merge:VF_128
 	  (unspec:VF_128
-	    [(match_operand:VF_128 2 "nonimmediate_operand" "Yrm,*xm,xm,vm")
+	    [(match_operand:VF_128 2 "nonimmediate_operand" "Yrjm,*xjm,xjm,vm")
 	     (match_operand:SI 3 "const_0_to_15_operand")]
 	    UNSPEC_ROUND)
 	  (match_operand:VF_128 1 "register_operand" "0,0,x,v")
 	  (const_int 1)))]
   "TARGET_SSE4_1"
-  "@
-   round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %3}
-   round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %3}
-   vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}
-   vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}"
-  [(set_attr "isa" "noavx,noavx,avx,avx512f")
+{
+  switch (which_alternative)
+    {
+      case 0:
+      case 1:
+	return "round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %3}";
+      case 2:
+	return "vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}";
+      case 3:
+	if (x86_evex_reg_mentioned_p (operands, 3))
+	  return "vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}";
+	else
+	  return "vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}";
+      default:
+	gcc_unreachable ();
+    }
+}
+  [(set_attr "isa" "noavx,noavx,noavx512f,avx512f")
    (set_attr "type" "ssecvt")
+   (set_attr "gpr32" "0,0,0,1")
    (set_attr "length_immediate" "1")
    (set_attr "prefix_data16" "1,1,*,*")
    (set_attr "prefix_extra" "1")
@@ -24078,19 +24095,32 @@ (define_insn "*sse4_1_round<ssescalarmodesuffix>"
 	(vec_merge:VFH_128
 	  (vec_duplicate:VFH_128
 	    (unspec:<ssescalarmode>
-	      [(match_operand:<ssescalarmode> 2 "nonimmediate_operand" "Yrm,*xm,xm,vm")
+	      [(match_operand:<ssescalarmode> 2 "nonimmediate_operand" "Yrjm,*xjm,xjm,vm")
 	       (match_operand:SI 3 "const_0_to_15_operand")]
 	      UNSPEC_ROUND))
 	  (match_operand:VFH_128 1 "register_operand" "0,0,x,v")
 	  (const_int 1)))]
   "TARGET_SSE4_1"
-  "@
-   round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %2, %3}
-   round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %2, %3}
-   vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}
-   vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
-  [(set_attr "isa" "noavx,noavx,avx,avx512f")
+{
+  switch (which_alternative)
+    {
+      case 0:
+      case 1:
+	return "round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %2, %3}";
+      case 2:
+	return "vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}";
+      case 3:
+	if (x86_evex_reg_mentioned_p (operands, 3) || <MODE>mode == V8HFmode)
+	  return "vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}";
+	else
+	  return "vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}";
+      default:
+	gcc_unreachable ();
+    }
+}
+  [(set_attr "isa" "noavx,noavx,noavx512f,avx512f")
    (set_attr "type" "ssecvt")
+   (set_attr "gpr32" "0,0,0,1")
    (set_attr "length_immediate" "1")
    (set_attr "prefix_data16" "1,1,*,*")
    (set_attr "prefix_extra" "1")
@@ -24311,7 +24341,7 @@ (define_insn "sse4_2_pcmpestri"
 	(unspec:SI
 	  [(match_operand:V16QI 1 "register_operand" "x,x")
 	   (match_operand:SI 2 "register_operand" "a,a")
-	   (match_operand:V16QI 3 "nonimmediate_operand" "x,m")
+	   (match_operand:V16QI 3 "nonimmediate_operand" "x,jm")
 	   (match_operand:SI 4 "register_operand" "d,d")
 	   (match_operand:SI 5 "const_0_to_255_operand")]
 	  UNSPEC_PCMPESTR))
@@ -24326,6 +24356,7 @@ (define_insn "sse4_2_pcmpestri"
   "TARGET_SSE4_2"
   "%vpcmpestri\t{%5, %3, %1|%1, %3, %5}"
   [(set_attr "type" "sselog")
+   (set_attr "gpr32" "0")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "maybe_vex")
    (set_attr "length_immediate" "1")
@@ -24338,7 +24369,7 @@ (define_insn "sse4_2_pcmpestrm"
 	(unspec:V16QI
 	  [(match_operand:V16QI 1 "register_operand" "x,x")
 	   (match_operand:SI 2 "register_operand" "a,a")
-	   (match_operand:V16QI 3 "nonimmediate_operand" "x,m")
+	   (match_operand:V16QI 3 "nonimmediate_operand" "x,jm")
 	   (match_operand:SI 4 "register_operand" "d,d")
 	   (match_operand:SI 5 "const_0_to_255_operand")]
 	  UNSPEC_PCMPESTR))
@@ -24353,6 +24384,7 @@ (define_insn "sse4_2_pcmpestrm"
   "TARGET_SSE4_2"
   "%vpcmpestrm\t{%5, %3, %1|%1, %3, %5}"
   [(set_attr "type" "sselog")
+   (set_attr "gpr32" "0")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "maybe_vex")
@@ -24365,7 +24397,7 @@ (define_insn "sse4_2_pcmpestr_cconly"
 	(unspec:CC
 	  [(match_operand:V16QI 2 "register_operand" "x,x,x,x")
 	   (match_operand:SI 3 "register_operand" "a,a,a,a")
-	   (match_operand:V16QI 4 "nonimmediate_operand" "x,m,x,m")
+	   (match_operand:V16QI 4 "nonimmediate_operand" "x,jm,x,jm")
 	   (match_operand:SI 5 "register_operand" "d,d,d,d")
 	   (match_operand:SI 6 "const_0_to_255_operand")]
 	  UNSPEC_PCMPESTR))
@@ -24378,6 +24410,7 @@ (define_insn "sse4_2_pcmpestr_cconly"
    %vpcmpestri\t{%6, %4, %2|%2, %4, %6}
    %vpcmpestri\t{%6, %4, %2|%2, %4, %6}"
   [(set_attr "type" "sselog")
+   (set_attr "gpr32" "0")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
    (set_attr "memory" "none,load,none,load")
@@ -24389,7 +24422,7 @@ (define_insn_and_split "sse4_2_pcmpistr"
   [(set (match_operand:SI 0 "register_operand" "=c,c")
 	(unspec:SI
 	  [(match_operand:V16QI 2 "register_operand" "x,x")
-	   (match_operand:V16QI 3 "nonimmediate_operand" "x,m")
+	   (match_operand:V16QI 3 "nonimmediate_operand" "x,jm")
 	   (match_operand:SI 4 "const_0_to_255_operand")]
 	  UNSPEC_PCMPISTR))
    (set (match_operand:V16QI 1 "register_operand" "=Yz,Yz")
@@ -24432,6 +24465,7 @@ (define_insn_and_split "sse4_2_pcmpistr"
   DONE;
 }
   [(set_attr "type" "sselog")
+   (set_attr "gpr32" "0")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
    (set_attr "memory" "none,load")
@@ -24441,7 +24475,7 @@ (define_insn "sse4_2_pcmpistri"
   [(set (match_operand:SI 0 "register_operand" "=c,c")
 	(unspec:SI
 	  [(match_operand:V16QI 1 "register_operand" "x,x")
-	   (match_operand:V16QI 2 "nonimmediate_operand" "x,m")
+	   (match_operand:V16QI 2 "nonimmediate_operand" "x,jm")
 	   (match_operand:SI 3 "const_0_to_255_operand")]
 	  UNSPEC_PCMPISTR))
    (set (reg:CC FLAGS_REG)
@@ -24453,6 +24487,7 @@ (define_insn "sse4_2_pcmpistri"
   "TARGET_SSE4_2"
   "%vpcmpistri\t{%3, %2, %1|%1, %2, %3}"
   [(set_attr "type" "sselog")
+   (set_attr "gpr32" "0")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "maybe_vex")
@@ -24464,7 +24499,7 @@ (define_insn "sse4_2_pcmpistrm"
   [(set (match_operand:V16QI 0 "register_operand" "=Yz,Yz")
 	(unspec:V16QI
 	  [(match_operand:V16QI 1 "register_operand" "x,x")
-	   (match_operand:V16QI 2 "nonimmediate_operand" "x,m")
+	   (match_operand:V16QI 2 "nonimmediate_operand" "x,jm")
 	   (match_operand:SI 3 "const_0_to_255_operand")]
 	  UNSPEC_PCMPISTR))
    (set (reg:CC FLAGS_REG)
@@ -24476,6 +24511,7 @@ (define_insn "sse4_2_pcmpistrm"
   "TARGET_SSE4_2"
   "%vpcmpistrm\t{%3, %2, %1|%1, %2, %3}"
   [(set_attr "type" "sselog")
+   (set_attr "gpr32" "0")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "maybe_vex")
@@ -24487,7 +24523,7 @@ (define_insn "sse4_2_pcmpistr_cconly"
   [(set (reg:CC FLAGS_REG)
 	(unspec:CC
 	  [(match_operand:V16QI 2 "register_operand" "x,x,x,x")
-	   (match_operand:V16QI 3 "nonimmediate_operand" "x,m,x,m")
+	   (match_operand:V16QI 3 "nonimmediate_operand" "x,jm,x,jm")
 	   (match_operand:SI 4 "const_0_to_255_operand")]
 	  UNSPEC_PCMPISTR))
    (clobber (match_scratch:V16QI 0 "=Yz,Yz,X,X"))
@@ -24499,6 +24535,7 @@ (define_insn "sse4_2_pcmpistr_cconly"
    %vpcmpistri\t{%4, %3, %2|%2, %3, %4}
    %vpcmpistri\t{%4, %3, %2|%2, %3, %4}"
   [(set_attr "type" "sselog")
+   (set_attr "gpr32" "0")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
    (set_attr "memory" "none,load,none,load")
@@ -25983,23 +26020,25 @@ (define_insn "aesdeclast"
 
 (define_insn "aesimc"
   [(set (match_operand:V2DI 0 "register_operand" "=x")
-	(unspec:V2DI [(match_operand:V2DI 1 "vector_operand" "xBm")]
+	(unspec:V2DI [(match_operand:V2DI 1 "vector_operand" "xja")]
 		      UNSPEC_AESIMC))]
   "TARGET_AES"
   "%vaesimc\t{%1, %0|%0, %1}"
   [(set_attr "type" "sselog1")
+   (set_attr "gpr32" "0")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "TI")])
 
 (define_insn "aeskeygenassist"
   [(set (match_operand:V2DI 0 "register_operand" "=x")
-	(unspec:V2DI [(match_operand:V2DI 1 "vector_operand" "xBm")
+	(unspec:V2DI [(match_operand:V2DI 1 "vector_operand" "xja")
 		      (match_operand:SI 2 "const_0_to_255_operand")]
 		     UNSPEC_AESKEYGENASSIST))]
   "TARGET_AES"
   "%vaeskeygenassist\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "type" "sselog1")
+   (set_attr "gpr32" "0")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "maybe_vex")
diff --git a/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c b/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c
index 510213a6ca7..771bcb078e1 100644
--- a/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c
+++ b/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c
@@ -45,13 +45,22 @@ typedef union
   DTYPE a[16];
 } tmp_u;
 
-__attribute__((target("sse4.2")))
+__attribute__((target("sse4.2,aes")))
 void sse_test ()
 {
   register tmp_u *tdst __asm__("%r16");
   register tmp_u *src1 __asm__("%r17");
   register tmp_u *src2 __asm__("%r18");
- 
+
+  src1->xi[0] = _mm_minpos_epu16 (src1->xi[1]);
+  src1->a[2] = _mm_testc_si128 (src1->xi[3], src2->xi[4]);
+  src1->xf[3] = _mm_round_ss (src1->xf[5], src2->xf[6],
+			      _MM_FROUND_CUR_DIRECTION);
+  src1->xf[4] = _mm_round_ps (src1->xf[7], _MM_FROUND_CUR_DIRECTION);
+  src1->xd[0] = _mm_round_sd (src1->xd[2], src2->xd[3],
+			      _MM_FROUND_CUR_DIRECTION);
+  src1->xd[1] = _mm_round_pd (src1->xd[4], _MM_FROUND_CUR_DIRECTION);
+
   src1->xi[0] = _mm_hadd_epi16 (tdst->xi[2], src2->xi[3]);
   src1->xi[1] = _mm_hadd_epi32 (tdst->xi[0], src2->xi[1]);
   tdst->xi[2] = _mm_hadds_epi16 (src1->xi[4], src2->xi[5]);
@@ -77,16 +86,33 @@ void sse_test ()
   tdst->xi[1] = _mm_sign_epi8 (src1->xi[5], src2->xi[6]);
   tdst->xi[2] = _mm_sign_epi16 (src1->xi[7], src2->xi[0]);
   tdst->xi[3] = _mm_sign_epi32 (src1->xi[1], src2->xi[2]);
+
+  tdst->a[2] = _mm_cmpestri (src1->xi[3], 16, src2->xi[4], 16, 0x0c);
+  tdst->xi[4] = _mm_cmpestrm (src1->xi[3], 16, src2->xi[4], 16, 0x20);
+  tdst->a[5] = _mm_cmpistri (src1->xi[5], src2->xi[6], 0x30);
+  tdst->xi[6] = _mm_cmpistrm (src1->xi[5], src2->xi[6], 0x40);
+
+  tdst->xi[7] = _mm_aesimc_si128 (src1->xi[7]);
+  tdst->xi[0] = _mm_aeskeygenassist_si128 (src1->xi[1], 0x1b);
 }
 
-__attribute__((target("avx2")))
+__attribute__((target("avx2,aes")))
 void vex_test ()
 {
 
   register tmp_u *tdst __asm__("%r16");
   register tmp_u *src1 __asm__("%r17");
   register tmp_u *src2 __asm__("%r18");
-  
+ 
+  src1->xi[0] = _mm_minpos_epu16 (src1->xi[1]);
+  src1->a[2] = _mm256_testc_si256 (src1->yi[2], src2->yi[3]);
+  src1->xf[3] = _mm_round_ss (src1->xf[5], src2->xf[6],
+			      _MM_FROUND_CUR_DIRECTION);
+  src1->yf[4] = _mm256_round_ps (src1->yf[2], _MM_FROUND_CUR_DIRECTION);
+  src1->xd[0] = _mm_round_sd (src1->xd[2], src2->xd[3],
+			      _MM_FROUND_CUR_DIRECTION);
+  src1->yd[1] = _mm256_round_pd (src1->yd[3], _MM_FROUND_CUR_DIRECTION);
+ 
   src1->yi[1] = _mm256_hadd_epi16 (tdst->yi[2], src2->yi[3]);
   src1->yi[2] = _mm256_hadd_epi32 (tdst->yi[0], src2->yi[1]);
   tdst->yi[3] = _mm256_hadds_epi16 (src1->yi[1], src2->yi[2]);
@@ -98,7 +124,6 @@ void vex_test ()
   src1->yi[1] = _mm256_cmpgt_epi64 (tdst->yi[3], src2->yi[0]);
 
   tdst->yf[2] = _mm256_dp_ps (src1->yf[0], src2->yf[1], 0xbf);
-  tdst->xd[3] = _mm_dp_pd (src1->xd[0], src2->xd[1], 0xbf);
 
   tdst->yi[3] = _mm256_mpsadbw_epu8 (src1->yi[1], src2->yi[1], 0xc1);
 
@@ -112,6 +137,14 @@ void vex_test ()
   tdst->yi[2] = _mm256_sign_epi8 (src1->yi[0], src2->yi[1]);
   tdst->yi[3] = _mm256_sign_epi16 (src1->yi[2], src2->yi[3]);
   tdst->yi[0] = _mm256_sign_epi32 (src1->yi[0], src2->yi[1]);
+
+  tdst->a[2] = _mm_cmpestri (src1->xi[3], 16, src2->xi[4], 16, 0x0c);
+  tdst->xi[4] = _mm_cmpestrm (src1->xi[3], 16, src2->xi[4], 16, 0x20);
+  tdst->a[5] = _mm_cmpistri (src1->xi[5], src2->xi[6], 0x30);
+  tdst->xi[6] = _mm_cmpistrm (src1->xi[5], src2->xi[6], 0x40);
+
+  tdst->xi[7] = _mm_aesimc_si128 (src1->xi[7]);
+  tdst->xi[0] = _mm_aeskeygenassist_si128 (src1->xi[1], 0x1b);
 }
 
 /* { dg-final { scan-assembler-not "v?pcmpeqq\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
@@ -134,3 +167,15 @@ void vex_test ()
 /* { dg-final { scan-assembler-not "v?psignb\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
 /* { dg-final { scan-assembler-not "v?psignw\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
 /* { dg-final { scan-assembler-not "v?psignd\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?phminposuw\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?ptest\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?roundss\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?roundsd\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?roundps\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?roundpd\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?pcmpestri\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?pcmpistri\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?pcmpestrm\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?pcmpistrm\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?aesimc\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?aeskeygenassist\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
-- 
2.31.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 12/13] [APX_EGPR] Handle legacy insns that only support GPR16 (4/5)
  2023-09-22 10:56 [PATCH v2 00/13] Support Intel APX EGPR Hongyu Wang
                   ` (10 preceding siblings ...)
  2023-09-22 10:56 ` [PATCH 11/13] [APX EGPR] Handle legacy insns that only support GPR16 (3/5) Hongyu Wang
@ 2023-09-22 10:56 ` Hongyu Wang
  2023-09-22 10:56 ` [PATCH 13/13] [APX EGPR] Handle vex insns that only support GPR16 (5/5) Hongyu Wang
  2023-09-25  2:02 ` [PATCH v2 00/13] Support Intel APX EGPR Hongtao Liu
  13 siblings, 0 replies; 28+ messages in thread
From: Hongyu Wang @ 2023-09-22 10:56 UTC (permalink / raw)
  To: gcc-patches; +Cc: ubizjak, vmakarov, jakub, Kong Lingling, Hongtao Liu

From: Kong Lingling <lingling.kong@intel.com>

The APX enabled hardware should also be AVX10 enabled, thus for map2/3 insns
with evex counterpart, we assume auto promotion to EGPR under APX_F if the
insn uses GPR32. So for below insns, we disabled EGPR usage for their sse
mnenomics, while allowing egpr generation of their v prefixed mnemonics.

insn list:
1. pabsb/pabsw/pabsd
2. pextrb/pextrw/pextrd/pextrq
3. pinsrb/pinsrd/pinsrq
4. pshufb
5. extractps/insertps
6. pmaddubsw
7. pmulhrsw
8. packusdw
9. palignr
10. movntdqa
11. mpsadbw
12. pmuldq/pmulld
13. pmaxsb/pmaxsd, pminsb/pminsd
    pmaxud/pmaxuw, pminud/pminuw
14. (pmovsxbw/pmovsxbd/pmovsxbq,
     pmovsxwd/pmovsxwq, pmovsxdq
     pmovzxbw/pmovzxbd/pmovzxbq,
     pmovzxwd/pmovzxwq, pmovzxdq)
15. aesdec/aesdeclast, aesenc/aesenclast
16. pclmulqdq
17. gf2p8affineqb/gf2p8affineinvqb/gf2p8mulb

gcc/ChangeLog:

	* config/i386/i386.md (*movhi_internal): Split out non-gpr
	supported pextrw with mem constraint to avx/noavx alternatives,
	set jm and attr gpr32 0 to the noavx alternative.
	(*mov<mode>_internal): Likewise.
	* config/i386/mmx.md (mmx_pshufbv8qi3): Change "r/m/Bm" to
	"jr/jm/ja" and set_attr gpr32 0 for noavx alternative.
	(mmx_pshufbv4qi3): Likewise.
	(*mmx_pinsrd): Likewise.
	(*mmx_pinsrb): Likewise.
	(*pinsrb): Likewise.
	(mmx_pshufbv8qi3): Likewise.
	(mmx_pshufbv4qi3): Likewise.
	(@sse4_1_insertps_<mode>): Likewise.
	(*mmx_pextrw): Split altrenatives and map non-EGPR
	constraints, attr_gpr32 and attr_isa to noavx mnemonics.
	(*movv2qi_internal): Likewise.
	(*pextrw): Likewise.
	(*mmx_pextrb): Likewise.
	(*mmx_pextrb_zext): Likewise.
	(*pextrb): Likewise.
	(*pextrb_zext): Likewise.
	(vec_extractv2si_1): Likewise.
	(vec_extractv2si_1_zext): Likewise.
	* config/i386/sse.md: (vi128_h_r): New mode attr for
	pinsr{bw}/pextr{bw} with reg operand.
	(*abs<mode>2): Split altrenatives and %v in mnemonics, map
	non-EGPR constraints, gpr32 and isa attrs to noavx mnemonics.
	(*vec_extract<mode>): Likewise.
	(*vec_extract<mode>): Likewise for HFBF pattern.
	(*vec_extract<PEXTR_MODE12:mode>_zext): Likewise.
	(*vec_extractv4si_1): Likewise.
	(*vec_extractv4si_zext): Likewise.
	(*vec_extractv2di_1): Likewise.
	(*vec_concatv2si_sse4_1): Likewise.
	(<sse2p4_1>_pinsr<ssemodesuffix>): Likewise.
	(vec_concatv2di): Likewise.
	(*sse4_1_<code>v2qiv2di2<mask_name>_1): Likewise.
	(ssse3_avx2>_pshufb<mode>3<mask_name>): Change "r/m/Bm" to
	"jr/jm/ja" and set_attr gpr32 0 for noavx alternative, split
	%v for avx/noavx alternatives if necessary.
	(*vec_concatv2sf_sse4_1): Likewise.
	(*sse4_1_extractps): Likewise.
	(vec_set<mode>_0): Likewise for VI4F_128.
	(*vec_setv4sf_sse4_1): Likewise.
	(@sse4_1_insertps<mode>): Likewise.
	(ssse3_pmaddubsw128): Likewise.
	(*<ssse3_avx2>_pmulhrsw<mode>3<mask_name>): Likewise.
	(<sse4_1_avx2>_packusdw<mask_name>): Likewise.
	(<ssse3_avx2>_palignr<mode>): Likewise.
	(<vi8_sse4_1_avx2_avx512>_movntdqa): Likewise.
	(<sse4_1_avx2>_mpsadbw): Likewise.
	(*sse4_1_mulv2siv2di3<mask_name>): Likewise.
	(*<sse4_1_avx2>_mul<mode>3<mask_name>): Likewise.
	(*sse4_1_<code><mode>3<mask_name>): Likewise.
	(*<code>v8hi3): Likewise.
	(*<code>v16qi3): Likewise.
	(*sse4_1_<code>v8qiv8hi2<mask_name>_1): Likewise.
	(*sse4_1_zero_extendv8qiv8hi2_3): Likewise.
	(*sse4_1_zero_extendv8qiv8hi2_4): Likewise.
	(*sse4_1_<code>v4qiv4si2<mask_name>_1): Likewise.
	(*sse4_1_<code>v4hiv4si2<mask_name>_1): Likewise.
	(*sse4_1_zero_extendv4hiv4si2_3): Likewise.
	(*sse4_1_zero_extendv4hiv4si2_4): Likewise.
	(*sse4_1_<code>v2hiv2di2<mask_name>_1): Likewise.
	(*sse4_1_<code>v2siv2di2<mask_name>_1): Likewise.
	(*sse4_1_zero_extendv2siv2di2_3): Likewise.
	(*sse4_1_zero_extendv2siv2di2_4): Likewise.
	(aesdec): Likewise.
	(aesdeclast): Likewise.
	(aesenc): Likewise.
	(aesenclast): Likewise.
	(pclmulqdq): Likewise.
	(vgf2p8affineinvqb_<mode><mask_name>): Likewise.
	(vgf2p8affineqb_<mode><mask_name>): Likewise.
	(vgf2p8mulb_<mode><mask_name>): Likewise.

Co-authored-by: Hongyu Wang <hongyu.wang@intel.com>
Co-authored-by: Hongtao Liu <hongtao.liu@intel.com>
---
 gcc/config/i386/i386.md |  42 +++---
 gcc/config/i386/mmx.md  | 143 ++++++++++++---------
 gcc/config/i386/sse.md  | 274 ++++++++++++++++++++++++++--------------
 3 files changed, 289 insertions(+), 170 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 271d417146c..c09ee3989cb 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -2868,9 +2868,9 @@ (define_peephole2
 
 (define_insn "*movhi_internal"
   [(set (match_operand:HI 0 "nonimmediate_operand"
-    "=r,r,r,m ,*k,*k ,r ,m ,*k ,?r,?*v,*v,*v,*v,m")
+    "=r,r,r,m ,*k,*k ,r ,m ,*k ,?r,?*v,*v,*v,*v,jm,m")
 	(match_operand:HI 1 "general_operand"
-    "r ,n,m,rn,r ,*km,*k,*k,CBC,*v,r  ,C ,*v,m ,*v"))]
+    "r ,n,m,rn,r ,*km,*k,*k,CBC,*v,r  ,C ,*v,m ,*x,*v"))]
   "!(MEM_P (operands[0]) && MEM_P (operands[1]))
    && ix86_hardreg_mov_ok (operands[0], operands[1])"
 {
@@ -2925,15 +2925,21 @@ (define_insn "*movhi_internal"
 	(cond [(eq_attr "alternative" "9,10,11,12,13")
 		  (const_string "sse2")
 	       (eq_attr "alternative" "14")
-		  (const_string "sse4")
+		  (const_string "sse4_noavx")
+	       (eq_attr "alternative" "15")
+		  (const_string "avx")
 	       ]
 	       (const_string "*")))
+   (set (attr "gpr32")
+	(if_then_else (eq_attr "alternative" "14")
+		      (const_string "0")
+		      (const_string "1")))
    (set (attr "type")
      (cond [(eq_attr "alternative" "4,5,6,7")
 	      (const_string "mskmov")
 	    (eq_attr "alternative" "8")
 	      (const_string "msklog")
-	    (eq_attr "alternative" "13,14")
+	    (eq_attr "alternative" "13,14,15")
 	      (if_then_else (match_test "TARGET_AVX512FP16")
 		(const_string "ssemov")
 		(const_string "sselog1"))
@@ -2958,7 +2964,7 @@ (define_insn "*movhi_internal"
    (set (attr "prefix")
 	(cond [(eq_attr "alternative" "4,5,6,7,8")
 		 (const_string "vex")
-	       (eq_attr "alternative" "9,10,11,12,13,14")
+	       (eq_attr "alternative" "9,10,11,12,13,14,15")
 		 (const_string "maybe_evex")
 	      ]
 	      (const_string "orig")))
@@ -2967,7 +2973,7 @@ (define_insn "*movhi_internal"
 	      (if_then_else (match_test "TARGET_AVX512FP16")
 		(const_string "HI")
 		(const_string "SI"))
-	    (eq_attr "alternative" "13,14")
+	    (eq_attr "alternative" "13,14,15")
 	      (if_then_else (match_test "TARGET_AVX512FP16")
 		(const_string "HI")
 		(const_string "TI"))
@@ -4320,9 +4326,9 @@ (define_mode_attr hfbfconstf
 
 (define_insn "*mov<mode>_internal"
  [(set (match_operand:HFBF 0 "nonimmediate_operand"
-	 "=?r,?r,?r,?m,v,v,?r,m,?v,v")
+	 "=?r,?r,?r,?m,v,v,?r,jm,m,?v,v")
        (match_operand:HFBF 1 "general_operand"
-	 "r  ,F ,m ,r<hfbfconstf>,C,v, v,v,r ,m"))]
+	 "r  ,F ,m ,r<hfbfconstf>,C,v, v,v,v,r ,m"))]
  "!(MEM_P (operands[0]) && MEM_P (operands[1]))
   && (lra_in_progress
       || reload_completed
@@ -4358,18 +4364,24 @@ (define_insn "*mov<mode>_internal"
     }
 }
   [(set (attr "isa")
-	(cond [(eq_attr "alternative" "4,5,6,8,9")
+	(cond [(eq_attr "alternative" "4,5,6,9,10")
 		 (const_string "sse2")
 	       (eq_attr "alternative" "7")
-		 (const_string "sse4")
+		 (const_string "sse4_noavx")
+	       (eq_attr "alternative" "8")
+		 (const_string "avx")
 	      ]
 	      (const_string "*")))
+   (set (attr "gpr32")
+	(if_then_else (eq_attr "alternative" "8")
+		      (const_string "0")
+		      (const_string "1")))
    (set (attr "type")
 	(cond [(eq_attr "alternative" "4")
 		 (const_string "sselog1")
-	       (eq_attr "alternative" "5,6,8")
+	       (eq_attr "alternative" "5,6,9")
 		 (const_string "ssemov")
-	       (eq_attr "alternative" "7,9")
+	       (eq_attr "alternative" "7,8,10")
 		 (if_then_else
 		   (match_test ("TARGET_AVX512FP16"))
 		   (const_string "ssemov")
@@ -4389,19 +4401,19 @@ (define_insn "*mov<mode>_internal"
 		 ]
 	      (const_string "imov")))
    (set (attr "prefix")
-	(cond [(eq_attr "alternative" "4,5,6,7,8,9")
+	(cond [(eq_attr "alternative" "4,5,6,7,8,9,10")
 		 (const_string "maybe_vex")
 	      ]
 	      (const_string "orig")))
    (set (attr "mode")
 	(cond [(eq_attr "alternative" "4")
 		 (const_string "V4SF")
-	       (eq_attr "alternative" "6,8")
+	       (eq_attr "alternative" "6,9")
 		 (if_then_else
 		   (match_test "TARGET_AVX512FP16")
 		   (const_string "HI")
 		   (const_string "SI"))
-	       (eq_attr "alternative" "7,9")
+	       (eq_attr "alternative" "7,8,10")
 		 (if_then_else
 		   (match_test "TARGET_AVX512FP16")
 		   (const_string "HI")
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index ef578222945..73809585a5d 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -418,9 +418,9 @@ (define_expand "movv2qi"
 
 (define_insn "*movv2qi_internal"
   [(set (match_operand:V2QI 0 "nonimmediate_operand"
-    "=r,r,r,m ,v,v,v,m,r,v")
+    "=r,r,r,m ,v,v,v,jm,m,r,v")
 	(match_operand:V2QI 1 "general_operand"
-    "r ,C,m,rC,C,v,m,v,v,r"))]
+    "r ,C,m,rC,C,v,m,x,v,v,r"))]
   "!(MEM_P (operands[0]) && MEM_P (operands[1]))"
 {
   switch (get_attr_type (insn))
@@ -453,20 +453,26 @@ (define_insn "*movv2qi_internal"
     }
 }
   [(set (attr "isa")
-	(cond [(eq_attr "alternative" "6,8,9")
+	(cond [(eq_attr "alternative" "6,9,10")
 		  (const_string "sse2")
 	       (eq_attr "alternative" "7")
-		  (const_string "sse4")
+		  (const_string "sse4_noavx")
+	       (eq_attr "alternative" "8")
+		  (const_string "avx")
 	       ]
 	       (const_string "*")))
+   (set (attr "gpr32")
+	(if_then_else (eq_attr "alternative" "7")
+		      (const_string "0")
+		      (const_string "1")))
    (set (attr "type")
-     (cond [(eq_attr "alternative" "6,7")
+     (cond [(eq_attr "alternative" "6,7,8")
 	      (if_then_else (match_test "TARGET_AVX512FP16")
 		(const_string "ssemov")
 		(const_string "sselog1"))
 	    (eq_attr "alternative" "4")
 	      (const_string "sselog1")
-	    (eq_attr "alternative" "5,8,9")
+	    (eq_attr "alternative" "5,9,10")
 	      (const_string "ssemov")
 	    (match_test "optimize_function_for_size_p (cfun)")
 	      (const_string "imov")
@@ -483,16 +489,16 @@ (define_insn "*movv2qi_internal"
 	   ]
 	   (const_string "imov")))
    (set (attr "prefix")
-	(cond [(eq_attr "alternative" "4,5,6,7,8,9")
+	(cond [(eq_attr "alternative" "4,5,6,7,8,9,10")
 		 (const_string "maybe_evex")
 	      ]
 	      (const_string "orig")))
    (set (attr "mode")
-     (cond [(eq_attr "alternative" "6,7")
+     (cond [(eq_attr "alternative" "6,7,8")
 	      (if_then_else (match_test "TARGET_AVX512FP16")
 		(const_string "HI")
 		(const_string "TI"))
-	    (eq_attr "alternative" "8,9")
+	    (eq_attr "alternative" "9,10")
 	      (if_then_else (match_test "TARGET_AVX512FP16")
 		(const_string "HI")
 		(const_string "SI"))
@@ -526,9 +532,9 @@ (define_insn "*movv2qi_internal"
 	    ]
 	    (const_string "HI")))
    (set (attr "preferred_for_speed")
-     (cond [(eq_attr "alternative" "8")
+     (cond [(eq_attr "alternative" "9")
 	      (symbol_ref "TARGET_INTER_UNIT_MOVES_FROM_VEC")
-	    (eq_attr "alternative" "9")
+	    (eq_attr "alternative" "10")
 	      (symbol_ref "TARGET_INTER_UNIT_MOVES_TO_VEC")
 	   ]
 	   (symbol_ref "true")))])
@@ -1167,7 +1173,7 @@ (define_expand "vcond<mode>v2sf"
 (define_insn "@sse4_1_insertps_<mode>"
   [(set (match_operand:V2FI 0 "register_operand" "=Yr,*x,v")
 	(unspec:V2FI
-	  [(match_operand:V2FI 2 "nonimmediate_operand" "Yrm,*xm,vm")
+	  [(match_operand:V2FI 2 "nonimmediate_operand" "Yrjm,*xjm,vm")
 	   (match_operand:V2FI 1 "register_operand" "0,0,v")
 	   (match_operand:SI 3 "const_0_to_255_operand")]
 	  UNSPEC_INSERTPS))]
@@ -1193,6 +1199,7 @@ (define_insn "@sse4_1_insertps_<mode>"
     }
 }
   [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "gpr32" "0,0,1")
    (set_attr "type" "sselog")
    (set_attr "prefix_data16" "1,1,*")
    (set_attr "prefix_extra" "1")
@@ -3952,7 +3959,7 @@ (define_insn "*mmx_pinsrd"
   [(set (match_operand:V2SI 0 "register_operand" "=x,Yv")
         (vec_merge:V2SI
           (vec_duplicate:V2SI
-            (match_operand:SI 2 "nonimmediate_operand" "rm,rm"))
+            (match_operand:SI 2 "nonimmediate_operand" "jrjm,rm"))
 	  (match_operand:V2SI 1 "register_operand" "0,Yv")
           (match_operand:SI 3 "const_int_operand")))]
   "TARGET_SSE4_1 && TARGET_MMX_WITH_SSE
@@ -3971,6 +3978,7 @@ (define_insn "*mmx_pinsrd"
     }
 }
   [(set_attr "isa" "noavx,avx")
+   (set_attr "gpr32" "0,1")
    (set_attr "prefix_extra" "1")
    (set_attr "type" "sselog")
    (set_attr "length_immediate" "1")
@@ -4031,7 +4039,7 @@ (define_insn "*mmx_pinsrb"
   [(set (match_operand:V8QI 0 "register_operand" "=x,YW")
         (vec_merge:V8QI
           (vec_duplicate:V8QI
-            (match_operand:QI 2 "nonimmediate_operand" "rm,rm"))
+            (match_operand:QI 2 "nonimmediate_operand" "jrjm,rm"))
 	  (match_operand:V8QI 1 "register_operand" "0,YW")
           (match_operand:SI 3 "const_int_operand")))]
   "TARGET_SSE4_1 && TARGET_MMX_WITH_SSE
@@ -4057,28 +4065,31 @@ (define_insn "*mmx_pinsrb"
 }
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sselog")
+   (set_attr "gpr32" "0,1")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "TI")])
 
 (define_insn "*mmx_pextrw"
-  [(set (match_operand:HI 0 "register_sse4nonimm_operand" "=r,r,m")
+  [(set (match_operand:HI 0 "register_sse4nonimm_operand" "=r,r,jm,m")
 	(vec_select:HI
-	  (match_operand:V4HI 1 "register_operand" "y,YW,YW")
+	  (match_operand:V4HI 1 "register_operand" "y,YW,YW,YW")
 	  (parallel [(match_operand:SI 2 "const_0_to_3_operand")])))]
   "(TARGET_MMX || TARGET_MMX_WITH_SSE)
    && (TARGET_SSE || TARGET_3DNOW_A)"
   "@
    pextrw\t{%2, %1, %k0|%k0, %1, %2}
    %vpextrw\t{%2, %1, %k0|%k0, %1, %2}
-   %vpextrw\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "isa" "*,sse2,sse4")
-   (set_attr "mmx_isa" "native,*,*")
-   (set_attr "type" "mmxcvt,sselog1,sselog1")
+   pextrw\t{%2, %1, %0|%0, %1, %2}
+   vpextrw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "*,sse2,sse4_noavx,avx")
+   (set_attr "gpr32" "1,1,0,1")
+   (set_attr "mmx_isa" "native,*,*,*")
+   (set_attr "type" "mmxcvt,sselog1,sselog1,sselog1")
    (set_attr "length_immediate" "1")
-   (set_attr "prefix" "orig,maybe_vex,maybe_vex")
-   (set_attr "mode" "DI,TI,TI")])
+   (set_attr "prefix" "orig,maybe_vex,maybe_vex,maybe_evex")
+   (set_attr "mode" "DI,TI,TI,TI")])
 
 (define_insn "*mmx_pextrw_zext"
   [(set (match_operand:SWI48 0 "register_operand" "=r,r")
@@ -4099,29 +4110,34 @@ (define_insn "*mmx_pextrw_zext"
    (set_attr "mode" "DI,TI")])
 
 (define_insn "*mmx_pextrb"
-  [(set (match_operand:QI 0 "nonimmediate_operand" "=r,m")
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=jr,jm,r,m")
 	(vec_select:QI
-	  (match_operand:V8QI 1 "register_operand" "YW,YW")
+	  (match_operand:V8QI 1 "register_operand" "YW,YW,YW,YW")
 	  (parallel [(match_operand:SI 2 "const_0_to_7_operand")])))]
   "TARGET_SSE4_1 && TARGET_MMX_WITH_SSE"
   "@
-   %vpextrb\t{%2, %1, %k0|%k0, %1, %2}
-   %vpextrb\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "type" "sselog1")
+   pextrb\t{%2, %1, %k0|%k0, %1, %2}
+   pextrb\t{%2, %1, %0|%0, %1, %2}
+   vpextrb\t{%2, %1, %k0|%k0, %1, %2}
+   vpextrb\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "noavx,noavx,avx,avx")
+   (set_attr "gpr32" "1,0,1,1")
+   (set_attr "type" "sselog1")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "TI")])
 
 (define_insn "*mmx_pextrb_zext"
-  [(set (match_operand:SWI248 0 "register_operand" "=r")
+  [(set (match_operand:SWI248 0 "register_operand" "=jr,r")
 	(zero_extend:SWI248
 	  (vec_select:QI
-	    (match_operand:V8QI 1 "register_operand" "YW")
+	    (match_operand:V8QI 1 "register_operand" "YW,YW")
 	    (parallel [(match_operand:SI 2 "const_0_to_7_operand")]))))]
   "TARGET_SSE4_1 && TARGET_MMX_WITH_SSE"
   "%vpextrb\t{%2, %1, %k0|%k0, %1, %2}"
-  [(set_attr "type" "sselog1")
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "sselog1")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "maybe_vex")
@@ -4131,13 +4147,14 @@ (define_insn "mmx_pshufbv8qi3"
   [(set (match_operand:V8QI 0 "register_operand" "=x,Yw")
 	(unspec:V8QI
 	  [(match_operand:V8QI 1 "register_operand" "0,Yw")
-	   (match_operand:V16QI 2 "vector_operand" "xBm,Ywm")]
+	   (match_operand:V16QI 2 "vector_operand" "xja,Ywm")]
 	  UNSPEC_PSHUFB))]
   "TARGET_SSSE3 && TARGET_MMX_WITH_SSE"
   "@
    pshufb\t{%2, %0|%0, %2}
    vpshufb\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,avx")
+   (set_attr "gpr32" "0,1")
    (set_attr "type" "sselog1")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "orig,maybe_evex")
@@ -4148,13 +4165,14 @@ (define_insn "mmx_pshufbv4qi3"
   [(set (match_operand:V4QI 0 "register_operand" "=x,Yw")
 	(unspec:V4QI
 	  [(match_operand:V4QI 1 "register_operand" "0,Yw")
-	   (match_operand:V16QI 2 "vector_operand" "xBm,Ywm")]
+	   (match_operand:V16QI 2 "vector_operand" "xja,Ywm")]
 	  UNSPEC_PSHUFB))]
   "TARGET_SSSE3"
   "@
    pshufb\t{%2, %0|%0, %2}
    vpshufb\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,avx")
+   (set_attr "gpr32" "0,1")
    (set_attr "type" "sselog1")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "orig,maybe_evex")
@@ -4414,29 +4432,31 @@ (define_split
 ;; Avoid combining registers from different units in a single alternative,
 ;; see comment above inline_secondary_memory_needed function in i386.cc
 (define_insn "*vec_extractv2si_1"
-  [(set (match_operand:SI 0 "nonimmediate_operand"     "=y,rm,x,x,y,x,r")
+  [(set (match_operand:SI 0 "nonimmediate_operand"     "=y,jrjm,rm,x,x,y,x,r")
 	(vec_select:SI
-	  (match_operand:V2SI 1 "nonimmediate_operand" " 0,x ,x,0,o,o,o")
+	  (match_operand:V2SI 1 "nonimmediate_operand" " 0,x,   x ,x,0,o,o,o")
 	  (parallel [(const_int 1)])))]
   "(TARGET_MMX || TARGET_MMX_WITH_SSE)
    && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
   "@
    punpckhdq\t%0, %0
-   %vpextrd\t{$1, %1, %0|%0, %1, 1}
+   pextrd\t{$1, %1, %0|%0, %1, 1}
+   vpextrd\t{$1, %1, %0|%0, %1, 1}
    %vpshufd\t{$0xe5, %1, %0|%0, %1, 0xe5}
    shufps\t{$0xe5, %0, %0|%0, %0, 0xe5}
    #
    #
    #"
-  [(set_attr "isa" "*,sse4,sse2,noavx,*,*,*")
-   (set_attr "mmx_isa" "native,*,*,*,native,*,*")
-   (set_attr "type" "mmxcvt,ssemov,sseshuf1,sseshuf1,mmxmov,ssemov,imov")
+  [(set_attr "isa" "*,sse4_noavx,avx,sse2,noavx,*,*,*")
+   (set_attr "gpr32" "1,0,1,1,1,1,1,1")
+   (set_attr "mmx_isa" "native,*,*,*,*,native,*,*")
+   (set_attr "type" "mmxcvt,ssemov,ssemov,sseshuf1,sseshuf1,mmxmov,ssemov,imov")
    (set (attr "length_immediate")
-     (if_then_else (eq_attr "alternative" "1,2,3")
+     (if_then_else (eq_attr "alternative" "1,2,3,4")
 		   (const_string "1")
 		   (const_string "*")))
-   (set_attr "prefix" "orig,maybe_vex,maybe_vex,orig,orig,orig,orig")
-   (set_attr "mode" "DI,TI,TI,V4SF,SI,SI,SI")])
+   (set_attr "prefix" "orig,orig,maybe_evex,maybe_vex,orig,orig,orig,orig")
+   (set_attr "mode" "DI,TI,TI,TI,V4SF,SI,SI,SI")])
 
 (define_split
   [(set (match_operand:SI 0 "register_operand")
@@ -4448,15 +4468,16 @@ (define_split
   "operands[1] = adjust_address (operands[1], SImode, 4);")
 
 (define_insn "*vec_extractv2si_1_zext"
-  [(set (match_operand:DI 0 "register_operand" "=r")
+  [(set (match_operand:DI 0 "register_operand" "=jr,r")
 	(zero_extend:DI
 	  (vec_select:SI
-	    (match_operand:V2SI 1 "register_operand" "x")
+	    (match_operand:V2SI 1 "register_operand" "x,x")
 	    (parallel [(const_int 1)]))))]
   "(TARGET_MMX || TARGET_MMX_WITH_SSE)
    && TARGET_64BIT && TARGET_SSE4_1"
   "%vpextrd\t{$1, %1, %k0|%k0, %1, 1}"
-  [(set_attr "type" "sselog1")
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "sselog1")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "maybe_vex")
@@ -4606,7 +4627,7 @@ (define_insn "*pinsrb"
   [(set (match_operand:V4QI 0 "register_operand" "=x,YW")
         (vec_merge:V4QI
           (vec_duplicate:V4QI
-            (match_operand:QI 2 "nonimmediate_operand" "rm,rm"))
+            (match_operand:QI 2 "nonimmediate_operand" "jrjm,rm"))
 	  (match_operand:V4QI 1 "register_operand" "0,YW")
           (match_operand:SI 3 "const_int_operand")))]
   "TARGET_SSE4_1
@@ -4631,6 +4652,7 @@ (define_insn "*pinsrb"
     }
 }
   [(set_attr "isa" "noavx,avx")
+   (set_attr "gpr32" "0,1")
    (set_attr "type" "sselog")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
@@ -4638,15 +4660,17 @@ (define_insn "*pinsrb"
    (set_attr "mode" "TI")])
 
 (define_insn "*pextrw"
-  [(set (match_operand:HI 0 "register_sse4nonimm_operand" "=r,m")
+  [(set (match_operand:HI 0 "register_sse4nonimm_operand" "=r,jm,m")
 	(vec_select:HI
-	  (match_operand:V2HI 1 "register_operand" "YW,YW")
+	  (match_operand:V2HI 1 "register_operand" "YW,YW,YW")
 	  (parallel [(match_operand:SI 2 "const_0_to_1_operand")])))]
   "TARGET_SSE2"
   "@
    %vpextrw\t{%2, %1, %k0|%k0, %1, %2}
-   %vpextrw\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "isa" "*,sse4")
+   pextrw\t{%2, %1, %0|%0, %1, %2}
+   vpextrw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "*,sse4_noavx,avx")
+   (set_attr "gpr32" "1,0,1")
    (set_attr "type" "sselog1")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "maybe_vex")
@@ -4666,29 +4690,34 @@ (define_insn "*pextrw_zext"
    (set_attr "mode" "TI")])
 
 (define_insn "*pextrb"
-  [(set (match_operand:QI 0 "nonimmediate_operand" "=r,m")
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=jr,jm,r,m")
 	(vec_select:QI
-	  (match_operand:V4QI 1 "register_operand" "YW,YW")
+	  (match_operand:V4QI 1 "register_operand" "YW,YW,YW,YW")
 	  (parallel [(match_operand:SI 2 "const_0_to_3_operand")])))]
   "TARGET_SSE4_1"
   "@
-   %vpextrb\t{%2, %1, %k0|%k0, %1, %2}
-   %vpextrb\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "type" "sselog1")
+   pextrb\t{%2, %1, %k0|%k0, %1, %2}
+   pextrb\t{%2, %1, %0|%0, %1, %2}
+   vpextrb\t{%2, %1, %k0|%k0, %1, %2}
+   vpextrb\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "noavx,noavx,avx,avx")
+   (set_attr "gpr32" "1,0,1,1")
+   (set_attr "type" "sselog1")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "TI")])
 
 (define_insn "*pextrb_zext"
-  [(set (match_operand:SWI248 0 "register_operand" "=r")
+  [(set (match_operand:SWI248 0 "register_operand" "=jr,r")
 	(zero_extend:SWI248
 	  (vec_select:QI
-	    (match_operand:V4QI 1 "register_operand" "YW")
+	    (match_operand:V4QI 1 "register_operand" "YW,YW")
 	    (parallel [(match_operand:SI 2 "const_0_to_3_operand")]))))]
   "TARGET_SSE4_1"
   "%vpextrb\t{%2, %1, %k0|%k0, %1, %2}"
-  [(set_attr "type" "sselog1")
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "sselog1")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "maybe_vex")
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 4db3940e422..d3b59c4866b 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -10836,7 +10836,7 @@ (define_insn "*vec_concatv2sf_sse4_1"
 	  (match_operand:SF 1 "nonimmediate_operand"
 	  "  0, 0,Yv, 0,0, v,m, 0 , m")
 	  (match_operand:SF 2 "nonimm_or_0_operand"
-	  " Yr,*x,Yv, m,m, m,C,*ym, C")))]
+	  " Yr,*x,Yv, jm,jm, m,C,*ym, C")))]
   "TARGET_SSE4_1 && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
   "@
    unpcklps\t{%2, %0|%0, %2}
@@ -10868,6 +10868,10 @@ (define_insn "*vec_concatv2sf_sse4_1"
      (if_then_else (eq_attr "alternative" "7,8")
 		   (const_string "native")
 		   (const_string "*")))
+   (set (attr "gpr32")
+     (if_then_else (eq_attr "alternative" "3,4")
+		   (const_string "0")
+		   (const_string "1")))
    (set (attr "prefix_data16")
      (if_then_else (eq_attr "alternative" "3,4")
 		   (const_string "1")
@@ -10959,7 +10963,7 @@ (define_insn "vec_set<mode>_0"
 	(vec_merge:VI4F_128
 	  (vec_duplicate:VI4F_128
 	    (match_operand:<ssescalarmode> 2 "general_operand"
-	  " Yr,*x,v,m,r ,m,x,v,?rm,?rm,?rm,!x,?re,!*fF"))
+	  " Yr,*x,v,m,r ,m,x,v,?jrjm,?jrjm,?rm,!x,?re,!*fF"))
 	  (match_operand:VI4F_128 1 "nonimm_or_0_operand"
 	  " C , C,C,C,C ,C,0,v,0  ,0  ,x  ,0 ,0   ,0")
 	  (const_int 1)))]
@@ -10999,6 +11003,10 @@ (define_insn "vec_set<mode>_0"
 	      (const_string "fmov")
 	   ]
 	   (const_string "ssemov")))
+   (set (attr "gpr32")
+     (if_then_else (eq_attr "alternative" "8,9")
+		   (const_string "0")
+		   (const_string "1")))
    (set (attr "prefix_extra")
      (if_then_else (eq_attr "alternative" "8,9,10")
 		   (const_string "1")
@@ -11169,7 +11177,7 @@ (define_insn "*vec_setv4sf_sse4_1"
   [(set (match_operand:V4SF 0 "register_operand" "=Yr,*x,v")
 	(vec_merge:V4SF
 	  (vec_duplicate:V4SF
-	    (match_operand:SF 2 "nonimmediate_operand" "Yrm,*xm,vm"))
+	    (match_operand:SF 2 "nonimmediate_operand" "Yrjm,*xjm,vm"))
 	  (match_operand:V4SF 1 "register_operand" "0,0,v")
 	  (match_operand:SI 3 "const_int_operand")))]
   "TARGET_SSE4_1
@@ -11190,6 +11198,7 @@ (define_insn "*vec_setv4sf_sse4_1"
 }
   [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "sselog")
+   (set_attr "gpr32" "0,0,1")
    (set_attr "prefix_data16" "1,1,*")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
@@ -11264,7 +11273,7 @@ (define_insn_and_split "*vec_setv2di_0_zero_extendsi_1"
 (define_insn "@sse4_1_insertps_<mode>"
   [(set (match_operand:VI4F_128 0 "register_operand" "=Yr,*x,v")
 	(unspec:VI4F_128
-	  [(match_operand:VI4F_128 2 "nonimmediate_operand" "Yrm,*xm,vm")
+	  [(match_operand:VI4F_128 2 "nonimmediate_operand" "Yrjm,*xjm,vm")
 	   (match_operand:VI4F_128 1 "register_operand" "0,0,v")
 	   (match_operand:SI 3 "const_0_to_255_operand")]
 	  UNSPEC_INSERTPS))]
@@ -11290,6 +11299,7 @@ (define_insn "@sse4_1_insertps_<mode>"
     }
 }
   [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "gpr32" "0,0,1")
    (set_attr "type" "sselog")
    (set_attr "prefix_data16" "1,1,*")
    (set_attr "prefix_extra" "1")
@@ -11367,7 +11377,7 @@ (define_insn_and_split "*vec_extractv4sf_0"
   "operands[1] = gen_lowpart (SFmode, operands[1]);")
 
 (define_insn_and_split "*sse4_1_extractps"
-  [(set (match_operand:SF 0 "nonimmediate_operand" "=rm,rm,rm,Yv,Yv")
+  [(set (match_operand:SF 0 "nonimmediate_operand" "=jrjm,jrjm,rm,Yv,Yv")
 	(vec_select:SF
 	  (match_operand:V4SF 1 "register_operand" "Yr,*x,v,0,v")
 	  (parallel [(match_operand:SI 2 "const_0_to_3_operand")])))]
@@ -11401,6 +11411,7 @@ (define_insn_and_split "*sse4_1_extractps"
   DONE;
 }
   [(set_attr "isa" "noavx,noavx,avx,noavx,avx")
+   (set_attr "gpr32" "0,0,1,1,1")
    (set_attr "type" "sselog,sselog,sselog,*,*")
    (set_attr "prefix_data16" "1,1,1,*,*")
    (set_attr "prefix_extra" "1,1,1,*,*")
@@ -12265,9 +12276,9 @@ (define_insn_and_split "*vec_extract<mode>_0"
   "operands[1] = gen_lowpart (<ssescalarmode>mode, operands[1]);")
 
 (define_insn "*vec_extract<mode>"
-  [(set (match_operand:HFBF 0 "register_sse4nonimm_operand" "=?r,m,x,v")
+  [(set (match_operand:HFBF 0 "register_sse4nonimm_operand" "=?r,jm,m,x,v")
 	(vec_select:HFBF
-	  (match_operand:<ssevecmode> 1 "register_operand" "v,v,0,v")
+	  (match_operand:<ssevecmode> 1 "register_operand" "v,x,v,0,v")
 	  (parallel
 	    [(match_operand:SI 2 "const_0_to_7_operand")])))]
   "TARGET_SSE2"
@@ -12277,12 +12288,14 @@ (define_insn "*vec_extract<mode>"
     case 0:
       return "%vpextrw\t{%2, %1, %k0|%k0, %1, %2}";
     case 1:
-      return "%vpextrw\t{%2, %1, %0|%0, %1, %2}";
-
+      return "pextrw\t{%2, %1, %0|%0, %1, %2}";
     case 2:
+      return "vpextrw\t{%2, %1, %0|%0, %1, %2}";
+
+    case 3:
       operands[2] = GEN_INT (INTVAL (operands[2]) * 2);
       return "psrldq\t{%2, %0|%0, %2}";
-    case 3:
+    case 4:
       operands[2] = GEN_INT (INTVAL (operands[2]) * 2);
       return "vpsrldq\t{%2, %1, %0|%0, %1, %2}";
 
@@ -12290,8 +12303,9 @@ (define_insn "*vec_extract<mode>"
       gcc_unreachable ();
    }
 }
-  [(set_attr "isa" "*,sse4,noavx,avx")
-   (set_attr "type" "sselog1,sselog1,sseishft1,sseishft1")
+  [(set_attr "isa" "*,sse4_noavx,avx,noavx,avx")
+   (set_attr "gpr32" "1,0,1,1,1")
+   (set_attr "type" "sselog1,sselog1,sselog1,sseishft1,sseishft1")
    (set_attr "prefix" "maybe_evex")
    (set_attr "mode" "TI")])
 
@@ -15653,7 +15667,7 @@ (define_insn "*sse4_1_mulv2siv2di3<mask_name>"
 	      (parallel [(const_int 0) (const_int 2)])))
 	  (sign_extend:V2DI
 	    (vec_select:V2SI
-	      (match_operand:V4SI 2 "vector_operand" "YrBm,*xBm,vm")
+	      (match_operand:V4SI 2 "vector_operand" "Yrja,*xja,vm")
 	      (parallel [(const_int 0) (const_int 2)])))))]
   "TARGET_SSE4_1 && <mask_avx512vl_condition>
    && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
@@ -15662,6 +15676,7 @@ (define_insn "*sse4_1_mulv2siv2di3<mask_name>"
    pmuldq\t{%2, %0|%0, %2}
    vpmuldq\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "gpr32" "0,0,1")
    (set_attr "type" "sseimul")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "orig,orig,vex")
@@ -15899,7 +15914,7 @@ (define_insn "*<sse4_1_avx2>_mul<mode>3<mask_name>"
   [(set (match_operand:VI4_AVX512F 0 "register_operand" "=Yr,*x,v")
 	(mult:VI4_AVX512F
 	  (match_operand:VI4_AVX512F 1 "bcst_vector_operand" "%0,0,v")
-	  (match_operand:VI4_AVX512F 2 "bcst_vector_operand" "YrBm,*xBm,vmBr")))]
+	  (match_operand:VI4_AVX512F 2 "bcst_vector_operand" "Yrja,*xja,vmBr")))]
   "TARGET_SSE4_1 && ix86_binary_operator_ok (MULT, <MODE>mode, operands)
   && <mask_mode512bit_condition>"
   "@
@@ -15907,6 +15922,7 @@ (define_insn "*<sse4_1_avx2>_mul<mode>3<mask_name>"
    pmulld\t{%2, %0|%0, %2}
    vpmulld\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "gpr32" "0,0,1")
    (set_attr "type" "sseimul")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "<bcst_mask_prefix4>")
@@ -16711,7 +16727,7 @@ (define_insn "*sse4_1_<code><mode>3<mask_name>"
   [(set (match_operand:VI14_128 0 "register_operand" "=Yr,*x,<v_Yw>")
 	(smaxmin:VI14_128
 	  (match_operand:VI14_128 1 "vector_operand" "%0,0,<v_Yw>")
-	  (match_operand:VI14_128 2 "vector_operand" "YrBm,*xBm,<v_Yw>m")))]
+	  (match_operand:VI14_128 2 "vector_operand" "Yrja,*xja,<v_Yw>m")))]
   "TARGET_SSE4_1
    && <mask_mode512bit_condition>
    && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
@@ -16722,6 +16738,7 @@ (define_insn "*sse4_1_<code><mode>3<mask_name>"
   [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "sseiadd")
    (set_attr "prefix_extra" "1")
+   (set_attr "gpr32" "0,0,1")
    (set_attr "prefix" "orig,orig,vex")
    (set_attr "mode" "TI")])
 
@@ -16729,13 +16746,14 @@ (define_insn "*<code>v8hi3"
   [(set (match_operand:V8HI 0 "register_operand" "=x,Yw")
 	(smaxmin:V8HI
 	  (match_operand:V8HI 1 "vector_operand" "%0,Yw")
-	  (match_operand:V8HI 2 "vector_operand" "xBm,Ywm")))]
+	  (match_operand:V8HI 2 "vector_operand" "xja,Ywm")))]
   "TARGET_SSE2 && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
   "@
    p<maxmin_int>w\t{%2, %0|%0, %2}
    vp<maxmin_int>w\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sseiadd")
+   (set_attr "gpr32" "0,1")
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "TI")])
 
@@ -16803,6 +16821,7 @@ (define_insn "*sse4_1_<code><mode>3<mask_name>"
    vp<maxmin_int><ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "sseiadd")
+   (set_attr "gpr32" "0,0,1")
    (set_attr "prefix_extra" "1,1,*")
    (set_attr "prefix" "orig,orig,vex")
    (set_attr "mode" "TI")])
@@ -16811,12 +16830,13 @@ (define_insn "*<code>v16qi3"
   [(set (match_operand:V16QI 0 "register_operand" "=x,Yw")
 	(umaxmin:V16QI
 	  (match_operand:V16QI 1 "vector_operand" "%0,Yw")
-	  (match_operand:V16QI 2 "vector_operand" "xBm,Ywm")))]
+	  (match_operand:V16QI 2 "vector_operand" "xja,Ywm")))]
   "TARGET_SSE2 && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
   "@
    p<maxmin_int>b\t{%2, %0|%0, %2}
    vp<maxmin_int>b\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,avx")
+   (set_attr "gpr32" "0,1")
    (set_attr "type" "sseiadd")
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "TI")])
@@ -18808,7 +18828,7 @@ (define_insn "<sse2p4_1>_pinsr<ssemodesuffix>"
   [(set (match_operand:PINSR_MODE 0 "register_operand" "=x,x,x,x,v,v,&x")
 	(vec_merge:PINSR_MODE
 	  (vec_duplicate:PINSR_MODE
-	    (match_operand:<ssescalarmode> 2 "nonimmediate_operand" "r,m,r,m,r,m,x"))
+	    (match_operand:<ssescalarmode> 2 "nonimmediate_operand" "jr,jm,r,m,r,m,x"))
 	  (match_operand:PINSR_MODE 1 "register_operand" "0,0,x,x,v,v,x")
 	  (match_operand:SI 3 "const_int_operand")))]
   "TARGET_SSE2
@@ -18845,6 +18865,7 @@ (define_insn "<sse2p4_1>_pinsr<ssemodesuffix>"
 }
   [(set_attr "isa" "noavx,noavx,avx,avx,<pinsr_evex_isa>,<pinsr_evex_isa>,avx2")
    (set_attr "type" "sselog")
+   (set_attr "gpr32" "0,0,1,1,1,1,1")
    (set (attr "prefix_rex")
      (if_then_else
        (and (not (match_test "TARGET_AVX"))
@@ -20005,17 +20026,23 @@ (define_insn_and_split "*vec_extract<mode>_0_mem"
   operands[4] = gen_lowpart (<ssescalarmode>mode, operands[2]);
 })
 
+(define_mode_attr vi128_jr_r
+ [(V16QI "jr") (V8HI "r")])
+
 (define_insn "*vec_extract<mode>"
-  [(set (match_operand:<ssescalarmode> 0 "register_sse4nonimm_operand" "=r,m")
+  [(set (match_operand:<ssescalarmode> 0 "register_sse4nonimm_operand" "=<vi128_jr_r>,r,jm,m")
 	(vec_select:<ssescalarmode>
-	  (match_operand:PEXTR_MODE12 1 "register_operand" "YW,YW")
+	  (match_operand:PEXTR_MODE12 1 "register_operand" "YW,YW,YW,YW")
 	  (parallel
 	    [(match_operand:SI 2 "const_0_to_<ssescalarnummask>_operand")])))]
   "TARGET_SSE2"
   "@
-   %vpextr<ssemodesuffix>\t{%2, %1, %k0|%k0, %1, %2}
-   %vpextr<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "isa" "*,sse4")
+   pextr<ssemodesuffix>\t{%2, %1, %k0|%k0, %1, %2}
+   vpextr<ssemodesuffix>\t{%2, %1, %k0|%k0, %1, %2}
+   pextr<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}
+   vpextr<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "sse2_noavx,avx,sse4_noavx,avx")
+   (set_attr "gpr32" "1,1,0,1")
    (set_attr "type" "sselog1")
    (set (attr "prefix_extra")
      (if_then_else
@@ -20023,20 +20050,21 @@ (define_insn "*vec_extract<mode>"
        (const_string "*")
        (const_string "1")))
    (set_attr "length_immediate" "1")
-   (set_attr "prefix" "maybe_vex,maybe_vex")
+   (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "TI")])
 
 (define_insn "*vec_extract<PEXTR_MODE12:mode>_zext"
-  [(set (match_operand:SWI48 0 "register_operand" "=r")
+  [(set (match_operand:SWI48 0 "register_operand" "=<vi128_jr_r>,r")
 	(zero_extend:SWI48
 	  (vec_select:<PEXTR_MODE12:ssescalarmode>
-	    (match_operand:PEXTR_MODE12 1 "register_operand" "YW")
+	    (match_operand:PEXTR_MODE12 1 "register_operand" "YW,YW")
 	    (parallel
 	      [(match_operand:SI 2
 		"const_0_to_<PEXTR_MODE12:ssescalarnummask>_operand")]))))]
   "TARGET_SSE2"
   "%vpextr<PEXTR_MODE12:ssemodesuffix>\t{%2, %1, %k0|%k0, %1, %2}"
-  [(set_attr "type" "sselog1")
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "sselog1")
    (set (attr "prefix_extra")
      (if_then_else
        (eq (const_string "<PEXTR_MODE12:MODE>mode") (const_string "V8HImode"))
@@ -20047,15 +20075,16 @@ (define_insn "*vec_extract<PEXTR_MODE12:mode>_zext"
    (set_attr "mode" "TI")])
 
 (define_insn "*vec_extractv16qi_zext"
-  [(set (match_operand:HI 0 "register_operand" "=r")
+  [(set (match_operand:HI 0 "register_operand" "=jr,r")
 	(zero_extend:HI
 	  (vec_select:QI
-	    (match_operand:V16QI 1 "register_operand" "YW")
+	    (match_operand:V16QI 1 "register_operand" "YW,YW")
 	    (parallel
 	      [(match_operand:SI 2 "const_0_to_15_operand")]))))]
   "TARGET_SSE4_1"
   "%vpextrb\t{%2, %1, %k0|%k0, %1, %2}"
-  [(set_attr "type" "sselog1")
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "sselog1")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "maybe_vex")
@@ -20161,24 +20190,26 @@ (define_split
   "operands[1] = gen_lowpart (SImode, operands[1]);")
 
 (define_insn "*vec_extractv4si"
-  [(set (match_operand:SI 0 "nonimmediate_operand" "=rm,rm,Yr,*x,Yw")
+  [(set (match_operand:SI 0 "nonimmediate_operand" "=jrjm,rm,rm,Yr,*x,Yw")
 	(vec_select:SI
-	  (match_operand:V4SI 1 "register_operand" "  x, v, 0, 0,Yw")
+	  (match_operand:V4SI 1 "register_operand" "x,    x, v, 0, 0, Yw")
 	  (parallel [(match_operand:SI 2 "const_0_to_3_operand")])))]
   "TARGET_SSE4_1"
 {
   switch (which_alternative)
     {
     case 0:
+      return "pextrd\t{%2, %1, %0|%0, %1, %2}";
     case 1:
-      return "%vpextrd\t{%2, %1, %0|%0, %1, %2}";
-
     case 2:
+      return "vpextrd\t{%2, %1, %0|%0, %1, %2}";
+
     case 3:
+    case 4:
       operands[2] = GEN_INT (INTVAL (operands[2]) * 4);
       return "psrldq\t{%2, %0|%0, %2}";
 
-    case 4:
+    case 5:
       operands[2] = GEN_INT (INTVAL (operands[2]) * 4);
       return "vpsrldq\t{%2, %1, %0|%0, %1, %2}";
 
@@ -20186,25 +20217,26 @@ (define_insn "*vec_extractv4si"
       gcc_unreachable ();
     }
 }
-  [(set_attr "isa" "*,avx512dq,noavx,noavx,avx")
-   (set_attr "type" "sselog1,sselog1,sseishft1,sseishft1,sseishft1")
+  [(set_attr "isa" "noavx,avx,avx512dq,noavx,noavx,avx")
+   (set_attr "type" "sselog1,sselog1,sselog1,sseishft1,sseishft1,sseishft1")
+   (set_attr "gpr32" "0,1,1,1,1,1")
    (set (attr "prefix_extra")
      (if_then_else (eq_attr "alternative" "0,1")
 		   (const_string "1")
 		   (const_string "*")))
    (set_attr "length_immediate" "1")
-   (set_attr "prefix" "maybe_vex,evex,orig,orig,maybe_vex")
+   (set_attr "prefix" "orig,vex,evex,orig,orig,maybe_vex")
    (set_attr "mode" "TI")])
 
 (define_insn "*vec_extractv4si_zext"
-  [(set (match_operand:DI 0 "register_operand" "=r,r")
+  [(set (match_operand:DI 0 "register_operand" "=jr,r,r")
 	(zero_extend:DI
 	  (vec_select:SI
-	    (match_operand:V4SI 1 "register_operand" "x,v")
+	    (match_operand:V4SI 1 "register_operand" "x,x,v")
 	    (parallel [(match_operand:SI 2 "const_0_to_3_operand")]))))]
   "TARGET_64BIT && TARGET_SSE4_1"
   "%vpextrd\t{%2, %1, %k0|%k0, %1, %2}"
-  [(set_attr "isa" "*,avx512dq")
+  [(set_attr "isa" "noavx,avx,avx512dq")
    (set_attr "type" "sselog1")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
@@ -20234,13 +20266,14 @@ (define_insn_and_split "*vec_extractv4si_zext_mem"
 })
 
 (define_insn "*vec_extractv2di_1"
-  [(set (match_operand:DI 0 "nonimmediate_operand"     "=rm,rm,m,x,x,Yv,x,v,r")
+  [(set (match_operand:DI 0 "nonimmediate_operand"     "=jrjm,rm,rm,m,x,x,Yv,x,v,r")
 	(vec_select:DI
-	  (match_operand:V2DI 1 "nonimmediate_operand"  "x ,v ,v,0,x, v,x,o,o")
+	  (match_operand:V2DI 1 "nonimmediate_operand"  "x,   x ,v ,v,0,x, v,x,o,o")
 	  (parallel [(const_int 1)])))]
   "TARGET_SSE && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
   "@
-   %vpextrq\t{$1, %1, %0|%0, %1, 1}
+   pextrq\t{$1, %1, %0|%0, %1, 1}
+   vpextrq\t{$1, %1, %0|%0, %1, 1}
    vpextrq\t{$1, %1, %0|%0, %1, 1}
    %vmovhps\t{%1, %0|%0, %1}
    psrldq\t{$8, %0|%0, 8}
@@ -20251,44 +20284,47 @@ (define_insn "*vec_extractv2di_1"
    #"
   [(set (attr "isa")
      (cond [(eq_attr "alternative" "0")
-	      (const_string "x64_sse4")
+	      (const_string "x64_sse4_noavx")
 	    (eq_attr "alternative" "1")
+	      (const_string "x64_avx")
+	    (eq_attr "alternative" "2")
 	      (const_string "x64_avx512dq")
-	    (eq_attr "alternative" "3")
-	      (const_string "sse2_noavx")
 	    (eq_attr "alternative" "4")
-	      (const_string "avx")
+	      (const_string "sse2_noavx")
 	    (eq_attr "alternative" "5")
-	      (const_string "avx512bw")
+	      (const_string "avx")
 	    (eq_attr "alternative" "6")
-	      (const_string "noavx")
+	      (const_string "avx512bw")
 	    (eq_attr "alternative" "8")
+	      (const_string "noavx")
+	    (eq_attr "alternative" "9")
 	      (const_string "x64")
 	   ]
 	   (const_string "*")))
    (set (attr "type")
-     (cond [(eq_attr "alternative" "2,6,7")
+     (cond [(eq_attr "alternative" "3,7,8")
 	      (const_string "ssemov")
-	    (eq_attr "alternative" "3,4,5")
+	    (eq_attr "alternative" "4,5,6")
 	      (const_string "sseishft1")
-	    (eq_attr "alternative" "8")
+	    (eq_attr "alternative" "9")
 	      (const_string "imov")
 	   ]
 	   (const_string "sselog1")))
+   (set_attr "gpr32" "0,1,1,1,1,1,1,1,1,1")
    (set (attr "length_immediate")
-     (if_then_else (eq_attr "alternative" "0,1,3,4,5")
+     (if_then_else (eq_attr "alternative" "0,1,2,4,5,6")
 		   (const_string "1")
 		   (const_string "*")))
    (set (attr "prefix_rex")
-     (if_then_else (eq_attr "alternative" "0,1")
+     (if_then_else (eq_attr "alternative" "0")
 		   (const_string "1")
 		   (const_string "*")))
    (set (attr "prefix_extra")
-     (if_then_else (eq_attr "alternative" "0,1")
+     (if_then_else (eq_attr "alternative" "0")
 		   (const_string "1")
 		   (const_string "*")))
-   (set_attr "prefix" "maybe_vex,evex,maybe_vex,orig,vex,evex,orig,*,*")
-   (set_attr "mode" "TI,TI,V2SF,TI,TI,TI,V4SF,DI,DI")])
+   (set_attr "prefix" "orig,maybe_evex,evex,maybe_vex,orig,vex,evex,orig,*,*")
+   (set_attr "mode" "TI,TI,TI,V2SF,TI,TI,TI,V4SF,DI,DI")])
 
 (define_split
   [(set (match_operand:<ssescalarmode> 0 "register_operand")
@@ -20406,7 +20442,7 @@ (define_insn "*vec_concatv2si_sse4_1"
 	  (match_operand:SI 1 "nonimmediate_operand"
 	  "  0, 0, x,Yv, 0, 0,Yv,rm,  0,rm")
 	  (match_operand:SI 2 "nonimm_or_0_operand"
-	  " rm,rm,rm,rm,Yr,*x,Yv, C,*ym, C")))]
+	  "jrjm,jrjm,rm,rm,Yr,*x,Yv, C,*ym, C")))]
   "TARGET_SSE4_1 && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
   "@
    pinsrd\t{$1, %2, %0|%0, %2, 1}
@@ -20433,6 +20469,10 @@ (define_insn "*vec_concatv2si_sse4_1"
 	      (const_string "mmxmov")
 	   ]
 	   (const_string "sselog")))
+   (set (attr "gpr32")
+     (if_then_else (eq_attr "alternative" "0,1")
+		   (const_string "0")
+		   (const_string "1")))
    (set (attr "prefix_extra")
      (if_then_else (eq_attr "alternative" "0,1,2,3")
 		   (const_string "1")
@@ -20557,7 +20597,7 @@ (define_insn "vec_concatv2di"
 	  (match_operand:DI 1 "register_operand"
 	  "  0, 0,x ,Yv,0,Yv,0,0,v")
 	  (match_operand:DI 2 "nonimmediate_operand"
-	  " rm,rm,rm,rm,x,Yv,x,m,m")))]
+	  " jrm,jrm,rm,rm,x,Yv,x,m,m")))]
   "TARGET_SSE"
   "@
    pinsrq\t{$1, %2, %0|%0, %2, 1}
@@ -20587,6 +20627,10 @@ (define_insn "vec_concatv2di"
        (eq_attr "alternative" "0,1,2,3,4,5")
        (const_string "sselog")
        (const_string "ssemov")))
+   (set (attr "gpr32")
+     (if_then_else (eq_attr "alternative" "0,1")
+		   (const_string "0")
+		   (const_string "1")))
    (set (attr "prefix_rex")
      (if_then_else (eq_attr "alternative" "0,1,2,3")
 		   (const_string "1")
@@ -21519,7 +21563,7 @@ (define_insn "ssse3_pmaddubsw128"
 			   (const_int 12) (const_int 14)])))
 	    (sign_extend:V8HI
 	      (vec_select:V8QI
-		(match_operand:V16QI 2 "vector_operand" "xBm,Ywm")
+		(match_operand:V16QI 2 "vector_operand" "xja,Ywm")
 		(parallel [(const_int 0) (const_int 2)
 			   (const_int 4) (const_int 6)
 			   (const_int 8) (const_int 10)
@@ -21542,6 +21586,7 @@ (define_insn "ssse3_pmaddubsw128"
    pmaddubsw\t{%2, %0|%0, %2}
    vpmaddubsw\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,avx")
+   (set_attr "gpr32" "0,1")
    (set_attr "type" "sseiadd")
    (set_attr "atom_unit" "simul")
    (set_attr "prefix_extra" "1")
@@ -21660,7 +21705,7 @@ (define_insn "*<ssse3_avx2>_pmulhrsw<mode>3<mask_name>"
 		  (sign_extend:<ssedoublemode>
 		    (match_operand:VI2_AVX2_AVX512BW 1 "vector_operand" "%0,<v_Yw>"))
 		  (sign_extend:<ssedoublemode>
-		    (match_operand:VI2_AVX2_AVX512BW 2 "vector_operand" "xBm,<v_Yw>m")))
+		    (match_operand:VI2_AVX2_AVX512BW 2 "vector_operand" "xja,<v_Yw>m")))
 		(const_int 14))
 	      (match_operand:VI2_AVX2_AVX512BW 3 "const1_operand"))
 	    (const_int 1))))]
@@ -21670,6 +21715,7 @@ (define_insn "*<ssse3_avx2>_pmulhrsw<mode>3<mask_name>"
    pmulhrsw\t{%2, %0|%0, %2}
    vpmulhrsw\t{%2, %1, %0<mask_operand4>|%0<mask_operand4>, %1, %2}"
   [(set_attr "isa" "noavx,avx")
+   (set_attr "gpr32" "0,1")
    (set_attr "type" "sseimul")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "orig,maybe_evex")
@@ -21786,13 +21832,14 @@ (define_insn "<ssse3_avx2>_pshufb<mode>3<mask_name>"
   [(set (match_operand:VI1_AVX512 0 "register_operand" "=x,<v_Yw>")
 	(unspec:VI1_AVX512
 	  [(match_operand:VI1_AVX512 1 "register_operand" "0,<v_Yw>")
-	   (match_operand:VI1_AVX512 2 "vector_operand" "xBm,<v_Yw>m")]
+	   (match_operand:VI1_AVX512 2 "vector_operand" "xja,<v_Yw>m")]
 	  UNSPEC_PSHUFB))]
   "TARGET_SSSE3 && <mask_mode512bit_condition> && <mask_avx512bw_condition>"
   "@
    pshufb\t{%2, %0|%0, %2}
    vpshufb\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "isa" "noavx,avx")
+   (set_attr "gpr32" "0,1")
    (set_attr "type" "sselog1")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "orig,maybe_evex")
@@ -21908,7 +21955,7 @@ (define_insn "<ssse3_avx2>_palignr<mode>"
   [(set (match_operand:VIMAX_AVX2_AVX512BW 0 "register_operand" "=x,<v_Yw>")
 	(unspec:VIMAX_AVX2_AVX512BW
 	  [(match_operand:VIMAX_AVX2_AVX512BW 1 "register_operand" "0,<v_Yw>")
-	   (match_operand:VIMAX_AVX2_AVX512BW 2 "vector_operand" "xBm,<v_Yw>m")
+	   (match_operand:VIMAX_AVX2_AVX512BW 2 "vector_operand" "xja,<v_Yw>m")
 	   (match_operand:SI 3 "const_0_to_255_mul_8_operand")]
 	  UNSPEC_PALIGNR))]
   "TARGET_SSSE3"
@@ -21926,6 +21973,7 @@ (define_insn "<ssse3_avx2>_palignr<mode>"
     }
 }
   [(set_attr "isa" "noavx,avx")
+   (set_attr "gpr32" "0,1")
    (set_attr "type" "sseishft")
    (set_attr "atom_unit" "sishuf")
    (set_attr "prefix_extra" "1")
@@ -22000,6 +22048,7 @@ (define_insn_and_split "ssse3_palignrdi"
 }
   [(set_attr "mmx_isa" "native,sse_noavx,avx")
    (set_attr "type" "sseishft")
+   (set_attr "gpr32" "0,0,1")
    (set_attr "atom_unit" "sishuf")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
@@ -22015,12 +22064,14 @@ (define_mode_iterator VI1248_AVX512VL_AVX512BW
    (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")])
 
 (define_insn "*abs<mode>2"
-  [(set (match_operand:VI1248_AVX512VL_AVX512BW 0 "register_operand" "=<v_Yw>")
+  [(set (match_operand:VI1248_AVX512VL_AVX512BW 0 "register_operand" "=x,<v_Yw>")
 	(abs:VI1248_AVX512VL_AVX512BW
-	  (match_operand:VI1248_AVX512VL_AVX512BW 1 "vector_operand" "<v_Yw>Bm")))]
+	  (match_operand:VI1248_AVX512VL_AVX512BW 1 "vector_operand" "xja,<v_Yw>Bm")))]
   "TARGET_SSSE3"
   "%vpabs<ssemodesuffix>\t{%1, %0|%0, %1}"
-  [(set_attr "type" "sselog1")
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "gpr32" "0,1")
+   (set_attr "type" "sselog1")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "<sseinsnmode>")])
@@ -22358,11 +22409,12 @@ (define_mode_attr vi8_sse4_1_avx2_avx512
 
 (define_insn "<vi8_sse4_1_avx2_avx512>_movntdqa"
   [(set (match_operand:VI8_AVX2_AVX512F 0 "register_operand" "=Yr,*x,v")
-	(unspec:VI8_AVX2_AVX512F [(match_operand:VI8_AVX2_AVX512F 1 "memory_operand" "m,m,m")]
+	(unspec:VI8_AVX2_AVX512F [(match_operand:VI8_AVX2_AVX512F 1 "memory_operand" "jm,jm,m")]
 		     UNSPEC_MOVNTDQA))]
   "TARGET_SSE4_1"
   "%vmovntdqa\t{%1, %0|%0, %1}"
   [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "gpr32" "0,0,1")
    (set_attr "type" "ssemov")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "orig,orig,maybe_evex")
@@ -22381,6 +22433,7 @@ (define_insn "<sse4_1_avx2>_mpsadbw"
    mpsadbw\t{%3, %2, %0|%0, %2, %3}
    vmpsadbw\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "gpr32" "0,0,1")
    (set_attr "type" "sselog1")
    (set_attr "gpr32" "0")
    (set_attr "length_immediate" "1")
@@ -22394,7 +22447,7 @@ (define_insn "<sse4_1_avx2>_packusdw<mask_name>"
   [(set (match_operand:VI2_AVX2_AVX512BW 0 "register_operand" "=Yr,*x,<v_Yw>")
 	(unspec:VI2_AVX2_AVX512BW
 	  [(match_operand:<sseunpackmode> 1 "register_operand" "0,0,<v_Yw>")
-	   (match_operand:<sseunpackmode> 2 "vector_operand" "YrBm,*xBm,<v_Yw>m")]
+	   (match_operand:<sseunpackmode> 2 "vector_operand" "Yrja,*xja,<v_Yw>m")]
 	   UNSPEC_US_TRUNCATE))]
   "TARGET_SSE4_1 && <mask_mode512bit_condition> && <mask_avx512bw_condition>"
   "@
@@ -22402,6 +22455,7 @@ (define_insn "<sse4_1_avx2>_packusdw<mask_name>"
    packusdw\t{%2, %0|%0, %2}
    vpackusdw\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "gpr32" "0,0,1")
    (set_attr "type" "sselog")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "orig,orig,<mask_prefix>")
@@ -22748,10 +22802,14 @@ (define_insn "sse4_1_<code>v8qiv8hi2<mask_name>"
 (define_insn "*sse4_1_<code>v8qiv8hi2<mask_name>_1"
   [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,Yw")
 	(any_extend:V8HI
-	  (match_operand:V8QI 1 "memory_operand" "m,m,m")))]
+	  (match_operand:V8QI 1 "memory_operand" "jm,jm,m")))]
   "TARGET_SSE4_1 && <mask_avx512bw_condition> && <mask_avx512vl_condition>"
-  "%vpmov<extsuffix>bw\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
+  "@
+   pmov<extsuffix>bw\t{%1, %0|%0, %1}
+   pmov<extsuffix>bw\t{%1, %0|%0, %1}
+   vpmov<extsuffix>bw\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "gpr32" "0,0,1")
    (set_attr "type" "ssemov")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "orig,orig,maybe_evex")
@@ -22781,7 +22839,7 @@ (define_insn_and_split "*sse4_1_zero_extendv8qiv8hi2_3"
   [(set (match_operand:V16QI 0 "register_operand" "=Yr,*x,Yw")
 	(vec_select:V16QI
 	  (vec_concat:V32QI
-	    (match_operand:V16QI 1 "vector_operand" "YrBm,*xBm,Ywm")
+	    (match_operand:V16QI 1 "vector_operand" "Yrja,*xja,Ywm")
 	    (match_operand:V16QI 2 "const0_operand"))
 	  (match_parallel 3 "pmovzx_parallel"
 	    [(match_operand 4 "const_int_operand")])))]
@@ -22806,7 +22864,8 @@ (define_insn_and_split "*sse4_1_zero_extendv8qiv8hi2_3"
       DONE;
     }
 }
-  [(set_attr "isa" "noavx,noavx,avx")])
+  [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "gpr32" "0,0,1")])
 
 (define_insn_and_split "*sse4_1_zero_extendv8qiv8hi2_4"
   [(set (match_operand:V16QI 0 "register_operand" "=Yr,*x,Yw")
@@ -22814,7 +22873,7 @@ (define_insn_and_split "*sse4_1_zero_extendv8qiv8hi2_4"
 	  (vec_concat:V32QI
 	    (subreg:V16QI
 	      (vec_concat:VI248_128
-		(match_operand:<ssehalfvecmode> 1 "vector_operand" "YrBm,*xBm,Ywm")
+		(match_operand:<ssehalfvecmode> 1 "vector_operand" "Yrja,*xja,Ywm")
 		(match_operand:<ssehalfvecmode> 2 "const0_operand")) 0)
 	    (match_operand:V16QI 3 "const0_operand"))
 	  (match_parallel 4 "pmovzx_parallel"
@@ -22841,7 +22900,8 @@ (define_insn_and_split "*sse4_1_zero_extendv8qiv8hi2_4"
     }
   operands[1] = lowpart_subreg (V16QImode, operands[1], <ssehalfvecmode>mode);
 }
-  [(set_attr "isa" "noavx,noavx,avx")])
+  [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "gpr32" "0,0,1")])
 
 (define_expand "<insn>v8qiv8hi2"
   [(set (match_operand:V8HI 0 "register_operand")
@@ -22960,10 +23020,11 @@ (define_insn "sse4_1_<code>v4qiv4si2<mask_name>"
 (define_insn "*sse4_1_<code>v4qiv4si2<mask_name>_1"
   [(set (match_operand:V4SI 0 "register_operand" "=Yr,*x,v")
 	(any_extend:V4SI
-	  (match_operand:V4QI 1 "memory_operand" "m,m,m")))]
+	  (match_operand:V4QI 1 "memory_operand" "jm,jm,m")))]
   "TARGET_SSE4_1 && <mask_avx512vl_condition>"
   "%vpmov<extsuffix>bd\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "gpr32" "0,0,1")
    (set_attr "type" "ssemov")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "orig,orig,maybe_evex")
@@ -23132,10 +23193,11 @@ (define_insn "sse4_1_<code>v4hiv4si2<mask_name>"
 (define_insn "*sse4_1_<code>v4hiv4si2<mask_name>_1"
   [(set (match_operand:V4SI 0 "register_operand" "=Yr,*x,v")
 	(any_extend:V4SI
-	  (match_operand:V4HI 1 "memory_operand" "m,m,m")))]
+	  (match_operand:V4HI 1 "memory_operand" "jm,jm,m")))]
   "TARGET_SSE4_1 && <mask_avx512vl_condition>"
   "%vpmov<extsuffix>wd\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "gpr32" "0,0,1")
    (set_attr "type" "ssemov")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "orig,orig,maybe_evex")
@@ -23184,7 +23246,7 @@ (define_insn_and_split "*sse4_1_zero_extendv4hiv4si2_3"
   [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,v")
 	(vec_select:V8HI
 	  (vec_concat:V16HI
-	    (match_operand:V8HI 1 "vector_operand" "YrBm,*xBm,vm")
+	    (match_operand:V8HI 1 "vector_operand" "Yrja,*xja,vm")
 	    (match_operand:V8HI 2 "const0_operand"))
 	  (match_parallel 3 "pmovzx_parallel"
 	    [(match_operand 4 "const_int_operand")])))]
@@ -23207,7 +23269,8 @@ (define_insn_and_split "*sse4_1_zero_extendv4hiv4si2_3"
       DONE;
     }
 }
-  [(set_attr "isa" "noavx,noavx,avx")])
+  [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "gpr32" "0,0,1")])
 
 (define_insn_and_split "*sse4_1_zero_extendv4hiv4si2_4"
   [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,v")
@@ -23215,7 +23278,7 @@ (define_insn_and_split "*sse4_1_zero_extendv4hiv4si2_4"
 	  (vec_concat:V16HI
 	    (subreg:V8HI
 	      (vec_concat:VI148_128
-		(match_operand:<ssehalfvecmode> 1 "vector_operand" "YrBm,*xBm,vm")
+		(match_operand:<ssehalfvecmode> 1 "vector_operand" "Yrja,*xja,vm")
 		(match_operand:<ssehalfvecmode> 2 "const0_operand")) 0)
 	    (match_operand:V8HI 3 "const0_operand"))
 	  (match_parallel 4 "pmovzx_parallel"
@@ -23240,7 +23303,8 @@ (define_insn_and_split "*sse4_1_zero_extendv4hiv4si2_4"
     }
   operands[1] = lowpart_subreg (V8HImode, operands[1], <ssehalfvecmode>mode);
 }
-  [(set_attr "isa" "noavx,noavx,avx")])
+  [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "gpr32" "0,0,1")])
 
 (define_insn "avx512f_<code>v8qiv8di2<mask_name>"
   [(set (match_operand:V8DI 0 "register_operand" "=v")
@@ -23378,12 +23442,14 @@ (define_insn "sse4_1_<code>v2qiv2di2<mask_name>"
    (set_attr "mode" "TI")])
 
 (define_insn "*sse4_1_<code>v2qiv2di2<mask_name>_1"
-  [(set (match_operand:V2DI 0 "register_operand" "=v")
+  [(set (match_operand:V2DI 0 "register_operand" "=x,v")
 	(any_extend:V2DI
-	 (match_operand:V2QI 1 "memory_operand" "m")))]
+	 (match_operand:V2QI 1 "memory_operand" "jm,m")))]
   "TARGET_SSE4_1 && <mask_avx512vl_condition>"
   "%vpmov<extsuffix>bq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
-  [(set_attr "type" "ssemov")
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "gpr32" "0,1")
+   (set_attr "type" "ssemov")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "maybe_evex")
    (set_attr "mode" "TI")])
@@ -23517,10 +23583,11 @@ (define_insn "sse4_1_<code>v2hiv2di2<mask_name>"
 (define_insn "*sse4_1_<code>v2hiv2di2<mask_name>_1"
   [(set (match_operand:V2DI 0 "register_operand" "=Yr,*x,v")
 	(any_extend:V2DI
-	  (match_operand:V2HI 1 "memory_operand" "m,m,m")))]
+	  (match_operand:V2HI 1 "memory_operand" "jm,jm,m")))]
   "TARGET_SSE4_1 && <mask_avx512vl_condition>"
   "%vpmov<extsuffix>wq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "gpr32" "0,0,1")
    (set_attr "type" "ssemov")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "orig,orig,maybe_evex")
@@ -23682,10 +23749,11 @@ (define_insn "sse4_1_<code>v2siv2di2<mask_name>"
 (define_insn "*sse4_1_<code>v2siv2di2<mask_name>_1"
   [(set (match_operand:V2DI 0 "register_operand" "=Yr,*x,v")
 	(any_extend:V2DI
-	  (match_operand:V2SI 1 "memory_operand" "m,m,m")))]
+	  (match_operand:V2SI 1 "memory_operand" "jm,jm,m")))]
   "TARGET_SSE4_1 && <mask_avx512vl_condition>"
   "%vpmov<extsuffix>dq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "gpr32" "0,0,1")
    (set_attr "type" "ssemov")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "orig,orig,maybe_evex")
@@ -23712,7 +23780,7 @@ (define_insn_and_split "*sse4_1_zero_extendv2siv2di2_3"
   [(set (match_operand:V4SI 0 "register_operand" "=Yr,*x,v")
 	(vec_select:V4SI
 	  (vec_concat:V8SI
-	    (match_operand:V4SI 1 "vector_operand" "YrBm,*xBm,vm")
+	    (match_operand:V4SI 1 "vector_operand" "Yrja,*xja,vm")
 	    (match_operand:V4SI 2 "const0_operand"))
 	  (match_parallel 3 "pmovzx_parallel"
 	    [(match_operand 4 "const_int_operand")])))]
@@ -23733,14 +23801,15 @@ (define_insn_and_split "*sse4_1_zero_extendv2siv2di2_3"
       DONE;
     }
 }
-  [(set_attr "isa" "noavx,noavx,avx")])
+  [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "gpr32" "0,0,1")])
 
 (define_insn_and_split "*sse4_1_zero_extendv2siv2di2_4"
   [(set (match_operand:V4SI 0 "register_operand" "=Yr,*x,v")
 	(vec_select:V4SI
 	  (vec_concat:V8SI
 	    (vec_concat:V4SI
-	      (match_operand:V2SI 1 "vector_operand" "YrBm, *xBm, vm")
+	      (match_operand:V2SI 1 "vector_operand" "Yrja, *xja, vm")
 	      (match_operand:V2SI 2 "const0_operand"))
 	    (match_operand:V4SI 3 "const0_operand"))
 	  (match_parallel 4 "pmovzx_parallel"
@@ -23762,7 +23831,8 @@ (define_insn_and_split "*sse4_1_zero_extendv2siv2di2_4"
     }
   operands[1] = lowpart_subreg (V4SImode, operands[1], V2SImode);
 }
-  [(set_attr "isa" "noavx,noavx,avx")])
+  [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "gpr32" "0,0,1")])
 
 (define_expand "<insn>v2siv2di2"
   [(set (match_operand:V2DI 0 "register_operand")
@@ -25953,7 +26023,7 @@ (define_insn "xop_vpermil2<mode>3"
 (define_insn "aesenc"
   [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
 	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
-		       (match_operand:V2DI 2 "vector_operand" "xBm,xm,vm")]
+		       (match_operand:V2DI 2 "vector_operand" "xja,xm,vm")]
 		      UNSPEC_AESENC))]
   "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
   "@
@@ -25962,6 +26032,7 @@ (define_insn "aesenc"
    vaesenc\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,aes,avx512vl")
    (set_attr "type" "sselog1")
+   (set_attr "gpr32" "0,1,1")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "orig,vex,evex")
    (set_attr "btver2_decode" "double,double,double")
@@ -25970,7 +26041,7 @@ (define_insn "aesenc"
 (define_insn "aesenclast"
   [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
 	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
-		       (match_operand:V2DI 2 "vector_operand" "xBm,xm,vm")]
+		       (match_operand:V2DI 2 "vector_operand" "xja,xm,vm")]
 		      UNSPEC_AESENCLAST))]
   "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
   "@
@@ -25979,6 +26050,7 @@ (define_insn "aesenclast"
    vaesenclast\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,aes,avx512vl")
    (set_attr "type" "sselog1")
+   (set_attr "gpr32" "0,1,1")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "orig,vex,evex")
    (set_attr "btver2_decode" "double,double,double") 
@@ -25987,7 +26059,7 @@ (define_insn "aesenclast"
 (define_insn "aesdec"
   [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
 	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
-		       (match_operand:V2DI 2 "vector_operand" "xBm,xm,vm")]
+		       (match_operand:V2DI 2 "vector_operand" "xja,xm,vm")]
 		      UNSPEC_AESDEC))]
   "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
   "@
@@ -25996,6 +26068,7 @@ (define_insn "aesdec"
    vaesdec\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,aes,avx512vl")
    (set_attr "type" "sselog1")
+   (set_attr "gpr32" "0,1,1")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "orig,vex,evex")
    (set_attr "btver2_decode" "double,double,double") 
@@ -26004,7 +26077,7 @@ (define_insn "aesdec"
 (define_insn "aesdeclast"
   [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
 	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
-		       (match_operand:V2DI 2 "vector_operand" "xBm,xm,vm")]
+		       (match_operand:V2DI 2 "vector_operand" "xja,xm,vm")]
 		      UNSPEC_AESDECLAST))]
   "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
   "@
@@ -26012,6 +26085,7 @@ (define_insn "aesdeclast"
    vaesdeclast\t{%2, %1, %0|%0, %1, %2}
    vaesdeclast\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,aes,avx512vl")
+   (set_attr "gpr32" "0,1,1")
    (set_attr "type" "sselog1")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "orig,vex,evex")
@@ -26047,7 +26121,7 @@ (define_insn "aeskeygenassist"
 (define_insn "pclmulqdq"
   [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
 	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
-		      (match_operand:V2DI 2 "vector_operand" "xBm,xm,vm")
+		      (match_operand:V2DI 2 "vector_operand" "xja,xm,vm")
 		      (match_operand:SI 3 "const_0_to_255_operand")]
 		     UNSPEC_PCLMUL))]
   "TARGET_PCLMUL"
@@ -26057,6 +26131,7 @@ (define_insn "pclmulqdq"
    vpclmulqdq\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "isa" "noavx,avx,vpclmulqdqvl")
    (set_attr "type" "sselog1")
+   (set_attr "gpr32" "0,1,1")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "orig,vex,evex")
@@ -29403,7 +29478,7 @@ (define_insn "vgf2p8affineinvqb_<mode><mask_name>"
   [(set (match_operand:VI1_AVX512F 0 "register_operand" "=x,v")
 	(unspec:VI1_AVX512F
 	  [(match_operand:VI1_AVX512F 1 "register_operand" "0,v")
-	   (match_operand:VI1_AVX512F 2 "vector_operand" "xBm,vm")
+	   (match_operand:VI1_AVX512F 2 "vector_operand" "xja,vm")
 	   (match_operand 3 "const_0_to_255_operand")]
 	  UNSPEC_GF2P8AFFINEINV))]
   "TARGET_GFNI"
@@ -29411,6 +29486,7 @@ (define_insn "vgf2p8affineinvqb_<mode><mask_name>"
    gf2p8affineinvqb\t{%3, %2, %0| %0, %2, %3}
    vgf2p8affineinvqb\t{%3, %2, %1, %0<mask_operand4>| %0<mask_operand4>, %1, %2, %3}"
   [(set_attr "isa" "noavx,avx")
+   (set_attr "gpr32" "0,1")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "orig,maybe_evex")
    (set_attr "mode" "<sseinsnmode>")])
@@ -29419,7 +29495,7 @@ (define_insn "vgf2p8affineqb_<mode><mask_name>"
   [(set (match_operand:VI1_AVX512F 0 "register_operand" "=x,v")
 	(unspec:VI1_AVX512F
 	  [(match_operand:VI1_AVX512F 1 "register_operand" "0,v")
-	   (match_operand:VI1_AVX512F 2 "vector_operand" "xBm,vm")
+	   (match_operand:VI1_AVX512F 2 "vector_operand" "xja,vm")
 	   (match_operand 3 "const_0_to_255_operand")]
 	  UNSPEC_GF2P8AFFINE))]
   "TARGET_GFNI"
@@ -29427,6 +29503,7 @@ (define_insn "vgf2p8affineqb_<mode><mask_name>"
    gf2p8affineqb\t{%3, %2, %0| %0, %2, %3}
    vgf2p8affineqb\t{%3, %2, %1, %0<mask_operand4>| %0<mask_operand4>, %1, %2, %3}"
   [(set_attr "isa" "noavx,avx")
+   (set_attr "gpr32" "0,1")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "orig,maybe_evex")
    (set_attr "mode" "<sseinsnmode>")])
@@ -29435,13 +29512,14 @@ (define_insn "vgf2p8mulb_<mode><mask_name>"
   [(set (match_operand:VI1_AVX512F 0 "register_operand" "=x,v")
 	(unspec:VI1_AVX512F
 	  [(match_operand:VI1_AVX512F 1 "register_operand" "%0,v")
-	   (match_operand:VI1_AVX512F 2 "vector_operand" "xBm,vm")]
+	   (match_operand:VI1_AVX512F 2 "vector_operand" "xja,vm")]
 	  UNSPEC_GF2P8MUL))]
   "TARGET_GFNI"
   "@
    gf2p8mulb\t{%2, %0| %0, %2}
    vgf2p8mulb\t{%2, %1, %0<mask_operand3>| %0<mask_operand3>, %1, %2}"
   [(set_attr "isa" "noavx,avx")
+   (set_attr "gpr32" "0,1")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "orig,maybe_evex")
    (set_attr "mode" "<sseinsnmode>")])
-- 
2.31.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 13/13] [APX EGPR] Handle vex insns that only support GPR16 (5/5)
  2023-09-22 10:56 [PATCH v2 00/13] Support Intel APX EGPR Hongyu Wang
                   ` (11 preceding siblings ...)
  2023-09-22 10:56 ` [PATCH 12/13] [APX_EGPR] Handle legacy insns that only support GPR16 (4/5) Hongyu Wang
@ 2023-09-22 10:56 ` Hongyu Wang
  2023-09-25  2:02 ` [PATCH v2 00/13] Support Intel APX EGPR Hongtao Liu
  13 siblings, 0 replies; 28+ messages in thread
From: Hongyu Wang @ 2023-09-22 10:56 UTC (permalink / raw)
  To: gcc-patches; +Cc: ubizjak, vmakarov, jakub, Kong Lingling, Hongtao Liu

From: Kong Lingling <lingling.kong@intel.com>

These vex insn may have legacy counterpart that could support EGPR,
but they do not have evex counterpart. Split out its vex part from
patterns and set the vex part to non-EGPR supported by adjusting
constraints and attr_gpr32.

insn list:
1. vmovmskpd/vmovmskps
2. vpmovmskb
3. vrsqrtss/vrsqrtps
4. vrcpss/vrcpps
5. vhaddpd/vhaddps, vhsubpd/vhsubps
6. vldmxcsr/vstmxcsr
7. vaddsubpd/vaddsubps
8. vlddqu
9. vtestps/vtestpd
10. vmaskmovps/vmaskmovpd, vpmaskmovd/vpmaskmovq
11. vperm2f128/vperm2i128
12. vinserti128/vinsertf128
13. vbroadcasti128/vbroadcastf128
14. vcmppd/vcmpps, vcmpss/vcmpsd
15. vgatherdps/vgatherqps, vgatherdpd/vgatherqpd

gcc/ChangeLog:

	* config/i386/constraints.md (jb): New constraint for vsib memory
	that does not allow gpr32.
	* config/i386/i386.md: (setcc_<mode>_sse): Replace m to jm for avx
	alternative and set attr_gpr32 to 0.
	(movmsk_df): Split avx/noavx alternatives and  replace "r" to "jr" for
	avx alternative.
	(<sse>_rcp<mode>2): Split avx/noavx alternatives and replace
	"m/Bm" to "jm/ja" for avx alternative, set its gpr32 attr to 0.
	(*rsqrtsf2_sse): Likewise.
	* config/i386/mmx.md (mmx_pmovmskb): Split alternative 1 to
	avx/noavx and assign jr/r constraint to dest.
	* config/i386/sse.md (<sse>_movmsk<ssemodesuffix><avxsizesuffix>):
	Split avx/noavx alternatives and replace "r" to "jr" for avx alternative.
	(*<sse>_movmsk<ssemodesuffix><avxsizesuffix>_<u>ext): Likewise.
	(*<sse>_movmsk<ssemodesuffix><avxsizesuffix>_lt): Likewise.
	(*<sse>_movmsk<ssemodesuffix><avxsizesuffix>_<u>ext_lt): Likewise.
	(*<sse>_movmsk<ssemodesuffix><avxsizesuffix>_shift): Likewise.
	(*<sse>_movmsk<ssemodesuffix><avxsizesuffix>_<u>ext_shift): Likewise.
	(<sse2_avx2>_pmovmskb): Likewise.
	(*<sse2_avx2>_pmovmskb_zext): Likewise.
	(*sse2_pmovmskb_ext): Likewise.
	(*<sse2_avx2>_pmovmskb_lt): Likewise.
	(*<sse2_avx2>_pmovmskb_zext_lt): Likewise.
	(*sse2_pmovmskb_ext_lt): Likewise.
	(<sse>_rcp<mode>2): Split avx/noavx alternatives and replace
	"m/Bm" to "jm/ja" for avx alternative, set its attr_gpr32 to 0.
	(sse_vmrcpv4sf2): Likewise.
	(*sse_vmrcpv4sf2): Likewise.
	(rsqrt<mode>2): Likewise.
	(sse_vmrsqrtv4sf2): Likewise.
	(*sse_vmrsqrtv4sf2): Likewise.
	(avx_h<insn>v4df3): Likewise.
	(sse3_hsubv2df3): Likewise.
	(avx_h<insn>v8sf3): Likewise.
	(sse3_h<insn>v4sf3): Likewise.
	(<sse3>_lddqu<avxsizesuffix>): Likewise.
	(avx_cmp<mode>3): Likewise.
	(avx_vmcmp<mode>3): Likewise.
	(*sse2_gt<mode>3): Likewise.
	(sse_ldmxcsr): Likewise.
	(sse_stmxcsr): Likewise.
	(avx_vtest<ssemodesuffix><avxsizesuffix>): Replace m to jm for
	avx alternative and set attr_gpr32 to 0.
	(avx2_permv2ti): Likewise.
	(*avx_vperm2f128<mode>_full): Likewise.
	(*avx_vperm2f128<mode>_nozero): Likewise.
	(vec_set_lo_v32qi): Likewise.
	(<avx_avx2>_maskload<ssemodesuffix><avxsizesuffix>): Likewise.
	(<avx_avx2>_maskstore<ssemodesuffix><avxsi)zesuffix>: Likewise.
	(avx_cmp<mode>3): Likewise.
	(avx_vmcmp<mode>3): Likewise.
	(*<sse>_maskcmp<mode>3_comm): Likewise.
	(*avx2_gathersi<VEC_GATHER_MODE:mode>): Replace Tv to jb and set
	attr_gpr32 to 0.
	(*avx2_gathersi<VEC_GATHER_MODE:mode>_2): Likewise.
	(*avx2_gatherdi<VEC_GATHER_MODE:mode>): Likewise.
	(*avx2_gatherdi<VEC_GATHER_MODE:mode>_2): Likewise.
	(*avx2_gatherdi<VI4F_256:mode>_3): Likewise.
	(*avx2_gatherdi<VI4F_256:mode>_4): Likewise.
	(avx_vbroadcastf128_<mode>): Restrict non-egpr alternative to
	noavx512vl, set its constraint to jm and set attr_gpr32 to 0.
	(vec_set_lo_<mode><mask_name>): Likewise.
	(vec_set_lo_<mode><mask_name>): Likewise for SF/SI modes.
	(vec_set_hi_<mode><mask_name>): Likewise.
	(vec_set_hi_<mode><mask_name>): Likewise for SF/SI modes.
	(vec_set_hi_<mode>): Likewise.
	(vec_set_lo_<mode>): Likewise.
	(avx2_set_hi_v32qi): Likewise.

Co-authored-by: Hongyu Wang <hongyu.wang@intel.com>
Co-authored-by: Hongtao Liu <hongtao.liu@intel.com>
---
 gcc/config/i386/constraints.md |   6 +
 gcc/config/i386/i386.md        |  47 +++--
 gcc/config/i386/mmx.md         |  11 +-
 gcc/config/i386/sse.md         | 320 +++++++++++++++++++++------------
 4 files changed, 242 insertions(+), 142 deletions(-)

diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
index 36c268d7f9b..dc91bd94b27 100644
--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -428,3 +428,9 @@ (define_special_memory_constraint "ja"
   (and (match_operand 0 "vector_memory_operand")
        (not (and (match_test "TARGET_APX_EGPR")
 		 (match_test "x86_extended_rex2reg_mentioned_p (op)")))))
+
+(define_address_constraint "jb"
+  "VSIB address operand without EGPR"
+  (and (match_operand 0 "vsib_address_operand")
+       (not (and (match_test "TARGET_APX_EGPR")
+		 (match_test "x86_extended_rex2reg_mentioned_p (op)")))))
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index c09ee3989cb..a0ba1752a54 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -554,7 +554,8 @@ (define_attr "isa" "base,x64,nox64,x64_sse2,x64_sse4,x64_sse4_noavx,
 		    avx,noavx,avx2,noavx2,bmi,bmi2,fma4,fma,avx512f,noavx512f,
 		    avx512bw,noavx512bw,avx512dq,noavx512dq,fma_or_avx512vl,
 		    avx512vl,noavx512vl,avxvnni,avx512vnnivl,avx512fp16,avxifma,
-		    avx512ifmavl,avxneconvert,avx512bf16vl,vpclmulqdqvl"
+		    avx512ifmavl,avxneconvert,avx512bf16vl,vpclmulqdqvl,
+		    avx_noavx512f,avx_noavx512vl"
   (const_string "base"))
 
 ;; The (bounding maximum) length of an instruction immediate.
@@ -908,6 +909,8 @@ (define_attr "enabled" ""
 	 (eq_attr "isa" "sse4_noavx")
 	   (symbol_ref "TARGET_SSE4_1 && !TARGET_AVX")
 	 (eq_attr "isa" "avx") (symbol_ref "TARGET_AVX")
+	 (eq_attr "isa" "avx_noavx512f")
+	   (symbol_ref "TARGET_AVX && !TARGET_AVX512F")
 	 (eq_attr "isa" "noavx") (symbol_ref "!TARGET_AVX")
 	 (eq_attr "isa" "avx2") (symbol_ref "TARGET_AVX2")
 	 (eq_attr "isa" "noavx2") (symbol_ref "!TARGET_AVX2")
@@ -16661,12 +16664,13 @@ (define_insn "setcc_<mode>_sse"
   [(set (match_operand:MODEF 0 "register_operand" "=x,x")
 	(match_operator:MODEF 3 "sse_comparison_operator"
 	  [(match_operand:MODEF 1 "register_operand" "0,x")
-	   (match_operand:MODEF 2 "nonimmediate_operand" "xm,xm")]))]
+	   (match_operand:MODEF 2 "nonimmediate_operand" "xm,xjm")]))]
   "SSE_FLOAT_MODE_P (<MODE>mode)"
   "@
    cmp%D3<ssemodesuffix>\t{%2, %0|%0, %2}
    vcmp%D3<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,avx")
+   (set_attr "gpr32" "1,0")
    (set_attr "type" "ssecmp")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "orig,vex")
@@ -20122,24 +20126,27 @@ (define_insn "*<insn>hf"
    (set_attr "mode" "HF")])
 
 (define_insn "*rcpsf2_sse"
-  [(set (match_operand:SF 0 "register_operand" "=x,x,x")
-	(unspec:SF [(match_operand:SF 1 "nonimmediate_operand" "0,x,m")]
+  [(set (match_operand:SF 0 "register_operand" "=x,x,x,x")
+	(unspec:SF [(match_operand:SF 1 "nonimmediate_operand" "0,x,m,ja")]
 		   UNSPEC_RCP))]
   "TARGET_SSE && TARGET_SSE_MATH"
   "@
    %vrcpss\t{%d1, %0|%0, %d1}
    %vrcpss\t{%d1, %0|%0, %d1}
-   %vrcpss\t{%1, %d0|%d0, %1}"
-  [(set_attr "type" "sse")
+   rcpss\t{%1, %d0|%d0, %1}
+   vrcpss\t{%1, %d0|%d0, %1}"
+  [(set_attr "isa" "*,*,noavx,avx")
+   (set_attr "gpr32" "1,1,1,0")
+   (set_attr "type" "sse")
    (set_attr "atom_sse_attr" "rcp")
    (set_attr "btver2_sse_attr" "rcp")
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "SF")
-   (set_attr "avx_partial_xmm_update" "false,false,true")
+   (set_attr "avx_partial_xmm_update" "false,false,true,true")
    (set (attr "preferred_for_speed")
       (cond [(match_test "TARGET_AVX")
 	       (symbol_ref "true")
-	     (eq_attr "alternative" "1,2")
+	     (eq_attr "alternative" "1,2,3")
 	       (symbol_ref "!TARGET_SSE_PARTIAL_REG_DEPENDENCY")
 	    ]
 	    (symbol_ref "true")))])
@@ -20382,24 +20389,27 @@ (define_insn "sqrtxf2"
    (set_attr "bdver1_decode" "direct")])
 
 (define_insn "*rsqrtsf2_sse"
-  [(set (match_operand:SF 0 "register_operand" "=x,x,x")
-	(unspec:SF [(match_operand:SF 1 "nonimmediate_operand" "0,x,m")]
+  [(set (match_operand:SF 0 "register_operand" "=x,x,x,x")
+	(unspec:SF [(match_operand:SF 1 "nonimmediate_operand" "0,x,m,ja")]
 		   UNSPEC_RSQRT))]
   "TARGET_SSE && TARGET_SSE_MATH"
   "@
    %vrsqrtss\t{%d1, %0|%0, %d1}
    %vrsqrtss\t{%d1, %0|%0, %d1}
-   %vrsqrtss\t{%1, %d0|%d0, %1}"
-  [(set_attr "type" "sse")
+   rsqrtss\t{%1, %d0|%d0, %1}
+   vrsqrtss\t{%1, %d0|%d0, %1}"
+  [(set_attr "isa" "*,*,noavx,avx")
+   (set_attr "gpr32" "1,1,1,0")
+   (set_attr "type" "sse")
    (set_attr "atom_sse_attr" "rcp")
    (set_attr "btver2_sse_attr" "rcp")
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "SF")
-   (set_attr "avx_partial_xmm_update" "false,false,true")
+   (set_attr "avx_partial_xmm_update" "false,false,true,true")
    (set (attr "preferred_for_speed")
       (cond [(match_test "TARGET_AVX")
 	       (symbol_ref "true")
-	     (eq_attr "alternative" "1,2")
+	     (eq_attr "alternative" "1,2,3")
 	       (symbol_ref "!TARGET_SSE_PARTIAL_REG_DEPENDENCY")
 	    ]
 	    (symbol_ref "true")))])
@@ -22103,14 +22113,15 @@ (define_expand "signbitxf2"
 })
 
 (define_insn "movmsk_df"
-  [(set (match_operand:SI 0 "register_operand" "=r")
+  [(set (match_operand:SI 0 "register_operand" "=r,jr")
 	(unspec:SI
-	  [(match_operand:DF 1 "register_operand" "x")]
+	  [(match_operand:DF 1 "register_operand" "x,x")]
 	  UNSPEC_MOVMSK))]
   "SSE_FLOAT_MODE_P (DFmode) && TARGET_SSE_MATH"
   "%vmovmskpd\t{%1, %0|%0, %1}"
-  [(set_attr "type" "ssemov")
-   (set_attr "prefix" "maybe_vex")
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "ssemov")
+   (set_attr "prefix" "maybe_evex")
    (set_attr "mode" "DF")])
 
 ;; Use movmskpd in SSE mode to avoid store forwarding stall
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 73809585a5d..3530615c706 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -5174,13 +5174,14 @@ (define_expand "usadv8qi"
 })
 
 (define_insn_and_split "mmx_pmovmskb"
-  [(set (match_operand:SI 0 "register_operand" "=r,r")
-	(unspec:SI [(match_operand:V8QI 1 "register_operand" "y,x")]
+  [(set (match_operand:SI 0 "register_operand" "=r,r,jr")
+	(unspec:SI [(match_operand:V8QI 1 "register_operand" "y,x,x")]
 		   UNSPEC_MOVMSK))]
   "(TARGET_MMX || TARGET_MMX_WITH_SSE)
    && (TARGET_SSE || TARGET_3DNOW_A)"
   "@
    pmovmskb\t{%1, %0|%0, %1}
+   #
    #"
   "TARGET_SSE2 && reload_completed
    && SSE_REGNO_P (REGNO (operands[1]))"
@@ -5195,9 +5196,9 @@ (define_insn_and_split "mmx_pmovmskb"
   operands[2] = lowpart_subreg (QImode, operands[0],
 				GET_MODE (operands[0]));
 }
-  [(set_attr "mmx_isa" "native,sse")
-   (set_attr "type" "mmxcvt,ssemov")
-   (set_attr "mode" "DI,TI")])
+  [(set_attr "mmx_isa" "native,sse_noavx,avx")
+   (set_attr "type" "mmxcvt,ssemov,ssemov")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_expand "mmx_maskmovq"
   [(set (match_operand:V8QI 0 "memory_operand")
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index d3b59c4866b..6bffd749c6d 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1833,12 +1833,14 @@ (define_peephole2
   "operands[4] = adjust_address (operands[0], V2DFmode, 0);")
 
 (define_insn "<sse3>_lddqu<avxsizesuffix>"
-  [(set (match_operand:VI1 0 "register_operand" "=x")
-	(unspec:VI1 [(match_operand:VI1 1 "memory_operand" "m")]
+  [(set (match_operand:VI1 0 "register_operand" "=x,x")
+	(unspec:VI1 [(match_operand:VI1 1 "memory_operand" "m,jm")]
 		    UNSPEC_LDDQU))]
   "TARGET_SSE3"
   "%vlddqu\t{%1, %0|%0, %1}"
-  [(set_attr "type" "ssemov")
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "ssemov")
+   (set_attr "gpr32" "1,0")
    (set_attr "movu" "1")
    (set (attr "prefix_data16")
      (if_then_else
@@ -2507,12 +2509,14 @@ (define_insn "<sse>_div<mode>3<mask_name><round_name>"
    (set_attr "mode" "<MODE>")])
 
 (define_insn "<sse>_rcp<mode>2"
-  [(set (match_operand:VF1_128_256 0 "register_operand" "=x")
+  [(set (match_operand:VF1_128_256 0 "register_operand" "=x,x")
 	(unspec:VF1_128_256
-	  [(match_operand:VF1_128_256 1 "vector_operand" "xBm")] UNSPEC_RCP))]
+	  [(match_operand:VF1_128_256 1 "vector_operand" "xBm,xja")] UNSPEC_RCP))]
   "TARGET_SSE"
   "%vrcpps\t{%1, %0|%0, %1}"
-  [(set_attr "type" "sse")
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "sse")
+   (set_attr "gpr32" "1,0")
    (set_attr "atom_sse_attr" "rcp")
    (set_attr "btver2_sse_attr" "rcp")
    (set_attr "prefix" "maybe_vex")
@@ -2521,7 +2525,7 @@ (define_insn "<sse>_rcp<mode>2"
 (define_insn "sse_vmrcpv4sf2"
   [(set (match_operand:V4SF 0 "register_operand" "=x,x")
 	(vec_merge:V4SF
-	  (unspec:V4SF [(match_operand:V4SF 1 "nonimmediate_operand" "xm,xm")]
+	  (unspec:V4SF [(match_operand:V4SF 1 "nonimmediate_operand" "xm,xjm")]
 		       UNSPEC_RCP)
 	  (match_operand:V4SF 2 "register_operand" "0,x")
 	  (const_int 1)))]
@@ -2531,6 +2535,7 @@ (define_insn "sse_vmrcpv4sf2"
    vrcpss\t{%1, %2, %0|%0, %2, %k1}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sse")
+   (set_attr "gpr32" "1,0")
    (set_attr "atom_sse_attr" "rcp")
    (set_attr "btver2_sse_attr" "rcp")
    (set_attr "prefix" "orig,vex")
@@ -2540,7 +2545,7 @@ (define_insn "*sse_vmrcpv4sf2"
   [(set (match_operand:V4SF 0 "register_operand" "=x,x")
 	(vec_merge:V4SF
 	  (vec_duplicate:V4SF
-	    (unspec:SF [(match_operand:SF 1 "nonimmediate_operand" "xm,xm")]
+	    (unspec:SF [(match_operand:SF 1 "nonimmediate_operand" "xm,xjm")]
 		         UNSPEC_RCP))
 	  (match_operand:V4SF 2 "register_operand" "0,x")
 	  (const_int 1)))]
@@ -2550,6 +2555,7 @@ (define_insn "*sse_vmrcpv4sf2"
    vrcpss\t{%1, %2, %0|%0, %2, %1}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sse")
+   (set_attr "gpr32" "1,0")
    (set_attr "atom_sse_attr" "rcp")
    (set_attr "btver2_sse_attr" "rcp")
    (set_attr "prefix" "orig,vex")
@@ -2726,12 +2732,14 @@ (define_expand "rsqrt<mode>2"
   "TARGET_AVX512FP16")
 
 (define_insn "<sse>_rsqrt<mode>2"
-  [(set (match_operand:VF1_128_256 0 "register_operand" "=x")
+  [(set (match_operand:VF1_128_256 0 "register_operand" "=x,x")
 	(unspec:VF1_128_256
-	  [(match_operand:VF1_128_256 1 "vector_operand" "xBm")] UNSPEC_RSQRT))]
+	  [(match_operand:VF1_128_256 1 "vector_operand" "xBm,xja")] UNSPEC_RSQRT))]
   "TARGET_SSE"
   "%vrsqrtps\t{%1, %0|%0, %1}"
-  [(set_attr "type" "sse")
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "sse")
+   (set_attr "gpr32" "1,0")
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "<MODE>")])
 
@@ -2790,7 +2798,7 @@ (define_insn "rsqrt14_<mode>_mask"
 (define_insn "sse_vmrsqrtv4sf2"
   [(set (match_operand:V4SF 0 "register_operand" "=x,x")
 	(vec_merge:V4SF
-	  (unspec:V4SF [(match_operand:V4SF 1 "nonimmediate_operand" "xm,xm")]
+	  (unspec:V4SF [(match_operand:V4SF 1 "nonimmediate_operand" "xm,xjm")]
 		       UNSPEC_RSQRT)
 	  (match_operand:V4SF 2 "register_operand" "0,x")
 	  (const_int 1)))]
@@ -2800,6 +2808,7 @@ (define_insn "sse_vmrsqrtv4sf2"
    vrsqrtss\t{%1, %2, %0|%0, %2, %k1}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sse")
+   (set_attr "gpr32" "1,0")
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "SF")])
 
@@ -2807,7 +2816,7 @@ (define_insn "*sse_vmrsqrtv4sf2"
   [(set (match_operand:V4SF 0 "register_operand" "=x,x")
 	(vec_merge:V4SF
 	  (vec_duplicate:V4SF
-	    (unspec:SF [(match_operand:SF 1 "nonimmediate_operand" "xm,xm")]
+	    (unspec:SF [(match_operand:SF 1 "nonimmediate_operand" "xm,xjm")]
 		         UNSPEC_RSQRT))
 	  (match_operand:V4SF 2 "register_operand" "0,x")
 	  (const_int 1)))]
@@ -2817,6 +2826,7 @@ (define_insn "*sse_vmrsqrtv4sf2"
    vrsqrtss\t{%1, %2, %0|%0, %2, %1}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sse")
+   (set_attr "gpr32" "1,0")
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "SF")])
 
@@ -2992,7 +3002,7 @@ (define_insn "vec_addsub<mode>3"
         (vec_merge:VF_128_256
 	  (minus:VF_128_256
 	    (match_operand:VF_128_256 1 "register_operand" "0,x")
-	    (match_operand:VF_128_256 2 "vector_operand" "xBm, xm"))
+	    (match_operand:VF_128_256 2 "vector_operand" "xBm, xjm"))
 	  (plus:VF_128_256 (match_dup 1) (match_dup 2))
 	  (const_int <addsub_cst>)))]
   "TARGET_SSE3"
@@ -3001,6 +3011,7 @@ (define_insn "vec_addsub<mode>3"
    vaddsub<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sseadd")
+   (set_attr "gpr32" "1,0")
    (set (attr "atom_unit")
      (if_then_else
        (match_test "<MODE>mode == V2DFmode")
@@ -3144,7 +3155,7 @@ (define_insn "avx_h<insn>v4df3"
 	      (vec_select:DF (match_dup 1) (parallel [(const_int 1)])))
 	    (plusminus:DF
 	      (vec_select:DF
-		(match_operand:V4DF 2 "nonimmediate_operand" "xm")
+		(match_operand:V4DF 2 "nonimmediate_operand" "xjm")
 		(parallel [(const_int 0)]))
 	      (vec_select:DF (match_dup 2) (parallel [(const_int 1)]))))
 	  (vec_concat:V2DF
@@ -3157,6 +3168,7 @@ (define_insn "avx_h<insn>v4df3"
   "TARGET_AVX"
   "vh<plusminus_mnemonic>pd\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "type" "sseadd")
+   (set_attr "gpr32" "0")
    (set_attr "prefix" "vex")
    (set_attr "mode" "V4DF")])
 
@@ -3187,7 +3199,7 @@ (define_insn "*sse3_haddv2df3"
 	      (parallel [(match_operand:SI 4 "const_0_to_1_operand")])))
 	  (plus:DF
 	    (vec_select:DF
-	      (match_operand:V2DF 2 "vector_operand" "xBm,xm")
+	      (match_operand:V2DF 2 "vector_operand" "xBm,xjm")
 	      (parallel [(match_operand:SI 5 "const_0_to_1_operand")]))
 	    (vec_select:DF
 	      (match_dup 2)
@@ -3199,6 +3211,7 @@ (define_insn "*sse3_haddv2df3"
    haddpd\t{%2, %0|%0, %2}
    vhaddpd\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,avx")
+   (set_attr "gpr32" "1,0")
    (set_attr "type" "sseadd")
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "V2DF")])
@@ -3213,7 +3226,7 @@ (define_insn "sse3_hsubv2df3"
 	    (vec_select:DF (match_dup 1) (parallel [(const_int 1)])))
 	  (minus:DF
 	    (vec_select:DF
-	      (match_operand:V2DF 2 "vector_operand" "xBm,xm")
+	      (match_operand:V2DF 2 "vector_operand" "xBm,xjm")
 	      (parallel [(const_int 0)]))
 	    (vec_select:DF (match_dup 2) (parallel [(const_int 1)])))))]
   "TARGET_SSE3"
@@ -3222,6 +3235,7 @@ (define_insn "sse3_hsubv2df3"
    vhsubpd\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sseadd")
+   (set_attr "gpr32" "1,0")
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "V2DF")])
 
@@ -3278,7 +3292,7 @@ (define_insn "avx_h<insn>v8sf3"
 	    (vec_concat:V2SF
 	      (plusminus:SF
 		(vec_select:SF
-		  (match_operand:V8SF 2 "nonimmediate_operand" "xm")
+		  (match_operand:V8SF 2 "nonimmediate_operand" "xjm")
 		  (parallel [(const_int 0)]))
 		(vec_select:SF (match_dup 2) (parallel [(const_int 1)])))
 	      (plusminus:SF
@@ -3302,6 +3316,7 @@ (define_insn "avx_h<insn>v8sf3"
   "TARGET_AVX"
   "vh<plusminus_mnemonic>ps\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "type" "sseadd")
+   (set_attr "gpr32" "0")
    (set_attr "prefix" "vex")
    (set_attr "mode" "V8SF")])
 
@@ -3320,7 +3335,7 @@ (define_insn "sse3_h<insn>v4sf3"
 	  (vec_concat:V2SF
 	    (plusminus:SF
 	      (vec_select:SF
-		(match_operand:V4SF 2 "vector_operand" "xBm,xm")
+		(match_operand:V4SF 2 "vector_operand" "xBm,xjm")
 		(parallel [(const_int 0)]))
 	      (vec_select:SF (match_dup 2) (parallel [(const_int 1)])))
 	    (plusminus:SF
@@ -3332,6 +3347,7 @@ (define_insn "sse3_h<insn>v4sf3"
    vh<plusminus_mnemonic>ps\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sseadd")
+   (set_attr "gpr32" "1,0")
    (set_attr "atom_unit" "complex")
    (set_attr "prefix" "orig,vex")
    (set_attr "prefix_rep" "1,*")
@@ -3525,12 +3541,13 @@ (define_insn "avx_cmp<mode>3"
   [(set (match_operand:VF_128_256 0 "register_operand" "=x")
 	(unspec:VF_128_256
 	  [(match_operand:VF_128_256 1 "register_operand" "x")
-	   (match_operand:VF_128_256 2 "nonimmediate_operand" "xm")
+	   (match_operand:VF_128_256 2 "nonimmediate_operand" "xjm")
 	   (match_operand:SI 3 "const_0_to_31_operand")]
 	  UNSPEC_PCMP))]
   "TARGET_AVX"
   "vcmp<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssecmp")
+   (set_attr "gpr32" "0")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "vex")
    (set_attr "mode" "<MODE>")])
@@ -3736,7 +3753,7 @@ (define_insn "avx_vmcmp<mode>3"
 	(vec_merge:VF_128
 	  (unspec:VF_128
 	    [(match_operand:VF_128 1 "register_operand" "x")
-	     (match_operand:VF_128 2 "nonimmediate_operand" "xm")
+	     (match_operand:VF_128 2 "nonimmediate_operand" "xjm")
 	     (match_operand:SI 3 "const_0_to_31_operand")]
 	    UNSPEC_PCMP)
 	 (match_dup 1)
@@ -3744,6 +3761,7 @@ (define_insn "avx_vmcmp<mode>3"
   "TARGET_AVX"
   "vcmp<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}"
   [(set_attr "type" "ssecmp")
+   (set_attr "gpr32" "0")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "vex")
    (set_attr "mode" "<ssescalarmode>")])
@@ -3752,13 +3770,14 @@ (define_insn "*<sse>_maskcmp<mode>3_comm"
   [(set (match_operand:VF_128_256 0 "register_operand" "=x,x")
 	(match_operator:VF_128_256 3 "sse_comparison_operator"
 	  [(match_operand:VF_128_256 1 "register_operand" "%0,x")
-	   (match_operand:VF_128_256 2 "vector_operand" "xBm,xm")]))]
+	   (match_operand:VF_128_256 2 "vector_operand" "xBm,xjm")]))]
   "TARGET_SSE
    && GET_RTX_CLASS (GET_CODE (operands[3])) == RTX_COMM_COMPARE"
   "@
    cmp%D3<ssemodesuffix>\t{%2, %0|%0, %2}
    vcmp%D3<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,avx")
+   (set_attr "gpr32" "1,0")
    (set_attr "type" "ssecmp")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "orig,vex")
@@ -3768,12 +3787,13 @@ (define_insn "<sse>_maskcmp<mode>3"
   [(set (match_operand:VF_128_256 0 "register_operand" "=x,x")
 	(match_operator:VF_128_256 3 "sse_comparison_operator"
 	  [(match_operand:VF_128_256 1 "register_operand" "0,x")
-	   (match_operand:VF_128_256 2 "vector_operand" "xBm,xm")]))]
+	   (match_operand:VF_128_256 2 "vector_operand" "xBm,xjm")]))]
   "TARGET_SSE"
   "@
    cmp%D3<ssemodesuffix>\t{%2, %0|%0, %2}
    vcmp%D3<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,avx")
+   (set_attr "gpr32" "1,0")
    (set_attr "type" "ssecmp")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "orig,vex")
@@ -3784,7 +3804,7 @@ (define_insn "<sse>_vmmaskcmp<mode>3"
 	(vec_merge:VF_128
 	 (match_operator:VF_128 3 "sse_comparison_operator"
 	   [(match_operand:VF_128 1 "register_operand" "0,x")
-	    (match_operand:VF_128 2 "nonimmediate_operand" "xm,xm")])
+	    (match_operand:VF_128 2 "nonimmediate_operand" "xm,xjm")])
 	 (match_dup 1)
 	 (const_int 1)))]
   "TARGET_SSE"
@@ -3792,6 +3812,7 @@ (define_insn "<sse>_vmmaskcmp<mode>3"
    cmp%D3<ssescalarmodesuffix>\t{%2, %0|%0, %<iptr>2}
    vcmp%D3<ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %<iptr>2}"
   [(set_attr "isa" "noavx,avx")
+   (set_attr "gpr32" "1,0")
    (set_attr "type" "ssecmp")
    (set_attr "length_immediate" "1,*")
    (set_attr "prefix" "orig,vex")
@@ -4709,7 +4730,7 @@ (define_insn "<sse>_andnot<mode>3<mask_name>"
 	(and:VFB_128_256
 	  (not:VFB_128_256
 	    (match_operand:VFB_128_256 1 "register_operand" "0,x,v,v"))
-	  (match_operand:VFB_128_256 2 "vector_operand" "xBm,xm,vm,vm")))]
+	  (match_operand:VFB_128_256 2 "vector_operand" "xBm,xjm,vm,vm")))]
   "TARGET_SSE && <mask_avx512vl_condition>
    && (!<mask_applied> || <ssescalarmode>mode != HFmode)"
 {
@@ -4753,7 +4774,8 @@ (define_insn "<sse>_andnot<mode>3<mask_name>"
   output_asm_insn (buf, operands);
   return "";
 }
-  [(set_attr "isa" "noavx,avx,avx512dq,avx512f")
+  [(set_attr "isa" "noavx,avx_noavx512f,avx512dq,avx512f")
+   (set_attr "gpr32" "1,0,1,1")
    (set_attr "type" "sselog")
    (set_attr "prefix" "orig,maybe_vex,evex,evex")
    (set (attr "mode")
@@ -4761,6 +4783,10 @@ (define_insn "<sse>_andnot<mode>3<mask_name>"
 		    (and (eq_attr "alternative" "1")
 			 (match_test "!TARGET_AVX512DQ")))
 		 (const_string "<sseintvecmode2>")
+	       (and (not (match_test "<mask_applied>"))
+		    (eq_attr "alternative" "3")
+		    (match_test "!x86_evex_reg_mentioned_p (operands, 3)"))
+		 (const_string "<MODE>")
 	       (eq_attr "alternative" "3")
 		 (const_string "<sseintvecmode2>")
 	       (match_test "TARGET_AVX")
@@ -5063,7 +5089,7 @@ (define_insn "*andnot<mode>3"
   [(set (match_operand:ANDNOT_MODE 0 "register_operand" "=x,x,v,v")
 	(and:ANDNOT_MODE
 	  (not:ANDNOT_MODE (match_operand:ANDNOT_MODE 1 "register_operand" "0,x,v,v"))
-	  (match_operand:ANDNOT_MODE 2 "vector_operand" "xBm,xm,vm,v")))]
+	  (match_operand:ANDNOT_MODE 2 "vector_operand" "xBm,xjm,vm,v")))]
   "TARGET_SSE"
 {
   char buf[128];
@@ -5092,7 +5118,8 @@ (define_insn "*andnot<mode>3"
   output_asm_insn (buf, operands);
   return "";
 }
-  [(set_attr "isa" "noavx,avx,avx512vl,avx512f")
+  [(set_attr "isa" "noavx,avx_noavx512f,avx512vl,avx512f")
+   (set_attr "gpr32" "1,0,1,1")
    (set_attr "type" "sselog")
    (set (attr "prefix_data16")
      (if_then_else
@@ -5102,7 +5129,10 @@ (define_insn "*andnot<mode>3"
        (const_string "*")))
    (set_attr "prefix" "orig,vex,evex,evex")
    (set (attr "mode")
-	(cond [(eq_attr "alternative" "2")
+	(cond [(and (eq_attr "alternative" "3")
+		    (match_test "!x86_evex_reg_mentioned_p (operands, 3)"))
+		 (const_string "TI")
+	       (eq_attr "alternative" "2")
 		 (const_string "TI")
 	       (eq_attr "alternative" "3")
 		 (const_string "XI")
@@ -12240,7 +12270,7 @@ (define_insn_and_split "vec_extract_lo_v32qi"
   "operands[1] = gen_lowpart (V16QImode, operands[1]);")
 
 (define_insn "vec_extract_hi_v32qi"
-  [(set (match_operand:V16QI 0 "nonimmediate_operand" "=xm,vm")
+  [(set (match_operand:V16QI 0 "nonimmediate_operand" "=xjm,vm")
 	(vec_select:V16QI
 	  (match_operand:V32QI 1 "register_operand" "x,v")
 	  (parallel [(const_int 16) (const_int 17)
@@ -12258,7 +12288,8 @@ (define_insn "vec_extract_hi_v32qi"
   [(set_attr "type" "sselog1")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
-   (set_attr "isa" "*,avx512vl")
+   (set_attr "isa" "noavx512vl,avx512vl")
+   (set_attr "gpr32" "0,1")
    (set_attr "prefix" "vex,evex")
    (set_attr "mode" "OI")])
 
@@ -17130,6 +17161,7 @@ (define_insn "*sse2_gt<mode>3"
    pcmpgt<ssemodesuffix>\t{%2, %0|%0, %2}
    vpcmpgt<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,avx")
+   (set_attr "gpr32" "1,0")
    (set_attr "type" "ssecmp")
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "TI")])
@@ -17446,7 +17478,7 @@ (define_insn "*andnot<mode>3"
   [(set (match_operand:VI 0 "register_operand" "=x,x,v,v,v")
 	(and:VI
 	  (not:VI (match_operand:VI 1 "bcst_vector_operand" "0,x,v,m,Br"))
-	  (match_operand:VI 2 "bcst_vector_operand" "xBm,xm,vmBr,0,0")))]
+	  (match_operand:VI 2 "bcst_vector_operand" "xBm,xjm,vmBr,0,0")))]
   "TARGET_SSE
    && (register_operand (operands[1], <MODE>mode)
        || register_operand (operands[2], <MODE>mode))"
@@ -17533,7 +17565,8 @@ (define_insn "*andnot<mode>3"
   output_asm_insn (buf, operands);
   return "";
 }
-  [(set_attr "isa" "noavx,avx,avx,*,*")
+  [(set_attr "isa" "noavx,avx_noavx512f,avx512f,*,*")
+   (set_attr "gpr32" "1,0,1,1,1")
    (set_attr "type" "sselog")
    (set (attr "prefix_data16")
      (if_then_else
@@ -17688,7 +17721,7 @@ (define_insn "*<code><mode>3<mask_name>"
   [(set (match_operand:VI48_AVX_AVX512F 0 "register_operand" "=x,x,v")
 	(any_logic:VI48_AVX_AVX512F
 	  (match_operand:VI48_AVX_AVX512F 1 "bcst_vector_operand" "%0,x,v")
-	  (match_operand:VI48_AVX_AVX512F 2 "bcst_vector_operand" "xBm,xm,vmBr")))]
+	  (match_operand:VI48_AVX_AVX512F 2 "bcst_vector_operand" "xBm,xjm,vmBr")))]
   "TARGET_SSE && <mask_mode512bit_condition>
    && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
 {
@@ -17718,9 +17751,11 @@ (define_insn "*<code><mode>3<mask_name>"
 	case E_V4DImode:
 	case E_V4SImode:
 	case E_V2DImode:
-	  ssesuffix = (TARGET_AVX512VL
-		       && (<mask_applied> || which_alternative == 2)
-		       ? "<ssemodesuffix>" : "");
+	  ssesuffix = ((TARGET_AVX512VL
+		        && (<mask_applied> || which_alternative == 2))
+		       || (MEM_P (operands[2]) && which_alternative == 2
+			   && x86_extended_rex2reg_mentioned_p (operands[2])))
+		       ? "<ssemodesuffix>" : "";
 	  break;
 	default:
 	  gcc_unreachable ();
@@ -17760,7 +17795,8 @@ (define_insn "*<code><mode>3<mask_name>"
   output_asm_insn (buf, operands);
   return "";
 }
-  [(set_attr "isa" "noavx,avx,avx")
+  [(set_attr "isa" "noavx,avx_noavx512f,avx512f")
+   (set_attr "gpr32" "1,0,1")
    (set_attr "type" "sselog")
    (set (attr "prefix_data16")
      (if_then_else
@@ -17787,7 +17823,7 @@ (define_insn "*<code><mode>3"
   [(set (match_operand:VI12_AVX_AVX512F 0 "register_operand" "=x,x,v")
 	(any_logic:VI12_AVX_AVX512F
 	  (match_operand:VI12_AVX_AVX512F 1 "vector_operand" "%0,x,v")
-	  (match_operand:VI12_AVX_AVX512F 2 "vector_operand" "xBm,xm,vm")))]
+	  (match_operand:VI12_AVX_AVX512F 2 "vector_operand" "xBm,xjm,vm")))]
   "TARGET_SSE && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
 {
   char buf[64];
@@ -17816,7 +17852,10 @@ (define_insn "*<code><mode>3"
 	case E_V16HImode:
 	case E_V16QImode:
 	case E_V8HImode:
-	  ssesuffix = TARGET_AVX512VL && which_alternative == 2 ? "q" : "";
+	  ssesuffix = (((TARGET_AVX512VL && which_alternative == 2)
+		       || (MEM_P (operands[2]) && which_alternative == 2
+			   && x86_extended_rex2reg_mentioned_p (operands[2]))))
+		       ? "q" : "";
 	  break;
 	default:
 	  gcc_unreachable ();
@@ -17853,7 +17892,8 @@ (define_insn "*<code><mode>3"
   output_asm_insn (buf, operands);
   return "";
 }
-  [(set_attr "isa" "noavx,avx,avx")
+  [(set_attr "isa" "noavx,avx_noavx512f,avx512f")
+   (set_attr "gpr32" "1,0,1")
    (set_attr "type" "sselog")
    (set (attr "prefix_data16")
      (if_then_else
@@ -17880,13 +17920,14 @@ (define_insn "<code>v1ti3"
   [(set (match_operand:V1TI 0 "register_operand" "=x,x,v")
 	(any_logic:V1TI
 	  (match_operand:V1TI 1 "register_operand" "%0,x,v")
-	  (match_operand:V1TI 2 "vector_operand" "xBm,xm,vm")))]
+	  (match_operand:V1TI 2 "vector_operand" "xBm,xjm,vm")))]
   "TARGET_SSE2"
   "@
    p<logic>\t{%2, %0|%0, %2}
    vp<logic>\t{%2, %1, %0|%0, %1, %2}
    vp<logic>d\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "isa" "noavx,avx,avx512vl")
+  [(set_attr "isa" "noavx,avx_noavx512vl,avx512vl")
+   (set_attr "gpr32" "1,0,1")
    (set_attr "prefix" "orig,vex,evex")
    (set_attr "prefix_data16" "1,*,*")
    (set_attr "type" "sselog")
@@ -20866,33 +20907,35 @@ (define_insn "*<sse2_avx2>_psadbw"
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_insn "<sse>_movmsk<ssemodesuffix><avxsizesuffix>"
-  [(set (match_operand:SI 0 "register_operand" "=r")
+  [(set (match_operand:SI 0 "register_operand" "=r,jr")
 	(unspec:SI
-	  [(match_operand:VF_128_256 1 "register_operand" "x")]
+	  [(match_operand:VF_128_256 1 "register_operand" "x,x")]
 	  UNSPEC_MOVMSK))]
   "TARGET_SSE"
   "%vmovmsk<ssemodesuffix>\t{%1, %0|%0, %1}"
-  [(set_attr "type" "ssemov")
-   (set_attr "prefix" "maybe_vex")
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "ssemov")
+   (set_attr "prefix" "maybe_evex")
    (set_attr "mode" "<MODE>")])
 
 (define_insn "*<sse>_movmsk<ssemodesuffix><avxsizesuffix>_<u>ext"
-  [(set (match_operand:DI 0 "register_operand" "=r")
+  [(set (match_operand:DI 0 "register_operand" "=r,jr")
 	(any_extend:DI
 	  (unspec:SI
-	    [(match_operand:VF_128_256 1 "register_operand" "x")]
+	    [(match_operand:VF_128_256 1 "register_operand" "x,x")]
 	    UNSPEC_MOVMSK)))]
   "TARGET_64BIT && TARGET_SSE"
-  "%vmovmsk<ssemodesuffix>\t{%1, %k0|%k0, %1}"
-  [(set_attr "type" "ssemov")
-   (set_attr "prefix" "maybe_vex")
+  "%vmovmsk<ssemodesuffix>\t{%1, %0|%0, %1}"
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "ssemov")
+   (set_attr "prefix" "maybe_evex")
    (set_attr "mode" "<MODE>")])
 
 (define_insn_and_split "*<sse>_movmsk<ssemodesuffix><avxsizesuffix>_lt"
-  [(set (match_operand:SI 0 "register_operand" "=r")
+  [(set (match_operand:SI 0 "register_operand" "=r,jr")
 	(unspec:SI
 	  [(lt:VF_128_256
-	     (match_operand:<sseintvecmode> 1 "register_operand" "x")
+	     (match_operand:<sseintvecmode> 1 "register_operand" "x,x")
 	     (match_operand:<sseintvecmode> 2 "const0_operand"))]
 	  UNSPEC_MOVMSK))]
   "TARGET_SSE"
@@ -20901,16 +20944,17 @@ (define_insn_and_split "*<sse>_movmsk<ssemodesuffix><avxsizesuffix>_lt"
   [(set (match_dup 0)
 	(unspec:SI [(match_dup 1)] UNSPEC_MOVMSK))]
   "operands[1] = gen_lowpart (<MODE>mode, operands[1]);"
-  [(set_attr "type" "ssemov")
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "ssemov")
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "<MODE>")])
 
 (define_insn_and_split "*<sse>_movmsk<ssemodesuffix><avxsizesuffix>_<u>ext_lt"
-  [(set (match_operand:DI 0 "register_operand" "=r")
+  [(set (match_operand:DI 0 "register_operand" "=r,jr")
 	(any_extend:DI
 	  (unspec:SI
 	    [(lt:VF_128_256
-	       (match_operand:<sseintvecmode> 1 "register_operand" "x")
+	       (match_operand:<sseintvecmode> 1 "register_operand" "x,x")
 	       (match_operand:<sseintvecmode> 2 "const0_operand"))]
 	    UNSPEC_MOVMSK)))]
   "TARGET_64BIT && TARGET_SSE"
@@ -20919,16 +20963,17 @@ (define_insn_and_split "*<sse>_movmsk<ssemodesuffix><avxsizesuffix>_<u>ext_lt"
   [(set (match_dup 0)
 	(any_extend:DI (unspec:SI [(match_dup 1)] UNSPEC_MOVMSK)))]
   "operands[1] = gen_lowpart (<MODE>mode, operands[1]);"
-  [(set_attr "type" "ssemov")
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "ssemov")
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "<MODE>")])
 
 (define_insn_and_split "*<sse>_movmsk<ssemodesuffix><avxsizesuffix>_shift"
-  [(set (match_operand:SI 0 "register_operand" "=r")
+  [(set (match_operand:SI 0 "register_operand" "=r,jr")
 	(unspec:SI
 	  [(subreg:VF_128_256
 	     (ashiftrt:<sseintvecmode>
-	       (match_operand:<sseintvecmode> 1 "register_operand" "x")
+	       (match_operand:<sseintvecmode> 1 "register_operand" "x,x")
 	       (match_operand:QI 2 "const_int_operand")) 0)]
 	  UNSPEC_MOVMSK))]
   "TARGET_SSE"
@@ -20937,17 +20982,18 @@ (define_insn_and_split "*<sse>_movmsk<ssemodesuffix><avxsizesuffix>_shift"
   [(set (match_dup 0)
 	(unspec:SI [(match_dup 1)] UNSPEC_MOVMSK))]
   "operands[1] = gen_lowpart (<MODE>mode, operands[1]);"
-  [(set_attr "type" "ssemov")
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "ssemov")
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "<MODE>")])
 
 (define_insn_and_split "*<sse>_movmsk<ssemodesuffix><avxsizesuffix>_<u>ext_shift"
-  [(set (match_operand:DI 0 "register_operand" "=r")
+  [(set (match_operand:DI 0 "register_operand" "=r,jr")
 	(any_extend:DI
 	  (unspec:SI
 	    [(subreg:VF_128_256
 	       (ashiftrt:<sseintvecmode>
-		 (match_operand:<sseintvecmode> 1 "register_operand" "x")
+		 (match_operand:<sseintvecmode> 1 "register_operand" "x,x")
 	       (match_operand:QI 2 "const_int_operand")) 0)]
 	    UNSPEC_MOVMSK)))]
   "TARGET_64BIT && TARGET_SSE"
@@ -20956,18 +21002,20 @@ (define_insn_and_split "*<sse>_movmsk<ssemodesuffix><avxsizesuffix>_<u>ext_shift
   [(set (match_dup 0)
 	(any_extend:DI (unspec:SI [(match_dup 1)] UNSPEC_MOVMSK)))]
   "operands[1] = gen_lowpart (<MODE>mode, operands[1]);"
-  [(set_attr "type" "ssemov")
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "ssemov")
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "<MODE>")])
 
 (define_insn "<sse2_avx2>_pmovmskb"
-  [(set (match_operand:SI 0 "register_operand" "=r")
+  [(set (match_operand:SI 0 "register_operand" "=r,jr")
 	(unspec:SI
-	  [(match_operand:VI1_AVX2 1 "register_operand" "x")]
+	  [(match_operand:VI1_AVX2 1 "register_operand" "x,x")]
 	  UNSPEC_MOVMSK))]
   "TARGET_SSE2"
   "%vpmovmskb\t{%1, %0|%0, %1}"
-  [(set_attr "type" "ssemov")
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "ssemov")
    (set (attr "prefix_data16")
      (if_then_else
        (match_test "TARGET_AVX")
@@ -20977,14 +21025,15 @@ (define_insn "<sse2_avx2>_pmovmskb"
    (set_attr "mode" "SI")])
 
 (define_insn "*<sse2_avx2>_pmovmskb_zext"
-  [(set (match_operand:DI 0 "register_operand" "=r")
+  [(set (match_operand:DI 0 "register_operand" "=r,jr")
 	(zero_extend:DI
 	  (unspec:SI
-	    [(match_operand:VI1_AVX2 1 "register_operand" "x")]
+	    [(match_operand:VI1_AVX2 1 "register_operand" "x,x")]
 	    UNSPEC_MOVMSK)))]
   "TARGET_64BIT && TARGET_SSE2"
   "%vpmovmskb\t{%1, %k0|%k0, %1}"
-  [(set_attr "type" "ssemov")
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "ssemov")
    (set (attr "prefix_data16")
      (if_then_else
        (match_test "TARGET_AVX")
@@ -20994,14 +21043,15 @@ (define_insn "*<sse2_avx2>_pmovmskb_zext"
    (set_attr "mode" "SI")])
 
 (define_insn "*sse2_pmovmskb_ext"
-  [(set (match_operand:DI 0 "register_operand" "=r")
+  [(set (match_operand:DI 0 "register_operand" "=r,jr")
 	(sign_extend:DI
 	  (unspec:SI
-	    [(match_operand:V16QI 1 "register_operand" "x")]
+	    [(match_operand:V16QI 1 "register_operand" "x,x")]
 	    UNSPEC_MOVMSK)))]
   "TARGET_64BIT && TARGET_SSE2"
   "%vpmovmskb\t{%1, %k0|%k0, %1}"
-  [(set_attr "type" "ssemov")
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "ssemov")
    (set (attr "prefix_data16")
      (if_then_else
        (match_test "TARGET_AVX")
@@ -21086,9 +21136,9 @@ (define_split
 })
 
 (define_insn_and_split "*<sse2_avx2>_pmovmskb_lt"
-  [(set (match_operand:SI 0 "register_operand" "=r")
+  [(set (match_operand:SI 0 "register_operand" "=r,jr")
 	(unspec:SI
-	  [(lt:VI1_AVX2 (match_operand:VI1_AVX2 1 "register_operand" "x")
+	  [(lt:VI1_AVX2 (match_operand:VI1_AVX2 1 "register_operand" "x,x")
 			(match_operand:VI1_AVX2 2 "const0_operand"))]
 	  UNSPEC_MOVMSK))]
   "TARGET_SSE2"
@@ -21097,7 +21147,8 @@ (define_insn_and_split "*<sse2_avx2>_pmovmskb_lt"
   [(set (match_dup 0)
 	(unspec:SI [(match_dup 1)] UNSPEC_MOVMSK))]
   ""
-  [(set_attr "type" "ssemov")
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "ssemov")
    (set (attr "prefix_data16")
      (if_then_else
        (match_test "TARGET_AVX")
@@ -21107,10 +21158,10 @@ (define_insn_and_split "*<sse2_avx2>_pmovmskb_lt"
    (set_attr "mode" "SI")])
 
 (define_insn_and_split "*<sse2_avx2>_pmovmskb_zext_lt"
-  [(set (match_operand:DI 0 "register_operand" "=r")
+  [(set (match_operand:DI 0 "register_operand" "=r,jr")
 	(zero_extend:DI
 	  (unspec:SI
-	    [(lt:VI1_AVX2 (match_operand:VI1_AVX2 1 "register_operand" "x")
+	    [(lt:VI1_AVX2 (match_operand:VI1_AVX2 1 "register_operand" "x,x")
 			  (match_operand:VI1_AVX2 2 "const0_operand"))]
 	    UNSPEC_MOVMSK)))]
   "TARGET_64BIT && TARGET_SSE2"
@@ -21119,7 +21170,8 @@ (define_insn_and_split "*<sse2_avx2>_pmovmskb_zext_lt"
   [(set (match_dup 0)
 	(zero_extend:DI (unspec:SI [(match_dup 1)] UNSPEC_MOVMSK)))]
   ""
-  [(set_attr "type" "ssemov")
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "ssemov")
    (set (attr "prefix_data16")
      (if_then_else
        (match_test "TARGET_AVX")
@@ -21129,10 +21181,10 @@ (define_insn_and_split "*<sse2_avx2>_pmovmskb_zext_lt"
    (set_attr "mode" "SI")])
 
 (define_insn_and_split "*sse2_pmovmskb_ext_lt"
-  [(set (match_operand:DI 0 "register_operand" "=r")
+  [(set (match_operand:DI 0 "register_operand" "=r,jr")
 	(sign_extend:DI
 	  (unspec:SI
-	    [(lt:V16QI (match_operand:V16QI 1 "register_operand" "x")
+	    [(lt:V16QI (match_operand:V16QI 1 "register_operand" "x,x")
 		       (match_operand:V16QI 2 "const0_operand"))]
 	    UNSPEC_MOVMSK)))]
   "TARGET_64BIT && TARGET_SSE2"
@@ -21141,7 +21193,8 @@ (define_insn_and_split "*sse2_pmovmskb_ext_lt"
   [(set (match_dup 0)
 	(sign_extend:DI (unspec:SI [(match_dup 1)] UNSPEC_MOVMSK)))]
   ""
-  [(set_attr "type" "ssemov")
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "ssemov")
    (set (attr "prefix_data16")
      (if_then_else
        (match_test "TARGET_AVX")
@@ -21202,21 +21255,25 @@ (define_insn "*sse2_maskmovdqu"
    (set_attr "mode" "TI")])
 
 (define_insn "sse_ldmxcsr"
-  [(unspec_volatile [(match_operand:SI 0 "memory_operand" "m")]
+  [(unspec_volatile [(match_operand:SI 0 "memory_operand" "m,jm")]
 		    UNSPECV_LDMXCSR)]
   "TARGET_SSE"
   "%vldmxcsr\t%0"
-  [(set_attr "type" "sse")
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "sse")
+   (set_attr "gpr32" "1,0")
    (set_attr "atom_sse_attr" "mxcsr")
    (set_attr "prefix" "maybe_vex")
    (set_attr "memory" "load")])
 
 (define_insn "sse_stmxcsr"
-  [(set (match_operand:SI 0 "memory_operand" "=m")
+  [(set (match_operand:SI 0 "memory_operand" "=m,jm")
 	(unspec_volatile:SI [(const_int 0)] UNSPECV_STMXCSR))]
   "TARGET_SSE"
   "%vstmxcsr\t%0"
-  [(set_attr "type" "sse")
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "sse")
+   (set_attr "gpr32" "0")
    (set_attr "atom_sse_attr" "mxcsr")
    (set_attr "prefix" "maybe_vex")
    (set_attr "memory" "store")])
@@ -23860,11 +23917,12 @@ (define_expand "<insn>v2siv2di2"
 (define_insn "avx_vtest<ssemodesuffix><avxsizesuffix>"
   [(set (reg:CC FLAGS_REG)
 	(unspec:CC [(match_operand:VF_128_256 0 "register_operand" "x")
-		    (match_operand:VF_128_256 1 "nonimmediate_operand" "xm")]
+		    (match_operand:VF_128_256 1 "nonimmediate_operand" "xjm")]
 		   UNSPEC_VTESTP))]
   "TARGET_AVX"
   "vtest<ssemodesuffix>\t{%1, %0|%0, %1}"
   [(set_attr "type" "ssecomi")
+   (set_attr "gpr32" "0")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "vex")
    (set_attr "mode" "<MODE>")])
@@ -26925,7 +26983,7 @@ (define_split
 (define_insn "avx_vbroadcastf128_<mode>"
   [(set (match_operand:V_256 0 "register_operand" "=x,x,x,v,v,v,v")
 	(vec_concat:V_256
-	  (match_operand:<ssehalfvecmode> 1 "nonimmediate_operand" "m,0,?x,m,0,m,0")
+	  (match_operand:<ssehalfvecmode> 1 "nonimmediate_operand" "jm,0,?x,m,0,m,0")
 	  (match_dup 1)))]
   "TARGET_AVX"
   "@
@@ -26936,8 +26994,9 @@ (define_insn "avx_vbroadcastf128_<mode>"
    vinsert<i128vldq>\t{$1, %1, %0, %0|%0, %0, %1, 1}
    vbroadcast<shuffletype>32x4\t{%1, %0|%0, %1}
    vinsert<shuffletype>32x4\t{$1, %1, %0, %0|%0, %0, %1, 1}"
-  [(set_attr "isa" "*,*,*,avx512dq,avx512dq,avx512vl,avx512vl")
+  [(set_attr "isa" "noavx512vl,*,*,avx512dq,avx512dq,avx512vl,avx512vl")
    (set_attr "type" "ssemov,sselog1,sselog1,ssemov,sselog1,ssemov,sselog1")
+   (set_attr "gpr32" "0,1,1,1,1,1,1")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "0,1,1,0,1,0,1")
    (set_attr "prefix" "vex,vex,vex,evex,evex,evex,evex")
@@ -27220,12 +27279,13 @@ (define_insn "*avx_vperm2f128<mode>_full"
   [(set (match_operand:AVX256MODE2P 0 "register_operand" "=x")
 	(unspec:AVX256MODE2P
 	  [(match_operand:AVX256MODE2P 1 "register_operand" "x")
-	   (match_operand:AVX256MODE2P 2 "nonimmediate_operand" "xm")
+	   (match_operand:AVX256MODE2P 2 "nonimmediate_operand" "xjm")
 	   (match_operand:SI 3 "const_0_to_255_operand")]
 	  UNSPEC_VPERMIL2F128))]
   "TARGET_AVX"
   "vperm2<i128>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "sselog")
+   (set_attr "gpr32" "0")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "vex")
@@ -27342,11 +27402,11 @@ (define_expand "avx_vinsertf128<mode>"
 })
 
 (define_insn "vec_set_lo_<mode><mask_name>"
-  [(set (match_operand:VI8F_256 0 "register_operand" "=v")
+  [(set (match_operand:VI8F_256 0 "register_operand" "=x,v")
 	(vec_concat:VI8F_256
-	  (match_operand:<ssehalfvecmode> 2 "nonimmediate_operand" "vm")
+	  (match_operand:<ssehalfvecmode> 2 "nonimmediate_operand" "xjm,vm")
 	  (vec_select:<ssehalfvecmode>
-	    (match_operand:VI8F_256 1 "register_operand" "v")
+	    (match_operand:VI8F_256 1 "register_operand" "x,v")
 	    (parallel [(const_int 2) (const_int 3)]))))]
   "TARGET_AVX && <mask_avx512dq_condition>"
 {
@@ -27357,7 +27417,9 @@ (define_insn "vec_set_lo_<mode><mask_name>"
   else
     return "vinsert<i128>\t{$0x0, %2, %1, %0|%0, %1, %2, 0x0}";
 }
-  [(set_attr "type" "sselog")
+  [(set_attr "isa" "noavx512vl,avx512vl")
+   (set_attr "gpr32" "0,1")
+   (set_attr "type" "sselog")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "vex")
@@ -27386,11 +27448,11 @@ (define_insn "vec_set_hi_<mode><mask_name>"
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_insn "vec_set_lo_<mode><mask_name>"
-  [(set (match_operand:VI4F_256 0 "register_operand" "=v")
+  [(set (match_operand:VI4F_256 0 "register_operand" "=x,v")
 	(vec_concat:VI4F_256
-	  (match_operand:<ssehalfvecmode> 2 "nonimmediate_operand" "vm")
+	  (match_operand:<ssehalfvecmode> 2 "nonimmediate_operand" "xjm,vm")
 	  (vec_select:<ssehalfvecmode>
-	    (match_operand:VI4F_256 1 "register_operand" "v")
+	    (match_operand:VI4F_256 1 "register_operand" "x,v")
 	    (parallel [(const_int 4) (const_int 5)
 		       (const_int 6) (const_int 7)]))))]
   "TARGET_AVX"
@@ -27400,20 +27462,22 @@ (define_insn "vec_set_lo_<mode><mask_name>"
   else
     return "vinsert<i128>\t{$0x0, %2, %1, %0|%0, %1, %2, 0x0}";
 }
-  [(set_attr "type" "sselog")
+  [(set_attr "isa" "noavx512vl,avx512vl")
+   (set_attr "gpr32" "0,1")
+   (set_attr "type" "sselog")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "vex")
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_insn "vec_set_hi_<mode><mask_name>"
-  [(set (match_operand:VI4F_256 0 "register_operand" "=v")
+  [(set (match_operand:VI4F_256 0 "register_operand" "=x,v")
 	(vec_concat:VI4F_256
 	  (vec_select:<ssehalfvecmode>
-	    (match_operand:VI4F_256 1 "register_operand" "v")
+	    (match_operand:VI4F_256 1 "register_operand" "x,v")
 	    (parallel [(const_int 0) (const_int 1)
 		       (const_int 2) (const_int 3)]))
-	  (match_operand:<ssehalfvecmode> 2 "nonimmediate_operand" "vm")))]
+	  (match_operand:<ssehalfvecmode> 2 "nonimmediate_operand" "xjm,vm")))]
   "TARGET_AVX"
 {
   if (TARGET_AVX512VL)
@@ -27421,7 +27485,9 @@ (define_insn "vec_set_hi_<mode><mask_name>"
   else
     return "vinsert<i128>\t{$0x1, %2, %1, %0|%0, %1, %2, 0x1}";
 }
-  [(set_attr "type" "sselog")
+  [(set_attr "isa" "noavx512vl,avx512vl")
+   (set_attr "gpr32" "0,1")
+   (set_attr "type" "sselog")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "vex")
@@ -27430,7 +27496,7 @@ (define_insn "vec_set_hi_<mode><mask_name>"
 (define_insn "vec_set_lo_<mode>"
   [(set (match_operand:V16_256 0 "register_operand" "=x,v")
 	(vec_concat:V16_256
-	  (match_operand:<ssehalfvecmode> 2 "nonimmediate_operand" "xm,vm")
+	  (match_operand:<ssehalfvecmode> 2 "nonimmediate_operand" "xjm,vm")
 	  (vec_select:<ssehalfvecmode>
 	    (match_operand:V16_256 1 "register_operand" "x,v")
 	    (parallel [(const_int 8) (const_int 9)
@@ -27441,7 +27507,9 @@ (define_insn "vec_set_lo_<mode>"
   "@
    vinsert%~128\t{$0x0, %2, %1, %0|%0, %1, %2, 0x0}
    vinserti32x4\t{$0x0, %2, %1, %0|%0, %1, %2, 0x0}"
-  [(set_attr "type" "sselog")
+  [(set_attr "isa" "noavx512vl,avx512vl")
+   (set_attr "gpr32" "0,1")
+   (set_attr "type" "sselog")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "vex,evex")
@@ -27456,12 +27524,14 @@ (define_insn "vec_set_hi_<mode>"
 		       (const_int 2) (const_int 3)
 		       (const_int 4) (const_int 5)
 		       (const_int 6) (const_int 7)]))
-	  (match_operand:<ssehalfvecmode> 2 "nonimmediate_operand" "xm,vm")))]
+	  (match_operand:<ssehalfvecmode> 2 "nonimmediate_operand" "xjm,vm")))]
   "TARGET_AVX"
   "@
    vinsert%~128\t{$0x1, %2, %1, %0|%0, %1, %2, 0x1}
    vinserti32x4\t{$0x1, %2, %1, %0|%0, %1, %2, 0x1}"
-  [(set_attr "type" "sselog")
+  [(set_attr "isa" "noavx512vl,avx512vl")
+   (set_attr "gpr32" "0,1")
+   (set_attr "type" "sselog")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "vex,evex")
@@ -27470,7 +27540,7 @@ (define_insn "vec_set_hi_<mode>"
 (define_insn "vec_set_lo_v32qi"
   [(set (match_operand:V32QI 0 "register_operand" "=x,v")
 	(vec_concat:V32QI
-	  (match_operand:V16QI 2 "nonimmediate_operand" "xm,v")
+	  (match_operand:V16QI 2 "nonimmediate_operand" "xjm,v")
 	  (vec_select:V16QI
 	    (match_operand:V32QI 1 "register_operand" "x,v")
 	    (parallel [(const_int 16) (const_int 17)
@@ -27485,7 +27555,9 @@ (define_insn "vec_set_lo_v32qi"
   "@
    vinsert%~128\t{$0x0, %2, %1, %0|%0, %1, %2, 0x0}
    vinserti32x4\t{$0x0, %2, %1, %0|%0, %1, %2, 0x0}"
-  [(set_attr "type" "sselog")
+  [(set_attr "isa" "noavx512vl,avx512vl")
+   (set_attr "type" "sselog")
+   (set_attr "gpr32" "0,1")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "vex,evex")
@@ -27504,12 +27576,14 @@ (define_insn "vec_set_hi_v32qi"
 		       (const_int 10) (const_int 11)
 		       (const_int 12) (const_int 13)
 		       (const_int 14) (const_int 15)]))
-	  (match_operand:V16QI 2 "nonimmediate_operand" "xm,vm")))]
+	  (match_operand:V16QI 2 "nonimmediate_operand" "xjm,vm")))]
   "TARGET_AVX"
   "@
    vinsert%~128\t{$0x1, %2, %1, %0|%0, %1, %2, 0x1}
    vinserti32x4\t{$0x1, %2, %1, %0|%0, %1, %2, 0x1}"
-  [(set_attr "type" "sselog")
+  [(set_attr "isa" "noavx512vl,avx512vl")
+   (set_attr "gpr32" "0")
+   (set_attr "type" "sselog")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "vex,evex")
@@ -27519,7 +27593,7 @@ (define_insn "<avx_avx2>_maskload<ssemodesuffix><avxsizesuffix>"
   [(set (match_operand:V48_128_256 0 "register_operand" "=x")
 	(unspec:V48_128_256
 	  [(match_operand:<sseintvecmode> 2 "register_operand" "x")
-	   (match_operand:V48_128_256 1 "memory_operand" "m")]
+	   (match_operand:V48_128_256 1 "memory_operand" "jm")]
 	  UNSPEC_MASKMOV))]
   "TARGET_AVX"
 {
@@ -27529,13 +27603,14 @@ (define_insn "<avx_avx2>_maskload<ssemodesuffix><avxsizesuffix>"
     return "vmaskmov<ssefltmodesuffix>\t{%1, %2, %0|%0, %2, %1}";
 }
   [(set_attr "type" "sselog1")
+   (set_attr "gpr32" "0")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "vex")
    (set_attr "btver2_decode" "vector")
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_insn "<avx_avx2>_maskstore<ssemodesuffix><avxsizesuffix>"
-  [(set (match_operand:V48_128_256 0 "memory_operand" "+m")
+  [(set (match_operand:V48_128_256 0 "memory_operand" "+jm")
 	(unspec:V48_128_256
 	  [(match_operand:<sseintvecmode> 1 "register_operand" "x")
 	   (match_operand:V48_128_256 2 "register_operand" "x")
@@ -27549,6 +27624,7 @@ (define_insn "<avx_avx2>_maskstore<ssemodesuffix><avxsizesuffix>"
     return "vmaskmov<ssefltmodesuffix>\t{%2, %1, %0|%0, %1, %2}";
 }
   [(set_attr "type" "sselog1")
+   (set_attr "gpr32" "0")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "vex")
    (set_attr "btver2_decode" "vector") 
@@ -27806,7 +27882,7 @@ (define_insn "avx_vec_concat<mode>"
   [(set (match_operand:V_256_512 0 "register_operand" "=x,v,x,Yv")
 	(vec_concat:V_256_512
 	  (match_operand:<ssehalfvecmode> 1 "nonimmediate_operand" "x,v,xm,vm")
-	  (match_operand:<ssehalfvecmode> 2 "nonimm_or_0_operand" "xBt,vm,C,C")))]
+	  (match_operand:<ssehalfvecmode> 2 "nonimm_or_0_operand" "xjm,vm,C,C")))]
   "TARGET_AVX
    && (operands[2] == CONST0_RTX (<ssehalfvecmode>mode)
        || !MEM_P (operands[1]))"
@@ -28145,7 +28221,7 @@ (define_insn "*avx2_gathersi<VEC_GATHER_MODE:mode>"
 	  [(match_operand:VEC_GATHER_MODE 2 "register_operand" "0")
 	   (match_operator:<ssescalarmode> 7 "vsib_mem_operator"
 	     [(unspec:P
-		[(match_operand:P 3 "vsib_address_operand" "Tv")
+		[(match_operand:P 3 "vsib_address_operand" "jb")
 		 (match_operand:<VEC_GATHER_IDXSI> 4 "register_operand" "x")
 		 (match_operand:SI 6 "const1248_operand")]
 		UNSPEC_VSIBADDR)])
@@ -28156,6 +28232,7 @@ (define_insn "*avx2_gathersi<VEC_GATHER_MODE:mode>"
   "TARGET_AVX2"
   "%M3v<sseintprefix>gatherd<ssemodesuffix>\t{%1, %7, %0|%0, %7, %1}"
   [(set_attr "type" "ssemov")
+   (set_attr "gpr32" "0")
    (set_attr "prefix" "vex")
    (set_attr "mode" "<sseinsnmode>")])
 
@@ -28165,7 +28242,7 @@ (define_insn "*avx2_gathersi<VEC_GATHER_MODE:mode>_2"
 	  [(pc)
 	   (match_operator:<ssescalarmode> 6 "vsib_mem_operator"
 	     [(unspec:P
-		[(match_operand:P 2 "vsib_address_operand" "Tv")
+		[(match_operand:P 2 "vsib_address_operand" "jb")
 		 (match_operand:<VEC_GATHER_IDXSI> 3 "register_operand" "x")
 		 (match_operand:SI 5 "const1248_operand")]
 		UNSPEC_VSIBADDR)])
@@ -28176,6 +28253,7 @@ (define_insn "*avx2_gathersi<VEC_GATHER_MODE:mode>_2"
   "TARGET_AVX2"
   "%M2v<sseintprefix>gatherd<ssemodesuffix>\t{%1, %6, %0|%0, %6, %1}"
   [(set_attr "type" "ssemov")
+   (set_attr "gpr32" "0")
    (set_attr "prefix" "vex")
    (set_attr "mode" "<sseinsnmode>")])
 
@@ -28206,7 +28284,7 @@ (define_insn "*avx2_gatherdi<VEC_GATHER_MODE:mode>"
 	  [(match_operand:<VEC_GATHER_SRCDI> 2 "register_operand" "0")
 	   (match_operator:<ssescalarmode> 7 "vsib_mem_operator"
 	     [(unspec:P
-		[(match_operand:P 3 "vsib_address_operand" "Tv")
+		[(match_operand:P 3 "vsib_address_operand" "jb")
 		 (match_operand:<VEC_GATHER_IDXDI> 4 "register_operand" "x")
 		 (match_operand:SI 6 "const1248_operand")]
 		UNSPEC_VSIBADDR)])
@@ -28217,6 +28295,7 @@ (define_insn "*avx2_gatherdi<VEC_GATHER_MODE:mode>"
   "TARGET_AVX2"
   "%M3v<sseintprefix>gatherq<ssemodesuffix>\t{%5, %7, %2|%2, %7, %5}"
   [(set_attr "type" "ssemov")
+   (set_attr "gpr32" "0")
    (set_attr "prefix" "vex")
    (set_attr "mode" "<sseinsnmode>")])
 
@@ -28226,7 +28305,7 @@ (define_insn "*avx2_gatherdi<VEC_GATHER_MODE:mode>_2"
 	  [(pc)
 	   (match_operator:<ssescalarmode> 6 "vsib_mem_operator"
 	     [(unspec:P
-		[(match_operand:P 2 "vsib_address_operand" "Tv")
+		[(match_operand:P 2 "vsib_address_operand" "jb")
 		 (match_operand:<VEC_GATHER_IDXDI> 3 "register_operand" "x")
 		 (match_operand:SI 5 "const1248_operand")]
 		UNSPEC_VSIBADDR)])
@@ -28241,6 +28320,7 @@ (define_insn "*avx2_gatherdi<VEC_GATHER_MODE:mode>_2"
   return "%M2v<sseintprefix>gatherq<ssemodesuffix>\t{%4, %6, %0|%0, %6, %4}";
 }
   [(set_attr "type" "ssemov")
+   (set_attr "gpr32" "0")
    (set_attr "prefix" "vex")
    (set_attr "mode" "<sseinsnmode>")])
 
@@ -28251,7 +28331,7 @@ (define_insn "*avx2_gatherdi<VI4F_256:mode>_3"
 	    [(match_operand:<VEC_GATHER_SRCDI> 2 "register_operand" "0")
 	     (match_operator:<ssescalarmode> 7 "vsib_mem_operator"
 	       [(unspec:P
-		  [(match_operand:P 3 "vsib_address_operand" "Tv")
+		  [(match_operand:P 3 "vsib_address_operand" "jb")
 		   (match_operand:<VEC_GATHER_IDXDI> 4 "register_operand" "x")
 		   (match_operand:SI 6 "const1248_operand")]
 		  UNSPEC_VSIBADDR)])
@@ -28264,6 +28344,7 @@ (define_insn "*avx2_gatherdi<VI4F_256:mode>_3"
   "TARGET_AVX2"
   "%M3v<sseintprefix>gatherq<ssemodesuffix>\t{%5, %7, %0|%0, %7, %5}"
   [(set_attr "type" "ssemov")
+   (set_attr "gpr32" "0")
    (set_attr "prefix" "vex")
    (set_attr "mode" "<sseinsnmode>")])
 
@@ -28274,7 +28355,7 @@ (define_insn "*avx2_gatherdi<VI4F_256:mode>_4"
 	    [(pc)
 	     (match_operator:<ssescalarmode> 6 "vsib_mem_operator"
 	       [(unspec:P
-		  [(match_operand:P 2 "vsib_address_operand" "Tv")
+		  [(match_operand:P 2 "vsib_address_operand" "jb")
 		   (match_operand:<VEC_GATHER_IDXDI> 3 "register_operand" "x")
 		   (match_operand:SI 5 "const1248_operand")]
 		  UNSPEC_VSIBADDR)])
@@ -28287,6 +28368,7 @@ (define_insn "*avx2_gatherdi<VI4F_256:mode>_4"
   "TARGET_AVX2"
   "%M2v<sseintprefix>gatherq<ssemodesuffix>\t{%4, %6, %0|%0, %6, %4}"
   [(set_attr "type" "ssemov")
+   (set_attr "gpr32" "0")
    (set_attr "prefix" "vex")
    (set_attr "mode" "<sseinsnmode>")])
 
-- 
2.31.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 01/13] [APX EGPR] middle-end: Add insn argument to base_reg_class
  2023-09-22 10:56 ` [PATCH 01/13] [APX EGPR] middle-end: Add insn argument to base_reg_class Hongyu Wang
@ 2023-09-22 16:02   ` Vladimir Makarov
  2023-10-07  8:22     ` Hongyu Wang
  0 siblings, 1 reply; 28+ messages in thread
From: Vladimir Makarov @ 2023-09-22 16:02 UTC (permalink / raw)
  To: Hongyu Wang, gcc-patches; +Cc: ubizjak, jakub, Kong Lingling, Hongtao Liu


On 9/22/23 06:56, Hongyu Wang wrote:
> From: Kong Lingling <lingling.kong@intel.com>
>
> Current reload infrastructure does not support selective base_reg_class
> for backend insn. Add new macros with insn parameters to base_reg_class
> for lra/reload usage.
>
> gcc/ChangeLog:
>
> 	* addresses.h (base_reg_class): Add insn argument and new macro
> 	INSN_BASE_REG_CLASS.
> 	(regno_ok_for_base_p_1): Add insn argument and new macro
> 	REGNO_OK_FOR_INSN_BASE_P.
> 	(regno_ok_for_base_p): Add insn argument and parse to ok_for_base_p_1.
> 	* doc/tm.texi: Document INSN_BASE_REG_CLASS and
> 	REGNO_OK_FOR_INSN_BASE_P.
> 	* doc/tm.texi.in: Ditto.
> 	* lra-constraints.cc (process_address_1): Pass insn to
> 	base_reg_class.
> 	(curr_insn_transform): Ditto.
> 	* reload.cc (find_reloads): Ditto.
> 	(find_reloads_address): Ditto.
> 	(find_reloads_address_1): Ditto.
> 	(find_reloads_subreg_address): Ditto.
> 	* reload1.cc (maybe_fix_stack_asms): Ditto.
>
The patch is ok for committing to the trunk.  Thank you.

It would be nice to add to the documentation that INSN_BASE_REG_CLASS, 
INSN_INDEX_REG_CLASS, and REGNO_OK_FOR_INSN_BASE_P if defined have 
priority over older corresponding macros as it is already documented for 
REGNO_MODE_CODE_OK_FOR_BASE_P relating to REGNO_OK_FOR_BASE_P. But this 
small issue can be addressed later.



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 02/13] [APX EGPR] middle-end: Add index_reg_class with insn argument.
  2023-09-22 10:56 ` [PATCH 02/13] [APX EGPR] middle-end: Add index_reg_class with insn argument Hongyu Wang
@ 2023-09-22 16:03   ` Vladimir Makarov
  0 siblings, 0 replies; 28+ messages in thread
From: Vladimir Makarov @ 2023-09-22 16:03 UTC (permalink / raw)
  To: Hongyu Wang, gcc-patches; +Cc: ubizjak, jakub, Kong Lingling, Hongtao Liu


On 9/22/23 06:56, Hongyu Wang wrote:
> Like base_reg_class, INDEX_REG_CLASS also does not support backend insn.
> Add index_reg_class with insn argument for lra/reload usage.
>
> gcc/ChangeLog:
>
> 	* addresses.h (index_reg_class): New wrapper function like
> 	base_reg_class.
> 	* doc/tm.texi: Document INSN_INDEX_REG_CLASS.
> 	* doc/tm.texi.in: Ditto.
> 	* lra-constraints.cc (index_part_to_reg): Pass index_class.
> 	(process_address_1): Calls index_reg_class with curr_insn and
> 	replace INDEX_REG_CLASS with its return value index_cl.
> 	* reload.cc (find_reloads_address): Likewise.
> 	(find_reloads_address_1): Likewise.
>
The patch is ok for me to commit it to the trunk.  Thank you.

So all changes to the RA have been reviewed.  You just need an approval 
to the rest patches from an x86-64 maintainer.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 00/13] Support Intel APX EGPR
  2023-09-22 10:56 [PATCH v2 00/13] Support Intel APX EGPR Hongyu Wang
                   ` (12 preceding siblings ...)
  2023-09-22 10:56 ` [PATCH 13/13] [APX EGPR] Handle vex insns that only support GPR16 (5/5) Hongyu Wang
@ 2023-09-25  2:02 ` Hongtao Liu
  13 siblings, 0 replies; 28+ messages in thread
From: Hongtao Liu @ 2023-09-25  2:02 UTC (permalink / raw)
  To: Hongyu Wang; +Cc: gcc-patches, ubizjak, vmakarov, jakub

On Fri, Sep 22, 2023 at 6:56 PM Hongyu Wang <hongyu.wang@intel.com> wrote:
>
> Hi,
>
> This is a v2 patch for APX support which follows-up previous discussion in
> https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628904.html
>
> As discussed in previous thread, the inverse approach to extend base/index reg
> support with new memory constraints requrires much more effort both in
> middle-end and backend, so we keep the backend implementation logic. Also for
> inline asm, it is hard to provide memory constraint to use EGPR by force due to
> the limitation of base/index reg class hook, so we just add one register class
> that allows EGPR usage by force.
>
> The major changes are
>
>   1. Add new macros INSN_BASE_REG_CLASS/REGNO_OK_FOR_INSN_BASE_P instead of
>   using old MODE_CODE macros.
>   2. Add a series of constraints that prohibits EGPR as counterparts of common
>   constraints that may involve register/memory/address. All those are prefixed
>   with "j" to avoid confilts with previous constraints.
>   3. Support constraint mapping for all gpr related common constraints in
>   inline asm.
>
> Bootstrapped/regtested x86_64-linux-gnu.
> Ok for trunk?
I've pushed the patches to a vendor branch
b6dcc29fe45c214629bdb3ff9cf5bb08d06c0b81        refs/vendors/ix86/heads/apx

>
> Hongyu Wang (2):
>   [APX EGPR] middle-end: Add index_reg_class with insn argument.
>   [APX EGPR] Handle GPR16 only vector move insns
>
> Kong Lingling (11):
>   [APX EGPR] middle-end: Add insn argument to base_reg_class
>   [APX_EGPR] Initial support for APX_F
>   [APX EGPR] Add 16 new integer general purpose registers
>   [APX EGPR] Add register and memory constraints that disallow EGPR
>   [APX EGPR] Add backend hook for base_reg_class/index_reg_class.
>   [APX EGPR] Map reg/mem constraints in inline asm to non-EGPR
>     constraint.
>   [APX EGPR] Handle legacy insn that only support GPR16 (1/5)
>   [APX EGPR] Handle legacy insns that only support GPR16 (2/5)
>   [APX EGPR] Handle legacy insns that only support GPR16 (3/5)
>   [APX_EGPR] Handle legacy insns that only support GPR16 (4/5)
>   [APX EGPR] Handle vex insns that only support GPR16 (5/5)
>
>  gcc/addresses.h                               |  29 +-
>  gcc/common/config/i386/cpuinfo.h              |  12 +-
>  gcc/common/config/i386/i386-common.cc         |  17 +
>  gcc/common/config/i386/i386-cpuinfo.h         |   1 +
>  gcc/common/config/i386/i386-isas.h            |   1 +
>  gcc/config/i386/constraints.md                |  65 +-
>  gcc/config/i386/cpuid.h                       |   1 +
>  gcc/config/i386/i386-isa.def                  |   1 +
>  gcc/config/i386/i386-options.cc               |  18 +
>  gcc/config/i386/i386-opts.h                   |   8 +
>  gcc/config/i386/i386-protos.h                 |   5 +
>  gcc/config/i386/i386.cc                       | 303 ++++++-
>  gcc/config/i386/i386.h                        |  69 +-
>  gcc/config/i386/i386.md                       | 131 ++-
>  gcc/config/i386/i386.opt                      |  30 +
>  gcc/config/i386/mmx.md                        | 154 ++--
>  gcc/config/i386/sse.md                        | 792 ++++++++++++------
>  gcc/doc/invoke.texi                           |  11 +-
>  gcc/doc/tm.texi                               |  21 +
>  gcc/doc/tm.texi.in                            |  21 +
>  gcc/lra-constraints.cc                        |  32 +-
>  gcc/reload.cc                                 |  34 +-
>  gcc/reload1.cc                                |   2 +-
>  gcc/testsuite/gcc.target/i386/apx-1.c         |   8 +
>  .../gcc.target/i386/apx-egprs-names.c         |  17 +
>  .../gcc.target/i386/apx-inline-gpr-norex2.c   |  25 +
>  .../gcc.target/i386/apx-interrupt-1.c         | 102 +++
>  .../i386/apx-legacy-insn-check-norex2-asm.c   |   5 +
>  .../i386/apx-legacy-insn-check-norex2.c       | 181 ++++
>  .../gcc.target/i386/apx-spill_to_egprs-1.c    |  25 +
>  gcc/testsuite/lib/target-supports.exp         |  10 +
>  31 files changed, 1683 insertions(+), 448 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/apx-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/apx-egprs-names.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/apx-inline-gpr-norex2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/apx-interrupt-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2-asm.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/apx-spill_to_egprs-1.c
>
> --
> 2.31.1
>


-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 03/13] [APX_EGPR] Initial support for APX_F
  2023-09-22 10:56 ` [PATCH 03/13] [APX_EGPR] Initial support for APX_F Hongyu Wang
@ 2023-10-07  2:35   ` Hongtao Liu
  0 siblings, 0 replies; 28+ messages in thread
From: Hongtao Liu @ 2023-10-07  2:35 UTC (permalink / raw)
  To: Hongyu Wang
  Cc: gcc-patches, ubizjak, vmakarov, jakub, Kong Lingling, Hongtao Liu

On Fri, Sep 22, 2023 at 6:58 PM Hongyu Wang <hongyu.wang@intel.com> wrote:
>
> From: Kong Lingling <lingling.kong@intel.com>
>
> Add -mapx-features= enumeration to separate subfeatures of APX_F.
> -mapxf is treated same as previous ISA flag, while it sets
> -mapx-features=apx_all that enables all subfeatures.
Ok for this and the resest of patches(04-13).
>
> gcc/ChangeLog:
>
>         * common/config/i386/cpuinfo.h (XSTATE_APX_F): New macro.
>         (XCR_APX_F_ENABLED_MASK): Likewise.
>         (get_available_features): Detect APX_F under
>         * common/config/i386/i386-common.cc (OPTION_MASK_ISA2_APX_F_SET): New.
>         (OPTION_MASK_ISA2_APX_F_UNSET): Likewise.
>         (ix86_handle_option): Handle -mapxf.
>         * common/config/i386/i386-cpuinfo.h (FEATURE_APX_F): New.
>         * common/config/i386/i386-isas.h: Add entry for APX_F.
>         * config/i386/cpuid.h (bit_APX_F): New.
>         * config/i386/i386.h (bit_APX_F): (TARGET_APX_EGPR,
>         TARGET_APX_PUSH2POP2, TARGET_APX_NDD): New define.
>         * config/i386/i386-opts.h (enum apx_features): New enum.
>         * config/i386/i386-isa.def (APX_F): New DEF_PTA.
>         * config/i386/i386-options.cc (ix86_function_specific_save):
>         Save ix86_apx_features.
>         (ix86_function_specific_restore): Restore it.
>         (ix86_valid_target_attribute_inner_p): Add mapxf.
>         (ix86_option_override_internal): Set ix86_apx_features for PTA
>         and TARGET_APX_F. Also reports error when APX_F is set but not
>         having TARGET_64BIT.
>         * config/i386/i386.opt: (-mapxf): New ISA flag option.
>         (-mapx=): New enumeration option.
>         (apx_features): New enum type.
>         (apx_none): New enum value.
>         (apx_egpr): Likewise.
>         (apx_push2pop2): Likewise.
>         (apx_ndd): Likewise.
>         (apx_all): Likewise.
>         * doc/invoke.texi: Document mapxf.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/i386/apx-1.c: New test.
>
> Co-authored-by: Hongyu Wang <hongyu.wang@intel.com>
> Co-authored-by: Hongtao Liu <hongtao.liu@intel.com>
> ---
>  gcc/common/config/i386/cpuinfo.h      | 12 +++++++++++-
>  gcc/common/config/i386/i386-common.cc | 17 +++++++++++++++++
>  gcc/common/config/i386/i386-cpuinfo.h |  1 +
>  gcc/common/config/i386/i386-isas.h    |  1 +
>  gcc/config/i386/cpuid.h               |  1 +
>  gcc/config/i386/i386-isa.def          |  1 +
>  gcc/config/i386/i386-options.cc       | 18 ++++++++++++++++++
>  gcc/config/i386/i386-opts.h           |  8 ++++++++
>  gcc/config/i386/i386.h                |  4 ++++
>  gcc/config/i386/i386.opt              | 25 +++++++++++++++++++++++++
>  gcc/doc/invoke.texi                   | 11 +++++++----
>  gcc/testsuite/gcc.target/i386/apx-1.c |  8 ++++++++
>  12 files changed, 102 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/apx-1.c
>
> diff --git a/gcc/common/config/i386/cpuinfo.h b/gcc/common/config/i386/cpuinfo.h
> index 24ae0dbf0ac..141d3743316 100644
> --- a/gcc/common/config/i386/cpuinfo.h
> +++ b/gcc/common/config/i386/cpuinfo.h
> @@ -678,6 +678,7 @@ get_available_features (struct __processor_model *cpu_model,
>  #define XSTATE_HI_ZMM                  0x80
>  #define XSTATE_TILECFG                 0x20000
>  #define XSTATE_TILEDATA                0x40000
> +#define XSTATE_APX_F                   0x80000
>
>  #define XCR_AVX_ENABLED_MASK \
>    (XSTATE_SSE | XSTATE_YMM)
> @@ -685,11 +686,13 @@ get_available_features (struct __processor_model *cpu_model,
>    (XSTATE_SSE | XSTATE_YMM | XSTATE_OPMASK | XSTATE_ZMM | XSTATE_HI_ZMM)
>  #define XCR_AMX_ENABLED_MASK \
>    (XSTATE_TILECFG | XSTATE_TILEDATA)
> +#define XCR_APX_F_ENABLED_MASK XSTATE_APX_F
>
> -  /* Check if AVX and AVX512 are usable.  */
> +  /* Check if AVX, AVX512 and APX are usable.  */
>    int avx_usable = 0;
>    int avx512_usable = 0;
>    int amx_usable = 0;
> +  int apx_usable = 0;
>    /* Check if KL is usable.  */
>    int has_kl = 0;
>    if ((ecx & bit_OSXSAVE))
> @@ -709,6 +712,8 @@ get_available_features (struct __processor_model *cpu_model,
>         }
>        amx_usable = ((xcrlow & XCR_AMX_ENABLED_MASK)
>                     == XCR_AMX_ENABLED_MASK);
> +      apx_usable = ((xcrlow & XCR_APX_F_ENABLED_MASK)
> +                   == XCR_APX_F_ENABLED_MASK);
>      }
>
>  #define set_feature(f) \
> @@ -922,6 +927,11 @@ get_available_features (struct __processor_model *cpu_model,
>               if (edx & bit_AMX_COMPLEX)
>                 set_feature (FEATURE_AMX_COMPLEX);
>             }
> +         if (apx_usable)
> +           {
> +             if (edx & bit_APX_F)
> +               set_feature (FEATURE_APX_F);
> +           }
>         }
>      }
>
> diff --git a/gcc/common/config/i386/i386-common.cc b/gcc/common/config/i386/i386-common.cc
> index 95468b7c405..86596e96ad1 100644
> --- a/gcc/common/config/i386/i386-common.cc
> +++ b/gcc/common/config/i386/i386-common.cc
> @@ -123,6 +123,7 @@ along with GCC; see the file COPYING3.  If not see
>  #define OPTION_MASK_ISA2_SM3_SET OPTION_MASK_ISA2_SM3
>  #define OPTION_MASK_ISA2_SHA512_SET OPTION_MASK_ISA2_SHA512
>  #define OPTION_MASK_ISA2_SM4_SET OPTION_MASK_ISA2_SM4
> +#define OPTION_MASK_ISA2_APX_F_SET OPTION_MASK_ISA2_APX_F
>
>  /* SSE4 includes both SSE4.1 and SSE4.2. -msse4 should be the same
>     as -msse4.2.  */
> @@ -309,6 +310,7 @@ along with GCC; see the file COPYING3.  If not see
>  #define OPTION_MASK_ISA2_SM3_UNSET OPTION_MASK_ISA2_SM3
>  #define OPTION_MASK_ISA2_SHA512_UNSET OPTION_MASK_ISA2_SHA512
>  #define OPTION_MASK_ISA2_SM4_UNSET OPTION_MASK_ISA2_SM4
> +#define OPTION_MASK_ISA2_APX_F_UNSET OPTION_MASK_ISA2_APX_F
>
>  /* SSE4 includes both SSE4.1 and SSE4.2.  -mno-sse4 should the same
>     as -mno-sse4.1. */
> @@ -1341,6 +1343,21 @@ ix86_handle_option (struct gcc_options *opts,
>         }
>        return true;
>
> +    case OPT_mapxf:
> +      if (value)
> +       {
> +         opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_APX_F_SET;
> +         opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_APX_F_SET;
> +         opts->x_ix86_apx_features = apx_all;
> +       }
> +      else
> +       {
> +         opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_APX_F_UNSET;
> +         opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_APX_F_UNSET;
> +         opts->x_ix86_apx_features = apx_none;
> +       }
> +      return true;
> +
>      case OPT_mfma:
>        if (value)
>         {
> diff --git a/gcc/common/config/i386/i386-cpuinfo.h b/gcc/common/config/i386/i386-cpuinfo.h
> index 9153b4d0a54..8bf592191ab 100644
> --- a/gcc/common/config/i386/i386-cpuinfo.h
> +++ b/gcc/common/config/i386/i386-cpuinfo.h
> @@ -261,6 +261,7 @@ enum processor_features
>    FEATURE_SM3,
>    FEATURE_SHA512,
>    FEATURE_SM4,
> +  FEATURE_APX_F,
>    CPU_FEATURE_MAX
>  };
>
> diff --git a/gcc/common/config/i386/i386-isas.h b/gcc/common/config/i386/i386-isas.h
> index 2297903a45e..47e0cbd6f5b 100644
> --- a/gcc/common/config/i386/i386-isas.h
> +++ b/gcc/common/config/i386/i386-isas.h
> @@ -191,4 +191,5 @@ ISA_NAMES_TABLE_START
>    ISA_NAMES_TABLE_ENTRY("sm3", FEATURE_SM3, P_NONE, "-msm3")
>    ISA_NAMES_TABLE_ENTRY("sha512", FEATURE_SHA512, P_NONE, "-msha512")
>    ISA_NAMES_TABLE_ENTRY("sm4", FEATURE_SM4, P_NONE, "-msm4")
> +  ISA_NAMES_TABLE_ENTRY("apxf", FEATURE_APX_F, P_NONE, "-mapxf")
>  ISA_NAMES_TABLE_END
> diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h
> index 73c15480350..f3d3a2a1c22 100644
> --- a/gcc/config/i386/cpuid.h
> +++ b/gcc/config/i386/cpuid.h
> @@ -149,6 +149,7 @@
>  #define bit_AVXNECONVERT       (1 << 5)
>  #define bit_AVXVNNIINT16       (1 << 10)
>  #define bit_PREFETCHI  (1 << 14)
> +#define bit_APX_F      (1 << 21)
>
>  /* Extended State Enumeration Sub-leaf (%eax == 0xd, %ecx == 1) */
>  #define bit_XSAVEOPT   (1 << 0)
> diff --git a/gcc/config/i386/i386-isa.def b/gcc/config/i386/i386-isa.def
> index aeafcf870ac..c581f343339 100644
> --- a/gcc/config/i386/i386-isa.def
> +++ b/gcc/config/i386/i386-isa.def
> @@ -121,3 +121,4 @@ DEF_PTA(AVXVNNIINT16)
>  DEF_PTA(SM3)
>  DEF_PTA(SHA512)
>  DEF_PTA(SM4)
> +DEF_PTA(APX_F)
> diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
> index e47f9ed5d5f..b9727d5f1f3 100644
> --- a/gcc/config/i386/i386-options.cc
> +++ b/gcc/config/i386/i386-options.cc
> @@ -694,6 +694,7 @@ ix86_function_specific_save (struct cl_target_option *ptr,
>    ptr->branch_cost = ix86_branch_cost;
>    ptr->tune_defaulted = ix86_tune_defaulted;
>    ptr->arch_specified = ix86_arch_specified;
> +  ptr->x_ix86_apx_features = opts->x_ix86_apx_features;
>    ptr->x_ix86_isa_flags_explicit = opts->x_ix86_isa_flags_explicit;
>    ptr->x_ix86_isa_flags2_explicit = opts->x_ix86_isa_flags2_explicit;
>    ptr->x_recip_mask_explicit = opts->x_recip_mask_explicit;
> @@ -832,6 +833,7 @@ ix86_function_specific_restore (struct gcc_options *opts,
>    ix86_prefetch_sse = ptr->prefetch_sse;
>    ix86_tune_defaulted = ptr->tune_defaulted;
>    ix86_arch_specified = ptr->arch_specified;
> +  opts->x_ix86_apx_features = ptr->x_ix86_apx_features;
>    opts->x_ix86_isa_flags_explicit = ptr->x_ix86_isa_flags_explicit;
>    opts->x_ix86_isa_flags2_explicit = ptr->x_ix86_isa_flags2_explicit;
>    opts->x_recip_mask_explicit = ptr->x_recip_mask_explicit;
> @@ -1109,6 +1111,7 @@ ix86_valid_target_attribute_inner_p (tree fndecl, tree args, char *p_strings[],
>      IX86_ATTR_ISA ("sm3", OPT_msm3),
>      IX86_ATTR_ISA ("sha512", OPT_msha512),
>      IX86_ATTR_ISA ("sm4", OPT_msm4),
> +    IX86_ATTR_ISA ("apxf", OPT_mapxf),
>
>      /* enum options */
>      IX86_ATTR_ENUM ("fpmath=", OPT_mfpmath_),
> @@ -2080,6 +2083,9 @@ ix86_option_override_internal (bool main_args_p,
>        opts->x_ix86_stringop_alg = no_stringop;
>      }
>
> +  if (TARGET_APX_F && !TARGET_64BIT)
> +    error ("%<-mapxf%> is not supported for 32-bit code");
> +
>    if (TARGET_UINTR && !TARGET_64BIT)
>      error ("%<-muintr%> not supported for 32-bit code");
>
> @@ -2293,6 +2299,14 @@ ix86_option_override_internal (bool main_args_p,
>               SET_TARGET_POPCNT (opts);
>           }
>
> +       /* Enable apx if apxf or apx_features are not
> +          explicitly set for -march.  */
> +       if (TARGET_64BIT_P (opts->x_ix86_isa_flags)
> +            && ((processor_alias_table[i].flags & PTA_APX_F) != 0)
> +            && !TARGET_EXPLICIT_APX_F_P (opts)
> +            && !OPTION_SET_P (ix86_apx_features))
> +         opts->x_ix86_apx_features = apx_all;
> +
>         if ((processor_alias_table[i].flags
>            & (PTA_PREFETCH_SSE | PTA_SSE)) != 0)
>           ix86_prefetch_sse = true;
> @@ -2444,6 +2458,10 @@ ix86_option_override_internal (bool main_args_p,
>    /* Arrange to set up i386_stack_locals for all functions.  */
>    init_machine_status = ix86_init_machine_status;
>
> +  /* Override APX flag here if ISA bit is set.  */
> +  if (TARGET_APX_F && !OPTION_SET_P (ix86_apx_features))
> +    opts->x_ix86_apx_features = apx_all;
> +
>    /* Validate -mregparm= value.  */
>    if (opts_set->x_ix86_regparm)
>      {
> diff --git a/gcc/config/i386/i386-opts.h b/gcc/config/i386/i386-opts.h
> index be359f3e3d5..2ec76a16bce 100644
> --- a/gcc/config/i386/i386-opts.h
> +++ b/gcc/config/i386/i386-opts.h
> @@ -134,4 +134,12 @@ enum lam_type {
>    lam_u57
>  };
>
> +enum apx_features {
> +  apx_none = 0,
> +  apx_egpr = 1 << 0,
> +  apx_push2pop2 = 1 << 1,
> +  apx_ndd = 1 << 2,
> +  apx_all = apx_egpr | apx_push2pop2 | apx_ndd,
> +};
> +
>  #endif
> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> index 3e8488f2ae8..8c7ed541a8f 100644
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -51,6 +51,10 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
>
>  #define TARGET_MMX_WITH_SSE    (TARGET_64BIT && TARGET_SSE2)
>
> +#define TARGET_APX_EGPR (ix86_apx_features & apx_egpr)
> +#define TARGET_APX_PUSH2POP2 (ix86_apx_features & apx_push2pop2)
> +#define TARGET_APX_NDD (ix86_apx_features & apx_ndd)
> +
>  #include "config/vxworks-dummy.h"
>
>  #include "config/i386/i386-opts.h"
> diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
> index 78b499304a4..d89b5bbc5e8 100644
> --- a/gcc/config/i386/i386.opt
> +++ b/gcc/config/i386/i386.opt
> @@ -1310,3 +1310,28 @@ Enable vectorization for gather instruction.
>  mscatter
>  Target Alias(mtune-ctrl=, use_scatter, ^use_scatter)
>  Enable vectorization for scatter instruction.
> +
> +mapxf
> +Target Mask(ISA2_APX_F) Var(ix86_isa_flags2) Save
> +Support APX code generation.
> +
> +mapx-features=
> +Target Undocumented Joined Enum(apx_features) EnumSet Var(ix86_apx_features) Init(apx_none) Save
> +
> +Enum
> +Name(apx_features) Type(int)
> +
> +EnumValue
> +Enum(apx_features) String(none) Value(apx_none) Set(1)
> +
> +EnumValue
> +Enum(apx_features) String(egpr) Value(apx_egpr) Set(2)
> +
> +EnumValue
> +Enum(apx_features) String(push2pop2) Value(apx_push2pop2) Set(3)
> +
> +EnumValue
> +Enum(apx_features) String(ndd) Value(apx_ndd) Set(4)
> +
> +EnumValue
> +Enum(apx_features) String(all) Value(apx_all) Set(1)
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index ba7984bcb7e..afe6b321b14 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -1443,7 +1443,7 @@ See RS/6000 and PowerPC Options.
>  -mrdseed  -msgx -mavx512vp2intersect -mserialize -mtsxldtrk
>  -mamx-tile  -mamx-int8  -mamx-bf16 -muintr -mhreset -mavxvnni
>  -mavx512fp16 -mavxifma -mavxvnniint8 -mavxneconvert -mcmpccxadd -mamx-fp16
> --mprefetchi -mraoint -mamx-complex -mavxvnniint16 -msm3 -msha512 -msm4
> +-mprefetchi -mraoint -mamx-complex -mavxvnniint16 -msm3 -msha512 -msm4 -mapxf
>  -mcldemote  -mms-bitfields  -mno-align-stringops  -minline-all-stringops
>  -minline-stringops-dynamically  -mstringop-strategy=@var{alg}
>  -mkl -mwidekl
> @@ -33802,6 +33802,9 @@ preferred alignment to @option{-mpreferred-stack-boundary=2}.
>  @need 200
>  @opindex msm4
>  @itemx -msm4
> +@need 200
> +@opindex mapxf
> +@itemx -mapxf
>  These switches enable the use of instructions in the MMX, SSE,
>  AVX512ER, AVX512CD, AVX512VL, AVX512BW, AVX512DQ, AVX512IFMA, AVX512VBMI, SHA,
>  AES, PCLMUL, CLFLUSHOPT, CLWB, FSGSBASE, PTWRITE, RDRND, F16C, FMA, PCONFIG,
> @@ -33812,9 +33815,9 @@ GFNI, VAES, WAITPKG, VPCLMULQDQ, AVX512BITALG, MOVDIRI, MOVDIR64B, AVX512BF16,
>  ENQCMD, AVX512VPOPCNTDQ, AVX5124FMAPS, AVX512VNNI, AVX5124VNNIW, SERIALIZE,
>  UINTR, HRESET, AMXTILE, AMXINT8, AMXBF16, KL, WIDEKL, AVXVNNI, AVX512-FP16,
>  AVXIFMA, AVXVNNIINT8, AVXNECONVERT, CMPCCXADD, AMX-FP16, PREFETCHI, RAOINT,
> -AMX-COMPLEX, AVXVNNIINT16, SM3, SHA512, SM4 or CLDEMOTE extended instruction
> -sets. Each has a corresponding @option{-mno-} option to disable use of these
> -instructions.
> +AMX-COMPLEX, AVXVNNIINT16, SM3, SHA512, SM4, APX_F or CLDEMOTE extended
> +instruction sets. Each has a corresponding @option{-mno-} option to disable
> +use of these instructions.
>
>  These extensions are also available as built-in functions: see
>  @ref{x86 Built-in Functions}, for details of the functions enabled and
> diff --git a/gcc/testsuite/gcc.target/i386/apx-1.c b/gcc/testsuite/gcc.target/i386/apx-1.c
> new file mode 100644
> index 00000000000..4e580ecdf37
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/apx-1.c
> @@ -0,0 +1,8 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mapxf" } */
> +/* { dg-error "'-mapxf' is not supported for 32-bit code" "" { target ia32 } 0 } */
> +
> +void
> +apx_hanlder ()
> +{
> +}
> --
> 2.31.1
>


-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 01/13] [APX EGPR] middle-end: Add insn argument to base_reg_class
  2023-09-22 16:02   ` Vladimir Makarov
@ 2023-10-07  8:22     ` Hongyu Wang
  0 siblings, 0 replies; 28+ messages in thread
From: Hongyu Wang @ 2023-10-07  8:22 UTC (permalink / raw)
  To: Vladimir Makarov
  Cc: Hongyu Wang, gcc-patches, ubizjak, jakub, Kong Lingling, Hongtao Liu

> It would be nice to add to the documentation that INSN_BASE_REG_CLASS,
> INSN_INDEX_REG_CLASS, and REGNO_OK_FOR_INSN_BASE_P if defined have
> priority over older corresponding macros as it is already documented for
> REGNO_MODE_CODE_OK_FOR_BASE_P relating to REGNO_OK_FOR_BASE_P. But this
> small issue can be addressed later.
>

Thanks, I would add the description like below when committing.

+@defmac INSN_BASE_REG_CLASS (@var{insn})
+A C expression whose value is the register class to which a valid
+base register for a specified @var{insn} must belong. This macro is
+used when some backend insns may have limited usage of base register
+compared with other insns. If you define this macro, the compiler will
+use it instead of all other defined macros that relate to
+BASE_REG_CLASS.
+@end defmac
+

+@defmac REGNO_OK_FOR_INSN_BASE_P (@var{num}, @var{insn})
+A C expression which is nonzero if register number @var{num} is
+suitable for use as a base register in operand addresses for a specified
+@var{insn}. This macro is used when some backend insn may have limited
+usage of base register compared with other insns. If you define this
+macro, the compiler will use it instead of all other defined macros
+that relate to REGNO_OK_FOR_BASE_P.
+@end defmac
+

+@defmac INSN_INDEX_REG_CLASS (@var{insn})
+A C expression whose value is the register class to which a valid
+index register for a specified @var{insn} must belong. This macro is
+used when some backend insns may have limited usage of index register
+compared with other insns. If you defined this macro, the compiler
+will use it instead of @code{INDEX_REG_CLASS}.
+@end defmac
+

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 01/13] [APX EGPR] middle-end: Add insn argument to base_reg_class
  2023-09-10  4:49     ` Hongyu Wang
@ 2023-09-14 12:09       ` Vladimir Makarov
  0 siblings, 0 replies; 28+ messages in thread
From: Vladimir Makarov @ 2023-09-14 12:09 UTC (permalink / raw)
  To: Hongyu Wang; +Cc: Hongyu Wang, gcc-patches, jakub, hongtao.liu


On 9/10/23 00:49, Hongyu Wang wrote:
> Vladimir Makarov via Gcc-patches <gcc-patches@gcc.gnu.org> 于2023年9月9日周六 01:04写道:
>>
>> On 8/31/23 04:20, Hongyu Wang wrote:
>>> @@ -2542,6 +2542,8 @@ the code of the immediately enclosing expression (@code{MEM} for the top level
>>>    of an address, @code{ADDRESS} for something that occurs in an
>>>    @code{address_operand}).  @var{index_code} is the code of the corresponding
>>>    index expression if @var{outer_code} is @code{PLUS}; @code{SCRATCH} otherwise.
>>> +@code{insn} indicates insn specific base register class should be subset
>>> +of the original base register class.
>>>    @end defmac
>> I'd prefer more general description of 'insn' argument for the macros.
>> Something like that:
>>
>> @code{insn} can be used to define an insn-specific base register class.
>>
> Sure, will adjust in the V2 patch.
> Also, currently we reuse the old macro MODE_CODE_BASE_REG_CLASS, do
> you think we need a new macro like INSN_BASE_REG_CLASS as other
> parameters are actually unused? Then we don't need to change other
> targets like avr/gcn.
>
I thought about this too.  Using new macros would be definitely worth to 
add, especially when you are already adding INSN_INDEX_REG_CLASS.

The names INSN_BASE_REG_CLASS instead of MODE_CODE_BASE_REG_CLASS and 
REGNO_OK_FOR_INSN_BASE_P instead of REGNO_MODE_CODE_OK_FOR_BASE_P are ok 
for me too.

When you submit the v2 patch, I'll review the RA part as soon as 
possible (actually I already looked at this) and most probably give my 
approval for the RA part because I prefer you current approach for RA 
instead of introducing new memory constraints.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 01/13] [APX EGPR] middle-end: Add insn argument to base_reg_class
  2023-09-08 17:03   ` Vladimir Makarov
@ 2023-09-10  4:49     ` Hongyu Wang
  2023-09-14 12:09       ` Vladimir Makarov
  0 siblings, 1 reply; 28+ messages in thread
From: Hongyu Wang @ 2023-09-10  4:49 UTC (permalink / raw)
  To: Vladimir Makarov; +Cc: Hongyu Wang, gcc-patches, jakub, hongtao.liu

Vladimir Makarov via Gcc-patches <gcc-patches@gcc.gnu.org> 于2023年9月9日周六 01:04写道:
>
>
> On 8/31/23 04:20, Hongyu Wang wrote:
> > @@ -2542,6 +2542,8 @@ the code of the immediately enclosing expression (@code{MEM} for the top level
> >   of an address, @code{ADDRESS} for something that occurs in an
> >   @code{address_operand}).  @var{index_code} is the code of the corresponding
> >   index expression if @var{outer_code} is @code{PLUS}; @code{SCRATCH} otherwise.
> > +@code{insn} indicates insn specific base register class should be subset
> > +of the original base register class.
> >   @end defmac
>
> I'd prefer more general description of 'insn' argument for the macros.
> Something like that:
>
> @code{insn} can be used to define an insn-specific base register class.
>

Sure, will adjust in the V2 patch.
Also, currently we reuse the old macro MODE_CODE_BASE_REG_CLASS, do
you think we need a new macro like INSN_BASE_REG_CLASS as other
parameters are actually unused? Then we don't need to change other
targets like avr/gcn.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 01/13] [APX EGPR] middle-end: Add insn argument to base_reg_class
  2023-08-31  8:20 ` [PATCH 01/13] [APX EGPR] middle-end: Add insn argument to base_reg_class Hongyu Wang
  2023-08-31 10:15   ` Uros Bizjak
@ 2023-09-08 17:03   ` Vladimir Makarov
  2023-09-10  4:49     ` Hongyu Wang
  1 sibling, 1 reply; 28+ messages in thread
From: Vladimir Makarov @ 2023-09-08 17:03 UTC (permalink / raw)
  To: Hongyu Wang, gcc-patches
  Cc: hongtao.liu, ubizjak, hubicka, jakub, Kong Lingling

[-- Attachment #1: Type: text/plain, Size: 673 bytes --]


On 8/31/23 04:20, Hongyu Wang wrote:
> @@ -2542,6 +2542,8 @@ the code of the immediately enclosing expression (@code{MEM} for the top level
>   of an address, @code{ADDRESS} for something that occurs in an
>   @code{address_operand}).  @var{index_code} is the code of the corresponding
>   index expression if @var{outer_code} is @code{PLUS}; @code{SCRATCH} otherwise.
> +@code{insn} indicates insn specific base register class should be subset
> +of the original base register class.
>   @end defmac

I'd prefer more general description of 'insn' argument for the macros.  
Something like that:

@code{insn} can be used to define an insn-specific base register class.



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 01/13] [APX EGPR] middle-end: Add insn argument to base_reg_class
  2023-09-07  6:23         ` Uros Bizjak
@ 2023-09-07 12:13           ` Vladimir Makarov
  0 siblings, 0 replies; 28+ messages in thread
From: Vladimir Makarov @ 2023-09-07 12:13 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: Hongyu Wang, Hongyu Wang, jakub, gcc-patches, hongtao.liu, hubicka


On 9/7/23 02:23, Uros Bizjak wrote:
> On Wed, Sep 6, 2023 at 9:43 PM Vladimir Makarov <vmakarov@redhat.com> wrote:
>>
>> On 9/1/23 05:07, Hongyu Wang wrote:
>>>
>> I think the approach proposed by Intel developers is better.  In some way
>> we already use such approach when we pass memory mode to get the base
>> reg class.  Although we could use different memory constraints for
>> different modes when the possible base reg differs for some memory
>> modes.
>>
>> Using special memory constraints probably can be implemented too (I
>> understand attractiveness of such approach for readability of the
>> machine description).  But in my opinion it will require much bigger
>> work in IRA/LRA/reload.  It also significantly slow down RA as we need
>> to process insn constraints for processing each memory in many places
>> (e.g. for calculation of reg classes and costs in IRA).  Still I think
>> there will be a few cases for this approach resulting in a bigger
>> probability of assigning hard reg out of specific base reg class and
>> this will result in additional reloads.
>>
>> So the approach proposed by Intel is ok for me.  Although if x86 maintainers
>> are strongly against this approach and the changes in x86 machine
>> dependent code and Intel developers implement Uros approach, I am
>> ready to review this.  But still I prefer the current Intel developers
>> approach for reasons I mentioned above.
> My above proposal is more or less a wish from a target maintainer PoV.
> Ideally, we would have a bunch of different memory constraints, and a
> target hook that returns corresponding BASE/INDEX reg classes.
> However, I have no idea about the complexity of the implementation in
> the infrastructure part of the compiler.
>
Basically, it needs introducing new hooks which return base and index 
classes from special memory constraints. When we process memory in an 
insn (a lot of places in IRA, LRA,reload) we should consider all 
possible memory insn constraints, take intersection of basic and index 
reg classes for the constraints and use them instead of the default base 
and reg classes.

The required functionality is absent in reload too.

I would say that it is a moderate size project (1-2 months for me).  It 
still requires to introduce new hooks and I guess there are few cases 
when we will still assign hard regs out of desirable base class for 
address pseudos and this will results in generation of additional reload 
insns.  It also means much more additional changes in RA source code and 
x86 machine dependent files.

Probably, with this approach there will be also edge cases when we need 
to solve new PRs because of LRA failures to generate the correct code 
but I believe they can be solved.

Therefore I lean toward the current Intel approach when to get base reg 
class we pass the insn as a parameter additionally to memory mode.



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 01/13] [APX EGPR] middle-end: Add insn argument to base_reg_class
  2023-09-06 19:43       ` Vladimir Makarov
@ 2023-09-07  6:23         ` Uros Bizjak
  2023-09-07 12:13           ` Vladimir Makarov
  0 siblings, 1 reply; 28+ messages in thread
From: Uros Bizjak @ 2023-09-07  6:23 UTC (permalink / raw)
  To: Vladimir Makarov
  Cc: Hongyu Wang, Hongyu Wang, jakub, gcc-patches, hongtao.liu, hubicka

On Wed, Sep 6, 2023 at 9:43 PM Vladimir Makarov <vmakarov@redhat.com> wrote:
>
>
> On 9/1/23 05:07, Hongyu Wang wrote:
> > Uros Bizjak via Gcc-patches <gcc-patches@gcc.gnu.org> 于2023年8月31日周四 18:16写道:
> >> On Thu, Aug 31, 2023 at 10:20 AM Hongyu Wang <hongyu.wang@intel.com> wrote:
> >>> From: Kong Lingling <lingling.kong@intel.com>
> >>>
> >>> Current reload infrastructure does not support selective base_reg_class
> >>> for backend insn. Add insn argument to base_reg_class for
> >>> lra/reload usage.
> >> I don't think this is the correct approach. Ideally, a memory
> >> constraint should somehow encode its BASE/INDEX register class.
> >> Instead of passing "insn", simply a different constraint could be used
> >> in the constraint string of the relevant insn.
> > We tried constraint only at the beginning, but then we found the
> > reload infrastructure
> > does not work like that.
> >
> > The BASE/INDEX reg classes are determined before choosing alternatives, in
> > process_address under curr_insn_transform. Process_address creates the mem
> > operand according to the BASE/INDEX reg class. Then, the memory operand
> > constraint check will evaluate the mem op with targetm.legitimate_address_p.
> >
> > If we want to make use of EGPR in base/index we need to either extend BASE/INDEX
> > reg class in the backend, or, for specific insns, add a target hook to
> > tell reload
> > that the extended reg class with EGPR can be used to construct memory operand.
> >
> > CC'd Vladimir as git send-mail failed to add recipient.
> >
> >
> I think the approach proposed by Intel developers is better.  In some way
> we already use such approach when we pass memory mode to get the base
> reg class.  Although we could use different memory constraints for
> different modes when the possible base reg differs for some memory
> modes.
>
> Using special memory constraints probably can be implemented too (I
> understand attractiveness of such approach for readability of the
> machine description).  But in my opinion it will require much bigger
> work in IRA/LRA/reload.  It also significantly slow down RA as we need
> to process insn constraints for processing each memory in many places
> (e.g. for calculation of reg classes and costs in IRA).  Still I think
> there will be a few cases for this approach resulting in a bigger
> probability of assigning hard reg out of specific base reg class and
> this will result in additional reloads.
>
> So the approach proposed by Intel is ok for me.  Although if x86 maintainers
> are strongly against this approach and the changes in x86 machine
> dependent code and Intel developers implement Uros approach, I am
> ready to review this.  But still I prefer the current Intel developers
> approach for reasons I mentioned above.

My above proposal is more or less a wish from a target maintainer PoV.
Ideally, we would have a bunch of different memory constraints, and a
target hook that returns corresponding BASE/INDEX reg classes.
However, I have no idea about the complexity of the implementation in
the infrastructure part of the compiler.

Uros.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 01/13] [APX EGPR] middle-end: Add insn argument to base_reg_class
  2023-09-01  9:07     ` Hongyu Wang
@ 2023-09-06 19:43       ` Vladimir Makarov
  2023-09-07  6:23         ` Uros Bizjak
  0 siblings, 1 reply; 28+ messages in thread
From: Vladimir Makarov @ 2023-09-06 19:43 UTC (permalink / raw)
  To: Hongyu Wang, Uros Bizjak
  Cc: Hongyu Wang, jakub, gcc-patches, hongtao.liu, hubicka


On 9/1/23 05:07, Hongyu Wang wrote:
> Uros Bizjak via Gcc-patches <gcc-patches@gcc.gnu.org> 于2023年8月31日周四 18:16写道:
>> On Thu, Aug 31, 2023 at 10:20 AM Hongyu Wang <hongyu.wang@intel.com> wrote:
>>> From: Kong Lingling <lingling.kong@intel.com>
>>>
>>> Current reload infrastructure does not support selective base_reg_class
>>> for backend insn. Add insn argument to base_reg_class for
>>> lra/reload usage.
>> I don't think this is the correct approach. Ideally, a memory
>> constraint should somehow encode its BASE/INDEX register class.
>> Instead of passing "insn", simply a different constraint could be used
>> in the constraint string of the relevant insn.
> We tried constraint only at the beginning, but then we found the
> reload infrastructure
> does not work like that.
>
> The BASE/INDEX reg classes are determined before choosing alternatives, in
> process_address under curr_insn_transform. Process_address creates the mem
> operand according to the BASE/INDEX reg class. Then, the memory operand
> constraint check will evaluate the mem op with targetm.legitimate_address_p.
>
> If we want to make use of EGPR in base/index we need to either extend BASE/INDEX
> reg class in the backend, or, for specific insns, add a target hook to
> tell reload
> that the extended reg class with EGPR can be used to construct memory operand.
>
> CC'd Vladimir as git send-mail failed to add recipient.
>
>
I think the approach proposed by Intel developers is better.  In some way
we already use such approach when we pass memory mode to get the base
reg class.  Although we could use different memory constraints for
different modes when the possible base reg differs for some memory
modes.

Using special memory constraints probably can be implemented too (I
understand attractiveness of such approach for readability of the
machine description).  But in my opinion it will require much bigger
work in IRA/LRA/reload.  It also significantly slow down RA as we need
to process insn constraints for processing each memory in many places
(e.g. for calculation of reg classes and costs in IRA).  Still I think
there will be a few cases for this approach resulting in a bigger
probability of assigning hard reg out of specific base reg class and
this will result in additional reloads.

So the approach proposed by Intel is ok for me.  Although if x86 maintainers
are strongly against this approach and the changes in x86 machine
dependent code and Intel developers implement Uros approach, I am
ready to review this.  But still I prefer the current Intel developers
approach for reasons I mentioned above.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 01/13] [APX EGPR] middle-end: Add insn argument to base_reg_class
  2023-08-31 10:15   ` Uros Bizjak
@ 2023-09-01  9:07     ` Hongyu Wang
  2023-09-06 19:43       ` Vladimir Makarov
  0 siblings, 1 reply; 28+ messages in thread
From: Hongyu Wang @ 2023-09-01  9:07 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: Hongyu Wang, jakub, gcc-patches, hongtao.liu, hubicka, vmakarov

Uros Bizjak via Gcc-patches <gcc-patches@gcc.gnu.org> 于2023年8月31日周四 18:16写道:
>
> On Thu, Aug 31, 2023 at 10:20 AM Hongyu Wang <hongyu.wang@intel.com> wrote:
> >
> > From: Kong Lingling <lingling.kong@intel.com>
> >
> > Current reload infrastructure does not support selective base_reg_class
> > for backend insn. Add insn argument to base_reg_class for
> > lra/reload usage.
>
> I don't think this is the correct approach. Ideally, a memory
> constraint should somehow encode its BASE/INDEX register class.
> Instead of passing "insn", simply a different constraint could be used
> in the constraint string of the relevant insn.

We tried constraint only at the beginning, but then we found the
reload infrastructure
does not work like that.

The BASE/INDEX reg classes are determined before choosing alternatives, in
process_address under curr_insn_transform. Process_address creates the mem
operand according to the BASE/INDEX reg class. Then, the memory operand
constraint check will evaluate the mem op with targetm.legitimate_address_p.

If we want to make use of EGPR in base/index we need to either extend BASE/INDEX
reg class in the backend, or, for specific insns, add a target hook to
tell reload
that the extended reg class with EGPR can be used to construct memory operand.

CC'd Vladimir as git send-mail failed to add recipient.

>
> Uros.
> >
> > gcc/ChangeLog:
> >
> >         * addresses.h (base_reg_class):  Add insn argument.
> >         Pass to MODE_CODE_BASE_REG_CLASS.
> >         (regno_ok_for_base_p_1): Add insn argument.
> >         Pass to REGNO_MODE_CODE_OK_FOR_BASE_P.
> >         (regno_ok_for_base_p): Add insn argument and parse to ok_for_base_p_1.
> >         * config/avr/avr.h (MODE_CODE_BASE_REG_CLASS): Add insn argument.
> >         (REGNO_MODE_CODE_OK_FOR_BASE_P): Ditto.
> >         * config/gcn/gcn.h (MODE_CODE_BASE_REG_CLASS): Ditto.
> >         (REGNO_MODE_CODE_OK_FOR_BASE_P): Ditto.
> >         * config/rl78/rl78.h (REGNO_MODE_CODE_OK_FOR_BASE_P): Ditto.
> >         (MODE_CODE_BASE_REG_CLASS): Ditto.
> >         * doc/tm.texi: Add insn argument for MODE_CODE_BASE_REG_CLASS
> >         and REGNO_MODE_CODE_OK_FOR_BASE_P.
> >         * doc/tm.texi.in: Ditto.
> >         * lra-constraints.cc (process_address_1): Pass insn to
> >         base_reg_class.
> >         (curr_insn_transform): Ditto.
> >         * reload.cc (find_reloads): Ditto.
> >         (find_reloads_address): Ditto.
> >         (find_reloads_address_1): Ditto.
> >         (find_reloads_subreg_address): Ditto.
> >         * reload1.cc (maybe_fix_stack_asms): Ditto.
> > ---
> >  gcc/addresses.h        | 15 +++++++++------
> >  gcc/config/avr/avr.h   |  5 +++--
> >  gcc/config/gcn/gcn.h   |  4 ++--
> >  gcc/config/rl78/rl78.h |  6 ++++--
> >  gcc/doc/tm.texi        |  8 ++++++--
> >  gcc/doc/tm.texi.in     |  8 ++++++--
> >  gcc/lra-constraints.cc | 15 +++++++++------
> >  gcc/reload.cc          | 30 ++++++++++++++++++------------
> >  gcc/reload1.cc         |  2 +-
> >  9 files changed, 58 insertions(+), 35 deletions(-)
> >
> > diff --git a/gcc/addresses.h b/gcc/addresses.h
> > index 3519c241c6d..08b100cfe6d 100644
> > --- a/gcc/addresses.h
> > +++ b/gcc/addresses.h
> > @@ -28,11 +28,12 @@ inline enum reg_class
> >  base_reg_class (machine_mode mode ATTRIBUTE_UNUSED,
> >                 addr_space_t as ATTRIBUTE_UNUSED,
> >                 enum rtx_code outer_code ATTRIBUTE_UNUSED,
> > -               enum rtx_code index_code ATTRIBUTE_UNUSED)
> > +               enum rtx_code index_code ATTRIBUTE_UNUSED,
> > +               rtx_insn *insn ATTRIBUTE_UNUSED = NULL)
> >  {
> >  #ifdef MODE_CODE_BASE_REG_CLASS
> >    return MODE_CODE_BASE_REG_CLASS (MACRO_MODE (mode), as, outer_code,
> > -                                  index_code);
> > +                                  index_code, insn);
> >  #else
> >  #ifdef MODE_BASE_REG_REG_CLASS
> >    if (index_code == REG)
> > @@ -56,11 +57,12 @@ ok_for_base_p_1 (unsigned regno ATTRIBUTE_UNUSED,
> >                  machine_mode mode ATTRIBUTE_UNUSED,
> >                  addr_space_t as ATTRIBUTE_UNUSED,
> >                  enum rtx_code outer_code ATTRIBUTE_UNUSED,
> > -                enum rtx_code index_code ATTRIBUTE_UNUSED)
> > +                enum rtx_code index_code ATTRIBUTE_UNUSED,
> > +                rtx_insn* insn ATTRIBUTE_UNUSED = NULL)
> >  {
> >  #ifdef REGNO_MODE_CODE_OK_FOR_BASE_P
> >    return REGNO_MODE_CODE_OK_FOR_BASE_P (regno, MACRO_MODE (mode), as,
> > -                                       outer_code, index_code);
> > +                                       outer_code, index_code, insn);
> >  #else
> >  #ifdef REGNO_MODE_OK_FOR_REG_BASE_P
> >    if (index_code == REG)
> > @@ -79,12 +81,13 @@ ok_for_base_p_1 (unsigned regno ATTRIBUTE_UNUSED,
> >
> >  inline bool
> >  regno_ok_for_base_p (unsigned regno, machine_mode mode, addr_space_t as,
> > -                    enum rtx_code outer_code, enum rtx_code index_code)
> > +                    enum rtx_code outer_code, enum rtx_code index_code,
> > +                    rtx_insn* insn = NULL)
> >  {
> >    if (regno >= FIRST_PSEUDO_REGISTER && reg_renumber[regno] >= 0)
> >      regno = reg_renumber[regno];
> >
> > -  return ok_for_base_p_1 (regno, mode, as, outer_code, index_code);
> > +  return ok_for_base_p_1 (regno, mode, as, outer_code, index_code, insn);
> >  }
> >
> >  #endif /* GCC_ADDRESSES_H */
> > diff --git a/gcc/config/avr/avr.h b/gcc/config/avr/avr.h
> > index 8e7e00db13b..1d090fe0838 100644
> > --- a/gcc/config/avr/avr.h
> > +++ b/gcc/config/avr/avr.h
> > @@ -280,12 +280,13 @@ enum reg_class {
> >
> >  #define REGNO_REG_CLASS(R) avr_regno_reg_class(R)
> >
> > -#define MODE_CODE_BASE_REG_CLASS(mode, as, outer_code, index_code)   \
> > +#define MODE_CODE_BASE_REG_CLASS(mode, as, outer_code, index_code, insn)   \
> >    avr_mode_code_base_reg_class (mode, as, outer_code, index_code)
> >
> >  #define INDEX_REG_CLASS NO_REGS
> >
> > -#define REGNO_MODE_CODE_OK_FOR_BASE_P(num, mode, as, outer_code, index_code) \
> > +#define REGNO_MODE_CODE_OK_FOR_BASE_P(num, mode, as, outer_code,         \
> > +                                     index_code, insn)                   \
> >    avr_regno_mode_code_ok_for_base_p (num, mode, as, outer_code, index_code)
> >
> >  #define REGNO_OK_FOR_INDEX_P(NUM) 0
> > diff --git a/gcc/config/gcn/gcn.h b/gcc/config/gcn/gcn.h
> > index 4ff9a5d4d12..b56702a77fd 100644
> > --- a/gcc/config/gcn/gcn.h
> > +++ b/gcc/config/gcn/gcn.h
> > @@ -437,9 +437,9 @@ enum reg_class
> >       0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0 }}
> >
> >  #define REGNO_REG_CLASS(REGNO) gcn_regno_reg_class (REGNO)
> > -#define MODE_CODE_BASE_REG_CLASS(MODE, AS, OUTER, INDEX) \
> > +#define MODE_CODE_BASE_REG_CLASS(MODE, AS, OUTER, INDEX, INSN) \
> >          gcn_mode_code_base_reg_class (MODE, AS, OUTER, INDEX)
> > -#define REGNO_MODE_CODE_OK_FOR_BASE_P(NUM, MODE, AS, OUTER, INDEX) \
> > +#define REGNO_MODE_CODE_OK_FOR_BASE_P(NUM, MODE, AS, OUTER, INDEX, INSN) \
> >          gcn_regno_mode_code_ok_for_base_p (NUM, MODE, AS, OUTER, INDEX)
> >  #define INDEX_REG_CLASS VGPR_REGS
> >  #define REGNO_OK_FOR_INDEX_P(regno) regno_ok_for_index_p (regno)
> > diff --git a/gcc/config/rl78/rl78.h b/gcc/config/rl78/rl78.h
> > index 7a7c6a44ba2..d0ed9162292 100644
> > --- a/gcc/config/rl78/rl78.h
> > +++ b/gcc/config/rl78/rl78.h
> > @@ -375,10 +375,12 @@ enum reg_class
> >
> >  #define REGNO_OK_FOR_INDEX_P(regno)    REGNO_OK_FOR_BASE_P (regno)
> >
> > -#define REGNO_MODE_CODE_OK_FOR_BASE_P(regno, mode, address_space, outer_code, index_code) \
> > +#define REGNO_MODE_CODE_OK_FOR_BASE_P(regno, mode, address_space, outer_code, \
> > +                                     index_code, insn)                       \
> >    rl78_regno_mode_code_ok_for_base_p (regno, mode, address_space, outer_code, index_code)
> >
> > -#define MODE_CODE_BASE_REG_CLASS(mode, address_space, outer_code, index_code) \
> > +#define MODE_CODE_BASE_REG_CLASS(mode, address_space, outer_code, index_code, \
> > +                                insn)                                        \
> >    rl78_mode_code_base_reg_class (mode, address_space, outer_code, index_code)
> >
> >  #define RETURN_ADDR_RTX(COUNT, FRAMEADDR)                              \
> > diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> > index d0d47b0d471..a4239e3de10 100644
> > --- a/gcc/doc/tm.texi
> > +++ b/gcc/doc/tm.texi
> > @@ -2533,7 +2533,7 @@ register address.  You should define this macro if base plus index
> >  addresses have different requirements than other base register uses.
> >  @end defmac
> >
> > -@defmac MODE_CODE_BASE_REG_CLASS (@var{mode}, @var{address_space}, @var{outer_code}, @var{index_code})
> > +@defmac MODE_CODE_BASE_REG_CLASS (@var{mode}, @var{address_space}, @var{outer_code}, @var{index_code}, @var{insn})
> >  A C expression whose value is the register class to which a valid
> >  base register for a memory reference in mode @var{mode} to address
> >  space @var{address_space} must belong.  @var{outer_code} and @var{index_code}
> > @@ -2542,6 +2542,8 @@ the code of the immediately enclosing expression (@code{MEM} for the top level
> >  of an address, @code{ADDRESS} for something that occurs in an
> >  @code{address_operand}).  @var{index_code} is the code of the corresponding
> >  index expression if @var{outer_code} is @code{PLUS}; @code{SCRATCH} otherwise.
> > +@code{insn} indicates insn specific base register class should be subset
> > +of the original base register class.
> >  @end defmac
> >
> >  @defmac INDEX_REG_CLASS
> > @@ -2579,7 +2581,7 @@ Use of this macro is deprecated; please use the more general
> >  @code{REGNO_MODE_CODE_OK_FOR_BASE_P}.
> >  @end defmac
> >
> > -@defmac REGNO_MODE_CODE_OK_FOR_BASE_P (@var{num}, @var{mode}, @var{address_space}, @var{outer_code}, @var{index_code})
> > +@defmac REGNO_MODE_CODE_OK_FOR_BASE_P (@var{num}, @var{mode}, @var{address_space}, @var{outer_code}, @var{index_code}, @var{insn})
> >  A C expression which is nonzero if register number @var{num} is
> >  suitable for use as a base register in operand addresses, accessing
> >  memory in mode @var{mode} in address space @var{address_space}.
> > @@ -2592,6 +2594,8 @@ address, @code{ADDRESS} for something that occurs in an
> >  corresponding index expression if @var{outer_code} is @code{PLUS};
> >  @code{SCRATCH} otherwise.  The mode may be @code{VOIDmode} for addresses
> >  that appear outside a @code{MEM}, i.e., as an @code{address_operand}.
> > +@code{insn} indicates insn specific base register class should be subset
> > +of the original base register class.
> >  @end defmac
> >
> >  @defmac REGNO_OK_FOR_INDEX_P (@var{num})
> > diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
> > index 4ac96dc357d..72898f3adba 100644
> > --- a/gcc/doc/tm.texi.in
> > +++ b/gcc/doc/tm.texi.in
> > @@ -2128,7 +2128,7 @@ register address.  You should define this macro if base plus index
> >  addresses have different requirements than other base register uses.
> >  @end defmac
> >
> > -@defmac MODE_CODE_BASE_REG_CLASS (@var{mode}, @var{address_space}, @var{outer_code}, @var{index_code})
> > +@defmac MODE_CODE_BASE_REG_CLASS (@var{mode}, @var{address_space}, @var{outer_code}, @var{index_code}, @var{insn})
> >  A C expression whose value is the register class to which a valid
> >  base register for a memory reference in mode @var{mode} to address
> >  space @var{address_space} must belong.  @var{outer_code} and @var{index_code}
> > @@ -2137,6 +2137,8 @@ the code of the immediately enclosing expression (@code{MEM} for the top level
> >  of an address, @code{ADDRESS} for something that occurs in an
> >  @code{address_operand}).  @var{index_code} is the code of the corresponding
> >  index expression if @var{outer_code} is @code{PLUS}; @code{SCRATCH} otherwise.
> > +@code{insn} indicates insn specific base register class should be subset
> > +of the original base register class.
> >  @end defmac
> >
> >  @defmac INDEX_REG_CLASS
> > @@ -2174,7 +2176,7 @@ Use of this macro is deprecated; please use the more general
> >  @code{REGNO_MODE_CODE_OK_FOR_BASE_P}.
> >  @end defmac
> >
> > -@defmac REGNO_MODE_CODE_OK_FOR_BASE_P (@var{num}, @var{mode}, @var{address_space}, @var{outer_code}, @var{index_code})
> > +@defmac REGNO_MODE_CODE_OK_FOR_BASE_P (@var{num}, @var{mode}, @var{address_space}, @var{outer_code}, @var{index_code}, @var{insn})
> >  A C expression which is nonzero if register number @var{num} is
> >  suitable for use as a base register in operand addresses, accessing
> >  memory in mode @var{mode} in address space @var{address_space}.
> > @@ -2187,6 +2189,8 @@ address, @code{ADDRESS} for something that occurs in an
> >  corresponding index expression if @var{outer_code} is @code{PLUS};
> >  @code{SCRATCH} otherwise.  The mode may be @code{VOIDmode} for addresses
> >  that appear outside a @code{MEM}, i.e., as an @code{address_operand}.
> > +@code{insn} indicates insn specific base register class should be subset
> > +of the original base register class.
> >  @end defmac
> >
> >  @defmac REGNO_OK_FOR_INDEX_P (@var{num})
> > diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
> > index c718bedff32..9e7915ce934 100644
> > --- a/gcc/lra-constraints.cc
> > +++ b/gcc/lra-constraints.cc
> > @@ -3672,7 +3672,7 @@ process_address_1 (int nop, bool check_only_p,
> >                                      REGNO (*ad.base_term)) != NULL_RTX)
> >             ? after : NULL),
> >            base_reg_class (ad.mode, ad.as, ad.base_outer_code,
> > -                          get_index_code (&ad)))))
> > +                          get_index_code (&ad), curr_insn))))
> >      {
> >        change_p = true;
> >        if (ad.base_term2 != NULL)
> > @@ -3722,7 +3722,8 @@ process_address_1 (int nop, bool check_only_p,
> >           rtx_insn *last = get_last_insn ();
> >           int code = -1;
> >           enum reg_class cl = base_reg_class (ad.mode, ad.as,
> > -                                             SCRATCH, SCRATCH);
> > +                                             SCRATCH, SCRATCH,
> > +                                             curr_insn);
> >           rtx addr = *ad.inner;
> >
> >           new_reg = lra_create_new_reg (Pmode, NULL_RTX, cl, NULL, "addr");
> > @@ -3785,7 +3786,8 @@ process_address_1 (int nop, bool check_only_p,
> >           /* index * scale + disp => new base + index * scale,
> >              case (1) above.  */
> >           enum reg_class cl = base_reg_class (ad.mode, ad.as, PLUS,
> > -                                             GET_CODE (*ad.index));
> > +                                             GET_CODE (*ad.index),
> > +                                             curr_insn);
> >
> >           lra_assert (INDEX_REG_CLASS != NO_REGS);
> >           new_reg = lra_create_new_reg (Pmode, NULL_RTX, cl, NULL, "disp");
> > @@ -3846,7 +3848,7 @@ process_address_1 (int nop, bool check_only_p,
> >               *ad.base_term = XEXP (SET_SRC (set), 0);
> >               *ad.disp_term = XEXP (SET_SRC (set), 1);
> >               cl = base_reg_class (ad.mode, ad.as, ad.base_outer_code,
> > -                                  get_index_code (&ad));
> > +                                  get_index_code (&ad), curr_insn);
> >               regno = REGNO (*ad.base_term);
> >               if (regno >= FIRST_PSEUDO_REGISTER
> >                   && cl != lra_get_allocno_class (regno))
> > @@ -3890,7 +3892,8 @@ process_address_1 (int nop, bool check_only_p,
> >    else
> >      {
> >        enum reg_class cl = base_reg_class (ad.mode, ad.as,
> > -                                         SCRATCH, SCRATCH);
> > +                                         SCRATCH, SCRATCH,
> > +                                         curr_insn);
> >        rtx addr = *ad.inner;
> >
> >        new_reg = lra_create_new_reg (Pmode, NULL_RTX, cl, NULL, "addr");
> > @@ -4639,7 +4642,7 @@ curr_insn_transform (bool check_only_p)
> >
> >           push_to_sequence (before);
> >           rclass = base_reg_class (GET_MODE (op), MEM_ADDR_SPACE (op),
> > -                                  MEM, SCRATCH);
> > +                                  MEM, SCRATCH, curr_insn);
> >           if (GET_RTX_CLASS (code) == RTX_AUTOINC)
> >             new_reg = emit_inc (rclass, *loc, *loc,
> >                                 /* This value does not matter for MODIFY.  */
> > diff --git a/gcc/reload.cc b/gcc/reload.cc
> > index 2126bdd117c..72f7e27af15 100644
> > --- a/gcc/reload.cc
> > +++ b/gcc/reload.cc
> > @@ -3321,7 +3321,7 @@ find_reloads (rtx_insn *insn, int replace, int ind_levels, int live_known,
> >                        were handled in find_reloads_address.  */
> >                     this_alternative[i]
> >                       = base_reg_class (VOIDmode, ADDR_SPACE_GENERIC,
> > -                                       ADDRESS, SCRATCH);
> > +                                       ADDRESS, SCRATCH, insn);
> >                     win = 1;
> >                     badop = 0;
> >                     break;
> > @@ -3508,7 +3508,7 @@ find_reloads (rtx_insn *insn, int replace, int ind_levels, int live_known,
> >                            the address into a base register.  */
> >                         this_alternative[i]
> >                           = base_reg_class (VOIDmode, ADDR_SPACE_GENERIC,
> > -                                           ADDRESS, SCRATCH);
> > +                                           ADDRESS, SCRATCH, insn);
> >                         badop = 0;
> >                         break;
> >
> > @@ -4018,7 +4018,7 @@ find_reloads (rtx_insn *insn, int replace, int ind_levels, int live_known,
> >             operand_reloadnum[i]
> >               = push_reload (XEXP (recog_data.operand[i], 0), NULL_RTX,
> >                              &XEXP (recog_data.operand[i], 0), (rtx*) 0,
> > -                            base_reg_class (VOIDmode, as, MEM, SCRATCH),
> > +                            base_reg_class (VOIDmode, as, MEM, SCRATCH, insn),
> >                              address_mode,
> >                              VOIDmode, 0, 0, i, RELOAD_OTHER);
> >             rld[operand_reloadnum[i]].inc
> > @@ -4897,7 +4897,8 @@ find_reloads_address (machine_mode mode, rtx *memrefloc, rtx ad,
> >        if (reg_equiv_constant (regno) != 0)
> >         {
> >           find_reloads_address_part (reg_equiv_constant (regno), loc,
> > -                                    base_reg_class (mode, as, MEM, SCRATCH),
> > +                                    base_reg_class (mode, as, MEM,
> > +                                                    SCRATCH, insn),
> >                                      GET_MODE (ad), opnum, type, ind_levels);
> >           return 1;
> >         }
> > @@ -4966,7 +4967,7 @@ find_reloads_address (machine_mode mode, rtx *memrefloc, rtx ad,
> >
> >        /* If we do not have one of the cases above, we must do the reload.  */
> >        push_reload (ad, NULL_RTX, loc, (rtx*) 0,
> > -                  base_reg_class (mode, as, MEM, SCRATCH),
> > +                  base_reg_class (mode, as, MEM, SCRATCH, insn),
> >                    GET_MODE (ad), VOIDmode, 0, 0, opnum, type);
> >        return 1;
> >      }
> > @@ -5123,7 +5124,8 @@ find_reloads_address (machine_mode mode, rtx *memrefloc, rtx ad,
> >              reload the sum into a base reg.
> >              That will at least work.  */
> >           find_reloads_address_part (ad, loc,
> > -                                    base_reg_class (mode, as, MEM, SCRATCH),
> > +                                    base_reg_class (mode, as, MEM,
> > +                                                    SCRATCH, insn),
> >                                      GET_MODE (ad), opnum, type, ind_levels);
> >         }
> >        return ! removed_and;
> > @@ -5203,7 +5205,7 @@ find_reloads_address (machine_mode mode, rtx *memrefloc, rtx ad,
> >                                  op_index == 0 ? addend : offset_reg);
> >           *loc = ad;
> >
> > -         cls = base_reg_class (mode, as, MEM, GET_CODE (addend));
> > +         cls = base_reg_class (mode, as, MEM, GET_CODE (addend), insn);
> >           find_reloads_address_part (XEXP (ad, op_index),
> >                                      &XEXP (ad, op_index), cls,
> >                                      GET_MODE (ad), opnum, type, ind_levels);
> > @@ -5261,7 +5263,8 @@ find_reloads_address (machine_mode mode, rtx *memrefloc, rtx ad,
> >         }
> >
> >        find_reloads_address_part (ad, loc,
> > -                                base_reg_class (mode, as, MEM, SCRATCH),
> > +                                base_reg_class (mode, as, MEM,
> > +                                                SCRATCH, insn),
> >                                  address_mode, opnum, type, ind_levels);
> >        return ! removed_and;
> >      }
> > @@ -5513,7 +5516,8 @@ find_reloads_address_1 (machine_mode mode, addr_space_t as,
> >    if (context == 1)
> >      context_reg_class = INDEX_REG_CLASS;
> >    else
> > -    context_reg_class = base_reg_class (mode, as, outer_code, index_code);
> > +    context_reg_class = base_reg_class (mode, as, outer_code, index_code,
> > +                                       insn);
> >
> >    switch (code)
> >      {
> > @@ -5738,7 +5742,8 @@ find_reloads_address_1 (machine_mode mode, addr_space_t as,
> >                 reloadnum = push_reload (tem, tem, &XEXP (x, 0),
> >                                          &XEXP (op1, 0),
> >                                          base_reg_class (mode, as,
> > -                                                        code, index_code),
> > +                                                        code, index_code,
> > +                                                        insn),
> >                                          GET_MODE (x), GET_MODE (x), 0,
> >                                          0, opnum, RELOAD_OTHER);
> >
> > @@ -5756,7 +5761,8 @@ find_reloads_address_1 (machine_mode mode, addr_space_t as,
> >             reloadnum = push_reload (XEXP (op1, 0), XEXP (x, 0),
> >                                      &XEXP (op1, 0), &XEXP (x, 0),
> >                                      base_reg_class (mode, as,
> > -                                                    code, index_code),
> > +                                                    code, index_code,
> > +                                                    insn),
> >                                      GET_MODE (x), GET_MODE (x), 0, 0,
> >                                      opnum, RELOAD_OTHER);
> >
> > @@ -6216,7 +6222,7 @@ find_reloads_subreg_address (rtx x, int opnum, enum reload_type type,
> >      {
> >        push_reload (XEXP (tem, 0), NULL_RTX, &XEXP (tem, 0), (rtx*) 0,
> >                    base_reg_class (GET_MODE (tem), MEM_ADDR_SPACE (tem),
> > -                                  MEM, SCRATCH),
> > +                                  MEM, SCRATCH, insn),
> >                    GET_MODE (XEXP (tem, 0)), VOIDmode, 0, 0, opnum, type);
> >        reloaded = 1;
> >      }
> > diff --git a/gcc/reload1.cc b/gcc/reload1.cc
> > index 9ba822d1ff7..f41f4a4de22 100644
> > --- a/gcc/reload1.cc
> > +++ b/gcc/reload1.cc
> > @@ -1382,7 +1382,7 @@ maybe_fix_stack_asms (void)
> >                   if (insn_extra_address_constraint (cn))
> >                     cls = (int) reg_class_subunion[cls]
> >                       [(int) base_reg_class (VOIDmode, ADDR_SPACE_GENERIC,
> > -                                            ADDRESS, SCRATCH)];
> > +                                            ADDRESS, SCRATCH, chain->insn)];
> >                   else
> >                     cls = (int) reg_class_subunion[cls]
> >                       [reg_class_for_constraint (cn)];
> > --
> > 2.31.1
> >

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 01/13] [APX EGPR] middle-end: Add insn argument to base_reg_class
  2023-08-31  8:20 ` [PATCH 01/13] [APX EGPR] middle-end: Add insn argument to base_reg_class Hongyu Wang
@ 2023-08-31 10:15   ` Uros Bizjak
  2023-09-01  9:07     ` Hongyu Wang
  2023-09-08 17:03   ` Vladimir Makarov
  1 sibling, 1 reply; 28+ messages in thread
From: Uros Bizjak @ 2023-08-31 10:15 UTC (permalink / raw)
  To: Hongyu Wang
  Cc: gcc-patches, hongtao.liu, hubicka, vmakarov, jakub, Kong Lingling

On Thu, Aug 31, 2023 at 10:20 AM Hongyu Wang <hongyu.wang@intel.com> wrote:
>
> From: Kong Lingling <lingling.kong@intel.com>
>
> Current reload infrastructure does not support selective base_reg_class
> for backend insn. Add insn argument to base_reg_class for
> lra/reload usage.

I don't think this is the correct approach. Ideally, a memory
constraint should somehow encode its BASE/INDEX register class.
Instead of passing "insn", simply a different constraint could be used
in the constraint string of the relevant insn.

Uros.
>
> gcc/ChangeLog:
>
>         * addresses.h (base_reg_class):  Add insn argument.
>         Pass to MODE_CODE_BASE_REG_CLASS.
>         (regno_ok_for_base_p_1): Add insn argument.
>         Pass to REGNO_MODE_CODE_OK_FOR_BASE_P.
>         (regno_ok_for_base_p): Add insn argument and parse to ok_for_base_p_1.
>         * config/avr/avr.h (MODE_CODE_BASE_REG_CLASS): Add insn argument.
>         (REGNO_MODE_CODE_OK_FOR_BASE_P): Ditto.
>         * config/gcn/gcn.h (MODE_CODE_BASE_REG_CLASS): Ditto.
>         (REGNO_MODE_CODE_OK_FOR_BASE_P): Ditto.
>         * config/rl78/rl78.h (REGNO_MODE_CODE_OK_FOR_BASE_P): Ditto.
>         (MODE_CODE_BASE_REG_CLASS): Ditto.
>         * doc/tm.texi: Add insn argument for MODE_CODE_BASE_REG_CLASS
>         and REGNO_MODE_CODE_OK_FOR_BASE_P.
>         * doc/tm.texi.in: Ditto.
>         * lra-constraints.cc (process_address_1): Pass insn to
>         base_reg_class.
>         (curr_insn_transform): Ditto.
>         * reload.cc (find_reloads): Ditto.
>         (find_reloads_address): Ditto.
>         (find_reloads_address_1): Ditto.
>         (find_reloads_subreg_address): Ditto.
>         * reload1.cc (maybe_fix_stack_asms): Ditto.
> ---
>  gcc/addresses.h        | 15 +++++++++------
>  gcc/config/avr/avr.h   |  5 +++--
>  gcc/config/gcn/gcn.h   |  4 ++--
>  gcc/config/rl78/rl78.h |  6 ++++--
>  gcc/doc/tm.texi        |  8 ++++++--
>  gcc/doc/tm.texi.in     |  8 ++++++--
>  gcc/lra-constraints.cc | 15 +++++++++------
>  gcc/reload.cc          | 30 ++++++++++++++++++------------
>  gcc/reload1.cc         |  2 +-
>  9 files changed, 58 insertions(+), 35 deletions(-)
>
> diff --git a/gcc/addresses.h b/gcc/addresses.h
> index 3519c241c6d..08b100cfe6d 100644
> --- a/gcc/addresses.h
> +++ b/gcc/addresses.h
> @@ -28,11 +28,12 @@ inline enum reg_class
>  base_reg_class (machine_mode mode ATTRIBUTE_UNUSED,
>                 addr_space_t as ATTRIBUTE_UNUSED,
>                 enum rtx_code outer_code ATTRIBUTE_UNUSED,
> -               enum rtx_code index_code ATTRIBUTE_UNUSED)
> +               enum rtx_code index_code ATTRIBUTE_UNUSED,
> +               rtx_insn *insn ATTRIBUTE_UNUSED = NULL)
>  {
>  #ifdef MODE_CODE_BASE_REG_CLASS
>    return MODE_CODE_BASE_REG_CLASS (MACRO_MODE (mode), as, outer_code,
> -                                  index_code);
> +                                  index_code, insn);
>  #else
>  #ifdef MODE_BASE_REG_REG_CLASS
>    if (index_code == REG)
> @@ -56,11 +57,12 @@ ok_for_base_p_1 (unsigned regno ATTRIBUTE_UNUSED,
>                  machine_mode mode ATTRIBUTE_UNUSED,
>                  addr_space_t as ATTRIBUTE_UNUSED,
>                  enum rtx_code outer_code ATTRIBUTE_UNUSED,
> -                enum rtx_code index_code ATTRIBUTE_UNUSED)
> +                enum rtx_code index_code ATTRIBUTE_UNUSED,
> +                rtx_insn* insn ATTRIBUTE_UNUSED = NULL)
>  {
>  #ifdef REGNO_MODE_CODE_OK_FOR_BASE_P
>    return REGNO_MODE_CODE_OK_FOR_BASE_P (regno, MACRO_MODE (mode), as,
> -                                       outer_code, index_code);
> +                                       outer_code, index_code, insn);
>  #else
>  #ifdef REGNO_MODE_OK_FOR_REG_BASE_P
>    if (index_code == REG)
> @@ -79,12 +81,13 @@ ok_for_base_p_1 (unsigned regno ATTRIBUTE_UNUSED,
>
>  inline bool
>  regno_ok_for_base_p (unsigned regno, machine_mode mode, addr_space_t as,
> -                    enum rtx_code outer_code, enum rtx_code index_code)
> +                    enum rtx_code outer_code, enum rtx_code index_code,
> +                    rtx_insn* insn = NULL)
>  {
>    if (regno >= FIRST_PSEUDO_REGISTER && reg_renumber[regno] >= 0)
>      regno = reg_renumber[regno];
>
> -  return ok_for_base_p_1 (regno, mode, as, outer_code, index_code);
> +  return ok_for_base_p_1 (regno, mode, as, outer_code, index_code, insn);
>  }
>
>  #endif /* GCC_ADDRESSES_H */
> diff --git a/gcc/config/avr/avr.h b/gcc/config/avr/avr.h
> index 8e7e00db13b..1d090fe0838 100644
> --- a/gcc/config/avr/avr.h
> +++ b/gcc/config/avr/avr.h
> @@ -280,12 +280,13 @@ enum reg_class {
>
>  #define REGNO_REG_CLASS(R) avr_regno_reg_class(R)
>
> -#define MODE_CODE_BASE_REG_CLASS(mode, as, outer_code, index_code)   \
> +#define MODE_CODE_BASE_REG_CLASS(mode, as, outer_code, index_code, insn)   \
>    avr_mode_code_base_reg_class (mode, as, outer_code, index_code)
>
>  #define INDEX_REG_CLASS NO_REGS
>
> -#define REGNO_MODE_CODE_OK_FOR_BASE_P(num, mode, as, outer_code, index_code) \
> +#define REGNO_MODE_CODE_OK_FOR_BASE_P(num, mode, as, outer_code,         \
> +                                     index_code, insn)                   \
>    avr_regno_mode_code_ok_for_base_p (num, mode, as, outer_code, index_code)
>
>  #define REGNO_OK_FOR_INDEX_P(NUM) 0
> diff --git a/gcc/config/gcn/gcn.h b/gcc/config/gcn/gcn.h
> index 4ff9a5d4d12..b56702a77fd 100644
> --- a/gcc/config/gcn/gcn.h
> +++ b/gcc/config/gcn/gcn.h
> @@ -437,9 +437,9 @@ enum reg_class
>       0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0 }}
>
>  #define REGNO_REG_CLASS(REGNO) gcn_regno_reg_class (REGNO)
> -#define MODE_CODE_BASE_REG_CLASS(MODE, AS, OUTER, INDEX) \
> +#define MODE_CODE_BASE_REG_CLASS(MODE, AS, OUTER, INDEX, INSN) \
>          gcn_mode_code_base_reg_class (MODE, AS, OUTER, INDEX)
> -#define REGNO_MODE_CODE_OK_FOR_BASE_P(NUM, MODE, AS, OUTER, INDEX) \
> +#define REGNO_MODE_CODE_OK_FOR_BASE_P(NUM, MODE, AS, OUTER, INDEX, INSN) \
>          gcn_regno_mode_code_ok_for_base_p (NUM, MODE, AS, OUTER, INDEX)
>  #define INDEX_REG_CLASS VGPR_REGS
>  #define REGNO_OK_FOR_INDEX_P(regno) regno_ok_for_index_p (regno)
> diff --git a/gcc/config/rl78/rl78.h b/gcc/config/rl78/rl78.h
> index 7a7c6a44ba2..d0ed9162292 100644
> --- a/gcc/config/rl78/rl78.h
> +++ b/gcc/config/rl78/rl78.h
> @@ -375,10 +375,12 @@ enum reg_class
>
>  #define REGNO_OK_FOR_INDEX_P(regno)    REGNO_OK_FOR_BASE_P (regno)
>
> -#define REGNO_MODE_CODE_OK_FOR_BASE_P(regno, mode, address_space, outer_code, index_code) \
> +#define REGNO_MODE_CODE_OK_FOR_BASE_P(regno, mode, address_space, outer_code, \
> +                                     index_code, insn)                       \
>    rl78_regno_mode_code_ok_for_base_p (regno, mode, address_space, outer_code, index_code)
>
> -#define MODE_CODE_BASE_REG_CLASS(mode, address_space, outer_code, index_code) \
> +#define MODE_CODE_BASE_REG_CLASS(mode, address_space, outer_code, index_code, \
> +                                insn)                                        \
>    rl78_mode_code_base_reg_class (mode, address_space, outer_code, index_code)
>
>  #define RETURN_ADDR_RTX(COUNT, FRAMEADDR)                              \
> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> index d0d47b0d471..a4239e3de10 100644
> --- a/gcc/doc/tm.texi
> +++ b/gcc/doc/tm.texi
> @@ -2533,7 +2533,7 @@ register address.  You should define this macro if base plus index
>  addresses have different requirements than other base register uses.
>  @end defmac
>
> -@defmac MODE_CODE_BASE_REG_CLASS (@var{mode}, @var{address_space}, @var{outer_code}, @var{index_code})
> +@defmac MODE_CODE_BASE_REG_CLASS (@var{mode}, @var{address_space}, @var{outer_code}, @var{index_code}, @var{insn})
>  A C expression whose value is the register class to which a valid
>  base register for a memory reference in mode @var{mode} to address
>  space @var{address_space} must belong.  @var{outer_code} and @var{index_code}
> @@ -2542,6 +2542,8 @@ the code of the immediately enclosing expression (@code{MEM} for the top level
>  of an address, @code{ADDRESS} for something that occurs in an
>  @code{address_operand}).  @var{index_code} is the code of the corresponding
>  index expression if @var{outer_code} is @code{PLUS}; @code{SCRATCH} otherwise.
> +@code{insn} indicates insn specific base register class should be subset
> +of the original base register class.
>  @end defmac
>
>  @defmac INDEX_REG_CLASS
> @@ -2579,7 +2581,7 @@ Use of this macro is deprecated; please use the more general
>  @code{REGNO_MODE_CODE_OK_FOR_BASE_P}.
>  @end defmac
>
> -@defmac REGNO_MODE_CODE_OK_FOR_BASE_P (@var{num}, @var{mode}, @var{address_space}, @var{outer_code}, @var{index_code})
> +@defmac REGNO_MODE_CODE_OK_FOR_BASE_P (@var{num}, @var{mode}, @var{address_space}, @var{outer_code}, @var{index_code}, @var{insn})
>  A C expression which is nonzero if register number @var{num} is
>  suitable for use as a base register in operand addresses, accessing
>  memory in mode @var{mode} in address space @var{address_space}.
> @@ -2592,6 +2594,8 @@ address, @code{ADDRESS} for something that occurs in an
>  corresponding index expression if @var{outer_code} is @code{PLUS};
>  @code{SCRATCH} otherwise.  The mode may be @code{VOIDmode} for addresses
>  that appear outside a @code{MEM}, i.e., as an @code{address_operand}.
> +@code{insn} indicates insn specific base register class should be subset
> +of the original base register class.
>  @end defmac
>
>  @defmac REGNO_OK_FOR_INDEX_P (@var{num})
> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
> index 4ac96dc357d..72898f3adba 100644
> --- a/gcc/doc/tm.texi.in
> +++ b/gcc/doc/tm.texi.in
> @@ -2128,7 +2128,7 @@ register address.  You should define this macro if base plus index
>  addresses have different requirements than other base register uses.
>  @end defmac
>
> -@defmac MODE_CODE_BASE_REG_CLASS (@var{mode}, @var{address_space}, @var{outer_code}, @var{index_code})
> +@defmac MODE_CODE_BASE_REG_CLASS (@var{mode}, @var{address_space}, @var{outer_code}, @var{index_code}, @var{insn})
>  A C expression whose value is the register class to which a valid
>  base register for a memory reference in mode @var{mode} to address
>  space @var{address_space} must belong.  @var{outer_code} and @var{index_code}
> @@ -2137,6 +2137,8 @@ the code of the immediately enclosing expression (@code{MEM} for the top level
>  of an address, @code{ADDRESS} for something that occurs in an
>  @code{address_operand}).  @var{index_code} is the code of the corresponding
>  index expression if @var{outer_code} is @code{PLUS}; @code{SCRATCH} otherwise.
> +@code{insn} indicates insn specific base register class should be subset
> +of the original base register class.
>  @end defmac
>
>  @defmac INDEX_REG_CLASS
> @@ -2174,7 +2176,7 @@ Use of this macro is deprecated; please use the more general
>  @code{REGNO_MODE_CODE_OK_FOR_BASE_P}.
>  @end defmac
>
> -@defmac REGNO_MODE_CODE_OK_FOR_BASE_P (@var{num}, @var{mode}, @var{address_space}, @var{outer_code}, @var{index_code})
> +@defmac REGNO_MODE_CODE_OK_FOR_BASE_P (@var{num}, @var{mode}, @var{address_space}, @var{outer_code}, @var{index_code}, @var{insn})
>  A C expression which is nonzero if register number @var{num} is
>  suitable for use as a base register in operand addresses, accessing
>  memory in mode @var{mode} in address space @var{address_space}.
> @@ -2187,6 +2189,8 @@ address, @code{ADDRESS} for something that occurs in an
>  corresponding index expression if @var{outer_code} is @code{PLUS};
>  @code{SCRATCH} otherwise.  The mode may be @code{VOIDmode} for addresses
>  that appear outside a @code{MEM}, i.e., as an @code{address_operand}.
> +@code{insn} indicates insn specific base register class should be subset
> +of the original base register class.
>  @end defmac
>
>  @defmac REGNO_OK_FOR_INDEX_P (@var{num})
> diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
> index c718bedff32..9e7915ce934 100644
> --- a/gcc/lra-constraints.cc
> +++ b/gcc/lra-constraints.cc
> @@ -3672,7 +3672,7 @@ process_address_1 (int nop, bool check_only_p,
>                                      REGNO (*ad.base_term)) != NULL_RTX)
>             ? after : NULL),
>            base_reg_class (ad.mode, ad.as, ad.base_outer_code,
> -                          get_index_code (&ad)))))
> +                          get_index_code (&ad), curr_insn))))
>      {
>        change_p = true;
>        if (ad.base_term2 != NULL)
> @@ -3722,7 +3722,8 @@ process_address_1 (int nop, bool check_only_p,
>           rtx_insn *last = get_last_insn ();
>           int code = -1;
>           enum reg_class cl = base_reg_class (ad.mode, ad.as,
> -                                             SCRATCH, SCRATCH);
> +                                             SCRATCH, SCRATCH,
> +                                             curr_insn);
>           rtx addr = *ad.inner;
>
>           new_reg = lra_create_new_reg (Pmode, NULL_RTX, cl, NULL, "addr");
> @@ -3785,7 +3786,8 @@ process_address_1 (int nop, bool check_only_p,
>           /* index * scale + disp => new base + index * scale,
>              case (1) above.  */
>           enum reg_class cl = base_reg_class (ad.mode, ad.as, PLUS,
> -                                             GET_CODE (*ad.index));
> +                                             GET_CODE (*ad.index),
> +                                             curr_insn);
>
>           lra_assert (INDEX_REG_CLASS != NO_REGS);
>           new_reg = lra_create_new_reg (Pmode, NULL_RTX, cl, NULL, "disp");
> @@ -3846,7 +3848,7 @@ process_address_1 (int nop, bool check_only_p,
>               *ad.base_term = XEXP (SET_SRC (set), 0);
>               *ad.disp_term = XEXP (SET_SRC (set), 1);
>               cl = base_reg_class (ad.mode, ad.as, ad.base_outer_code,
> -                                  get_index_code (&ad));
> +                                  get_index_code (&ad), curr_insn);
>               regno = REGNO (*ad.base_term);
>               if (regno >= FIRST_PSEUDO_REGISTER
>                   && cl != lra_get_allocno_class (regno))
> @@ -3890,7 +3892,8 @@ process_address_1 (int nop, bool check_only_p,
>    else
>      {
>        enum reg_class cl = base_reg_class (ad.mode, ad.as,
> -                                         SCRATCH, SCRATCH);
> +                                         SCRATCH, SCRATCH,
> +                                         curr_insn);
>        rtx addr = *ad.inner;
>
>        new_reg = lra_create_new_reg (Pmode, NULL_RTX, cl, NULL, "addr");
> @@ -4639,7 +4642,7 @@ curr_insn_transform (bool check_only_p)
>
>           push_to_sequence (before);
>           rclass = base_reg_class (GET_MODE (op), MEM_ADDR_SPACE (op),
> -                                  MEM, SCRATCH);
> +                                  MEM, SCRATCH, curr_insn);
>           if (GET_RTX_CLASS (code) == RTX_AUTOINC)
>             new_reg = emit_inc (rclass, *loc, *loc,
>                                 /* This value does not matter for MODIFY.  */
> diff --git a/gcc/reload.cc b/gcc/reload.cc
> index 2126bdd117c..72f7e27af15 100644
> --- a/gcc/reload.cc
> +++ b/gcc/reload.cc
> @@ -3321,7 +3321,7 @@ find_reloads (rtx_insn *insn, int replace, int ind_levels, int live_known,
>                        were handled in find_reloads_address.  */
>                     this_alternative[i]
>                       = base_reg_class (VOIDmode, ADDR_SPACE_GENERIC,
> -                                       ADDRESS, SCRATCH);
> +                                       ADDRESS, SCRATCH, insn);
>                     win = 1;
>                     badop = 0;
>                     break;
> @@ -3508,7 +3508,7 @@ find_reloads (rtx_insn *insn, int replace, int ind_levels, int live_known,
>                            the address into a base register.  */
>                         this_alternative[i]
>                           = base_reg_class (VOIDmode, ADDR_SPACE_GENERIC,
> -                                           ADDRESS, SCRATCH);
> +                                           ADDRESS, SCRATCH, insn);
>                         badop = 0;
>                         break;
>
> @@ -4018,7 +4018,7 @@ find_reloads (rtx_insn *insn, int replace, int ind_levels, int live_known,
>             operand_reloadnum[i]
>               = push_reload (XEXP (recog_data.operand[i], 0), NULL_RTX,
>                              &XEXP (recog_data.operand[i], 0), (rtx*) 0,
> -                            base_reg_class (VOIDmode, as, MEM, SCRATCH),
> +                            base_reg_class (VOIDmode, as, MEM, SCRATCH, insn),
>                              address_mode,
>                              VOIDmode, 0, 0, i, RELOAD_OTHER);
>             rld[operand_reloadnum[i]].inc
> @@ -4897,7 +4897,8 @@ find_reloads_address (machine_mode mode, rtx *memrefloc, rtx ad,
>        if (reg_equiv_constant (regno) != 0)
>         {
>           find_reloads_address_part (reg_equiv_constant (regno), loc,
> -                                    base_reg_class (mode, as, MEM, SCRATCH),
> +                                    base_reg_class (mode, as, MEM,
> +                                                    SCRATCH, insn),
>                                      GET_MODE (ad), opnum, type, ind_levels);
>           return 1;
>         }
> @@ -4966,7 +4967,7 @@ find_reloads_address (machine_mode mode, rtx *memrefloc, rtx ad,
>
>        /* If we do not have one of the cases above, we must do the reload.  */
>        push_reload (ad, NULL_RTX, loc, (rtx*) 0,
> -                  base_reg_class (mode, as, MEM, SCRATCH),
> +                  base_reg_class (mode, as, MEM, SCRATCH, insn),
>                    GET_MODE (ad), VOIDmode, 0, 0, opnum, type);
>        return 1;
>      }
> @@ -5123,7 +5124,8 @@ find_reloads_address (machine_mode mode, rtx *memrefloc, rtx ad,
>              reload the sum into a base reg.
>              That will at least work.  */
>           find_reloads_address_part (ad, loc,
> -                                    base_reg_class (mode, as, MEM, SCRATCH),
> +                                    base_reg_class (mode, as, MEM,
> +                                                    SCRATCH, insn),
>                                      GET_MODE (ad), opnum, type, ind_levels);
>         }
>        return ! removed_and;
> @@ -5203,7 +5205,7 @@ find_reloads_address (machine_mode mode, rtx *memrefloc, rtx ad,
>                                  op_index == 0 ? addend : offset_reg);
>           *loc = ad;
>
> -         cls = base_reg_class (mode, as, MEM, GET_CODE (addend));
> +         cls = base_reg_class (mode, as, MEM, GET_CODE (addend), insn);
>           find_reloads_address_part (XEXP (ad, op_index),
>                                      &XEXP (ad, op_index), cls,
>                                      GET_MODE (ad), opnum, type, ind_levels);
> @@ -5261,7 +5263,8 @@ find_reloads_address (machine_mode mode, rtx *memrefloc, rtx ad,
>         }
>
>        find_reloads_address_part (ad, loc,
> -                                base_reg_class (mode, as, MEM, SCRATCH),
> +                                base_reg_class (mode, as, MEM,
> +                                                SCRATCH, insn),
>                                  address_mode, opnum, type, ind_levels);
>        return ! removed_and;
>      }
> @@ -5513,7 +5516,8 @@ find_reloads_address_1 (machine_mode mode, addr_space_t as,
>    if (context == 1)
>      context_reg_class = INDEX_REG_CLASS;
>    else
> -    context_reg_class = base_reg_class (mode, as, outer_code, index_code);
> +    context_reg_class = base_reg_class (mode, as, outer_code, index_code,
> +                                       insn);
>
>    switch (code)
>      {
> @@ -5738,7 +5742,8 @@ find_reloads_address_1 (machine_mode mode, addr_space_t as,
>                 reloadnum = push_reload (tem, tem, &XEXP (x, 0),
>                                          &XEXP (op1, 0),
>                                          base_reg_class (mode, as,
> -                                                        code, index_code),
> +                                                        code, index_code,
> +                                                        insn),
>                                          GET_MODE (x), GET_MODE (x), 0,
>                                          0, opnum, RELOAD_OTHER);
>
> @@ -5756,7 +5761,8 @@ find_reloads_address_1 (machine_mode mode, addr_space_t as,
>             reloadnum = push_reload (XEXP (op1, 0), XEXP (x, 0),
>                                      &XEXP (op1, 0), &XEXP (x, 0),
>                                      base_reg_class (mode, as,
> -                                                    code, index_code),
> +                                                    code, index_code,
> +                                                    insn),
>                                      GET_MODE (x), GET_MODE (x), 0, 0,
>                                      opnum, RELOAD_OTHER);
>
> @@ -6216,7 +6222,7 @@ find_reloads_subreg_address (rtx x, int opnum, enum reload_type type,
>      {
>        push_reload (XEXP (tem, 0), NULL_RTX, &XEXP (tem, 0), (rtx*) 0,
>                    base_reg_class (GET_MODE (tem), MEM_ADDR_SPACE (tem),
> -                                  MEM, SCRATCH),
> +                                  MEM, SCRATCH, insn),
>                    GET_MODE (XEXP (tem, 0)), VOIDmode, 0, 0, opnum, type);
>        reloaded = 1;
>      }
> diff --git a/gcc/reload1.cc b/gcc/reload1.cc
> index 9ba822d1ff7..f41f4a4de22 100644
> --- a/gcc/reload1.cc
> +++ b/gcc/reload1.cc
> @@ -1382,7 +1382,7 @@ maybe_fix_stack_asms (void)
>                   if (insn_extra_address_constraint (cn))
>                     cls = (int) reg_class_subunion[cls]
>                       [(int) base_reg_class (VOIDmode, ADDR_SPACE_GENERIC,
> -                                            ADDRESS, SCRATCH)];
> +                                            ADDRESS, SCRATCH, chain->insn)];
>                   else
>                     cls = (int) reg_class_subunion[cls]
>                       [reg_class_for_constraint (cn)];
> --
> 2.31.1
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 01/13] [APX EGPR] middle-end: Add insn argument to base_reg_class
  2023-08-31  8:20 [PATCH 00/13] [RFC] " Hongyu Wang
@ 2023-08-31  8:20 ` Hongyu Wang
  2023-08-31 10:15   ` Uros Bizjak
  2023-09-08 17:03   ` Vladimir Makarov
  0 siblings, 2 replies; 28+ messages in thread
From: Hongyu Wang @ 2023-08-31  8:20 UTC (permalink / raw)
  To: gcc-patches; +Cc: hongtao.liu, ubizjak, hubicka, vmakarov, jakub, Kong Lingling

From: Kong Lingling <lingling.kong@intel.com>

Current reload infrastructure does not support selective base_reg_class
for backend insn. Add insn argument to base_reg_class for
lra/reload usage.

gcc/ChangeLog:

	* addresses.h (base_reg_class):  Add insn argument.
	Pass to MODE_CODE_BASE_REG_CLASS.
	(regno_ok_for_base_p_1): Add insn argument.
	Pass to REGNO_MODE_CODE_OK_FOR_BASE_P.
	(regno_ok_for_base_p): Add insn argument and parse to ok_for_base_p_1.
	* config/avr/avr.h (MODE_CODE_BASE_REG_CLASS): Add insn argument.
	(REGNO_MODE_CODE_OK_FOR_BASE_P): Ditto.
	* config/gcn/gcn.h (MODE_CODE_BASE_REG_CLASS): Ditto.
	(REGNO_MODE_CODE_OK_FOR_BASE_P): Ditto.
	* config/rl78/rl78.h (REGNO_MODE_CODE_OK_FOR_BASE_P): Ditto.
	(MODE_CODE_BASE_REG_CLASS): Ditto.
	* doc/tm.texi: Add insn argument for MODE_CODE_BASE_REG_CLASS
	and REGNO_MODE_CODE_OK_FOR_BASE_P.
	* doc/tm.texi.in: Ditto.
	* lra-constraints.cc (process_address_1): Pass insn to
	base_reg_class.
	(curr_insn_transform): Ditto.
	* reload.cc (find_reloads): Ditto.
	(find_reloads_address): Ditto.
	(find_reloads_address_1): Ditto.
	(find_reloads_subreg_address): Ditto.
	* reload1.cc (maybe_fix_stack_asms): Ditto.
---
 gcc/addresses.h        | 15 +++++++++------
 gcc/config/avr/avr.h   |  5 +++--
 gcc/config/gcn/gcn.h   |  4 ++--
 gcc/config/rl78/rl78.h |  6 ++++--
 gcc/doc/tm.texi        |  8 ++++++--
 gcc/doc/tm.texi.in     |  8 ++++++--
 gcc/lra-constraints.cc | 15 +++++++++------
 gcc/reload.cc          | 30 ++++++++++++++++++------------
 gcc/reload1.cc         |  2 +-
 9 files changed, 58 insertions(+), 35 deletions(-)

diff --git a/gcc/addresses.h b/gcc/addresses.h
index 3519c241c6d..08b100cfe6d 100644
--- a/gcc/addresses.h
+++ b/gcc/addresses.h
@@ -28,11 +28,12 @@ inline enum reg_class
 base_reg_class (machine_mode mode ATTRIBUTE_UNUSED,
 		addr_space_t as ATTRIBUTE_UNUSED,
 		enum rtx_code outer_code ATTRIBUTE_UNUSED,
-		enum rtx_code index_code ATTRIBUTE_UNUSED)
+		enum rtx_code index_code ATTRIBUTE_UNUSED,
+		rtx_insn *insn ATTRIBUTE_UNUSED = NULL)
 {
 #ifdef MODE_CODE_BASE_REG_CLASS
   return MODE_CODE_BASE_REG_CLASS (MACRO_MODE (mode), as, outer_code,
-				   index_code);
+				   index_code, insn);
 #else
 #ifdef MODE_BASE_REG_REG_CLASS
   if (index_code == REG)
@@ -56,11 +57,12 @@ ok_for_base_p_1 (unsigned regno ATTRIBUTE_UNUSED,
 		 machine_mode mode ATTRIBUTE_UNUSED,
 		 addr_space_t as ATTRIBUTE_UNUSED,
 		 enum rtx_code outer_code ATTRIBUTE_UNUSED,
-		 enum rtx_code index_code ATTRIBUTE_UNUSED)
+		 enum rtx_code index_code ATTRIBUTE_UNUSED,
+		 rtx_insn* insn ATTRIBUTE_UNUSED = NULL)
 {
 #ifdef REGNO_MODE_CODE_OK_FOR_BASE_P
   return REGNO_MODE_CODE_OK_FOR_BASE_P (regno, MACRO_MODE (mode), as,
-					outer_code, index_code);
+					outer_code, index_code, insn);
 #else
 #ifdef REGNO_MODE_OK_FOR_REG_BASE_P
   if (index_code == REG)
@@ -79,12 +81,13 @@ ok_for_base_p_1 (unsigned regno ATTRIBUTE_UNUSED,
 
 inline bool
 regno_ok_for_base_p (unsigned regno, machine_mode mode, addr_space_t as,
-		     enum rtx_code outer_code, enum rtx_code index_code)
+		     enum rtx_code outer_code, enum rtx_code index_code,
+		     rtx_insn* insn = NULL)
 {
   if (regno >= FIRST_PSEUDO_REGISTER && reg_renumber[regno] >= 0)
     regno = reg_renumber[regno];
 
-  return ok_for_base_p_1 (regno, mode, as, outer_code, index_code);
+  return ok_for_base_p_1 (regno, mode, as, outer_code, index_code, insn);
 }
 
 #endif /* GCC_ADDRESSES_H */
diff --git a/gcc/config/avr/avr.h b/gcc/config/avr/avr.h
index 8e7e00db13b..1d090fe0838 100644
--- a/gcc/config/avr/avr.h
+++ b/gcc/config/avr/avr.h
@@ -280,12 +280,13 @@ enum reg_class {
 
 #define REGNO_REG_CLASS(R) avr_regno_reg_class(R)
 
-#define MODE_CODE_BASE_REG_CLASS(mode, as, outer_code, index_code)   \
+#define MODE_CODE_BASE_REG_CLASS(mode, as, outer_code, index_code, insn)   \
   avr_mode_code_base_reg_class (mode, as, outer_code, index_code)
 
 #define INDEX_REG_CLASS NO_REGS
 
-#define REGNO_MODE_CODE_OK_FOR_BASE_P(num, mode, as, outer_code, index_code) \
+#define REGNO_MODE_CODE_OK_FOR_BASE_P(num, mode, as, outer_code,	  \
+				      index_code, insn)			  \
   avr_regno_mode_code_ok_for_base_p (num, mode, as, outer_code, index_code)
 
 #define REGNO_OK_FOR_INDEX_P(NUM) 0
diff --git a/gcc/config/gcn/gcn.h b/gcc/config/gcn/gcn.h
index 4ff9a5d4d12..b56702a77fd 100644
--- a/gcc/config/gcn/gcn.h
+++ b/gcc/config/gcn/gcn.h
@@ -437,9 +437,9 @@ enum reg_class
      0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0 }}
 
 #define REGNO_REG_CLASS(REGNO) gcn_regno_reg_class (REGNO)
-#define MODE_CODE_BASE_REG_CLASS(MODE, AS, OUTER, INDEX) \
+#define MODE_CODE_BASE_REG_CLASS(MODE, AS, OUTER, INDEX, INSN) \
 	 gcn_mode_code_base_reg_class (MODE, AS, OUTER, INDEX)
-#define REGNO_MODE_CODE_OK_FOR_BASE_P(NUM, MODE, AS, OUTER, INDEX) \
+#define REGNO_MODE_CODE_OK_FOR_BASE_P(NUM, MODE, AS, OUTER, INDEX, INSN) \
 	 gcn_regno_mode_code_ok_for_base_p (NUM, MODE, AS, OUTER, INDEX)
 #define INDEX_REG_CLASS VGPR_REGS
 #define REGNO_OK_FOR_INDEX_P(regno) regno_ok_for_index_p (regno)
diff --git a/gcc/config/rl78/rl78.h b/gcc/config/rl78/rl78.h
index 7a7c6a44ba2..d0ed9162292 100644
--- a/gcc/config/rl78/rl78.h
+++ b/gcc/config/rl78/rl78.h
@@ -375,10 +375,12 @@ enum reg_class
 
 #define REGNO_OK_FOR_INDEX_P(regno)	REGNO_OK_FOR_BASE_P (regno)
 
-#define REGNO_MODE_CODE_OK_FOR_BASE_P(regno, mode, address_space, outer_code, index_code) \
+#define REGNO_MODE_CODE_OK_FOR_BASE_P(regno, mode, address_space, outer_code, \
+				      index_code, insn)			      \
   rl78_regno_mode_code_ok_for_base_p (regno, mode, address_space, outer_code, index_code)
 
-#define MODE_CODE_BASE_REG_CLASS(mode, address_space, outer_code, index_code) \
+#define MODE_CODE_BASE_REG_CLASS(mode, address_space, outer_code, index_code, \
+				 insn) 					      \
   rl78_mode_code_base_reg_class (mode, address_space, outer_code, index_code)
 
 #define RETURN_ADDR_RTX(COUNT, FRAMEADDR)				\
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index d0d47b0d471..a4239e3de10 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -2533,7 +2533,7 @@ register address.  You should define this macro if base plus index
 addresses have different requirements than other base register uses.
 @end defmac
 
-@defmac MODE_CODE_BASE_REG_CLASS (@var{mode}, @var{address_space}, @var{outer_code}, @var{index_code})
+@defmac MODE_CODE_BASE_REG_CLASS (@var{mode}, @var{address_space}, @var{outer_code}, @var{index_code}, @var{insn})
 A C expression whose value is the register class to which a valid
 base register for a memory reference in mode @var{mode} to address
 space @var{address_space} must belong.  @var{outer_code} and @var{index_code}
@@ -2542,6 +2542,8 @@ the code of the immediately enclosing expression (@code{MEM} for the top level
 of an address, @code{ADDRESS} for something that occurs in an
 @code{address_operand}).  @var{index_code} is the code of the corresponding
 index expression if @var{outer_code} is @code{PLUS}; @code{SCRATCH} otherwise.
+@code{insn} indicates insn specific base register class should be subset
+of the original base register class.
 @end defmac
 
 @defmac INDEX_REG_CLASS
@@ -2579,7 +2581,7 @@ Use of this macro is deprecated; please use the more general
 @code{REGNO_MODE_CODE_OK_FOR_BASE_P}.
 @end defmac
 
-@defmac REGNO_MODE_CODE_OK_FOR_BASE_P (@var{num}, @var{mode}, @var{address_space}, @var{outer_code}, @var{index_code})
+@defmac REGNO_MODE_CODE_OK_FOR_BASE_P (@var{num}, @var{mode}, @var{address_space}, @var{outer_code}, @var{index_code}, @var{insn})
 A C expression which is nonzero if register number @var{num} is
 suitable for use as a base register in operand addresses, accessing
 memory in mode @var{mode} in address space @var{address_space}.
@@ -2592,6 +2594,8 @@ address, @code{ADDRESS} for something that occurs in an
 corresponding index expression if @var{outer_code} is @code{PLUS};
 @code{SCRATCH} otherwise.  The mode may be @code{VOIDmode} for addresses
 that appear outside a @code{MEM}, i.e., as an @code{address_operand}.
+@code{insn} indicates insn specific base register class should be subset
+of the original base register class.
 @end defmac
 
 @defmac REGNO_OK_FOR_INDEX_P (@var{num})
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 4ac96dc357d..72898f3adba 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -2128,7 +2128,7 @@ register address.  You should define this macro if base plus index
 addresses have different requirements than other base register uses.
 @end defmac
 
-@defmac MODE_CODE_BASE_REG_CLASS (@var{mode}, @var{address_space}, @var{outer_code}, @var{index_code})
+@defmac MODE_CODE_BASE_REG_CLASS (@var{mode}, @var{address_space}, @var{outer_code}, @var{index_code}, @var{insn})
 A C expression whose value is the register class to which a valid
 base register for a memory reference in mode @var{mode} to address
 space @var{address_space} must belong.  @var{outer_code} and @var{index_code}
@@ -2137,6 +2137,8 @@ the code of the immediately enclosing expression (@code{MEM} for the top level
 of an address, @code{ADDRESS} for something that occurs in an
 @code{address_operand}).  @var{index_code} is the code of the corresponding
 index expression if @var{outer_code} is @code{PLUS}; @code{SCRATCH} otherwise.
+@code{insn} indicates insn specific base register class should be subset
+of the original base register class.
 @end defmac
 
 @defmac INDEX_REG_CLASS
@@ -2174,7 +2176,7 @@ Use of this macro is deprecated; please use the more general
 @code{REGNO_MODE_CODE_OK_FOR_BASE_P}.
 @end defmac
 
-@defmac REGNO_MODE_CODE_OK_FOR_BASE_P (@var{num}, @var{mode}, @var{address_space}, @var{outer_code}, @var{index_code})
+@defmac REGNO_MODE_CODE_OK_FOR_BASE_P (@var{num}, @var{mode}, @var{address_space}, @var{outer_code}, @var{index_code}, @var{insn})
 A C expression which is nonzero if register number @var{num} is
 suitable for use as a base register in operand addresses, accessing
 memory in mode @var{mode} in address space @var{address_space}.
@@ -2187,6 +2189,8 @@ address, @code{ADDRESS} for something that occurs in an
 corresponding index expression if @var{outer_code} is @code{PLUS};
 @code{SCRATCH} otherwise.  The mode may be @code{VOIDmode} for addresses
 that appear outside a @code{MEM}, i.e., as an @code{address_operand}.
+@code{insn} indicates insn specific base register class should be subset
+of the original base register class.
 @end defmac
 
 @defmac REGNO_OK_FOR_INDEX_P (@var{num})
diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index c718bedff32..9e7915ce934 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -3672,7 +3672,7 @@ process_address_1 (int nop, bool check_only_p,
 				     REGNO (*ad.base_term)) != NULL_RTX)
 	    ? after : NULL),
 	   base_reg_class (ad.mode, ad.as, ad.base_outer_code,
-			   get_index_code (&ad)))))
+			   get_index_code (&ad), curr_insn))))
     {
       change_p = true;
       if (ad.base_term2 != NULL)
@@ -3722,7 +3722,8 @@ process_address_1 (int nop, bool check_only_p,
 	  rtx_insn *last = get_last_insn ();
 	  int code = -1;
 	  enum reg_class cl = base_reg_class (ad.mode, ad.as,
-					      SCRATCH, SCRATCH);
+					      SCRATCH, SCRATCH,
+					      curr_insn);
 	  rtx addr = *ad.inner;
 
 	  new_reg = lra_create_new_reg (Pmode, NULL_RTX, cl, NULL, "addr");
@@ -3785,7 +3786,8 @@ process_address_1 (int nop, bool check_only_p,
 	  /* index * scale + disp => new base + index * scale,
 	     case (1) above.  */
 	  enum reg_class cl = base_reg_class (ad.mode, ad.as, PLUS,
-					      GET_CODE (*ad.index));
+					      GET_CODE (*ad.index),
+					      curr_insn);
 
 	  lra_assert (INDEX_REG_CLASS != NO_REGS);
 	  new_reg = lra_create_new_reg (Pmode, NULL_RTX, cl, NULL, "disp");
@@ -3846,7 +3848,7 @@ process_address_1 (int nop, bool check_only_p,
 	      *ad.base_term = XEXP (SET_SRC (set), 0);
 	      *ad.disp_term = XEXP (SET_SRC (set), 1);
 	      cl = base_reg_class (ad.mode, ad.as, ad.base_outer_code,
-				   get_index_code (&ad));
+				   get_index_code (&ad), curr_insn);
 	      regno = REGNO (*ad.base_term);
 	      if (regno >= FIRST_PSEUDO_REGISTER
 		  && cl != lra_get_allocno_class (regno))
@@ -3890,7 +3892,8 @@ process_address_1 (int nop, bool check_only_p,
   else
     {
       enum reg_class cl = base_reg_class (ad.mode, ad.as,
-					  SCRATCH, SCRATCH);
+					  SCRATCH, SCRATCH,
+					  curr_insn);
       rtx addr = *ad.inner;
       
       new_reg = lra_create_new_reg (Pmode, NULL_RTX, cl, NULL, "addr");
@@ -4639,7 +4642,7 @@ curr_insn_transform (bool check_only_p)
 
 	  push_to_sequence (before);
 	  rclass = base_reg_class (GET_MODE (op), MEM_ADDR_SPACE (op),
-				   MEM, SCRATCH);
+				   MEM, SCRATCH, curr_insn);
 	  if (GET_RTX_CLASS (code) == RTX_AUTOINC)
 	    new_reg = emit_inc (rclass, *loc, *loc,
 				/* This value does not matter for MODIFY.  */
diff --git a/gcc/reload.cc b/gcc/reload.cc
index 2126bdd117c..72f7e27af15 100644
--- a/gcc/reload.cc
+++ b/gcc/reload.cc
@@ -3321,7 +3321,7 @@ find_reloads (rtx_insn *insn, int replace, int ind_levels, int live_known,
 		       were handled in find_reloads_address.  */
 		    this_alternative[i]
 		      = base_reg_class (VOIDmode, ADDR_SPACE_GENERIC,
-					ADDRESS, SCRATCH);
+					ADDRESS, SCRATCH, insn);
 		    win = 1;
 		    badop = 0;
 		    break;
@@ -3508,7 +3508,7 @@ find_reloads (rtx_insn *insn, int replace, int ind_levels, int live_known,
 			   the address into a base register.  */
 			this_alternative[i]
 			  = base_reg_class (VOIDmode, ADDR_SPACE_GENERIC,
-					    ADDRESS, SCRATCH);
+					    ADDRESS, SCRATCH, insn);
 			badop = 0;
 			break;
 
@@ -4018,7 +4018,7 @@ find_reloads (rtx_insn *insn, int replace, int ind_levels, int live_known,
 	    operand_reloadnum[i]
 	      = push_reload (XEXP (recog_data.operand[i], 0), NULL_RTX,
 			     &XEXP (recog_data.operand[i], 0), (rtx*) 0,
-			     base_reg_class (VOIDmode, as, MEM, SCRATCH),
+			     base_reg_class (VOIDmode, as, MEM, SCRATCH, insn),
 			     address_mode,
 			     VOIDmode, 0, 0, i, RELOAD_OTHER);
 	    rld[operand_reloadnum[i]].inc
@@ -4897,7 +4897,8 @@ find_reloads_address (machine_mode mode, rtx *memrefloc, rtx ad,
       if (reg_equiv_constant (regno) != 0)
 	{
 	  find_reloads_address_part (reg_equiv_constant (regno), loc,
-				     base_reg_class (mode, as, MEM, SCRATCH),
+				     base_reg_class (mode, as, MEM,
+						     SCRATCH, insn),
 				     GET_MODE (ad), opnum, type, ind_levels);
 	  return 1;
 	}
@@ -4966,7 +4967,7 @@ find_reloads_address (machine_mode mode, rtx *memrefloc, rtx ad,
 
       /* If we do not have one of the cases above, we must do the reload.  */
       push_reload (ad, NULL_RTX, loc, (rtx*) 0,
-		   base_reg_class (mode, as, MEM, SCRATCH),
+		   base_reg_class (mode, as, MEM, SCRATCH, insn),
 		   GET_MODE (ad), VOIDmode, 0, 0, opnum, type);
       return 1;
     }
@@ -5123,7 +5124,8 @@ find_reloads_address (machine_mode mode, rtx *memrefloc, rtx ad,
 	     reload the sum into a base reg.
 	     That will at least work.  */
 	  find_reloads_address_part (ad, loc,
-				     base_reg_class (mode, as, MEM, SCRATCH),
+				     base_reg_class (mode, as, MEM,
+						     SCRATCH, insn),
 				     GET_MODE (ad), opnum, type, ind_levels);
 	}
       return ! removed_and;
@@ -5203,7 +5205,7 @@ find_reloads_address (machine_mode mode, rtx *memrefloc, rtx ad,
 				 op_index == 0 ? addend : offset_reg);
 	  *loc = ad;
 
-	  cls = base_reg_class (mode, as, MEM, GET_CODE (addend));
+	  cls = base_reg_class (mode, as, MEM, GET_CODE (addend), insn);
 	  find_reloads_address_part (XEXP (ad, op_index),
 				     &XEXP (ad, op_index), cls,
 				     GET_MODE (ad), opnum, type, ind_levels);
@@ -5261,7 +5263,8 @@ find_reloads_address (machine_mode mode, rtx *memrefloc, rtx ad,
 	}
 
       find_reloads_address_part (ad, loc,
-				 base_reg_class (mode, as, MEM, SCRATCH),
+				 base_reg_class (mode, as, MEM,
+						 SCRATCH, insn),
 				 address_mode, opnum, type, ind_levels);
       return ! removed_and;
     }
@@ -5513,7 +5516,8 @@ find_reloads_address_1 (machine_mode mode, addr_space_t as,
   if (context == 1)
     context_reg_class = INDEX_REG_CLASS;
   else
-    context_reg_class = base_reg_class (mode, as, outer_code, index_code);
+    context_reg_class = base_reg_class (mode, as, outer_code, index_code,
+					insn);
 
   switch (code)
     {
@@ -5738,7 +5742,8 @@ find_reloads_address_1 (machine_mode mode, addr_space_t as,
 		reloadnum = push_reload (tem, tem, &XEXP (x, 0),
 					 &XEXP (op1, 0),
 					 base_reg_class (mode, as,
-							 code, index_code),
+							 code, index_code,
+							 insn),
 					 GET_MODE (x), GET_MODE (x), 0,
 					 0, opnum, RELOAD_OTHER);
 
@@ -5756,7 +5761,8 @@ find_reloads_address_1 (machine_mode mode, addr_space_t as,
 	    reloadnum = push_reload (XEXP (op1, 0), XEXP (x, 0),
 				     &XEXP (op1, 0), &XEXP (x, 0),
 				     base_reg_class (mode, as,
-						     code, index_code),
+						     code, index_code,
+						     insn),
 				     GET_MODE (x), GET_MODE (x), 0, 0,
 				     opnum, RELOAD_OTHER);
 
@@ -6216,7 +6222,7 @@ find_reloads_subreg_address (rtx x, int opnum, enum reload_type type,
     {
       push_reload (XEXP (tem, 0), NULL_RTX, &XEXP (tem, 0), (rtx*) 0,
 		   base_reg_class (GET_MODE (tem), MEM_ADDR_SPACE (tem),
-				   MEM, SCRATCH),
+				   MEM, SCRATCH, insn),
 		   GET_MODE (XEXP (tem, 0)), VOIDmode, 0, 0, opnum, type);
       reloaded = 1;
     }
diff --git a/gcc/reload1.cc b/gcc/reload1.cc
index 9ba822d1ff7..f41f4a4de22 100644
--- a/gcc/reload1.cc
+++ b/gcc/reload1.cc
@@ -1382,7 +1382,7 @@ maybe_fix_stack_asms (void)
 		  if (insn_extra_address_constraint (cn))
 		    cls = (int) reg_class_subunion[cls]
 		      [(int) base_reg_class (VOIDmode, ADDR_SPACE_GENERIC,
-					     ADDRESS, SCRATCH)];
+					     ADDRESS, SCRATCH, chain->insn)];
 		  else
 		    cls = (int) reg_class_subunion[cls]
 		      [reg_class_for_constraint (cn)];
-- 
2.31.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2023-10-07  8:22 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-22 10:56 [PATCH v2 00/13] Support Intel APX EGPR Hongyu Wang
2023-09-22 10:56 ` [PATCH 01/13] [APX EGPR] middle-end: Add insn argument to base_reg_class Hongyu Wang
2023-09-22 16:02   ` Vladimir Makarov
2023-10-07  8:22     ` Hongyu Wang
2023-09-22 10:56 ` [PATCH 02/13] [APX EGPR] middle-end: Add index_reg_class with insn argument Hongyu Wang
2023-09-22 16:03   ` Vladimir Makarov
2023-09-22 10:56 ` [PATCH 03/13] [APX_EGPR] Initial support for APX_F Hongyu Wang
2023-10-07  2:35   ` Hongtao Liu
2023-09-22 10:56 ` [PATCH 04/13] [APX EGPR] Add 16 new integer general purpose registers Hongyu Wang
2023-09-22 10:56 ` [PATCH 05/13] [APX EGPR] Add register and memory constraints that disallow EGPR Hongyu Wang
2023-09-22 10:56 ` [PATCH 06/13] [APX EGPR] Add backend hook for base_reg_class/index_reg_class Hongyu Wang
2023-09-22 10:56 ` [PATCH 07/13] [APX EGPR] Map reg/mem constraints in inline asm to non-EGPR constraint Hongyu Wang
2023-09-22 10:56 ` [PATCH 08/13] [APX EGPR] Handle GPR16 only vector move insns Hongyu Wang
2023-09-22 10:56 ` [PATCH 09/13] [APX EGPR] Handle legacy insn that only support GPR16 (1/5) Hongyu Wang
2023-09-22 10:56 ` [PATCH 10/13] [APX EGPR] Handle legacy insns that only support GPR16 (2/5) Hongyu Wang
2023-09-22 10:56 ` [PATCH 11/13] [APX EGPR] Handle legacy insns that only support GPR16 (3/5) Hongyu Wang
2023-09-22 10:56 ` [PATCH 12/13] [APX_EGPR] Handle legacy insns that only support GPR16 (4/5) Hongyu Wang
2023-09-22 10:56 ` [PATCH 13/13] [APX EGPR] Handle vex insns that only support GPR16 (5/5) Hongyu Wang
2023-09-25  2:02 ` [PATCH v2 00/13] Support Intel APX EGPR Hongtao Liu
  -- strict thread matches above, loose matches on Subject: below --
2023-08-31  8:20 [PATCH 00/13] [RFC] " Hongyu Wang
2023-08-31  8:20 ` [PATCH 01/13] [APX EGPR] middle-end: Add insn argument to base_reg_class Hongyu Wang
2023-08-31 10:15   ` Uros Bizjak
2023-09-01  9:07     ` Hongyu Wang
2023-09-06 19:43       ` Vladimir Makarov
2023-09-07  6:23         ` Uros Bizjak
2023-09-07 12:13           ` Vladimir Makarov
2023-09-08 17:03   ` Vladimir Makarov
2023-09-10  4:49     ` Hongyu Wang
2023-09-14 12:09       ` Vladimir Makarov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).