public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 00/13] [RFC] Support Intel APX EGPR
@ 2023-08-31  8:20 Hongyu Wang
  2023-08-31  8:20 ` [PATCH 01/13] [APX EGPR] middle-end: Add insn argument to base_reg_class Hongyu Wang
                   ` (13 more replies)
  0 siblings, 14 replies; 49+ messages in thread
From: Hongyu Wang @ 2023-08-31  8:20 UTC (permalink / raw)
  To: gcc-patches; +Cc: hongtao.liu, ubizjak, hubicka, vmakarov, jakub

Intel Advanced performance extension (APX) has been released in [1].
It contains several extensions such as extended 16 general purpose registers
(EGPRs), push2/pop2, new data destination (NDD), conditional compare
(CCMP/CTEST) combined with suppress flags write version of common instructions
(NF). This RFC focused on EGPR implementation in GCC.

APX introduces a REX2 prefix to help represent EGPR for several legacy/SSE
instructions. For the remaining ones, it promotes some of them using evex
prefix for EGPR.  The main issue in APX is that not all legacy/sse/vex
instructions support EGPR. For example, instructions in legacy opcode map2/3
cannot use REX2 prefix since there is only 1bit in REX2 to indicate map0/1
instructions, e.g., pinsrd. Also, for most vector extensions, EGPR is supported
in their evex forms but not vex forms, which means the mnemonics with no evex
forms also cannot use EGPR, e.g., vphaddw. 

Such limitation brings some challenge with current GCC infrastructure.
Generally, we use constraints to guide register allocation behavior. For
register operand, it is easy to add a new constraint to certain insn and limit
it to legacy or REX registers. But for memory operand, if we only use
constraint to limit base/index register choice, reload has no backoff when
process_address allocates any egprs to base/index reg, and then any post-reload
pass would get ICE from the constraint.

Here is what we did to address the issue: 

Middle-end: 
-	Add rtx_insn parameter to base_reg_class, reuse the
MODE_CODE_BASE_REG_CLASS macro with rtx_insn parameter.
-	Add index_reg_class like base_reg_class, calls new INSN_INDEX_REG_CLASS
macro with rtx_insn parameter.
-	In process_address_1, add rtx_insn parameter to call sites of
base_reg_class, replace usage of INDEX_REG_CLASS to index_reg_class with
rtx_insn parameter.  

Back-end:
-	Extend GENERAL_REG_CLASS, INDEX_REG_CLASS and their supersets with
corresponding regno checks for EGPRs.
-	Add GENERAL_GPR16/INDEX_GPR16 class for old 16 GPRs.
-	Whole component is controlled under -mapxf/TARGET_APX_EGPR. If it is
not enabled, clear r16-r31 in accessible_reg_set.
-	New register_constraint “h” and memory_constraint “Bt” that disallows
EGPRs in operand.
-	New asm_gpr32 flag option to enable/disable gpr32 for inline asm,
  disabled by default.
-	If asm_gpr32 is disabled, replace constraints “r” to “h”, and
“m/memory” to “Bt”.
-	Extra insn attribute gpr32, value 0 indicates the alternative cannot
use EGPRs.
-	Add target functions for base_reg_class and index_reg_class, calls a
helper function to verify if insn can use EGPR in its memory_operand. 
-	In the helper function, the verify process works as follow: 
    1. Returns true if APX_EGPR disabled or insn is null. 
    2. If the insn is inline asm, returns asm_gpr32 flag. 
    3. Returns false for unrecognizable insn. 
    4. Save recog_data and which_alternative, extract the insn, and restore them
    before return. 
    5. Loop through all enabled alternatives, if one of the enabled alternatives
    have attr_gpr32 0, returns false, otherwise returns true.
-	For insn alternatives that cannot use gpr32 in register_operand, use h
constraint instead of r.
-	For insn alternatives that cannot use gpr32 in memory operand, use Bt
constraint instead of m, and set corresponding attr_gpr32 to 0.
-	Split output template with %v if the sse version of mnemonic cannot use
gpr32. 
-	For insn alternatives that cannot use gpr32 in memory operand, classify
the isa attribute and split alternatives to noavx, avx_noavx512f and etc., so
the helper function can properly loop through the available enabled mask.

Specifically for inline asm, we currently just map “r/m/memory” constraints as
an example. Eventually we will support entire mapping of all common constraints
if the mapping method was accepted.

Also, for vex instructions, currently we assume egpr was supported if they have
evex counterpart, since any APX enabled machine will have AVX10 support for all
the evex encodings. We just disabled those mnemonics that doesn’t support EGPR.
So EGPR will be allowed under -mavx2 -mapxf for many vex mnemonics. 

We haven’t disabled EGPR for 3DNOW/XOP/LWP/FMA4/TBM instructions, as they will
be co-operated with -mapxf. We can disable EGPR for them if AMD guys requires. 

For testing, currently we tested GCC testsuite and spec2017 with -maxf+sde
simulater and no more errors. Also, we inverted the register allocation order
to force r31 to be allocated first, and no more error except those AMD only
instructions. We will conduct further tests like changing all do-compile to
do-assemble and add more to gcc/testsuite in the future.

The RFC intends to describe our approach for APX implementation for EGPR
component. It may still have potential issues or bugs and requires futher
optimization. Any comments are very appreciated.

[1]. https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html.

Hongyu Wang (2):
  [APX EGPR] middle-end: Add index_reg_class with insn argument.
  [APX EGPR] Handle GPR16 only vector move insns

Kong Lingling (11):
  [APX EGPR] middle-end: Add insn argument to base_reg_class
  [APX_EGPR] Initial support for APX_F
  [APX EGPR] Add 16 new integer general purpose registers
  [APX EGPR] Add register and memory constraints that disallow EGPR
  [APX EGPR] Map reg/mem constraints in inline asm to non-EGPR
    constraint.
  [APX EGPR] Add backend hook for base_reg_class/index_reg_class.
  [APX EGPR] Handle legacy insn that only support GPR16 (1/5)
  [APX EGPR] Handle legacy insns that only support GPR16 (2/5)
  [APX EGPR] Handle legacy insns that only support GPR16 (3/5)
  [APX_EGPR] Handle legacy insns that only support GPR16 (4/5)
  [APX EGPR] Handle vex insns that only support GPR16 (5/5)

 gcc/addresses.h                               |  25 +-
 gcc/common/config/i386/cpuinfo.h              |  12 +-
 gcc/common/config/i386/i386-common.cc         |  17 +
 gcc/common/config/i386/i386-cpuinfo.h         |   1 +
 gcc/common/config/i386/i386-isas.h            |   1 +
 gcc/config/avr/avr.h                          |   5 +-
 gcc/config/gcn/gcn.h                          |   4 +-
 gcc/config/i386/constraints.md                |  26 +-
 gcc/config/i386/cpuid.h                       |   1 +
 gcc/config/i386/i386-isa.def                  |   1 +
 gcc/config/i386/i386-options.cc               |  15 +
 gcc/config/i386/i386-opts.h                   |   8 +
 gcc/config/i386/i386-protos.h                 |   9 +
 gcc/config/i386/i386.cc                       | 253 +++++-
 gcc/config/i386/i386.h                        |  69 +-
 gcc/config/i386/i386.md                       | 144 ++-
 gcc/config/i386/i386.opt                      |  30 +
 gcc/config/i386/mmx.md                        | 170 ++--
 gcc/config/i386/sse.md                        | 859 ++++++++++++------
 gcc/config/rl78/rl78.h                        |   6 +-
 gcc/doc/invoke.texi                           |  11 +-
 gcc/doc/tm.texi                               |  17 +-
 gcc/doc/tm.texi.in                            |  17 +-
 gcc/lra-constraints.cc                        |  32 +-
 gcc/reload.cc                                 |  34 +-
 gcc/reload1.cc                                |   2 +-
 gcc/testsuite/gcc.target/i386/apx-1.c         |   8 +
 .../gcc.target/i386/apx-egprs-names.c         |  17 +
 .../gcc.target/i386/apx-inline-gpr-norex2.c   | 108 +++
 .../gcc.target/i386/apx-interrupt-1.c         | 102 +++
 .../i386/apx-legacy-insn-check-norex2-asm.c   |   5 +
 .../i386/apx-legacy-insn-check-norex2.c       | 181 ++++
 .../gcc.target/i386/apx-spill_to_egprs-1.c    |  25 +
 gcc/testsuite/lib/target-supports.exp         |  10 +
 34 files changed, 1747 insertions(+), 478 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-egprs-names.c
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-inline-gpr-norex2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-interrupt-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2-asm.c
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-spill_to_egprs-1.c

-- 
2.31.1


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 01/13] [APX EGPR] middle-end: Add insn argument to base_reg_class
  2023-08-31  8:20 [PATCH 00/13] [RFC] Support Intel APX EGPR Hongyu Wang
@ 2023-08-31  8:20 ` Hongyu Wang
  2023-08-31 10:15   ` Uros Bizjak
  2023-09-08 17:03   ` Vladimir Makarov
  2023-08-31  8:20 ` [PATCH 02/13] [APX EGPR] middle-end: Add index_reg_class with insn argument Hongyu Wang
                   ` (12 subsequent siblings)
  13 siblings, 2 replies; 49+ messages in thread
From: Hongyu Wang @ 2023-08-31  8:20 UTC (permalink / raw)
  To: gcc-patches; +Cc: hongtao.liu, ubizjak, hubicka, vmakarov, jakub, Kong Lingling

From: Kong Lingling <lingling.kong@intel.com>

Current reload infrastructure does not support selective base_reg_class
for backend insn. Add insn argument to base_reg_class for
lra/reload usage.

gcc/ChangeLog:

	* addresses.h (base_reg_class):  Add insn argument.
	Pass to MODE_CODE_BASE_REG_CLASS.
	(regno_ok_for_base_p_1): Add insn argument.
	Pass to REGNO_MODE_CODE_OK_FOR_BASE_P.
	(regno_ok_for_base_p): Add insn argument and parse to ok_for_base_p_1.
	* config/avr/avr.h (MODE_CODE_BASE_REG_CLASS): Add insn argument.
	(REGNO_MODE_CODE_OK_FOR_BASE_P): Ditto.
	* config/gcn/gcn.h (MODE_CODE_BASE_REG_CLASS): Ditto.
	(REGNO_MODE_CODE_OK_FOR_BASE_P): Ditto.
	* config/rl78/rl78.h (REGNO_MODE_CODE_OK_FOR_BASE_P): Ditto.
	(MODE_CODE_BASE_REG_CLASS): Ditto.
	* doc/tm.texi: Add insn argument for MODE_CODE_BASE_REG_CLASS
	and REGNO_MODE_CODE_OK_FOR_BASE_P.
	* doc/tm.texi.in: Ditto.
	* lra-constraints.cc (process_address_1): Pass insn to
	base_reg_class.
	(curr_insn_transform): Ditto.
	* reload.cc (find_reloads): Ditto.
	(find_reloads_address): Ditto.
	(find_reloads_address_1): Ditto.
	(find_reloads_subreg_address): Ditto.
	* reload1.cc (maybe_fix_stack_asms): Ditto.
---
 gcc/addresses.h        | 15 +++++++++------
 gcc/config/avr/avr.h   |  5 +++--
 gcc/config/gcn/gcn.h   |  4 ++--
 gcc/config/rl78/rl78.h |  6 ++++--
 gcc/doc/tm.texi        |  8 ++++++--
 gcc/doc/tm.texi.in     |  8 ++++++--
 gcc/lra-constraints.cc | 15 +++++++++------
 gcc/reload.cc          | 30 ++++++++++++++++++------------
 gcc/reload1.cc         |  2 +-
 9 files changed, 58 insertions(+), 35 deletions(-)

diff --git a/gcc/addresses.h b/gcc/addresses.h
index 3519c241c6d..08b100cfe6d 100644
--- a/gcc/addresses.h
+++ b/gcc/addresses.h
@@ -28,11 +28,12 @@ inline enum reg_class
 base_reg_class (machine_mode mode ATTRIBUTE_UNUSED,
 		addr_space_t as ATTRIBUTE_UNUSED,
 		enum rtx_code outer_code ATTRIBUTE_UNUSED,
-		enum rtx_code index_code ATTRIBUTE_UNUSED)
+		enum rtx_code index_code ATTRIBUTE_UNUSED,
+		rtx_insn *insn ATTRIBUTE_UNUSED = NULL)
 {
 #ifdef MODE_CODE_BASE_REG_CLASS
   return MODE_CODE_BASE_REG_CLASS (MACRO_MODE (mode), as, outer_code,
-				   index_code);
+				   index_code, insn);
 #else
 #ifdef MODE_BASE_REG_REG_CLASS
   if (index_code == REG)
@@ -56,11 +57,12 @@ ok_for_base_p_1 (unsigned regno ATTRIBUTE_UNUSED,
 		 machine_mode mode ATTRIBUTE_UNUSED,
 		 addr_space_t as ATTRIBUTE_UNUSED,
 		 enum rtx_code outer_code ATTRIBUTE_UNUSED,
-		 enum rtx_code index_code ATTRIBUTE_UNUSED)
+		 enum rtx_code index_code ATTRIBUTE_UNUSED,
+		 rtx_insn* insn ATTRIBUTE_UNUSED = NULL)
 {
 #ifdef REGNO_MODE_CODE_OK_FOR_BASE_P
   return REGNO_MODE_CODE_OK_FOR_BASE_P (regno, MACRO_MODE (mode), as,
-					outer_code, index_code);
+					outer_code, index_code, insn);
 #else
 #ifdef REGNO_MODE_OK_FOR_REG_BASE_P
   if (index_code == REG)
@@ -79,12 +81,13 @@ ok_for_base_p_1 (unsigned regno ATTRIBUTE_UNUSED,
 
 inline bool
 regno_ok_for_base_p (unsigned regno, machine_mode mode, addr_space_t as,
-		     enum rtx_code outer_code, enum rtx_code index_code)
+		     enum rtx_code outer_code, enum rtx_code index_code,
+		     rtx_insn* insn = NULL)
 {
   if (regno >= FIRST_PSEUDO_REGISTER && reg_renumber[regno] >= 0)
     regno = reg_renumber[regno];
 
-  return ok_for_base_p_1 (regno, mode, as, outer_code, index_code);
+  return ok_for_base_p_1 (regno, mode, as, outer_code, index_code, insn);
 }
 
 #endif /* GCC_ADDRESSES_H */
diff --git a/gcc/config/avr/avr.h b/gcc/config/avr/avr.h
index 8e7e00db13b..1d090fe0838 100644
--- a/gcc/config/avr/avr.h
+++ b/gcc/config/avr/avr.h
@@ -280,12 +280,13 @@ enum reg_class {
 
 #define REGNO_REG_CLASS(R) avr_regno_reg_class(R)
 
-#define MODE_CODE_BASE_REG_CLASS(mode, as, outer_code, index_code)   \
+#define MODE_CODE_BASE_REG_CLASS(mode, as, outer_code, index_code, insn)   \
   avr_mode_code_base_reg_class (mode, as, outer_code, index_code)
 
 #define INDEX_REG_CLASS NO_REGS
 
-#define REGNO_MODE_CODE_OK_FOR_BASE_P(num, mode, as, outer_code, index_code) \
+#define REGNO_MODE_CODE_OK_FOR_BASE_P(num, mode, as, outer_code,	  \
+				      index_code, insn)			  \
   avr_regno_mode_code_ok_for_base_p (num, mode, as, outer_code, index_code)
 
 #define REGNO_OK_FOR_INDEX_P(NUM) 0
diff --git a/gcc/config/gcn/gcn.h b/gcc/config/gcn/gcn.h
index 4ff9a5d4d12..b56702a77fd 100644
--- a/gcc/config/gcn/gcn.h
+++ b/gcc/config/gcn/gcn.h
@@ -437,9 +437,9 @@ enum reg_class
      0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0 }}
 
 #define REGNO_REG_CLASS(REGNO) gcn_regno_reg_class (REGNO)
-#define MODE_CODE_BASE_REG_CLASS(MODE, AS, OUTER, INDEX) \
+#define MODE_CODE_BASE_REG_CLASS(MODE, AS, OUTER, INDEX, INSN) \
 	 gcn_mode_code_base_reg_class (MODE, AS, OUTER, INDEX)
-#define REGNO_MODE_CODE_OK_FOR_BASE_P(NUM, MODE, AS, OUTER, INDEX) \
+#define REGNO_MODE_CODE_OK_FOR_BASE_P(NUM, MODE, AS, OUTER, INDEX, INSN) \
 	 gcn_regno_mode_code_ok_for_base_p (NUM, MODE, AS, OUTER, INDEX)
 #define INDEX_REG_CLASS VGPR_REGS
 #define REGNO_OK_FOR_INDEX_P(regno) regno_ok_for_index_p (regno)
diff --git a/gcc/config/rl78/rl78.h b/gcc/config/rl78/rl78.h
index 7a7c6a44ba2..d0ed9162292 100644
--- a/gcc/config/rl78/rl78.h
+++ b/gcc/config/rl78/rl78.h
@@ -375,10 +375,12 @@ enum reg_class
 
 #define REGNO_OK_FOR_INDEX_P(regno)	REGNO_OK_FOR_BASE_P (regno)
 
-#define REGNO_MODE_CODE_OK_FOR_BASE_P(regno, mode, address_space, outer_code, index_code) \
+#define REGNO_MODE_CODE_OK_FOR_BASE_P(regno, mode, address_space, outer_code, \
+				      index_code, insn)			      \
   rl78_regno_mode_code_ok_for_base_p (regno, mode, address_space, outer_code, index_code)
 
-#define MODE_CODE_BASE_REG_CLASS(mode, address_space, outer_code, index_code) \
+#define MODE_CODE_BASE_REG_CLASS(mode, address_space, outer_code, index_code, \
+				 insn) 					      \
   rl78_mode_code_base_reg_class (mode, address_space, outer_code, index_code)
 
 #define RETURN_ADDR_RTX(COUNT, FRAMEADDR)				\
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index d0d47b0d471..a4239e3de10 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -2533,7 +2533,7 @@ register address.  You should define this macro if base plus index
 addresses have different requirements than other base register uses.
 @end defmac
 
-@defmac MODE_CODE_BASE_REG_CLASS (@var{mode}, @var{address_space}, @var{outer_code}, @var{index_code})
+@defmac MODE_CODE_BASE_REG_CLASS (@var{mode}, @var{address_space}, @var{outer_code}, @var{index_code}, @var{insn})
 A C expression whose value is the register class to which a valid
 base register for a memory reference in mode @var{mode} to address
 space @var{address_space} must belong.  @var{outer_code} and @var{index_code}
@@ -2542,6 +2542,8 @@ the code of the immediately enclosing expression (@code{MEM} for the top level
 of an address, @code{ADDRESS} for something that occurs in an
 @code{address_operand}).  @var{index_code} is the code of the corresponding
 index expression if @var{outer_code} is @code{PLUS}; @code{SCRATCH} otherwise.
+@code{insn} indicates insn specific base register class should be subset
+of the original base register class.
 @end defmac
 
 @defmac INDEX_REG_CLASS
@@ -2579,7 +2581,7 @@ Use of this macro is deprecated; please use the more general
 @code{REGNO_MODE_CODE_OK_FOR_BASE_P}.
 @end defmac
 
-@defmac REGNO_MODE_CODE_OK_FOR_BASE_P (@var{num}, @var{mode}, @var{address_space}, @var{outer_code}, @var{index_code})
+@defmac REGNO_MODE_CODE_OK_FOR_BASE_P (@var{num}, @var{mode}, @var{address_space}, @var{outer_code}, @var{index_code}, @var{insn})
 A C expression which is nonzero if register number @var{num} is
 suitable for use as a base register in operand addresses, accessing
 memory in mode @var{mode} in address space @var{address_space}.
@@ -2592,6 +2594,8 @@ address, @code{ADDRESS} for something that occurs in an
 corresponding index expression if @var{outer_code} is @code{PLUS};
 @code{SCRATCH} otherwise.  The mode may be @code{VOIDmode} for addresses
 that appear outside a @code{MEM}, i.e., as an @code{address_operand}.
+@code{insn} indicates insn specific base register class should be subset
+of the original base register class.
 @end defmac
 
 @defmac REGNO_OK_FOR_INDEX_P (@var{num})
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 4ac96dc357d..72898f3adba 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -2128,7 +2128,7 @@ register address.  You should define this macro if base plus index
 addresses have different requirements than other base register uses.
 @end defmac
 
-@defmac MODE_CODE_BASE_REG_CLASS (@var{mode}, @var{address_space}, @var{outer_code}, @var{index_code})
+@defmac MODE_CODE_BASE_REG_CLASS (@var{mode}, @var{address_space}, @var{outer_code}, @var{index_code}, @var{insn})
 A C expression whose value is the register class to which a valid
 base register for a memory reference in mode @var{mode} to address
 space @var{address_space} must belong.  @var{outer_code} and @var{index_code}
@@ -2137,6 +2137,8 @@ the code of the immediately enclosing expression (@code{MEM} for the top level
 of an address, @code{ADDRESS} for something that occurs in an
 @code{address_operand}).  @var{index_code} is the code of the corresponding
 index expression if @var{outer_code} is @code{PLUS}; @code{SCRATCH} otherwise.
+@code{insn} indicates insn specific base register class should be subset
+of the original base register class.
 @end defmac
 
 @defmac INDEX_REG_CLASS
@@ -2174,7 +2176,7 @@ Use of this macro is deprecated; please use the more general
 @code{REGNO_MODE_CODE_OK_FOR_BASE_P}.
 @end defmac
 
-@defmac REGNO_MODE_CODE_OK_FOR_BASE_P (@var{num}, @var{mode}, @var{address_space}, @var{outer_code}, @var{index_code})
+@defmac REGNO_MODE_CODE_OK_FOR_BASE_P (@var{num}, @var{mode}, @var{address_space}, @var{outer_code}, @var{index_code}, @var{insn})
 A C expression which is nonzero if register number @var{num} is
 suitable for use as a base register in operand addresses, accessing
 memory in mode @var{mode} in address space @var{address_space}.
@@ -2187,6 +2189,8 @@ address, @code{ADDRESS} for something that occurs in an
 corresponding index expression if @var{outer_code} is @code{PLUS};
 @code{SCRATCH} otherwise.  The mode may be @code{VOIDmode} for addresses
 that appear outside a @code{MEM}, i.e., as an @code{address_operand}.
+@code{insn} indicates insn specific base register class should be subset
+of the original base register class.
 @end defmac
 
 @defmac REGNO_OK_FOR_INDEX_P (@var{num})
diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index c718bedff32..9e7915ce934 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -3672,7 +3672,7 @@ process_address_1 (int nop, bool check_only_p,
 				     REGNO (*ad.base_term)) != NULL_RTX)
 	    ? after : NULL),
 	   base_reg_class (ad.mode, ad.as, ad.base_outer_code,
-			   get_index_code (&ad)))))
+			   get_index_code (&ad), curr_insn))))
     {
       change_p = true;
       if (ad.base_term2 != NULL)
@@ -3722,7 +3722,8 @@ process_address_1 (int nop, bool check_only_p,
 	  rtx_insn *last = get_last_insn ();
 	  int code = -1;
 	  enum reg_class cl = base_reg_class (ad.mode, ad.as,
-					      SCRATCH, SCRATCH);
+					      SCRATCH, SCRATCH,
+					      curr_insn);
 	  rtx addr = *ad.inner;
 
 	  new_reg = lra_create_new_reg (Pmode, NULL_RTX, cl, NULL, "addr");
@@ -3785,7 +3786,8 @@ process_address_1 (int nop, bool check_only_p,
 	  /* index * scale + disp => new base + index * scale,
 	     case (1) above.  */
 	  enum reg_class cl = base_reg_class (ad.mode, ad.as, PLUS,
-					      GET_CODE (*ad.index));
+					      GET_CODE (*ad.index),
+					      curr_insn);
 
 	  lra_assert (INDEX_REG_CLASS != NO_REGS);
 	  new_reg = lra_create_new_reg (Pmode, NULL_RTX, cl, NULL, "disp");
@@ -3846,7 +3848,7 @@ process_address_1 (int nop, bool check_only_p,
 	      *ad.base_term = XEXP (SET_SRC (set), 0);
 	      *ad.disp_term = XEXP (SET_SRC (set), 1);
 	      cl = base_reg_class (ad.mode, ad.as, ad.base_outer_code,
-				   get_index_code (&ad));
+				   get_index_code (&ad), curr_insn);
 	      regno = REGNO (*ad.base_term);
 	      if (regno >= FIRST_PSEUDO_REGISTER
 		  && cl != lra_get_allocno_class (regno))
@@ -3890,7 +3892,8 @@ process_address_1 (int nop, bool check_only_p,
   else
     {
       enum reg_class cl = base_reg_class (ad.mode, ad.as,
-					  SCRATCH, SCRATCH);
+					  SCRATCH, SCRATCH,
+					  curr_insn);
       rtx addr = *ad.inner;
       
       new_reg = lra_create_new_reg (Pmode, NULL_RTX, cl, NULL, "addr");
@@ -4639,7 +4642,7 @@ curr_insn_transform (bool check_only_p)
 
 	  push_to_sequence (before);
 	  rclass = base_reg_class (GET_MODE (op), MEM_ADDR_SPACE (op),
-				   MEM, SCRATCH);
+				   MEM, SCRATCH, curr_insn);
 	  if (GET_RTX_CLASS (code) == RTX_AUTOINC)
 	    new_reg = emit_inc (rclass, *loc, *loc,
 				/* This value does not matter for MODIFY.  */
diff --git a/gcc/reload.cc b/gcc/reload.cc
index 2126bdd117c..72f7e27af15 100644
--- a/gcc/reload.cc
+++ b/gcc/reload.cc
@@ -3321,7 +3321,7 @@ find_reloads (rtx_insn *insn, int replace, int ind_levels, int live_known,
 		       were handled in find_reloads_address.  */
 		    this_alternative[i]
 		      = base_reg_class (VOIDmode, ADDR_SPACE_GENERIC,
-					ADDRESS, SCRATCH);
+					ADDRESS, SCRATCH, insn);
 		    win = 1;
 		    badop = 0;
 		    break;
@@ -3508,7 +3508,7 @@ find_reloads (rtx_insn *insn, int replace, int ind_levels, int live_known,
 			   the address into a base register.  */
 			this_alternative[i]
 			  = base_reg_class (VOIDmode, ADDR_SPACE_GENERIC,
-					    ADDRESS, SCRATCH);
+					    ADDRESS, SCRATCH, insn);
 			badop = 0;
 			break;
 
@@ -4018,7 +4018,7 @@ find_reloads (rtx_insn *insn, int replace, int ind_levels, int live_known,
 	    operand_reloadnum[i]
 	      = push_reload (XEXP (recog_data.operand[i], 0), NULL_RTX,
 			     &XEXP (recog_data.operand[i], 0), (rtx*) 0,
-			     base_reg_class (VOIDmode, as, MEM, SCRATCH),
+			     base_reg_class (VOIDmode, as, MEM, SCRATCH, insn),
 			     address_mode,
 			     VOIDmode, 0, 0, i, RELOAD_OTHER);
 	    rld[operand_reloadnum[i]].inc
@@ -4897,7 +4897,8 @@ find_reloads_address (machine_mode mode, rtx *memrefloc, rtx ad,
       if (reg_equiv_constant (regno) != 0)
 	{
 	  find_reloads_address_part (reg_equiv_constant (regno), loc,
-				     base_reg_class (mode, as, MEM, SCRATCH),
+				     base_reg_class (mode, as, MEM,
+						     SCRATCH, insn),
 				     GET_MODE (ad), opnum, type, ind_levels);
 	  return 1;
 	}
@@ -4966,7 +4967,7 @@ find_reloads_address (machine_mode mode, rtx *memrefloc, rtx ad,
 
       /* If we do not have one of the cases above, we must do the reload.  */
       push_reload (ad, NULL_RTX, loc, (rtx*) 0,
-		   base_reg_class (mode, as, MEM, SCRATCH),
+		   base_reg_class (mode, as, MEM, SCRATCH, insn),
 		   GET_MODE (ad), VOIDmode, 0, 0, opnum, type);
       return 1;
     }
@@ -5123,7 +5124,8 @@ find_reloads_address (machine_mode mode, rtx *memrefloc, rtx ad,
 	     reload the sum into a base reg.
 	     That will at least work.  */
 	  find_reloads_address_part (ad, loc,
-				     base_reg_class (mode, as, MEM, SCRATCH),
+				     base_reg_class (mode, as, MEM,
+						     SCRATCH, insn),
 				     GET_MODE (ad), opnum, type, ind_levels);
 	}
       return ! removed_and;
@@ -5203,7 +5205,7 @@ find_reloads_address (machine_mode mode, rtx *memrefloc, rtx ad,
 				 op_index == 0 ? addend : offset_reg);
 	  *loc = ad;
 
-	  cls = base_reg_class (mode, as, MEM, GET_CODE (addend));
+	  cls = base_reg_class (mode, as, MEM, GET_CODE (addend), insn);
 	  find_reloads_address_part (XEXP (ad, op_index),
 				     &XEXP (ad, op_index), cls,
 				     GET_MODE (ad), opnum, type, ind_levels);
@@ -5261,7 +5263,8 @@ find_reloads_address (machine_mode mode, rtx *memrefloc, rtx ad,
 	}
 
       find_reloads_address_part (ad, loc,
-				 base_reg_class (mode, as, MEM, SCRATCH),
+				 base_reg_class (mode, as, MEM,
+						 SCRATCH, insn),
 				 address_mode, opnum, type, ind_levels);
       return ! removed_and;
     }
@@ -5513,7 +5516,8 @@ find_reloads_address_1 (machine_mode mode, addr_space_t as,
   if (context == 1)
     context_reg_class = INDEX_REG_CLASS;
   else
-    context_reg_class = base_reg_class (mode, as, outer_code, index_code);
+    context_reg_class = base_reg_class (mode, as, outer_code, index_code,
+					insn);
 
   switch (code)
     {
@@ -5738,7 +5742,8 @@ find_reloads_address_1 (machine_mode mode, addr_space_t as,
 		reloadnum = push_reload (tem, tem, &XEXP (x, 0),
 					 &XEXP (op1, 0),
 					 base_reg_class (mode, as,
-							 code, index_code),
+							 code, index_code,
+							 insn),
 					 GET_MODE (x), GET_MODE (x), 0,
 					 0, opnum, RELOAD_OTHER);
 
@@ -5756,7 +5761,8 @@ find_reloads_address_1 (machine_mode mode, addr_space_t as,
 	    reloadnum = push_reload (XEXP (op1, 0), XEXP (x, 0),
 				     &XEXP (op1, 0), &XEXP (x, 0),
 				     base_reg_class (mode, as,
-						     code, index_code),
+						     code, index_code,
+						     insn),
 				     GET_MODE (x), GET_MODE (x), 0, 0,
 				     opnum, RELOAD_OTHER);
 
@@ -6216,7 +6222,7 @@ find_reloads_subreg_address (rtx x, int opnum, enum reload_type type,
     {
       push_reload (XEXP (tem, 0), NULL_RTX, &XEXP (tem, 0), (rtx*) 0,
 		   base_reg_class (GET_MODE (tem), MEM_ADDR_SPACE (tem),
-				   MEM, SCRATCH),
+				   MEM, SCRATCH, insn),
 		   GET_MODE (XEXP (tem, 0)), VOIDmode, 0, 0, opnum, type);
       reloaded = 1;
     }
diff --git a/gcc/reload1.cc b/gcc/reload1.cc
index 9ba822d1ff7..f41f4a4de22 100644
--- a/gcc/reload1.cc
+++ b/gcc/reload1.cc
@@ -1382,7 +1382,7 @@ maybe_fix_stack_asms (void)
 		  if (insn_extra_address_constraint (cn))
 		    cls = (int) reg_class_subunion[cls]
 		      [(int) base_reg_class (VOIDmode, ADDR_SPACE_GENERIC,
-					     ADDRESS, SCRATCH)];
+					     ADDRESS, SCRATCH, chain->insn)];
 		  else
 		    cls = (int) reg_class_subunion[cls]
 		      [reg_class_for_constraint (cn)];
-- 
2.31.1


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 02/13] [APX EGPR] middle-end: Add index_reg_class with insn argument.
  2023-08-31  8:20 [PATCH 00/13] [RFC] Support Intel APX EGPR Hongyu Wang
  2023-08-31  8:20 ` [PATCH 01/13] [APX EGPR] middle-end: Add insn argument to base_reg_class Hongyu Wang
@ 2023-08-31  8:20 ` Hongyu Wang
  2023-08-31  8:20 ` [PATCH 03/13] [APX_EGPR] Initial support for APX_F Hongyu Wang
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 49+ messages in thread
From: Hongyu Wang @ 2023-08-31  8:20 UTC (permalink / raw)
  To: gcc-patches; +Cc: hongtao.liu, ubizjak, hubicka, vmakarov, jakub

Like base_reg_class, INDEX_REG_CLASS also does not support backend insn.
Add index_reg_class with insn argument for lra/reload usage.

gcc/ChangeLog:

	* addresses.h (index_reg_class): New wrapper function like
	base_reg_class.
	* doc/tm.texi: Document INSN_INDEX_REG_CLASS.
	* doc/tm.texi.in: Ditto.
	* lra-constraints.cc (index_part_to_reg): Pass index_class.
	(process_address_1): Calls index_reg_class with curr_insn and
	replace INDEX_REG_CLASS with its return value index_cl.
	* reload.cc (find_reloads_address): Likewise.
	(find_reloads_address_1): Likewise.
---
 gcc/addresses.h        | 10 ++++++++++
 gcc/doc/tm.texi        |  9 +++++++++
 gcc/doc/tm.texi.in     |  9 +++++++++
 gcc/lra-constraints.cc | 17 +++++++++--------
 gcc/reload.cc          |  4 ++--
 5 files changed, 39 insertions(+), 10 deletions(-)

diff --git a/gcc/addresses.h b/gcc/addresses.h
index 08b100cfe6d..4bd96a3fc83 100644
--- a/gcc/addresses.h
+++ b/gcc/addresses.h
@@ -47,6 +47,16 @@ base_reg_class (machine_mode mode ATTRIBUTE_UNUSED,
 #endif
 }
 
+inline enum reg_class
+index_reg_class (rtx_insn *insn ATTRIBUTE_UNUSED = NULL)
+{
+#ifdef INSN_INDEX_REG_CLASS
+  return INSN_INDEX_REG_CLASS (insn);
+#else
+  return INDEX_REG_CLASS;
+#endif
+}
+
 /* Wrapper function to unify target macros REGNO_MODE_CODE_OK_FOR_BASE_P,
    REGNO_MODE_OK_FOR_REG_BASE_P, REGNO_MODE_OK_FOR_BASE_P and
    REGNO_OK_FOR_BASE_P.
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index a4239e3de10..5a50f5cf7f3 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -2553,6 +2553,15 @@ address where its value is either multiplied by a scale factor or
 added to another register (as well as added to a displacement).
 @end defmac
 
+@defmac INSN_INDEX_REG_CLASS (@var{insn})
+A C expression whose value is the register class to which a valid
+index register must belong. An index register is one used in an
+address where its value is either multiplied by a scale factor or
+added to another register (as well as added to a displacement).
+@code{insn} indicates insn specific index register class should be
+subset of the original index register class.
+@end defmac
+
 @defmac REGNO_OK_FOR_BASE_P (@var{num})
 A C expression which is nonzero if register number @var{num} is
 suitable for use as a base register in operand addresses.
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 72898f3adba..65748e19ccd 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -2148,6 +2148,15 @@ address where its value is either multiplied by a scale factor or
 added to another register (as well as added to a displacement).
 @end defmac
 
+@defmac INSN_INDEX_REG_CLASS (@var{insn})
+A C expression whose value is the register class to which a valid
+index register must belong. An index register is one used in an
+address where its value is either multiplied by a scale factor or
+added to another register (as well as added to a displacement).
+@code{insn} indicates insn specific index register class should be
+subset of the original index register class.
+@end defmac
+
 @defmac REGNO_OK_FOR_BASE_P (@var{num})
 A C expression which is nonzero if register number @var{num} is
 suitable for use as a base register in operand addresses.
diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index 9e7915ce934..161b67d8b73 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -3390,12 +3390,12 @@ base_plus_disp_to_reg (struct address_info *ad, rtx disp)
 /* Make reload of index part of address AD.  Return the new
    pseudo.  */
 static rtx
-index_part_to_reg (struct address_info *ad)
+index_part_to_reg (struct address_info *ad, enum reg_class index_class)
 {
   rtx new_reg;
 
   new_reg = lra_create_new_reg (GET_MODE (*ad->index), NULL_RTX,
-				INDEX_REG_CLASS, NULL, "index term");
+				index_class, NULL, "index term");
   expand_mult (GET_MODE (*ad->index), *ad->index_term,
 	       GEN_INT (get_index_scale (ad)), new_reg, 1);
   return new_reg;
@@ -3650,13 +3650,14 @@ process_address_1 (int nop, bool check_only_p,
   /* If INDEX_REG_CLASS is assigned to base_term already and isn't to
      index_term, swap them so to avoid assigning INDEX_REG_CLASS to both
      when INDEX_REG_CLASS is a single register class.  */
+  enum reg_class index_cl = index_reg_class (curr_insn);
   if (ad.base_term != NULL
       && ad.index_term != NULL
-      && ira_class_hard_regs_num[INDEX_REG_CLASS] == 1
+      && ira_class_hard_regs_num[index_cl] == 1
       && REG_P (*ad.base_term)
       && REG_P (*ad.index_term)
-      && in_class_p (*ad.base_term, INDEX_REG_CLASS, NULL)
-      && ! in_class_p (*ad.index_term, INDEX_REG_CLASS, NULL))
+      && in_class_p (*ad.base_term, index_cl, NULL)
+      && ! in_class_p (*ad.index_term, index_cl, NULL))
     {
       std::swap (ad.base, ad.index);
       std::swap (ad.base_term, ad.index_term);
@@ -3680,7 +3681,7 @@ process_address_1 (int nop, bool check_only_p,
     }
   if (ad.index_term != NULL
       && process_addr_reg (ad.index_term, check_only_p,
-			   before, NULL, INDEX_REG_CLASS))
+			   before, NULL, index_cl))
     change_p = true;
 
   /* Target hooks sometimes don't treat extra-constraint addresses as
@@ -3789,7 +3790,7 @@ process_address_1 (int nop, bool check_only_p,
 					      GET_CODE (*ad.index),
 					      curr_insn);
 
-	  lra_assert (INDEX_REG_CLASS != NO_REGS);
+	  lra_assert (index_cl != NO_REGS);
 	  new_reg = lra_create_new_reg (Pmode, NULL_RTX, cl, NULL, "disp");
 	  lra_emit_move (new_reg, *ad.disp);
 	  *ad.inner = simplify_gen_binary (PLUS, GET_MODE (new_reg),
@@ -3885,7 +3886,7 @@ process_address_1 (int nop, bool check_only_p,
       changed pseudo on the equivalent memory and a subreg of the
       pseudo onto the memory of different mode for which the scale is
       prohibitted.  */
-      new_reg = index_part_to_reg (&ad);
+      new_reg = index_part_to_reg (&ad, index_cl);
       *ad.inner = simplify_gen_binary (PLUS, GET_MODE (new_reg),
 				       *ad.base_term, new_reg);
     }
diff --git a/gcc/reload.cc b/gcc/reload.cc
index 72f7e27af15..66b484b12fa 100644
--- a/gcc/reload.cc
+++ b/gcc/reload.cc
@@ -5114,7 +5114,7 @@ find_reloads_address (machine_mode mode, rtx *memrefloc, rtx ad,
 	  /* Reload the displacement into an index reg.
 	     We assume the frame pointer or arg pointer is a base reg.  */
 	  find_reloads_address_part (XEXP (ad, 1), &XEXP (ad, 1),
-				     INDEX_REG_CLASS, GET_MODE (ad), opnum,
+				     index_reg_class (insn), GET_MODE (ad), opnum,
 				     type, ind_levels);
 	  return 0;
 	}
@@ -5514,7 +5514,7 @@ find_reloads_address_1 (machine_mode mode, addr_space_t as,
   bool reloaded_inner_of_autoinc = false;
 
   if (context == 1)
-    context_reg_class = INDEX_REG_CLASS;
+    context_reg_class = index_reg_class (insn);
   else
     context_reg_class = base_reg_class (mode, as, outer_code, index_code,
 					insn);
-- 
2.31.1


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 03/13] [APX_EGPR] Initial support for APX_F
  2023-08-31  8:20 [PATCH 00/13] [RFC] Support Intel APX EGPR Hongyu Wang
  2023-08-31  8:20 ` [PATCH 01/13] [APX EGPR] middle-end: Add insn argument to base_reg_class Hongyu Wang
  2023-08-31  8:20 ` [PATCH 02/13] [APX EGPR] middle-end: Add index_reg_class with insn argument Hongyu Wang
@ 2023-08-31  8:20 ` Hongyu Wang
  2023-08-31  8:20 ` [PATCH 04/13] [APX EGPR] Add 16 new integer general purpose registers Hongyu Wang
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 49+ messages in thread
From: Hongyu Wang @ 2023-08-31  8:20 UTC (permalink / raw)
  To: gcc-patches; +Cc: hongtao.liu, ubizjak, hubicka, vmakarov, jakub, Kong Lingling

From: Kong Lingling <lingling.kong@intel.com>

Add -mapx-features= enumeration to separate subfeatures of APX_F.
-mapxf is treated same as previous ISA flag, while it sets
-mapx-features=apx_all that enables all subfeatures.

gcc/ChangeLog:

	* common/config/i386/cpuinfo.h (XSTATE_APX_F): New macro.
	(XCR_APX_F_ENABLED_MASK): Likewise.
	(get_available_features): Detect APX_F under
	* common/config/i386/i386-common.cc (OPTION_MASK_ISA2_APX_F_SET): New.
	(OPTION_MASK_ISA2_APX_F_UNSET): Likewise.
	(ix86_handle_option): Handle -mapxf.
	* common/config/i386/i386-cpuinfo.h (FEATURE_APX_F): New.
	* common/config/i386/i386-isas.h: Add entry for APX_F.
	* config/i386/cpuid.h (bit_APX_F): New.
	* config/i386/i386.h (bit_APX_F): (TARGET_APX_EGPR,
	TARGET_APX_PUSH2POP2, TARGET_APX_NDD): New define.
	* config/i386/i386-opts.h (enum apx_features): New enum.
	* config/i386/i386-isa.def (APX_F): New DEF_PTA.
	* config/i386/i386-options.cc (ix86_function_specific_save):
	Save ix86_apx_features.
	(ix86_function_specific_restore): Restore it.
	(ix86_valid_target_attribute_inner_p): Add mapxf.
	(ix86_option_override_internal): Set ix86_apx_features for PTA
	and TARGET_APX_F. Also reports error when APX_F is set but not
	having TARGET_64BIT.
	* config/i386/i386.opt: (-mapxf): New ISA flag option.
	(-mapx=): New enumeration option.
	(apx_features): New enum type.
	(apx_none): New enum value.
	(apx_egpr): Likewise.
	(apx_push2pop2): Likewise.
	(apx_ndd): Likewise.
	(apx_all): Likewise.
	* doc/invoke.texi: Document mapxf.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/apx-1.c: New test.
---
 gcc/common/config/i386/cpuinfo.h      | 12 +++++++++++-
 gcc/common/config/i386/i386-common.cc | 17 +++++++++++++++++
 gcc/common/config/i386/i386-cpuinfo.h |  1 +
 gcc/common/config/i386/i386-isas.h    |  1 +
 gcc/config/i386/cpuid.h               |  1 +
 gcc/config/i386/i386-isa.def          |  1 +
 gcc/config/i386/i386-options.cc       | 15 +++++++++++++++
 gcc/config/i386/i386-opts.h           |  8 ++++++++
 gcc/config/i386/i386.h                |  4 ++++
 gcc/config/i386/i386.opt              | 25 +++++++++++++++++++++++++
 gcc/doc/invoke.texi                   | 11 +++++++----
 gcc/testsuite/gcc.target/i386/apx-1.c |  8 ++++++++
 12 files changed, 99 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-1.c

diff --git a/gcc/common/config/i386/cpuinfo.h b/gcc/common/config/i386/cpuinfo.h
index 24ae0dbf0ac..141d3743316 100644
--- a/gcc/common/config/i386/cpuinfo.h
+++ b/gcc/common/config/i386/cpuinfo.h
@@ -678,6 +678,7 @@ get_available_features (struct __processor_model *cpu_model,
 #define XSTATE_HI_ZMM			0x80
 #define XSTATE_TILECFG			0x20000
 #define XSTATE_TILEDATA		0x40000
+#define XSTATE_APX_F			0x80000
 
 #define XCR_AVX_ENABLED_MASK \
   (XSTATE_SSE | XSTATE_YMM)
@@ -685,11 +686,13 @@ get_available_features (struct __processor_model *cpu_model,
   (XSTATE_SSE | XSTATE_YMM | XSTATE_OPMASK | XSTATE_ZMM | XSTATE_HI_ZMM)
 #define XCR_AMX_ENABLED_MASK \
   (XSTATE_TILECFG | XSTATE_TILEDATA)
+#define XCR_APX_F_ENABLED_MASK XSTATE_APX_F
 
-  /* Check if AVX and AVX512 are usable.  */
+  /* Check if AVX, AVX512 and APX are usable.  */
   int avx_usable = 0;
   int avx512_usable = 0;
   int amx_usable = 0;
+  int apx_usable = 0;
   /* Check if KL is usable.  */
   int has_kl = 0;
   if ((ecx & bit_OSXSAVE))
@@ -709,6 +712,8 @@ get_available_features (struct __processor_model *cpu_model,
 	}
       amx_usable = ((xcrlow & XCR_AMX_ENABLED_MASK)
 		    == XCR_AMX_ENABLED_MASK);
+      apx_usable = ((xcrlow & XCR_APX_F_ENABLED_MASK)
+		    == XCR_APX_F_ENABLED_MASK);
     }
 
 #define set_feature(f) \
@@ -922,6 +927,11 @@ get_available_features (struct __processor_model *cpu_model,
 	      if (edx & bit_AMX_COMPLEX)
 		set_feature (FEATURE_AMX_COMPLEX);
 	    }
+	  if (apx_usable)
+	    {
+	      if (edx & bit_APX_F)
+		set_feature (FEATURE_APX_F);
+	    }
 	}
     }
 
diff --git a/gcc/common/config/i386/i386-common.cc b/gcc/common/config/i386/i386-common.cc
index 95468b7c405..86596e96ad1 100644
--- a/gcc/common/config/i386/i386-common.cc
+++ b/gcc/common/config/i386/i386-common.cc
@@ -123,6 +123,7 @@ along with GCC; see the file COPYING3.  If not see
 #define OPTION_MASK_ISA2_SM3_SET OPTION_MASK_ISA2_SM3
 #define OPTION_MASK_ISA2_SHA512_SET OPTION_MASK_ISA2_SHA512
 #define OPTION_MASK_ISA2_SM4_SET OPTION_MASK_ISA2_SM4
+#define OPTION_MASK_ISA2_APX_F_SET OPTION_MASK_ISA2_APX_F
 
 /* SSE4 includes both SSE4.1 and SSE4.2. -msse4 should be the same
    as -msse4.2.  */
@@ -309,6 +310,7 @@ along with GCC; see the file COPYING3.  If not see
 #define OPTION_MASK_ISA2_SM3_UNSET OPTION_MASK_ISA2_SM3
 #define OPTION_MASK_ISA2_SHA512_UNSET OPTION_MASK_ISA2_SHA512
 #define OPTION_MASK_ISA2_SM4_UNSET OPTION_MASK_ISA2_SM4
+#define OPTION_MASK_ISA2_APX_F_UNSET OPTION_MASK_ISA2_APX_F
 
 /* SSE4 includes both SSE4.1 and SSE4.2.  -mno-sse4 should the same
    as -mno-sse4.1. */
@@ -1341,6 +1343,21 @@ ix86_handle_option (struct gcc_options *opts,
 	}
       return true;
 
+    case OPT_mapxf:
+      if (value)
+	{
+	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_APX_F_SET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_APX_F_SET;
+	  opts->x_ix86_apx_features = apx_all;
+	}
+      else
+	{
+	  opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_APX_F_UNSET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_APX_F_UNSET;
+	  opts->x_ix86_apx_features = apx_none;
+	}
+      return true;
+
     case OPT_mfma:
       if (value)
 	{
diff --git a/gcc/common/config/i386/i386-cpuinfo.h b/gcc/common/config/i386/i386-cpuinfo.h
index 9153b4d0a54..8bf592191ab 100644
--- a/gcc/common/config/i386/i386-cpuinfo.h
+++ b/gcc/common/config/i386/i386-cpuinfo.h
@@ -261,6 +261,7 @@ enum processor_features
   FEATURE_SM3,
   FEATURE_SHA512,
   FEATURE_SM4,
+  FEATURE_APX_F,
   CPU_FEATURE_MAX
 };
 
diff --git a/gcc/common/config/i386/i386-isas.h b/gcc/common/config/i386/i386-isas.h
index 2297903a45e..47e0cbd6f5b 100644
--- a/gcc/common/config/i386/i386-isas.h
+++ b/gcc/common/config/i386/i386-isas.h
@@ -191,4 +191,5 @@ ISA_NAMES_TABLE_START
   ISA_NAMES_TABLE_ENTRY("sm3", FEATURE_SM3, P_NONE, "-msm3")
   ISA_NAMES_TABLE_ENTRY("sha512", FEATURE_SHA512, P_NONE, "-msha512")
   ISA_NAMES_TABLE_ENTRY("sm4", FEATURE_SM4, P_NONE, "-msm4")
+  ISA_NAMES_TABLE_ENTRY("apxf", FEATURE_APX_F, P_NONE, "-mapxf")
 ISA_NAMES_TABLE_END
diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h
index 73c15480350..f3d3a2a1c22 100644
--- a/gcc/config/i386/cpuid.h
+++ b/gcc/config/i386/cpuid.h
@@ -149,6 +149,7 @@
 #define bit_AVXNECONVERT	(1 << 5)
 #define bit_AVXVNNIINT16	(1 << 10)
 #define bit_PREFETCHI	(1 << 14)
+#define bit_APX_F	(1 << 21)
 
 /* Extended State Enumeration Sub-leaf (%eax == 0xd, %ecx == 1) */
 #define bit_XSAVEOPT	(1 << 0)
diff --git a/gcc/config/i386/i386-isa.def b/gcc/config/i386/i386-isa.def
index aeafcf870ac..c581f343339 100644
--- a/gcc/config/i386/i386-isa.def
+++ b/gcc/config/i386/i386-isa.def
@@ -121,3 +121,4 @@ DEF_PTA(AVXVNNIINT16)
 DEF_PTA(SM3)
 DEF_PTA(SHA512)
 DEF_PTA(SM4)
+DEF_PTA(APX_F)
diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
index e47f9ed5d5f..8881462e3b0 100644
--- a/gcc/config/i386/i386-options.cc
+++ b/gcc/config/i386/i386-options.cc
@@ -694,6 +694,7 @@ ix86_function_specific_save (struct cl_target_option *ptr,
   ptr->branch_cost = ix86_branch_cost;
   ptr->tune_defaulted = ix86_tune_defaulted;
   ptr->arch_specified = ix86_arch_specified;
+  ptr->x_ix86_apx_features = opts->x_ix86_apx_features;
   ptr->x_ix86_isa_flags_explicit = opts->x_ix86_isa_flags_explicit;
   ptr->x_ix86_isa_flags2_explicit = opts->x_ix86_isa_flags2_explicit;
   ptr->x_recip_mask_explicit = opts->x_recip_mask_explicit;
@@ -832,6 +833,7 @@ ix86_function_specific_restore (struct gcc_options *opts,
   ix86_prefetch_sse = ptr->prefetch_sse;
   ix86_tune_defaulted = ptr->tune_defaulted;
   ix86_arch_specified = ptr->arch_specified;
+  opts->x_ix86_apx_features = ptr->x_ix86_apx_features;
   opts->x_ix86_isa_flags_explicit = ptr->x_ix86_isa_flags_explicit;
   opts->x_ix86_isa_flags2_explicit = ptr->x_ix86_isa_flags2_explicit;
   opts->x_recip_mask_explicit = ptr->x_recip_mask_explicit;
@@ -1109,6 +1111,7 @@ ix86_valid_target_attribute_inner_p (tree fndecl, tree args, char *p_strings[],
     IX86_ATTR_ISA ("sm3", OPT_msm3),
     IX86_ATTR_ISA ("sha512", OPT_msha512),
     IX86_ATTR_ISA ("sm4", OPT_msm4),
+    IX86_ATTR_ISA ("apxf", OPT_mapxf),
 
     /* enum options */
     IX86_ATTR_ENUM ("fpmath=",	OPT_mfpmath_),
@@ -2080,6 +2083,9 @@ ix86_option_override_internal (bool main_args_p,
       opts->x_ix86_stringop_alg = no_stringop;
     }
 
+  if (TARGET_APX_F && !TARGET_64BIT)
+    error ("%<-mapxf%> is not supported for 32-bit code");
+
   if (TARGET_UINTR && !TARGET_64BIT)
     error ("%<-muintr%> not supported for 32-bit code");
 
@@ -2293,6 +2299,11 @@ ix86_option_override_internal (bool main_args_p,
 	      SET_TARGET_POPCNT (opts);
 	  }
 
+	if (TARGET_64BIT_P (opts->x_ix86_isa_flags)
+	     && ((processor_alias_table[i].flags & PTA_APX_F) != 0)
+	     && !TARGET_EXPLICIT_APX_F_P (opts))
+	  opts->x_ix86_apx_features = apx_all;
+
 	if ((processor_alias_table[i].flags
 	   & (PTA_PREFETCH_SSE | PTA_SSE)) != 0)
 	  ix86_prefetch_sse = true;
@@ -2444,6 +2455,10 @@ ix86_option_override_internal (bool main_args_p,
   /* Arrange to set up i386_stack_locals for all functions.  */
   init_machine_status = ix86_init_machine_status;
 
+  /* Override APX flag here if ISA bit is set.  */
+  if (TARGET_APX_F && opts->x_ix86_apx_features != apx_all)
+    opts->x_ix86_apx_features = apx_all;
+
   /* Validate -mregparm= value.  */
   if (opts_set->x_ix86_regparm)
     {
diff --git a/gcc/config/i386/i386-opts.h b/gcc/config/i386/i386-opts.h
index be359f3e3d5..2ec76a16bce 100644
--- a/gcc/config/i386/i386-opts.h
+++ b/gcc/config/i386/i386-opts.h
@@ -134,4 +134,12 @@ enum lam_type {
   lam_u57
 };
 
+enum apx_features {
+  apx_none = 0,
+  apx_egpr = 1 << 0,
+  apx_push2pop2 = 1 << 1,
+  apx_ndd = 1 << 2,
+  apx_all = apx_egpr | apx_push2pop2 | apx_ndd,
+};
+
 #endif
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 3e8488f2ae8..8c7ed541a8f 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -51,6 +51,10 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 
 #define TARGET_MMX_WITH_SSE	(TARGET_64BIT && TARGET_SSE2)
 
+#define TARGET_APX_EGPR (ix86_apx_features & apx_egpr)
+#define TARGET_APX_PUSH2POP2 (ix86_apx_features & apx_push2pop2)
+#define TARGET_APX_NDD (ix86_apx_features & apx_ndd)
+
 #include "config/vxworks-dummy.h"
 
 #include "config/i386/i386-opts.h"
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index 78b499304a4..1ee4d90186e 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -1310,3 +1310,28 @@ Enable vectorization for gather instruction.
 mscatter
 Target Alias(mtune-ctrl=, use_scatter, ^use_scatter)
 Enable vectorization for scatter instruction.
+
+mapxf
+Target Mask(ISA2_APX_F) Var(ix86_isa_flags2) Save
+Support APX code generation.
+
+mapx=
+Target Joined Enum(apx_features) EnumSet Var(ix86_apx_features) Init(apx_none) Save
+
+Enum
+Name(apx_features) Type(int)
+
+EnumValue
+Enum(apx_features) String(none) Value(apx_none) Set(1)
+
+EnumValue
+Enum(apx_features) String(egpr) Value(apx_egpr) Set(2)
+
+EnumValue
+Enum(apx_features) String(push2pop2) Value(apx_push2pop2) Set(3)
+
+EnumValue
+Enum(apx_features) String(ndd) Value(apx_ndd) Set(4)
+
+EnumValue
+Enum(apx_features) String(all) Value(apx_all) Set(1)
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 16aa92b5e86..48d7ccc3be8 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1438,7 +1438,7 @@ See RS/6000 and PowerPC Options.
 -mrdseed  -msgx -mavx512vp2intersect -mserialize -mtsxldtrk
 -mamx-tile  -mamx-int8  -mamx-bf16 -muintr -mhreset -mavxvnni
 -mavx512fp16 -mavxifma -mavxvnniint8 -mavxneconvert -mcmpccxadd -mamx-fp16
--mprefetchi -mraoint -mamx-complex -mavxvnniint16 -msm3 -msha512 -msm4
+-mprefetchi -mraoint -mamx-complex -mavxvnniint16 -msm3 -msha512 -msm4 -mapxf
 -mcldemote  -mms-bitfields  -mno-align-stringops  -minline-all-stringops
 -minline-stringops-dynamically  -mstringop-strategy=@var{alg}
 -mkl -mwidekl
@@ -33688,6 +33688,9 @@ preferred alignment to @option{-mpreferred-stack-boundary=2}.
 @need 200
 @opindex msm4
 @itemx -msm4
+@need 200
+@opindex mapxf
+@itemx -mapxf
 These switches enable the use of instructions in the MMX, SSE,
 AVX512ER, AVX512CD, AVX512VL, AVX512BW, AVX512DQ, AVX512IFMA, AVX512VBMI, SHA,
 AES, PCLMUL, CLFLUSHOPT, CLWB, FSGSBASE, PTWRITE, RDRND, F16C, FMA, PCONFIG,
@@ -33698,9 +33701,9 @@ GFNI, VAES, WAITPKG, VPCLMULQDQ, AVX512BITALG, MOVDIRI, MOVDIR64B, AVX512BF16,
 ENQCMD, AVX512VPOPCNTDQ, AVX5124FMAPS, AVX512VNNI, AVX5124VNNIW, SERIALIZE,
 UINTR, HRESET, AMXTILE, AMXINT8, AMXBF16, KL, WIDEKL, AVXVNNI, AVX512-FP16,
 AVXIFMA, AVXVNNIINT8, AVXNECONVERT, CMPCCXADD, AMX-FP16, PREFETCHI, RAOINT,
-AMX-COMPLEX, AVXVNNIINT16, SM3, SHA512, SM4 or CLDEMOTE extended instruction
-sets. Each has a corresponding @option{-mno-} option to disable use of these
-instructions.
+AMX-COMPLEX, AVXVNNIINT16, SM3, SHA512, SM4, APX_F or CLDEMOTE extended
+instruction sets. Each has a corresponding @option{-mno-} option to disable
+use of these instructions.
 
 These extensions are also available as built-in functions: see
 @ref{x86 Built-in Functions}, for details of the functions enabled and
diff --git a/gcc/testsuite/gcc.target/i386/apx-1.c b/gcc/testsuite/gcc.target/i386/apx-1.c
new file mode 100644
index 00000000000..956229ab6e3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/apx-1.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mapxf" } */
+/* { dg-error "'-mapxf' not supported for 32-bit code" "" { target ia32 } 0 } */
+
+void
+apx_hanlder ()
+{
+}
-- 
2.31.1


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 04/13] [APX EGPR] Add 16 new integer general purpose registers
  2023-08-31  8:20 [PATCH 00/13] [RFC] Support Intel APX EGPR Hongyu Wang
                   ` (2 preceding siblings ...)
  2023-08-31  8:20 ` [PATCH 03/13] [APX_EGPR] Initial support for APX_F Hongyu Wang
@ 2023-08-31  8:20 ` Hongyu Wang
  2023-08-31  8:20 ` [PATCH 05/13] [APX EGPR] Add register and memory constraints that disallow EGPR Hongyu Wang
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 49+ messages in thread
From: Hongyu Wang @ 2023-08-31  8:20 UTC (permalink / raw)
  To: gcc-patches; +Cc: hongtao.liu, ubizjak, hubicka, vmakarov, jakub, Kong Lingling

From: Kong Lingling <lingling.kong@intel.com>

Extend GENERAL_REGS with extra r16-r31 registers like REX registers,
named as REX2 registers. They will only be enabled under
TARGET_APX_EGPR.

gcc/ChangeLog:

	* config/i386/i386-protos.h (x86_extended_rex2reg_mentioned_p):
	New function prototype.
	* config/i386/i386.cc (regclass_map): Add mapping for 16 new
	general registers.
	(debugger64_register_map): Likewise.
	(ix86_conditional_register_usage): Clear REX2 register when APX
	disabled.
	(ix86_code_end): Add handling for REX2 reg.
	(print_reg): Likewise.
	(ix86_output_jmp_thunk_or_indirect): Likewise.
	(ix86_output_indirect_branch_via_reg): Likewise.
	(ix86_attr_length_vex_default): Likewise.
	(ix86_emit_save_regs): Adjust to allow saving r31.
	(ix86_register_priority): Set REX2 reg priority same as REX.
	(x86_extended_reg_mentioned_p): Add check for REX2 regs.
	(x86_extended_rex2reg_mentioned_p): New function.
	* config/i386/i386.h (CALL_USED_REGISTERS): Add new extended
	registers.
	(REG_ALLOC_ORDER): Likewise.
	(FIRST_REX2_INT_REG): Define.
	(LAST_REX2_INT_REG): Ditto.
	(GENERAL_REGS): Add 16 new registers.
	(INT_SSE_REGS): Likewise.
	(FLOAT_INT_REGS): Likewise.
	(FLOAT_INT_SSE_REGS): Likewise.
	(INT_MASK_REGS): Likewise.
	(ALL_REGS):Likewise.
	(REX2_INT_REG_P): Define.
	(REX2_INT_REGNO_P): Ditto.
	(GENERAL_REGNO_P): Add REX2_INT_REGNO_P.
	(REGNO_OK_FOR_INDEX_P): Ditto.
	(REG_OK_FOR_INDEX_NONSTRICT_P): Add new extended registers.
	* config/i386/i386.md: Add 16 new integer general
	registers.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/apx-egprs-names.c: New test.
	* gcc.target/i386/apx-spill_to_egprs-1.c: Likewise.
	* gcc.target/i386/apx-interrupt-1.c: Likewise.
---
 gcc/config/i386/i386-protos.h                 |   1 +
 gcc/config/i386/i386.cc                       |  67 ++++++++++--
 gcc/config/i386/i386.h                        |  47 +++++---
 gcc/config/i386/i386.md                       |  18 +++-
 .../gcc.target/i386/apx-egprs-names.c         |  17 +++
 .../gcc.target/i386/apx-interrupt-1.c         | 102 ++++++++++++++++++
 .../gcc.target/i386/apx-spill_to_egprs-1.c    |  25 +++++
 7 files changed, 253 insertions(+), 24 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-egprs-names.c
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-interrupt-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-spill_to_egprs-1.c

diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 9ffb125fc2b..bd4782800c4 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -64,6 +64,7 @@ extern bool symbolic_reference_mentioned_p (rtx);
 extern bool extended_reg_mentioned_p (rtx);
 extern bool x86_extended_QIreg_mentioned_p (rtx_insn *);
 extern bool x86_extended_reg_mentioned_p (rtx);
+extern bool x86_extended_rex2reg_mentioned_p (rtx);
 extern bool x86_maybe_negate_const_int (rtx *, machine_mode);
 extern machine_mode ix86_cc_mode (enum rtx_code, rtx, rtx);
 
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 1bc3f11ff07..d26d9ab0d9d 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -169,7 +169,12 @@ enum reg_class const regclass_map[FIRST_PSEUDO_REGISTER] =
   ALL_SSE_REGS, ALL_SSE_REGS, ALL_SSE_REGS, ALL_SSE_REGS,
   /* Mask registers.  */
   ALL_MASK_REGS, MASK_REGS, MASK_REGS, MASK_REGS,
-  MASK_REGS, MASK_REGS, MASK_REGS, MASK_REGS
+  MASK_REGS, MASK_REGS, MASK_REGS, MASK_REGS,
+  /* REX2 registers */
+  GENERAL_REGS, GENERAL_REGS, GENERAL_REGS, GENERAL_REGS,
+  GENERAL_REGS, GENERAL_REGS, GENERAL_REGS, GENERAL_REGS,
+  GENERAL_REGS, GENERAL_REGS, GENERAL_REGS, GENERAL_REGS,
+  GENERAL_REGS, GENERAL_REGS, GENERAL_REGS, GENERAL_REGS,
 };
 
 /* The "default" register map used in 32bit mode.  */
@@ -227,7 +232,10 @@ int const debugger64_register_map[FIRST_PSEUDO_REGISTER] =
   /* AVX-512 registers 24-31 */
   75, 76, 77, 78, 79, 80, 81, 82,
   /* Mask registers */
-  118, 119, 120, 121, 122, 123, 124, 125
+  118, 119, 120, 121, 122, 123, 124, 125,
+  /* rex2 extend interger registers */
+  130, 131, 132, 133, 134, 135, 136, 137,
+  138, 139, 140, 141, 142, 143, 144, 145
 };
 
 /* Define the register numbers to be used in Dwarf debugging information.
@@ -521,6 +529,13 @@ ix86_conditional_register_usage (void)
 
       accessible_reg_set &= ~reg_class_contents[ALL_MASK_REGS];
     }
+
+  /* If APX is disabled, disable the registers.  */
+  if (! (TARGET_APX_EGPR && TARGET_64BIT))
+    {
+      for (i = FIRST_REX2_INT_REG; i <= LAST_REX2_INT_REG; i++)
+	CLEAR_HARD_REG_BIT (accessible_reg_set, i);
+    }
 }
 
 /* Canonicalize a comparison from one we don't have to one we do have.  */
@@ -6179,6 +6194,13 @@ ix86_code_end (void)
 					regno, false);
     }
 
+  for (regno = FIRST_REX2_INT_REG; regno <= LAST_REX2_INT_REG; regno++)
+    {
+      if (TEST_HARD_REG_BIT (indirect_thunks_used, regno))
+	output_indirect_thunk_function (indirect_thunk_prefix_none,
+					regno, false);
+    }
+
   for (regno = FIRST_INT_REG; regno <= LAST_INT_REG; regno++)
     {
       char name[32];
@@ -7190,10 +7212,10 @@ choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned int *align,
 static void
 ix86_emit_save_regs (void)
 {
-  unsigned int regno;
+  int regno;
   rtx_insn *insn;
 
-  for (regno = FIRST_PSEUDO_REGISTER - 1; regno-- > 0; )
+  for (regno = FIRST_PSEUDO_REGISTER - 1; regno >= 0; regno--)
     if (GENERAL_REGNO_P (regno) && ix86_save_reg (regno, true, true))
       {
 	insn = emit_insn (gen_push (gen_rtx_REG (word_mode, regno)));
@@ -13037,7 +13059,7 @@ print_reg (rtx x, int code, FILE *file)
 
   /* Irritatingly, AMD extended registers use
      different naming convention: "r%d[bwd]"  */
-  if (REX_INT_REGNO_P (regno))
+  if (REX_INT_REGNO_P (regno) || REX2_INT_REGNO_P (regno))
     {
       gcc_assert (TARGET_64BIT);
       switch (msize)
@@ -16251,7 +16273,7 @@ ix86_output_jmp_thunk_or_indirect (const char *thunk_name, const int regno)
 {
   if (thunk_name != NULL)
     {
-      if (REX_INT_REGNO_P (regno)
+      if ((REX_INT_REGNO_P (regno) || REX2_INT_REGNO_P (regno))
 	  && ix86_indirect_branch_cs_prefix)
 	fprintf (asm_out_file, "\tcs\n");
       fprintf (asm_out_file, "\tjmp\t");
@@ -16303,7 +16325,7 @@ ix86_output_indirect_branch_via_reg (rtx call_op, bool sibcall_p)
     {
       if (thunk_name != NULL)
 	{
-	  if (REX_INT_REGNO_P (regno)
+	  if ((REX_INT_REGNO_P (regno) || REX_INT_REGNO_P (regno))
 	      && ix86_indirect_branch_cs_prefix)
 	    fprintf (asm_out_file, "\tcs\n");
 	  fprintf (asm_out_file, "\tcall\t");
@@ -17060,19 +17082,26 @@ ix86_attr_length_vex_default (rtx_insn *insn, bool has_0f_opcode,
   for (i = recog_data.n_operands - 1; i >= 0; --i)
     if (REG_P (recog_data.operand[i]))
       {
-	/* REX.W bit uses 3 byte VEX prefix.  */
+	/* REX.W bit uses 3 byte VEX prefix.
+	   REX2 with vex use extended EVEX prefix length is 4-byte.  */
 	if (GET_MODE (recog_data.operand[i]) == DImode
 	    && GENERAL_REG_P (recog_data.operand[i]))
 	  return 3 + 1;
 
 	/* REX.B bit requires 3-byte VEX. Right here we don't know which
-	   operand will be encoded using VEX.B, so be conservative.  */
+	   operand will be encoded using VEX.B, so be conservative.
+	   REX2 with vex use extended EVEX prefix length is 4-byte.  */
 	if (REX_INT_REGNO_P (recog_data.operand[i])
+	    || REX2_INT_REGNO_P (recog_data.operand[i])
 	    || REX_SSE_REGNO_P (recog_data.operand[i]))
 	  reg_only = 3 + 1;
       }
     else if (MEM_P (recog_data.operand[i]))
       {
+	/* REX2.X or REX2.B bits use 3 byte VEX prefix.  */
+	if (x86_extended_rex2reg_mentioned_p (recog_data.operand[i]))
+	  return 4;
+
 	/* REX.X or REX.B bits use 3 byte VEX prefix.  */
 	if (x86_extended_reg_mentioned_p (recog_data.operand[i]))
 	  return 3 + 1;
@@ -19509,6 +19538,8 @@ ix86_register_priority (int hard_regno)
   /* New x86-64 int registers result in bigger code size.  Discourage them.  */
   if (REX_INT_REGNO_P (hard_regno))
     return 2;
+  if (REX2_INT_REGNO_P (hard_regno))
+    return 2;
   /* New x86-64 SSE registers result in bigger code size.  Discourage them.  */
   if (REX_SSE_REGNO_P (hard_regno))
     return 2;
@@ -22755,7 +22786,23 @@ x86_extended_reg_mentioned_p (rtx insn)
     {
       const_rtx x = *iter;
       if (REG_P (x)
-	  && (REX_INT_REGNO_P (REGNO (x)) || REX_SSE_REGNO_P (REGNO (x))))
+	  && (REX_INT_REGNO_P (REGNO (x)) || REX_SSE_REGNO_P (REGNO (x))
+	      || REX2_INT_REGNO_P (REGNO (x))))
+	return true;
+    }
+  return false;
+}
+
+/* Return true when INSN mentions register that must be encoded using REX2
+   prefix.  */
+bool
+x86_extended_rex2reg_mentioned_p (rtx insn)
+{
+  subrtx_iterator::array_type array;
+  FOR_EACH_SUBRTX (iter, array, INSN_P (insn) ? PATTERN (insn) : insn, NONCONST)
+    {
+      const_rtx x = *iter;
+      if (REG_P (x) && REX2_INT_REGNO_P (REGNO (x)))
 	return true;
     }
   return false;
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 8c7ed541a8f..1ab291177f5 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -948,7 +948,11 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
 /*xmm24,xmm25,xmm26,xmm27,xmm28,xmm29,xmm30,xmm31*/		\
      0,   0,    0,    0,    0,    0,    0,    0,		\
 /*  k0,  k1, k2, k3, k4, k5, k6, k7*/				\
-     0,  0,   0,  0,  0,  0,  0,  0 }
+     0,  0,   0,  0,  0,  0,  0,  0,				\
+/*  r16,  r17, r18, r19, r20, r21, r22, r23*/			\
+     0,   0,   0,   0,   0,   0,   0,   0,			\
+/*  r24,  r25, r26, r27, r28, r29, r30, r31*/			\
+     0,   0,   0,   0,   0,   0,   0,   0}			\
 
 /* 1 for registers not available across function calls.
    These must include the FIXED_REGISTERS and also any
@@ -985,7 +989,11 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
 /*xmm24,xmm25,xmm26,xmm27,xmm28,xmm29,xmm30,xmm31*/		\
      1,    1,     1,    1,    1,    1,    1,    1,		\
  /* k0,  k1,  k2,  k3,  k4,  k5,  k6,  k7*/			\
-     1,   1,   1,   1,   1,   1,   1,   1 }
+     1,   1,   1,   1,   1,   1,   1,   1,			\
+/*  r16,  r17, r18, r19, r20, r21, r22, r23*/			\
+     1,   1,   1,   1,   1,   1,   1,   1,			\
+/*  r24,  r25, r26, r27, r28, r29, r30, r31*/			\
+     1,   1,   1,   1,   1,   1,   1,   1}			\
 
 /* Order in which to allocate registers.  Each register must be
    listed once, even those in FIXED_REGISTERS.  List frame pointer
@@ -1001,7 +1009,8 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
   16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,	\
   32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,	\
   48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63,	\
-  64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75 }
+  64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79,	\
+  80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91}
 
 /* ADJUST_REG_ALLOC_ORDER is a macro which permits reg_alloc_order
    to be rearranged based on a particular function.  When using sse math,
@@ -1203,6 +1212,9 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
 #define FIRST_MASK_REG  MASK0_REG
 #define LAST_MASK_REG   MASK7_REG
 
+#define FIRST_REX2_INT_REG  R16_REG
+#define LAST_REX2_INT_REG   R31_REG
+
 /* Override this in other tm.h files to cope with various OS lossage
    requiring a frame pointer.  */
 #ifndef SUBTARGET_FRAME_POINTER_REQUIRED
@@ -1280,7 +1292,9 @@ enum reg_class
   INDEX_REGS,			/* %eax %ebx %ecx %edx %esi %edi %ebp */
   LEGACY_REGS,			/* %eax %ebx %ecx %edx %esi %edi %ebp %esp */
   GENERAL_REGS,			/* %eax %ebx %ecx %edx %esi %edi %ebp %esp
-				   %r8 %r9 %r10 %r11 %r12 %r13 %r14 %r15 */
+				   %r8 %r9 %r10 %r11 %r12 %r13 %r14 %r15
+				   %r16 %r17 %r18 %r19 %r20 %r21 %r22 %r23
+				   %r24 %r25 %r26 %r27 %r28 %r29 %r30 %r31 */
   FP_TOP_REG, FP_SECOND_REG,	/* %st(0) %st(1) */
   FLOAT_REGS,
   SSE_FIRST_REG,
@@ -1380,7 +1394,7 @@ enum reg_class
       { 0x7e,      0xff0,   0x0 },	/* TLS_GOTBASE_REGS */		\
       { 0x7f,      0xff0,   0x0 },	/* INDEX_REGS */		\
    { 0x900ff,        0x0,   0x0 },	/* LEGACY_REGS */		\
-   { 0x900ff,      0xff0,   0x0 },	/* GENERAL_REGS */		\
+   { 0x900ff,      0xff0,   0xffff000 },	/* GENERAL_REGS */		\
      { 0x100,        0x0,   0x0 },	/* FP_TOP_REG */		\
      { 0x200,        0x0,   0x0 },	/* FP_SECOND_REG */		\
     { 0xff00,        0x0,   0x0 },	/* FLOAT_REGS */		\
@@ -1390,13 +1404,13 @@ enum reg_class
  { 0xff00000, 0xfffff000,   0xf },	/* ALL_SSE_REGS */		\
 { 0xf0000000,        0xf,   0x0 },	/* MMX_REGS */			\
  { 0xff0ff00, 0xfffff000,   0xf },	/* FLOAT_SSE_REGS */		\
- {   0x9ffff,      0xff0,   0x0 },	/* FLOAT_INT_REGS */		\
- { 0xff900ff, 0xfffffff0,   0xf },	/* INT_SSE_REGS */		\
- { 0xff9ffff, 0xfffffff0,   0xf },	/* FLOAT_INT_SSE_REGS */	\
+ {   0x9ffff,      0xff0,   0xffff000 },	/* FLOAT_INT_REGS */		\
+ { 0xff900ff, 0xfffffff0,   0xffff00f },	/* INT_SSE_REGS */		\
+ { 0xff9ffff, 0xfffffff0,   0xffff00f },	/* FLOAT_INT_SSE_REGS */	\
        { 0x0,        0x0, 0xfe0 },	/* MASK_REGS */			\
        { 0x0,        0x0, 0xff0 },	/* ALL_MASK_REGS */		\
-   { 0x900ff,      0xff0, 0xff0 },	/* INT_MASK_REGS */	\
-{ 0xffffffff, 0xffffffff, 0xfff }	/* ALL_REGS  */			\
+   { 0x900ff,      0xff0, 0xffffff0 },	/* INT_MASK_REGS */	\
+{ 0xffffffff, 0xffffffff, 0xfffffff }	/* ALL_REGS  */			\
 }
 
 /* The same information, inverted:
@@ -1426,13 +1440,17 @@ enum reg_class
 #define REX_INT_REGNO_P(N) \
   IN_RANGE ((N), FIRST_REX_INT_REG, LAST_REX_INT_REG)
 
+#define REX2_INT_REG_P(X) (REG_P (X) && REX2_INT_REGNO_P (REGNO (X)))
+#define REX2_INT_REGNO_P(N) \
+  IN_RANGE ((N), FIRST_REX2_INT_REG, LAST_REX2_INT_REG)
+
 #define GENERAL_REG_P(X) (REG_P (X) && GENERAL_REGNO_P (REGNO (X)))
 #define GENERAL_REGNO_P(N) \
-  (LEGACY_INT_REGNO_P (N) || REX_INT_REGNO_P (N))
+  (LEGACY_INT_REGNO_P (N) || REX_INT_REGNO_P (N) || REX2_INT_REGNO_P (N))
 
 #define INDEX_REG_P(X) (REG_P (X) && INDEX_REGNO_P (REGNO (X)))
 #define INDEX_REGNO_P(N) \
-  (LEGACY_INDEX_REGNO_P (N) || REX_INT_REGNO_P (N))
+  (LEGACY_INDEX_REGNO_P (N) || REX_INT_REGNO_P (N) || REX2_INT_REGNO_P (N))
 
 #define ANY_QI_REG_P(X) (REG_P (X) && ANY_QI_REGNO_P (REGNO (X)))
 #define ANY_QI_REGNO_P(N) \
@@ -1698,6 +1716,7 @@ typedef struct ix86_args {
    has been allocated, which happens in reginfo.cc during register
    allocation.  */
 
+
 #define REGNO_OK_FOR_INDEX_P(REGNO)					\
   (INDEX_REGNO_P (REGNO)						\
    || INDEX_REGNO_P (reg_renumber[(REGNO)]))
@@ -1990,7 +2009,9 @@ do {							\
  "xmm20", "xmm21", "xmm22", "xmm23",					\
  "xmm24", "xmm25", "xmm26", "xmm27",					\
  "xmm28", "xmm29", "xmm30", "xmm31",					\
- "k0", "k1", "k2", "k3", "k4", "k5", "k6", "k7" }
+ "k0", "k1", "k2", "k3", "k4", "k5", "k6", "k7",			\
+ "r16", "r17", "r18", "r19", "r20", "r21", "r22", "r23",		\
+ "r24", "r25", "r26", "r27", "r28", "r29", "r30", "r31" }
 
 #define REGISTER_NAMES HI_REGISTER_NAMES
 
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index eef8a0e01eb..e3270658cb7 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -464,7 +464,23 @@ (define_constants
    (MASK5_REG			73)
    (MASK6_REG			74)
    (MASK7_REG			75)
-   (FIRST_PSEUDO_REG		76)
+   (R16_REG			76)
+   (R17_REG			77)
+   (R18_REG			78)
+   (R19_REG			79)
+   (R20_REG			80)
+   (R21_REG			81)
+   (R22_REG			82)
+   (R23_REG			83)
+   (R24_REG			84)
+   (R25_REG			85)
+   (R26_REG			86)
+   (R27_REG			87)
+   (R28_REG			88)
+   (R29_REG			89)
+   (R30_REG			90)
+   (R31_REG			91)
+   (FIRST_PSEUDO_REG		92)
   ])
 
 ;; Insn callee abi index.
diff --git a/gcc/testsuite/gcc.target/i386/apx-egprs-names.c b/gcc/testsuite/gcc.target/i386/apx-egprs-names.c
new file mode 100644
index 00000000000..445bcf2c250
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/apx-egprs-names.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-mapxf -m64" } */
+/* { dg-final { scan-assembler "r31" } } */
+/* { dg-final { scan-assembler "r30" } } */
+/* { dg-final { scan-assembler "r29" } } */
+/* { dg-final { scan-assembler "r28" } } */
+void foo ()
+{
+  register long a __asm ("r31");
+  register int b __asm ("r30");
+  register short c __asm ("r29");
+  register char d __asm ("r28");
+  __asm__ __volatile__ ("mov %0, %%rax" : : "r" (a) : "rax");
+  __asm__ __volatile__ ("mov %0, %%eax" : : "r" (b) : "eax");
+  __asm__ __volatile__ ("mov %0, %%eax" : : "r" (c) : "eax");
+  __asm__ __volatile__ ("mov %0, %%eax" : : "r" (d) : "eax");
+}
diff --git a/gcc/testsuite/gcc.target/i386/apx-interrupt-1.c b/gcc/testsuite/gcc.target/i386/apx-interrupt-1.c
new file mode 100644
index 00000000000..441dbf04bf2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/apx-interrupt-1.c
@@ -0,0 +1,102 @@
+/* { dg-do compile } */
+/* { dg-options "-mapxf -m64 -O2 -mgeneral-regs-only -mno-cld -mno-push-args -maccumulate-outgoing-args" } */
+
+extern void foo (void *) __attribute__ ((interrupt));
+extern int bar (int);
+
+void foo (void *frame)
+{
+  int a,b,c,d,e,f,i;
+  a = bar (5);
+  b = bar (a);
+  c = bar (b);
+  d = bar (c);
+  e = bar (d);
+  f = bar (e);
+  for (i = 1; i < 10; i++)
+  {
+    a += bar (a + i) + bar (b + i) +
+	 bar (c + i) + bar (d + i) +
+	 bar (e + i) + bar (f + i);
+  }
+}
+/* { dg-final { scan-assembler-times "push(?:l|q)\[\\t \]*%(?:e|r)ax" 1 } } */
+/* { dg-final { scan-assembler-times "push(?:l|q)\[\\t \]*%(?:e|r)bx" 1 } } */
+/* { dg-final { scan-assembler-times "push(?:l|q)\[\\t \]*%(?:e|r)cx" 1 } } */
+/* { dg-final { scan-assembler-times "push(?:l|q)\[\\t \]*%(?:e|r)dx" 1 } } */
+/* { dg-final { scan-assembler-times "push(?:l|q)\[\\t \]*%(?:e|r)si" 1 } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%rdi" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r8" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r9" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r10" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r11" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r12" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r13" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r14" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r15" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r16" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r17" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r18" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r19" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r20" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r21" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r22" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r23" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r24" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r25" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r26" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r27" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r28" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r29" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r30" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "pushq\[\\t \]*%r31" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times {\t\.cfi_offset 145, -16} 1 } } */
+/* { dg-final { scan-assembler-times {\t\.cfi_offset 144, -24} 1 } } */
+/* { dg-final { scan-assembler-times {\t\.cfi_offset 143, -32} 1 } } */
+/* { dg-final { scan-assembler-times {\t\.cfi_offset 142, -40} 1 } } */
+/* { dg-final { scan-assembler-times {\t\.cfi_offset 141, -48} 1 } } */
+/* { dg-final { scan-assembler-times {\t\.cfi_offset 140, -56} 1 } } */
+/* { dg-final { scan-assembler-times {\t\.cfi_offset 139, -64} 1 } } */
+/* { dg-final { scan-assembler-times {\t\.cfi_offset 138, -72} 1 } } */
+/* { dg-final { scan-assembler-times {\t\.cfi_offset 137, -80} 1 } } */
+/* { dg-final { scan-assembler-times {\t\.cfi_offset 136, -88} 1 } } */
+/* { dg-final { scan-assembler-times {\t\.cfi_offset 135, -96} 1 } } */
+/* { dg-final { scan-assembler-times {\t\.cfi_offset 134, -104} 1 } } */
+/* { dg-final { scan-assembler-times {\t\.cfi_offset 133, -112} 1 } } */
+/* { dg-final { scan-assembler-times {\t\.cfi_offset 132, -120} 1 } } */
+/* { dg-final { scan-assembler-times {\t\.cfi_offset 131, -128} 1 } } */
+/* { dg-final { scan-assembler-times {\t\.cfi_offset 130, -136} 1 } } */
+/* { dg-final { scan-assembler-times ".cfi_restore" 15} } */
+/* { dg-final { scan-assembler-times "pop(?:l|q)\[\\t \]*%(?:e|r)ax" 1 } } */
+/* { dg-final { scan-assembler-times "pop(?:l|q)\[\\t \]*%(?:e|r)bx" 1 } } */
+/* { dg-final { scan-assembler-times "pop(?:l|q)\[\\t \]*%(?:e|r)cx" 1 } } */
+/* { dg-final { scan-assembler-times "pop(?:l|q)\[\\t \]*%(?:e|r)dx" 1 } } */
+/* { dg-final { scan-assembler-times "pop(?:l|q)\[\\t \]*%(?:e|r)si" 1 } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%rdi" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r8" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r9" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r10" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r11" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r12" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r13" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r14" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r15" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r16" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r17" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r18" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r19" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r20" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r21" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r22" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r23" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r24" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r25" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r26" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r27" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r28" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r29" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r30" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "popq\[\\t \]*%r31" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "iret" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "iretq" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "\tcld" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/apx-spill_to_egprs-1.c b/gcc/testsuite/gcc.target/i386/apx-spill_to_egprs-1.c
new file mode 100644
index 00000000000..290863d63a7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/apx-spill_to_egprs-1.c
@@ -0,0 +1,25 @@
+/* { dg-do compile  { target { ! ia32 } } } */
+/* { dg-options "-O2 -march=skylake-avx512 -mapxf -DDTYPE32" } */
+
+#include "spill_to_mask-1.c"
+
+/* { dg-final { scan-assembler "movl\[ \t]+\[^\\n\\r\]*, %r16d" } } */
+/* { dg-final { scan-assembler "movl\[ \t]+\[^\\n\\r\]*, %r17d" } } */
+/* { dg-final { scan-assembler "movl\[ \t]+\[^\\n\\r\]*, %r18d" } } */
+/* { dg-final { scan-assembler "movq\[ \t]+\[^\\n\\r\]*, %r19" } } */
+/* { dg-final { scan-assembler "movl\[ \t]+\[^\\n\\r\]*, %r20d" } } */
+/* { dg-final { scan-assembler "movl\[ \t]+\[^\\n\\r\]*, %r21d" } } */
+/* { dg-final { scan-assembler "movl\[ \t]+\[^\\n\\r\]*, %r22d" } } */
+/* { dg-final { scan-assembler "movl\[ \t]+\[^\\n\\r\]*, %r23d" } } */
+/* { dg-final { scan-assembler "movl\[ \t]+\[^\\n\\r\]*, %r24d" } } */
+/* { dg-final { scan-assembler "addl\[ \t]+\[^\\n\\r\]*, %r25d" } } */
+/* { dg-final { scan-assembler "movl\[ \t]+\[^\\n\\r\]*, %r26d" } } */
+/* { dg-final { scan-assembler "movl\[ \t]+\[^\\n\\r\]*, %r27d" } } */
+/* { dg-final { scan-assembler "movbel\[ \t]+\[^\\n\\r\]*, %r28d" } } */
+/* { dg-final { scan-assembler "movbel\[ \t]+\[^\\n\\r\]*, %r29d" } } */
+/* { dg-final { scan-assembler "movbel\[ \t]+\[^\\n\\r\]*, %r30d" } } */
+/* { dg-final { scan-assembler "movbel\[ \t]+\[^\\n\\r\]*, %r31d" } } */
+/* { dg-final { scan-assembler-not "knot" } } */
+/* { dg-final { scan-assembler-not "kxor" } } */
+/* { dg-final { scan-assembler-not "kor" } } */
+/* { dg-final { scan-assembler-not "kandn" } } */
-- 
2.31.1


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 05/13] [APX EGPR] Add register and memory constraints that disallow EGPR
  2023-08-31  8:20 [PATCH 00/13] [RFC] Support Intel APX EGPR Hongyu Wang
                   ` (3 preceding siblings ...)
  2023-08-31  8:20 ` [PATCH 04/13] [APX EGPR] Add 16 new integer general purpose registers Hongyu Wang
@ 2023-08-31  8:20 ` Hongyu Wang
  2023-08-31  8:20 ` [PATCH 06/13] [APX EGPR] Map reg/mem constraints in inline asm to non-EGPR constraint Hongyu Wang
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 49+ messages in thread
From: Hongyu Wang @ 2023-08-31  8:20 UTC (permalink / raw)
  To: gcc-patches; +Cc: hongtao.liu, ubizjak, hubicka, vmakarov, jakub, Kong Lingling

From: Kong Lingling <lingling.kong@intel.com>

For APX, as we extended the GENERAL_REG_CLASS, new constraints are
needed to restrict insns that cannot adopt EGPR either in its reg or
memory operands.

gcc/ChangeLog:

	* config/i386/constraints.md (h): New register constraint
	for GENERAL_GPR16.
	(Bt): New non-EGPR memory constraint.
	(BT): Likewise for Bm constraint.
	* config/i386/i386.h (enum reg_class): Add new reg class
	GENERAL_GPR16.
---
 gcc/config/i386/constraints.md | 19 ++++++++++++++++++-
 gcc/config/i386/i386.h         |  4 ++++
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
index fd490f39110..f487bf2e5a3 100644
--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -19,7 +19,7 @@
 
 ;;; Unused letters:
 ;;;           H
-;;;           h j               z
+;;;           j               z
 
 ;; Integer register constraints.
 ;; It is not necessary to define 'r' here.
@@ -165,6 +165,8 @@ (define_register_constraint "YW"
 ;;  k  TLS address that allows insn using non-integer registers
 ;;  n  Memory operand without REX prefix
 ;;  r  Broadcast memory operand
+;;  t  Memory operand without EGPR
+;;  T  Vector memory operand without EGPR
 ;;  s  Sibcall memory operand, not valid for TARGET_X32
 ;;  w  Call memory operand, not valid for TARGET_X32
 ;;  z  Constant call address operand.
@@ -201,6 +203,18 @@ (define_special_memory_constraint "Bn"
   "@internal Memory operand without REX prefix."
   (match_operand 0 "norex_memory_operand"))
 
+(define_memory_constraint "Bt"
+  "@internal Memory operand without GPR32."
+  (and (match_operand 0 "memory_operand")
+       (not (and (match_test "TARGET_APX_EGPR")
+		 (match_test "x86_extended_rex2reg_mentioned_p (op)")))))
+
+(define_special_memory_constraint "BT"
+  "@internal vector memory operand without GPR32."
+  (and (match_operand 0 "vector_memory_operand")
+       (not (and (match_test "TARGET_APX_EGPR")
+		 (match_test "x86_extended_rex2reg_mentioned_p (op)")))))
+
 (define_special_memory_constraint "Br"
   "@internal bcst memory operand."
   (match_operand 0 "bcst_mem_operand"))
@@ -371,3 +385,6 @@ (define_address_constraint "Tv"
 (define_address_constraint "Ts"
   "Address operand without segment register"
   (match_operand 0 "address_no_seg_operand"))
+
+(define_register_constraint  "h"
+ "TARGET_APX_EGPR ? GENERAL_GPR16 : GENERAL_REGS")
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 1ab291177f5..7ec3086641c 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -1295,6 +1295,8 @@ enum reg_class
 				   %r8 %r9 %r10 %r11 %r12 %r13 %r14 %r15
 				   %r16 %r17 %r18 %r19 %r20 %r21 %r22 %r23
 				   %r24 %r25 %r26 %r27 %r28 %r29 %r30 %r31 */
+  GENERAL_GPR16,		/* %eax %ebx %ecx %edx %esi %edi %ebp %esp
+				   %r8 %r9 %r10 %r11 %r12 %r13 %r14 %r15 */
   FP_TOP_REG, FP_SECOND_REG,	/* %st(0) %st(1) */
   FLOAT_REGS,
   SSE_FIRST_REG,
@@ -1357,6 +1359,7 @@ enum reg_class
    "INDEX_REGS",			\
    "LEGACY_REGS",			\
    "GENERAL_REGS",			\
+   "GENERAL_GPR16",			\
    "FP_TOP_REG", "FP_SECOND_REG",	\
    "FLOAT_REGS",			\
    "SSE_FIRST_REG",			\
@@ -1395,6 +1398,7 @@ enum reg_class
       { 0x7f,      0xff0,   0x0 },	/* INDEX_REGS */		\
    { 0x900ff,        0x0,   0x0 },	/* LEGACY_REGS */		\
    { 0x900ff,      0xff0,   0xffff000 },	/* GENERAL_REGS */		\
+   { 0x900ff,      0xff0,   0x0 },	/* GENERAL_GPR16 */		\
      { 0x100,        0x0,   0x0 },	/* FP_TOP_REG */		\
      { 0x200,        0x0,   0x0 },	/* FP_SECOND_REG */		\
     { 0xff00,        0x0,   0x0 },	/* FLOAT_REGS */		\
-- 
2.31.1


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 06/13] [APX EGPR] Map reg/mem constraints in inline asm to non-EGPR constraint.
  2023-08-31  8:20 [PATCH 00/13] [RFC] Support Intel APX EGPR Hongyu Wang
                   ` (4 preceding siblings ...)
  2023-08-31  8:20 ` [PATCH 05/13] [APX EGPR] Add register and memory constraints that disallow EGPR Hongyu Wang
@ 2023-08-31  8:20 ` Hongyu Wang
  2023-08-31  9:17   ` Jakub Jelinek
  2023-08-31  8:20 ` [PATCH 07/13] [APX EGPR] Add backend hook for base_reg_class/index_reg_class Hongyu Wang
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 49+ messages in thread
From: Hongyu Wang @ 2023-08-31  8:20 UTC (permalink / raw)
  To: gcc-patches; +Cc: hongtao.liu, ubizjak, hubicka, vmakarov, jakub, Kong Lingling

From: Kong Lingling <lingling.kong@intel.com>

In inline asm, we do not know if the insn can use EGPR, so disable EGPR
usage by default from mapping the common reg/mem constraint to non-EGPR
constraints. Use a flag mapx-inline-asm-use-gpr32 to enable EGPR usage
for inline asm.

gcc/ChangeLog:

	* config/i386/i386.cc (INCLUDE_STRING): Add include for
	ix86_md_asm_adjust.
	(ix86_md_asm_adjust): When APX EGPR enabled without specifying the
	target option, map reg/mem constraints to non-EGPR constraints.
	* config/i386/i386.opt: Add option mapx-inline-asm-use-gpr32.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/apx-inline-gpr-norex2.c: New test.
---
 gcc/config/i386/i386.cc                       |  44 +++++++
 gcc/config/i386/i386.opt                      |   5 +
 .../gcc.target/i386/apx-inline-gpr-norex2.c   | 107 ++++++++++++++++++
 3 files changed, 156 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-inline-gpr-norex2.c

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index d26d9ab0d9d..9460ebbfda4 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -17,6 +17,7 @@ You should have received a copy of the GNU General Public License
 along with GCC; see the file COPYING3.  If not see
 <http://www.gnu.org/licenses/>.  */
 
+#define INCLUDE_STRING
 #define IN_TARGET_CODE 1
 
 #include "config.h"
@@ -23077,6 +23078,49 @@ ix86_md_asm_adjust (vec<rtx> &outputs, vec<rtx> & /*inputs*/,
   bool saw_asm_flag = false;
 
   start_sequence ();
+  /* TODO: Here we just mapped the general r/m constraints to non-EGPR
+   constraints, will eventually map all the usable constraints in the future. */
+  if (TARGET_APX_EGPR && !ix86_apx_inline_asm_use_gpr32)
+    {
+      /* Map "r" constraint in inline asm to "h" that disallows r16-r31
+	 and replace only r, exclude Br and Yr.  */
+      for (unsigned i = 0; i < constraints.length (); i++)
+	{
+	  std::string *s = new std::string (constraints[i]);
+	  size_t pos = s->find ('r');
+	  while (pos != std::string::npos)
+	    {
+	      if (pos > 0
+		  && (s->at (pos - 1) == 'Y' || s->at (pos - 1) == 'B'))
+		pos = s->find ('r', pos + 1);
+	      else
+		{
+		  s->replace (pos, 1, "h");
+		  constraints[i] = (const char*) s->c_str ();
+		  break;
+		}
+	    }
+	}
+      /* Also map "m/memory/Bm" constraint that may use GPR32, replace them with
+	 "Bt/Bt/BT".  */
+      for (unsigned i = 0; i < constraints.length (); i++)
+	{
+	  std::string *s = new std::string (constraints[i]);
+	  size_t pos = s->find ("m");
+	  size_t pos2 = s->find ("memory");
+	  if (pos != std::string::npos)
+	    {
+	      if (pos > 0 && (s->at (pos - 1) == 'B'))
+		  s->replace (pos - 1, 2, "BT");
+	      else if (pos2 != std::string::npos)
+		  s->replace (pos, 6, "Bt");
+	      else
+		  s->replace (pos, 1, "Bt");
+	      constraints[i] = (const char*) s->c_str ();
+	    }
+	}
+     }
+
   for (unsigned i = 0, n = outputs.length (); i < n; ++i)
     {
       const char *con = constraints[i];
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index 1ee4d90186e..5c8d3a207e3 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -1335,3 +1335,8 @@ Enum(apx_features) String(ndd) Value(apx_ndd) Set(4)
 
 EnumValue
 Enum(apx_features) String(all) Value(apx_all) Set(1)
+
+mapx-inline-asm-use-gpr32
+Target Var(ix86_apx_inline_asm_use_gpr32) Init(0)
+Enable GPR32 in inline asm when APX_EGPR enabled, do not
+hook reg or mem constraint in inline asm to GPR16.
diff --git a/gcc/testsuite/gcc.target/i386/apx-inline-gpr-norex2.c b/gcc/testsuite/gcc.target/i386/apx-inline-gpr-norex2.c
new file mode 100644
index 00000000000..21534450045
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/apx-inline-gpr-norex2.c
@@ -0,0 +1,107 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mapxf -m64 -march=skylake-avx512 -DDTYPE32" } */
+
+typedef unsigned int u32;
+typedef unsigned long long u64;
+
+#ifdef DTYPE32
+typedef u32 DTYPE;
+#define byteswap byteswapu32
+#endif
+
+#define R(x,n) ( (x >> n) | (x << (32 - n)))
+
+#define S0(x) (R(x, 2) ^ R(x,13) ^ R(x,22))
+#define S1(x) (R(x, 6) ^ R(x,11) ^ R(x,25))
+
+#define TT(a,b,c,d,e,f,g,h,x,K)                 \
+{                                                        \
+    tmp1 = h + S1(e) + (g ^ (e & (f ^ g))) + K + x;                \
+    tmp2 = S0(a) + ((a & b) | (c & (a | b)));                           \
+    h  = tmp1 + tmp2;                                    \
+    d += tmp1;                                           \
+}
+
+static inline u32 byteswapu32(u32 x)
+{
+  x = (x & 0x0000FFFF) << 16 | (x & 0xFFFF0000) >> 16;
+  x = (x & 0x00FF00FF) << 8 | (x & 0xFF00FF00) >> 8;  
+  return x;
+}
+
+void foo (DTYPE in[16], DTYPE out[8], const DTYPE C[16])
+{
+    DTYPE tmp1 = 0, tmp2 = 0, a, b, c, d, e, f, g, h;
+    DTYPE w0, w1, w2, w3, w4, w5, w6, w7,
+	w8, w9, w10, w11, w12, w13, w14, w15;
+    w0  = byteswap(in[0]);
+    w1  = byteswap(in[1]);
+    w2  = byteswap(in[2]);
+    w3  = byteswap(in[3]);
+    w4  = byteswap(in[4]);
+    w5  = byteswap(in[5]);
+    w6  = byteswap(in[6]);
+    w7  = byteswap(in[7]);
+    w8  = byteswap(in[8]);
+    w9  = byteswap(in[9]);
+    w10 = byteswap(in[10]);
+    w11 = byteswap(in[11]);
+    w12 = byteswap(in[12]);
+    w13 = byteswap(in[13]);
+    w14 = byteswap(in[14]);
+    w15 = byteswap(in[15]);
+    a = out[0];
+    b = out[1];
+    c = out[2];
+    d = out[3];
+    e = out[4];
+    f = out[5];
+    g = out[6];
+    h = out[7];
+    
+    TT(a, b, c, d, e, f, g, h,  w0, C[0]);
+    TT(h, a, b, c, d, e, f, g,  w1, C[1]);
+    TT(g, h, a, b, c, d, e, f,  w2, C[2]);
+    TT(f, g, h, a, b, c, d, e,  w3, C[3]);
+    TT(e, f, g, h, a, b, c, d,  w4, C[4]);
+    TT(d, e, f, g, h, a, b, c,  w5, C[5]);
+    TT(c, d, e, f, g, h, a, b,  w6, C[6]);
+    TT(b, c, d, e, f, g, h, a,  w7, C[7]);
+    TT(a, b, c, d, e, f, g, h,  w8, C[8]);
+    TT(h, a, b, c, d, e, f, g,  w9, C[9]);
+    TT(g, h, a, b, c, d, e, f, w10, C[10]);
+    TT(f, g, h, a, b, c, d, e, w11, C[11]);
+    TT(e, f, g, h, a, b, c, d, w12, C[12]);
+    TT(d, e, f, g, h, a, b, c, w13, C[13]);
+    TT(c, d, e, f, g, h, a, b, w14, C[14]);
+    TT(b, c, d, e, f, g, h, a, w15, C[15]);
+
+    out[0] += a;
+    out[1] += b;
+    out[2] += c;
+    out[3] += d;
+    out[4] += e;
+    out[5] += f;
+    out[6] += g;
+    out[7] += h;
+
+    __asm__ __volatile__ ("test_asm_xmm %0, %%rax" : : "Yr" (out[7]) : "rax");
+    __asm__ __volatile__ ("test_asm_Brr %0, %%rax" : : "Brr" (w14) : "rbx");
+    __asm__ __volatile__ ("test_asm_rBr %0, %%rax" : : "rBr" (w13) : "rbx");
+    __asm__ __volatile__ ("test_asm_r %0, %%rax" : : "r" (w15) : "rbx");
+    __asm__ __volatile__ ("test_asm_m %0, %%rax" : : "m" (out[0]) : "rbx");
+    __asm__ __volatile__ ("test_asm_mem %0, %%rax" : : "memory" (out[1]) : "rbx");
+}
+
+/* { dg-final { scan-assembler-not "knot" } } */
+/* { dg-final { scan-assembler-not "kxor" } } */
+/* { dg-final { scan-assembler-not "kor" } } */
+/* { dg-final { scan-assembler-not "kandn" } } */
+/* { dg-final { scan-assembler-times "test_asm_xmm %xmm5, %rax" 1 } } */
+/* { dg-final { scan-assembler-times "test_asm_Brr %r15d, %rax" 1 } } */
+/* { dg-final { scan-assembler-times "test_asm_rBr %r14d, %rax" 1 } } */
+/* { dg-final { scan-assembler-times "test_asm_r %r13d, %rax" 1 } } */
+/* { dg-final { scan-assembler-not "test_asm_rBr %r31d, %rax" } } */
+/* { dg-final { scan-assembler-not "test_asm_r %r30d, %rax" } } */
+/* { dg-final { scan-assembler-not "test_asm_m \\(%r29d\\), %rax" } } */
+/* { dg-final { scan-assembler-not "test_asm_mem \\(%r28d\\), %rax" } } */
-- 
2.31.1


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 07/13] [APX EGPR] Add backend hook for base_reg_class/index_reg_class.
  2023-08-31  8:20 [PATCH 00/13] [RFC] Support Intel APX EGPR Hongyu Wang
                   ` (5 preceding siblings ...)
  2023-08-31  8:20 ` [PATCH 06/13] [APX EGPR] Map reg/mem constraints in inline asm to non-EGPR constraint Hongyu Wang
@ 2023-08-31  8:20 ` Hongyu Wang
  2023-08-31  8:20 ` [PATCH 08/13] [APX EGPR] Handle GPR16 only vector move insns Hongyu Wang
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 49+ messages in thread
From: Hongyu Wang @ 2023-08-31  8:20 UTC (permalink / raw)
  To: gcc-patches; +Cc: hongtao.liu, ubizjak, hubicka, vmakarov, jakub, Kong Lingling

From: Kong Lingling <lingling.kong@intel.com>

Add backend helper functions to verify if a rtx_insn can adopt EGPR to
its base/index reg of memory operand. The verification rule goes like
  1. For asm insn, enable/disable EGPR by ix86_apx_inline_asm_use_gpr32.
  2. Disable EGPR for unrecognized insn.
  3. If which_alternative is not decided, loop through enabled alternatives
  and check its attr_gpr32. Only enable EGPR when all enabled
  alternatives has attr_gpr32 = 1.
  4. If which_alternative is decided, enable/disable EGPR by its corresponding
  attr_gpr32.

gcc/ChangeLog:

	* config/i386/i386-protos.h (ix86_mode_code_base_reg_class): New
	prototype.
	(ix86_regno_mode_code_ok_for_base_p): Likewise.
	(ix86_insn_index_reg_class): Likewise.
	* config/i386/i386.cc (ix86_memory_address_use_extended_reg_class_p):
	New helper function to scan the insn.
	(ix86_mode_code_base_reg_class): New function to choose BASE_REG_CLASS.
	(ix86_regno_mode_code_ok_for_base_p): Likewise for base regno.
	(ix86_insn_index_reg_class): Likewise for INDEX_REG_CLASS.
	* config/i386/i386.h (MODE_CODE_BASE_REG_CLASS): Define.
	(REGNO_MODE_CODE_OK_FOR_BASE_P): Likewise.
	(INSN_INDEX_REG_CLASS): Likewise.
	(enum reg_class): Add INDEX_GPR16.
	(GENERAL_GPR16_REGNO_P): Define.
	* config/i386/i386.md (gpr32): New attribute.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/apx-inline-gpr-norex2.c: Adjust.
---
 gcc/config/i386/i386-protos.h                 |  7 ++
 gcc/config/i386/i386.cc                       | 98 +++++++++++++++++++
 gcc/config/i386/i386.h                        | 16 ++-
 gcc/config/i386/i386.md                       |  3 +
 .../gcc.target/i386/apx-inline-gpr-norex2.c   |  7 +-
 5 files changed, 127 insertions(+), 4 deletions(-)

diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index bd4782800c4..78eb3e0f584 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -79,6 +79,13 @@ extern bool ix86_expand_set_or_cpymem (rtx, rtx, rtx, rtx, rtx, rtx,
 				       rtx, rtx, rtx, rtx, bool);
 extern bool ix86_expand_cmpstrn_or_cmpmem (rtx, rtx, rtx, rtx, rtx, bool);
 
+extern enum reg_class ix86_mode_code_base_reg_class (machine_mode, addr_space_t,
+						     RTX_CODE, RTX_CODE,
+						     rtx_insn *);
+extern bool ix86_regno_mode_code_ok_for_base_p (int, machine_mode, addr_space_t,
+						RTX_CODE, RTX_CODE,
+						rtx_insn *);
+extern enum reg_class ix86_insn_index_reg_class (rtx_insn *);
 extern bool constant_address_p (rtx);
 extern bool legitimate_pic_operand_p (rtx);
 extern bool legitimate_pic_address_disp_p (rtx);
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 9460ebbfda4..412f3aefc43 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -11054,6 +11054,104 @@ ix86_validate_address_register (rtx op)
   return NULL_RTX;
 }
 
+/* Return true if insn memory address can use any available reg
+   in BASE_REG_CLASS or INDEX_REG_CLASS, otherwise false.
+   For APX, some instruction can't be encoded with gpr32
+   which is BASE_REG_CLASS or INDEX_REG_CLASS, for that case
+   returns false.  */
+static bool
+ix86_memory_address_use_extended_reg_class_p (rtx_insn* insn)
+{
+  /* LRA will do some initialization with insn == NULL,
+     return the maximum reg class for that.
+     For other cases, real insn will be passed and checked.  */
+  bool ret = true;
+  if (TARGET_APX_EGPR && insn)
+    {
+      if (asm_noperands (PATTERN (insn)) >= 0
+	  || GET_CODE (PATTERN (insn)) == ASM_INPUT)
+	return ix86_apx_inline_asm_use_gpr32;
+
+      if (INSN_CODE (insn) < 0)
+	return false;
+
+      /* Try recog the insn before calling get_attr_gpr32. Save
+	 the current recog_data first.  */
+      /* Also save which_alternative for current recog.  */
+
+      struct recog_data_d recog_data_save = recog_data;
+      int which_alternative_saved = which_alternative;
+
+      /* Update the recog_data for alternative check. */
+      if (recog_data.insn != insn)
+	extract_insn_cached (insn);
+
+      /* If alternative is not set, loop throught each alternative
+	 of insn and get gpr32 attr for all enabled alternatives.
+	 If any enabled alternatives has 0 value for gpr32, disallow
+	 gpr32 for addressing.  */
+      if (which_alternative_saved == -1)
+	{
+	  alternative_mask enabled = get_enabled_alternatives (insn);
+	  bool curr_insn_gpr32 = false;
+	  for (int i = 0; i < recog_data.n_alternatives; i++)
+	    {
+	      if (!TEST_BIT (enabled, i))
+		continue;
+	      which_alternative = i;
+	      curr_insn_gpr32 = get_attr_gpr32 (insn);
+	      if (!curr_insn_gpr32)
+		ret = false;
+	    }
+	}
+      else
+	{
+	  which_alternative = which_alternative_saved;
+	  ret = get_attr_gpr32 (insn);
+	}
+
+      recog_data = recog_data_save;
+      which_alternative = which_alternative_saved;
+    }
+
+  return ret;
+}
+
+/* For APX, some instructions can't be encoded with gpr32.  */
+enum reg_class
+ix86_mode_code_base_reg_class (machine_mode mode ATTRIBUTE_UNUSED,
+			       addr_space_t as ATTRIBUTE_UNUSED,
+			       enum rtx_code outer_code ATTRIBUTE_UNUSED,
+			       enum rtx_code index_code ATTRIBUTE_UNUSED,
+			       rtx_insn* insn)
+{
+  if (ix86_memory_address_use_extended_reg_class_p (insn))
+    return BASE_REG_CLASS;
+  return GENERAL_GPR16;
+}
+
+bool
+ix86_regno_mode_code_ok_for_base_p (int regno,
+				    machine_mode mode ATTRIBUTE_UNUSED,
+				    addr_space_t as ATTRIBUTE_UNUSED,
+				    enum rtx_code outer_code ATTRIBUTE_UNUSED,
+				    enum rtx_code index_code ATTRIBUTE_UNUSED,
+				    rtx_insn* insn)
+{
+
+  if (ix86_memory_address_use_extended_reg_class_p (insn))
+    return GENERAL_REGNO_P (regno);
+  return GENERAL_GPR16_REGNO_P (regno);
+}
+
+enum reg_class
+ix86_insn_index_reg_class (rtx_insn* insn)
+{
+  if (ix86_memory_address_use_extended_reg_class_p (insn))
+    return INDEX_REG_CLASS;
+  return INDEX_GPR16;
+}
+
 /* Recognizes RTL expressions that are valid memory addresses for an
    instruction.  The MODE argument is the machine mode for the MEM
    expression that wants to use this address.
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 7ec3086641c..c8362ef451c 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -1018,6 +1018,13 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
 
 #define ADJUST_REG_ALLOC_ORDER x86_order_regs_for_local_alloc ()
 
+#define MODE_CODE_BASE_REG_CLASS(MODE, AS, OUTER, INDEX, INSN) \
+  ix86_mode_code_base_reg_class (MODE, AS, OUTER, INDEX, INSN)
+#define REGNO_MODE_CODE_OK_FOR_BASE_P(NUM, MODE, AS, OUTER, INDEX, INSN) \
+  ix86_regno_mode_code_ok_for_base_p (NUM, MODE, AS, OUTER, INDEX, INSN)
+
+#define INSN_INDEX_REG_CLASS(INSN) \
+  ix86_insn_index_reg_class (INSN)
 
 #define OVERRIDE_ABI_FORMAT(FNDECL) ix86_call_abi_override (FNDECL)
 
@@ -1297,6 +1304,8 @@ enum reg_class
 				   %r24 %r25 %r26 %r27 %r28 %r29 %r30 %r31 */
   GENERAL_GPR16,		/* %eax %ebx %ecx %edx %esi %edi %ebp %esp
 				   %r8 %r9 %r10 %r11 %r12 %r13 %r14 %r15 */
+  INDEX_GPR16,			/* %eax %ebx %ecx %edx %esi %edi %ebp
+				   %r8 %r9 %r10 %r11 %r12 %r13 %r14 %r15 */
   FP_TOP_REG, FP_SECOND_REG,	/* %st(0) %st(1) */
   FLOAT_REGS,
   SSE_FIRST_REG,
@@ -1360,6 +1369,7 @@ enum reg_class
    "LEGACY_REGS",			\
    "GENERAL_REGS",			\
    "GENERAL_GPR16",			\
+   "INDEX_GPR16",			\
    "FP_TOP_REG", "FP_SECOND_REG",	\
    "FLOAT_REGS",			\
    "SSE_FIRST_REG",			\
@@ -1395,10 +1405,11 @@ enum reg_class
       { 0x0f,        0x0,   0x0 },	/* Q_REGS */			\
    { 0x900f0,        0x0,   0x0 },	/* NON_Q_REGS */		\
       { 0x7e,      0xff0,   0x0 },	/* TLS_GOTBASE_REGS */		\
-      { 0x7f,      0xff0,   0x0 },	/* INDEX_REGS */		\
+      { 0x7f,      0xff0,   0xffff000 },	/* INDEX_REGS */		\
    { 0x900ff,        0x0,   0x0 },	/* LEGACY_REGS */		\
    { 0x900ff,      0xff0,   0xffff000 },	/* GENERAL_REGS */		\
    { 0x900ff,      0xff0,   0x0 },	/* GENERAL_GPR16 */		\
+   { 0x0007f,      0xff0,   0x0 },	/* INDEX_GPR16 */		\
      { 0x100,        0x0,   0x0 },	/* FP_TOP_REG */		\
      { 0x200,        0x0,   0x0 },	/* FP_SECOND_REG */		\
     { 0xff00,        0x0,   0x0 },	/* FLOAT_REGS */		\
@@ -1456,6 +1467,9 @@ enum reg_class
 #define INDEX_REGNO_P(N) \
   (LEGACY_INDEX_REGNO_P (N) || REX_INT_REGNO_P (N) || REX2_INT_REGNO_P (N))
 
+#define GENERAL_GPR16_REGNO_P(N) \
+  (LEGACY_INT_REGNO_P (N) || REX_INT_REGNO_P (N))
+
 #define ANY_QI_REG_P(X) (REG_P (X) && ANY_QI_REGNO_P (REGNO (X)))
 #define ANY_QI_REGNO_P(N) \
   (TARGET_64BIT ? GENERAL_REGNO_P (N) : QI_REGNO_P (N))
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index e3270658cb7..b9eaea78f00 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -873,6 +873,9 @@ (define_attr "use_carry" "0,1" (const_string "0"))
 ;; Define attribute to indicate unaligned ssemov insns
 (define_attr "movu" "0,1" (const_string "0"))
 
+;; Define attribute to indicate gpr32 insns.
+(define_attr "gpr32" "0, 1" (const_string "1"))
+
 ;; Define instruction set of MMX instructions
 (define_attr "mmx_isa" "base,native,sse,sse_noavx,avx"
   (const_string "base"))
diff --git a/gcc/testsuite/gcc.target/i386/apx-inline-gpr-norex2.c b/gcc/testsuite/gcc.target/i386/apx-inline-gpr-norex2.c
index 21534450045..6dfc6714c2f 100644
--- a/gcc/testsuite/gcc.target/i386/apx-inline-gpr-norex2.c
+++ b/gcc/testsuite/gcc.target/i386/apx-inline-gpr-norex2.c
@@ -98,9 +98,10 @@ void foo (DTYPE in[16], DTYPE out[8], const DTYPE C[16])
 /* { dg-final { scan-assembler-not "kor" } } */
 /* { dg-final { scan-assembler-not "kandn" } } */
 /* { dg-final { scan-assembler-times "test_asm_xmm %xmm5, %rax" 1 } } */
-/* { dg-final { scan-assembler-times "test_asm_Brr %r15d, %rax" 1 } } */
-/* { dg-final { scan-assembler-times "test_asm_rBr %r14d, %rax" 1 } } */
-/* { dg-final { scan-assembler-times "test_asm_r %r13d, %rax" 1 } } */
+/* { dg-final { scan-assembler-times "test_asm_Brr %r12d, %rax" 1 } } */
+/* { dg-final { scan-assembler-times "test_asm_rBr %eax, %rax" 1 } } */
+/* { dg-final { scan-assembler-times "test_asm_r %eax, %rax" 1 } } */
+/* { dg-final { scan-assembler-times "test_asm_m \\(%rax\\), %rax" 1 } } */
 /* { dg-final { scan-assembler-not "test_asm_rBr %r31d, %rax" } } */
 /* { dg-final { scan-assembler-not "test_asm_r %r30d, %rax" } } */
 /* { dg-final { scan-assembler-not "test_asm_m \\(%r29d\\), %rax" } } */
-- 
2.31.1


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 08/13] [APX EGPR] Handle GPR16 only vector move insns
  2023-08-31  8:20 [PATCH 00/13] [RFC] Support Intel APX EGPR Hongyu Wang
                   ` (6 preceding siblings ...)
  2023-08-31  8:20 ` [PATCH 07/13] [APX EGPR] Add backend hook for base_reg_class/index_reg_class Hongyu Wang
@ 2023-08-31  8:20 ` Hongyu Wang
  2023-08-31  9:43   ` Jakub Jelinek
  2023-08-31  8:20 ` [PATCH 09/13] [APX EGPR] Handle legacy insn that only support GPR16 (1/5) Hongyu Wang
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 49+ messages in thread
From: Hongyu Wang @ 2023-08-31  8:20 UTC (permalink / raw)
  To: gcc-patches; +Cc: hongtao.liu, ubizjak, hubicka, vmakarov, jakub

For vector move insns like vmovdqa/vmovdqu, their evex counterparts
requrire explicit suffix 64/32/16/8. The usage of these instruction
are prohibited under AVX10_1 or AVX512F, so for AVX2+APX_F we select
vmovaps/vmovups for vector load/store insns that contains EGPR.

gcc/ChangeLog:

	* config/i386/i386.cc (ix86_get_ssemov): Check if egpr is used,
	adjust mnemonic for vmovduq/vmovdqa.
	* config/i386/sse.md (*<extract_type>_vinsert<shuffletype><extract_suf>_0):
	Check if egpr is used, adjust mnemonic for vmovdqu/vmovdqa.
	(avx_vec_concat<mode>): Likewise, and separate alternative 0 to
	avx_noavx512f.
---
 gcc/config/i386/i386.cc | 31 ++++++++++++++++++++++++++++++-
 gcc/config/i386/sse.md  | 34 ++++++++++++++++++++++++----------
 2 files changed, 54 insertions(+), 11 deletions(-)

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 412f3aefc43..f5d642948bc 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -5469,6 +5469,11 @@ ix86_get_ssemov (rtx *operands, unsigned size,
   bool evex_reg_p = (size == 64
 		     || EXT_REX_SSE_REG_P (operands[0])
 		     || EXT_REX_SSE_REG_P (operands[1]));
+
+  bool egpr_p = (TARGET_APX_EGPR
+		 && (x86_extended_rex2reg_mentioned_p (operands[0])
+		     || x86_extended_rex2reg_mentioned_p (operands[1])));
+
   machine_mode scalar_mode;
 
   const char *opcode = NULL;
@@ -5547,6 +5552,12 @@ ix86_get_ssemov (rtx *operands, unsigned size,
 			 ? "vmovdqu16"
 			 : "vmovdqu64")
 		      : "vmovdqa64");
+	  else if (egpr_p)
+	    opcode = (misaligned_p
+		      ? (TARGET_AVX512BW
+			 ? "vmovdqu16"
+			 : "%vmovups")
+		      : "%vmovaps");
 	  else
 	    opcode = (misaligned_p
 		      ? (TARGET_AVX512BW
@@ -5563,6 +5574,8 @@ ix86_get_ssemov (rtx *operands, unsigned size,
 	case E_TFmode:
 	  if (evex_reg_p)
 	    opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64";
+	  else if (egpr_p)
+	    opcode = misaligned_p ? "%vmovups" : "%vmovaps";
 	  else
 	    opcode = misaligned_p ? "%vmovdqu" : "%vmovdqa";
 	  break;
@@ -5581,6 +5594,12 @@ ix86_get_ssemov (rtx *operands, unsigned size,
 			 ? "vmovdqu8"
 			 : "vmovdqu64")
 		      : "vmovdqa64");
+	  else if (egpr_p)
+	    opcode = (misaligned_p
+		      ? (TARGET_AVX512BW
+			 ? "vmovdqu8"
+			 : "%vmovups")
+		      : "%vmovaps");
 	  else
 	    opcode = (misaligned_p
 		      ? (TARGET_AVX512BW
@@ -5589,12 +5608,18 @@ ix86_get_ssemov (rtx *operands, unsigned size,
 		      : "%vmovdqa");
 	  break;
 	case E_HImode:
-	  if (evex_reg_p)
+	  if (evex_reg_p || egpr_p)
 	    opcode = (misaligned_p
 		      ? (TARGET_AVX512BW
 			 ? "vmovdqu16"
 			 : "vmovdqu64")
 		      : "vmovdqa64");
+	  else if (egpr_p)
+	    opcode = (misaligned_p
+		      ? (TARGET_AVX512BW
+			 ? "vmovdqu16"
+			 : "%vmovups")
+		      : "%vmovaps");
 	  else
 	    opcode = (misaligned_p
 		      ? (TARGET_AVX512BW
@@ -5605,6 +5630,8 @@ ix86_get_ssemov (rtx *operands, unsigned size,
 	case E_SImode:
 	  if (evex_reg_p)
 	    opcode = misaligned_p ? "vmovdqu32" : "vmovdqa32";
+	  else if (egpr_p)
+	    opcode = misaligned_p ? "%vmovups" : "%vmovaps";
 	  else
 	    opcode = misaligned_p ? "%vmovdqu" : "%vmovdqa";
 	  break;
@@ -5613,6 +5640,8 @@ ix86_get_ssemov (rtx *operands, unsigned size,
 	case E_OImode:
 	  if (evex_reg_p)
 	    opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64";
+	  else if (egpr_p)
+	    opcode = misaligned_p ? "%vmovups" : "%vmovaps";
 	  else
 	    opcode = misaligned_p ? "%vmovdqu" : "%vmovdqa";
 	  break;
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 192e746fda3..bd6674d34f9 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -18918,6 +18918,12 @@ (define_insn "*<extract_type>_vinsert<shuffletype><extract_suf>_0"
 {
   if (which_alternative == 0)
     return "vinsert<shuffletype><extract_suf>\t{$0, %2, %1, %0|%0, %1, %2, 0}";
+  bool egpr_used = (TARGET_APX_EGPR
+		    && x86_extended_rex2reg_mentioned_p (operands[2]));
+  const char *align_templ = egpr_used ? "vmovdqa\t{%2, %x0|%x0, %2}"
+				      : "vmovaps\t{%2, %x0|%x0, %2}";
+  const char *unalign_templ = egpr_used ? "vmovdqu\t{%2, %x0|%x0, %2}"
+					: "vmovups\t{%2, %x0|%x0, %2}";
   switch (<MODE>mode)
     {
     case E_V8DFmode:
@@ -18933,17 +18939,17 @@ (define_insn "*<extract_type>_vinsert<shuffletype><extract_suf>_0"
     case E_V8DImode:
       if (misaligned_operand (operands[2], <ssequartermode>mode))
 	return which_alternative == 2 ? "vmovdqu64\t{%2, %x0|%x0, %2}"
-				      : "vmovdqu\t{%2, %x0|%x0, %2}";
+				      : unalign_templ;
       else
 	return which_alternative == 2 ? "vmovdqa64\t{%2, %x0|%x0, %2}"
-				      : "vmovdqa\t{%2, %x0|%x0, %2}";
+				      : align_templ;
     case E_V16SImode:
       if (misaligned_operand (operands[2], <ssequartermode>mode))
 	return which_alternative == 2 ? "vmovdqu32\t{%2, %x0|%x0, %2}"
-				      : "vmovdqu\t{%2, %x0|%x0, %2}";
+				      : unalign_templ;
       else
 	return which_alternative == 2 ? "vmovdqa32\t{%2, %x0|%x0, %2}"
-				      : "vmovdqa\t{%2, %x0|%x0, %2}";
+				      : align_templ;
     default:
       gcc_unreachable ();
     }
@@ -27652,11 +27658,13 @@ (define_insn "avx_vec_concat<mode>"
   [(set (match_operand:V_256_512 0 "register_operand" "=x,v,x,Yv")
 	(vec_concat:V_256_512
 	  (match_operand:<ssehalfvecmode> 1 "nonimmediate_operand" "x,v,xm,vm")
-	  (match_operand:<ssehalfvecmode> 2 "nonimm_or_0_operand" "xm,vm,C,C")))]
+	  (match_operand:<ssehalfvecmode> 2 "nonimm_or_0_operand" "xBt,vm,C,C")))]
   "TARGET_AVX
    && (operands[2] == CONST0_RTX (<ssehalfvecmode>mode)
        || !MEM_P (operands[1]))"
 {
+  bool egpr_used = (TARGET_APX_EGPR
+		    && x86_extended_rex2reg_mentioned_p (operands[1]));
   switch (which_alternative)
     {
     case 0:
@@ -27704,7 +27712,8 @@ (define_insn "avx_vec_concat<mode>"
 	  if (misaligned_operand (operands[1], <ssehalfvecmode>mode))
 	    {
 	      if (which_alternative == 2)
-		return "vmovdqu\t{%1, %t0|%t0, %1}";
+		return egpr_used ? "vmovups\t{%1, %t0|%t0, %1}"
+				 : "vmovdqu\t{%1, %t0|%t0, %1}";
 	      else if (GET_MODE_SIZE (<ssescalarmode>mode) == 8)
 		return "vmovdqu64\t{%1, %t0|%t0, %1}";
 	      else
@@ -27713,7 +27722,8 @@ (define_insn "avx_vec_concat<mode>"
 	  else
 	    {
 	      if (which_alternative == 2)
-		return "vmovdqa\t{%1, %t0|%t0, %1}";
+		return egpr_used ? "vmovaps\t{%1, %t0|%t0, %1}"
+				 : "vmovdqa\t{%1, %t0|%t0, %1}";
 	      else if (GET_MODE_SIZE (<ssescalarmode>mode) == 8)
 		return "vmovdqa64\t{%1, %t0|%t0, %1}";
 	      else
@@ -27723,7 +27733,8 @@ (define_insn "avx_vec_concat<mode>"
 	  if (misaligned_operand (operands[1], <ssehalfvecmode>mode))
 	    {
 	      if (which_alternative == 2)
-		return "vmovdqu\t{%1, %x0|%x0, %1}";
+		return egpr_used ? "vmovups\t{%1, %x0|%x0, %1}"
+				 : "vmovdqu\t{%1, %x0|%x0, %1}";
 	      else if (GET_MODE_SIZE (<ssescalarmode>mode) == 8)
 		return "vmovdqu64\t{%1, %x0|%x0, %1}";
 	      else
@@ -27732,7 +27743,8 @@ (define_insn "avx_vec_concat<mode>"
 	  else
 	    {
 	      if (which_alternative == 2)
-		return "vmovdqa\t{%1, %x0|%x0, %1}";
+		return egpr_used ? "vmovaps\t{%1, %x0|%x0, %1}"
+				 : "vmovdqa\t{%1, %x0|%x0, %1}";
 	      else if (GET_MODE_SIZE (<ssescalarmode>mode) == 8)
 		return "vmovdqa64\t{%1, %x0|%x0, %1}";
 	      else
@@ -27745,7 +27757,9 @@ (define_insn "avx_vec_concat<mode>"
       gcc_unreachable ();
     }
 }
-  [(set_attr "type" "sselog,sselog,ssemov,ssemov")
+  [(set_attr "isa" "noavx512f,avx512f,*,*")
+   (set_attr "gpr32" "0,1,1,1")
+   (set_attr "type" "sselog,sselog,ssemov,ssemov")
    (set_attr "prefix_extra" "1,1,*,*")
    (set_attr "length_immediate" "1,1,*,*")
    (set_attr "prefix" "maybe_evex")
-- 
2.31.1


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 09/13] [APX EGPR] Handle legacy insn that only support GPR16 (1/5)
  2023-08-31  8:20 [PATCH 00/13] [RFC] Support Intel APX EGPR Hongyu Wang
                   ` (7 preceding siblings ...)
  2023-08-31  8:20 ` [PATCH 08/13] [APX EGPR] Handle GPR16 only vector move insns Hongyu Wang
@ 2023-08-31  8:20 ` Hongyu Wang
  2023-08-31 10:06   ` Uros Bizjak
  2023-08-31  8:20 ` [PATCH 10/13] [APX EGPR] Handle legacy insns that only support GPR16 (2/5) Hongyu Wang
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 49+ messages in thread
From: Hongyu Wang @ 2023-08-31  8:20 UTC (permalink / raw)
  To: gcc-patches; +Cc: hongtao.liu, ubizjak, hubicka, vmakarov, jakub, Kong Lingling

From: Kong Lingling <lingling.kong@intel.com>

These legacy insn in opcode map0/1 only support GPR16,
and do not have vex/evex counterpart, directly adjust constraints and
add gpr32 attr to patterns.

insn list:
1. xsave/xsave64, xrstor/xrstor64
2. xsaves/xsaves64, xrstors/xrstors64
3. xsavec/xsavec64
4. xsaveopt/xsaveopt64
5. fxsave64/fxrstor64

gcc/ChangeLog:

	* config/i386/i386.md (<xsave>): Set attr gpr32 0 and constraint
	Bt.
	(<xsave>_rex64): Likewise.
	(<xrstor>_rex64): Likewise.
	(<xrstor>64): Likewise.
	(fxsave64): Likewise.
	(fxstore64): Likewise.

gcc/testsuite/ChangeLog:

	* lib/target-supports.exp: Add apxf check.
	* gcc.target/i386/apx-legacy-insn-check-norex2.c: New test.
	* gcc.target/i386/apx-legacy-insn-check-norex2-asm.c: New assembler test.
---
 gcc/config/i386/i386.md                       | 18 +++++++----
 .../i386/apx-legacy-insn-check-norex2-asm.c   |  5 ++++
 .../i386/apx-legacy-insn-check-norex2.c       | 30 +++++++++++++++++++
 gcc/testsuite/lib/target-supports.exp         | 10 +++++++
 4 files changed, 57 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2-asm.c
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index b9eaea78f00..83ad01b43c1 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -25626,11 +25626,12 @@ (define_insn "fxsave"
         (symbol_ref "ix86_attr_length_address_default (insn) + 3"))])
 
 (define_insn "fxsave64"
-  [(set (match_operand:BLK 0 "memory_operand" "=m")
+  [(set (match_operand:BLK 0 "memory_operand" "=Bt")
 	(unspec_volatile:BLK [(const_int 0)] UNSPECV_FXSAVE64))]
   "TARGET_64BIT && TARGET_FXSR"
   "fxsave64\t%0"
   [(set_attr "type" "other")
+   (set_attr "gpr32" "0")
    (set_attr "memory" "store")
    (set (attr "length")
         (symbol_ref "ix86_attr_length_address_default (insn) + 4"))])
@@ -25646,11 +25647,12 @@ (define_insn "fxrstor"
         (symbol_ref "ix86_attr_length_address_default (insn) + 3"))])
 
 (define_insn "fxrstor64"
-  [(unspec_volatile [(match_operand:BLK 0 "memory_operand" "m")]
+  [(unspec_volatile [(match_operand:BLK 0 "memory_operand" "Bt")]
 		    UNSPECV_FXRSTOR64)]
   "TARGET_64BIT && TARGET_FXSR"
   "fxrstor64\t%0"
   [(set_attr "type" "other")
+   (set_attr "gpr32" "0")
    (set_attr "memory" "load")
    (set (attr "length")
         (symbol_ref "ix86_attr_length_address_default (insn) + 4"))])
@@ -25704,7 +25706,7 @@ (define_insn "<xsave>"
         (symbol_ref "ix86_attr_length_address_default (insn) + 3"))])
 
 (define_insn "<xsave>_rex64"
-  [(set (match_operand:BLK 0 "memory_operand" "=m")
+  [(set (match_operand:BLK 0 "memory_operand" "=Bt")
 	(unspec_volatile:BLK
 	 [(match_operand:SI 1 "register_operand" "a")
 	  (match_operand:SI 2 "register_operand" "d")]
@@ -25713,11 +25715,12 @@ (define_insn "<xsave>_rex64"
   "<xsave>\t%0"
   [(set_attr "type" "other")
    (set_attr "memory" "store")
+   (set_attr "gpr32" "0")
    (set (attr "length")
         (symbol_ref "ix86_attr_length_address_default (insn) + 3"))])
 
 (define_insn "<xsave>"
-  [(set (match_operand:BLK 0 "memory_operand" "=m")
+  [(set (match_operand:BLK 0 "memory_operand" "=Bt")
 	(unspec_volatile:BLK
 	 [(match_operand:SI 1 "register_operand" "a")
 	  (match_operand:SI 2 "register_operand" "d")]
@@ -25726,6 +25729,7 @@ (define_insn "<xsave>"
   "<xsave>\t%0"
   [(set_attr "type" "other")
    (set_attr "memory" "store")
+   (set_attr "gpr32" "0")
    (set (attr "length")
         (symbol_ref "ix86_attr_length_address_default (insn) + 4"))])
 
@@ -25743,7 +25747,7 @@ (define_insn "<xrstor>"
 
 (define_insn "<xrstor>_rex64"
    [(unspec_volatile:BLK
-     [(match_operand:BLK 0 "memory_operand" "m")
+     [(match_operand:BLK 0 "memory_operand" "Bt")
       (match_operand:SI 1 "register_operand" "a")
       (match_operand:SI 2 "register_operand" "d")]
      ANY_XRSTOR)]
@@ -25751,12 +25755,13 @@ (define_insn "<xrstor>_rex64"
   "<xrstor>\t%0"
   [(set_attr "type" "other")
    (set_attr "memory" "load")
+   (set_attr "gpr32" "0")
    (set (attr "length")
         (symbol_ref "ix86_attr_length_address_default (insn) + 3"))])
 
 (define_insn "<xrstor>64"
    [(unspec_volatile:BLK
-     [(match_operand:BLK 0 "memory_operand" "m")
+     [(match_operand:BLK 0 "memory_operand" "Bt")
       (match_operand:SI 1 "register_operand" "a")
       (match_operand:SI 2 "register_operand" "d")]
      ANY_XRSTOR64)]
@@ -25764,6 +25769,7 @@ (define_insn "<xrstor>64"
   "<xrstor>64\t%0"
   [(set_attr "type" "other")
    (set_attr "memory" "load")
+   (set_attr "gpr32" "0")
    (set (attr "length")
         (symbol_ref "ix86_attr_length_address_default (insn) + 4"))])
 
diff --git a/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2-asm.c b/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2-asm.c
new file mode 100644
index 00000000000..7ecc861435f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2-asm.c
@@ -0,0 +1,5 @@
+/* { dg-do assemble { target apxf } } */
+/* { dg-options "-O1 -mapxf -m64 -DDTYPE32" } */
+
+#include "apx-legacy-insn-check-norex2.c"
+
diff --git a/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c b/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c
new file mode 100644
index 00000000000..1e5450dfb73
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mapxf -m64 -DDTYPE32" } */
+
+#include <immintrin.h>
+
+typedef unsigned int u32;
+typedef unsigned long long u64;
+
+#ifndef DTYPE32
+#define DTYPE32
+#endif
+
+#ifdef DTYPE32
+typedef u32 DTYPE;
+#endif
+
+__attribute__((target("xsave,fxsr")))
+void legacy_test ()
+{
+  register DTYPE* val __asm__("r16");
+  _xsave64 (val, 1);
+  _xrstor64 (val, 1);
+  _fxsave64 (val);
+  _fxrstor64 (val);
+}
+
+/* { dg-final { scan-assembler-not "xsave64\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "xrstor64\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "fxsave64\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "fxrstor64\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index d353cc0aaf0..6359408542a 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -9938,6 +9938,16 @@ proc check_effective_target_sm4 { } {
     } "-msm4" ]
 }
 
+proc check_effective_target_apxf { } {
+    return [check_no_compiler_messages apxf object {
+	void
+	foo ()
+	{
+	  __asm__ volatile ("add\t%%r16, %%r31" ::);
+	}
+    } "-mapxf" ]
+}
+
 # Return 1 if sse instructions can be compiled.
 proc check_effective_target_sse { } {
     return [check_no_compiler_messages sse object {
-- 
2.31.1


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 10/13] [APX EGPR] Handle legacy insns that only support GPR16 (2/5)
  2023-08-31  8:20 [PATCH 00/13] [RFC] Support Intel APX EGPR Hongyu Wang
                   ` (8 preceding siblings ...)
  2023-08-31  8:20 ` [PATCH 09/13] [APX EGPR] Handle legacy insn that only support GPR16 (1/5) Hongyu Wang
@ 2023-08-31  8:20 ` Hongyu Wang
  2023-08-31  8:20 ` [PATCH 11/13] [APX EGPR] Handle legacy insns that only support GPR16 (3/5) Hongyu Wang
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 49+ messages in thread
From: Hongyu Wang @ 2023-08-31  8:20 UTC (permalink / raw)
  To: gcc-patches; +Cc: hongtao.liu, ubizjak, hubicka, vmakarov, jakub, Kong Lingling

From: Kong Lingling <lingling.kong@intel.com>

These legacy insns in opcode map2/3 have vex but no evex
counterpart, disable EGPR for them by adjusting alternatives and
attr_gpr32.

insn list:
1. phaddw/vphaddw, phaddd/vphaddd, phaddsw/vphaddsw
2. phsubw/vphsubw, phsubd/vphsubd, phsubsw/vphsubsw
3. psignb/vpsginb, psignw/vpsignw, psignd/vpsignd
4. blendps/vblendps, blendpd/vblendpd
5. blendvps/vblendvps, blendvpd/vblendvpd
6. pblendvb/vpblendvb, pblendw/vpblendw
7. mpsadbw/vmpsadbw
8. dpps/vddps, dppd/vdppd
9. pcmpeqq/vpcmpeqq, pcmpgtq/vpcmpgtq

gcc/ChangeLog:

	* config/i386/sse.md (avx2_ph<plusminus_mnemonic>wv16hi3): Set
	attr gpr32 0 and constraint Bt/BM to all mem alternatives.
	(ssse3_ph<plusminus_mnemonic>wv8hi3): Likewise.
	(ssse3_ph<plusminus_mnemonic>wv4hi3): Likewise.
	(avx2_ph<plusminus_mnemonic>dv8si3): Likewise.
	(ssse3_ph<plusminus_mnemonic>dv4si3): Likewise.
	(ssse3_ph<plusminus_mnemonic>dv2si3): Likewise.
	(<ssse3_avx2>_psign<mode>3): Likewise.
	(ssse3_psign<mode>3): Likewise.
	(<sse4_1>_blend<ssemodesuffix><avxsizesuffix): Likewise.
	(<sse4_1>_blendv<ssemodesuffix><avxsizesuffix): Likewise.
	(*<sse4_1>_blendv<ssemodesuffix><avxsizesuffix>_lt): Likewise.
	(*<sse4_1>_blendv<ssefltmodesuff)ix><avxsizesuffix>_not_ltint: Likewise.
	(<sse4_1>_dp<ssemodesuffix><avxsizesuffix>): Likewise.
	(<sse4_1_avx2>_mpsadbw): Likewise.
	(<sse4_1_avx2>_pblendvb): Likewise.
	(*<sse4_1_avx2>_pblendvb_lt): Likewise.
	(sse4_1_pblend<ssemodesuffix>): Likewise.
	(*avx2_pblend<ssemodesuffix>): Likewise.
	(avx2_permv2ti): Likewise.
	(*avx_vperm2f128<mode>_nozero): Likewise.
	(*avx2_eq<mode>3): Likewise.
	(*sse4_1_eqv2di3): Likewise.
	(sse4_2_gtv2di3): Likewise.
	(avx2_gt<mode>3): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/apx-legacy-insn-check-norex2.c: Add
	sse/vex intrinsic tests.
---
 gcc/config/i386/sse.md                        |  80 ++++++++-----
 .../i386/apx-legacy-insn-check-norex2.c       | 106 ++++++++++++++++++
 2 files changed, 159 insertions(+), 27 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index bd6674d34f9..05963de9219 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -16837,7 +16837,7 @@ (define_insn "*avx2_eq<mode>3"
   [(set (match_operand:VI_256 0 "register_operand" "=x")
 	(eq:VI_256
 	  (match_operand:VI_256 1 "nonimmediate_operand" "%x")
-	  (match_operand:VI_256 2 "nonimmediate_operand" "xm")))]
+	  (match_operand:VI_256 2 "nonimmediate_operand" "xBt")))]
   "TARGET_AVX2 && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
   "vpcmpeq<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "type" "ssecmp")
@@ -16845,6 +16845,7 @@ (define_insn "*avx2_eq<mode>3"
      (if_then_else (eq (const_string "<MODE>mode") (const_string "V4DImode"))
 		   (const_string "1")
 		   (const_string "*")))
+   (set_attr "gpr32" "0")
    (set_attr "prefix" "vex")
    (set_attr "mode" "OI")])
 
@@ -17027,7 +17028,7 @@ (define_insn "*sse4_1_eqv2di3"
   [(set (match_operand:V2DI 0 "register_operand" "=Yr,*x,x")
 	(eq:V2DI
 	  (match_operand:V2DI 1 "vector_operand" "%0,0,x")
-	  (match_operand:V2DI 2 "vector_operand" "YrBm,*xBm,xm")))]
+	  (match_operand:V2DI 2 "vector_operand" "YrBT,*xBT,xBt")))]
   "TARGET_SSE4_1 && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
   "@
    pcmpeqq\t{%2, %0|%0, %2}
@@ -17035,6 +17036,7 @@ (define_insn "*sse4_1_eqv2di3"
    vpcmpeqq\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssecmp")
+   (set_attr "gpr32" "0")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "orig,orig,vex")
    (set_attr "mode" "TI")])
@@ -17043,7 +17045,7 @@ (define_insn "*sse2_eq<mode>3"
   [(set (match_operand:VI124_128 0 "register_operand" "=x,x")
 	(eq:VI124_128
 	  (match_operand:VI124_128 1 "vector_operand" "%0,x")
-	  (match_operand:VI124_128 2 "vector_operand" "xBm,xm")))]
+	  (match_operand:VI124_128 2 "vector_operand" "xBm,xBt")))]
   "TARGET_SSE2
    && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
   "@
@@ -17058,7 +17060,7 @@ (define_insn "sse4_2_gtv2di3"
   [(set (match_operand:V2DI 0 "register_operand" "=Yr,*x,x")
 	(gt:V2DI
 	  (match_operand:V2DI 1 "register_operand" "0,0,x")
-	  (match_operand:V2DI 2 "vector_operand" "YrBm,*xBm,xm")))]
+	  (match_operand:V2DI 2 "vector_operand" "YrBT,*xBT,xBt")))]
   "TARGET_SSE4_2"
   "@
    pcmpgtq\t{%2, %0|%0, %2}
@@ -17066,6 +17068,7 @@ (define_insn "sse4_2_gtv2di3"
    vpcmpgtq\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssecmp")
+   (set_attr "gpr32" "0")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "orig,orig,vex")
    (set_attr "mode" "TI")])
@@ -17074,7 +17077,7 @@ (define_insn "avx2_gt<mode>3"
   [(set (match_operand:VI_256 0 "register_operand" "=x")
 	(gt:VI_256
 	  (match_operand:VI_256 1 "register_operand" "x")
-	  (match_operand:VI_256 2 "nonimmediate_operand" "xm")))]
+	  (match_operand:VI_256 2 "nonimmediate_operand" "xBt")))]
   "TARGET_AVX2"
   "vpcmpgt<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "type" "ssecmp")
@@ -17082,6 +17085,7 @@ (define_insn "avx2_gt<mode>3"
      (if_then_else (eq (const_string "<MODE>mode") (const_string "V4DImode"))
 		   (const_string "1")
 		   (const_string "*")))
+   (set_attr "gpr32" "0")
    (set_attr "prefix" "vex")
    (set_attr "mode" "OI")])
 
@@ -17105,7 +17109,7 @@ (define_insn "*sse2_gt<mode>3"
   [(set (match_operand:VI124_128 0 "register_operand" "=x,x")
 	(gt:VI124_128
 	  (match_operand:VI124_128 1 "register_operand" "0,x")
-	  (match_operand:VI124_128 2 "vector_operand" "xBm,xm")))]
+	  (match_operand:VI124_128 2 "vector_operand" "xBm,xBt")))]
   "TARGET_SSE2"
   "@
    pcmpgt<ssemodesuffix>\t{%2, %0|%0, %2}
@@ -21228,7 +21232,7 @@ (define_insn "avx2_ph<plusminus_mnemonic>wv16hi3"
 	  (vec_select:V16HI
 	    (vec_concat:V32HI
 	      (match_operand:V16HI 1 "register_operand" "x")
-	      (match_operand:V16HI 2 "nonimmediate_operand" "xm"))
+	      (match_operand:V16HI 2 "nonimmediate_operand" "xBt"))
 	    (parallel
 	      [(const_int 0) (const_int 2) (const_int 4) (const_int 6)
 	       (const_int 16) (const_int 18) (const_int 20) (const_int 22)
@@ -21244,6 +21248,7 @@ (define_insn "avx2_ph<plusminus_mnemonic>wv16hi3"
   "TARGET_AVX2"
   "vph<plusminus_mnemonic>w\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "type" "sseiadd")
+   (set_attr "gpr32" "0")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "vex")
    (set_attr "mode" "OI")])
@@ -21254,7 +21259,7 @@ (define_insn "ssse3_ph<plusminus_mnemonic>wv8hi3"
 	  (vec_select:V8HI
 	    (vec_concat:V16HI
 	      (match_operand:V8HI 1 "register_operand" "0,x")
-	      (match_operand:V8HI 2 "vector_operand" "xBm,xm"))
+	      (match_operand:V8HI 2 "vector_operand" "xBT,xBt"))
 	    (parallel
 	      [(const_int 0) (const_int 2) (const_int 4) (const_int 6)
 	       (const_int 8) (const_int 10) (const_int 12) (const_int 14)]))
@@ -21269,6 +21274,7 @@ (define_insn "ssse3_ph<plusminus_mnemonic>wv8hi3"
    vph<plusminus_mnemonic>w\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sseiadd")
+   (set_attr "gpr32" "0")
    (set_attr "atom_unit" "complex")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "orig,vex")
@@ -21280,7 +21286,7 @@ (define_insn_and_split "ssse3_ph<plusminus_mnemonic>wv4hi3"
 	  (vec_select:V4HI
 	    (vec_concat:V8HI
 	      (match_operand:V4HI 1 "register_operand" "0,0,x")
-	      (match_operand:V4HI 2 "register_mmxmem_operand" "ym,x,x"))
+	      (match_operand:V4HI 2 "register_mmxmem_operand" "yBt,x,x"))
 	    (parallel
 	      [(const_int 0) (const_int 2) (const_int 4) (const_int 6)]))
 	  (vec_select:V4HI
@@ -21309,6 +21315,7 @@ (define_insn_and_split "ssse3_ph<plusminus_mnemonic>wv4hi3"
 }
   [(set_attr "mmx_isa" "native,sse_noavx,avx")
    (set_attr "type" "sseiadd")
+   (set_attr "gpr32" "0")
    (set_attr "atom_unit" "complex")
    (set_attr "prefix_extra" "1")
    (set (attr "prefix_rex") (symbol_ref "x86_extended_reg_mentioned_p (insn)"))
@@ -21320,7 +21327,7 @@ (define_insn "avx2_ph<plusminus_mnemonic>dv8si3"
 	  (vec_select:V8SI
 	    (vec_concat:V16SI
 	      (match_operand:V8SI 1 "register_operand" "x")
-	      (match_operand:V8SI 2 "nonimmediate_operand" "xm"))
+	      (match_operand:V8SI 2 "nonimmediate_operand" "xBt"))
 	    (parallel
 	      [(const_int 0) (const_int 2) (const_int 8) (const_int 10)
 	       (const_int 4) (const_int 6) (const_int 12) (const_int 14)]))
@@ -21332,6 +21339,7 @@ (define_insn "avx2_ph<plusminus_mnemonic>dv8si3"
   "TARGET_AVX2"
   "vph<plusminus_mnemonic>d\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "type" "sseiadd")
+   (set_attr "gpr32" "0")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "vex")
    (set_attr "mode" "OI")])
@@ -21342,7 +21350,7 @@ (define_insn "ssse3_ph<plusminus_mnemonic>dv4si3"
 	  (vec_select:V4SI
 	    (vec_concat:V8SI
 	      (match_operand:V4SI 1 "register_operand" "0,x")
-	      (match_operand:V4SI 2 "vector_operand" "xBm,xm"))
+	      (match_operand:V4SI 2 "vector_operand" "xBT,xBt"))
 	    (parallel
 	      [(const_int 0) (const_int 2) (const_int 4) (const_int 6)]))
 	  (vec_select:V4SI
@@ -21355,6 +21363,7 @@ (define_insn "ssse3_ph<plusminus_mnemonic>dv4si3"
    vph<plusminus_mnemonic>d\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sseiadd")
+   (set_attr "gpr32" "0")
    (set_attr "atom_unit" "complex")
    (set_attr "prefix_data16" "1,*")
    (set_attr "prefix_extra" "1")
@@ -21367,7 +21376,7 @@ (define_insn_and_split "ssse3_ph<plusminus_mnemonic>dv2si3"
 	  (vec_select:V2SI
 	    (vec_concat:V4SI
 	      (match_operand:V2SI 1 "register_operand" "0,0,x")
-	      (match_operand:V2SI 2 "register_mmxmem_operand" "ym,x,x"))
+	      (match_operand:V2SI 2 "register_mmxmem_operand" "yBt,x,x"))
 	    (parallel [(const_int 0) (const_int 2)]))
 	  (vec_select:V2SI
 	    (vec_concat:V4SI (match_dup 1) (match_dup 2))
@@ -21394,6 +21403,7 @@ (define_insn_and_split "ssse3_ph<plusminus_mnemonic>dv2si3"
 }
   [(set_attr "mmx_isa" "native,sse_noavx,avx")
    (set_attr "type" "sseiadd")
+   (set_attr "gpr32" "0")
    (set_attr "atom_unit" "complex")
    (set_attr "prefix_extra" "1")
    (set (attr "prefix_rex") (symbol_ref "x86_extended_reg_mentioned_p (insn)"))
@@ -21848,7 +21858,7 @@ (define_insn "<ssse3_avx2>_psign<mode>3"
   [(set (match_operand:VI124_AVX2 0 "register_operand" "=x,x")
 	(unspec:VI124_AVX2
 	  [(match_operand:VI124_AVX2 1 "register_operand" "0,x")
-	   (match_operand:VI124_AVX2 2 "vector_operand" "xBm,xm")]
+	   (match_operand:VI124_AVX2 2 "vector_operand" "xBT,xBt")]
 	  UNSPEC_PSIGN))]
   "TARGET_SSSE3"
   "@
@@ -21856,6 +21866,7 @@ (define_insn "<ssse3_avx2>_psign<mode>3"
    vpsign<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sselog1")
+   (set_attr "gpr32" "0")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "<sseinsnmode>")])
@@ -21864,7 +21875,7 @@ (define_insn "ssse3_psign<mode>3"
   [(set (match_operand:MMXMODEI 0 "register_operand" "=y,x,x")
 	(unspec:MMXMODEI
 	  [(match_operand:MMXMODEI 1 "register_operand" "0,0,x")
-	   (match_operand:MMXMODEI 2 "register_mmxmem_operand" "ym,x,x")]
+	   (match_operand:MMXMODEI 2 "register_mmxmem_operand" "yBt,x,x")]
 	  UNSPEC_PSIGN))]
   "(TARGET_MMX || TARGET_MMX_WITH_SSE) && TARGET_SSSE3"
   "@
@@ -21874,6 +21885,7 @@ (define_insn "ssse3_psign<mode>3"
   [(set_attr "isa" "*,noavx,avx")
    (set_attr "mmx_isa" "native,*,*")
    (set_attr "type" "sselog1")
+   (set_attr "gpr32" "0")
    (set_attr "prefix_extra" "1")
    (set (attr "prefix_rex") (symbol_ref "x86_extended_reg_mentioned_p (insn)"))
    (set_attr "mode" "DI,TI,TI")])
@@ -22153,7 +22165,7 @@ (define_mode_attr blendbits
 (define_insn "<sse4_1>_blend<ssemodesuffix><avxsizesuffix>"
   [(set (match_operand:VF_128_256 0 "register_operand" "=Yr,*x,x")
 	(vec_merge:VF_128_256
-	  (match_operand:VF_128_256 2 "vector_operand" "YrBm,*xBm,xm")
+	  (match_operand:VF_128_256 2 "vector_operand" "YrBT,*xBT,xBt")
 	  (match_operand:VF_128_256 1 "register_operand" "0,0,x")
 	  (match_operand:SI 3 "const_0_to_<blendbits>_operand")))]
   "TARGET_SSE4_1"
@@ -22163,6 +22175,7 @@ (define_insn "<sse4_1>_blend<ssemodesuffix><avxsizesuffix>"
    vblend<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssemov")
+   (set_attr "gpr32" "0")
    (set_attr "length_immediate" "1")
    (set_attr "prefix_data16" "1,1,*")
    (set_attr "prefix_extra" "1")
@@ -22173,7 +22186,7 @@ (define_insn "<sse4_1>_blendv<ssemodesuffix><avxsizesuffix>"
   [(set (match_operand:VF_128_256 0 "register_operand" "=Yr,*x,x")
 	(unspec:VF_128_256
 	  [(match_operand:VF_128_256 1 "register_operand" "0,0,x")
-	   (match_operand:VF_128_256 2 "vector_operand" "YrBm,*xBm,xm")
+	   (match_operand:VF_128_256 2 "vector_operand" "YrBT,*xBT,xBt")
 	   (match_operand:VF_128_256 3 "register_operand" "Yz,Yz,x")]
 	  UNSPEC_BLENDV))]
   "TARGET_SSE4_1"
@@ -22183,6 +22196,7 @@ (define_insn "<sse4_1>_blendv<ssemodesuffix><avxsizesuffix>"
    vblendv<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssemov")
+   (set_attr "gpr32" "0")
    (set_attr "length_immediate" "1")
    (set_attr "prefix_data16" "1,1,*")
    (set_attr "prefix_extra" "1")
@@ -22234,7 +22248,7 @@ (define_insn_and_split "*<sse4_1>_blendv<ssemodesuffix><avxsizesuffix>_lt"
   [(set (match_operand:VF_128_256 0 "register_operand" "=Yr,*x,x")
 	(unspec:VF_128_256
 	  [(match_operand:VF_128_256 1 "register_operand" "0,0,x")
-	   (match_operand:VF_128_256 2 "vector_operand" "YrBm,*xBm,xm")
+	   (match_operand:VF_128_256 2 "vector_operand" "YrBT,*xBT,xBt")
 	   (lt:VF_128_256
 	     (match_operand:<sseintvecmode> 3 "register_operand" "Yz,Yz,x")
 	     (match_operand:<sseintvecmode> 4 "const0_operand"))]
@@ -22248,6 +22262,7 @@ (define_insn_and_split "*<sse4_1>_blendv<ssemodesuffix><avxsizesuffix>_lt"
   "operands[3] = gen_lowpart (<MODE>mode, operands[3]);"
   [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssemov")
+   (set_attr "gpr32" "0")
    (set_attr "length_immediate" "1")
    (set_attr "prefix_data16" "1,1,*")
    (set_attr "prefix_extra" "1")
@@ -22266,7 +22281,7 @@ (define_insn_and_split "*<sse4_1>_blendv<ssefltmodesuffix><avxsizesuffix>_ltint"
   [(set (match_operand:<ssebytemode> 0 "register_operand" "=Yr,*x,x")
 	(unspec:<ssebytemode>
 	  [(match_operand:<ssebytemode> 1 "register_operand" "0,0,x")
-	   (match_operand:<ssebytemode> 2 "vector_operand" "YrBm,*xBm,xm")
+	   (match_operand:<ssebytemode> 2 "vector_operand" "YrBT,*xBT,xBt")
 	   (subreg:<ssebytemode>
 	     (lt:VI48_AVX
 	       (match_operand:VI48_AVX 3 "register_operand" "Yz,Yz,x")
@@ -22286,6 +22301,7 @@ (define_insn_and_split "*<sse4_1>_blendv<ssefltmodesuffix><avxsizesuffix>_ltint"
 }
   [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssemov")
+   (set_attr "gpr32" "0")
    (set_attr "length_immediate" "1")
    (set_attr "prefix_data16" "1,1,*")
    (set_attr "prefix_extra" "1")
@@ -22324,7 +22340,7 @@ (define_insn "<sse4_1>_dp<ssemodesuffix><avxsizesuffix>"
   [(set (match_operand:VF_128_256 0 "register_operand" "=Yr,*x,x")
 	(unspec:VF_128_256
 	  [(match_operand:VF_128_256 1 "vector_operand" "%0,0,x")
-	   (match_operand:VF_128_256 2 "vector_operand" "YrBm,*xBm,xm")
+	   (match_operand:VF_128_256 2 "vector_operand" "YrBT,*xBT,xBt")
 	   (match_operand:SI 3 "const_0_to_255_operand")]
 	  UNSPEC_DP))]
   "TARGET_SSE4_1"
@@ -22334,6 +22350,7 @@ (define_insn "<sse4_1>_dp<ssemodesuffix><avxsizesuffix>"
    vdp<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssemul")
+   (set_attr "gpr32" "0")
    (set_attr "length_immediate" "1")
    (set_attr "prefix_data16" "1,1,*")
    (set_attr "prefix_extra" "1")
@@ -22362,7 +22379,7 @@ (define_insn "<sse4_1_avx2>_mpsadbw"
   [(set (match_operand:VI1_AVX2 0 "register_operand" "=Yr,*x,x")
 	(unspec:VI1_AVX2
 	  [(match_operand:VI1_AVX2 1 "register_operand" "0,0,x")
-	   (match_operand:VI1_AVX2 2 "vector_operand" "YrBm,*xBm,xm")
+	   (match_operand:VI1_AVX2 2 "vector_operand" "YrBT,*xBT,xBt")
 	   (match_operand:SI 3 "const_0_to_255_operand")]
 	  UNSPEC_MPSADBW))]
   "TARGET_SSE4_1"
@@ -22372,6 +22389,7 @@ (define_insn "<sse4_1_avx2>_mpsadbw"
    vmpsadbw\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "sselog1")
+   (set_attr "gpr32" "0")
    (set_attr "length_immediate" "1")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "orig,orig,vex")
@@ -22400,7 +22418,7 @@ (define_insn "<sse4_1_avx2>_pblendvb"
   [(set (match_operand:VI1_AVX2 0 "register_operand" "=Yr,*x,x")
 	(unspec:VI1_AVX2
 	  [(match_operand:VI1_AVX2 1 "register_operand"  "0,0,x")
-	   (match_operand:VI1_AVX2 2 "vector_operand" "YrBm,*xBm,xm")
+	   (match_operand:VI1_AVX2 2 "vector_operand" "YrBT,*xBT,xBt")
 	   (match_operand:VI1_AVX2 3 "register_operand" "Yz,Yz,x")]
 	  UNSPEC_BLENDV))]
   "TARGET_SSE4_1"
@@ -22410,6 +22428,7 @@ (define_insn "<sse4_1_avx2>_pblendvb"
    vpblendvb\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssemov")
+   (set_attr "gpr32" "0")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "*,*,1")
    (set_attr "prefix" "orig,orig,vex")
@@ -22449,7 +22468,7 @@ (define_insn_and_split "*<sse4_1_avx2>_pblendvb_lt"
   [(set (match_operand:VI1_AVX2 0 "register_operand" "=Yr,*x,x")
 	(unspec:VI1_AVX2
 	  [(match_operand:VI1_AVX2 1 "register_operand"  "0,0,x")
-	   (match_operand:VI1_AVX2 2 "vector_operand" "YrBm,*xBm,xm")
+	   (match_operand:VI1_AVX2 2 "vector_operand" "YrBT,*xBT,xBt")
 	   (lt:VI1_AVX2 (match_operand:VI1_AVX2 3 "register_operand" "Yz,Yz,x")
 			(match_operand:VI1_AVX2 4 "const0_operand"))]
 	  UNSPEC_BLENDV))]
@@ -22462,6 +22481,7 @@ (define_insn_and_split "*<sse4_1_avx2>_pblendvb_lt"
   ""
   [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssemov")
+   (set_attr "gpr32" "0")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "*,*,1")
    (set_attr "prefix" "orig,orig,vex")
@@ -22493,7 +22513,7 @@ (define_insn_and_split "*<sse4_1_avx2>_pblendvb_lt_subreg_not"
 (define_insn "sse4_1_pblend<ssemodesuffix>"
   [(set (match_operand:V8_128 0 "register_operand" "=Yr,*x,x")
 	(vec_merge:V8_128
-	  (match_operand:V8_128 2 "vector_operand" "YrBm,*xBm,xm")
+	  (match_operand:V8_128 2 "vector_operand" "YrBT,*xBT,xBt")
 	  (match_operand:V8_128 1 "register_operand" "0,0,x")
 	  (match_operand:SI 3 "const_0_to_255_operand")))]
   "TARGET_SSE4_1"
@@ -22503,6 +22523,7 @@ (define_insn "sse4_1_pblend<ssemodesuffix>"
    vpblendw\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssemov")
+   (set_attr "gpr32" "0")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "orig,orig,vex")
@@ -22565,7 +22586,7 @@ (define_expand "avx2_pblend<ssemodesuffix>_1"
 (define_insn "*avx2_pblend<ssemodesuffix>"
   [(set (match_operand:V16_256 0 "register_operand" "=x")
 	(vec_merge:V16_256
-	  (match_operand:V16_256 2 "nonimmediate_operand" "xm")
+	  (match_operand:V16_256 2 "nonimmediate_operand" "xBt")
 	  (match_operand:V16_256 1 "register_operand" "x")
 	  (match_operand:SI 3 "avx2_pblendw_operand")))]
   "TARGET_AVX2"
@@ -22574,6 +22595,7 @@ (define_insn "*avx2_pblend<ssemodesuffix>"
   return "vpblendw\t{%3, %2, %1, %0|%0, %1, %2, %3}";
 }
   [(set_attr "type" "ssemov")
+   (set_attr "gpr32" "0")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "vex")
@@ -22582,7 +22604,7 @@ (define_insn "*avx2_pblend<ssemodesuffix>"
 (define_insn "avx2_pblendd<mode>"
   [(set (match_operand:VI4_AVX2 0 "register_operand" "=x")
 	(vec_merge:VI4_AVX2
-	  (match_operand:VI4_AVX2 2 "nonimmediate_operand" "xm")
+	  (match_operand:VI4_AVX2 2 "nonimmediate_operand" "xBt")
 	  (match_operand:VI4_AVX2 1 "register_operand" "x")
 	  (match_operand:SI 3 "const_0_to_255_operand")))]
   "TARGET_AVX2"
@@ -26443,11 +26465,13 @@ (define_insn "avx512f_perm<mode>_1<mask_name>"
    (set_attr "prefix" "<mask_prefix2>")
    (set_attr "mode" "<sseinsnmode>")])
 
+;; TODO (APX): vmovaps supports EGPR but not others, could split
+;; pattern to enable gpr32 for this one.
 (define_insn "avx2_permv2ti"
   [(set (match_operand:V4DI 0 "register_operand" "=x")
 	(unspec:V4DI
 	  [(match_operand:V4DI 1 "register_operand" "x")
-	   (match_operand:V4DI 2 "nonimmediate_operand" "xm")
+	   (match_operand:V4DI 2 "nonimmediate_operand" "xBt")
 	   (match_operand:SI 3 "const_0_to_255_operand")]
 	  UNSPEC_VPERMTI))]
   "TARGET_AVX2"
@@ -26474,6 +26498,7 @@ (define_insn "avx2_permv2ti"
     return "vperm2i128\t{%3, %2, %1, %0|%0, %1, %2, %3}";
   }
   [(set_attr "type" "sselog")
+   (set_attr "gpr32" "0")
    (set_attr "prefix" "vex")
    (set_attr "mode" "OI")])
 
@@ -27089,7 +27114,7 @@ (define_insn "*avx_vperm2f128<mode>_nozero"
 	(vec_select:AVX256MODE2P
 	  (vec_concat:<ssedoublevecmode>
 	    (match_operand:AVX256MODE2P 1 "register_operand" "x")
-	    (match_operand:AVX256MODE2P 2 "nonimmediate_operand" "xm"))
+	    (match_operand:AVX256MODE2P 2 "nonimmediate_operand" "xBt"))
 	  (match_parallel 3 ""
 	    [(match_operand 4 "const_int_operand")])))]
   "TARGET_AVX
@@ -27106,6 +27131,7 @@ (define_insn "*avx_vperm2f128<mode>_nozero"
   return "vperm2<i128>\t{%3, %2, %1, %0|%0, %1, %2, %3}";
 }
   [(set_attr "type" "sselog")
+   (set_attr "gpr32" "0")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "vex")
diff --git a/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c b/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c
index 1e5450dfb73..510213a6ca7 100644
--- a/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c
+++ b/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c
@@ -28,3 +28,109 @@ void legacy_test ()
 /* { dg-final { scan-assembler-not "xrstor64\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
 /* { dg-final { scan-assembler-not "fxsave64\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
 /* { dg-final { scan-assembler-not "fxrstor64\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+
+#ifdef DTYPE
+#undef DTYPE
+#define DTYPE u64
+#endif
+
+typedef union
+{
+  __m128i xi[8];
+  __m128 xf[8];
+  __m128d xd[8];
+  __m256i yi[4];
+  __m256 yf[4];
+  __m256d yd[4];
+  DTYPE a[16];
+} tmp_u;
+
+__attribute__((target("sse4.2")))
+void sse_test ()
+{
+  register tmp_u *tdst __asm__("%r16");
+  register tmp_u *src1 __asm__("%r17");
+  register tmp_u *src2 __asm__("%r18");
+ 
+  src1->xi[0] = _mm_hadd_epi16 (tdst->xi[2], src2->xi[3]);
+  src1->xi[1] = _mm_hadd_epi32 (tdst->xi[0], src2->xi[1]);
+  tdst->xi[2] = _mm_hadds_epi16 (src1->xi[4], src2->xi[5]);
+  tdst->xi[3] = _mm_hsub_epi16 (src1->xi[6], src2->xi[7]);
+  tdst->xi[4] = _mm_hsub_epi32 (src1->xi[0], src2->xi[1]);
+  tdst->xi[5] = _mm_hsubs_epi16 (src1->xi[2], src2->xi[3]);
+
+  src1->xi[6] = _mm_cmpeq_epi64 (tdst->xi[4], src2->xi[5]);
+  src1->xi[7] = _mm_cmpgt_epi64 (tdst->xi[6], src2->xi[7]);
+
+  tdst->xf[0] = _mm_dp_ps (src1->xf[0], src2->xf[1], 0xbf);
+  tdst->xd[1] = _mm_dp_pd (src1->xd[2], src2->xd[3], 0xae);
+
+  tdst->xi[2] = _mm_mpsadbw_epu8 (src1->xi[4], src2->xi[5], 0xc1);
+
+  tdst->xi[3] = _mm_blend_epi16 (src1->xi[6], src2->xi[7], 0xc);
+  tdst->xi[4] = _mm_blendv_epi8 (src1->xi[0], src2->xi[1], tdst->xi[2]);
+  tdst->xf[5] = _mm_blend_ps (src1->xf[3], src2->xf[4], 0x4);
+  tdst->xf[6] = _mm_blendv_ps (src1->xf[5], src2->xf[6], tdst->xf[7]);
+  tdst->xd[7] = _mm_blend_pd (tdst->xd[0], src1->xd[1], 0x1);
+  tdst->xd[0] = _mm_blendv_pd (src1->xd[2], src2->xd[3], tdst->xd[4]);
+
+  tdst->xi[1] = _mm_sign_epi8 (src1->xi[5], src2->xi[6]);
+  tdst->xi[2] = _mm_sign_epi16 (src1->xi[7], src2->xi[0]);
+  tdst->xi[3] = _mm_sign_epi32 (src1->xi[1], src2->xi[2]);
+}
+
+__attribute__((target("avx2")))
+void vex_test ()
+{
+
+  register tmp_u *tdst __asm__("%r16");
+  register tmp_u *src1 __asm__("%r17");
+  register tmp_u *src2 __asm__("%r18");
+  
+  src1->yi[1] = _mm256_hadd_epi16 (tdst->yi[2], src2->yi[3]);
+  src1->yi[2] = _mm256_hadd_epi32 (tdst->yi[0], src2->yi[1]);
+  tdst->yi[3] = _mm256_hadds_epi16 (src1->yi[1], src2->yi[2]);
+  tdst->yi[0] = _mm256_hsub_epi16 (src1->yi[3], src2->yi[0]);
+  tdst->yi[1] = _mm256_hsub_epi32 (src1->yi[0], src2->yi[1]);
+  tdst->yi[2] = _mm256_hsubs_epi16 (src1->yi[2], src2->yi[3]);
+
+  src1->yi[2] = _mm256_cmpeq_epi64 (tdst->yi[1], src2->yi[2]);
+  src1->yi[1] = _mm256_cmpgt_epi64 (tdst->yi[3], src2->yi[0]);
+
+  tdst->yf[2] = _mm256_dp_ps (src1->yf[0], src2->yf[1], 0xbf);
+  tdst->xd[3] = _mm_dp_pd (src1->xd[0], src2->xd[1], 0xbf);
+
+  tdst->yi[3] = _mm256_mpsadbw_epu8 (src1->yi[1], src2->yi[1], 0xc1);
+
+  tdst->yi[0] = _mm256_blend_epi16 (src1->yi[1], src2->yi[2], 0xc);
+  tdst->yi[1] = _mm256_blendv_epi8 (src1->yi[1], src2->yi[2], tdst->yi[0]);
+  tdst->yf[2] = _mm256_blend_ps (src1->yf[0], src2->yf[1], 0x4);
+  tdst->yf[3] = _mm256_blendv_ps (src1->yf[2], src2->yf[3], tdst->yf[1]);
+  tdst->yd[3] = _mm256_blend_pd (tdst->yd[1], src1->yd[0], 0x1);
+  tdst->yd[1] = _mm256_blendv_pd (src1->yd[2], src2->yd[3], tdst->yd[2]);
+
+  tdst->yi[2] = _mm256_sign_epi8 (src1->yi[0], src2->yi[1]);
+  tdst->yi[3] = _mm256_sign_epi16 (src1->yi[2], src2->yi[3]);
+  tdst->yi[0] = _mm256_sign_epi32 (src1->yi[0], src2->yi[1]);
+}
+
+/* { dg-final { scan-assembler-not "v?pcmpeqq\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?pcmpgtq\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?phaddw\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?phaddd\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?phaddsw\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?phsubw\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?phsubd\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?phsubsw\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?dpps\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?dppd\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?psadbw\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?pblendw\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?pblendvb\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?blendps\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?blendvps\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?blendpd\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?blendvpd\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?psignb\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?psignw\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?psignd\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
-- 
2.31.1


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 11/13] [APX EGPR] Handle legacy insns that only support GPR16 (3/5)
  2023-08-31  8:20 [PATCH 00/13] [RFC] Support Intel APX EGPR Hongyu Wang
                   ` (9 preceding siblings ...)
  2023-08-31  8:20 ` [PATCH 10/13] [APX EGPR] Handle legacy insns that only support GPR16 (2/5) Hongyu Wang
@ 2023-08-31  8:20 ` Hongyu Wang
  2023-08-31  9:26   ` Richard Biener
  2023-08-31  8:20 ` [PATCH 12/13] [APX_EGPR] Handle legacy insns that only support GPR16 (4/5) Hongyu Wang
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 49+ messages in thread
From: Hongyu Wang @ 2023-08-31  8:20 UTC (permalink / raw)
  To: gcc-patches; +Cc: hongtao.liu, ubizjak, hubicka, vmakarov, jakub, Kong Lingling

From: Kong Lingling <lingling.kong@intel.com>

Disable EGPR usage for below legacy insns in opcode map2/3 that have vex
but no evex counterpart.

insn list:
1. phminposuw/vphminposuw
2. ptest/vptest
3. roundps/vroundps, roundpd/vroundpd,
   roundss/vroundss, roundsd/vroundsd
4. pcmpestri/vpcmpestri, pcmpestrm/vpcmpestrm
5. pcmpistri/vpcmpistri, pcmpistrm/vpcmpistrm
6. aesimc/vaesimc, aeskeygenassist/vaeskeygenassist

gcc/ChangeLog:

	* config/i386/i386-protos.h (x86_evex_reg_mentioned_p): New
	prototype.
	* config/i386/i386.cc (x86_evex_reg_mentioned_p): New
	function.
	* config/i386/i386.md (sse4_1_round<mode>2): Set attr gpr32 0
	and constraint Bt/BM to all non-evex alternatives, adjust
	alternative outputs if evex reg is mentioned.
	* config/i386/sse.md (<sse4_1>_ptest<mode>): Set attr gpr32 0
	and constraint Bt/BM to all non-evex alternatives.
	(ptesttf2): Likewise.
	(<sse4_1>_round<ssemodesuffix><avxsizesuffix): Likewise.
	(sse4_1_round<ssescalarmodesuffix>): Likewise.
	(sse4_2_pcmpestri): Likewise.
	(sse4_2_pcmpestrm): Likewise.
	(sse4_2_pcmpestr_cconly): Likewise.
	(sse4_2_pcmpistr): Likewise.
	(sse4_2_pcmpistri): Likewise.
	(sse4_2_pcmpistrm): Likewise.
	(sse4_2_pcmpistr_cconly): Likewise.
	(aesimc): Likewise.
	(aeskeygenassist): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/apx-legacy-insn-check-norex2.c: Add intrinsic
	tests.
---
 gcc/config/i386/i386-protos.h                 |  1 +
 gcc/config/i386/i386.cc                       | 13 +++
 gcc/config/i386/i386.md                       |  3 +-
 gcc/config/i386/sse.md                        | 93 +++++++++++++------
 .../i386/apx-legacy-insn-check-norex2.c       | 55 ++++++++++-
 5 files changed, 132 insertions(+), 33 deletions(-)

diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 78eb3e0f584..bbb219e3039 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -65,6 +65,7 @@ extern bool extended_reg_mentioned_p (rtx);
 extern bool x86_extended_QIreg_mentioned_p (rtx_insn *);
 extern bool x86_extended_reg_mentioned_p (rtx);
 extern bool x86_extended_rex2reg_mentioned_p (rtx);
+extern bool x86_evex_reg_mentioned_p (rtx [], int);
 extern bool x86_maybe_negate_const_int (rtx *, machine_mode);
 extern machine_mode ix86_cc_mode (enum rtx_code, rtx, rtx);
 
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index f5d642948bc..ec93c5bab97 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -22936,6 +22936,19 @@ x86_extended_rex2reg_mentioned_p (rtx insn)
   return false;
 }
 
+/* Return true when rtx operands mentions register that must be encoded using
+   evex prefix.  */
+bool
+x86_evex_reg_mentioned_p (rtx operands[], int nops)
+{
+  int i;
+  for (i = 0; i < nops; i++)
+    if (EXT_REX_SSE_REG_P (operands[i])
+	|| x86_extended_rex2reg_mentioned_p (operands[i]))
+      return true;
+  return false;
+}
+
 /* If profitable, negate (without causing overflow) integer constant
    of mode MODE at location LOC.  Return true in this case.  */
 bool
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 83ad01b43c1..4c305e72389 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -21603,7 +21603,7 @@ (define_expand "significand<mode>2"
 (define_insn "sse4_1_round<mode>2"
   [(set (match_operand:MODEFH 0 "register_operand" "=x,x,x,v,v")
 	(unspec:MODEFH
-	  [(match_operand:MODEFH 1 "nonimmediate_operand" "0,x,m,v,m")
+	  [(match_operand:MODEFH 1 "nonimmediate_operand" "0,x,Bt,v,m")
 	   (match_operand:SI 2 "const_0_to_15_operand")]
 	  UNSPEC_ROUND))]
   "TARGET_SSE4_1"
@@ -21616,6 +21616,7 @@ (define_insn "sse4_1_round<mode>2"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix_extra" "1,1,1,*,*")
    (set_attr "length_immediate" "1")
+   (set_attr "gpr32" "1,1,0,1,1")
    (set_attr "prefix" "maybe_vex,maybe_vex,maybe_vex,evex,evex")
    (set_attr "isa" "noavx512f,noavx512f,noavx512f,avx512f,avx512f")
    (set_attr "avx_partial_xmm_update" "false,false,true,false,true")
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 05963de9219..456713b991a 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -22617,11 +22617,12 @@ (define_insn "avx2_pblendd<mode>"
 
 (define_insn "sse4_1_phminposuw"
   [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,x")
-	(unspec:V8HI [(match_operand:V8HI 1 "vector_operand" "YrBm,*xBm,xm")]
+	(unspec:V8HI [(match_operand:V8HI 1 "vector_operand" "YrBT,*xBT,xBt")]
 		     UNSPEC_PHMINPOSUW))]
   "TARGET_SSE4_1"
   "%vphminposuw\t{%1, %0|%0, %1}"
   [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "gpr32" "0")
    (set_attr "type" "sselog1")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "orig,orig,vex")
@@ -23810,12 +23811,13 @@ (define_insn "avx_vtest<ssemodesuffix><avxsizesuffix>"
 (define_insn "*<sse4_1>_ptest<mode>"
   [(set (reg FLAGS_REG)
 	(unspec [(match_operand:V_AVX 0 "register_operand" "Yr, *x, x")
-		 (match_operand:V_AVX 1 "vector_operand" "YrBm, *xBm, xm")]
+		 (match_operand:V_AVX 1 "vector_operand" "YrBT, *xBT, xBt")]
 		UNSPEC_PTEST))]
   "TARGET_SSE4_1 && ix86_match_ptest_ccmode (insn)"
   "%vptest\t{%1, %0|%0, %1}"
   [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssecomi")
+   (set_attr "gpr32" "0")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "orig,orig,vex")
    (set (attr "btver2_decode")
@@ -23852,12 +23854,13 @@ (define_expand "<sse4_1>_ptest<mode>"
 (define_insn "ptesttf2"
   [(set (reg:CC FLAGS_REG)
 	(unspec:CC [(match_operand:TF 0 "register_operand" "Yr, *x, x")
-		    (match_operand:TF 1 "vector_operand" "YrBm, *xBm, xm")]
+		    (match_operand:TF 1 "vector_operand" "YrBT, *xBT, xBt")]
 		   UNSPEC_PTEST))]
   "TARGET_SSE4_1"
   "%vptest\t{%1, %0|%0, %1}"
   [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssecomi")
+   (set_attr "gpr32" "0")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "orig,orig,vex")
    (set_attr "mode" "TI")])
@@ -23968,13 +23971,14 @@ (define_expand "lrint<mode><sseintvecmodelower>2"
 (define_insn "<sse4_1>_round<ssemodesuffix><avxsizesuffix>"
   [(set (match_operand:VF_128_256 0 "register_operand" "=Yr,*x,x")
 	(unspec:VF_128_256
-	  [(match_operand:VF_128_256 1 "vector_operand" "YrBm,*xBm,xm")
+	  [(match_operand:VF_128_256 1 "vector_operand" "YrBT,*xBT,xBt")
 	   (match_operand:SI 2 "const_0_to_15_operand")]
 	  UNSPEC_ROUND))]
   "TARGET_SSE4_1"
   "%vround<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssecvt")
+   (set_attr "gpr32" "0")
    (set_attr "prefix_data16" "1,1,*")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
@@ -24061,19 +24065,32 @@ (define_insn "sse4_1_round<ssescalarmodesuffix>"
   [(set (match_operand:VF_128 0 "register_operand" "=Yr,*x,x,v")
 	(vec_merge:VF_128
 	  (unspec:VF_128
-	    [(match_operand:VF_128 2 "nonimmediate_operand" "Yrm,*xm,xm,vm")
+	    [(match_operand:VF_128 2 "nonimmediate_operand" "YrBt,*xBt,xBt,vm")
 	     (match_operand:SI 3 "const_0_to_15_operand")]
 	    UNSPEC_ROUND)
 	  (match_operand:VF_128 1 "register_operand" "0,0,x,v")
 	  (const_int 1)))]
   "TARGET_SSE4_1"
-  "@
-   round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %3}
-   round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %3}
-   vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}
-   vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}"
-  [(set_attr "isa" "noavx,noavx,avx,avx512f")
+{
+  switch (which_alternative)
+    {
+      case 0:
+      case 1:
+	return "round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %3}";
+      case 2:
+	return "vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}";
+      case 3:
+	if (x86_evex_reg_mentioned_p (operands, 3))
+	  return "vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}";
+	else
+	  return "vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}";
+      default:
+	gcc_unreachable ();
+    }
+}
+  [(set_attr "isa" "noavx,noavx,noavx512f,avx512f")
    (set_attr "type" "ssecvt")
+   (set_attr "gpr32" "0,0,0,1")
    (set_attr "length_immediate" "1")
    (set_attr "prefix_data16" "1,1,*,*")
    (set_attr "prefix_extra" "1")
@@ -24085,19 +24102,32 @@ (define_insn "*sse4_1_round<ssescalarmodesuffix>"
 	(vec_merge:VFH_128
 	  (vec_duplicate:VFH_128
 	    (unspec:<ssescalarmode>
-	      [(match_operand:<ssescalarmode> 2 "nonimmediate_operand" "Yrm,*xm,xm,vm")
+	      [(match_operand:<ssescalarmode> 2 "nonimmediate_operand" "YrBt,*xBt,xBt,vm")
 	       (match_operand:SI 3 "const_0_to_15_operand")]
 	      UNSPEC_ROUND))
 	  (match_operand:VFH_128 1 "register_operand" "0,0,x,v")
 	  (const_int 1)))]
   "TARGET_SSE4_1"
-  "@
-   round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %2, %3}
-   round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %2, %3}
-   vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}
-   vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
-  [(set_attr "isa" "noavx,noavx,avx,avx512f")
+{
+  switch (which_alternative)
+    {
+      case 0:
+      case 1:
+	return "round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %2, %3}";
+      case 2:
+	return "vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}";
+      case 3:
+	if (x86_evex_reg_mentioned_p (operands, 3) || <MODE>mode == V8HFmode)
+	  return "vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}";
+	else
+	  return "vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}";
+      default:
+	gcc_unreachable ();
+    }
+}
+  [(set_attr "isa" "noavx,noavx,noavx512f,avx512f")
    (set_attr "type" "ssecvt")
+   (set_attr "gpr32" "0,0,0,1")
    (set_attr "length_immediate" "1")
    (set_attr "prefix_data16" "1,1,*,*")
    (set_attr "prefix_extra" "1")
@@ -24318,7 +24348,7 @@ (define_insn "sse4_2_pcmpestri"
 	(unspec:SI
 	  [(match_operand:V16QI 1 "register_operand" "x,x")
 	   (match_operand:SI 2 "register_operand" "a,a")
-	   (match_operand:V16QI 3 "nonimmediate_operand" "x,m")
+	   (match_operand:V16QI 3 "nonimmediate_operand" "x,Bt")
 	   (match_operand:SI 4 "register_operand" "d,d")
 	   (match_operand:SI 5 "const_0_to_255_operand")]
 	  UNSPEC_PCMPESTR))
@@ -24333,6 +24363,7 @@ (define_insn "sse4_2_pcmpestri"
   "TARGET_SSE4_2"
   "%vpcmpestri\t{%5, %3, %1|%1, %3, %5}"
   [(set_attr "type" "sselog")
+   (set_attr "gpr32" "0")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "maybe_vex")
    (set_attr "length_immediate" "1")
@@ -24345,7 +24376,7 @@ (define_insn "sse4_2_pcmpestrm"
 	(unspec:V16QI
 	  [(match_operand:V16QI 1 "register_operand" "x,x")
 	   (match_operand:SI 2 "register_operand" "a,a")
-	   (match_operand:V16QI 3 "nonimmediate_operand" "x,m")
+	   (match_operand:V16QI 3 "nonimmediate_operand" "x,Bt")
 	   (match_operand:SI 4 "register_operand" "d,d")
 	   (match_operand:SI 5 "const_0_to_255_operand")]
 	  UNSPEC_PCMPESTR))
@@ -24360,6 +24391,7 @@ (define_insn "sse4_2_pcmpestrm"
   "TARGET_SSE4_2"
   "%vpcmpestrm\t{%5, %3, %1|%1, %3, %5}"
   [(set_attr "type" "sselog")
+   (set_attr "gpr32" "0")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "maybe_vex")
@@ -24372,7 +24404,7 @@ (define_insn "sse4_2_pcmpestr_cconly"
 	(unspec:CC
 	  [(match_operand:V16QI 2 "register_operand" "x,x,x,x")
 	   (match_operand:SI 3 "register_operand" "a,a,a,a")
-	   (match_operand:V16QI 4 "nonimmediate_operand" "x,m,x,m")
+	   (match_operand:V16QI 4 "nonimmediate_operand" "x,Bt,x,Bt")
 	   (match_operand:SI 5 "register_operand" "d,d,d,d")
 	   (match_operand:SI 6 "const_0_to_255_operand")]
 	  UNSPEC_PCMPESTR))
@@ -24385,6 +24417,7 @@ (define_insn "sse4_2_pcmpestr_cconly"
    %vpcmpestri\t{%6, %4, %2|%2, %4, %6}
    %vpcmpestri\t{%6, %4, %2|%2, %4, %6}"
   [(set_attr "type" "sselog")
+   (set_attr "gpr32" "0")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
    (set_attr "memory" "none,load,none,load")
@@ -24396,7 +24429,7 @@ (define_insn_and_split "sse4_2_pcmpistr"
   [(set (match_operand:SI 0 "register_operand" "=c,c")
 	(unspec:SI
 	  [(match_operand:V16QI 2 "register_operand" "x,x")
-	   (match_operand:V16QI 3 "nonimmediate_operand" "x,m")
+	   (match_operand:V16QI 3 "nonimmediate_operand" "x,Bt")
 	   (match_operand:SI 4 "const_0_to_255_operand")]
 	  UNSPEC_PCMPISTR))
    (set (match_operand:V16QI 1 "register_operand" "=Yz,Yz")
@@ -24439,6 +24472,7 @@ (define_insn_and_split "sse4_2_pcmpistr"
   DONE;
 }
   [(set_attr "type" "sselog")
+   (set_attr "gpr32" "0")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
    (set_attr "memory" "none,load")
@@ -24448,7 +24482,7 @@ (define_insn "sse4_2_pcmpistri"
   [(set (match_operand:SI 0 "register_operand" "=c,c")
 	(unspec:SI
 	  [(match_operand:V16QI 1 "register_operand" "x,x")
-	   (match_operand:V16QI 2 "nonimmediate_operand" "x,m")
+	   (match_operand:V16QI 2 "nonimmediate_operand" "x,Bt")
 	   (match_operand:SI 3 "const_0_to_255_operand")]
 	  UNSPEC_PCMPISTR))
    (set (reg:CC FLAGS_REG)
@@ -24460,6 +24494,7 @@ (define_insn "sse4_2_pcmpistri"
   "TARGET_SSE4_2"
   "%vpcmpistri\t{%3, %2, %1|%1, %2, %3}"
   [(set_attr "type" "sselog")
+   (set_attr "gpr32" "0")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "maybe_vex")
@@ -24471,7 +24506,7 @@ (define_insn "sse4_2_pcmpistrm"
   [(set (match_operand:V16QI 0 "register_operand" "=Yz,Yz")
 	(unspec:V16QI
 	  [(match_operand:V16QI 1 "register_operand" "x,x")
-	   (match_operand:V16QI 2 "nonimmediate_operand" "x,m")
+	   (match_operand:V16QI 2 "nonimmediate_operand" "x,Bt")
 	   (match_operand:SI 3 "const_0_to_255_operand")]
 	  UNSPEC_PCMPISTR))
    (set (reg:CC FLAGS_REG)
@@ -24483,6 +24518,7 @@ (define_insn "sse4_2_pcmpistrm"
   "TARGET_SSE4_2"
   "%vpcmpistrm\t{%3, %2, %1|%1, %2, %3}"
   [(set_attr "type" "sselog")
+   (set_attr "gpr32" "0")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "maybe_vex")
@@ -24494,7 +24530,7 @@ (define_insn "sse4_2_pcmpistr_cconly"
   [(set (reg:CC FLAGS_REG)
 	(unspec:CC
 	  [(match_operand:V16QI 2 "register_operand" "x,x,x,x")
-	   (match_operand:V16QI 3 "nonimmediate_operand" "x,m,x,m")
+	   (match_operand:V16QI 3 "nonimmediate_operand" "x,Bt,x,Bt")
 	   (match_operand:SI 4 "const_0_to_255_operand")]
 	  UNSPEC_PCMPISTR))
    (clobber (match_scratch:V16QI 0 "=Yz,Yz,X,X"))
@@ -24506,6 +24542,7 @@ (define_insn "sse4_2_pcmpistr_cconly"
    %vpcmpistri\t{%4, %3, %2|%2, %3, %4}
    %vpcmpistri\t{%4, %3, %2|%2, %3, %4}"
   [(set_attr "type" "sselog")
+   (set_attr "gpr32" "0")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
    (set_attr "memory" "none,load,none,load")
@@ -25990,23 +26027,25 @@ (define_insn "aesdeclast"
 
 (define_insn "aesimc"
   [(set (match_operand:V2DI 0 "register_operand" "=x")
-	(unspec:V2DI [(match_operand:V2DI 1 "vector_operand" "xBm")]
+	(unspec:V2DI [(match_operand:V2DI 1 "vector_operand" "xBT")]
 		      UNSPEC_AESIMC))]
   "TARGET_AES"
   "%vaesimc\t{%1, %0|%0, %1}"
   [(set_attr "type" "sselog1")
+   (set_attr "gpr32" "0")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "TI")])
 
 (define_insn "aeskeygenassist"
   [(set (match_operand:V2DI 0 "register_operand" "=x")
-	(unspec:V2DI [(match_operand:V2DI 1 "vector_operand" "xBm")
+	(unspec:V2DI [(match_operand:V2DI 1 "vector_operand" "xBT")
 		      (match_operand:SI 2 "const_0_to_255_operand")]
 		     UNSPEC_AESKEYGENASSIST))]
   "TARGET_AES"
   "%vaeskeygenassist\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "type" "sselog1")
+   (set_attr "gpr32" "0")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "maybe_vex")
diff --git a/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c b/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c
index 510213a6ca7..771bcb078e1 100644
--- a/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c
+++ b/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c
@@ -45,13 +45,22 @@ typedef union
   DTYPE a[16];
 } tmp_u;
 
-__attribute__((target("sse4.2")))
+__attribute__((target("sse4.2,aes")))
 void sse_test ()
 {
   register tmp_u *tdst __asm__("%r16");
   register tmp_u *src1 __asm__("%r17");
   register tmp_u *src2 __asm__("%r18");
- 
+
+  src1->xi[0] = _mm_minpos_epu16 (src1->xi[1]);
+  src1->a[2] = _mm_testc_si128 (src1->xi[3], src2->xi[4]);
+  src1->xf[3] = _mm_round_ss (src1->xf[5], src2->xf[6],
+			      _MM_FROUND_CUR_DIRECTION);
+  src1->xf[4] = _mm_round_ps (src1->xf[7], _MM_FROUND_CUR_DIRECTION);
+  src1->xd[0] = _mm_round_sd (src1->xd[2], src2->xd[3],
+			      _MM_FROUND_CUR_DIRECTION);
+  src1->xd[1] = _mm_round_pd (src1->xd[4], _MM_FROUND_CUR_DIRECTION);
+
   src1->xi[0] = _mm_hadd_epi16 (tdst->xi[2], src2->xi[3]);
   src1->xi[1] = _mm_hadd_epi32 (tdst->xi[0], src2->xi[1]);
   tdst->xi[2] = _mm_hadds_epi16 (src1->xi[4], src2->xi[5]);
@@ -77,16 +86,33 @@ void sse_test ()
   tdst->xi[1] = _mm_sign_epi8 (src1->xi[5], src2->xi[6]);
   tdst->xi[2] = _mm_sign_epi16 (src1->xi[7], src2->xi[0]);
   tdst->xi[3] = _mm_sign_epi32 (src1->xi[1], src2->xi[2]);
+
+  tdst->a[2] = _mm_cmpestri (src1->xi[3], 16, src2->xi[4], 16, 0x0c);
+  tdst->xi[4] = _mm_cmpestrm (src1->xi[3], 16, src2->xi[4], 16, 0x20);
+  tdst->a[5] = _mm_cmpistri (src1->xi[5], src2->xi[6], 0x30);
+  tdst->xi[6] = _mm_cmpistrm (src1->xi[5], src2->xi[6], 0x40);
+
+  tdst->xi[7] = _mm_aesimc_si128 (src1->xi[7]);
+  tdst->xi[0] = _mm_aeskeygenassist_si128 (src1->xi[1], 0x1b);
 }
 
-__attribute__((target("avx2")))
+__attribute__((target("avx2,aes")))
 void vex_test ()
 {
 
   register tmp_u *tdst __asm__("%r16");
   register tmp_u *src1 __asm__("%r17");
   register tmp_u *src2 __asm__("%r18");
-  
+ 
+  src1->xi[0] = _mm_minpos_epu16 (src1->xi[1]);
+  src1->a[2] = _mm256_testc_si256 (src1->yi[2], src2->yi[3]);
+  src1->xf[3] = _mm_round_ss (src1->xf[5], src2->xf[6],
+			      _MM_FROUND_CUR_DIRECTION);
+  src1->yf[4] = _mm256_round_ps (src1->yf[2], _MM_FROUND_CUR_DIRECTION);
+  src1->xd[0] = _mm_round_sd (src1->xd[2], src2->xd[3],
+			      _MM_FROUND_CUR_DIRECTION);
+  src1->yd[1] = _mm256_round_pd (src1->yd[3], _MM_FROUND_CUR_DIRECTION);
+ 
   src1->yi[1] = _mm256_hadd_epi16 (tdst->yi[2], src2->yi[3]);
   src1->yi[2] = _mm256_hadd_epi32 (tdst->yi[0], src2->yi[1]);
   tdst->yi[3] = _mm256_hadds_epi16 (src1->yi[1], src2->yi[2]);
@@ -98,7 +124,6 @@ void vex_test ()
   src1->yi[1] = _mm256_cmpgt_epi64 (tdst->yi[3], src2->yi[0]);
 
   tdst->yf[2] = _mm256_dp_ps (src1->yf[0], src2->yf[1], 0xbf);
-  tdst->xd[3] = _mm_dp_pd (src1->xd[0], src2->xd[1], 0xbf);
 
   tdst->yi[3] = _mm256_mpsadbw_epu8 (src1->yi[1], src2->yi[1], 0xc1);
 
@@ -112,6 +137,14 @@ void vex_test ()
   tdst->yi[2] = _mm256_sign_epi8 (src1->yi[0], src2->yi[1]);
   tdst->yi[3] = _mm256_sign_epi16 (src1->yi[2], src2->yi[3]);
   tdst->yi[0] = _mm256_sign_epi32 (src1->yi[0], src2->yi[1]);
+
+  tdst->a[2] = _mm_cmpestri (src1->xi[3], 16, src2->xi[4], 16, 0x0c);
+  tdst->xi[4] = _mm_cmpestrm (src1->xi[3], 16, src2->xi[4], 16, 0x20);
+  tdst->a[5] = _mm_cmpistri (src1->xi[5], src2->xi[6], 0x30);
+  tdst->xi[6] = _mm_cmpistrm (src1->xi[5], src2->xi[6], 0x40);
+
+  tdst->xi[7] = _mm_aesimc_si128 (src1->xi[7]);
+  tdst->xi[0] = _mm_aeskeygenassist_si128 (src1->xi[1], 0x1b);
 }
 
 /* { dg-final { scan-assembler-not "v?pcmpeqq\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
@@ -134,3 +167,15 @@ void vex_test ()
 /* { dg-final { scan-assembler-not "v?psignb\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
 /* { dg-final { scan-assembler-not "v?psignw\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
 /* { dg-final { scan-assembler-not "v?psignd\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?phminposuw\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?ptest\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?roundss\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?roundsd\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?roundps\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?roundpd\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?pcmpestri\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?pcmpistri\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?pcmpestrm\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?pcmpistrm\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?aesimc\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?aeskeygenassist\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
-- 
2.31.1


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 12/13] [APX_EGPR] Handle legacy insns that only support GPR16 (4/5)
  2023-08-31  8:20 [PATCH 00/13] [RFC] Support Intel APX EGPR Hongyu Wang
                   ` (10 preceding siblings ...)
  2023-08-31  8:20 ` [PATCH 11/13] [APX EGPR] Handle legacy insns that only support GPR16 (3/5) Hongyu Wang
@ 2023-08-31  8:20 ` Hongyu Wang
  2023-08-31  8:20 ` [PATCH 13/13] [APX EGPR] Handle vex insns that only support GPR16 (5/5) Hongyu Wang
  2023-08-31  9:19 ` [PATCH 00/13] [RFC] Support Intel APX EGPR Richard Biener
  13 siblings, 0 replies; 49+ messages in thread
From: Hongyu Wang @ 2023-08-31  8:20 UTC (permalink / raw)
  To: gcc-patches; +Cc: hongtao.liu, ubizjak, hubicka, vmakarov, jakub, Kong Lingling

From: Kong Lingling <lingling.kong@intel.com>

The APX enabled hardware should also be AVX10 enabled, thus for map2/3 insns
with evex counterpart, we assume auto promotion to EGPR under APX_F if the
insn uses GPR32. So for below insns, we disabled EGPR usage for their sse
mnenomics, while allowing egpr generation of their v prefixed mnemonics.

insn list:
1. pabsb/pabsw/pabsd
2. pextrb/pextrw/pextrd/pextrq
3. pinsrb/pinsrd/pinsrq
4. pshufb
5. extractps/insertps
6. pmaddubsw
7. pmulhrsw
8. packusdw
9. palignr
10. movntdqa
11. mpsadbw
12. pmuldq/pmulld
13. pmaxsb/pmaxsd, pminsb/pminsd
    pmaxud/pmaxuw, pminud/pminuw
14. (pmovsxbw/pmovsxbd/pmovsxbq,
     pmovsxwd/pmovsxwq, pmovsxdq
     pmovzxbw/pmovzxbd/pmovzxbq,
     pmovzxwd/pmovzxwq, pmovzxdq)
15. aesdec/aesdeclast, aesenc/aesenclast
16. pclmulqdq
17. gf2p8affineqb/gf2p8affineinvqb/gf2p8mulb

gcc/ChangeLog:

	* config/i386/i386.md (*movhi_internal): Split out non-gpr
	supported pextrw with mem constraint to avx/noavx alternatives,
	set Bt and attr gpr32 0 to the noavx alternative.
	(*mov<mode>_internal): Likewise.
	* config/i386/mmx.md (mmx_pshufbv8qi3): Change "r/m/Bm" to
	"h/Bt/BT" and set_attr gpr32 0 for noavx alternative.
	(mmx_pshufbv4qi3): Likewise.
	(*mmx_pinsrd): Likewise.
	(*mmx_pinsrb): Likewise.
	(*pinsrb): Likewise.
	(mmx_pshufbv8qi3): Likewise.
	(mmx_pshufbv4qi3): Likewise.
	(@sse4_1_insertps_<mode>): Likewise.
	(*mmx_pextrw): Split altrenatives and map non-EGPR
	constraints, attr_gpr32 and attr_isa to noavx mnemonics.
	(*movv2qi_internal): Likewise.
	(*pextrw): Likewise.
	(*mmx_pextrb): Likewise.
	(*mmx_pextrb_zext): Likewise.
	(*pextrb): Likewise.
	(*pextrb_zext): Likewise.
	(vec_extractv2si_1): Likewise.
	(vec_extractv2si_1_zext): Likewise.
	* config/i386/sse.md: (vi128_h_r): New mode attr for
	pinsr{bw}/pextr{bw} with reg operand.
	(*abs<mode>2): Split altrenatives and %v in mnemonics, map
	non-EGPR constraints, gpr32 and isa attrs to noavx mnemonics.
	(*vec_extract<mode>): Likewise.
	(*vec_extract<mode>): Likewise for HFBF pattern.
	(*vec_extract<PEXTR_MODE12:mode>_zext): Likewise.
	(*vec_extractv4si_1): Likewise.
	(*vec_extractv4si_zext): Likewise.
	(*vec_extractv2di_1): Likewise.
	(*vec_concatv2si_sse4_1): Likewise.
	(<sse2p4_1>_pinsr<ssemodesuffix>): Likewise.
	(vec_concatv2di): Likewise.
	(*sse4_1_<code>v2qiv2di2<mask_name>_1): Likewise.
	(ssse3_avx2>_pshufb<mode>3<mask_name>): Change "r/m/Bm" to
	"h/Bt/BT" and set_attr gpr32 0 for noavx alternative, split
	%v for avx/noavx alternatives if necessary.
	(*vec_concatv2sf_sse4_1): Likewise.
	(*sse4_1_extractps): Likewise.
	(vec_set<mode>_0): Likewise for VI4F_128.
	(*vec_setv4sf_sse4_1): Likewise.
	(@sse4_1_insertps<mode>): Likewise.
	(ssse3_pmaddubsw128): Likewise.
	(*<ssse3_avx2>_pmulhrsw<mode>3<mask_name>): Likewise.
	(<sse4_1_avx2>_packusdw<mask_name>): Likewise.
	(<ssse3_avx2>_palignr<mode>): Likewise.
	(<vi8_sse4_1_avx2_avx512>_movntdqa): Likewise.
	(<sse4_1_avx2>_mpsadbw): Likewise.
	(*sse4_1_mulv2siv2di3<mask_name>): Likewise.
	(*<sse4_1_avx2>_mul<mode>3<mask_name>): Likewise.
	(*sse4_1_<code><mode>3<mask_name>): Likewise.
	(*<code>v8hi3): Likewise.
	(*<code>v16qi3): Likewise.
	(*sse4_1_<code>v8qiv8hi2<mask_name>_1): Likewise.
	(*sse4_1_zero_extendv8qiv8hi2_3): Likewise.
	(*sse4_1_zero_extendv8qiv8hi2_4): Likewise.
	(*sse4_1_<code>v4qiv4si2<mask_name>_1): Likewise.
	(*sse4_1_<code>v4hiv4si2<mask_name>_1): Likewise.
	(*sse4_1_zero_extendv4hiv4si2_3): Likewise.
	(*sse4_1_zero_extendv4hiv4si2_4): Likewise.
	(*sse4_1_<code>v2hiv2di2<mask_name>_1): Likewise.
	(*sse4_1_<code>v2siv2di2<mask_name>_1): Likewise.
	(*sse4_1_zero_extendv2siv2di2_3): Likewise.
	(*sse4_1_zero_extendv2siv2di2_4): Likewise.
	(aesdec): Likewise.
	(aesdeclast): Likewise.
	(aesenc): Likewise.
	(aesenclast): Likewise.
	(pclmulqdq): Likewise.
	(vgf2p8affineinvqb_<mode><mask_name>): Likewise.
	(vgf2p8affineqb_<mode><mask_name>): Likewise.
	(vgf2p8mulb_<mode><mask_name>): Likewise.
---
 gcc/config/i386/i386.md |  50 ++++---
 gcc/config/i386/mmx.md  | 159 ++++++++++++--------
 gcc/config/i386/sse.md  | 315 ++++++++++++++++++++++++++--------------
 3 files changed, 339 insertions(+), 185 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 4c305e72389..8ec249b268d 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -2868,9 +2868,9 @@ (define_peephole2
 
 (define_insn "*movhi_internal"
   [(set (match_operand:HI 0 "nonimmediate_operand"
-    "=r,r,r,m ,*k,*k ,r ,m ,*k ,?r,?*v,*v,*v,*v,m")
+    "=r,r,r,m ,*k,*k ,r ,m ,*k ,?r,?*v,*v,*v,*v,Bt,m")
 	(match_operand:HI 1 "general_operand"
-    "r ,n,m,rn,r ,*km,*k,*k,CBC,*v,r  ,C ,*v,m ,*v"))]
+    "r ,n,m,rn,r ,*km,*k,*k,CBC,*v,r  ,C ,*v,m ,*x,*v"))]
   "!(MEM_P (operands[0]) && MEM_P (operands[1]))
    && ix86_hardreg_mov_ok (operands[0], operands[1])"
 {
@@ -2904,8 +2904,10 @@ (define_insn "*movhi_internal"
 
       if (SSE_REG_P (operands[0]))
 	return "%vpinsrw\t{$0, %1, %d0|%d0, %1, 0}";
+      else if (!TARGET_AVX)
+	return "pextrw\t{$0, %1, %0|%0, %1, 0}";
       else
-	return "%vpextrw\t{$0, %1, %0|%0, %1, 0}";
+	return "vpextrw\t{$0, %1, %0|%0, %1, 0}";
 
     case TYPE_MSKLOG:
       if (operands[1] == const0_rtx)
@@ -2925,15 +2927,21 @@ (define_insn "*movhi_internal"
 	(cond [(eq_attr "alternative" "9,10,11,12,13")
 		  (const_string "sse2")
 	       (eq_attr "alternative" "14")
-		  (const_string "sse4")
+		  (const_string "sse4_noavx")
+	       (eq_attr "alternative" "15")
+		  (const_string "avx")
 	       ]
 	       (const_string "*")))
+   (set (attr "gpr32")
+	(if_then_else (eq_attr "alternative" "14")
+		      (const_string "0")
+		      (const_string "1")))
    (set (attr "type")
      (cond [(eq_attr "alternative" "4,5,6,7")
 	      (const_string "mskmov")
 	    (eq_attr "alternative" "8")
 	      (const_string "msklog")
-	    (eq_attr "alternative" "13,14")
+	    (eq_attr "alternative" "13,14,15")
 	      (if_then_else (match_test "TARGET_AVX512FP16")
 		(const_string "ssemov")
 		(const_string "sselog1"))
@@ -2958,7 +2966,7 @@ (define_insn "*movhi_internal"
    (set (attr "prefix")
 	(cond [(eq_attr "alternative" "4,5,6,7,8")
 		 (const_string "vex")
-	       (eq_attr "alternative" "9,10,11,12,13,14")
+	       (eq_attr "alternative" "9,10,11,12,13,14,15")
 		 (const_string "maybe_evex")
 	      ]
 	      (const_string "orig")))
@@ -2967,7 +2975,7 @@ (define_insn "*movhi_internal"
 	      (if_then_else (match_test "TARGET_AVX512FP16")
 		(const_string "HI")
 		(const_string "SI"))
-	    (eq_attr "alternative" "13,14")
+	    (eq_attr "alternative" "13,14,15")
 	      (if_then_else (match_test "TARGET_AVX512FP16")
 		(const_string "HI")
 		(const_string "TI"))
@@ -4320,9 +4328,9 @@ (define_mode_attr hfbfconstf
 
 (define_insn "*mov<mode>_internal"
  [(set (match_operand:HFBF 0 "nonimmediate_operand"
-	 "=?r,?r,?r,?m,v,v,?r,m,?v,v")
+	 "=?r,?r,?r,?m,v,v,?r,Bt,m,?v,v")
        (match_operand:HFBF 1 "general_operand"
-	 "r  ,F ,m ,r<hfbfconstf>,C,v, v,v,r ,m"))]
+	 "r  ,F ,m ,r<hfbfconstf>,C,v, v,v,v,r ,m"))]
  "!(MEM_P (operands[0]) && MEM_P (operands[1]))
   && (lra_in_progress
       || reload_completed
@@ -4347,8 +4355,10 @@ (define_insn "*mov<mode>_internal"
 
       if (SSE_REG_P (operands[0]))
 	return "%vpinsrw\t{$0, %1, %d0|%d0, %1, 0}";
+      else if (!TARGET_AVX)
+	return "pextrw\t{$0, %1, %0|%0, %1, 0}";
       else
-	return "%vpextrw\t{$0, %1, %0|%0, %1, 0}";
+	return "vpextrw\t{$0, %1, %0|%0, %1, 0}";
 
     default:
       if (get_attr_mode (insn) == MODE_SI)
@@ -4358,18 +4368,24 @@ (define_insn "*mov<mode>_internal"
     }
 }
   [(set (attr "isa")
-	(cond [(eq_attr "alternative" "4,5,6,8,9")
+	(cond [(eq_attr "alternative" "4,5,6,9,10")
 		 (const_string "sse2")
 	       (eq_attr "alternative" "7")
-		 (const_string "sse4")
+		 (const_string "sse4_noavx")
+	       (eq_attr "alternative" "8")
+		 (const_string "avx")
 	      ]
 	      (const_string "*")))
+   (set (attr "gpr32")
+	(if_then_else (eq_attr "alternative" "8")
+		      (const_string "0")
+		      (const_string "1")))
    (set (attr "type")
 	(cond [(eq_attr "alternative" "4")
 		 (const_string "sselog1")
-	       (eq_attr "alternative" "5,6,8")
+	       (eq_attr "alternative" "5,6,9")
 		 (const_string "ssemov")
-	       (eq_attr "alternative" "7,9")
+	       (eq_attr "alternative" "7,8,10")
 		 (if_then_else
 		   (match_test ("TARGET_AVX512FP16"))
 		   (const_string "ssemov")
@@ -4389,19 +4405,19 @@ (define_insn "*mov<mode>_internal"
 		 ]
 	      (const_string "imov")))
    (set (attr "prefix")
-	(cond [(eq_attr "alternative" "4,5,6,7,8,9")
+	(cond [(eq_attr "alternative" "4,5,6,7,8,9,10")
 		 (const_string "maybe_vex")
 	      ]
 	      (const_string "orig")))
    (set (attr "mode")
 	(cond [(eq_attr "alternative" "4")
 		 (const_string "V4SF")
-	       (eq_attr "alternative" "6,8")
+	       (eq_attr "alternative" "6,9")
 		 (if_then_else
 		   (match_test "TARGET_AVX512FP16")
 		   (const_string "HI")
 		   (const_string "SI"))
-	       (eq_attr "alternative" "7,9")
+	       (eq_attr "alternative" "7,8,10")
 		 (if_then_else
 		   (match_test "TARGET_AVX512FP16")
 		   (const_string "HI")
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index ef578222945..63803c89f2b 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -418,9 +418,9 @@ (define_expand "movv2qi"
 
 (define_insn "*movv2qi_internal"
   [(set (match_operand:V2QI 0 "nonimmediate_operand"
-    "=r,r,r,m ,v,v,v,m,r,v")
+    "=r,r,r,m ,v,v,v,Bt,m,r,v")
 	(match_operand:V2QI 1 "general_operand"
-    "r ,C,m,rC,C,v,m,v,v,r"))]
+    "r ,C,m,rC,C,v,m,x,v,v,r"))]
   "!(MEM_P (operands[0]) && MEM_P (operands[1]))"
 {
   switch (get_attr_type (insn))
@@ -442,8 +442,10 @@ (define_insn "*movv2qi_internal"
 
       if (SSE_REG_P (operands[0]))
 	return "%vpinsrw\t{$0, %1, %d0|%d0, %1, 0}";
+      else if (!TARGET_AVX)
+	return "pextrw\t{$0, %1, %0|%0, %1, 0}";
       else
-	return "%vpextrw\t{$0, %1, %0|%0, %1, 0}";
+	return "vpextrw\t{$0, %1, %0|%0, %1, 0}";
 
     case TYPE_SSEMOV:
       return ix86_output_ssemov (insn, operands);
@@ -453,20 +455,26 @@ (define_insn "*movv2qi_internal"
     }
 }
   [(set (attr "isa")
-	(cond [(eq_attr "alternative" "6,8,9")
+	(cond [(eq_attr "alternative" "6,9,10")
 		  (const_string "sse2")
 	       (eq_attr "alternative" "7")
-		  (const_string "sse4")
+		  (const_string "sse4_noavx")
+	       (eq_attr "alternative" "8")
+		  (const_string "avx")
 	       ]
 	       (const_string "*")))
+   (set (attr "gpr32")
+	(if_then_else (eq_attr "alternative" "7")
+		      (const_string "0")
+		      (const_string "1")))
    (set (attr "type")
-     (cond [(eq_attr "alternative" "6,7")
+     (cond [(eq_attr "alternative" "6,7,8")
 	      (if_then_else (match_test "TARGET_AVX512FP16")
 		(const_string "ssemov")
 		(const_string "sselog1"))
 	    (eq_attr "alternative" "4")
 	      (const_string "sselog1")
-	    (eq_attr "alternative" "5,8,9")
+	    (eq_attr "alternative" "5,9,10")
 	      (const_string "ssemov")
 	    (match_test "optimize_function_for_size_p (cfun)")
 	      (const_string "imov")
@@ -483,16 +491,16 @@ (define_insn "*movv2qi_internal"
 	   ]
 	   (const_string "imov")))
    (set (attr "prefix")
-	(cond [(eq_attr "alternative" "4,5,6,7,8,9")
+	(cond [(eq_attr "alternative" "4,5,6,7,8,9,10")
 		 (const_string "maybe_evex")
 	      ]
 	      (const_string "orig")))
    (set (attr "mode")
-     (cond [(eq_attr "alternative" "6,7")
+     (cond [(eq_attr "alternative" "6,7,8")
 	      (if_then_else (match_test "TARGET_AVX512FP16")
 		(const_string "HI")
 		(const_string "TI"))
-	    (eq_attr "alternative" "8,9")
+	    (eq_attr "alternative" "9,10")
 	      (if_then_else (match_test "TARGET_AVX512FP16")
 		(const_string "HI")
 		(const_string "SI"))
@@ -526,9 +534,9 @@ (define_insn "*movv2qi_internal"
 	    ]
 	    (const_string "HI")))
    (set (attr "preferred_for_speed")
-     (cond [(eq_attr "alternative" "8")
+     (cond [(eq_attr "alternative" "9")
 	      (symbol_ref "TARGET_INTER_UNIT_MOVES_FROM_VEC")
-	    (eq_attr "alternative" "9")
+	    (eq_attr "alternative" "10")
 	      (symbol_ref "TARGET_INTER_UNIT_MOVES_TO_VEC")
 	   ]
 	   (symbol_ref "true")))])
@@ -1167,7 +1175,7 @@ (define_expand "vcond<mode>v2sf"
 (define_insn "@sse4_1_insertps_<mode>"
   [(set (match_operand:V2FI 0 "register_operand" "=Yr,*x,v")
 	(unspec:V2FI
-	  [(match_operand:V2FI 2 "nonimmediate_operand" "Yrm,*xm,vm")
+	  [(match_operand:V2FI 2 "nonimmediate_operand" "YrBt,*xBt,vm")
 	   (match_operand:V2FI 1 "register_operand" "0,0,v")
 	   (match_operand:SI 3 "const_0_to_255_operand")]
 	  UNSPEC_INSERTPS))]
@@ -1193,6 +1201,7 @@ (define_insn "@sse4_1_insertps_<mode>"
     }
 }
   [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "gpr32" "0,0,1")
    (set_attr "type" "sselog")
    (set_attr "prefix_data16" "1,1,*")
    (set_attr "prefix_extra" "1")
@@ -3952,7 +3961,7 @@ (define_insn "*mmx_pinsrd"
   [(set (match_operand:V2SI 0 "register_operand" "=x,Yv")
         (vec_merge:V2SI
           (vec_duplicate:V2SI
-            (match_operand:SI 2 "nonimmediate_operand" "rm,rm"))
+            (match_operand:SI 2 "nonimmediate_operand" "hBt,rm"))
 	  (match_operand:V2SI 1 "register_operand" "0,Yv")
           (match_operand:SI 3 "const_int_operand")))]
   "TARGET_SSE4_1 && TARGET_MMX_WITH_SSE
@@ -3971,6 +3980,7 @@ (define_insn "*mmx_pinsrd"
     }
 }
   [(set_attr "isa" "noavx,avx")
+   (set_attr "gpr32" "0,1")
    (set_attr "prefix_extra" "1")
    (set_attr "type" "sselog")
    (set_attr "length_immediate" "1")
@@ -4031,7 +4041,7 @@ (define_insn "*mmx_pinsrb"
   [(set (match_operand:V8QI 0 "register_operand" "=x,YW")
         (vec_merge:V8QI
           (vec_duplicate:V8QI
-            (match_operand:QI 2 "nonimmediate_operand" "rm,rm"))
+            (match_operand:QI 2 "nonimmediate_operand" "hBt,rm"))
 	  (match_operand:V8QI 1 "register_operand" "0,YW")
           (match_operand:SI 3 "const_int_operand")))]
   "TARGET_SSE4_1 && TARGET_MMX_WITH_SSE
@@ -4057,28 +4067,31 @@ (define_insn "*mmx_pinsrb"
 }
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sselog")
+   (set_attr "gpr32" "0,1")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "TI")])
 
 (define_insn "*mmx_pextrw"
-  [(set (match_operand:HI 0 "register_sse4nonimm_operand" "=r,r,m")
+  [(set (match_operand:HI 0 "register_sse4nonimm_operand" "=r,r,Bt,m")
 	(vec_select:HI
-	  (match_operand:V4HI 1 "register_operand" "y,YW,YW")
+	  (match_operand:V4HI 1 "register_operand" "y,YW,YW,YW")
 	  (parallel [(match_operand:SI 2 "const_0_to_3_operand")])))]
   "(TARGET_MMX || TARGET_MMX_WITH_SSE)
    && (TARGET_SSE || TARGET_3DNOW_A)"
   "@
    pextrw\t{%2, %1, %k0|%k0, %1, %2}
    %vpextrw\t{%2, %1, %k0|%k0, %1, %2}
-   %vpextrw\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "isa" "*,sse2,sse4")
-   (set_attr "mmx_isa" "native,*,*")
-   (set_attr "type" "mmxcvt,sselog1,sselog1")
+   pextrw\t{%2, %1, %0|%0, %1, %2}
+   vpextrw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "*,sse2,sse4_noavx,avx")
+   (set_attr "gpr32" "1,1,0,1")
+   (set_attr "mmx_isa" "native,*,*,*")
+   (set_attr "type" "mmxcvt,sselog1,sselog1,sselog1")
    (set_attr "length_immediate" "1")
-   (set_attr "prefix" "orig,maybe_vex,maybe_vex")
-   (set_attr "mode" "DI,TI,TI")])
+   (set_attr "prefix" "orig,maybe_vex,maybe_vex,maybe_evex")
+   (set_attr "mode" "DI,TI,TI,TI")])
 
 (define_insn "*mmx_pextrw_zext"
   [(set (match_operand:SWI48 0 "register_operand" "=r,r")
@@ -4099,29 +4112,36 @@ (define_insn "*mmx_pextrw_zext"
    (set_attr "mode" "DI,TI")])
 
 (define_insn "*mmx_pextrb"
-  [(set (match_operand:QI 0 "nonimmediate_operand" "=r,m")
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=h,Bt,r,m")
 	(vec_select:QI
-	  (match_operand:V8QI 1 "register_operand" "YW,YW")
+	  (match_operand:V8QI 1 "register_operand" "YW,YW,YW,YW")
 	  (parallel [(match_operand:SI 2 "const_0_to_7_operand")])))]
   "TARGET_SSE4_1 && TARGET_MMX_WITH_SSE"
   "@
-   %vpextrb\t{%2, %1, %k0|%k0, %1, %2}
-   %vpextrb\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "type" "sselog1")
+   pextrb\t{%2, %1, %k0|%k0, %1, %2}
+   pextrb\t{%2, %1, %0|%0, %1, %2}
+   vpextrb\t{%2, %1, %k0|%k0, %1, %2}
+   vpextrb\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "noavx,noavx,avx,avx")
+   (set_attr "gpr32" "1,0,1,1")
+   (set_attr "type" "sselog1")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "TI")])
 
 (define_insn "*mmx_pextrb_zext"
-  [(set (match_operand:SWI248 0 "register_operand" "=r")
+  [(set (match_operand:SWI248 0 "register_operand" "=h,r")
 	(zero_extend:SWI248
 	  (vec_select:QI
-	    (match_operand:V8QI 1 "register_operand" "YW")
+	    (match_operand:V8QI 1 "register_operand" "YW,YW")
 	    (parallel [(match_operand:SI 2 "const_0_to_7_operand")]))))]
   "TARGET_SSE4_1 && TARGET_MMX_WITH_SSE"
-  "%vpextrb\t{%2, %1, %k0|%k0, %1, %2}"
-  [(set_attr "type" "sselog1")
+  "@
+   pextrb\t{%2, %1, %k0|%k0, %1, %2}
+   vpextrb\t{%2, %1, %k0|%k0, %1, %2}"
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "sselog1")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "maybe_vex")
@@ -4131,13 +4151,14 @@ (define_insn "mmx_pshufbv8qi3"
   [(set (match_operand:V8QI 0 "register_operand" "=x,Yw")
 	(unspec:V8QI
 	  [(match_operand:V8QI 1 "register_operand" "0,Yw")
-	   (match_operand:V16QI 2 "vector_operand" "xBm,Ywm")]
+	   (match_operand:V16QI 2 "vector_operand" "xBT,Ywm")]
 	  UNSPEC_PSHUFB))]
   "TARGET_SSSE3 && TARGET_MMX_WITH_SSE"
   "@
    pshufb\t{%2, %0|%0, %2}
    vpshufb\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,avx")
+   (set_attr "gpr32" "0,1")
    (set_attr "type" "sselog1")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "orig,maybe_evex")
@@ -4148,13 +4169,14 @@ (define_insn "mmx_pshufbv4qi3"
   [(set (match_operand:V4QI 0 "register_operand" "=x,Yw")
 	(unspec:V4QI
 	  [(match_operand:V4QI 1 "register_operand" "0,Yw")
-	   (match_operand:V16QI 2 "vector_operand" "xBm,Ywm")]
+	   (match_operand:V16QI 2 "vector_operand" "xBT,Ywm")]
 	  UNSPEC_PSHUFB))]
   "TARGET_SSSE3"
   "@
    pshufb\t{%2, %0|%0, %2}
    vpshufb\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,avx")
+   (set_attr "gpr32" "0,1")
    (set_attr "type" "sselog1")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "orig,maybe_evex")
@@ -4414,29 +4436,31 @@ (define_split
 ;; Avoid combining registers from different units in a single alternative,
 ;; see comment above inline_secondary_memory_needed function in i386.cc
 (define_insn "*vec_extractv2si_1"
-  [(set (match_operand:SI 0 "nonimmediate_operand"     "=y,rm,x,x,y,x,r")
+  [(set (match_operand:SI 0 "nonimmediate_operand"     "=y,hBt,rm,x,x,y,x,r")
 	(vec_select:SI
-	  (match_operand:V2SI 1 "nonimmediate_operand" " 0,x ,x,0,o,o,o")
+	  (match_operand:V2SI 1 "nonimmediate_operand" " 0,x,  x ,x,0,o,o,o")
 	  (parallel [(const_int 1)])))]
   "(TARGET_MMX || TARGET_MMX_WITH_SSE)
    && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
   "@
    punpckhdq\t%0, %0
-   %vpextrd\t{$1, %1, %0|%0, %1, 1}
+   pextrd\t{$1, %1, %0|%0, %1, 1}
+   vpextrd\t{$1, %1, %0|%0, %1, 1}
    %vpshufd\t{$0xe5, %1, %0|%0, %1, 0xe5}
    shufps\t{$0xe5, %0, %0|%0, %0, 0xe5}
    #
    #
    #"
-  [(set_attr "isa" "*,sse4,sse2,noavx,*,*,*")
-   (set_attr "mmx_isa" "native,*,*,*,native,*,*")
-   (set_attr "type" "mmxcvt,ssemov,sseshuf1,sseshuf1,mmxmov,ssemov,imov")
+  [(set_attr "isa" "*,sse4_noavx,avx,sse2,noavx,*,*,*")
+   (set_attr "gpr32" "1,0,1,1,1,1,1,1")
+   (set_attr "mmx_isa" "native,*,*,*,*,native,*,*")
+   (set_attr "type" "mmxcvt,ssemov,ssemov,sseshuf1,sseshuf1,mmxmov,ssemov,imov")
    (set (attr "length_immediate")
-     (if_then_else (eq_attr "alternative" "1,2,3")
+     (if_then_else (eq_attr "alternative" "1,2,3,4")
 		   (const_string "1")
 		   (const_string "*")))
-   (set_attr "prefix" "orig,maybe_vex,maybe_vex,orig,orig,orig,orig")
-   (set_attr "mode" "DI,TI,TI,V4SF,SI,SI,SI")])
+   (set_attr "prefix" "orig,orig,maybe_evex,maybe_vex,orig,orig,orig,orig")
+   (set_attr "mode" "DI,TI,TI,TI,V4SF,SI,SI,SI")])
 
 (define_split
   [(set (match_operand:SI 0 "register_operand")
@@ -4448,15 +4472,18 @@ (define_split
   "operands[1] = adjust_address (operands[1], SImode, 4);")
 
 (define_insn "*vec_extractv2si_1_zext"
-  [(set (match_operand:DI 0 "register_operand" "=r")
+  [(set (match_operand:DI 0 "register_operand" "=h,r")
 	(zero_extend:DI
 	  (vec_select:SI
-	    (match_operand:V2SI 1 "register_operand" "x")
+	    (match_operand:V2SI 1 "register_operand" "x,x")
 	    (parallel [(const_int 1)]))))]
   "(TARGET_MMX || TARGET_MMX_WITH_SSE)
    && TARGET_64BIT && TARGET_SSE4_1"
-  "%vpextrd\t{$1, %1, %k0|%k0, %1, 1}"
-  [(set_attr "type" "sselog1")
+  "@
+   pextrd\t{$1, %1, %k0|%k0, %1, 1}
+   vpextrd\t{$1, %1, %k0|%k0, %1, 1}"
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "sselog1")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "maybe_vex")
@@ -4606,7 +4633,7 @@ (define_insn "*pinsrb"
   [(set (match_operand:V4QI 0 "register_operand" "=x,YW")
         (vec_merge:V4QI
           (vec_duplicate:V4QI
-            (match_operand:QI 2 "nonimmediate_operand" "rm,rm"))
+            (match_operand:QI 2 "nonimmediate_operand" "hBt,rm"))
 	  (match_operand:V4QI 1 "register_operand" "0,YW")
           (match_operand:SI 3 "const_int_operand")))]
   "TARGET_SSE4_1
@@ -4631,6 +4658,7 @@ (define_insn "*pinsrb"
     }
 }
   [(set_attr "isa" "noavx,avx")
+   (set_attr "gpr32" "0,1")
    (set_attr "type" "sselog")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
@@ -4638,15 +4666,17 @@ (define_insn "*pinsrb"
    (set_attr "mode" "TI")])
 
 (define_insn "*pextrw"
-  [(set (match_operand:HI 0 "register_sse4nonimm_operand" "=r,m")
+  [(set (match_operand:HI 0 "register_sse4nonimm_operand" "=r,Bt,m")
 	(vec_select:HI
-	  (match_operand:V2HI 1 "register_operand" "YW,YW")
+	  (match_operand:V2HI 1 "register_operand" "YW,YW,YW")
 	  (parallel [(match_operand:SI 2 "const_0_to_1_operand")])))]
   "TARGET_SSE2"
   "@
    %vpextrw\t{%2, %1, %k0|%k0, %1, %2}
-   %vpextrw\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "isa" "*,sse4")
+   pextrw\t{%2, %1, %0|%0, %1, %2}
+   vpextrw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "*,sse4_noavx,avx")
+   (set_attr "gpr32" "1,0,1")
    (set_attr "type" "sselog1")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "maybe_vex")
@@ -4666,29 +4696,36 @@ (define_insn "*pextrw_zext"
    (set_attr "mode" "TI")])
 
 (define_insn "*pextrb"
-  [(set (match_operand:QI 0 "nonimmediate_operand" "=r,m")
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=h,Bt,r,m")
 	(vec_select:QI
-	  (match_operand:V4QI 1 "register_operand" "YW,YW")
+	  (match_operand:V4QI 1 "register_operand" "YW,YW,YW,YW")
 	  (parallel [(match_operand:SI 2 "const_0_to_3_operand")])))]
   "TARGET_SSE4_1"
   "@
-   %vpextrb\t{%2, %1, %k0|%k0, %1, %2}
-   %vpextrb\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "type" "sselog1")
+   pextrb\t{%2, %1, %k0|%k0, %1, %2}
+   pextrb\t{%2, %1, %0|%0, %1, %2}
+   vpextrb\t{%2, %1, %k0|%k0, %1, %2}
+   vpextrb\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "noavx,noavx,avx,avx")
+   (set_attr "gpr32" "1,0,1,1")
+   (set_attr "type" "sselog1")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "TI")])
 
 (define_insn "*pextrb_zext"
-  [(set (match_operand:SWI248 0 "register_operand" "=r")
+  [(set (match_operand:SWI248 0 "register_operand" "=h,r")
 	(zero_extend:SWI248
 	  (vec_select:QI
-	    (match_operand:V4QI 1 "register_operand" "YW")
+	    (match_operand:V4QI 1 "register_operand" "YW,YW")
 	    (parallel [(match_operand:SI 2 "const_0_to_3_operand")]))))]
   "TARGET_SSE4_1"
-  "%vpextrb\t{%2, %1, %k0|%k0, %1, %2}"
-  [(set_attr "type" "sselog1")
+  "@
+   pextrb\t{%2, %1, %k0|%k0, %1, %2}
+   vpextrb\t{%2, %1, %k0|%k0, %1, %2}"
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "sselog1")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "maybe_vex")
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 456713b991a..4913c34ed37 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -10840,7 +10840,7 @@ (define_insn "*vec_concatv2sf_sse4_1"
 	  (match_operand:SF 1 "nonimmediate_operand"
 	  "  0, 0,Yv, 0,0, v,m, 0 , m")
 	  (match_operand:SF 2 "nonimm_or_0_operand"
-	  " Yr,*x,Yv, m,m, m,C,*ym, C")))]
+	  " Yr,*x,Yv, Bt,Bt, m,C,*ym, C")))]
   "TARGET_SSE4_1 && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
   "@
    unpcklps\t{%2, %0|%0, %2}
@@ -10872,6 +10872,10 @@ (define_insn "*vec_concatv2sf_sse4_1"
      (if_then_else (eq_attr "alternative" "7,8")
 		   (const_string "native")
 		   (const_string "*")))
+   (set (attr "gpr32")
+     (if_then_else (eq_attr "alternative" "3,4")
+		   (const_string "0")
+		   (const_string "1")))
    (set (attr "prefix_data16")
      (if_then_else (eq_attr "alternative" "3,4")
 		   (const_string "1")
@@ -10963,7 +10967,7 @@ (define_insn "vec_set<mode>_0"
 	(vec_merge:VI4F_128
 	  (vec_duplicate:VI4F_128
 	    (match_operand:<ssescalarmode> 2 "general_operand"
-	  " Yr,*x,v,m,r ,m,x,v,?rm,?rm,?rm,!x,?re,!*fF"))
+	  " Yr,*x,v,m,r ,m,x,v,?hBt,?hBt,?rm,!x,?re,!*fF"))
 	  (match_operand:VI4F_128 1 "nonimm_or_0_operand"
 	  " C , C,C,C,C ,C,0,v,0  ,0  ,x  ,0 ,0   ,0")
 	  (const_int 1)))]
@@ -11003,6 +11007,10 @@ (define_insn "vec_set<mode>_0"
 	      (const_string "fmov")
 	   ]
 	   (const_string "ssemov")))
+   (set (attr "gpr32")
+     (if_then_else (eq_attr "alternative" "8,9")
+		   (const_string "0")
+		   (const_string "1")))
    (set (attr "prefix_extra")
      (if_then_else (eq_attr "alternative" "8,9,10")
 		   (const_string "1")
@@ -11175,7 +11183,7 @@ (define_insn "*vec_setv4sf_sse4_1"
   [(set (match_operand:V4SF 0 "register_operand" "=Yr,*x,v")
 	(vec_merge:V4SF
 	  (vec_duplicate:V4SF
-	    (match_operand:SF 2 "nonimmediate_operand" "Yrm,*xm,vm"))
+	    (match_operand:SF 2 "nonimmediate_operand" "YrBt,*xBt,vm"))
 	  (match_operand:V4SF 1 "register_operand" "0,0,v")
 	  (match_operand:SI 3 "const_int_operand")))]
   "TARGET_SSE4_1
@@ -11196,6 +11204,7 @@ (define_insn "*vec_setv4sf_sse4_1"
 }
   [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "sselog")
+   (set_attr "gpr32" "0,0,1")
    (set_attr "prefix_data16" "1,1,*")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
@@ -11270,7 +11279,7 @@ (define_insn_and_split "*vec_setv2di_0_zero_extendsi_1"
 (define_insn "@sse4_1_insertps_<mode>"
   [(set (match_operand:VI4F_128 0 "register_operand" "=Yr,*x,v")
 	(unspec:VI4F_128
-	  [(match_operand:VI4F_128 2 "nonimmediate_operand" "Yrm,*xm,vm")
+	  [(match_operand:VI4F_128 2 "nonimmediate_operand" "YrBt,*xBt,vm")
 	   (match_operand:VI4F_128 1 "register_operand" "0,0,v")
 	   (match_operand:SI 3 "const_0_to_255_operand")]
 	  UNSPEC_INSERTPS))]
@@ -11296,6 +11305,7 @@ (define_insn "@sse4_1_insertps_<mode>"
     }
 }
   [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "gpr32" "0,0,1")
    (set_attr "type" "sselog")
    (set_attr "prefix_data16" "1,1,*")
    (set_attr "prefix_extra" "1")
@@ -11373,7 +11383,7 @@ (define_insn_and_split "*vec_extractv4sf_0"
   "operands[1] = gen_lowpart (SFmode, operands[1]);")
 
 (define_insn_and_split "*sse4_1_extractps"
-  [(set (match_operand:SF 0 "nonimmediate_operand" "=rm,rm,rm,Yv,Yv")
+  [(set (match_operand:SF 0 "nonimmediate_operand" "=hBt,hBt,rm,Yv,Yv")
 	(vec_select:SF
 	  (match_operand:V4SF 1 "register_operand" "Yr,*x,v,0,v")
 	  (parallel [(match_operand:SI 2 "const_0_to_3_operand")])))]
@@ -11407,6 +11417,7 @@ (define_insn_and_split "*sse4_1_extractps"
   DONE;
 }
   [(set_attr "isa" "noavx,noavx,avx,noavx,avx")
+   (set_attr "gpr32" "0,0,1,1,1")
    (set_attr "type" "sselog,sselog,sselog,*,*")
    (set_attr "prefix_data16" "1,1,1,*,*")
    (set_attr "prefix_extra" "1,1,1,*,*")
@@ -12271,9 +12282,9 @@ (define_insn_and_split "*vec_extract<mode>_0"
   "operands[1] = gen_lowpart (<ssescalarmode>mode, operands[1]);")
 
 (define_insn "*vec_extract<mode>"
-  [(set (match_operand:HFBF 0 "register_sse4nonimm_operand" "=?r,m,x,v")
+  [(set (match_operand:HFBF 0 "register_sse4nonimm_operand" "=?r,Bt,m,x,v")
 	(vec_select:HFBF
-	  (match_operand:<ssevecmode> 1 "register_operand" "v,v,0,v")
+	  (match_operand:<ssevecmode> 1 "register_operand" "v,x,v,0,v")
 	  (parallel
 	    [(match_operand:SI 2 "const_0_to_7_operand")])))]
   "TARGET_SSE2"
@@ -12283,12 +12294,14 @@ (define_insn "*vec_extract<mode>"
     case 0:
       return "%vpextrw\t{%2, %1, %k0|%k0, %1, %2}";
     case 1:
-      return "%vpextrw\t{%2, %1, %0|%0, %1, %2}";
-
+      return "pextrw\t{%2, %1, %0|%0, %1, %2}";
     case 2:
+      return "vpextrw\t{%2, %1, %0|%0, %1, %2}";
+
+    case 3:
       operands[2] = GEN_INT (INTVAL (operands[2]) * 2);
       return "psrldq\t{%2, %0|%0, %2}";
-    case 3:
+    case 4:
       operands[2] = GEN_INT (INTVAL (operands[2]) * 2);
       return "vpsrldq\t{%2, %1, %0|%0, %1, %2}";
 
@@ -12296,8 +12309,9 @@ (define_insn "*vec_extract<mode>"
       gcc_unreachable ();
    }
 }
-  [(set_attr "isa" "*,sse4,noavx,avx")
-   (set_attr "type" "sselog1,sselog1,sseishft1,sseishft1")
+  [(set_attr "isa" "*,sse4_noavx,avx,noavx,avx")
+   (set_attr "gpr32" "1,0,1,1,1")
+   (set_attr "type" "sselog1,sselog1,sselog1,sseishft1,sseishft1")
    (set_attr "prefix" "maybe_evex")
    (set_attr "mode" "TI")])
 
@@ -15659,7 +15673,7 @@ (define_insn "*sse4_1_mulv2siv2di3<mask_name>"
 	      (parallel [(const_int 0) (const_int 2)])))
 	  (sign_extend:V2DI
 	    (vec_select:V2SI
-	      (match_operand:V4SI 2 "vector_operand" "YrBm,*xBm,vm")
+	      (match_operand:V4SI 2 "vector_operand" "YrBT,*xBT,vm")
 	      (parallel [(const_int 0) (const_int 2)])))))]
   "TARGET_SSE4_1 && <mask_avx512vl_condition>
    && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
@@ -15668,6 +15682,7 @@ (define_insn "*sse4_1_mulv2siv2di3<mask_name>"
    pmuldq\t{%2, %0|%0, %2}
    vpmuldq\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "gpr32" "0,0,1")
    (set_attr "type" "sseimul")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "orig,orig,vex")
@@ -15905,7 +15920,7 @@ (define_insn "*<sse4_1_avx2>_mul<mode>3<mask_name>"
   [(set (match_operand:VI4_AVX512F 0 "register_operand" "=Yr,*x,v")
 	(mult:VI4_AVX512F
 	  (match_operand:VI4_AVX512F 1 "bcst_vector_operand" "%0,0,v")
-	  (match_operand:VI4_AVX512F 2 "bcst_vector_operand" "YrBm,*xBm,vmBr")))]
+	  (match_operand:VI4_AVX512F 2 "bcst_vector_operand" "YrBT,*xBT,vmBr")))]
   "TARGET_SSE4_1 && ix86_binary_operator_ok (MULT, <MODE>mode, operands)
   && <mask_mode512bit_condition>"
   "@
@@ -15913,6 +15928,7 @@ (define_insn "*<sse4_1_avx2>_mul<mode>3<mask_name>"
    pmulld\t{%2, %0|%0, %2}
    vpmulld\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "gpr32" "0,0,1")
    (set_attr "type" "sseimul")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "<bcst_mask_prefix4>")
@@ -16717,7 +16733,7 @@ (define_insn "*sse4_1_<code><mode>3<mask_name>"
   [(set (match_operand:VI14_128 0 "register_operand" "=Yr,*x,<v_Yw>")
 	(smaxmin:VI14_128
 	  (match_operand:VI14_128 1 "vector_operand" "%0,0,<v_Yw>")
-	  (match_operand:VI14_128 2 "vector_operand" "YrBm,*xBm,<v_Yw>m")))]
+	  (match_operand:VI14_128 2 "vector_operand" "YrBT,*xBT,<v_Yw>m")))]
   "TARGET_SSE4_1
    && <mask_mode512bit_condition>
    && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
@@ -16728,6 +16744,7 @@ (define_insn "*sse4_1_<code><mode>3<mask_name>"
   [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "sseiadd")
    (set_attr "prefix_extra" "1")
+   (set_attr "gpr32" "0,0,1")
    (set_attr "prefix" "orig,orig,vex")
    (set_attr "mode" "TI")])
 
@@ -16735,13 +16752,14 @@ (define_insn "*<code>v8hi3"
   [(set (match_operand:V8HI 0 "register_operand" "=x,Yw")
 	(smaxmin:V8HI
 	  (match_operand:V8HI 1 "vector_operand" "%0,Yw")
-	  (match_operand:V8HI 2 "vector_operand" "xBm,Ywm")))]
+	  (match_operand:V8HI 2 "vector_operand" "xBT,Ywm")))]
   "TARGET_SSE2 && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
   "@
    p<maxmin_int>w\t{%2, %0|%0, %2}
    vp<maxmin_int>w\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sseiadd")
+   (set_attr "gpr32" "0,1")
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "TI")])
 
@@ -16809,6 +16827,7 @@ (define_insn "*sse4_1_<code><mode>3<mask_name>"
    vp<maxmin_int><ssemodesuffix>\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "sseiadd")
+   (set_attr "gpr32" "0,0,1")
    (set_attr "prefix_extra" "1,1,*")
    (set_attr "prefix" "orig,orig,vex")
    (set_attr "mode" "TI")])
@@ -16817,12 +16836,13 @@ (define_insn "*<code>v16qi3"
   [(set (match_operand:V16QI 0 "register_operand" "=x,Yw")
 	(umaxmin:V16QI
 	  (match_operand:V16QI 1 "vector_operand" "%0,Yw")
-	  (match_operand:V16QI 2 "vector_operand" "xBm,Ywm")))]
+	  (match_operand:V16QI 2 "vector_operand" "xBT,Ywm")))]
   "TARGET_SSE2 && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
   "@
    p<maxmin_int>b\t{%2, %0|%0, %2}
    vp<maxmin_int>b\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,avx")
+   (set_attr "gpr32" "0,1")
    (set_attr "type" "sseiadd")
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "TI")])
@@ -18813,7 +18833,7 @@ (define_insn "<sse2p4_1>_pinsr<ssemodesuffix>"
   [(set (match_operand:PINSR_MODE 0 "register_operand" "=x,x,x,x,v,v,&x")
 	(vec_merge:PINSR_MODE
 	  (vec_duplicate:PINSR_MODE
-	    (match_operand:<ssescalarmode> 2 "nonimmediate_operand" "r,m,r,m,r,m,x"))
+	    (match_operand:<ssescalarmode> 2 "nonimmediate_operand" "h,Bt,r,m,r,m,x"))
 	  (match_operand:PINSR_MODE 1 "register_operand" "0,0,x,x,v,v,x")
 	  (match_operand:SI 3 "const_int_operand")))]
   "TARGET_SSE2
@@ -18850,6 +18870,7 @@ (define_insn "<sse2p4_1>_pinsr<ssemodesuffix>"
 }
   [(set_attr "isa" "noavx,noavx,avx,avx,<pinsr_evex_isa>,<pinsr_evex_isa>,avx2")
    (set_attr "type" "sselog")
+   (set_attr "gpr32" "0,0,1,1,1,1,1")
    (set (attr "prefix_rex")
      (if_then_else
        (and (not (match_test "TARGET_AVX"))
@@ -20010,17 +20031,23 @@ (define_insn_and_split "*vec_extract<mode>_0_mem"
   operands[4] = gen_lowpart (<ssescalarmode>mode, operands[2]);
 })
 
+(define_mode_attr vi128_h_r
+ [(V16QI "h") (V8HI "r")])
+
 (define_insn "*vec_extract<mode>"
-  [(set (match_operand:<ssescalarmode> 0 "register_sse4nonimm_operand" "=r,m")
+  [(set (match_operand:<ssescalarmode> 0 "register_sse4nonimm_operand" "=<vi128_h_r>,r,Bt,m")
 	(vec_select:<ssescalarmode>
-	  (match_operand:PEXTR_MODE12 1 "register_operand" "YW,YW")
+	  (match_operand:PEXTR_MODE12 1 "register_operand" "YW,YW,YW,YW")
 	  (parallel
 	    [(match_operand:SI 2 "const_0_to_<ssescalarnummask>_operand")])))]
   "TARGET_SSE2"
   "@
-   %vpextr<ssemodesuffix>\t{%2, %1, %k0|%k0, %1, %2}
-   %vpextr<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "isa" "*,sse4")
+   pextr<ssemodesuffix>\t{%2, %1, %k0|%k0, %1, %2}
+   vpextr<ssemodesuffix>\t{%2, %1, %k0|%k0, %1, %2}
+   pextr<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}
+   vpextr<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "sse2_noavx,avx,sse4_noavx,avx")
+   (set_attr "gpr32" "1,1,0,1")
    (set_attr "type" "sselog1")
    (set (attr "prefix_extra")
      (if_then_else
@@ -20028,20 +20055,23 @@ (define_insn "*vec_extract<mode>"
        (const_string "*")
        (const_string "1")))
    (set_attr "length_immediate" "1")
-   (set_attr "prefix" "maybe_vex,maybe_vex")
+   (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "TI")])
 
 (define_insn "*vec_extract<PEXTR_MODE12:mode>_zext"
-  [(set (match_operand:SWI48 0 "register_operand" "=r")
+  [(set (match_operand:SWI48 0 "register_operand" "=<vi128_h_r>,r")
 	(zero_extend:SWI48
 	  (vec_select:<PEXTR_MODE12:ssescalarmode>
-	    (match_operand:PEXTR_MODE12 1 "register_operand" "YW")
+	    (match_operand:PEXTR_MODE12 1 "register_operand" "YW,YW")
 	    (parallel
 	      [(match_operand:SI 2
 		"const_0_to_<PEXTR_MODE12:ssescalarnummask>_operand")]))))]
   "TARGET_SSE2"
-  "%vpextr<PEXTR_MODE12:ssemodesuffix>\t{%2, %1, %k0|%k0, %1, %2}"
-  [(set_attr "type" "sselog1")
+  "@
+   pextr<PEXTR_MODE12:ssemodesuffix>\t{%2, %1, %k0|%k0, %1, %2}
+   vpextr<PEXTR_MODE12:ssemodesuffix>\t{%2, %1, %k0|%k0, %1, %2}"
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "sselog1")
    (set (attr "prefix_extra")
      (if_then_else
        (eq (const_string "<PEXTR_MODE12:MODE>mode") (const_string "V8HImode"))
@@ -20052,15 +20082,18 @@ (define_insn "*vec_extract<PEXTR_MODE12:mode>_zext"
    (set_attr "mode" "TI")])
 
 (define_insn "*vec_extractv16qi_zext"
-  [(set (match_operand:HI 0 "register_operand" "=r")
+  [(set (match_operand:HI 0 "register_operand" "=h,r")
 	(zero_extend:HI
 	  (vec_select:QI
-	    (match_operand:V16QI 1 "register_operand" "YW")
+	    (match_operand:V16QI 1 "register_operand" "YW,YW")
 	    (parallel
 	      [(match_operand:SI 2 "const_0_to_15_operand")]))))]
   "TARGET_SSE4_1"
-  "%vpextrb\t{%2, %1, %k0|%k0, %1, %2}"
-  [(set_attr "type" "sselog1")
+  "@
+   pextrb\t{%2, %1, %k0|%k0, %1, %2}
+   vpextrb\t{%2, %1, %k0|%k0, %1, %2}"
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "sselog1")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "maybe_vex")
@@ -20166,24 +20199,26 @@ (define_split
   "operands[1] = gen_lowpart (SImode, operands[1]);")
 
 (define_insn "*vec_extractv4si"
-  [(set (match_operand:SI 0 "nonimmediate_operand" "=rm,rm,Yr,*x,Yw")
+  [(set (match_operand:SI 0 "nonimmediate_operand" "=hBt,rm,rm,Yr,*x,Yw")
 	(vec_select:SI
-	  (match_operand:V4SI 1 "register_operand" "  x, v, 0, 0,Yw")
+	  (match_operand:V4SI 1 "register_operand" "x,   x, v, 0, 0, Yw")
 	  (parallel [(match_operand:SI 2 "const_0_to_3_operand")])))]
   "TARGET_SSE4_1"
 {
   switch (which_alternative)
     {
     case 0:
+      return "pextrd\t{%2, %1, %0|%0, %1, %2}";
     case 1:
-      return "%vpextrd\t{%2, %1, %0|%0, %1, %2}";
-
     case 2:
+      return "vpextrd\t{%2, %1, %0|%0, %1, %2}";
+
     case 3:
+    case 4:
       operands[2] = GEN_INT (INTVAL (operands[2]) * 4);
       return "psrldq\t{%2, %0|%0, %2}";
 
-    case 4:
+    case 5:
       operands[2] = GEN_INT (INTVAL (operands[2]) * 4);
       return "vpsrldq\t{%2, %1, %0|%0, %1, %2}";
 
@@ -20191,25 +20226,29 @@ (define_insn "*vec_extractv4si"
       gcc_unreachable ();
     }
 }
-  [(set_attr "isa" "*,avx512dq,noavx,noavx,avx")
-   (set_attr "type" "sselog1,sselog1,sseishft1,sseishft1,sseishft1")
+  [(set_attr "isa" "noavx,avx,avx512dq,noavx,noavx,avx")
+   (set_attr "type" "sselog1,sselog1,sselog1,sseishft1,sseishft1,sseishft1")
+   (set_attr "gpr32" "0,1,1,1,1,1")
    (set (attr "prefix_extra")
      (if_then_else (eq_attr "alternative" "0,1")
 		   (const_string "1")
 		   (const_string "*")))
    (set_attr "length_immediate" "1")
-   (set_attr "prefix" "maybe_vex,evex,orig,orig,maybe_vex")
+   (set_attr "prefix" "orig,vex,evex,orig,orig,maybe_vex")
    (set_attr "mode" "TI")])
 
 (define_insn "*vec_extractv4si_zext"
-  [(set (match_operand:DI 0 "register_operand" "=r,r")
+  [(set (match_operand:DI 0 "register_operand" "=h,r,r")
 	(zero_extend:DI
 	  (vec_select:SI
-	    (match_operand:V4SI 1 "register_operand" "x,v")
+	    (match_operand:V4SI 1 "register_operand" "x,x,v")
 	    (parallel [(match_operand:SI 2 "const_0_to_3_operand")]))))]
   "TARGET_64BIT && TARGET_SSE4_1"
-  "%vpextrd\t{%2, %1, %k0|%k0, %1, %2}"
-  [(set_attr "isa" "*,avx512dq")
+  "@
+   pextrd\t{%2, %1, %k0|%k0, %1, %2}
+   vpextrd\t{%2, %1, %k0|%k0, %1, %2}
+   vpextrd\t{%2, %1, %k0|%k0, %1, %2}"
+  [(set_attr "isa" "noavx,avx,avx512dq")
    (set_attr "type" "sselog1")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
@@ -20239,13 +20278,14 @@ (define_insn_and_split "*vec_extractv4si_zext_mem"
 })
 
 (define_insn "*vec_extractv2di_1"
-  [(set (match_operand:DI 0 "nonimmediate_operand"     "=rm,rm,m,x,x,Yv,x,v,r")
+  [(set (match_operand:DI 0 "nonimmediate_operand"     "=hBt,rm,rm,m,x,x,Yv,x,v,r")
 	(vec_select:DI
-	  (match_operand:V2DI 1 "nonimmediate_operand"  "x ,v ,v,0,x, v,x,o,o")
+	  (match_operand:V2DI 1 "nonimmediate_operand"  "x, x ,v ,v,0,x, v,x,o,o")
 	  (parallel [(const_int 1)])))]
   "TARGET_SSE && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
   "@
-   %vpextrq\t{$1, %1, %0|%0, %1, 1}
+   pextrq\t{$1, %1, %0|%0, %1, 1}
+   vpextrq\t{$1, %1, %0|%0, %1, 1}
    vpextrq\t{$1, %1, %0|%0, %1, 1}
    %vmovhps\t{%1, %0|%0, %1}
    psrldq\t{$8, %0|%0, 8}
@@ -20256,44 +20296,47 @@ (define_insn "*vec_extractv2di_1"
    #"
   [(set (attr "isa")
      (cond [(eq_attr "alternative" "0")
-	      (const_string "x64_sse4")
+	      (const_string "x64_sse4_noavx")
 	    (eq_attr "alternative" "1")
+	      (const_string "x64_avx")
+	    (eq_attr "alternative" "2")
 	      (const_string "x64_avx512dq")
-	    (eq_attr "alternative" "3")
-	      (const_string "sse2_noavx")
 	    (eq_attr "alternative" "4")
-	      (const_string "avx")
+	      (const_string "sse2_noavx")
 	    (eq_attr "alternative" "5")
-	      (const_string "avx512bw")
+	      (const_string "avx")
 	    (eq_attr "alternative" "6")
-	      (const_string "noavx")
+	      (const_string "avx512bw")
 	    (eq_attr "alternative" "8")
+	      (const_string "noavx")
+	    (eq_attr "alternative" "9")
 	      (const_string "x64")
 	   ]
 	   (const_string "*")))
    (set (attr "type")
-     (cond [(eq_attr "alternative" "2,6,7")
+     (cond [(eq_attr "alternative" "3,7,8")
 	      (const_string "ssemov")
-	    (eq_attr "alternative" "3,4,5")
+	    (eq_attr "alternative" "4,5,6")
 	      (const_string "sseishft1")
-	    (eq_attr "alternative" "8")
+	    (eq_attr "alternative" "9")
 	      (const_string "imov")
 	   ]
 	   (const_string "sselog1")))
+   (set_attr "gpr32" "0,1,1,1,1,1,1,1,1,1")
    (set (attr "length_immediate")
-     (if_then_else (eq_attr "alternative" "0,1,3,4,5")
+     (if_then_else (eq_attr "alternative" "0,1,2,4,5,6")
 		   (const_string "1")
 		   (const_string "*")))
    (set (attr "prefix_rex")
-     (if_then_else (eq_attr "alternative" "0,1")
+     (if_then_else (eq_attr "alternative" "0")
 		   (const_string "1")
 		   (const_string "*")))
    (set (attr "prefix_extra")
-     (if_then_else (eq_attr "alternative" "0,1")
+     (if_then_else (eq_attr "alternative" "0")
 		   (const_string "1")
 		   (const_string "*")))
-   (set_attr "prefix" "maybe_vex,evex,maybe_vex,orig,vex,evex,orig,*,*")
-   (set_attr "mode" "TI,TI,V2SF,TI,TI,TI,V4SF,DI,DI")])
+   (set_attr "prefix" "orig,maybe_evex,evex,maybe_vex,orig,vex,evex,orig,*,*")
+   (set_attr "mode" "TI,TI,TI,V2SF,TI,TI,TI,V4SF,DI,DI")])
 
 (define_split
   [(set (match_operand:<ssescalarmode> 0 "register_operand")
@@ -20411,7 +20454,7 @@ (define_insn "*vec_concatv2si_sse4_1"
 	  (match_operand:SI 1 "nonimmediate_operand"
 	  "  0, 0, x,Yv, 0, 0,Yv,rm,  0,rm")
 	  (match_operand:SI 2 "nonimm_or_0_operand"
-	  " rm,rm,rm,rm,Yr,*x,Yv, C,*ym, C")))]
+	  " hBt,hBt,rm,rm,Yr,*x,Yv, C,*ym, C")))]
   "TARGET_SSE4_1 && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
   "@
    pinsrd\t{$1, %2, %0|%0, %2, 1}
@@ -20438,6 +20481,10 @@ (define_insn "*vec_concatv2si_sse4_1"
 	      (const_string "mmxmov")
 	   ]
 	   (const_string "sselog")))
+   (set (attr "gpr32")
+     (if_then_else (eq_attr "alternative" "0,1")
+		   (const_string "0")
+		   (const_string "1")))
    (set (attr "prefix_extra")
      (if_then_else (eq_attr "alternative" "0,1,2,3")
 		   (const_string "1")
@@ -20562,7 +20609,7 @@ (define_insn "vec_concatv2di"
 	  (match_operand:DI 1 "register_operand"
 	  "  0, 0,x ,Yv,0,Yv,0,0,v")
 	  (match_operand:DI 2 "nonimmediate_operand"
-	  " rm,rm,rm,rm,x,Yv,x,m,m")))]
+	  " hm,hm,rm,rm,x,Yv,x,m,m")))]
   "TARGET_SSE"
   "@
    pinsrq\t{$1, %2, %0|%0, %2, 1}
@@ -20592,6 +20639,10 @@ (define_insn "vec_concatv2di"
        (eq_attr "alternative" "0,1,2,3,4,5")
        (const_string "sselog")
        (const_string "ssemov")))
+   (set (attr "gpr32")
+     (if_then_else (eq_attr "alternative" "0,1")
+		   (const_string "0")
+		   (const_string "1")))
    (set (attr "prefix_rex")
      (if_then_else (eq_attr "alternative" "0,1,2,3")
 		   (const_string "1")
@@ -21525,7 +21576,7 @@ (define_insn "ssse3_pmaddubsw128"
 			   (const_int 12) (const_int 14)])))
 	    (sign_extend:V8HI
 	      (vec_select:V8QI
-		(match_operand:V16QI 2 "vector_operand" "xBm,Ywm")
+		(match_operand:V16QI 2 "vector_operand" "xBT,Ywm")
 		(parallel [(const_int 0) (const_int 2)
 			   (const_int 4) (const_int 6)
 			   (const_int 8) (const_int 10)
@@ -21548,6 +21599,7 @@ (define_insn "ssse3_pmaddubsw128"
    pmaddubsw\t{%2, %0|%0, %2}
    vpmaddubsw\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,avx")
+   (set_attr "gpr32" "0,1")
    (set_attr "type" "sseiadd")
    (set_attr "atom_unit" "simul")
    (set_attr "prefix_extra" "1")
@@ -21666,7 +21718,7 @@ (define_insn "*<ssse3_avx2>_pmulhrsw<mode>3<mask_name>"
 		  (sign_extend:<ssedoublemode>
 		    (match_operand:VI2_AVX2_AVX512BW 1 "vector_operand" "%0,<v_Yw>"))
 		  (sign_extend:<ssedoublemode>
-		    (match_operand:VI2_AVX2_AVX512BW 2 "vector_operand" "xBm,<v_Yw>m")))
+		    (match_operand:VI2_AVX2_AVX512BW 2 "vector_operand" "xBT,<v_Yw>m")))
 		(const_int 14))
 	      (match_operand:VI2_AVX2_AVX512BW 3 "const1_operand"))
 	    (const_int 1))))]
@@ -21676,6 +21728,7 @@ (define_insn "*<ssse3_avx2>_pmulhrsw<mode>3<mask_name>"
    pmulhrsw\t{%2, %0|%0, %2}
    vpmulhrsw\t{%2, %1, %0<mask_operand4>|%0<mask_operand4>, %1, %2}"
   [(set_attr "isa" "noavx,avx")
+   (set_attr "gpr32" "0,1")
    (set_attr "type" "sseimul")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "orig,maybe_evex")
@@ -21792,13 +21845,14 @@ (define_insn "<ssse3_avx2>_pshufb<mode>3<mask_name>"
   [(set (match_operand:VI1_AVX512 0 "register_operand" "=x,<v_Yw>")
 	(unspec:VI1_AVX512
 	  [(match_operand:VI1_AVX512 1 "register_operand" "0,<v_Yw>")
-	   (match_operand:VI1_AVX512 2 "vector_operand" "xBm,<v_Yw>m")]
+	   (match_operand:VI1_AVX512 2 "vector_operand" "xBT,<v_Yw>m")]
 	  UNSPEC_PSHUFB))]
   "TARGET_SSSE3 && <mask_mode512bit_condition> && <mask_avx512bw_condition>"
   "@
    pshufb\t{%2, %0|%0, %2}
    vpshufb\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "isa" "noavx,avx")
+   (set_attr "gpr32" "0,1")
    (set_attr "type" "sselog1")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "orig,maybe_evex")
@@ -21915,7 +21969,7 @@ (define_insn "<ssse3_avx2>_palignr<mode>"
   [(set (match_operand:VIMAX_AVX2_AVX512BW 0 "register_operand" "=x,<v_Yw>")
 	(unspec:VIMAX_AVX2_AVX512BW
 	  [(match_operand:VIMAX_AVX2_AVX512BW 1 "register_operand" "0,<v_Yw>")
-	   (match_operand:VIMAX_AVX2_AVX512BW 2 "vector_operand" "xBm,<v_Yw>m")
+	   (match_operand:VIMAX_AVX2_AVX512BW 2 "vector_operand" "xBT,<v_Yw>m")
 	   (match_operand:SI 3 "const_0_to_255_mul_8_operand")]
 	  UNSPEC_PALIGNR))]
   "TARGET_SSSE3"
@@ -21933,6 +21987,7 @@ (define_insn "<ssse3_avx2>_palignr<mode>"
     }
 }
   [(set_attr "isa" "noavx,avx")
+   (set_attr "gpr32" "0,1")
    (set_attr "type" "sseishft")
    (set_attr "atom_unit" "sishuf")
    (set_attr "prefix_extra" "1")
@@ -22007,6 +22062,7 @@ (define_insn_and_split "ssse3_palignrdi"
 }
   [(set_attr "mmx_isa" "native,sse_noavx,avx")
    (set_attr "type" "sseishft")
+   (set_attr "gpr32" "0,0,1")
    (set_attr "atom_unit" "sishuf")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
@@ -22022,12 +22078,16 @@ (define_mode_iterator VI1248_AVX512VL_AVX512BW
    (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")])
 
 (define_insn "*abs<mode>2"
-  [(set (match_operand:VI1248_AVX512VL_AVX512BW 0 "register_operand" "=<v_Yw>")
+  [(set (match_operand:VI1248_AVX512VL_AVX512BW 0 "register_operand" "=x,<v_Yw>")
 	(abs:VI1248_AVX512VL_AVX512BW
-	  (match_operand:VI1248_AVX512VL_AVX512BW 1 "vector_operand" "<v_Yw>Bm")))]
+	  (match_operand:VI1248_AVX512VL_AVX512BW 1 "vector_operand" "xBT,<v_Yw>Bm")))]
   "TARGET_SSSE3"
-  "%vpabs<ssemodesuffix>\t{%1, %0|%0, %1}"
-  [(set_attr "type" "sselog1")
+  "@
+   pabs<ssemodesuffix>\t{%1, %0|%0, %1}
+   vpabs<ssemodesuffix>\t{%1, %0|%0, %1}"
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "gpr32" "0,1")
+   (set_attr "type" "sselog1")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "<sseinsnmode>")])
@@ -22365,11 +22425,15 @@ (define_mode_attr vi8_sse4_1_avx2_avx512
 
 (define_insn "<vi8_sse4_1_avx2_avx512>_movntdqa"
   [(set (match_operand:VI8_AVX2_AVX512F 0 "register_operand" "=Yr,*x,v")
-	(unspec:VI8_AVX2_AVX512F [(match_operand:VI8_AVX2_AVX512F 1 "memory_operand" "m,m,m")]
+	(unspec:VI8_AVX2_AVX512F [(match_operand:VI8_AVX2_AVX512F 1 "memory_operand" "Bt,Bt,m")]
 		     UNSPEC_MOVNTDQA))]
   "TARGET_SSE4_1"
-  "%vmovntdqa\t{%1, %0|%0, %1}"
+  "@
+   movntdqa\t{%1, %0|%0, %1}
+   movntdqa\t{%1, %0|%0, %1}
+   vmovntdqa\t{%1, %0|%0, %1}"
   [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "gpr32" "0,0,1")
    (set_attr "type" "ssemov")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "orig,orig,maybe_evex")
@@ -22388,6 +22452,7 @@ (define_insn "<sse4_1_avx2>_mpsadbw"
    mpsadbw\t{%3, %2, %0|%0, %2, %3}
    vmpsadbw\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "gpr32" "0,0,1")
    (set_attr "type" "sselog1")
    (set_attr "gpr32" "0")
    (set_attr "length_immediate" "1")
@@ -22401,7 +22466,7 @@ (define_insn "<sse4_1_avx2>_packusdw<mask_name>"
   [(set (match_operand:VI2_AVX2_AVX512BW 0 "register_operand" "=Yr,*x,<v_Yw>")
 	(unspec:VI2_AVX2_AVX512BW
 	  [(match_operand:<sseunpackmode> 1 "register_operand" "0,0,<v_Yw>")
-	   (match_operand:<sseunpackmode> 2 "vector_operand" "YrBm,*xBm,<v_Yw>m")]
+	   (match_operand:<sseunpackmode> 2 "vector_operand" "YrBT,*xBT,<v_Yw>m")]
 	   UNSPEC_US_TRUNCATE))]
   "TARGET_SSE4_1 && <mask_mode512bit_condition> && <mask_avx512bw_condition>"
   "@
@@ -22409,6 +22474,7 @@ (define_insn "<sse4_1_avx2>_packusdw<mask_name>"
    packusdw\t{%2, %0|%0, %2}
    vpackusdw\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
   [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "gpr32" "0,0,1")
    (set_attr "type" "sselog")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "orig,orig,<mask_prefix>")
@@ -22755,10 +22821,14 @@ (define_insn "sse4_1_<code>v8qiv8hi2<mask_name>"
 (define_insn "*sse4_1_<code>v8qiv8hi2<mask_name>_1"
   [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,Yw")
 	(any_extend:V8HI
-	  (match_operand:V8QI 1 "memory_operand" "m,m,m")))]
+	  (match_operand:V8QI 1 "memory_operand" "Bt,Bt,m")))]
   "TARGET_SSE4_1 && <mask_avx512bw_condition> && <mask_avx512vl_condition>"
-  "%vpmov<extsuffix>bw\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
+  "@
+   pmov<extsuffix>bw\t{%1, %0|%0, %1}
+   pmov<extsuffix>bw\t{%1, %0|%0, %1}
+   vpmov<extsuffix>bw\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "gpr32" "0,0,1")
    (set_attr "type" "ssemov")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "orig,orig,maybe_evex")
@@ -22788,7 +22858,7 @@ (define_insn_and_split "*sse4_1_zero_extendv8qiv8hi2_3"
   [(set (match_operand:V16QI 0 "register_operand" "=Yr,*x,Yw")
 	(vec_select:V16QI
 	  (vec_concat:V32QI
-	    (match_operand:V16QI 1 "vector_operand" "YrBm,*xBm,Ywm")
+	    (match_operand:V16QI 1 "vector_operand" "YrBT,*xBT,Ywm")
 	    (match_operand:V16QI 2 "const0_operand"))
 	  (match_parallel 3 "pmovzx_parallel"
 	    [(match_operand 4 "const_int_operand")])))]
@@ -22813,7 +22883,8 @@ (define_insn_and_split "*sse4_1_zero_extendv8qiv8hi2_3"
       DONE;
     }
 }
-  [(set_attr "isa" "noavx,noavx,avx")])
+  [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "gpr32" "0,0,1")])
 
 (define_insn_and_split "*sse4_1_zero_extendv8qiv8hi2_4"
   [(set (match_operand:V16QI 0 "register_operand" "=Yr,*x,Yw")
@@ -22821,7 +22892,7 @@ (define_insn_and_split "*sse4_1_zero_extendv8qiv8hi2_4"
 	  (vec_concat:V32QI
 	    (subreg:V16QI
 	      (vec_concat:VI248_128
-		(match_operand:<ssehalfvecmode> 1 "vector_operand" "YrBm,*xBm,Ywm")
+		(match_operand:<ssehalfvecmode> 1 "vector_operand" "YrBT,*xBT,Ywm")
 		(match_operand:<ssehalfvecmode> 2 "const0_operand")) 0)
 	    (match_operand:V16QI 3 "const0_operand"))
 	  (match_parallel 4 "pmovzx_parallel"
@@ -22848,7 +22919,8 @@ (define_insn_and_split "*sse4_1_zero_extendv8qiv8hi2_4"
     }
   operands[1] = lowpart_subreg (V16QImode, operands[1], <ssehalfvecmode>mode);
 }
-  [(set_attr "isa" "noavx,noavx,avx")])
+  [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "gpr32" "0,0,1")])
 
 (define_expand "<insn>v8qiv8hi2"
   [(set (match_operand:V8HI 0 "register_operand")
@@ -22967,10 +23039,11 @@ (define_insn "sse4_1_<code>v4qiv4si2<mask_name>"
 (define_insn "*sse4_1_<code>v4qiv4si2<mask_name>_1"
   [(set (match_operand:V4SI 0 "register_operand" "=Yr,*x,v")
 	(any_extend:V4SI
-	  (match_operand:V4QI 1 "memory_operand" "m,m,m")))]
+	  (match_operand:V4QI 1 "memory_operand" "Bt,Bt,m")))]
   "TARGET_SSE4_1 && <mask_avx512vl_condition>"
   "%vpmov<extsuffix>bd\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "gpr32" "0,0,1")
    (set_attr "type" "ssemov")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "orig,orig,maybe_evex")
@@ -23139,10 +23212,14 @@ (define_insn "sse4_1_<code>v4hiv4si2<mask_name>"
 (define_insn "*sse4_1_<code>v4hiv4si2<mask_name>_1"
   [(set (match_operand:V4SI 0 "register_operand" "=Yr,*x,v")
 	(any_extend:V4SI
-	  (match_operand:V4HI 1 "memory_operand" "m,m,m")))]
+	  (match_operand:V4HI 1 "memory_operand" "Bt,Bt,m")))]
   "TARGET_SSE4_1 && <mask_avx512vl_condition>"
-  "%vpmov<extsuffix>wd\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
+  "@
+   pmov<extsuffix>wd\t{%1, %0|%0, %1}
+   pmov<extsuffix>wd\t{%1, %0|%0, %1}
+   vpmov<extsuffix>wd\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "gpr32" "0,0,1")
    (set_attr "type" "ssemov")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "orig,orig,maybe_evex")
@@ -23191,7 +23268,7 @@ (define_insn_and_split "*sse4_1_zero_extendv4hiv4si2_3"
   [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,v")
 	(vec_select:V8HI
 	  (vec_concat:V16HI
-	    (match_operand:V8HI 1 "vector_operand" "YrBm,*xBm,vm")
+	    (match_operand:V8HI 1 "vector_operand" "YrBT,*xBT,vm")
 	    (match_operand:V8HI 2 "const0_operand"))
 	  (match_parallel 3 "pmovzx_parallel"
 	    [(match_operand 4 "const_int_operand")])))]
@@ -23214,7 +23291,8 @@ (define_insn_and_split "*sse4_1_zero_extendv4hiv4si2_3"
       DONE;
     }
 }
-  [(set_attr "isa" "noavx,noavx,avx")])
+  [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "gpr32" "0,0,1")])
 
 (define_insn_and_split "*sse4_1_zero_extendv4hiv4si2_4"
   [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,v")
@@ -23222,7 +23300,7 @@ (define_insn_and_split "*sse4_1_zero_extendv4hiv4si2_4"
 	  (vec_concat:V16HI
 	    (subreg:V8HI
 	      (vec_concat:VI148_128
-		(match_operand:<ssehalfvecmode> 1 "vector_operand" "YrBm,*xBm,vm")
+		(match_operand:<ssehalfvecmode> 1 "vector_operand" "YrBT,*xBT,vm")
 		(match_operand:<ssehalfvecmode> 2 "const0_operand")) 0)
 	    (match_operand:V8HI 3 "const0_operand"))
 	  (match_parallel 4 "pmovzx_parallel"
@@ -23247,7 +23325,8 @@ (define_insn_and_split "*sse4_1_zero_extendv4hiv4si2_4"
     }
   operands[1] = lowpart_subreg (V8HImode, operands[1], <ssehalfvecmode>mode);
 }
-  [(set_attr "isa" "noavx,noavx,avx")])
+  [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "gpr32" "0,0,1")])
 
 (define_insn "avx512f_<code>v8qiv8di2<mask_name>"
   [(set (match_operand:V8DI 0 "register_operand" "=v")
@@ -23385,12 +23464,16 @@ (define_insn "sse4_1_<code>v2qiv2di2<mask_name>"
    (set_attr "mode" "TI")])
 
 (define_insn "*sse4_1_<code>v2qiv2di2<mask_name>_1"
-  [(set (match_operand:V2DI 0 "register_operand" "=v")
+  [(set (match_operand:V2DI 0 "register_operand" "=x,v")
 	(any_extend:V2DI
-	 (match_operand:V2QI 1 "memory_operand" "m")))]
+	 (match_operand:V2QI 1 "memory_operand" "Bt,m")))]
   "TARGET_SSE4_1 && <mask_avx512vl_condition>"
-  "%vpmov<extsuffix>bq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
-  [(set_attr "type" "ssemov")
+  "@
+   pmov<extsuffix>bq\t{%1, %0|%0, %1}
+   vpmov<extsuffix>bq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "gpr32" "0,1")
+   (set_attr "type" "ssemov")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "maybe_evex")
    (set_attr "mode" "TI")])
@@ -23524,10 +23607,14 @@ (define_insn "sse4_1_<code>v2hiv2di2<mask_name>"
 (define_insn "*sse4_1_<code>v2hiv2di2<mask_name>_1"
   [(set (match_operand:V2DI 0 "register_operand" "=Yr,*x,v")
 	(any_extend:V2DI
-	  (match_operand:V2HI 1 "memory_operand" "m,m,m")))]
+	  (match_operand:V2HI 1 "memory_operand" "Bt,Bt,m")))]
   "TARGET_SSE4_1 && <mask_avx512vl_condition>"
-  "%vpmov<extsuffix>wq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
+  "@
+   pmov<extsuffix>wq\t{%1, %0|%0, %1}
+   pmov<extsuffix>wq\t{%1, %0|%0, %1}
+   vpmov<extsuffix>wq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "gpr32" "0,0,1")
    (set_attr "type" "ssemov")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "orig,orig,maybe_evex")
@@ -23689,10 +23776,14 @@ (define_insn "sse4_1_<code>v2siv2di2<mask_name>"
 (define_insn "*sse4_1_<code>v2siv2di2<mask_name>_1"
   [(set (match_operand:V2DI 0 "register_operand" "=Yr,*x,v")
 	(any_extend:V2DI
-	  (match_operand:V2SI 1 "memory_operand" "m,m,m")))]
+	  (match_operand:V2SI 1 "memory_operand" "Bt,Bt,m")))]
   "TARGET_SSE4_1 && <mask_avx512vl_condition>"
-  "%vpmov<extsuffix>dq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
+  "@
+   pmov<extsuffix>dq\t{%1, %0|%0, %1}
+   pmov<extsuffix>dq\t{%1, %0|%0, %1}
+   vpmov<extsuffix>dq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "gpr32" "0,0,1")
    (set_attr "type" "ssemov")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "orig,orig,maybe_evex")
@@ -23719,7 +23810,7 @@ (define_insn_and_split "*sse4_1_zero_extendv2siv2di2_3"
   [(set (match_operand:V4SI 0 "register_operand" "=Yr,*x,v")
 	(vec_select:V4SI
 	  (vec_concat:V8SI
-	    (match_operand:V4SI 1 "vector_operand" "YrBm,*xBm,vm")
+	    (match_operand:V4SI 1 "vector_operand" "YrBT,*xBT,vm")
 	    (match_operand:V4SI 2 "const0_operand"))
 	  (match_parallel 3 "pmovzx_parallel"
 	    [(match_operand 4 "const_int_operand")])))]
@@ -23740,14 +23831,15 @@ (define_insn_and_split "*sse4_1_zero_extendv2siv2di2_3"
       DONE;
     }
 }
-  [(set_attr "isa" "noavx,noavx,avx")])
+  [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "gpr32" "0,0,1")])
 
 (define_insn_and_split "*sse4_1_zero_extendv2siv2di2_4"
   [(set (match_operand:V4SI 0 "register_operand" "=Yr,*x,v")
 	(vec_select:V4SI
 	  (vec_concat:V8SI
 	    (vec_concat:V4SI
-	      (match_operand:V2SI 1 "vector_operand" "YrBm, *xBm, vm")
+	      (match_operand:V2SI 1 "vector_operand" "YrBT, *xBT, vm")
 	      (match_operand:V2SI 2 "const0_operand"))
 	    (match_operand:V4SI 3 "const0_operand"))
 	  (match_parallel 4 "pmovzx_parallel"
@@ -23769,7 +23861,8 @@ (define_insn_and_split "*sse4_1_zero_extendv2siv2di2_4"
     }
   operands[1] = lowpart_subreg (V4SImode, operands[1], V2SImode);
 }
-  [(set_attr "isa" "noavx,noavx,avx")])
+  [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "gpr32" "0,0,1")])
 
 (define_expand "<insn>v2siv2di2"
   [(set (match_operand:V2DI 0 "register_operand")
@@ -25960,7 +26053,7 @@ (define_insn "xop_vpermil2<mode>3"
 (define_insn "aesenc"
   [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
 	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
-		       (match_operand:V2DI 2 "vector_operand" "xBm,xm,vm")]
+		       (match_operand:V2DI 2 "vector_operand" "xBT,xm,vm")]
 		      UNSPEC_AESENC))]
   "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
   "@
@@ -25969,6 +26062,7 @@ (define_insn "aesenc"
    vaesenc\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,aes,avx512vl")
    (set_attr "type" "sselog1")
+   (set_attr "gpr32" "0,1,1")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "orig,vex,evex")
    (set_attr "btver2_decode" "double,double,double")
@@ -25977,7 +26071,7 @@ (define_insn "aesenc"
 (define_insn "aesenclast"
   [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
 	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
-		       (match_operand:V2DI 2 "vector_operand" "xBm,xm,vm")]
+		       (match_operand:V2DI 2 "vector_operand" "xBT,xm,vm")]
 		      UNSPEC_AESENCLAST))]
   "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
   "@
@@ -25986,6 +26080,7 @@ (define_insn "aesenclast"
    vaesenclast\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,aes,avx512vl")
    (set_attr "type" "sselog1")
+   (set_attr "gpr32" "0,1,1")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "orig,vex,evex")
    (set_attr "btver2_decode" "double,double,double") 
@@ -25994,7 +26089,7 @@ (define_insn "aesenclast"
 (define_insn "aesdec"
   [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
 	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
-		       (match_operand:V2DI 2 "vector_operand" "xBm,xm,vm")]
+		       (match_operand:V2DI 2 "vector_operand" "xBT,xm,vm")]
 		      UNSPEC_AESDEC))]
   "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
   "@
@@ -26003,6 +26098,7 @@ (define_insn "aesdec"
    vaesdec\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,aes,avx512vl")
    (set_attr "type" "sselog1")
+   (set_attr "gpr32" "0,1,1")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "orig,vex,evex")
    (set_attr "btver2_decode" "double,double,double") 
@@ -26011,7 +26107,7 @@ (define_insn "aesdec"
 (define_insn "aesdeclast"
   [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
 	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
-		       (match_operand:V2DI 2 "vector_operand" "xBm,xm,vm")]
+		       (match_operand:V2DI 2 "vector_operand" "xBT,xm,vm")]
 		      UNSPEC_AESDECLAST))]
   "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
   "@
@@ -26019,6 +26115,7 @@ (define_insn "aesdeclast"
    vaesdeclast\t{%2, %1, %0|%0, %1, %2}
    vaesdeclast\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,aes,avx512vl")
+   (set_attr "gpr32" "0,1,1")
    (set_attr "type" "sselog1")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "orig,vex,evex")
@@ -26054,7 +26151,7 @@ (define_insn "aeskeygenassist"
 (define_insn "pclmulqdq"
   [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
 	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
-		      (match_operand:V2DI 2 "vector_operand" "xBm,xm,vm")
+		      (match_operand:V2DI 2 "vector_operand" "xBT,xm,vm")
 		      (match_operand:SI 3 "const_0_to_255_operand")]
 		     UNSPEC_PCLMUL))]
   "TARGET_PCLMUL"
@@ -26064,6 +26161,7 @@ (define_insn "pclmulqdq"
    vpclmulqdq\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "isa" "noavx,avx,vpclmulqdqvl")
    (set_attr "type" "sselog1")
+   (set_attr "gpr32" "0,1,1")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "orig,vex,evex")
@@ -29395,7 +29493,7 @@ (define_insn "vgf2p8affineinvqb_<mode><mask_name>"
   [(set (match_operand:VI1_AVX512F 0 "register_operand" "=x,v")
 	(unspec:VI1_AVX512F
 	  [(match_operand:VI1_AVX512F 1 "register_operand" "0,v")
-	   (match_operand:VI1_AVX512F 2 "vector_operand" "xBm,vm")
+	   (match_operand:VI1_AVX512F 2 "vector_operand" "xBT,vm")
 	   (match_operand 3 "const_0_to_255_operand")]
 	  UNSPEC_GF2P8AFFINEINV))]
   "TARGET_GFNI"
@@ -29403,6 +29501,7 @@ (define_insn "vgf2p8affineinvqb_<mode><mask_name>"
    gf2p8affineinvqb\t{%3, %2, %0| %0, %2, %3}
    vgf2p8affineinvqb\t{%3, %2, %1, %0<mask_operand4>| %0<mask_operand4>, %1, %2, %3}"
   [(set_attr "isa" "noavx,avx")
+   (set_attr "gpr32" "0,1")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "orig,maybe_evex")
    (set_attr "mode" "<sseinsnmode>")])
@@ -29411,7 +29510,7 @@ (define_insn "vgf2p8affineqb_<mode><mask_name>"
   [(set (match_operand:VI1_AVX512F 0 "register_operand" "=x,v")
 	(unspec:VI1_AVX512F
 	  [(match_operand:VI1_AVX512F 1 "register_operand" "0,v")
-	   (match_operand:VI1_AVX512F 2 "vector_operand" "xBm,vm")
+	   (match_operand:VI1_AVX512F 2 "vector_operand" "xBT,vm")
 	   (match_operand 3 "const_0_to_255_operand")]
 	  UNSPEC_GF2P8AFFINE))]
   "TARGET_GFNI"
@@ -29419,6 +29518,7 @@ (define_insn "vgf2p8affineqb_<mode><mask_name>"
    gf2p8affineqb\t{%3, %2, %0| %0, %2, %3}
    vgf2p8affineqb\t{%3, %2, %1, %0<mask_operand4>| %0<mask_operand4>, %1, %2, %3}"
   [(set_attr "isa" "noavx,avx")
+   (set_attr "gpr32" "0,1")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "orig,maybe_evex")
    (set_attr "mode" "<sseinsnmode>")])
@@ -29427,13 +29527,14 @@ (define_insn "vgf2p8mulb_<mode><mask_name>"
   [(set (match_operand:VI1_AVX512F 0 "register_operand" "=x,v")
 	(unspec:VI1_AVX512F
 	  [(match_operand:VI1_AVX512F 1 "register_operand" "%0,v")
-	   (match_operand:VI1_AVX512F 2 "vector_operand" "xBm,vm")]
+	   (match_operand:VI1_AVX512F 2 "vector_operand" "xBT,vm")]
 	  UNSPEC_GF2P8MUL))]
   "TARGET_GFNI"
   "@
    gf2p8mulb\t{%2, %0| %0, %2}
    vgf2p8mulb\t{%2, %1, %0<mask_operand3>| %0<mask_operand3>, %1, %2}"
   [(set_attr "isa" "noavx,avx")
+   (set_attr "gpr32" "0,1")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "orig,maybe_evex")
    (set_attr "mode" "<sseinsnmode>")])
-- 
2.31.1


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 13/13] [APX EGPR] Handle vex insns that only support GPR16 (5/5)
  2023-08-31  8:20 [PATCH 00/13] [RFC] Support Intel APX EGPR Hongyu Wang
                   ` (11 preceding siblings ...)
  2023-08-31  8:20 ` [PATCH 12/13] [APX_EGPR] Handle legacy insns that only support GPR16 (4/5) Hongyu Wang
@ 2023-08-31  8:20 ` Hongyu Wang
  2023-08-31  9:19 ` [PATCH 00/13] [RFC] Support Intel APX EGPR Richard Biener
  13 siblings, 0 replies; 49+ messages in thread
From: Hongyu Wang @ 2023-08-31  8:20 UTC (permalink / raw)
  To: gcc-patches; +Cc: hongtao.liu, ubizjak, hubicka, vmakarov, jakub, Kong Lingling

From: Kong Lingling <lingling.kong@intel.com>

These vex insn may have legacy counterpart that could support EGPR,
but they do not have evex counterpart. Split out its vex part from
patterns and set the vex part to non-EGPR supported by adjusting
constraints and attr_gpr32.

insn list:
1. vmovmskpd/vmovmskps
2. vpmovmskb
3. vrsqrtss/vrsqrtps
4. vrcpss/vrcpps
5. vhaddpd/vhaddps, vhsubpd/vhsubps
6. vldmxcsr/vstmxcsr
7. vaddsubpd/vaddsubps
8. vlddqu
9. vtestps/vtestpd
10. vmaskmovps/vmaskmovpd, vpmaskmovd/vpmaskmovq
11. vperm2f128/vperm2i128
12. vinserti128/vinsertf128
13. vbroadcasti128/vbroadcastf128
14. vcmppd/vcmpps, vcmpss/vcmpsd
15. vgatherdps/vgatherqps, vgatherdpd/vgatherqpd

gcc/ChangeLog:

	* config/i386/constraints.md (TV): New constraint for vsib memory
	that does not allow gpr32.
	* config/i386/i386.md: (setcc_<mode>_sse): Replace m to Bt for avx
	alternative and set attr_gpr32 to 0.
	(movmsk_df): Split avx/noavx alternatives and  replace "r" to "h" for
	avx alternative.
	(<sse>_rcp<mode>2): Split avx/noavx alternatives and replace
	"m/Bm" to "Bt/BT" for avx alternative, set its gpr32 attr to 0.
	(*rsqrtsf2_sse): Likewise.
	* config/i386/mmx.md (mmx_pmovmskb): Split alternative 1 to
	avx/noavx and assign h/r constraint to dest.
	* config/i386/sse.md (<sse>_movmsk<ssemodesuffix><avxsizesuffix>):
	Split avx/noavx alternatives and replace "r" to "h" for avx alternative.
	(*<sse>_movmsk<ssemodesuffix><avxsizesuffix>_<u>ext): Likewise.
	(*<sse>_movmsk<ssemodesuffix><avxsizesuffix>_lt): Likewise.
	(*<sse>_movmsk<ssemodesuffix><avxsizesuffix>_<u>ext_lt): Likewise.
	(*<sse>_movmsk<ssemodesuffix><avxsizesuffix>_shift): Likewise.
	(*<sse>_movmsk<ssemodesuffix><avxsizesuffix>_<u>ext_shift): Likewise.
	(<sse2_avx2>_pmovmskb): Likewise.
	(*<sse2_avx2>_pmovmskb_zext): Likewise.
	(*sse2_pmovmskb_ext): Likewise.
	(*<sse2_avx2>_pmovmskb_lt): Likewise.
	(*<sse2_avx2>_pmovmskb_zext_lt): Likewise.
	(*sse2_pmovmskb_ext_lt): Likewise.
	(<sse>_rcp<mode>2): Split avx/noavx alternatives and replace
	"m/Bm" to "Bt/BT" for avx alternative, set its attr_gpr32 to 0.
	(sse_vmrcpv4sf2): Likewise.
	(*sse_vmrcpv4sf2): Likewise.
	(rsqrt<mode>2): Likewise.
	(sse_vmrsqrtv4sf2): Likewise.
	(*sse_vmrsqrtv4sf2): Likewise.
	(avx_h<insn>v4df3): Likewise.
	(sse3_hsubv2df3): Likewise.
	(avx_h<insn>v8sf3): Likewise.
	(sse3_h<insn>v4sf3): Likewise.
	(<sse3>_lddqu<avxsizesuffix>): Likewise.
	(avx_cmp<mode>3): Likewise.
	(avx_vmcmp<mode>3): Likewise.
	(*sse2_gt<mode>3): Likewise.
	(sse_ldmxcsr): Likewise.
	(sse_stmxcsr): Likewise.
	(avx_vtest<ssemodesuffix><avxsizesuffix>): Replace m to Bt for
	avx alternative and set attr_gpr32 to 0.
	(avx2_permv2ti): Likewise.
	(*avx_vperm2f128<mode>_full): Likewise.
	(*avx_vperm2f128<mode>_nozero): Likewise.
	(vec_set_lo_v32qi): Likewise.
	(<avx_avx2>_maskload<ssemodesuffix><avxsizesuffix>): Likewise.
	(<avx_avx2>_maskstore<ssemodesuffix><avxsi)zesuffix>: Likewise.
	(avx_cmp<mode>3): Likewise.
	(avx_vmcmp<mode>3): Likewise.
	(*<sse>_maskcmp<mode>3_comm): Likewise.
	(*avx2_gathersi<VEC_GATHER_MODE:mode>): Replace Tv to TV and set
	attr_gpr32 to 0.
	(*avx2_gathersi<VEC_GATHER_MODE:mode>_2): Likewise.
	(*avx2_gatherdi<VEC_GATHER_MODE:mode>): Likewise.
	(*avx2_gatherdi<VEC_GATHER_MODE:mode>_2): Likewise.
	(*avx2_gatherdi<VI4F_256:mode>_3): Likewise.
	(*avx2_gatherdi<VI4F_256:mode>_4): Likewise.
	(avx_vbroadcastf128_<mode>): Restrict non-egpr alternative to
	noavx512vl, set its constraint to Bt and set attr_gpr32 to 0.
	(vec_set_lo_<mode><mask_name>): Likewise.
	(vec_set_lo_<mode><mask_name>): Likewise for SF/SI modes.
	(vec_set_hi_<mode><mask_name>): Likewise.
	(vec_set_hi_<mode><mask_name>): Likewise for SF/SI modes.
	(vec_set_hi_<mode>): Likewise.
	(vec_set_lo_<mode>): Likewise.
	(avx2_set_hi_v32qi): Likewise.
---
 gcc/config/i386/constraints.md |   7 +
 gcc/config/i386/i386.md        |  52 +++--
 gcc/config/i386/mmx.md         |  11 +-
 gcc/config/i386/sse.md         | 337 +++++++++++++++++++++------------
 4 files changed, 261 insertions(+), 146 deletions(-)

diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
index f487bf2e5a3..052b6a95841 100644
--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -374,6 +374,7 @@ (define_constraint "Z"
 
 ;; T prefix is used for different address constraints
 ;;   v - VSIB address
+;;   V - VSIB address with no rex2 register
 ;;   s - address with no segment register
 ;;   i - address with no index and no rip
 ;;   b - address with no base and no rip
@@ -386,5 +387,11 @@ (define_address_constraint "Ts"
   "Address operand without segment register"
   (match_operand 0 "address_no_seg_operand"))
 
+(define_address_constraint "TV"
+  "VSIB address operand"
+  (and (match_operand 0 "vsib_address_operand")
+       (not (and (match_test "TARGET_APX_EGPR")
+		 (match_test "x86_extended_rex2reg_mentioned_p (op)")))))
+
 (define_register_constraint  "h"
  "TARGET_APX_EGPR ? GENERAL_GPR16 : GENERAL_REGS")
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 8ec249b268d..d31c1910026 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -554,7 +554,8 @@ (define_attr "isa" "base,x64,nox64,x64_sse2,x64_sse4,x64_sse4_noavx,
 		    avx,noavx,avx2,noavx2,bmi,bmi2,fma4,fma,avx512f,noavx512f,
 		    avx512bw,noavx512bw,avx512dq,noavx512dq,fma_or_avx512vl,
 		    avx512vl,noavx512vl,avxvnni,avx512vnnivl,avx512fp16,avxifma,
-		    avx512ifmavl,avxneconvert,avx512bf16vl,vpclmulqdqvl"
+		    avx512ifmavl,avxneconvert,avx512bf16vl,vpclmulqdqvl,
+		    avx_noavx512f,avx_noavx512vl"
   (const_string "base"))
 
 ;; The (bounding maximum) length of an instruction immediate.
@@ -908,6 +909,8 @@ (define_attr "enabled" ""
 	 (eq_attr "isa" "sse4_noavx")
 	   (symbol_ref "TARGET_SSE4_1 && !TARGET_AVX")
 	 (eq_attr "isa" "avx") (symbol_ref "TARGET_AVX")
+	 (eq_attr "isa" "avx_noavx512f")
+	   (symbol_ref "TARGET_AVX && !TARGET_AVX512F")
 	 (eq_attr "isa" "noavx") (symbol_ref "!TARGET_AVX")
 	 (eq_attr "isa" "avx2") (symbol_ref "TARGET_AVX2")
 	 (eq_attr "isa" "noavx2") (symbol_ref "!TARGET_AVX2")
@@ -16665,12 +16668,13 @@ (define_insn "setcc_<mode>_sse"
   [(set (match_operand:MODEF 0 "register_operand" "=x,x")
 	(match_operator:MODEF 3 "sse_comparison_operator"
 	  [(match_operand:MODEF 1 "register_operand" "0,x")
-	   (match_operand:MODEF 2 "nonimmediate_operand" "xm,xm")]))]
+	   (match_operand:MODEF 2 "nonimmediate_operand" "xm,xBt")]))]
   "SSE_FLOAT_MODE_P (<MODE>mode)"
   "@
    cmp%D3<ssemodesuffix>\t{%2, %0|%0, %2}
    vcmp%D3<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,avx")
+   (set_attr "gpr32" "1,0")
    (set_attr "type" "ssecmp")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "orig,vex")
@@ -20126,24 +20130,28 @@ (define_insn "*<insn>hf"
    (set_attr "mode" "HF")])
 
 (define_insn "*rcpsf2_sse"
-  [(set (match_operand:SF 0 "register_operand" "=x,x,x")
-	(unspec:SF [(match_operand:SF 1 "nonimmediate_operand" "0,x,m")]
+  [(set (match_operand:SF 0 "register_operand" "=x,x,x,x")
+	(unspec:SF [(match_operand:SF 1 "nonimmediate_operand" "0,x,m,BT")]
 		   UNSPEC_RCP))]
   "TARGET_SSE && TARGET_SSE_MATH"
   "@
    %vrcpss\t{%d1, %0|%0, %d1}
    %vrcpss\t{%d1, %0|%0, %d1}
-   %vrcpss\t{%1, %d0|%d0, %1}"
-  [(set_attr "type" "sse")
+   rcpss\t{%1, %d0|%d0, %1}
+   vrcpss\t{%1, %d0|%d0, %1}"
+  [(set_attr "isa" "*,*,noavx,avx")
+   (set_attr "gpr32" "1,1,1,0")
+   (set_attr "type" "sse")
+   (set_attr "gpr32" "0")
    (set_attr "atom_sse_attr" "rcp")
    (set_attr "btver2_sse_attr" "rcp")
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "SF")
-   (set_attr "avx_partial_xmm_update" "false,false,true")
+   (set_attr "avx_partial_xmm_update" "false,false,true,true")
    (set (attr "preferred_for_speed")
       (cond [(match_test "TARGET_AVX")
 	       (symbol_ref "true")
-	     (eq_attr "alternative" "1,2")
+	     (eq_attr "alternative" "1,2,3")
 	       (symbol_ref "!TARGET_SSE_PARTIAL_REG_DEPENDENCY")
 	    ]
 	    (symbol_ref "true")))])
@@ -20386,24 +20394,27 @@ (define_insn "sqrtxf2"
    (set_attr "bdver1_decode" "direct")])
 
 (define_insn "*rsqrtsf2_sse"
-  [(set (match_operand:SF 0 "register_operand" "=x,x,x")
-	(unspec:SF [(match_operand:SF 1 "nonimmediate_operand" "0,x,m")]
+  [(set (match_operand:SF 0 "register_operand" "=x,x,x,x")
+	(unspec:SF [(match_operand:SF 1 "nonimmediate_operand" "0,x,m,BT")]
 		   UNSPEC_RSQRT))]
   "TARGET_SSE && TARGET_SSE_MATH"
   "@
    %vrsqrtss\t{%d1, %0|%0, %d1}
    %vrsqrtss\t{%d1, %0|%0, %d1}
-   %vrsqrtss\t{%1, %d0|%d0, %1}"
-  [(set_attr "type" "sse")
+   rsqrtss\t{%1, %d0|%d0, %1}
+   vrsqrtss\t{%1, %d0|%d0, %1}"
+  [(set_attr "isa" "*,*,noavx,avx")
+   (set_attr "gpr32" "1,1,1,0")
+   (set_attr "type" "sse")
    (set_attr "atom_sse_attr" "rcp")
    (set_attr "btver2_sse_attr" "rcp")
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "SF")
-   (set_attr "avx_partial_xmm_update" "false,false,true")
+   (set_attr "avx_partial_xmm_update" "false,false,true,true")
    (set (attr "preferred_for_speed")
       (cond [(match_test "TARGET_AVX")
 	       (symbol_ref "true")
-	     (eq_attr "alternative" "1,2")
+	     (eq_attr "alternative" "1,2,3")
 	       (symbol_ref "!TARGET_SSE_PARTIAL_REG_DEPENDENCY")
 	    ]
 	    (symbol_ref "true")))])
@@ -22107,14 +22118,17 @@ (define_expand "signbitxf2"
 })
 
 (define_insn "movmsk_df"
-  [(set (match_operand:SI 0 "register_operand" "=r")
+  [(set (match_operand:SI 0 "register_operand" "=r,h")
 	(unspec:SI
-	  [(match_operand:DF 1 "register_operand" "x")]
+	  [(match_operand:DF 1 "register_operand" "x,x")]
 	  UNSPEC_MOVMSK))]
   "SSE_FLOAT_MODE_P (DFmode) && TARGET_SSE_MATH"
-  "%vmovmskpd\t{%1, %0|%0, %1}"
-  [(set_attr "type" "ssemov")
-   (set_attr "prefix" "maybe_vex")
+  "@
+   movmskpd\t{%1, %0|%0, %1}
+   vmovmskpd\t{%1, %0|%0, %1}"
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "ssemov")
+   (set_attr "prefix" "maybe_evex")
    (set_attr "mode" "DF")])
 
 ;; Use movmskpd in SSE mode to avoid store forwarding stall
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 63803c89f2b..9dcb165d270 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -5182,13 +5182,14 @@ (define_expand "usadv8qi"
 })
 
 (define_insn_and_split "mmx_pmovmskb"
-  [(set (match_operand:SI 0 "register_operand" "=r,r")
-	(unspec:SI [(match_operand:V8QI 1 "register_operand" "y,x")]
+  [(set (match_operand:SI 0 "register_operand" "=r,r,h")
+	(unspec:SI [(match_operand:V8QI 1 "register_operand" "y,x,x")]
 		   UNSPEC_MOVMSK))]
   "(TARGET_MMX || TARGET_MMX_WITH_SSE)
    && (TARGET_SSE || TARGET_3DNOW_A)"
   "@
    pmovmskb\t{%1, %0|%0, %1}
+   #
    #"
   "TARGET_SSE2 && reload_completed
    && SSE_REGNO_P (REGNO (operands[1]))"
@@ -5203,9 +5204,9 @@ (define_insn_and_split "mmx_pmovmskb"
   operands[2] = lowpart_subreg (QImode, operands[0],
 				GET_MODE (operands[0]));
 }
-  [(set_attr "mmx_isa" "native,sse")
-   (set_attr "type" "mmxcvt,ssemov")
-   (set_attr "mode" "DI,TI")])
+  [(set_attr "mmx_isa" "native,sse_noavx,avx")
+   (set_attr "type" "mmxcvt,ssemov,ssemov")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_expand "mmx_maskmovq"
   [(set (match_operand:V8QI 0 "memory_operand")
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 4913c34ed37..4b6bed36061 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1845,12 +1845,16 @@ (define_peephole2
   "operands[4] = adjust_address (operands[0], V2DFmode, 0);")
 
 (define_insn "<sse3>_lddqu<avxsizesuffix>"
-  [(set (match_operand:VI1 0 "register_operand" "=x")
-	(unspec:VI1 [(match_operand:VI1 1 "memory_operand" "m")]
+  [(set (match_operand:VI1 0 "register_operand" "=x,x")
+	(unspec:VI1 [(match_operand:VI1 1 "memory_operand" "m,Bt")]
 		    UNSPEC_LDDQU))]
   "TARGET_SSE3"
-  "%vlddqu\t{%1, %0|%0, %1}"
-  [(set_attr "type" "ssemov")
+  "@
+   lddqu\t{%1, %0|%0, %1}
+   vlddqu\t{%1, %0|%0, %1}"
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "ssemov")
+   (set_attr "gpr32" "1,0")
    (set_attr "movu" "1")
    (set (attr "prefix_data16")
      (if_then_else
@@ -2519,12 +2523,16 @@ (define_insn "<sse>_div<mode>3<mask_name><round_name>"
    (set_attr "mode" "<MODE>")])
 
 (define_insn "<sse>_rcp<mode>2"
-  [(set (match_operand:VF1_128_256 0 "register_operand" "=x")
+  [(set (match_operand:VF1_128_256 0 "register_operand" "=x,x")
 	(unspec:VF1_128_256
-	  [(match_operand:VF1_128_256 1 "vector_operand" "xBm")] UNSPEC_RCP))]
+	  [(match_operand:VF1_128_256 1 "vector_operand" "xBm,xBT")] UNSPEC_RCP))]
   "TARGET_SSE"
-  "%vrcpps\t{%1, %0|%0, %1}"
-  [(set_attr "type" "sse")
+  "@
+   rcpps\t{%1, %0|%0, %1}
+   vrcpps\t{%1, %0|%0, %1}"
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "sse")
+   (set_attr "gpr32" "1,0")
    (set_attr "atom_sse_attr" "rcp")
    (set_attr "btver2_sse_attr" "rcp")
    (set_attr "prefix" "maybe_vex")
@@ -2543,6 +2551,7 @@ (define_insn "sse_vmrcpv4sf2"
    vrcpss\t{%1, %2, %0|%0, %2, %k1}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sse")
+   (set_attr "gpr32" "1,0")
    (set_attr "atom_sse_attr" "rcp")
    (set_attr "btver2_sse_attr" "rcp")
    (set_attr "prefix" "orig,vex")
@@ -2562,6 +2571,7 @@ (define_insn "*sse_vmrcpv4sf2"
    vrcpss\t{%1, %2, %0|%0, %2, %1}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sse")
+   (set_attr "gpr32" "1,0")
    (set_attr "atom_sse_attr" "rcp")
    (set_attr "btver2_sse_attr" "rcp")
    (set_attr "prefix" "orig,vex")
@@ -2738,12 +2748,16 @@ (define_expand "rsqrt<mode>2"
   "TARGET_AVX512FP16")
 
 (define_insn "<sse>_rsqrt<mode>2"
-  [(set (match_operand:VF1_128_256 0 "register_operand" "=x")
+  [(set (match_operand:VF1_128_256 0 "register_operand" "=x,x")
 	(unspec:VF1_128_256
-	  [(match_operand:VF1_128_256 1 "vector_operand" "xBm")] UNSPEC_RSQRT))]
+	  [(match_operand:VF1_128_256 1 "vector_operand" "xBm,xBT")] UNSPEC_RSQRT))]
   "TARGET_SSE"
-  "%vrsqrtps\t{%1, %0|%0, %1}"
-  [(set_attr "type" "sse")
+  "@
+   rsqrtps\t{%1, %0|%0, %1}
+   vrsqrtps\t{%1, %0|%0, %1}"
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "sse")
+   (set_attr "gpr32" "1,0")
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "<MODE>")])
 
@@ -2802,7 +2816,7 @@ (define_insn "rsqrt14_<mode>_mask"
 (define_insn "sse_vmrsqrtv4sf2"
   [(set (match_operand:V4SF 0 "register_operand" "=x,x")
 	(vec_merge:V4SF
-	  (unspec:V4SF [(match_operand:V4SF 1 "nonimmediate_operand" "xm,xm")]
+	  (unspec:V4SF [(match_operand:V4SF 1 "nonimmediate_operand" "xm,xBt")]
 		       UNSPEC_RSQRT)
 	  (match_operand:V4SF 2 "register_operand" "0,x")
 	  (const_int 1)))]
@@ -2812,6 +2826,7 @@ (define_insn "sse_vmrsqrtv4sf2"
    vrsqrtss\t{%1, %2, %0|%0, %2, %k1}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sse")
+   (set_attr "gpr32" "1,0")
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "SF")])
 
@@ -2819,7 +2834,7 @@ (define_insn "*sse_vmrsqrtv4sf2"
   [(set (match_operand:V4SF 0 "register_operand" "=x,x")
 	(vec_merge:V4SF
 	  (vec_duplicate:V4SF
-	    (unspec:SF [(match_operand:SF 1 "nonimmediate_operand" "xm,xm")]
+	    (unspec:SF [(match_operand:SF 1 "nonimmediate_operand" "xm,xBt")]
 		         UNSPEC_RSQRT))
 	  (match_operand:V4SF 2 "register_operand" "0,x")
 	  (const_int 1)))]
@@ -2829,6 +2844,7 @@ (define_insn "*sse_vmrsqrtv4sf2"
    vrsqrtss\t{%1, %2, %0|%0, %2, %1}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sse")
+   (set_attr "gpr32" "1,0")
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "SF")])
 
@@ -3004,7 +3020,7 @@ (define_insn "vec_addsub<mode>3"
         (vec_merge:VF_128_256
 	  (minus:VF_128_256
 	    (match_operand:VF_128_256 1 "register_operand" "0,x")
-	    (match_operand:VF_128_256 2 "vector_operand" "xBm, xm"))
+	    (match_operand:VF_128_256 2 "vector_operand" "xBm, xBt"))
 	  (plus:VF_128_256 (match_dup 1) (match_dup 2))
 	  (const_int <addsub_cst>)))]
   "TARGET_SSE3"
@@ -3013,6 +3029,7 @@ (define_insn "vec_addsub<mode>3"
    vaddsub<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sseadd")
+   (set_attr "gpr32" "1,0")
    (set (attr "atom_unit")
      (if_then_else
        (match_test "<MODE>mode == V2DFmode")
@@ -3156,7 +3173,7 @@ (define_insn "avx_h<insn>v4df3"
 	      (vec_select:DF (match_dup 1) (parallel [(const_int 1)])))
 	    (plusminus:DF
 	      (vec_select:DF
-		(match_operand:V4DF 2 "nonimmediate_operand" "xm")
+		(match_operand:V4DF 2 "nonimmediate_operand" "xBt")
 		(parallel [(const_int 0)]))
 	      (vec_select:DF (match_dup 2) (parallel [(const_int 1)]))))
 	  (vec_concat:V2DF
@@ -3169,6 +3186,7 @@ (define_insn "avx_h<insn>v4df3"
   "TARGET_AVX"
   "vh<plusminus_mnemonic>pd\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "type" "sseadd")
+   (set_attr "gpr32" "0")
    (set_attr "prefix" "vex")
    (set_attr "mode" "V4DF")])
 
@@ -3199,7 +3217,7 @@ (define_insn "*sse3_haddv2df3"
 	      (parallel [(match_operand:SI 4 "const_0_to_1_operand")])))
 	  (plus:DF
 	    (vec_select:DF
-	      (match_operand:V2DF 2 "vector_operand" "xBm,xm")
+	      (match_operand:V2DF 2 "vector_operand" "xBm,xBt")
 	      (parallel [(match_operand:SI 5 "const_0_to_1_operand")]))
 	    (vec_select:DF
 	      (match_dup 2)
@@ -3211,6 +3229,7 @@ (define_insn "*sse3_haddv2df3"
    haddpd\t{%2, %0|%0, %2}
    vhaddpd\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,avx")
+   (set_attr "gpr32" "1,0")
    (set_attr "type" "sseadd")
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "V2DF")])
@@ -3225,7 +3244,7 @@ (define_insn "sse3_hsubv2df3"
 	    (vec_select:DF (match_dup 1) (parallel [(const_int 1)])))
 	  (minus:DF
 	    (vec_select:DF
-	      (match_operand:V2DF 2 "vector_operand" "xBm,xm")
+	      (match_operand:V2DF 2 "vector_operand" "xBm,xBt")
 	      (parallel [(const_int 0)]))
 	    (vec_select:DF (match_dup 2) (parallel [(const_int 1)])))))]
   "TARGET_SSE3"
@@ -3234,6 +3253,7 @@ (define_insn "sse3_hsubv2df3"
    vhsubpd\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sseadd")
+   (set_attr "gpr32" "1,0")
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "V2DF")])
 
@@ -3290,7 +3310,7 @@ (define_insn "avx_h<insn>v8sf3"
 	    (vec_concat:V2SF
 	      (plusminus:SF
 		(vec_select:SF
-		  (match_operand:V8SF 2 "nonimmediate_operand" "xm")
+		  (match_operand:V8SF 2 "nonimmediate_operand" "xBt")
 		  (parallel [(const_int 0)]))
 		(vec_select:SF (match_dup 2) (parallel [(const_int 1)])))
 	      (plusminus:SF
@@ -3314,6 +3334,7 @@ (define_insn "avx_h<insn>v8sf3"
   "TARGET_AVX"
   "vh<plusminus_mnemonic>ps\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "type" "sseadd")
+   (set_attr "gpr32" "0")
    (set_attr "prefix" "vex")
    (set_attr "mode" "V8SF")])
 
@@ -3332,7 +3353,7 @@ (define_insn "sse3_h<insn>v4sf3"
 	  (vec_concat:V2SF
 	    (plusminus:SF
 	      (vec_select:SF
-		(match_operand:V4SF 2 "vector_operand" "xBm,xm")
+		(match_operand:V4SF 2 "vector_operand" "xBm,xBt")
 		(parallel [(const_int 0)]))
 	      (vec_select:SF (match_dup 2) (parallel [(const_int 1)])))
 	    (plusminus:SF
@@ -3344,6 +3365,7 @@ (define_insn "sse3_h<insn>v4sf3"
    vh<plusminus_mnemonic>ps\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sseadd")
+   (set_attr "gpr32" "1,0")
    (set_attr "atom_unit" "complex")
    (set_attr "prefix" "orig,vex")
    (set_attr "prefix_rep" "1,*")
@@ -3537,12 +3559,13 @@ (define_insn "avx_cmp<mode>3"
   [(set (match_operand:VF_128_256 0 "register_operand" "=x")
 	(unspec:VF_128_256
 	  [(match_operand:VF_128_256 1 "register_operand" "x")
-	   (match_operand:VF_128_256 2 "nonimmediate_operand" "xm")
+	   (match_operand:VF_128_256 2 "nonimmediate_operand" "xBt")
 	   (match_operand:SI 3 "const_0_to_31_operand")]
 	  UNSPEC_PCMP))]
   "TARGET_AVX"
   "vcmp<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssecmp")
+   (set_attr "gpr32" "0")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "vex")
    (set_attr "mode" "<MODE>")])
@@ -3748,7 +3771,7 @@ (define_insn "avx_vmcmp<mode>3"
 	(vec_merge:VF_128
 	  (unspec:VF_128
 	    [(match_operand:VF_128 1 "register_operand" "x")
-	     (match_operand:VF_128 2 "nonimmediate_operand" "xm")
+	     (match_operand:VF_128 2 "nonimmediate_operand" "xBt")
 	     (match_operand:SI 3 "const_0_to_31_operand")]
 	    UNSPEC_PCMP)
 	 (match_dup 1)
@@ -3756,6 +3779,7 @@ (define_insn "avx_vmcmp<mode>3"
   "TARGET_AVX"
   "vcmp<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}"
   [(set_attr "type" "ssecmp")
+   (set_attr "gpr32" "0")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "vex")
    (set_attr "mode" "<ssescalarmode>")])
@@ -3764,13 +3788,14 @@ (define_insn "*<sse>_maskcmp<mode>3_comm"
   [(set (match_operand:VF_128_256 0 "register_operand" "=x,x")
 	(match_operator:VF_128_256 3 "sse_comparison_operator"
 	  [(match_operand:VF_128_256 1 "register_operand" "%0,x")
-	   (match_operand:VF_128_256 2 "vector_operand" "xBm,xm")]))]
+	   (match_operand:VF_128_256 2 "vector_operand" "xBm,xBt")]))]
   "TARGET_SSE
    && GET_RTX_CLASS (GET_CODE (operands[3])) == RTX_COMM_COMPARE"
   "@
    cmp%D3<ssemodesuffix>\t{%2, %0|%0, %2}
    vcmp%D3<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,avx")
+   (set_attr "gpr32" "1,0")
    (set_attr "type" "ssecmp")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "orig,vex")
@@ -3780,12 +3805,13 @@ (define_insn "<sse>_maskcmp<mode>3"
   [(set (match_operand:VF_128_256 0 "register_operand" "=x,x")
 	(match_operator:VF_128_256 3 "sse_comparison_operator"
 	  [(match_operand:VF_128_256 1 "register_operand" "0,x")
-	   (match_operand:VF_128_256 2 "vector_operand" "xBm,xm")]))]
+	   (match_operand:VF_128_256 2 "vector_operand" "xBm,xBt")]))]
   "TARGET_SSE"
   "@
    cmp%D3<ssemodesuffix>\t{%2, %0|%0, %2}
    vcmp%D3<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,avx")
+   (set_attr "gpr32" "1,0")
    (set_attr "type" "ssecmp")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "orig,vex")
@@ -3796,7 +3822,7 @@ (define_insn "<sse>_vmmaskcmp<mode>3"
 	(vec_merge:VF_128
 	 (match_operator:VF_128 3 "sse_comparison_operator"
 	   [(match_operand:VF_128 1 "register_operand" "0,x")
-	    (match_operand:VF_128 2 "nonimmediate_operand" "xm,xm")])
+	    (match_operand:VF_128 2 "nonimmediate_operand" "xm,xBt")])
 	 (match_dup 1)
 	 (const_int 1)))]
   "TARGET_SSE"
@@ -3804,6 +3830,7 @@ (define_insn "<sse>_vmmaskcmp<mode>3"
    cmp%D3<ssescalarmodesuffix>\t{%2, %0|%0, %<iptr>2}
    vcmp%D3<ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %<iptr>2}"
   [(set_attr "isa" "noavx,avx")
+   (set_attr "gpr32" "1,0")
    (set_attr "type" "ssecmp")
    (set_attr "length_immediate" "1,*")
    (set_attr "prefix" "orig,vex")
@@ -4721,7 +4748,7 @@ (define_insn "<sse>_andnot<mode>3<mask_name>"
 	(and:VFB_128_256
 	  (not:VFB_128_256
 	    (match_operand:VFB_128_256 1 "register_operand" "0,x,v,v"))
-	  (match_operand:VFB_128_256 2 "vector_operand" "xBm,xm,vm,vm")))]
+	  (match_operand:VFB_128_256 2 "vector_operand" "xBm,xBt,vm,vm")))]
   "TARGET_SSE && <mask_avx512vl_condition>
    && (!<mask_applied> || <ssescalarmode>mode != HFmode)"
 {
@@ -4765,7 +4792,8 @@ (define_insn "<sse>_andnot<mode>3<mask_name>"
   output_asm_insn (buf, operands);
   return "";
 }
-  [(set_attr "isa" "noavx,avx,avx512dq,avx512f")
+  [(set_attr "isa" "noavx,avx_noavx512f,avx512dq,avx512f")
+   (set_attr "gpr32" "1,0,1,1")
    (set_attr "type" "sselog")
    (set_attr "prefix" "orig,maybe_vex,evex,evex")
    (set (attr "mode")
@@ -5075,7 +5103,7 @@ (define_insn "*andnot<mode>3"
   [(set (match_operand:ANDNOT_MODE 0 "register_operand" "=x,x,v,v")
 	(and:ANDNOT_MODE
 	  (not:ANDNOT_MODE (match_operand:ANDNOT_MODE 1 "register_operand" "0,x,v,v"))
-	  (match_operand:ANDNOT_MODE 2 "vector_operand" "xBm,xm,vm,v")))]
+	  (match_operand:ANDNOT_MODE 2 "vector_operand" "xBm,xBt,vm,v")))]
   "TARGET_SSE"
 {
   char buf[128];
@@ -5104,7 +5132,8 @@ (define_insn "*andnot<mode>3"
   output_asm_insn (buf, operands);
   return "";
 }
-  [(set_attr "isa" "noavx,avx,avx512vl,avx512f")
+  [(set_attr "isa" "noavx,avx_noavx512f,avx512vl,avx512f")
+   (set_attr "gpr32" "1,0,1,1")
    (set_attr "type" "sselog")
    (set (attr "prefix_data16")
      (if_then_else
@@ -12246,7 +12275,7 @@ (define_insn_and_split "vec_extract_lo_v32qi"
   "operands[1] = gen_lowpart (V16QImode, operands[1]);")
 
 (define_insn "vec_extract_hi_v32qi"
-  [(set (match_operand:V16QI 0 "nonimmediate_operand" "=xm,vm")
+  [(set (match_operand:V16QI 0 "nonimmediate_operand" "=xBt,vm")
 	(vec_select:V16QI
 	  (match_operand:V32QI 1 "register_operand" "x,v")
 	  (parallel [(const_int 16) (const_int 17)
@@ -12264,7 +12293,8 @@ (define_insn "vec_extract_hi_v32qi"
   [(set_attr "type" "sselog1")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
-   (set_attr "isa" "*,avx512vl")
+   (set_attr "isa" "noavx512vl,avx512vl")
+   (set_attr "gpr32" "0,1")
    (set_attr "prefix" "vex,evex")
    (set_attr "mode" "OI")])
 
@@ -17135,6 +17165,7 @@ (define_insn "*sse2_gt<mode>3"
    pcmpgt<ssemodesuffix>\t{%2, %0|%0, %2}
    vpcmpgt<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,avx")
+   (set_attr "gpr32" "1,0")
    (set_attr "type" "ssecmp")
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "TI")])
@@ -17451,7 +17482,7 @@ (define_insn "*andnot<mode>3"
   [(set (match_operand:VI 0 "register_operand" "=x,x,v,v,v")
 	(and:VI
 	  (not:VI (match_operand:VI 1 "bcst_vector_operand" "0,x,v,m,Br"))
-	  (match_operand:VI 2 "bcst_vector_operand" "xBm,xm,vmBr,0,0")))]
+	  (match_operand:VI 2 "bcst_vector_operand" "xBm,xBt,vmBr,0,0")))]
   "TARGET_SSE
    && (register_operand (operands[1], <MODE>mode)
        || register_operand (operands[2], <MODE>mode))"
@@ -17538,7 +17569,8 @@ (define_insn "*andnot<mode>3"
   output_asm_insn (buf, operands);
   return "";
 }
-  [(set_attr "isa" "noavx,avx,avx,*,*")
+  [(set_attr "isa" "noavx,avx_noavx512f,avx512f,*,*")
+   (set_attr "gpr32" "1,0,1,1,1")
    (set_attr "type" "sselog")
    (set (attr "prefix_data16")
      (if_then_else
@@ -17693,7 +17725,7 @@ (define_insn "*<code><mode>3<mask_name>"
   [(set (match_operand:VI48_AVX_AVX512F 0 "register_operand" "=x,x,v")
 	(any_logic:VI48_AVX_AVX512F
 	  (match_operand:VI48_AVX_AVX512F 1 "bcst_vector_operand" "%0,x,v")
-	  (match_operand:VI48_AVX_AVX512F 2 "bcst_vector_operand" "xBm,xm,vmBr")))]
+	  (match_operand:VI48_AVX_AVX512F 2 "bcst_vector_operand" "xBm,xBt,vmBr")))]
   "TARGET_SSE && <mask_mode512bit_condition>
    && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
 {
@@ -17723,9 +17755,11 @@ (define_insn "*<code><mode>3<mask_name>"
 	case E_V4DImode:
 	case E_V4SImode:
 	case E_V2DImode:
-	  ssesuffix = (TARGET_AVX512VL
-		       && (<mask_applied> || which_alternative == 2)
-		       ? "<ssemodesuffix>" : "");
+	  ssesuffix = ((TARGET_AVX512VL
+		        && (<mask_applied> || which_alternative == 2))
+		       || (MEM_P (operands[2]) && which_alternative == 2
+			   && x86_extended_rex2reg_mentioned_p (operands[2])))
+		       ? "<ssemodesuffix>" : "";
 	  break;
 	default:
 	  gcc_unreachable ();
@@ -17765,7 +17799,8 @@ (define_insn "*<code><mode>3<mask_name>"
   output_asm_insn (buf, operands);
   return "";
 }
-  [(set_attr "isa" "noavx,avx,avx")
+  [(set_attr "isa" "noavx,avx_noavx512f,avx512f")
+   (set_attr "gpr32" "1,0,1")
    (set_attr "type" "sselog")
    (set (attr "prefix_data16")
      (if_then_else
@@ -17792,7 +17827,7 @@ (define_insn "*<code><mode>3"
   [(set (match_operand:VI12_AVX_AVX512F 0 "register_operand" "=x,x,v")
 	(any_logic:VI12_AVX_AVX512F
 	  (match_operand:VI12_AVX_AVX512F 1 "vector_operand" "%0,x,v")
-	  (match_operand:VI12_AVX_AVX512F 2 "vector_operand" "xBm,xm,vm")))]
+	  (match_operand:VI12_AVX_AVX512F 2 "vector_operand" "xBm,xBt,vm")))]
   "TARGET_SSE && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
 {
   char buf[64];
@@ -17821,7 +17856,10 @@ (define_insn "*<code><mode>3"
 	case E_V16HImode:
 	case E_V16QImode:
 	case E_V8HImode:
-	  ssesuffix = TARGET_AVX512VL && which_alternative == 2 ? "q" : "";
+	  ssesuffix = (((TARGET_AVX512VL && which_alternative == 2)
+		       || (MEM_P (operands[2]) && which_alternative == 2
+			   && x86_extended_rex2reg_mentioned_p (operands[2]))))
+		       ? "q" : "";
 	  break;
 	default:
 	  gcc_unreachable ();
@@ -17858,7 +17896,8 @@ (define_insn "*<code><mode>3"
   output_asm_insn (buf, operands);
   return "";
 }
-  [(set_attr "isa" "noavx,avx,avx")
+  [(set_attr "isa" "noavx,avx_noavx512f,avx512f")
+   (set_attr "gpr32" "1,0,1")
    (set_attr "type" "sselog")
    (set (attr "prefix_data16")
      (if_then_else
@@ -17885,13 +17924,14 @@ (define_insn "<code>v1ti3"
   [(set (match_operand:V1TI 0 "register_operand" "=x,x,v")
 	(any_logic:V1TI
 	  (match_operand:V1TI 1 "register_operand" "%0,x,v")
-	  (match_operand:V1TI 2 "vector_operand" "xBm,xm,vm")))]
+	  (match_operand:V1TI 2 "vector_operand" "xBm,xBt,vm")))]
   "TARGET_SSE2"
   "@
    p<logic>\t{%2, %0|%0, %2}
    vp<logic>\t{%2, %1, %0|%0, %1, %2}
    vp<logic>d\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "isa" "noavx,avx,avx512vl")
+  [(set_attr "isa" "noavx,avx_noavx512vl,avx512vl")
+   (set_attr "gpr32" "1,0,1")
    (set_attr "prefix" "orig,vex,evex")
    (set_attr "prefix_data16" "1,*,*")
    (set_attr "type" "sselog")
@@ -20878,33 +20918,39 @@ (define_insn "*<sse2_avx2>_psadbw"
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_insn "<sse>_movmsk<ssemodesuffix><avxsizesuffix>"
-  [(set (match_operand:SI 0 "register_operand" "=r")
+  [(set (match_operand:SI 0 "register_operand" "=r,h")
 	(unspec:SI
-	  [(match_operand:VF_128_256 1 "register_operand" "x")]
+	  [(match_operand:VF_128_256 1 "register_operand" "x,x")]
 	  UNSPEC_MOVMSK))]
   "TARGET_SSE"
-  "%vmovmsk<ssemodesuffix>\t{%1, %0|%0, %1}"
-  [(set_attr "type" "ssemov")
-   (set_attr "prefix" "maybe_vex")
+  "@
+   movmsk<ssemodesuffix>\t{%1, %0|%0, %1}
+   vmovmsk<ssemodesuffix>\t{%1, %0|%0, %1}"
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "ssemov")
+   (set_attr "prefix" "maybe_evex")
    (set_attr "mode" "<MODE>")])
 
 (define_insn "*<sse>_movmsk<ssemodesuffix><avxsizesuffix>_<u>ext"
-  [(set (match_operand:DI 0 "register_operand" "=r")
+  [(set (match_operand:DI 0 "register_operand" "=r,h")
 	(any_extend:DI
 	  (unspec:SI
-	    [(match_operand:VF_128_256 1 "register_operand" "x")]
+	    [(match_operand:VF_128_256 1 "register_operand" "x,x")]
 	    UNSPEC_MOVMSK)))]
   "TARGET_64BIT && TARGET_SSE"
-  "%vmovmsk<ssemodesuffix>\t{%1, %k0|%k0, %1}"
-  [(set_attr "type" "ssemov")
-   (set_attr "prefix" "maybe_vex")
+  "@
+   movmsk<ssemodesuffix>\t{%1, %0|%0, %1}
+   vmovmsk<ssemodesuffix>\t{%1, %0|%0, %1}"
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "ssemov")
+   (set_attr "prefix" "maybe_evex")
    (set_attr "mode" "<MODE>")])
 
 (define_insn_and_split "*<sse>_movmsk<ssemodesuffix><avxsizesuffix>_lt"
-  [(set (match_operand:SI 0 "register_operand" "=r")
+  [(set (match_operand:SI 0 "register_operand" "=r,h")
 	(unspec:SI
 	  [(lt:VF_128_256
-	     (match_operand:<sseintvecmode> 1 "register_operand" "x")
+	     (match_operand:<sseintvecmode> 1 "register_operand" "x,x")
 	     (match_operand:<sseintvecmode> 2 "const0_operand"))]
 	  UNSPEC_MOVMSK))]
   "TARGET_SSE"
@@ -20913,16 +20959,17 @@ (define_insn_and_split "*<sse>_movmsk<ssemodesuffix><avxsizesuffix>_lt"
   [(set (match_dup 0)
 	(unspec:SI [(match_dup 1)] UNSPEC_MOVMSK))]
   "operands[1] = gen_lowpart (<MODE>mode, operands[1]);"
-  [(set_attr "type" "ssemov")
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "ssemov")
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "<MODE>")])
 
 (define_insn_and_split "*<sse>_movmsk<ssemodesuffix><avxsizesuffix>_<u>ext_lt"
-  [(set (match_operand:DI 0 "register_operand" "=r")
+  [(set (match_operand:DI 0 "register_operand" "=r,h")
 	(any_extend:DI
 	  (unspec:SI
 	    [(lt:VF_128_256
-	       (match_operand:<sseintvecmode> 1 "register_operand" "x")
+	       (match_operand:<sseintvecmode> 1 "register_operand" "x,x")
 	       (match_operand:<sseintvecmode> 2 "const0_operand"))]
 	    UNSPEC_MOVMSK)))]
   "TARGET_64BIT && TARGET_SSE"
@@ -20931,16 +20978,17 @@ (define_insn_and_split "*<sse>_movmsk<ssemodesuffix><avxsizesuffix>_<u>ext_lt"
   [(set (match_dup 0)
 	(any_extend:DI (unspec:SI [(match_dup 1)] UNSPEC_MOVMSK)))]
   "operands[1] = gen_lowpart (<MODE>mode, operands[1]);"
-  [(set_attr "type" "ssemov")
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "ssemov")
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "<MODE>")])
 
 (define_insn_and_split "*<sse>_movmsk<ssemodesuffix><avxsizesuffix>_shift"
-  [(set (match_operand:SI 0 "register_operand" "=r")
+  [(set (match_operand:SI 0 "register_operand" "=r,h")
 	(unspec:SI
 	  [(subreg:VF_128_256
 	     (ashiftrt:<sseintvecmode>
-	       (match_operand:<sseintvecmode> 1 "register_operand" "x")
+	       (match_operand:<sseintvecmode> 1 "register_operand" "x,x")
 	       (match_operand:QI 2 "const_int_operand")) 0)]
 	  UNSPEC_MOVMSK))]
   "TARGET_SSE"
@@ -20949,17 +20997,18 @@ (define_insn_and_split "*<sse>_movmsk<ssemodesuffix><avxsizesuffix>_shift"
   [(set (match_dup 0)
 	(unspec:SI [(match_dup 1)] UNSPEC_MOVMSK))]
   "operands[1] = gen_lowpart (<MODE>mode, operands[1]);"
-  [(set_attr "type" "ssemov")
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "ssemov")
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "<MODE>")])
 
 (define_insn_and_split "*<sse>_movmsk<ssemodesuffix><avxsizesuffix>_<u>ext_shift"
-  [(set (match_operand:DI 0 "register_operand" "=r")
+  [(set (match_operand:DI 0 "register_operand" "=r,h")
 	(any_extend:DI
 	  (unspec:SI
 	    [(subreg:VF_128_256
 	       (ashiftrt:<sseintvecmode>
-		 (match_operand:<sseintvecmode> 1 "register_operand" "x")
+		 (match_operand:<sseintvecmode> 1 "register_operand" "x,x")
 	       (match_operand:QI 2 "const_int_operand")) 0)]
 	    UNSPEC_MOVMSK)))]
   "TARGET_64BIT && TARGET_SSE"
@@ -20968,18 +21017,22 @@ (define_insn_and_split "*<sse>_movmsk<ssemodesuffix><avxsizesuffix>_<u>ext_shift
   [(set (match_dup 0)
 	(any_extend:DI (unspec:SI [(match_dup 1)] UNSPEC_MOVMSK)))]
   "operands[1] = gen_lowpart (<MODE>mode, operands[1]);"
-  [(set_attr "type" "ssemov")
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "ssemov")
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "<MODE>")])
 
 (define_insn "<sse2_avx2>_pmovmskb"
-  [(set (match_operand:SI 0 "register_operand" "=r")
+  [(set (match_operand:SI 0 "register_operand" "=r,h")
 	(unspec:SI
-	  [(match_operand:VI1_AVX2 1 "register_operand" "x")]
+	  [(match_operand:VI1_AVX2 1 "register_operand" "x,x")]
 	  UNSPEC_MOVMSK))]
   "TARGET_SSE2"
-  "%vpmovmskb\t{%1, %0|%0, %1}"
-  [(set_attr "type" "ssemov")
+  "@
+   pmovmskb\t{%1, %0|%0, %1}
+   vpmovmskb\t{%1, %0|%0, %1}"
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "ssemov")
    (set (attr "prefix_data16")
      (if_then_else
        (match_test "TARGET_AVX")
@@ -20989,14 +21042,17 @@ (define_insn "<sse2_avx2>_pmovmskb"
    (set_attr "mode" "SI")])
 
 (define_insn "*<sse2_avx2>_pmovmskb_zext"
-  [(set (match_operand:DI 0 "register_operand" "=r")
+  [(set (match_operand:DI 0 "register_operand" "=r,h")
 	(zero_extend:DI
 	  (unspec:SI
-	    [(match_operand:VI1_AVX2 1 "register_operand" "x")]
+	    [(match_operand:VI1_AVX2 1 "register_operand" "x,x")]
 	    UNSPEC_MOVMSK)))]
   "TARGET_64BIT && TARGET_SSE2"
-  "%vpmovmskb\t{%1, %k0|%k0, %1}"
-  [(set_attr "type" "ssemov")
+  "@
+   pmovmskb\t{%1, %k0|%k0, %1}
+   vpmovmskb\t{%1, %k0|%k0, %1}"
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "ssemov")
    (set (attr "prefix_data16")
      (if_then_else
        (match_test "TARGET_AVX")
@@ -21006,14 +21062,17 @@ (define_insn "*<sse2_avx2>_pmovmskb_zext"
    (set_attr "mode" "SI")])
 
 (define_insn "*sse2_pmovmskb_ext"
-  [(set (match_operand:DI 0 "register_operand" "=r")
+  [(set (match_operand:DI 0 "register_operand" "=r,h")
 	(sign_extend:DI
 	  (unspec:SI
-	    [(match_operand:V16QI 1 "register_operand" "x")]
+	    [(match_operand:V16QI 1 "register_operand" "x,x")]
 	    UNSPEC_MOVMSK)))]
   "TARGET_64BIT && TARGET_SSE2"
-  "%vpmovmskb\t{%1, %k0|%k0, %1}"
-  [(set_attr "type" "ssemov")
+  "@
+   pmovmskb\t{%1, %k0|%k0, %1}
+   vpmovmskb\t{%1, %k0|%k0, %1}"
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "ssemov")
    (set (attr "prefix_data16")
      (if_then_else
        (match_test "TARGET_AVX")
@@ -21098,9 +21157,9 @@ (define_split
 })
 
 (define_insn_and_split "*<sse2_avx2>_pmovmskb_lt"
-  [(set (match_operand:SI 0 "register_operand" "=r")
+  [(set (match_operand:SI 0 "register_operand" "=r,h")
 	(unspec:SI
-	  [(lt:VI1_AVX2 (match_operand:VI1_AVX2 1 "register_operand" "x")
+	  [(lt:VI1_AVX2 (match_operand:VI1_AVX2 1 "register_operand" "x,x")
 			(match_operand:VI1_AVX2 2 "const0_operand"))]
 	  UNSPEC_MOVMSK))]
   "TARGET_SSE2"
@@ -21109,7 +21168,8 @@ (define_insn_and_split "*<sse2_avx2>_pmovmskb_lt"
   [(set (match_dup 0)
 	(unspec:SI [(match_dup 1)] UNSPEC_MOVMSK))]
   ""
-  [(set_attr "type" "ssemov")
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "ssemov")
    (set (attr "prefix_data16")
      (if_then_else
        (match_test "TARGET_AVX")
@@ -21119,10 +21179,10 @@ (define_insn_and_split "*<sse2_avx2>_pmovmskb_lt"
    (set_attr "mode" "SI")])
 
 (define_insn_and_split "*<sse2_avx2>_pmovmskb_zext_lt"
-  [(set (match_operand:DI 0 "register_operand" "=r")
+  [(set (match_operand:DI 0 "register_operand" "=r,h")
 	(zero_extend:DI
 	  (unspec:SI
-	    [(lt:VI1_AVX2 (match_operand:VI1_AVX2 1 "register_operand" "x")
+	    [(lt:VI1_AVX2 (match_operand:VI1_AVX2 1 "register_operand" "x,x")
 			  (match_operand:VI1_AVX2 2 "const0_operand"))]
 	    UNSPEC_MOVMSK)))]
   "TARGET_64BIT && TARGET_SSE2"
@@ -21131,7 +21191,8 @@ (define_insn_and_split "*<sse2_avx2>_pmovmskb_zext_lt"
   [(set (match_dup 0)
 	(zero_extend:DI (unspec:SI [(match_dup 1)] UNSPEC_MOVMSK)))]
   ""
-  [(set_attr "type" "ssemov")
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "ssemov")
    (set (attr "prefix_data16")
      (if_then_else
        (match_test "TARGET_AVX")
@@ -21141,10 +21202,10 @@ (define_insn_and_split "*<sse2_avx2>_pmovmskb_zext_lt"
    (set_attr "mode" "SI")])
 
 (define_insn_and_split "*sse2_pmovmskb_ext_lt"
-  [(set (match_operand:DI 0 "register_operand" "=r")
+  [(set (match_operand:DI 0 "register_operand" "=r,h")
 	(sign_extend:DI
 	  (unspec:SI
-	    [(lt:V16QI (match_operand:V16QI 1 "register_operand" "x")
+	    [(lt:V16QI (match_operand:V16QI 1 "register_operand" "x,x")
 		       (match_operand:V16QI 2 "const0_operand"))]
 	    UNSPEC_MOVMSK)))]
   "TARGET_64BIT && TARGET_SSE2"
@@ -21153,7 +21214,8 @@ (define_insn_and_split "*sse2_pmovmskb_ext_lt"
   [(set (match_dup 0)
 	(sign_extend:DI (unspec:SI [(match_dup 1)] UNSPEC_MOVMSK)))]
   ""
-  [(set_attr "type" "ssemov")
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "ssemov")
    (set (attr "prefix_data16")
      (if_then_else
        (match_test "TARGET_AVX")
@@ -21214,21 +21276,28 @@ (define_insn "*sse2_maskmovdqu"
    (set_attr "mode" "TI")])
 
 (define_insn "sse_ldmxcsr"
-  [(unspec_volatile [(match_operand:SI 0 "memory_operand" "m")]
+  [(unspec_volatile [(match_operand:SI 0 "memory_operand" "m,Bt")]
 		    UNSPECV_LDMXCSR)]
   "TARGET_SSE"
-  "%vldmxcsr\t%0"
-  [(set_attr "type" "sse")
+  "@
+  ldmxcsr\t%0
+  vldmxcsr\t%0"
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "sse")
+   (set_attr "gpr32" "1,0")
    (set_attr "atom_sse_attr" "mxcsr")
    (set_attr "prefix" "maybe_vex")
    (set_attr "memory" "load")])
 
 (define_insn "sse_stmxcsr"
-  [(set (match_operand:SI 0 "memory_operand" "=m")
+  [(set (match_operand:SI 0 "memory_operand" "=m,Bt")
 	(unspec_volatile:SI [(const_int 0)] UNSPECV_STMXCSR))]
   "TARGET_SSE"
-  "%vstmxcsr\t%0"
+  "@
+  stmxcsr\t%0
+  vstmxcsr\t%0"
   [(set_attr "type" "sse")
+   (set_attr "gpr32" "0")
    (set_attr "atom_sse_attr" "mxcsr")
    (set_attr "prefix" "maybe_vex")
    (set_attr "memory" "store")])
@@ -23890,11 +23959,12 @@ (define_expand "<insn>v2siv2di2"
 (define_insn "avx_vtest<ssemodesuffix><avxsizesuffix>"
   [(set (reg:CC FLAGS_REG)
 	(unspec:CC [(match_operand:VF_128_256 0 "register_operand" "x")
-		    (match_operand:VF_128_256 1 "nonimmediate_operand" "xm")]
+		    (match_operand:VF_128_256 1 "nonimmediate_operand" "xBt")]
 		   UNSPEC_VTESTP))]
   "TARGET_AVX"
   "vtest<ssemodesuffix>\t{%1, %0|%0, %1}"
   [(set_attr "type" "ssecomi")
+   (set_attr "gpr32" "0")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "vex")
    (set_attr "mode" "<MODE>")])
@@ -26955,7 +27025,7 @@ (define_split
 (define_insn "avx_vbroadcastf128_<mode>"
   [(set (match_operand:V_256 0 "register_operand" "=x,x,x,v,v,v,v")
 	(vec_concat:V_256
-	  (match_operand:<ssehalfvecmode> 1 "nonimmediate_operand" "m,0,?x,m,0,m,0")
+	  (match_operand:<ssehalfvecmode> 1 "nonimmediate_operand" "Bt,0,?x,m,0,m,0")
 	  (match_dup 1)))]
   "TARGET_AVX"
   "@
@@ -26966,8 +27036,9 @@ (define_insn "avx_vbroadcastf128_<mode>"
    vinsert<i128vldq>\t{$1, %1, %0, %0|%0, %0, %1, 1}
    vbroadcast<shuffletype>32x4\t{%1, %0|%0, %1}
    vinsert<shuffletype>32x4\t{$1, %1, %0, %0|%0, %0, %1, 1}"
-  [(set_attr "isa" "*,*,*,avx512dq,avx512dq,avx512vl,avx512vl")
+  [(set_attr "isa" "noavx512vl,*,*,avx512dq,avx512dq,avx512vl,avx512vl")
    (set_attr "type" "ssemov,sselog1,sselog1,ssemov,sselog1,ssemov,sselog1")
+   (set_attr "gpr32" "0,1,1,1,1,1,1")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "0,1,1,0,1,0,1")
    (set_attr "prefix" "vex,vex,vex,evex,evex,evex,evex")
@@ -27235,12 +27306,13 @@ (define_insn "*avx_vperm2f128<mode>_full"
   [(set (match_operand:AVX256MODE2P 0 "register_operand" "=x")
 	(unspec:AVX256MODE2P
 	  [(match_operand:AVX256MODE2P 1 "register_operand" "x")
-	   (match_operand:AVX256MODE2P 2 "nonimmediate_operand" "xm")
+	   (match_operand:AVX256MODE2P 2 "nonimmediate_operand" "xBt")
 	   (match_operand:SI 3 "const_0_to_255_operand")]
 	  UNSPEC_VPERMIL2F128))]
   "TARGET_AVX"
   "vperm2<i128>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "sselog")
+   (set_attr "gpr32" "0")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "vex")
@@ -27357,11 +27429,11 @@ (define_expand "avx_vinsertf128<mode>"
 })
 
 (define_insn "vec_set_lo_<mode><mask_name>"
-  [(set (match_operand:VI8F_256 0 "register_operand" "=v")
+  [(set (match_operand:VI8F_256 0 "register_operand" "=x,v")
 	(vec_concat:VI8F_256
-	  (match_operand:<ssehalfvecmode> 2 "nonimmediate_operand" "vm")
+	  (match_operand:<ssehalfvecmode> 2 "nonimmediate_operand" "xBt,vm")
 	  (vec_select:<ssehalfvecmode>
-	    (match_operand:VI8F_256 1 "register_operand" "v")
+	    (match_operand:VI8F_256 1 "register_operand" "x,v")
 	    (parallel [(const_int 2) (const_int 3)]))))]
   "TARGET_AVX && <mask_avx512dq_condition>"
 {
@@ -27372,7 +27444,9 @@ (define_insn "vec_set_lo_<mode><mask_name>"
   else
     return "vinsert<i128>\t{$0x0, %2, %1, %0|%0, %1, %2, 0x0}";
 }
-  [(set_attr "type" "sselog")
+  [(set_attr "isa" "noavx512vl,avx512vl")
+   (set_attr "gpr32" "0,1")
+   (set_attr "type" "sselog")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "vex")
@@ -27401,11 +27475,11 @@ (define_insn "vec_set_hi_<mode><mask_name>"
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_insn "vec_set_lo_<mode><mask_name>"
-  [(set (match_operand:VI4F_256 0 "register_operand" "=v")
+  [(set (match_operand:VI4F_256 0 "register_operand" "=x,v")
 	(vec_concat:VI4F_256
-	  (match_operand:<ssehalfvecmode> 2 "nonimmediate_operand" "vm")
+	  (match_operand:<ssehalfvecmode> 2 "nonimmediate_operand" "xBt,vm")
 	  (vec_select:<ssehalfvecmode>
-	    (match_operand:VI4F_256 1 "register_operand" "v")
+	    (match_operand:VI4F_256 1 "register_operand" "x,v")
 	    (parallel [(const_int 4) (const_int 5)
 		       (const_int 6) (const_int 7)]))))]
   "TARGET_AVX"
@@ -27415,20 +27489,22 @@ (define_insn "vec_set_lo_<mode><mask_name>"
   else
     return "vinsert<i128>\t{$0x0, %2, %1, %0|%0, %1, %2, 0x0}";
 }
-  [(set_attr "type" "sselog")
+  [(set_attr "isa" "noavx512vl,avx512vl")
+   (set_attr "gpr32" "0,1")
+   (set_attr "type" "sselog")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "vex")
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_insn "vec_set_hi_<mode><mask_name>"
-  [(set (match_operand:VI4F_256 0 "register_operand" "=v")
+  [(set (match_operand:VI4F_256 0 "register_operand" "=x,v")
 	(vec_concat:VI4F_256
 	  (vec_select:<ssehalfvecmode>
-	    (match_operand:VI4F_256 1 "register_operand" "v")
+	    (match_operand:VI4F_256 1 "register_operand" "x,v")
 	    (parallel [(const_int 0) (const_int 1)
 		       (const_int 2) (const_int 3)]))
-	  (match_operand:<ssehalfvecmode> 2 "nonimmediate_operand" "vm")))]
+	  (match_operand:<ssehalfvecmode> 2 "nonimmediate_operand" "xBt,vm")))]
   "TARGET_AVX"
 {
   if (TARGET_AVX512VL)
@@ -27436,7 +27512,9 @@ (define_insn "vec_set_hi_<mode><mask_name>"
   else
     return "vinsert<i128>\t{$0x1, %2, %1, %0|%0, %1, %2, 0x1}";
 }
-  [(set_attr "type" "sselog")
+  [(set_attr "isa" "noavx512vl,avx512vl")
+   (set_attr "gpr32" "0,1")
+   (set_attr "type" "sselog")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "vex")
@@ -27445,7 +27523,7 @@ (define_insn "vec_set_hi_<mode><mask_name>"
 (define_insn "vec_set_lo_<mode>"
   [(set (match_operand:V16_256 0 "register_operand" "=x,v")
 	(vec_concat:V16_256
-	  (match_operand:<ssehalfvecmode> 2 "nonimmediate_operand" "xm,vm")
+	  (match_operand:<ssehalfvecmode> 2 "nonimmediate_operand" "xBt,vm")
 	  (vec_select:<ssehalfvecmode>
 	    (match_operand:V16_256 1 "register_operand" "x,v")
 	    (parallel [(const_int 8) (const_int 9)
@@ -27456,7 +27534,9 @@ (define_insn "vec_set_lo_<mode>"
   "@
    vinsert%~128\t{$0x0, %2, %1, %0|%0, %1, %2, 0x0}
    vinserti32x4\t{$0x0, %2, %1, %0|%0, %1, %2, 0x0}"
-  [(set_attr "type" "sselog")
+  [(set_attr "isa" "noavx512vl,avx512vl")
+   (set_attr "gpr32" "0,1")
+   (set_attr "type" "sselog")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "vex,evex")
@@ -27471,12 +27551,14 @@ (define_insn "vec_set_hi_<mode>"
 		       (const_int 2) (const_int 3)
 		       (const_int 4) (const_int 5)
 		       (const_int 6) (const_int 7)]))
-	  (match_operand:<ssehalfvecmode> 2 "nonimmediate_operand" "xm,vm")))]
+	  (match_operand:<ssehalfvecmode> 2 "nonimmediate_operand" "xBt,vm")))]
   "TARGET_AVX"
   "@
    vinsert%~128\t{$0x1, %2, %1, %0|%0, %1, %2, 0x1}
    vinserti32x4\t{$0x1, %2, %1, %0|%0, %1, %2, 0x1}"
-  [(set_attr "type" "sselog")
+  [(set_attr "isa" "noavx512vl,avx512vl")
+   (set_attr "gpr32" "0,1")
+   (set_attr "type" "sselog")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "vex,evex")
@@ -27485,7 +27567,7 @@ (define_insn "vec_set_hi_<mode>"
 (define_insn "vec_set_lo_v32qi"
   [(set (match_operand:V32QI 0 "register_operand" "=x,v")
 	(vec_concat:V32QI
-	  (match_operand:V16QI 2 "nonimmediate_operand" "xm,v")
+	  (match_operand:V16QI 2 "nonimmediate_operand" "xBt,v")
 	  (vec_select:V16QI
 	    (match_operand:V32QI 1 "register_operand" "x,v")
 	    (parallel [(const_int 16) (const_int 17)
@@ -27501,6 +27583,7 @@ (define_insn "vec_set_lo_v32qi"
    vinsert%~128\t{$0x0, %2, %1, %0|%0, %1, %2, 0x0}
    vinserti32x4\t{$0x0, %2, %1, %0|%0, %1, %2, 0x0}"
   [(set_attr "type" "sselog")
+   (set_attr "gpr32" "0")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "vex,evex")
@@ -27519,12 +27602,14 @@ (define_insn "vec_set_hi_v32qi"
 		       (const_int 10) (const_int 11)
 		       (const_int 12) (const_int 13)
 		       (const_int 14) (const_int 15)]))
-	  (match_operand:V16QI 2 "nonimmediate_operand" "xm,vm")))]
+	  (match_operand:V16QI 2 "nonimmediate_operand" "xBt,vm")))]
   "TARGET_AVX"
   "@
    vinsert%~128\t{$0x1, %2, %1, %0|%0, %1, %2, 0x1}
    vinserti32x4\t{$0x1, %2, %1, %0|%0, %1, %2, 0x1}"
-  [(set_attr "type" "sselog")
+  [(set_attr "isa" "noavx512vl,avx512vl")
+   (set_attr "gpr32" "0")
+   (set_attr "type" "sselog")
    (set_attr "prefix_extra" "1")
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "vex,evex")
@@ -27534,7 +27619,7 @@ (define_insn "<avx_avx2>_maskload<ssemodesuffix><avxsizesuffix>"
   [(set (match_operand:V48_128_256 0 "register_operand" "=x")
 	(unspec:V48_128_256
 	  [(match_operand:<sseintvecmode> 2 "register_operand" "x")
-	   (match_operand:V48_128_256 1 "memory_operand" "m")]
+	   (match_operand:V48_128_256 1 "memory_operand" "Bt")]
 	  UNSPEC_MASKMOV))]
   "TARGET_AVX"
 {
@@ -27544,13 +27629,14 @@ (define_insn "<avx_avx2>_maskload<ssemodesuffix><avxsizesuffix>"
     return "vmaskmov<ssefltmodesuffix>\t{%1, %2, %0|%0, %2, %1}";
 }
   [(set_attr "type" "sselog1")
+   (set_attr "gpr32" "0")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "vex")
    (set_attr "btver2_decode" "vector")
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_insn "<avx_avx2>_maskstore<ssemodesuffix><avxsizesuffix>"
-  [(set (match_operand:V48_128_256 0 "memory_operand" "+m")
+  [(set (match_operand:V48_128_256 0 "memory_operand" "+Bt")
 	(unspec:V48_128_256
 	  [(match_operand:<sseintvecmode> 1 "register_operand" "x")
 	   (match_operand:V48_128_256 2 "register_operand" "x")
@@ -27564,6 +27650,7 @@ (define_insn "<avx_avx2>_maskstore<ssemodesuffix><avxsizesuffix>"
     return "vmaskmov<ssefltmodesuffix>\t{%2, %1, %0|%0, %1, %2}";
 }
   [(set_attr "type" "sselog1")
+   (set_attr "gpr32" "0")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "vex")
    (set_attr "btver2_decode" "vector") 
@@ -28160,7 +28247,7 @@ (define_insn "*avx2_gathersi<VEC_GATHER_MODE:mode>"
 	  [(match_operand:VEC_GATHER_MODE 2 "register_operand" "0")
 	   (match_operator:<ssescalarmode> 7 "vsib_mem_operator"
 	     [(unspec:P
-		[(match_operand:P 3 "vsib_address_operand" "Tv")
+		[(match_operand:P 3 "vsib_address_operand" "TV")
 		 (match_operand:<VEC_GATHER_IDXSI> 4 "register_operand" "x")
 		 (match_operand:SI 6 "const1248_operand")]
 		UNSPEC_VSIBADDR)])
@@ -28171,6 +28258,7 @@ (define_insn "*avx2_gathersi<VEC_GATHER_MODE:mode>"
   "TARGET_AVX2"
   "%M3v<sseintprefix>gatherd<ssemodesuffix>\t{%1, %7, %0|%0, %7, %1}"
   [(set_attr "type" "ssemov")
+   (set_attr "gpr32" "0")
    (set_attr "prefix" "vex")
    (set_attr "mode" "<sseinsnmode>")])
 
@@ -28180,7 +28268,7 @@ (define_insn "*avx2_gathersi<VEC_GATHER_MODE:mode>_2"
 	  [(pc)
 	   (match_operator:<ssescalarmode> 6 "vsib_mem_operator"
 	     [(unspec:P
-		[(match_operand:P 2 "vsib_address_operand" "Tv")
+		[(match_operand:P 2 "vsib_address_operand" "TV")
 		 (match_operand:<VEC_GATHER_IDXSI> 3 "register_operand" "x")
 		 (match_operand:SI 5 "const1248_operand")]
 		UNSPEC_VSIBADDR)])
@@ -28191,6 +28279,7 @@ (define_insn "*avx2_gathersi<VEC_GATHER_MODE:mode>_2"
   "TARGET_AVX2"
   "%M2v<sseintprefix>gatherd<ssemodesuffix>\t{%1, %6, %0|%0, %6, %1}"
   [(set_attr "type" "ssemov")
+   (set_attr "gpr32" "0")
    (set_attr "prefix" "vex")
    (set_attr "mode" "<sseinsnmode>")])
 
@@ -28221,7 +28310,7 @@ (define_insn "*avx2_gatherdi<VEC_GATHER_MODE:mode>"
 	  [(match_operand:<VEC_GATHER_SRCDI> 2 "register_operand" "0")
 	   (match_operator:<ssescalarmode> 7 "vsib_mem_operator"
 	     [(unspec:P
-		[(match_operand:P 3 "vsib_address_operand" "Tv")
+		[(match_operand:P 3 "vsib_address_operand" "TV")
 		 (match_operand:<VEC_GATHER_IDXDI> 4 "register_operand" "x")
 		 (match_operand:SI 6 "const1248_operand")]
 		UNSPEC_VSIBADDR)])
@@ -28232,6 +28321,7 @@ (define_insn "*avx2_gatherdi<VEC_GATHER_MODE:mode>"
   "TARGET_AVX2"
   "%M3v<sseintprefix>gatherq<ssemodesuffix>\t{%5, %7, %2|%2, %7, %5}"
   [(set_attr "type" "ssemov")
+   (set_attr "gpr32" "0")
    (set_attr "prefix" "vex")
    (set_attr "mode" "<sseinsnmode>")])
 
@@ -28241,7 +28331,7 @@ (define_insn "*avx2_gatherdi<VEC_GATHER_MODE:mode>_2"
 	  [(pc)
 	   (match_operator:<ssescalarmode> 6 "vsib_mem_operator"
 	     [(unspec:P
-		[(match_operand:P 2 "vsib_address_operand" "Tv")
+		[(match_operand:P 2 "vsib_address_operand" "TV")
 		 (match_operand:<VEC_GATHER_IDXDI> 3 "register_operand" "x")
 		 (match_operand:SI 5 "const1248_operand")]
 		UNSPEC_VSIBADDR)])
@@ -28256,6 +28346,7 @@ (define_insn "*avx2_gatherdi<VEC_GATHER_MODE:mode>_2"
   return "%M2v<sseintprefix>gatherq<ssemodesuffix>\t{%4, %6, %0|%0, %6, %4}";
 }
   [(set_attr "type" "ssemov")
+   (set_attr "gpr32" "0")
    (set_attr "prefix" "vex")
    (set_attr "mode" "<sseinsnmode>")])
 
@@ -28266,7 +28357,7 @@ (define_insn "*avx2_gatherdi<VI4F_256:mode>_3"
 	    [(match_operand:<VEC_GATHER_SRCDI> 2 "register_operand" "0")
 	     (match_operator:<ssescalarmode> 7 "vsib_mem_operator"
 	       [(unspec:P
-		  [(match_operand:P 3 "vsib_address_operand" "Tv")
+		  [(match_operand:P 3 "vsib_address_operand" "TV")
 		   (match_operand:<VEC_GATHER_IDXDI> 4 "register_operand" "x")
 		   (match_operand:SI 6 "const1248_operand")]
 		  UNSPEC_VSIBADDR)])
@@ -28279,6 +28370,7 @@ (define_insn "*avx2_gatherdi<VI4F_256:mode>_3"
   "TARGET_AVX2"
   "%M3v<sseintprefix>gatherq<ssemodesuffix>\t{%5, %7, %0|%0, %7, %5}"
   [(set_attr "type" "ssemov")
+   (set_attr "gpr32" "0")
    (set_attr "prefix" "vex")
    (set_attr "mode" "<sseinsnmode>")])
 
@@ -28289,7 +28381,7 @@ (define_insn "*avx2_gatherdi<VI4F_256:mode>_4"
 	    [(pc)
 	     (match_operator:<ssescalarmode> 6 "vsib_mem_operator"
 	       [(unspec:P
-		  [(match_operand:P 2 "vsib_address_operand" "Tv")
+		  [(match_operand:P 2 "vsib_address_operand" "TV")
 		   (match_operand:<VEC_GATHER_IDXDI> 3 "register_operand" "x")
 		   (match_operand:SI 5 "const1248_operand")]
 		  UNSPEC_VSIBADDR)])
@@ -28302,6 +28394,7 @@ (define_insn "*avx2_gatherdi<VI4F_256:mode>_4"
   "TARGET_AVX2"
   "%M2v<sseintprefix>gatherq<ssemodesuffix>\t{%4, %6, %0|%0, %6, %4}"
   [(set_attr "type" "ssemov")
+   (set_attr "gpr32" "0")
    (set_attr "prefix" "vex")
    (set_attr "mode" "<sseinsnmode>")])
 
-- 
2.31.1


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 06/13] [APX EGPR] Map reg/mem constraints in inline asm to non-EGPR constraint.
  2023-08-31  8:20 ` [PATCH 06/13] [APX EGPR] Map reg/mem constraints in inline asm to non-EGPR constraint Hongyu Wang
@ 2023-08-31  9:17   ` Jakub Jelinek
  2023-08-31 10:00     ` Uros Bizjak
  2023-09-01  9:04     ` Hongyu Wang
  0 siblings, 2 replies; 49+ messages in thread
From: Jakub Jelinek @ 2023-08-31  9:17 UTC (permalink / raw)
  To: Hongyu Wang; +Cc: gcc-patches, hongtao.liu, hubicka

On Thu, Aug 31, 2023 at 04:20:17PM +0800, Hongyu Wang via Gcc-patches wrote:
> From: Kong Lingling <lingling.kong@intel.com>
> 
> In inline asm, we do not know if the insn can use EGPR, so disable EGPR
> usage by default from mapping the common reg/mem constraint to non-EGPR
> constraints. Use a flag mapx-inline-asm-use-gpr32 to enable EGPR usage
> for inline asm.
> 
> gcc/ChangeLog:
> 
> 	* config/i386/i386.cc (INCLUDE_STRING): Add include for
> 	ix86_md_asm_adjust.
> 	(ix86_md_asm_adjust): When APX EGPR enabled without specifying the
> 	target option, map reg/mem constraints to non-EGPR constraints.
> 	* config/i386/i386.opt: Add option mapx-inline-asm-use-gpr32.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/i386/apx-inline-gpr-norex2.c: New test.
> ---
>  gcc/config/i386/i386.cc                       |  44 +++++++
>  gcc/config/i386/i386.opt                      |   5 +
>  .../gcc.target/i386/apx-inline-gpr-norex2.c   | 107 ++++++++++++++++++
>  3 files changed, 156 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/apx-inline-gpr-norex2.c
> 
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index d26d9ab0d9d..9460ebbfda4 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -17,6 +17,7 @@ You should have received a copy of the GNU General Public License
>  along with GCC; see the file COPYING3.  If not see
>  <http://www.gnu.org/licenses/>.  */
>  
> +#define INCLUDE_STRING
>  #define IN_TARGET_CODE 1
>  
>  #include "config.h"
> @@ -23077,6 +23078,49 @@ ix86_md_asm_adjust (vec<rtx> &outputs, vec<rtx> & /*inputs*/,
>    bool saw_asm_flag = false;
>  
>    start_sequence ();
> +  /* TODO: Here we just mapped the general r/m constraints to non-EGPR
> +   constraints, will eventually map all the usable constraints in the future. */

I think there should be some constraint which explicitly has all the 32
GPRs, like there is one for just all 16 GPRs (h), so that regardless of
-mapx-inline-asm-use-gpr32 one can be explicit what the inline asm wants.

Also, what about the "g" constraint?  Shouldn't there be another for "g"
without r16..r31?  What about the various other memory
constraints ("<", "o", ...)?

> +  if (TARGET_APX_EGPR && !ix86_apx_inline_asm_use_gpr32)
> +    {
> +      /* Map "r" constraint in inline asm to "h" that disallows r16-r31
> +	 and replace only r, exclude Br and Yr.  */
> +      for (unsigned i = 0; i < constraints.length (); i++)
> +	{
> +	  std::string *s = new std::string (constraints[i]);

Doesn't this leak memory (all the time)?
I must say I don't really understand why you need to use std::string here,
but certainly it shouldn't leak.

> +	  size_t pos = s->find ('r');
> +	  while (pos != std::string::npos)
> +	    {
> +	      if (pos > 0
> +		  && (s->at (pos - 1) == 'Y' || s->at (pos - 1) == 'B'))
> +		pos = s->find ('r', pos + 1);
> +	      else
> +		{
> +		  s->replace (pos, 1, "h");
> +		  constraints[i] = (const char*) s->c_str ();

Formatting (space before *).  The usual way for constraints is ggc_strdup on
some string in a buffer.  Also, one could have several copies or r (or m, memory (doesn't
that appear just in clobbers?  And that doesn't look like something that
should be replaced), Bm, e.g. in various alternatives.  So, you
need to change them all, not just the first hit.  "r,r,r,m" and the like.
Normally, one would simply walk the constraint string, parsing the special
letters (+, =, & etc.) and single letter constraints and 2 letter
constraints using CONSTRAINT_LEN macro (tons of examples in GCC sources).
Either do it in 2 passes, first one counts how long constraint string one
will need after the adjustments (and whether to adjust something at all),
then if needed XALLOCAVEC it and adjust in there, or say use a
auto_vec<char, 32> for
it.

> +		  break;
> +		}
> +	    }
> +	}
> +      /* Also map "m/memory/Bm" constraint that may use GPR32, replace them with
> +	 "Bt/Bt/BT".  */
> +      for (unsigned i = 0; i < constraints.length (); i++)
> +	{
> +	  std::string *s = new std::string (constraints[i]);
> +	  size_t pos = s->find ("m");
> +	  size_t pos2 = s->find ("memory");
> +	  if (pos != std::string::npos)
> +	    {
> +	      if (pos > 0 && (s->at (pos - 1) == 'B'))
> +		  s->replace (pos - 1, 2, "BT");
> +	      else if (pos2 != std::string::npos)
> +		  s->replace (pos, 6, "Bt");
> +	      else
> +		  s->replace (pos, 1, "Bt");

Formatting, the s->replace calls are indented too much.

	Jakub


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 00/13] [RFC] Support Intel APX EGPR
  2023-08-31  8:20 [PATCH 00/13] [RFC] Support Intel APX EGPR Hongyu Wang
                   ` (12 preceding siblings ...)
  2023-08-31  8:20 ` [PATCH 13/13] [APX EGPR] Handle vex insns that only support GPR16 (5/5) Hongyu Wang
@ 2023-08-31  9:19 ` Richard Biener
  2023-09-01  8:55   ` Hongyu Wang
  13 siblings, 1 reply; 49+ messages in thread
From: Richard Biener @ 2023-08-31  9:19 UTC (permalink / raw)
  To: Hongyu Wang; +Cc: gcc-patches, jakub, hongtao.liu, hubicka

On Thu, Aug 31, 2023 at 10:22 AM Hongyu Wang via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Intel Advanced performance extension (APX) has been released in [1].
> It contains several extensions such as extended 16 general purpose registers
> (EGPRs), push2/pop2, new data destination (NDD), conditional compare
> (CCMP/CTEST) combined with suppress flags write version of common instructions
> (NF). This RFC focused on EGPR implementation in GCC.
>
> APX introduces a REX2 prefix to help represent EGPR for several legacy/SSE
> instructions. For the remaining ones, it promotes some of them using evex
> prefix for EGPR.  The main issue in APX is that not all legacy/sse/vex
> instructions support EGPR. For example, instructions in legacy opcode map2/3
> cannot use REX2 prefix since there is only 1bit in REX2 to indicate map0/1
> instructions, e.g., pinsrd. Also, for most vector extensions, EGPR is supported
> in their evex forms but not vex forms, which means the mnemonics with no evex
> forms also cannot use EGPR, e.g., vphaddw.
>
> Such limitation brings some challenge with current GCC infrastructure.
> Generally, we use constraints to guide register allocation behavior. For
> register operand, it is easy to add a new constraint to certain insn and limit
> it to legacy or REX registers. But for memory operand, if we only use
> constraint to limit base/index register choice, reload has no backoff when
> process_address allocates any egprs to base/index reg, and then any post-reload
> pass would get ICE from the constraint.

How realistic would it be to simply disable instructions not supporting EGPR?
I hope there are alternatives that would be available in actual APX
implementations?
Otherwise this design limitation doesn't shed a very positive light on
the designers ...

How sure are we actual implementations with APX will appear (just
remembering SSE5...)?
I'm quite sure it's not going to be 2024 so would it be realistic to
post-pone APX work
to next stage1, targeting GCC 15 only?

> Here is what we did to address the issue:
>
> Middle-end:
> -       Add rtx_insn parameter to base_reg_class, reuse the
> MODE_CODE_BASE_REG_CLASS macro with rtx_insn parameter.
> -       Add index_reg_class like base_reg_class, calls new INSN_INDEX_REG_CLASS
> macro with rtx_insn parameter.
> -       In process_address_1, add rtx_insn parameter to call sites of
> base_reg_class, replace usage of INDEX_REG_CLASS to index_reg_class with
> rtx_insn parameter.
>
> Back-end:
> -       Extend GENERAL_REG_CLASS, INDEX_REG_CLASS and their supersets with
> corresponding regno checks for EGPRs.
> -       Add GENERAL_GPR16/INDEX_GPR16 class for old 16 GPRs.
> -       Whole component is controlled under -mapxf/TARGET_APX_EGPR. If it is
> not enabled, clear r16-r31 in accessible_reg_set.
> -       New register_constraint “h” and memory_constraint “Bt” that disallows
> EGPRs in operand.
> -       New asm_gpr32 flag option to enable/disable gpr32 for inline asm,
>   disabled by default.
> -       If asm_gpr32 is disabled, replace constraints “r” to “h”, and
> “m/memory” to “Bt”.
> -       Extra insn attribute gpr32, value 0 indicates the alternative cannot
> use EGPRs.
> -       Add target functions for base_reg_class and index_reg_class, calls a
> helper function to verify if insn can use EGPR in its memory_operand.
> -       In the helper function, the verify process works as follow:
>     1. Returns true if APX_EGPR disabled or insn is null.
>     2. If the insn is inline asm, returns asm_gpr32 flag.
>     3. Returns false for unrecognizable insn.
>     4. Save recog_data and which_alternative, extract the insn, and restore them
>     before return.
>     5. Loop through all enabled alternatives, if one of the enabled alternatives
>     have attr_gpr32 0, returns false, otherwise returns true.
> -       For insn alternatives that cannot use gpr32 in register_operand, use h
> constraint instead of r.
> -       For insn alternatives that cannot use gpr32 in memory operand, use Bt
> constraint instead of m, and set corresponding attr_gpr32 to 0.
> -       Split output template with %v if the sse version of mnemonic cannot use
> gpr32.
> -       For insn alternatives that cannot use gpr32 in memory operand, classify
> the isa attribute and split alternatives to noavx, avx_noavx512f and etc., so
> the helper function can properly loop through the available enabled mask.
>
> Specifically for inline asm, we currently just map “r/m/memory” constraints as
> an example. Eventually we will support entire mapping of all common constraints
> if the mapping method was accepted.
>
> Also, for vex instructions, currently we assume egpr was supported if they have
> evex counterpart, since any APX enabled machine will have AVX10 support for all
> the evex encodings. We just disabled those mnemonics that doesn’t support EGPR.
> So EGPR will be allowed under -mavx2 -mapxf for many vex mnemonics.
>
> We haven’t disabled EGPR for 3DNOW/XOP/LWP/FMA4/TBM instructions, as they will
> be co-operated with -mapxf. We can disable EGPR for them if AMD guys requires.

I think most of these are retired by now, so it's unlikely an
implementation providing
these and also APX will appear.

I have no comments on the implementation other than having instructions
that do not support the upper GPRs is quite ugly.  I don't know of any other
target with this kind of restriction, if there is any we could see how it deals
with such situation.

Richard.

> For testing, currently we tested GCC testsuite and spec2017 with -maxf+sde
> simulater and no more errors. Also, we inverted the register allocation order
> to force r31 to be allocated first, and no more error except those AMD only
> instructions. We will conduct further tests like changing all do-compile to
> do-assemble and add more to gcc/testsuite in the future.
>
> The RFC intends to describe our approach for APX implementation for EGPR
> component. It may still have potential issues or bugs and requires futher
> optimization. Any comments are very appreciated.
>
> [1]. https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html.
>
> Hongyu Wang (2):
>   [APX EGPR] middle-end: Add index_reg_class with insn argument.
>   [APX EGPR] Handle GPR16 only vector move insns
>
> Kong Lingling (11):
>   [APX EGPR] middle-end: Add insn argument to base_reg_class
>   [APX_EGPR] Initial support for APX_F
>   [APX EGPR] Add 16 new integer general purpose registers
>   [APX EGPR] Add register and memory constraints that disallow EGPR
>   [APX EGPR] Map reg/mem constraints in inline asm to non-EGPR
>     constraint.
>   [APX EGPR] Add backend hook for base_reg_class/index_reg_class.
>   [APX EGPR] Handle legacy insn that only support GPR16 (1/5)
>   [APX EGPR] Handle legacy insns that only support GPR16 (2/5)
>   [APX EGPR] Handle legacy insns that only support GPR16 (3/5)
>   [APX_EGPR] Handle legacy insns that only support GPR16 (4/5)
>   [APX EGPR] Handle vex insns that only support GPR16 (5/5)
>
>  gcc/addresses.h                               |  25 +-
>  gcc/common/config/i386/cpuinfo.h              |  12 +-
>  gcc/common/config/i386/i386-common.cc         |  17 +
>  gcc/common/config/i386/i386-cpuinfo.h         |   1 +
>  gcc/common/config/i386/i386-isas.h            |   1 +
>  gcc/config/avr/avr.h                          |   5 +-
>  gcc/config/gcn/gcn.h                          |   4 +-
>  gcc/config/i386/constraints.md                |  26 +-
>  gcc/config/i386/cpuid.h                       |   1 +
>  gcc/config/i386/i386-isa.def                  |   1 +
>  gcc/config/i386/i386-options.cc               |  15 +
>  gcc/config/i386/i386-opts.h                   |   8 +
>  gcc/config/i386/i386-protos.h                 |   9 +
>  gcc/config/i386/i386.cc                       | 253 +++++-
>  gcc/config/i386/i386.h                        |  69 +-
>  gcc/config/i386/i386.md                       | 144 ++-
>  gcc/config/i386/i386.opt                      |  30 +
>  gcc/config/i386/mmx.md                        | 170 ++--
>  gcc/config/i386/sse.md                        | 859 ++++++++++++------
>  gcc/config/rl78/rl78.h                        |   6 +-
>  gcc/doc/invoke.texi                           |  11 +-
>  gcc/doc/tm.texi                               |  17 +-
>  gcc/doc/tm.texi.in                            |  17 +-
>  gcc/lra-constraints.cc                        |  32 +-
>  gcc/reload.cc                                 |  34 +-
>  gcc/reload1.cc                                |   2 +-
>  gcc/testsuite/gcc.target/i386/apx-1.c         |   8 +
>  .../gcc.target/i386/apx-egprs-names.c         |  17 +
>  .../gcc.target/i386/apx-inline-gpr-norex2.c   | 108 +++
>  .../gcc.target/i386/apx-interrupt-1.c         | 102 +++
>  .../i386/apx-legacy-insn-check-norex2-asm.c   |   5 +
>  .../i386/apx-legacy-insn-check-norex2.c       | 181 ++++
>  .../gcc.target/i386/apx-spill_to_egprs-1.c    |  25 +
>  gcc/testsuite/lib/target-supports.exp         |  10 +
>  34 files changed, 1747 insertions(+), 478 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/apx-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/apx-egprs-names.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/apx-inline-gpr-norex2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/apx-interrupt-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2-asm.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/apx-spill_to_egprs-1.c
>
> --
> 2.31.1
>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 11/13] [APX EGPR] Handle legacy insns that only support GPR16 (3/5)
  2023-08-31  8:20 ` [PATCH 11/13] [APX EGPR] Handle legacy insns that only support GPR16 (3/5) Hongyu Wang
@ 2023-08-31  9:26   ` Richard Biener
  2023-08-31  9:28     ` Richard Biener
  2023-08-31  9:31     ` Jakub Jelinek
  0 siblings, 2 replies; 49+ messages in thread
From: Richard Biener @ 2023-08-31  9:26 UTC (permalink / raw)
  To: Hongyu Wang; +Cc: gcc-patches, jakub, hongtao.liu, hubicka

On Thu, Aug 31, 2023 at 10:25 AM Hongyu Wang via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> From: Kong Lingling <lingling.kong@intel.com>
>
> Disable EGPR usage for below legacy insns in opcode map2/3 that have vex
> but no evex counterpart.
>
> insn list:
> 1. phminposuw/vphminposuw
> 2. ptest/vptest
> 3. roundps/vroundps, roundpd/vroundpd,
>    roundss/vroundss, roundsd/vroundsd
> 4. pcmpestri/vpcmpestri, pcmpestrm/vpcmpestrm
> 5. pcmpistri/vpcmpistri, pcmpistrm/vpcmpistrm

How are GPRs involved in the above?  Or did I misunderstand something?

> 6. aesimc/vaesimc, aeskeygenassist/vaeskeygenassist
>
> gcc/ChangeLog:
>
>         * config/i386/i386-protos.h (x86_evex_reg_mentioned_p): New
>         prototype.
>         * config/i386/i386.cc (x86_evex_reg_mentioned_p): New
>         function.
>         * config/i386/i386.md (sse4_1_round<mode>2): Set attr gpr32 0
>         and constraint Bt/BM to all non-evex alternatives, adjust
>         alternative outputs if evex reg is mentioned.
>         * config/i386/sse.md (<sse4_1>_ptest<mode>): Set attr gpr32 0
>         and constraint Bt/BM to all non-evex alternatives.
>         (ptesttf2): Likewise.
>         (<sse4_1>_round<ssemodesuffix><avxsizesuffix): Likewise.
>         (sse4_1_round<ssescalarmodesuffix>): Likewise.
>         (sse4_2_pcmpestri): Likewise.
>         (sse4_2_pcmpestrm): Likewise.
>         (sse4_2_pcmpestr_cconly): Likewise.
>         (sse4_2_pcmpistr): Likewise.
>         (sse4_2_pcmpistri): Likewise.
>         (sse4_2_pcmpistrm): Likewise.
>         (sse4_2_pcmpistr_cconly): Likewise.
>         (aesimc): Likewise.
>         (aeskeygenassist): Likewise.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/i386/apx-legacy-insn-check-norex2.c: Add intrinsic
>         tests.
> ---
>  gcc/config/i386/i386-protos.h                 |  1 +
>  gcc/config/i386/i386.cc                       | 13 +++
>  gcc/config/i386/i386.md                       |  3 +-
>  gcc/config/i386/sse.md                        | 93 +++++++++++++------
>  .../i386/apx-legacy-insn-check-norex2.c       | 55 ++++++++++-
>  5 files changed, 132 insertions(+), 33 deletions(-)
>
> diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
> index 78eb3e0f584..bbb219e3039 100644
> --- a/gcc/config/i386/i386-protos.h
> +++ b/gcc/config/i386/i386-protos.h
> @@ -65,6 +65,7 @@ extern bool extended_reg_mentioned_p (rtx);
>  extern bool x86_extended_QIreg_mentioned_p (rtx_insn *);
>  extern bool x86_extended_reg_mentioned_p (rtx);
>  extern bool x86_extended_rex2reg_mentioned_p (rtx);
> +extern bool x86_evex_reg_mentioned_p (rtx [], int);
>  extern bool x86_maybe_negate_const_int (rtx *, machine_mode);
>  extern machine_mode ix86_cc_mode (enum rtx_code, rtx, rtx);
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index f5d642948bc..ec93c5bab97 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -22936,6 +22936,19 @@ x86_extended_rex2reg_mentioned_p (rtx insn)
>    return false;
>  }
>
> +/* Return true when rtx operands mentions register that must be encoded using
> +   evex prefix.  */
> +bool
> +x86_evex_reg_mentioned_p (rtx operands[], int nops)
> +{
> +  int i;
> +  for (i = 0; i < nops; i++)
> +    if (EXT_REX_SSE_REG_P (operands[i])
> +       || x86_extended_rex2reg_mentioned_p (operands[i]))
> +      return true;
> +  return false;
> +}
> +
>  /* If profitable, negate (without causing overflow) integer constant
>     of mode MODE at location LOC.  Return true in this case.  */
>  bool
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index 83ad01b43c1..4c305e72389 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -21603,7 +21603,7 @@ (define_expand "significand<mode>2"
>  (define_insn "sse4_1_round<mode>2"
>    [(set (match_operand:MODEFH 0 "register_operand" "=x,x,x,v,v")
>         (unspec:MODEFH
> -         [(match_operand:MODEFH 1 "nonimmediate_operand" "0,x,m,v,m")
> +         [(match_operand:MODEFH 1 "nonimmediate_operand" "0,x,Bt,v,m")
>            (match_operand:SI 2 "const_0_to_15_operand")]
>           UNSPEC_ROUND))]
>    "TARGET_SSE4_1"
> @@ -21616,6 +21616,7 @@ (define_insn "sse4_1_round<mode>2"
>    [(set_attr "type" "ssecvt")
>     (set_attr "prefix_extra" "1,1,1,*,*")
>     (set_attr "length_immediate" "1")
> +   (set_attr "gpr32" "1,1,0,1,1")
>     (set_attr "prefix" "maybe_vex,maybe_vex,maybe_vex,evex,evex")
>     (set_attr "isa" "noavx512f,noavx512f,noavx512f,avx512f,avx512f")
>     (set_attr "avx_partial_xmm_update" "false,false,true,false,true")
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index 05963de9219..456713b991a 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -22617,11 +22617,12 @@ (define_insn "avx2_pblendd<mode>"
>
>  (define_insn "sse4_1_phminposuw"
>    [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,x")
> -       (unspec:V8HI [(match_operand:V8HI 1 "vector_operand" "YrBm,*xBm,xm")]
> +       (unspec:V8HI [(match_operand:V8HI 1 "vector_operand" "YrBT,*xBT,xBt")]
>                      UNSPEC_PHMINPOSUW))]
>    "TARGET_SSE4_1"
>    "%vphminposuw\t{%1, %0|%0, %1}"
>    [(set_attr "isa" "noavx,noavx,avx")
> +   (set_attr "gpr32" "0")
>     (set_attr "type" "sselog1")
>     (set_attr "prefix_extra" "1")
>     (set_attr "prefix" "orig,orig,vex")
> @@ -23810,12 +23811,13 @@ (define_insn "avx_vtest<ssemodesuffix><avxsizesuffix>"
>  (define_insn "*<sse4_1>_ptest<mode>"
>    [(set (reg FLAGS_REG)
>         (unspec [(match_operand:V_AVX 0 "register_operand" "Yr, *x, x")
> -                (match_operand:V_AVX 1 "vector_operand" "YrBm, *xBm, xm")]
> +                (match_operand:V_AVX 1 "vector_operand" "YrBT, *xBT, xBt")]
>                 UNSPEC_PTEST))]
>    "TARGET_SSE4_1 && ix86_match_ptest_ccmode (insn)"
>    "%vptest\t{%1, %0|%0, %1}"
>    [(set_attr "isa" "noavx,noavx,avx")
>     (set_attr "type" "ssecomi")
> +   (set_attr "gpr32" "0")
>     (set_attr "prefix_extra" "1")
>     (set_attr "prefix" "orig,orig,vex")
>     (set (attr "btver2_decode")
> @@ -23852,12 +23854,13 @@ (define_expand "<sse4_1>_ptest<mode>"
>  (define_insn "ptesttf2"
>    [(set (reg:CC FLAGS_REG)
>         (unspec:CC [(match_operand:TF 0 "register_operand" "Yr, *x, x")
> -                   (match_operand:TF 1 "vector_operand" "YrBm, *xBm, xm")]
> +                   (match_operand:TF 1 "vector_operand" "YrBT, *xBT, xBt")]
>                    UNSPEC_PTEST))]
>    "TARGET_SSE4_1"
>    "%vptest\t{%1, %0|%0, %1}"
>    [(set_attr "isa" "noavx,noavx,avx")
>     (set_attr "type" "ssecomi")
> +   (set_attr "gpr32" "0")
>     (set_attr "prefix_extra" "1")
>     (set_attr "prefix" "orig,orig,vex")
>     (set_attr "mode" "TI")])
> @@ -23968,13 +23971,14 @@ (define_expand "lrint<mode><sseintvecmodelower>2"
>  (define_insn "<sse4_1>_round<ssemodesuffix><avxsizesuffix>"
>    [(set (match_operand:VF_128_256 0 "register_operand" "=Yr,*x,x")
>         (unspec:VF_128_256
> -         [(match_operand:VF_128_256 1 "vector_operand" "YrBm,*xBm,xm")
> +         [(match_operand:VF_128_256 1 "vector_operand" "YrBT,*xBT,xBt")
>            (match_operand:SI 2 "const_0_to_15_operand")]
>           UNSPEC_ROUND))]
>    "TARGET_SSE4_1"
>    "%vround<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
>    [(set_attr "isa" "noavx,noavx,avx")
>     (set_attr "type" "ssecvt")
> +   (set_attr "gpr32" "0")
>     (set_attr "prefix_data16" "1,1,*")
>     (set_attr "prefix_extra" "1")
>     (set_attr "length_immediate" "1")
> @@ -24061,19 +24065,32 @@ (define_insn "sse4_1_round<ssescalarmodesuffix>"
>    [(set (match_operand:VF_128 0 "register_operand" "=Yr,*x,x,v")
>         (vec_merge:VF_128
>           (unspec:VF_128
> -           [(match_operand:VF_128 2 "nonimmediate_operand" "Yrm,*xm,xm,vm")
> +           [(match_operand:VF_128 2 "nonimmediate_operand" "YrBt,*xBt,xBt,vm")
>              (match_operand:SI 3 "const_0_to_15_operand")]
>             UNSPEC_ROUND)
>           (match_operand:VF_128 1 "register_operand" "0,0,x,v")
>           (const_int 1)))]
>    "TARGET_SSE4_1"
> -  "@
> -   round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %3}
> -   round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %3}
> -   vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}
> -   vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}"
> -  [(set_attr "isa" "noavx,noavx,avx,avx512f")
> +{
> +  switch (which_alternative)
> +    {
> +      case 0:
> +      case 1:
> +       return "round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %3}";
> +      case 2:
> +       return "vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}";
> +      case 3:
> +       if (x86_evex_reg_mentioned_p (operands, 3))
> +         return "vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}";
> +       else
> +         return "vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}";
> +      default:
> +       gcc_unreachable ();
> +    }
> +}
> +  [(set_attr "isa" "noavx,noavx,noavx512f,avx512f")
>     (set_attr "type" "ssecvt")
> +   (set_attr "gpr32" "0,0,0,1")
>     (set_attr "length_immediate" "1")
>     (set_attr "prefix_data16" "1,1,*,*")
>     (set_attr "prefix_extra" "1")
> @@ -24085,19 +24102,32 @@ (define_insn "*sse4_1_round<ssescalarmodesuffix>"
>         (vec_merge:VFH_128
>           (vec_duplicate:VFH_128
>             (unspec:<ssescalarmode>
> -             [(match_operand:<ssescalarmode> 2 "nonimmediate_operand" "Yrm,*xm,xm,vm")
> +             [(match_operand:<ssescalarmode> 2 "nonimmediate_operand" "YrBt,*xBt,xBt,vm")
>                (match_operand:SI 3 "const_0_to_15_operand")]
>               UNSPEC_ROUND))
>           (match_operand:VFH_128 1 "register_operand" "0,0,x,v")
>           (const_int 1)))]
>    "TARGET_SSE4_1"
> -  "@
> -   round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %2, %3}
> -   round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %2, %3}
> -   vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}
> -   vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
> -  [(set_attr "isa" "noavx,noavx,avx,avx512f")
> +{
> +  switch (which_alternative)
> +    {
> +      case 0:
> +      case 1:
> +       return "round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %2, %3}";
> +      case 2:
> +       return "vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}";
> +      case 3:
> +       if (x86_evex_reg_mentioned_p (operands, 3) || <MODE>mode == V8HFmode)
> +         return "vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}";
> +       else
> +         return "vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}";
> +      default:
> +       gcc_unreachable ();
> +    }
> +}
> +  [(set_attr "isa" "noavx,noavx,noavx512f,avx512f")
>     (set_attr "type" "ssecvt")
> +   (set_attr "gpr32" "0,0,0,1")
>     (set_attr "length_immediate" "1")
>     (set_attr "prefix_data16" "1,1,*,*")
>     (set_attr "prefix_extra" "1")
> @@ -24318,7 +24348,7 @@ (define_insn "sse4_2_pcmpestri"
>         (unspec:SI
>           [(match_operand:V16QI 1 "register_operand" "x,x")
>            (match_operand:SI 2 "register_operand" "a,a")
> -          (match_operand:V16QI 3 "nonimmediate_operand" "x,m")
> +          (match_operand:V16QI 3 "nonimmediate_operand" "x,Bt")
>            (match_operand:SI 4 "register_operand" "d,d")
>            (match_operand:SI 5 "const_0_to_255_operand")]
>           UNSPEC_PCMPESTR))
> @@ -24333,6 +24363,7 @@ (define_insn "sse4_2_pcmpestri"
>    "TARGET_SSE4_2"
>    "%vpcmpestri\t{%5, %3, %1|%1, %3, %5}"
>    [(set_attr "type" "sselog")
> +   (set_attr "gpr32" "0")
>     (set_attr "prefix_extra" "1")
>     (set_attr "prefix" "maybe_vex")
>     (set_attr "length_immediate" "1")
> @@ -24345,7 +24376,7 @@ (define_insn "sse4_2_pcmpestrm"
>         (unspec:V16QI
>           [(match_operand:V16QI 1 "register_operand" "x,x")
>            (match_operand:SI 2 "register_operand" "a,a")
> -          (match_operand:V16QI 3 "nonimmediate_operand" "x,m")
> +          (match_operand:V16QI 3 "nonimmediate_operand" "x,Bt")
>            (match_operand:SI 4 "register_operand" "d,d")
>            (match_operand:SI 5 "const_0_to_255_operand")]
>           UNSPEC_PCMPESTR))
> @@ -24360,6 +24391,7 @@ (define_insn "sse4_2_pcmpestrm"
>    "TARGET_SSE4_2"
>    "%vpcmpestrm\t{%5, %3, %1|%1, %3, %5}"
>    [(set_attr "type" "sselog")
> +   (set_attr "gpr32" "0")
>     (set_attr "prefix_extra" "1")
>     (set_attr "length_immediate" "1")
>     (set_attr "prefix" "maybe_vex")
> @@ -24372,7 +24404,7 @@ (define_insn "sse4_2_pcmpestr_cconly"
>         (unspec:CC
>           [(match_operand:V16QI 2 "register_operand" "x,x,x,x")
>            (match_operand:SI 3 "register_operand" "a,a,a,a")
> -          (match_operand:V16QI 4 "nonimmediate_operand" "x,m,x,m")
> +          (match_operand:V16QI 4 "nonimmediate_operand" "x,Bt,x,Bt")
>            (match_operand:SI 5 "register_operand" "d,d,d,d")
>            (match_operand:SI 6 "const_0_to_255_operand")]
>           UNSPEC_PCMPESTR))
> @@ -24385,6 +24417,7 @@ (define_insn "sse4_2_pcmpestr_cconly"
>     %vpcmpestri\t{%6, %4, %2|%2, %4, %6}
>     %vpcmpestri\t{%6, %4, %2|%2, %4, %6}"
>    [(set_attr "type" "sselog")
> +   (set_attr "gpr32" "0")
>     (set_attr "prefix_extra" "1")
>     (set_attr "length_immediate" "1")
>     (set_attr "memory" "none,load,none,load")
> @@ -24396,7 +24429,7 @@ (define_insn_and_split "sse4_2_pcmpistr"
>    [(set (match_operand:SI 0 "register_operand" "=c,c")
>         (unspec:SI
>           [(match_operand:V16QI 2 "register_operand" "x,x")
> -          (match_operand:V16QI 3 "nonimmediate_operand" "x,m")
> +          (match_operand:V16QI 3 "nonimmediate_operand" "x,Bt")
>            (match_operand:SI 4 "const_0_to_255_operand")]
>           UNSPEC_PCMPISTR))
>     (set (match_operand:V16QI 1 "register_operand" "=Yz,Yz")
> @@ -24439,6 +24472,7 @@ (define_insn_and_split "sse4_2_pcmpistr"
>    DONE;
>  }
>    [(set_attr "type" "sselog")
> +   (set_attr "gpr32" "0")
>     (set_attr "prefix_extra" "1")
>     (set_attr "length_immediate" "1")
>     (set_attr "memory" "none,load")
> @@ -24448,7 +24482,7 @@ (define_insn "sse4_2_pcmpistri"
>    [(set (match_operand:SI 0 "register_operand" "=c,c")
>         (unspec:SI
>           [(match_operand:V16QI 1 "register_operand" "x,x")
> -          (match_operand:V16QI 2 "nonimmediate_operand" "x,m")
> +          (match_operand:V16QI 2 "nonimmediate_operand" "x,Bt")
>            (match_operand:SI 3 "const_0_to_255_operand")]
>           UNSPEC_PCMPISTR))
>     (set (reg:CC FLAGS_REG)
> @@ -24460,6 +24494,7 @@ (define_insn "sse4_2_pcmpistri"
>    "TARGET_SSE4_2"
>    "%vpcmpistri\t{%3, %2, %1|%1, %2, %3}"
>    [(set_attr "type" "sselog")
> +   (set_attr "gpr32" "0")
>     (set_attr "prefix_extra" "1")
>     (set_attr "length_immediate" "1")
>     (set_attr "prefix" "maybe_vex")
> @@ -24471,7 +24506,7 @@ (define_insn "sse4_2_pcmpistrm"
>    [(set (match_operand:V16QI 0 "register_operand" "=Yz,Yz")
>         (unspec:V16QI
>           [(match_operand:V16QI 1 "register_operand" "x,x")
> -          (match_operand:V16QI 2 "nonimmediate_operand" "x,m")
> +          (match_operand:V16QI 2 "nonimmediate_operand" "x,Bt")
>            (match_operand:SI 3 "const_0_to_255_operand")]
>           UNSPEC_PCMPISTR))
>     (set (reg:CC FLAGS_REG)
> @@ -24483,6 +24518,7 @@ (define_insn "sse4_2_pcmpistrm"
>    "TARGET_SSE4_2"
>    "%vpcmpistrm\t{%3, %2, %1|%1, %2, %3}"
>    [(set_attr "type" "sselog")
> +   (set_attr "gpr32" "0")
>     (set_attr "prefix_extra" "1")
>     (set_attr "length_immediate" "1")
>     (set_attr "prefix" "maybe_vex")
> @@ -24494,7 +24530,7 @@ (define_insn "sse4_2_pcmpistr_cconly"
>    [(set (reg:CC FLAGS_REG)
>         (unspec:CC
>           [(match_operand:V16QI 2 "register_operand" "x,x,x,x")
> -          (match_operand:V16QI 3 "nonimmediate_operand" "x,m,x,m")
> +          (match_operand:V16QI 3 "nonimmediate_operand" "x,Bt,x,Bt")
>            (match_operand:SI 4 "const_0_to_255_operand")]
>           UNSPEC_PCMPISTR))
>     (clobber (match_scratch:V16QI 0 "=Yz,Yz,X,X"))
> @@ -24506,6 +24542,7 @@ (define_insn "sse4_2_pcmpistr_cconly"
>     %vpcmpistri\t{%4, %3, %2|%2, %3, %4}
>     %vpcmpistri\t{%4, %3, %2|%2, %3, %4}"
>    [(set_attr "type" "sselog")
> +   (set_attr "gpr32" "0")
>     (set_attr "prefix_extra" "1")
>     (set_attr "length_immediate" "1")
>     (set_attr "memory" "none,load,none,load")
> @@ -25990,23 +26027,25 @@ (define_insn "aesdeclast"
>
>  (define_insn "aesimc"
>    [(set (match_operand:V2DI 0 "register_operand" "=x")
> -       (unspec:V2DI [(match_operand:V2DI 1 "vector_operand" "xBm")]
> +       (unspec:V2DI [(match_operand:V2DI 1 "vector_operand" "xBT")]
>                       UNSPEC_AESIMC))]
>    "TARGET_AES"
>    "%vaesimc\t{%1, %0|%0, %1}"
>    [(set_attr "type" "sselog1")
> +   (set_attr "gpr32" "0")
>     (set_attr "prefix_extra" "1")
>     (set_attr "prefix" "maybe_vex")
>     (set_attr "mode" "TI")])
>
>  (define_insn "aeskeygenassist"
>    [(set (match_operand:V2DI 0 "register_operand" "=x")
> -       (unspec:V2DI [(match_operand:V2DI 1 "vector_operand" "xBm")
> +       (unspec:V2DI [(match_operand:V2DI 1 "vector_operand" "xBT")
>                       (match_operand:SI 2 "const_0_to_255_operand")]
>                      UNSPEC_AESKEYGENASSIST))]
>    "TARGET_AES"
>    "%vaeskeygenassist\t{%2, %1, %0|%0, %1, %2}"
>    [(set_attr "type" "sselog1")
> +   (set_attr "gpr32" "0")
>     (set_attr "prefix_extra" "1")
>     (set_attr "length_immediate" "1")
>     (set_attr "prefix" "maybe_vex")
> diff --git a/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c b/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c
> index 510213a6ca7..771bcb078e1 100644
> --- a/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c
> +++ b/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c
> @@ -45,13 +45,22 @@ typedef union
>    DTYPE a[16];
>  } tmp_u;
>
> -__attribute__((target("sse4.2")))
> +__attribute__((target("sse4.2,aes")))
>  void sse_test ()
>  {
>    register tmp_u *tdst __asm__("%r16");
>    register tmp_u *src1 __asm__("%r17");
>    register tmp_u *src2 __asm__("%r18");
> -
> +
> +  src1->xi[0] = _mm_minpos_epu16 (src1->xi[1]);
> +  src1->a[2] = _mm_testc_si128 (src1->xi[3], src2->xi[4]);
> +  src1->xf[3] = _mm_round_ss (src1->xf[5], src2->xf[6],
> +                             _MM_FROUND_CUR_DIRECTION);
> +  src1->xf[4] = _mm_round_ps (src1->xf[7], _MM_FROUND_CUR_DIRECTION);
> +  src1->xd[0] = _mm_round_sd (src1->xd[2], src2->xd[3],
> +                             _MM_FROUND_CUR_DIRECTION);
> +  src1->xd[1] = _mm_round_pd (src1->xd[4], _MM_FROUND_CUR_DIRECTION);
> +
>    src1->xi[0] = _mm_hadd_epi16 (tdst->xi[2], src2->xi[3]);
>    src1->xi[1] = _mm_hadd_epi32 (tdst->xi[0], src2->xi[1]);
>    tdst->xi[2] = _mm_hadds_epi16 (src1->xi[4], src2->xi[5]);
> @@ -77,16 +86,33 @@ void sse_test ()
>    tdst->xi[1] = _mm_sign_epi8 (src1->xi[5], src2->xi[6]);
>    tdst->xi[2] = _mm_sign_epi16 (src1->xi[7], src2->xi[0]);
>    tdst->xi[3] = _mm_sign_epi32 (src1->xi[1], src2->xi[2]);
> +
> +  tdst->a[2] = _mm_cmpestri (src1->xi[3], 16, src2->xi[4], 16, 0x0c);
> +  tdst->xi[4] = _mm_cmpestrm (src1->xi[3], 16, src2->xi[4], 16, 0x20);
> +  tdst->a[5] = _mm_cmpistri (src1->xi[5], src2->xi[6], 0x30);
> +  tdst->xi[6] = _mm_cmpistrm (src1->xi[5], src2->xi[6], 0x40);
> +
> +  tdst->xi[7] = _mm_aesimc_si128 (src1->xi[7]);
> +  tdst->xi[0] = _mm_aeskeygenassist_si128 (src1->xi[1], 0x1b);
>  }
>
> -__attribute__((target("avx2")))
> +__attribute__((target("avx2,aes")))
>  void vex_test ()
>  {
>
>    register tmp_u *tdst __asm__("%r16");
>    register tmp_u *src1 __asm__("%r17");
>    register tmp_u *src2 __asm__("%r18");
> -
> +
> +  src1->xi[0] = _mm_minpos_epu16 (src1->xi[1]);
> +  src1->a[2] = _mm256_testc_si256 (src1->yi[2], src2->yi[3]);
> +  src1->xf[3] = _mm_round_ss (src1->xf[5], src2->xf[6],
> +                             _MM_FROUND_CUR_DIRECTION);
> +  src1->yf[4] = _mm256_round_ps (src1->yf[2], _MM_FROUND_CUR_DIRECTION);
> +  src1->xd[0] = _mm_round_sd (src1->xd[2], src2->xd[3],
> +                             _MM_FROUND_CUR_DIRECTION);
> +  src1->yd[1] = _mm256_round_pd (src1->yd[3], _MM_FROUND_CUR_DIRECTION);
> +
>    src1->yi[1] = _mm256_hadd_epi16 (tdst->yi[2], src2->yi[3]);
>    src1->yi[2] = _mm256_hadd_epi32 (tdst->yi[0], src2->yi[1]);
>    tdst->yi[3] = _mm256_hadds_epi16 (src1->yi[1], src2->yi[2]);
> @@ -98,7 +124,6 @@ void vex_test ()
>    src1->yi[1] = _mm256_cmpgt_epi64 (tdst->yi[3], src2->yi[0]);
>
>    tdst->yf[2] = _mm256_dp_ps (src1->yf[0], src2->yf[1], 0xbf);
> -  tdst->xd[3] = _mm_dp_pd (src1->xd[0], src2->xd[1], 0xbf);
>
>    tdst->yi[3] = _mm256_mpsadbw_epu8 (src1->yi[1], src2->yi[1], 0xc1);
>
> @@ -112,6 +137,14 @@ void vex_test ()
>    tdst->yi[2] = _mm256_sign_epi8 (src1->yi[0], src2->yi[1]);
>    tdst->yi[3] = _mm256_sign_epi16 (src1->yi[2], src2->yi[3]);
>    tdst->yi[0] = _mm256_sign_epi32 (src1->yi[0], src2->yi[1]);
> +
> +  tdst->a[2] = _mm_cmpestri (src1->xi[3], 16, src2->xi[4], 16, 0x0c);
> +  tdst->xi[4] = _mm_cmpestrm (src1->xi[3], 16, src2->xi[4], 16, 0x20);
> +  tdst->a[5] = _mm_cmpistri (src1->xi[5], src2->xi[6], 0x30);
> +  tdst->xi[6] = _mm_cmpistrm (src1->xi[5], src2->xi[6], 0x40);
> +
> +  tdst->xi[7] = _mm_aesimc_si128 (src1->xi[7]);
> +  tdst->xi[0] = _mm_aeskeygenassist_si128 (src1->xi[1], 0x1b);
>  }
>
>  /* { dg-final { scan-assembler-not "v?pcmpeqq\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> @@ -134,3 +167,15 @@ void vex_test ()
>  /* { dg-final { scan-assembler-not "v?psignb\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
>  /* { dg-final { scan-assembler-not "v?psignw\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
>  /* { dg-final { scan-assembler-not "v?psignd\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> +/* { dg-final { scan-assembler-not "v?phminposuw\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> +/* { dg-final { scan-assembler-not "v?ptest\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> +/* { dg-final { scan-assembler-not "v?roundss\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> +/* { dg-final { scan-assembler-not "v?roundsd\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> +/* { dg-final { scan-assembler-not "v?roundps\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> +/* { dg-final { scan-assembler-not "v?roundpd\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> +/* { dg-final { scan-assembler-not "v?pcmpestri\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> +/* { dg-final { scan-assembler-not "v?pcmpistri\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> +/* { dg-final { scan-assembler-not "v?pcmpestrm\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> +/* { dg-final { scan-assembler-not "v?pcmpistrm\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> +/* { dg-final { scan-assembler-not "v?aesimc\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> +/* { dg-final { scan-assembler-not "v?aeskeygenassist\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> --
> 2.31.1
>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 11/13] [APX EGPR] Handle legacy insns that only support GPR16 (3/5)
  2023-08-31  9:26   ` Richard Biener
@ 2023-08-31  9:28     ` Richard Biener
  2023-09-01  9:03       ` Hongyu Wang
  2023-09-01 10:38       ` Hongtao Liu
  2023-08-31  9:31     ` Jakub Jelinek
  1 sibling, 2 replies; 49+ messages in thread
From: Richard Biener @ 2023-08-31  9:28 UTC (permalink / raw)
  To: Hongyu Wang; +Cc: gcc-patches, jakub, hongtao.liu, hubicka

On Thu, Aug 31, 2023 at 11:26 AM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Thu, Aug 31, 2023 at 10:25 AM Hongyu Wang via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > From: Kong Lingling <lingling.kong@intel.com>
> >
> > Disable EGPR usage for below legacy insns in opcode map2/3 that have vex
> > but no evex counterpart.
> >
> > insn list:
> > 1. phminposuw/vphminposuw
> > 2. ptest/vptest
> > 3. roundps/vroundps, roundpd/vroundpd,
> >    roundss/vroundss, roundsd/vroundsd
> > 4. pcmpestri/vpcmpestri, pcmpestrm/vpcmpestrm
> > 5. pcmpistri/vpcmpistri, pcmpistrm/vpcmpistrm
>
> How are GPRs involved in the above?  Or did I misunderstand something?

Following up myself - for the memory operand alternatives I guess.  How about
simply disabling the memory alternatives when EGPR is active?  Wouldn't
that simplify the initial patchset a lot?  Re-enabling them when
deemed important
could be done as followup then?

Richard.

> > 6. aesimc/vaesimc, aeskeygenassist/vaeskeygenassist
> >
> > gcc/ChangeLog:
> >
> >         * config/i386/i386-protos.h (x86_evex_reg_mentioned_p): New
> >         prototype.
> >         * config/i386/i386.cc (x86_evex_reg_mentioned_p): New
> >         function.
> >         * config/i386/i386.md (sse4_1_round<mode>2): Set attr gpr32 0
> >         and constraint Bt/BM to all non-evex alternatives, adjust
> >         alternative outputs if evex reg is mentioned.
> >         * config/i386/sse.md (<sse4_1>_ptest<mode>): Set attr gpr32 0
> >         and constraint Bt/BM to all non-evex alternatives.
> >         (ptesttf2): Likewise.
> >         (<sse4_1>_round<ssemodesuffix><avxsizesuffix): Likewise.
> >         (sse4_1_round<ssescalarmodesuffix>): Likewise.
> >         (sse4_2_pcmpestri): Likewise.
> >         (sse4_2_pcmpestrm): Likewise.
> >         (sse4_2_pcmpestr_cconly): Likewise.
> >         (sse4_2_pcmpistr): Likewise.
> >         (sse4_2_pcmpistri): Likewise.
> >         (sse4_2_pcmpistrm): Likewise.
> >         (sse4_2_pcmpistr_cconly): Likewise.
> >         (aesimc): Likewise.
> >         (aeskeygenassist): Likewise.
> >
> > gcc/testsuite/ChangeLog:
> >
> >         * gcc.target/i386/apx-legacy-insn-check-norex2.c: Add intrinsic
> >         tests.
> > ---
> >  gcc/config/i386/i386-protos.h                 |  1 +
> >  gcc/config/i386/i386.cc                       | 13 +++
> >  gcc/config/i386/i386.md                       |  3 +-
> >  gcc/config/i386/sse.md                        | 93 +++++++++++++------
> >  .../i386/apx-legacy-insn-check-norex2.c       | 55 ++++++++++-
> >  5 files changed, 132 insertions(+), 33 deletions(-)
> >
> > diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
> > index 78eb3e0f584..bbb219e3039 100644
> > --- a/gcc/config/i386/i386-protos.h
> > +++ b/gcc/config/i386/i386-protos.h
> > @@ -65,6 +65,7 @@ extern bool extended_reg_mentioned_p (rtx);
> >  extern bool x86_extended_QIreg_mentioned_p (rtx_insn *);
> >  extern bool x86_extended_reg_mentioned_p (rtx);
> >  extern bool x86_extended_rex2reg_mentioned_p (rtx);
> > +extern bool x86_evex_reg_mentioned_p (rtx [], int);
> >  extern bool x86_maybe_negate_const_int (rtx *, machine_mode);
> >  extern machine_mode ix86_cc_mode (enum rtx_code, rtx, rtx);
> >
> > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > index f5d642948bc..ec93c5bab97 100644
> > --- a/gcc/config/i386/i386.cc
> > +++ b/gcc/config/i386/i386.cc
> > @@ -22936,6 +22936,19 @@ x86_extended_rex2reg_mentioned_p (rtx insn)
> >    return false;
> >  }
> >
> > +/* Return true when rtx operands mentions register that must be encoded using
> > +   evex prefix.  */
> > +bool
> > +x86_evex_reg_mentioned_p (rtx operands[], int nops)
> > +{
> > +  int i;
> > +  for (i = 0; i < nops; i++)
> > +    if (EXT_REX_SSE_REG_P (operands[i])
> > +       || x86_extended_rex2reg_mentioned_p (operands[i]))
> > +      return true;
> > +  return false;
> > +}
> > +
> >  /* If profitable, negate (without causing overflow) integer constant
> >     of mode MODE at location LOC.  Return true in this case.  */
> >  bool
> > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> > index 83ad01b43c1..4c305e72389 100644
> > --- a/gcc/config/i386/i386.md
> > +++ b/gcc/config/i386/i386.md
> > @@ -21603,7 +21603,7 @@ (define_expand "significand<mode>2"
> >  (define_insn "sse4_1_round<mode>2"
> >    [(set (match_operand:MODEFH 0 "register_operand" "=x,x,x,v,v")
> >         (unspec:MODEFH
> > -         [(match_operand:MODEFH 1 "nonimmediate_operand" "0,x,m,v,m")
> > +         [(match_operand:MODEFH 1 "nonimmediate_operand" "0,x,Bt,v,m")
> >            (match_operand:SI 2 "const_0_to_15_operand")]
> >           UNSPEC_ROUND))]
> >    "TARGET_SSE4_1"
> > @@ -21616,6 +21616,7 @@ (define_insn "sse4_1_round<mode>2"
> >    [(set_attr "type" "ssecvt")
> >     (set_attr "prefix_extra" "1,1,1,*,*")
> >     (set_attr "length_immediate" "1")
> > +   (set_attr "gpr32" "1,1,0,1,1")
> >     (set_attr "prefix" "maybe_vex,maybe_vex,maybe_vex,evex,evex")
> >     (set_attr "isa" "noavx512f,noavx512f,noavx512f,avx512f,avx512f")
> >     (set_attr "avx_partial_xmm_update" "false,false,true,false,true")
> > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > index 05963de9219..456713b991a 100644
> > --- a/gcc/config/i386/sse.md
> > +++ b/gcc/config/i386/sse.md
> > @@ -22617,11 +22617,12 @@ (define_insn "avx2_pblendd<mode>"
> >
> >  (define_insn "sse4_1_phminposuw"
> >    [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,x")
> > -       (unspec:V8HI [(match_operand:V8HI 1 "vector_operand" "YrBm,*xBm,xm")]
> > +       (unspec:V8HI [(match_operand:V8HI 1 "vector_operand" "YrBT,*xBT,xBt")]
> >                      UNSPEC_PHMINPOSUW))]
> >    "TARGET_SSE4_1"
> >    "%vphminposuw\t{%1, %0|%0, %1}"
> >    [(set_attr "isa" "noavx,noavx,avx")
> > +   (set_attr "gpr32" "0")
> >     (set_attr "type" "sselog1")
> >     (set_attr "prefix_extra" "1")
> >     (set_attr "prefix" "orig,orig,vex")
> > @@ -23810,12 +23811,13 @@ (define_insn "avx_vtest<ssemodesuffix><avxsizesuffix>"
> >  (define_insn "*<sse4_1>_ptest<mode>"
> >    [(set (reg FLAGS_REG)
> >         (unspec [(match_operand:V_AVX 0 "register_operand" "Yr, *x, x")
> > -                (match_operand:V_AVX 1 "vector_operand" "YrBm, *xBm, xm")]
> > +                (match_operand:V_AVX 1 "vector_operand" "YrBT, *xBT, xBt")]
> >                 UNSPEC_PTEST))]
> >    "TARGET_SSE4_1 && ix86_match_ptest_ccmode (insn)"
> >    "%vptest\t{%1, %0|%0, %1}"
> >    [(set_attr "isa" "noavx,noavx,avx")
> >     (set_attr "type" "ssecomi")
> > +   (set_attr "gpr32" "0")
> >     (set_attr "prefix_extra" "1")
> >     (set_attr "prefix" "orig,orig,vex")
> >     (set (attr "btver2_decode")
> > @@ -23852,12 +23854,13 @@ (define_expand "<sse4_1>_ptest<mode>"
> >  (define_insn "ptesttf2"
> >    [(set (reg:CC FLAGS_REG)
> >         (unspec:CC [(match_operand:TF 0 "register_operand" "Yr, *x, x")
> > -                   (match_operand:TF 1 "vector_operand" "YrBm, *xBm, xm")]
> > +                   (match_operand:TF 1 "vector_operand" "YrBT, *xBT, xBt")]
> >                    UNSPEC_PTEST))]
> >    "TARGET_SSE4_1"
> >    "%vptest\t{%1, %0|%0, %1}"
> >    [(set_attr "isa" "noavx,noavx,avx")
> >     (set_attr "type" "ssecomi")
> > +   (set_attr "gpr32" "0")
> >     (set_attr "prefix_extra" "1")
> >     (set_attr "prefix" "orig,orig,vex")
> >     (set_attr "mode" "TI")])
> > @@ -23968,13 +23971,14 @@ (define_expand "lrint<mode><sseintvecmodelower>2"
> >  (define_insn "<sse4_1>_round<ssemodesuffix><avxsizesuffix>"
> >    [(set (match_operand:VF_128_256 0 "register_operand" "=Yr,*x,x")
> >         (unspec:VF_128_256
> > -         [(match_operand:VF_128_256 1 "vector_operand" "YrBm,*xBm,xm")
> > +         [(match_operand:VF_128_256 1 "vector_operand" "YrBT,*xBT,xBt")
> >            (match_operand:SI 2 "const_0_to_15_operand")]
> >           UNSPEC_ROUND))]
> >    "TARGET_SSE4_1"
> >    "%vround<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
> >    [(set_attr "isa" "noavx,noavx,avx")
> >     (set_attr "type" "ssecvt")
> > +   (set_attr "gpr32" "0")
> >     (set_attr "prefix_data16" "1,1,*")
> >     (set_attr "prefix_extra" "1")
> >     (set_attr "length_immediate" "1")
> > @@ -24061,19 +24065,32 @@ (define_insn "sse4_1_round<ssescalarmodesuffix>"
> >    [(set (match_operand:VF_128 0 "register_operand" "=Yr,*x,x,v")
> >         (vec_merge:VF_128
> >           (unspec:VF_128
> > -           [(match_operand:VF_128 2 "nonimmediate_operand" "Yrm,*xm,xm,vm")
> > +           [(match_operand:VF_128 2 "nonimmediate_operand" "YrBt,*xBt,xBt,vm")
> >              (match_operand:SI 3 "const_0_to_15_operand")]
> >             UNSPEC_ROUND)
> >           (match_operand:VF_128 1 "register_operand" "0,0,x,v")
> >           (const_int 1)))]
> >    "TARGET_SSE4_1"
> > -  "@
> > -   round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %3}
> > -   round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %3}
> > -   vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}
> > -   vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}"
> > -  [(set_attr "isa" "noavx,noavx,avx,avx512f")
> > +{
> > +  switch (which_alternative)
> > +    {
> > +      case 0:
> > +      case 1:
> > +       return "round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %3}";
> > +      case 2:
> > +       return "vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}";
> > +      case 3:
> > +       if (x86_evex_reg_mentioned_p (operands, 3))
> > +         return "vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}";
> > +       else
> > +         return "vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}";
> > +      default:
> > +       gcc_unreachable ();
> > +    }
> > +}
> > +  [(set_attr "isa" "noavx,noavx,noavx512f,avx512f")
> >     (set_attr "type" "ssecvt")
> > +   (set_attr "gpr32" "0,0,0,1")
> >     (set_attr "length_immediate" "1")
> >     (set_attr "prefix_data16" "1,1,*,*")
> >     (set_attr "prefix_extra" "1")
> > @@ -24085,19 +24102,32 @@ (define_insn "*sse4_1_round<ssescalarmodesuffix>"
> >         (vec_merge:VFH_128
> >           (vec_duplicate:VFH_128
> >             (unspec:<ssescalarmode>
> > -             [(match_operand:<ssescalarmode> 2 "nonimmediate_operand" "Yrm,*xm,xm,vm")
> > +             [(match_operand:<ssescalarmode> 2 "nonimmediate_operand" "YrBt,*xBt,xBt,vm")
> >                (match_operand:SI 3 "const_0_to_15_operand")]
> >               UNSPEC_ROUND))
> >           (match_operand:VFH_128 1 "register_operand" "0,0,x,v")
> >           (const_int 1)))]
> >    "TARGET_SSE4_1"
> > -  "@
> > -   round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %2, %3}
> > -   round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %2, %3}
> > -   vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}
> > -   vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
> > -  [(set_attr "isa" "noavx,noavx,avx,avx512f")
> > +{
> > +  switch (which_alternative)
> > +    {
> > +      case 0:
> > +      case 1:
> > +       return "round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %2, %3}";
> > +      case 2:
> > +       return "vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}";
> > +      case 3:
> > +       if (x86_evex_reg_mentioned_p (operands, 3) || <MODE>mode == V8HFmode)
> > +         return "vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}";
> > +       else
> > +         return "vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}";
> > +      default:
> > +       gcc_unreachable ();
> > +    }
> > +}
> > +  [(set_attr "isa" "noavx,noavx,noavx512f,avx512f")
> >     (set_attr "type" "ssecvt")
> > +   (set_attr "gpr32" "0,0,0,1")
> >     (set_attr "length_immediate" "1")
> >     (set_attr "prefix_data16" "1,1,*,*")
> >     (set_attr "prefix_extra" "1")
> > @@ -24318,7 +24348,7 @@ (define_insn "sse4_2_pcmpestri"
> >         (unspec:SI
> >           [(match_operand:V16QI 1 "register_operand" "x,x")
> >            (match_operand:SI 2 "register_operand" "a,a")
> > -          (match_operand:V16QI 3 "nonimmediate_operand" "x,m")
> > +          (match_operand:V16QI 3 "nonimmediate_operand" "x,Bt")
> >            (match_operand:SI 4 "register_operand" "d,d")
> >            (match_operand:SI 5 "const_0_to_255_operand")]
> >           UNSPEC_PCMPESTR))
> > @@ -24333,6 +24363,7 @@ (define_insn "sse4_2_pcmpestri"
> >    "TARGET_SSE4_2"
> >    "%vpcmpestri\t{%5, %3, %1|%1, %3, %5}"
> >    [(set_attr "type" "sselog")
> > +   (set_attr "gpr32" "0")
> >     (set_attr "prefix_extra" "1")
> >     (set_attr "prefix" "maybe_vex")
> >     (set_attr "length_immediate" "1")
> > @@ -24345,7 +24376,7 @@ (define_insn "sse4_2_pcmpestrm"
> >         (unspec:V16QI
> >           [(match_operand:V16QI 1 "register_operand" "x,x")
> >            (match_operand:SI 2 "register_operand" "a,a")
> > -          (match_operand:V16QI 3 "nonimmediate_operand" "x,m")
> > +          (match_operand:V16QI 3 "nonimmediate_operand" "x,Bt")
> >            (match_operand:SI 4 "register_operand" "d,d")
> >            (match_operand:SI 5 "const_0_to_255_operand")]
> >           UNSPEC_PCMPESTR))
> > @@ -24360,6 +24391,7 @@ (define_insn "sse4_2_pcmpestrm"
> >    "TARGET_SSE4_2"
> >    "%vpcmpestrm\t{%5, %3, %1|%1, %3, %5}"
> >    [(set_attr "type" "sselog")
> > +   (set_attr "gpr32" "0")
> >     (set_attr "prefix_extra" "1")
> >     (set_attr "length_immediate" "1")
> >     (set_attr "prefix" "maybe_vex")
> > @@ -24372,7 +24404,7 @@ (define_insn "sse4_2_pcmpestr_cconly"
> >         (unspec:CC
> >           [(match_operand:V16QI 2 "register_operand" "x,x,x,x")
> >            (match_operand:SI 3 "register_operand" "a,a,a,a")
> > -          (match_operand:V16QI 4 "nonimmediate_operand" "x,m,x,m")
> > +          (match_operand:V16QI 4 "nonimmediate_operand" "x,Bt,x,Bt")
> >            (match_operand:SI 5 "register_operand" "d,d,d,d")
> >            (match_operand:SI 6 "const_0_to_255_operand")]
> >           UNSPEC_PCMPESTR))
> > @@ -24385,6 +24417,7 @@ (define_insn "sse4_2_pcmpestr_cconly"
> >     %vpcmpestri\t{%6, %4, %2|%2, %4, %6}
> >     %vpcmpestri\t{%6, %4, %2|%2, %4, %6}"
> >    [(set_attr "type" "sselog")
> > +   (set_attr "gpr32" "0")
> >     (set_attr "prefix_extra" "1")
> >     (set_attr "length_immediate" "1")
> >     (set_attr "memory" "none,load,none,load")
> > @@ -24396,7 +24429,7 @@ (define_insn_and_split "sse4_2_pcmpistr"
> >    [(set (match_operand:SI 0 "register_operand" "=c,c")
> >         (unspec:SI
> >           [(match_operand:V16QI 2 "register_operand" "x,x")
> > -          (match_operand:V16QI 3 "nonimmediate_operand" "x,m")
> > +          (match_operand:V16QI 3 "nonimmediate_operand" "x,Bt")
> >            (match_operand:SI 4 "const_0_to_255_operand")]
> >           UNSPEC_PCMPISTR))
> >     (set (match_operand:V16QI 1 "register_operand" "=Yz,Yz")
> > @@ -24439,6 +24472,7 @@ (define_insn_and_split "sse4_2_pcmpistr"
> >    DONE;
> >  }
> >    [(set_attr "type" "sselog")
> > +   (set_attr "gpr32" "0")
> >     (set_attr "prefix_extra" "1")
> >     (set_attr "length_immediate" "1")
> >     (set_attr "memory" "none,load")
> > @@ -24448,7 +24482,7 @@ (define_insn "sse4_2_pcmpistri"
> >    [(set (match_operand:SI 0 "register_operand" "=c,c")
> >         (unspec:SI
> >           [(match_operand:V16QI 1 "register_operand" "x,x")
> > -          (match_operand:V16QI 2 "nonimmediate_operand" "x,m")
> > +          (match_operand:V16QI 2 "nonimmediate_operand" "x,Bt")
> >            (match_operand:SI 3 "const_0_to_255_operand")]
> >           UNSPEC_PCMPISTR))
> >     (set (reg:CC FLAGS_REG)
> > @@ -24460,6 +24494,7 @@ (define_insn "sse4_2_pcmpistri"
> >    "TARGET_SSE4_2"
> >    "%vpcmpistri\t{%3, %2, %1|%1, %2, %3}"
> >    [(set_attr "type" "sselog")
> > +   (set_attr "gpr32" "0")
> >     (set_attr "prefix_extra" "1")
> >     (set_attr "length_immediate" "1")
> >     (set_attr "prefix" "maybe_vex")
> > @@ -24471,7 +24506,7 @@ (define_insn "sse4_2_pcmpistrm"
> >    [(set (match_operand:V16QI 0 "register_operand" "=Yz,Yz")
> >         (unspec:V16QI
> >           [(match_operand:V16QI 1 "register_operand" "x,x")
> > -          (match_operand:V16QI 2 "nonimmediate_operand" "x,m")
> > +          (match_operand:V16QI 2 "nonimmediate_operand" "x,Bt")
> >            (match_operand:SI 3 "const_0_to_255_operand")]
> >           UNSPEC_PCMPISTR))
> >     (set (reg:CC FLAGS_REG)
> > @@ -24483,6 +24518,7 @@ (define_insn "sse4_2_pcmpistrm"
> >    "TARGET_SSE4_2"
> >    "%vpcmpistrm\t{%3, %2, %1|%1, %2, %3}"
> >    [(set_attr "type" "sselog")
> > +   (set_attr "gpr32" "0")
> >     (set_attr "prefix_extra" "1")
> >     (set_attr "length_immediate" "1")
> >     (set_attr "prefix" "maybe_vex")
> > @@ -24494,7 +24530,7 @@ (define_insn "sse4_2_pcmpistr_cconly"
> >    [(set (reg:CC FLAGS_REG)
> >         (unspec:CC
> >           [(match_operand:V16QI 2 "register_operand" "x,x,x,x")
> > -          (match_operand:V16QI 3 "nonimmediate_operand" "x,m,x,m")
> > +          (match_operand:V16QI 3 "nonimmediate_operand" "x,Bt,x,Bt")
> >            (match_operand:SI 4 "const_0_to_255_operand")]
> >           UNSPEC_PCMPISTR))
> >     (clobber (match_scratch:V16QI 0 "=Yz,Yz,X,X"))
> > @@ -24506,6 +24542,7 @@ (define_insn "sse4_2_pcmpistr_cconly"
> >     %vpcmpistri\t{%4, %3, %2|%2, %3, %4}
> >     %vpcmpistri\t{%4, %3, %2|%2, %3, %4}"
> >    [(set_attr "type" "sselog")
> > +   (set_attr "gpr32" "0")
> >     (set_attr "prefix_extra" "1")
> >     (set_attr "length_immediate" "1")
> >     (set_attr "memory" "none,load,none,load")
> > @@ -25990,23 +26027,25 @@ (define_insn "aesdeclast"
> >
> >  (define_insn "aesimc"
> >    [(set (match_operand:V2DI 0 "register_operand" "=x")
> > -       (unspec:V2DI [(match_operand:V2DI 1 "vector_operand" "xBm")]
> > +       (unspec:V2DI [(match_operand:V2DI 1 "vector_operand" "xBT")]
> >                       UNSPEC_AESIMC))]
> >    "TARGET_AES"
> >    "%vaesimc\t{%1, %0|%0, %1}"
> >    [(set_attr "type" "sselog1")
> > +   (set_attr "gpr32" "0")
> >     (set_attr "prefix_extra" "1")
> >     (set_attr "prefix" "maybe_vex")
> >     (set_attr "mode" "TI")])
> >
> >  (define_insn "aeskeygenassist"
> >    [(set (match_operand:V2DI 0 "register_operand" "=x")
> > -       (unspec:V2DI [(match_operand:V2DI 1 "vector_operand" "xBm")
> > +       (unspec:V2DI [(match_operand:V2DI 1 "vector_operand" "xBT")
> >                       (match_operand:SI 2 "const_0_to_255_operand")]
> >                      UNSPEC_AESKEYGENASSIST))]
> >    "TARGET_AES"
> >    "%vaeskeygenassist\t{%2, %1, %0|%0, %1, %2}"
> >    [(set_attr "type" "sselog1")
> > +   (set_attr "gpr32" "0")
> >     (set_attr "prefix_extra" "1")
> >     (set_attr "length_immediate" "1")
> >     (set_attr "prefix" "maybe_vex")
> > diff --git a/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c b/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c
> > index 510213a6ca7..771bcb078e1 100644
> > --- a/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c
> > +++ b/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c
> > @@ -45,13 +45,22 @@ typedef union
> >    DTYPE a[16];
> >  } tmp_u;
> >
> > -__attribute__((target("sse4.2")))
> > +__attribute__((target("sse4.2,aes")))
> >  void sse_test ()
> >  {
> >    register tmp_u *tdst __asm__("%r16");
> >    register tmp_u *src1 __asm__("%r17");
> >    register tmp_u *src2 __asm__("%r18");
> > -
> > +
> > +  src1->xi[0] = _mm_minpos_epu16 (src1->xi[1]);
> > +  src1->a[2] = _mm_testc_si128 (src1->xi[3], src2->xi[4]);
> > +  src1->xf[3] = _mm_round_ss (src1->xf[5], src2->xf[6],
> > +                             _MM_FROUND_CUR_DIRECTION);
> > +  src1->xf[4] = _mm_round_ps (src1->xf[7], _MM_FROUND_CUR_DIRECTION);
> > +  src1->xd[0] = _mm_round_sd (src1->xd[2], src2->xd[3],
> > +                             _MM_FROUND_CUR_DIRECTION);
> > +  src1->xd[1] = _mm_round_pd (src1->xd[4], _MM_FROUND_CUR_DIRECTION);
> > +
> >    src1->xi[0] = _mm_hadd_epi16 (tdst->xi[2], src2->xi[3]);
> >    src1->xi[1] = _mm_hadd_epi32 (tdst->xi[0], src2->xi[1]);
> >    tdst->xi[2] = _mm_hadds_epi16 (src1->xi[4], src2->xi[5]);
> > @@ -77,16 +86,33 @@ void sse_test ()
> >    tdst->xi[1] = _mm_sign_epi8 (src1->xi[5], src2->xi[6]);
> >    tdst->xi[2] = _mm_sign_epi16 (src1->xi[7], src2->xi[0]);
> >    tdst->xi[3] = _mm_sign_epi32 (src1->xi[1], src2->xi[2]);
> > +
> > +  tdst->a[2] = _mm_cmpestri (src1->xi[3], 16, src2->xi[4], 16, 0x0c);
> > +  tdst->xi[4] = _mm_cmpestrm (src1->xi[3], 16, src2->xi[4], 16, 0x20);
> > +  tdst->a[5] = _mm_cmpistri (src1->xi[5], src2->xi[6], 0x30);
> > +  tdst->xi[6] = _mm_cmpistrm (src1->xi[5], src2->xi[6], 0x40);
> > +
> > +  tdst->xi[7] = _mm_aesimc_si128 (src1->xi[7]);
> > +  tdst->xi[0] = _mm_aeskeygenassist_si128 (src1->xi[1], 0x1b);
> >  }
> >
> > -__attribute__((target("avx2")))
> > +__attribute__((target("avx2,aes")))
> >  void vex_test ()
> >  {
> >
> >    register tmp_u *tdst __asm__("%r16");
> >    register tmp_u *src1 __asm__("%r17");
> >    register tmp_u *src2 __asm__("%r18");
> > -
> > +
> > +  src1->xi[0] = _mm_minpos_epu16 (src1->xi[1]);
> > +  src1->a[2] = _mm256_testc_si256 (src1->yi[2], src2->yi[3]);
> > +  src1->xf[3] = _mm_round_ss (src1->xf[5], src2->xf[6],
> > +                             _MM_FROUND_CUR_DIRECTION);
> > +  src1->yf[4] = _mm256_round_ps (src1->yf[2], _MM_FROUND_CUR_DIRECTION);
> > +  src1->xd[0] = _mm_round_sd (src1->xd[2], src2->xd[3],
> > +                             _MM_FROUND_CUR_DIRECTION);
> > +  src1->yd[1] = _mm256_round_pd (src1->yd[3], _MM_FROUND_CUR_DIRECTION);
> > +
> >    src1->yi[1] = _mm256_hadd_epi16 (tdst->yi[2], src2->yi[3]);
> >    src1->yi[2] = _mm256_hadd_epi32 (tdst->yi[0], src2->yi[1]);
> >    tdst->yi[3] = _mm256_hadds_epi16 (src1->yi[1], src2->yi[2]);
> > @@ -98,7 +124,6 @@ void vex_test ()
> >    src1->yi[1] = _mm256_cmpgt_epi64 (tdst->yi[3], src2->yi[0]);
> >
> >    tdst->yf[2] = _mm256_dp_ps (src1->yf[0], src2->yf[1], 0xbf);
> > -  tdst->xd[3] = _mm_dp_pd (src1->xd[0], src2->xd[1], 0xbf);
> >
> >    tdst->yi[3] = _mm256_mpsadbw_epu8 (src1->yi[1], src2->yi[1], 0xc1);
> >
> > @@ -112,6 +137,14 @@ void vex_test ()
> >    tdst->yi[2] = _mm256_sign_epi8 (src1->yi[0], src2->yi[1]);
> >    tdst->yi[3] = _mm256_sign_epi16 (src1->yi[2], src2->yi[3]);
> >    tdst->yi[0] = _mm256_sign_epi32 (src1->yi[0], src2->yi[1]);
> > +
> > +  tdst->a[2] = _mm_cmpestri (src1->xi[3], 16, src2->xi[4], 16, 0x0c);
> > +  tdst->xi[4] = _mm_cmpestrm (src1->xi[3], 16, src2->xi[4], 16, 0x20);
> > +  tdst->a[5] = _mm_cmpistri (src1->xi[5], src2->xi[6], 0x30);
> > +  tdst->xi[6] = _mm_cmpistrm (src1->xi[5], src2->xi[6], 0x40);
> > +
> > +  tdst->xi[7] = _mm_aesimc_si128 (src1->xi[7]);
> > +  tdst->xi[0] = _mm_aeskeygenassist_si128 (src1->xi[1], 0x1b);
> >  }
> >
> >  /* { dg-final { scan-assembler-not "v?pcmpeqq\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > @@ -134,3 +167,15 @@ void vex_test ()
> >  /* { dg-final { scan-assembler-not "v?psignb\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> >  /* { dg-final { scan-assembler-not "v?psignw\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> >  /* { dg-final { scan-assembler-not "v?psignd\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > +/* { dg-final { scan-assembler-not "v?phminposuw\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > +/* { dg-final { scan-assembler-not "v?ptest\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > +/* { dg-final { scan-assembler-not "v?roundss\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > +/* { dg-final { scan-assembler-not "v?roundsd\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > +/* { dg-final { scan-assembler-not "v?roundps\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > +/* { dg-final { scan-assembler-not "v?roundpd\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > +/* { dg-final { scan-assembler-not "v?pcmpestri\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > +/* { dg-final { scan-assembler-not "v?pcmpistri\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > +/* { dg-final { scan-assembler-not "v?pcmpestrm\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > +/* { dg-final { scan-assembler-not "v?pcmpistrm\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > +/* { dg-final { scan-assembler-not "v?aesimc\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > +/* { dg-final { scan-assembler-not "v?aeskeygenassist\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > --
> > 2.31.1
> >

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 11/13] [APX EGPR] Handle legacy insns that only support GPR16 (3/5)
  2023-08-31  9:26   ` Richard Biener
  2023-08-31  9:28     ` Richard Biener
@ 2023-08-31  9:31     ` Jakub Jelinek
  1 sibling, 0 replies; 49+ messages in thread
From: Jakub Jelinek @ 2023-08-31  9:31 UTC (permalink / raw)
  To: Richard Biener; +Cc: Hongyu Wang, gcc-patches, hongtao.liu, hubicka

On Thu, Aug 31, 2023 at 11:26:26AM +0200, Richard Biener wrote:
> On Thu, Aug 31, 2023 at 10:25 AM Hongyu Wang via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > From: Kong Lingling <lingling.kong@intel.com>
> >
> > Disable EGPR usage for below legacy insns in opcode map2/3 that have vex
> > but no evex counterpart.
> >
> > insn list:
> > 1. phminposuw/vphminposuw
> > 2. ptest/vptest
> > 3. roundps/vroundps, roundpd/vroundpd,
> >    roundss/vroundss, roundsd/vroundsd
> > 4. pcmpestri/vpcmpestri, pcmpestrm/vpcmpestrm
> > 5. pcmpistri/vpcmpistri, pcmpistrm/vpcmpistrm
> 
> How are GPRs involved in the above?  Or did I misunderstand something?

Those instructions allow memory operands, and say vptest (%r18), %xmm7
isn't supported.

	Jakub


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 08/13] [APX EGPR] Handle GPR16 only vector move insns
  2023-08-31  8:20 ` [PATCH 08/13] [APX EGPR] Handle GPR16 only vector move insns Hongyu Wang
@ 2023-08-31  9:43   ` Jakub Jelinek
  2023-09-01  9:07     ` Hongyu Wang
  0 siblings, 1 reply; 49+ messages in thread
From: Jakub Jelinek @ 2023-08-31  9:43 UTC (permalink / raw)
  To: Hongyu Wang; +Cc: gcc-patches, hongtao.liu, hubicka

On Thu, Aug 31, 2023 at 04:20:19PM +0800, Hongyu Wang via Gcc-patches wrote:
> For vector move insns like vmovdqa/vmovdqu, their evex counterparts
> requrire explicit suffix 64/32/16/8. The usage of these instruction
> are prohibited under AVX10_1 or AVX512F, so for AVX2+APX_F we select
> vmovaps/vmovups for vector load/store insns that contains EGPR.

Why not make it dependent on AVX512VL?
I.e. if egpr_p && TARGET_AVX512VL, still use vmovdqu16 or vmovdqa16
and the like, and only if !evex_reg_p && egpr_p && !TARGET_AVX512VL
fall back to what you're doing?
> 
> gcc/ChangeLog:
> 
> 	* config/i386/i386.cc (ix86_get_ssemov): Check if egpr is used,
> 	adjust mnemonic for vmovduq/vmovdqa.
> 	* config/i386/sse.md (*<extract_type>_vinsert<shuffletype><extract_suf>_0):
> 	Check if egpr is used, adjust mnemonic for vmovdqu/vmovdqa.
> 	(avx_vec_concat<mode>): Likewise, and separate alternative 0 to
> 	avx_noavx512f.

	Jakub


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 06/13] [APX EGPR] Map reg/mem constraints in inline asm to non-EGPR constraint.
  2023-08-31  9:17   ` Jakub Jelinek
@ 2023-08-31 10:00     ` Uros Bizjak
  2023-09-01  9:04       ` Hongyu Wang
  2023-09-01 11:03       ` Richard Sandiford
  2023-09-01  9:04     ` Hongyu Wang
  1 sibling, 2 replies; 49+ messages in thread
From: Uros Bizjak @ 2023-08-31 10:00 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Hongyu Wang, gcc-patches, hongtao.liu, hubicka

On Thu, Aug 31, 2023 at 11:18 AM Jakub Jelinek via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On Thu, Aug 31, 2023 at 04:20:17PM +0800, Hongyu Wang via Gcc-patches wrote:
> > From: Kong Lingling <lingling.kong@intel.com>
> >
> > In inline asm, we do not know if the insn can use EGPR, so disable EGPR
> > usage by default from mapping the common reg/mem constraint to non-EGPR
> > constraints. Use a flag mapx-inline-asm-use-gpr32 to enable EGPR usage
> > for inline asm.
> >
> > gcc/ChangeLog:
> >
> >       * config/i386/i386.cc (INCLUDE_STRING): Add include for
> >       ix86_md_asm_adjust.
> >       (ix86_md_asm_adjust): When APX EGPR enabled without specifying the
> >       target option, map reg/mem constraints to non-EGPR constraints.
> >       * config/i386/i386.opt: Add option mapx-inline-asm-use-gpr32.
> >
> > gcc/testsuite/ChangeLog:
> >
> >       * gcc.target/i386/apx-inline-gpr-norex2.c: New test.
> > ---
> >  gcc/config/i386/i386.cc                       |  44 +++++++
> >  gcc/config/i386/i386.opt                      |   5 +
> >  .../gcc.target/i386/apx-inline-gpr-norex2.c   | 107 ++++++++++++++++++
> >  3 files changed, 156 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/apx-inline-gpr-norex2.c
> >
> > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > index d26d9ab0d9d..9460ebbfda4 100644
> > --- a/gcc/config/i386/i386.cc
> > +++ b/gcc/config/i386/i386.cc
> > @@ -17,6 +17,7 @@ You should have received a copy of the GNU General Public License
> >  along with GCC; see the file COPYING3.  If not see
> >  <http://www.gnu.org/licenses/>.  */
> >
> > +#define INCLUDE_STRING
> >  #define IN_TARGET_CODE 1
> >
> >  #include "config.h"
> > @@ -23077,6 +23078,49 @@ ix86_md_asm_adjust (vec<rtx> &outputs, vec<rtx> & /*inputs*/,
> >    bool saw_asm_flag = false;
> >
> >    start_sequence ();
> > +  /* TODO: Here we just mapped the general r/m constraints to non-EGPR
> > +   constraints, will eventually map all the usable constraints in the future. */
>
> I think there should be some constraint which explicitly has all the 32
> GPRs, like there is one for just all 16 GPRs (h), so that regardless of
> -mapx-inline-asm-use-gpr32 one can be explicit what the inline asm wants.
>
> Also, what about the "g" constraint?  Shouldn't there be another for "g"
> without r16..r31?  What about the various other memory
> constraints ("<", "o", ...)?

I think we should leave all existing constraints as they are, so "r"
covers only GPR16, "m" and "o" to only use GPR16. We can then
introduce "h" to instructions that have the ability to handle EGPR.
This would be somehow similar to the SSE -> AVX512F transition, where
we still have "x" for SSE16 and "v" was introduced as a separate
register class for EVEX SSE registers. This way, asm will be
compatible, when "r", "m", "o" and "g" are used. The new memory
constraint "Bt", should allow new registers, and should be added to
the constraint string as a separate constraint, and conditionally
enabled by relevant "isa" (AKA "enabled") attribute.

Uros.

> > +  if (TARGET_APX_EGPR && !ix86_apx_inline_asm_use_gpr32)
> > +    {
> > +      /* Map "r" constraint in inline asm to "h" that disallows r16-r31
> > +      and replace only r, exclude Br and Yr.  */
> > +      for (unsigned i = 0; i < constraints.length (); i++)
> > +     {
> > +       std::string *s = new std::string (constraints[i]);
>
> Doesn't this leak memory (all the time)?
> I must say I don't really understand why you need to use std::string here,
> but certainly it shouldn't leak.
>
> > +       size_t pos = s->find ('r');
> > +       while (pos != std::string::npos)
> > +         {
> > +           if (pos > 0
> > +               && (s->at (pos - 1) == 'Y' || s->at (pos - 1) == 'B'))
> > +             pos = s->find ('r', pos + 1);
> > +           else
> > +             {
> > +               s->replace (pos, 1, "h");
> > +               constraints[i] = (const char*) s->c_str ();
>
> Formatting (space before *).  The usual way for constraints is ggc_strdup on
> some string in a buffer.  Also, one could have several copies or r (or m, memory (doesn't
> that appear just in clobbers?  And that doesn't look like something that
> should be replaced), Bm, e.g. in various alternatives.  So, you
> need to change them all, not just the first hit.  "r,r,r,m" and the like.
> Normally, one would simply walk the constraint string, parsing the special
> letters (+, =, & etc.) and single letter constraints and 2 letter
> constraints using CONSTRAINT_LEN macro (tons of examples in GCC sources).
> Either do it in 2 passes, first one counts how long constraint string one
> will need after the adjustments (and whether to adjust something at all),
> then if needed XALLOCAVEC it and adjust in there, or say use a
> auto_vec<char, 32> for
> it.
>
> > +               break;
> > +             }
> > +         }
> > +     }
> > +      /* Also map "m/memory/Bm" constraint that may use GPR32, replace them with
> > +      "Bt/Bt/BT".  */
> > +      for (unsigned i = 0; i < constraints.length (); i++)
> > +     {
> > +       std::string *s = new std::string (constraints[i]);
> > +       size_t pos = s->find ("m");
> > +       size_t pos2 = s->find ("memory");
> > +       if (pos != std::string::npos)
> > +         {
> > +           if (pos > 0 && (s->at (pos - 1) == 'B'))
> > +               s->replace (pos - 1, 2, "BT");
> > +           else if (pos2 != std::string::npos)
> > +               s->replace (pos, 6, "Bt");
> > +           else
> > +               s->replace (pos, 1, "Bt");
>
> Formatting, the s->replace calls are indented too much.
>
>         Jakub
>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 09/13] [APX EGPR] Handle legacy insn that only support GPR16 (1/5)
  2023-08-31  8:20 ` [PATCH 09/13] [APX EGPR] Handle legacy insn that only support GPR16 (1/5) Hongyu Wang
@ 2023-08-31 10:06   ` Uros Bizjak
  0 siblings, 0 replies; 49+ messages in thread
From: Uros Bizjak @ 2023-08-31 10:06 UTC (permalink / raw)
  To: Hongyu Wang
  Cc: gcc-patches, hongtao.liu, hubicka, vmakarov, jakub, Kong Lingling

On Thu, Aug 31, 2023 at 10:20 AM Hongyu Wang <hongyu.wang@intel.com> wrote:
>
> From: Kong Lingling <lingling.kong@intel.com>
>
> These legacy insn in opcode map0/1 only support GPR16,
> and do not have vex/evex counterpart, directly adjust constraints and
> add gpr32 attr to patterns.
>
> insn list:
> 1. xsave/xsave64, xrstor/xrstor64
> 2. xsaves/xsaves64, xrstors/xrstors64
> 3. xsavec/xsavec64
> 4. xsaveopt/xsaveopt64
> 5. fxsave64/fxrstor64

IMO, instructions should be handled with a reversed approach. Add "h"
constraint (and memory constraint that can handle EGPR) to
instructions that CAN use EGPR (together with a relevant "enabled"
attribute. We have had the same approach with "x" to "v" transition
with SSE registers. If we "forgot" to add "v" to the instruction, it
still worked, but not to its full potential w.r.t available registers.

Uros.
>
> gcc/ChangeLog:
>
>         * config/i386/i386.md (<xsave>): Set attr gpr32 0 and constraint
>         Bt.
>         (<xsave>_rex64): Likewise.
>         (<xrstor>_rex64): Likewise.
>         (<xrstor>64): Likewise.
>         (fxsave64): Likewise.
>         (fxstore64): Likewise.
>
> gcc/testsuite/ChangeLog:
>
>         * lib/target-supports.exp: Add apxf check.
>         * gcc.target/i386/apx-legacy-insn-check-norex2.c: New test.
>         * gcc.target/i386/apx-legacy-insn-check-norex2-asm.c: New assembler test.
> ---
>  gcc/config/i386/i386.md                       | 18 +++++++----
>  .../i386/apx-legacy-insn-check-norex2-asm.c   |  5 ++++
>  .../i386/apx-legacy-insn-check-norex2.c       | 30 +++++++++++++++++++
>  gcc/testsuite/lib/target-supports.exp         | 10 +++++++
>  4 files changed, 57 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2-asm.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c
>
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index b9eaea78f00..83ad01b43c1 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -25626,11 +25626,12 @@ (define_insn "fxsave"
>          (symbol_ref "ix86_attr_length_address_default (insn) + 3"))])
>
>  (define_insn "fxsave64"
> -  [(set (match_operand:BLK 0 "memory_operand" "=m")
> +  [(set (match_operand:BLK 0 "memory_operand" "=Bt")
>         (unspec_volatile:BLK [(const_int 0)] UNSPECV_FXSAVE64))]
>    "TARGET_64BIT && TARGET_FXSR"
>    "fxsave64\t%0"
>    [(set_attr "type" "other")
> +   (set_attr "gpr32" "0")
>     (set_attr "memory" "store")
>     (set (attr "length")
>          (symbol_ref "ix86_attr_length_address_default (insn) + 4"))])
> @@ -25646,11 +25647,12 @@ (define_insn "fxrstor"
>          (symbol_ref "ix86_attr_length_address_default (insn) + 3"))])
>
>  (define_insn "fxrstor64"
> -  [(unspec_volatile [(match_operand:BLK 0 "memory_operand" "m")]
> +  [(unspec_volatile [(match_operand:BLK 0 "memory_operand" "Bt")]
>                     UNSPECV_FXRSTOR64)]
>    "TARGET_64BIT && TARGET_FXSR"
>    "fxrstor64\t%0"
>    [(set_attr "type" "other")
> +   (set_attr "gpr32" "0")
>     (set_attr "memory" "load")
>     (set (attr "length")
>          (symbol_ref "ix86_attr_length_address_default (insn) + 4"))])
> @@ -25704,7 +25706,7 @@ (define_insn "<xsave>"
>          (symbol_ref "ix86_attr_length_address_default (insn) + 3"))])
>
>  (define_insn "<xsave>_rex64"
> -  [(set (match_operand:BLK 0 "memory_operand" "=m")
> +  [(set (match_operand:BLK 0 "memory_operand" "=Bt")
>         (unspec_volatile:BLK
>          [(match_operand:SI 1 "register_operand" "a")
>           (match_operand:SI 2 "register_operand" "d")]
> @@ -25713,11 +25715,12 @@ (define_insn "<xsave>_rex64"
>    "<xsave>\t%0"
>    [(set_attr "type" "other")
>     (set_attr "memory" "store")
> +   (set_attr "gpr32" "0")
>     (set (attr "length")
>          (symbol_ref "ix86_attr_length_address_default (insn) + 3"))])
>
>  (define_insn "<xsave>"
> -  [(set (match_operand:BLK 0 "memory_operand" "=m")
> +  [(set (match_operand:BLK 0 "memory_operand" "=Bt")
>         (unspec_volatile:BLK
>          [(match_operand:SI 1 "register_operand" "a")
>           (match_operand:SI 2 "register_operand" "d")]
> @@ -25726,6 +25729,7 @@ (define_insn "<xsave>"
>    "<xsave>\t%0"
>    [(set_attr "type" "other")
>     (set_attr "memory" "store")
> +   (set_attr "gpr32" "0")
>     (set (attr "length")
>          (symbol_ref "ix86_attr_length_address_default (insn) + 4"))])
>
> @@ -25743,7 +25747,7 @@ (define_insn "<xrstor>"
>
>  (define_insn "<xrstor>_rex64"
>     [(unspec_volatile:BLK
> -     [(match_operand:BLK 0 "memory_operand" "m")
> +     [(match_operand:BLK 0 "memory_operand" "Bt")
>        (match_operand:SI 1 "register_operand" "a")
>        (match_operand:SI 2 "register_operand" "d")]
>       ANY_XRSTOR)]
> @@ -25751,12 +25755,13 @@ (define_insn "<xrstor>_rex64"
>    "<xrstor>\t%0"
>    [(set_attr "type" "other")
>     (set_attr "memory" "load")
> +   (set_attr "gpr32" "0")
>     (set (attr "length")
>          (symbol_ref "ix86_attr_length_address_default (insn) + 3"))])
>
>  (define_insn "<xrstor>64"
>     [(unspec_volatile:BLK
> -     [(match_operand:BLK 0 "memory_operand" "m")
> +     [(match_operand:BLK 0 "memory_operand" "Bt")
>        (match_operand:SI 1 "register_operand" "a")
>        (match_operand:SI 2 "register_operand" "d")]
>       ANY_XRSTOR64)]
> @@ -25764,6 +25769,7 @@ (define_insn "<xrstor>64"
>    "<xrstor>64\t%0"
>    [(set_attr "type" "other")
>     (set_attr "memory" "load")
> +   (set_attr "gpr32" "0")
>     (set (attr "length")
>          (symbol_ref "ix86_attr_length_address_default (insn) + 4"))])
>
> diff --git a/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2-asm.c b/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2-asm.c
> new file mode 100644
> index 00000000000..7ecc861435f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2-asm.c
> @@ -0,0 +1,5 @@
> +/* { dg-do assemble { target apxf } } */
> +/* { dg-options "-O1 -mapxf -m64 -DDTYPE32" } */
> +
> +#include "apx-legacy-insn-check-norex2.c"
> +
> diff --git a/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c b/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c
> new file mode 100644
> index 00000000000..1e5450dfb73
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c
> @@ -0,0 +1,30 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -mapxf -m64 -DDTYPE32" } */
> +
> +#include <immintrin.h>
> +
> +typedef unsigned int u32;
> +typedef unsigned long long u64;
> +
> +#ifndef DTYPE32
> +#define DTYPE32
> +#endif
> +
> +#ifdef DTYPE32
> +typedef u32 DTYPE;
> +#endif
> +
> +__attribute__((target("xsave,fxsr")))
> +void legacy_test ()
> +{
> +  register DTYPE* val __asm__("r16");
> +  _xsave64 (val, 1);
> +  _xrstor64 (val, 1);
> +  _fxsave64 (val);
> +  _fxrstor64 (val);
> +}
> +
> +/* { dg-final { scan-assembler-not "xsave64\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> +/* { dg-final { scan-assembler-not "xrstor64\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> +/* { dg-final { scan-assembler-not "fxsave64\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> +/* { dg-final { scan-assembler-not "fxrstor64\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
> index d353cc0aaf0..6359408542a 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -9938,6 +9938,16 @@ proc check_effective_target_sm4 { } {
>      } "-msm4" ]
>  }
>
> +proc check_effective_target_apxf { } {
> +    return [check_no_compiler_messages apxf object {
> +       void
> +       foo ()
> +       {
> +         __asm__ volatile ("add\t%%r16, %%r31" ::);
> +       }
> +    } "-mapxf" ]
> +}
> +
>  # Return 1 if sse instructions can be compiled.
>  proc check_effective_target_sse { } {
>      return [check_no_compiler_messages sse object {
> --
> 2.31.1
>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 01/13] [APX EGPR] middle-end: Add insn argument to base_reg_class
  2023-08-31  8:20 ` [PATCH 01/13] [APX EGPR] middle-end: Add insn argument to base_reg_class Hongyu Wang
@ 2023-08-31 10:15   ` Uros Bizjak
  2023-09-01  9:07     ` Hongyu Wang
  2023-09-08 17:03   ` Vladimir Makarov
  1 sibling, 1 reply; 49+ messages in thread
From: Uros Bizjak @ 2023-08-31 10:15 UTC (permalink / raw)
  To: Hongyu Wang
  Cc: gcc-patches, hongtao.liu, hubicka, vmakarov, jakub, Kong Lingling

On Thu, Aug 31, 2023 at 10:20 AM Hongyu Wang <hongyu.wang@intel.com> wrote:
>
> From: Kong Lingling <lingling.kong@intel.com>
>
> Current reload infrastructure does not support selective base_reg_class
> for backend insn. Add insn argument to base_reg_class for
> lra/reload usage.

I don't think this is the correct approach. Ideally, a memory
constraint should somehow encode its BASE/INDEX register class.
Instead of passing "insn", simply a different constraint could be used
in the constraint string of the relevant insn.

Uros.
>
> gcc/ChangeLog:
>
>         * addresses.h (base_reg_class):  Add insn argument.
>         Pass to MODE_CODE_BASE_REG_CLASS.
>         (regno_ok_for_base_p_1): Add insn argument.
>         Pass to REGNO_MODE_CODE_OK_FOR_BASE_P.
>         (regno_ok_for_base_p): Add insn argument and parse to ok_for_base_p_1.
>         * config/avr/avr.h (MODE_CODE_BASE_REG_CLASS): Add insn argument.
>         (REGNO_MODE_CODE_OK_FOR_BASE_P): Ditto.
>         * config/gcn/gcn.h (MODE_CODE_BASE_REG_CLASS): Ditto.
>         (REGNO_MODE_CODE_OK_FOR_BASE_P): Ditto.
>         * config/rl78/rl78.h (REGNO_MODE_CODE_OK_FOR_BASE_P): Ditto.
>         (MODE_CODE_BASE_REG_CLASS): Ditto.
>         * doc/tm.texi: Add insn argument for MODE_CODE_BASE_REG_CLASS
>         and REGNO_MODE_CODE_OK_FOR_BASE_P.
>         * doc/tm.texi.in: Ditto.
>         * lra-constraints.cc (process_address_1): Pass insn to
>         base_reg_class.
>         (curr_insn_transform): Ditto.
>         * reload.cc (find_reloads): Ditto.
>         (find_reloads_address): Ditto.
>         (find_reloads_address_1): Ditto.
>         (find_reloads_subreg_address): Ditto.
>         * reload1.cc (maybe_fix_stack_asms): Ditto.
> ---
>  gcc/addresses.h        | 15 +++++++++------
>  gcc/config/avr/avr.h   |  5 +++--
>  gcc/config/gcn/gcn.h   |  4 ++--
>  gcc/config/rl78/rl78.h |  6 ++++--
>  gcc/doc/tm.texi        |  8 ++++++--
>  gcc/doc/tm.texi.in     |  8 ++++++--
>  gcc/lra-constraints.cc | 15 +++++++++------
>  gcc/reload.cc          | 30 ++++++++++++++++++------------
>  gcc/reload1.cc         |  2 +-
>  9 files changed, 58 insertions(+), 35 deletions(-)
>
> diff --git a/gcc/addresses.h b/gcc/addresses.h
> index 3519c241c6d..08b100cfe6d 100644
> --- a/gcc/addresses.h
> +++ b/gcc/addresses.h
> @@ -28,11 +28,12 @@ inline enum reg_class
>  base_reg_class (machine_mode mode ATTRIBUTE_UNUSED,
>                 addr_space_t as ATTRIBUTE_UNUSED,
>                 enum rtx_code outer_code ATTRIBUTE_UNUSED,
> -               enum rtx_code index_code ATTRIBUTE_UNUSED)
> +               enum rtx_code index_code ATTRIBUTE_UNUSED,
> +               rtx_insn *insn ATTRIBUTE_UNUSED = NULL)
>  {
>  #ifdef MODE_CODE_BASE_REG_CLASS
>    return MODE_CODE_BASE_REG_CLASS (MACRO_MODE (mode), as, outer_code,
> -                                  index_code);
> +                                  index_code, insn);
>  #else
>  #ifdef MODE_BASE_REG_REG_CLASS
>    if (index_code == REG)
> @@ -56,11 +57,12 @@ ok_for_base_p_1 (unsigned regno ATTRIBUTE_UNUSED,
>                  machine_mode mode ATTRIBUTE_UNUSED,
>                  addr_space_t as ATTRIBUTE_UNUSED,
>                  enum rtx_code outer_code ATTRIBUTE_UNUSED,
> -                enum rtx_code index_code ATTRIBUTE_UNUSED)
> +                enum rtx_code index_code ATTRIBUTE_UNUSED,
> +                rtx_insn* insn ATTRIBUTE_UNUSED = NULL)
>  {
>  #ifdef REGNO_MODE_CODE_OK_FOR_BASE_P
>    return REGNO_MODE_CODE_OK_FOR_BASE_P (regno, MACRO_MODE (mode), as,
> -                                       outer_code, index_code);
> +                                       outer_code, index_code, insn);
>  #else
>  #ifdef REGNO_MODE_OK_FOR_REG_BASE_P
>    if (index_code == REG)
> @@ -79,12 +81,13 @@ ok_for_base_p_1 (unsigned regno ATTRIBUTE_UNUSED,
>
>  inline bool
>  regno_ok_for_base_p (unsigned regno, machine_mode mode, addr_space_t as,
> -                    enum rtx_code outer_code, enum rtx_code index_code)
> +                    enum rtx_code outer_code, enum rtx_code index_code,
> +                    rtx_insn* insn = NULL)
>  {
>    if (regno >= FIRST_PSEUDO_REGISTER && reg_renumber[regno] >= 0)
>      regno = reg_renumber[regno];
>
> -  return ok_for_base_p_1 (regno, mode, as, outer_code, index_code);
> +  return ok_for_base_p_1 (regno, mode, as, outer_code, index_code, insn);
>  }
>
>  #endif /* GCC_ADDRESSES_H */
> diff --git a/gcc/config/avr/avr.h b/gcc/config/avr/avr.h
> index 8e7e00db13b..1d090fe0838 100644
> --- a/gcc/config/avr/avr.h
> +++ b/gcc/config/avr/avr.h
> @@ -280,12 +280,13 @@ enum reg_class {
>
>  #define REGNO_REG_CLASS(R) avr_regno_reg_class(R)
>
> -#define MODE_CODE_BASE_REG_CLASS(mode, as, outer_code, index_code)   \
> +#define MODE_CODE_BASE_REG_CLASS(mode, as, outer_code, index_code, insn)   \
>    avr_mode_code_base_reg_class (mode, as, outer_code, index_code)
>
>  #define INDEX_REG_CLASS NO_REGS
>
> -#define REGNO_MODE_CODE_OK_FOR_BASE_P(num, mode, as, outer_code, index_code) \
> +#define REGNO_MODE_CODE_OK_FOR_BASE_P(num, mode, as, outer_code,         \
> +                                     index_code, insn)                   \
>    avr_regno_mode_code_ok_for_base_p (num, mode, as, outer_code, index_code)
>
>  #define REGNO_OK_FOR_INDEX_P(NUM) 0
> diff --git a/gcc/config/gcn/gcn.h b/gcc/config/gcn/gcn.h
> index 4ff9a5d4d12..b56702a77fd 100644
> --- a/gcc/config/gcn/gcn.h
> +++ b/gcc/config/gcn/gcn.h
> @@ -437,9 +437,9 @@ enum reg_class
>       0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0 }}
>
>  #define REGNO_REG_CLASS(REGNO) gcn_regno_reg_class (REGNO)
> -#define MODE_CODE_BASE_REG_CLASS(MODE, AS, OUTER, INDEX) \
> +#define MODE_CODE_BASE_REG_CLASS(MODE, AS, OUTER, INDEX, INSN) \
>          gcn_mode_code_base_reg_class (MODE, AS, OUTER, INDEX)
> -#define REGNO_MODE_CODE_OK_FOR_BASE_P(NUM, MODE, AS, OUTER, INDEX) \
> +#define REGNO_MODE_CODE_OK_FOR_BASE_P(NUM, MODE, AS, OUTER, INDEX, INSN) \
>          gcn_regno_mode_code_ok_for_base_p (NUM, MODE, AS, OUTER, INDEX)
>  #define INDEX_REG_CLASS VGPR_REGS
>  #define REGNO_OK_FOR_INDEX_P(regno) regno_ok_for_index_p (regno)
> diff --git a/gcc/config/rl78/rl78.h b/gcc/config/rl78/rl78.h
> index 7a7c6a44ba2..d0ed9162292 100644
> --- a/gcc/config/rl78/rl78.h
> +++ b/gcc/config/rl78/rl78.h
> @@ -375,10 +375,12 @@ enum reg_class
>
>  #define REGNO_OK_FOR_INDEX_P(regno)    REGNO_OK_FOR_BASE_P (regno)
>
> -#define REGNO_MODE_CODE_OK_FOR_BASE_P(regno, mode, address_space, outer_code, index_code) \
> +#define REGNO_MODE_CODE_OK_FOR_BASE_P(regno, mode, address_space, outer_code, \
> +                                     index_code, insn)                       \
>    rl78_regno_mode_code_ok_for_base_p (regno, mode, address_space, outer_code, index_code)
>
> -#define MODE_CODE_BASE_REG_CLASS(mode, address_space, outer_code, index_code) \
> +#define MODE_CODE_BASE_REG_CLASS(mode, address_space, outer_code, index_code, \
> +                                insn)                                        \
>    rl78_mode_code_base_reg_class (mode, address_space, outer_code, index_code)
>
>  #define RETURN_ADDR_RTX(COUNT, FRAMEADDR)                              \
> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> index d0d47b0d471..a4239e3de10 100644
> --- a/gcc/doc/tm.texi
> +++ b/gcc/doc/tm.texi
> @@ -2533,7 +2533,7 @@ register address.  You should define this macro if base plus index
>  addresses have different requirements than other base register uses.
>  @end defmac
>
> -@defmac MODE_CODE_BASE_REG_CLASS (@var{mode}, @var{address_space}, @var{outer_code}, @var{index_code})
> +@defmac MODE_CODE_BASE_REG_CLASS (@var{mode}, @var{address_space}, @var{outer_code}, @var{index_code}, @var{insn})
>  A C expression whose value is the register class to which a valid
>  base register for a memory reference in mode @var{mode} to address
>  space @var{address_space} must belong.  @var{outer_code} and @var{index_code}
> @@ -2542,6 +2542,8 @@ the code of the immediately enclosing expression (@code{MEM} for the top level
>  of an address, @code{ADDRESS} for something that occurs in an
>  @code{address_operand}).  @var{index_code} is the code of the corresponding
>  index expression if @var{outer_code} is @code{PLUS}; @code{SCRATCH} otherwise.
> +@code{insn} indicates insn specific base register class should be subset
> +of the original base register class.
>  @end defmac
>
>  @defmac INDEX_REG_CLASS
> @@ -2579,7 +2581,7 @@ Use of this macro is deprecated; please use the more general
>  @code{REGNO_MODE_CODE_OK_FOR_BASE_P}.
>  @end defmac
>
> -@defmac REGNO_MODE_CODE_OK_FOR_BASE_P (@var{num}, @var{mode}, @var{address_space}, @var{outer_code}, @var{index_code})
> +@defmac REGNO_MODE_CODE_OK_FOR_BASE_P (@var{num}, @var{mode}, @var{address_space}, @var{outer_code}, @var{index_code}, @var{insn})
>  A C expression which is nonzero if register number @var{num} is
>  suitable for use as a base register in operand addresses, accessing
>  memory in mode @var{mode} in address space @var{address_space}.
> @@ -2592,6 +2594,8 @@ address, @code{ADDRESS} for something that occurs in an
>  corresponding index expression if @var{outer_code} is @code{PLUS};
>  @code{SCRATCH} otherwise.  The mode may be @code{VOIDmode} for addresses
>  that appear outside a @code{MEM}, i.e., as an @code{address_operand}.
> +@code{insn} indicates insn specific base register class should be subset
> +of the original base register class.
>  @end defmac
>
>  @defmac REGNO_OK_FOR_INDEX_P (@var{num})
> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
> index 4ac96dc357d..72898f3adba 100644
> --- a/gcc/doc/tm.texi.in
> +++ b/gcc/doc/tm.texi.in
> @@ -2128,7 +2128,7 @@ register address.  You should define this macro if base plus index
>  addresses have different requirements than other base register uses.
>  @end defmac
>
> -@defmac MODE_CODE_BASE_REG_CLASS (@var{mode}, @var{address_space}, @var{outer_code}, @var{index_code})
> +@defmac MODE_CODE_BASE_REG_CLASS (@var{mode}, @var{address_space}, @var{outer_code}, @var{index_code}, @var{insn})
>  A C expression whose value is the register class to which a valid
>  base register for a memory reference in mode @var{mode} to address
>  space @var{address_space} must belong.  @var{outer_code} and @var{index_code}
> @@ -2137,6 +2137,8 @@ the code of the immediately enclosing expression (@code{MEM} for the top level
>  of an address, @code{ADDRESS} for something that occurs in an
>  @code{address_operand}).  @var{index_code} is the code of the corresponding
>  index expression if @var{outer_code} is @code{PLUS}; @code{SCRATCH} otherwise.
> +@code{insn} indicates insn specific base register class should be subset
> +of the original base register class.
>  @end defmac
>
>  @defmac INDEX_REG_CLASS
> @@ -2174,7 +2176,7 @@ Use of this macro is deprecated; please use the more general
>  @code{REGNO_MODE_CODE_OK_FOR_BASE_P}.
>  @end defmac
>
> -@defmac REGNO_MODE_CODE_OK_FOR_BASE_P (@var{num}, @var{mode}, @var{address_space}, @var{outer_code}, @var{index_code})
> +@defmac REGNO_MODE_CODE_OK_FOR_BASE_P (@var{num}, @var{mode}, @var{address_space}, @var{outer_code}, @var{index_code}, @var{insn})
>  A C expression which is nonzero if register number @var{num} is
>  suitable for use as a base register in operand addresses, accessing
>  memory in mode @var{mode} in address space @var{address_space}.
> @@ -2187,6 +2189,8 @@ address, @code{ADDRESS} for something that occurs in an
>  corresponding index expression if @var{outer_code} is @code{PLUS};
>  @code{SCRATCH} otherwise.  The mode may be @code{VOIDmode} for addresses
>  that appear outside a @code{MEM}, i.e., as an @code{address_operand}.
> +@code{insn} indicates insn specific base register class should be subset
> +of the original base register class.
>  @end defmac
>
>  @defmac REGNO_OK_FOR_INDEX_P (@var{num})
> diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
> index c718bedff32..9e7915ce934 100644
> --- a/gcc/lra-constraints.cc
> +++ b/gcc/lra-constraints.cc
> @@ -3672,7 +3672,7 @@ process_address_1 (int nop, bool check_only_p,
>                                      REGNO (*ad.base_term)) != NULL_RTX)
>             ? after : NULL),
>            base_reg_class (ad.mode, ad.as, ad.base_outer_code,
> -                          get_index_code (&ad)))))
> +                          get_index_code (&ad), curr_insn))))
>      {
>        change_p = true;
>        if (ad.base_term2 != NULL)
> @@ -3722,7 +3722,8 @@ process_address_1 (int nop, bool check_only_p,
>           rtx_insn *last = get_last_insn ();
>           int code = -1;
>           enum reg_class cl = base_reg_class (ad.mode, ad.as,
> -                                             SCRATCH, SCRATCH);
> +                                             SCRATCH, SCRATCH,
> +                                             curr_insn);
>           rtx addr = *ad.inner;
>
>           new_reg = lra_create_new_reg (Pmode, NULL_RTX, cl, NULL, "addr");
> @@ -3785,7 +3786,8 @@ process_address_1 (int nop, bool check_only_p,
>           /* index * scale + disp => new base + index * scale,
>              case (1) above.  */
>           enum reg_class cl = base_reg_class (ad.mode, ad.as, PLUS,
> -                                             GET_CODE (*ad.index));
> +                                             GET_CODE (*ad.index),
> +                                             curr_insn);
>
>           lra_assert (INDEX_REG_CLASS != NO_REGS);
>           new_reg = lra_create_new_reg (Pmode, NULL_RTX, cl, NULL, "disp");
> @@ -3846,7 +3848,7 @@ process_address_1 (int nop, bool check_only_p,
>               *ad.base_term = XEXP (SET_SRC (set), 0);
>               *ad.disp_term = XEXP (SET_SRC (set), 1);
>               cl = base_reg_class (ad.mode, ad.as, ad.base_outer_code,
> -                                  get_index_code (&ad));
> +                                  get_index_code (&ad), curr_insn);
>               regno = REGNO (*ad.base_term);
>               if (regno >= FIRST_PSEUDO_REGISTER
>                   && cl != lra_get_allocno_class (regno))
> @@ -3890,7 +3892,8 @@ process_address_1 (int nop, bool check_only_p,
>    else
>      {
>        enum reg_class cl = base_reg_class (ad.mode, ad.as,
> -                                         SCRATCH, SCRATCH);
> +                                         SCRATCH, SCRATCH,
> +                                         curr_insn);
>        rtx addr = *ad.inner;
>
>        new_reg = lra_create_new_reg (Pmode, NULL_RTX, cl, NULL, "addr");
> @@ -4639,7 +4642,7 @@ curr_insn_transform (bool check_only_p)
>
>           push_to_sequence (before);
>           rclass = base_reg_class (GET_MODE (op), MEM_ADDR_SPACE (op),
> -                                  MEM, SCRATCH);
> +                                  MEM, SCRATCH, curr_insn);
>           if (GET_RTX_CLASS (code) == RTX_AUTOINC)
>             new_reg = emit_inc (rclass, *loc, *loc,
>                                 /* This value does not matter for MODIFY.  */
> diff --git a/gcc/reload.cc b/gcc/reload.cc
> index 2126bdd117c..72f7e27af15 100644
> --- a/gcc/reload.cc
> +++ b/gcc/reload.cc
> @@ -3321,7 +3321,7 @@ find_reloads (rtx_insn *insn, int replace, int ind_levels, int live_known,
>                        were handled in find_reloads_address.  */
>                     this_alternative[i]
>                       = base_reg_class (VOIDmode, ADDR_SPACE_GENERIC,
> -                                       ADDRESS, SCRATCH);
> +                                       ADDRESS, SCRATCH, insn);
>                     win = 1;
>                     badop = 0;
>                     break;
> @@ -3508,7 +3508,7 @@ find_reloads (rtx_insn *insn, int replace, int ind_levels, int live_known,
>                            the address into a base register.  */
>                         this_alternative[i]
>                           = base_reg_class (VOIDmode, ADDR_SPACE_GENERIC,
> -                                           ADDRESS, SCRATCH);
> +                                           ADDRESS, SCRATCH, insn);
>                         badop = 0;
>                         break;
>
> @@ -4018,7 +4018,7 @@ find_reloads (rtx_insn *insn, int replace, int ind_levels, int live_known,
>             operand_reloadnum[i]
>               = push_reload (XEXP (recog_data.operand[i], 0), NULL_RTX,
>                              &XEXP (recog_data.operand[i], 0), (rtx*) 0,
> -                            base_reg_class (VOIDmode, as, MEM, SCRATCH),
> +                            base_reg_class (VOIDmode, as, MEM, SCRATCH, insn),
>                              address_mode,
>                              VOIDmode, 0, 0, i, RELOAD_OTHER);
>             rld[operand_reloadnum[i]].inc
> @@ -4897,7 +4897,8 @@ find_reloads_address (machine_mode mode, rtx *memrefloc, rtx ad,
>        if (reg_equiv_constant (regno) != 0)
>         {
>           find_reloads_address_part (reg_equiv_constant (regno), loc,
> -                                    base_reg_class (mode, as, MEM, SCRATCH),
> +                                    base_reg_class (mode, as, MEM,
> +                                                    SCRATCH, insn),
>                                      GET_MODE (ad), opnum, type, ind_levels);
>           return 1;
>         }
> @@ -4966,7 +4967,7 @@ find_reloads_address (machine_mode mode, rtx *memrefloc, rtx ad,
>
>        /* If we do not have one of the cases above, we must do the reload.  */
>        push_reload (ad, NULL_RTX, loc, (rtx*) 0,
> -                  base_reg_class (mode, as, MEM, SCRATCH),
> +                  base_reg_class (mode, as, MEM, SCRATCH, insn),
>                    GET_MODE (ad), VOIDmode, 0, 0, opnum, type);
>        return 1;
>      }
> @@ -5123,7 +5124,8 @@ find_reloads_address (machine_mode mode, rtx *memrefloc, rtx ad,
>              reload the sum into a base reg.
>              That will at least work.  */
>           find_reloads_address_part (ad, loc,
> -                                    base_reg_class (mode, as, MEM, SCRATCH),
> +                                    base_reg_class (mode, as, MEM,
> +                                                    SCRATCH, insn),
>                                      GET_MODE (ad), opnum, type, ind_levels);
>         }
>        return ! removed_and;
> @@ -5203,7 +5205,7 @@ find_reloads_address (machine_mode mode, rtx *memrefloc, rtx ad,
>                                  op_index == 0 ? addend : offset_reg);
>           *loc = ad;
>
> -         cls = base_reg_class (mode, as, MEM, GET_CODE (addend));
> +         cls = base_reg_class (mode, as, MEM, GET_CODE (addend), insn);
>           find_reloads_address_part (XEXP (ad, op_index),
>                                      &XEXP (ad, op_index), cls,
>                                      GET_MODE (ad), opnum, type, ind_levels);
> @@ -5261,7 +5263,8 @@ find_reloads_address (machine_mode mode, rtx *memrefloc, rtx ad,
>         }
>
>        find_reloads_address_part (ad, loc,
> -                                base_reg_class (mode, as, MEM, SCRATCH),
> +                                base_reg_class (mode, as, MEM,
> +                                                SCRATCH, insn),
>                                  address_mode, opnum, type, ind_levels);
>        return ! removed_and;
>      }
> @@ -5513,7 +5516,8 @@ find_reloads_address_1 (machine_mode mode, addr_space_t as,
>    if (context == 1)
>      context_reg_class = INDEX_REG_CLASS;
>    else
> -    context_reg_class = base_reg_class (mode, as, outer_code, index_code);
> +    context_reg_class = base_reg_class (mode, as, outer_code, index_code,
> +                                       insn);
>
>    switch (code)
>      {
> @@ -5738,7 +5742,8 @@ find_reloads_address_1 (machine_mode mode, addr_space_t as,
>                 reloadnum = push_reload (tem, tem, &XEXP (x, 0),
>                                          &XEXP (op1, 0),
>                                          base_reg_class (mode, as,
> -                                                        code, index_code),
> +                                                        code, index_code,
> +                                                        insn),
>                                          GET_MODE (x), GET_MODE (x), 0,
>                                          0, opnum, RELOAD_OTHER);
>
> @@ -5756,7 +5761,8 @@ find_reloads_address_1 (machine_mode mode, addr_space_t as,
>             reloadnum = push_reload (XEXP (op1, 0), XEXP (x, 0),
>                                      &XEXP (op1, 0), &XEXP (x, 0),
>                                      base_reg_class (mode, as,
> -                                                    code, index_code),
> +                                                    code, index_code,
> +                                                    insn),
>                                      GET_MODE (x), GET_MODE (x), 0, 0,
>                                      opnum, RELOAD_OTHER);
>
> @@ -6216,7 +6222,7 @@ find_reloads_subreg_address (rtx x, int opnum, enum reload_type type,
>      {
>        push_reload (XEXP (tem, 0), NULL_RTX, &XEXP (tem, 0), (rtx*) 0,
>                    base_reg_class (GET_MODE (tem), MEM_ADDR_SPACE (tem),
> -                                  MEM, SCRATCH),
> +                                  MEM, SCRATCH, insn),
>                    GET_MODE (XEXP (tem, 0)), VOIDmode, 0, 0, opnum, type);
>        reloaded = 1;
>      }
> diff --git a/gcc/reload1.cc b/gcc/reload1.cc
> index 9ba822d1ff7..f41f4a4de22 100644
> --- a/gcc/reload1.cc
> +++ b/gcc/reload1.cc
> @@ -1382,7 +1382,7 @@ maybe_fix_stack_asms (void)
>                   if (insn_extra_address_constraint (cn))
>                     cls = (int) reg_class_subunion[cls]
>                       [(int) base_reg_class (VOIDmode, ADDR_SPACE_GENERIC,
> -                                            ADDRESS, SCRATCH)];
> +                                            ADDRESS, SCRATCH, chain->insn)];
>                   else
>                     cls = (int) reg_class_subunion[cls]
>                       [reg_class_for_constraint (cn)];
> --
> 2.31.1
>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 00/13] [RFC] Support Intel APX EGPR
  2023-08-31  9:19 ` [PATCH 00/13] [RFC] Support Intel APX EGPR Richard Biener
@ 2023-09-01  8:55   ` Hongyu Wang
  0 siblings, 0 replies; 49+ messages in thread
From: Hongyu Wang @ 2023-09-01  8:55 UTC (permalink / raw)
  To: Richard Biener; +Cc: Hongyu Wang, gcc-patches, jakub, hongtao.liu, hubicka

Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> 于2023年8月31日周四 17:21写道:
>
> On Thu, Aug 31, 2023 at 10:22 AM Hongyu Wang via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > Intel Advanced performance extension (APX) has been released in [1].
> > It contains several extensions such as extended 16 general purpose registers
> > (EGPRs), push2/pop2, new data destination (NDD), conditional compare
> > (CCMP/CTEST) combined with suppress flags write version of common instructions
> > (NF). This RFC focused on EGPR implementation in GCC.
> >
> > APX introduces a REX2 prefix to help represent EGPR for several legacy/SSE
> > instructions. For the remaining ones, it promotes some of them using evex
> > prefix for EGPR.  The main issue in APX is that not all legacy/sse/vex
> > instructions support EGPR. For example, instructions in legacy opcode map2/3
> > cannot use REX2 prefix since there is only 1bit in REX2 to indicate map0/1
> > instructions, e.g., pinsrd. Also, for most vector extensions, EGPR is supported
> > in their evex forms but not vex forms, which means the mnemonics with no evex
> > forms also cannot use EGPR, e.g., vphaddw.
> >
> > Such limitation brings some challenge with current GCC infrastructure.
> > Generally, we use constraints to guide register allocation behavior. For
> > register operand, it is easy to add a new constraint to certain insn and limit
> > it to legacy or REX registers. But for memory operand, if we only use
> > constraint to limit base/index register choice, reload has no backoff when
> > process_address allocates any egprs to base/index reg, and then any post-reload
> > pass would get ICE from the constraint.
>
> How realistic would it be to simply disable instructions not supporting EGPR?

Part of SSE and AVX instructions without the EVEX counterpart do not
support EGPR.
We are trying to prohibit EGPR usage under -mapxf in this RFC, but I
suppose it is not
realistic to disable them.

> I hope there are alternatives that would be available in actual APX
> implementations?
> Otherwise this design limitation doesn't shed a very positive light on
> the designers ...

I'm a bit confused by "alternatives" here, did you mean the
alternative instructions
for all the non-EGPR SSE/AVX ones?

> How sure are we actual implementations with APX will appear (just
> remembering SSE5...)?
> I'm quite sure it's not going to be 2024 so would it be realistic to
> post-pone APX work
> to next stage1, targeting GCC 15 only?

APX is a pretty big feature which contains several separate sub
features and takes a lot of
effort to implement. I suspect that only one release timeframe can
accommodate all of them.
I would prefer to split it into phases and implement it in GCC phase
by phase. Phase1 will
include the fundamental features, e.g. EGPR, NDD and PUSH2POP2. We
plan to implement
and land into GCC14 if it becomes good enough. Phase2 will include
CCMP/CTEST/NF
and target GCC15. The advantage is that interested users can use GCC14
and try fundamental
EGPR feature then.

>
> > Here is what we did to address the issue:
> >
> > Middle-end:
> > -       Add rtx_insn parameter to base_reg_class, reuse the
> > MODE_CODE_BASE_REG_CLASS macro with rtx_insn parameter.
> > -       Add index_reg_class like base_reg_class, calls new INSN_INDEX_REG_CLASS
> > macro with rtx_insn parameter.
> > -       In process_address_1, add rtx_insn parameter to call sites of
> > base_reg_class, replace usage of INDEX_REG_CLASS to index_reg_class with
> > rtx_insn parameter.
> >
> > Back-end:
> > -       Extend GENERAL_REG_CLASS, INDEX_REG_CLASS and their supersets with
> > corresponding regno checks for EGPRs.
> > -       Add GENERAL_GPR16/INDEX_GPR16 class for old 16 GPRs.
> > -       Whole component is controlled under -mapxf/TARGET_APX_EGPR. If it is
> > not enabled, clear r16-r31 in accessible_reg_set.
> > -       New register_constraint “h” and memory_constraint “Bt” that disallows
> > EGPRs in operand.
> > -       New asm_gpr32 flag option to enable/disable gpr32 for inline asm,
> >   disabled by default.
> > -       If asm_gpr32 is disabled, replace constraints “r” to “h”, and
> > “m/memory” to “Bt”.
> > -       Extra insn attribute gpr32, value 0 indicates the alternative cannot
> > use EGPRs.
> > -       Add target functions for base_reg_class and index_reg_class, calls a
> > helper function to verify if insn can use EGPR in its memory_operand.
> > -       In the helper function, the verify process works as follow:
> >     1. Returns true if APX_EGPR disabled or insn is null.
> >     2. If the insn is inline asm, returns asm_gpr32 flag.
> >     3. Returns false for unrecognizable insn.
> >     4. Save recog_data and which_alternative, extract the insn, and restore them
> >     before return.
> >     5. Loop through all enabled alternatives, if one of the enabled alternatives
> >     have attr_gpr32 0, returns false, otherwise returns true.
> > -       For insn alternatives that cannot use gpr32 in register_operand, use h
> > constraint instead of r.
> > -       For insn alternatives that cannot use gpr32 in memory operand, use Bt
> > constraint instead of m, and set corresponding attr_gpr32 to 0.
> > -       Split output template with %v if the sse version of mnemonic cannot use
> > gpr32.
> > -       For insn alternatives that cannot use gpr32 in memory operand, classify
> > the isa attribute and split alternatives to noavx, avx_noavx512f and etc., so
> > the helper function can properly loop through the available enabled mask.
> >
> > Specifically for inline asm, we currently just map “r/m/memory” constraints as
> > an example. Eventually we will support entire mapping of all common constraints
> > if the mapping method was accepted.
> >
> > Also, for vex instructions, currently we assume egpr was supported if they have
> > evex counterpart, since any APX enabled machine will have AVX10 support for all
> > the evex encodings. We just disabled those mnemonics that doesn’t support EGPR.
> > So EGPR will be allowed under -mavx2 -mapxf for many vex mnemonics.
> >
> > We haven’t disabled EGPR for 3DNOW/XOP/LWP/FMA4/TBM instructions, as they will
> > be co-operated with -mapxf. We can disable EGPR for them if AMD guys requires.
>
> I think most of these are retired by now, so it's unlikely an
> implementation providing
> these and also APX will appear.

Thanks, that could reduce much effort.

>
> I have no comments on the implementation other than having instructions
> that do not support the upper GPRs is quite ugly.  I don't know of any other
> target with this kind of restriction, if there is any we could see how it deals
> with such situation.

We have tried to find one but failed. We also want to know if there is
a better solution.

>
> Richard.
>
> > For testing, currently we tested GCC testsuite and spec2017 with -maxf+sde
> > simulater and no more errors. Also, we inverted the register allocation order
> > to force r31 to be allocated first, and no more error except those AMD only
> > instructions. We will conduct further tests like changing all do-compile to
> > do-assemble and add more to gcc/testsuite in the future.
> >
> > The RFC intends to describe our approach for APX implementation for EGPR
> > component. It may still have potential issues or bugs and requires futher
> > optimization. Any comments are very appreciated.
> >
> > [1]. https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html.
> >
> > Hongyu Wang (2):
> >   [APX EGPR] middle-end: Add index_reg_class with insn argument.
> >   [APX EGPR] Handle GPR16 only vector move insns
> >
> > Kong Lingling (11):
> >   [APX EGPR] middle-end: Add insn argument to base_reg_class
> >   [APX_EGPR] Initial support for APX_F
> >   [APX EGPR] Add 16 new integer general purpose registers
> >   [APX EGPR] Add register and memory constraints that disallow EGPR
> >   [APX EGPR] Map reg/mem constraints in inline asm to non-EGPR
> >     constraint.
> >   [APX EGPR] Add backend hook for base_reg_class/index_reg_class.
> >   [APX EGPR] Handle legacy insn that only support GPR16 (1/5)
> >   [APX EGPR] Handle legacy insns that only support GPR16 (2/5)
> >   [APX EGPR] Handle legacy insns that only support GPR16 (3/5)
> >   [APX_EGPR] Handle legacy insns that only support GPR16 (4/5)
> >   [APX EGPR] Handle vex insns that only support GPR16 (5/5)
> >
> >  gcc/addresses.h                               |  25 +-
> >  gcc/common/config/i386/cpuinfo.h              |  12 +-
> >  gcc/common/config/i386/i386-common.cc         |  17 +
> >  gcc/common/config/i386/i386-cpuinfo.h         |   1 +
> >  gcc/common/config/i386/i386-isas.h            |   1 +
> >  gcc/config/avr/avr.h                          |   5 +-
> >  gcc/config/gcn/gcn.h                          |   4 +-
> >  gcc/config/i386/constraints.md                |  26 +-
> >  gcc/config/i386/cpuid.h                       |   1 +
> >  gcc/config/i386/i386-isa.def                  |   1 +
> >  gcc/config/i386/i386-options.cc               |  15 +
> >  gcc/config/i386/i386-opts.h                   |   8 +
> >  gcc/config/i386/i386-protos.h                 |   9 +
> >  gcc/config/i386/i386.cc                       | 253 +++++-
> >  gcc/config/i386/i386.h                        |  69 +-
> >  gcc/config/i386/i386.md                       | 144 ++-
> >  gcc/config/i386/i386.opt                      |  30 +
> >  gcc/config/i386/mmx.md                        | 170 ++--
> >  gcc/config/i386/sse.md                        | 859 ++++++++++++------
> >  gcc/config/rl78/rl78.h                        |   6 +-
> >  gcc/doc/invoke.texi                           |  11 +-
> >  gcc/doc/tm.texi                               |  17 +-
> >  gcc/doc/tm.texi.in                            |  17 +-
> >  gcc/lra-constraints.cc                        |  32 +-
> >  gcc/reload.cc                                 |  34 +-
> >  gcc/reload1.cc                                |   2 +-
> >  gcc/testsuite/gcc.target/i386/apx-1.c         |   8 +
> >  .../gcc.target/i386/apx-egprs-names.c         |  17 +
> >  .../gcc.target/i386/apx-inline-gpr-norex2.c   | 108 +++
> >  .../gcc.target/i386/apx-interrupt-1.c         | 102 +++
> >  .../i386/apx-legacy-insn-check-norex2-asm.c   |   5 +
> >  .../i386/apx-legacy-insn-check-norex2.c       | 181 ++++
> >  .../gcc.target/i386/apx-spill_to_egprs-1.c    |  25 +
> >  gcc/testsuite/lib/target-supports.exp         |  10 +
> >  34 files changed, 1747 insertions(+), 478 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/apx-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/apx-egprs-names.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/apx-inline-gpr-norex2.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/apx-interrupt-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2-asm.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/apx-spill_to_egprs-1.c
> >
> > --
> > 2.31.1
> >

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 11/13] [APX EGPR] Handle legacy insns that only support GPR16 (3/5)
  2023-08-31  9:28     ` Richard Biener
@ 2023-09-01  9:03       ` Hongyu Wang
  2023-09-01 10:38       ` Hongtao Liu
  1 sibling, 0 replies; 49+ messages in thread
From: Hongyu Wang @ 2023-09-01  9:03 UTC (permalink / raw)
  To: Richard Biener; +Cc: Hongyu Wang, gcc-patches, jakub, hongtao.liu, hubicka

Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> 于2023年8月31日周四 17:31写道:
>
> On Thu, Aug 31, 2023 at 11:26 AM Richard Biener
> <richard.guenther@gmail.com> wrote:
> >
> > On Thu, Aug 31, 2023 at 10:25 AM Hongyu Wang via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> > >
> > > From: Kong Lingling <lingling.kong@intel.com>
> > >
> > > Disable EGPR usage for below legacy insns in opcode map2/3 that have vex
> > > but no evex counterpart.
> > >
> > > insn list:
> > > 1. phminposuw/vphminposuw
> > > 2. ptest/vptest
> > > 3. roundps/vroundps, roundpd/vroundpd,
> > >    roundss/vroundss, roundsd/vroundsd
> > > 4. pcmpestri/vpcmpestri, pcmpestrm/vpcmpestrm
> > > 5. pcmpistri/vpcmpistri, pcmpistrm/vpcmpistrm
> >
> > How are GPRs involved in the above?  Or did I misunderstand something?
>
> Following up myself - for the memory operand alternatives I guess.  How about
> simply disabling the memory alternatives when EGPR is active?  Wouldn't
> that simplify the initial patchset a lot?  Re-enabling them when
> deemed important
> could be done as followup then?
>

It also require per-pattern change that set the attr isa for mem alternative
with "noapx_egpr".  We also have a series of patterns that some
alternatives support
EGPR and another does not. Like commonly used vec_set patterns, then we still
need to identify the alternatives that if it is EGPR supported.


> Richard.
>
> > > 6. aesimc/vaesimc, aeskeygenassist/vaeskeygenassist
> > >
> > > gcc/ChangeLog:
> > >
> > >         * config/i386/i386-protos.h (x86_evex_reg_mentioned_p): New
> > >         prototype.
> > >         * config/i386/i386.cc (x86_evex_reg_mentioned_p): New
> > >         function.
> > >         * config/i386/i386.md (sse4_1_round<mode>2): Set attr gpr32 0
> > >         and constraint Bt/BM to all non-evex alternatives, adjust
> > >         alternative outputs if evex reg is mentioned.
> > >         * config/i386/sse.md (<sse4_1>_ptest<mode>): Set attr gpr32 0
> > >         and constraint Bt/BM to all non-evex alternatives.
> > >         (ptesttf2): Likewise.
> > >         (<sse4_1>_round<ssemodesuffix><avxsizesuffix): Likewise.
> > >         (sse4_1_round<ssescalarmodesuffix>): Likewise.
> > >         (sse4_2_pcmpestri): Likewise.
> > >         (sse4_2_pcmpestrm): Likewise.
> > >         (sse4_2_pcmpestr_cconly): Likewise.
> > >         (sse4_2_pcmpistr): Likewise.
> > >         (sse4_2_pcmpistri): Likewise.
> > >         (sse4_2_pcmpistrm): Likewise.
> > >         (sse4_2_pcmpistr_cconly): Likewise.
> > >         (aesimc): Likewise.
> > >         (aeskeygenassist): Likewise.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >         * gcc.target/i386/apx-legacy-insn-check-norex2.c: Add intrinsic
> > >         tests.
> > > ---
> > >  gcc/config/i386/i386-protos.h                 |  1 +
> > >  gcc/config/i386/i386.cc                       | 13 +++
> > >  gcc/config/i386/i386.md                       |  3 +-
> > >  gcc/config/i386/sse.md                        | 93 +++++++++++++------
> > >  .../i386/apx-legacy-insn-check-norex2.c       | 55 ++++++++++-
> > >  5 files changed, 132 insertions(+), 33 deletions(-)
> > >
> > > diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
> > > index 78eb3e0f584..bbb219e3039 100644
> > > --- a/gcc/config/i386/i386-protos.h
> > > +++ b/gcc/config/i386/i386-protos.h
> > > @@ -65,6 +65,7 @@ extern bool extended_reg_mentioned_p (rtx);
> > >  extern bool x86_extended_QIreg_mentioned_p (rtx_insn *);
> > >  extern bool x86_extended_reg_mentioned_p (rtx);
> > >  extern bool x86_extended_rex2reg_mentioned_p (rtx);
> > > +extern bool x86_evex_reg_mentioned_p (rtx [], int);
> > >  extern bool x86_maybe_negate_const_int (rtx *, machine_mode);
> > >  extern machine_mode ix86_cc_mode (enum rtx_code, rtx, rtx);
> > >
> > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > > index f5d642948bc..ec93c5bab97 100644
> > > --- a/gcc/config/i386/i386.cc
> > > +++ b/gcc/config/i386/i386.cc
> > > @@ -22936,6 +22936,19 @@ x86_extended_rex2reg_mentioned_p (rtx insn)
> > >    return false;
> > >  }
> > >
> > > +/* Return true when rtx operands mentions register that must be encoded using
> > > +   evex prefix.  */
> > > +bool
> > > +x86_evex_reg_mentioned_p (rtx operands[], int nops)
> > > +{
> > > +  int i;
> > > +  for (i = 0; i < nops; i++)
> > > +    if (EXT_REX_SSE_REG_P (operands[i])
> > > +       || x86_extended_rex2reg_mentioned_p (operands[i]))
> > > +      return true;
> > > +  return false;
> > > +}
> > > +
> > >  /* If profitable, negate (without causing overflow) integer constant
> > >     of mode MODE at location LOC.  Return true in this case.  */
> > >  bool
> > > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> > > index 83ad01b43c1..4c305e72389 100644
> > > --- a/gcc/config/i386/i386.md
> > > +++ b/gcc/config/i386/i386.md
> > > @@ -21603,7 +21603,7 @@ (define_expand "significand<mode>2"
> > >  (define_insn "sse4_1_round<mode>2"
> > >    [(set (match_operand:MODEFH 0 "register_operand" "=x,x,x,v,v")
> > >         (unspec:MODEFH
> > > -         [(match_operand:MODEFH 1 "nonimmediate_operand" "0,x,m,v,m")
> > > +         [(match_operand:MODEFH 1 "nonimmediate_operand" "0,x,Bt,v,m")
> > >            (match_operand:SI 2 "const_0_to_15_operand")]
> > >           UNSPEC_ROUND))]
> > >    "TARGET_SSE4_1"
> > > @@ -21616,6 +21616,7 @@ (define_insn "sse4_1_round<mode>2"
> > >    [(set_attr "type" "ssecvt")
> > >     (set_attr "prefix_extra" "1,1,1,*,*")
> > >     (set_attr "length_immediate" "1")
> > > +   (set_attr "gpr32" "1,1,0,1,1")
> > >     (set_attr "prefix" "maybe_vex,maybe_vex,maybe_vex,evex,evex")
> > >     (set_attr "isa" "noavx512f,noavx512f,noavx512f,avx512f,avx512f")
> > >     (set_attr "avx_partial_xmm_update" "false,false,true,false,true")
> > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > > index 05963de9219..456713b991a 100644
> > > --- a/gcc/config/i386/sse.md
> > > +++ b/gcc/config/i386/sse.md
> > > @@ -22617,11 +22617,12 @@ (define_insn "avx2_pblendd<mode>"
> > >
> > >  (define_insn "sse4_1_phminposuw"
> > >    [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,x")
> > > -       (unspec:V8HI [(match_operand:V8HI 1 "vector_operand" "YrBm,*xBm,xm")]
> > > +       (unspec:V8HI [(match_operand:V8HI 1 "vector_operand" "YrBT,*xBT,xBt")]
> > >                      UNSPEC_PHMINPOSUW))]
> > >    "TARGET_SSE4_1"
> > >    "%vphminposuw\t{%1, %0|%0, %1}"
> > >    [(set_attr "isa" "noavx,noavx,avx")
> > > +   (set_attr "gpr32" "0")
> > >     (set_attr "type" "sselog1")
> > >     (set_attr "prefix_extra" "1")
> > >     (set_attr "prefix" "orig,orig,vex")
> > > @@ -23810,12 +23811,13 @@ (define_insn "avx_vtest<ssemodesuffix><avxsizesuffix>"
> > >  (define_insn "*<sse4_1>_ptest<mode>"
> > >    [(set (reg FLAGS_REG)
> > >         (unspec [(match_operand:V_AVX 0 "register_operand" "Yr, *x, x")
> > > -                (match_operand:V_AVX 1 "vector_operand" "YrBm, *xBm, xm")]
> > > +                (match_operand:V_AVX 1 "vector_operand" "YrBT, *xBT, xBt")]
> > >                 UNSPEC_PTEST))]
> > >    "TARGET_SSE4_1 && ix86_match_ptest_ccmode (insn)"
> > >    "%vptest\t{%1, %0|%0, %1}"
> > >    [(set_attr "isa" "noavx,noavx,avx")
> > >     (set_attr "type" "ssecomi")
> > > +   (set_attr "gpr32" "0")
> > >     (set_attr "prefix_extra" "1")
> > >     (set_attr "prefix" "orig,orig,vex")
> > >     (set (attr "btver2_decode")
> > > @@ -23852,12 +23854,13 @@ (define_expand "<sse4_1>_ptest<mode>"
> > >  (define_insn "ptesttf2"
> > >    [(set (reg:CC FLAGS_REG)
> > >         (unspec:CC [(match_operand:TF 0 "register_operand" "Yr, *x, x")
> > > -                   (match_operand:TF 1 "vector_operand" "YrBm, *xBm, xm")]
> > > +                   (match_operand:TF 1 "vector_operand" "YrBT, *xBT, xBt")]
> > >                    UNSPEC_PTEST))]
> > >    "TARGET_SSE4_1"
> > >    "%vptest\t{%1, %0|%0, %1}"
> > >    [(set_attr "isa" "noavx,noavx,avx")
> > >     (set_attr "type" "ssecomi")
> > > +   (set_attr "gpr32" "0")
> > >     (set_attr "prefix_extra" "1")
> > >     (set_attr "prefix" "orig,orig,vex")
> > >     (set_attr "mode" "TI")])
> > > @@ -23968,13 +23971,14 @@ (define_expand "lrint<mode><sseintvecmodelower>2"
> > >  (define_insn "<sse4_1>_round<ssemodesuffix><avxsizesuffix>"
> > >    [(set (match_operand:VF_128_256 0 "register_operand" "=Yr,*x,x")
> > >         (unspec:VF_128_256
> > > -         [(match_operand:VF_128_256 1 "vector_operand" "YrBm,*xBm,xm")
> > > +         [(match_operand:VF_128_256 1 "vector_operand" "YrBT,*xBT,xBt")
> > >            (match_operand:SI 2 "const_0_to_15_operand")]
> > >           UNSPEC_ROUND))]
> > >    "TARGET_SSE4_1"
> > >    "%vround<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
> > >    [(set_attr "isa" "noavx,noavx,avx")
> > >     (set_attr "type" "ssecvt")
> > > +   (set_attr "gpr32" "0")
> > >     (set_attr "prefix_data16" "1,1,*")
> > >     (set_attr "prefix_extra" "1")
> > >     (set_attr "length_immediate" "1")
> > > @@ -24061,19 +24065,32 @@ (define_insn "sse4_1_round<ssescalarmodesuffix>"
> > >    [(set (match_operand:VF_128 0 "register_operand" "=Yr,*x,x,v")
> > >         (vec_merge:VF_128
> > >           (unspec:VF_128
> > > -           [(match_operand:VF_128 2 "nonimmediate_operand" "Yrm,*xm,xm,vm")
> > > +           [(match_operand:VF_128 2 "nonimmediate_operand" "YrBt,*xBt,xBt,vm")
> > >              (match_operand:SI 3 "const_0_to_15_operand")]
> > >             UNSPEC_ROUND)
> > >           (match_operand:VF_128 1 "register_operand" "0,0,x,v")
> > >           (const_int 1)))]
> > >    "TARGET_SSE4_1"
> > > -  "@
> > > -   round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %3}
> > > -   round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %3}
> > > -   vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}
> > > -   vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}"
> > > -  [(set_attr "isa" "noavx,noavx,avx,avx512f")
> > > +{
> > > +  switch (which_alternative)
> > > +    {
> > > +      case 0:
> > > +      case 1:
> > > +       return "round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %3}";
> > > +      case 2:
> > > +       return "vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}";
> > > +      case 3:
> > > +       if (x86_evex_reg_mentioned_p (operands, 3))
> > > +         return "vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}";
> > > +       else
> > > +         return "vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}";
> > > +      default:
> > > +       gcc_unreachable ();
> > > +    }
> > > +}
> > > +  [(set_attr "isa" "noavx,noavx,noavx512f,avx512f")
> > >     (set_attr "type" "ssecvt")
> > > +   (set_attr "gpr32" "0,0,0,1")
> > >     (set_attr "length_immediate" "1")
> > >     (set_attr "prefix_data16" "1,1,*,*")
> > >     (set_attr "prefix_extra" "1")
> > > @@ -24085,19 +24102,32 @@ (define_insn "*sse4_1_round<ssescalarmodesuffix>"
> > >         (vec_merge:VFH_128
> > >           (vec_duplicate:VFH_128
> > >             (unspec:<ssescalarmode>
> > > -             [(match_operand:<ssescalarmode> 2 "nonimmediate_operand" "Yrm,*xm,xm,vm")
> > > +             [(match_operand:<ssescalarmode> 2 "nonimmediate_operand" "YrBt,*xBt,xBt,vm")
> > >                (match_operand:SI 3 "const_0_to_15_operand")]
> > >               UNSPEC_ROUND))
> > >           (match_operand:VFH_128 1 "register_operand" "0,0,x,v")
> > >           (const_int 1)))]
> > >    "TARGET_SSE4_1"
> > > -  "@
> > > -   round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %2, %3}
> > > -   round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %2, %3}
> > > -   vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}
> > > -   vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
> > > -  [(set_attr "isa" "noavx,noavx,avx,avx512f")
> > > +{
> > > +  switch (which_alternative)
> > > +    {
> > > +      case 0:
> > > +      case 1:
> > > +       return "round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %2, %3}";
> > > +      case 2:
> > > +       return "vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}";
> > > +      case 3:
> > > +       if (x86_evex_reg_mentioned_p (operands, 3) || <MODE>mode == V8HFmode)
> > > +         return "vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}";
> > > +       else
> > > +         return "vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}";
> > > +      default:
> > > +       gcc_unreachable ();
> > > +    }
> > > +}
> > > +  [(set_attr "isa" "noavx,noavx,noavx512f,avx512f")
> > >     (set_attr "type" "ssecvt")
> > > +   (set_attr "gpr32" "0,0,0,1")
> > >     (set_attr "length_immediate" "1")
> > >     (set_attr "prefix_data16" "1,1,*,*")
> > >     (set_attr "prefix_extra" "1")
> > > @@ -24318,7 +24348,7 @@ (define_insn "sse4_2_pcmpestri"
> > >         (unspec:SI
> > >           [(match_operand:V16QI 1 "register_operand" "x,x")
> > >            (match_operand:SI 2 "register_operand" "a,a")
> > > -          (match_operand:V16QI 3 "nonimmediate_operand" "x,m")
> > > +          (match_operand:V16QI 3 "nonimmediate_operand" "x,Bt")
> > >            (match_operand:SI 4 "register_operand" "d,d")
> > >            (match_operand:SI 5 "const_0_to_255_operand")]
> > >           UNSPEC_PCMPESTR))
> > > @@ -24333,6 +24363,7 @@ (define_insn "sse4_2_pcmpestri"
> > >    "TARGET_SSE4_2"
> > >    "%vpcmpestri\t{%5, %3, %1|%1, %3, %5}"
> > >    [(set_attr "type" "sselog")
> > > +   (set_attr "gpr32" "0")
> > >     (set_attr "prefix_extra" "1")
> > >     (set_attr "prefix" "maybe_vex")
> > >     (set_attr "length_immediate" "1")
> > > @@ -24345,7 +24376,7 @@ (define_insn "sse4_2_pcmpestrm"
> > >         (unspec:V16QI
> > >           [(match_operand:V16QI 1 "register_operand" "x,x")
> > >            (match_operand:SI 2 "register_operand" "a,a")
> > > -          (match_operand:V16QI 3 "nonimmediate_operand" "x,m")
> > > +          (match_operand:V16QI 3 "nonimmediate_operand" "x,Bt")
> > >            (match_operand:SI 4 "register_operand" "d,d")
> > >            (match_operand:SI 5 "const_0_to_255_operand")]
> > >           UNSPEC_PCMPESTR))
> > > @@ -24360,6 +24391,7 @@ (define_insn "sse4_2_pcmpestrm"
> > >    "TARGET_SSE4_2"
> > >    "%vpcmpestrm\t{%5, %3, %1|%1, %3, %5}"
> > >    [(set_attr "type" "sselog")
> > > +   (set_attr "gpr32" "0")
> > >     (set_attr "prefix_extra" "1")
> > >     (set_attr "length_immediate" "1")
> > >     (set_attr "prefix" "maybe_vex")
> > > @@ -24372,7 +24404,7 @@ (define_insn "sse4_2_pcmpestr_cconly"
> > >         (unspec:CC
> > >           [(match_operand:V16QI 2 "register_operand" "x,x,x,x")
> > >            (match_operand:SI 3 "register_operand" "a,a,a,a")
> > > -          (match_operand:V16QI 4 "nonimmediate_operand" "x,m,x,m")
> > > +          (match_operand:V16QI 4 "nonimmediate_operand" "x,Bt,x,Bt")
> > >            (match_operand:SI 5 "register_operand" "d,d,d,d")
> > >            (match_operand:SI 6 "const_0_to_255_operand")]
> > >           UNSPEC_PCMPESTR))
> > > @@ -24385,6 +24417,7 @@ (define_insn "sse4_2_pcmpestr_cconly"
> > >     %vpcmpestri\t{%6, %4, %2|%2, %4, %6}
> > >     %vpcmpestri\t{%6, %4, %2|%2, %4, %6}"
> > >    [(set_attr "type" "sselog")
> > > +   (set_attr "gpr32" "0")
> > >     (set_attr "prefix_extra" "1")
> > >     (set_attr "length_immediate" "1")
> > >     (set_attr "memory" "none,load,none,load")
> > > @@ -24396,7 +24429,7 @@ (define_insn_and_split "sse4_2_pcmpistr"
> > >    [(set (match_operand:SI 0 "register_operand" "=c,c")
> > >         (unspec:SI
> > >           [(match_operand:V16QI 2 "register_operand" "x,x")
> > > -          (match_operand:V16QI 3 "nonimmediate_operand" "x,m")
> > > +          (match_operand:V16QI 3 "nonimmediate_operand" "x,Bt")
> > >            (match_operand:SI 4 "const_0_to_255_operand")]
> > >           UNSPEC_PCMPISTR))
> > >     (set (match_operand:V16QI 1 "register_operand" "=Yz,Yz")
> > > @@ -24439,6 +24472,7 @@ (define_insn_and_split "sse4_2_pcmpistr"
> > >    DONE;
> > >  }
> > >    [(set_attr "type" "sselog")
> > > +   (set_attr "gpr32" "0")
> > >     (set_attr "prefix_extra" "1")
> > >     (set_attr "length_immediate" "1")
> > >     (set_attr "memory" "none,load")
> > > @@ -24448,7 +24482,7 @@ (define_insn "sse4_2_pcmpistri"
> > >    [(set (match_operand:SI 0 "register_operand" "=c,c")
> > >         (unspec:SI
> > >           [(match_operand:V16QI 1 "register_operand" "x,x")
> > > -          (match_operand:V16QI 2 "nonimmediate_operand" "x,m")
> > > +          (match_operand:V16QI 2 "nonimmediate_operand" "x,Bt")
> > >            (match_operand:SI 3 "const_0_to_255_operand")]
> > >           UNSPEC_PCMPISTR))
> > >     (set (reg:CC FLAGS_REG)
> > > @@ -24460,6 +24494,7 @@ (define_insn "sse4_2_pcmpistri"
> > >    "TARGET_SSE4_2"
> > >    "%vpcmpistri\t{%3, %2, %1|%1, %2, %3}"
> > >    [(set_attr "type" "sselog")
> > > +   (set_attr "gpr32" "0")
> > >     (set_attr "prefix_extra" "1")
> > >     (set_attr "length_immediate" "1")
> > >     (set_attr "prefix" "maybe_vex")
> > > @@ -24471,7 +24506,7 @@ (define_insn "sse4_2_pcmpistrm"
> > >    [(set (match_operand:V16QI 0 "register_operand" "=Yz,Yz")
> > >         (unspec:V16QI
> > >           [(match_operand:V16QI 1 "register_operand" "x,x")
> > > -          (match_operand:V16QI 2 "nonimmediate_operand" "x,m")
> > > +          (match_operand:V16QI 2 "nonimmediate_operand" "x,Bt")
> > >            (match_operand:SI 3 "const_0_to_255_operand")]
> > >           UNSPEC_PCMPISTR))
> > >     (set (reg:CC FLAGS_REG)
> > > @@ -24483,6 +24518,7 @@ (define_insn "sse4_2_pcmpistrm"
> > >    "TARGET_SSE4_2"
> > >    "%vpcmpistrm\t{%3, %2, %1|%1, %2, %3}"
> > >    [(set_attr "type" "sselog")
> > > +   (set_attr "gpr32" "0")
> > >     (set_attr "prefix_extra" "1")
> > >     (set_attr "length_immediate" "1")
> > >     (set_attr "prefix" "maybe_vex")
> > > @@ -24494,7 +24530,7 @@ (define_insn "sse4_2_pcmpistr_cconly"
> > >    [(set (reg:CC FLAGS_REG)
> > >         (unspec:CC
> > >           [(match_operand:V16QI 2 "register_operand" "x,x,x,x")
> > > -          (match_operand:V16QI 3 "nonimmediate_operand" "x,m,x,m")
> > > +          (match_operand:V16QI 3 "nonimmediate_operand" "x,Bt,x,Bt")
> > >            (match_operand:SI 4 "const_0_to_255_operand")]
> > >           UNSPEC_PCMPISTR))
> > >     (clobber (match_scratch:V16QI 0 "=Yz,Yz,X,X"))
> > > @@ -24506,6 +24542,7 @@ (define_insn "sse4_2_pcmpistr_cconly"
> > >     %vpcmpistri\t{%4, %3, %2|%2, %3, %4}
> > >     %vpcmpistri\t{%4, %3, %2|%2, %3, %4}"
> > >    [(set_attr "type" "sselog")
> > > +   (set_attr "gpr32" "0")
> > >     (set_attr "prefix_extra" "1")
> > >     (set_attr "length_immediate" "1")
> > >     (set_attr "memory" "none,load,none,load")
> > > @@ -25990,23 +26027,25 @@ (define_insn "aesdeclast"
> > >
> > >  (define_insn "aesimc"
> > >    [(set (match_operand:V2DI 0 "register_operand" "=x")
> > > -       (unspec:V2DI [(match_operand:V2DI 1 "vector_operand" "xBm")]
> > > +       (unspec:V2DI [(match_operand:V2DI 1 "vector_operand" "xBT")]
> > >                       UNSPEC_AESIMC))]
> > >    "TARGET_AES"
> > >    "%vaesimc\t{%1, %0|%0, %1}"
> > >    [(set_attr "type" "sselog1")
> > > +   (set_attr "gpr32" "0")
> > >     (set_attr "prefix_extra" "1")
> > >     (set_attr "prefix" "maybe_vex")
> > >     (set_attr "mode" "TI")])
> > >
> > >  (define_insn "aeskeygenassist"
> > >    [(set (match_operand:V2DI 0 "register_operand" "=x")
> > > -       (unspec:V2DI [(match_operand:V2DI 1 "vector_operand" "xBm")
> > > +       (unspec:V2DI [(match_operand:V2DI 1 "vector_operand" "xBT")
> > >                       (match_operand:SI 2 "const_0_to_255_operand")]
> > >                      UNSPEC_AESKEYGENASSIST))]
> > >    "TARGET_AES"
> > >    "%vaeskeygenassist\t{%2, %1, %0|%0, %1, %2}"
> > >    [(set_attr "type" "sselog1")
> > > +   (set_attr "gpr32" "0")
> > >     (set_attr "prefix_extra" "1")
> > >     (set_attr "length_immediate" "1")
> > >     (set_attr "prefix" "maybe_vex")
> > > diff --git a/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c b/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c
> > > index 510213a6ca7..771bcb078e1 100644
> > > --- a/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c
> > > +++ b/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c
> > > @@ -45,13 +45,22 @@ typedef union
> > >    DTYPE a[16];
> > >  } tmp_u;
> > >
> > > -__attribute__((target("sse4.2")))
> > > +__attribute__((target("sse4.2,aes")))
> > >  void sse_test ()
> > >  {
> > >    register tmp_u *tdst __asm__("%r16");
> > >    register tmp_u *src1 __asm__("%r17");
> > >    register tmp_u *src2 __asm__("%r18");
> > > -
> > > +
> > > +  src1->xi[0] = _mm_minpos_epu16 (src1->xi[1]);
> > > +  src1->a[2] = _mm_testc_si128 (src1->xi[3], src2->xi[4]);
> > > +  src1->xf[3] = _mm_round_ss (src1->xf[5], src2->xf[6],
> > > +                             _MM_FROUND_CUR_DIRECTION);
> > > +  src1->xf[4] = _mm_round_ps (src1->xf[7], _MM_FROUND_CUR_DIRECTION);
> > > +  src1->xd[0] = _mm_round_sd (src1->xd[2], src2->xd[3],
> > > +                             _MM_FROUND_CUR_DIRECTION);
> > > +  src1->xd[1] = _mm_round_pd (src1->xd[4], _MM_FROUND_CUR_DIRECTION);
> > > +
> > >    src1->xi[0] = _mm_hadd_epi16 (tdst->xi[2], src2->xi[3]);
> > >    src1->xi[1] = _mm_hadd_epi32 (tdst->xi[0], src2->xi[1]);
> > >    tdst->xi[2] = _mm_hadds_epi16 (src1->xi[4], src2->xi[5]);
> > > @@ -77,16 +86,33 @@ void sse_test ()
> > >    tdst->xi[1] = _mm_sign_epi8 (src1->xi[5], src2->xi[6]);
> > >    tdst->xi[2] = _mm_sign_epi16 (src1->xi[7], src2->xi[0]);
> > >    tdst->xi[3] = _mm_sign_epi32 (src1->xi[1], src2->xi[2]);
> > > +
> > > +  tdst->a[2] = _mm_cmpestri (src1->xi[3], 16, src2->xi[4], 16, 0x0c);
> > > +  tdst->xi[4] = _mm_cmpestrm (src1->xi[3], 16, src2->xi[4], 16, 0x20);
> > > +  tdst->a[5] = _mm_cmpistri (src1->xi[5], src2->xi[6], 0x30);
> > > +  tdst->xi[6] = _mm_cmpistrm (src1->xi[5], src2->xi[6], 0x40);
> > > +
> > > +  tdst->xi[7] = _mm_aesimc_si128 (src1->xi[7]);
> > > +  tdst->xi[0] = _mm_aeskeygenassist_si128 (src1->xi[1], 0x1b);
> > >  }
> > >
> > > -__attribute__((target("avx2")))
> > > +__attribute__((target("avx2,aes")))
> > >  void vex_test ()
> > >  {
> > >
> > >    register tmp_u *tdst __asm__("%r16");
> > >    register tmp_u *src1 __asm__("%r17");
> > >    register tmp_u *src2 __asm__("%r18");
> > > -
> > > +
> > > +  src1->xi[0] = _mm_minpos_epu16 (src1->xi[1]);
> > > +  src1->a[2] = _mm256_testc_si256 (src1->yi[2], src2->yi[3]);
> > > +  src1->xf[3] = _mm_round_ss (src1->xf[5], src2->xf[6],
> > > +                             _MM_FROUND_CUR_DIRECTION);
> > > +  src1->yf[4] = _mm256_round_ps (src1->yf[2], _MM_FROUND_CUR_DIRECTION);
> > > +  src1->xd[0] = _mm_round_sd (src1->xd[2], src2->xd[3],
> > > +                             _MM_FROUND_CUR_DIRECTION);
> > > +  src1->yd[1] = _mm256_round_pd (src1->yd[3], _MM_FROUND_CUR_DIRECTION);
> > > +
> > >    src1->yi[1] = _mm256_hadd_epi16 (tdst->yi[2], src2->yi[3]);
> > >    src1->yi[2] = _mm256_hadd_epi32 (tdst->yi[0], src2->yi[1]);
> > >    tdst->yi[3] = _mm256_hadds_epi16 (src1->yi[1], src2->yi[2]);
> > > @@ -98,7 +124,6 @@ void vex_test ()
> > >    src1->yi[1] = _mm256_cmpgt_epi64 (tdst->yi[3], src2->yi[0]);
> > >
> > >    tdst->yf[2] = _mm256_dp_ps (src1->yf[0], src2->yf[1], 0xbf);
> > > -  tdst->xd[3] = _mm_dp_pd (src1->xd[0], src2->xd[1], 0xbf);
> > >
> > >    tdst->yi[3] = _mm256_mpsadbw_epu8 (src1->yi[1], src2->yi[1], 0xc1);
> > >
> > > @@ -112,6 +137,14 @@ void vex_test ()
> > >    tdst->yi[2] = _mm256_sign_epi8 (src1->yi[0], src2->yi[1]);
> > >    tdst->yi[3] = _mm256_sign_epi16 (src1->yi[2], src2->yi[3]);
> > >    tdst->yi[0] = _mm256_sign_epi32 (src1->yi[0], src2->yi[1]);
> > > +
> > > +  tdst->a[2] = _mm_cmpestri (src1->xi[3], 16, src2->xi[4], 16, 0x0c);
> > > +  tdst->xi[4] = _mm_cmpestrm (src1->xi[3], 16, src2->xi[4], 16, 0x20);
> > > +  tdst->a[5] = _mm_cmpistri (src1->xi[5], src2->xi[6], 0x30);
> > > +  tdst->xi[6] = _mm_cmpistrm (src1->xi[5], src2->xi[6], 0x40);
> > > +
> > > +  tdst->xi[7] = _mm_aesimc_si128 (src1->xi[7]);
> > > +  tdst->xi[0] = _mm_aeskeygenassist_si128 (src1->xi[1], 0x1b);
> > >  }
> > >
> > >  /* { dg-final { scan-assembler-not "v?pcmpeqq\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > > @@ -134,3 +167,15 @@ void vex_test ()
> > >  /* { dg-final { scan-assembler-not "v?psignb\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > >  /* { dg-final { scan-assembler-not "v?psignw\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > >  /* { dg-final { scan-assembler-not "v?psignd\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > > +/* { dg-final { scan-assembler-not "v?phminposuw\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > > +/* { dg-final { scan-assembler-not "v?ptest\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > > +/* { dg-final { scan-assembler-not "v?roundss\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > > +/* { dg-final { scan-assembler-not "v?roundsd\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > > +/* { dg-final { scan-assembler-not "v?roundps\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > > +/* { dg-final { scan-assembler-not "v?roundpd\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > > +/* { dg-final { scan-assembler-not "v?pcmpestri\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > > +/* { dg-final { scan-assembler-not "v?pcmpistri\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > > +/* { dg-final { scan-assembler-not "v?pcmpestrm\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > > +/* { dg-final { scan-assembler-not "v?pcmpistrm\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > > +/* { dg-final { scan-assembler-not "v?aesimc\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > > +/* { dg-final { scan-assembler-not "v?aeskeygenassist\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > > --
> > > 2.31.1
> > >

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 06/13] [APX EGPR] Map reg/mem constraints in inline asm to non-EGPR constraint.
  2023-08-31  9:17   ` Jakub Jelinek
  2023-08-31 10:00     ` Uros Bizjak
@ 2023-09-01  9:04     ` Hongyu Wang
  1 sibling, 0 replies; 49+ messages in thread
From: Hongyu Wang @ 2023-09-01  9:04 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Hongyu Wang, gcc-patches, hongtao.liu, hubicka

Jakub Jelinek via Gcc-patches <gcc-patches@gcc.gnu.org> 于2023年8月31日周四 17:18写道:
>
> On Thu, Aug 31, 2023 at 04:20:17PM +0800, Hongyu Wang via Gcc-patches wrote:
> > From: Kong Lingling <lingling.kong@intel.com>
> >
> > In inline asm, we do not know if the insn can use EGPR, so disable EGPR
> > usage by default from mapping the common reg/mem constraint to non-EGPR
> > constraints. Use a flag mapx-inline-asm-use-gpr32 to enable EGPR usage
> > for inline asm.
> >
> > gcc/ChangeLog:
> >
> >       * config/i386/i386.cc (INCLUDE_STRING): Add include for
> >       ix86_md_asm_adjust.
> >       (ix86_md_asm_adjust): When APX EGPR enabled without specifying the
> >       target option, map reg/mem constraints to non-EGPR constraints.
> >       * config/i386/i386.opt: Add option mapx-inline-asm-use-gpr32.
> >
> > gcc/testsuite/ChangeLog:
> >
> >       * gcc.target/i386/apx-inline-gpr-norex2.c: New test.
> > ---
> >  gcc/config/i386/i386.cc                       |  44 +++++++
> >  gcc/config/i386/i386.opt                      |   5 +
> >  .../gcc.target/i386/apx-inline-gpr-norex2.c   | 107 ++++++++++++++++++
> >  3 files changed, 156 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/apx-inline-gpr-norex2.c
> >
> > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > index d26d9ab0d9d..9460ebbfda4 100644
> > --- a/gcc/config/i386/i386.cc
> > +++ b/gcc/config/i386/i386.cc
> > @@ -17,6 +17,7 @@ You should have received a copy of the GNU General Public License
> >  along with GCC; see the file COPYING3.  If not see
> >  <http://www.gnu.org/licenses/>.  */
> >
> > +#define INCLUDE_STRING
> >  #define IN_TARGET_CODE 1
> >
> >  #include "config.h"
> > @@ -23077,6 +23078,49 @@ ix86_md_asm_adjust (vec<rtx> &outputs, vec<rtx> & /*inputs*/,
> >    bool saw_asm_flag = false;
> >
> >    start_sequence ();
> > +  /* TODO: Here we just mapped the general r/m constraints to non-EGPR
> > +   constraints, will eventually map all the usable constraints in the future. */
>
> I think there should be some constraint which explicitly has all the 32
> GPRs, like there is one for just all 16 GPRs (h), so that regardless of
> -mapx-inline-asm-use-gpr32 one can be explicit what the inline asm wants.
>

Yes, we will add new register constraints. For memory constraints it requires
some special handling in ix86_memory_address_use_extended_reg_class_p

> Also, what about the "g" constraint?  Shouldn't there be another for "g"
> without r16..r31?  What about the various other memory
> constraints ("<", "o", ...)?

We will support fully mapping of all common constraints, with refining
of current
mapping in the V2 patch.

>
> > +  if (TARGET_APX_EGPR && !ix86_apx_inline_asm_use_gpr32)
> > +    {
> > +      /* Map "r" constraint in inline asm to "h" that disallows r16-r31
> > +      and replace only r, exclude Br and Yr.  */
> > +      for (unsigned i = 0; i < constraints.length (); i++)
> > +     {
> > +       std::string *s = new std::string (constraints[i]);
>
> Doesn't this leak memory (all the time)?
> I must say I don't really understand why you need to use std::string here,
> but certainly it shouldn't leak.

std::string just makes the code shorter than using str functions. Current code
will be completely refactored with supporting more mapping of constraints.

>
> > +       size_t pos = s->find ('r');
> > +       while (pos != std::string::npos)
> > +         {
> > +           if (pos > 0
> > +               && (s->at (pos - 1) == 'Y' || s->at (pos - 1) == 'B'))
> > +             pos = s->find ('r', pos + 1);
> > +           else
> > +             {
> > +               s->replace (pos, 1, "h");
> > +               constraints[i] = (const char*) s->c_str ();
>
> Formatting (space before *).  The usual way for constraints is ggc_strdup on
> some string in a buffer.  Also, one could have several copies or r (or m, memory (doesn't
> that appear just in clobbers?  And that doesn't look like something that
> should be replaced), Bm, e.g. in various alternatives.  So, you
> need to change them all, not just the first hit.  "r,r,r,m" and the like.
> Normally, one would simply walk the constraint string, parsing the special
> letters (+, =, & etc.) and single letter constraints and 2 letter
> constraints using CONSTRAINT_LEN macro (tons of examples in GCC sources).
> Either do it in 2 passes, first one counts how long constraint string one
> will need after the adjustments (and whether to adjust something at all),
> then if needed XALLOCAVEC it and adjust in there, or say use a
> auto_vec<char, 32> for
> it.

Thanks for your guidance. Previously we thought constraints[i] was a splitted
simple constraint, but clearly it is not.
We will refer to an existing example of this and rewrite current one.

>
> > +               break;
> > +             }
> > +         }
> > +     }
> > +      /* Also map "m/memory/Bm" constraint that may use GPR32, replace them with
> > +      "Bt/Bt/BT".  */
> > +      for (unsigned i = 0; i < constraints.length (); i++)
> > +     {
> > +       std::string *s = new std::string (constraints[i]);
> > +       size_t pos = s->find ("m");
> > +       size_t pos2 = s->find ("memory");
> > +       if (pos != std::string::npos)
> > +         {
> > +           if (pos > 0 && (s->at (pos - 1) == 'B'))
> > +               s->replace (pos - 1, 2, "BT");
> > +           else if (pos2 != std::string::npos)
> > +               s->replace (pos, 6, "Bt");
> > +           else
> > +               s->replace (pos, 1, "Bt");
>
> Formatting, the s->replace calls are indented too much.
>
>         Jakub
>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 06/13] [APX EGPR] Map reg/mem constraints in inline asm to non-EGPR constraint.
  2023-08-31 10:00     ` Uros Bizjak
@ 2023-09-01  9:04       ` Hongyu Wang
  2023-09-01  9:38         ` Uros Bizjak
  2023-09-01 11:03       ` Richard Sandiford
  1 sibling, 1 reply; 49+ messages in thread
From: Hongyu Wang @ 2023-09-01  9:04 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: Jakub Jelinek, Hongyu Wang, gcc-patches, hongtao.liu, hubicka

Uros Bizjak via Gcc-patches <gcc-patches@gcc.gnu.org> 于2023年8月31日周四 18:01写道:
>
> On Thu, Aug 31, 2023 at 11:18 AM Jakub Jelinek via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > On Thu, Aug 31, 2023 at 04:20:17PM +0800, Hongyu Wang via Gcc-patches wrote:
> > > From: Kong Lingling <lingling.kong@intel.com>
> > >
> > > In inline asm, we do not know if the insn can use EGPR, so disable EGPR
> > > usage by default from mapping the common reg/mem constraint to non-EGPR
> > > constraints. Use a flag mapx-inline-asm-use-gpr32 to enable EGPR usage
> > > for inline asm.
> > >
> > > gcc/ChangeLog:
> > >
> > >       * config/i386/i386.cc (INCLUDE_STRING): Add include for
> > >       ix86_md_asm_adjust.
> > >       (ix86_md_asm_adjust): When APX EGPR enabled without specifying the
> > >       target option, map reg/mem constraints to non-EGPR constraints.
> > >       * config/i386/i386.opt: Add option mapx-inline-asm-use-gpr32.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >       * gcc.target/i386/apx-inline-gpr-norex2.c: New test.
> > > ---
> > >  gcc/config/i386/i386.cc                       |  44 +++++++
> > >  gcc/config/i386/i386.opt                      |   5 +
> > >  .../gcc.target/i386/apx-inline-gpr-norex2.c   | 107 ++++++++++++++++++
> > >  3 files changed, 156 insertions(+)
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/apx-inline-gpr-norex2.c
> > >
> > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > > index d26d9ab0d9d..9460ebbfda4 100644
> > > --- a/gcc/config/i386/i386.cc
> > > +++ b/gcc/config/i386/i386.cc
> > > @@ -17,6 +17,7 @@ You should have received a copy of the GNU General Public License
> > >  along with GCC; see the file COPYING3.  If not see
> > >  <http://www.gnu.org/licenses/>.  */
> > >
> > > +#define INCLUDE_STRING
> > >  #define IN_TARGET_CODE 1
> > >
> > >  #include "config.h"
> > > @@ -23077,6 +23078,49 @@ ix86_md_asm_adjust (vec<rtx> &outputs, vec<rtx> & /*inputs*/,
> > >    bool saw_asm_flag = false;
> > >
> > >    start_sequence ();
> > > +  /* TODO: Here we just mapped the general r/m constraints to non-EGPR
> > > +   constraints, will eventually map all the usable constraints in the future. */
> >
> > I think there should be some constraint which explicitly has all the 32
> > GPRs, like there is one for just all 16 GPRs (h), so that regardless of
> > -mapx-inline-asm-use-gpr32 one can be explicit what the inline asm wants.
> >
> > Also, what about the "g" constraint?  Shouldn't there be another for "g"
> > without r16..r31?  What about the various other memory
> > constraints ("<", "o", ...)?
>
> I think we should leave all existing constraints as they are, so "r"
> covers only GPR16, "m" and "o" to only use GPR16. We can then
> introduce "h" to instructions that have the ability to handle EGPR.
> This would be somehow similar to the SSE -> AVX512F transition, where
> we still have "x" for SSE16 and "v" was introduced as a separate
> register class for EVEX SSE registers. This way, asm will be
> compatible, when "r", "m", "o" and "g" are used. The new memory
> constraint "Bt", should allow new registers, and should be added to
> the constraint string as a separate constraint, and conditionally
> enabled by relevant "isa" (AKA "enabled") attribute.

The extended constraint can work for registers, but for memory it is more
complicated.

If we want to use new mem constraints that allow gpr32, then BASE/INDEX
reg class still requires per-insn verification, so it means changes
on all patterns with vm, and those SSE patterns on opcode map0/1. Also,
several legacy insns that are promoted to EVEX encoding space need to be
changed. The overall implementation could be 10 times larger than current,
which would be quite hard for maintenance.

>
> Uros.
>
> > > +  if (TARGET_APX_EGPR && !ix86_apx_inline_asm_use_gpr32)
> > > +    {
> > > +      /* Map "r" constraint in inline asm to "h" that disallows r16-r31
> > > +      and replace only r, exclude Br and Yr.  */
> > > +      for (unsigned i = 0; i < constraints.length (); i++)
> > > +     {
> > > +       std::string *s = new std::string (constraints[i]);
> >
> > Doesn't this leak memory (all the time)?
> > I must say I don't really understand why you need to use std::string here,
> > but certainly it shouldn't leak.
> >
> > > +       size_t pos = s->find ('r');
> > > +       while (pos != std::string::npos)
> > > +         {
> > > +           if (pos > 0
> > > +               && (s->at (pos - 1) == 'Y' || s->at (pos - 1) == 'B'))
> > > +             pos = s->find ('r', pos + 1);
> > > +           else
> > > +             {
> > > +               s->replace (pos, 1, "h");
> > > +               constraints[i] = (const char*) s->c_str ();
> >
> > Formatting (space before *).  The usual way for constraints is ggc_strdup on
> > some string in a buffer.  Also, one could have several copies or r (or m, memory (doesn't
> > that appear just in clobbers?  And that doesn't look like something that
> > should be replaced), Bm, e.g. in various alternatives.  So, you
> > need to change them all, not just the first hit.  "r,r,r,m" and the like.
> > Normally, one would simply walk the constraint string, parsing the special
> > letters (+, =, & etc.) and single letter constraints and 2 letter
> > constraints using CONSTRAINT_LEN macro (tons of examples in GCC sources).
> > Either do it in 2 passes, first one counts how long constraint string one
> > will need after the adjustments (and whether to adjust something at all),
> > then if needed XALLOCAVEC it and adjust in there, or say use a
> > auto_vec<char, 32> for
> > it.
> >
> > > +               break;
> > > +             }
> > > +         }
> > > +     }
> > > +      /* Also map "m/memory/Bm" constraint that may use GPR32, replace them with
> > > +      "Bt/Bt/BT".  */
> > > +      for (unsigned i = 0; i < constraints.length (); i++)
> > > +     {
> > > +       std::string *s = new std::string (constraints[i]);
> > > +       size_t pos = s->find ("m");
> > > +       size_t pos2 = s->find ("memory");
> > > +       if (pos != std::string::npos)
> > > +         {
> > > +           if (pos > 0 && (s->at (pos - 1) == 'B'))
> > > +               s->replace (pos - 1, 2, "BT");
> > > +           else if (pos2 != std::string::npos)
> > > +               s->replace (pos, 6, "Bt");
> > > +           else
> > > +               s->replace (pos, 1, "Bt");
> >
> > Formatting, the s->replace calls are indented too much.
> >
> >         Jakub
> >

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 01/13] [APX EGPR] middle-end: Add insn argument to base_reg_class
  2023-08-31 10:15   ` Uros Bizjak
@ 2023-09-01  9:07     ` Hongyu Wang
  2023-09-06 19:43       ` Vladimir Makarov
  0 siblings, 1 reply; 49+ messages in thread
From: Hongyu Wang @ 2023-09-01  9:07 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: Hongyu Wang, jakub, gcc-patches, hongtao.liu, hubicka, vmakarov

Uros Bizjak via Gcc-patches <gcc-patches@gcc.gnu.org> 于2023年8月31日周四 18:16写道:
>
> On Thu, Aug 31, 2023 at 10:20 AM Hongyu Wang <hongyu.wang@intel.com> wrote:
> >
> > From: Kong Lingling <lingling.kong@intel.com>
> >
> > Current reload infrastructure does not support selective base_reg_class
> > for backend insn. Add insn argument to base_reg_class for
> > lra/reload usage.
>
> I don't think this is the correct approach. Ideally, a memory
> constraint should somehow encode its BASE/INDEX register class.
> Instead of passing "insn", simply a different constraint could be used
> in the constraint string of the relevant insn.

We tried constraint only at the beginning, but then we found the
reload infrastructure
does not work like that.

The BASE/INDEX reg classes are determined before choosing alternatives, in
process_address under curr_insn_transform. Process_address creates the mem
operand according to the BASE/INDEX reg class. Then, the memory operand
constraint check will evaluate the mem op with targetm.legitimate_address_p.

If we want to make use of EGPR in base/index we need to either extend BASE/INDEX
reg class in the backend, or, for specific insns, add a target hook to
tell reload
that the extended reg class with EGPR can be used to construct memory operand.

CC'd Vladimir as git send-mail failed to add recipient.

>
> Uros.
> >
> > gcc/ChangeLog:
> >
> >         * addresses.h (base_reg_class):  Add insn argument.
> >         Pass to MODE_CODE_BASE_REG_CLASS.
> >         (regno_ok_for_base_p_1): Add insn argument.
> >         Pass to REGNO_MODE_CODE_OK_FOR_BASE_P.
> >         (regno_ok_for_base_p): Add insn argument and parse to ok_for_base_p_1.
> >         * config/avr/avr.h (MODE_CODE_BASE_REG_CLASS): Add insn argument.
> >         (REGNO_MODE_CODE_OK_FOR_BASE_P): Ditto.
> >         * config/gcn/gcn.h (MODE_CODE_BASE_REG_CLASS): Ditto.
> >         (REGNO_MODE_CODE_OK_FOR_BASE_P): Ditto.
> >         * config/rl78/rl78.h (REGNO_MODE_CODE_OK_FOR_BASE_P): Ditto.
> >         (MODE_CODE_BASE_REG_CLASS): Ditto.
> >         * doc/tm.texi: Add insn argument for MODE_CODE_BASE_REG_CLASS
> >         and REGNO_MODE_CODE_OK_FOR_BASE_P.
> >         * doc/tm.texi.in: Ditto.
> >         * lra-constraints.cc (process_address_1): Pass insn to
> >         base_reg_class.
> >         (curr_insn_transform): Ditto.
> >         * reload.cc (find_reloads): Ditto.
> >         (find_reloads_address): Ditto.
> >         (find_reloads_address_1): Ditto.
> >         (find_reloads_subreg_address): Ditto.
> >         * reload1.cc (maybe_fix_stack_asms): Ditto.
> > ---
> >  gcc/addresses.h        | 15 +++++++++------
> >  gcc/config/avr/avr.h   |  5 +++--
> >  gcc/config/gcn/gcn.h   |  4 ++--
> >  gcc/config/rl78/rl78.h |  6 ++++--
> >  gcc/doc/tm.texi        |  8 ++++++--
> >  gcc/doc/tm.texi.in     |  8 ++++++--
> >  gcc/lra-constraints.cc | 15 +++++++++------
> >  gcc/reload.cc          | 30 ++++++++++++++++++------------
> >  gcc/reload1.cc         |  2 +-
> >  9 files changed, 58 insertions(+), 35 deletions(-)
> >
> > diff --git a/gcc/addresses.h b/gcc/addresses.h
> > index 3519c241c6d..08b100cfe6d 100644
> > --- a/gcc/addresses.h
> > +++ b/gcc/addresses.h
> > @@ -28,11 +28,12 @@ inline enum reg_class
> >  base_reg_class (machine_mode mode ATTRIBUTE_UNUSED,
> >                 addr_space_t as ATTRIBUTE_UNUSED,
> >                 enum rtx_code outer_code ATTRIBUTE_UNUSED,
> > -               enum rtx_code index_code ATTRIBUTE_UNUSED)
> > +               enum rtx_code index_code ATTRIBUTE_UNUSED,
> > +               rtx_insn *insn ATTRIBUTE_UNUSED = NULL)
> >  {
> >  #ifdef MODE_CODE_BASE_REG_CLASS
> >    return MODE_CODE_BASE_REG_CLASS (MACRO_MODE (mode), as, outer_code,
> > -                                  index_code);
> > +                                  index_code, insn);
> >  #else
> >  #ifdef MODE_BASE_REG_REG_CLASS
> >    if (index_code == REG)
> > @@ -56,11 +57,12 @@ ok_for_base_p_1 (unsigned regno ATTRIBUTE_UNUSED,
> >                  machine_mode mode ATTRIBUTE_UNUSED,
> >                  addr_space_t as ATTRIBUTE_UNUSED,
> >                  enum rtx_code outer_code ATTRIBUTE_UNUSED,
> > -                enum rtx_code index_code ATTRIBUTE_UNUSED)
> > +                enum rtx_code index_code ATTRIBUTE_UNUSED,
> > +                rtx_insn* insn ATTRIBUTE_UNUSED = NULL)
> >  {
> >  #ifdef REGNO_MODE_CODE_OK_FOR_BASE_P
> >    return REGNO_MODE_CODE_OK_FOR_BASE_P (regno, MACRO_MODE (mode), as,
> > -                                       outer_code, index_code);
> > +                                       outer_code, index_code, insn);
> >  #else
> >  #ifdef REGNO_MODE_OK_FOR_REG_BASE_P
> >    if (index_code == REG)
> > @@ -79,12 +81,13 @@ ok_for_base_p_1 (unsigned regno ATTRIBUTE_UNUSED,
> >
> >  inline bool
> >  regno_ok_for_base_p (unsigned regno, machine_mode mode, addr_space_t as,
> > -                    enum rtx_code outer_code, enum rtx_code index_code)
> > +                    enum rtx_code outer_code, enum rtx_code index_code,
> > +                    rtx_insn* insn = NULL)
> >  {
> >    if (regno >= FIRST_PSEUDO_REGISTER && reg_renumber[regno] >= 0)
> >      regno = reg_renumber[regno];
> >
> > -  return ok_for_base_p_1 (regno, mode, as, outer_code, index_code);
> > +  return ok_for_base_p_1 (regno, mode, as, outer_code, index_code, insn);
> >  }
> >
> >  #endif /* GCC_ADDRESSES_H */
> > diff --git a/gcc/config/avr/avr.h b/gcc/config/avr/avr.h
> > index 8e7e00db13b..1d090fe0838 100644
> > --- a/gcc/config/avr/avr.h
> > +++ b/gcc/config/avr/avr.h
> > @@ -280,12 +280,13 @@ enum reg_class {
> >
> >  #define REGNO_REG_CLASS(R) avr_regno_reg_class(R)
> >
> > -#define MODE_CODE_BASE_REG_CLASS(mode, as, outer_code, index_code)   \
> > +#define MODE_CODE_BASE_REG_CLASS(mode, as, outer_code, index_code, insn)   \
> >    avr_mode_code_base_reg_class (mode, as, outer_code, index_code)
> >
> >  #define INDEX_REG_CLASS NO_REGS
> >
> > -#define REGNO_MODE_CODE_OK_FOR_BASE_P(num, mode, as, outer_code, index_code) \
> > +#define REGNO_MODE_CODE_OK_FOR_BASE_P(num, mode, as, outer_code,         \
> > +                                     index_code, insn)                   \
> >    avr_regno_mode_code_ok_for_base_p (num, mode, as, outer_code, index_code)
> >
> >  #define REGNO_OK_FOR_INDEX_P(NUM) 0
> > diff --git a/gcc/config/gcn/gcn.h b/gcc/config/gcn/gcn.h
> > index 4ff9a5d4d12..b56702a77fd 100644
> > --- a/gcc/config/gcn/gcn.h
> > +++ b/gcc/config/gcn/gcn.h
> > @@ -437,9 +437,9 @@ enum reg_class
> >       0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0 }}
> >
> >  #define REGNO_REG_CLASS(REGNO) gcn_regno_reg_class (REGNO)
> > -#define MODE_CODE_BASE_REG_CLASS(MODE, AS, OUTER, INDEX) \
> > +#define MODE_CODE_BASE_REG_CLASS(MODE, AS, OUTER, INDEX, INSN) \
> >          gcn_mode_code_base_reg_class (MODE, AS, OUTER, INDEX)
> > -#define REGNO_MODE_CODE_OK_FOR_BASE_P(NUM, MODE, AS, OUTER, INDEX) \
> > +#define REGNO_MODE_CODE_OK_FOR_BASE_P(NUM, MODE, AS, OUTER, INDEX, INSN) \
> >          gcn_regno_mode_code_ok_for_base_p (NUM, MODE, AS, OUTER, INDEX)
> >  #define INDEX_REG_CLASS VGPR_REGS
> >  #define REGNO_OK_FOR_INDEX_P(regno) regno_ok_for_index_p (regno)
> > diff --git a/gcc/config/rl78/rl78.h b/gcc/config/rl78/rl78.h
> > index 7a7c6a44ba2..d0ed9162292 100644
> > --- a/gcc/config/rl78/rl78.h
> > +++ b/gcc/config/rl78/rl78.h
> > @@ -375,10 +375,12 @@ enum reg_class
> >
> >  #define REGNO_OK_FOR_INDEX_P(regno)    REGNO_OK_FOR_BASE_P (regno)
> >
> > -#define REGNO_MODE_CODE_OK_FOR_BASE_P(regno, mode, address_space, outer_code, index_code) \
> > +#define REGNO_MODE_CODE_OK_FOR_BASE_P(regno, mode, address_space, outer_code, \
> > +                                     index_code, insn)                       \
> >    rl78_regno_mode_code_ok_for_base_p (regno, mode, address_space, outer_code, index_code)
> >
> > -#define MODE_CODE_BASE_REG_CLASS(mode, address_space, outer_code, index_code) \
> > +#define MODE_CODE_BASE_REG_CLASS(mode, address_space, outer_code, index_code, \
> > +                                insn)                                        \
> >    rl78_mode_code_base_reg_class (mode, address_space, outer_code, index_code)
> >
> >  #define RETURN_ADDR_RTX(COUNT, FRAMEADDR)                              \
> > diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> > index d0d47b0d471..a4239e3de10 100644
> > --- a/gcc/doc/tm.texi
> > +++ b/gcc/doc/tm.texi
> > @@ -2533,7 +2533,7 @@ register address.  You should define this macro if base plus index
> >  addresses have different requirements than other base register uses.
> >  @end defmac
> >
> > -@defmac MODE_CODE_BASE_REG_CLASS (@var{mode}, @var{address_space}, @var{outer_code}, @var{index_code})
> > +@defmac MODE_CODE_BASE_REG_CLASS (@var{mode}, @var{address_space}, @var{outer_code}, @var{index_code}, @var{insn})
> >  A C expression whose value is the register class to which a valid
> >  base register for a memory reference in mode @var{mode} to address
> >  space @var{address_space} must belong.  @var{outer_code} and @var{index_code}
> > @@ -2542,6 +2542,8 @@ the code of the immediately enclosing expression (@code{MEM} for the top level
> >  of an address, @code{ADDRESS} for something that occurs in an
> >  @code{address_operand}).  @var{index_code} is the code of the corresponding
> >  index expression if @var{outer_code} is @code{PLUS}; @code{SCRATCH} otherwise.
> > +@code{insn} indicates insn specific base register class should be subset
> > +of the original base register class.
> >  @end defmac
> >
> >  @defmac INDEX_REG_CLASS
> > @@ -2579,7 +2581,7 @@ Use of this macro is deprecated; please use the more general
> >  @code{REGNO_MODE_CODE_OK_FOR_BASE_P}.
> >  @end defmac
> >
> > -@defmac REGNO_MODE_CODE_OK_FOR_BASE_P (@var{num}, @var{mode}, @var{address_space}, @var{outer_code}, @var{index_code})
> > +@defmac REGNO_MODE_CODE_OK_FOR_BASE_P (@var{num}, @var{mode}, @var{address_space}, @var{outer_code}, @var{index_code}, @var{insn})
> >  A C expression which is nonzero if register number @var{num} is
> >  suitable for use as a base register in operand addresses, accessing
> >  memory in mode @var{mode} in address space @var{address_space}.
> > @@ -2592,6 +2594,8 @@ address, @code{ADDRESS} for something that occurs in an
> >  corresponding index expression if @var{outer_code} is @code{PLUS};
> >  @code{SCRATCH} otherwise.  The mode may be @code{VOIDmode} for addresses
> >  that appear outside a @code{MEM}, i.e., as an @code{address_operand}.
> > +@code{insn} indicates insn specific base register class should be subset
> > +of the original base register class.
> >  @end defmac
> >
> >  @defmac REGNO_OK_FOR_INDEX_P (@var{num})
> > diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
> > index 4ac96dc357d..72898f3adba 100644
> > --- a/gcc/doc/tm.texi.in
> > +++ b/gcc/doc/tm.texi.in
> > @@ -2128,7 +2128,7 @@ register address.  You should define this macro if base plus index
> >  addresses have different requirements than other base register uses.
> >  @end defmac
> >
> > -@defmac MODE_CODE_BASE_REG_CLASS (@var{mode}, @var{address_space}, @var{outer_code}, @var{index_code})
> > +@defmac MODE_CODE_BASE_REG_CLASS (@var{mode}, @var{address_space}, @var{outer_code}, @var{index_code}, @var{insn})
> >  A C expression whose value is the register class to which a valid
> >  base register for a memory reference in mode @var{mode} to address
> >  space @var{address_space} must belong.  @var{outer_code} and @var{index_code}
> > @@ -2137,6 +2137,8 @@ the code of the immediately enclosing expression (@code{MEM} for the top level
> >  of an address, @code{ADDRESS} for something that occurs in an
> >  @code{address_operand}).  @var{index_code} is the code of the corresponding
> >  index expression if @var{outer_code} is @code{PLUS}; @code{SCRATCH} otherwise.
> > +@code{insn} indicates insn specific base register class should be subset
> > +of the original base register class.
> >  @end defmac
> >
> >  @defmac INDEX_REG_CLASS
> > @@ -2174,7 +2176,7 @@ Use of this macro is deprecated; please use the more general
> >  @code{REGNO_MODE_CODE_OK_FOR_BASE_P}.
> >  @end defmac
> >
> > -@defmac REGNO_MODE_CODE_OK_FOR_BASE_P (@var{num}, @var{mode}, @var{address_space}, @var{outer_code}, @var{index_code})
> > +@defmac REGNO_MODE_CODE_OK_FOR_BASE_P (@var{num}, @var{mode}, @var{address_space}, @var{outer_code}, @var{index_code}, @var{insn})
> >  A C expression which is nonzero if register number @var{num} is
> >  suitable for use as a base register in operand addresses, accessing
> >  memory in mode @var{mode} in address space @var{address_space}.
> > @@ -2187,6 +2189,8 @@ address, @code{ADDRESS} for something that occurs in an
> >  corresponding index expression if @var{outer_code} is @code{PLUS};
> >  @code{SCRATCH} otherwise.  The mode may be @code{VOIDmode} for addresses
> >  that appear outside a @code{MEM}, i.e., as an @code{address_operand}.
> > +@code{insn} indicates insn specific base register class should be subset
> > +of the original base register class.
> >  @end defmac
> >
> >  @defmac REGNO_OK_FOR_INDEX_P (@var{num})
> > diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
> > index c718bedff32..9e7915ce934 100644
> > --- a/gcc/lra-constraints.cc
> > +++ b/gcc/lra-constraints.cc
> > @@ -3672,7 +3672,7 @@ process_address_1 (int nop, bool check_only_p,
> >                                      REGNO (*ad.base_term)) != NULL_RTX)
> >             ? after : NULL),
> >            base_reg_class (ad.mode, ad.as, ad.base_outer_code,
> > -                          get_index_code (&ad)))))
> > +                          get_index_code (&ad), curr_insn))))
> >      {
> >        change_p = true;
> >        if (ad.base_term2 != NULL)
> > @@ -3722,7 +3722,8 @@ process_address_1 (int nop, bool check_only_p,
> >           rtx_insn *last = get_last_insn ();
> >           int code = -1;
> >           enum reg_class cl = base_reg_class (ad.mode, ad.as,
> > -                                             SCRATCH, SCRATCH);
> > +                                             SCRATCH, SCRATCH,
> > +                                             curr_insn);
> >           rtx addr = *ad.inner;
> >
> >           new_reg = lra_create_new_reg (Pmode, NULL_RTX, cl, NULL, "addr");
> > @@ -3785,7 +3786,8 @@ process_address_1 (int nop, bool check_only_p,
> >           /* index * scale + disp => new base + index * scale,
> >              case (1) above.  */
> >           enum reg_class cl = base_reg_class (ad.mode, ad.as, PLUS,
> > -                                             GET_CODE (*ad.index));
> > +                                             GET_CODE (*ad.index),
> > +                                             curr_insn);
> >
> >           lra_assert (INDEX_REG_CLASS != NO_REGS);
> >           new_reg = lra_create_new_reg (Pmode, NULL_RTX, cl, NULL, "disp");
> > @@ -3846,7 +3848,7 @@ process_address_1 (int nop, bool check_only_p,
> >               *ad.base_term = XEXP (SET_SRC (set), 0);
> >               *ad.disp_term = XEXP (SET_SRC (set), 1);
> >               cl = base_reg_class (ad.mode, ad.as, ad.base_outer_code,
> > -                                  get_index_code (&ad));
> > +                                  get_index_code (&ad), curr_insn);
> >               regno = REGNO (*ad.base_term);
> >               if (regno >= FIRST_PSEUDO_REGISTER
> >                   && cl != lra_get_allocno_class (regno))
> > @@ -3890,7 +3892,8 @@ process_address_1 (int nop, bool check_only_p,
> >    else
> >      {
> >        enum reg_class cl = base_reg_class (ad.mode, ad.as,
> > -                                         SCRATCH, SCRATCH);
> > +                                         SCRATCH, SCRATCH,
> > +                                         curr_insn);
> >        rtx addr = *ad.inner;
> >
> >        new_reg = lra_create_new_reg (Pmode, NULL_RTX, cl, NULL, "addr");
> > @@ -4639,7 +4642,7 @@ curr_insn_transform (bool check_only_p)
> >
> >           push_to_sequence (before);
> >           rclass = base_reg_class (GET_MODE (op), MEM_ADDR_SPACE (op),
> > -                                  MEM, SCRATCH);
> > +                                  MEM, SCRATCH, curr_insn);
> >           if (GET_RTX_CLASS (code) == RTX_AUTOINC)
> >             new_reg = emit_inc (rclass, *loc, *loc,
> >                                 /* This value does not matter for MODIFY.  */
> > diff --git a/gcc/reload.cc b/gcc/reload.cc
> > index 2126bdd117c..72f7e27af15 100644
> > --- a/gcc/reload.cc
> > +++ b/gcc/reload.cc
> > @@ -3321,7 +3321,7 @@ find_reloads (rtx_insn *insn, int replace, int ind_levels, int live_known,
> >                        were handled in find_reloads_address.  */
> >                     this_alternative[i]
> >                       = base_reg_class (VOIDmode, ADDR_SPACE_GENERIC,
> > -                                       ADDRESS, SCRATCH);
> > +                                       ADDRESS, SCRATCH, insn);
> >                     win = 1;
> >                     badop = 0;
> >                     break;
> > @@ -3508,7 +3508,7 @@ find_reloads (rtx_insn *insn, int replace, int ind_levels, int live_known,
> >                            the address into a base register.  */
> >                         this_alternative[i]
> >                           = base_reg_class (VOIDmode, ADDR_SPACE_GENERIC,
> > -                                           ADDRESS, SCRATCH);
> > +                                           ADDRESS, SCRATCH, insn);
> >                         badop = 0;
> >                         break;
> >
> > @@ -4018,7 +4018,7 @@ find_reloads (rtx_insn *insn, int replace, int ind_levels, int live_known,
> >             operand_reloadnum[i]
> >               = push_reload (XEXP (recog_data.operand[i], 0), NULL_RTX,
> >                              &XEXP (recog_data.operand[i], 0), (rtx*) 0,
> > -                            base_reg_class (VOIDmode, as, MEM, SCRATCH),
> > +                            base_reg_class (VOIDmode, as, MEM, SCRATCH, insn),
> >                              address_mode,
> >                              VOIDmode, 0, 0, i, RELOAD_OTHER);
> >             rld[operand_reloadnum[i]].inc
> > @@ -4897,7 +4897,8 @@ find_reloads_address (machine_mode mode, rtx *memrefloc, rtx ad,
> >        if (reg_equiv_constant (regno) != 0)
> >         {
> >           find_reloads_address_part (reg_equiv_constant (regno), loc,
> > -                                    base_reg_class (mode, as, MEM, SCRATCH),
> > +                                    base_reg_class (mode, as, MEM,
> > +                                                    SCRATCH, insn),
> >                                      GET_MODE (ad), opnum, type, ind_levels);
> >           return 1;
> >         }
> > @@ -4966,7 +4967,7 @@ find_reloads_address (machine_mode mode, rtx *memrefloc, rtx ad,
> >
> >        /* If we do not have one of the cases above, we must do the reload.  */
> >        push_reload (ad, NULL_RTX, loc, (rtx*) 0,
> > -                  base_reg_class (mode, as, MEM, SCRATCH),
> > +                  base_reg_class (mode, as, MEM, SCRATCH, insn),
> >                    GET_MODE (ad), VOIDmode, 0, 0, opnum, type);
> >        return 1;
> >      }
> > @@ -5123,7 +5124,8 @@ find_reloads_address (machine_mode mode, rtx *memrefloc, rtx ad,
> >              reload the sum into a base reg.
> >              That will at least work.  */
> >           find_reloads_address_part (ad, loc,
> > -                                    base_reg_class (mode, as, MEM, SCRATCH),
> > +                                    base_reg_class (mode, as, MEM,
> > +                                                    SCRATCH, insn),
> >                                      GET_MODE (ad), opnum, type, ind_levels);
> >         }
> >        return ! removed_and;
> > @@ -5203,7 +5205,7 @@ find_reloads_address (machine_mode mode, rtx *memrefloc, rtx ad,
> >                                  op_index == 0 ? addend : offset_reg);
> >           *loc = ad;
> >
> > -         cls = base_reg_class (mode, as, MEM, GET_CODE (addend));
> > +         cls = base_reg_class (mode, as, MEM, GET_CODE (addend), insn);
> >           find_reloads_address_part (XEXP (ad, op_index),
> >                                      &XEXP (ad, op_index), cls,
> >                                      GET_MODE (ad), opnum, type, ind_levels);
> > @@ -5261,7 +5263,8 @@ find_reloads_address (machine_mode mode, rtx *memrefloc, rtx ad,
> >         }
> >
> >        find_reloads_address_part (ad, loc,
> > -                                base_reg_class (mode, as, MEM, SCRATCH),
> > +                                base_reg_class (mode, as, MEM,
> > +                                                SCRATCH, insn),
> >                                  address_mode, opnum, type, ind_levels);
> >        return ! removed_and;
> >      }
> > @@ -5513,7 +5516,8 @@ find_reloads_address_1 (machine_mode mode, addr_space_t as,
> >    if (context == 1)
> >      context_reg_class = INDEX_REG_CLASS;
> >    else
> > -    context_reg_class = base_reg_class (mode, as, outer_code, index_code);
> > +    context_reg_class = base_reg_class (mode, as, outer_code, index_code,
> > +                                       insn);
> >
> >    switch (code)
> >      {
> > @@ -5738,7 +5742,8 @@ find_reloads_address_1 (machine_mode mode, addr_space_t as,
> >                 reloadnum = push_reload (tem, tem, &XEXP (x, 0),
> >                                          &XEXP (op1, 0),
> >                                          base_reg_class (mode, as,
> > -                                                        code, index_code),
> > +                                                        code, index_code,
> > +                                                        insn),
> >                                          GET_MODE (x), GET_MODE (x), 0,
> >                                          0, opnum, RELOAD_OTHER);
> >
> > @@ -5756,7 +5761,8 @@ find_reloads_address_1 (machine_mode mode, addr_space_t as,
> >             reloadnum = push_reload (XEXP (op1, 0), XEXP (x, 0),
> >                                      &XEXP (op1, 0), &XEXP (x, 0),
> >                                      base_reg_class (mode, as,
> > -                                                    code, index_code),
> > +                                                    code, index_code,
> > +                                                    insn),
> >                                      GET_MODE (x), GET_MODE (x), 0, 0,
> >                                      opnum, RELOAD_OTHER);
> >
> > @@ -6216,7 +6222,7 @@ find_reloads_subreg_address (rtx x, int opnum, enum reload_type type,
> >      {
> >        push_reload (XEXP (tem, 0), NULL_RTX, &XEXP (tem, 0), (rtx*) 0,
> >                    base_reg_class (GET_MODE (tem), MEM_ADDR_SPACE (tem),
> > -                                  MEM, SCRATCH),
> > +                                  MEM, SCRATCH, insn),
> >                    GET_MODE (XEXP (tem, 0)), VOIDmode, 0, 0, opnum, type);
> >        reloaded = 1;
> >      }
> > diff --git a/gcc/reload1.cc b/gcc/reload1.cc
> > index 9ba822d1ff7..f41f4a4de22 100644
> > --- a/gcc/reload1.cc
> > +++ b/gcc/reload1.cc
> > @@ -1382,7 +1382,7 @@ maybe_fix_stack_asms (void)
> >                   if (insn_extra_address_constraint (cn))
> >                     cls = (int) reg_class_subunion[cls]
> >                       [(int) base_reg_class (VOIDmode, ADDR_SPACE_GENERIC,
> > -                                            ADDRESS, SCRATCH)];
> > +                                            ADDRESS, SCRATCH, chain->insn)];
> >                   else
> >                     cls = (int) reg_class_subunion[cls]
> >                       [reg_class_for_constraint (cn)];
> > --
> > 2.31.1
> >

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 08/13] [APX EGPR] Handle GPR16 only vector move insns
  2023-08-31  9:43   ` Jakub Jelinek
@ 2023-09-01  9:07     ` Hongyu Wang
  2023-09-01  9:20       ` Jakub Jelinek
  0 siblings, 1 reply; 49+ messages in thread
From: Hongyu Wang @ 2023-09-01  9:07 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Hongyu Wang, gcc-patches, hongtao.liu, hubicka

Jakub Jelinek via Gcc-patches <gcc-patches@gcc.gnu.org> 于2023年8月31日周四 17:44写道:
>
> On Thu, Aug 31, 2023 at 04:20:19PM +0800, Hongyu Wang via Gcc-patches wrote:
> > For vector move insns like vmovdqa/vmovdqu, their evex counterparts
> > requrire explicit suffix 64/32/16/8. The usage of these instruction
> > are prohibited under AVX10_1 or AVX512F, so for AVX2+APX_F we select
> > vmovaps/vmovups for vector load/store insns that contains EGPR.
>
> Why not make it dependent on AVX512VL?
> I.e. if egpr_p && TARGET_AVX512VL, still use vmovdqu16 or vmovdqa16
> and the like, and only if !evex_reg_p && egpr_p && !TARGET_AVX512VL
> fall back to what you're doing?

I'm not sure if it is necessary, as on hardware there is no difference between
vmovdqu16/vmovups. If vmovups already has the capability to represent
EGPR why do we need to distinguish them under VL?

> >
> > gcc/ChangeLog:
> >
> >       * config/i386/i386.cc (ix86_get_ssemov): Check if egpr is used,
> >       adjust mnemonic for vmovduq/vmovdqa.
> >       * config/i386/sse.md (*<extract_type>_vinsert<shuffletype><extract_suf>_0):
> >       Check if egpr is used, adjust mnemonic for vmovdqu/vmovdqa.
> >       (avx_vec_concat<mode>): Likewise, and separate alternative 0 to
> >       avx_noavx512f.
>
>         Jakub
>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 08/13] [APX EGPR] Handle GPR16 only vector move insns
  2023-09-01  9:07     ` Hongyu Wang
@ 2023-09-01  9:20       ` Jakub Jelinek
  2023-09-01 11:34         ` Hongyu Wang
  0 siblings, 1 reply; 49+ messages in thread
From: Jakub Jelinek @ 2023-09-01  9:20 UTC (permalink / raw)
  To: Hongyu Wang; +Cc: Hongyu Wang, gcc-patches, hongtao.liu, hubicka

On Fri, Sep 01, 2023 at 05:07:53PM +0800, Hongyu Wang wrote:
> Jakub Jelinek via Gcc-patches <gcc-patches@gcc.gnu.org> 于2023年8月31日周四 17:44写道:
> >
> > On Thu, Aug 31, 2023 at 04:20:19PM +0800, Hongyu Wang via Gcc-patches wrote:
> > > For vector move insns like vmovdqa/vmovdqu, their evex counterparts
> > > requrire explicit suffix 64/32/16/8. The usage of these instruction
> > > are prohibited under AVX10_1 or AVX512F, so for AVX2+APX_F we select
> > > vmovaps/vmovups for vector load/store insns that contains EGPR.
> >
> > Why not make it dependent on AVX512VL?
> > I.e. if egpr_p && TARGET_AVX512VL, still use vmovdqu16 or vmovdqa16
> > and the like, and only if !evex_reg_p && egpr_p && !TARGET_AVX512VL
> > fall back to what you're doing?
> 
> I'm not sure if it is necessary, as on hardware there is no difference between
> vmovdqu16/vmovups. If vmovups already has the capability to represent
> EGPR why do we need to distinguish them under VL?

On the Intel HW you're currently planning.
Will that be the case for AMD as well?
Some insns are documented to move float or double vectors while others
integer vectors (of different element sizes).
Or is vmovups with GPR32 at least encoded smaller than vmovdqu{16,32,64}?

	Jakub


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 06/13] [APX EGPR] Map reg/mem constraints in inline asm to non-EGPR constraint.
  2023-09-01  9:04       ` Hongyu Wang
@ 2023-09-01  9:38         ` Uros Bizjak
  2023-09-01 10:35           ` Hongtao Liu
  0 siblings, 1 reply; 49+ messages in thread
From: Uros Bizjak @ 2023-09-01  9:38 UTC (permalink / raw)
  To: Hongyu Wang; +Cc: Jakub Jelinek, Hongyu Wang, gcc-patches, hongtao.liu, hubicka

On Fri, Sep 1, 2023 at 11:10 AM Hongyu Wang <wwwhhhyyy333@gmail.com> wrote:
>
> Uros Bizjak via Gcc-patches <gcc-patches@gcc.gnu.org> 于2023年8月31日周四 18:01写道:
> >
> > On Thu, Aug 31, 2023 at 11:18 AM Jakub Jelinek via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> > >
> > > On Thu, Aug 31, 2023 at 04:20:17PM +0800, Hongyu Wang via Gcc-patches wrote:
> > > > From: Kong Lingling <lingling.kong@intel.com>
> > > >
> > > > In inline asm, we do not know if the insn can use EGPR, so disable EGPR
> > > > usage by default from mapping the common reg/mem constraint to non-EGPR
> > > > constraints. Use a flag mapx-inline-asm-use-gpr32 to enable EGPR usage
> > > > for inline asm.
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > >       * config/i386/i386.cc (INCLUDE_STRING): Add include for
> > > >       ix86_md_asm_adjust.
> > > >       (ix86_md_asm_adjust): When APX EGPR enabled without specifying the
> > > >       target option, map reg/mem constraints to non-EGPR constraints.
> > > >       * config/i386/i386.opt: Add option mapx-inline-asm-use-gpr32.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > >       * gcc.target/i386/apx-inline-gpr-norex2.c: New test.
> > > > ---
> > > >  gcc/config/i386/i386.cc                       |  44 +++++++
> > > >  gcc/config/i386/i386.opt                      |   5 +
> > > >  .../gcc.target/i386/apx-inline-gpr-norex2.c   | 107 ++++++++++++++++++
> > > >  3 files changed, 156 insertions(+)
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/apx-inline-gpr-norex2.c
> > > >
> > > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > > > index d26d9ab0d9d..9460ebbfda4 100644
> > > > --- a/gcc/config/i386/i386.cc
> > > > +++ b/gcc/config/i386/i386.cc
> > > > @@ -17,6 +17,7 @@ You should have received a copy of the GNU General Public License
> > > >  along with GCC; see the file COPYING3.  If not see
> > > >  <http://www.gnu.org/licenses/>.  */
> > > >
> > > > +#define INCLUDE_STRING
> > > >  #define IN_TARGET_CODE 1
> > > >
> > > >  #include "config.h"
> > > > @@ -23077,6 +23078,49 @@ ix86_md_asm_adjust (vec<rtx> &outputs, vec<rtx> & /*inputs*/,
> > > >    bool saw_asm_flag = false;
> > > >
> > > >    start_sequence ();
> > > > +  /* TODO: Here we just mapped the general r/m constraints to non-EGPR
> > > > +   constraints, will eventually map all the usable constraints in the future. */
> > >
> > > I think there should be some constraint which explicitly has all the 32
> > > GPRs, like there is one for just all 16 GPRs (h), so that regardless of
> > > -mapx-inline-asm-use-gpr32 one can be explicit what the inline asm wants.
> > >
> > > Also, what about the "g" constraint?  Shouldn't there be another for "g"
> > > without r16..r31?  What about the various other memory
> > > constraints ("<", "o", ...)?
> >
> > I think we should leave all existing constraints as they are, so "r"
> > covers only GPR16, "m" and "o" to only use GPR16. We can then
> > introduce "h" to instructions that have the ability to handle EGPR.
> > This would be somehow similar to the SSE -> AVX512F transition, where
> > we still have "x" for SSE16 and "v" was introduced as a separate
> > register class for EVEX SSE registers. This way, asm will be
> > compatible, when "r", "m", "o" and "g" are used. The new memory
> > constraint "Bt", should allow new registers, and should be added to
> > the constraint string as a separate constraint, and conditionally
> > enabled by relevant "isa" (AKA "enabled") attribute.
>
> The extended constraint can work for registers, but for memory it is more
> complicated.

Yes, unfortunately. The compiler assumes that an unchangeable register
class is used for BASE/INDEX registers. I have hit this limitation
when trying to implement memory support for instructions involving
8-bit high registers (%ah, %bh, %ch, %dh), which do not support REX
registers, also inside memory operand. (You can see the "hack" in e.g.
*extzvqi_mem_rex64" and corresponding peephole2 with the original
*extzvqi pattern). I am aware that dynamic insn-dependent BASE/INDEX
register class is the major limitation in the compiler, so perhaps the
strategy on how to override this limitation should be discussed with
the register allocator author first. Perhaps adding an insn attribute
to insn RTX pattern to specify different BASE/INDEX register sets can
be a better solution than passing insn RTX to the register allocator.

The above idea still does not solve the asm problem on how to select
correct BASE/INDEX register set for memory operands.

Uros.
>
> If we want to use new mem constraints that allow gpr32, then BASE/INDEX
> reg class still requires per-insn verification, so it means changes
> on all patterns with vm, and those SSE patterns on opcode map0/1. Also,
> several legacy insns that are promoted to EVEX encoding space need to be
> changed. The overall implementation could be 10 times larger than current,
> which would be quite hard for maintenance.
>
> >
> > Uros.
> >
> > > > +  if (TARGET_APX_EGPR && !ix86_apx_inline_asm_use_gpr32)
> > > > +    {
> > > > +      /* Map "r" constraint in inline asm to "h" that disallows r16-r31
> > > > +      and replace only r, exclude Br and Yr.  */
> > > > +      for (unsigned i = 0; i < constraints.length (); i++)
> > > > +     {
> > > > +       std::string *s = new std::string (constraints[i]);
> > >
> > > Doesn't this leak memory (all the time)?
> > > I must say I don't really understand why you need to use std::string here,
> > > but certainly it shouldn't leak.
> > >
> > > > +       size_t pos = s->find ('r');
> > > > +       while (pos != std::string::npos)
> > > > +         {
> > > > +           if (pos > 0
> > > > +               && (s->at (pos - 1) == 'Y' || s->at (pos - 1) == 'B'))
> > > > +             pos = s->find ('r', pos + 1);
> > > > +           else
> > > > +             {
> > > > +               s->replace (pos, 1, "h");
> > > > +               constraints[i] = (const char*) s->c_str ();
> > >
> > > Formatting (space before *).  The usual way for constraints is ggc_strdup on
> > > some string in a buffer.  Also, one could have several copies or r (or m, memory (doesn't
> > > that appear just in clobbers?  And that doesn't look like something that
> > > should be replaced), Bm, e.g. in various alternatives.  So, you
> > > need to change them all, not just the first hit.  "r,r,r,m" and the like.
> > > Normally, one would simply walk the constraint string, parsing the special
> > > letters (+, =, & etc.) and single letter constraints and 2 letter
> > > constraints using CONSTRAINT_LEN macro (tons of examples in GCC sources).
> > > Either do it in 2 passes, first one counts how long constraint string one
> > > will need after the adjustments (and whether to adjust something at all),
> > > then if needed XALLOCAVEC it and adjust in there, or say use a
> > > auto_vec<char, 32> for
> > > it.
> > >
> > > > +               break;
> > > > +             }
> > > > +         }
> > > > +     }
> > > > +      /* Also map "m/memory/Bm" constraint that may use GPR32, replace them with
> > > > +      "Bt/Bt/BT".  */
> > > > +      for (unsigned i = 0; i < constraints.length (); i++)
> > > > +     {
> > > > +       std::string *s = new std::string (constraints[i]);
> > > > +       size_t pos = s->find ("m");
> > > > +       size_t pos2 = s->find ("memory");
> > > > +       if (pos != std::string::npos)
> > > > +         {
> > > > +           if (pos > 0 && (s->at (pos - 1) == 'B'))
> > > > +               s->replace (pos - 1, 2, "BT");
> > > > +           else if (pos2 != std::string::npos)
> > > > +               s->replace (pos, 6, "Bt");
> > > > +           else
> > > > +               s->replace (pos, 1, "Bt");
> > >
> > > Formatting, the s->replace calls are indented too much.
> > >
> > >         Jakub
> > >

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 06/13] [APX EGPR] Map reg/mem constraints in inline asm to non-EGPR constraint.
  2023-09-01  9:38         ` Uros Bizjak
@ 2023-09-01 10:35           ` Hongtao Liu
  2023-09-01 11:27             ` Uros Bizjak
  0 siblings, 1 reply; 49+ messages in thread
From: Hongtao Liu @ 2023-09-01 10:35 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: Hongyu Wang, Jakub Jelinek, Hongyu Wang, gcc-patches,
	hongtao.liu, hubicka

On Fri, Sep 1, 2023 at 5:38 PM Uros Bizjak via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On Fri, Sep 1, 2023 at 11:10 AM Hongyu Wang <wwwhhhyyy333@gmail.com> wrote:
> >
> > Uros Bizjak via Gcc-patches <gcc-patches@gcc.gnu.org> 于2023年8月31日周四 18:01写道:
> > >
> > > On Thu, Aug 31, 2023 at 11:18 AM Jakub Jelinek via Gcc-patches
> > > <gcc-patches@gcc.gnu.org> wrote:
> > > >
> > > > On Thu, Aug 31, 2023 at 04:20:17PM +0800, Hongyu Wang via Gcc-patches wrote:
> > > > > From: Kong Lingling <lingling.kong@intel.com>
> > > > >
> > > > > In inline asm, we do not know if the insn can use EGPR, so disable EGPR
> > > > > usage by default from mapping the common reg/mem constraint to non-EGPR
> > > > > constraints. Use a flag mapx-inline-asm-use-gpr32 to enable EGPR usage
> > > > > for inline asm.
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > >       * config/i386/i386.cc (INCLUDE_STRING): Add include for
> > > > >       ix86_md_asm_adjust.
> > > > >       (ix86_md_asm_adjust): When APX EGPR enabled without specifying the
> > > > >       target option, map reg/mem constraints to non-EGPR constraints.
> > > > >       * config/i386/i386.opt: Add option mapx-inline-asm-use-gpr32.
> > > > >
> > > > > gcc/testsuite/ChangeLog:
> > > > >
> > > > >       * gcc.target/i386/apx-inline-gpr-norex2.c: New test.
> > > > > ---
> > > > >  gcc/config/i386/i386.cc                       |  44 +++++++
> > > > >  gcc/config/i386/i386.opt                      |   5 +
> > > > >  .../gcc.target/i386/apx-inline-gpr-norex2.c   | 107 ++++++++++++++++++
> > > > >  3 files changed, 156 insertions(+)
> > > > >  create mode 100644 gcc/testsuite/gcc.target/i386/apx-inline-gpr-norex2.c
> > > > >
> > > > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > > > > index d26d9ab0d9d..9460ebbfda4 100644
> > > > > --- a/gcc/config/i386/i386.cc
> > > > > +++ b/gcc/config/i386/i386.cc
> > > > > @@ -17,6 +17,7 @@ You should have received a copy of the GNU General Public License
> > > > >  along with GCC; see the file COPYING3.  If not see
> > > > >  <http://www.gnu.org/licenses/>.  */
> > > > >
> > > > > +#define INCLUDE_STRING
> > > > >  #define IN_TARGET_CODE 1
> > > > >
> > > > >  #include "config.h"
> > > > > @@ -23077,6 +23078,49 @@ ix86_md_asm_adjust (vec<rtx> &outputs, vec<rtx> & /*inputs*/,
> > > > >    bool saw_asm_flag = false;
> > > > >
> > > > >    start_sequence ();
> > > > > +  /* TODO: Here we just mapped the general r/m constraints to non-EGPR
> > > > > +   constraints, will eventually map all the usable constraints in the future. */
> > > >
> > > > I think there should be some constraint which explicitly has all the 32
> > > > GPRs, like there is one for just all 16 GPRs (h), so that regardless of
> > > > -mapx-inline-asm-use-gpr32 one can be explicit what the inline asm wants.
> > > >
> > > > Also, what about the "g" constraint?  Shouldn't there be another for "g"
> > > > without r16..r31?  What about the various other memory
> > > > constraints ("<", "o", ...)?
> > >
> > > I think we should leave all existing constraints as they are, so "r"
> > > covers only GPR16, "m" and "o" to only use GPR16. We can then
> > > introduce "h" to instructions that have the ability to handle EGPR.
> > > This would be somehow similar to the SSE -> AVX512F transition, where
> > > we still have "x" for SSE16 and "v" was introduced as a separate
> > > register class for EVEX SSE registers. This way, asm will be
> > > compatible, when "r", "m", "o" and "g" are used. The new memory
> > > constraint "Bt", should allow new registers, and should be added to
> > > the constraint string as a separate constraint, and conditionally
> > > enabled by relevant "isa" (AKA "enabled") attribute.
> >
> > The extended constraint can work for registers, but for memory it is more
> > complicated.
>
> Yes, unfortunately. The compiler assumes that an unchangeable register
> class is used for BASE/INDEX registers. I have hit this limitation
> when trying to implement memory support for instructions involving
> 8-bit high registers (%ah, %bh, %ch, %dh), which do not support REX
> registers, also inside memory operand. (You can see the "hack" in e.g.
> *extzvqi_mem_rex64" and corresponding peephole2 with the original
> *extzvqi pattern). I am aware that dynamic insn-dependent BASE/INDEX
> register class is the major limitation in the compiler, so perhaps the
> strategy on how to override this limitation should be discussed with
> the register allocator author first. Perhaps adding an insn attribute
> to insn RTX pattern to specify different BASE/INDEX register sets can
> be a better solution than passing insn RTX to the register allocator.
>
> The above idea still does not solve the asm problem on how to select
> correct BASE/INDEX register set for memory operands.
The current approach disables gpr32 for memory operand in asm_operand
by default. but can be turned on by options
ix86_apx_inline_asm_use_gpr32(users need to guarantee the instruction
supports gpr32).
Only ~ 5% of total instructions don't support gpr32, reversed approach
only gonna get more complicated.

>
> Uros.
> >
> > If we want to use new mem constraints that allow gpr32, then BASE/INDEX
> > reg class still requires per-insn verification, so it means changes
> > on all patterns with vm, and those SSE patterns on opcode map0/1. Also,
> > several legacy insns that are promoted to EVEX encoding space need to be
> > changed. The overall implementation could be 10 times larger than current,
> > which would be quite hard for maintenance.
> >
> > >
> > > Uros.
> > >
> > > > > +  if (TARGET_APX_EGPR && !ix86_apx_inline_asm_use_gpr32)
> > > > > +    {
> > > > > +      /* Map "r" constraint in inline asm to "h" that disallows r16-r31
> > > > > +      and replace only r, exclude Br and Yr.  */
> > > > > +      for (unsigned i = 0; i < constraints.length (); i++)
> > > > > +     {
> > > > > +       std::string *s = new std::string (constraints[i]);
> > > >
> > > > Doesn't this leak memory (all the time)?
> > > > I must say I don't really understand why you need to use std::string here,
> > > > but certainly it shouldn't leak.
> > > >
> > > > > +       size_t pos = s->find ('r');
> > > > > +       while (pos != std::string::npos)
> > > > > +         {
> > > > > +           if (pos > 0
> > > > > +               && (s->at (pos - 1) == 'Y' || s->at (pos - 1) == 'B'))
> > > > > +             pos = s->find ('r', pos + 1);
> > > > > +           else
> > > > > +             {
> > > > > +               s->replace (pos, 1, "h");
> > > > > +               constraints[i] = (const char*) s->c_str ();
> > > >
> > > > Formatting (space before *).  The usual way for constraints is ggc_strdup on
> > > > some string in a buffer.  Also, one could have several copies or r (or m, memory (doesn't
> > > > that appear just in clobbers?  And that doesn't look like something that
> > > > should be replaced), Bm, e.g. in various alternatives.  So, you
> > > > need to change them all, not just the first hit.  "r,r,r,m" and the like.
> > > > Normally, one would simply walk the constraint string, parsing the special
> > > > letters (+, =, & etc.) and single letter constraints and 2 letter
> > > > constraints using CONSTRAINT_LEN macro (tons of examples in GCC sources).
> > > > Either do it in 2 passes, first one counts how long constraint string one
> > > > will need after the adjustments (and whether to adjust something at all),
> > > > then if needed XALLOCAVEC it and adjust in there, or say use a
> > > > auto_vec<char, 32> for
> > > > it.
> > > >
> > > > > +               break;
> > > > > +             }
> > > > > +         }
> > > > > +     }
> > > > > +      /* Also map "m/memory/Bm" constraint that may use GPR32, replace them with
> > > > > +      "Bt/Bt/BT".  */
> > > > > +      for (unsigned i = 0; i < constraints.length (); i++)
> > > > > +     {
> > > > > +       std::string *s = new std::string (constraints[i]);
> > > > > +       size_t pos = s->find ("m");
> > > > > +       size_t pos2 = s->find ("memory");
> > > > > +       if (pos != std::string::npos)
> > > > > +         {
> > > > > +           if (pos > 0 && (s->at (pos - 1) == 'B'))
> > > > > +               s->replace (pos - 1, 2, "BT");
> > > > > +           else if (pos2 != std::string::npos)
> > > > > +               s->replace (pos, 6, "Bt");
> > > > > +           else
> > > > > +               s->replace (pos, 1, "Bt");
> > > >
> > > > Formatting, the s->replace calls are indented too much.
> > > >
> > > >         Jakub
> > > >



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 11/13] [APX EGPR] Handle legacy insns that only support GPR16 (3/5)
  2023-08-31  9:28     ` Richard Biener
  2023-09-01  9:03       ` Hongyu Wang
@ 2023-09-01 10:38       ` Hongtao Liu
  1 sibling, 0 replies; 49+ messages in thread
From: Hongtao Liu @ 2023-09-01 10:38 UTC (permalink / raw)
  To: Richard Biener; +Cc: Hongyu Wang, gcc-patches, jakub, hongtao.liu, hubicka

On Thu, Aug 31, 2023 at 5:31 PM Richard Biener via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On Thu, Aug 31, 2023 at 11:26 AM Richard Biener
> <richard.guenther@gmail.com> wrote:
> >
> > On Thu, Aug 31, 2023 at 10:25 AM Hongyu Wang via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> > >
> > > From: Kong Lingling <lingling.kong@intel.com>
> > >
> > > Disable EGPR usage for below legacy insns in opcode map2/3 that have vex
> > > but no evex counterpart.
> > >
> > > insn list:
> > > 1. phminposuw/vphminposuw
> > > 2. ptest/vptest
> > > 3. roundps/vroundps, roundpd/vroundpd,
> > >    roundss/vroundss, roundsd/vroundsd
> > > 4. pcmpestri/vpcmpestri, pcmpestrm/vpcmpestrm
> > > 5. pcmpistri/vpcmpistri, pcmpistrm/vpcmpistrm
> >
> > How are GPRs involved in the above?  Or did I misunderstand something?
>
> Following up myself - for the memory operand alternatives I guess.  How about
> simply disabling the memory alternatives when EGPR is active?  Wouldn't
> that simplify the initial patchset a lot?  Re-enabling them when
> deemed important
> could be done as followup then?
>
There're instructions only support memory operand but don't support
gpr32 (.i.e. xsave)
We still need to handle them at the initial patch.
> Richard.
>
> > > 6. aesimc/vaesimc, aeskeygenassist/vaeskeygenassist
> > >
> > > gcc/ChangeLog:
> > >
> > >         * config/i386/i386-protos.h (x86_evex_reg_mentioned_p): New
> > >         prototype.
> > >         * config/i386/i386.cc (x86_evex_reg_mentioned_p): New
> > >         function.
> > >         * config/i386/i386.md (sse4_1_round<mode>2): Set attr gpr32 0
> > >         and constraint Bt/BM to all non-evex alternatives, adjust
> > >         alternative outputs if evex reg is mentioned.
> > >         * config/i386/sse.md (<sse4_1>_ptest<mode>): Set attr gpr32 0
> > >         and constraint Bt/BM to all non-evex alternatives.
> > >         (ptesttf2): Likewise.
> > >         (<sse4_1>_round<ssemodesuffix><avxsizesuffix): Likewise.
> > >         (sse4_1_round<ssescalarmodesuffix>): Likewise.
> > >         (sse4_2_pcmpestri): Likewise.
> > >         (sse4_2_pcmpestrm): Likewise.
> > >         (sse4_2_pcmpestr_cconly): Likewise.
> > >         (sse4_2_pcmpistr): Likewise.
> > >         (sse4_2_pcmpistri): Likewise.
> > >         (sse4_2_pcmpistrm): Likewise.
> > >         (sse4_2_pcmpistr_cconly): Likewise.
> > >         (aesimc): Likewise.
> > >         (aeskeygenassist): Likewise.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >         * gcc.target/i386/apx-legacy-insn-check-norex2.c: Add intrinsic
> > >         tests.
> > > ---
> > >  gcc/config/i386/i386-protos.h                 |  1 +
> > >  gcc/config/i386/i386.cc                       | 13 +++
> > >  gcc/config/i386/i386.md                       |  3 +-
> > >  gcc/config/i386/sse.md                        | 93 +++++++++++++------
> > >  .../i386/apx-legacy-insn-check-norex2.c       | 55 ++++++++++-
> > >  5 files changed, 132 insertions(+), 33 deletions(-)
> > >
> > > diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
> > > index 78eb3e0f584..bbb219e3039 100644
> > > --- a/gcc/config/i386/i386-protos.h
> > > +++ b/gcc/config/i386/i386-protos.h
> > > @@ -65,6 +65,7 @@ extern bool extended_reg_mentioned_p (rtx);
> > >  extern bool x86_extended_QIreg_mentioned_p (rtx_insn *);
> > >  extern bool x86_extended_reg_mentioned_p (rtx);
> > >  extern bool x86_extended_rex2reg_mentioned_p (rtx);
> > > +extern bool x86_evex_reg_mentioned_p (rtx [], int);
> > >  extern bool x86_maybe_negate_const_int (rtx *, machine_mode);
> > >  extern machine_mode ix86_cc_mode (enum rtx_code, rtx, rtx);
> > >
> > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > > index f5d642948bc..ec93c5bab97 100644
> > > --- a/gcc/config/i386/i386.cc
> > > +++ b/gcc/config/i386/i386.cc
> > > @@ -22936,6 +22936,19 @@ x86_extended_rex2reg_mentioned_p (rtx insn)
> > >    return false;
> > >  }
> > >
> > > +/* Return true when rtx operands mentions register that must be encoded using
> > > +   evex prefix.  */
> > > +bool
> > > +x86_evex_reg_mentioned_p (rtx operands[], int nops)
> > > +{
> > > +  int i;
> > > +  for (i = 0; i < nops; i++)
> > > +    if (EXT_REX_SSE_REG_P (operands[i])
> > > +       || x86_extended_rex2reg_mentioned_p (operands[i]))
> > > +      return true;
> > > +  return false;
> > > +}
> > > +
> > >  /* If profitable, negate (without causing overflow) integer constant
> > >     of mode MODE at location LOC.  Return true in this case.  */
> > >  bool
> > > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> > > index 83ad01b43c1..4c305e72389 100644
> > > --- a/gcc/config/i386/i386.md
> > > +++ b/gcc/config/i386/i386.md
> > > @@ -21603,7 +21603,7 @@ (define_expand "significand<mode>2"
> > >  (define_insn "sse4_1_round<mode>2"
> > >    [(set (match_operand:MODEFH 0 "register_operand" "=x,x,x,v,v")
> > >         (unspec:MODEFH
> > > -         [(match_operand:MODEFH 1 "nonimmediate_operand" "0,x,m,v,m")
> > > +         [(match_operand:MODEFH 1 "nonimmediate_operand" "0,x,Bt,v,m")
> > >            (match_operand:SI 2 "const_0_to_15_operand")]
> > >           UNSPEC_ROUND))]
> > >    "TARGET_SSE4_1"
> > > @@ -21616,6 +21616,7 @@ (define_insn "sse4_1_round<mode>2"
> > >    [(set_attr "type" "ssecvt")
> > >     (set_attr "prefix_extra" "1,1,1,*,*")
> > >     (set_attr "length_immediate" "1")
> > > +   (set_attr "gpr32" "1,1,0,1,1")
> > >     (set_attr "prefix" "maybe_vex,maybe_vex,maybe_vex,evex,evex")
> > >     (set_attr "isa" "noavx512f,noavx512f,noavx512f,avx512f,avx512f")
> > >     (set_attr "avx_partial_xmm_update" "false,false,true,false,true")
> > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > > index 05963de9219..456713b991a 100644
> > > --- a/gcc/config/i386/sse.md
> > > +++ b/gcc/config/i386/sse.md
> > > @@ -22617,11 +22617,12 @@ (define_insn "avx2_pblendd<mode>"
> > >
> > >  (define_insn "sse4_1_phminposuw"
> > >    [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,x")
> > > -       (unspec:V8HI [(match_operand:V8HI 1 "vector_operand" "YrBm,*xBm,xm")]
> > > +       (unspec:V8HI [(match_operand:V8HI 1 "vector_operand" "YrBT,*xBT,xBt")]
> > >                      UNSPEC_PHMINPOSUW))]
> > >    "TARGET_SSE4_1"
> > >    "%vphminposuw\t{%1, %0|%0, %1}"
> > >    [(set_attr "isa" "noavx,noavx,avx")
> > > +   (set_attr "gpr32" "0")
> > >     (set_attr "type" "sselog1")
> > >     (set_attr "prefix_extra" "1")
> > >     (set_attr "prefix" "orig,orig,vex")
> > > @@ -23810,12 +23811,13 @@ (define_insn "avx_vtest<ssemodesuffix><avxsizesuffix>"
> > >  (define_insn "*<sse4_1>_ptest<mode>"
> > >    [(set (reg FLAGS_REG)
> > >         (unspec [(match_operand:V_AVX 0 "register_operand" "Yr, *x, x")
> > > -                (match_operand:V_AVX 1 "vector_operand" "YrBm, *xBm, xm")]
> > > +                (match_operand:V_AVX 1 "vector_operand" "YrBT, *xBT, xBt")]
> > >                 UNSPEC_PTEST))]
> > >    "TARGET_SSE4_1 && ix86_match_ptest_ccmode (insn)"
> > >    "%vptest\t{%1, %0|%0, %1}"
> > >    [(set_attr "isa" "noavx,noavx,avx")
> > >     (set_attr "type" "ssecomi")
> > > +   (set_attr "gpr32" "0")
> > >     (set_attr "prefix_extra" "1")
> > >     (set_attr "prefix" "orig,orig,vex")
> > >     (set (attr "btver2_decode")
> > > @@ -23852,12 +23854,13 @@ (define_expand "<sse4_1>_ptest<mode>"
> > >  (define_insn "ptesttf2"
> > >    [(set (reg:CC FLAGS_REG)
> > >         (unspec:CC [(match_operand:TF 0 "register_operand" "Yr, *x, x")
> > > -                   (match_operand:TF 1 "vector_operand" "YrBm, *xBm, xm")]
> > > +                   (match_operand:TF 1 "vector_operand" "YrBT, *xBT, xBt")]
> > >                    UNSPEC_PTEST))]
> > >    "TARGET_SSE4_1"
> > >    "%vptest\t{%1, %0|%0, %1}"
> > >    [(set_attr "isa" "noavx,noavx,avx")
> > >     (set_attr "type" "ssecomi")
> > > +   (set_attr "gpr32" "0")
> > >     (set_attr "prefix_extra" "1")
> > >     (set_attr "prefix" "orig,orig,vex")
> > >     (set_attr "mode" "TI")])
> > > @@ -23968,13 +23971,14 @@ (define_expand "lrint<mode><sseintvecmodelower>2"
> > >  (define_insn "<sse4_1>_round<ssemodesuffix><avxsizesuffix>"
> > >    [(set (match_operand:VF_128_256 0 "register_operand" "=Yr,*x,x")
> > >         (unspec:VF_128_256
> > > -         [(match_operand:VF_128_256 1 "vector_operand" "YrBm,*xBm,xm")
> > > +         [(match_operand:VF_128_256 1 "vector_operand" "YrBT,*xBT,xBt")
> > >            (match_operand:SI 2 "const_0_to_15_operand")]
> > >           UNSPEC_ROUND))]
> > >    "TARGET_SSE4_1"
> > >    "%vround<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
> > >    [(set_attr "isa" "noavx,noavx,avx")
> > >     (set_attr "type" "ssecvt")
> > > +   (set_attr "gpr32" "0")
> > >     (set_attr "prefix_data16" "1,1,*")
> > >     (set_attr "prefix_extra" "1")
> > >     (set_attr "length_immediate" "1")
> > > @@ -24061,19 +24065,32 @@ (define_insn "sse4_1_round<ssescalarmodesuffix>"
> > >    [(set (match_operand:VF_128 0 "register_operand" "=Yr,*x,x,v")
> > >         (vec_merge:VF_128
> > >           (unspec:VF_128
> > > -           [(match_operand:VF_128 2 "nonimmediate_operand" "Yrm,*xm,xm,vm")
> > > +           [(match_operand:VF_128 2 "nonimmediate_operand" "YrBt,*xBt,xBt,vm")
> > >              (match_operand:SI 3 "const_0_to_15_operand")]
> > >             UNSPEC_ROUND)
> > >           (match_operand:VF_128 1 "register_operand" "0,0,x,v")
> > >           (const_int 1)))]
> > >    "TARGET_SSE4_1"
> > > -  "@
> > > -   round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %3}
> > > -   round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %3}
> > > -   vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}
> > > -   vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}"
> > > -  [(set_attr "isa" "noavx,noavx,avx,avx512f")
> > > +{
> > > +  switch (which_alternative)
> > > +    {
> > > +      case 0:
> > > +      case 1:
> > > +       return "round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %3}";
> > > +      case 2:
> > > +       return "vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}";
> > > +      case 3:
> > > +       if (x86_evex_reg_mentioned_p (operands, 3))
> > > +         return "vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}";
> > > +       else
> > > +         return "vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}";
> > > +      default:
> > > +       gcc_unreachable ();
> > > +    }
> > > +}
> > > +  [(set_attr "isa" "noavx,noavx,noavx512f,avx512f")
> > >     (set_attr "type" "ssecvt")
> > > +   (set_attr "gpr32" "0,0,0,1")
> > >     (set_attr "length_immediate" "1")
> > >     (set_attr "prefix_data16" "1,1,*,*")
> > >     (set_attr "prefix_extra" "1")
> > > @@ -24085,19 +24102,32 @@ (define_insn "*sse4_1_round<ssescalarmodesuffix>"
> > >         (vec_merge:VFH_128
> > >           (vec_duplicate:VFH_128
> > >             (unspec:<ssescalarmode>
> > > -             [(match_operand:<ssescalarmode> 2 "nonimmediate_operand" "Yrm,*xm,xm,vm")
> > > +             [(match_operand:<ssescalarmode> 2 "nonimmediate_operand" "YrBt,*xBt,xBt,vm")
> > >                (match_operand:SI 3 "const_0_to_15_operand")]
> > >               UNSPEC_ROUND))
> > >           (match_operand:VFH_128 1 "register_operand" "0,0,x,v")
> > >           (const_int 1)))]
> > >    "TARGET_SSE4_1"
> > > -  "@
> > > -   round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %2, %3}
> > > -   round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %2, %3}
> > > -   vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}
> > > -   vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
> > > -  [(set_attr "isa" "noavx,noavx,avx,avx512f")
> > > +{
> > > +  switch (which_alternative)
> > > +    {
> > > +      case 0:
> > > +      case 1:
> > > +       return "round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %2, %3}";
> > > +      case 2:
> > > +       return "vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}";
> > > +      case 3:
> > > +       if (x86_evex_reg_mentioned_p (operands, 3) || <MODE>mode == V8HFmode)
> > > +         return "vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}";
> > > +       else
> > > +         return "vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}";
> > > +      default:
> > > +       gcc_unreachable ();
> > > +    }
> > > +}
> > > +  [(set_attr "isa" "noavx,noavx,noavx512f,avx512f")
> > >     (set_attr "type" "ssecvt")
> > > +   (set_attr "gpr32" "0,0,0,1")
> > >     (set_attr "length_immediate" "1")
> > >     (set_attr "prefix_data16" "1,1,*,*")
> > >     (set_attr "prefix_extra" "1")
> > > @@ -24318,7 +24348,7 @@ (define_insn "sse4_2_pcmpestri"
> > >         (unspec:SI
> > >           [(match_operand:V16QI 1 "register_operand" "x,x")
> > >            (match_operand:SI 2 "register_operand" "a,a")
> > > -          (match_operand:V16QI 3 "nonimmediate_operand" "x,m")
> > > +          (match_operand:V16QI 3 "nonimmediate_operand" "x,Bt")
> > >            (match_operand:SI 4 "register_operand" "d,d")
> > >            (match_operand:SI 5 "const_0_to_255_operand")]
> > >           UNSPEC_PCMPESTR))
> > > @@ -24333,6 +24363,7 @@ (define_insn "sse4_2_pcmpestri"
> > >    "TARGET_SSE4_2"
> > >    "%vpcmpestri\t{%5, %3, %1|%1, %3, %5}"
> > >    [(set_attr "type" "sselog")
> > > +   (set_attr "gpr32" "0")
> > >     (set_attr "prefix_extra" "1")
> > >     (set_attr "prefix" "maybe_vex")
> > >     (set_attr "length_immediate" "1")
> > > @@ -24345,7 +24376,7 @@ (define_insn "sse4_2_pcmpestrm"
> > >         (unspec:V16QI
> > >           [(match_operand:V16QI 1 "register_operand" "x,x")
> > >            (match_operand:SI 2 "register_operand" "a,a")
> > > -          (match_operand:V16QI 3 "nonimmediate_operand" "x,m")
> > > +          (match_operand:V16QI 3 "nonimmediate_operand" "x,Bt")
> > >            (match_operand:SI 4 "register_operand" "d,d")
> > >            (match_operand:SI 5 "const_0_to_255_operand")]
> > >           UNSPEC_PCMPESTR))
> > > @@ -24360,6 +24391,7 @@ (define_insn "sse4_2_pcmpestrm"
> > >    "TARGET_SSE4_2"
> > >    "%vpcmpestrm\t{%5, %3, %1|%1, %3, %5}"
> > >    [(set_attr "type" "sselog")
> > > +   (set_attr "gpr32" "0")
> > >     (set_attr "prefix_extra" "1")
> > >     (set_attr "length_immediate" "1")
> > >     (set_attr "prefix" "maybe_vex")
> > > @@ -24372,7 +24404,7 @@ (define_insn "sse4_2_pcmpestr_cconly"
> > >         (unspec:CC
> > >           [(match_operand:V16QI 2 "register_operand" "x,x,x,x")
> > >            (match_operand:SI 3 "register_operand" "a,a,a,a")
> > > -          (match_operand:V16QI 4 "nonimmediate_operand" "x,m,x,m")
> > > +          (match_operand:V16QI 4 "nonimmediate_operand" "x,Bt,x,Bt")
> > >            (match_operand:SI 5 "register_operand" "d,d,d,d")
> > >            (match_operand:SI 6 "const_0_to_255_operand")]
> > >           UNSPEC_PCMPESTR))
> > > @@ -24385,6 +24417,7 @@ (define_insn "sse4_2_pcmpestr_cconly"
> > >     %vpcmpestri\t{%6, %4, %2|%2, %4, %6}
> > >     %vpcmpestri\t{%6, %4, %2|%2, %4, %6}"
> > >    [(set_attr "type" "sselog")
> > > +   (set_attr "gpr32" "0")
> > >     (set_attr "prefix_extra" "1")
> > >     (set_attr "length_immediate" "1")
> > >     (set_attr "memory" "none,load,none,load")
> > > @@ -24396,7 +24429,7 @@ (define_insn_and_split "sse4_2_pcmpistr"
> > >    [(set (match_operand:SI 0 "register_operand" "=c,c")
> > >         (unspec:SI
> > >           [(match_operand:V16QI 2 "register_operand" "x,x")
> > > -          (match_operand:V16QI 3 "nonimmediate_operand" "x,m")
> > > +          (match_operand:V16QI 3 "nonimmediate_operand" "x,Bt")
> > >            (match_operand:SI 4 "const_0_to_255_operand")]
> > >           UNSPEC_PCMPISTR))
> > >     (set (match_operand:V16QI 1 "register_operand" "=Yz,Yz")
> > > @@ -24439,6 +24472,7 @@ (define_insn_and_split "sse4_2_pcmpistr"
> > >    DONE;
> > >  }
> > >    [(set_attr "type" "sselog")
> > > +   (set_attr "gpr32" "0")
> > >     (set_attr "prefix_extra" "1")
> > >     (set_attr "length_immediate" "1")
> > >     (set_attr "memory" "none,load")
> > > @@ -24448,7 +24482,7 @@ (define_insn "sse4_2_pcmpistri"
> > >    [(set (match_operand:SI 0 "register_operand" "=c,c")
> > >         (unspec:SI
> > >           [(match_operand:V16QI 1 "register_operand" "x,x")
> > > -          (match_operand:V16QI 2 "nonimmediate_operand" "x,m")
> > > +          (match_operand:V16QI 2 "nonimmediate_operand" "x,Bt")
> > >            (match_operand:SI 3 "const_0_to_255_operand")]
> > >           UNSPEC_PCMPISTR))
> > >     (set (reg:CC FLAGS_REG)
> > > @@ -24460,6 +24494,7 @@ (define_insn "sse4_2_pcmpistri"
> > >    "TARGET_SSE4_2"
> > >    "%vpcmpistri\t{%3, %2, %1|%1, %2, %3}"
> > >    [(set_attr "type" "sselog")
> > > +   (set_attr "gpr32" "0")
> > >     (set_attr "prefix_extra" "1")
> > >     (set_attr "length_immediate" "1")
> > >     (set_attr "prefix" "maybe_vex")
> > > @@ -24471,7 +24506,7 @@ (define_insn "sse4_2_pcmpistrm"
> > >    [(set (match_operand:V16QI 0 "register_operand" "=Yz,Yz")
> > >         (unspec:V16QI
> > >           [(match_operand:V16QI 1 "register_operand" "x,x")
> > > -          (match_operand:V16QI 2 "nonimmediate_operand" "x,m")
> > > +          (match_operand:V16QI 2 "nonimmediate_operand" "x,Bt")
> > >            (match_operand:SI 3 "const_0_to_255_operand")]
> > >           UNSPEC_PCMPISTR))
> > >     (set (reg:CC FLAGS_REG)
> > > @@ -24483,6 +24518,7 @@ (define_insn "sse4_2_pcmpistrm"
> > >    "TARGET_SSE4_2"
> > >    "%vpcmpistrm\t{%3, %2, %1|%1, %2, %3}"
> > >    [(set_attr "type" "sselog")
> > > +   (set_attr "gpr32" "0")
> > >     (set_attr "prefix_extra" "1")
> > >     (set_attr "length_immediate" "1")
> > >     (set_attr "prefix" "maybe_vex")
> > > @@ -24494,7 +24530,7 @@ (define_insn "sse4_2_pcmpistr_cconly"
> > >    [(set (reg:CC FLAGS_REG)
> > >         (unspec:CC
> > >           [(match_operand:V16QI 2 "register_operand" "x,x,x,x")
> > > -          (match_operand:V16QI 3 "nonimmediate_operand" "x,m,x,m")
> > > +          (match_operand:V16QI 3 "nonimmediate_operand" "x,Bt,x,Bt")
> > >            (match_operand:SI 4 "const_0_to_255_operand")]
> > >           UNSPEC_PCMPISTR))
> > >     (clobber (match_scratch:V16QI 0 "=Yz,Yz,X,X"))
> > > @@ -24506,6 +24542,7 @@ (define_insn "sse4_2_pcmpistr_cconly"
> > >     %vpcmpistri\t{%4, %3, %2|%2, %3, %4}
> > >     %vpcmpistri\t{%4, %3, %2|%2, %3, %4}"
> > >    [(set_attr "type" "sselog")
> > > +   (set_attr "gpr32" "0")
> > >     (set_attr "prefix_extra" "1")
> > >     (set_attr "length_immediate" "1")
> > >     (set_attr "memory" "none,load,none,load")
> > > @@ -25990,23 +26027,25 @@ (define_insn "aesdeclast"
> > >
> > >  (define_insn "aesimc"
> > >    [(set (match_operand:V2DI 0 "register_operand" "=x")
> > > -       (unspec:V2DI [(match_operand:V2DI 1 "vector_operand" "xBm")]
> > > +       (unspec:V2DI [(match_operand:V2DI 1 "vector_operand" "xBT")]
> > >                       UNSPEC_AESIMC))]
> > >    "TARGET_AES"
> > >    "%vaesimc\t{%1, %0|%0, %1}"
> > >    [(set_attr "type" "sselog1")
> > > +   (set_attr "gpr32" "0")
> > >     (set_attr "prefix_extra" "1")
> > >     (set_attr "prefix" "maybe_vex")
> > >     (set_attr "mode" "TI")])
> > >
> > >  (define_insn "aeskeygenassist"
> > >    [(set (match_operand:V2DI 0 "register_operand" "=x")
> > > -       (unspec:V2DI [(match_operand:V2DI 1 "vector_operand" "xBm")
> > > +       (unspec:V2DI [(match_operand:V2DI 1 "vector_operand" "xBT")
> > >                       (match_operand:SI 2 "const_0_to_255_operand")]
> > >                      UNSPEC_AESKEYGENASSIST))]
> > >    "TARGET_AES"
> > >    "%vaeskeygenassist\t{%2, %1, %0|%0, %1, %2}"
> > >    [(set_attr "type" "sselog1")
> > > +   (set_attr "gpr32" "0")
> > >     (set_attr "prefix_extra" "1")
> > >     (set_attr "length_immediate" "1")
> > >     (set_attr "prefix" "maybe_vex")
> > > diff --git a/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c b/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c
> > > index 510213a6ca7..771bcb078e1 100644
> > > --- a/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c
> > > +++ b/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c
> > > @@ -45,13 +45,22 @@ typedef union
> > >    DTYPE a[16];
> > >  } tmp_u;
> > >
> > > -__attribute__((target("sse4.2")))
> > > +__attribute__((target("sse4.2,aes")))
> > >  void sse_test ()
> > >  {
> > >    register tmp_u *tdst __asm__("%r16");
> > >    register tmp_u *src1 __asm__("%r17");
> > >    register tmp_u *src2 __asm__("%r18");
> > > -
> > > +
> > > +  src1->xi[0] = _mm_minpos_epu16 (src1->xi[1]);
> > > +  src1->a[2] = _mm_testc_si128 (src1->xi[3], src2->xi[4]);
> > > +  src1->xf[3] = _mm_round_ss (src1->xf[5], src2->xf[6],
> > > +                             _MM_FROUND_CUR_DIRECTION);
> > > +  src1->xf[4] = _mm_round_ps (src1->xf[7], _MM_FROUND_CUR_DIRECTION);
> > > +  src1->xd[0] = _mm_round_sd (src1->xd[2], src2->xd[3],
> > > +                             _MM_FROUND_CUR_DIRECTION);
> > > +  src1->xd[1] = _mm_round_pd (src1->xd[4], _MM_FROUND_CUR_DIRECTION);
> > > +
> > >    src1->xi[0] = _mm_hadd_epi16 (tdst->xi[2], src2->xi[3]);
> > >    src1->xi[1] = _mm_hadd_epi32 (tdst->xi[0], src2->xi[1]);
> > >    tdst->xi[2] = _mm_hadds_epi16 (src1->xi[4], src2->xi[5]);
> > > @@ -77,16 +86,33 @@ void sse_test ()
> > >    tdst->xi[1] = _mm_sign_epi8 (src1->xi[5], src2->xi[6]);
> > >    tdst->xi[2] = _mm_sign_epi16 (src1->xi[7], src2->xi[0]);
> > >    tdst->xi[3] = _mm_sign_epi32 (src1->xi[1], src2->xi[2]);
> > > +
> > > +  tdst->a[2] = _mm_cmpestri (src1->xi[3], 16, src2->xi[4], 16, 0x0c);
> > > +  tdst->xi[4] = _mm_cmpestrm (src1->xi[3], 16, src2->xi[4], 16, 0x20);
> > > +  tdst->a[5] = _mm_cmpistri (src1->xi[5], src2->xi[6], 0x30);
> > > +  tdst->xi[6] = _mm_cmpistrm (src1->xi[5], src2->xi[6], 0x40);
> > > +
> > > +  tdst->xi[7] = _mm_aesimc_si128 (src1->xi[7]);
> > > +  tdst->xi[0] = _mm_aeskeygenassist_si128 (src1->xi[1], 0x1b);
> > >  }
> > >
> > > -__attribute__((target("avx2")))
> > > +__attribute__((target("avx2,aes")))
> > >  void vex_test ()
> > >  {
> > >
> > >    register tmp_u *tdst __asm__("%r16");
> > >    register tmp_u *src1 __asm__("%r17");
> > >    register tmp_u *src2 __asm__("%r18");
> > > -
> > > +
> > > +  src1->xi[0] = _mm_minpos_epu16 (src1->xi[1]);
> > > +  src1->a[2] = _mm256_testc_si256 (src1->yi[2], src2->yi[3]);
> > > +  src1->xf[3] = _mm_round_ss (src1->xf[5], src2->xf[6],
> > > +                             _MM_FROUND_CUR_DIRECTION);
> > > +  src1->yf[4] = _mm256_round_ps (src1->yf[2], _MM_FROUND_CUR_DIRECTION);
> > > +  src1->xd[0] = _mm_round_sd (src1->xd[2], src2->xd[3],
> > > +                             _MM_FROUND_CUR_DIRECTION);
> > > +  src1->yd[1] = _mm256_round_pd (src1->yd[3], _MM_FROUND_CUR_DIRECTION);
> > > +
> > >    src1->yi[1] = _mm256_hadd_epi16 (tdst->yi[2], src2->yi[3]);
> > >    src1->yi[2] = _mm256_hadd_epi32 (tdst->yi[0], src2->yi[1]);
> > >    tdst->yi[3] = _mm256_hadds_epi16 (src1->yi[1], src2->yi[2]);
> > > @@ -98,7 +124,6 @@ void vex_test ()
> > >    src1->yi[1] = _mm256_cmpgt_epi64 (tdst->yi[3], src2->yi[0]);
> > >
> > >    tdst->yf[2] = _mm256_dp_ps (src1->yf[0], src2->yf[1], 0xbf);
> > > -  tdst->xd[3] = _mm_dp_pd (src1->xd[0], src2->xd[1], 0xbf);
> > >
> > >    tdst->yi[3] = _mm256_mpsadbw_epu8 (src1->yi[1], src2->yi[1], 0xc1);
> > >
> > > @@ -112,6 +137,14 @@ void vex_test ()
> > >    tdst->yi[2] = _mm256_sign_epi8 (src1->yi[0], src2->yi[1]);
> > >    tdst->yi[3] = _mm256_sign_epi16 (src1->yi[2], src2->yi[3]);
> > >    tdst->yi[0] = _mm256_sign_epi32 (src1->yi[0], src2->yi[1]);
> > > +
> > > +  tdst->a[2] = _mm_cmpestri (src1->xi[3], 16, src2->xi[4], 16, 0x0c);
> > > +  tdst->xi[4] = _mm_cmpestrm (src1->xi[3], 16, src2->xi[4], 16, 0x20);
> > > +  tdst->a[5] = _mm_cmpistri (src1->xi[5], src2->xi[6], 0x30);
> > > +  tdst->xi[6] = _mm_cmpistrm (src1->xi[5], src2->xi[6], 0x40);
> > > +
> > > +  tdst->xi[7] = _mm_aesimc_si128 (src1->xi[7]);
> > > +  tdst->xi[0] = _mm_aeskeygenassist_si128 (src1->xi[1], 0x1b);
> > >  }
> > >
> > >  /* { dg-final { scan-assembler-not "v?pcmpeqq\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > > @@ -134,3 +167,15 @@ void vex_test ()
> > >  /* { dg-final { scan-assembler-not "v?psignb\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > >  /* { dg-final { scan-assembler-not "v?psignw\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > >  /* { dg-final { scan-assembler-not "v?psignd\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > > +/* { dg-final { scan-assembler-not "v?phminposuw\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > > +/* { dg-final { scan-assembler-not "v?ptest\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > > +/* { dg-final { scan-assembler-not "v?roundss\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > > +/* { dg-final { scan-assembler-not "v?roundsd\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > > +/* { dg-final { scan-assembler-not "v?roundps\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > > +/* { dg-final { scan-assembler-not "v?roundpd\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > > +/* { dg-final { scan-assembler-not "v?pcmpestri\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > > +/* { dg-final { scan-assembler-not "v?pcmpistri\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > > +/* { dg-final { scan-assembler-not "v?pcmpestrm\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > > +/* { dg-final { scan-assembler-not "v?pcmpistrm\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > > +/* { dg-final { scan-assembler-not "v?aesimc\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > > +/* { dg-final { scan-assembler-not "v?aeskeygenassist\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > > --
> > > 2.31.1
> > >



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 06/13] [APX EGPR] Map reg/mem constraints in inline asm to non-EGPR constraint.
  2023-08-31 10:00     ` Uros Bizjak
  2023-09-01  9:04       ` Hongyu Wang
@ 2023-09-01 11:03       ` Richard Sandiford
  2023-09-04  1:03         ` Hongtao Liu
  1 sibling, 1 reply; 49+ messages in thread
From: Richard Sandiford @ 2023-09-01 11:03 UTC (permalink / raw)
  To: Uros Bizjak via Gcc-patches
  Cc: Jakub Jelinek, Uros Bizjak, Hongyu Wang, hongtao.liu, hubicka

Uros Bizjak via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> On Thu, Aug 31, 2023 at 11:18 AM Jakub Jelinek via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>>
>> On Thu, Aug 31, 2023 at 04:20:17PM +0800, Hongyu Wang via Gcc-patches wrote:
>> > From: Kong Lingling <lingling.kong@intel.com>
>> >
>> > In inline asm, we do not know if the insn can use EGPR, so disable EGPR
>> > usage by default from mapping the common reg/mem constraint to non-EGPR
>> > constraints. Use a flag mapx-inline-asm-use-gpr32 to enable EGPR usage
>> > for inline asm.
>> >
>> > gcc/ChangeLog:
>> >
>> >       * config/i386/i386.cc (INCLUDE_STRING): Add include for
>> >       ix86_md_asm_adjust.
>> >       (ix86_md_asm_adjust): When APX EGPR enabled without specifying the
>> >       target option, map reg/mem constraints to non-EGPR constraints.
>> >       * config/i386/i386.opt: Add option mapx-inline-asm-use-gpr32.
>> >
>> > gcc/testsuite/ChangeLog:
>> >
>> >       * gcc.target/i386/apx-inline-gpr-norex2.c: New test.
>> > ---
>> >  gcc/config/i386/i386.cc                       |  44 +++++++
>> >  gcc/config/i386/i386.opt                      |   5 +
>> >  .../gcc.target/i386/apx-inline-gpr-norex2.c   | 107 ++++++++++++++++++
>> >  3 files changed, 156 insertions(+)
>> >  create mode 100644 gcc/testsuite/gcc.target/i386/apx-inline-gpr-norex2.c
>> >
>> > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
>> > index d26d9ab0d9d..9460ebbfda4 100644
>> > --- a/gcc/config/i386/i386.cc
>> > +++ b/gcc/config/i386/i386.cc
>> > @@ -17,6 +17,7 @@ You should have received a copy of the GNU General Public License
>> >  along with GCC; see the file COPYING3.  If not see
>> >  <http://www.gnu.org/licenses/>.  */
>> >
>> > +#define INCLUDE_STRING
>> >  #define IN_TARGET_CODE 1
>> >
>> >  #include "config.h"
>> > @@ -23077,6 +23078,49 @@ ix86_md_asm_adjust (vec<rtx> &outputs, vec<rtx> & /*inputs*/,
>> >    bool saw_asm_flag = false;
>> >
>> >    start_sequence ();
>> > +  /* TODO: Here we just mapped the general r/m constraints to non-EGPR
>> > +   constraints, will eventually map all the usable constraints in the future. */
>>
>> I think there should be some constraint which explicitly has all the 32
>> GPRs, like there is one for just all 16 GPRs (h), so that regardless of
>> -mapx-inline-asm-use-gpr32 one can be explicit what the inline asm wants.
>>
>> Also, what about the "g" constraint?  Shouldn't there be another for "g"
>> without r16..r31?  What about the various other memory
>> constraints ("<", "o", ...)?
>
> I think we should leave all existing constraints as they are, so "r"
> covers only GPR16, "m" and "o" to only use GPR16. We can then
> introduce "h" to instructions that have the ability to handle EGPR.

Yeah.  I'm jumping in without having read the full thread, sorry,
but the current mechanism for handling this is TARGET_MEM_CONSTRAINT
(added for s390).  That is, TARGET_MEM_CONSTRAINT can be defined to some
new constraint that is more general than the traditional "m" constraint.
This constraint is then the one that is associated with memory_operand
etc.  "m" can then be defined explicitly to the old definition,
so that existing asms continue to work.

So if the port wants generic internal memory addresses to use the
EGPR set (sounds reasonable), then TARGET_MEM_CONSTRAINT would be
a new constraint that maps to those addresses.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 06/13] [APX EGPR] Map reg/mem constraints in inline asm to non-EGPR constraint.
  2023-09-01 10:35           ` Hongtao Liu
@ 2023-09-01 11:27             ` Uros Bizjak
  2023-09-04  0:28               ` Hongtao Liu
  0 siblings, 1 reply; 49+ messages in thread
From: Uros Bizjak @ 2023-09-01 11:27 UTC (permalink / raw)
  To: Hongtao Liu
  Cc: Hongyu Wang, Jakub Jelinek, Hongyu Wang, gcc-patches,
	hongtao.liu, hubicka

On Fri, Sep 1, 2023 at 12:36 PM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Fri, Sep 1, 2023 at 5:38 PM Uros Bizjak via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > On Fri, Sep 1, 2023 at 11:10 AM Hongyu Wang <wwwhhhyyy333@gmail.com> wrote:
> > >
> > > Uros Bizjak via Gcc-patches <gcc-patches@gcc.gnu.org> 于2023年8月31日周四 18:01写道:
> > > >
> > > > On Thu, Aug 31, 2023 at 11:18 AM Jakub Jelinek via Gcc-patches
> > > > <gcc-patches@gcc.gnu.org> wrote:
> > > > >
> > > > > On Thu, Aug 31, 2023 at 04:20:17PM +0800, Hongyu Wang via Gcc-patches wrote:
> > > > > > From: Kong Lingling <lingling.kong@intel.com>
> > > > > >
> > > > > > In inline asm, we do not know if the insn can use EGPR, so disable EGPR
> > > > > > usage by default from mapping the common reg/mem constraint to non-EGPR
> > > > > > constraints. Use a flag mapx-inline-asm-use-gpr32 to enable EGPR usage
> > > > > > for inline asm.
> > > > > >
> > > > > > gcc/ChangeLog:
> > > > > >
> > > > > >       * config/i386/i386.cc (INCLUDE_STRING): Add include for
> > > > > >       ix86_md_asm_adjust.
> > > > > >       (ix86_md_asm_adjust): When APX EGPR enabled without specifying the
> > > > > >       target option, map reg/mem constraints to non-EGPR constraints.
> > > > > >       * config/i386/i386.opt: Add option mapx-inline-asm-use-gpr32.
> > > > > >
> > > > > > gcc/testsuite/ChangeLog:
> > > > > >
> > > > > >       * gcc.target/i386/apx-inline-gpr-norex2.c: New test.
> > > > > > ---
> > > > > >  gcc/config/i386/i386.cc                       |  44 +++++++
> > > > > >  gcc/config/i386/i386.opt                      |   5 +
> > > > > >  .../gcc.target/i386/apx-inline-gpr-norex2.c   | 107 ++++++++++++++++++
> > > > > >  3 files changed, 156 insertions(+)
> > > > > >  create mode 100644 gcc/testsuite/gcc.target/i386/apx-inline-gpr-norex2.c
> > > > > >
> > > > > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > > > > > index d26d9ab0d9d..9460ebbfda4 100644
> > > > > > --- a/gcc/config/i386/i386.cc
> > > > > > +++ b/gcc/config/i386/i386.cc
> > > > > > @@ -17,6 +17,7 @@ You should have received a copy of the GNU General Public License
> > > > > >  along with GCC; see the file COPYING3.  If not see
> > > > > >  <http://www.gnu.org/licenses/>.  */
> > > > > >
> > > > > > +#define INCLUDE_STRING
> > > > > >  #define IN_TARGET_CODE 1
> > > > > >
> > > > > >  #include "config.h"
> > > > > > @@ -23077,6 +23078,49 @@ ix86_md_asm_adjust (vec<rtx> &outputs, vec<rtx> & /*inputs*/,
> > > > > >    bool saw_asm_flag = false;
> > > > > >
> > > > > >    start_sequence ();
> > > > > > +  /* TODO: Here we just mapped the general r/m constraints to non-EGPR
> > > > > > +   constraints, will eventually map all the usable constraints in the future. */
> > > > >
> > > > > I think there should be some constraint which explicitly has all the 32
> > > > > GPRs, like there is one for just all 16 GPRs (h), so that regardless of
> > > > > -mapx-inline-asm-use-gpr32 one can be explicit what the inline asm wants.
> > > > >
> > > > > Also, what about the "g" constraint?  Shouldn't there be another for "g"
> > > > > without r16..r31?  What about the various other memory
> > > > > constraints ("<", "o", ...)?
> > > >
> > > > I think we should leave all existing constraints as they are, so "r"
> > > > covers only GPR16, "m" and "o" to only use GPR16. We can then
> > > > introduce "h" to instructions that have the ability to handle EGPR.
> > > > This would be somehow similar to the SSE -> AVX512F transition, where
> > > > we still have "x" for SSE16 and "v" was introduced as a separate
> > > > register class for EVEX SSE registers. This way, asm will be
> > > > compatible, when "r", "m", "o" and "g" are used. The new memory
> > > > constraint "Bt", should allow new registers, and should be added to
> > > > the constraint string as a separate constraint, and conditionally
> > > > enabled by relevant "isa" (AKA "enabled") attribute.
> > >
> > > The extended constraint can work for registers, but for memory it is more
> > > complicated.
> >
> > Yes, unfortunately. The compiler assumes that an unchangeable register
> > class is used for BASE/INDEX registers. I have hit this limitation
> > when trying to implement memory support for instructions involving
> > 8-bit high registers (%ah, %bh, %ch, %dh), which do not support REX
> > registers, also inside memory operand. (You can see the "hack" in e.g.
> > *extzvqi_mem_rex64" and corresponding peephole2 with the original
> > *extzvqi pattern). I am aware that dynamic insn-dependent BASE/INDEX
> > register class is the major limitation in the compiler, so perhaps the
> > strategy on how to override this limitation should be discussed with
> > the register allocator author first. Perhaps adding an insn attribute
> > to insn RTX pattern to specify different BASE/INDEX register sets can
> > be a better solution than passing insn RTX to the register allocator.
> >
> > The above idea still does not solve the asm problem on how to select
> > correct BASE/INDEX register set for memory operands.
> The current approach disables gpr32 for memory operand in asm_operand
> by default. but can be turned on by options
> ix86_apx_inline_asm_use_gpr32(users need to guarantee the instruction
> supports gpr32).
> Only ~ 5% of total instructions don't support gpr32, reversed approach
> only gonna get more complicated.

I'm not referring to the reversed approach, just want to point out
that the same approach as you proposed w.r.t. to memory operand can be
achieved using some named insn attribute that would affect BASE/INDEX
register class selection. The attribute could default to gpr32 with
APX, unless the insn specific attribute has e.g. nogpr32 value. See
for example how "enabled" and "preferred_for_*" attributes are used.
Perhaps this new attribute can also be applied to separate
alternatives.

Uros.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 08/13] [APX EGPR] Handle GPR16 only vector move insns
  2023-09-01  9:20       ` Jakub Jelinek
@ 2023-09-01 11:34         ` Hongyu Wang
  2023-09-01 11:41           ` Jakub Jelinek
  0 siblings, 1 reply; 49+ messages in thread
From: Hongyu Wang @ 2023-09-01 11:34 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Hongyu Wang, gcc-patches, hongtao.liu, hubicka

Jakub Jelinek <jakub@redhat.com> 于2023年9月1日周五 17:20写道:
>
> On Fri, Sep 01, 2023 at 05:07:53PM +0800, Hongyu Wang wrote:
> > Jakub Jelinek via Gcc-patches <gcc-patches@gcc.gnu.org> 于2023年8月31日周四 17:44写道:
> > >
> > > On Thu, Aug 31, 2023 at 04:20:19PM +0800, Hongyu Wang via Gcc-patches wrote:
> > > > For vector move insns like vmovdqa/vmovdqu, their evex counterparts
> > > > requrire explicit suffix 64/32/16/8. The usage of these instruction
> > > > are prohibited under AVX10_1 or AVX512F, so for AVX2+APX_F we select
> > > > vmovaps/vmovups for vector load/store insns that contains EGPR.
> > >
> > > Why not make it dependent on AVX512VL?
> > > I.e. if egpr_p && TARGET_AVX512VL, still use vmovdqu16 or vmovdqa16
> > > and the like, and only if !evex_reg_p && egpr_p && !TARGET_AVX512VL
> > > fall back to what you're doing?
> >
> > I'm not sure if it is necessary, as on hardware there is no difference between
> > vmovdqu16/vmovups. If vmovups already has the capability to represent
> > EGPR why do we need to distinguish them under VL?
>
> On the Intel HW you're currently planning.
> Will that be the case for AMD as well?
> Some insns are documented to move float or double vectors while others
> integer vectors (of different element sizes).
> Or is vmovups with GPR32 at least encoded smaller than vmovdqu{16,32,64}?

With GPR32 they have same encoding size. If we need to strictly follow
the meaning of mnemonics,
I will adjust as you suggested. Thanks.


>
>         Jakub
>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 08/13] [APX EGPR] Handle GPR16 only vector move insns
  2023-09-01 11:34         ` Hongyu Wang
@ 2023-09-01 11:41           ` Jakub Jelinek
  0 siblings, 0 replies; 49+ messages in thread
From: Jakub Jelinek @ 2023-09-01 11:41 UTC (permalink / raw)
  To: Hongyu Wang; +Cc: Hongyu Wang, gcc-patches, hongtao.liu, hubicka

On Fri, Sep 01, 2023 at 07:34:16PM +0800, Hongyu Wang wrote:
> > On Fri, Sep 01, 2023 at 05:07:53PM +0800, Hongyu Wang wrote:
> > > Jakub Jelinek via Gcc-patches <gcc-patches@gcc.gnu.org> 于2023年8月31日周四 17:44写道:
> > > >
> > > > On Thu, Aug 31, 2023 at 04:20:19PM +0800, Hongyu Wang via Gcc-patches wrote:
> > > > > For vector move insns like vmovdqa/vmovdqu, their evex counterparts
> > > > > requrire explicit suffix 64/32/16/8. The usage of these instruction
> > > > > are prohibited under AVX10_1 or AVX512F, so for AVX2+APX_F we select
> > > > > vmovaps/vmovups for vector load/store insns that contains EGPR.
> > > >
> > > > Why not make it dependent on AVX512VL?
> > > > I.e. if egpr_p && TARGET_AVX512VL, still use vmovdqu16 or vmovdqa16
> > > > and the like, and only if !evex_reg_p && egpr_p && !TARGET_AVX512VL
> > > > fall back to what you're doing?
> > >
> > > I'm not sure if it is necessary, as on hardware there is no difference between
> > > vmovdqu16/vmovups. If vmovups already has the capability to represent
> > > EGPR why do we need to distinguish them under VL?
> >
> > On the Intel HW you're currently planning.
> > Will that be the case for AMD as well?
> > Some insns are documented to move float or double vectors while others
> > integer vectors (of different element sizes).
> > Or is vmovups with GPR32 at least encoded smaller than vmovdqu{16,32,64}?
> 
> With GPR32 they have same encoding size. If we need to strictly follow
> the meaning of mnemonics,
> I will adjust as you suggested. Thanks.

I think it is useful, even if just for those who try to read the
assembler/disassembler.  Of course, if there are cases where only one of
those has to be used (say -mavx -mno-avx2 and 256-bit integer vector moves),
there is no way around that and one just uses what is available.

	Jakub


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 06/13] [APX EGPR] Map reg/mem constraints in inline asm to non-EGPR constraint.
  2023-09-01 11:27             ` Uros Bizjak
@ 2023-09-04  0:28               ` Hongtao Liu
  2023-09-04  8:57                 ` Uros Bizjak
  0 siblings, 1 reply; 49+ messages in thread
From: Hongtao Liu @ 2023-09-04  0:28 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: Hongyu Wang, Jakub Jelinek, Hongyu Wang, gcc-patches,
	hongtao.liu, hubicka

On Fri, Sep 1, 2023 at 7:27 PM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Fri, Sep 1, 2023 at 12:36 PM Hongtao Liu <crazylht@gmail.com> wrote:
> >
> > On Fri, Sep 1, 2023 at 5:38 PM Uros Bizjak via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> > >
> > > On Fri, Sep 1, 2023 at 11:10 AM Hongyu Wang <wwwhhhyyy333@gmail.com> wrote:
> > > >
> > > > Uros Bizjak via Gcc-patches <gcc-patches@gcc.gnu.org> 于2023年8月31日周四 18:01写道:
> > > > >
> > > > > On Thu, Aug 31, 2023 at 11:18 AM Jakub Jelinek via Gcc-patches
> > > > > <gcc-patches@gcc.gnu.org> wrote:
> > > > > >
> > > > > > On Thu, Aug 31, 2023 at 04:20:17PM +0800, Hongyu Wang via Gcc-patches wrote:
> > > > > > > From: Kong Lingling <lingling.kong@intel.com>
> > > > > > >
> > > > > > > In inline asm, we do not know if the insn can use EGPR, so disable EGPR
> > > > > > > usage by default from mapping the common reg/mem constraint to non-EGPR
> > > > > > > constraints. Use a flag mapx-inline-asm-use-gpr32 to enable EGPR usage
> > > > > > > for inline asm.
> > > > > > >
> > > > > > > gcc/ChangeLog:
> > > > > > >
> > > > > > >       * config/i386/i386.cc (INCLUDE_STRING): Add include for
> > > > > > >       ix86_md_asm_adjust.
> > > > > > >       (ix86_md_asm_adjust): When APX EGPR enabled without specifying the
> > > > > > >       target option, map reg/mem constraints to non-EGPR constraints.
> > > > > > >       * config/i386/i386.opt: Add option mapx-inline-asm-use-gpr32.
> > > > > > >
> > > > > > > gcc/testsuite/ChangeLog:
> > > > > > >
> > > > > > >       * gcc.target/i386/apx-inline-gpr-norex2.c: New test.
> > > > > > > ---
> > > > > > >  gcc/config/i386/i386.cc                       |  44 +++++++
> > > > > > >  gcc/config/i386/i386.opt                      |   5 +
> > > > > > >  .../gcc.target/i386/apx-inline-gpr-norex2.c   | 107 ++++++++++++++++++
> > > > > > >  3 files changed, 156 insertions(+)
> > > > > > >  create mode 100644 gcc/testsuite/gcc.target/i386/apx-inline-gpr-norex2.c
> > > > > > >
> > > > > > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > > > > > > index d26d9ab0d9d..9460ebbfda4 100644
> > > > > > > --- a/gcc/config/i386/i386.cc
> > > > > > > +++ b/gcc/config/i386/i386.cc
> > > > > > > @@ -17,6 +17,7 @@ You should have received a copy of the GNU General Public License
> > > > > > >  along with GCC; see the file COPYING3.  If not see
> > > > > > >  <http://www.gnu.org/licenses/>.  */
> > > > > > >
> > > > > > > +#define INCLUDE_STRING
> > > > > > >  #define IN_TARGET_CODE 1
> > > > > > >
> > > > > > >  #include "config.h"
> > > > > > > @@ -23077,6 +23078,49 @@ ix86_md_asm_adjust (vec<rtx> &outputs, vec<rtx> & /*inputs*/,
> > > > > > >    bool saw_asm_flag = false;
> > > > > > >
> > > > > > >    start_sequence ();
> > > > > > > +  /* TODO: Here we just mapped the general r/m constraints to non-EGPR
> > > > > > > +   constraints, will eventually map all the usable constraints in the future. */
> > > > > >
> > > > > > I think there should be some constraint which explicitly has all the 32
> > > > > > GPRs, like there is one for just all 16 GPRs (h), so that regardless of
> > > > > > -mapx-inline-asm-use-gpr32 one can be explicit what the inline asm wants.
> > > > > >
> > > > > > Also, what about the "g" constraint?  Shouldn't there be another for "g"
> > > > > > without r16..r31?  What about the various other memory
> > > > > > constraints ("<", "o", ...)?
> > > > >
> > > > > I think we should leave all existing constraints as they are, so "r"
> > > > > covers only GPR16, "m" and "o" to only use GPR16. We can then
> > > > > introduce "h" to instructions that have the ability to handle EGPR.
> > > > > This would be somehow similar to the SSE -> AVX512F transition, where
> > > > > we still have "x" for SSE16 and "v" was introduced as a separate
> > > > > register class for EVEX SSE registers. This way, asm will be
> > > > > compatible, when "r", "m", "o" and "g" are used. The new memory
> > > > > constraint "Bt", should allow new registers, and should be added to
> > > > > the constraint string as a separate constraint, and conditionally
> > > > > enabled by relevant "isa" (AKA "enabled") attribute.
> > > >
> > > > The extended constraint can work for registers, but for memory it is more
> > > > complicated.
> > >
> > > Yes, unfortunately. The compiler assumes that an unchangeable register
> > > class is used for BASE/INDEX registers. I have hit this limitation
> > > when trying to implement memory support for instructions involving
> > > 8-bit high registers (%ah, %bh, %ch, %dh), which do not support REX
> > > registers, also inside memory operand. (You can see the "hack" in e.g.
> > > *extzvqi_mem_rex64" and corresponding peephole2 with the original
> > > *extzvqi pattern). I am aware that dynamic insn-dependent BASE/INDEX
> > > register class is the major limitation in the compiler, so perhaps the
> > > strategy on how to override this limitation should be discussed with
> > > the register allocator author first. Perhaps adding an insn attribute
> > > to insn RTX pattern to specify different BASE/INDEX register sets can
> > > be a better solution than passing insn RTX to the register allocator.
> > >
> > > The above idea still does not solve the asm problem on how to select
> > > correct BASE/INDEX register set for memory operands.
> > The current approach disables gpr32 for memory operand in asm_operand
> > by default. but can be turned on by options
> > ix86_apx_inline_asm_use_gpr32(users need to guarantee the instruction
> > supports gpr32).
> > Only ~ 5% of total instructions don't support gpr32, reversed approach
> > only gonna get more complicated.
>
> I'm not referring to the reversed approach, just want to point out
> that the same approach as you proposed w.r.t. to memory operand can be
> achieved using some named insn attribute that would affect BASE/INDEX
> register class selection. The attribute could default to gpr32 with
> APX, unless the insn specific attribute has e.g. nogpr32 value. See
> for example how "enabled" and "preferred_for_*" attributes are used.
> Perhaps this new attribute can also be applied to separate
> alternatives.
Yes, for xop/fma4/3dnow instructions, I think we can use isa attr like
(define_attr "gpr32" "0, 1"
  (cond [(eq_attr "isa" "fma4")
           (const_string "0")]
      (const_string "1")))

But still, we need to adjust memory constraints in the pattern.
Ideally, gcc includes encoding information for every instruction,
(.i.e. map0/map1), so that we can determine the attribute value of
gpr32 directly from this information.
>
> Uros.



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 06/13] [APX EGPR] Map reg/mem constraints in inline asm to non-EGPR constraint.
  2023-09-01 11:03       ` Richard Sandiford
@ 2023-09-04  1:03         ` Hongtao Liu
  0 siblings, 0 replies; 49+ messages in thread
From: Hongtao Liu @ 2023-09-04  1:03 UTC (permalink / raw)
  To: Richard Sandiford, Uros Bizjak via Gcc-patches, Jakub Jelinek,
	Uros Bizjak, Hongyu Wang, hongtao.liu, hubicka

On Fri, Sep 1, 2023 at 7:03 PM Richard Sandiford via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Uros Bizjak via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > On Thu, Aug 31, 2023 at 11:18 AM Jakub Jelinek via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> >>
> >> On Thu, Aug 31, 2023 at 04:20:17PM +0800, Hongyu Wang via Gcc-patches wrote:
> >> > From: Kong Lingling <lingling.kong@intel.com>
> >> >
> >> > In inline asm, we do not know if the insn can use EGPR, so disable EGPR
> >> > usage by default from mapping the common reg/mem constraint to non-EGPR
> >> > constraints. Use a flag mapx-inline-asm-use-gpr32 to enable EGPR usage
> >> > for inline asm.
> >> >
> >> > gcc/ChangeLog:
> >> >
> >> >       * config/i386/i386.cc (INCLUDE_STRING): Add include for
> >> >       ix86_md_asm_adjust.
> >> >       (ix86_md_asm_adjust): When APX EGPR enabled without specifying the
> >> >       target option, map reg/mem constraints to non-EGPR constraints.
> >> >       * config/i386/i386.opt: Add option mapx-inline-asm-use-gpr32.
> >> >
> >> > gcc/testsuite/ChangeLog:
> >> >
> >> >       * gcc.target/i386/apx-inline-gpr-norex2.c: New test.
> >> > ---
> >> >  gcc/config/i386/i386.cc                       |  44 +++++++
> >> >  gcc/config/i386/i386.opt                      |   5 +
> >> >  .../gcc.target/i386/apx-inline-gpr-norex2.c   | 107 ++++++++++++++++++
> >> >  3 files changed, 156 insertions(+)
> >> >  create mode 100644 gcc/testsuite/gcc.target/i386/apx-inline-gpr-norex2.c
> >> >
> >> > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> >> > index d26d9ab0d9d..9460ebbfda4 100644
> >> > --- a/gcc/config/i386/i386.cc
> >> > +++ b/gcc/config/i386/i386.cc
> >> > @@ -17,6 +17,7 @@ You should have received a copy of the GNU General Public License
> >> >  along with GCC; see the file COPYING3.  If not see
> >> >  <http://www.gnu.org/licenses/>.  */
> >> >
> >> > +#define INCLUDE_STRING
> >> >  #define IN_TARGET_CODE 1
> >> >
> >> >  #include "config.h"
> >> > @@ -23077,6 +23078,49 @@ ix86_md_asm_adjust (vec<rtx> &outputs, vec<rtx> & /*inputs*/,
> >> >    bool saw_asm_flag = false;
> >> >
> >> >    start_sequence ();
> >> > +  /* TODO: Here we just mapped the general r/m constraints to non-EGPR
> >> > +   constraints, will eventually map all the usable constraints in the future. */
> >>
> >> I think there should be some constraint which explicitly has all the 32
> >> GPRs, like there is one for just all 16 GPRs (h), so that regardless of
> >> -mapx-inline-asm-use-gpr32 one can be explicit what the inline asm wants.
> >>
> >> Also, what about the "g" constraint?  Shouldn't there be another for "g"
> >> without r16..r31?  What about the various other memory
> >> constraints ("<", "o", ...)?
> >
> > I think we should leave all existing constraints as they are, so "r"
> > covers only GPR16, "m" and "o" to only use GPR16. We can then
> > introduce "h" to instructions that have the ability to handle EGPR.
>
> Yeah.  I'm jumping in without having read the full thread, sorry,
> but the current mechanism for handling this is TARGET_MEM_CONSTRAINT
> (added for s390).  That is, TARGET_MEM_CONSTRAINT can be defined to some
Thanks for the comments.
> new constraint that is more general than the traditional "m" constraint.
> This constraint is then the one that is associated with memory_operand
> etc.  "m" can then be defined explicitly to the old definition,
> so that existing asms continue to work.
>
> So if the port wants generic internal memory addresses to use the
> EGPR set (sounds reasonable), then TARGET_MEM_CONSTRAINT would be
> a new constraint that maps to those addresses.
But still we need to enhance current reload infrastructure to support
selective base_reg_class/index_reg_class, refer to [1].
The good thing about using TARGET_MEM_CONSTRAINT is that we don't have
to remapping memory constraint for inline asm, but the bad thing about
it is that we need to modify the backend pattern a lot, because only
5% of the instructions don't support gpr32, and 95% of them need to be
changed to the new memory constraint.
It feels like the cons outweigh the pros.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-September/629040.html

>
> Thanks,
> Richard



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 06/13] [APX EGPR] Map reg/mem constraints in inline asm to non-EGPR constraint.
  2023-09-04  0:28               ` Hongtao Liu
@ 2023-09-04  8:57                 ` Uros Bizjak
  2023-09-04  9:10                   ` Hongtao Liu
  0 siblings, 1 reply; 49+ messages in thread
From: Uros Bizjak @ 2023-09-04  8:57 UTC (permalink / raw)
  To: Hongtao Liu
  Cc: Hongyu Wang, Jakub Jelinek, Hongyu Wang, gcc-patches,
	hongtao.liu, hubicka

On Mon, Sep 4, 2023 at 2:28 AM Hongtao Liu <crazylht@gmail.com> wrote:

> > > > > > > I think there should be some constraint which explicitly has all the 32
> > > > > > > GPRs, like there is one for just all 16 GPRs (h), so that regardless of
> > > > > > > -mapx-inline-asm-use-gpr32 one can be explicit what the inline asm wants.
> > > > > > >
> > > > > > > Also, what about the "g" constraint?  Shouldn't there be another for "g"
> > > > > > > without r16..r31?  What about the various other memory
> > > > > > > constraints ("<", "o", ...)?
> > > > > >
> > > > > > I think we should leave all existing constraints as they are, so "r"
> > > > > > covers only GPR16, "m" and "o" to only use GPR16. We can then
> > > > > > introduce "h" to instructions that have the ability to handle EGPR.
> > > > > > This would be somehow similar to the SSE -> AVX512F transition, where
> > > > > > we still have "x" for SSE16 and "v" was introduced as a separate
> > > > > > register class for EVEX SSE registers. This way, asm will be
> > > > > > compatible, when "r", "m", "o" and "g" are used. The new memory
> > > > > > constraint "Bt", should allow new registers, and should be added to
> > > > > > the constraint string as a separate constraint, and conditionally
> > > > > > enabled by relevant "isa" (AKA "enabled") attribute.
> > > > >
> > > > > The extended constraint can work for registers, but for memory it is more
> > > > > complicated.
> > > >
> > > > Yes, unfortunately. The compiler assumes that an unchangeable register
> > > > class is used for BASE/INDEX registers. I have hit this limitation
> > > > when trying to implement memory support for instructions involving
> > > > 8-bit high registers (%ah, %bh, %ch, %dh), which do not support REX
> > > > registers, also inside memory operand. (You can see the "hack" in e.g.
> > > > *extzvqi_mem_rex64" and corresponding peephole2 with the original
> > > > *extzvqi pattern). I am aware that dynamic insn-dependent BASE/INDEX
> > > > register class is the major limitation in the compiler, so perhaps the
> > > > strategy on how to override this limitation should be discussed with
> > > > the register allocator author first. Perhaps adding an insn attribute
> > > > to insn RTX pattern to specify different BASE/INDEX register sets can
> > > > be a better solution than passing insn RTX to the register allocator.
> > > >
> > > > The above idea still does not solve the asm problem on how to select
> > > > correct BASE/INDEX register set for memory operands.
> > > The current approach disables gpr32 for memory operand in asm_operand
> > > by default. but can be turned on by options
> > > ix86_apx_inline_asm_use_gpr32(users need to guarantee the instruction
> > > supports gpr32).
> > > Only ~ 5% of total instructions don't support gpr32, reversed approach
> > > only gonna get more complicated.
> >
> > I'm not referring to the reversed approach, just want to point out
> > that the same approach as you proposed w.r.t. to memory operand can be
> > achieved using some named insn attribute that would affect BASE/INDEX
> > register class selection. The attribute could default to gpr32 with
> > APX, unless the insn specific attribute has e.g. nogpr32 value. See
> > for example how "enabled" and "preferred_for_*" attributes are used.
> > Perhaps this new attribute can also be applied to separate
> > alternatives.
> Yes, for xop/fma4/3dnow instructions, I think we can use isa attr like
> (define_attr "gpr32" "0, 1"
>   (cond [(eq_attr "isa" "fma4")
>            (const_string "0")]
>       (const_string "1")))

Just a nit, can the member be named "map0" and "map1"? The code will
then look like:

if (get_attr_gpr32 (insn) == GPR32_MAP0) ...

instead of:

if (get_attr_gpr32 (insn) == GPR32_0) ...

> But still, we need to adjust memory constraints in the pattern.

I guess the gpr32 property is the same for all alternatives of the
insn pattern. In this case,  "m" "g" and "a" constraints could remain
as they are, the final register class will be adjusted (by some target
hook?) based on the value of gpr32 attribute.

> Ideally, gcc includes encoding information for every instruction,
> (.i.e. map0/map1), so that we can determine the attribute value of
> gpr32 directly from this information.

I think the right tool for this is attribute infrastructure of insn
patterns. We can set the default, set precise value of the insns, or
calculate attribute from some other attribute in a quite flexible way.
Other than that, adjusting BASE/INDEX register class of the RA pass is
the infrastructure change, but perhaps similar to the one you
proposed.

Uros.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 06/13] [APX EGPR] Map reg/mem constraints in inline asm to non-EGPR constraint.
  2023-09-04  8:57                 ` Uros Bizjak
@ 2023-09-04  9:10                   ` Hongtao Liu
  0 siblings, 0 replies; 49+ messages in thread
From: Hongtao Liu @ 2023-09-04  9:10 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: Hongyu Wang, Jakub Jelinek, Hongyu Wang, gcc-patches,
	hongtao.liu, hubicka

On Mon, Sep 4, 2023 at 4:57 PM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Mon, Sep 4, 2023 at 2:28 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> > > > > > > > I think there should be some constraint which explicitly has all the 32
> > > > > > > > GPRs, like there is one for just all 16 GPRs (h), so that regardless of
> > > > > > > > -mapx-inline-asm-use-gpr32 one can be explicit what the inline asm wants.
> > > > > > > >
> > > > > > > > Also, what about the "g" constraint?  Shouldn't there be another for "g"
> > > > > > > > without r16..r31?  What about the various other memory
> > > > > > > > constraints ("<", "o", ...)?
> > > > > > >
> > > > > > > I think we should leave all existing constraints as they are, so "r"
> > > > > > > covers only GPR16, "m" and "o" to only use GPR16. We can then
> > > > > > > introduce "h" to instructions that have the ability to handle EGPR.
> > > > > > > This would be somehow similar to the SSE -> AVX512F transition, where
> > > > > > > we still have "x" for SSE16 and "v" was introduced as a separate
> > > > > > > register class for EVEX SSE registers. This way, asm will be
> > > > > > > compatible, when "r", "m", "o" and "g" are used. The new memory
> > > > > > > constraint "Bt", should allow new registers, and should be added to
> > > > > > > the constraint string as a separate constraint, and conditionally
> > > > > > > enabled by relevant "isa" (AKA "enabled") attribute.
> > > > > >
> > > > > > The extended constraint can work for registers, but for memory it is more
> > > > > > complicated.
> > > > >
> > > > > Yes, unfortunately. The compiler assumes that an unchangeable register
> > > > > class is used for BASE/INDEX registers. I have hit this limitation
> > > > > when trying to implement memory support for instructions involving
> > > > > 8-bit high registers (%ah, %bh, %ch, %dh), which do not support REX
> > > > > registers, also inside memory operand. (You can see the "hack" in e.g.
> > > > > *extzvqi_mem_rex64" and corresponding peephole2 with the original
> > > > > *extzvqi pattern). I am aware that dynamic insn-dependent BASE/INDEX
> > > > > register class is the major limitation in the compiler, so perhaps the
> > > > > strategy on how to override this limitation should be discussed with
> > > > > the register allocator author first. Perhaps adding an insn attribute
> > > > > to insn RTX pattern to specify different BASE/INDEX register sets can
> > > > > be a better solution than passing insn RTX to the register allocator.
> > > > >
> > > > > The above idea still does not solve the asm problem on how to select
> > > > > correct BASE/INDEX register set for memory operands.
> > > > The current approach disables gpr32 for memory operand in asm_operand
> > > > by default. but can be turned on by options
> > > > ix86_apx_inline_asm_use_gpr32(users need to guarantee the instruction
> > > > supports gpr32).
> > > > Only ~ 5% of total instructions don't support gpr32, reversed approach
> > > > only gonna get more complicated.
> > >
> > > I'm not referring to the reversed approach, just want to point out
> > > that the same approach as you proposed w.r.t. to memory operand can be
> > > achieved using some named insn attribute that would affect BASE/INDEX
> > > register class selection. The attribute could default to gpr32 with
> > > APX, unless the insn specific attribute has e.g. nogpr32 value. See
> > > for example how "enabled" and "preferred_for_*" attributes are used.
> > > Perhaps this new attribute can also be applied to separate
> > > alternatives.
> > Yes, for xop/fma4/3dnow instructions, I think we can use isa attr like
> > (define_attr "gpr32" "0, 1"
> >   (cond [(eq_attr "isa" "fma4")
> >            (const_string "0")]
> >       (const_string "1")))
>
> Just a nit, can the member be named "map0" and "map1"? The code will
> then look like:
>
> if (get_attr_gpr32 (insn) == GPR32_MAP0) ...
>
> instead of:
>
> if (get_attr_gpr32 (insn) == GPR32_0) ...
>
> > But still, we need to adjust memory constraints in the pattern.
>
> I guess the gpr32 property is the same for all alternatives of the
> insn pattern. In this case,  "m" "g" and "a" constraints could remain
> as they are, the final register class will be adjusted (by some target
> hook?) based on the value of gpr32 attribute.
I'm worried that not all rtl optimizers after post_reload will respect
base/index_reg_class regarding the insn they belong to.
 if they just check if it's a legitimate memory/address (the current
legitimate_address doesn't have a corresponding insn to pass down),
m/g/a will still generate invalid instruction.
So a defensive programming is to explicitly modifying the constraint.
>
> > Ideally, gcc includes encoding information for every instruction,
> > (.i.e. map0/map1), so that we can determine the attribute value of
> > gpr32 directly from this information.
>
> I think the right tool for this is attribute infrastructure of insn
> patterns. We can set the default, set precise value of the insns, or
> calculate attribute from some other attribute in a quite flexible way.
> Other than that, adjusting BASE/INDEX register class of the RA pass is
> the infrastructure change, but perhaps similar to the one you
> proposed.
Yes.
>
> Uros.



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 01/13] [APX EGPR] middle-end: Add insn argument to base_reg_class
  2023-09-01  9:07     ` Hongyu Wang
@ 2023-09-06 19:43       ` Vladimir Makarov
  2023-09-07  6:23         ` Uros Bizjak
  0 siblings, 1 reply; 49+ messages in thread
From: Vladimir Makarov @ 2023-09-06 19:43 UTC (permalink / raw)
  To: Hongyu Wang, Uros Bizjak
  Cc: Hongyu Wang, jakub, gcc-patches, hongtao.liu, hubicka


On 9/1/23 05:07, Hongyu Wang wrote:
> Uros Bizjak via Gcc-patches <gcc-patches@gcc.gnu.org> 于2023年8月31日周四 18:16写道:
>> On Thu, Aug 31, 2023 at 10:20 AM Hongyu Wang <hongyu.wang@intel.com> wrote:
>>> From: Kong Lingling <lingling.kong@intel.com>
>>>
>>> Current reload infrastructure does not support selective base_reg_class
>>> for backend insn. Add insn argument to base_reg_class for
>>> lra/reload usage.
>> I don't think this is the correct approach. Ideally, a memory
>> constraint should somehow encode its BASE/INDEX register class.
>> Instead of passing "insn", simply a different constraint could be used
>> in the constraint string of the relevant insn.
> We tried constraint only at the beginning, but then we found the
> reload infrastructure
> does not work like that.
>
> The BASE/INDEX reg classes are determined before choosing alternatives, in
> process_address under curr_insn_transform. Process_address creates the mem
> operand according to the BASE/INDEX reg class. Then, the memory operand
> constraint check will evaluate the mem op with targetm.legitimate_address_p.
>
> If we want to make use of EGPR in base/index we need to either extend BASE/INDEX
> reg class in the backend, or, for specific insns, add a target hook to
> tell reload
> that the extended reg class with EGPR can be used to construct memory operand.
>
> CC'd Vladimir as git send-mail failed to add recipient.
>
>
I think the approach proposed by Intel developers is better.  In some way
we already use such approach when we pass memory mode to get the base
reg class.  Although we could use different memory constraints for
different modes when the possible base reg differs for some memory
modes.

Using special memory constraints probably can be implemented too (I
understand attractiveness of such approach for readability of the
machine description).  But in my opinion it will require much bigger
work in IRA/LRA/reload.  It also significantly slow down RA as we need
to process insn constraints for processing each memory in many places
(e.g. for calculation of reg classes and costs in IRA).  Still I think
there will be a few cases for this approach resulting in a bigger
probability of assigning hard reg out of specific base reg class and
this will result in additional reloads.

So the approach proposed by Intel is ok for me.  Although if x86 maintainers
are strongly against this approach and the changes in x86 machine
dependent code and Intel developers implement Uros approach, I am
ready to review this.  But still I prefer the current Intel developers
approach for reasons I mentioned above.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 01/13] [APX EGPR] middle-end: Add insn argument to base_reg_class
  2023-09-06 19:43       ` Vladimir Makarov
@ 2023-09-07  6:23         ` Uros Bizjak
  2023-09-07 12:13           ` Vladimir Makarov
  0 siblings, 1 reply; 49+ messages in thread
From: Uros Bizjak @ 2023-09-07  6:23 UTC (permalink / raw)
  To: Vladimir Makarov
  Cc: Hongyu Wang, Hongyu Wang, jakub, gcc-patches, hongtao.liu, hubicka

On Wed, Sep 6, 2023 at 9:43 PM Vladimir Makarov <vmakarov@redhat.com> wrote:
>
>
> On 9/1/23 05:07, Hongyu Wang wrote:
> > Uros Bizjak via Gcc-patches <gcc-patches@gcc.gnu.org> 于2023年8月31日周四 18:16写道:
> >> On Thu, Aug 31, 2023 at 10:20 AM Hongyu Wang <hongyu.wang@intel.com> wrote:
> >>> From: Kong Lingling <lingling.kong@intel.com>
> >>>
> >>> Current reload infrastructure does not support selective base_reg_class
> >>> for backend insn. Add insn argument to base_reg_class for
> >>> lra/reload usage.
> >> I don't think this is the correct approach. Ideally, a memory
> >> constraint should somehow encode its BASE/INDEX register class.
> >> Instead of passing "insn", simply a different constraint could be used
> >> in the constraint string of the relevant insn.
> > We tried constraint only at the beginning, but then we found the
> > reload infrastructure
> > does not work like that.
> >
> > The BASE/INDEX reg classes are determined before choosing alternatives, in
> > process_address under curr_insn_transform. Process_address creates the mem
> > operand according to the BASE/INDEX reg class. Then, the memory operand
> > constraint check will evaluate the mem op with targetm.legitimate_address_p.
> >
> > If we want to make use of EGPR in base/index we need to either extend BASE/INDEX
> > reg class in the backend, or, for specific insns, add a target hook to
> > tell reload
> > that the extended reg class with EGPR can be used to construct memory operand.
> >
> > CC'd Vladimir as git send-mail failed to add recipient.
> >
> >
> I think the approach proposed by Intel developers is better.  In some way
> we already use such approach when we pass memory mode to get the base
> reg class.  Although we could use different memory constraints for
> different modes when the possible base reg differs for some memory
> modes.
>
> Using special memory constraints probably can be implemented too (I
> understand attractiveness of such approach for readability of the
> machine description).  But in my opinion it will require much bigger
> work in IRA/LRA/reload.  It also significantly slow down RA as we need
> to process insn constraints for processing each memory in many places
> (e.g. for calculation of reg classes and costs in IRA).  Still I think
> there will be a few cases for this approach resulting in a bigger
> probability of assigning hard reg out of specific base reg class and
> this will result in additional reloads.
>
> So the approach proposed by Intel is ok for me.  Although if x86 maintainers
> are strongly against this approach and the changes in x86 machine
> dependent code and Intel developers implement Uros approach, I am
> ready to review this.  But still I prefer the current Intel developers
> approach for reasons I mentioned above.

My above proposal is more or less a wish from a target maintainer PoV.
Ideally, we would have a bunch of different memory constraints, and a
target hook that returns corresponding BASE/INDEX reg classes.
However, I have no idea about the complexity of the implementation in
the infrastructure part of the compiler.

Uros.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 01/13] [APX EGPR] middle-end: Add insn argument to base_reg_class
  2023-09-07  6:23         ` Uros Bizjak
@ 2023-09-07 12:13           ` Vladimir Makarov
  0 siblings, 0 replies; 49+ messages in thread
From: Vladimir Makarov @ 2023-09-07 12:13 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: Hongyu Wang, Hongyu Wang, jakub, gcc-patches, hongtao.liu, hubicka


On 9/7/23 02:23, Uros Bizjak wrote:
> On Wed, Sep 6, 2023 at 9:43 PM Vladimir Makarov <vmakarov@redhat.com> wrote:
>>
>> On 9/1/23 05:07, Hongyu Wang wrote:
>>>
>> I think the approach proposed by Intel developers is better.  In some way
>> we already use such approach when we pass memory mode to get the base
>> reg class.  Although we could use different memory constraints for
>> different modes when the possible base reg differs for some memory
>> modes.
>>
>> Using special memory constraints probably can be implemented too (I
>> understand attractiveness of such approach for readability of the
>> machine description).  But in my opinion it will require much bigger
>> work in IRA/LRA/reload.  It also significantly slow down RA as we need
>> to process insn constraints for processing each memory in many places
>> (e.g. for calculation of reg classes and costs in IRA).  Still I think
>> there will be a few cases for this approach resulting in a bigger
>> probability of assigning hard reg out of specific base reg class and
>> this will result in additional reloads.
>>
>> So the approach proposed by Intel is ok for me.  Although if x86 maintainers
>> are strongly against this approach and the changes in x86 machine
>> dependent code and Intel developers implement Uros approach, I am
>> ready to review this.  But still I prefer the current Intel developers
>> approach for reasons I mentioned above.
> My above proposal is more or less a wish from a target maintainer PoV.
> Ideally, we would have a bunch of different memory constraints, and a
> target hook that returns corresponding BASE/INDEX reg classes.
> However, I have no idea about the complexity of the implementation in
> the infrastructure part of the compiler.
>
Basically, it needs introducing new hooks which return base and index 
classes from special memory constraints. When we process memory in an 
insn (a lot of places in IRA, LRA,reload) we should consider all 
possible memory insn constraints, take intersection of basic and index 
reg classes for the constraints and use them instead of the default base 
and reg classes.

The required functionality is absent in reload too.

I would say that it is a moderate size project (1-2 months for me).  It 
still requires to introduce new hooks and I guess there are few cases 
when we will still assign hard regs out of desirable base class for 
address pseudos and this will results in generation of additional reload 
insns.  It also means much more additional changes in RA source code and 
x86 machine dependent files.

Probably, with this approach there will be also edge cases when we need 
to solve new PRs because of LRA failures to generate the correct code 
but I believe they can be solved.

Therefore I lean toward the current Intel approach when to get base reg 
class we pass the insn as a parameter additionally to memory mode.



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 01/13] [APX EGPR] middle-end: Add insn argument to base_reg_class
  2023-08-31  8:20 ` [PATCH 01/13] [APX EGPR] middle-end: Add insn argument to base_reg_class Hongyu Wang
  2023-08-31 10:15   ` Uros Bizjak
@ 2023-09-08 17:03   ` Vladimir Makarov
  2023-09-10  4:49     ` Hongyu Wang
  1 sibling, 1 reply; 49+ messages in thread
From: Vladimir Makarov @ 2023-09-08 17:03 UTC (permalink / raw)
  To: Hongyu Wang, gcc-patches
  Cc: hongtao.liu, ubizjak, hubicka, jakub, Kong Lingling

[-- Attachment #1: Type: text/plain, Size: 673 bytes --]


On 8/31/23 04:20, Hongyu Wang wrote:
> @@ -2542,6 +2542,8 @@ the code of the immediately enclosing expression (@code{MEM} for the top level
>   of an address, @code{ADDRESS} for something that occurs in an
>   @code{address_operand}).  @var{index_code} is the code of the corresponding
>   index expression if @var{outer_code} is @code{PLUS}; @code{SCRATCH} otherwise.
> +@code{insn} indicates insn specific base register class should be subset
> +of the original base register class.
>   @end defmac

I'd prefer more general description of 'insn' argument for the macros.  
Something like that:

@code{insn} can be used to define an insn-specific base register class.



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 01/13] [APX EGPR] middle-end: Add insn argument to base_reg_class
  2023-09-08 17:03   ` Vladimir Makarov
@ 2023-09-10  4:49     ` Hongyu Wang
  2023-09-14 12:09       ` Vladimir Makarov
  0 siblings, 1 reply; 49+ messages in thread
From: Hongyu Wang @ 2023-09-10  4:49 UTC (permalink / raw)
  To: Vladimir Makarov; +Cc: Hongyu Wang, gcc-patches, jakub, hongtao.liu

Vladimir Makarov via Gcc-patches <gcc-patches@gcc.gnu.org> 于2023年9月9日周六 01:04写道:
>
>
> On 8/31/23 04:20, Hongyu Wang wrote:
> > @@ -2542,6 +2542,8 @@ the code of the immediately enclosing expression (@code{MEM} for the top level
> >   of an address, @code{ADDRESS} for something that occurs in an
> >   @code{address_operand}).  @var{index_code} is the code of the corresponding
> >   index expression if @var{outer_code} is @code{PLUS}; @code{SCRATCH} otherwise.
> > +@code{insn} indicates insn specific base register class should be subset
> > +of the original base register class.
> >   @end defmac
>
> I'd prefer more general description of 'insn' argument for the macros.
> Something like that:
>
> @code{insn} can be used to define an insn-specific base register class.
>

Sure, will adjust in the V2 patch.
Also, currently we reuse the old macro MODE_CODE_BASE_REG_CLASS, do
you think we need a new macro like INSN_BASE_REG_CLASS as other
parameters are actually unused? Then we don't need to change other
targets like avr/gcn.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 01/13] [APX EGPR] middle-end: Add insn argument to base_reg_class
  2023-09-10  4:49     ` Hongyu Wang
@ 2023-09-14 12:09       ` Vladimir Makarov
  0 siblings, 0 replies; 49+ messages in thread
From: Vladimir Makarov @ 2023-09-14 12:09 UTC (permalink / raw)
  To: Hongyu Wang; +Cc: Hongyu Wang, gcc-patches, jakub, hongtao.liu


On 9/10/23 00:49, Hongyu Wang wrote:
> Vladimir Makarov via Gcc-patches <gcc-patches@gcc.gnu.org> 于2023年9月9日周六 01:04写道:
>>
>> On 8/31/23 04:20, Hongyu Wang wrote:
>>> @@ -2542,6 +2542,8 @@ the code of the immediately enclosing expression (@code{MEM} for the top level
>>>    of an address, @code{ADDRESS} for something that occurs in an
>>>    @code{address_operand}).  @var{index_code} is the code of the corresponding
>>>    index expression if @var{outer_code} is @code{PLUS}; @code{SCRATCH} otherwise.
>>> +@code{insn} indicates insn specific base register class should be subset
>>> +of the original base register class.
>>>    @end defmac
>> I'd prefer more general description of 'insn' argument for the macros.
>> Something like that:
>>
>> @code{insn} can be used to define an insn-specific base register class.
>>
> Sure, will adjust in the V2 patch.
> Also, currently we reuse the old macro MODE_CODE_BASE_REG_CLASS, do
> you think we need a new macro like INSN_BASE_REG_CLASS as other
> parameters are actually unused? Then we don't need to change other
> targets like avr/gcn.
>
I thought about this too.  Using new macros would be definitely worth to 
add, especially when you are already adding INSN_INDEX_REG_CLASS.

The names INSN_BASE_REG_CLASS instead of MODE_CODE_BASE_REG_CLASS and 
REGNO_OK_FOR_INSN_BASE_P instead of REGNO_MODE_CODE_OK_FOR_BASE_P are ok 
for me too.

When you submit the v2 patch, I'll review the RA part as soon as 
possible (actually I already looked at this) and most probably give my 
approval for the RA part because I prefer you current approach for RA 
instead of introducing new memory constraints.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 02/13] [APX EGPR] middle-end: Add index_reg_class with insn argument.
  2023-09-22 10:56 ` [PATCH 02/13] [APX EGPR] middle-end: Add index_reg_class with insn argument Hongyu Wang
@ 2023-09-22 16:03   ` Vladimir Makarov
  0 siblings, 0 replies; 49+ messages in thread
From: Vladimir Makarov @ 2023-09-22 16:03 UTC (permalink / raw)
  To: Hongyu Wang, gcc-patches; +Cc: ubizjak, jakub, Kong Lingling, Hongtao Liu


On 9/22/23 06:56, Hongyu Wang wrote:
> Like base_reg_class, INDEX_REG_CLASS also does not support backend insn.
> Add index_reg_class with insn argument for lra/reload usage.
>
> gcc/ChangeLog:
>
> 	* addresses.h (index_reg_class): New wrapper function like
> 	base_reg_class.
> 	* doc/tm.texi: Document INSN_INDEX_REG_CLASS.
> 	* doc/tm.texi.in: Ditto.
> 	* lra-constraints.cc (index_part_to_reg): Pass index_class.
> 	(process_address_1): Calls index_reg_class with curr_insn and
> 	replace INDEX_REG_CLASS with its return value index_cl.
> 	* reload.cc (find_reloads_address): Likewise.
> 	(find_reloads_address_1): Likewise.
>
The patch is ok for me to commit it to the trunk.  Thank you.

So all changes to the RA have been reviewed.  You just need an approval 
to the rest patches from an x86-64 maintainer.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 02/13] [APX EGPR] middle-end: Add index_reg_class with insn argument.
  2023-09-22 10:56 [PATCH v2 00/13] " Hongyu Wang
@ 2023-09-22 10:56 ` Hongyu Wang
  2023-09-22 16:03   ` Vladimir Makarov
  0 siblings, 1 reply; 49+ messages in thread
From: Hongyu Wang @ 2023-09-22 10:56 UTC (permalink / raw)
  To: gcc-patches; +Cc: ubizjak, vmakarov, jakub, Kong Lingling, Hongtao Liu

Like base_reg_class, INDEX_REG_CLASS also does not support backend insn.
Add index_reg_class with insn argument for lra/reload usage.

gcc/ChangeLog:

	* addresses.h (index_reg_class): New wrapper function like
	base_reg_class.
	* doc/tm.texi: Document INSN_INDEX_REG_CLASS.
	* doc/tm.texi.in: Ditto.
	* lra-constraints.cc (index_part_to_reg): Pass index_class.
	(process_address_1): Calls index_reg_class with curr_insn and
	replace INDEX_REG_CLASS with its return value index_cl.
	* reload.cc (find_reloads_address): Likewise.
	(find_reloads_address_1): Likewise.

Co-authored-by: Kong Lingling <lingling.kong@intel.com>
Co-authored-by: Hongtao Liu <hongtao.liu@intel.com>
---
 gcc/addresses.h        | 10 ++++++++++
 gcc/doc/tm.texi        |  7 +++++++
 gcc/doc/tm.texi.in     |  7 +++++++
 gcc/lra-constraints.cc | 17 +++++++++--------
 gcc/reload.cc          |  4 ++--
 5 files changed, 35 insertions(+), 10 deletions(-)

diff --git a/gcc/addresses.h b/gcc/addresses.h
index 2c92927bd51..08bf39cd56c 100644
--- a/gcc/addresses.h
+++ b/gcc/addresses.h
@@ -51,6 +51,16 @@ base_reg_class (machine_mode mode ATTRIBUTE_UNUSED,
 #endif
 }
 
+inline enum reg_class
+index_reg_class (rtx_insn *insn ATTRIBUTE_UNUSED = NULL)
+{
+#ifdef INSN_INDEX_REG_CLASS
+  return INSN_INDEX_REG_CLASS (insn);
+#else
+  return INDEX_REG_CLASS;
+#endif
+}
+
 /* Wrapper function to unify target macros REGNO_MODE_CODE_OK_FOR_BASE_P,
    REGNO_MODE_OK_FOR_REG_BASE_P, REGNO_MODE_OK_FOR_BASE_P and
    REGNO_OK_FOR_BASE_P.
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 5b1e2a11f89..c566f7a1105 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -2582,6 +2582,13 @@ address where its value is either multiplied by a scale factor or
 added to another register (as well as added to a displacement).
 @end defmac
 
+@defmac INSN_INDEX_REG_CLASS (@var{insn})
+A C expression whose value is the register class to which a valid
+index register for a specified @var{insn} must belong. This macro is
+used when some backend insns may have limited usage of index register
+compared with other insns.
+@end defmac
+
 @defmac REGNO_OK_FOR_BASE_P (@var{num})
 A C expression which is nonzero if register number @var{num} is
 suitable for use as a base register in operand addresses.
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index f6e63ad8871..3182d0d7c75 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -2164,6 +2164,13 @@ address where its value is either multiplied by a scale factor or
 added to another register (as well as added to a displacement).
 @end defmac
 
+@defmac INSN_INDEX_REG_CLASS (@var{insn})
+A C expression whose value is the register class to which a valid
+index register for a specified @var{insn} must belong. This macro is
+used when some backend insns may have limited usage of index register
+compared with other insns.
+@end defmac
+
 @defmac REGNO_OK_FOR_BASE_P (@var{num})
 A C expression which is nonzero if register number @var{num} is
 suitable for use as a base register in operand addresses.
diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index 6dc77af86cd..0c8e28e0194 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -3399,12 +3399,12 @@ base_plus_disp_to_reg (struct address_info *ad, rtx disp)
 /* Make reload of index part of address AD.  Return the new
    pseudo.  */
 static rtx
-index_part_to_reg (struct address_info *ad)
+index_part_to_reg (struct address_info *ad, enum reg_class index_class)
 {
   rtx new_reg;
 
   new_reg = lra_create_new_reg (GET_MODE (*ad->index), NULL_RTX,
-				INDEX_REG_CLASS, NULL, "index term");
+				index_class, NULL, "index term");
   expand_mult (GET_MODE (*ad->index), *ad->index_term,
 	       GEN_INT (get_index_scale (ad)), new_reg, 1);
   return new_reg;
@@ -3659,13 +3659,14 @@ process_address_1 (int nop, bool check_only_p,
   /* If INDEX_REG_CLASS is assigned to base_term already and isn't to
      index_term, swap them so to avoid assigning INDEX_REG_CLASS to both
      when INDEX_REG_CLASS is a single register class.  */
+  enum reg_class index_cl = index_reg_class (curr_insn);
   if (ad.base_term != NULL
       && ad.index_term != NULL
-      && ira_class_hard_regs_num[INDEX_REG_CLASS] == 1
+      && ira_class_hard_regs_num[index_cl] == 1
       && REG_P (*ad.base_term)
       && REG_P (*ad.index_term)
-      && in_class_p (*ad.base_term, INDEX_REG_CLASS, NULL)
-      && ! in_class_p (*ad.index_term, INDEX_REG_CLASS, NULL))
+      && in_class_p (*ad.base_term, index_cl, NULL)
+      && ! in_class_p (*ad.index_term, index_cl, NULL))
     {
       std::swap (ad.base, ad.index);
       std::swap (ad.base_term, ad.index_term);
@@ -3689,7 +3690,7 @@ process_address_1 (int nop, bool check_only_p,
     }
   if (ad.index_term != NULL
       && process_addr_reg (ad.index_term, check_only_p,
-			   before, NULL, INDEX_REG_CLASS))
+			   before, NULL, index_cl))
     change_p = true;
 
   /* Target hooks sometimes don't treat extra-constraint addresses as
@@ -3798,7 +3799,7 @@ process_address_1 (int nop, bool check_only_p,
 					      GET_CODE (*ad.index),
 					      curr_insn);
 
-	  lra_assert (INDEX_REG_CLASS != NO_REGS);
+	  lra_assert (index_cl != NO_REGS);
 	  new_reg = lra_create_new_reg (Pmode, NULL_RTX, cl, NULL, "disp");
 	  lra_emit_move (new_reg, *ad.disp);
 	  *ad.inner = simplify_gen_binary (PLUS, GET_MODE (new_reg),
@@ -3894,7 +3895,7 @@ process_address_1 (int nop, bool check_only_p,
       changed pseudo on the equivalent memory and a subreg of the
       pseudo onto the memory of different mode for which the scale is
       prohibitted.  */
-      new_reg = index_part_to_reg (&ad);
+      new_reg = index_part_to_reg (&ad, index_cl);
       *ad.inner = simplify_gen_binary (PLUS, GET_MODE (new_reg),
 				       *ad.base_term, new_reg);
     }
diff --git a/gcc/reload.cc b/gcc/reload.cc
index 72f7e27af15..66b484b12fa 100644
--- a/gcc/reload.cc
+++ b/gcc/reload.cc
@@ -5114,7 +5114,7 @@ find_reloads_address (machine_mode mode, rtx *memrefloc, rtx ad,
 	  /* Reload the displacement into an index reg.
 	     We assume the frame pointer or arg pointer is a base reg.  */
 	  find_reloads_address_part (XEXP (ad, 1), &XEXP (ad, 1),
-				     INDEX_REG_CLASS, GET_MODE (ad), opnum,
+				     index_reg_class (insn), GET_MODE (ad), opnum,
 				     type, ind_levels);
 	  return 0;
 	}
@@ -5514,7 +5514,7 @@ find_reloads_address_1 (machine_mode mode, addr_space_t as,
   bool reloaded_inner_of_autoinc = false;
 
   if (context == 1)
-    context_reg_class = INDEX_REG_CLASS;
+    context_reg_class = index_reg_class (insn);
   else
     context_reg_class = base_reg_class (mode, as, outer_code, index_code,
 					insn);
-- 
2.31.1


^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2023-09-22 16:03 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-31  8:20 [PATCH 00/13] [RFC] Support Intel APX EGPR Hongyu Wang
2023-08-31  8:20 ` [PATCH 01/13] [APX EGPR] middle-end: Add insn argument to base_reg_class Hongyu Wang
2023-08-31 10:15   ` Uros Bizjak
2023-09-01  9:07     ` Hongyu Wang
2023-09-06 19:43       ` Vladimir Makarov
2023-09-07  6:23         ` Uros Bizjak
2023-09-07 12:13           ` Vladimir Makarov
2023-09-08 17:03   ` Vladimir Makarov
2023-09-10  4:49     ` Hongyu Wang
2023-09-14 12:09       ` Vladimir Makarov
2023-08-31  8:20 ` [PATCH 02/13] [APX EGPR] middle-end: Add index_reg_class with insn argument Hongyu Wang
2023-08-31  8:20 ` [PATCH 03/13] [APX_EGPR] Initial support for APX_F Hongyu Wang
2023-08-31  8:20 ` [PATCH 04/13] [APX EGPR] Add 16 new integer general purpose registers Hongyu Wang
2023-08-31  8:20 ` [PATCH 05/13] [APX EGPR] Add register and memory constraints that disallow EGPR Hongyu Wang
2023-08-31  8:20 ` [PATCH 06/13] [APX EGPR] Map reg/mem constraints in inline asm to non-EGPR constraint Hongyu Wang
2023-08-31  9:17   ` Jakub Jelinek
2023-08-31 10:00     ` Uros Bizjak
2023-09-01  9:04       ` Hongyu Wang
2023-09-01  9:38         ` Uros Bizjak
2023-09-01 10:35           ` Hongtao Liu
2023-09-01 11:27             ` Uros Bizjak
2023-09-04  0:28               ` Hongtao Liu
2023-09-04  8:57                 ` Uros Bizjak
2023-09-04  9:10                   ` Hongtao Liu
2023-09-01 11:03       ` Richard Sandiford
2023-09-04  1:03         ` Hongtao Liu
2023-09-01  9:04     ` Hongyu Wang
2023-08-31  8:20 ` [PATCH 07/13] [APX EGPR] Add backend hook for base_reg_class/index_reg_class Hongyu Wang
2023-08-31  8:20 ` [PATCH 08/13] [APX EGPR] Handle GPR16 only vector move insns Hongyu Wang
2023-08-31  9:43   ` Jakub Jelinek
2023-09-01  9:07     ` Hongyu Wang
2023-09-01  9:20       ` Jakub Jelinek
2023-09-01 11:34         ` Hongyu Wang
2023-09-01 11:41           ` Jakub Jelinek
2023-08-31  8:20 ` [PATCH 09/13] [APX EGPR] Handle legacy insn that only support GPR16 (1/5) Hongyu Wang
2023-08-31 10:06   ` Uros Bizjak
2023-08-31  8:20 ` [PATCH 10/13] [APX EGPR] Handle legacy insns that only support GPR16 (2/5) Hongyu Wang
2023-08-31  8:20 ` [PATCH 11/13] [APX EGPR] Handle legacy insns that only support GPR16 (3/5) Hongyu Wang
2023-08-31  9:26   ` Richard Biener
2023-08-31  9:28     ` Richard Biener
2023-09-01  9:03       ` Hongyu Wang
2023-09-01 10:38       ` Hongtao Liu
2023-08-31  9:31     ` Jakub Jelinek
2023-08-31  8:20 ` [PATCH 12/13] [APX_EGPR] Handle legacy insns that only support GPR16 (4/5) Hongyu Wang
2023-08-31  8:20 ` [PATCH 13/13] [APX EGPR] Handle vex insns that only support GPR16 (5/5) Hongyu Wang
2023-08-31  9:19 ` [PATCH 00/13] [RFC] Support Intel APX EGPR Richard Biener
2023-09-01  8:55   ` Hongyu Wang
2023-09-22 10:56 [PATCH v2 00/13] " Hongyu Wang
2023-09-22 10:56 ` [PATCH 02/13] [APX EGPR] middle-end: Add index_reg_class with insn argument Hongyu Wang
2023-09-22 16:03   ` Vladimir Makarov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).